1
0
mirror of synced 2025-12-22 19:38:29 -05:00
Commit Graph

182 Commits

Author SHA1 Message Date
Serhii Lazebnyi
5df66cd572 Source S3: Connector does not enforce SSL/TLS for non-S3 endpoints (#17800)
* Deleted ssl/tsl flag from config

* Updated PR number

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-10-12 16:07:22 +02:00
Augustin
ff4ea3961a Republish connectors using CDK 0.1.88 to 0.1.89 (#17304) 2022-09-28 18:18:59 +02:00
Denys Davydov
9054468c21 Source s3: upgrade pyarrow (#16921)
* #423 oncall source s3: upgrade pyarrow

* source s3: upd changelog

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-09-20 19:24:07 +03:00
Denys Davydov
4dc394cb9a Source S3: fix reading jsonl files with nested data (#16607)
* #531 source s3: fix reading nested jsonl files

* #531 source s3: upd changelog

* oncall #531 source s3: fix sample file

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-09-19 12:09:40 +03:00
Denys Davydov
73ba7b63d5 Source S3: choose between data types when merging master schema (#16631)
* #422 source s3: choose broadest data type when there is a mismatch during merging json schemas

* #422 source s3: upd changelog
2022-09-19 10:50:18 +03:00
Bhupesh Varshney
498d70089f Source S3: Doc fix grammar & typo describing parquet file source (#16264) 2022-09-05 17:53:41 -03:00
Liren Tu
a475235d89 📝 S3 source: update doc about path_prefix + path_pattern (#16206)
* Add more explanation for path_prefix + path_pattern

* Simplify wording

* Update wording
2022-08-31 20:51:00 -07:00
Jagruti Tiwari
8288c16485 fix: replace airbyte oss with airbyte open source (#15885)
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
2022-08-24 01:01:53 -03:00
sivankumar86
3d499557b7 source-S3: Support JSON format (#14213)
* json format support added

* json format support added

* code formatted

* format convertion changed

* format naming convertion changed

* test cased issue fixed

* test case issued resolved

* sample file and config added for integration tests

* Json doc added

Json doc added

* update

* sample file and config added for integration tests

* sample file and config added for integration tests

* update jsonl files

* review 1

* review 1

* review 1

* pyarrow version upgrade

* clean integration test folder architecture

* add timestamp record to simple_test.jsonl

* fixed integration test and parser review change

* simplify table read

* doc update

* fix specs

* user sample files

* fix sample files

* add newlines at end of files

* rename json parser

* rename jsonfile to jsonlfile

* schema inference added

* patch review fix

* Update docs/integrations/sources/s3.md

doc update

Co-authored-by: George Claireaux <george@airbyte.io>

* changing the version

* changing the title to sync with other type

* fix expected csv records

* fix expected records for avro and parquet

* review fix

* fixed master schema handling

* remove sample configs

* fix expected records

* json doc update

added more details on json parser

* fixed api name

* bump version

* auto-bump connector version [ci skip]

Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
Co-authored-by: George Claireaux <george@airbyte.io>
Co-authored-by: George Claireaux <george@claireaux.co.uk>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-08-01 15:48:23 +01:00
Serhii Chvaliuk
29d6a61a21 🐛 Source S3: "decimal" type added for parquet (#14911)
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-07-22 01:04:44 +03:00
Baz
7cf67e2c85 🐛 Source S3: fixed bug when extra columns not in master schema (#14669) 2022-07-13 22:56:03 +03:00
Topher Lubaway
9c6c092a22 Revert "Improving docusaurus sidebar generation (#1927) (#14369)" (#14596)
This reverts commit a2c194a11f.
2022-07-11 15:27:14 -05:00
Mykyta Serbynevskiy
a2c194a11f Improving docusaurus sidebar generation (#1927) (#14369)
* Improving docusaurus sidebar generation (#1927)

* Added "Career & open positions" folder to sidebar, adjusted "Project overview" folder

* Deleted "career-and-open-positions" folder from sidebar
2022-07-08 14:18:27 -05:00
Serhii Lazebnyi
f896c574d1 🎉 Source Amazon S3: Fix docs link issue (#14397)
* Fix UI connector name and link issue

* Revert name to S3
2022-07-08 15:13:00 +03:00
Serhii Lazebnyi
f9348b2251 🐛 Source Amazon S3: solve possible case of files being missed during incremental syncs (#12568)
* Added history to state

* Deleted unused import

* Rollback abnormal state file

* Rollback abnormal state file

* Fixed type error issue

* Fix state issue

* Updated after review

* Bumped version
2022-05-31 21:39:10 +03:00
Serhii Lazebnyi
91326749d9 🎉Source Amazon S3: increase unit test coverage at least 90% (#11967)
* Increased unittest coverage

* #11676 test coverage 85%

* #11676 unit tests 90%

* #11676 two more unit tests

* #11676 bump version

* auto-bump connector version

Co-authored-by: Denys Davydov <denys.i.davydov@globallogic.com>
Co-authored-by: Denys Davydov <davydov.den18@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-05-23 13:37:27 +03:00
Serhii Lazebnyi
225aecd37c 🐛Source Amazon S3: Fixed empty options issue (#12730)
* Fixed empty oprions issue

* Update airbyte-integrations/connectors/source-s3/source_s3/utils.py

Co-authored-by: Denis Davydov <denys.i.davydov@globallogic.com>

* Bumped version

* Fix typo

* Bumped seed version

* Fix changelog

* Bumped version in docker file

* auto-bump connector version

Co-authored-by: Denis Davydov <denys.i.davydov@globallogic.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-05-11 21:21:54 +03:00
Melker Öhrman
f9188590cc Add avro parser to s3 source (#12602)
* added MVP avro parser running fine locally

* added unit tests for avro

* added wip state of avro integration test setup

* deleted unused files

* added avro specific config path

* fixed comments. Added nested record support, simplify code and minor fixes

* bumped version + docs update

* Added working acceptance tests + format

* auto-bump connector version

Co-authored-by: George Claireaux <george@claireaux.co.uk>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-05-11 16:59:58 +01:00
Davin Chia
f8a35eaa80 Add Java Catalog documentation. (#12751)
Clean up and add better guidelines on how to use the Java catalogs we recently added.

Took the chance to move existing documentation to improve reading flow.
2022-05-11 13:02:07 +08:00
Serhii Lazebnyi
27e6ce2ca8 Source Amazon S3: Refactored docs (#12534)
* Refactored spec and docs

* Updated spec.json

* Rollback spec fromating

* Rollback spec fromating

* Rollback spec fromating
2022-05-09 14:56:52 +03:00
Sherif A. Nada
b8e147538c Update various connector input configs & docs copy (#12500) 2022-05-04 23:37:10 -07:00
Maksym Pavlenok
91eff1dffd 🐛 Source S3: Loading of files' metadata (#8252) 2022-02-02 00:49:18 +02:00
Serhii Chvaliuk
c7021e6f30 🐛 Source S3: work-around for format.delimiter change '\\t' -> '\t' (#9163)
* work-around for format.delimiter '\\t' -> '\t'

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-01-06 20:49:55 +02:00
Augustin
14b301ce37 Update S3 and file sources docs: we do not support unstructured data (#9192) 2021-12-29 18:22:59 +01:00
Vadym
504580d833 Remove base-python gradle dependencies in connectors where base-python is not used (#7499)
* Remeve base-python references.

* Add requirements.txt

* Fix requirements.txt blank line

* Fix source-exchange rates to common CDK approach

* Fix source-smartsheets SAT.
Fix source-exchange-rates build.gradle.

* Bump docker version

* Update source-dixa SAT config

* Fix source-exchange-rates SAT config

* Revert bump scaffold sources version

* Fix source-shortio SAT config

* Fix source-square invalid_config.json

* Fix source-us-census invalid_config.json

* Fix source-intercom versioning
2021-11-10 13:12:29 +02:00
George Claireaux
1d3a17a8fb 🎉 Source S3 - memory & performance optimisations + advanced CSV options (#6615)
* memory & performance optimisations

* address comments

* version bump

* added advanced_options for reading csv without header, and more custom pyarrow ReadOptions

* updated to use the latest airbyte-cdk

* updated docs

* bump source-s3 to 0.1.6

* remove unneeded lines

* Use the all dep ami for python builds.

* ec2-instance-id should be ec2-image-id

* ec2-instance-id should be ec2-image-id

Co-authored-by: Jingkun Zhuang <Jingkun.Zhuang@icims.com>
Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-10-19 16:50:51 +01:00
Abhi Vaidyanatha
ae32ecbb27 GitBook: [master] 186 pages and 77 assets modified 2021-10-08 21:17:47 +00:00
Dmytro
6767424b6d 🎉 S3 source: add support for non-AWS S3 Storage (#6398) 2021-09-27 16:40:24 +03:00
Maksym Pavlenok
e5c44e64b1 🎉 Source S3: support of Parquet format (#5305)
* add parquet parser

* add integration tests for partquet formats

* add unit tests for parquet

* update docs and secrets

* fix incorrect import for tests

* add lib pandas for unit tests

* revert changes of foreign connectors

* update secret settings

* fix config values

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_spec.py

Co-authored-by: George Claireaux <george@claireaux.co.uk>

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_spec.py

Co-authored-by: George Claireaux <george@claireaux.co.uk>

* remove some unused default options

* update tests

* update docs

* bump its version

* fix expected test

Co-authored-by: Maksym Pavlenok <maksym.pavlenok@globallogic.com>
Co-authored-by: George Claireaux <george@claireaux.co.uk>
2021-09-05 02:40:49 +03:00
George Claireaux
137257b62b 🐛 Source S3: fixed bug where sync could hang indefinitely (#5197)
* infer schema in multi process

* use dill to pickle function

* moved funcs

* Revert "moved funcs"

This reverts commit c1739ad988.

* Revert "use dill to pickle function"

This reverts commit 52404a9f1b.

* Revert "infer schema in multi process"

This reverts commit f0fb6f66f9.

* multiprocess in csv schema iinfer

* simplify what happens in the multiprocess to offending code

* try this

* using tempfile

* formatting

* version bump

* changelog + formatting

* addressed review comments

* re-trigger checks

* ran testScaffoldTemplates to fix breaking check
2021-08-06 00:07:46 +01:00
George Claireaux
9e529545c2 🐛 Source S3: fixed bug in spec so that Format field displays in UI correctly (#5135)
* fixed bug in spec so that Format field displays in UI correctly

* newline & changelog
2021-08-02 17:23:10 +01:00
George Claireaux
d9f11bcf6a 🎉 New Source: S3 (+ abstract files source) (#4990)
* minor line length changes

* cdk generated source + oop structure + start of implementation

* fixed some broken syntax stuff

* pre-pyarrow convert

* introducing pyarrow

* skeleton for unit tests

* read working on multiple files

* incremental first draft

* blobfile -> fileclient

* change references of 'blob' to 'file'

* minor tidy to make draft PR

* fixes

* addressed review comments + more unit tests

* finished unit tests

* bugfixes and abstract integration tests framework

* remove old commented stuff

* docstrings

* restructure as source-s3

* Delete playground.py

* integration tests

* acceptance tests and some more reshuffling

* source S3 credentials

* change _airbyte_ columns to _ab_

* update spec with better descriptions and ordering

* created s3 source docs

* source definition

* reverse docstring change in cdk

* reverse docstring change

* reverse change

* reverse docstring change

* remove TODO comments

* add PR to changelog

* removed unused libraries

* formatting & address some review comments

* rename of files/classes for clarity

* addressing review comments

* address reviews

* add s3 source

* building spec with pydantic for provider-specific inheritance

* pydantic spec and improved path pattern with wcmatch.glob

* update path patterns info in doc

* formatting

* tests gzip and bz2 compression on csv

* updated compression support in doc

* forgot to upload bz2 test file

* added pattern validation to dataset

* formatting

* Format.

* ran testScaffoldTemplates & generated this diff

* bumped version because of documentationUrl fix

Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-07-30 15:06:11 +01:00