Oliver Meyer
5975c323d8
🐛 Source S3: fix datetime format string in FileStream ( #23195 )
...
* Fix datetime format string in FileStream
* Update changelog
* Fix integration tests
* Localize datetime objects
* Bump Dockerfile version
* auto-bump connector version
---------
Co-authored-by: Nataly Merezhuk <65251165+natalyjazzviolin@users.noreply.github.com >
Co-authored-by: sh4sh <6833405+sh4sh@users.noreply.github.com >
Co-authored-by: Evan Tahler <evan@airbyte.io >
Co-authored-by: Augustin <augustin@airbyte.io >
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2023-03-16 11:40:31 -07:00
Denys Davydov
3eecf5408c
Source S3: infer schema of the first file only ( #23189 )
...
* #1470 Source S3: infer schema of the first file
* #1470 source s3: upd changelog
* #1470 source s3: review fixes
* #1470 source s3: review fixes
* #1470 source s3: bump version
* #1470 source s3: review fixes
* auto-bump connector version
---------
Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com >
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2023-03-14 20:09:15 +02:00
Sophia Wiley
5512befeb1
Docs: updated links from .io to .com ( #23652 )
...
* updated links
* edited contributors link
* deleted line about CDK in docs
2023-03-06 17:27:55 +01:00
Baz
6a6039bbc5
🐛 Source S3: Make Advanced Reader Options and Advanced Options truly Optional ( #23669 )
2023-03-03 15:12:49 +02:00
Artem Inzhyyants
f83621ae05
Source S3: fix error handling: raise error on guessing file schema ( #23502 )
...
* Source S3: fix error handling: raise error on guessing file schema
* Source S3: update docs
* auto-bump connector version
---------
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2023-02-27 19:19:52 +01:00
Denys Davydov
e17464703d
Source s3: fix avro discovery ( #23198 )
...
* #23197 source s3: fix avro discovery
* #23197 source s3: upd changelog
* #23197 source s3: add allowed hosts
* #23197 source s3: fix tests
* #23197 - fix build: formatting
* auto-bump connector version
---------
Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com >
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2023-02-24 12:37:00 +02:00
Denys Davydov
3dc79f5a99
Source S3: speed up discovery ( #22500 )
...
* #1470 source S3: speed up discovery
* #1470 source s3: upd changelog
* auto-bump connector version
---------
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2023-02-09 21:44:48 +02:00
Denys Davydov
fcd3b0334e
Source S3: validate CSV read options and convert options ( #22550 )
...
* #1467 source S3: validate CSV read options and convert options
* #1467 source S3: upd changelog
* #1467 source s3: review fixes
* auto-bump connector version
---------
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2023-02-09 18:27:25 +02:00
Joe Reuter
6e373435f2
Small spec fixes to make sure they work with connector form UI ( #21587 )
2023-01-25 19:43:26 +01:00
Roman Yermilov [GL]
04a77ad3aa
Source S3: keep processing but warn if OSError happen ( #21604 )
...
* Source S3: keep processing but warn if OSError happen
* Source S3: bump version and update changelog
* auto-bump connector version
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2023-01-24 20:00:51 +04:00
Artem Inzhyyants
31edbd8bae
Source S3: update block size for json ( #21210 )
...
* Source S3: update block size for json
* Source S3: update docs
* auto-bump connector version
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2023-01-10 19:53:42 +00:00
Amruta Ranade
cae63965bd
Deployment docs and sidebar cleanup ( #20965 )
2023-01-03 19:18:35 +05:30
Artem Inzhyyants
09cfcbf599
🐛 Source S3: Check config settings for CSV file format ( #20262 )
...
* Source S3: get master schema on check connection
* Source S3: bump version
* Source S3: update docs
* Source S3: fix test
* Source S3: add fields validation for CSV source
* Source S3: add test
* Source S3: Refactor config validation
* Source S3: update docs
* Source S3: format
* Source S3: format
* Source S3: fix tests
* Source S3: fix tests
* Source S3: fix tests
* auto-bump connector version
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-12-14 21:53:06 +01:00
Arnaud Jeannin
0164355635
🎨 Add oss/cloud tags on doc for GA connectors ( #19118 )
...
* feat: add cloud and oss tags
* put headers back
* fix: rm prettier style
* fix: aws styles
2022-11-17 17:01:20 +01:00
Xingyuan-Chen
425cc91c85
Source S3: Add virtual-hosted-style option ( #19006 )
...
* add virtual-hosted-style option for S3 source
* update s3 version
* auto-bump connector version
Co-authored-by: Vincent Koc <vincentkoc@ieee.org >
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-11-08 10:48:16 -05:00
Denys Davydov
6a40ac52fe
Source S3: use AirbyteTracedException ( #18602 )
...
* #750 # 837 #904 Source S3: use AirbyteTracedException
* source s3: upd changelog
* auto-bump connector version
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-10-29 07:23:39 +03:00
Denys Davydov
5aa25a1e1a
Source S3 - fix schema inference ( #17991 )
...
* #678 oncall. Source S3 - fix schema inference
* source s3: upd changelog
* auto-bump connector version [ci skip]
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-10-14 14:53:39 +03:00
Serhii Lazebnyi
5df66cd572
Source S3: Connector does not enforce SSL/TLS for non-S3 endpoints ( #17800 )
...
* Deleted ssl/tsl flag from config
* Updated PR number
* auto-bump connector version [ci skip]
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-10-12 16:07:22 +02:00
Augustin
ff4ea3961a
Republish connectors using CDK 0.1.88 to 0.1.89 ( #17304 )
2022-09-28 18:18:59 +02:00
Denys Davydov
9054468c21
Source s3: upgrade pyarrow ( #16921 )
...
* #423 oncall source s3: upgrade pyarrow
* source s3: upd changelog
* auto-bump connector version [ci skip]
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-09-20 19:24:07 +03:00
Denys Davydov
4dc394cb9a
Source S3: fix reading jsonl files with nested data ( #16607 )
...
* #531 source s3: fix reading nested jsonl files
* #531 source s3: upd changelog
* oncall #531 source s3: fix sample file
* auto-bump connector version [ci skip]
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-09-19 12:09:40 +03:00
Denys Davydov
73ba7b63d5
Source S3: choose between data types when merging master schema ( #16631 )
...
* #422 source s3: choose broadest data type when there is a mismatch during merging json schemas
* #422 source s3: upd changelog
2022-09-19 10:50:18 +03:00
Bhupesh Varshney
498d70089f
Source S3: Doc fix grammar & typo describing parquet file source ( #16264 )
2022-09-05 17:53:41 -03:00
Liren Tu
a475235d89
📝 S3 source: update doc about path_prefix + path_pattern ( #16206 )
...
* Add more explanation for path_prefix + path_pattern
* Simplify wording
* Update wording
2022-08-31 20:51:00 -07:00
Jagruti Tiwari
8288c16485
fix: replace airbyte oss with airbyte open source ( #15885 )
...
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com >
2022-08-24 01:01:53 -03:00
sivankumar86
3d499557b7
source-S3: Support JSON format ( #14213 )
...
* json format support added
* json format support added
* code formatted
* format convertion changed
* format naming convertion changed
* test cased issue fixed
* test case issued resolved
* sample file and config added for integration tests
* Json doc added
Json doc added
* update
* sample file and config added for integration tests
* sample file and config added for integration tests
* update jsonl files
* review 1
* review 1
* review 1
* pyarrow version upgrade
* clean integration test folder architecture
* add timestamp record to simple_test.jsonl
* fixed integration test and parser review change
* simplify table read
* doc update
* fix specs
* user sample files
* fix sample files
* add newlines at end of files
* rename json parser
* rename jsonfile to jsonlfile
* schema inference added
* patch review fix
* Update docs/integrations/sources/s3.md
doc update
Co-authored-by: George Claireaux <george@airbyte.io >
* changing the version
* changing the title to sync with other type
* fix expected csv records
* fix expected records for avro and parquet
* review fix
* fixed master schema handling
* remove sample configs
* fix expected records
* json doc update
added more details on json parser
* fixed api name
* bump version
* auto-bump connector version [ci skip]
Co-authored-by: alafanechere <augustin.lafanechere@gmail.com >
Co-authored-by: George Claireaux <george@airbyte.io >
Co-authored-by: George Claireaux <george@claireaux.co.uk >
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-08-01 15:48:23 +01:00
Serhii Chvaliuk
29d6a61a21
🐛 Source S3: "decimal" type added for parquet ( #14911 )
...
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com >
2022-07-22 01:04:44 +03:00
Baz
7cf67e2c85
🐛 Source S3: fixed bug when extra columns not in master schema ( #14669 )
2022-07-13 22:56:03 +03:00
Topher Lubaway
9c6c092a22
Revert "Improving docusaurus sidebar generation ( #1927 ) ( #14369 )" ( #14596 )
...
This reverts commit a2c194a11f .
2022-07-11 15:27:14 -05:00
Mykyta Serbynevskiy
a2c194a11f
Improving docusaurus sidebar generation ( #1927 ) ( #14369 )
...
* Improving docusaurus sidebar generation (#1927 )
* Added "Career & open positions" folder to sidebar, adjusted "Project overview" folder
* Deleted "career-and-open-positions" folder from sidebar
2022-07-08 14:18:27 -05:00
Serhii Lazebnyi
f896c574d1
🎉 Source Amazon S3: Fix docs link issue ( #14397 )
...
* Fix UI connector name and link issue
* Revert name to S3
2022-07-08 15:13:00 +03:00
Serhii Lazebnyi
f9348b2251
🐛 Source Amazon S3: solve possible case of files being missed during incremental syncs ( #12568 )
...
* Added history to state
* Deleted unused import
* Rollback abnormal state file
* Rollback abnormal state file
* Fixed type error issue
* Fix state issue
* Updated after review
* Bumped version
2022-05-31 21:39:10 +03:00
Serhii Lazebnyi
91326749d9
🎉 Source Amazon S3: increase unit test coverage at least 90% ( #11967 )
...
* Increased unittest coverage
* #11676 test coverage 85%
* #11676 unit tests 90%
* #11676 two more unit tests
* #11676 bump version
* auto-bump connector version
Co-authored-by: Denys Davydov <denys.i.davydov@globallogic.com >
Co-authored-by: Denys Davydov <davydov.den18@gmail.com >
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-05-23 13:37:27 +03:00
Serhii Lazebnyi
225aecd37c
🐛 Source Amazon S3: Fixed empty options issue ( #12730 )
...
* Fixed empty oprions issue
* Update airbyte-integrations/connectors/source-s3/source_s3/utils.py
Co-authored-by: Denis Davydov <denys.i.davydov@globallogic.com >
* Bumped version
* Fix typo
* Bumped seed version
* Fix changelog
* Bumped version in docker file
* auto-bump connector version
Co-authored-by: Denis Davydov <denys.i.davydov@globallogic.com >
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-05-11 21:21:54 +03:00
Melker Öhrman
f9188590cc
Add avro parser to s3 source ( #12602 )
...
* added MVP avro parser running fine locally
* added unit tests for avro
* added wip state of avro integration test setup
* deleted unused files
* added avro specific config path
* fixed comments. Added nested record support, simplify code and minor fixes
* bumped version + docs update
* Added working acceptance tests + format
* auto-bump connector version
Co-authored-by: George Claireaux <george@claireaux.co.uk >
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2022-05-11 16:59:58 +01:00
Davin Chia
f8a35eaa80
Add Java Catalog documentation. ( #12751 )
...
Clean up and add better guidelines on how to use the Java catalogs we recently added.
Took the chance to move existing documentation to improve reading flow.
2022-05-11 13:02:07 +08:00
Serhii Lazebnyi
27e6ce2ca8
Source Amazon S3: Refactored docs ( #12534 )
...
* Refactored spec and docs
* Updated spec.json
* Rollback spec fromating
* Rollback spec fromating
* Rollback spec fromating
2022-05-09 14:56:52 +03:00
Sherif A. Nada
b8e147538c
Update various connector input configs & docs copy ( #12500 )
2022-05-04 23:37:10 -07:00
Maksym Pavlenok
91eff1dffd
🐛 Source S3: Loading of files' metadata ( #8252 )
2022-02-02 00:49:18 +02:00
Serhii Chvaliuk
c7021e6f30
🐛 Source S3: work-around for format.delimiter change '\\t' -> '\t' ( #9163 )
...
* work-around for format.delimiter '\\t' -> '\t'
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com >
2022-01-06 20:49:55 +02:00
Augustin
14b301ce37
Update S3 and file sources docs: we do not support unstructured data ( #9192 )
2021-12-29 18:22:59 +01:00
Vadym
504580d833
Remove base-python gradle dependencies in connectors where base-python is not used ( #7499 )
...
* Remeve base-python references.
* Add requirements.txt
* Fix requirements.txt blank line
* Fix source-exchange rates to common CDK approach
* Fix source-smartsheets SAT.
Fix source-exchange-rates build.gradle.
* Bump docker version
* Update source-dixa SAT config
* Fix source-exchange-rates SAT config
* Revert bump scaffold sources version
* Fix source-shortio SAT config
* Fix source-square invalid_config.json
* Fix source-us-census invalid_config.json
* Fix source-intercom versioning
2021-11-10 13:12:29 +02:00
George Claireaux
1d3a17a8fb
🎉 Source S3 - memory & performance optimisations + advanced CSV options ( #6615 )
...
* memory & performance optimisations
* address comments
* version bump
* added advanced_options for reading csv without header, and more custom pyarrow ReadOptions
* updated to use the latest airbyte-cdk
* updated docs
* bump source-s3 to 0.1.6
* remove unneeded lines
* Use the all dep ami for python builds.
* ec2-instance-id should be ec2-image-id
* ec2-instance-id should be ec2-image-id
Co-authored-by: Jingkun Zhuang <Jingkun.Zhuang@icims.com >
Co-authored-by: Davin Chia <davinchia@gmail.com >
2021-10-19 16:50:51 +01:00
Abhi Vaidyanatha
ae32ecbb27
GitBook: [master] 186 pages and 77 assets modified
2021-10-08 21:17:47 +00:00
Dmytro
6767424b6d
🎉 S3 source: add support for non-AWS S3 Storage ( #6398 )
2021-09-27 16:40:24 +03:00
Maksym Pavlenok
e5c44e64b1
🎉 Source S3: support of Parquet format ( #5305 )
...
* add parquet parser
* add integration tests for partquet formats
* add unit tests for parquet
* update docs and secrets
* fix incorrect import for tests
* add lib pandas for unit tests
* revert changes of foreign connectors
* update secret settings
* fix config values
* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_spec.py
Co-authored-by: George Claireaux <george@claireaux.co.uk >
* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_spec.py
Co-authored-by: George Claireaux <george@claireaux.co.uk >
* remove some unused default options
* update tests
* update docs
* bump its version
* fix expected test
Co-authored-by: Maksym Pavlenok <maksym.pavlenok@globallogic.com >
Co-authored-by: George Claireaux <george@claireaux.co.uk >
2021-09-05 02:40:49 +03:00
George Claireaux
137257b62b
🐛 Source S3: fixed bug where sync could hang indefinitely ( #5197 )
...
* infer schema in multi process
* use dill to pickle function
* moved funcs
* Revert "moved funcs"
This reverts commit c1739ad988 .
* Revert "use dill to pickle function"
This reverts commit 52404a9f1b .
* Revert "infer schema in multi process"
This reverts commit f0fb6f66f9 .
* multiprocess in csv schema iinfer
* simplify what happens in the multiprocess to offending code
* try this
* using tempfile
* formatting
* version bump
* changelog + formatting
* addressed review comments
* re-trigger checks
* ran testScaffoldTemplates to fix breaking check
2021-08-06 00:07:46 +01:00
George Claireaux
9e529545c2
🐛 Source S3: fixed bug in spec so that Format field displays in UI correctly ( #5135 )
...
* fixed bug in spec so that Format field displays in UI correctly
* newline & changelog
2021-08-02 17:23:10 +01:00
George Claireaux
d9f11bcf6a
🎉 New Source: S3 (+ abstract files source) ( #4990 )
...
* minor line length changes
* cdk generated source + oop structure + start of implementation
* fixed some broken syntax stuff
* pre-pyarrow convert
* introducing pyarrow
* skeleton for unit tests
* read working on multiple files
* incremental first draft
* blobfile -> fileclient
* change references of 'blob' to 'file'
* minor tidy to make draft PR
* fixes
* addressed review comments + more unit tests
* finished unit tests
* bugfixes and abstract integration tests framework
* remove old commented stuff
* docstrings
* restructure as source-s3
* Delete playground.py
* integration tests
* acceptance tests and some more reshuffling
* source S3 credentials
* change _airbyte_ columns to _ab_
* update spec with better descriptions and ordering
* created s3 source docs
* source definition
* reverse docstring change in cdk
* reverse docstring change
* reverse change
* reverse docstring change
* remove TODO comments
* add PR to changelog
* removed unused libraries
* formatting & address some review comments
* rename of files/classes for clarity
* addressing review comments
* address reviews
* add s3 source
* building spec with pydantic for provider-specific inheritance
* pydantic spec and improved path pattern with wcmatch.glob
* update path patterns info in doc
* formatting
* tests gzip and bz2 compression on csv
* updated compression support in doc
* forgot to upload bz2 test file
* added pattern validation to dataset
* formatting
* Format.
* ran testScaffoldTemplates & generated this diff
* bumped version because of documentationUrl fix
Co-authored-by: Davin Chia <davinchia@gmail.com >
2021-07-30 15:06:11 +01:00