1
0
mirror of synced 2025-12-22 03:21:25 -05:00
Commit Graph

149 Commits

Author SHA1 Message Date
Oliver Meyer
5975c323d8 🐛 Source S3: fix datetime format string in FileStream (#23195)
* Fix datetime format string in FileStream

* Update changelog

* Fix integration tests

* Localize datetime objects

* Bump Dockerfile version

* auto-bump connector version

---------

Co-authored-by: Nataly Merezhuk <65251165+natalyjazzviolin@users.noreply.github.com>
Co-authored-by: sh4sh <6833405+sh4sh@users.noreply.github.com>
Co-authored-by: Evan Tahler <evan@airbyte.io>
Co-authored-by: Augustin <augustin@airbyte.io>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-03-16 11:40:31 -07:00
Denys Davydov
3eecf5408c Source S3: infer schema of the first file only (#23189)
* #1470 Source S3: infer schema of the first file

* #1470 source s3: upd changelog

* #1470 source s3: review fixes

* #1470 source s3: review fixes

* #1470 source s3: bump version

* #1470 source s3: review fixes

* auto-bump connector version

---------

Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-03-14 20:09:15 +02:00
Sophia Wiley
5512befeb1 Docs: updated links from .io to .com (#23652)
* updated links

* edited contributors link

* deleted line about CDK in docs
2023-03-06 17:27:55 +01:00
Baz
6a6039bbc5 🐛 Source S3: Make Advanced Reader Options and Advanced Options truly Optional (#23669) 2023-03-03 15:12:49 +02:00
Artem Inzhyyants
f83621ae05 Source S3: fix error handling: raise error on guessing file schema (#23502)
* Source S3: fix error handling: raise error on guessing file schema

* Source S3: update docs

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-02-27 19:19:52 +01:00
Denys Davydov
e17464703d Source s3: fix avro discovery (#23198)
* #23197 source s3: fix avro discovery

* #23197 source s3: upd changelog

* #23197 source s3: add allowed hosts

* #23197 source s3: fix tests

* #23197 - fix build: formatting

* auto-bump connector version

---------

Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-02-24 12:37:00 +02:00
Denys Davydov
3dc79f5a99 Source S3: speed up discovery (#22500)
* #1470 source S3: speed up discovery

* #1470 source s3: upd changelog

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-02-09 21:44:48 +02:00
Denys Davydov
fcd3b0334e Source S3: validate CSV read options and convert options (#22550)
* #1467 source S3: validate CSV read options and convert options

* #1467 source S3: upd changelog

* #1467 source s3: review fixes

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-02-09 18:27:25 +02:00
Joe Reuter
6e373435f2 Small spec fixes to make sure they work with connector form UI (#21587) 2023-01-25 19:43:26 +01:00
Roman Yermilov [GL]
04a77ad3aa Source S3: keep processing but warn if OSError happen (#21604)
* Source S3: keep processing but warn if OSError happen

* Source S3: bump version and update changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-01-24 20:00:51 +04:00
Artem Inzhyyants
31edbd8bae Source S3: update block size for json (#21210)
* Source S3: update block size for json

* Source S3: update docs

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-01-10 19:53:42 +00:00
Amruta Ranade
cae63965bd Deployment docs and sidebar cleanup (#20965) 2023-01-03 19:18:35 +05:30
Artem Inzhyyants
09cfcbf599 🐛 Source S3: Check config settings for CSV file format (#20262)
* Source S3: get master schema on check connection

* Source S3: bump version

* Source S3: update docs

* Source S3: fix test

* Source S3: add fields validation for CSV source

* Source S3: add test

* Source S3: Refactor config validation

* Source S3: update docs

* Source S3: format

* Source S3: format

* Source S3: fix tests

* Source S3: fix tests

* Source S3: fix tests

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-12-14 21:53:06 +01:00
Arnaud Jeannin
0164355635 🎨 Add oss/cloud tags on doc for GA connectors (#19118)
* feat: add cloud and oss tags

* put headers back

* fix: rm prettier style

* fix: aws styles
2022-11-17 17:01:20 +01:00
Xingyuan-Chen
425cc91c85 Source S3: Add virtual-hosted-style option (#19006)
* add virtual-hosted-style option for S3 source

* update s3 version

* auto-bump connector version

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-11-08 10:48:16 -05:00
Denys Davydov
6a40ac52fe Source S3: use AirbyteTracedException (#18602)
* #750 # 837 #904 Source S3: use AirbyteTracedException

* source s3: upd changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-10-29 07:23:39 +03:00
Denys Davydov
5aa25a1e1a Source S3 - fix schema inference (#17991)
* #678 oncall. Source S3 - fix schema inference

* source s3: upd changelog

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-10-14 14:53:39 +03:00
Serhii Lazebnyi
5df66cd572 Source S3: Connector does not enforce SSL/TLS for non-S3 endpoints (#17800)
* Deleted ssl/tsl flag from config

* Updated PR number

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-10-12 16:07:22 +02:00
Augustin
ff4ea3961a Republish connectors using CDK 0.1.88 to 0.1.89 (#17304) 2022-09-28 18:18:59 +02:00
Denys Davydov
9054468c21 Source s3: upgrade pyarrow (#16921)
* #423 oncall source s3: upgrade pyarrow

* source s3: upd changelog

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-09-20 19:24:07 +03:00
Denys Davydov
4dc394cb9a Source S3: fix reading jsonl files with nested data (#16607)
* #531 source s3: fix reading nested jsonl files

* #531 source s3: upd changelog

* oncall #531 source s3: fix sample file

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-09-19 12:09:40 +03:00
Denys Davydov
73ba7b63d5 Source S3: choose between data types when merging master schema (#16631)
* #422 source s3: choose broadest data type when there is a mismatch during merging json schemas

* #422 source s3: upd changelog
2022-09-19 10:50:18 +03:00
Bhupesh Varshney
498d70089f Source S3: Doc fix grammar & typo describing parquet file source (#16264) 2022-09-05 17:53:41 -03:00
Liren Tu
a475235d89 📝 S3 source: update doc about path_prefix + path_pattern (#16206)
* Add more explanation for path_prefix + path_pattern

* Simplify wording

* Update wording
2022-08-31 20:51:00 -07:00
Jagruti Tiwari
8288c16485 fix: replace airbyte oss with airbyte open source (#15885)
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
2022-08-24 01:01:53 -03:00
sivankumar86
3d499557b7 source-S3: Support JSON format (#14213)
* json format support added

* json format support added

* code formatted

* format convertion changed

* format naming convertion changed

* test cased issue fixed

* test case issued resolved

* sample file and config added for integration tests

* Json doc added

Json doc added

* update

* sample file and config added for integration tests

* sample file and config added for integration tests

* update jsonl files

* review 1

* review 1

* review 1

* pyarrow version upgrade

* clean integration test folder architecture

* add timestamp record to simple_test.jsonl

* fixed integration test and parser review change

* simplify table read

* doc update

* fix specs

* user sample files

* fix sample files

* add newlines at end of files

* rename json parser

* rename jsonfile to jsonlfile

* schema inference added

* patch review fix

* Update docs/integrations/sources/s3.md

doc update

Co-authored-by: George Claireaux <george@airbyte.io>

* changing the version

* changing the title to sync with other type

* fix expected csv records

* fix expected records for avro and parquet

* review fix

* fixed master schema handling

* remove sample configs

* fix expected records

* json doc update

added more details on json parser

* fixed api name

* bump version

* auto-bump connector version [ci skip]

Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
Co-authored-by: George Claireaux <george@airbyte.io>
Co-authored-by: George Claireaux <george@claireaux.co.uk>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-08-01 15:48:23 +01:00
Serhii Chvaliuk
29d6a61a21 🐛 Source S3: "decimal" type added for parquet (#14911)
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-07-22 01:04:44 +03:00
Baz
7cf67e2c85 🐛 Source S3: fixed bug when extra columns not in master schema (#14669) 2022-07-13 22:56:03 +03:00
Topher Lubaway
9c6c092a22 Revert "Improving docusaurus sidebar generation (#1927) (#14369)" (#14596)
This reverts commit a2c194a11f.
2022-07-11 15:27:14 -05:00
Mykyta Serbynevskiy
a2c194a11f Improving docusaurus sidebar generation (#1927) (#14369)
* Improving docusaurus sidebar generation (#1927)

* Added "Career & open positions" folder to sidebar, adjusted "Project overview" folder

* Deleted "career-and-open-positions" folder from sidebar
2022-07-08 14:18:27 -05:00
Serhii Lazebnyi
f896c574d1 🎉 Source Amazon S3: Fix docs link issue (#14397)
* Fix UI connector name and link issue

* Revert name to S3
2022-07-08 15:13:00 +03:00
Serhii Lazebnyi
f9348b2251 🐛 Source Amazon S3: solve possible case of files being missed during incremental syncs (#12568)
* Added history to state

* Deleted unused import

* Rollback abnormal state file

* Rollback abnormal state file

* Fixed type error issue

* Fix state issue

* Updated after review

* Bumped version
2022-05-31 21:39:10 +03:00
Serhii Lazebnyi
91326749d9 🎉Source Amazon S3: increase unit test coverage at least 90% (#11967)
* Increased unittest coverage

* #11676 test coverage 85%

* #11676 unit tests 90%

* #11676 two more unit tests

* #11676 bump version

* auto-bump connector version

Co-authored-by: Denys Davydov <denys.i.davydov@globallogic.com>
Co-authored-by: Denys Davydov <davydov.den18@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-05-23 13:37:27 +03:00
Serhii Lazebnyi
225aecd37c 🐛Source Amazon S3: Fixed empty options issue (#12730)
* Fixed empty oprions issue

* Update airbyte-integrations/connectors/source-s3/source_s3/utils.py

Co-authored-by: Denis Davydov <denys.i.davydov@globallogic.com>

* Bumped version

* Fix typo

* Bumped seed version

* Fix changelog

* Bumped version in docker file

* auto-bump connector version

Co-authored-by: Denis Davydov <denys.i.davydov@globallogic.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-05-11 21:21:54 +03:00
Melker Öhrman
f9188590cc Add avro parser to s3 source (#12602)
* added MVP avro parser running fine locally

* added unit tests for avro

* added wip state of avro integration test setup

* deleted unused files

* added avro specific config path

* fixed comments. Added nested record support, simplify code and minor fixes

* bumped version + docs update

* Added working acceptance tests + format

* auto-bump connector version

Co-authored-by: George Claireaux <george@claireaux.co.uk>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-05-11 16:59:58 +01:00
Davin Chia
f8a35eaa80 Add Java Catalog documentation. (#12751)
Clean up and add better guidelines on how to use the Java catalogs we recently added.

Took the chance to move existing documentation to improve reading flow.
2022-05-11 13:02:07 +08:00
Serhii Lazebnyi
27e6ce2ca8 Source Amazon S3: Refactored docs (#12534)
* Refactored spec and docs

* Updated spec.json

* Rollback spec fromating

* Rollback spec fromating

* Rollback spec fromating
2022-05-09 14:56:52 +03:00
Sherif A. Nada
b8e147538c Update various connector input configs & docs copy (#12500) 2022-05-04 23:37:10 -07:00
Maksym Pavlenok
91eff1dffd 🐛 Source S3: Loading of files' metadata (#8252) 2022-02-02 00:49:18 +02:00
Serhii Chvaliuk
c7021e6f30 🐛 Source S3: work-around for format.delimiter change '\\t' -> '\t' (#9163)
* work-around for format.delimiter '\\t' -> '\t'

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-01-06 20:49:55 +02:00
Augustin
14b301ce37 Update S3 and file sources docs: we do not support unstructured data (#9192) 2021-12-29 18:22:59 +01:00
Vadym
504580d833 Remove base-python gradle dependencies in connectors where base-python is not used (#7499)
* Remeve base-python references.

* Add requirements.txt

* Fix requirements.txt blank line

* Fix source-exchange rates to common CDK approach

* Fix source-smartsheets SAT.
Fix source-exchange-rates build.gradle.

* Bump docker version

* Update source-dixa SAT config

* Fix source-exchange-rates SAT config

* Revert bump scaffold sources version

* Fix source-shortio SAT config

* Fix source-square invalid_config.json

* Fix source-us-census invalid_config.json

* Fix source-intercom versioning
2021-11-10 13:12:29 +02:00
George Claireaux
1d3a17a8fb 🎉 Source S3 - memory & performance optimisations + advanced CSV options (#6615)
* memory & performance optimisations

* address comments

* version bump

* added advanced_options for reading csv without header, and more custom pyarrow ReadOptions

* updated to use the latest airbyte-cdk

* updated docs

* bump source-s3 to 0.1.6

* remove unneeded lines

* Use the all dep ami for python builds.

* ec2-instance-id should be ec2-image-id

* ec2-instance-id should be ec2-image-id

Co-authored-by: Jingkun Zhuang <Jingkun.Zhuang@icims.com>
Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-10-19 16:50:51 +01:00
Abhi Vaidyanatha
ae32ecbb27 GitBook: [master] 186 pages and 77 assets modified 2021-10-08 21:17:47 +00:00
Dmytro
6767424b6d 🎉 S3 source: add support for non-AWS S3 Storage (#6398) 2021-09-27 16:40:24 +03:00
Maksym Pavlenok
e5c44e64b1 🎉 Source S3: support of Parquet format (#5305)
* add parquet parser

* add integration tests for partquet formats

* add unit tests for parquet

* update docs and secrets

* fix incorrect import for tests

* add lib pandas for unit tests

* revert changes of foreign connectors

* update secret settings

* fix config values

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_spec.py

Co-authored-by: George Claireaux <george@claireaux.co.uk>

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_spec.py

Co-authored-by: George Claireaux <george@claireaux.co.uk>

* remove some unused default options

* update tests

* update docs

* bump its version

* fix expected test

Co-authored-by: Maksym Pavlenok <maksym.pavlenok@globallogic.com>
Co-authored-by: George Claireaux <george@claireaux.co.uk>
2021-09-05 02:40:49 +03:00
George Claireaux
137257b62b 🐛 Source S3: fixed bug where sync could hang indefinitely (#5197)
* infer schema in multi process

* use dill to pickle function

* moved funcs

* Revert "moved funcs"

This reverts commit c1739ad988.

* Revert "use dill to pickle function"

This reverts commit 52404a9f1b.

* Revert "infer schema in multi process"

This reverts commit f0fb6f66f9.

* multiprocess in csv schema iinfer

* simplify what happens in the multiprocess to offending code

* try this

* using tempfile

* formatting

* version bump

* changelog + formatting

* addressed review comments

* re-trigger checks

* ran testScaffoldTemplates to fix breaking check
2021-08-06 00:07:46 +01:00
George Claireaux
9e529545c2 🐛 Source S3: fixed bug in spec so that Format field displays in UI correctly (#5135)
* fixed bug in spec so that Format field displays in UI correctly

* newline & changelog
2021-08-02 17:23:10 +01:00
George Claireaux
d9f11bcf6a 🎉 New Source: S3 (+ abstract files source) (#4990)
* minor line length changes

* cdk generated source + oop structure + start of implementation

* fixed some broken syntax stuff

* pre-pyarrow convert

* introducing pyarrow

* skeleton for unit tests

* read working on multiple files

* incremental first draft

* blobfile -> fileclient

* change references of 'blob' to 'file'

* minor tidy to make draft PR

* fixes

* addressed review comments + more unit tests

* finished unit tests

* bugfixes and abstract integration tests framework

* remove old commented stuff

* docstrings

* restructure as source-s3

* Delete playground.py

* integration tests

* acceptance tests and some more reshuffling

* source S3 credentials

* change _airbyte_ columns to _ab_

* update spec with better descriptions and ordering

* created s3 source docs

* source definition

* reverse docstring change in cdk

* reverse docstring change

* reverse change

* reverse docstring change

* remove TODO comments

* add PR to changelog

* removed unused libraries

* formatting & address some review comments

* rename of files/classes for clarity

* addressing review comments

* address reviews

* add s3 source

* building spec with pydantic for provider-specific inheritance

* pydantic spec and improved path pattern with wcmatch.glob

* update path patterns info in doc

* formatting

* tests gzip and bz2 compression on csv

* updated compression support in doc

* forgot to upload bz2 test file

* added pattern validation to dataset

* formatting

* Format.

* ran testScaffoldTemplates & generated this diff

* bumped version because of documentationUrl fix

Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-07-30 15:06:11 +01:00