1
0
mirror of synced 2025-12-21 19:11:14 -05:00
Commit Graph

13 Commits

Author SHA1 Message Date
Aaron ("AJ") Steers
83ecbe0fc3 CI: apply pre-commit format fix from #49806 (#49852) 2024-12-18 14:05:43 -08:00
Anton Karpets
9c8d0c19df Source S3: avoid error on empty stream when running discover (#38674) 2024-05-28 15:59:04 +03:00
Catherine Noll
6946052513 Source S3: maintain backwards compatibility between V3 & V4 state messages (#29028) 2023-08-11 11:38:43 -04:00
Artem Inzhyyants
c68afefdf0 Source S3: handle Bucket Access Errors (#27651)
* Source S3: handle bucket access errors

* Source S3: update docs
2023-06-23 13:22:57 +02:00
Artem Inzhyyants
3080f65429 Source S3: Add start date filter for files (#25010)
* Source S3: Add start date filter for files

* Source S3: add docs

* Source S3: add unittest

* Source S3: add unittest

* Source S3: add unittest

* Source S3: Fix spec test

* Source S3: bump version

* Source S3: fix tests

* Source S3: fix description

* auto-bump connector version

* Source S3: refactor start_date filtering

* Source S3: update setup

* Source S3: serialize state for cache

* Source S3: refactor skip file filter

* Source S3: bump version + update docs

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-04-18 14:07:15 +02:00
Cole Snodgrass
2e099acc52 update headers from 2022 -> 2023 (#22594)
* It's 2023!

* 2022 -> 2023

---------

Co-authored-by: evantahler <evan@airbyte.io>
2023-02-08 13:01:16 -08:00
Marcos Marx
dca2256a7c Bump 2022 license version (#13233)
* Bump year in license short to 2022

* remove protocol from cdk
2022-05-26 15:00:42 -03:00
Maksym Pavlenok
91eff1dffd 🐛 Source S3: Loading of files' metadata (#8252) 2022-02-02 00:49:18 +02:00
Marcos Eliziario Santos
ff0c09a724 s3-source add option to not infer datatypes (#7892)
* Implement Flag to avoid inferring data types for CSV input files in s3 SOURCE

* Unit Tests to Flag to avoid inferring data types for CSV input files in s3 SOURCE
Refactor parametrized tests in CSV and Parquet formats to use pytest.parametrize for better error reporting on test failure.

* S3 Source, infer_datatypes flag: additional unit tests

* wrong method signature

* Refactors

* s3-source - infer_datatypes flag, fix user message

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/csv_spec.py

Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>

* s3-source - refactor - use spec defaults instead of hardcoding them in code.

* Update airbyte-integrations/connectors/source-s3/source_s3/utils.py

Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>

* code review changes

Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>
2021-11-30 10:03:49 -03:00
George Claireaux
1d3a17a8fb 🎉 Source S3 - memory & performance optimisations + advanced CSV options (#6615)
* memory & performance optimisations

* address comments

* version bump

* added advanced_options for reading csv without header, and more custom pyarrow ReadOptions

* updated to use the latest airbyte-cdk

* updated docs

* bump source-s3 to 0.1.6

* remove unneeded lines

* Use the all dep ami for python builds.

* ec2-instance-id should be ec2-image-id

* ec2-instance-id should be ec2-image-id

Co-authored-by: Jingkun Zhuang <Jingkun.Zhuang@icims.com>
Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-10-19 16:50:51 +01:00
Michel Tricot
1773e41e47 Shorten our headers + adds contributors file (#6478) 2021-09-27 10:45:50 -07:00
Dmytro
6767424b6d 🎉 S3 source: add support for non-AWS S3 Storage (#6398) 2021-09-27 16:40:24 +03:00
George Claireaux
d9f11bcf6a 🎉 New Source: S3 (+ abstract files source) (#4990)
* minor line length changes

* cdk generated source + oop structure + start of implementation

* fixed some broken syntax stuff

* pre-pyarrow convert

* introducing pyarrow

* skeleton for unit tests

* read working on multiple files

* incremental first draft

* blobfile -> fileclient

* change references of 'blob' to 'file'

* minor tidy to make draft PR

* fixes

* addressed review comments + more unit tests

* finished unit tests

* bugfixes and abstract integration tests framework

* remove old commented stuff

* docstrings

* restructure as source-s3

* Delete playground.py

* integration tests

* acceptance tests and some more reshuffling

* source S3 credentials

* change _airbyte_ columns to _ab_

* update spec with better descriptions and ordering

* created s3 source docs

* source definition

* reverse docstring change in cdk

* reverse docstring change

* reverse change

* reverse docstring change

* remove TODO comments

* add PR to changelog

* removed unused libraries

* formatting & address some review comments

* rename of files/classes for clarity

* addressing review comments

* address reviews

* add s3 source

* building spec with pydantic for provider-specific inheritance

* pydantic spec and improved path pattern with wcmatch.glob

* update path patterns info in doc

* formatting

* tests gzip and bz2 compression on csv

* updated compression support in doc

* forgot to upload bz2 test file

* added pattern validation to dataset

* formatting

* Format.

* ran testScaffoldTemplates & generated this diff

* bumped version because of documentationUrl fix

Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-07-30 15:06:11 +01:00