1
0
mirror of synced 2025-12-21 19:11:14 -05:00
Commit Graph

30 Commits

Author SHA1 Message Date
Aldo Gonzalez
a662d91db1 feat(source-s3): adjust file record message protocol (#57498) 2025-05-05 10:00:17 -06:00
Aaron ("AJ") Steers
83ecbe0fc3 CI: apply pre-commit format fix from #49806 (#49852) 2024-12-18 14:05:43 -08:00
Aldo Gonzalez
ce34807b4e Feat(source-s3): add file-transfer feature to s3 (#48346) 2024-11-12 09:27:05 -06:00
Ella Rohm-Ensing
9b19d9078f update source-s3 to use airbyte-cdk version with protocol pydantic v2 (#39573) 2024-06-27 02:51:15 +02:00
Anton Karpets
9c8d0c19df Source S3: avoid error on empty stream when running discover (#38674) 2024-05-28 15:59:04 +03:00
Catherine Noll
6ed63f5b2c Source S3: run incremental syncs with concurrency (#34895) 2024-02-25 12:53:39 -05:00
Roman Yermilov [GL]
bbb06b866f Source S3: add filter by start date (#35392) 2024-02-20 19:44:45 +01:00
Catherine Noll
53d71f94e8 Source S3: updates for compatibility with the concurrent CDK (#34591) 2024-02-05 10:01:28 -05:00
Anatolii Yatsuk
63c6961e78 Source S3: Add IAM Role Authentication (#33818) 2024-01-17 19:15:09 +02:00
Ella Rohm-Ensing
ac3eb28de2 airbyte-ci: add format commands (#31831)
Co-authored-by: Ben Church <ben@airbyte.io>
Co-authored-by: bnchrch <bnchrch@users.noreply.github.com>
Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
Co-authored-by: Augustin <augustin@airbyte.io>
Co-authored-by: Marius Posta <marius@airbyte.io>
Co-authored-by: alafanechere <alafanechere@users.noreply.github.com>
2023-11-14 02:17:48 -06:00
Anatolii Yatsuk
21cfb2a083 Source S3: Add HTTPS validation for S3 endpoint (#32109) 2023-11-10 12:33:38 +02:00
Joe Reuter
7c7acade71 S3 and Azure Blob Storage: Update File CDK to support document file types (#31904)
Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
2023-10-31 11:21:22 +01:00
Joe Reuter
68e99ce224 🎉 Source S3: Reduce image size and add acceptance test (#31654)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-25 06:44:00 -04:00
Anatolii Yatsuk
951605ae8a Source S3: Add reading files inside zip archive (#31340) 2023-10-18 11:53:01 +03:00
Joe Reuter
0a01bc26f4 S3: Basic unstructured file support (#31209)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-17 15:18:27 +02:00
Marius Posta
7ae97175a6 gradle: fix repo wide behaviour (#30607) 2023-09-28 05:01:13 -07:00
Maxime Carbonneau-Leclerc
2954cbb7ce Source S3: remove streams.*.file_type from source-s3 configuration (#30476) 2023-09-18 09:34:26 -04:00
Maxime Carbonneau-Leclerc
4e7c70f767 Source S3: v4 rollout - take 3 (#30153)
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
2023-09-05 14:33:36 -04:00
Marius Posta
f5c7c1c0b8 chore: get ./gradlew format to pass for the whole repo (same java style) (#29786) 2023-08-24 05:09:42 -05:00
Maxime Carbonneau-Leclerc
40b76a7813 Source S3: v4 rollout/feature parity (#29753) 2023-08-23 11:30:08 -04:00
Catherine Noll
620a941d21 Source S3: don't require history to be present to identify legacy state format (#29520) 2023-08-18 17:35:10 +00:00
Alexandre Girard
7e95c1d175 🐛 Source S3 (V4): Ensure all files are not resync'd when migrating from v3 to v4 (#29418) 2023-08-15 18:11:15 -07:00
Catherine Noll
a29dbdfe04 Source S3: handle legacy path_prefix + path_patterns (#29382) 2023-08-15 18:45:43 -04:00
Alexandre Girard
690479d221 Source S3 (v4): Set decimal_as_float to True for parquet files (#29342)
* [ISSUE #28893] infer csv schema

* [ISSUE #28893] align with pyarrow

* Automated Commit - Formatting Changes

* [ISSUE #28893] legacy inference and infer only when needed

* [ISSUE #28893] fix scenario tests

* [ISSUE #28893] using discovered schema as part of read

* [ISSUE #28893] self-review + cleanup

* [ISSUE #28893] fix test

* [ISSUE #28893] code review part #1

* [ISSUE #28893] code review part #2

* Fix test

* formatcdk

* [ISSUE #28893] code review

* FIX test log level

* Re-adding failing tests

* [ISSUE #28893] improve inferrence to consider multiple types per value

* set decimal_as_float to True

* update

* Automated Commit - Formatting Changes

* add file adapters for avro, csv, jsonl, and parquet

* fix try catch

* update

* format

* pr feedback with a few additional default options set

---------

Co-authored-by: maxi297 <maxime@airbyte.io>
Co-authored-by: maxi297 <maxi297@users.noreply.github.com>
Co-authored-by: brianjlai <brian.lai@airbyte.io>
2023-08-14 20:13:52 -05:00
Brian Lai
82b8274063 [file-based cdk] S3 file format adapter (#29353)
* [ISSUE #28893] infer csv schema

* [ISSUE #28893] align with pyarrow

* Automated Commit - Formatting Changes

* [ISSUE #28893] legacy inference and infer only when needed

* [ISSUE #28893] fix scenario tests

* [ISSUE #28893] using discovered schema as part of read

* [ISSUE #28893] self-review + cleanup

* [ISSUE #28893] fix test

* [ISSUE #28893] code review part #1

* [ISSUE #28893] code review part #2

* Fix test

* formatcdk

* [ISSUE #28893] code review

* FIX test log level

* Re-adding failing tests

* [ISSUE #28893] improve inferrence to consider multiple types per value

* Automated Commit - Formatting Changes

* add file adapters for avro, csv, jsonl, and parquet

* fix try catch

* pr feedback with a few additional default options set

* fix things from the rebase of master

---------

Co-authored-by: maxi297 <maxime@airbyte.io>
Co-authored-by: maxi297 <maxi297@users.noreply.github.com>
2023-08-14 18:47:08 -04:00
Catherine Noll
6946052513 Source S3: maintain backwards compatibility between V3 & V4 state messages (#29028) 2023-08-11 11:38:43 -04:00
Brian Lai
0543099b4d [file based cdk] S3 legacy config adapter (#29145)
* s3 adapter

* pr feedback and updates after rebasing master

* add comment

* formatting
2023-08-09 19:09:47 -04:00
Alexandre Girard
0aa86cf156 File-based CDK + Source S3 (v4): Pass configured file encoding to stream reader (#29110)
* Add encoding to open_file interface

* pass the encoding set in the config

* cleanup

* cleanup

* Automated Commit - Formatting Changes

* Add missing test

* Automated Commit - Formatting Changes

* Update infer_schema too

* Automated Commit - Formatting Changes

* Update unit test

* add a unit test

* fix

* format

* format

* remove newline

* use a mock

* fix

* format

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-08-09 09:05:06 -05:00
Brian Lai
b8d5ca77db 🐛 [file based cdk] Fix S3 and abstract spec to be compatible with Airbyte UI and CAT (#29075)
* remove version, make validation_policy enum, fix input_schema for s3 and abstract file based configs

* remove multiple file format options from stream config

* pr feedback

* fix tests after rebase

* additional spec changes to work with the UI

* fix tests post-rebase

* fix tests post-rebase and cleanup

* formatting
2023-08-08 18:10:05 -04:00
Catherine Noll
57d3dafe16 Source S3: basic structure using file-based CDK (#28786) 2023-08-01 12:45:17 -04:00