Aldo Gonzalez
a662d91db1
✨ feat(source-s3): adjust file record message protocol ( #57498 )
2025-05-05 10:00:17 -06:00
Aaron ("AJ") Steers
83ecbe0fc3
CI: apply pre-commit format fix from #49806 ( #49852 )
2024-12-18 14:05:43 -08:00
Aldo Gonzalez
ce34807b4e
✨ Feat(source-s3): add file-transfer feature to s3 ( #48346 )
2024-11-12 09:27:05 -06:00
Ella Rohm-Ensing
9b19d9078f
update source-s3 to use airbyte-cdk version with protocol pydantic v2 ( #39573 )
2024-06-27 02:51:15 +02:00
Anton Karpets
9c8d0c19df
✨ Source S3: avoid error on empty stream when running discover ( #38674 )
2024-05-28 15:59:04 +03:00
Catherine Noll
6ed63f5b2c
Source S3: run incremental syncs with concurrency ( #34895 )
2024-02-25 12:53:39 -05:00
Roman Yermilov [GL]
bbb06b866f
Source S3: add filter by start date ( #35392 )
2024-02-20 19:44:45 +01:00
Catherine Noll
53d71f94e8
Source S3: updates for compatibility with the concurrent CDK ( #34591 )
2024-02-05 10:01:28 -05:00
Anatolii Yatsuk
63c6961e78
✨ Source S3: Add IAM Role Authentication ( #33818 )
2024-01-17 19:15:09 +02:00
Ella Rohm-Ensing
ac3eb28de2
airbyte-ci: add format commands ( #31831 )
...
Co-authored-by: Ben Church <ben@airbyte.io >
Co-authored-by: bnchrch <bnchrch@users.noreply.github.com >
Co-authored-by: alafanechere <augustin.lafanechere@gmail.com >
Co-authored-by: Augustin <augustin@airbyte.io >
Co-authored-by: Marius Posta <marius@airbyte.io >
Co-authored-by: alafanechere <alafanechere@users.noreply.github.com >
2023-11-14 02:17:48 -06:00
Anatolii Yatsuk
21cfb2a083
✨ Source S3: Add HTTPS validation for S3 endpoint ( #32109 )
2023-11-10 12:33:38 +02:00
Joe Reuter
7c7acade71
S3 and Azure Blob Storage: Update File CDK to support document file types ( #31904 )
...
Co-authored-by: alafanechere <augustin.lafanechere@gmail.com >
2023-10-31 11:21:22 +01:00
Joe Reuter
68e99ce224
🎉 Source S3: Reduce image size and add acceptance test ( #31654 )
...
Co-authored-by: flash1293 <flash1293@users.noreply.github.com >
2023-10-25 06:44:00 -04:00
Anatolii Yatsuk
951605ae8a
✨ Source S3: Add reading files inside zip archive ( #31340 )
2023-10-18 11:53:01 +03:00
Joe Reuter
0a01bc26f4
S3: Basic unstructured file support ( #31209 )
...
Co-authored-by: flash1293 <flash1293@users.noreply.github.com >
2023-10-17 15:18:27 +02:00
Marius Posta
7ae97175a6
gradle: fix repo wide behaviour ( #30607 )
2023-09-28 05:01:13 -07:00
Maxime Carbonneau-Leclerc
2954cbb7ce
✨ Source S3: remove streams.*.file_type from source-s3 configuration ( #30476 )
2023-09-18 09:34:26 -04:00
Maxime Carbonneau-Leclerc
4e7c70f767
✨ Source S3: v4 rollout - take 3 ( #30153 )
...
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com >
2023-09-05 14:33:36 -04:00
Marius Posta
f5c7c1c0b8
chore: get ./gradlew format to pass for the whole repo (same java style) ( #29786 )
2023-08-24 05:09:42 -05:00
Maxime Carbonneau-Leclerc
40b76a7813
✨ Source S3: v4 rollout/feature parity ( #29753 )
2023-08-23 11:30:08 -04:00
Catherine Noll
620a941d21
Source S3: don't require history to be present to identify legacy state format ( #29520 )
2023-08-18 17:35:10 +00:00
Alexandre Girard
7e95c1d175
🐛 Source S3 (V4): Ensure all files are not resync'd when migrating from v3 to v4 ( #29418 )
2023-08-15 18:11:15 -07:00
Catherine Noll
a29dbdfe04
Source S3: handle legacy path_prefix + path_patterns ( #29382 )
2023-08-15 18:45:43 -04:00
Alexandre Girard
690479d221
✨ Source S3 (v4): Set decimal_as_float to True for parquet files ( #29342 )
...
* [ISSUE #28893 ] infer csv schema
* [ISSUE #28893 ] align with pyarrow
* Automated Commit - Formatting Changes
* [ISSUE #28893 ] legacy inference and infer only when needed
* [ISSUE #28893 ] fix scenario tests
* [ISSUE #28893 ] using discovered schema as part of read
* [ISSUE #28893 ] self-review + cleanup
* [ISSUE #28893 ] fix test
* [ISSUE #28893 ] code review part #1
* [ISSUE #28893 ] code review part #2
* Fix test
* formatcdk
* [ISSUE #28893 ] code review
* FIX test log level
* Re-adding failing tests
* [ISSUE #28893 ] improve inferrence to consider multiple types per value
* set decimal_as_float to True
* update
* Automated Commit - Formatting Changes
* add file adapters for avro, csv, jsonl, and parquet
* fix try catch
* update
* format
* pr feedback with a few additional default options set
---------
Co-authored-by: maxi297 <maxime@airbyte.io >
Co-authored-by: maxi297 <maxi297@users.noreply.github.com >
Co-authored-by: brianjlai <brian.lai@airbyte.io >
2023-08-14 20:13:52 -05:00
Brian Lai
82b8274063
✨ [file-based cdk] S3 file format adapter ( #29353 )
...
* [ISSUE #28893 ] infer csv schema
* [ISSUE #28893 ] align with pyarrow
* Automated Commit - Formatting Changes
* [ISSUE #28893 ] legacy inference and infer only when needed
* [ISSUE #28893 ] fix scenario tests
* [ISSUE #28893 ] using discovered schema as part of read
* [ISSUE #28893 ] self-review + cleanup
* [ISSUE #28893 ] fix test
* [ISSUE #28893 ] code review part #1
* [ISSUE #28893 ] code review part #2
* Fix test
* formatcdk
* [ISSUE #28893 ] code review
* FIX test log level
* Re-adding failing tests
* [ISSUE #28893 ] improve inferrence to consider multiple types per value
* Automated Commit - Formatting Changes
* add file adapters for avro, csv, jsonl, and parquet
* fix try catch
* pr feedback with a few additional default options set
* fix things from the rebase of master
---------
Co-authored-by: maxi297 <maxime@airbyte.io >
Co-authored-by: maxi297 <maxi297@users.noreply.github.com >
2023-08-14 18:47:08 -04:00
Catherine Noll
6946052513
Source S3: maintain backwards compatibility between V3 & V4 state messages ( #29028 )
2023-08-11 11:38:43 -04:00
Brian Lai
0543099b4d
✨ [file based cdk] S3 legacy config adapter ( #29145 )
...
* s3 adapter
* pr feedback and updates after rebasing master
* add comment
* formatting
2023-08-09 19:09:47 -04:00
Alexandre Girard
0aa86cf156
File-based CDK + Source S3 (v4): Pass configured file encoding to stream reader ( #29110 )
...
* Add encoding to open_file interface
* pass the encoding set in the config
* cleanup
* cleanup
* Automated Commit - Formatting Changes
* Add missing test
* Automated Commit - Formatting Changes
* Update infer_schema too
* Automated Commit - Formatting Changes
* Update unit test
* add a unit test
* fix
* format
* format
* remove newline
* use a mock
* fix
* format
---------
Co-authored-by: girarda <girarda@users.noreply.github.com >
2023-08-09 09:05:06 -05:00
Brian Lai
b8d5ca77db
🐛 [file based cdk] Fix S3 and abstract spec to be compatible with Airbyte UI and CAT ( #29075 )
...
* remove version, make validation_policy enum, fix input_schema for s3 and abstract file based configs
* remove multiple file format options from stream config
* pr feedback
* fix tests after rebase
* additional spec changes to work with the UI
* fix tests post-rebase
* fix tests post-rebase and cleanup
* formatting
2023-08-08 18:10:05 -04:00
Catherine Noll
57d3dafe16
Source S3: basic structure using file-based CDK ( #28786 )
2023-08-01 12:45:17 -04:00