1
0
mirror of synced 2026-01-07 09:05:45 -05:00
Commit Graph

58 Commits

Author SHA1 Message Date
Artem Inzhyyants
c68afefdf0 Source S3: handle Bucket Access Errors (#27651)
* Source S3: handle bucket access errors

* Source S3: update docs
2023-06-23 13:22:57 +02:00
Artem Inzhyyants
0c3d4499d6 Source S3: fix start date (#27611)
* Source S3: fix start date

* Source S3: update docs

* Source S3: bump version
2023-06-22 17:17:52 +02:00
Artem Inzhyyants
eef872e9f3 Source S3: Add logging for file reading (#27604)
* Source S3: Add logging for file reading

* Source S3: update docs
2023-06-22 10:53:32 +02:00
Artem Inzhyyants
93f3286a0d 🚨🚨Source S3: use platform-handled schema evolution (#25127)
* Source S3: Remove match_target_schema; use platform-handled schema evolution instead

* Source S3: Remove ab_additional_col

* Source S3: update docs; bump version

* Source S3: fix unit tests

* Source S3: fix expected_records

* Source S3: revert _match_target_schema

* Source S3: update expected records for parquet dataset

* Source S3: update metadata

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-05-15 17:14:26 +02:00
Artem Inzhyyants
f74d96f9e2 Source S3: support parquet dataset (#25937)
* Source S3: support parquet dataset

* Source S3: update docs

* Source S3: Fix expected records

* Source S3: Fix expected records

* Source S3: update sem version

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-05-10 11:25:00 +02:00
Artem Inzhyyants
64726c7413 Source S3: Parse nested avro schemas (#25361)
* Source S3: Parse nested avro schemas

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
Co-authored-by: Sergey Chvalyuk <grubberr@gmail.com>
2023-05-01 22:31:25 +03:00
Artem Inzhyyants
7ce322552e 🐛 Source S3: remove minimum block size (#25706)
* Source S3: remove minimum block size

* Source S3: update docs

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-05-01 15:00:42 +02:00
Artem Inzhyyants
e22f9e4cc0 Source S3: handle block size related errors (#25067)
* Source S3: handle pyarrow block size errors

* Source S3: bump version

* Automated Change

* Source S3: fix null field check

* Revert "Automated Change"

This reverts commit dc707f729d.

* Automated Change

* Source S3: bump version + update docs

* auto-bump connector version

---------

Co-authored-by: artem1205 <artem1205@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-04-18 16:08:23 +02:00
Artem Inzhyyants
3080f65429 Source S3: Add start date filter for files (#25010)
* Source S3: Add start date filter for files

* Source S3: add docs

* Source S3: add unittest

* Source S3: add unittest

* Source S3: add unittest

* Source S3: Fix spec test

* Source S3: bump version

* Source S3: fix tests

* Source S3: fix description

* auto-bump connector version

* Source S3: refactor start_date filtering

* Source S3: update setup

* Source S3: serialize state for cache

* Source S3: refactor skip file filter

* Source S3: bump version + update docs

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-04-18 14:07:15 +02:00
Denys Davydov
13ac15130d Source S3: read a single record on check (#24429)
* #1697 source S3: read a single record on check

* #1697 source s3: upd changelog

* #1697 source s3: fix unit_tests

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-03-27 12:56:48 +03:00
Denys Davydov
6a88625cca Source s3: fix datetime conversion (#24178)
* #1669 source s3: fix datetime conversion

* #1669 source s3: review fixes

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-03-17 20:14:08 +02:00
Denys Davydov
db45f05814 Source S3: fix discovery issues (#24157)
* #1652 #1664 Source S3: fix discovery issues

* #1652 #1664 source s3: upd changelog

* #1652 #1664 source s3: review comments

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-03-16 22:39:29 +02:00
Oliver Meyer
5975c323d8 🐛 Source S3: fix datetime format string in FileStream (#23195)
* Fix datetime format string in FileStream

* Update changelog

* Fix integration tests

* Localize datetime objects

* Bump Dockerfile version

* auto-bump connector version

---------

Co-authored-by: Nataly Merezhuk <65251165+natalyjazzviolin@users.noreply.github.com>
Co-authored-by: sh4sh <6833405+sh4sh@users.noreply.github.com>
Co-authored-by: Evan Tahler <evan@airbyte.io>
Co-authored-by: Augustin <augustin@airbyte.io>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-03-16 11:40:31 -07:00
Denys Davydov
3eecf5408c Source S3: infer schema of the first file only (#23189)
* #1470 Source S3: infer schema of the first file

* #1470 source s3: upd changelog

* #1470 source s3: review fixes

* #1470 source s3: review fixes

* #1470 source s3: bump version

* #1470 source s3: review fixes

* auto-bump connector version

---------

Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-03-14 20:09:15 +02:00
Baz
6a6039bbc5 🐛 Source S3: Make Advanced Reader Options and Advanced Options truly Optional (#23669) 2023-03-03 15:12:49 +02:00
Artem Inzhyyants
f83621ae05 Source S3: fix error handling: raise error on guessing file schema (#23502)
* Source S3: fix error handling: raise error on guessing file schema

* Source S3: update docs

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-02-27 19:19:52 +01:00
Denys Davydov
e17464703d Source s3: fix avro discovery (#23198)
* #23197 source s3: fix avro discovery

* #23197 source s3: upd changelog

* #23197 source s3: add allowed hosts

* #23197 source s3: fix tests

* #23197 - fix build: formatting

* auto-bump connector version

---------

Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-02-24 12:37:00 +02:00
Denys Davydov
3dc79f5a99 Source S3: speed up discovery (#22500)
* #1470 source S3: speed up discovery

* #1470 source s3: upd changelog

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-02-09 21:44:48 +02:00
Denys Davydov
fcd3b0334e Source S3: validate CSV read options and convert options (#22550)
* #1467 source S3: validate CSV read options and convert options

* #1467 source S3: upd changelog

* #1467 source s3: review fixes

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-02-09 18:27:25 +02:00
Cole Snodgrass
2e099acc52 update headers from 2022 -> 2023 (#22594)
* It's 2023!

* 2022 -> 2023

---------

Co-authored-by: evantahler <evan@airbyte.io>
2023-02-08 13:01:16 -08:00
Joe Reuter
6e373435f2 Small spec fixes to make sure they work with connector form UI (#21587) 2023-01-25 19:43:26 +01:00
Roman Yermilov [GL]
04a77ad3aa Source S3: keep processing but warn if OSError happen (#21604)
* Source S3: keep processing but warn if OSError happen

* Source S3: bump version and update changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-01-24 20:00:51 +04:00
Artem Inzhyyants
31edbd8bae Source S3: update block size for json (#21210)
* Source S3: update block size for json

* Source S3: update docs

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-01-10 19:53:42 +00:00
Artem Inzhyyants
09cfcbf599 🐛 Source S3: Check config settings for CSV file format (#20262)
* Source S3: get master schema on check connection

* Source S3: bump version

* Source S3: update docs

* Source S3: fix test

* Source S3: add fields validation for CSV source

* Source S3: add test

* Source S3: Refactor config validation

* Source S3: update docs

* Source S3: format

* Source S3: format

* Source S3: fix tests

* Source S3: fix tests

* Source S3: fix tests

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-12-14 21:53:06 +01:00
Xingyuan-Chen
425cc91c85 Source S3: Add virtual-hosted-style option (#19006)
* add virtual-hosted-style option for S3 source

* update s3 version

* auto-bump connector version

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-11-08 10:48:16 -05:00
Denys Davydov
6a40ac52fe Source S3: use AirbyteTracedException (#18602)
* #750 # 837 #904 Source S3: use AirbyteTracedException

* source s3: upd changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-10-29 07:23:39 +03:00
Denys Davydov
5aa25a1e1a Source S3 - fix schema inference (#17991)
* #678 oncall. Source S3 - fix schema inference

* source s3: upd changelog

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-10-14 14:53:39 +03:00
Serhii Lazebnyi
5df66cd572 Source S3: Connector does not enforce SSL/TLS for non-S3 endpoints (#17800)
* Deleted ssl/tsl flag from config

* Updated PR number

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-10-12 16:07:22 +02:00
Pedro S. Lopez
938436bcc9 update connector specs and definitions with new .com documentation urls (#17585)
* update definitions with new .com docs urls

* update docs urls in specs

* update generators

* regenerate scaffold connectors

* remove unrelated changes

* update more urls

* update specs

* fix tests

* run `:airbyte-config:specs:generateSeedConnectorSpecs` to fix formatting

* revert docs changes to make pr more reviewable

* revert generator readme changes to make more reviewable

* fix mysql strict encrypt expected spec

* fix postgres expected spec
2022-10-11 11:04:23 -04:00
Evan Tahler
49cb3360de Remove redundant title labels from connector specs (#17544)
* Remove redundant title labels from connector specs

* Manually update specs

* add env variable

* Remove debugging log
2022-10-05 12:58:38 -07:00
Denys Davydov
4dc394cb9a Source S3: fix reading jsonl files with nested data (#16607)
* #531 source s3: fix reading nested jsonl files

* #531 source s3: upd changelog

* oncall #531 source s3: fix sample file

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-09-19 12:09:40 +03:00
Denys Davydov
73ba7b63d5 Source S3: choose between data types when merging master schema (#16631)
* #422 source s3: choose broadest data type when there is a mismatch during merging json schemas

* #422 source s3: upd changelog
2022-09-19 10:50:18 +03:00
sivankumar86
3d499557b7 source-S3: Support JSON format (#14213)
* json format support added

* json format support added

* code formatted

* format convertion changed

* format naming convertion changed

* test cased issue fixed

* test case issued resolved

* sample file and config added for integration tests

* Json doc added

Json doc added

* update

* sample file and config added for integration tests

* sample file and config added for integration tests

* update jsonl files

* review 1

* review 1

* review 1

* pyarrow version upgrade

* clean integration test folder architecture

* add timestamp record to simple_test.jsonl

* fixed integration test and parser review change

* simplify table read

* doc update

* fix specs

* user sample files

* fix sample files

* add newlines at end of files

* rename json parser

* rename jsonfile to jsonlfile

* schema inference added

* patch review fix

* Update docs/integrations/sources/s3.md

doc update

Co-authored-by: George Claireaux <george@airbyte.io>

* changing the version

* changing the title to sync with other type

* fix expected csv records

* fix expected records for avro and parquet

* review fix

* fixed master schema handling

* remove sample configs

* fix expected records

* json doc update

added more details on json parser

* fixed api name

* bump version

* auto-bump connector version [ci skip]

Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
Co-authored-by: George Claireaux <george@airbyte.io>
Co-authored-by: George Claireaux <george@claireaux.co.uk>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-08-01 15:48:23 +01:00
Serhii Chvaliuk
29d6a61a21 🐛 Source S3: "decimal" type added for parquet (#14911)
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-07-22 01:04:44 +03:00
Baz
7cf67e2c85 🐛 Source S3: fixed bug when extra columns not in master schema (#14669) 2022-07-13 22:56:03 +03:00
Serhii Lazebnyi
f9348b2251 🐛 Source Amazon S3: solve possible case of files being missed during incremental syncs (#12568)
* Added history to state

* Deleted unused import

* Rollback abnormal state file

* Rollback abnormal state file

* Fixed type error issue

* Fix state issue

* Updated after review

* Bumped version
2022-05-31 21:39:10 +03:00
Marcos Marx
dca2256a7c Bump 2022 license version (#13233)
* Bump year in license short to 2022

* remove protocol from cdk
2022-05-26 15:00:42 -03:00
Serhii Lazebnyi
91326749d9 🎉Source Amazon S3: increase unit test coverage at least 90% (#11967)
* Increased unittest coverage

* #11676 test coverage 85%

* #11676 unit tests 90%

* #11676 two more unit tests

* #11676 bump version

* auto-bump connector version

Co-authored-by: Denys Davydov <denys.i.davydov@globallogic.com>
Co-authored-by: Denys Davydov <davydov.den18@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-05-23 13:37:27 +03:00
Serhii Lazebnyi
225aecd37c 🐛Source Amazon S3: Fixed empty options issue (#12730)
* Fixed empty oprions issue

* Update airbyte-integrations/connectors/source-s3/source_s3/utils.py

Co-authored-by: Denis Davydov <denys.i.davydov@globallogic.com>

* Bumped version

* Fix typo

* Bumped seed version

* Fix changelog

* Bumped version in docker file

* auto-bump connector version

Co-authored-by: Denis Davydov <denys.i.davydov@globallogic.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-05-11 21:21:54 +03:00
Melker Öhrman
f9188590cc Add avro parser to s3 source (#12602)
* added MVP avro parser running fine locally

* added unit tests for avro

* added wip state of avro integration test setup

* deleted unused files

* added avro specific config path

* fixed comments. Added nested record support, simplify code and minor fixes

* bumped version + docs update

* Added working acceptance tests + format

* auto-bump connector version

Co-authored-by: George Claireaux <george@claireaux.co.uk>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-05-11 16:59:58 +01:00
Sherif A. Nada
b8e147538c Update various connector input configs & docs copy (#12500) 2022-05-04 23:37:10 -07:00
Maksym Pavlenok
bbd13802d8 🐛 Fix Python checker configs and Connector Base workflow (#10505) 2022-02-22 19:58:55 +02:00
Maksym Pavlenok
91eff1dffd 🐛 Source S3: Loading of files' metadata (#8252) 2022-02-02 00:49:18 +02:00
Serhii Chvaliuk
c7021e6f30 🐛 Source S3: work-around for format.delimiter change '\\t' -> '\t' (#9163)
* work-around for format.delimiter '\\t' -> '\t'

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-01-06 20:49:55 +02:00
Anna Lvova
9f685b95c8 🐛 Source LinkedIn Ads: hands 429 response status code (#8382)
* Add log message for 429 status_code

* fix

* bump version

* bump version and format
2021-12-02 14:17:33 +02:00
Christophe Duong
b424c1a0e7 🐛 Fix incremental normalization with empty tables (#8394)
* Fix incremental with empty final tables

* upgrade docker images

* Regen SQL

* Bumpversion & format
2021-12-01 23:40:14 +01:00
Marcos Eliziario Santos
ff0c09a724 s3-source add option to not infer datatypes (#7892)
* Implement Flag to avoid inferring data types for CSV input files in s3 SOURCE

* Unit Tests to Flag to avoid inferring data types for CSV input files in s3 SOURCE
Refactor parametrized tests in CSV and Parquet formats to use pytest.parametrize for better error reporting on test failure.

* S3 Source, infer_datatypes flag: additional unit tests

* wrong method signature

* Refactors

* s3-source - infer_datatypes flag, fix user message

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/csv_spec.py

Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>

* s3-source - refactor - use spec defaults instead of hardcoding them in code.

* Update airbyte-integrations/connectors/source-s3/source_s3/utils.py

Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>

* code review changes

Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>
2021-11-30 10:03:49 -03:00
Christophe Duong
86ca36c5c0 Format code (#7978) 2021-11-15 14:51:10 +01:00
Marcos Marx
e821919066 run gradlew format (#7223) 2021-10-20 16:43:14 -03:00
George Claireaux
1d3a17a8fb 🎉 Source S3 - memory & performance optimisations + advanced CSV options (#6615)
* memory & performance optimisations

* address comments

* version bump

* added advanced_options for reading csv without header, and more custom pyarrow ReadOptions

* updated to use the latest airbyte-cdk

* updated docs

* bump source-s3 to 0.1.6

* remove unneeded lines

* Use the all dep ami for python builds.

* ec2-instance-id should be ec2-image-id

* ec2-instance-id should be ec2-image-id

Co-authored-by: Jingkun Zhuang <Jingkun.Zhuang@icims.com>
Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-10-19 16:50:51 +01:00