1
0
mirror of synced 2025-12-22 03:21:25 -05:00
Commit Graph

228 Commits

Author SHA1 Message Date
Melker Öhrman
f9188590cc Add avro parser to s3 source (#12602)
* added MVP avro parser running fine locally

* added unit tests for avro

* added wip state of avro integration test setup

* deleted unused files

* added avro specific config path

* fixed comments. Added nested record support, simplify code and minor fixes

* bumped version + docs update

* Added working acceptance tests + format

* auto-bump connector version

Co-authored-by: George Claireaux <george@claireaux.co.uk>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-05-11 16:59:58 +01:00
Serhii Lazebnyi
27e6ce2ca8 Source Amazon S3: Refactored docs (#12534)
* Refactored spec and docs

* Updated spec.json

* Rollback spec fromating

* Rollback spec fromating

* Rollback spec fromating
2022-05-09 14:56:52 +03:00
Sherif A. Nada
b8e147538c Update various connector input configs & docs copy (#12500) 2022-05-04 23:37:10 -07:00
Brian Leonard
c302af45ff Upgrade to Python 3.9 (#11763)
* Dockerfile to 3.9

* Python version

* More python updates

* 3.9 on GitHub actions and lint updates

* Test out 3.9.11 on GitHub actions

* install python with an action

* formatting: newline

* Also has python code

* only check first level for changed modules
Previous example (source-google-search-console/credentials)

* Test failure: there is no logger.trace
2022-04-11 20:51:37 -07:00
Maksym Pavlenok
bbd13802d8 🐛 Fix Python checker configs and Connector Base workflow (#10505) 2022-02-22 19:58:55 +02:00
VitaliiMaltsev
e30d8348b2 Change JsonSchemaPrimitive to a class (#9913)
* fix for jdk 17

* add JsonSchemaType class

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* fix Oracle tests

* fix Redshift tests

* fix Redshift tests

* fix checkstyle

* fix MSSQL tests

* fix cockroachdb tests

* fix checkstyle

* fix checkstyle

* replace star imports

* replace star imports

* replace star imports

* update JsonSchemaType | fixed checkstyle

* Remove unused variables in test

* Fix imports

* Expand imports

* Fix more imports

Co-authored-by: vmaltsev <vitalii.maltsev@globallogic.com>
Co-authored-by: Liren Tu <tuliren.git@outlook.com>
2022-02-14 02:12:37 -08:00
Maksym Pavlenok
91eff1dffd 🐛 Source S3: Loading of files' metadata (#8252) 2022-02-02 00:49:18 +02:00
Serhii Chvaliuk
c7021e6f30 🐛 Source S3: work-around for format.delimiter change '\\t' -> '\t' (#9163)
* work-around for format.delimiter '\\t' -> '\t'

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-01-06 20:49:55 +02:00
Marcos Eliziario Santos
aa6829bb54 source-s3: bump version (#8695)
* source-s3: bump version

* Fix SAT test spec

* bump version
2021-12-29 09:13:17 -03:00
Anna Lvova
9f685b95c8 🐛 Source LinkedIn Ads: hands 429 response status code (#8382)
* Add log message for 429 status_code

* fix

* bump version

* bump version and format
2021-12-02 14:17:33 +02:00
Christophe Duong
b424c1a0e7 🐛 Fix incremental normalization with empty tables (#8394)
* Fix incremental with empty final tables

* upgrade docker images

* Regen SQL

* Bumpversion & format
2021-12-01 23:40:14 +01:00
Marcos Eliziario Santos
ff0c09a724 s3-source add option to not infer datatypes (#7892)
* Implement Flag to avoid inferring data types for CSV input files in s3 SOURCE

* Unit Tests to Flag to avoid inferring data types for CSV input files in s3 SOURCE
Refactor parametrized tests in CSV and Parquet formats to use pytest.parametrize for better error reporting on test failure.

* S3 Source, infer_datatypes flag: additional unit tests

* wrong method signature

* Refactors

* s3-source - infer_datatypes flag, fix user message

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/csv_spec.py

Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>

* s3-source - refactor - use spec defaults instead of hardcoding them in code.

* Update airbyte-integrations/connectors/source-s3/source_s3/utils.py

Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>

* code review changes

Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>
2021-11-30 10:03:49 -03:00
Christophe Duong
86ca36c5c0 Format code (#7978) 2021-11-15 14:51:10 +01:00
Vadym
504580d833 Remove base-python gradle dependencies in connectors where base-python is not used (#7499)
* Remeve base-python references.

* Add requirements.txt

* Fix requirements.txt blank line

* Fix source-exchange rates to common CDK approach

* Fix source-smartsheets SAT.
Fix source-exchange-rates build.gradle.

* Bump docker version

* Update source-dixa SAT config

* Fix source-exchange-rates SAT config

* Revert bump scaffold sources version

* Fix source-shortio SAT config

* Fix source-square invalid_config.json

* Fix source-us-census invalid_config.json

* Fix source-intercom versioning
2021-11-10 13:12:29 +02:00
Marcos Marx
e821919066 run gradlew format (#7223) 2021-10-20 16:43:14 -03:00
George Claireaux
1d3a17a8fb 🎉 Source S3 - memory & performance optimisations + advanced CSV options (#6615)
* memory & performance optimisations

* address comments

* version bump

* added advanced_options for reading csv without header, and more custom pyarrow ReadOptions

* updated to use the latest airbyte-cdk

* updated docs

* bump source-s3 to 0.1.6

* remove unneeded lines

* Use the all dep ami for python builds.

* ec2-instance-id should be ec2-image-id

* ec2-instance-id should be ec2-image-id

Co-authored-by: Jingkun Zhuang <Jingkun.Zhuang@icims.com>
Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-10-19 16:50:51 +01:00
vitaliizazmic
741001ae92 🐛 Airbyte CDK: fixing integration test failing
* Airbyte CDK native logger #1279 - fix import logger error

* Airbyte CDK native logger #1279 - source-paypal-transaction: change level from "WARN" to logging.WARN in self.logger.log

* Airbyte CDK native logger #1279 - source-s3: use native logger instead AirbyteLogger

* Airbyte CDK native logger #1279 - source-zuora: use native logger instead AirbyteLogger

* Airbyte CDK native logger #1279 - fix get logger
2021-10-18 18:46:55 +03:00
Maksym Pavlenok
b80f81ef9b 🐛 Source S3: timestamp parquet data (#6613)
* fix datetime parquet data

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_parser.py

Co-authored-by: George Claireaux <george@claireaux.co.uk>

* aggregate pyarrow types

Co-authored-by: Maksym Pavlenok <maksym.pavlenok@globallogic.com>
Co-authored-by: George Claireaux <george@claireaux.co.uk>
2021-10-04 14:04:01 +03:00
Michel Tricot
1773e41e47 Shorten our headers + adds contributors file (#6478) 2021-09-27 10:45:50 -07:00
Dmytro
6767424b6d 🎉 S3 source: add support for non-AWS S3 Storage (#6398) 2021-09-27 16:40:24 +03:00
Maksym Pavlenok
e5c44e64b1 🎉 Source S3: support of Parquet format (#5305)
* add parquet parser

* add integration tests for partquet formats

* add unit tests for parquet

* update docs and secrets

* fix incorrect import for tests

* add lib pandas for unit tests

* revert changes of foreign connectors

* update secret settings

* fix config values

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_spec.py

Co-authored-by: George Claireaux <george@claireaux.co.uk>

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_spec.py

Co-authored-by: George Claireaux <george@claireaux.co.uk>

* remove some unused default options

* update tests

* update docs

* bump its version

* fix expected test

Co-authored-by: Maksym Pavlenok <maksym.pavlenok@globallogic.com>
Co-authored-by: George Claireaux <george@claireaux.co.uk>
2021-09-05 02:40:49 +03:00
LiRen Tu
2906ec287a CI: Add action to check broken doc links (#5254)
* Add action to check broken doc links

* Ignore localhost

* Update config

* Fix broken links

* Use quiet mode

* Ignore PR link

* Fix more broken links

* Fix more broken links

* Fix more broken links

* Verify pattern

* Fix more broken links

* Separate full and pr check

* Update pattern

* Test invalid link

* Remove invalid link
2021-08-07 14:28:02 -07:00
Sherif A. Nada
8e74551703 🐛 Source hubspot: correctly use logger exception printing (#5250) 2021-08-06 13:41:14 -07:00
George Claireaux
137257b62b 🐛 Source S3: fixed bug where sync could hang indefinitely (#5197)
* infer schema in multi process

* use dill to pickle function

* moved funcs

* Revert "moved funcs"

This reverts commit c1739ad988.

* Revert "use dill to pickle function"

This reverts commit 52404a9f1b.

* Revert "infer schema in multi process"

This reverts commit f0fb6f66f9.

* multiprocess in csv schema iinfer

* simplify what happens in the multiprocess to offending code

* try this

* using tempfile

* formatting

* version bump

* changelog + formatting

* addressed review comments

* re-trigger checks

* ran testScaffoldTemplates to fix breaking check
2021-08-06 00:07:46 +01:00
Sherif A. Nada
6b07d3014f publish PR 4550 (#5219)
Co-authored-by: Rodrigo Parra <rodpar07@gmail.com>
2021-08-04 23:10:39 -07:00
Dmytro
abde7c7727 Remove json schema parameter (#4907)
source-acceptance-test framework not longer required json_schema parameters
   from catalog file. This parameter is verbose, makes reading config file
   complicated and could be misleading when debugging acceptance test issues.

Co-authored-by: Dmytro Rezchykov <dmitry.rezchykov@zazmic.com>
2021-08-04 18:16:59 +03:00
George Claireaux
9e529545c2 🐛 Source S3: fixed bug in spec so that Format field displays in UI correctly (#5135)
* fixed bug in spec so that Format field displays in UI correctly

* newline & changelog
2021-08-02 17:23:10 +01:00
George Claireaux
d9f11bcf6a 🎉 New Source: S3 (+ abstract files source) (#4990)
* minor line length changes

* cdk generated source + oop structure + start of implementation

* fixed some broken syntax stuff

* pre-pyarrow convert

* introducing pyarrow

* skeleton for unit tests

* read working on multiple files

* incremental first draft

* blobfile -> fileclient

* change references of 'blob' to 'file'

* minor tidy to make draft PR

* fixes

* addressed review comments + more unit tests

* finished unit tests

* bugfixes and abstract integration tests framework

* remove old commented stuff

* docstrings

* restructure as source-s3

* Delete playground.py

* integration tests

* acceptance tests and some more reshuffling

* source S3 credentials

* change _airbyte_ columns to _ab_

* update spec with better descriptions and ordering

* created s3 source docs

* source definition

* reverse docstring change in cdk

* reverse docstring change

* reverse change

* reverse docstring change

* remove TODO comments

* add PR to changelog

* removed unused libraries

* formatting & address some review comments

* rename of files/classes for clarity

* addressing review comments

* address reviews

* add s3 source

* building spec with pydantic for provider-specific inheritance

* pydantic spec and improved path pattern with wcmatch.glob

* update path patterns info in doc

* formatting

* tests gzip and bz2 compression on csv

* updated compression support in doc

* forgot to upload bz2 test file

* added pattern validation to dataset

* formatting

* Format.

* ran testScaffoldTemplates & generated this diff

* bumped version because of documentationUrl fix

Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-07-30 15:06:11 +01:00