1
0
mirror of synced 2026-02-02 16:02:07 -05:00
Commit Graph

412 Commits

Author SHA1 Message Date
Marius Posta
3e680675a4 github workflows: repo-wide auto-format (#29798)
Co-authored-by: postamar <postamar@users.noreply.github.com>
2023-08-25 10:20:41 -07:00
Maxime Carbonneau-Leclerc
82a96e0c69 File-based CDK: allow for extension mismatch (#29835) 2023-08-25 11:44:49 -04:00
Maxime Carbonneau-Leclerc
cb2796de0a File-based CDK: Remove excessive logging when there are more fields i… (#29778) 2023-08-23 17:05:54 -04:00
Maxime Carbonneau-Leclerc
40b76a7813 Source S3: v4 rollout/feature parity (#29753) 2023-08-23 11:30:08 -04:00
Maxime Carbonneau-Leclerc
b801a3d24f Do not stop processing file on parsing error (#29679) 2023-08-21 15:56:01 -04:00
Joe Reuter
f8de9d12df CDK: Remove list endpoint (#29581) 2023-08-21 12:43:44 +02:00
Joe Reuter
d293e1cce4 Embedded CDK: run a check before starting to load (#29079) 2023-08-21 12:42:58 +02:00
Alexandre Girard
b4ce532762 low-code: Allow formatting datetimes as milliseconds since unix epoch (#29504)
Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-08-17 18:49:28 -07:00
Maxime Carbonneau-Leclerc
ec290eed8a Issue 27019/doc (#29434)
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
2023-08-17 17:28:13 -04:00
Maxime Carbonneau-Leclerc
e9d99630ed Removing validation on skip rows and autogenerated headers (#29488) 2023-08-17 16:14:19 -04:00
Catherine Noll
7c1d6081de File-based CDK: handle legacy path_prefix + globs (#29389) 2023-08-15 12:18:25 -04:00
Brian Lai
5908b85e69 [file-based cdk] Remove CSV quoting_behavior config option (#29388)
* remove CSV quoting_behavior config option

* cleanup after getting latest master
2023-08-14 20:37:38 -04:00
Alexandre Girard
b512fa4628 file-based CDK: Configurable strings_can_be_null (#29298)
* [ISSUE #28893] infer csv schema

* [ISSUE #28893] align with pyarrow

* Automated Commit - Formatting Changes

* [ISSUE #28893] legacy inference and infer only when needed

* [ISSUE #28893] fix scenario tests

* [ISSUE #28893] using discovered schema as part of read

* [ISSUE #28893] self-review + cleanup

* [ISSUE #28893] fix test

* [ISSUE #28893] code review part #1

* [ISSUE #28893] code review part #2

* Fix test

* formatcdk

* first pass

* [ISSUE #28893] code review

* fix mypy issues

* comment

* rename for clarity

* Add a scenario test case

* this isn't optional anymore

* FIX test log level

* Re-adding failing tests

* [ISSUE #28893] improve inferrence to consider multiple types per value

* Automated Commit - Formatting Changes

* [ISSUE #28893] remove InferenceType.PRIMITIVE_AND_COMPLEX_TYPES

* Code review

* Automated Commit - Formatting Changes

* fix unit tests

---------

Co-authored-by: maxi297 <maxime@airbyte.io>
Co-authored-by: maxi297 <maxi297@users.noreply.github.com>
2023-08-14 12:51:27 -07:00
Maxime Carbonneau-Leclerc
12f1304a67 Issue 28893/infer schema csv (#29099) 2023-08-14 15:14:46 -04:00
Catherine Noll
6946052513 Source S3: maintain backwards compatibility between V3 & V4 state messages (#29028) 2023-08-11 11:38:43 -04:00
Alexandre Girard
1a120ecd4b File-CDK (Avro) Set double_as_string to false by default (#29339)
* set double_as_string to false by default

* Use default config when irrelevant to the test

* Update description

* Update the description again
2023-08-10 14:31:52 -07:00
Maxime Carbonneau-Leclerc
cfbd0b8219 [ISSUE #26764] support brute force multiline json objects for JSONL (#29331)
* [ISSUE #26764] support brute force multiline json objects for JSONL

* [ISSUE #26764] infer_schema to support multiline json objects as well

* [ISSUE #26764] code review
2023-08-10 15:54:46 -04:00
Alexandre Girard
0aa86cf156 File-based CDK + Source S3 (v4): Pass configured file encoding to stream reader (#29110)
* Add encoding to open_file interface

* pass the encoding set in the config

* cleanup

* cleanup

* Automated Commit - Formatting Changes

* Add missing test

* Automated Commit - Formatting Changes

* Update infer_schema too

* Automated Commit - Formatting Changes

* Update unit test

* add a unit test

* fix

* format

* format

* remove newline

* use a mock

* fix

* format

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-08-09 09:05:06 -05:00
Brian Lai
b8d5ca77db 🐛 [file based cdk] Fix S3 and abstract spec to be compatible with Airbyte UI and CAT (#29075)
* remove version, make validation_policy enum, fix input_schema for s3 and abstract file based configs

* remove multiple file format options from stream config

* pr feedback

* fix tests after rebase

* additional spec changes to work with the UI

* fix tests post-rebase

* fix tests post-rebase and cleanup

* formatting
2023-08-08 18:10:05 -04:00
Alexandre Girard
78b00e088b Parquet parser return Decimal fields as strings (#29191)
* Update the test so it fails if the type is different

* Update to convert values

* Add columns from file partitions

* update
2023-08-08 11:38:16 -07:00
Alexandre Girard
1b6428877d Avro parser: return Decimal fields as strings (#29182)
* update avro parsing

* rename field

* output as iso strings
2023-08-08 11:34:25 -07:00
Brian Lai
01045d674d Add start_date to all file-based configs (#28845)
* add start_date config to abstract spec and apply it in the cursor

* rollback start date cursor changes

* revert back to filtering in the reader and pr feedback

* fix tests post-rebase and pr feedback
2023-08-07 20:43:07 -04:00
Lake Mossman
4bf6b8e15a Fix title & description of datetime_format field (#29025)
* fix description of datetime_format field

* Automated Commit - Formatting Changes

* improve description of cursor datetime formats field

* Automated Commit - Formatting Changes

---------

Co-authored-by: lmossman <lmossman@users.noreply.github.com>
2023-08-04 14:04:24 -07:00
Catherine Noll
53d8450ec2 File-based CDK: allow FileBasedSource to take a cursor_cls (#29027) 2023-08-04 09:49:03 -04:00
Catherine Noll
8ced5ff1db airbyte-cdk: allow Entrypoint to extract config (#28980) 2023-08-03 22:48:06 -04:00
Alexandre Girard
641a65a1e3 Add CSV options to the CSV parser (#28491)
* remove invalid legacy option

* remove unused option

* the tests pass but this is quite messy

* very slight clean up

* Add skip options to csv format

* fix some of the typing issues

* fixme comment

* remove extra log message

* fix typing issues

* skip before header

* skip after header

* format

* add another test

* Automated Commit - Formatting Changes

* auto generate column names

* delete dead code

* update title and description

* true and false values

* Update the tests

* Add comment

* missing test

* rename

* update expected spec

* move to method

* Update comment

* fix typo

* remove unused import

* Add a comment

* None records do not pass the WaitForDiscoverPolicy

* format

* remove second branch to ensure we always go through the same processing

* Raise an exception if the record is None

* reset

* Update tests

* handle unquoted newlines

* Automated Commit - Formatting Changes

* Update test case so the quoting is explicit

* Update comment

* Automated Commit - Formatting Changes

* Fail validation if skipping rows before header and header is autogenerated

* always fail if a record cannot be parsed

* format

* set write line_no in error message

* remove none check

* Automated Commit - Formatting Changes

* enable autogenerate test

* remove duplicate test

* missing unit tests

* Update

* remove branching

* remove unused none check

* Update tests

* remove branching

* format

* extract to function

* comment

* missing type

* type annotation

* use set

* Document that the strings are case-sensitive

* public -> private

* add unit test

* newline

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-08-03 08:59:55 -07:00
Joe Reuter
df3b1d9c8d 🚨🚨 Low code CDK: Decouple SimpleRetriever and HttpStream (#28657)
* fix tests

* format

* review comments

* Automated Commit - Formatting Changes

* review comments

* review comments

* review comments

* log all messages

* log all message

* review comments

* review comments

* Automated Commit - Formatting Changes

* add comment

---------

Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-08-03 12:30:59 +02:00
Joe Reuter
1ee4c04203 CDK: Embedded reader utils (#28873)
* relax pydantic dep

* Automated Commit - Format and Process Resources Changes

* wip

* wrap up base integration

* add init file

* introduce CDK runner and improve error message

* make state param optional

* update protocol models

* review comments

* always run incremental if possible

* fix

---------

Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-08-03 12:02:31 +02:00
Joe Reuter
60e1d72b42 Python CDK: Relax pydantic version requirement (#28854)
* relax pydantic dep

* Automated Commit - Format and Process Resources Changes

* update protocol models

* format change

---------

Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-08-02 13:03:03 +02:00
Catherine Noll
09ebb47b24 File cdk parser and cursor updates (#28900)
* File-based CDK: update parquet parser to handle partitions

* File-based CDK: make the record output & cursor date time format consistent
2023-08-01 21:47:58 -04:00
Maxime Carbonneau-Leclerc
e158bec4b2 [ISSUE #28782] support multiple cursor field datetime formats (#28936)
* [ISSUE #28782] support multiple cursor field datetime formats

* Making sure we use the proper format for creating slices

* Code review
2023-08-01 17:59:17 -04:00
Catherine Noll
22ff7e0fae File-based CDK: reorganize FileReadMode to fix circular import (#28885) 2023-07-31 17:55:29 -04:00
Catherine Noll
642e7680b4 File-based CDK: add read mode to stream reader interface & parsers (#28862) 2023-07-31 16:55:00 -04:00
Catherine Noll
73395a187a File-based CDK: allow null values for all inferred columns (#28847) 2023-07-31 15:10:21 -04:00
Maxime Carbonneau-Leclerc
48bf520d87 Fix stream read given stream doesn't have any slice (#28746)
* Fix stream read given stream doesn't have any slice

* Not return slices if there are none

* Fix test
2023-07-27 10:05:35 -04:00
Lake Mossman
f66079cdb1 Capitalize title of request_authentication field (#28585)
* capitalize title of request_authentication field

* Automated Commit - Formatting Changes

---------

Co-authored-by: lmossman <lmossman@users.noreply.github.com>
2023-07-25 17:17:27 -06:00
Alexandre Girard
df01616951 [Issue #23497] Deduplicate query parameters for declarative connectors (#28550)
* remove duplicate param

* remove duplicate params

* fix some of the typing issues

* fix typing issues

* newline

* format

* Enable by default

* Add missing file

* refactor and remove flag

* none check

* move line of code

* fix typing in rate_limiting

* comment

* use typedef

* else branch

* format

* gate the feature

* rename test

* fix the test

* only dedupe if the values are the same

* Add some tests

* convert values to strings

* Document the change

* implement in requester too
2023-07-25 14:22:25 -07:00
Brian Lai
59300093b1 [file-based cdk] Add avro parser for inferring schema and reading records (#28500)
* add avro parser for inferring schema and reading records

* fix mypy check not caught locally

* pr feedback and some additional types

* add decimal_as_float for avro

* formatting + mypy
2023-07-25 12:54:16 -04:00
Maxime Carbonneau-Leclerc
b1a5f270ae Fix remove field transform (#28518)
* Fix remove field transform

* mypy
2023-07-20 10:15:42 -04:00
Joe Reuter
58cc540c6b 🚨 Low code CDK: Add session token authenticator (#28050)
This PR adds a new authenticator: The SessionTokenAuthenticator. The existing authenticator under the same name is renamed to LegacySessionTokenAuthenticator.
2023-07-19 17:10:24 +02:00
Joe Reuter
78728410f4 Low code CDK: Fix mypy errors (#28386)
* ingore unit tests in mypy check

* Update airbyte-cdk/python/bin/run-mypy-on-modified-files.sh

Co-authored-by: Alexandre Girard <alexandre@airbyte.io>

* work through mypy errors

* fix a bunch of stuff

* fix more type hints

* fix model_to_component_factory types

* format

* ignore list instead of allow list

---------

Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
2023-07-19 15:08:35 +02:00
Alexandre Girard
9de707fbf0 Parquet files: support decimal as floats, map, null, and fixed sized binary types (#28320)
* tests pass

* everything except parquet config seems to work

* the file fortmat needs a literal

* Add a comment

* Update

* comment

* Ensure only one file type is specified

* Add a test

* add test

* update

* Automated Commit - Formatting Changes

* extract formats

* Automated Commit - Formatting Changes

* fix typo

* Update tests

* Also test jsonl

* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py

Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>

* Update the spec

* update to new config format

* set decimal_as_float to True on legacy configs for backward compatibility

* comments

* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/config/file_based_stream_config.py

Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>

* format

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
2023-07-18 18:40:51 -05:00
Alexandre Girard
3ae73fb0ff connector builder: Set test_read_limit_reached to true if we hit the max records limit (#28293)
* set test_read_limit_reached to true if we hit the max records limit

* rename slice to _slice to avoid shadowing a builtin keyword

* newline

* fix some of the typing issues

* fix some more typing issues

* another fix

* fix last typing issue

* format

* Automated Commit - Formatting Changes

* reset type

* fix the type

* Update for clarity

* Update types

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-07-18 15:53:53 -07:00
Catherine Noll
e2bb01838e File-based CDK: implement JSONL parser (#28259) 2023-07-17 22:46:58 -06:00
Joe Reuter
0d185a2b40 fix date format detection (#28268) 2023-07-14 13:16:15 +02:00
Alexandre Girard
97a353d5c5 Run mypy on airbyte-cdk as part of the build pipeline and fix typing issues in the file-based module (#27790)
* Try running only on modified files

* make a change

* return something with the wrong type

* Revert "return something with the wrong type"

This reverts commit 23b828371e.

* fix typing in file-based

* format

* Mypy

* fix

* leave as Mapping

* Revert "leave as Mapping"

This reverts commit 908f063f70.

* Use Dict

* update

* move dict()

* Revert "move dict()"

This reverts commit fa347a8236.

* Revert "Revert "move dict()""

This reverts commit c9237df2e4.

* Revert "Revert "Revert "move dict()"""

This reverts commit 5ac1616414.

* use Mapping

* point to config file

* comment

* strict = False

* remove --

* Revert "comment"

This reverts commit 6000814a82.

* install types

* install types in same command as mypy runs

* non-interactive

* freeze version

* pydantic plugin

* plugins

* update

* ignore missing import

* Revert "ignore missing import"

This reverts commit 1da7930fb7.

* Install pydantic instead

* fix

* this passes locally

* strict = true

* format

* explicitly import models

* Update

* remove old mypy.ini config

* temporarily disable mypy

* format

* any

* format

* fix tests

* format

* Automated Commit - Formatting Changes

* Revert "temporarily disable mypy"

This reverts commit eb8470fa3f.

* implicit reexport

* update test

* fix mypy

* Automated Commit - Formatting Changes

* fix some errors in tests

* more type fixes

* more fixes

* more

* .

* done with tests

* fix last files

* format

* Update gradle

* change source-stripe

* only run mypy on cdk

* remove strict

* Add more rules

* update

* ignore missing imports

* cast to string

* Allow untyped decorator

* reset to master

* move to the cdk

* derp

* move explicit imports around

* Automated Commit - Formatting Changes

* Revert "move explicit imports around"

This reverts commit 56e306b72f.

* move explicit imports around

* Upgrade mypy version

* point to config file

* Update readme

* Ignore errors in the models module

* Automated Commit - Formatting Changes

* move check to gradle build

* Any

* try checking out master too

* Revert "try checking out master too"

This reverts commit 8a8f3e373c.

* fetch master

* install mypy

* try without origin

* fetch from the script

* checkout master

* ls the branches

* remotes/origin/master

* remove some cruft

* comment

* remove pydantic types

* unpin mypy

* fetch from the script

* Update connectors base too

* modify a non-cdk file to confirm it doesn't get checked by mypy

* run mypy after generateComponentManifestClassFiles

* run from the venv

* pass files as arguments

* update

* fix when running without args

* with subdir

* path

* try without /

* ./

* remove filter

* try resetting

* Revert "try resetting"

This reverts commit 3a54c424de.

* exclude autogen file

* do not use the github action

* works locally

* remove extra fetch

* run on connectors base

* try bad  typing

* Revert "try bad  typing"

This reverts commit 33b512a3e4.

* reset stripe

* Revert "reset stripe"

This reverts commit 28f23fc6dd.

* Revert "Revert "reset stripe""

This reverts commit 5bf5dee371.

* missing return type

* do not ignore the autogen file

* remove extra installs

* run from venv

* Only check files modified on current branch

* Revert "Only check files modified on current branch"

This reverts commit b4b728e654.

* use merge-base

* Revert "use merge-base"

This reverts commit 3136670cbf.

* try with updated mypy

* bump

* run other steps after mypy

* reset task ordering

* run mypy though

* looser config

* tests pass

* fix mypy issues

* type: ignore

* optional

* this is always a bool

* ignore

* fix typing issues

* remove ignore

* remove mapping

* Automated Commit - Formatting Changes

* Revert "remove ignore"

This reverts commit 9ffeeb6cb1.

* update config

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
2023-07-13 16:55:48 -07:00
Brian Lai
8e835963c1 [file-based cdk] spec schema improvements and fixes (#28263)
* fix spec schema incompatibility with ui and improve spec documentation and titles

* fix schema to account for latest changes pulled from main

* tests

* remove duplicate test
2023-07-13 15:14:05 -04:00
Catherine Noll
48843cf807 File-based CDK: handle user-input schema (#28052) 2023-07-13 11:59:42 -04:00
Brian Lai
f0951ffbd8 [file-based cdk] file based spec boilerplate backed by pydantic models (#28139)
* file based spec operation backed by pydantic models

* pr feedback to clean up various config and the test scenarios

* fix tests after rebase
2023-07-12 19:42:50 -04:00
Alexandre Girard
40e62fbcb4 Implement parquet parser (#28064)
* Implement parquet parser

* move comment

* comments

* Automated Commit - Formatting Changes

* cleanup

* Update

* remove superfluous method

* update

* format

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-07-12 13:40:30 -07:00