1
0
mirror of synced 2025-12-30 12:04:43 -05:00
Commit Graph

539 Commits

Author SHA1 Message Date
Ben Church
fb7258e2bd Move tools/ci_* projects to airbyte-ci, update to use Poetry, bump to python 3.10 (#27957)
* Move ci_connector_ops

* Move ci_credentials

* Move tools/ci_common_utils

* Rename tools to airbyte-ci

* Move to ci

* Convert ci_credentials

* Convert ci_common_utls

* Convert ci_connector_ops

* Get pipelines running

* Move pipelines to own poetry project

* Update readme

* Delete

* Add ci_code_validator

* Use pipx to install gha deps

* Fix'

* Ensure every thing is running

* Automated Commit - Formatting Changes

* Gitignore miss

* Add pipx installer

* Get local pipx dependencies

* Fix paths

* Install pipx

* ceremonial source-faker change

* Add installation step for ci_code_validator

* Add comment

* remove ci_code_validator

* Address code review comments

* add pipx install to acceptance-test-docker.sh

* Run formater

* Revert "ceremonial source-faker change"

This reverts commit 26884cd0db.

* gitignore lecacy pipeline report path

* update poetry.lock

* skip upload if logs do not exist

---------

Co-authored-by: bnchrch <bnchrch@users.noreply.github.com>
Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
2023-07-26 15:49:59 +00:00
Lake Mossman
f66079cdb1 Capitalize title of request_authentication field (#28585)
* capitalize title of request_authentication field

* Automated Commit - Formatting Changes

---------

Co-authored-by: lmossman <lmossman@users.noreply.github.com>
2023-07-25 17:17:27 -06:00
girarda
6e75a7812e 🤖 Bump patch version of Airbyte CDK 2023-07-25 21:48:36 +00:00
Alexandre Girard
df01616951 [Issue #23497] Deduplicate query parameters for declarative connectors (#28550)
* remove duplicate param

* remove duplicate params

* fix some of the typing issues

* fix typing issues

* newline

* format

* Enable by default

* Add missing file

* refactor and remove flag

* none check

* move line of code

* fix typing in rate_limiting

* comment

* use typedef

* else branch

* format

* gate the feature

* rename test

* fix the test

* only dedupe if the values are the same

* Add some tests

* convert values to strings

* Document the change

* implement in requester too
2023-07-25 14:22:25 -07:00
Brian Lai
59300093b1 [file-based cdk] Add avro parser for inferring schema and reading records (#28500)
* add avro parser for inferring schema and reading records

* fix mypy check not caught locally

* pr feedback and some additional types

* add decimal_as_float for avro

* formatting + mypy
2023-07-25 12:54:16 -04:00
maxi297
62ca5b82ea 🤖 Bump patch version of Airbyte CDK 2023-07-20 14:50:12 +00:00
Maxime Carbonneau-Leclerc
b1a5f270ae Fix remove field transform (#28518)
* Fix remove field transform

* mypy
2023-07-20 10:15:42 -04:00
flash1293
05b00303ba 🤖 Bump minor version of Airbyte CDK 2023-07-19 15:18:35 +00:00
Joe Reuter
58cc540c6b 🚨 Low code CDK: Add session token authenticator (#28050)
This PR adds a new authenticator: The SessionTokenAuthenticator. The existing authenticator under the same name is renamed to LegacySessionTokenAuthenticator.
2023-07-19 17:10:24 +02:00
Joe Reuter
78728410f4 Low code CDK: Fix mypy errors (#28386)
* ingore unit tests in mypy check

* Update airbyte-cdk/python/bin/run-mypy-on-modified-files.sh

Co-authored-by: Alexandre Girard <alexandre@airbyte.io>

* work through mypy errors

* fix a bunch of stuff

* fix more type hints

* fix model_to_component_factory types

* format

* ignore list instead of allow list

---------

Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
2023-07-19 15:08:35 +02:00
Joe Reuter
db16853fd8 Ingore unit tests in mypy check (#28359)
* ingore unit tests in mypy check

* Update airbyte-cdk/python/bin/run-mypy-on-modified-files.sh

Co-authored-by: Alexandre Girard <alexandre@airbyte.io>

* ignore list instead of allow list

---------

Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
2023-07-19 10:41:34 +02:00
girarda
8099c254f8 🤖 Bump patch version of Airbyte CDK 2023-07-19 01:56:49 +00:00
Alexandre Girard
dcf35701f4 Fix cdk unit test (#28447)
* Update spec

* fix test_create_custom_components

* ignore errors
2023-07-18 18:40:32 -07:00
Alexandre Girard
9de707fbf0 Parquet files: support decimal as floats, map, null, and fixed sized binary types (#28320)
* tests pass

* everything except parquet config seems to work

* the file fortmat needs a literal

* Add a comment

* Update

* comment

* Ensure only one file type is specified

* Add a test

* add test

* update

* Automated Commit - Formatting Changes

* extract formats

* Automated Commit - Formatting Changes

* fix typo

* Update tests

* Also test jsonl

* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py

Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>

* Update the spec

* update to new config format

* set decimal_as_float to True on legacy configs for backward compatibility

* comments

* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/config/file_based_stream_config.py

Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>

* format

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
2023-07-18 18:40:51 -05:00
Alexandre Girard
3ae73fb0ff connector builder: Set test_read_limit_reached to true if we hit the max records limit (#28293)
* set test_read_limit_reached to true if we hit the max records limit

* rename slice to _slice to avoid shadowing a builtin keyword

* newline

* fix some of the typing issues

* fix some more typing issues

* another fix

* fix last typing issue

* format

* Automated Commit - Formatting Changes

* reset type

* fix the type

* Update for clarity

* Update types

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-07-18 15:53:53 -07:00
girarda
ef26793fac 🤖 Bump minor version of Airbyte CDK 2023-07-18 21:52:18 +00:00
Alexandre Girard
5ca1b41eb7 Move pyarrow to CDK extra (#28413)
* move pyarrow to extra

* Automated Commit - Formatting Changes

* remove parquet tests

* delete the import

* missing space

* Automated Commit - Formatting Changes

* comment parquet_parser too

* optimize imports

* comment out temporary file source

* add pyarrow to dev extra

* reset files

* share pyarrow dependency

* use alpine for declarative_source

* Automated Commit - Formatting Changes

* Revert "use alpine for declarative_source"

This reverts commit a3ad47ccca.

* pin cdk version

* reset the cdk version

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-07-18 16:20:53 -05:00
Maxime Carbonneau-Leclerc
250e508f8c Remove pyyaml/cython tmp fix in airbyte-cdk docker (#28396) 2023-07-18 10:38:53 -04:00
maxi297
25fee2206a 🤖 Bump minor version of Airbyte CDK 2023-07-18 14:16:25 +00:00
Maxime Carbonneau-Leclerc
21d1a3bd62 Fix pyyaml/cython issue (#28393) 2023-07-18 10:09:58 -04:00
Catherine Noll
e2bb01838e File-based CDK: implement JSONL parser (#28259) 2023-07-17 22:46:58 -06:00
Joe Reuter
0d185a2b40 fix date format detection (#28268) 2023-07-14 13:16:15 +02:00
Alexandre Girard
97a353d5c5 Run mypy on airbyte-cdk as part of the build pipeline and fix typing issues in the file-based module (#27790)
* Try running only on modified files

* make a change

* return something with the wrong type

* Revert "return something with the wrong type"

This reverts commit 23b828371e.

* fix typing in file-based

* format

* Mypy

* fix

* leave as Mapping

* Revert "leave as Mapping"

This reverts commit 908f063f70.

* Use Dict

* update

* move dict()

* Revert "move dict()"

This reverts commit fa347a8236.

* Revert "Revert "move dict()""

This reverts commit c9237df2e4.

* Revert "Revert "Revert "move dict()"""

This reverts commit 5ac1616414.

* use Mapping

* point to config file

* comment

* strict = False

* remove --

* Revert "comment"

This reverts commit 6000814a82.

* install types

* install types in same command as mypy runs

* non-interactive

* freeze version

* pydantic plugin

* plugins

* update

* ignore missing import

* Revert "ignore missing import"

This reverts commit 1da7930fb7.

* Install pydantic instead

* fix

* this passes locally

* strict = true

* format

* explicitly import models

* Update

* remove old mypy.ini config

* temporarily disable mypy

* format

* any

* format

* fix tests

* format

* Automated Commit - Formatting Changes

* Revert "temporarily disable mypy"

This reverts commit eb8470fa3f.

* implicit reexport

* update test

* fix mypy

* Automated Commit - Formatting Changes

* fix some errors in tests

* more type fixes

* more fixes

* more

* .

* done with tests

* fix last files

* format

* Update gradle

* change source-stripe

* only run mypy on cdk

* remove strict

* Add more rules

* update

* ignore missing imports

* cast to string

* Allow untyped decorator

* reset to master

* move to the cdk

* derp

* move explicit imports around

* Automated Commit - Formatting Changes

* Revert "move explicit imports around"

This reverts commit 56e306b72f.

* move explicit imports around

* Upgrade mypy version

* point to config file

* Update readme

* Ignore errors in the models module

* Automated Commit - Formatting Changes

* move check to gradle build

* Any

* try checking out master too

* Revert "try checking out master too"

This reverts commit 8a8f3e373c.

* fetch master

* install mypy

* try without origin

* fetch from the script

* checkout master

* ls the branches

* remotes/origin/master

* remove some cruft

* comment

* remove pydantic types

* unpin mypy

* fetch from the script

* Update connectors base too

* modify a non-cdk file to confirm it doesn't get checked by mypy

* run mypy after generateComponentManifestClassFiles

* run from the venv

* pass files as arguments

* update

* fix when running without args

* with subdir

* path

* try without /

* ./

* remove filter

* try resetting

* Revert "try resetting"

This reverts commit 3a54c424de.

* exclude autogen file

* do not use the github action

* works locally

* remove extra fetch

* run on connectors base

* try bad  typing

* Revert "try bad  typing"

This reverts commit 33b512a3e4.

* reset stripe

* Revert "reset stripe"

This reverts commit 28f23fc6dd.

* Revert "Revert "reset stripe""

This reverts commit 5bf5dee371.

* missing return type

* do not ignore the autogen file

* remove extra installs

* run from venv

* Only check files modified on current branch

* Revert "Only check files modified on current branch"

This reverts commit b4b728e654.

* use merge-base

* Revert "use merge-base"

This reverts commit 3136670cbf.

* try with updated mypy

* bump

* run other steps after mypy

* reset task ordering

* run mypy though

* looser config

* tests pass

* fix mypy issues

* type: ignore

* optional

* this is always a bool

* ignore

* fix typing issues

* remove ignore

* remove mapping

* Automated Commit - Formatting Changes

* Revert "remove ignore"

This reverts commit 9ffeeb6cb1.

* update config

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
2023-07-13 16:55:48 -07:00
Brian Lai
8e835963c1 [file-based cdk] spec schema improvements and fixes (#28263)
* fix spec schema incompatibility with ui and improve spec documentation and titles

* fix schema to account for latest changes pulled from main

* tests

* remove duplicate test
2023-07-13 15:14:05 -04:00
Catherine Noll
48843cf807 File-based CDK: handle user-input schema (#28052) 2023-07-13 11:59:42 -04:00
Brian Lai
f0951ffbd8 [file-based cdk] file based spec boilerplate backed by pydantic models (#28139)
* file based spec operation backed by pydantic models

* pr feedback to clean up various config and the test scenarios

* fix tests after rebase
2023-07-12 19:42:50 -04:00
Alexandre Girard
40e62fbcb4 Implement parquet parser (#28064)
* Implement parquet parser

* move comment

* comments

* Automated Commit - Formatting Changes

* cleanup

* Update

* remove superfluous method

* update

* format

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-07-12 13:40:30 -07:00
maxi297
07da56914f 🤖 Bump patch version of Airbyte CDK 2023-07-11 17:43:50 +00:00
Maxime Carbonneau-Leclerc
df2a6e50bb Issue 21014/oauth requests (#27973)
* [ISSUE #27494] fix type issue caused by connector builder logging

* [ISSUE #21014] log request/response for oauth as 'global_requests'

* formatcdk

* [ISSUE #21014] support DeclarativeOauth2Authenticator as well

* [ISSUE #21014] improving message grouper tests

* formatcdk

* Test solution with logic in MessageRepository (#27990)

* Test solution with logic in MessageRepository

* Solution without creating a new ModelToComponentFactory

* [ISSUE #21014] adding tests

* [ISSUE #21014] add title and description to global requests

* Revert "Solution without creating a new ModelToComponentFactory"

This reverts commit f17799ecff.

* Automated Commit - Formatting Changes

* [ISSUE #21014] code review

* [ISSUE #21014] do not break on log appender conflict

* Automated Commit - Formatting Changes

* [ISSUE #21014] code review

* formatcdk

* [ISSUE #21014] moving is_global to is_auxiliary
2023-07-11 13:37:38 -04:00
Catherine Noll
07286f7069 File-based CDK: implement schemaless option (#28063) 2023-07-11 11:52:47 -04:00
Brian Lai
f79aa72d64 refactor config validation_policy to not store policies on the config (#28097) 2023-07-10 20:27:02 -04:00
maxi297
b058261081 🤖 Bump patch version of Airbyte CDK 2023-07-06 12:09:49 +00:00
Maxime Carbonneau-Leclerc
c609897848 Stream state is not recorded if cursor field is result of transformation (#27915)
* [ISSUE #27494] move transformation for record selection

* formatcdk

* [ISSUE #27494] fix type issue caused by connector builder logging

* formatcdk

* [ISSUE #27494] code review
2023-07-06 08:02:16 -04:00
Catherine Noll
cfec41b1e5 File-based CDK: implement schema validation policy options (#27816) 2023-07-06 03:35:48 -04:00
Brian Lai
aa57cc21ba [26763] csv config options validation and use by reader (#27850)
* csv options validation applying dialect to reader and rafeactoring parser interfaces a bit

* fix tests

* pr feedback

* add quoting behavior config format
2023-07-05 18:33:23 -04:00
Lake Mossman
b78762f641 Update the request_body_data description to remove typos and be more readable (#27783)
* fix typo in schema

* remove hyphen
2023-07-03 15:45:59 -07:00
maxi297
f472bc0667 🤖 Bump patch version of Airbyte CDK 2023-06-29 17:31:02 +00:00
Maxime Carbonneau-Leclerc
4376527266 Fixing an issue with square as CATs only compare string and not datetime (#27840)
* Fixing an issue with square as CATs only compare string and not datetime

* formatcdk
2023-06-29 13:23:06 -04:00
maxi297
95bd388f9b 🤖 Bump patch version of Airbyte CDK 2023-06-29 14:22:36 +00:00
Maxime Carbonneau-Leclerc
91a56171a0 [ISSUE #26343] update close_slice to use the greater record (#27818)
* [ISSUE #26343] update close_slice to use the greater record

* Renaming parameter for close_slice

* code review
2023-06-29 10:09:12 -04:00
Catherine Noll
6fb53c65ee File-based CDK: implement AvailabilityStrategy.check_availability (#27609) 2023-06-28 17:32:07 -04:00
maxi297
59f6acf2f4 🤖 Bump minor version of Airbyte CDK 2023-06-28 20:58:41 +00:00
Maxime Carbonneau-Leclerc
a013fad5a9 [ISSUE-26343] data feed (#27475)
* [ISSUE #26581] per partition cursor

* [ISSUE #26581] format

* [ISSUE #26581] clean up state management

* [ISSUE #26581] improving Hashabledict

* [ISSUE #26581] format cdk

* [ISSUE #26581] fix tests

* [ISSUE #26581] code review from girarda

* Retrigger pipeline

* Decouple cursor and stream slicer and pushing state management as far up cursor as possible

* Format cdk

* Small fixes/comments

* DatetimeBasedCursor should not update state based on slice (for now at least since it wasn't doing this before)

* [ISSUE #26581] code review

* Automated Commit - Formatting Changes

* [ISSUE #26581] validation overlapping keys

* [ISSUE #26581] add typing

* [ISSUE #26581] code review

* Remove SyncMode from stream_slices

* Removing SyncMode from stream_slices up until SimpleRetriever and fixing typing

* [ISSUE-26434] replacing Record primitive by class

* [ISSUE-26434] update Cursor.update_state to use new record object

* Issue 26343/data feed incremental sync solution 2 (#27481)

* TMP [ISSUE-26434] first solution to enable stop condition on pagination

* TMP [ISSUE-26434] second solution to enable stop condition on pagination

* TMP [ISSUE-26434] second solution fix

* [ISSUE #26343] fixing behavior and adding tests

* [ISSUE #26343] only updating state once a slice to allow for data feed

* [ISSUE #26343] removing freezing of cursor

* format cdk

* [ISSUE #26343] ensure data_feed doesn't have end_datetime

* [ISSUE #26343] self review

* [ISSUE #26343] code review

* [ISSUE #26343] code review clean up

* [ISSUE #26343] code review clean up

* Code review

* [ISSUE #26343] add warn log message in DatetimeBasedCursor

* format

* Format
2023-06-28 16:53:00 -04:00
girarda
8f8cbd80a7 🤖 Bump patch version of Airbyte CDK 2023-06-28 16:38:56 +00:00
Alexandre Girard
4d08781d04 Revert "Low-Code CDK: make RecordFilter.filter_records as generator (#24772)" (#27789)
This reverts commit 032f9b8045.
2023-06-28 09:23:05 -07:00
Alexandre Girard
6ebabdc2fa File-based CDK: Support for incremental syncs (#27382)
* New file-based CDK module scaffolding

* Address code review comments

* Formatting

* Automated Commit - Formatting Changes

* Apply suggestions from code review

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>

* Automated Commit - Formatting Changes

* address CR comments

* Update tests to use builder pattern

* Move max files for schema inference onto the discovery policy

* Reorganize stream & its dependencies

* File CDK: error handling for CSV parser (#27176)

* file url and updated_at timestamp is added to state's history field

* Address CR comments

* Address CR comments

* Use stream_slice to determine which files to sync

* fix

* test with no input state

* test with multiple files

* filter out older files

* group by timestamp

* Add another test

* comment

* use min time

* skip files that are already in the history

* move the code around

* include files that are not in the history

* remove start_timestamp

* cleanup

* sync misisng recent files even if history is more recent

* remove old files if history is full

* resync files if history is incomplete

* sync recent files

* comment

* configurable history size

* configurable days to sync if history is full

* move to a stateful object

* Only update state once per file

* two unit tests

* Unit tests

* missing files

* remove inner state

* fix tests

* fix interface

* fix constructor

* Update interface

* cleanup

* format

* Update

* cleanup

* Add timestamp and source file to schema

* set file uri on record

* format

* comment

* reset

* notes

* delete dead code

* format

* remove dead code

* remove dead code

* warning if history is not complete

* always set is_history_partial in the state

* rename

* Add a readme

* format

* Update

* rename

* rename

* missing files

* get instead of compute

* sort alphabetically, and sync everthing if the history is not partial

* unit tests

* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/README.md

Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>

* Update docs

* reset

* Test to verify we remove files sorted (datetime, alphabetically)

* comment

* Update scenario

* Rename method to get_state

* If the file's ts is equal to the earliest ts, only sync it if its alphabetically greater than the file

* add missing test

* rename

* rename and update comments

* Update comment for clarity

* inject the cursor

* add interface

* comment

* Handle the case where the file has been modified since it was synced

* Only inject from AbstractFileSource

* keep the remote files in the stream slices

* Use file_based typedefs

* format

* Update the comment

* simplify the logic, update comment, and add a test

* Add a comment

* slightly cleaner

* clean up

* typing

* comment

* I think this is simpler to reason about

* create the cursor in the source

* update

* Remove methods from FiledBasedStreamReader and AbstractFileBasedStream interface (#27736)

* update the interface

* Add a comment

* rename

---------

Co-authored-by: Catherine Noll <noll.catherine@gmail.com>
Co-authored-by: clnoll <clnoll@users.noreply.github.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
2023-06-27 15:58:26 -07:00
flash1293
d0d906c0a7 🤖 Bump patch version of Airbyte CDK 2023-06-27 06:33:44 +00:00
Joe Reuter
8aba48810c Low-code CDK: Serialize request body as string for connector builder module (#27657)
* serialize request body as string

* fix some bugs
2023-06-27 08:27:16 +02:00
brianjlai
4cd2cbbea5 🤖 Bump patch version of Airbyte CDK 2023-06-23 17:55:25 +00:00
midavadim
c44c3eae48 CDK: availability check - handle HttpErrors which happen during slice extraction (#26630)
* for availability check - handle  HttError happens during slice extraction (reading of parent stream),
updated reason messages,
moved check availability call under common try/except which handles errors during usual stream read,
moved log messages which indicate start of the stream sync before availability check in to make to understand which stream is the source of errors

* why do we return here and not try next stream?

* fixed bug in CheckStream, now we try to check availability for all streams
2023-06-23 13:15:25 -04:00