1
0
mirror of synced 2025-12-26 14:02:10 -05:00
Commit Graph

25 Commits

Author SHA1 Message Date
Serhii Lazebnyi
dd5adef0d4 feat(concurrent-cdk): add to concurrent per slice tracking of the most recent cursor (#45180) 2024-10-07 23:26:48 +02:00
Daryna Ishchenko
b3406937c6 feat(airbyt-cdk): add transform_record() to class DefaultFileBasedStream (#45698) 2024-09-24 17:02:21 +03:00
Artem Inzhyyants
df34893b63 feat(airbyte-cdk): replace pydantic BaseModel with dataclasses + serpyco-rs in protocol (#44444)
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
2024-09-02 17:48:17 +02:00
Brian Lai
fca0460030 [airbyte-cdk] tech-debt Remove support for parsing legacy state message format (#43459) 2024-08-16 21:06:37 -04:00
Anton Karpets
6c439a8859 [file-based cdk]: add config option to limit number of files for schema discover (#39317)
Co-authored-by: askarpets <anton.karpets@globallogic.com>
Co-authored-by: Serhii Lazebnyi <serhii.lazebnyi@globallogic.com>
Co-authored-by: Serhii Lazebnyi <53845333+lazebnyi@users.noreply.github.com>
2024-07-11 15:16:09 +02:00
Ella Rohm-Ensing
b7819d9f6c python: assert actual == expected ordering (#36980) 2024-04-11 15:16:33 +00:00
Ella Rohm-Ensing
acbdc2d6e1 Introduce FinalStateCursor to emit state messages at the end of full refresh syncs (#35905)
Co-authored-by: brianjlai <brian.lai@airbyte.io>
2024-03-08 16:58:26 -05:00
Brian Lai
ef98194673 Emit final state message for full refresh syncs and consolidate read flows (#35622) 2024-03-05 01:05:06 -05:00
Artem Inzhyyants
0954ad3d3a Airbyte CDK: add interpolation for request options (#35485)
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
2024-02-22 19:40:44 +01:00
Catherine Noll
e8910e427a File-based CDK: make incremental syncs concurrent (#34540) 2024-02-07 20:41:04 -05:00
Catherine Noll
eb31e4d2ba File-based CDK: make full refresh concurrent (#34411) 2024-01-29 19:33:50 -05:00
Baz
cf7f700bbb 🎉 Airbyte CDK (File-based CDK): Stop the sync if the record could not be parsed (#32589) 2024-01-11 21:26:23 +02:00
Augustin
0b33caecda Revert "[skip ci] formatting: add missing license headers (#33250)" (#33289) 2023-12-11 11:38:37 +01:00
Augustin
60c1cc01ad [skip ci] formatting: add missing license headers (#33250) 2023-12-11 10:15:18 +01:00
Joe Reuter
aa220fc515 Stop sync on traced exception (#33246)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-12-08 18:07:25 +01:00
Joe Reuter
e35a1f2cd9 File CDK: Allow configuration of parsed records during check and discover from parser (#31281)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-13 09:50:22 +02:00
Marius Posta
7ae97175a6 gradle: fix repo wide behaviour (#30607) 2023-09-28 05:01:13 -07:00
Maxime Carbonneau-Leclerc
b6836ad950 [ISSUE #30353] remove file_type from stream config (#30453) 2023-09-18 08:50:00 -04:00
Maxime Carbonneau-Leclerc
b801a3d24f Do not stop processing file on parsing error (#29679) 2023-08-21 15:56:01 -04:00
Brian Lai
b8d5ca77db 🐛 [file based cdk] Fix S3 and abstract spec to be compatible with Airbyte UI and CAT (#29075)
* remove version, make validation_policy enum, fix input_schema for s3 and abstract file based configs

* remove multiple file format options from stream config

* pr feedback

* fix tests after rebase

* additional spec changes to work with the UI

* fix tests post-rebase

* fix tests post-rebase and cleanup

* formatting
2023-08-08 18:10:05 -04:00
Catherine Noll
53d8450ec2 File-based CDK: allow FileBasedSource to take a cursor_cls (#29027) 2023-08-04 09:49:03 -04:00
Catherine Noll
09ebb47b24 File cdk parser and cursor updates (#28900)
* File-based CDK: update parquet parser to handle partitions

* File-based CDK: make the record output & cursor date time format consistent
2023-08-01 21:47:58 -04:00
Catherine Noll
73395a187a File-based CDK: allow null values for all inferred columns (#28847) 2023-07-31 15:10:21 -04:00
Alexandre Girard
97a353d5c5 Run mypy on airbyte-cdk as part of the build pipeline and fix typing issues in the file-based module (#27790)
* Try running only on modified files

* make a change

* return something with the wrong type

* Revert "return something with the wrong type"

This reverts commit 23b828371e.

* fix typing in file-based

* format

* Mypy

* fix

* leave as Mapping

* Revert "leave as Mapping"

This reverts commit 908f063f70.

* Use Dict

* update

* move dict()

* Revert "move dict()"

This reverts commit fa347a8236.

* Revert "Revert "move dict()""

This reverts commit c9237df2e4.

* Revert "Revert "Revert "move dict()"""

This reverts commit 5ac1616414.

* use Mapping

* point to config file

* comment

* strict = False

* remove --

* Revert "comment"

This reverts commit 6000814a82.

* install types

* install types in same command as mypy runs

* non-interactive

* freeze version

* pydantic plugin

* plugins

* update

* ignore missing import

* Revert "ignore missing import"

This reverts commit 1da7930fb7.

* Install pydantic instead

* fix

* this passes locally

* strict = true

* format

* explicitly import models

* Update

* remove old mypy.ini config

* temporarily disable mypy

* format

* any

* format

* fix tests

* format

* Automated Commit - Formatting Changes

* Revert "temporarily disable mypy"

This reverts commit eb8470fa3f.

* implicit reexport

* update test

* fix mypy

* Automated Commit - Formatting Changes

* fix some errors in tests

* more type fixes

* more fixes

* more

* .

* done with tests

* fix last files

* format

* Update gradle

* change source-stripe

* only run mypy on cdk

* remove strict

* Add more rules

* update

* ignore missing imports

* cast to string

* Allow untyped decorator

* reset to master

* move to the cdk

* derp

* move explicit imports around

* Automated Commit - Formatting Changes

* Revert "move explicit imports around"

This reverts commit 56e306b72f.

* move explicit imports around

* Upgrade mypy version

* point to config file

* Update readme

* Ignore errors in the models module

* Automated Commit - Formatting Changes

* move check to gradle build

* Any

* try checking out master too

* Revert "try checking out master too"

This reverts commit 8a8f3e373c.

* fetch master

* install mypy

* try without origin

* fetch from the script

* checkout master

* ls the branches

* remotes/origin/master

* remove some cruft

* comment

* remove pydantic types

* unpin mypy

* fetch from the script

* Update connectors base too

* modify a non-cdk file to confirm it doesn't get checked by mypy

* run mypy after generateComponentManifestClassFiles

* run from the venv

* pass files as arguments

* update

* fix when running without args

* with subdir

* path

* try without /

* ./

* remove filter

* try resetting

* Revert "try resetting"

This reverts commit 3a54c424de.

* exclude autogen file

* do not use the github action

* works locally

* remove extra fetch

* run on connectors base

* try bad  typing

* Revert "try bad  typing"

This reverts commit 33b512a3e4.

* reset stripe

* Revert "reset stripe"

This reverts commit 28f23fc6dd.

* Revert "Revert "reset stripe""

This reverts commit 5bf5dee371.

* missing return type

* do not ignore the autogen file

* remove extra installs

* run from venv

* Only check files modified on current branch

* Revert "Only check files modified on current branch"

This reverts commit b4b728e654.

* use merge-base

* Revert "use merge-base"

This reverts commit 3136670cbf.

* try with updated mypy

* bump

* run other steps after mypy

* reset task ordering

* run mypy though

* looser config

* tests pass

* fix mypy issues

* type: ignore

* optional

* this is always a bool

* ignore

* fix typing issues

* remove ignore

* remove mapping

* Automated Commit - Formatting Changes

* Revert "remove ignore"

This reverts commit 9ffeeb6cb1.

* update config

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
2023-07-13 16:55:48 -07:00
Alexandre Girard
6ebabdc2fa File-based CDK: Support for incremental syncs (#27382)
* New file-based CDK module scaffolding

* Address code review comments

* Formatting

* Automated Commit - Formatting Changes

* Apply suggestions from code review

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>

* Automated Commit - Formatting Changes

* address CR comments

* Update tests to use builder pattern

* Move max files for schema inference onto the discovery policy

* Reorganize stream & its dependencies

* File CDK: error handling for CSV parser (#27176)

* file url and updated_at timestamp is added to state's history field

* Address CR comments

* Address CR comments

* Use stream_slice to determine which files to sync

* fix

* test with no input state

* test with multiple files

* filter out older files

* group by timestamp

* Add another test

* comment

* use min time

* skip files that are already in the history

* move the code around

* include files that are not in the history

* remove start_timestamp

* cleanup

* sync misisng recent files even if history is more recent

* remove old files if history is full

* resync files if history is incomplete

* sync recent files

* comment

* configurable history size

* configurable days to sync if history is full

* move to a stateful object

* Only update state once per file

* two unit tests

* Unit tests

* missing files

* remove inner state

* fix tests

* fix interface

* fix constructor

* Update interface

* cleanup

* format

* Update

* cleanup

* Add timestamp and source file to schema

* set file uri on record

* format

* comment

* reset

* notes

* delete dead code

* format

* remove dead code

* remove dead code

* warning if history is not complete

* always set is_history_partial in the state

* rename

* Add a readme

* format

* Update

* rename

* rename

* missing files

* get instead of compute

* sort alphabetically, and sync everthing if the history is not partial

* unit tests

* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/README.md

Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>

* Update docs

* reset

* Test to verify we remove files sorted (datetime, alphabetically)

* comment

* Update scenario

* Rename method to get_state

* If the file's ts is equal to the earliest ts, only sync it if its alphabetically greater than the file

* add missing test

* rename

* rename and update comments

* Update comment for clarity

* inject the cursor

* add interface

* comment

* Handle the case where the file has been modified since it was synced

* Only inject from AbstractFileSource

* keep the remote files in the stream slices

* Use file_based typedefs

* format

* Update the comment

* simplify the logic, update comment, and add a test

* Add a comment

* slightly cleaner

* clean up

* typing

* comment

* I think this is simpler to reason about

* create the cursor in the source

* update

* Remove methods from FiledBasedStreamReader and AbstractFileBasedStream interface (#27736)

* update the interface

* Add a comment

* rename

---------

Co-authored-by: Catherine Noll <noll.catherine@gmail.com>
Co-authored-by: clnoll <clnoll@users.noreply.github.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
2023-06-27 15:58:26 -07:00