1
0
mirror of synced 2026-01-10 09:04:48 -05:00
Files
airbyte/airbyte-cdk/python/airbyte_cdk/sources/file_based
Alexandre Girard 9de707fbf0 Parquet files: support decimal as floats, map, null, and fixed sized binary types (#28320)
* tests pass

* everything except parquet config seems to work

* the file fortmat needs a literal

* Add a comment

* Update

* comment

* Ensure only one file type is specified

* Add a test

* add test

* update

* Automated Commit - Formatting Changes

* extract formats

* Automated Commit - Formatting Changes

* fix typo

* Update tests

* Also test jsonl

* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py

Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>

* Update the spec

* update to new config format

* set decimal_as_float to True on legacy configs for backward compatibility

* comments

* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/config/file_based_stream_config.py

Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>

* format

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
2023-07-18 18:40:51 -05:00
..

Incremental syncs

The file-based connectors supports the following sync modes:

Feature Supported?
Full Refresh Sync Yes
Incremental Sync Yes
Replicate Incremental Deletes No
Replicate Multiple Files (pattern matching) Yes
Replicate Multiple Streams (distinct tables) Yes
Namespaces No

We recommend you do not

Incremental sync

After the initial sync, the connector only pulls files that were modified since the last sync.

The connector checkpoints the connection states when it is done syncing all files for a given timestamp. The connection's state only keeps track of the last 10 000 files synced. If more than 10 000 files are synced, the connector won't be able to rely on the connection state to deduplicate files. In this case, the connector will initialize its cursor to the minimum between the earliest file in the history, or 3 days ago.

Both the maximum number of files, and the time buffer can be configured by connector developers.