airbyte

mirror of synced 2026-01-04 09:04:47 -05:00

Go to file

Alexandre Girard 6ebabdc2fa File-based CDK: Support for incremental syncs (#27382 )

* New file-based CDK module scaffolding

* Address code review comments

* Formatting

* Automated Commit - Formatting Changes

* Apply suggestions from code review

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>

* Automated Commit - Formatting Changes

* address CR comments

* Update tests to use builder pattern

* Move max files for schema inference onto the discovery policy

* Reorganize stream & its dependencies

* File CDK: error handling for CSV parser (#27176)

* file url and updated_at timestamp is added to state's history field

* Address CR comments

* Address CR comments

* Use stream_slice to determine which files to sync

* fix

* test with no input state

* test with multiple files

* filter out older files

* group by timestamp

* Add another test

* comment

* use min time

* skip files that are already in the history

* move the code around

* include files that are not in the history

* remove start_timestamp

* cleanup

* sync misisng recent files even if history is more recent

* remove old files if history is full

* resync files if history is incomplete

* sync recent files

* comment

* configurable history size

* configurable days to sync if history is full

* move to a stateful object

* Only update state once per file

* two unit tests

* Unit tests

* missing files

* remove inner state

* fix tests

* fix interface

* fix constructor

* Update interface

* cleanup

* format

* Update

* cleanup

* Add timestamp and source file to schema

* set file uri on record

* format

* comment

* reset

* notes

* delete dead code

* format

* remove dead code

* remove dead code

* warning if history is not complete

* always set is_history_partial in the state

* rename

* Add a readme

* format

* Update

* rename

* rename

* missing files

* get instead of compute

* sort alphabetically, and sync everthing if the history is not partial

* unit tests

* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/README.md

Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>

* Update docs

* reset

* Test to verify we remove files sorted (datetime, alphabetically)

* comment

* Update scenario

* Rename method to get_state

* If the file's ts is equal to the earliest ts, only sync it if its alphabetically greater than the file

* add missing test

* rename

* rename and update comments

* Update comment for clarity

* inject the cursor

* add interface

* comment

* Handle the case where the file has been modified since it was synced

* Only inject from AbstractFileSource

* keep the remote files in the stream slices

* Use file_based typedefs

* format

* Update the comment

* simplify the logic, update comment, and add a test

* Add a comment

* slightly cleaner

* clean up

* typing

* comment

* I think this is simpler to reason about

* create the cursor in the source

* update

* Remove methods from FiledBasedStreamReader and AbstractFileBasedStream interface (#27736)

* update the interface

* Add a comment

* rename

---------

Co-authored-by: Catherine Noll <noll.catherine@gmail.com>
Co-authored-by: clnoll <clnoll@users.noreply.github.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

2023-06-27 15:58:26 -07:00

.github

change default bucket (#27749 )

2023-06-27 03:27:52 -05:00

airbyte-api

…

airbyte-base-java-image

yum clean all after every yum install to save space (#27555 )

2023-06-23 13:53:22 -07:00

airbyte-cdk

File-based CDK: Support for incremental syncs (#27382 )

2023-06-27 15:58:26 -07:00

airbyte-ci

orchestrator: add html report url to connector health messages (#27665 )

2023-06-24 11:09:26 +02:00

airbyte-commons

🐛 Source zenloop: update to state per partition (#27556 )

2023-06-22 08:42:20 -05:00

airbyte-commons-cli

…

airbyte-commons-protocol

…

airbyte-config-oss

Update gradle command to remove gitignored files (#26924 )

2023-06-02 02:24:59 +00:00

airbyte-connector-test-harnesses/acceptance-test-harness

✨ Destination Bigquery: stop running normalization container for DAT (#25925 )

2023-05-18 00:46:32 +00:00

airbyte-db/db-lib

Fix issue in streaming JDBC database (#27212 )

2023-06-20 12:08:47 -05:00

airbyte-integrations

Google Sheets Destination connector licenses to Elv2 (#27780 )

2023-06-27 16:31:12 -05:00

airbyte-json-validation

…

airbyte-test-utils

…

buildSrc

…

docs

Google Sheets Destination connector licenses to Elv2 (#27780 )

2023-06-27 16:31:12 -05:00

docusaurus

Delete docs for old sentry integration (#27719 )

2023-06-26 17:06:49 -04:00

gradle/wrapper

…

octavia-cli

fix: stops redirecting stderr to /dev/null during image pull in octavia-cli install (#27039 )

2023-06-14 15:32:54 -05:00

resources/examples/airflow

…

tools

Remove Affected connector report (#27622 )

2023-06-26 09:54:58 -07:00

.bumpversion.cfg

Bump Airbyte version from 0.50.4 to 0.50.5

2023-06-25 17:39:02 +00:00

.dockerignore

…

.editorconfig

…

.gitignore

connectors-ci: write pytest logs to host (#27042 )

2023-06-06 02:38:19 -05:00

.pre-commit-config.yaml

Feat: Metadata Service remove old catalog system (#26013 )

2023-05-16 11:40:11 -07:00

.prettierignore

…

.python-version

…

.readthedocs.yaml

…

.root

…

build.gradle

Feat: Metadata Service remove old catalog system (#26013 )

2023-05-16 11:40:11 -07:00

CODE_OF_CONDUCT.md

…

codecov.yml

…

connectors.md

🐛 Source Google Ads: add 'asset' Resource to full refresh custom tables (#25624 )

2023-05-30 13:16:15 -04:00

CONTRIBUTING.md

…

CONTRIBUTORS.md

…

deps.toml

upgrade debezium version to 2.2.0.Final (#25401 )

2023-04-27 22:12:20 +05:30

gradle.properties

Bump Airbyte version from 0.50.4 to 0.50.5

2023-06-25 17:39:02 +00:00

gradlew

…

gradlew.bat

…

LICENSE

…

LICENSE_SHORT

…

publish-repositories.gradle

…

pyproject.toml

Remove debug mode for pytest config (#27157 )

2023-06-08 19:17:15 +02:00

pytest.ini

…

README.md

Update Airbyte README (#26913 )

2023-06-02 10:56:13 -04:00

run-ab-platform.sh

Bump Airbyte version from 0.50.4 to 0.50.5

2023-06-25 17:39:02 +00:00

settings.gradle

Registry: Refactor and Remove registry download from git. (#26217 )

2023-05-19 09:05:09 -07:00

spotbugs-exclude-filter-file.xml

…

THANK-YOU.md

Update THANK-YOU.md (#25449 )

2023-04-24 13:54:59 -07:00

README.md

Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

We believe that only an open-source solution to data movement can cover the long tail of data sources while empowering data engineers to customize existing connectors. Our ultimate vision is to help you move data from any source to any destination. Airbyte already provides the largest catalog of 300+ connectors for APIs, databases, data warehouses, and data lakes.

Screenshot taken from Airbyte Cloud.

Getting Started

Deploy Airbyte Open Source or set up Airbyte Cloud to start centralizing your data.
Create connectors in minutes with our no-code Connector Builder or low-code CDK.
Explore popular use cases in our tutorials.
Orchestrate Airbyte syncs with Airflow, Prefect, Dagster or the Airbyte API.
Easily transform loaded data with SQL or dbt.

Try it out yourself with our demo app, visit our full documentation and learn more about recent announcements. See our registry for a full list of connectors already available in Airbyte or Airbyte Cloud.

Join the Airbyte Community

The Airbyte community can be found in the Airbyte Community Slack, where you can ask questions and voice ideas. You can also ask for help in our Discourse forum, or join our office hours. Airbyte's roadmap is publicly viewable on GitHub.

For videos and blogs on data engineering and building your data stack, check out Airbyte's Content Hub, Youtube, and sign up for our newsletter.

Dedicated support with direct access to our team is also available for Open Source users. If you are interested, please fill out this form.

Contributing

If you've found a problem with Airbyte, please open a GitHub issue. To contribute to Airbyte and see our Code of Conduct, please see the contributing guide. We have a list of good first issues that contain bugs that have a relatively limited scope. This is a great place to get started, gain experience, and get familiar with our contribution process.

Security

Airbyte takes security issues very seriously. Please do not file GitHub issues or post on our public forum for security vulnerabilities. Email security@airbyte.io if you believe you have uncovered a vulnerability. In the message, try to provide a description of the issue and ideally a way of reproducing it. The security team will get back to you as soon as possible.

Airbyte Enterprise also offers additional security features (among others) on top of Airbyte Open Source.

License

See the LICENSE file for licensing information, and our FAQ for any questions you may have on that topic.

Thank You

Airbyte would not be possible without the support and assistance of other open-source tools and companies. Visit our thank you page to learn more about how we build Airbyte.

Languages

Python 52.7%

Kotlin 35.8%

Java 8.8%

MDX 0.9%

JavaScript 0.7%

Other 0.8%