1
0
mirror of synced 2025-12-30 21:02:43 -05:00
Files
airbyte/docs/archive/faq/data-loading.md
Sophia Wiley 4eab55bb20 Cloud docs restructure (#23319)
* cloud docs restructure

* edited webhook notifications wording

* undo changes to webapp links

* removed one other change to webapp links

* removed most redirects because of query strings
commit

* changed utilize to use

* edited data residency intro based on patsys suggestion

* deleted typo in redirect

---------

Co-authored-by: Amruta Ranade <11484018+Amruta-Ranade@users.noreply.github.com>
2023-02-28 09:50:04 -08:00

125 lines
7.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Data Loading
## **Why dont I see any data in my destination yet?**
It can take a while for Airbyte to load data into your destination. Some sources have restrictive API limits which constrain how much
data we can sync in a given time. Large amounts of data in your source can also make the initial sync take longer. You can check your
sync status in your connection detail page that you can access through the destination detail page or the source one.
## **Why my final tables are being recreated everytime?**
Airbyte ingests data into raw tables and applies the process of normalization if you selected it in the connection page.
The normalization runs a full refresh each sync and for some destinations like Snowflake, Redshift, Bigquery this may incur more
resource consumption and more costs. You need to pay attention to the frequency that you're retrieving your data to avoid issues.
For example, if you create a connection to sync every 5 minutes with incremental sync on, it will only retrieve new records into the raw tables but will apply normalization
to *all* the data in every sync! If you have tons of data, this may not be the right sync frequency for you.
There is a [Github issue](https://github.com/airbytehq/airbyte/issues/4286) to implement normalization using incremental, which will reduce
costs and resources in your destination.
## **What happens if a sync fails?**
You won't lose data when a sync fails, however, no data will be added or updated in your destination.
Airbyte will automatically attempt to replicate data 3 times. You can see and export the logs for those attempts in the connection
detail page. You can access this page through the Source or Destination detail page.
You can configure a Slack webhook to warn you when a sync fails.
In the future you will be able to configure other notification method (email, Sentry) and an option to create a
GitHub issue with the logs. Were still working on it, and the purpose would be to help the community and the Airbyte team to fix the
issue as soon as possible, especially if it is a connector issue.
Until Airbyte has this system in place, here is what you can do:
* File a GitHub issue: go [here](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=)
and file an issue with the detailed logs copied in the issues description. The team will be notified about your issue and will update
it for any progress or comment on it.
* Fix the issue yourself: Airbyte is open source so you dont need to wait for anybody to fix your issue if it is important to you.
To do so, just fork the [GitHub project](https://github.com/airbytehq/airbyte) and fix the piece of code that need fixing. If youre okay
with contributing your fix to the community, you can submit a pull request. We will review it ASAP.
* Ask on Slack: dont hesitate to ping the team on [Slack](https://slack.airbyte.io).
Once all this is done, Airbyte resumes your sync from where it left off.
We truly appreciate any contribution you make to help the community. Airbyte will become the open-source standard only if everybody participates.
## **Can Airbyte support 2-way sync i.e. changes from A go to B and changes from B go to A?**
Airbyte actually does not support this right now. There are some details around how we handle schema and tables names that isn't going to
work for you in the current iteration.
If you attempt to do a circular dependency between source and destination, you'll end up with the following
A.public.table_foo writes to B.public.public_table_foo to A.public.public_public_table_foo. You won't be writing into your original table,
which I think is your intention.
## **What happens to data in the pipeline if the destination gets disconnected? Could I lose data, or wind up with duplicate data when the pipeline is reconnected?**
Airbyte is architected to prevent data loss or duplication. Airbyte will display a failure for the sync, and re-attempt it at the next syncing,
according to the frequency you set.
## **How frequently can Airbyte sync data?**
You can adjust the load time to run as frequent as every hour or as infrequent as once a year using [Cron expressions](https://docs.airbyte.com/cloud/managing-airbyte-cloud/edit-stream-configuration).
## **Why wouldnt I choose to load all of my data more frequently?**
While frequent data loads will give you more up-to-date data, there are a few reasons you wouldnt want to load your too frequently, including:
* Higher API usage may cause you to hit a limit that could impact other systems that rely on that API.
* Higher cost of loading data into your warehouse.
* More frequent delays, resulting in increased delay notification emails. For instance, if the data source generally takes several hours to
update but you wanted five-minute increments, you may receive a delay notification every sync.
Generally is recommended setting the incremental loads to every hour to help limit API calls.
## **Is there a way to know the estimated time to completion for the first historic sync?**
Unfortunately not yet.
## **Do you support change data capture \(CDC\) or logical replication for databases?**
Airbyte currently supports [CDC for Postgres and Mysql](../../understanding-airbyte/cdc.md). Airbyte is adding support for a few other
databases you can check in the roadmap.
## Using incremental sync, is it possible to add more fields when some new columns are added to a source table, or when a new table is added?
For the moment, incremental sync doesn't support schema changes, so you would need to perform a full refresh whenever that happens.
Heres a related [Github issue](https://github.com/airbytehq/airbyte/issues/1601).
## There is a limit of how many tables one connection can handle?
Yes, for more than 6000 thousand tables could be a problem to load the information on UI.
There are two Github issues about this limitation: [Issue #3942](https://github.com/airbytehq/airbyte/issues/3942)
and [Issue #3943](https://github.com/airbytehq/airbyte/issues/3943).
## Help, Airbyte is hanging/taking a long time to discover my source's schema!
This usually happens for database sources that contain a lot of tables. This should resolve itself in half an hour or so.
If the source contains more than 6k tables, see the [above question](#there-is-a-limit-of-how-many-tables-one-connection-can-handle).
There is a known issue with [Oracle databases](https://github.com/airbytehq/airbyte/issues/4944).
## **I see you support a lot of connectors what about connectors Airbyte doesnt support yet?**
You can either:
* Submit a [connector request](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=area%2Fintegration%2C+new-integration&template=new-integration-request.md&title=) on our Github project, and be notified once we or the community build a connector for it.
* Build a connector yourself by forking our [GitHub project](https://github.com/airbytehq/airbyte) and submitting a pull request. Here
are the [instructions how to build a connector](../../contributing-to-airbyte/README.md).
* Ask on Slack: dont hesitate to ping the team on [Slack](https://slack.airbyte.io).
## **What kind of notifications do I get?**
For the moment, the UI will only display one kind of notification: when a sync fails, Airbyte will display the failure at the source/destination
level in the list of sources/destinations, and in the connection detail page along with the logs.
However, there are other types of notifications:
* When a connector that you use is no longer up to date
* When your connections fails
* When core isn't up to date