🍾 Update scaling doc. Add question on long running discover schema. (#5607)
This commit is contained in:
@@ -83,6 +83,14 @@ Yes, for more than 6000 thousand tables could be a problem to load the informati
|
||||
There are two Github issues about this limitation: [Issue #3942](https://github.com/airbytehq/airbyte/issues/3942)
|
||||
and [Issue #3943](https://github.com/airbytehq/airbyte/issues/3943).
|
||||
|
||||
## Help, Airbyte is hanging/taking a long time to discover my source's schema!
|
||||
|
||||
This usually happens for database sources that contain a lot of tables. This should resolve itself in half an hour or so.
|
||||
|
||||
If the source contains more than 6k tables, see the [above question](#there-is-a-limit-of-how-many-tables-one-connection-can-handle).
|
||||
|
||||
There is a known issue with [Oracle databases](https://github.com/airbytehq/airbyte/issues/4944).
|
||||
|
||||
## **I see you support a lot of connectors – what about connectors Airbyte doesn’t support yet?**
|
||||
|
||||
You can either:
|
||||
|
||||
@@ -10,7 +10,7 @@ It depends on your source and destination. Check our setup guides to see the tas
|
||||
|
||||
## **What data sources does Airbyte offer connectors for?**
|
||||
|
||||
We already offer 50+ connectors, and will focus all our effort in ramping up the number of connectors and strengthening them. View the [full list here](../project-overview/changelog/connectors.md). If you don’t see a source you need, you can file a [connector request here](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=area%2Fintegration%2C+new-integration&template=new-integration-request.md&title=).
|
||||
We already offer 100+ connectors, and will focus all our effort in ramping up the number of connectors and strengthening them. View the [full list here](../project-overview/changelog/connectors.md). If you don’t see a source you need, you can file a [connector request here](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=area%2Fintegration%2C+new-integration&template=new-integration-request.md&title=).
|
||||
|
||||
## **Where can I see my data in Airbyte?**
|
||||
|
||||
|
||||
@@ -1,11 +1,11 @@
|
||||
# Scaling Airbyte
|
||||
|
||||
As depicted in our [High-Level View](../understanding-airbyte/high-level-view.md), Airbyte is made up of several components under the hood:
|
||||
1) Scheduler
|
||||
2) Server
|
||||
3) Temporal
|
||||
4) Webapp
|
||||
5) Database
|
||||
1. Scheduler
|
||||
2. Server
|
||||
3. Temporal
|
||||
4. Webapp
|
||||
5. Database
|
||||
|
||||
These components perform control plane operations that are low-scale, low-resource work. In addition to the work being low cost, these components are efficient and optimized for these jobs, meaning that only uncommonly large workloads will require deployments at scale. In general, you would only encounter scaling issues when running over a thousand connections.
|
||||
|
||||
@@ -27,20 +27,18 @@ One worker reads from the source; the other worker writes to the destination.
|
||||
**In general, we recommend starting out with a mid-sized cloud instance (e.g. 4 or 8 cores) and gradually tuning instance size to your workload.**
|
||||
|
||||
There are two resources to be aware of when thinking of scale:
|
||||
1) Memory
|
||||
2) Disk space
|
||||
1. Memory
|
||||
2. Disk space
|
||||
|
||||
### Memory
|
||||
As mentioned above, we are mainly concerned with scaling Sync jobs. Within a Sync job, the main memory culprit is the Source worker.
|
||||
|
||||
This is because the Source worker reads up to 10,000 records in memory. This can present problems for database sources with tables that have large row sizes. e.g. a table with an average row size of 0.5MBs will require 0.5 * 10000 / 1000 = 5GBs of RAM. See [this issue](https://github.com/airbytehq/airbyte/issues/3439) for more information.
|
||||
|
||||
Our Java connectors currently follow Java's default behaviour with container memory and will only use up to 1/4 of the host's allocated memory. On Docker agent with 8GBs of RAM configured, a Java connector limits itself to 2Gbs of RAM and will see Out-of-Memory exceptions if this goes higher.
|
||||
Our Java connectors currently follow Java's default behaviour with container memory and will only use up to 1/4 of the host's allocated memory. e.g. On a Docker agent with 8GBs of RAM configured, a Java connector limits itself to 2Gbs of RAM and will see Out-of-Memory exceptions if this goes higher. The same applies to Kubernetes pods.
|
||||
|
||||
Note that all Source database connectors are Java connectors. This means that users currently need to over-specify memory resource for Java connectors.
|
||||
|
||||
On Docker, this can be solved by giving the Docker agent more memory. See [here](https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container) for instructions. You might need to switch to a node with more memory. On Kubernetes, this can be solved by using a node type with more memory.
|
||||
|
||||
Improving this behaviour is on our roadmap. Please see [this issue](https://github.com/airbytehq/airbyte/issues/3440) for more information.
|
||||
|
||||
### Disk Space
|
||||
@@ -48,9 +46,9 @@ Airbyte uses backpressure to try to read the minimal amount of logs required. In
|
||||
|
||||
However, disk space might become an issue for the following reasons:
|
||||
|
||||
1) Long-running syncs can produce a fair amount of logs from the Docker agent and Airbyte on Docker deployments. Some work has been done to minimize accidental logging, so this should no longer be an acute problem, but is still an open issue.
|
||||
1. Long-running syncs can produce a fair amount of logs from the Docker agent and Airbyte on Docker deployments. Some work has been done to minimize accidental logging, so this should no longer be an acute problem, but is still an open issue.
|
||||
|
||||
2) Although Airyte connector images aren't massive, they aren't exactly small either. The typical connector image is ~300MB. An Airbyte deployment with
|
||||
2. Although Airbyte connector images aren't massive, they aren't exactly small either. The typical connector image is ~300MB. An Airbyte deployment with
|
||||
multiple connectors can easily use up to 10GBs of disk space.
|
||||
|
||||
Because of this, we recommend allocating a minimum of 30GBs of disk space per node. Since storage is on the cheaper side, we'd recommend you be safe than sorry, so err on the side of over-provisioning.
|
||||
|
||||
Reference in New Issue
Block a user