1
0
mirror of synced 2025-12-25 02:09:19 -05:00

[Docs] Simplify Connection Settings (#36502)

Co-authored-by: Chandler Prall <chandler.prall@gmail.com>
This commit is contained in:
Natalie Kwong
2024-03-26 13:21:33 -07:00
committed by GitHub
parent 7382c877ef
commit 1bb83eccc6
16 changed files with 76 additions and 66 deletions

View File

@@ -14,7 +14,7 @@ To configure these settings:
1. In the Airbyte UI, click **Connections** and then click the connection you want to change.
2. Click the **Replication** tab.
2. Click the **Settings** tab.
3. Click the **Configuration** dropdown to expand the options.
@@ -29,31 +29,23 @@ You can configure the following settings:
| Setting | Description |
|--------------------------------------|-------------------------------------------------------------------------------------|
| Connection Name | A custom name for your connection |
| [Replication frequency](/using-airbyte/core-concepts/sync-schedules.md) | How often data syncs (can be scheduled, cron, API-triggered or manual) |
| [Destination namespace](/using-airbyte/core-concepts/namespaces.md) | Where the replicated data is written to in the destination |
| Destination stream prefix | A prefix added to each table name in the destination |
| [Schedule Type](/using-airbyte/core-concepts/sync-schedules.md) | How often data syncs (can be scheduled, cron, API-triggered or manual) |
| [Destination Namespace](/using-airbyte/core-concepts/namespaces.md) | Where the replicated data is written to in the destination |
| Destination Stream Prefix | A prefix added to each table name in the destination |
| [Detect and propagate schema changes](/cloud/managing-airbyte-cloud/manage-schema-changes.md) | How Airbyte handles schema changes in the source |
| [Connection Data Residency](/cloud/managing-airbyte-cloud/manage-data-residency.md) | Where data will be processed (Cloud only) |
## Modify streams in your connection
## Modify Streams
In the **Activate the streams you want to sync** table, you choose which streams to sync and how they are loaded to the destination.
On the "Schema" tab, you choose which streams to sync and how they are loaded to the destination.
:::info
A connection's schema consists of one or many streams. Each stream is most commonly associated with a database table or an API endpoint. Within a stream, there can be one or many fields or columns.
:::
To modify streams:
To modify streams, click **Connections** and then click the connection you want to change. Click the **Schema** tab to see all the streams Airbyte can sync. To modify an individual stream:
1. In the Airbyte UI, click **Connections** and then click the connection you want to change.
2. Click the **Replication** tab.
3. Scroll down to the **Activate the streams you want to sync** table.
Modify an individual stream:
1. In the **Activate the streams you want to sync** table, toggle **Sync** on or off for your selected stream. To select or deselect all streams, click the checkbox in the table header. To deselect an individual stream, deselect its checkbox in the table.
1. Toggle **Sync** on or off for your selected stream. To select or deselect all streams at once, use "Hide disabled streams" in the table header. To deselect an individual stream, use the toggle in its row.
2. Click the **Sync mode** dropdown and select the sync mode you want to apply. Depending on the sync mode you select, you may need to choose a cursor or primary key.

View File

@@ -44,7 +44,7 @@ To purchase credits directly through the UI,
## Automatic reload of credits
You can enroll in automatic top-ups of your credit balance. Thie feature is for those who do not want to manually add credits each time.
You can enroll in automatic top-ups of your credit balance. This feature is for those who do not want to manually add credits each time.
To enroll, [email us](mailto:billing@airbyte.io) with:

View File

@@ -10,9 +10,10 @@ As a part of connection setup, you select where in the destination you want to w
| Destination Namespace | Description |
| ---------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| Destination default | All streams will be replicated to the single default namespace defined in the Destination's settings. |
| Mirror source structure | Some sources (for example, databases) provide namespace information for a stream. If a source provides namespace information, the destination will mirror the same namespace when this configuration is set. For sources or streams where the source namespace is not known, the behavior will default to the "Destination default" option. |
| Custom format | All streams will be replicated to a single user-defined namespace. See<a href="/understanding-airbyte/namespaces#--custom-format"> Custom format</a> for more details |
| Custom | All streams will be replicated to a single user-defined namespace. See<a href="/understanding-airbyte/namespaces#--custom-format"> Custom format</a> for more details |
| Destination-defined | All streams will be replicated to the single default namespace defined in the Destination's settings. |
| Source-defined | Some sources (for example, databases) provide namespace information for a stream. If a source provides namespace information, the destination will mirror the same namespace when this configuration is set. For sources or streams where the source namespace is not known, the behavior will default to the "Destination default" option. |
Most of our destinations support this feature. To learn if your connector supports this, head to the individual connector page to learn more. If your desired destination doesn't support it, you can ignore this feature.
@@ -26,7 +27,19 @@ In a source, the namespace is the location from where the data is replicated to
Airbyte supports namespaces and allows Sources to define namespaces, and Destinations to write to various namespaces. In Airbyte, the following options are available and are set on each individual connection.
### Destination default
### Custom
When replicating multiple sources into the same destination, you may create table conflicts where tables are overwritten by different syncs. This is where using a custom namespace will ensure data is synced accurately.
For example, a Github source can be replicated into a `github` schema. However, you may have multiple connections writing from different GitHub repositories \(common in multi-tenant scenarios\).
:::tip
To write more than 1 table with the same name to your destination, Airbyte recommends writing the connections to unique namespaces to avoid mixing data from the different GitHub repositories.
:::
You can enter plain text (most common) or additionally add a dynamic parameter `${SOURCE_NAMESPACE}`, which uses the namespace provided by the source if available.
### Destination-defined
All streams will be replicated and stored in the default namespace defined on the destination settings page, which is typically defined when the destination was set up. Depending on your destination, the namespace refers to:
@@ -45,21 +58,9 @@ All streams will be replicated and stored in the default namespace defined on th
If you prefer to replicate multiple sources into the same namespace, use the `Stream Prefix` configuration to differentiate data from these sources to ensure no streams collide when writing to the destination.
:::
### Mirror source structure
### Source-Defined
Some sources \(such as databases based on JDBC\) provide namespace information from which a stream has been extracted. Whenever a source is able to fill this field in the catalog.json file, the destination will try to write to exactly the same namespace when this configuration is set. For sources or streams where the source namespace is not known, the behavior will fall back to the "Destination default". Most APIs do not provide namespace information.
### Custom format
When replicating multiple sources into the same destination, you may create table conflicts where tables are overwritten by different syncs. This is where using a custom namespace will ensure data is synced accurately.
For example, a Github source can be replicated into a `github` schema. However, you may have multiple connections writing from different GitHub repositories \(common in multi-tenant scenarios\).
:::tip
To write more than 1 table with the same name to your destination, Airbyte recommends writing the connections to unique namespaces to avoid mixing data from the different GitHub repositories.
:::
You can enter plain text (most common) or additionally add a dynamic parameter `${SOURCE_NAMESPACE}`, which uses the namespace provided by the source if available.
Some sources \(such as databases based on JDBC\) provide namespace information from which a stream has been extracted. Whenever a source is able to fill this field in the catalog.json file, the destination will try to write to exactly the same namespace when this configuration is set. For sources or streams where the source namespace is not known, the behavior will fall back to the default namespace defined in the destination configuration. Most APIs do not provide namespace information.
### Examples

View File

@@ -26,11 +26,11 @@ A connection is an automated data pipeline that replicates data from a source to
| Concept | Description |
|-----------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| [Replication Frequency](/using-airbyte/core-concepts/sync-schedules.md) | When should a data sync be triggered? |
| [Destination Namespace and Stream Prefix](/using-airbyte/core-concepts/namespaces.md) | Where should the replicated data be written? |
| [Sync Mode](/using-airbyte/core-concepts/sync-modes/README.md) | How should the streams be replicated (read and written)? |
| [Schema Propagation](/cloud/managing-airbyte-cloud/manage-schema-changes.md) | How should Airbyte handle schema drift in sources? |
| [Catalog Selection](/cloud/managing-airbyte-cloud/configuring-connections.md#modify-streams-in-your-connection) | What data should be replicated from the source to the destination? |
| [Sync Mode](/using-airbyte/core-concepts/sync-modes/README.md) | How should the streams be replicated (read and written)? |
| [Sync Schedule](/using-airbyte/core-concepts/sync-schedules.md) | When should a data sync be triggered? |
| [Destination Namespace and Stream Prefix](/using-airbyte/core-concepts/namespaces.md) | Where should the replicated data be written? |
| [Schema Propagation](/cloud/managing-airbyte-cloud/manage-schema-changes.md) | How should Airbyte handle schema drift in sources? |
## Stream
@@ -51,7 +51,7 @@ Examples of fields:
- A column in the table in a relational database
- A field in an API response
## Sync Schedules
## Sync Schedule
There are three options for scheduling a sync to run:

View File

@@ -4,13 +4,13 @@ products: all
# Sync Schedules
For each connection, you can select between three options that allow a sync to run. The three options for `Replication Frequency` are:
For each connection, you can select between three options that allow a sync to run. The three options for `Schedule Type` are:
- Scheduled (e.g. every 24 hours, every 2 hours)
- Cron scheduling
- Cron
- Manual
## Sync Limitations
## Sync Considerations
* Only one sync per connection can run at a time.
* If a sync is scheduled to run before the previous sync finishes, the scheduled sync will start after the completion of the previous sync.
@@ -21,6 +21,15 @@ For Scheduled or cron scheduled syncs, Airbyte guarantees syncs will initiate wi
:::
## Scheduled syncs
You can choose between the following scheduled options:
- Every 24 hours (most common)
- Every 12 hours
- Every 8 hours
- Every 6 hours
- Every 3 hours
- Every 2 hours
- Every 1 hour
When a scheduled connection is first created, a sync is executed immediately after creation. After that, a sync is run once the time since the last sync \(whether it was triggered manually or due to a schedule\) has exceeded the schedule interval. For example:
- **October 1st, 2pm**, a user sets up a connection to sync data every 24 hours.
@@ -30,7 +39,7 @@ When a scheduled connection is first created, a sync is executed immediately aft
- **October 3rd, 2:01pm:** since the last sync was less than 24 hours ago, no sync is run
- **October 3rd, 5:01pm:** It has been more than 24 hours since the last sync, so a sync is run
## Cron Scheduling
## Cron Syncs
If you prefer more precision in scheduling your sync, you can also use CRON scheduling to set a specific time of day or month.
Airbyte uses the CRON scheduler from [Quartz](http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html). We recommend reading their [documentation](http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html) to understand the required formatting. You can also refer to these examples:

View File

@@ -11,7 +11,7 @@ Destinations are the data warehouses, data lakes, databases and analytics tools
Once you've signed up for Airbyte Cloud or logged in to your Airbyte Open Source deployment, click on the **Destinations** tab in the navigation bar found on the left side of the dashboard. This will take you to the list of available destinations.
![Destination List](./assets/getting-started-destination-list.png)
![Destination List](./assets/getting-started-destination-catalog.png)
You can use the provided search bar at the top of the page, or scroll down the list to find the destination you want to replicate data from.

View File

@@ -6,8 +6,7 @@ products: all
Setting up a new source in Airbyte is a quick and simple process! When viewing the Airbyte UI, you'll see the main navigation bar on the left side of your screen. Click the **Sources** tab to bring up a list of all available sources.
<Arcade id="4V0TGOX02P0rwVNwz4MR" title="Getting Started (Source)" />
<Arcade id="WjbChISa931Hc55yx4cb" title="Getting Started (Source)" />
You can use the provided search bar, or simply scroll down the list to find the source you want to replicate data from. Let's use a demo source, Faker, as an example. Clicking on the **Sample Data (Faker)** card will bring us to its setup page.

Binary file not shown.

After

Width:  |  Height:  |  Size: 186 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 181 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 317 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 146 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 183 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 156 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 164 KiB

View File

@@ -7,41 +7,50 @@ import TabItem from "@theme/TabItem";
# Set up a Connection
Now that you've learned how to set up your first [source](./add-a-source) and [destination](./add-a-destination), it's time to finish the job by creating your very first connection!
Now that you've learned how to set up your first [source](./add-a-source) and [destination](./add-a-destination), it's time to finish the setup by creating your very first connection!
On the left side of your main Airbyte dashboard, select **Connections**. You will be prompted to choose which source and destination to use for this connection. For this example, we'll use the **Google Sheets** source and the destination you previously set up, either **Local JSON** or **Google Sheets**.
## Configure the connection
Once you've chosen your source and destination, you'll be able to configure the connection. You can refer to [this page](/cloud/managing-airbyte-cloud/configuring-connections.md) for more information on each available configuration. For this demo, we'll simply set the **Replication frequency** to a 24 hour interval and leave the other fields at their default values.
Once you've chosen your source and destination you can configure the connection. You'll first be asked a few questions about how your data should sync, these correlate to our sync modes which you can read more about on [this page](/cloud/managing-airbyte-cloud/configuring-connections.md).
![Connection config](./assets/getting-started-connection-configuration.png)
Most users select "Mirror Source", which will simply copy the data from the source to the destination where you'll see one row in the destination for each row in the source. If you prefer to Append Historical Changes or take a Full Snapshot with each sync, you can optionally select those options, but keep in mind those will create duplicate records in your destination. The sync mode we choose for all the enabled streams will reflect your selection here.
<Arcade id="9E7CQiWoHtFvB12Yd5zN" title="Getting Started (Select Streams)" />
Next, you can toggle which streams you want to replicate. Our test data consists of three streams, which we've enabled and set to `Incremental - Append + Deduped` sync mode.
![Setup streams](./assets/getting-started-select-streams.png)
Your sync mode is already determined by your selection above, but you can change the sync mode for an individual stream. You can also select a cursor or primary key to enable incremental and/or deduplication. For more information on the nature of each sync mode supported by Airbyte, see [this page](/using-airbyte/core-concepts/sync-modes).
You can also select individual fields to sync on this page. Expand the fields available by clicking any stream. This is helpful when you have security concerns or don't want to sync all the data from the source.
![Column Selection](./assets/getting-started-column-selection.png)
Click **Next** to complete your stream setup and move to the connection configuration. This is where you'll set up how often your data will sync and where it will live in the destination. For this demo, we'll set the connection to run at 8 AM every day and sync the connection to a custom namespace with a stream prefix.
<Arcade id="KdySgaUBwroRxkYLnemX" title="Getting Started (Configure Connection)" />
:::note
By default, data will sync to the default defined in the destination. To ensure your data is synced to the correct place, see our examples for [Destination Namespace](/using-airbyte/core-concepts/namespaces.md)
To ensure your data is synced to the correct place, see our examples for [Destination Namespace](/using-airbyte/core-concepts/namespaces.md)
:::
Next, you can toggle which streams you want to replicate, as well as setting up the desired sync mode for each stream. For more information on the nature of each sync mode supported by Airbyte, see [this page](/using-airbyte/core-concepts/sync-modes).
Our test data consists of three streams, which we've enabled and set to `Incremental - Append + Deduped` sync mode.
![Stream config](./assets/getting-started-stream-selection.png)
Click **Set up connection** to complete your first connection. Your first sync is about to begin!
Once you've set up all the connection settings, click "Set up connection". You've successfully set up your first data pipeline with Airbyte. Your first sync is about to begin!
## Connection Overview
Once you've finished setting up the connection, you will be automatically redirected to a connection overview containing all the tools you need to keep track of your connection.
![Connection dashboard](./assets/getting-started-connection-complete.png)
![Connection dashboard](./assets/getting-started-status-page.png)
Here's a basic overview of the tabs and their use:
1. The **Status** tab shows you an overview of your connector's sync health.
2. The **Job History** tab allows you to check the logs for each sync. If you encounter any errors or unexpected behaviors during a sync, checking the logs is always a good first step to finding the cause and solution.
3. The **Replication** tab allows you to modify the configurations you chose during the connection setup.
3. The **Schema** tab allows you to modify the streams you chose during the connection setup.
4. The **Transformation** tab allows you to set up a custom post-sync transformations using dbt.
4. The **Settings** tab contains additional settings, and the option to delete the connection if you no longer wish to use it.
4. The **Settings** tab contains the connection settings, and the option to delete the connection if you no longer wish to use it.
### Check the data from your first sync

View File

@@ -477,11 +477,6 @@ module.exports = {
"cloud/managing-airbyte-cloud/manage-connection-state",
],
},
{
type: "doc",
label: "Using PyAirbyte",
id: "using-airbyte/pyairbyte/getting-started",
},
{
type: "category",
label: "Workspace Management",
@@ -569,6 +564,11 @@ module.exports = {
type: "doc",
id: "terraform-documentation",
},
{
type: "doc",
label: "Using PyAirbyte",
id: "using-airbyte/pyairbyte/getting-started",
},
understandingAirbyte,
contributeToAirbyte,
{