1
0
mirror of synced 2025-12-25 02:09:19 -05:00

update incremental docs (#1364)

This commit is contained in:
Charles
2020-12-17 09:38:47 -08:00
committed by GitHub
parent a4c05f2d56
commit e5fe44eaa1
3 changed files with 25 additions and 18 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

View File

@@ -1,26 +1,22 @@
# Incremental
# Incremental - Append
## Overview
Incremental syncs in Airbyte allow sources to replicate only new or modified data. This prevents re-fetching data that you have already replicated from a source. We will call this set of new or updated records the delta going forward.
Sources syncing with **Incremental - Append** only replicate _new_ or _modified_ data. This prevents re-fetching data that you have already replicated from a source.
## Configuration
In this flavor of incremental, records in the warehouse will never be deleted or mutated. A copy of each new or updated record is _appended_ to the data in the warehouse. This means you can find multiple copies of the same record in the destination warehouse. We provide an "at least once" guarantee of replicating each record that is present when the sync runs.
For a source to do incremental sync is must be able to keep track of new and updated records. This can take a couple different forms. Before we jump into them, we are going to use the word cursor or cursor field to describe the field or column in the data that Airbyte uses as a comparable to determine if any given record is new or has been updated since the last sync.
## Definitions
## Source-Defined Cursor.
We are going to use the word `cursor` to describe the value used to track whether a record should be replicated in an incremental sync. A common example of a `cursor` would be a timestamp from an `updated_at` column in a database table.
Some sources are able to determine the cursor that the use without any user input. For example, in the exchange rates api source, the source itself can determine that date field should be used to determine the last record that was synced. In these cases, the source will set the `cursor_field` attribute in the `AirbyteStream`.
While a `cursor` is the _value_ that is used to determine if a record should be replicated, a `cursor field` is the _field_ or _column_ in the data where that cursor can be found. Extending the above example, the `updated_at` column in the database would be the `cursor field`.
## User-Defined Cursor
Some sources cannot define the cursor without user input. For example, in the postgres source, the user needs to choose for themselves which database tables they want to sync. The author of the source cannot predict this. In these cases the user sets the `cursor_field` in the `ConfiguredAirbyteStream`.
In some cases, the source may propose a `default_cursor_field` in the `AirbyteStream`. In this case, if the user does not specify a `cursor_field` in the `ConfiguredAirbyteStream`, Airbyte will fallback on the default provided by the source. The user is allowed to override the source's `default_cursor_field` by setting the `cursor_field` value in the `ConfiguredAirbyteStream`, but they CANNOT override the `cursor_field` specified in an `AirbyteStream`
We will refer to the set of records that the source identifies as being new or updated as a `delta`.
## Rules
The delta from a sync will be _appended_ to the existing data in the data warehouse. Incremental will never delete or mutate existing records. Let's walk through a few examples.
As mentioned above, the delta from a sync will be _appended_ to the existing data in the data warehouse. Incremental will never delete or mutate existing records. Let's walk through a few examples.
### Newly Created Record
@@ -33,13 +29,13 @@ Assume that `updated_at` is our `cursor_field`. Let's say the following data alr
]
```
In the next sync the delta contains the following record:
In the next sync, the delta contains the following record:
```javascript
{ "name": "Louis XVII", "deceased": false, "updated_at": 1785 }
```
At the end of this incremental sync the data warehouse would now contain:
At the end of this incremental sync, the data warehouse would now contain:
```javascript
[
@@ -51,7 +47,7 @@ At the end of this incremental sync the data warehouse would now contain:
### Updating a Record
Let's assume that our warehouse contains all of the data that it did at the end of the previous section. Now unfortunately the king and queen lose their heads. Let's see that delta:
Let's assume that our warehouse contains all the data that it did at the end of the previous section. Now unfortunately the king and queen lose their heads. Let's see that delta:
```javascript
[
@@ -60,7 +56,7 @@ Let's assume that our warehouse contains all of the data that it did at the end
]
```
The output we expect to see in the warehouse is as follows.
The output we expect to see in the warehouse is as follows:
```javascript
[
@@ -72,7 +68,18 @@ The output we expect to see in the warehouse is as follows.
]
```
### Schema Migration
## Source-Defined Cursor
If the schema for the stream changes, Airbyte will _not_ allow an incremental sync to that stream. The user must first run a full refresh.
Some sources are able to determine the cursor that they use without any user input. For example, in the [exchange rates source](../integrations/sources/exchangeratesapi-io.md), the source knows that the date field should be used to determine the last record that was synced. In these cases, simply select the incremental option in the UI.
![](../.gitbook/assets/incremental_source_defined.png)
(You can find a more technical details about the configuration data model [here](catalog.md)).
## User-Defined Cursor
Some sources cannot define the cursor without user input. For example, in the [postgres source](../integrations/sources/postgres.md), the user needs to choose which column in a database table they want to use as the `cursor field`. In these cases, select the column in the sync settings dropdown that should be used as the `cursor field`.
![](../.gitbook/assets/incremental_user_defined.png)
(You can find a more technical details about the configuration data model [here](catalog.md)).