* Fix more typos in the docs * fix another typo * add "to" * make "example" plural * remove extra words * add period * remove extra "the" * use `source_defined_cursor` instead of `cursor_field` for `AirbyteStream` * Use ConfiguredAirbyteCatalog instead of AirbyteCatalog
3.7 KiB
3.7 KiB
AirbyteCatalog Reference
Overview
An AirbyteCatalog is a struct that is produced by the discover action of a source. It is a list of AirbyteStreams. Each AirbyteStream describes the data available to be synced from the source. After a source produces an AirbyteCatalog or AirbyteStream, they should be treated as read only. A ConfiguredAirbyteCatalog is a list of ConfiguredAirbyteStreams. Each ConfiguredAirbyteStream describes how to sync an AirbyteStream.
Cursor
- The cursor is how sources track which records are new or updated since the last sync.
- A "cursor field" is the field that is used as a comparable for making this determinations.
- If a configuration requires a cursor field, it requires an array of strings that serves as a path to the desired field. e.g. if the structure of a stream is
{ value: 2, metadata: { updated_at: 2020-11-01 } }thedefault_cursor_fieldmight be["metadata", "updated_at"].
- If a configuration requires a cursor field, it requires an array of strings that serves as a path to the desired field. e.g. if the structure of a stream is
AirbyteStream
This section will document the meaning of each field in an AirbyteStream
json_schema- This field contains a JsonSchema representation of the schema of the stream.supported_sync_modes- The sync modes that the stream supports. By default, all sources supportFULL_REFRESH. Even if this array is empty, it can be assumed that a source supportsFULL_REFRESH. The allowed sync modes areFULL_REFRESHandINCREMENTAL.source_defined_cursor- If a source supports theINCREMENTALsync mode, and it sets this field to true, it is responsible for determining internally how it tracks which records in a source are new or updated since the last sync. It is an array of keys to a field in the schema.default_cursor_field- If a source supports theINCREMENTALsync mode, it may, optionally, set this field. If this field is set, and the user does not override it with thecursor_fieldattribute in theConfiguredAirbyteStream(described below), this field will be used as the cursor.
ConfiguredAirbyteStream
This section will document the meaning of each field in an ConfiguredAirbyteStream
stream- This field contains theAirbyteStreamthat it is configured.sync_mode- The sync mode that will be used to sync that stream. The value in this field MUST be present in thesupported_sync_modesarray for the discoveredAirbyteStreamof this stream.cursor_field- This field is an array of keys to a field in the schema that in theINCREMENTALsync mode will be used to determine if a record is new or updated since the last sync.- If an
AirbyteStreamhassource_defined_cursorset totrue, then thecursor_fieldattribute inConfiguredAirbyteStreamwill be ignored. - If an
AirbyteStreamdefines adefault_cursor_field, then thecursor_fieldattribute inConfiguredAirbyteStreamis not required, but if it is set, it will override the default value. - If an
AirbyteStreamdoes not define acursor_fieldor adefault_cursor_field, thenConfiguredAirbyteStreammust define acursor_field.
- If an
Logic for resolving the Cursor Field
This section lays out how a cursor field is determined in the case of a Stream that is doing an incremental sync.
- If
source_defined_cursorinAirbyteStreamis true, then the source determines the cursor field internally. It cannot be overriden. If it is false, continue... - If
cursor_fieldinConfiguredAirbyteStreamis set, then the source uses that field as the cursor. If it is not set, continue... - If
default_cursor_fieldinAirbyteStreamis set, then the sources use that field as the cursor. If it is not set, continue... - Illegal - If
source_defined_cursor,cursor_field, anddefault_cursor_fieldare all falsey, this is an invalid configuration.