1
0
mirror of synced 2026-01-07 18:06:03 -05:00
Files
airbyte/docs/connector-development/config-based/record-selector.md
Alexandre Girard 288c3cabad Tutorial and documentation for config-based connectors (#15027)
* 5-step tutorial

* move

* tiny bit of editing

* Update tutorial

* update docs

* reset

* move files

* record selector, request options, and more links

* update

* update

* connector definition

* link

* links

* update example

* footnote

* typo

* document string interpolation

* note on string interpolation

* update

* fix code sample

* fix

* update sample

* fix

* use the actual config

* Update as per comments

* write as yaml

* typo

* Clarify options overloading

* clarify that docker must be running

* remove extra footnote

* use venv directly

* Apply suggestions from code review

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* signup instructions

* update

* clarify that both dot and bracket notations are interchangeable

* Clarify how check works

* create spec and config before updating connector definition

* clarify what now_local() is

* rename to yaml structure

* Go through tutorial and update end of section code samples

* fix link

* update

* update code samples

* Update code samples

* Update to bracket notation

* remove superfluous comments

* Update docs/connector-development/config-based/tutorial/2-install-dependencies.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/4-reading-data.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* fix path

* update

* motivation blurp

* warning

* warning

* fix code block

* update code samples

* update code sample

* update code samples

* small updates

* update yaml structure

* custom class example

* language annotations

* update warning

* Update tutorial to use dpath extractor

* Update record selector docs

* unit test

* link to contributing

* tiny update

* $ in front of commands

* $ in front of commands

* More readings

* link to existing config-based connectors

* index

* update

* delete broken link

* supported features

* update

* Add some links

* Update docs/connector-development/config-based/overview.md

Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>

* Update docs/connector-development/config-based/record-selector.md

Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>

* Update docs/connector-development/config-based/overview.md

Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>

* Update docs/connector-development/config-based/overview.md

Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>

* Update docs/connector-development/config-based/overview.md

Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>

* mention the unit

* headers

* remove mentions of interpolating on stream slice, etc.

* update

* exclude config-based docs

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Augustin <augustin.lafanechere@gmail.com>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
2022-08-12 15:50:54 -07:00

4.0 KiB

Record selector

The record selector is responsible for translating an HTTP response into a list of Airbyte records by extracting records from the response and optionally filtering and shaping records based on a heuristic.

The current record extraction implementation uses dpath to select records from the json-decoded HTTP response.

Common recipes:

Here are some common patterns:

Selecting the whole response

If the root of the response is an array containing the records, the records can be extracted using the following definition:

selector:
  extractor:
    field_pointer: [ ]

If the root of the response is a json object representing a single record, the record can be extracted and wrapped in an array.

For example, given a response body of the form

{
  "id": 1
}

and a selector

selector:
  extractor:
    field_pointer: [ ]

The selected records will be

[
  {
    "id": 1
  }
]

Selecting a field

Given a response body of the form

{
  "data": [{"id": 0}, {"id": 1}],
  "metadata": {"api-version": "1.0.0"}
}

and a selector

selector:
  extractor:
    field_pointer: [ "data" ]

The selected records will be

[
  {
    "id": 0
  },
  {
    "id": 1
  }
]

Selecting an inner field

Given a response body of the form

{
  "data": {
    "records": [
      {
        "id": 1
      },
      {
        "id": 2
      }
    ]
  }
}

and a selector

selector:
  extractor:
    field_pointer:
      - "data"
      - "records"

The selected records will be

[
  {
    "id": 1
  },
  {
    "id": 2
  }
]

Filtering records

Records can be filtered by adding a record_filter to the selector. The expression in the filter will be evaluated to a boolean returning true the record should be included.

In this example, all records with a created_at field greater than the stream slice's start_time will be filtered out:

selector:
  extractor:
    field_pointer: [ ]
  record_filter:
    condition: "{{ record['created_at'] < stream_slice['start_time'] }}"

Transformations

Fields can be added or removed from records by adding Transformations to a stream's definition.

Adding fields

Fields can be added with the AddFields transformation. This example adds a top-level field "field1" with a value "static_value"

stream:
  <...>
  transformations:
      - type: AddFields
        fields:
          - path: [ "field1" ]
            value: "static_value"

This example adds a top-level field "start_date", whose value is evaluated from the stream slice:

stream:
  <...>
  transformations:
      - type: AddFields
        fields:
          - path: [ "start_date" ]
            value: {{ stream_slice[ 'start_date' ] }}

Fields can also be added in a nested object by writing the fields' path as a list.

Given a record of the following shape:

{
  "id": 0,
  "data":
  {
    "field0": "some_data"
  }
}

this definition will add a field in the "data" nested object:

stream:
  <...>
  transformations:
      - type: AddFields
        fields:
          - path: [ "data", "field1" ]
            value: "static_value"

resulting in the following record:

{
  "id": 0,
  "data":
  {
    "field0": "some_data",
    "field1": "static_value"
  }
}

Removing fields

Fields can be removed from records with the RemoveFields transformation.

Given a record of the following shape:

{
  "path": 
  {
    "to":
    {
      "field1": "data_to_remove",
      "field2": "data_to_keep"
    }
  },
  "path2": "data_to_remove",
  "path3": "data_to_keep"
}

this definition will remove the 2 instances of "data_to_remove" which are found in "path2" and "path.to.field1":

the_stream:
  <...>
  transformations:
      - type: RemoveFields
        field_pointers:
          - [ "path", "to", "field1" ]
          - [ "path2" ]

resulting in the following record:

{
  "path": 
  {
    "to":
    {
      "field2": "data_to_keep"
    }
  },
  "path3": "data_to_keep"
}