1
0
mirror of synced 2026-01-27 07:02:03 -05:00
Files
airbyte/docs/connector-development/config-based/advanced-topics.md
Brian Lai cbf9ea76c1 [Low-Code CDK] Construct declarative components from Pydantic models (#21050)
* initial work to parse manifest objects into pydantic models

* pr feedback and some other validations

* rerun manifest schema generation

* remove field constraint due to bug

* initial work to construct most components from greenhouse

* custom components parse subcomponent fields correctly and adding a few more component constructors

* construct components from gnews

* first pass at posthog.yaml

* Handle nested custom components with list values.
Also includes updates to posthog.yaml, including autoformatting changes.

* adding constructors for slicers, filters, and transformations and a few bug fixes

* make sed work across multiple OS

* add NoAuth component

* fix handling of custom components with nested list

* Autogenerate `TYPE_NAME_TO_MODEL` mapping

* Handle default kwargs not defined on model for custom components

* Re-add `options` for CartesianProductStreamSlicer for backwards compat
with custom stream slicers

* add basic unit tests for the model component factory

* add back defaults and extra parameters like options to retain compatibility with legacy flow and backwards compatibility

* Remove `_get_defaults`; using actual default values on classes instead

* Add backoff strategy component creation functions

* add back defaults and extra parameters like options to retain compatibility with legacy flow and backwards compatibility

* add lots of tests to construct components from the pydantic models and a few bug fixes

* add a few tests for the model to component factory

* add catch

* fix a bug where propagated schema doesn't work with old factory

* clean up a few files

* add type inference for custom components, more tests and some refactoring of the model factory

* template, docs, manifest updates, pr feedback and some cleanup

* pr feedback and polish schema a bit

* fix tests from the latest rebase of master

* fix the last few bugs I found and adjust a few sources that weren't perfectly compatible with the new component flow

* fix CheckStream bug cleanup and a few small tweaks and polish

* add additional test to cover bug case

* fix formatting

* 🤖 Bump minor version of Airbyte CDK

Co-authored-by: Catherine Noll <noll.catherine@gmail.com>
Co-authored-by: brianjlai <brianjlai@users.noreply.github.com>
2023-01-12 21:02:08 -05:00

303 lines
8.9 KiB
Markdown

# Advanced Topics
## Object instantiation
This section describes the object that are to be instantiated from the YAML definition.
If the component is a literal, then it is returned as is:
```
3
```
will result in
```
3
```
If the component definition is a mapping with a "type" field,
the factory will lookup the [CLASS_TYPES_REGISTRY](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/class_types_registry.py) and replace the "type" field by "class_name" -> CLASS_TYPES_REGISTRY[type]
and instantiate the object from the resulting mapping
If the component definition is a mapping with neither a "class_name" nor a "type" field,
the factory will do a best-effort attempt at inferring the component type by looking up the parent object's constructor type hints.
If the type hint is an interface present in [DEFAULT_IMPLEMENTATIONS_REGISTRY](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/default_implementation_registry.py,
then the factory will create an object of its default implementation.
If the component definition is a list, then the factory will iterate over the elements of the list,
instantiate its subcomponents, and return a list of instantiated objects.
If the component has subcomponents, the factory will create the subcomponents before instantiating the top level object
```
{
"type": TopLevel
"param":
{
"type": "ParamType"
"k": "v"
}
}
```
will result in
```
TopLevel(param=ParamType(k="v"))
```
More details on object instantiation can be found [here](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.parsers.html?highlight=factory#airbyte_cdk.sources.declarative.parsers.factory.DeclarativeComponentFactory).
## $options
Parameters can be passed down from a parent component to its subcomponents using the $options key.
This can be used to avoid repetitions.
Schema:
```yaml
"$options":
type: object
additionalProperties: true
```
Example:
```yaml
outer:
$options:
MyKey: MyValue
inner:
k2: v2
```
This the example above, if both outer and inner are types with a "MyKey" field, both of them will evaluate to "MyValue".
These parameters can be overwritten by subcomponents as a form of specialization:
```yaml
outer:
$options:
MyKey: MyValue
inner:
$options:
MyKey: YourValue
k2: v2
```
In this example, "outer.MyKey" will evaluate to "MyValue", and "inner.MyKey" will evaluate to "YourValue".
The value can also be used for string interpolation:
```yaml
outer:
$options:
MyKey: MyValue
inner:
k2: "MyKey is {{ options['MyKey'] }}"
```
In this example, outer.inner.k2 will evaluate to "MyKey is MyValue"
## References
Strings can contain references to previously defined values.
The parser will dereference these values to produce a complete object definition.
References can be defined using a "*ref({arg})" string.
```yaml
key: 1234
reference: "*ref(key)"
```
will produce the following definition:
```yaml
key: 1234
reference: 1234
```
This also works with objects:
```yaml
key_value_pairs:
k1: v1
k2: v2
same_key_value_pairs: "*ref(key_value_pairs)"
```
will produce the following definition:
```yaml
key_value_pairs:
k1: v1
k2: v2
same_key_value_pairs:
k1: v1
k2: v2
```
The $ref keyword can be used to refer to an object and enhance it with addition key-value pairs
```yaml
key_value_pairs:
k1: v1
k2: v2
same_key_value_pairs:
$ref: "*ref(key_value_pairs)"
k3: v3
```
will produce the following definition:
```yaml
key_value_pairs:
k1: v1
k2: v2
same_key_value_pairs:
k1: v1
k2: v2
k3: v3
```
References can also point to nested values.
Nested references are ambiguous because one could define a key containing with `.`
in this example, we want to refer to the limit key in the dict object:
```yaml
dict:
limit: 50
limit_ref: "*ref(dict.limit)"
```
will produce the following definition:
```yaml
dict
limit: 50
limit-ref: 50
```
whereas here we want to access the `nested.path` value.
```yaml
nested:
path: "first one"
nested.path: "uh oh"
value: "ref(nested.path)
```
will produce the following definition:
```yaml
nested:
path: "first one"
nested.path: "uh oh"
value: "uh oh"
```
To resolve the ambiguity, we try looking for the reference key at the top-level, and then traverse the structs downward
until we find a key with the given path, or until there is nothing to traverse.
More details on referencing values can be found [here](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.parsers.html?highlight=yamlparser#airbyte_cdk.sources.declarative.parsers.yaml_parser.YamlParser).
## String interpolation
String values can be evaluated as Jinja2 templates.
If the input string is a raw string, the interpolated string will be the same.
`"hello world" -> "hello world"`
The engine will evaluate the content passed within `{{...}}`, interpolating the keys from context-specific arguments.
The "options" keyword [see ($options)](#options) can be referenced.
For example, some_object.inner_object.key will evaluate to "Hello airbyte" at runtime.
```yaml
some_object:
$options:
name: "airbyte"
inner_object:
key: "Hello {{ options.name }}"
```
Some components also pass in additional arguments to the context.
This is the case for the [record selector](./understanding-the-yaml-file/record-selector.md), which passes in an additional `response` argument.
Both dot notation and bracket notations (with single quotes ( `'`)) are interchangeable.
This means that both these string templates will evaluate to the same string:
1. `"{{ options.name }}"`
2. `"{{ options['name'] }}"`
In addition to passing additional values through the $options argument, macros can be called from within the string interpolation.
For example,
`"{{ max(2, 3) }}" -> 3`
The macros available can be found [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/declarative/interpolation/macros.py).
Additional information on jinja templating can be found at [https://jinja.palletsprojects.com/en/3.1.x/templates/#](https://jinja.palletsprojects.com/en/3.1.x/templates/#)
## Component schema reference
A JSON schema representation of the relationships between the components that can be used in the YAML configuration can be found [here](../../../airbyte-cdk/python/airbyte_cdk/sources/declarative/declarative_component_schema.yaml).
## Custom components
:::info
Please help us improve the low code CDK! If you find yourself needing to build a custom component,please [create a feature request issue](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fenhancement%2C+%2Cneeds-triage%2C+area%2Flow-code%2Fcomponents&template=feature-request.md&title=Low%20Code%20Feature:). If appropriate, we'll add it directly to the framework (or you can submit a PR)!
If an issue already exist for the missing feature you need, please upvote or comment on it so we can prioritize the issue accordingly.
:::
Any built-in components can be overloaded by a custom Python class.
To create a custom component, define a new class in a new file in the connector's module.
The class must implement the interface of the component it is replacing. For instance, a pagination strategy must implement `airbyte_cdk.sources.declarative.requesters.paginators.strategies.pagination_strategy.PaginationStrategy`.
The class must also be a dataclass where each field represents an argument to configure from the yaml file, and an `InitVar` named options.
For example:
```
@dataclass
class MyPaginationStrategy(PaginationStrategy):
my_field: Union[InterpolatedString, str]
options: InitVar[Mapping[str, Any]]
def __post_init__(self, options: Mapping[str, Any]):
pass
def next_page_token(self, response: requests.Response, last_records: List[Mapping[str, Any]]) -> Optional[Any]:
pass
def reset(self):
pass
```
This class can then be referred from the yaml file by specifying the type of custom component and using its fully qualified class name:
```yaml
pagination_strategy:
type: "CustomPaginationStrategy"
class_name: "my_connector_module.MyPaginationStrategy"
my_field: "hello world"
```
## How the framework works
1. Given the connection config and an optional stream state, the `StreamSlicer` computes the stream slices to read.
2. Iterate over all the stream slices defined by the stream slicer.
3. For each stream slice,
1. Submit a request to the partner API as defined by the requester
2. Select the records from the response
3. Repeat for as long as the paginator points to a next page
[connector-flow](./assets/connector-flow.png)
## More readings
- [Record selector](./understanding-the-yaml-file/record-selector.md)
- [Stream slicers](./understanding-the-yaml-file/stream-slicers.md)
- [Source schema](../../../airbyte-cdk/python/airbyte_cdk/sources/declarative/declarative_component_schema.yaml)