## What Migrating Pydantic V2 for Protocol Messages to speed up emitting records. This gives us 2.5x boost over V1. Close https://github.com/airbytehq/airbyte-internal-issues/issues/8333 ## How - Switch to using protocol models generated for pydantic_v2, in a new (temporary) package, `airbyte-protocol-models-pdv2` . - Update pydantic dependency of the CDK accordingly to v2. - For minimal impact, still use the compatibility code `pydantic.v1` in all of our pydantic code from airbyte-cdk that does not interact with the protocol models. ## Review guide 1. Checkout the code and clear your CDK virtual env (either `rm -rf .venv && python -m venv .venv` or `poetry env list; poetry env remove <env>`. This is necessary to fully clean out the `airbyte_protocol` library, for some reason. Then: `poetry lock --no-update && poetry install --all-extras`. This should install the CDK with new models. 2. Run unit tests on the CDK 3. Take your favorite connector and point it's `pyproject.toml` on local CDK (see example in `source-s3`) and try running it's tests and it's regression tests. ## User Impact > [!warning] > This is a major CDK change due to the pydantic dependency change - if connectors use pydantic 1.10, they will break and will need to do similar `from pydantic.v1` updates to get running again. Therefore, we should release this as a major CDK version bump. ## Can this PR be safely reverted and rolled back? - [x] YES 💚 - [ ] NO ❌ Even if sources migrate to this version, state format should not change, so a revert should be possible. ## Follow up work - Ella to move into issues <details> ### Source-s3 - turn this into an issue - [ ] Update source s3 CDK version and any required code changes - [ ] Fix source-s3 unit tests - [ ] Run source-s3 regression tests - [ ] Merge and release source-s3 by June 21st ### Docs - [ ] Update documentation on how to build with CDK ### CDK pieces - [ ] Update file-based CDK format validation to use Pydantic V2 - This is doable, and requires a breaking change to change `OneOfOptionConfig`. There are a few unhandled test cases that present issues we're unsure of how to handle so far. - [ ] Update low-code component generators to use Pydantic V2 - This is doable, there are a few issues around custom component generation that are unhandled. ### Further CDK performance work - create issues for these - [ ] Research if we can replace prints with buffered output (write to byte buffer and then flush to stdout) - [ ] Replace `json` with `orjson` ... </details>
93 lines
2.8 KiB
Python
93 lines
2.8 KiB
Python
#
|
|
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
|
#
|
|
|
|
from typing import List, Union
|
|
|
|
from airbyte_cdk.sources.config import BaseConfig
|
|
from pydantic.v1 import BaseModel, Field
|
|
|
|
|
|
class InnerClass(BaseModel):
|
|
field1: str
|
|
field2: int
|
|
|
|
|
|
class Choice1(BaseModel):
|
|
selected_strategy = Field("option1", const=True)
|
|
|
|
name: str
|
|
count: int
|
|
|
|
|
|
class Choice2(BaseModel):
|
|
selected_strategy = Field("option2", const=True)
|
|
|
|
sequence: List[str]
|
|
|
|
|
|
class SomeSourceConfig(BaseConfig):
|
|
class Config:
|
|
title = "Some Source"
|
|
|
|
items: List[InnerClass]
|
|
choice: Union[Choice1, Choice2]
|
|
|
|
|
|
class TestBaseConfig:
|
|
EXPECTED_SCHEMA = {
|
|
"properties": {
|
|
"choice": {
|
|
"oneOf": [
|
|
{
|
|
"properties": {
|
|
"count": {"title": "Count", "type": "integer"},
|
|
"name": {"title": "Name", "type": "string"},
|
|
"selected_strategy": {
|
|
"const": "option1",
|
|
"title": "Selected " "Strategy",
|
|
"type": "string",
|
|
"default": "option1",
|
|
},
|
|
},
|
|
"required": ["name", "count"],
|
|
"title": "Choice1",
|
|
"type": "object",
|
|
},
|
|
{
|
|
"properties": {
|
|
"selected_strategy": {
|
|
"const": "option2",
|
|
"title": "Selected " "Strategy",
|
|
"type": "string",
|
|
"default": "option2",
|
|
},
|
|
"sequence": {"items": {"type": "string"}, "title": "Sequence", "type": "array"},
|
|
},
|
|
"required": ["sequence"],
|
|
"title": "Choice2",
|
|
"type": "object",
|
|
},
|
|
],
|
|
"title": "Choice",
|
|
},
|
|
"items": {
|
|
"items": {
|
|
"properties": {"field1": {"title": "Field1", "type": "string"}, "field2": {"title": "Field2", "type": "integer"}},
|
|
"required": ["field1", "field2"],
|
|
"title": "InnerClass",
|
|
"type": "object",
|
|
},
|
|
"title": "Items",
|
|
"type": "array",
|
|
},
|
|
},
|
|
"required": ["items", "choice"],
|
|
"title": "Some Source",
|
|
"type": "object",
|
|
}
|
|
|
|
def test_schema_postprocessing(self):
|
|
schema = SomeSourceConfig.schema()
|
|
assert schema == self.EXPECTED_SCHEMA
|