## What Migrating Pydantic V2 for Protocol Messages to speed up emitting records. This gives us 2.5x boost over V1. Close https://github.com/airbytehq/airbyte-internal-issues/issues/8333 ## How - Switch to using protocol models generated for pydantic_v2, in a new (temporary) package, `airbyte-protocol-models-pdv2` . - Update pydantic dependency of the CDK accordingly to v2. - For minimal impact, still use the compatibility code `pydantic.v1` in all of our pydantic code from airbyte-cdk that does not interact with the protocol models. ## Review guide 1. Checkout the code and clear your CDK virtual env (either `rm -rf .venv && python -m venv .venv` or `poetry env list; poetry env remove <env>`. This is necessary to fully clean out the `airbyte_protocol` library, for some reason. Then: `poetry lock --no-update && poetry install --all-extras`. This should install the CDK with new models. 2. Run unit tests on the CDK 3. Take your favorite connector and point it's `pyproject.toml` on local CDK (see example in `source-s3`) and try running it's tests and it's regression tests. ## User Impact > [!warning] > This is a major CDK change due to the pydantic dependency change - if connectors use pydantic 1.10, they will break and will need to do similar `from pydantic.v1` updates to get running again. Therefore, we should release this as a major CDK version bump. ## Can this PR be safely reverted and rolled back? - [x] YES 💚 - [ ] NO ❌ Even if sources migrate to this version, state format should not change, so a revert should be possible. ## Follow up work - Ella to move into issues <details> ### Source-s3 - turn this into an issue - [ ] Update source s3 CDK version and any required code changes - [ ] Fix source-s3 unit tests - [ ] Run source-s3 regression tests - [ ] Merge and release source-s3 by June 21st ### Docs - [ ] Update documentation on how to build with CDK ### CDK pieces - [ ] Update file-based CDK format validation to use Pydantic V2 - This is doable, and requires a breaking change to change `OneOfOptionConfig`. There are a few unhandled test cases that present issues we're unsure of how to handle so far. - [ ] Update low-code component generators to use Pydantic V2 - This is doable, there are a few issues around custom component generation that are unhandled. ### Further CDK performance work - create issues for these - [ ] Research if we can replace prints with buffered output (write to byte buffer and then flush to stdout) - [ ] Replace `json` with `orjson` ... </details>
85 lines
3.2 KiB
Python
85 lines
3.2 KiB
Python
#
|
|
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
|
#
|
|
|
|
from typing import Any, Dict, Optional, Type
|
|
|
|
from airbyte_cdk.sources.utils.schema_helpers import expand_refs
|
|
from pydantic.v1 import BaseModel, Extra
|
|
from pydantic.v1.main import ModelMetaclass
|
|
from pydantic.v1.typing import resolve_annotations
|
|
|
|
|
|
class AllOptional(ModelMetaclass):
|
|
"""
|
|
Metaclass for marking all Pydantic model fields as Optional
|
|
Here is example of declaring model using this metaclass like:
|
|
'''
|
|
class MyModel(BaseModel, metaclass=AllOptional):
|
|
a: str
|
|
b: str
|
|
'''
|
|
it is an equivalent of:
|
|
'''
|
|
class MyModel(BaseModel):
|
|
a: Optional[str]
|
|
b: Optional[str]
|
|
'''
|
|
It would make code more clear and eliminate a lot of manual work.
|
|
"""
|
|
|
|
def __new__(mcs, name, bases, namespaces, **kwargs): # type: ignore[no-untyped-def] # super().__new__ is also untyped
|
|
"""
|
|
Iterate through fields and wrap then with typing.Optional type.
|
|
"""
|
|
annotations = resolve_annotations(namespaces.get("__annotations__", {}), namespaces.get("__module__", None))
|
|
for base in bases:
|
|
annotations = {**annotations, **getattr(base, "__annotations__", {})}
|
|
for field in annotations:
|
|
if not field.startswith("__"):
|
|
annotations[field] = Optional[annotations[field]] # type: ignore[assignment]
|
|
namespaces["__annotations__"] = annotations
|
|
return super().__new__(mcs, name, bases, namespaces, **kwargs)
|
|
|
|
|
|
class BaseSchemaModel(BaseModel):
|
|
"""
|
|
Base class for all schema models. It has some extra schema postprocessing.
|
|
Can be used in combination with AllOptional metaclass
|
|
"""
|
|
|
|
class Config:
|
|
extra = Extra.allow
|
|
|
|
@classmethod
|
|
def schema_extra(cls, schema: Dict[str, Any], model: Type[BaseModel]) -> None:
|
|
"""Modify generated jsonschema, remove "title", "description" and "required" fields.
|
|
|
|
Pydantic doesn't treat Union[None, Any] type correctly when generate jsonschema,
|
|
so we can't set field as nullable (i.e. field that can have either null and non-null values),
|
|
We generate this jsonschema value manually.
|
|
|
|
:param schema: generated jsonschema
|
|
:param model:
|
|
"""
|
|
schema.pop("title", None)
|
|
schema.pop("description", None)
|
|
schema.pop("required", None)
|
|
for name, prop in schema.get("properties", {}).items():
|
|
prop.pop("title", None)
|
|
prop.pop("description", None)
|
|
allow_none = model.__fields__[name].allow_none
|
|
if allow_none:
|
|
if "type" in prop:
|
|
prop["type"] = ["null", prop["type"]]
|
|
elif "$ref" in prop:
|
|
ref = prop.pop("$ref")
|
|
prop["oneOf"] = [{"type": "null"}, {"$ref": ref}]
|
|
|
|
@classmethod
|
|
def schema(cls, *args: Any, **kwargs: Any) -> Dict[str, Any]:
|
|
"""We're overriding the schema classmethod to enable some post-processing"""
|
|
schema = super().schema(*args, **kwargs)
|
|
expand_refs(schema)
|
|
return schema # type: ignore[no-any-return]
|