1
0
mirror of synced 2026-01-21 15:06:13 -05:00
Files
airbyte/airbyte-cdk/python/airbyte_cdk/sources/declarative/spec/spec.py
Ella Rohm-Ensing fc12432305 airbyte-cdk: only update airbyte-protocol-models to pydantic v2 (#39524)
## What

Migrating Pydantic V2 for Protocol Messages to speed up emitting records. This gives us 2.5x boost over V1. 

Close https://github.com/airbytehq/airbyte-internal-issues/issues/8333

## How
- Switch to using protocol models generated for pydantic_v2, in a new (temporary) package, `airbyte-protocol-models-pdv2` .
- Update pydantic dependency of the CDK accordingly to v2.
- For minimal impact, still use the compatibility code `pydantic.v1` in all of our pydantic code from airbyte-cdk that does not interact with the protocol models.

## Review guide
1. Checkout the code and clear your CDK virtual env (either `rm -rf .venv && python -m venv .venv` or `poetry env list; poetry env remove <env>`. This is necessary to fully clean out the `airbyte_protocol` library, for some reason. Then: `poetry lock --no-update && poetry install --all-extras`. This should install the CDK with new models. 
2. Run unit tests on the CDK
3. Take your favorite connector and point it's `pyproject.toml` on local CDK (see example in `source-s3`) and try running it's tests and it's regression tests.

## User Impact

> [!warning]
> This is a major CDK change due to the pydantic dependency change - if connectors use pydantic 1.10, they will break and will need to do similar `from pydantic.v1` updates to get running again. Therefore, we should release this as a major CDK version bump.

## Can this PR be safely reverted and rolled back?
- [x] YES 💚
- [ ] NO 

Even if sources migrate to this version, state format should not change, so a revert should be possible.

## Follow up work - Ella to move into issues

<details>

### Source-s3 - turn this into an issue
- [ ] Update source s3 CDK version and any required code changes
- [ ] Fix source-s3 unit tests
- [ ] Run source-s3 regression tests
- [ ] Merge and release source-s3 by June 21st

### Docs
- [ ] Update documentation on how to build with CDK 

### CDK pieces
- [ ] Update file-based CDK format validation to use Pydantic V2
  - This is doable, and requires a breaking change to change `OneOfOptionConfig`. There are a few unhandled test cases that present issues we're unsure of how to handle so far.
- [ ] Update low-code component generators to use Pydantic V2
  - This is doable, there are a few issues around custom component generation that are unhandled.

### Further CDK performance work - create issues for these
- [ ] Research if we can replace prints with buffered output (write to byte buffer and then flush to stdout)
- [ ] Replace `json` with `orjson`
...

</details>
2024-06-21 01:53:44 +02:00

43 lines
1.8 KiB
Python

#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#
from dataclasses import InitVar, dataclass
from typing import Any, Mapping, Optional
from airbyte_cdk.models.airbyte_protocol import AdvancedAuth, ConnectorSpecification # type: ignore [attr-defined]
from airbyte_cdk.sources.declarative.models.declarative_component_schema import AuthFlow
@dataclass
class Spec:
"""
Returns a connection specification made up of information about the connector and how it can be configured
Attributes:
connection_specification (Mapping[str, Any]): information related to how a connector can be configured
documentation_url (Optional[str]): The link the Airbyte documentation about this connector
"""
connection_specification: Mapping[str, Any]
parameters: InitVar[Mapping[str, Any]]
documentation_url: Optional[str] = None
advanced_auth: Optional[AuthFlow] = None
def generate_spec(self) -> ConnectorSpecification:
"""
Returns the connector specification according the spec block defined in the low code connector manifest.
"""
obj: dict[str, Mapping[str, Any] | str | AdvancedAuth] = {"connectionSpecification": self.connection_specification}
if self.documentation_url:
obj["documentationUrl"] = self.documentation_url
if self.advanced_auth:
self.advanced_auth.auth_flow_type = self.advanced_auth.auth_flow_type.value # type: ignore # We know this is always assigned to an AuthFlow which has the auth_flow_type field
# Map CDK AuthFlow model to protocol AdvancedAuth model
obj["advanced_auth"] = AdvancedAuth.parse_obj(self.advanced_auth.dict())
# We remap these keys to camel case because that's the existing format expected by the rest of the platform
return ConnectorSpecification.parse_obj(obj)