## What Migrating Pydantic V2 for Protocol Messages to speed up emitting records. This gives us 2.5x boost over V1. Close https://github.com/airbytehq/airbyte-internal-issues/issues/8333 ## How - Switch to using protocol models generated for pydantic_v2, in a new (temporary) package, `airbyte-protocol-models-pdv2` . - Update pydantic dependency of the CDK accordingly to v2. - For minimal impact, still use the compatibility code `pydantic.v1` in all of our pydantic code from airbyte-cdk that does not interact with the protocol models. ## Review guide 1. Checkout the code and clear your CDK virtual env (either `rm -rf .venv && python -m venv .venv` or `poetry env list; poetry env remove <env>`. This is necessary to fully clean out the `airbyte_protocol` library, for some reason. Then: `poetry lock --no-update && poetry install --all-extras`. This should install the CDK with new models. 2. Run unit tests on the CDK 3. Take your favorite connector and point it's `pyproject.toml` on local CDK (see example in `source-s3`) and try running it's tests and it's regression tests. ## User Impact > [!warning] > This is a major CDK change due to the pydantic dependency change - if connectors use pydantic 1.10, they will break and will need to do similar `from pydantic.v1` updates to get running again. Therefore, we should release this as a major CDK version bump. ## Can this PR be safely reverted and rolled back? - [x] YES 💚 - [ ] NO ❌ Even if sources migrate to this version, state format should not change, so a revert should be possible. ## Follow up work - Ella to move into issues <details> ### Source-s3 - turn this into an issue - [ ] Update source s3 CDK version and any required code changes - [ ] Fix source-s3 unit tests - [ ] Run source-s3 regression tests - [ ] Merge and release source-s3 by June 21st ### Docs - [ ] Update documentation on how to build with CDK ### CDK pieces - [ ] Update file-based CDK format validation to use Pydantic V2 - This is doable, and requires a breaking change to change `OneOfOptionConfig`. There are a few unhandled test cases that present issues we're unsure of how to handle so far. - [ ] Update low-code component generators to use Pydantic V2 - This is doable, there are a few issues around custom component generation that are unhandled. ### Further CDK performance work - create issues for these - [ ] Research if we can replace prints with buffered output (write to byte buffer and then flush to stdout) - [ ] Replace `json` with `orjson` ... </details>
59 lines
1.6 KiB
Python
59 lines
1.6 KiB
Python
# The earlier versions of airbyte-cdk (0.28.0<=) had the airbyte_protocol python classes
|
|
# declared inline in the airbyte-cdk code. However, somewhere around Feb 2023 the
|
|
# Airbyte Protocol moved to its own repo/PyPi package, called airbyte-protocol-models.
|
|
# This directory including the airbyte_protocol.py and well_known_types.py files
|
|
# are just wrappers on top of that stand-alone package which do some namespacing magic
|
|
# to make the airbyte_protocol python classes available to the airbyte-cdk consumer as part
|
|
# of airbyte-cdk rather than a standalone package.
|
|
from .airbyte_protocol import (
|
|
AdvancedAuth,
|
|
AirbyteAnalyticsTraceMessage,
|
|
AirbyteCatalog,
|
|
AirbyteConnectionStatus,
|
|
AirbyteControlConnectorConfigMessage,
|
|
AirbyteControlMessage,
|
|
AirbyteErrorTraceMessage,
|
|
AirbyteEstimateTraceMessage,
|
|
AirbyteGlobalState,
|
|
AirbyteLogMessage,
|
|
AirbyteMessage,
|
|
AirbyteProtocol,
|
|
AirbyteRecordMessage,
|
|
AirbyteStateBlob,
|
|
AirbyteStateMessage,
|
|
AirbyteStateType,
|
|
AirbyteStream,
|
|
AirbyteStreamState,
|
|
AirbyteStreamStatus,
|
|
AirbyteStreamStatusTraceMessage,
|
|
AirbyteTraceMessage,
|
|
AuthFlowType,
|
|
ConfiguredAirbyteCatalog,
|
|
ConfiguredAirbyteStream,
|
|
ConnectorSpecification,
|
|
DestinationSyncMode,
|
|
EstimateType,
|
|
FailureType,
|
|
Level,
|
|
OAuthConfigSpecification,
|
|
OrchestratorType,
|
|
Status,
|
|
StreamDescriptor,
|
|
SyncMode,
|
|
TraceType,
|
|
Type,
|
|
)
|
|
from .well_known_types import (
|
|
BinaryData,
|
|
Boolean,
|
|
Date,
|
|
Integer,
|
|
Model,
|
|
Number,
|
|
String,
|
|
TimestampWithoutTimezone,
|
|
TimestampWithTimezone,
|
|
TimeWithoutTimezone,
|
|
TimeWithTimezone,
|
|
)
|