## What
Migrating Pydantic V2 for Protocol Messages to speed up emitting records. This gives us 2.5x boost over V1.
Close https://github.com/airbytehq/airbyte-internal-issues/issues/8333
## How
- Switch to using protocol models generated for pydantic_v2, in a new (temporary) package, `airbyte-protocol-models-pdv2` .
- Update pydantic dependency of the CDK accordingly to v2.
- For minimal impact, still use the compatibility code `pydantic.v1` in all of our pydantic code from airbyte-cdk that does not interact with the protocol models.
## Review guide
1. Checkout the code and clear your CDK virtual env (either `rm -rf .venv && python -m venv .venv` or `poetry env list; poetry env remove <env>`. This is necessary to fully clean out the `airbyte_protocol` library, for some reason. Then: `poetry lock --no-update && poetry install --all-extras`. This should install the CDK with new models.
2. Run unit tests on the CDK
3. Take your favorite connector and point it's `pyproject.toml` on local CDK (see example in `source-s3`) and try running it's tests and it's regression tests.
## User Impact
> [!warning]
> This is a major CDK change due to the pydantic dependency change - if connectors use pydantic 1.10, they will break and will need to do similar `from pydantic.v1` updates to get running again. Therefore, we should release this as a major CDK version bump.
## Can this PR be safely reverted and rolled back?
- [x] YES 💚
- [ ] NO ❌
Even if sources migrate to this version, state format should not change, so a revert should be possible.
## Follow up work - Ella to move into issues
<details>
### Source-s3 - turn this into an issue
- [ ] Update source s3 CDK version and any required code changes
- [ ] Fix source-s3 unit tests
- [ ] Run source-s3 regression tests
- [ ] Merge and release source-s3 by June 21st
### Docs
- [ ] Update documentation on how to build with CDK
### CDK pieces
- [ ] Update file-based CDK format validation to use Pydantic V2
- This is doable, and requires a breaking change to change `OneOfOptionConfig`. There are a few unhandled test cases that present issues we're unsure of how to handle so far.
- [ ] Update low-code component generators to use Pydantic V2
- This is doable, there are a few issues around custom component generation that are unhandled.
### Further CDK performance work - create issues for these
- [ ] Research if we can replace prints with buffered output (write to byte buffer and then flush to stdout)
- [ ] Replace `json` with `orjson`
...
</details>
* [ISSUE #26581] per partition cursor
* [ISSUE #26581] format
* [ISSUE #26581] clean up state management
* [ISSUE #26581] improving Hashabledict
* [ISSUE #26581] format cdk
* [ISSUE #26581] fix tests
* [ISSUE #26581] code review from girarda
* Retrigger pipeline
* Decouple cursor and stream slicer and pushing state management as far up cursor as possible
* Format cdk
* Small fixes/comments
* DatetimeBasedCursor should not update state based on slice (for now at least since it wasn't doing this before)
* [ISSUE #26581] code review
* Automated Commit - Formatting Changes
* [ISSUE #26581] validation overlapping keys
* [ISSUE #26581] add typing
* [ISSUE #26581] code review
* Remove SyncMode from stream_slices
* Removing SyncMode from stream_slices up until SimpleRetriever and fixing typing
* [ISSUE-26434] replacing Record primitive by class
* [ISSUE-26434] update Cursor.update_state to use new record object
* Issue 26343/data feed incremental sync solution 2 (#27481)
* TMP [ISSUE-26434] first solution to enable stop condition on pagination
* TMP [ISSUE-26434] second solution to enable stop condition on pagination
* TMP [ISSUE-26434] second solution fix
* [ISSUE #26343] fixing behavior and adding tests
* [ISSUE #26343] only updating state once a slice to allow for data feed
* [ISSUE #26343] removing freezing of cursor
* format cdk
* [ISSUE #26343] ensure data_feed doesn't have end_datetime
* [ISSUE #26343] self review
* [ISSUE #26343] code review
* [ISSUE #26343] code review clean up
* [ISSUE #26343] code review clean up
* Code review
* [ISSUE #26343] add warn log message in DatetimeBasedCursor
* format
* Format
* [ISSUE #26581] per partition cursor
* [ISSUE #26581] format
* [ISSUE #26581] clean up state management
* [ISSUE #26581] improving Hashabledict
* [ISSUE #26581] format cdk
* [ISSUE #26581] fix tests
* [ISSUE #26581] code review from girarda
* Retrigger pipeline
* Decouple cursor and stream slicer and pushing state management as far up cursor as possible
* Format cdk
* Small fixes/comments
* DatetimeBasedCursor should not update state based on slice (for now at least since it wasn't doing this before)
* [ISSUE #26581] code review
* Automated Commit - Formatting Changes
* [ISSUE #26581] validation overlapping keys
* [ISSUE #26581] add typing
* [ISSUE #26581] code review
* Remove SyncMode from stream_slices
* Removing SyncMode from stream_slices up until SimpleRetriever and fixing typing
* format cdk
* [ISSUE #19410] remove request_options_provider from the … (#21403)
* [ISSUE #19410] (incomplete) remove request_options_provider from the manifest
* [ISSUE #19410] (incomplete) incomplete cleanup config_component_schema.json as well
* [ISSUE #19410] update source-monday
* [ISSUE #19410] code review
* [ISSUE #19410] formatting files
* [Low-Code CDK] Replace the $options keyword with $parameters (#21632)
* refactor flows and tests to use parameters instead of options
* update documentation to reflect the change from options to parameters
* create migration script to replace options with parameters in existing manifests
* update template to use parameters instead of options
* fix tests after rebasing from the branch
* address pr feedback and extra uses of options that I missed
* additional changes needed after rebasing from master
* migrate low-code connectors to use parameters instead of options
* 🚨🚨 [Low Code CDK] Update `*ref` format to `#/` (#21434)
* [Low-Code CDK] Remove JsonSchema type in favor of JsonSchemaFileLoader (#21832)
* fully deprecate JsonSchema in favor of JsonFileSchemaLoader
* remove usage in the legacy registry
* Update migration scripts according to manifest file rename (#21920)
* Issue 21866 remove legacy factory and validation flow (#21878)
* [ISSUE #21866] clean ManifestDeclarativeSource validation
* [ISSUE #21866] remove dataclasses-jsonschema
* [ISSUE #21866] code review
* [ISSUE-21866] flake8
* [ISSUE #21559] remove DefaultPaginator.url_base (#21823)
* [ISSUE #21559] remove DefaultPaginator.url_base
* [ISSUE #21559] code review
* [ISSUE #21559] update migration script
* [ISSUE #21559] code review
* [ISSUE #21559] update documentation
* [ISSUE #21559] run migration (#21824)
* [ISSUE #21559] remove DefaultPaginator.url_base (#21823)
* [ISSUE #21559] remove DefaultPaginator.url_base
* [ISSUE #21559] code review
* [ISSUE #21559] update migration script
* [ISSUE #21559] code review
* [ISSUE #21559] update documentation
* [ISSUE #21559] run migration (#21824)
* [ISSUE #21559] fix manifests
* [ISSUE #21926] setup server to allow for local tests (#21974)
* [Low Code CDK] remove checkpoint_interval from DeclarativeStream component (#22120)
* Issue #21576 rename dpathextractor fieldpointer (#21990)
* [ISSUE #21926] setup server to allow for local tests
* [ISSUE #21576] Rename DpathExtractor.field_pointer to field_path
* [ISSUE #21576] migration script
* [ISSUE #21576] update source-monday and source-pocket as well
* [ISSUE #21576] migration (#21997)
* [ISSUE #21576] code review
* Remove checkpoint_interval from source-prestashop manifest (#22141)
* replacing options with parameters for a few connectors I missed or were newly added
* [Low-Code CDK] Rremove stream_cursor_field from stream and derive it from stream_slicer (#22294)
* update schema to derive cursor_field from a stream slicer if it exists
* remove usage of stream_cursor_field on simple connector use cases
* fixing some of the more complex usage of stream_cursor_field that rely on cartesian product stream slicers
* fix documentation to replace references to stream_cursor_field
* Low Code CDK: Remove `name` and `primary_key` from non-DeclarativeStream components (#21891)
* fix eslint issues for webapp (#22462)
* 🪟🔧 Connector Builder frontend fixes for low_code_cdk_to_beta (#22375)
* bump connector builder server to latest CDK version
* fix breaking CDK changes in connector builder FE
* [Low-Code CDK] Separate request path from RequestOption component (#22398)
* split apart path from RequestOption and fix usages and cleanup the code
* replace usage of path with RequestPath and get rid of default to RequestOption
* fix bug where stream_slice_field was used in outbound request instead of request_option field_name
* organize yaml schema names and update documentation for RequestOption and RequestPath
* clean up tests
* regenerate models
* [ISSUE #19961] refactor stream slices (#22225)
* [ISSUE #19961] add 'incremental' and partially remove CartesianProductStreamSlicer - Google PageSpeed Insights not working yet
* [ISSUE #19961] fixing Google PageSpeed Insights
* move incremental_sync field to the stream level and perform merging into one stream slicer at that level
* add tests to merging incremental and iterable into cartesian
* rewrite documentation to separate incremental sync and iterator concepts
* update documentation to use partition router and revise the tutorial to reflect the new changes to the components
* [ISSUE #19961] update code to newest CDK version and clean autogenerated files (#22670)
* [ISSUE #19961] rename stream_slicer to partition_router and update ma… (#22590)
* [ISSUE #19961] rename stream_slicer to partition_router and update manifests (for incremental_sync as well)
* [ISSUE 19961] rename CustomStreamSlicer (#22598)
* [ISSUE 19961] rename CustomStreamSlicer
* [ISSUE #19961] code review CustomStreamSlicer
* [ISSUE #19961] fix source_square incremental sync
* [ISSUE #19961] rename SingleSlice to SinglePartitionRouter (#22591)
* [ISSUE #19961] rename SingleSlice to SinglePartitionRouter
* remove SinglePartitionRouter from the schema
---------
Co-authored-by: brianjlai <brian.lai@airbyte.io>
* [ISSUE #19961] rename SubstreamSlicer to SubstreamPartitionRouter (#22596)
* [ISSUE #19961] TMP rename SubstreamSlicer to SubstreamPartitionRouter
* [ISSUE #19961] revert DatetimeStreamSlicer.stream_state_field_start and DatetimeStreamSlicer.stream_state_field_end
* [ISSUE #19961] rename ListStreamSlicer to ListPartitionRouter (#22593)
---------
Co-authored-by: brianjlai <brian.lai@airbyte.io>
* [ISSUE #19961] clean faulty merge
* [ISSUE #19961] rename DatetimeStreamSlicer (#22617)
* [ISSUE #19961] rename stream_slicer to partition_router and update manifests (for incremental_sync as well)
* [ISSUE 19961] rename CustomStreamSlicer (#22598)
* [ISSUE 19961] rename CustomStreamSlicer
* [ISSUE #19961] code review CustomStreamSlicer
* [ISSUE #19961] fix source_square incremental sync
* [ISSUE #19961] rename SingleSlice to SinglePartitionRouter (#22591)
* [ISSUE #19961] rename SingleSlice to SinglePartitionRouter
* remove SinglePartitionRouter from the schema
---------
Co-authored-by: brianjlai <brian.lai@airbyte.io>
* [ISSUE #19961] rename DatetimeStreamSlicer
* [ISSUE #19961] rename SubstreamSlicer to SubstreamPartitionRouter (#22596)
* [ISSUE #19961] TMP rename SubstreamSlicer to SubstreamPartitionRouter
* [ISSUE #19961] revert DatetimeStreamSlicer.stream_state_field_start and DatetimeStreamSlicer.stream_state_field_end
* [ISSUE #19961] rename ListStreamSlicer to ListPartitionRouter (#22593)
---------
Co-authored-by: brianjlai <brian.lai@airbyte.io>
* Update docs/connector-development/config-based/understanding-the-yaml-file/partition-router.md
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
* Update docs/connector-development/config-based/understanding-the-yaml-file/partition-router.md
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
* Update docs/connector-development/config-based/understanding-the-yaml-file/yaml-overview.md
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
* Update docs/connector-development/config-based/understanding-the-yaml-file/partition-router.md
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
* Update docs/connector-development/config-based/understanding-the-yaml-file/partition-router.md
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
* Update docs/connector-development/config-based/understanding-the-yaml-file/partition-router.md
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
* Update docs/connector-development/config-based/understanding-the-yaml-file/incremental-syncs.md
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
* update docs
* [ISSUE #19961] clean unit tests files
* [ISSUE #19961] code review
---------
Co-authored-by: brianjlai <brian.lai@airbyte.io>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
* [Low-Code CDK] Allow for children of custom components to specify parameters that are normally derived (#22379)
* Fix a bug where child components of a custom component cannot receive fields from other components
* add tests, documentation and commenting
* fix test from merge
* add better error message for nested initialization failures
* 🪟🔧 Connector Builder frontend fixes for low_code_cdk_to_beta (#22880)
* restrict name to stream level
* remove checkpoint interval
* adjust logic for new request options
* refactor slicers
* wording
* review comments
* make oldest supported version explicit
* separate the frontend and connector builder changes from the low-code to beta release
* [Low-Code CDK] Add script to run low code unit tests and address issues with a few connectors (#23123)
* consolidate all the changes into a new PR after I messed up the merge on the side branch
* add set to allow this to be called externally if necessary later
* remove last few extra fields i found and fix docs links
* fix docs one more time
---------
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
Co-authored-by: maxi297 <maxime@airbyte.io>
Co-authored-by: Lake Mossman <lake@airbyte.io>
Co-authored-by: Joe Reuter <joe@airbyte.io>