## What
Migrating Pydantic V2 for Protocol Messages to speed up emitting records. This gives us 2.5x boost over V1.
Close https://github.com/airbytehq/airbyte-internal-issues/issues/8333
## How
- Switch to using protocol models generated for pydantic_v2, in a new (temporary) package, `airbyte-protocol-models-pdv2` .
- Update pydantic dependency of the CDK accordingly to v2.
- For minimal impact, still use the compatibility code `pydantic.v1` in all of our pydantic code from airbyte-cdk that does not interact with the protocol models.
## Review guide
1. Checkout the code and clear your CDK virtual env (either `rm -rf .venv && python -m venv .venv` or `poetry env list; poetry env remove <env>`. This is necessary to fully clean out the `airbyte_protocol` library, for some reason. Then: `poetry lock --no-update && poetry install --all-extras`. This should install the CDK with new models.
2. Run unit tests on the CDK
3. Take your favorite connector and point it's `pyproject.toml` on local CDK (see example in `source-s3`) and try running it's tests and it's regression tests.
## User Impact
> [!warning]
> This is a major CDK change due to the pydantic dependency change - if connectors use pydantic 1.10, they will break and will need to do similar `from pydantic.v1` updates to get running again. Therefore, we should release this as a major CDK version bump.
## Can this PR be safely reverted and rolled back?
- [x] YES 💚
- [ ] NO ❌
Even if sources migrate to this version, state format should not change, so a revert should be possible.
## Follow up work - Ella to move into issues
<details>
### Source-s3 - turn this into an issue
- [ ] Update source s3 CDK version and any required code changes
- [ ] Fix source-s3 unit tests
- [ ] Run source-s3 regression tests
- [ ] Merge and release source-s3 by June 21st
### Docs
- [ ] Update documentation on how to build with CDK
### CDK pieces
- [ ] Update file-based CDK format validation to use Pydantic V2
- This is doable, and requires a breaking change to change `OneOfOptionConfig`. There are a few unhandled test cases that present issues we're unsure of how to handle so far.
- [ ] Update low-code component generators to use Pydantic V2
- This is doable, there are a few issues around custom component generation that are unhandled.
### Further CDK performance work - create issues for these
- [ ] Research if we can replace prints with buffered output (write to byte buffer and then flush to stdout)
- [ ] Replace `json` with `orjson`
...
</details>
* relax pydantic dep
* Automated Commit - Format and Process Resources Changes
* update protocol models
* format change
---------
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
* Try running only on modified files
* make a change
* return something with the wrong type
* Revert "return something with the wrong type"
This reverts commit 23b828371e.
* fix typing in file-based
* format
* Mypy
* fix
* leave as Mapping
* Revert "leave as Mapping"
This reverts commit 908f063f70.
* Use Dict
* update
* move dict()
* Revert "move dict()"
This reverts commit fa347a8236.
* Revert "Revert "move dict()""
This reverts commit c9237df2e4.
* Revert "Revert "Revert "move dict()"""
This reverts commit 5ac1616414.
* use Mapping
* point to config file
* comment
* strict = False
* remove --
* Revert "comment"
This reverts commit 6000814a82.
* install types
* install types in same command as mypy runs
* non-interactive
* freeze version
* pydantic plugin
* plugins
* update
* ignore missing import
* Revert "ignore missing import"
This reverts commit 1da7930fb7.
* Install pydantic instead
* fix
* this passes locally
* strict = true
* format
* explicitly import models
* Update
* remove old mypy.ini config
* temporarily disable mypy
* format
* any
* format
* fix tests
* format
* Automated Commit - Formatting Changes
* Revert "temporarily disable mypy"
This reverts commit eb8470fa3f.
* implicit reexport
* update test
* fix mypy
* Automated Commit - Formatting Changes
* fix some errors in tests
* more type fixes
* more fixes
* more
* .
* done with tests
* fix last files
* format
* Update gradle
* change source-stripe
* only run mypy on cdk
* remove strict
* Add more rules
* update
* ignore missing imports
* cast to string
* Allow untyped decorator
* reset to master
* move to the cdk
* derp
* move explicit imports around
* Automated Commit - Formatting Changes
* Revert "move explicit imports around"
This reverts commit 56e306b72f.
* move explicit imports around
* Upgrade mypy version
* point to config file
* Update readme
* Ignore errors in the models module
* Automated Commit - Formatting Changes
* move check to gradle build
* Any
* try checking out master too
* Revert "try checking out master too"
This reverts commit 8a8f3e373c.
* fetch master
* install mypy
* try without origin
* fetch from the script
* checkout master
* ls the branches
* remotes/origin/master
* remove some cruft
* comment
* remove pydantic types
* unpin mypy
* fetch from the script
* Update connectors base too
* modify a non-cdk file to confirm it doesn't get checked by mypy
* run mypy after generateComponentManifestClassFiles
* run from the venv
* pass files as arguments
* update
* fix when running without args
* with subdir
* path
* try without /
* ./
* remove filter
* try resetting
* Revert "try resetting"
This reverts commit 3a54c424de.
* exclude autogen file
* do not use the github action
* works locally
* remove extra fetch
* run on connectors base
* try bad typing
* Revert "try bad typing"
This reverts commit 33b512a3e4.
* reset stripe
* Revert "reset stripe"
This reverts commit 28f23fc6dd.
* Revert "Revert "reset stripe""
This reverts commit 5bf5dee371.
* missing return type
* do not ignore the autogen file
* remove extra installs
* run from venv
* Only check files modified on current branch
* Revert "Only check files modified on current branch"
This reverts commit b4b728e654.
* use merge-base
* Revert "use merge-base"
This reverts commit 3136670cbf.
* try with updated mypy
* bump
* run other steps after mypy
* reset task ordering
* run mypy though
* looser config
* tests pass
* fix mypy issues
* type: ignore
* optional
* this is always a bool
* ignore
* fix typing issues
* remove ignore
* remove mapping
* Automated Commit - Formatting Changes
* Revert "remove ignore"
This reverts commit 9ffeeb6cb1.
* update config
---------
Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
* airbyte_protocol as dependency for cdk
download from pypi instead of using local directory in monorepo
* remove generate-protocol-files.sh
* protocol_models => models
airbyte_protocol.models more closely reflects java package name
* use published pypi package
* fix imports
* context because someone will wonder what happened here
* run formatter
---------
Co-authored-by: Conor <cpdeethree@users.noreply.github.com>
Co-authored-by: Sherif Nada <snadalive@gmail.com>
Co-authored-by: cgardens <charles@airbyte.io>
* Improve airbyte cdk invalid message data type error message
* Test cdk invalid message data type custom error is raised
* Fix test to pass stream as a string
* Add valid record message data input type test
* Add object type and value to AirbyteRecordMessage validator message
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
* Add Version to AirbyteMessage
* Move protocol version to ConnectorSpecification
* Add cdk generated protocol model
* Add protocol_version to the sample ConnectorSpec in the docs
* Update airbyte-protocol/protocol-models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* update doc
* Update CDK changelog
* Update CDK protocol model
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* Update schema
* generate python
* Stream as an object
* PR comments
* generate python
* rm unused required
* Description the state with no type
* Fix connector build
* Format
* format
Co-authored-by: cgardens <charles@airbyte.io>
* generate AirbyeTraceMessage `type` enum with descriptive class name
* add comment on `title` usage
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* apply changes to bases/airbyte-protocol
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* Pass worker metadata to connector
* Fix compilation
* Pass in job id and image from worker
* Remove application version
* Add default job environment variables
* Add back removed comment
* Rename env map to job metadata
* Fix env configs
* Read connector from application
* Use empty string
* Remove println
* Fix unit test
* Fix compilation error
* Introduce constants for worker env
* Add worker env to ENV_VARS_TO_TRANSFER
* Pass into getWorkerMetadata map to all constructions
* Format code
* Format octavia cli
* Fix test compilation
* Fix typos
* [9044] Destination-gcs\destination-bigquery(gcs) - updated check() method to handle that user has both storage.objects.create and storage.multipartUploads.create roles
This commit reverts #9348 (9bb28939ee) because it does not work. The `test_docker_runner[standard]` and `test_docker_runner[waiting]` test cases still fail transiently.
* Change OAuth API
* Change protocol for new OAuthConfigSpecification
* Refactor OAuth classes and tests
* Remove webbackend source/destination creation
* Change from webback to normal API
* Implement new protocol change with OAuth specs
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* format
* format
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* Change OAuth API
* Change protocol for new OAuth Spec (#7827)
* Add examples
* Add protocol object to api too
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* adding google sheets oauth flow to server
* fix oauth type in protocol yaml
* bump sheets version in definitions
* added GDrive scope
* update sheets to master changes
* update protocol incl. cdk
* protocol typing for oauth rootobject
* format
* destination-specification: add supportsNormalization and supportsDBT attributes
* address review comment
* missed this one
* output after gradle format