## What
Migrating Pydantic V2 for Protocol Messages to speed up emitting records. This gives us 2.5x boost over V1.
Close https://github.com/airbytehq/airbyte-internal-issues/issues/8333
## How
- Switch to using protocol models generated for pydantic_v2, in a new (temporary) package, `airbyte-protocol-models-pdv2` .
- Update pydantic dependency of the CDK accordingly to v2.
- For minimal impact, still use the compatibility code `pydantic.v1` in all of our pydantic code from airbyte-cdk that does not interact with the protocol models.
## Review guide
1. Checkout the code and clear your CDK virtual env (either `rm -rf .venv && python -m venv .venv` or `poetry env list; poetry env remove <env>`. This is necessary to fully clean out the `airbyte_protocol` library, for some reason. Then: `poetry lock --no-update && poetry install --all-extras`. This should install the CDK with new models.
2. Run unit tests on the CDK
3. Take your favorite connector and point it's `pyproject.toml` on local CDK (see example in `source-s3`) and try running it's tests and it's regression tests.
## User Impact
> [!warning]
> This is a major CDK change due to the pydantic dependency change - if connectors use pydantic 1.10, they will break and will need to do similar `from pydantic.v1` updates to get running again. Therefore, we should release this as a major CDK version bump.
## Can this PR be safely reverted and rolled back?
- [x] YES 💚
- [ ] NO ❌
Even if sources migrate to this version, state format should not change, so a revert should be possible.
## Follow up work - Ella to move into issues
<details>
### Source-s3 - turn this into an issue
- [ ] Update source s3 CDK version and any required code changes
- [ ] Fix source-s3 unit tests
- [ ] Run source-s3 regression tests
- [ ] Merge and release source-s3 by June 21st
### Docs
- [ ] Update documentation on how to build with CDK
### CDK pieces
- [ ] Update file-based CDK format validation to use Pydantic V2
- This is doable, and requires a breaking change to change `OneOfOptionConfig`. There are a few unhandled test cases that present issues we're unsure of how to handle so far.
- [ ] Update low-code component generators to use Pydantic V2
- This is doable, there are a few issues around custom component generation that are unhandled.
### Further CDK performance work - create issues for these
- [ ] Research if we can replace prints with buffered output (write to byte buffer and then flush to stdout)
- [ ] Replace `json` with `orjson`
...
</details>
<!--
Thanks for your contribution!
Before you submit the pull request,
I'd like to kindly remind you to take a moment and read through our guidelines
to ensure that your contribution aligns with the type of contributions our project accepts.
All the information you need can be found here:
https://docs.airbyte.com/contributing-to-airbyte/
We truly appreciate your interest in contributing to Airbyte,
and we're excited to see what you have to offer!
If you have any questions or need any assistance, feel free to reach out in #contributions Slack channel.
-->
## What
* After https://github.com/airbytehq/airbyte/pull/35905, we should be emitting a state message with every successful sync. However there are a few tests that were too lenient and weren't actually _successful_ syncs. This PR fixes those cases and adds validation that we emit at least one state message per successful sync.
## How
* Add an assertion that we get at least 1 state message for a successful sync
* Fix some tests that previously "output 0 expected records" but actually errored silently - do not run them as read tests
* Fix a test that failed silently due to lack of support for multi-format
* Add a new test for syncs that output 0 records successfully
## 🚨 User Impact 🚨
None - test changes
## Pre-merge Actions
*Expand the relevant checklist and delete the others.*
<details><summary><strong>New Connector</strong></summary>
### Community member or Airbyter
- **Community member?** Grant edit access to maintainers ([instructions](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests))
- Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run `./gradlew :airbyte-integrations:connectors:<name>:integrationTest`.
- Connector version is set to `0.0.1`
- `Dockerfile` has version `0.0.1`
- Documentation updated
- Connector's `README.md`
- Connector's `bootstrap.md`. See [description and examples](https://docs.google.com/document/d/1ypdgmwmEHWv-TrO4_YOQ7pAJGVrMp5BOkEVh831N260/edit?usp=sharing)
- `docs/integrations/<source or destination>/<name>.md` including changelog with an entry for the initial version. See changelog [example](https://docs.airbyte.io/integrations/sources/stripe#changelog)
- `docs/integrations/README.md`
### Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
- Create a non-forked branch based on this PR and test the below items on it
- Build is successful
- If new credentials are required for use in CI, add them to GSM. [Instructions](https://docs.airbyte.io/connector-development#using-credentials-in-ci).
</details>
<details><summary><strong>Updating a connector</strong></summary>
### Community member or Airbyter
- Grant edit access to maintainers ([instructions](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests))
- Unit & integration tests added
### Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
- Create a non-forked branch based on this PR and test the below items on it
- Build is successful
- If new credentials are required for use in CI, add them to GSM. [Instructions](https://docs.airbyte.io/connector-development#using-credentials-in-ci).
</details>
<details><summary><strong>Connector Generator</strong></summary>
- Issue acceptance criteria met
- PR name follows [PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook)
- If adding a new generator, add it to the [list of scaffold modules being tested](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connector-templates/generator/build.gradle#L41)
- The generator test modules (all connectors with `-scaffold` in their name) have been updated with the latest scaffold by running `./gradlew :airbyte-integrations:connector-templates:generator:generateScaffolds` then checking in your changes
- Documentation which references the generator is updated as needed
</details>
<details><summary><strong>Updating the Python CDK</strong></summary>
### Airbyter
Before merging:
- Pull Request description explains what problem it is solving
- Code change is unit tested
- Build and my-py check pass
- Smoke test the change on at least one affected connector
- On Github: Run [this workflow](https://github.com/airbytehq/airbyte/actions/workflows/connectors_tests.yml), passing `--use-local-cdk --name=source-<connector>` as options
- Locally: `airbyte-ci connectors --use-local-cdk --name=source-<connector> test`
- PR is reviewed and approved
After merging:
- [Publish the CDK](https://github.com/airbytehq/airbyte/actions/workflows/publish-cdk-command-manually.yml)
- The CDK does not follow proper semantic versioning. Choose minor if this the change has significant user impact or is a breaking change. Choose patch otherwise.
- Write a thoughtful changelog message so we know what was updated.
- Merge the platform PR that was auto-created for updating the Connector Builder's CDK version
- This step is optional if the change does not affect the connector builder or declarative connectors.
</details>
<!--
Thanks for your contribution!
Before you submit the pull request,
I'd like to kindly remind you to take a moment and read through our guidelines
to ensure that your contribution aligns with the type of contributions our project accepts.
All the information you need can be found here:
https://docs.airbyte.com/contributing-to-airbyte/
We truly appreciate your interest in contributing to Airbyte,
and we're excited to see what you have to offer!
If you have any questions or need any assistance, feel free to reach out in #contributions Slack channel.
-->
## What
* Closes https://github.com/airbytehq/airbyte/issues/34151
* Closes https://github.com/airbytehq/oncall/issues/4386
## How
Handle cases where the python value of a pyarrow scalar is None. This can be due to null values in data, as well as null-like values like `NaT` (similar to `NaN`). We previously handled this for `None` binary types, but now handle this for `None` of any type.
## 🚨 User Impact 🚨
No breaking changes. After this CDK version is released we should update the CDK dependency in S3 and any other file sources that parse parquet
## Pre-merge Actions
*Expand the relevant checklist and delete the others.*
<details><summary><strong>New Connector</strong></summary>
### Community member or Airbyter
- **Community member?** Grant edit access to maintainers ([instructions](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests))
- Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run `./gradlew :airbyte-integrations:connectors:<name>:integrationTest`.
- Connector version is set to `0.0.1`
- `Dockerfile` has version `0.0.1`
- Documentation updated
- Connector's `README.md`
- Connector's `bootstrap.md`. See [description and examples](https://docs.google.com/document/d/1ypdgmwmEHWv-TrO4_YOQ7pAJGVrMp5BOkEVh831N260/edit?usp=sharing)
- `docs/integrations/<source or destination>/<name>.md` including changelog with an entry for the initial version. See changelog [example](https://docs.airbyte.io/integrations/sources/stripe#changelog)
- `docs/integrations/README.md`
### Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
- Create a non-forked branch based on this PR and test the below items on it
- Build is successful
- If new credentials are required for use in CI, add them to GSM. [Instructions](https://docs.airbyte.io/connector-development#using-credentials-in-ci).
</details>
<details><summary><strong>Updating a connector</strong></summary>
### Community member or Airbyter
- Grant edit access to maintainers ([instructions](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests))
- Unit & integration tests added
### Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
- Create a non-forked branch based on this PR and test the below items on it
- Build is successful
- If new credentials are required for use in CI, add them to GSM. [Instructions](https://docs.airbyte.io/connector-development#using-credentials-in-ci).
</details>
<details><summary><strong>Connector Generator</strong></summary>
- Issue acceptance criteria met
- PR name follows [PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook)
- If adding a new generator, add it to the [list of scaffold modules being tested](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connector-templates/generator/build.gradle#L41)
- The generator test modules (all connectors with `-scaffold` in their name) have been updated with the latest scaffold by running `./gradlew :airbyte-integrations:connector-templates:generator:generateScaffolds` then checking in your changes
- Documentation which references the generator is updated as needed
</details>
<details><summary><strong>Updating the Python CDK</strong></summary>
### Airbyter
Before merging:
- Pull Request description explains what problem it is solving
- Code change is unit tested
- Build and my-py check pass
- Smoke test the change on at least one affected connector
- On Github: Run [this workflow](https://github.com/airbytehq/airbyte/actions/workflows/connectors_tests.yml), passing `--use-local-cdk --name=source-<connector>` as options
- Locally: `airbyte-ci connectors --use-local-cdk --name=source-<connector> test`
- PR is reviewed and approved
After merging:
- [Publish the CDK](https://github.com/airbytehq/airbyte/actions/workflows/publish-cdk-command-manually.yml)
- The CDK does not follow proper semantic versioning. Choose minor if this the change has significant user impact or is a breaking change. Choose patch otherwise.
- Write a thoughtful changelog message so we know what was updated.
- Merge the platform PR that was auto-created for updating the Connector Builder's CDK version
- This step is optional if the change does not affect the connector builder or declarative connectors.
</details>