1
0
mirror of synced 2025-12-26 14:02:10 -05:00
Commit Graph

34 Commits

Author SHA1 Message Date
Artem Inzhyyants
df34893b63 feat(airbyte-cdk): replace pydantic BaseModel with dataclasses + serpyco-rs in protocol (#44444)
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
2024-09-02 17:48:17 +02:00
Ella Rohm-Ensing
fc12432305 airbyte-cdk: only update airbyte-protocol-models to pydantic v2 (#39524)
## What

Migrating Pydantic V2 for Protocol Messages to speed up emitting records. This gives us 2.5x boost over V1. 

Close https://github.com/airbytehq/airbyte-internal-issues/issues/8333

## How
- Switch to using protocol models generated for pydantic_v2, in a new (temporary) package, `airbyte-protocol-models-pdv2` .
- Update pydantic dependency of the CDK accordingly to v2.
- For minimal impact, still use the compatibility code `pydantic.v1` in all of our pydantic code from airbyte-cdk that does not interact with the protocol models.

## Review guide
1. Checkout the code and clear your CDK virtual env (either `rm -rf .venv && python -m venv .venv` or `poetry env list; poetry env remove <env>`. This is necessary to fully clean out the `airbyte_protocol` library, for some reason. Then: `poetry lock --no-update && poetry install --all-extras`. This should install the CDK with new models. 
2. Run unit tests on the CDK
3. Take your favorite connector and point it's `pyproject.toml` on local CDK (see example in `source-s3`) and try running it's tests and it's regression tests.

## User Impact

> [!warning]
> This is a major CDK change due to the pydantic dependency change - if connectors use pydantic 1.10, they will break and will need to do similar `from pydantic.v1` updates to get running again. Therefore, we should release this as a major CDK version bump.

## Can this PR be safely reverted and rolled back?
- [x] YES 💚
- [ ] NO 

Even if sources migrate to this version, state format should not change, so a revert should be possible.

## Follow up work - Ella to move into issues

<details>

### Source-s3 - turn this into an issue
- [ ] Update source s3 CDK version and any required code changes
- [ ] Fix source-s3 unit tests
- [ ] Run source-s3 regression tests
- [ ] Merge and release source-s3 by June 21st

### Docs
- [ ] Update documentation on how to build with CDK 

### CDK pieces
- [ ] Update file-based CDK format validation to use Pydantic V2
  - This is doable, and requires a breaking change to change `OneOfOptionConfig`. There are a few unhandled test cases that present issues we're unsure of how to handle so far.
- [ ] Update low-code component generators to use Pydantic V2
  - This is doable, there are a few issues around custom component generation that are unhandled.

### Further CDK performance work - create issues for these
- [ ] Research if we can replace prints with buffered output (write to byte buffer and then flush to stdout)
- [ ] Replace `json` with `orjson`
...

</details>
2024-06-21 01:53:44 +02:00
Natik Gadzhi
8b82caa4df [airbyte-cdk] Fix dpath.util.* deprecation warnings (#38847)
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2024-06-03 12:51:48 -07:00
Ignas Vyšniauskas
82a2283a76 🐛 [cdk] Return correct type during 'check' if config does not match schema (#37398)
Co-authored-by: Natik Gadzhi <natik@respawn.io>
Co-authored-by: Augustin <augustin@airbyte.io>
2024-05-24 17:32:37 -07:00
Ella Rohm-Ensing
b7819d9f6c python: assert actual == expected ordering (#36980) 2024-04-11 15:16:33 +00:00
Artem Inzhyyants
0954ad3d3a Airbyte CDK: add interpolation for request options (#35485)
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
2024-02-22 19:40:44 +01:00
Joe Reuter
55d5345bff Vector DB CDK: Refactor to improve readability (#33255)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-12-13 11:23:39 +00:00
Augustin
0b33caecda Revert "[skip ci] formatting: add missing license headers (#33250)" (#33289) 2023-12-11 11:38:37 +01:00
Augustin
60c1cc01ad [skip ci] formatting: add missing license headers (#33250) 2023-12-11 10:15:18 +01:00
Joe Reuter
21b3b2f638 Vector DB CDK: Fix special tokens (#33065) 2023-12-08 11:46:46 +01:00
Joe Reuter
28e8692624 Vector DB CDK: Add omit_raw_text flag (#32698)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-11-30 09:49:02 +01:00
Joe Reuter
aa111d2bea Vector DB CDK: Delete cdc records (#32496) 2023-11-16 15:32:15 +00:00
Ella Rohm-Ensing
ac3eb28de2 airbyte-ci: add format commands (#31831)
Co-authored-by: Ben Church <ben@airbyte.io>
Co-authored-by: bnchrch <bnchrch@users.noreply.github.com>
Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
Co-authored-by: Augustin <augustin@airbyte.io>
Co-authored-by: Marius Posta <marius@airbyte.io>
Co-authored-by: alafanechere <alafanechere@users.noreply.github.com>
2023-11-14 02:17:48 -06:00
Joe Reuter
e113ff66c5 CDK: Make consts required in Pydantic generated json schemas (#32251) 2023-11-09 16:12:11 +01:00
Martin Hwasser
40b0e05526 vector_based_cdk: Add option to rename field names (#31524)
Co-authored-by: Joe Reuter <joe@airbyte.io>
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-19 15:37:47 +02:00
Joe Reuter
67324a4b5b Vector DB CDK: Batch by documents separately for each stream and namespace (#31158) 2023-10-12 13:47:27 +00:00
Joe Reuter
5ab372170b Vector DB CDK: Add embedding option for openai-compatible embedding services (#30137) 2023-10-02 16:21:44 +00:00
Marius Posta
7ae97175a6 gradle: fix repo wide behaviour (#30607) 2023-09-28 05:01:13 -07:00
Joe Reuter
7e3437f05b Add chunking options to vector_db CDK (#30305)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-09-25 10:09:37 +00:00
Joe Reuter
a609902106 Vector DB CDK: Split openai embedding calls (#30512) 2023-09-19 14:21:13 +00:00
Joe Reuter
da5b432255 Vector DB CDK: AzureOpenAIEmbedder (#30136) 2023-09-14 12:41:00 +02:00
Joe Reuter
f2a8bebdc5 Vector DB CDK: Add "from field" embedding strategy (#30140) 2023-09-06 14:54:17 +02:00
Joe Reuter
56580b70c3 Vector DB CDK: Better error message for misconfigured text fields (#30129)
Co-authored-by: Pedro S. Lopez <pedroslopez@me.com>
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-09-06 10:11:56 +02:00
Joe Reuter
7966a4e8f6 Vector DB CDK: Fix id generation, improve config spec, add base test case (#30081) 2023-09-01 15:04:10 +02:00
Joe Reuter
a6547456b9 Vector based CDK (#29703)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-08-29 16:04:32 +02:00
Cole Snodgrass
2e099acc52 update headers from 2022 -> 2023 (#22594)
* It's 2023!

* 2022 -> 2023

---------

Co-authored-by: evantahler <evan@airbyte.io>
2023-02-08 13:01:16 -08:00
Augustin
b564f3eb78 Protocol: make supported_sync_modes a required not empty list on AirbyteStream (#15591) 2022-10-19 15:22:25 +02:00
Augustin
b76b73bbfb cdk: do not call init_uncaught_exception_handler from modules' root (#14892) 2022-07-21 16:34:20 +02:00
Alexandre Girard
3894134d11 Bump year in license short to 2022 (#13191)
* Bump to 2022

* format
2022-05-25 17:56:49 -07:00
Eugene Kulak
aa67604f09 CDK: Add base pydantic model for connector config and schemas (#8485)
* add base spec model

* fix usage of state_checkpoint_interval in case it is dynamic

* add schema base models, fix spelling, signatures and polishing

Co-authored-by: Eugene Kulak <kulak.eugene@gmail.com>
2021-12-08 01:14:59 +02:00
Michel Tricot
1773e41e47 Shorten our headers + adds contributors file (#6478) 2021-09-27 10:45:50 -07:00
oleh.zorenko
4dca32713b 🎉 CDK: Add support for custom headers passing to the request in OAuth2Authenticator. refresh_access_token (#6219)
* Add support for headers to OAuth2Authenticator

Send custom headers in `refresh_access_token()`.

* Bump version + update CHANGELOG.md

* Add tests

* Update tests for refresh_access_token()

* Assert that mock_refresh_token_call was called

* Remove init file
2021-09-22 07:45:05 +03:00
Dmytro
b1f2bf5665 4776: Python CDK: Validate input config.py against spec (#5457)
Python CDK: Validate input config.py against spec

Co-authored-by: Dmytro Rezchykov <dmitry.rezchykov@zazmic.com>
2021-08-19 13:14:37 +03:00
Sherif A. Nada
cb4fe7254c CDK: Add initial Destination abstraction and tests (#4719)
Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>
2021-07-13 16:18:08 -07:00