1
0
mirror of synced 2026-01-09 06:03:17 -05:00
Commit Graph

36 Commits

Author SHA1 Message Date
Cynthia Yin
8400d20352 Destination Redshift: deprecate old migration normalization code (#25771)
* first pass normalization

* add pr link

* remove python test & resources

* linting
2023-05-05 14:18:27 -07:00
Mikhail Shustov
2ce3c17048 🎉 Destination ClickHouse: bump dbt-clickhouse to v1.4.0 (#23023)
* bump dbt-clickhouse to 1.4.0

* fix clickhouse integration test

* exclude duckdb from tests

* add to changelog

* bump normalization version in definitions

---------

Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
2023-02-16 20:15:09 -08:00
Cole Snodgrass
2e099acc52 update headers from 2022 -> 2023 (#22594)
* It's 2023!

* 2022 -> 2023

---------

Co-authored-by: evantahler <evan@airbyte.io>
2023-02-08 13:01:16 -08:00
Simon Späti
2bbc4f6f83 🎉 New Destination: DuckDB (#17494)
This is the first version of the DuckDB destination. There are potential edge cases that still need to be taken care of. But looking forward to your feedback.
2023-02-07 11:33:10 +01:00
Edward Gao
517fc6ac10 Normalization: Revert to protocol v0 (#22283)
* Revert "Normalization: handle non-object top-level schemas; treat binary data as string (#22165)"

This reverts commit 8276d03359.

* Revert "Normalization: check for ref type existence (#22161)"

This reverts commit dbe56d6fc2.

* Revert "🎉Updated normalization to handle new datatypes (#19721)"

This reverts commit c1d7736639.

* revert dest definitions

* also dockerfile

* re-add to changelog

* add comment in dockerfile
2023-02-06 10:14:36 -08:00
Jimmy Ma
6660b13ad2 Add Airbyte Protocol V1 support. (#20036)
* Add Airbyte Protocol V1 support.

* Fix VersionedAirbyteStreamFactoryTest

* Remove AirbyteMessageMigrationV0 example

* Add Protocol Version constants

* 🎉Updated normalization to handle new datatypes (#19721)

* Updated normalization simple stream processing to handle new datatypes

* Updated normalization nested stream processing to handle new datatypes

* Updated normalization nested stream processing to handle new datatypes

* Updated normalization drop_scd_catalog processing to handle new datatypes

* Updated normalization ephemeral test processing to handle new datatypes

* fixed more tests for normalization

* fixed more tests for normalization

* fixed more tests for normalization

* fixed more tests for normalization

* fixed more issues

* fixed more issues (clickhouse)

* fixed more issues

* fixed more issues

* fixed more issues

* added binary type processing for some DBs

* cleared commented code and moved some hardcodes to processing as macro

* fixed codestyle and cleared commented code

* minor refactor

* minor refactor

* minor refactor

* fixed bool cast error

* fixed dict->str cast error

* fixed is_combining_node cast py check

* removed commented code

* removed commented code

* committed autogenerated normalization_test_output files

* committed autogenerated normalization_test_output files (new files)

* refactored utils.py

* Updated utils.py to use Callable functions and get rid of property_type in is_number and is_bool functions

* committed autogenerated normalization_test_output files (new files)

* fixed typo in TIMESTAMP_WITH_TIMEZONE_TYPE

* updated stream_processor to handle string type first as a wider type

* fixed arrays normalization by updating is_simple_property method as per new approaches

* format

Co-authored-by: Edward Gao <edward.gao@airbyte.io>

* Update airbyte protocol migration (#20745)

* Extract MigrationContainer from AirbyteMessageMigrator

* Add ConfiguredAirbyteCatalogMigrations

* Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations

* Enable ConfiguredAirbyteCatalog migration

* Fix tests

* Remove extra this.

* Add missing docs

* Typo

Co-authored-by: Edward Gao <edward.gao@airbyte.io>

* Data types update: Implement protocol message migrations (#19240)

* Extract MigrationContainer from AirbyteMessageMigrator

* Add ConfiguredAirbyteCatalogMigrations

* Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations

* Enable ConfiguredAirbyteCatalog migration

* set up scaffolding

* [wip] more scaffolding, basic unit test

* minimal green code

* [wip] add failing test for other primitive types

* correct version number

* handle basic primitive type decls

* add implicit cases

* add recursive schema

* formatting

* comment

* support not

* fix indentation

* handle all nested schema cases

* handle boolean schemas

* verify empty schema handling

* cleanup

* extract map

* code organization

* extract method

* reformat

* [wip] more tests, minor fix type array handling

* corrected test

* cleanup

* reformat

* switch to v1

* add support for multityped fields

* missed test case

* nested test class

* basic record upgrade

* implement record upgrades

* slight refactor

* comments+clarificationso

* extract constants

* (partly) correct model classes

* add de/ser

* formatting

* extract constants

* fix json reference

* update docs

* switch to v1 models

* fix compile+test

* add base64 handling

* use vnull

* Data types update: Implement protocol message downgrade path (#19909)

* rough skeleton for passing catalog into migration

* basic test

* more scaffolding

* basic implementation

* add primitives test

* add in other tests (nested fields currently failing)

* add formats

* impleent oneOf handling

* formatting

* oneOf handling

* better tests

* comments + organization

* progress

* basic test case

* downgrade objects, ish

* basic array implementation

* handle numeric failure

* test for new type

* handle array items

* empty schema handling

* first pass at oneof handling

* add more tests+handling

* more tests

* comments

* add empty oneof test case

* format + reorganize

* more reorganize

* fix name

* also downgrade binary data

* only import vnull

* move migrations into v1 package

* extract schema mutation code

* comment

* extract schema migration to new class

* extract record downgrade logic for future use

* format

* fix build after rebase

* rename private method for consistency

* also implement configuredcatalog migrations >.>

* quick and dirty tests

* slight cleanup

* fix tests

* pmd

* pmd test

* null check on message objects

* maybe fix acceptance tests?

* fix name

* extract constants

* more fixes

* tmp

* meh

* fix cdc acc tests

* revert to master source-postgres

* remove log messages

* revert other misc hacks

* integers are valid cursors

* remove unrelated change

* fix build

* fix build more?

* [MUST REVERT] use dev normalization

* capture kube logs

* also here?

* no debug logs?

* delete dup from merging

* add final everywhere

* revert test changes

Co-authored-by: Jimmy Ma <jimmy@airbyte.io>

* On-the-fly migrations of persisted catalogs (#21757)

* On the fly catalog migration for normalization activity

* On the fly catalog migration for job persistence

* On the fly migration for standard sync persistence

* On the fly migration for airbyte catalogs

* Refactor code to share JsonSchema traversal

* Add V0 Data type search function

* PMD and Format

* Fix getOrInsertActorCatalog and ConfigRepositoryE2E tests

* Null-proofing CatalogMigrationV1Helper

* More null checks

* Fix test

* Format

* Add data type v1 support to the FE

* Changes AC test check to check exited ps (#21672)

some docker compose changes no longer show exited
processes.  this broke out test

this change should fix master

tested in a runner that failed

* Move wellknown types mapping to the utility function

* use protocolv1 normalization

---------

Co-authored-by: Topher Lubaway <asimplechris@gmail.com>
Co-authored-by: Edward Gao <edward.gao@airbyte.io>

* Update protocol support range (#21996)

* bump normalization version to 0.3.0

* Add version check on normalization (#22048)

* Add normalization min version check

* Add visible for testing

---------

Co-authored-by: Edward Gao <edward.gao@airbyte.io>
Co-authored-by: Eugene <etsybaev@gmail.com>
Co-authored-by: Topher Lubaway <asimplechris@gmail.com>
2023-01-30 10:17:49 -08:00
Greg Solovyev
8cf546483d 🐛 Add a drop table hook to drop scd tables in case of overwrite sync (#18015)
* Add a drop table hook to drop scd tables in case of overwrite sync

* Add an integration test for dropping SCD table on overwrite

* skip new test for Oracle and TiDB

* Add normalization run after initial reset

* Bump normalization version
2022-11-01 08:52:02 -07:00
Daemonxiao
d4524032ae 🎉 New Destination: TiDB (#15592)
* Add new destination-tidb

* support sync

* Add normalization-tidb

* fix failed tests

* Add unnest marco

* fmt

* Add new destination-tidb

* support sync

* Add normalization-tidb

* fix failed tests

* Add unnest marco

* fmt

* fmt

* fix integration test

* Update docs/integrations/destinations/tidb.md

Co-authored-by: Xiang Zhang <angwerzx@126.com>

* Update doc

* Update doc

* Update doc

* bump normalization version

* update normalization changelog

* run format

* add dest def

* generat spec

Co-authored-by: Xiang Zhang <angwerzx@126.com>
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
2022-08-31 16:50:27 -03:00
Baz
062b12f1ba 🎉 Base Norrmalization: clean-up Redshift tmp_schemas after SAT (#14015)
Now after `base-normalization` SAT the Destination Redshift will be automatically cleaned up from test leftovers. Other destinations are not covered yet.
2022-06-27 20:44:04 +03:00
Serhii Chvaliuk
0342699daf Normalization: rename *.sql -> *.sql.j2 (#13474)
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-06-06 18:58:34 +03:00
Brian Leonard
b882538147 Snowflake integration test steps (#13205)
* Destination-snowflake test config update

* Tests assume this ones doesn’t work!

* Make GCS integration

* Use existing GCS integration because tests use it

* Add comments

* Snowflake setup in base-normalization

* Markdown!

* Respect the env variable

* readme update

* Updated snapshot
2022-05-26 13:24:29 -07:00
Alexandre Girard
3894134d11 Bump year in license short to 2022 (#13191)
* Bump to 2022

* format
2022-05-25 17:56:49 -07:00
Serhii Chvaliuk
7023fbd48e Redshift SUPER type (#12064)
* 🎉 Destination Redshift: Use SUPER data type on Redshift destination for raw JSON data (#9407)

Co-authored-by: Oleksandr Tsukanov <alexander.tsukanovvv@gmail.com>
Co-authored-by: Sergey Chvalyuk <grubberr@gmail.com>
Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-04-20 15:11:22 +03:00
Edward Gao
c1381cde2c Revert Redshift SUPER PRs (#12041) 2022-04-14 12:36:26 -07:00
Serhii Chvaliuk
9b05bc1f34 Normalization redshift - add support SUPER type (#9610)
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
Co-authored-by: Oleksandr Tsukanov <alexander.tsukanovvv@gmail.com>
2022-04-12 21:42:43 +03:00
Edward Gao
046fc5e1cc 🎉 upgrade dbt to 1.0.0 (except for oracle and mysql) (#11051) 2022-03-11 16:38:37 -08:00
Edward Gao
b6926d44d4 🚨 Snowflake produces permanent tables 🚨 (#9063) 2022-01-06 10:10:25 -08:00
Bo Lu
bbcd461bc5 🎉 New Destination: ClickHouse (#7620)
* add ClickHouse destination

* update docs

* format code

* code improvement as per code review

* add ssh tunneling and ssl/tls support and code enhancement

* merge from master

* disable testCustomDbtTransformationsFailure test

* fix string format bug

* fix reserved keywords bug and disable dbt

* disable dbt in expect result

* add type hints

* bump connector version

Co-authored-by: Alexander Tsukanov <alexander.tsukanovvv@gmail.com>
Co-authored-by: Marcos Marx <marcosmarxm@gmail.com>
2021-12-13 19:39:19 -03:00
Christophe Duong
b424c1a0e7 🐛 Fix incremental normalization with empty tables (#8394)
* Fix incremental with empty final tables

* upgrade docker images

* Regen SQL

* Bumpversion & format
2021-12-01 23:40:14 +01:00
Christophe Duong
affea7f60b 🐛 Minor fixes to incremental normalization and nesting (#7669) 2021-11-08 17:42:57 +01:00
Christophe Duong
5fc50df39d 🎉 Incremental Normalization (#7162) 2021-10-29 13:53:02 +02:00
Christophe Duong
c4620559d7 🎉 Refactor Normalization docker images and upgrade to use dbt 0.21.0 (#6959)
* Split normalization docker images for some connectors with specifics dependencies

* Regenerate (#7003)
2021-10-14 20:29:16 +02:00
Baz
e5abaeccef 🎉 Base-normalization: Implement normalization for MSSQL-destination (#6079)
See the attached PR (https://github.com/airbytehq/airbyte/pull/6079)
2021-10-07 18:46:27 +03:00
Michel Tricot
1773e41e47 Shorten our headers + adds contributors file (#6478) 2021-09-27 10:45:50 -07:00
Marcos Marx
589d535a61 🎉 Oracle normalization (#5562)
* oracle normalization

* correct dbt_project function for oracle

* unit tests

* run format

* correct ephemeral tests

* add gradle dependency for oracle destination

* run int tests

* add oracle in settings.gradle for normalization run[

* use default airbyte columns

* format

* test all destinatoin ephemeral

* correct unit test

* correct unit test

* destination docs update

* correct mypy

* integration test all dest

* refactor oracle function

* merge master

* run all destinations

* flake8 escape regex

* surrogate key function

* correct few minor comments

* refactor scd sql function

* refactor scd function

* revert test

* refactor minor details

* revert tests

* revert ephemeral test

* revert unit test table_registry

* revert airbyte_protocol format

* format

* bump normalization version in worker

* minor chnages

* minor chages

* correct json_column for other destinations

* gradlew format

* revert tests

* remove comments

* add Oracle destination explicit in safe_cast_str

* add quote_in_parenthesis inside if clause

* gradlew format
2021-09-07 16:39:17 -03:00
Christophe Duong
d6429a410a Normalization handles quote in column names (#5027)
* Handle quotes in columns names
2021-07-28 16:00:13 +02:00
Christophe Duong
5cdc7f8517 🐛 (contribution) Fix SQL model to build a Type 2 SCD to handle NULL cursor_field values correctly (#4881)
* Update SQL model to build a Type 2 Slowly Changing Dimension (#4802)

* Make SQL more portable

* Bumpversion of normalization

Co-authored-by: Daniel Diamond <33811744+danieldiamond@users.noreply.github.com>
2021-07-22 16:27:54 +02:00
LiRen Tu
2caf3904f0 🎉 MySQL destination: normalization (#4163)
* Add mysql dbt package

* Add mysql normalization support in java

* Add mysql normalization support in python

* Fix unit tests

* Update readme

* Setup mysql container in integration test

* Add macros

* Depend on dbt-mysql from git repo

* Remove mysql limitation test

* Test normalization

* Revert protocol format change

* Fix mysel json macros

* Fix two more macros

* Fix table name length

* Fix array macro

* Fix equality test macro

* Update replace-identifiers

* Add more identifiers to replace

* Fix unnest macro

* Fix equality macro

* Check in mysql test output

* Update column limit test for mysql

* Escape parentheses

* Remove unnecessary mysql test

* Remove mysql output for easier code review

* Remove unnecessary mysql test

* Remove parentheses

* Update dependencies

* Skip mysql instead of manually write out types

* Bump version

* Check in unit test for mysql name transformer

* Fix type conversion

* Use json_value to extract scalar json fields

* Move dbt-mysql to Dockerfile (#4459)

* Format code

* Check in mysql dbt output

* Remove unnecessary quote

* Update mysql equality test to match 0.19.0

* Check in schema_test update

* Update readme

* Bump base normalization version

* Update document

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
2021-07-03 20:30:59 -07:00
Christophe Duong
bb4dcb1987 🎉 Remove hash when it is not necessary from normalization outputs (#3704)
* Refactor `generate_new_table_name` using a table name registry class instead

* update normalization docs

* Enable MyPy

* Regenerate output files

* Closes https://github.com/airbytehq/airbyte/issues/2389

* Bumpversion normalization
2021-06-01 17:07:22 +02:00
Christophe Duong
8862fba1bb 🎉 Avoid dbt runtime exception "maximum recursion depth exceeded" in ephemeral materialization
* Create new test_ephemeral and refactor with test_normalization

* Add notes in docs

* Refactor common normalization tests into DbtIntegrationTest

* Bumpversion of normalization image
2021-05-21 18:07:20 +02:00
Christophe Duong
8790fc10ab Simple rename of dbt generated models folder (#3469)
* Rename folder where models are generated from airbyte_views to airbyte_ctes

* Integration test outputs are moved around as a result
2021-05-18 20:01:09 +02:00
Charles
0df53170c9 Stop formatting python with spotless (#3388) 2021-05-13 17:46:34 -07:00
Christophe Duong
f666fd2f18 Upgrade normalization to use dbt from docker images (#3186) 2021-05-04 10:31:07 +02:00
Christophe Duong
0265012e42 Handle special characters in columns names (#3133)
* Handle special characters in columns names (add test case to integration tests)

* Add test case with column name collisions

* Bumpversion of normalization image
2021-04-30 11:59:55 +02:00
Christophe Duong
86513d6c54 Fix normalization Nesting bug (#3110)
* New test case for nested streams

* Fix filename naming (collisions and nesting)

* Update generated files from tests with new file naming

* Allow invalid json data in raw tables when normalizing on redshift

* Regenerate final sql files

* Disable unit tests on stream naming (temporarly)

* Fix unnesting bug in postgres

* Reactivate unit tests and change table registry

* Move normalization unit tests to integration tests (too slow)

* Remove heavy catalog.json used in unit_tests (actual catalog from facebook/stripe with thousands of lines)

* Bumpversion of normalization image
2021-04-29 14:32:59 +02:00
Christophe Duong
c2fa3e4c9c Introduce normalization integration tests (#3025)
* Speed normalization unit tests by dropping hubspot catalog (too heavy, will be covering it in integration tests instead

* Add integration tests for normalization

* Add dedup test case

* adjust build.gradle

* add readme for normalization

* Share PATH env variable with subprocess calls

* Handle git non-versionned tests vs versionned ones

* Format code

* Add tests check to normalization integration tests

* Add docs

* complete docs on normalization integration tests

* format code

* Normalization integration tests output (#3026)

* Version generated/output files from normalization integration tests

* simplify cast of float columns to string when used as partition key (#3027)

* bump version of normalization image

* Apply suggestions from code review

Co-authored-by: Jared Rhizor <jared@dataline.io>

* Apply suggestions from code review

Co-authored-by: Jared Rhizor <jared@dataline.io>
2021-04-27 12:01:04 +02:00