1
0
mirror of synced 2026-01-05 21:02:13 -05:00
Commit Graph

76 Commits

Author SHA1 Message Date
Christophe Duong
b424c1a0e7 🐛 Fix incremental normalization with empty tables (#8394)
* Fix incremental with empty final tables

* upgrade docker images

* Regen SQL

* Bumpversion & format
2021-12-01 23:40:14 +01:00
Christophe Duong
c4c92bd689 Fix normalization when un-nesting (#8378)
* Remove unique key on exploded nested tables

* un-nest hint

* Regen SQL
2021-12-01 17:13:04 +01:00
Christophe Duong
c5a7267378 🐛🐌 Optimize incremental normalization runtime with snowflake (#8088) 2021-11-19 15:03:52 +01:00
Christophe Duong
affea7f60b 🐛 Minor fixes to incremental normalization and nesting (#7669) 2021-11-08 17:42:57 +01:00
Christophe Duong
5fc50df39d 🎉 Incremental Normalization (#7162) 2021-10-29 13:53:02 +02:00
Andrés Bravo
abf0159778 🎉 Add configurable dbt parameter to destination-bigquery (#7118)
* Add configurable dbt  parameter to destination-bigquery

* Update airbyte-integrations/connectors/destination-bigquery/Dockerfile
2021-10-19 14:51:37 +02:00
Christophe Duong
c4620559d7 🎉 Refactor Normalization docker images and upgrade to use dbt 0.21.0 (#6959)
* Split normalization docker images for some connectors with specifics dependencies

* Regenerate (#7003)
2021-10-14 20:29:16 +02:00
Anna Lvova
ec68f478ff 🐛 fix: Normalization date-time should handle empty strings "" (#6379)
* add empty string normalization for postgres

* add empty string normalization for destinations

* fix

* fix

* fix

* fix for snowflake

* fix for mysql

* fix normalization for mysql

* upd doc

* upd doc

* Update airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>

* Update airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>

* bump version

* bump version

* add datetime normalization for mssql

* upd row count for mssql

* upd

* bump version

* upd docs for 0.1.50 normalization version

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
2021-10-08 13:57:37 +03:00
Daniel Diamond
f13313e37e Order CDC records by cdc log position if present (#6688) 2021-10-08 14:00:44 +05:30
Baz
e5abaeccef 🎉 Base-normalization: Implement normalization for MSSQL-destination (#6079)
See the attached PR (https://github.com/airbytehq/airbyte/pull/6079)
2021-10-07 18:46:27 +03:00
Christophe Duong
a3196428a7 Forward destination location to dbt profiles (#6709)
* Forward destination location to dbt profiles

* Format code

* Update version
2021-10-06 19:20:15 +02:00
Charles
5e750164ac Publish SSL-only version of Postgres Destination (#6496)
* try to publish new normalization version

* default to using ssl in postgres destinatoin

* tidy up

* Run normalization tests using postgres DB with SSL support

* bump version

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
2021-09-30 12:55:26 +02:00
andriikorotkov
8fa15713c3 🎉 Destination MySQl - Added support for connection via ssh (aka bastion server) (#6317)
* updated mysql tests

* updated mysql tests

* added mysql ssh tunnel tests by key

* fixed remarks

* fixed remarks

* updated DatabricksStreamCopier

* switch to custom file for ssh config in normalization

* updated MySQL SSH tests

* bump version

* get local port properly

* updated assertSameValue for MySQL ssh tunnel

* updated image version and documentation

* updated code style

* updated CI credentials

* updated normalization documentation

Co-authored-by: George Claireaux <george@claireaux.co.uk>
2021-09-28 13:11:32 +03:00
Michel Tricot
1773e41e47 Shorten our headers + adds contributors file (#6478) 2021-09-27 10:45:50 -07:00
George Claireaux
3d8625e03d Fix ssh tunneling for normalization (#6396)
* switch to custom file for ssh config in normalization

* bump version

* get local port properly

* added unit test for write_ssh_config

* format
2021-09-23 14:08:45 +01:00
Yaroslav Dudar
a6ecfda2ca 🐛 Fix Snowflake destination normalization to accept any date-time format. (#6052)
snowflake date-time format parser
2021-09-23 11:10:12 +03:00
Charles
8ad43afb07 SSH for Postgres Destination (#5743)
Co-authored-by: George Claireaux <phlair@users.noreply.github.com>
2021-09-07 17:06:25 -07:00
Marcos Marx
589d535a61 🎉 Oracle normalization (#5562)
* oracle normalization

* correct dbt_project function for oracle

* unit tests

* run format

* correct ephemeral tests

* add gradle dependency for oracle destination

* run int tests

* add oracle in settings.gradle for normalization run[

* use default airbyte columns

* format

* test all destinatoin ephemeral

* correct unit test

* correct unit test

* destination docs update

* correct mypy

* integration test all dest

* refactor oracle function

* merge master

* run all destinations

* flake8 escape regex

* surrogate key function

* correct few minor comments

* refactor scd sql function

* refactor scd function

* revert test

* refactor minor details

* revert tests

* revert ephemeral test

* revert unit test table_registry

* revert airbyte_protocol format

* format

* bump normalization version in worker

* minor chnages

* minor chages

* correct json_column for other destinations

* gradlew format

* revert tests

* remove comments

* add Oracle destination explicit in safe_cast_str

* add quote_in_parenthesis inside if clause

* gradlew format
2021-09-07 16:39:17 -03:00
Marcos Marx
7225187fa1 run gradlew format (#5552) 2021-08-20 15:38:28 -03:00
Marcos Marx
a9b2c08934 Add condition for unnest_column_name for pg/redshift/mysql (#5467)
* add unnest_column case conflict

* add redshift files

* format

* change logic

* change logic for unnest

* bump normalization version

* add files

* add stream test unnest_alias
2021-08-20 11:09:15 -03:00
Christophe Duong
158594fccc Remove BQ_keyfile.json mentions in normalization (#5528)
* Remove BQ_keyfile.json mentions in normalization

* Bumpversion normalization
2021-08-19 15:36:44 +02:00
Christophe Duong
f9705bf731 BigQuery normalization: make credentials json optional (#5433)
* Allow service-account-json or oauth methods for bigquery destinations
2021-08-17 11:50:17 +02:00
Marcos Marx
e4fe62f739 Normalization: solve conflict when stream and field have same name (#4557)
* solve conflict when stream and field have same name

* add logic to handle conflict

* change files

* change json_extract functions

* json_operations

* add normalization files

* test integration mysql

* remove table_alias

* mysql run

* json ops

* solve conflict with master

* solve mysql circle dependency dbt

* add tests for scalar and arrays

* add sql files

* bump normalization version

* format
2021-08-11 20:18:45 -03:00
Subodh Kant Chaturvedi
923884b897 introduce implementation for date-time support in normalization (#5180)
* introduce implementation for date-time support in normalization

* update test output for all destinations

* add comment
2021-08-11 02:28:03 +05:30
Christophe Duong
d6429a410a Normalization handles quote in column names (#5027)
* Handle quotes in columns names
2021-07-28 16:00:13 +02:00
Christophe Duong
5cdc7f8517 🐛 (contribution) Fix SQL model to build a Type 2 SCD to handle NULL cursor_field values correctly (#4881)
* Update SQL model to build a Type 2 Slowly Changing Dimension (#4802)

* Make SQL more portable

* Bumpversion of normalization

Co-authored-by: Daniel Diamond <33811744+danieldiamond@users.noreply.github.com>
2021-07-22 16:27:54 +02:00
LiRen Tu
2caf3904f0 🎉 MySQL destination: normalization (#4163)
* Add mysql dbt package

* Add mysql normalization support in java

* Add mysql normalization support in python

* Fix unit tests

* Update readme

* Setup mysql container in integration test

* Add macros

* Depend on dbt-mysql from git repo

* Remove mysql limitation test

* Test normalization

* Revert protocol format change

* Fix mysel json macros

* Fix two more macros

* Fix table name length

* Fix array macro

* Fix equality test macro

* Update replace-identifiers

* Add more identifiers to replace

* Fix unnest macro

* Fix equality macro

* Check in mysql test output

* Update column limit test for mysql

* Escape parentheses

* Remove unnecessary mysql test

* Remove mysql output for easier code review

* Remove unnecessary mysql test

* Remove parentheses

* Update dependencies

* Skip mysql instead of manually write out types

* Bump version

* Check in unit test for mysql name transformer

* Fix type conversion

* Use json_value to extract scalar json fields

* Move dbt-mysql to Dockerfile (#4459)

* Format code

* Check in mysql dbt output

* Remove unnecessary quote

* Update mysql equality test to match 0.19.0

* Check in schema_test update

* Update readme

* Bump base normalization version

* Update document

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
2021-07-03 20:30:59 -07:00
Marcos Marx
265e7f79d8 Normalization: remove dedup cdc excluded (#4297)
* change stream processor

* integraton tests

* add integration tests

* format gradle file

* add excluded files

* change catalog and msgs

* add cdc messages

* solve cdc excluded problem with tests

* remove .egg files

* remove time import

* tab stream_processor

* uncommented local test

* add tests for dbt!

* add excluded files

* add missing snowflake file

* add pg, bq and snowflake

* chris comments

* test comment

* pytest parametrize tests

* bump normalization version

* formating

* run test for all destinations
2021-06-30 14:59:13 -03:00
Christophe Duong
75a1dda07e 🎉 New BigQuery destination with Structured/Repeated Records (#4176) 2021-06-23 16:19:36 +02:00
Marcos Marx
810fde9e21 Documentation correct summary normalization docs (#4158)
* correct summary

* run format master failing
2021-06-16 12:36:41 -03:00
Christophe Duong
144bc7814e Normalization with empty catalog (#4020)
* Normalization with empty catalog
2021-06-10 14:22:55 +02:00
Christophe Duong
bb4dcb1987 🎉 Remove hash when it is not necessary from normalization outputs (#3704)
* Refactor `generate_new_table_name` using a table name registry class instead

* update normalization docs

* Enable MyPy

* Regenerate output files

* Closes https://github.com/airbytehq/airbyte/issues/2389

* Bumpversion normalization
2021-06-01 17:07:22 +02:00
Christophe Duong
8862fba1bb 🎉 Avoid dbt runtime exception "maximum recursion depth exceeded" in ephemeral materialization
* Create new test_ephemeral and refactor with test_normalization

* Add notes in docs

* Refactor common normalization tests into DbtIntegrationTest

* Bumpversion of normalization image
2021-05-21 18:07:20 +02:00
Christophe Duong
8790fc10ab Simple rename of dbt generated models folder (#3469)
* Rename folder where models are generated from airbyte_views to airbyte_ctes

* Integration test outputs are moved around as a result
2021-05-18 20:01:09 +02:00
Christophe Duong
083aebcbcb Workflow to handle operations (custom transformation) (#3379)
* Keep normalization backward compatible with old settings from destination

* Bumpversion normalization image
2021-05-17 18:08:27 +02:00
Charles
0df53170c9 Stop formatting python with spotless (#3388) 2021-05-13 17:46:34 -07:00
Christophe Duong
86513d6c54 Fix normalization Nesting bug (#3110)
* New test case for nested streams

* Fix filename naming (collisions and nesting)

* Update generated files from tests with new file naming

* Allow invalid json data in raw tables when normalizing on redshift

* Regenerate final sql files

* Disable unit tests on stream naming (temporarly)

* Fix unnesting bug in postgres

* Reactivate unit tests and change table registry

* Move normalization unit tests to integration tests (too slow)

* Remove heavy catalog.json used in unit_tests (actual catalog from facebook/stripe with thousands of lines)

* Bumpversion of normalization image
2021-04-29 14:32:59 +02:00
Christophe Duong
c2fa3e4c9c Introduce normalization integration tests (#3025)
* Speed normalization unit tests by dropping hubspot catalog (too heavy, will be covering it in integration tests instead

* Add integration tests for normalization

* Add dedup test case

* adjust build.gradle

* add readme for normalization

* Share PATH env variable with subprocess calls

* Handle git non-versionned tests vs versionned ones

* Format code

* Add tests check to normalization integration tests

* Add docs

* complete docs on normalization integration tests

* format code

* Normalization integration tests output (#3026)

* Version generated/output files from normalization integration tests

* simplify cast of float columns to string when used as partition key (#3027)

* bump version of normalization image

* Apply suggestions from code review

Co-authored-by: Jared Rhizor <jared@dataline.io>

* Apply suggestions from code review

Co-authored-by: Jared Rhizor <jared@dataline.io>
2021-04-27 12:01:04 +02:00
Davin Chia
f660b0a946 Add template generation for Santa aka CDK. (#3034)
Template generation for new Source using the Santa CDK - provide basic scaffolding for someone implementing a new source.

General approach is to buff up comments in the original SDK, and add TODOs with secondary comments in the generated stub methods, as well as links to existing examples (e.g. Stripe or ExchangeRate api) users can look at.

Checked in and added tests for the generated modules.
2021-04-25 18:02:33 +08:00
Charles
f445fdb5b2 match styling for spotlessApply and format (#3017)
* as a java developer I want to be able to run spotlessApply without changing styles in python code
2021-04-23 09:21:41 -07:00
Christophe Duong
07a45df454 Add normalization test cases (#2992)
* Add normalization test cases

* Fix new normalization test on name collisions
2021-04-22 19:39:39 +02:00
Christophe Duong
5859e0cef1 Fix Normalization failing with "adapter" does not exist (#2941)
* Fix normalization dedup on non-string primary key columns

* Bumpversion of normalization image

* Add test cases to standard test
2021-04-19 18:32:35 +02:00
Davin Chia
b9014acfca :tada Namespace support. Supported source-destination pairs will now sync data into the same namespace as the source. (#2862)
This PR introduces the following behavior for JDBC sources:
Instead of streamName = schema.tableName,  this is now streamName = tableName and namespace = schema. This means that, when replicating from these sources, data will be replicated into a form matching the source. e.g. public.users (postgres source) -> public.users (postgres destination) instead of current behaviour of public.public_users. Since MySQL does not have schemas, the MySQL source uses the database as it's namespace.

To do so:
- Make namespace a field class concept in Airbyte Protocol. This allows the source to propagate namespace and destinations to write to a source-defined namespace. Also sets us up for future namespace related configurability.
- Add an optional namespace field to the AirbyteRecordMessage. This field will be set by sources that support namespace.
- Introduce AirbyteStreamNameNamespacePair as a type-safe manner of identifying streams throughout our code base.
- Modify base_normalisation to better support source defined namespace, specifically allowing normalisation of tables with the same name to different schemas.
2021-04-17 15:33:22 +08:00
Davin Chia
e11ccfd0a1 Revert "Remove schema from stream name. (#2807)" (#2857)
This reverts commit 6e9d6fce59.
2021-04-12 14:56:11 -07:00
Davin Chia
6e9d6fce59 Remove schema from stream name. (#2807)
Last step (besides documentation) of namespace changes. This is a follow up to #2767 .

After this change, the following JDBC sources will change their behaviour to the behaviour described in the above document.

Namely, instead of streamName = schema.tableName, this will become streamName = tableName and namespace = schema. This means that, when replicating from these sources, data will be replicated into a form matching the source. e.g. public.users (postgres source) -> public.users (postgres destination) instead of current behaviour of public.public_users. Since MySQL does not have schemas, the MySQL source uses the database as it's namespace.

I cleaned up some bits of the CatalogHelpers. This affected the destinations, so I'm also running the destination tests.
2021-04-12 21:02:29 +08:00
Christophe Duong
fafc25d86a Add primary key tests to TestDestination (#2776)
* Add primary tests to TestDestination

* Test with composite primary keys
2021-04-08 11:01:02 +02:00
Christophe Duong
0b6a9830da Missing keywords in redshift (#2700)
* Missing keywords in redshift
2021-04-01 17:33:11 +02:00
Christophe Duong
dbbb58d0a8 🎉 normalization bugfix: support integers with precision > 32 bits & support union types (#2410) 2021-03-12 12:19:18 +01:00
Christophe Duong
28b5134d0e Normalization support destination sync modes append_dedup #2372 (#2394)
(This is not enabled for usage until front-end work is ready)
2021-03-12 12:18:24 +01:00
Jared Rhizor
fa505c7800 deterministic table name collision resolution for normalization (#2206)
* deterministic collision handling for table names

* remove debugging print statement

* fmt

* fix flake check

* fix

* fix

* fix usage

* respond to more feedback

* fix everything except truncation

* fix everything but expected values

* add test for just table name middle truncation

* handle inconsistent suffixes

* update tests

* fmt

* refactor (again)

* fix

* update comments

* remove formatting

* use full path

* remove logging

* remove print statements
2021-03-01 11:25:51 -08:00