1
0
mirror of synced 2025-12-22 03:21:25 -05:00
Commit Graph

50 Commits

Author SHA1 Message Date
Edward Gao
c8e3ec0210 Fix build: Revert "chore: clean out unused "bases" and utils (#53234)" (#53621) 2025-02-10 21:36:30 +00:00
Natik Gadzhi
4dec57a29f chore: clean out unused "bases" and utils (#53234) 2025-02-07 15:19:32 -08:00
Natik Gadzhi
cb80e6922a [tools] prettier rules for .md + formatting cleanup 2024-05-07 08:19:33 -07:00
Edward Gao
9c56062a7d Normalization integration tests: set explicit cursor on cdc streams (#27670) 2023-06-23 14:33:49 -07:00
Edward Gao
fb152a9a0a Normalization: Better handling for CDC transactional updates (#25993)
* try this?

* fix tests

* assert cdc values

* handle case where we have lsn but no updated_at

* readability improvements

* tweaks to test

* version bumps + changelogs

* Automated Change

---------

Co-authored-by: edgao <edgao@users.noreply.github.com>
2023-05-12 12:53:23 +00:00
Cynthia Yin
8400d20352 Destination Redshift: deprecate old migration normalization code (#25771)
* first pass normalization

* add pr link

* remove python test & resources

* linting
2023-05-05 14:18:27 -07:00
Edward Gao
9b7b30f92b Normalization: Use strict > comparison in incremental mode (#22381)
* copy tests from other branch

* switch to >

* [wip] wire up tests

* make tests work

* fixes

* nicer test structure

* maybe add feature flag?

* pattern matching

* also add version check

* formatting

* refactor test also

* extract test + fix method call

* minor tweaks

* add context to log message

* put workspace id in normalization input

* use non-semver tag

* add flag for version of normalization

* also flag old version

* add test

* missed part of the commit

* format

* add test for null workspace ID

* Revert "also flag old version"

This reverts commit 3be601d16c.

* Revert "missed part of the commit"

This reverts commit 47a67b4631.

* always apply flag, even if we're behind a version

* derp

* Add more logging to the normalization activity

* Update charts and kustomize for the feature flag

* fix clickhouse integration test

* remove replace_identifiers

* Revert "remove replace_identifiers"

This reverts commit 0e7ded5a7b.

* fix replace_identifiers

* garbage debug logs

* stop trying to setup duckdb test

* wake up and choose violence

* fix mssql

* exclude duckdb from tests

* make snowflake happy

* uncomment tests

* derp

* derpderp

* format

* format

* also fix redshift???

* maybe now everything works???

* remove debug logs

* use special docker tag

* bump to new tag

* use random test schema in publish also

* properly cleanup

* remove feature flag stuff

* version bump + changelog

* Automated Commit - Formatting Changes

* bump definitions

---------

Co-authored-by: Jimmy Ma <gosusnp@users.noreply.github.com>
Co-authored-by: Jimmy Ma <jimmy@airbyte.io>
Co-authored-by: octavia-squidington-iii <octavia-bot@airbyte.io>
Co-authored-by: edgao <edgao@users.noreply.github.com>
2023-03-23 09:37:15 -07:00
Simon Späti
2bbc4f6f83 🎉 New Destination: DuckDB (#17494)
This is the first version of the DuckDB destination. There are potential edge cases that still need to be taken care of. But looking forward to your feedback.
2023-02-07 11:33:10 +01:00
Edward Gao
517fc6ac10 Normalization: Revert to protocol v0 (#22283)
* Revert "Normalization: handle non-object top-level schemas; treat binary data as string (#22165)"

This reverts commit 8276d03359.

* Revert "Normalization: check for ref type existence (#22161)"

This reverts commit dbe56d6fc2.

* Revert "🎉Updated normalization to handle new datatypes (#19721)"

This reverts commit c1d7736639.

* revert dest definitions

* also dockerfile

* re-add to changelog

* add comment in dockerfile
2023-02-06 10:14:36 -08:00
Jimmy Ma
6660b13ad2 Add Airbyte Protocol V1 support. (#20036)
* Add Airbyte Protocol V1 support.

* Fix VersionedAirbyteStreamFactoryTest

* Remove AirbyteMessageMigrationV0 example

* Add Protocol Version constants

* 🎉Updated normalization to handle new datatypes (#19721)

* Updated normalization simple stream processing to handle new datatypes

* Updated normalization nested stream processing to handle new datatypes

* Updated normalization nested stream processing to handle new datatypes

* Updated normalization drop_scd_catalog processing to handle new datatypes

* Updated normalization ephemeral test processing to handle new datatypes

* fixed more tests for normalization

* fixed more tests for normalization

* fixed more tests for normalization

* fixed more tests for normalization

* fixed more issues

* fixed more issues (clickhouse)

* fixed more issues

* fixed more issues

* fixed more issues

* added binary type processing for some DBs

* cleared commented code and moved some hardcodes to processing as macro

* fixed codestyle and cleared commented code

* minor refactor

* minor refactor

* minor refactor

* fixed bool cast error

* fixed dict->str cast error

* fixed is_combining_node cast py check

* removed commented code

* removed commented code

* committed autogenerated normalization_test_output files

* committed autogenerated normalization_test_output files (new files)

* refactored utils.py

* Updated utils.py to use Callable functions and get rid of property_type in is_number and is_bool functions

* committed autogenerated normalization_test_output files (new files)

* fixed typo in TIMESTAMP_WITH_TIMEZONE_TYPE

* updated stream_processor to handle string type first as a wider type

* fixed arrays normalization by updating is_simple_property method as per new approaches

* format

Co-authored-by: Edward Gao <edward.gao@airbyte.io>

* Update airbyte protocol migration (#20745)

* Extract MigrationContainer from AirbyteMessageMigrator

* Add ConfiguredAirbyteCatalogMigrations

* Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations

* Enable ConfiguredAirbyteCatalog migration

* Fix tests

* Remove extra this.

* Add missing docs

* Typo

Co-authored-by: Edward Gao <edward.gao@airbyte.io>

* Data types update: Implement protocol message migrations (#19240)

* Extract MigrationContainer from AirbyteMessageMigrator

* Add ConfiguredAirbyteCatalogMigrations

* Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations

* Enable ConfiguredAirbyteCatalog migration

* set up scaffolding

* [wip] more scaffolding, basic unit test

* minimal green code

* [wip] add failing test for other primitive types

* correct version number

* handle basic primitive type decls

* add implicit cases

* add recursive schema

* formatting

* comment

* support not

* fix indentation

* handle all nested schema cases

* handle boolean schemas

* verify empty schema handling

* cleanup

* extract map

* code organization

* extract method

* reformat

* [wip] more tests, minor fix type array handling

* corrected test

* cleanup

* reformat

* switch to v1

* add support for multityped fields

* missed test case

* nested test class

* basic record upgrade

* implement record upgrades

* slight refactor

* comments+clarificationso

* extract constants

* (partly) correct model classes

* add de/ser

* formatting

* extract constants

* fix json reference

* update docs

* switch to v1 models

* fix compile+test

* add base64 handling

* use vnull

* Data types update: Implement protocol message downgrade path (#19909)

* rough skeleton for passing catalog into migration

* basic test

* more scaffolding

* basic implementation

* add primitives test

* add in other tests (nested fields currently failing)

* add formats

* impleent oneOf handling

* formatting

* oneOf handling

* better tests

* comments + organization

* progress

* basic test case

* downgrade objects, ish

* basic array implementation

* handle numeric failure

* test for new type

* handle array items

* empty schema handling

* first pass at oneof handling

* add more tests+handling

* more tests

* comments

* add empty oneof test case

* format + reorganize

* more reorganize

* fix name

* also downgrade binary data

* only import vnull

* move migrations into v1 package

* extract schema mutation code

* comment

* extract schema migration to new class

* extract record downgrade logic for future use

* format

* fix build after rebase

* rename private method for consistency

* also implement configuredcatalog migrations >.>

* quick and dirty tests

* slight cleanup

* fix tests

* pmd

* pmd test

* null check on message objects

* maybe fix acceptance tests?

* fix name

* extract constants

* more fixes

* tmp

* meh

* fix cdc acc tests

* revert to master source-postgres

* remove log messages

* revert other misc hacks

* integers are valid cursors

* remove unrelated change

* fix build

* fix build more?

* [MUST REVERT] use dev normalization

* capture kube logs

* also here?

* no debug logs?

* delete dup from merging

* add final everywhere

* revert test changes

Co-authored-by: Jimmy Ma <jimmy@airbyte.io>

* On-the-fly migrations of persisted catalogs (#21757)

* On the fly catalog migration for normalization activity

* On the fly catalog migration for job persistence

* On the fly migration for standard sync persistence

* On the fly migration for airbyte catalogs

* Refactor code to share JsonSchema traversal

* Add V0 Data type search function

* PMD and Format

* Fix getOrInsertActorCatalog and ConfigRepositoryE2E tests

* Null-proofing CatalogMigrationV1Helper

* More null checks

* Fix test

* Format

* Add data type v1 support to the FE

* Changes AC test check to check exited ps (#21672)

some docker compose changes no longer show exited
processes.  this broke out test

this change should fix master

tested in a runner that failed

* Move wellknown types mapping to the utility function

* use protocolv1 normalization

---------

Co-authored-by: Topher Lubaway <asimplechris@gmail.com>
Co-authored-by: Edward Gao <edward.gao@airbyte.io>

* Update protocol support range (#21996)

* bump normalization version to 0.3.0

* Add version check on normalization (#22048)

* Add normalization min version check

* Add visible for testing

---------

Co-authored-by: Edward Gao <edward.gao@airbyte.io>
Co-authored-by: Eugene <etsybaev@gmail.com>
Co-authored-by: Topher Lubaway <asimplechris@gmail.com>
2023-01-30 10:17:49 -08:00
Greg Solovyev
8cf546483d 🐛 Add a drop table hook to drop scd tables in case of overwrite sync (#18015)
* Add a drop table hook to drop scd tables in case of overwrite sync

* Add an integration test for dropping SCD table on overwrite

* skip new test for Oracle and TiDB

* Add normalization run after initial reset

* Bump normalization version
2022-11-01 08:52:02 -07:00
Daemonxiao
d4524032ae 🎉 New Destination: TiDB (#15592)
* Add new destination-tidb

* support sync

* Add normalization-tidb

* fix failed tests

* Add unnest marco

* fmt

* Add new destination-tidb

* support sync

* Add normalization-tidb

* fix failed tests

* Add unnest marco

* fmt

* fmt

* fix integration test

* Update docs/integrations/destinations/tidb.md

Co-authored-by: Xiang Zhang <angwerzx@126.com>

* Update doc

* Update doc

* Update doc

* bump normalization version

* update normalization changelog

* run format

* add dest def

* generat spec

Co-authored-by: Xiang Zhang <angwerzx@126.com>
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
2022-08-31 16:50:27 -03:00
Greg Solovyev
5819733ab1 Greg/guykoh update dbt clickhouse (#14897)
* Update dbt-clickhouse version to 1.1.7 to support AirByte on ClickHouse cloud

* Fix quote handling in Clickhouse normalization tests

* Update test output for Clickhouse

* Bump version and update changelog

Co-authored-by: guykohen <guy@clickhouse.com>
2022-08-22 21:53:11 -07:00
Edward Gao
b2dd470d3d Handle ints and longs in normalization (#14362)
* generate airbyte_type:integer

* normalization accepts `airbyte_type: integer`

* handles ints+longs

* update avro for consistency

* delete long type for now, treat all ints as longs

* update avro type mappings

{type:number, airbyte_type:integer} -> long
{type:number, airbyte_type:big_integer} -> string (i.e. "unbounded integer")

* fix test

* remove long handling

* Revert "remove long handling"

This reverts commit 33ade8d2831e675c3545ac6019d200ec312e54d9.

* Revert "update avro type mappings"

This reverts commit 5b0349badad7545efe8e1191291a628445fe1c84.

* Revert "delete long type for now, treat all ints as longs"

This reverts commit 018efd4a5d0c59f392fd8e3b0d0967c666b72947.

* Revert "update avro for consistency"

This reverts commit bcf47c6799b5906deb4f219d7f6e64ea73b41b74.

* newline@eof

* update test

* slightly better local tests

* fix test

* missed a few cases

* postgres tests use correct hostnames

* fix normalization

* fix int macro

* add test case

* normalization test output

* handle int/long correctly

* fix types for other DBs

* uint32 -> bigint; tests

* add type value assertions

* more test updates

* regenerate output

* reconcile big_integer to match docs

* update comment

* fix type

* fix mysql constructor call

* bigint only has 38 digits

* fix s3 ints, fix DAT test case

* big_integer should be string

* reduce to 28 digit big_ints

* fix test setup, mysql

* kill big_integer tests

* regenerate output

* version bumps

* auto-bump connector version [ci skip]

* auto-bump connector version [ci skip]

* auto-bump connector version [ci skip]

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-07-26 16:40:14 -07:00
Anna Lvova
49636982c1 🎉 Base Normalization: handle airbyte_type from stream schema in normalization (#13591)
* add datatypes

* up

* up

* add MySQL

* add MSSQL

* fix

* add macros

* add macros

* upd

* upd

* upd for clickhouse

* Return datetime2 for MS SQL

* Upd time type for mysql

* Upd datetime for MySQL

* update

* upd date type for clickhouse

* up

* auto-generate

* bump version

* bump version
2022-07-26 19:49:05 +03:00
Edward Gao
89e78a6be5 🐛 Destination BIgQuery can handle nulls inside arrays (#14522) 2022-07-13 18:21:17 -07:00
Serhii Chvaliuk
49d181a198 Normalization: Fix incorrect jinja2 macro json_extract_array call (#13894)
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-06-19 13:13:49 +03:00
Edward Gao
897522cf51 Add some dev-facing normalization docs (#13780) 2022-06-15 08:21:14 -07:00
Edward Gao
61ce03a436 🐛 Normalization correctly propagates deletions to the final tables (#12846) 2022-06-14 14:56:18 -07:00
Serhii Chvaliuk
0342699daf Normalization: rename *.sql -> *.sql.j2 (#13474)
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-06-06 18:58:34 +03:00
Davin Chia
e93bb85dc7 Fix build. (#12242) 2022-04-21 20:46:31 +08:00
Yurii Bidiuk
9d9507b227 revert formatting for test_pokemon_super.sql (#12234) 2022-04-21 11:28:23 +03:00
Yurii Bidiuk
785bcc4a9a 🐛 Destination Redshift: fix switching mode (#12085)
* fix switching mode for redshift

* bump version

* format code

* update spec
2022-04-20 16:57:15 +03:00
Serhii Chvaliuk
7023fbd48e Redshift SUPER type (#12064)
* 🎉 Destination Redshift: Use SUPER data type on Redshift destination for raw JSON data (#9407)

Co-authored-by: Oleksandr Tsukanov <alexander.tsukanovvv@gmail.com>
Co-authored-by: Sergey Chvalyuk <grubberr@gmail.com>
Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-04-20 15:11:22 +03:00
Marcos Marx
511819b5ae Normalization fix Prefix Tables starting with number (#9301)
* add normalization-clickhouse docker build step

* bump normalization version

* small changes gradle

* fix settings gradle

* fix eof file

* correct clickhouse normalization

* Refactor jinja template for scd (#9278)

* merge chris code and regenerate sql files

* correct scd post-hook generation for snowflake

* fix scd table for snowflake prefix table with number

* scd fix for all destinations

* use quote

* use normalize column for post-hook

* change logic to apply quote

* add logic to handle prefix for mssql and oracle

* run tests

* correct unit test

* bump normalization version

Co-authored-by: James Zhao <james.zhao@sinoreps.com>
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
2022-01-06 23:39:41 -03:00
Christophe Duong
c5d4a97363 🐛 Fix normalization issue with quoted & case sensitive columns (#9317) 2022-01-06 18:59:09 +01:00
Christophe Duong
e0bac4aaeb 🐛 Fix normalization SCD partition by float columns errors with BigQuery (#9281) 2022-01-06 18:49:31 +01:00
Bo Lu
bbcd461bc5 🎉 New Destination: ClickHouse (#7620)
* add ClickHouse destination

* update docs

* format code

* code improvement as per code review

* add ssh tunneling and ssl/tls support and code enhancement

* merge from master

* disable testCustomDbtTransformationsFailure test

* fix string format bug

* fix reserved keywords bug and disable dbt

* disable dbt in expect result

* add type hints

* bump connector version

Co-authored-by: Alexander Tsukanov <alexander.tsukanovvv@gmail.com>
Co-authored-by: Marcos Marx <marcosmarxm@gmail.com>
2021-12-13 19:39:19 -03:00
Christophe Duong
b424c1a0e7 🐛 Fix incremental normalization with empty tables (#8394)
* Fix incremental with empty final tables

* upgrade docker images

* Regen SQL

* Bumpversion & format
2021-12-01 23:40:14 +01:00
Christophe Duong
affea7f60b 🐛 Minor fixes to incremental normalization and nesting (#7669) 2021-11-08 17:42:57 +01:00
Christophe Duong
5fc50df39d 🎉 Incremental Normalization (#7162) 2021-10-29 13:53:02 +02:00
Christophe Duong
c4620559d7 🎉 Refactor Normalization docker images and upgrade to use dbt 0.21.0 (#6959)
* Split normalization docker images for some connectors with specifics dependencies

* Regenerate (#7003)
2021-10-14 20:29:16 +02:00
Anna Lvova
ec68f478ff 🐛 fix: Normalization date-time should handle empty strings "" (#6379)
* add empty string normalization for postgres

* add empty string normalization for destinations

* fix

* fix

* fix

* fix for snowflake

* fix for mysql

* fix normalization for mysql

* upd doc

* upd doc

* Update airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>

* Update airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>

* bump version

* bump version

* add datetime normalization for mssql

* upd row count for mssql

* upd

* bump version

* upd docs for 0.1.50 normalization version

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
2021-10-08 13:57:37 +03:00
Harshith Mullapudi
29ea7f19eb Add integration tests for Normalization - added ad_cdc_log_pos (#6799)
* integration tests for bigquery

* added for postgres

* added tests for all the destinations

* Bump version
2021-10-08 14:31:28 +05:30
Baz
e5abaeccef 🎉 Base-normalization: Implement normalization for MSSQL-destination (#6079)
See the attached PR (https://github.com/airbytehq/airbyte/pull/6079)
2021-10-07 18:46:27 +03:00
Yaroslav Dudar
a6ecfda2ca 🐛 Fix Snowflake destination normalization to accept any date-time format. (#6052)
snowflake date-time format parser
2021-09-23 11:10:12 +03:00
Marcos Marx
589d535a61 🎉 Oracle normalization (#5562)
* oracle normalization

* correct dbt_project function for oracle

* unit tests

* run format

* correct ephemeral tests

* add gradle dependency for oracle destination

* run int tests

* add oracle in settings.gradle for normalization run[

* use default airbyte columns

* format

* test all destinatoin ephemeral

* correct unit test

* correct unit test

* destination docs update

* correct mypy

* integration test all dest

* refactor oracle function

* merge master

* run all destinations

* flake8 escape regex

* surrogate key function

* correct few minor comments

* refactor scd sql function

* refactor scd function

* revert test

* refactor minor details

* revert tests

* revert ephemeral test

* revert unit test table_registry

* revert airbyte_protocol format

* format

* bump normalization version in worker

* minor chnages

* minor chages

* correct json_column for other destinations

* gradlew format

* revert tests

* remove comments

* add Oracle destination explicit in safe_cast_str

* add quote_in_parenthesis inside if clause

* gradlew format
2021-09-07 16:39:17 -03:00
Marcos Marx
7225187fa1 run gradlew format (#5552) 2021-08-20 15:38:28 -03:00
Marcos Marx
a9b2c08934 Add condition for unnest_column_name for pg/redshift/mysql (#5467)
* add unnest_column case conflict

* add redshift files

* format

* change logic

* change logic for unnest

* bump normalization version

* add files

* add stream test unnest_alias
2021-08-20 11:09:15 -03:00
Christophe Duong
f9705bf731 BigQuery normalization: make credentials json optional (#5433)
* Allow service-account-json or oauth methods for bigquery destinations
2021-08-17 11:50:17 +02:00
Marcos Marx
e4fe62f739 Normalization: solve conflict when stream and field have same name (#4557)
* solve conflict when stream and field have same name

* add logic to handle conflict

* change files

* change json_extract functions

* json_operations

* add normalization files

* test integration mysql

* remove table_alias

* mysql run

* json ops

* solve conflict with master

* solve mysql circle dependency dbt

* add tests for scalar and arrays

* add sql files

* bump normalization version

* format
2021-08-11 20:18:45 -03:00
Subodh Kant Chaturvedi
923884b897 introduce implementation for date-time support in normalization (#5180)
* introduce implementation for date-time support in normalization

* update test output for all destinations

* add comment
2021-08-11 02:28:03 +05:30
Christophe Duong
d6429a410a Normalization handles quote in column names (#5027)
* Handle quotes in columns names
2021-07-28 16:00:13 +02:00
Christophe Duong
5cdc7f8517 🐛 (contribution) Fix SQL model to build a Type 2 SCD to handle NULL cursor_field values correctly (#4881)
* Update SQL model to build a Type 2 Slowly Changing Dimension (#4802)

* Make SQL more portable

* Bumpversion of normalization

Co-authored-by: Daniel Diamond <33811744+danieldiamond@users.noreply.github.com>
2021-07-22 16:27:54 +02:00
LiRen Tu
2caf3904f0 🎉 MySQL destination: normalization (#4163)
* Add mysql dbt package

* Add mysql normalization support in java

* Add mysql normalization support in python

* Fix unit tests

* Update readme

* Setup mysql container in integration test

* Add macros

* Depend on dbt-mysql from git repo

* Remove mysql limitation test

* Test normalization

* Revert protocol format change

* Fix mysel json macros

* Fix two more macros

* Fix table name length

* Fix array macro

* Fix equality test macro

* Update replace-identifiers

* Add more identifiers to replace

* Fix unnest macro

* Fix equality macro

* Check in mysql test output

* Update column limit test for mysql

* Escape parentheses

* Remove unnecessary mysql test

* Remove mysql output for easier code review

* Remove unnecessary mysql test

* Remove parentheses

* Update dependencies

* Skip mysql instead of manually write out types

* Bump version

* Check in unit test for mysql name transformer

* Fix type conversion

* Use json_value to extract scalar json fields

* Move dbt-mysql to Dockerfile (#4459)

* Format code

* Check in mysql dbt output

* Remove unnecessary quote

* Update mysql equality test to match 0.19.0

* Check in schema_test update

* Update readme

* Bump base normalization version

* Update document

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
2021-07-03 20:30:59 -07:00
Marcos Marx
265e7f79d8 Normalization: remove dedup cdc excluded (#4297)
* change stream processor

* integraton tests

* add integration tests

* format gradle file

* add excluded files

* change catalog and msgs

* add cdc messages

* solve cdc excluded problem with tests

* remove .egg files

* remove time import

* tab stream_processor

* uncommented local test

* add tests for dbt!

* add excluded files

* add missing snowflake file

* add pg, bq and snowflake

* chris comments

* test comment

* pytest parametrize tests

* bump normalization version

* formating

* run test for all destinations
2021-06-30 14:59:13 -03:00
Christophe Duong
bb4dcb1987 🎉 Remove hash when it is not necessary from normalization outputs (#3704)
* Refactor `generate_new_table_name` using a table name registry class instead

* update normalization docs

* Enable MyPy

* Regenerate output files

* Closes https://github.com/airbytehq/airbyte/issues/2389

* Bumpversion normalization
2021-06-01 17:07:22 +02:00
Christophe Duong
0265012e42 Handle special characters in columns names (#3133)
* Handle special characters in columns names (add test case to integration tests)

* Add test case with column name collisions

* Bumpversion of normalization image
2021-04-30 11:59:55 +02:00
Christophe Duong
86513d6c54 Fix normalization Nesting bug (#3110)
* New test case for nested streams

* Fix filename naming (collisions and nesting)

* Update generated files from tests with new file naming

* Allow invalid json data in raw tables when normalizing on redshift

* Regenerate final sql files

* Disable unit tests on stream naming (temporarly)

* Fix unnesting bug in postgres

* Reactivate unit tests and change table registry

* Move normalization unit tests to integration tests (too slow)

* Remove heavy catalog.json used in unit_tests (actual catalog from facebook/stripe with thousands of lines)

* Bumpversion of normalization image
2021-04-29 14:32:59 +02:00
Christophe Duong
c2fa3e4c9c Introduce normalization integration tests (#3025)
* Speed normalization unit tests by dropping hubspot catalog (too heavy, will be covering it in integration tests instead

* Add integration tests for normalization

* Add dedup test case

* adjust build.gradle

* add readme for normalization

* Share PATH env variable with subprocess calls

* Handle git non-versionned tests vs versionned ones

* Format code

* Add tests check to normalization integration tests

* Add docs

* complete docs on normalization integration tests

* format code

* Normalization integration tests output (#3026)

* Version generated/output files from normalization integration tests

* simplify cast of float columns to string when used as partition key (#3027)

* bump version of normalization image

* Apply suggestions from code review

Co-authored-by: Jared Rhizor <jared@dataline.io>

* Apply suggestions from code review

Co-authored-by: Jared Rhizor <jared@dataline.io>
2021-04-27 12:01:04 +02:00