* try to publish new normalization version
* default to using ssl in postgres destinatoin
* tidy up
* Run normalization tests using postgres DB with SSL support
* bump version
Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
* updated mysql tests
* updated mysql tests
* added mysql ssh tunnel tests by key
* fixed remarks
* fixed remarks
* updated DatabricksStreamCopier
* switch to custom file for ssh config in normalization
* updated MySQL SSH tests
* bump version
* get local port properly
* updated assertSameValue for MySQL ssh tunnel
* updated image version and documentation
* updated code style
* updated CI credentials
* updated normalization documentation
Co-authored-by: George Claireaux <george@claireaux.co.uk>
* oracle normalization
* correct dbt_project function for oracle
* unit tests
* run format
* correct ephemeral tests
* add gradle dependency for oracle destination
* run int tests
* add oracle in settings.gradle for normalization run[
* use default airbyte columns
* format
* test all destinatoin ephemeral
* correct unit test
* correct unit test
* destination docs update
* correct mypy
* integration test all dest
* refactor oracle function
* merge master
* run all destinations
* flake8 escape regex
* surrogate key function
* correct few minor comments
* refactor scd sql function
* refactor scd function
* revert test
* refactor minor details
* revert tests
* revert ephemeral test
* revert unit test table_registry
* revert airbyte_protocol format
* format
* bump normalization version in worker
* minor chnages
* minor chages
* correct json_column for other destinations
* gradlew format
* revert tests
* remove comments
* add Oracle destination explicit in safe_cast_str
* add quote_in_parenthesis inside if clause
* gradlew format
* solve conflict when stream and field have same name
* add logic to handle conflict
* change files
* change json_extract functions
* json_operations
* add normalization files
* test integration mysql
* remove table_alias
* mysql run
* json ops
* solve conflict with master
* solve mysql circle dependency dbt
* add tests for scalar and arrays
* add sql files
* bump normalization version
* format
* Add mysql dbt package
* Add mysql normalization support in java
* Add mysql normalization support in python
* Fix unit tests
* Update readme
* Setup mysql container in integration test
* Add macros
* Depend on dbt-mysql from git repo
* Remove mysql limitation test
* Test normalization
* Revert protocol format change
* Fix mysel json macros
* Fix two more macros
* Fix table name length
* Fix array macro
* Fix equality test macro
* Update replace-identifiers
* Add more identifiers to replace
* Fix unnest macro
* Fix equality macro
* Check in mysql test output
* Update column limit test for mysql
* Escape parentheses
* Remove unnecessary mysql test
* Remove mysql output for easier code review
* Remove unnecessary mysql test
* Remove parentheses
* Update dependencies
* Skip mysql instead of manually write out types
* Bump version
* Check in unit test for mysql name transformer
* Fix type conversion
* Use json_value to extract scalar json fields
* Move dbt-mysql to Dockerfile (#4459)
* Format code
* Check in mysql dbt output
* Remove unnecessary quote
* Update mysql equality test to match 0.19.0
* Check in schema_test update
* Update readme
* Bump base normalization version
* Update document
Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
* Create new test_ephemeral and refactor with test_normalization
* Add notes in docs
* Refactor common normalization tests into DbtIntegrationTest
* Bumpversion of normalization image
* New test case for nested streams
* Fix filename naming (collisions and nesting)
* Update generated files from tests with new file naming
* Allow invalid json data in raw tables when normalizing on redshift
* Regenerate final sql files
* Disable unit tests on stream naming (temporarly)
* Fix unnesting bug in postgres
* Reactivate unit tests and change table registry
* Move normalization unit tests to integration tests (too slow)
* Remove heavy catalog.json used in unit_tests (actual catalog from facebook/stripe with thousands of lines)
* Bumpversion of normalization image
* Speed normalization unit tests by dropping hubspot catalog (too heavy, will be covering it in integration tests instead
* Add integration tests for normalization
* Add dedup test case
* adjust build.gradle
* add readme for normalization
* Share PATH env variable with subprocess calls
* Handle git non-versionned tests vs versionned ones
* Format code
* Add tests check to normalization integration tests
* Add docs
* complete docs on normalization integration tests
* format code
* Normalization integration tests output (#3026)
* Version generated/output files from normalization integration tests
* simplify cast of float columns to string when used as partition key (#3027)
* bump version of normalization image
* Apply suggestions from code review
Co-authored-by: Jared Rhizor <jared@dataline.io>
* Apply suggestions from code review
Co-authored-by: Jared Rhizor <jared@dataline.io>
Template generation for new Source using the Santa CDK - provide basic scaffolding for someone implementing a new source.
General approach is to buff up comments in the original SDK, and add TODOs with secondary comments in the generated stub methods, as well as links to existing examples (e.g. Stripe or ExchangeRate api) users can look at.
Checked in and added tests for the generated modules.
This PR introduces the following behavior for JDBC sources:
Instead of streamName = schema.tableName, this is now streamName = tableName and namespace = schema. This means that, when replicating from these sources, data will be replicated into a form matching the source. e.g. public.users (postgres source) -> public.users (postgres destination) instead of current behaviour of public.public_users. Since MySQL does not have schemas, the MySQL source uses the database as it's namespace.
To do so:
- Make namespace a field class concept in Airbyte Protocol. This allows the source to propagate namespace and destinations to write to a source-defined namespace. Also sets us up for future namespace related configurability.
- Add an optional namespace field to the AirbyteRecordMessage. This field will be set by sources that support namespace.
- Introduce AirbyteStreamNameNamespacePair as a type-safe manner of identifying streams throughout our code base.
- Modify base_normalisation to better support source defined namespace, specifically allowing normalisation of tables with the same name to different schemas.
Last step (besides documentation) of namespace changes. This is a follow up to #2767 .
After this change, the following JDBC sources will change their behaviour to the behaviour described in the above document.
Namely, instead of streamName = schema.tableName, this will become streamName = tableName and namespace = schema. This means that, when replicating from these sources, data will be replicated into a form matching the source. e.g. public.users (postgres source) -> public.users (postgres destination) instead of current behaviour of public.public_users. Since MySQL does not have schemas, the MySQL source uses the database as it's namespace.
I cleaned up some bits of the CatalogHelpers. This affected the destinations, so I'm also running the destination tests.