* copy files from edgao branch
* start writing create table statement
* add basic unit test setup
* create a table, probably
* remove outdated todo
* derp, one more column
* ugh
* add partitioning+clustering
* use StringSubstitutor
* substitutions in updateTable
* wip generate update/insert statement
* split up into smaller methods
* handle json types correctly
* rename stuff
* more json_query vs _value stuff
* minor tweak
* super basic test setup
* laying foundation for type parsing
* more stuff
* tweaks
* more progress on type parsing
* fix json_value stuff?
* misc fixes in insert
* fix dedupFinalTable
* add testDedupRaw
* full e2e test
* type parsing: gave up and mirrored the dbt code structure to avoid bugs
* type parsing - more cleanup
* handle column name collisions
* handle tablename collisions...?
* comments
* remove original ns/name from quotedstream
* also javadoc
* remove redundant method
* fix table rename
* add incremental append test
* add full refresh append test
* comment
* call T+D sql in a reasonable location for standard inserts
* add config option
* use config option here
* type parsing - fix fromJsonSchema
* gate everything
* log query + runtime
* add spec option temporarily
* Raw Table Updates
* fix more stuff
* first big pass at toDialectType
* no quotes
* wrap everything in quotes
* resolve some TODOs
* log sql statement in tests
* overwriteFinalTable returns optional
* minor clean up
* add raw dataset override
* try to preserve the original namespace for t+d?
* write to the raw table correctly
* update todos
* write directly to raw table
this is kind of dumb because we're still trying to do tmp table operations,
and we still don't ack state until the end of the entire sync.
* standard inserts write to raw table correctly
* imports + log statements
* move logs + add comment
* explicitly create raw table
* move comment to better place
* Typing issues
* bash attempt
* formatting updates
* formatting updates
* write to the airbyte schema by default unless overriden by config options
* standard inserts truncate raw table at start of sync
* full refresh overwrite will overwrite correctly!
* fix avro record schema parsing
* better raw table recreate
* rename raw table to match standard inserts
* full refresh overwrite does tmp table things
* small clean up
* small clean up
* remove errors entry if no errors
* pull out destination config into singleton
* clean up singleton stuff
* make sure dest config exists when trying to do lookups
* avoid stringifying null
* quick thoughts on alter table
* add basic cdc testcase
* tweak cdc test setup
* rename raw table to match standard inserts
* minor tweak
* delete exact sql string assertions
* switch to JSON type
* minor cleanup
* sql whitespace changes
* explain cdc deletions
* GCS Staging Full Refresh create temp table
* assert schema
* first out of order cdc test
* add another cdc test case (currently failing)
* better test structure
* make this work
* oops, fix test
* stop trying to delete deletion records
* minor improvements to code+test
* enable concurrent test runs on integration test
* move stuff to static initializer
* extract utility method
* formatting
* Move conditional to the base java package, replace conditionals which did not use the typing and deduping flag but should have been.
* 🤖 Auto format destination-bigquery code [skip ci]
* 🤖 Auto format destination-gcs code [skip ci]
* switch back to empty list; write big assert
* minor wording tweaks
* 🤖 Auto format destination-bigquery code [skip ci]
* 🤖 Auto format destination-gcs code [skip ci]
* DestinationConfigTest
* 🤖 Auto format destination-bigquery code [skip ci]
* 🤖 Auto format destination-gcs code [skip ci]
* formatting
* remove ParsedType
* 🤖 Auto format destination-gcs code [skip ci]
* 🤖 Auto format destination-bigquery code [skip ci]
* tests verify every data type
* 🤖 Auto format destination-bigquery code [skip ci]
* 🤖 Auto format destination-gcs code [skip ci]
* full update with all data types
* 🤖 Auto format destination-bigquery code [skip ci]
* 🤖 Auto format destination-gcs code [skip ci]
* move stuff to new base lib
* 🤖 Auto format destination-gcs code [skip ci]
* Automated Commit - Formatting Changes
* 🤖 Auto format destination-bigquery code [skip ci]
* fix test
* 🤖 Auto format destination-bigquery code [skip ci]
* 🤖 Auto format destination-bigquery code [skip ci]
* 🤖 Auto format destination-gcs code [skip ci]
* asserts in dedupFinalTable
* better asserts in dedupRawTable
* [wip] test case for all data types
* 🤖 Auto format destination-gcs code [skip ci]
* 🤖 Auto format destination-bigquery code [skip ci]
* AirbyteTypeTest
* Automated Commit - Formatting Changes
* remove comments
* test chooseOneOf
* slightly better test output
* Automated Commit - Formatting Changes
* add some awful pretty print code
* more comment
* minor tweaks
* verify array/object type
* fix test
* handle deletions more correctly
* test toDialectType
* Destinations v2: better namespace handling (#27682)
* [wip] better namespace handling
* 🤖 Auto format destination-bigquery code [skip ci]
* wip also implement in gcs
* get gcs working (?)
* 🤖 Auto format destination-bigquery code [skip ci]
* remove duplicate method
* 🤖 Auto format destination-bigquery code [skip ci]
* fixed my code style settings
* make ci happy?
* 🤖 Auto format destination-bigquery code [skip ci]
* make ci happy?
* remove incorrect test
* blank line change
* initialize singleton
---------
Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
* reset args correctly
* Automated Commit - Formatting Changes
* more bash stuff
* parse implicit structs
* initialize singleton in more tests
* Automated Commit - Formatting Changes
* I missed this namespace handling thing
* test more schemas
* fix singular types specified in arrays
* Automated Commit - Formatting Changes
* disable test for unimplemented feature
* initialize singleton
* remove spec options; changelogs+metadata
* randomize namespace
* also bump dockerfile
* unremove namespace sanitizing in legacy mode
* ... disable the correct test
* even more unit test fixes!
* move integration test to integration tests
---------
Co-authored-by: Cynthia Yin <cynthia@airbyte.io>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: edgao <edgao@users.noreply.github.com>
Co-authored-by: cynthiaxyin <cynthiaxyin@users.noreply.github.com>
* try this?
* fix tests
* assert cdc values
* handle case where we have lsn but no updated_at
* readability improvements
* tweaks to test
* version bumps + changelogs
* Automated Change
---------
Co-authored-by: edgao <edgao@users.noreply.github.com>
* publish normalization
* bump normalization container version in all the destinations that use it
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
Co-authored-by: edgao <edgao@users.noreply.github.com>
* copy tests from other branch
* switch to >
* [wip] wire up tests
* make tests work
* fixes
* nicer test structure
* maybe add feature flag?
* pattern matching
* also add version check
* formatting
* refactor test also
* extract test + fix method call
* minor tweaks
* add context to log message
* put workspace id in normalization input
* use non-semver tag
* add flag for version of normalization
* also flag old version
* add test
* missed part of the commit
* format
* add test for null workspace ID
* Revert "also flag old version"
This reverts commit 3be601d16c.
* Revert "missed part of the commit"
This reverts commit 47a67b4631.
* always apply flag, even if we're behind a version
* derp
* Add more logging to the normalization activity
* Update charts and kustomize for the feature flag
* fix clickhouse integration test
* remove replace_identifiers
* Revert "remove replace_identifiers"
This reverts commit 0e7ded5a7b.
* fix replace_identifiers
* garbage debug logs
* stop trying to setup duckdb test
* wake up and choose violence
* fix mssql
* exclude duckdb from tests
* make snowflake happy
* uncomment tests
* derp
* derpderp
* format
* format
* also fix redshift???
* maybe now everything works???
* remove debug logs
* use special docker tag
* bump to new tag
* use random test schema in publish also
* properly cleanup
* remove feature flag stuff
* version bump + changelog
* Automated Commit - Formatting Changes
* bump definitions
---------
Co-authored-by: Jimmy Ma <gosusnp@users.noreply.github.com>
Co-authored-by: Jimmy Ma <jimmy@airbyte.io>
Co-authored-by: octavia-squidington-iii <octavia-bot@airbyte.io>
Co-authored-by: edgao <edgao@users.noreply.github.com>
* bump dbt-clickhouse to 1.4.0
* fix clickhouse integration test
* exclude duckdb from tests
* add to changelog
* bump normalization version in definitions
---------
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
* Revert "Normalization: handle non-object top-level schemas; treat binary data as string (#22165)"
This reverts commit 8276d03359.
* Revert "Normalization: check for ref type existence (#22161)"
This reverts commit dbe56d6fc2.
* Revert "🎉Updated normalization to handle new datatypes (#19721)"
This reverts commit c1d7736639.
* revert dest definitions
* also dockerfile
* re-add to changelog
* add comment in dockerfile
* Add Airbyte Protocol V1 support.
* Fix VersionedAirbyteStreamFactoryTest
* Remove AirbyteMessageMigrationV0 example
* Add Protocol Version constants
* 🎉Updated normalization to handle new datatypes (#19721)
* Updated normalization simple stream processing to handle new datatypes
* Updated normalization nested stream processing to handle new datatypes
* Updated normalization nested stream processing to handle new datatypes
* Updated normalization drop_scd_catalog processing to handle new datatypes
* Updated normalization ephemeral test processing to handle new datatypes
* fixed more tests for normalization
* fixed more tests for normalization
* fixed more tests for normalization
* fixed more tests for normalization
* fixed more issues
* fixed more issues (clickhouse)
* fixed more issues
* fixed more issues
* fixed more issues
* added binary type processing for some DBs
* cleared commented code and moved some hardcodes to processing as macro
* fixed codestyle and cleared commented code
* minor refactor
* minor refactor
* minor refactor
* fixed bool cast error
* fixed dict->str cast error
* fixed is_combining_node cast py check
* removed commented code
* removed commented code
* committed autogenerated normalization_test_output files
* committed autogenerated normalization_test_output files (new files)
* refactored utils.py
* Updated utils.py to use Callable functions and get rid of property_type in is_number and is_bool functions
* committed autogenerated normalization_test_output files (new files)
* fixed typo in TIMESTAMP_WITH_TIMEZONE_TYPE
* updated stream_processor to handle string type first as a wider type
* fixed arrays normalization by updating is_simple_property method as per new approaches
* format
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
* Update airbyte protocol migration (#20745)
* Extract MigrationContainer from AirbyteMessageMigrator
* Add ConfiguredAirbyteCatalogMigrations
* Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations
* Enable ConfiguredAirbyteCatalog migration
* Fix tests
* Remove extra this.
* Add missing docs
* Typo
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
* Data types update: Implement protocol message migrations (#19240)
* Extract MigrationContainer from AirbyteMessageMigrator
* Add ConfiguredAirbyteCatalogMigrations
* Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations
* Enable ConfiguredAirbyteCatalog migration
* set up scaffolding
* [wip] more scaffolding, basic unit test
* minimal green code
* [wip] add failing test for other primitive types
* correct version number
* handle basic primitive type decls
* add implicit cases
* add recursive schema
* formatting
* comment
* support not
* fix indentation
* handle all nested schema cases
* handle boolean schemas
* verify empty schema handling
* cleanup
* extract map
* code organization
* extract method
* reformat
* [wip] more tests, minor fix type array handling
* corrected test
* cleanup
* reformat
* switch to v1
* add support for multityped fields
* missed test case
* nested test class
* basic record upgrade
* implement record upgrades
* slight refactor
* comments+clarificationso
* extract constants
* (partly) correct model classes
* add de/ser
* formatting
* extract constants
* fix json reference
* update docs
* switch to v1 models
* fix compile+test
* add base64 handling
* use vnull
* Data types update: Implement protocol message downgrade path (#19909)
* rough skeleton for passing catalog into migration
* basic test
* more scaffolding
* basic implementation
* add primitives test
* add in other tests (nested fields currently failing)
* add formats
* impleent oneOf handling
* formatting
* oneOf handling
* better tests
* comments + organization
* progress
* basic test case
* downgrade objects, ish
* basic array implementation
* handle numeric failure
* test for new type
* handle array items
* empty schema handling
* first pass at oneof handling
* add more tests+handling
* more tests
* comments
* add empty oneof test case
* format + reorganize
* more reorganize
* fix name
* also downgrade binary data
* only import vnull
* move migrations into v1 package
* extract schema mutation code
* comment
* extract schema migration to new class
* extract record downgrade logic for future use
* format
* fix build after rebase
* rename private method for consistency
* also implement configuredcatalog migrations >.>
* quick and dirty tests
* slight cleanup
* fix tests
* pmd
* pmd test
* null check on message objects
* maybe fix acceptance tests?
* fix name
* extract constants
* more fixes
* tmp
* meh
* fix cdc acc tests
* revert to master source-postgres
* remove log messages
* revert other misc hacks
* integers are valid cursors
* remove unrelated change
* fix build
* fix build more?
* [MUST REVERT] use dev normalization
* capture kube logs
* also here?
* no debug logs?
* delete dup from merging
* add final everywhere
* revert test changes
Co-authored-by: Jimmy Ma <jimmy@airbyte.io>
* On-the-fly migrations of persisted catalogs (#21757)
* On the fly catalog migration for normalization activity
* On the fly catalog migration for job persistence
* On the fly migration for standard sync persistence
* On the fly migration for airbyte catalogs
* Refactor code to share JsonSchema traversal
* Add V0 Data type search function
* PMD and Format
* Fix getOrInsertActorCatalog and ConfigRepositoryE2E tests
* Null-proofing CatalogMigrationV1Helper
* More null checks
* Fix test
* Format
* Add data type v1 support to the FE
* Changes AC test check to check exited ps (#21672)
some docker compose changes no longer show exited
processes. this broke out test
this change should fix master
tested in a runner that failed
* Move wellknown types mapping to the utility function
* use protocolv1 normalization
---------
Co-authored-by: Topher Lubaway <asimplechris@gmail.com>
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
* Update protocol support range (#21996)
* bump normalization version to 0.3.0
* Add version check on normalization (#22048)
* Add normalization min version check
* Add visible for testing
---------
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
Co-authored-by: Eugene <etsybaev@gmail.com>
Co-authored-by: Topher Lubaway <asimplechris@gmail.com>
* add WellKnownTypes.yaml
* rename to snakecase + put in airbyte-protocol
* add examples
* more descriptoins
* descriptions, more restrictions, better regex
* update documentation
* explicitly call out BC support
* Update cdc.md
Added a link to the article about that explains Airbyte replication modes
* Update cdc.md
Added a link to the CDC "exploration" tutorial
* Update incremental-deduped-history.md
Added a link to the incremental sync tutorial
* Update incremental-deduped-history.md
* Update incremental-deduped-history.md