* Update schema
* generate python
* Stream as an object
* PR comments
* generate python
* rm unused required
* Description the state with no type
* Fix connector build
* Format
* format
Co-authored-by: cgardens <charles@airbyte.io>
## What
Finale of https://github.com/airbytehq/airbyte/pull/13122.
We've renamed all directories in previous PRs. Here we remove the fat jar configuration and add publishing to all subprojects.
Explanation for what is happening:
Identically named subprojects have the following issues:
* publishing as is leads to classpath confusion when the jars with the same names are placed in the Java distribution. This leads to NoClassDefFound errors on runtime.
* deconflicting the jar names without changing directory names leads to dependency errors as the OSS jar pom files are generated using project dependencies (suggesting a dependency a sibling subproject in the same repo) that use subprojects group and name as a reference. This means the generated jars look for Jars that do not exists (as their names have been changed) and cannot compile.
* the workaround to changing a subproject's name involves resetting the subproject's name in the settings.gradle and depending on the new name in each build.gradle. This increases configuration burden and decreases the ease of reading, since one will have to check the settings.gradle to know what the right subproject name is. See https://github.com/gradle/gradle/issues/847 for more info.
* given that Gradle itself doesn't have support for identically named subprojects (see the linked issue), the simplest solution is to not allow duplicated directories. I've only renamed conflicting directories here to keep things simple. I will create a follow up issues to enforce non-identical subproject names in our builds.
## How
* Remove fat jar configuration.
* Add publishing to all subprojects.
Part 1 of #13122.
Rename airbyte-db:lib to airbyte-db:db-lib.
Rename airbyte-metrics:lib to airbyte-metrics:metrics-lib
Rename airbyte-protocol:models to airbyte-protocol:protocol-models.
Explanation for what is happening:
Identically named subprojects have the following issues:
- publishing as is leads to classpath confusion when the jars with the same names are placed in the Java distribution. This leads to NoClassDefFound errors on runtime.
- deconflicting the jar names without changing directory names leads to dependency errors as the OSS jar pom files are generated using project dependencies (suggesting a dependency a sibling subproject in the same repo) that use subprojects group and name as a reference. This means the generated jars look for Jars that do not exists (as their names have been changed) and cannot compile.
- the workaround to changing a subproject's name involves resetting the subproject's name in the settings.gradle and depending on the new name in each build.gradle. This increases configuration burden and decreases the ease of reading, since one will have to check the settings.gradle to know what the right subproject name is. See Projects with same name lead to unintended conflict resolution gradle/gradle#847 for more info.
- given that Gradle itself doesn't have support for identically named subprojects (see the linked issue), the simplest solution is to not allow duplicated directories. I've only renamed conflicting directories here to keep things simple. I will create a follow up issues to enforce non-identical subproject names in our builds.
What
Update the airbyte state message to support the per stream state.
The state message still contains the old way of storing the state in the data fields. It introduce 2 new fields to represent the global and the per stream state.
* generate AirbyeTraceMessage `type` enum with descriptive class name
* add comment on `title` usage
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* apply changes to bases/airbyte-protocol
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* Source-MySql: transform binary data base64 format, add integration tests
* Source-MySql: fix code style
* Source-MySql: bump versions
* Source-MySql: bump versions in source_specs.yaml
* Source-MySql: added test for stream with binary data for DestinationAbstractTest
* Source-MySql: added format
* Change OAuth API
* Change protocol for new OAuth Spec (#7827)
* Add examples
* Add protocol object to api too
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* add specs module with logic to fetch specs on build
* format + build and add gradle dependency for new script
* check seed file for existing specs + refactor
* add tests + a bit more refactoring
* run gw format
* update yaml config persistence to merge specs into definitions
* add comment
* delete secrets migration to be consistent with master
* add dep
* add tests for GcsBucketSpecFetcher
* get rid of static block + format
* DRY up parse call
* add GCS details to comment
* formatting + fix test
* update comment
* do not format seed specs files
* change signature of run to allow cloud to reuse this script
* run gw format
* revert commits that change signature of run
* fix comment typo
Co-authored-by: Davin Chia <davinchia@gmail.com>
* rename enum to be distinct from the enum in cloud
* add missing dependencies between modules
* add readme for seed connector spec generator
* reword
* reference readme in comment
* ignore 'spec' field in newFields logic
Co-authored-by: Davin Chia <davinchia@gmail.com>
* adding google sheets oauth flow to server
* fix oauth type in protocol yaml
* bump sheets version in definitions
* added GDrive scope
* update sheets to master changes
* update protocol incl. cdk
* protocol typing for oauth rootobject
* format
* destination-specification: add supportsNormalization and supportsDBT attributes
* address review comment
* missed this one
* output after gradle format
* destination: add implementation for mysql as destination
* Fix formatting errors.
* address review comments + fix flaky test
* fix formatting
* address Davin's review comments
* add missing todo
* enable namespace test + only provide test user the minimum permissions required
Co-authored-by: Davin Chia <davinchia@gmail.com>
This PR introduces the following behavior for JDBC sources:
Instead of streamName = schema.tableName, this is now streamName = tableName and namespace = schema. This means that, when replicating from these sources, data will be replicated into a form matching the source. e.g. public.users (postgres source) -> public.users (postgres destination) instead of current behaviour of public.public_users. Since MySQL does not have schemas, the MySQL source uses the database as it's namespace.
To do so:
- Make namespace a field class concept in Airbyte Protocol. This allows the source to propagate namespace and destinations to write to a source-defined namespace. Also sets us up for future namespace related configurability.
- Add an optional namespace field to the AirbyteRecordMessage. This field will be set by sources that support namespace.
- Introduce AirbyteStreamNameNamespacePair as a type-safe manner of identifying streams throughout our code base.
- Modify base_normalisation to better support source defined namespace, specifically allowing normalisation of tables with the same name to different schemas.
Last step (besides documentation) of namespace changes. This is a follow up to #2767 .
After this change, the following JDBC sources will change their behaviour to the behaviour described in the above document.
Namely, instead of streamName = schema.tableName, this will become streamName = tableName and namespace = schema. This means that, when replicating from these sources, data will be replicated into a form matching the source. e.g. public.users (postgres source) -> public.users (postgres destination) instead of current behaviour of public.public_users. Since MySQL does not have schemas, the MySQL source uses the database as it's namespace.
I cleaned up some bits of the CatalogHelpers. This affected the destinations, so I'm also running the destination tests.
This PR is step 5 of this tech spec - https://docs.google.com/document/d/1qFk4YqnwxE4MCGeJ9M2scGOYej6JnDy9A0zbICP_zjI/edit.
The first of (at least) 2 PRs to implement this on the source side. I made some headway before deciding to break the changes into one PR implementing this for discover schema job, and another PR implementing this for read. The combined PR would have been too big otherwise.
Also refactor MoreResources as the test method was attempting to write to the location classes where loaded out from - the issue is we cannot guarantee where the class is loaded from can be written to. Changing this to write to a random folder in the temp directory.
Add namespace field to the Airbyte Stream in preparation to propagate a source defined namespace to the Destination.
This namespace field is then consumed as the destination schema the table is written to.
This only applies to JDBC destinations.
This is Steps 1 - 4 of the namespace tech spec, seen at https://docs.google.com/document/d/1qFk4YqnwxE4MCGeJ9M2scGOYej6JnDy9A0zbICP_zjI/edit.
Some minor refactoring and commenting as I go.
* Remove unnecessary test classes as they match Integration tests in terms of what is being tested. They have no real value since the corresponding integration test can be run locally without additional credentials. The main value the classes brings is letting us run tests without building the docker image (the integration tests require doing so). however I feel this benefit is not worth the additional maintenance cost.
* Centralise DataArgumentProvider into it's own class for easier maintenance and usability.
* Handle destination sync mode in destinations
* Source & Destination sync modes are required (#2500)
* Provide Migration script making sure it is always defined for previous sync configs
* Add standard tests for sources that use the JdbcSource to guarantee that changes do not break any sources that rely on JdbcSource.
* Add JdbcStressTest to verify that we stream / chunk data properly (a.k.a can handle more data in any JdbcSource than fits in memory)
* Migrate MSSQL and Redshift to user the new base source