* Switch to measure max row byte size
* Reduce fetch size change logs
* Update unit tests
* Determine jdbc buffer size based on max memory
* Bump postgres version
* Bump postgres version
* Bump mysql version
* Bump mssql version
* Format java code
* Increase hikari connection timeout
* Update data source default parameters
* auto-bump connector version
* Mark postgres 0.4.21 as not published
* Revert "Bump mysql version"
This reverts commit ad9135258c.
* Fix unit test
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Part 1 of #13122.
Rename airbyte-db:lib to airbyte-db:db-lib.
Rename airbyte-metrics:lib to airbyte-metrics:metrics-lib
Rename airbyte-protocol:models to airbyte-protocol:protocol-models.
Explanation for what is happening:
Identically named subprojects have the following issues:
- publishing as is leads to classpath confusion when the jars with the same names are placed in the Java distribution. This leads to NoClassDefFound errors on runtime.
- deconflicting the jar names without changing directory names leads to dependency errors as the OSS jar pom files are generated using project dependencies (suggesting a dependency a sibling subproject in the same repo) that use subprojects group and name as a reference. This means the generated jars look for Jars that do not exists (as their names have been changed) and cannot compile.
- the workaround to changing a subproject's name involves resetting the subproject's name in the settings.gradle and depending on the new name in each build.gradle. This increases configuration burden and decreases the ease of reading, since one will have to check the settings.gradle to know what the right subproject name is. See Projects with same name lead to unintended conflict resolution gradle/gradle#847 for more info.
- given that Gradle itself doesn't have support for identically named subprojects (see the linked issue), the simplest solution is to not allow duplicated directories. I've only renamed conflicting directories here to keep things simple. I will create a follow up issues to enforce non-identical subproject names in our builds.
# Summary
- A follow-up PR for #5543.
- This PR separates the `airbyte-db` project to two modules:
- `lib` is the original `airbyte-db`.
- `jooq` is for jOOQ code generation.
- This is necessary because the jOOQ generator requires a custom database implementation that can run Flyway migration. So the code generator logic needs to depend on the compilation of the original `airbyte-db` project.
# Commits
* Separate db to lib and jooq modules
* Update dependencies
* Add jobs db migrator test
* Fix compose build
* Add migration dev center
* Add schema dump task
* Update airbyte-db/lib/README.md
* Co-authored-by: Davin Chia <davinchia@gmail.com>
* Update readme
* Remove bom dependency
* Update readme
* Use jooq code in db config persistence
* Remove AirbyteConfigsTable
Co-authored-by: Davin Chia <davinchia@gmail.com>
* Use CDK to generate source that can be configured to emit a certain number of records and always works.
* Checkpoint: socat works from inside the docker container.
* Override the entry point.
* Clean up and add ReadMe.
* Clean up socat.
* Checkpoint: connect to Kube cluster and list all the pods.
* Checkpoint: Sync worker pod is able to send output to the destination pod.
* Checkpoint: Sync worker creates Dest pod if none existed previously. It also waits for the pod to be ready before doing anything else. Sync worker will also remove the pod on termination.
* update readme
* Checkpoint: Dest pod does nott restart after finishing. Comment out delete command in Sync worker.
* working towards named pipes
* named pipes working
* update readme
* WIP named pipe / socat sidecar kube port forwarding (#3518)
* nearly working sources
* update
* stdin example
* move all kube testing yamls into the airbyte-workers directories. sort the airbyte-workers resource folder; place all the poc yamls together.
* Format.
* Put back the original KubeProcessBuilderFactory.
* Fix slight errors.
* Checkpoint: Worker pod knows its own IP. Successfully starts and writes to Dest pod after refactor.
* remove unused file and update readme
* Dest pod loops back into worker pod. However, the right messages do not seem to be passing in.
* Switch back to worker ip.
* SWEET VICTORY!.
* wrap kube pod in process (#3540)
also clean up kubernetes deploys.
* More clean up. (#3586)
The first 6 points of #3464.
The only interesting thing about this PR is the kube pod shutdown. For whatever reason, the OkHttpPool isn't respecting the evictAll call and 1 idle thread remains. So instead of shutting down immediately, the worker pod shuts down after 5 mins when the idle thread id reaped. There isn't an easy way to modify the pool's idle reap configuration now. I do not think this issue is blocking since it's relatively benign, so I vote we create a ticket and come back to this once we do an e2e test.
* Implements redirecting standard error as well. (#3623)
* Clean up before next implementation.
* kube process launching (#3790)
* processes must handle file mounting
* remove comment
* default to base entrypoint
* use process builder factory / select stdin / use a pool of ports
* fix up
* add super hacky copying example
* Checkpoint: Works end to end!
* Checkpoint: Use API to make sure init container is ready instead of blind sleep. Propagate exception in DefaultCheckConnectionWorker.
* Refactor KubePodProcess. Checked to make sure everything still works.
* Format.
* Clean up code. Begin putting this into variables and breaking up long constructor function.
* Add comments to explain what is happening.
* fix normalization test
* increase timeout for initcontainer
Co-authored-by: Davin Chia <davinchia@gmail.com>
* facepalm moment
* clean up kube poc pr (#3834)
* clean up
* remove source-always-works
* create separate commons-docker
* fix test
* enable kube e2e tests (#3866)
* enable kube e2e tests
* use more generally accepted env definition
* use new runners
* use its own runner and install minikube differently
* update name
* use kubectl alias
* use link instead of alias that doesn't propagate
* start minikube
* use driver=none
* go back to using action
* mess with versions
* revert runner
* install socat
* print logs after run
* also try re-runnining tasks
* always wait for file transfer
* use ports
* increase wait timeout for kube
* use different localhost ips and bump normalization to include an entrypoint
* proposed fix
* all working locally
* revert temporary changes
* revert normalization image change that's happening in a separate pr
* readability
* final comment
* Working Kube Cancel. (#3983)
* Port over the basic changes.
* Add logic to return proper exit code in the event of termination. Add comments to explain why.
* revert envs change and merge master to fix kube acceptance tests (#4012)
* use older env format
* fix build
Co-authored-by: jrhizor <me@jaredrhizor.com>
Co-authored-by: Jared Rhizor <jared@dataline.io>
This PR introduces the following behavior for JDBC sources:
Instead of streamName = schema.tableName, this is now streamName = tableName and namespace = schema. This means that, when replicating from these sources, data will be replicated into a form matching the source. e.g. public.users (postgres source) -> public.users (postgres destination) instead of current behaviour of public.public_users. Since MySQL does not have schemas, the MySQL source uses the database as it's namespace.
To do so:
- Make namespace a field class concept in Airbyte Protocol. This allows the source to propagate namespace and destinations to write to a source-defined namespace. Also sets us up for future namespace related configurability.
- Add an optional namespace field to the AirbyteRecordMessage. This field will be set by sources that support namespace.
- Introduce AirbyteStreamNameNamespacePair as a type-safe manner of identifying streams throughout our code base.
- Modify base_normalisation to better support source defined namespace, specifically allowing normalisation of tables with the same name to different schemas.
Last step (besides documentation) of namespace changes. This is a follow up to #2767 .
After this change, the following JDBC sources will change their behaviour to the behaviour described in the above document.
Namely, instead of streamName = schema.tableName, this will become streamName = tableName and namespace = schema. This means that, when replicating from these sources, data will be replicated into a form matching the source. e.g. public.users (postgres source) -> public.users (postgres destination) instead of current behaviour of public.public_users. Since MySQL does not have schemas, the MySQL source uses the database as it's namespace.
I cleaned up some bits of the CatalogHelpers. This affected the destinations, so I'm also running the destination tests.
* The newest stable version of docker for mac triggers an issue in test containers (see issue). The test containers maintainers have a fix but it is currently only in 1.15.0-rc2. This PR upgrades that for now. Without this fix, if you are on the newest version of docker for mac, you can't build our project.
* Issue to upgrade to 1.15 when it becomes stable so we don't stay on the RC forever: https://github.com/airbytehq/airbyte/issues/493