* copy files from edgao branch * start writing create table statement * add basic unit test setup * create a table, probably * remove outdated todo * derp, one more column * ugh * add partitioning+clustering * use StringSubstitutor * substitutions in updateTable * wip generate update/insert statement * split up into smaller methods * handle json types correctly * rename stuff * more json_query vs _value stuff * minor tweak * super basic test setup * laying foundation for type parsing * more stuff * tweaks * more progress on type parsing * fix json_value stuff? * misc fixes in insert * fix dedupFinalTable * add testDedupRaw * full e2e test * type parsing: gave up and mirrored the dbt code structure to avoid bugs * type parsing - more cleanup * handle column name collisions * handle tablename collisions...? * comments * remove original ns/name from quotedstream * also javadoc * remove redundant method * fix table rename * add incremental append test * add full refresh append test * comment * call T+D sql in a reasonable location for standard inserts * add config option * use config option here * type parsing - fix fromJsonSchema * gate everything * log query + runtime * add spec option temporarily * Raw Table Updates * fix more stuff * first big pass at toDialectType * no quotes * wrap everything in quotes * resolve some TODOs * log sql statement in tests * overwriteFinalTable returns optional * minor clean up * add raw dataset override * try to preserve the original namespace for t+d? * write to the raw table correctly * update todos * write directly to raw table this is kind of dumb because we're still trying to do tmp table operations, and we still don't ack state until the end of the entire sync. * standard inserts write to raw table correctly * imports + log statements * move logs + add comment * explicitly create raw table * move comment to better place * Typing issues * bash attempt * formatting updates * formatting updates * write to the airbyte schema by default unless overriden by config options * standard inserts truncate raw table at start of sync * full refresh overwrite will overwrite correctly! * fix avro record schema parsing * better raw table recreate * rename raw table to match standard inserts * full refresh overwrite does tmp table things * small clean up * small clean up * remove errors entry if no errors * pull out destination config into singleton * clean up singleton stuff * make sure dest config exists when trying to do lookups * avoid stringifying null * quick thoughts on alter table * add basic cdc testcase * tweak cdc test setup * rename raw table to match standard inserts * minor tweak * delete exact sql string assertions * switch to JSON type * minor cleanup * sql whitespace changes * explain cdc deletions * GCS Staging Full Refresh create temp table * assert schema * first out of order cdc test * add another cdc test case (currently failing) * better test structure * make this work * oops, fix test * stop trying to delete deletion records * minor improvements to code+test * enable concurrent test runs on integration test * move stuff to static initializer * extract utility method * formatting * Move conditional to the base java package, replace conditionals which did not use the typing and deduping flag but should have been. * 🤖 Auto format destination-bigquery code [skip ci] * 🤖 Auto format destination-gcs code [skip ci] * switch back to empty list; write big assert * minor wording tweaks * 🤖 Auto format destination-bigquery code [skip ci] * 🤖 Auto format destination-gcs code [skip ci] * DestinationConfigTest * 🤖 Auto format destination-bigquery code [skip ci] * 🤖 Auto format destination-gcs code [skip ci] * formatting * remove ParsedType * 🤖 Auto format destination-gcs code [skip ci] * 🤖 Auto format destination-bigquery code [skip ci] * tests verify every data type * 🤖 Auto format destination-bigquery code [skip ci] * 🤖 Auto format destination-gcs code [skip ci] * full update with all data types * 🤖 Auto format destination-bigquery code [skip ci] * 🤖 Auto format destination-gcs code [skip ci] * move stuff to new base lib * 🤖 Auto format destination-gcs code [skip ci] * Automated Commit - Formatting Changes * 🤖 Auto format destination-bigquery code [skip ci] * fix test * 🤖 Auto format destination-bigquery code [skip ci] * 🤖 Auto format destination-bigquery code [skip ci] * 🤖 Auto format destination-gcs code [skip ci] * asserts in dedupFinalTable * better asserts in dedupRawTable * [wip] test case for all data types * 🤖 Auto format destination-gcs code [skip ci] * 🤖 Auto format destination-bigquery code [skip ci] * AirbyteTypeTest * Automated Commit - Formatting Changes * remove comments * test chooseOneOf * slightly better test output * Automated Commit - Formatting Changes * add some awful pretty print code * more comment * minor tweaks * verify array/object type * fix test * handle deletions more correctly * test toDialectType * Destinations v2: better namespace handling (#27682) * [wip] better namespace handling * 🤖 Auto format destination-bigquery code [skip ci] * wip also implement in gcs * get gcs working (?) * 🤖 Auto format destination-bigquery code [skip ci] * remove duplicate method * 🤖 Auto format destination-bigquery code [skip ci] * fixed my code style settings * make ci happy? * 🤖 Auto format destination-bigquery code [skip ci] * make ci happy? * remove incorrect test * blank line change * initialize singleton --------- Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com> * reset args correctly * Automated Commit - Formatting Changes * more bash stuff * parse implicit structs * initialize singleton in more tests * Automated Commit - Formatting Changes * I missed this namespace handling thing * test more schemas * fix singular types specified in arrays * Automated Commit - Formatting Changes * disable test for unimplemented feature * initialize singleton * remove spec options; changelogs+metadata * randomize namespace * also bump dockerfile * unremove namespace sanitizing in legacy mode * ... disable the correct test * even more unit test fixes! * move integration test to integration tests --------- Co-authored-by: Cynthia Yin <cynthia@airbyte.io> Co-authored-by: Joe Bell <joseph.bell@airbyte.io> Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com> Co-authored-by: edgao <edgao@users.noreply.github.com> Co-authored-by: cynthiaxyin <cynthiaxyin@users.noreply.github.com>
150 lines
13 KiB
Markdown
150 lines
13 KiB
Markdown
# Debugging Docker Containers
|
|
This guide will cover debugging **JVM docker containers** either started via Docker Compose or started by the
|
|
worker container, such as a Destination container. This guide will assume use of [IntelliJ Community edition](https://www.jetbrains.com/idea/),
|
|
however the steps could be applied to another IDE or debugger.
|
|
|
|
## Prerequisites
|
|
You should have the airbyte repo downloaded and should be able to [run the platform locally](https://docs.airbyte.com/deploying-airbyte/local-deployment).
|
|
Also, if you're on macOS you will need to follow the installation steps for [Docker Mac Connect](https://github.com/chipmk/docker-mac-net-connect).
|
|
|
|
## Connecting your debugger
|
|
This solution utilizes the environment variable `JAVA_TOOL_OPTIONS` which when set to a specific value allows us to connect our debugger.
|
|
We will also be setting up a **Remote JVM Debug** run configuration in IntelliJ which uses the IP address or hostname to connect.
|
|
|
|
> **Note**
|
|
> The [Docker Mac Connect](https://github.com/chipmk/docker-mac-net-connect) tool is what makes it possible for macOS users to connect to a docker container
|
|
> by IP address.
|
|
|
|
### Docker Compose Extension
|
|
By default, the `docker compose` command will look for a `docker-compose.yaml` file in your directory and execute its instructions. However, you can
|
|
provide multiple files to the `docker compose` command with the `-f` option. You can read more about how Docker compose combines or overrides values when
|
|
you provide multiple files [on Docker's Website](https://docs.docker.com/compose/extends/).
|
|
|
|
In the Airbyte repo, there is already another file `docker-compose.debug.yaml` which extends the `docker-compose.yaml` file. Our goal is to set the
|
|
`JAVA_TOOL_OPTIONS` environment variable in the environment of the container we wish to debug. If you look at the `server` configuration under `services`
|
|
in the `docker-compose.debug.yaml` file, it should look like this:
|
|
```yaml
|
|
server:
|
|
environment:
|
|
- JAVA_TOOL_OPTIONS=${DEBUG_SERVER_JAVA_OPTIONS}
|
|
```
|
|
What this is saying is: For the Service `server` add an environment variable `JAVA_TOOL_OPTIONS` with the value of the variable `DEBUG_SERVER_JAVA_OPTIONS`.
|
|
`DEBUG_SERVER_JAVA_OPTIONS` has no default value, so if we don't provide one, `JAVA_TOOL_OPTIONS` will be blank or empty. When running the `docker compose` command,
|
|
Docker will look to your local environment variables, to see if you have set a value for `DEBUG_SERVER_JAVA_OPTIONS` and copy that value. To set this value
|
|
you can either `export` the variable in your environment prior to running the `docker compose` command, or prepend the variable to the command. For our debugging purposes,
|
|
we want the value to be `-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=*:5005` so to connect our debugger to the `server` container, run the following:
|
|
|
|
```bash
|
|
DEBUG_SERVER_JAVA_OPTIONS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=*:5005" VERSION="dev" docker compose -f docker-compose.yaml -f docker-compose.debug.yaml up
|
|
```
|
|
|
|
> **Note**
|
|
> This command also passes in the `VERSION=dev` environment variable, which is recommended from the comments in the `docker-compose.debug.yaml`
|
|
|
|
### Connecting the Debugger
|
|
Now we need to connect our debugger. In IntelliJ, open `Edit Configurations...` from the run menu (Or search for `Edit Configurations` in the command palette).
|
|
Create a new *Remote JVM Debug* Run configuration. The `host` option defaults to `localhost` which if you're on Linux you can leave this unchanged.
|
|
On a Mac however, you need to find the IP address of your container. **Make sure you've installed and started the [Docker Mac Connect](https://github.com/chipmk/docker-mac-net-connect)
|
|
service prior to running the `docker compose` command**. With your containers running, run the following command to easily fetch the IP addresses:
|
|
|
|
```bash
|
|
$ docker inspect $(docker ps -q ) --format='{{ printf "%-50s" .Name}} {{printf "%-50s" .Config.Image}} {{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
|
|
/airbyte-proxy airbyte/proxy:dev 172.18.0.10172.19.0.4
|
|
/airbyte-server airbyte/server:dev 172.18.0.9
|
|
/airbyte-worker airbyte/worker:dev 172.18.0.8
|
|
/airbyte-source sha256:5eea76716a190d10fd866f5ac6498c8306382f55c6d910231d37a749ad305960 172.17.0.2
|
|
/airbyte-connector-builder-server airbyte/connector-builder-server:dev 172.18.0.6
|
|
/airbyte-webapp airbyte/webapp:dev 172.18.0.7
|
|
/airbyte-cron airbyte/cron:dev 172.18.0.5
|
|
/airbyte-temporal airbyte/temporal:dev 172.18.0.2
|
|
/airbyte-db airbyte/db:dev 172.18.0.4172.19.0.3
|
|
/airbyte-temporal-ui temporalio/web:1.13.0 172.18.0.3172.19.0.2
|
|
```
|
|
You should see an entry for `/airbyte-server` which is the container we've been targeting so copy its IP address (`172.18.0.9` in the example output above)
|
|
and replace `localhost` in your IntelliJ Run configuration with the IP address.
|
|
|
|
Save your Remote JVM Debug run configuration and run it with the debug option. You should now be able to place breakpoints in any code that is being executed by the
|
|
`server` container. If you need to debug another container from the original `docker-compose.yaml` file, you could modify the `docker-compose.debug.yaml` file with a similar option.
|
|
|
|
### Debugging Containers Launched by the Worker container
|
|
The Airbyte platform launches some containers as needed at runtime, which are not defined in the `docker-compose.yaml` file. These containers are the source or destination
|
|
tasks, among other things. But if we can't pass environment variables to them through the `docker-compose.debug.yaml` file, then how can we set the
|
|
`JAVA_TOOL_OPTIONS` environment variable? Well, the answer is that we can *pass it through* the container which launches the other containers - the `worker` container.
|
|
|
|
For this example, lets say that we want to debug something that happens in the `destination-postgres` connector container. To follow along with this example, you will
|
|
need to have set up a connection which uses postgres as a destination, however if you want to use a different connector like `source-postgres`, `destination-bigquery`, etc. that's fine.
|
|
|
|
In the `docker-compose.debug.yaml` file you should see an entry for the `worker` service which looks like this
|
|
```yaml
|
|
worker:
|
|
environment:
|
|
- DEBUG_CONTAINER_IMAGE=${DEBUG_CONTAINER_IMAGE}
|
|
- DEBUG_CONTAINER_JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=*:5005
|
|
```
|
|
Similar to the previous debugging example, we want to pass an environment variable to the `docker compose` command. This time we're setting the
|
|
`DEBUG_CONTAINER_IMAGE` environment variable to the name of the container we're targeting. For our example that is `destination-postgres` so run the command:
|
|
```bash
|
|
DEBUG_CONTAINER_IMAGE="destination-postgres:5005" VERSION="dev" docker compose -f docker-compose.yaml -f docker-compose.debug.yaml up
|
|
```
|
|
The `worker` container now has an environment variable `DEBUG_CONTAINER_IMAGE` with a value of `destination-postgres` which when it compares when it is
|
|
spawning containers. If the container name matches the environment variable, it will set the `JAVA_TOOL_OPTIONS` environment variable in the container to
|
|
the value of its `DEBUG_CONTAINER_JAVA_OPTS` environment variable, which is the same value we used in the `server` example.
|
|
|
|
#### Connecting the Debugger to a Worker Spawned Container
|
|
To connect your debugger, **the container must be running**. This `destination-postgres` container will only run when we're running one of its tasks,
|
|
such as when a replication is running. Navigate to a connection in your local Airbyte instance at http://localhost:8000 which uses postgres as a destination.
|
|
If you ran through the [Postgres to Postgres replication tutorial](https://airbyte.com/tutorials/postgres-replication), you can use this connection.
|
|
|
|
On the connection page, trigger a manual sync with the "Sync now" button. Because we set the `suspend` option to `y` in our `JAVA_TOOL_OPTIONS` the
|
|
container will pause all execution until the debugger is connected. This can be very useful for methods which run very quickly, such as the Check method.
|
|
However, this could be very detrimental if it were pushed into a production environment. For now, it gives us time to set a new Remote JVM Debug Configuraiton.
|
|
|
|
This container will have a different IP than the `server` Remote JVM Debug Run configuration we set up earlier. So lets set up a new one with the IP of
|
|
the `destination-postgres` container:
|
|
|
|
```bash
|
|
$ docker inspect $(docker ps -q ) --format='{{ printf "%-50s" .Name}} {{printf "%-50s" .Config.Image}} {{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
|
|
/destination-postgres-write-52-0-grbsw airbyte/destination-postgres:0.3.26
|
|
/airbyte-proxy airbyte/proxy:dev 172.18.0.10172.19.0.4
|
|
/airbyte-worker airbyte/worker:dev 172.18.0.8
|
|
/airbyte-server airbyte/server:dev 172.18.0.9
|
|
/airbyte-destination postgres 172.17.0.3
|
|
/airbyte-source sha256:5eea76716a190d10fd866f5ac6498c8306382f55c6d910231d37a749ad305960 172.17.0.2
|
|
/airbyte-connector-builder-server airbyte/connector-builder-server:dev 172.18.0.6
|
|
/airbyte-webapp airbyte/webapp:dev 172.18.0.7
|
|
/airbyte-cron airbyte/cron:dev 172.18.0.5
|
|
/airbyte-temporal airbyte/temporal:dev 172.18.0.3
|
|
/airbyte-db airbyte/db:dev 172.18.0.2172.19.0.3
|
|
/airbyte-temporal-ui temporalio/web:1.13.0 172.18.0.4172.19.0.2
|
|
```
|
|
|
|
Huh? No IP address, weird. Interestingly enough, all the IPs are sequential but there is one missing, `172.18.0.1`. If we attempt to use that IP in remote debugger, it works!
|
|
|
|
You can now add breakpoints and debug any code which would be executed in the `destination-postgres` container.
|
|
|
|
Happy Debugging!
|
|
|
|
#### Connecting the Debugger to an Integration Test Spawned Container
|
|
You can also debug code contained in containers spawned in an integration test! This can be used to debug integration tests as well as testing code changes.
|
|
The steps involved are:
|
|
1. Follow all the steps outlined above to set up the **Remote JVM Debug** run configuration.
|
|
2. Edit the run configurations associated with the given integration test with the following environment variables:`DEBUG_CONTAINER_IMAGE=source-postgres;DEBUG_CONTAINER_JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=*:5005`
|
|
Note that you will have to keep repeating this step for every new integration test run configuration you create.
|
|
3. Run the integration test in debug mode. In the debug tab, open up the Remote JVM Debugger run configuration you just created.
|
|
4. Keep trying to attach the Remote JVM Debugger. It will likely fail a couple of times and eventually connect to the test container. If you want a more
|
|
deterministic way to connect the debugger, you can set a break point in the `DockerProcessFactor.localDebuggingOptions()` method. Resume running the integration test run and
|
|
then attempt to attach the Remote JVM Debugger (you still might need a couple of tries).
|
|
|
|
|
|
## Gotchas
|
|
So now that your debugger is set up, what else is there to know?
|
|
|
|
### Code changes
|
|
When you're debugging, you might want to make a code change. Anytime you make a code change, your code will become out of sync with the container which is run by the platform.
|
|
Essentially this means that after you've made a change you will need to rebuild the docker container you're debugging. Additionally, for the connector containers, you may have to navigate to
|
|
"Settings" in your local Airbyte Platform's web UI and change the version of the container to `dev`. See you connector's `README` for details on how to rebuild the container image.
|
|
|
|
### Ports
|
|
In this tutorial we've been using port `5005` for all debugging. It's the default, so we haven't changed it. If you need to debug *multiple* containers however, they will clash on this port.
|
|
If you need to do this, you will have to modify your setup to use another port that is not in use.
|