* use SHORT_ACTIVITY_OPTIONS on check connection activity so that it has retries
* retry workflow after delay instead of quarantining
* allow activity env vars to be configured in docker and kube
* add env var for workflow restart delay and refactor slightly
* update tests to handle new restart behavior
* update test name
* add empty env var values to .env files
* fail attempt before job in cleanJobState to prevent state machine failure
* change default value of max activity attempt retries from 10 to 5
* Create interface, factory for metric client
* remove unused func
* change count val to use long
* PR fix
* otel metric client implementation
* merge conflicts resolve
* build fix
* add a test, moved version into deps catalog
* fix test
* add docs for open telemetry
* fix kube setting for otel, and add doc
* helm related fields update for opentel
* sweep all scheduler application code and new-scheduler conditional logic
* remove airbyte-scheduler from deployments and docs
* format
* remove 'v2' from github actions
* add back scheduler in delete deployment command
* remove scheduler parameters from helm chart values
* add back job cleaner + test and add comment
* remove now-unused env vars from code and docs
* format
* remove feature flags from web backend connection handler as it is no longer needed
* remove feature flags from config api as it is now longer needed
* remove feature flags input from config api test
* format + shorter url
* remove scheduler parameters from helm chart readme
* switch bootloader to job and add recreate strategy to db
* add comment about recreate strategy
* Add comment about ttl on bootloader
* add comment about Pod vs Job
This PR adds the possibility to define pod annotations to the pods created by the workers.
Pod annotations can be required in different situations, such as configuring which IP pool to use when using some network plugins.
The original PR was here: #9874 we decided to split it into 3 different PRs.
* add misc todos
* save work so far
* configure async pod processing
* remove comment
* fmt
* working except logging propagation?
* add comment
* add logging and misc configuration fixes
* add output propagation
* fix state reading
* logging is working (but background highlighting is not)
* fix log highlighting
* use sys instead of ctx
* comment
* clean up and test state management
* clean up orchestrator app construction
* unify launcher workers and handle resuming
* respond to comments
* misc
* disable
* fix comment
* respond to comments
- Add the CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION and JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION. These are env vars that will determine if the database is ready for an application to start.
- Add the CONFIGS_DATABASE_INITIALIZATION_TIMEOUT_MS and the JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS env vars to determine how long an application should wait for the DB before giving up.
- Create the MinimumFlywayMigrationVersionCheck class. This class contains all the assertions to check if 1) a database is initialised. 2) a database meets the minimum migration version.
- Remove all set up operations from the ServerApp. Use MinimumFlywayMigrationVersionCheck operations instead.
- I also had to modify the Databases and BaseDatabaseInstance classes to support connecting to a database with timeouts. We would previously try forever.
- Add Bootloader to the relevant docker files and Kube files.
- Clean up the migration acceptance tests so it's clear what is happening.
* Update platform containers to use non-root users
* Update kube template for the webapp container to use port 8080
After having updated the webbapp nginx image to expose port 8080 instead of 80
* missing 80 -> 8080 changes
Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
* Rename GcsStorageBucket to GcsLogBucket.
* Update all references to GCP_STORAGE_BUCKET to GCS_LOG_BUCKET.
* Undo this for configuration files for older Airbyte versions.
* Clean up Job env vars. (#8462)
* Rename MAX_SYNC_JOB_ATTEMPTS to SYNC_JOB_MAX_ATTEMPTS.
* Rename MAX_SYNC_TIMEOUT_DAYS to SYNC_JOB_MAX_TIMEOUT_DAYS.
* Rename WORKER_POD_TOLERATIONS to JOB_POD_TOLERATIONS.
* Rename WORKER_POD_NODE_SELECTORS to JOB_POD_NODE_SELECTORS.
* Rename JOB_IMAGE_PULL_POLICY to JOB_POD_MAIN_CONTAINER_IMAGE_PULL_POLICY.
* Rename JOBS_IMAGE_PULL_SECRET to JOB_POD_MAIN_CONTAINER_IMAGE_PULL_SECRET.
* Rename JOB_SOCAT_IMAGE to JOB_POD_SOCAT_IMAGE.
* Rename JOB_BUSYBOX_IMAGE to JOB_POD_BUSYBOX_IMAGE.
* Rename JOB_CURL_IMAGE to JOB_POD_CURL_IMAGE.
* Rename KUBE_NAMESPACE to JOB_POD_KUBE_NAMESPACE.
* Rename RESOURCE_CPU_REQUEST to JOB_POD_MAIN_CONTAINER_CPU_REQUEST.
* Rename RESOURCE_CPU_LIMIT to JOB_POD_MAIN_CONTAINER_CPU_LIMIT.
* Rename RESOURCE_MEMORY_REQUEST to JOB_POD_MAIN_CONTAINER_MEMORY_REQUEST.
* Rename RESOURCE_MEMORY_LIMIT to JOB_POD_MAIN_CONTAINER_MEMORY_LIMIT.
* Remove worker suffix from created pods to reduce confusion with actual worker pods.
* Use sync instead of worker to name job pods.
* Created skeleton of a migration utility module.
* Secrets migration hello world functional
* Added dependencies for secrets migration
* Create Secrets store migration utility and related tests.
* Make secrets migration work in kube
* Make pod for secrets migration give right health result to kube
* docker-compose split of scheduler and worker
* fix heartbeat location bug + add support for kubernetes
* use two workers in integration tests
* capture logs in AirbyteTestContainer
* add waiting
* rename to make it easier to review
* rename module
* fix remaining conflicts
* allow configuring max workers of each type and document usage
* fix build
* remove comment
* add worker resource requiremetns
* try to fix for connector build
* fix regression in biuld
* add env comments for SUBMITTER_NUM_THREADS
* Update airbyte-workers/src/main/java/io/airbyte/workers/WorkerApp.java
Co-authored-by: Davin Chia <davinchia@gmail.com>
* Update airbyte-workers/src/main/java/io/airbyte/workers/temporal/TemporalPool.java
Co-authored-by: Davin Chia <davinchia@gmail.com>
* merge temporalpool into workerapp
* output docker system info
* move check to before
* remove unnecessary parts of the patch
* could this be the problem? i thought i added this
* show disk usage
* add print statements
* add pruning
* fix prune option
* use force
Co-authored-by: Davin Chia <davinchia@gmail.com>
Get rid of the workspace volume on Kube. This used to contain logs and configs. Since we've moved things into the config db, this now only contains log. However on Kube, we log to external storage, which means we can get rid of this.
- Set env specific log4j2 context map keys and modify our log4j configuration to publish to cloud/local depending on those keys.
- In the process, I discovered a bug with how we were creating the Minio client - that meant Kube deployments with Minio were almost certainly all using the local workspace volume for logs instead of Minio. Fixed this.
* allow scheduler and server to run on separate nodes
* re-add workspace mount for docker compose only
* remove stacktrace printing
* add affinity testing components
* reorder mounts
* just try a two node cluster
* add waiting log line
* seed containers are now axed
* remove unused var
* add comment
* rename to integration-test
* use kube service user for pod sweeper
* add pod sweeper logs
* temporarily switch to stable for testing
* temporarily remove building steps for kube testing since it can use prod images
* output date strings from date command
* load stable images
* remove loading since it can pull the images
* increase window for success storage to two hours
* revert test logging changes
Implement logging to and reading from Minio. Use the same S3 client for this.
Configure Airbyte Kube Prod and Staging to use Minio by default, so Airbyte Kube is a standalone deployment.
Also update documentation to reflect this.
* introduce automatic migration at the startup of server
* handle versions with non-zero patch
* it works!!!
* add dummy data
* cleanup orphan configs
* add more assertions
* format + add comments
* move migration acceptance test to acceptance test directory
* add automatic migration test to the build
* address review comments
* missed out on these
* format
* add more assertions
* format
* fix test
* format
* use default port for temporal
* move seed to server + introduce atomice replacement for config
* make tests better
* remove unwanted changes
* move atomic replacement logic behind persistence + pass path to latest seeds
* format
* update seeds
* review comments
* update seeds
* merge latest seeds with configs
* fix bug around latest seed
* update seed
* update seed
* seeds should be populated by separate container
* address review comment + change latest definition url
* update seeds
* format
* update seed references
* update seed
* update seed
* update seed
* update seed references
* update seed references + add Migration Acceptance Test
* update seed container in kube + disable automatic migration for kube + update docs
* update docs
* address review comments from Michel
* update doc
* temporary commmit to see if build becomes green
* delete seeds from airbyte config + undo temp commit
* Specify kube namespace
* remove work that turned out to be unnecessary and add debugging statement
* make it possible to run without configuring s3
* cleanup
* use default as default namespace
Co-authored-by: jrhizor <me@jaredrhizor.com>
Use Log4j2 appender to support routing logs to S3.
Create LogClient to support reading from S3.
Some clean up of the Log4j2 xml variables.
Several dependency changes to be more explicit when configuring jackson.
* working except for too much logging and bad success case
* succeeds on passing case
* completes successfully
* just doesn't kill the main
* working zombie killing
* cleanup
* more cleanup
* use correct path
* fmt
* cleanups, bugfixes, integration tests
* run worker integration tests as part of ci
* delete tester class
* fix hanging checkpoint container problem
* fix name of command
* replace todo with clarifying comment
* Use CDK to generate source that can be configured to emit a certain number of records and always works.
* Checkpoint: socat works from inside the docker container.
* Override the entry point.
* Clean up and add ReadMe.
* Clean up socat.
* Checkpoint: connect to Kube cluster and list all the pods.
* Checkpoint: Sync worker pod is able to send output to the destination pod.
* Checkpoint: Sync worker creates Dest pod if none existed previously. It also waits for the pod to be ready before doing anything else. Sync worker will also remove the pod on termination.
* update readme
* Checkpoint: Dest pod does nott restart after finishing. Comment out delete command in Sync worker.
* working towards named pipes
* named pipes working
* update readme
* WIP named pipe / socat sidecar kube port forwarding (#3518)
* nearly working sources
* update
* stdin example
* move all kube testing yamls into the airbyte-workers directories. sort the airbyte-workers resource folder; place all the poc yamls together.
* Format.
* Put back the original KubeProcessBuilderFactory.
* Fix slight errors.
* Checkpoint: Worker pod knows its own IP. Successfully starts and writes to Dest pod after refactor.
* remove unused file and update readme
* Dest pod loops back into worker pod. However, the right messages do not seem to be passing in.
* Switch back to worker ip.
* SWEET VICTORY!.
* wrap kube pod in process (#3540)
also clean up kubernetes deploys.
* More clean up. (#3586)
The first 6 points of #3464.
The only interesting thing about this PR is the kube pod shutdown. For whatever reason, the OkHttpPool isn't respecting the evictAll call and 1 idle thread remains. So instead of shutting down immediately, the worker pod shuts down after 5 mins when the idle thread id reaped. There isn't an easy way to modify the pool's idle reap configuration now. I do not think this issue is blocking since it's relatively benign, so I vote we create a ticket and come back to this once we do an e2e test.
* Implements redirecting standard error as well. (#3623)
* Clean up before next implementation.
* kube process launching (#3790)
* processes must handle file mounting
* remove comment
* default to base entrypoint
* use process builder factory / select stdin / use a pool of ports
* fix up
* add super hacky copying example
* Checkpoint: Works end to end!
* Checkpoint: Use API to make sure init container is ready instead of blind sleep. Propagate exception in DefaultCheckConnectionWorker.
* Refactor KubePodProcess. Checked to make sure everything still works.
* Format.
* Clean up code. Begin putting this into variables and breaking up long constructor function.
* Add comments to explain what is happening.
* fix normalization test
* increase timeout for initcontainer
Co-authored-by: Davin Chia <davinchia@gmail.com>
* facepalm moment
* clean up kube poc pr (#3834)
* clean up
* remove source-always-works
* create separate commons-docker
* fix test
* enable kube e2e tests (#3866)
* enable kube e2e tests
* use more generally accepted env definition
* use new runners
* use its own runner and install minikube differently
* update name
* use kubectl alias
* use link instead of alias that doesn't propagate
* start minikube
* use driver=none
* go back to using action
* mess with versions
* revert runner
* install socat
* print logs after run
* also try re-runnining tasks
* always wait for file transfer
* use ports
* increase wait timeout for kube
* use different localhost ips and bump normalization to include an entrypoint
* proposed fix
* all working locally
* revert temporary changes
* revert normalization image change that's happening in a separate pr
* readability
* final comment
* Working Kube Cancel. (#3983)
* Port over the basic changes.
* Add logic to return proper exit code in the event of termination. Add comments to explain why.
* revert envs change and merge master to fix kube acceptance tests (#4012)
* use older env format
* fix build
Co-authored-by: jrhizor <me@jaredrhizor.com>
Co-authored-by: Jared Rhizor <jared@dataline.io>
* updating .env files
* adding temporal and bumping K8s-supported version
* adding temporal yaml manifest
* updating scheduler and server manifests to include envars