1
0
mirror of synced 2026-01-06 06:04:16 -05:00
Commit Graph

54 Commits

Author SHA1 Message Date
Lake Mossman
fc5ba66e1c Restart workflow after activity failure instead of quarantining (#13779)
* use SHORT_ACTIVITY_OPTIONS on check connection activity so that it has retries

* retry workflow after delay instead of quarantining

* allow activity env vars to be configured in docker and kube

* add env var for workflow restart delay and refactor slightly

* update tests to handle new restart behavior

* update test name

* add empty env var values to .env files

* fail attempt before job in cleanJobState to prevent state machine failure

* change default value of max activity attempt retries from 10 to 5
2022-06-15 16:07:47 -07:00
Xiaohan Song
05472b807a Kube fix and Docs for open telemetry integration (#13701)
* Create interface, factory for metric client

* remove unused func

* change count val to use long

* PR fix

* otel metric client implementation

* merge conflicts resolve

* build fix

* add a test, moved version into deps catalog

* fix test

* add docs for open telemetry

* fix kube setting for otel, and add doc

* helm related fields update for opentel
2022-06-14 09:17:46 -07:00
Lake Mossman
73034c64da Sweep old scheduler code (#13400)
* sweep all scheduler application code and new-scheduler conditional logic

* remove airbyte-scheduler from deployments and docs

* format

* remove 'v2' from github actions

* add back scheduler in delete deployment command

* remove scheduler parameters from helm chart values

* add back job cleaner + test and add comment

* remove now-unused env vars from code and docs

* format

* remove feature flags from web backend connection handler as it is no longer needed

* remove feature flags from config api as it is now longer needed

* remove feature flags input from config api test

* format + shorter url

* remove scheduler parameters from helm chart readme
2022-06-06 10:49:17 -07:00
Lake Mossman
88390f24ea Improve kube deploy process. (#13397)
* switch bootloader to job and add recreate strategy to db

* add comment about recreate strategy

* Add comment about ttl on bootloader

* add comment about Pod vs Job
2022-06-02 15:27:55 -07:00
Jonathan Pearlin
e06a9de60f Use Database Availability/Initialization Check (#13178)
* Use isolated database initialization logic

* Address PMD warnings

* Use test provider where possible

* Initialize database on bootloader load

* Combine availability and migration checks

* Ensure env vars are set

* Fix typo

* Avoid duplicate literals

* Add log message

* Use correct data source

* Revert change

* Update copyright

* Remove redundant exception catch/throw
2022-05-27 09:47:33 -04:00
Prasanna Ram Venkatachalam
f1c6f1964c Pod Sweeper: fix parsing missing date (#11781) 2022-04-26 15:19:32 -03:00
Thibaud Chardonnens
5528d7d874 airbyte-workers: add support for kubernetes pod annotations (#10753)
This PR adds the possibility to define pod annotations to the pods created by the workers.
Pod annotations can be required in different situations, such as configuring which IP pool to use when using some network plugins.

The original PR was here: #9874 we decided to split it into 3 different PRs.
2022-04-05 17:44:59 +08:00
Jared Rhizor
77ea0e0d33 sweep pods from end time not start time (#10614) 2022-02-24 09:04:13 -08:00
Jared Rhizor
db4093277f async container launch kubernetes "process" (#9242)
* add misc todos

* save work so far

* configure async pod processing

* remove comment

* fmt

* working except logging propagation?

* add comment

* add logging and misc configuration fixes

* add output propagation

* fix state reading

* logging is working (but background highlighting is not)

* fix log highlighting

* use sys instead of ctx

* comment

* clean up and test state management

* clean up orchestrator app construction

* unify launcher workers and handle resuming

* respond to comments

* misc

* disable

* fix comment

* respond to comments
2022-01-20 07:56:06 -08:00
Augustin
c51fb7afe6 Improve JOB_POD variable naming + improve doc about memory management (#9048) 2021-12-23 18:42:13 +01:00
Davin Chia
60e32373e8 Revert "Revert "Switch to use Bootloader. (#8584)" (#8778)" (#8790)
This reverts commit 216501b4fa.

Turn this back on since this was originally reverted for logging update changes.
2021-12-15 14:09:43 +08:00
Davin Chia
216501b4fa Revert "Switch to use Bootloader. (#8584)" (#8778)
This reverts commit 5cf3967424.
2021-12-14 21:36:12 +08:00
Davin Chia
5cf3967424 Switch to use Bootloader. (#8584)
- Add the CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION and JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION. These are env vars that will determine if the database is ready for an application to start.
- Add the CONFIGS_DATABASE_INITIALIZATION_TIMEOUT_MS and the JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS env vars to determine how long an application should wait for the DB before giving up.
- Create the MinimumFlywayMigrationVersionCheck class. This class contains all the assertions to check if 1) a database is initialised. 2) a database meets the minimum migration version.
- Remove all set up operations from the ServerApp. Use MinimumFlywayMigrationVersionCheck operations instead.
- I also had to modify the Databases and BaseDatabaseInstance classes to support connecting to a database with timeouts. We would previously try forever.
- Add Bootloader to the relevant docker files and Kube files.
- Clean up the migration acceptance tests so it's clear what is happening.
2021-12-14 21:30:18 +08:00
Jared Rhizor
58475ce2a4 Revert "Update platform containers to use non-root users (#7872)" (#8611)
This reverts commit ebcaf2bcad.
2021-12-07 21:55:06 -08:00
Per-Victor Persson
ebcaf2bcad Update platform containers to use non-root users (#7872)
* Update platform containers to use non-root users

* Update kube template for the webapp container to use port 8080

After having updated the webbapp nginx image to expose port 8080 instead of 80

* missing 80 -> 8080 changes

Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
2021-12-07 21:40:32 -08:00
Davin Chia
341f505a94 Rename env vars for better readability. (#8447)
* Rename GcsStorageBucket to GcsLogBucket.

* Update all references to GCP_STORAGE_BUCKET to GCS_LOG_BUCKET.

* Undo this for configuration files for older Airbyte versions.

* Clean up Job env vars. (#8462)

* Rename MAX_SYNC_JOB_ATTEMPTS to SYNC_JOB_MAX_ATTEMPTS.

* Rename MAX_SYNC_TIMEOUT_DAYS to SYNC_JOB_MAX_TIMEOUT_DAYS.

* Rename WORKER_POD_TOLERATIONS to JOB_POD_TOLERATIONS.

* Rename WORKER_POD_NODE_SELECTORS to JOB_POD_NODE_SELECTORS.

* Rename JOB_IMAGE_PULL_POLICY to JOB_POD_MAIN_CONTAINER_IMAGE_PULL_POLICY.

* Rename JOBS_IMAGE_PULL_SECRET to JOB_POD_MAIN_CONTAINER_IMAGE_PULL_SECRET.

* Rename JOB_SOCAT_IMAGE to JOB_POD_SOCAT_IMAGE.

* Rename JOB_BUSYBOX_IMAGE to JOB_POD_BUSYBOX_IMAGE.

* Rename JOB_CURL_IMAGE to JOB_POD_CURL_IMAGE.

* Rename KUBE_NAMESPACE to JOB_POD_KUBE_NAMESPACE.

* Rename RESOURCE_CPU_REQUEST to JOB_POD_MAIN_CONTAINER_CPU_REQUEST.

* Rename RESOURCE_CPU_LIMIT to JOB_POD_MAIN_CONTAINER_CPU_LIMIT.

* Rename RESOURCE_MEMORY_REQUEST to JOB_POD_MAIN_CONTAINER_MEMORY_REQUEST.

* Rename RESOURCE_MEMORY_LIMIT to JOB_POD_MAIN_CONTAINER_MEMORY_LIMIT.

* Remove worker suffix from created pods to reduce confusion with actual worker pods.

* Use sync instead of worker to name job pods.
2021-12-03 23:28:48 +08:00
Jared Rhizor
1c1e402b3e remove minio hostport (#8310) 2021-11-29 07:46:16 -08:00
Artem Astapenko
fea3bf6798 Remove papercups (#6955)
* Remove papercups

* Remove support chat
2021-10-13 15:59:28 +03:00
Artem Astapenko
d0d92011e6 Remove openreplay (#6940)
* Remove openreplay

* Add lock file. Fix openreplay import
2021-10-13 15:17:59 +03:00
Mario Molina
febeb521ce 🎉 Configurable job pull image policy in k8s (#6827) 2021-10-06 15:04:50 +08:00
Jared Rhizor
35934f6c3c remove secrets migration from kube deployments (#6590) 2021-09-30 14:12:09 -07:00
Jenny Brown
eb04c1aec1 Secrets Migration utility (#6489)
* Created skeleton of a migration utility module.
* Secrets migration hello world functional
* Added dependencies for secrets migration
* Create Secrets store migration utility and related tests.
* Make secrets migration work in kube
* Make pod for secrets migration give right health result to kube
2021-09-29 11:33:38 -05:00
Prasanna Ram Venkatachalam
82cb353a39 inject tolerations and add node selectors for worker pods (#6123)
* inject worker pod tolerations env to deployments

* add support for worker pod node selectors
2021-09-20 06:34:39 -07:00
mohammad-bolt
332687a18b add secrets to kubernetes yamls (#5962) 2021-09-10 09:14:30 -07:00
Jared Rhizor
c7d8055731 split scheduler and worker (#5737)
* docker-compose split of scheduler and worker

* fix heartbeat location bug + add support for kubernetes

* use two workers in integration tests

* capture logs in AirbyteTestContainer

* add waiting

* rename to make it easier to review

* rename module

* fix remaining conflicts

* allow configuring max workers of each type and document usage

* fix build

* remove comment

* add worker resource requiremetns

* try to fix for connector build

* fix regression in biuld

* add env comments for SUBMITTER_NUM_THREADS

* Update airbyte-workers/src/main/java/io/airbyte/workers/WorkerApp.java

Co-authored-by: Davin Chia <davinchia@gmail.com>

* Update airbyte-workers/src/main/java/io/airbyte/workers/temporal/TemporalPool.java

Co-authored-by: Davin Chia <davinchia@gmail.com>

* merge temporalpool into workerapp

* output docker system info

* move check to before

* remove unnecessary parts of the patch

* could this be the problem? i thought i added this

* show disk usage

* add print statements

* add pruning

* fix prune option

* use force

Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-09-08 11:27:32 -07:00
Davin Chia
8353eea00c 🔪 : Get rid of Kube Workspace Volume. (#5663)
Get rid of the workspace volume on Kube. This used to contain logs and configs. Since we've moved things into the config db, this now only contains log. However on Kube, we log to external storage, which means we can get rid of this.

- Set env specific log4j2 context map keys and modify our log4j configuration to publish to cloud/local depending on those keys.
- In the process, I discovered a bug with how we were creating the Minio client - that meant Kube deployments with Minio were almost certainly all using the local workspace volume for logs instead of Minio. Fixed this.
2021-09-02 19:00:49 +08:00
Jared Rhizor
164d13866e allow scheduler and server to run on separate nodes (#5506)
* allow scheduler and server to run on separate nodes

* re-add workspace mount for docker compose only

* remove stacktrace printing

* add affinity testing components

* reorder mounts

* just try a two node cluster

* add waiting log line

* seed containers are now axed

* remove unused var

* add comment

* rename to integration-test
2021-08-26 15:51:16 -04:00
Davin Chia
d9ec412e68 Remove seed from kube manifest. (#5491) 2021-08-19 21:13:40 +08:00
Davin Chia
23dc777f8a Kube QOL Improvements. (#4968)
* Use cluster ip when possible.

* Removed noisy logging.
2021-07-27 12:05:44 +08:00
Artem Astapenko
2aee3233bb Add openreplay (#4685)
* Add openreplay

* Add env variables for openreplay

* Add openreplay env for k8s
2021-07-20 06:14:52 +07:00
Davin Chia
8034aa5e57 Make number of Concurrent Jobs configurable. (#4687) 2021-07-19 20:17:59 +08:00
Jared Rhizor
baed7b4997 use kube service user for pod sweeper (#4737)
* use kube service user for pod sweeper

* add pod sweeper logs

* temporarily switch to stable for testing

* temporarily remove building steps for kube testing since it can use prod images

* output date strings from date command

* load stable images

* remove loading since it can pull the images

* increase window for success storage to two hours

* revert test logging changes
2021-07-14 16:29:10 -07:00
Davin Chia
e2074a4dc1 Logging to GCS. (#4501)
Add the ability to log to GCS.
2021-07-07 21:06:25 +08:00
George Claireaux
2ca4f9441b 🐛 Kubernetes: Fix server starting up before Temporal ready to operate (#4567)
* refactor waitForTemporalServerAndLog into TemporalUtils

* remove use of docker-compose-wait
2021-07-07 09:51:00 +01:00
Christophe Duong
af57170d21 Deploy a sweeper pod to clear completed pod histories (#4500)
* Deploy a sweeper pod to clear completed pod histories
2021-07-07 10:32:09 +02:00
Christophe Duong
53b02e0499 Tweak .env in kube/overlays to connect to external db (#4403)
* Mirror .env to connect to external db
2021-07-01 19:27:33 +02:00
Christophe Duong
8b6093ce61 Configure kube pod resources for workers/syncs (#4381)
* Configure kube pod resources for workers/syncs
2021-07-01 12:25:14 +02:00
Jared Rhizor
19020d97b3 preserve kube db after deleting db pod (#4424) 2021-06-29 22:47:07 -07:00
Davin Chia
52bd5c96f0 🎉 Add Minio support to Kube. (#4365)
Implement logging to and reading from Minio. Use the same S3 client for this.

Configure Airbyte Kube Prod and Staging to use Minio by default, so Airbyte Kube is a standalone deployment.

Also update documentation to reflect this.
2021-06-30 09:42:50 +08:00
Subodh Kant Chaturvedi
0e194fb331 use old seed mount (#4413) 2021-06-30 01:21:48 +05:30
Subodh Kant Chaturvedi
887752822c 🎉 introduce automatic migration at the startup of server for docker environment (#3980)
* introduce automatic migration at the startup of server

* handle versions with non-zero patch

* it works!!!

* add dummy data

* cleanup orphan configs

* add more assertions

* format + add comments

* move migration acceptance test to acceptance test directory

* add automatic migration test to the build

* address review comments

* missed out on these

* format

* add more assertions

* format

* fix test

* format

* use default port for temporal

* move seed to server + introduce atomice replacement for config

* make tests better

* remove unwanted changes

* move atomic replacement logic behind persistence + pass path to latest seeds

* format

* update seeds

* review comments

* update seeds

* merge latest seeds with configs

* fix bug around latest seed

* update seed

* update seed

* seeds should be populated by separate container

* address review comment + change latest definition url

* update seeds

* format

* update seed references

* update seed

* update seed

* update seed

* update seed references

* update seed references + add Migration Acceptance Test

* update seed container in kube + disable automatic migration for kube + update docs

* update docs

* address review comments from Michel

* update doc

* temporary commmit to see if build becomes green

* delete seeds from airbyte config + undo temp commit
2021-06-29 23:50:00 +05:30
Christophe Duong
997344299a Specify kube namespace (#4317)
* Specify kube namespace

* remove work that turned out to be unnecessary and add debugging statement

* make it possible to run without configuring s3

* cleanup

* use default as default namespace

Co-authored-by: jrhizor <me@jaredrhizor.com>
2021-06-24 17:18:11 -07:00
Davin Chia
e60f5369e0 🎉 Kube Logs support stored and reading from S3. (#4053)
Use Log4j2 appender to support routing logs to S3.

Create LogClient to support reading from S3.

Some clean up of the Log4j2 xml variables.

Several dependency changes to be more explicit when configuring jackson.
2021-06-19 12:15:15 +08:00
Jared Rhizor
ef853153a5 kube zombie handling (#4137)
* working except for too much logging and bad success case

* succeeds on passing case

* completes successfully

* just doesn't kill the main

* working zombie killing

* cleanup

* more cleanup

* use correct path

* fmt

* cleanups, bugfixes, integration tests

* run worker integration tests as part of ci

* delete tester class

* fix hanging checkpoint container problem

* fix name of command

* replace todo with clarifying comment
2021-06-17 17:51:58 -07:00
Jared Rhizor
991cb68884 improve temporal configuration on kubernetes (#4183)
* use proper dynamic config

* fmt

* clarify comment
2021-06-17 17:50:50 -07:00
Jared Rhizor
34a695829e add fullstory (#4171)
* add fullstory

* fixes

* fix
2021-06-16 19:57:23 -07:00
Davin Chia
b04c080c95 Kube Queueing POC (#3464)
* Use CDK to generate source that can be configured to emit a certain number of records and always works.

* Checkpoint: socat works from inside the docker container.

* Override the entry point.

* Clean up and add ReadMe.

* Clean up socat.

* Checkpoint: connect to Kube cluster and list all the pods.

* Checkpoint: Sync worker pod is able to send output to the destination pod.

* Checkpoint: Sync worker creates Dest pod if none existed previously. It also waits for the pod to be ready before doing anything else. Sync worker will also remove the pod on termination.

* update readme

* Checkpoint: Dest pod does nott restart after finishing. Comment out delete command in Sync worker.

* working towards named pipes

* named pipes working

* update readme

* WIP named pipe / socat sidecar kube port forwarding (#3518)

* nearly working sources

* update

* stdin example

* move all kube testing yamls into the airbyte-workers directories. sort the airbyte-workers resource folder; place all the poc yamls together.

* Format.

* Put back the original KubeProcessBuilderFactory.

* Fix slight errors.

* Checkpoint: Worker pod knows its own IP. Successfully starts and writes to Dest pod after refactor.

* remove unused file and update readme

* Dest pod loops back into worker pod. However, the right messages do not seem to be passing in.

* Switch back to worker ip.

* SWEET VICTORY!.

* wrap kube pod in process (#3540)

also clean up kubernetes deploys.

* More clean up. (#3586)

The first 6 points of #3464.

The only interesting thing about this PR is the kube pod shutdown. For whatever reason, the OkHttpPool isn't respecting the evictAll call and 1 idle thread remains. So instead of shutting down immediately, the worker pod shuts down after 5 mins when the idle thread id reaped. There isn't an easy way to modify the pool's idle reap configuration now. I do not think this issue is blocking since it's relatively benign, so I vote we create a ticket and come back to this once we do an e2e test.

* Implements redirecting standard error as well. (#3623)

* Clean up before next implementation.

* kube process launching (#3790)

* processes must handle file mounting

* remove comment

* default to base entrypoint

* use process builder factory / select stdin / use a pool of ports

* fix up

* add super hacky copying example

* Checkpoint: Works end to end!

* Checkpoint: Use API to make sure init container is ready instead of blind sleep. Propagate exception in DefaultCheckConnectionWorker.

* Refactor KubePodProcess. Checked to make sure everything still works.

* Format.

* Clean up code. Begin putting this into variables and breaking up long constructor function.

* Add comments to explain what is happening.

* fix normalization test

* increase timeout for initcontainer

Co-authored-by: Davin Chia <davinchia@gmail.com>

* facepalm moment

* clean up kube poc pr (#3834)

* clean up

* remove source-always-works

* create separate commons-docker

* fix test

* enable kube e2e tests (#3866)

* enable kube e2e tests

* use more generally accepted env definition

* use new runners

* use its own runner and install minikube differently

* update name

* use kubectl alias

* use link instead of alias that doesn't propagate

* start minikube

* use driver=none

* go back to using action

* mess with versions

* revert runner

* install socat

* print logs after run

* also try re-runnining tasks

* always wait for file transfer

* use ports

* increase wait timeout for kube

* use different localhost ips and bump normalization to include an entrypoint

* proposed fix

* all working locally

* revert temporary changes

* revert normalization image change that's happening in a separate pr

* readability

* final comment

* Working Kube Cancel. (#3983)

* Port over the basic changes.

* Add logic to return proper exit code in the event of termination. Add comments to explain why.

* revert envs change and merge master to fix kube acceptance tests (#4012)

* use older env format

* fix build

Co-authored-by: jrhizor <me@jaredrhizor.com>
Co-authored-by: Jared Rhizor <jared@dataline.io>
2021-06-09 18:12:39 -07:00
Jared Rhizor
581b27335f 🎉 allow users to access both the api and webapp from the same port (#3603)
* single port v2

* fix upstream location

* add kube support

* fix .env.dev

* set INTERNAL_API_HOST for kube
2021-05-26 13:57:34 -07:00
Marcos Marx
ee9be7c6eb fix format (#3300) 2021-05-08 08:45:50 -03:00
Coetzee van Staden
bed3e06d8a Bump K8s deployment version to latest stable version (#3246)
* updating .env files

* adding temporal and bumping K8s-supported version

* adding temporal yaml manifest

* updating scheduler and server manifests to include envars
2021-05-07 21:21:54 -03:00