1
0
mirror of synced 2026-01-01 09:02:59 -05:00
Commit Graph

96 Commits

Author SHA1 Message Date
Lake Mossman
fc5ba66e1c Restart workflow after activity failure instead of quarantining (#13779)
* use SHORT_ACTIVITY_OPTIONS on check connection activity so that it has retries

* retry workflow after delay instead of quarantining

* allow activity env vars to be configured in docker and kube

* add env var for workflow restart delay and refactor slightly

* update tests to handle new restart behavior

* update test name

* add empty env var values to .env files

* fail attempt before job in cleanJobState to prevent state machine failure

* change default value of max activity attempt retries from 10 to 5
2022-06-15 16:07:47 -07:00
LiRen Tu
973f0b1165 Make connector adaptable based on deployment mode (#13522)
* Add deployment mode to env shared with jobs

* Add adaptive runners

* Migrate postgres source to use adaptive runner

* Add an array of specs in docker image spec definition

* Add copyright

* Parse docker image spec with specs list

* Update spec yaml files

* Pass in DEPLOYMENT_MODE to docker compose file

* Revert "Parse docker image spec with specs list"

This reverts commit 8fe41dd3b7.

* Revert changes in docker image spec

* Read cloud specific spec files based on deployment mode

* Revert "Update spec yaml files"

This reverts commit 059f326432.

* Publish cloud spec file if necessary

* Fix upload script

* Move test files

* Update docker compose file

* Format code

* Add comment about spec filename

* Add unit tests

* Remove redundant jdbc acceptance test

When running `PostgresStrictEncryptJdbcSourceAcceptanceTest`, the `discover` method tests always fail because there are unexpected columns in the catalog:
- `wakeup_at`
- `last_visited_at`
- `last_comment_at`

These columns only exist in `PostgresJdbcSourceAcceptanceTest`. And this failure cannot be reproduced locally.

The hypothesis is that when the JDBC unit tests are run on CI, they are run in parallel, and the same testcontainer is used for both tests. That's why the strict encrypt test can discover columns from the oridinary unit test.

Given that the JDBC strict encrypt test is basically redundant, it is removed.
2022-06-15 08:23:54 -07:00
Xiaohan Song
96149e2fe4 otel metric client implementation (#13473)
* Create interface, factory for metric client

* remove unused func

* change count val to use long

* PR fix

* otel metric client implementation

* merge conflicts resolve

* build fix

* add a test, moved version into deps catalog

* fix test
2022-06-08 14:39:09 -07:00
Lake Mossman
73034c64da Sweep old scheduler code (#13400)
* sweep all scheduler application code and new-scheduler conditional logic

* remove airbyte-scheduler from deployments and docs

* format

* remove 'v2' from github actions

* add back scheduler in delete deployment command

* remove scheduler parameters from helm chart values

* add back job cleaner + test and add comment

* remove now-unused env vars from code and docs

* format

* remove feature flags from web backend connection handler as it is no longer needed

* remove feature flags from config api as it is now longer needed

* remove feature flags input from config api test

* format + shorter url

* remove scheduler parameters from helm chart readme
2022-06-06 10:49:17 -07:00
Jonathan Pearlin
e06a9de60f Use Database Availability/Initialization Check (#13178)
* Use isolated database initialization logic

* Address PMD warnings

* Use test provider where possible

* Initialize database on bootloader load

* Combine availability and migration checks

* Ensure env vars are set

* Fix typo

* Avoid duplicate literals

* Add log message

* Use correct data source

* Revert change

* Update copyright

* Remove redundant exception catch/throw
2022-05-27 09:47:33 -04:00
terencecho
f4bb7b21b2 Add Auto-Disable Failing Connections feature (#11099)
* Add Disable Failing Connections feature

* Rename and cleanup

* list jobs based off connection id

* Move variables to env config and update unit tests

* Fix env flag name

* Fix missing name changes

* Add comments to unit test

* Address PR comments

* Support multiple config types

* Update unit tests

* Remove the attemptId notion in the connectionManagerWorkflow (#10780)

This is removing the attemptId from the create attempt activity to replace it with the attemptNumber. This will be modified in the workflow in a later commit.

* Revert "Remove the attemptId notion in the connectionManagerWorkflow (#10780)" (#11057)

This reverts commit 99338c852a.

* Revert "Revert "Remove the attemptId notion in the connectionManagerWorkflow (#10780)" (#11057)" (#11073)

This reverts commit 892dc7ec66.

* Revert "Revert "Revert "Remove the attemptId notion in the connectionManagerWorkflow (#10780)" (#11057)" (#11073)" (#11081)

This reverts commit e27bb74050.

* Add Disable Failing Connections feature

* Rename and cleanup

* Fix rebase

* only disable if first job is older than max days

* Return boolean for activity

* Return boolean for activity

* Add unit tests for ConnectionManagerWorkflow

* Utilize object output for activity and ignore non success or failed runs

* Utilize object output for activity and ignore non success or failed runs

Co-authored-by: Benoit Moriceau <benoit@airbyte.io>
2022-03-18 11:13:28 -07:00
Benoit Moriceau
2ac327486e Heartbeat for long running activity (#9852)
* Heartbeat for long running activity

* PR comments
2022-01-27 14:18:49 -08:00
Benoit Moriceau
4b148db54c Bmoric/deployment signal (#9799)
This is forcing the temporal sync workflow to fail. Using a signal didn't worked because the cancelation scope didn't cancel the child workflow.
2022-01-26 10:55:47 -08:00
Benoit Moriceau
e7da9232bb Fix record count and add acceptance test to the new scheduler (#9487)
* Add a job notification

The new scheduler was missing a notification step after the job is done.

This is needed in order to report the number of record of a sync.

* Acceptance test with the new scheduler

Add a new github action task to run the acceptances test with the new scheduler

* Retry if the failure

* PR comments
2022-01-19 18:16:19 -08:00
Davin Chia
c9adee6178 Clean up Docker compose env vars. (#9209)
- sort docker env vars.
- remove all non-docker related env vars.
- add what is missing.

For the .env file:
- sort the file to match the Configs.java lay out for better reading.
- get rid of env vars that are not used in docker
- get rid of env vars that have defaults, with the exception of var that are for scaling e.g. submitter_num_threads, worker related vars to prevent the env file from getting too large
- add a header to clarify when/where to add env vars to the file

For the docker compose file:
- sort the env vars alphabetically
- get rid of env vars that aren't used in that application
- add missing env vars into the worker application
2022-01-03 23:59:55 +08:00
Benoit Moriceau
389efbd23d Feature/new temporal scheduler (#8352)
This getting rid of scheduling with the scheduler app and the job submitter. It is replaced by a temporal workflow which will be responsible to schedule the syncs on time.
2021-12-23 20:15:38 +01:00
Augustin
c51fb7afe6 Improve JOB_POD variable naming + improve doc about memory management (#9048) 2021-12-23 18:42:13 +01:00
Davin Chia
60e32373e8 Revert "Revert "Switch to use Bootloader. (#8584)" (#8778)" (#8790)
This reverts commit 216501b4fa.

Turn this back on since this was originally reverted for logging update changes.
2021-12-15 14:09:43 +08:00
Davin Chia
216501b4fa Revert "Switch to use Bootloader. (#8584)" (#8778)
This reverts commit 5cf3967424.
2021-12-14 21:36:12 +08:00
Davin Chia
5cf3967424 Switch to use Bootloader. (#8584)
- Add the CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION and JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION. These are env vars that will determine if the database is ready for an application to start.
- Add the CONFIGS_DATABASE_INITIALIZATION_TIMEOUT_MS and the JOBS_DATABASE_INITIALIZATION_TIMEOUT_MS env vars to determine how long an application should wait for the DB before giving up.
- Create the MinimumFlywayMigrationVersionCheck class. This class contains all the assertions to check if 1) a database is initialised. 2) a database meets the minimum migration version.
- Remove all set up operations from the ServerApp. Use MinimumFlywayMigrationVersionCheck operations instead.
- I also had to modify the Databases and BaseDatabaseInstance classes to support connecting to a database with timeouts. We would previously try forever.
- Add Bootloader to the relevant docker files and Kube files.
- Clean up the migration acceptance tests so it's clear what is happening.
2021-12-14 21:30:18 +08:00
Jared Rhizor
58475ce2a4 Revert "Update platform containers to use non-root users (#7872)" (#8611)
This reverts commit ebcaf2bcad.
2021-12-07 21:55:06 -08:00
Per-Victor Persson
ebcaf2bcad Update platform containers to use non-root users (#7872)
* Update platform containers to use non-root users

* Update kube template for the webapp container to use port 8080

After having updated the webbapp nginx image to expose port 8080 instead of 80

* missing 80 -> 8080 changes

Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>
2021-12-07 21:40:32 -08:00
Davin Chia
341f505a94 Rename env vars for better readability. (#8447)
* Rename GcsStorageBucket to GcsLogBucket.

* Update all references to GCP_STORAGE_BUCKET to GCS_LOG_BUCKET.

* Undo this for configuration files for older Airbyte versions.

* Clean up Job env vars. (#8462)

* Rename MAX_SYNC_JOB_ATTEMPTS to SYNC_JOB_MAX_ATTEMPTS.

* Rename MAX_SYNC_TIMEOUT_DAYS to SYNC_JOB_MAX_TIMEOUT_DAYS.

* Rename WORKER_POD_TOLERATIONS to JOB_POD_TOLERATIONS.

* Rename WORKER_POD_NODE_SELECTORS to JOB_POD_NODE_SELECTORS.

* Rename JOB_IMAGE_PULL_POLICY to JOB_POD_MAIN_CONTAINER_IMAGE_PULL_POLICY.

* Rename JOBS_IMAGE_PULL_SECRET to JOB_POD_MAIN_CONTAINER_IMAGE_PULL_SECRET.

* Rename JOB_SOCAT_IMAGE to JOB_POD_SOCAT_IMAGE.

* Rename JOB_BUSYBOX_IMAGE to JOB_POD_BUSYBOX_IMAGE.

* Rename JOB_CURL_IMAGE to JOB_POD_CURL_IMAGE.

* Rename KUBE_NAMESPACE to JOB_POD_KUBE_NAMESPACE.

* Rename RESOURCE_CPU_REQUEST to JOB_POD_MAIN_CONTAINER_CPU_REQUEST.

* Rename RESOURCE_CPU_LIMIT to JOB_POD_MAIN_CONTAINER_CPU_LIMIT.

* Rename RESOURCE_MEMORY_REQUEST to JOB_POD_MAIN_CONTAINER_MEMORY_REQUEST.

* Rename RESOURCE_MEMORY_LIMIT to JOB_POD_MAIN_CONTAINER_MEMORY_LIMIT.

* Remove worker suffix from created pods to reduce confusion with actual worker pods.

* Use sync instead of worker to name job pods.
2021-12-03 23:28:48 +08:00
Lake Mossman
8775cbe1c2 Make spec field required on definitions + remove SpecFetcher usage (#8140)
* make spec field required

* remove spec backfill logic

* remove usages of specFetcher.getSpec()

* remove unused code and the caching scheduler clients

* fix tests to work with fetching specs from definitions

* fetch spec from definition in config repository and fix RunMigrationTest

* remove unused SpecFetcher methods/tests

* run gw format

* run gw format

* undo change to main method

* add back newlines

* set additional properties to true on destination definition

* remove now-unused VERSION_0_32_0_FORCE_UPGRADE env var
2021-11-30 16:53:56 -08:00
Charles
c9642a4882 🐛 Fix Migration Bug in v0.31.0-alpha (#7923) 2021-11-12 11:04:49 -08:00
Benoit Moriceau
9d05b1c477 Faux Major Version Bump (#7876) 2021-11-11 13:40:09 -08:00
Benoit Moriceau
73a9131526 Revert "Custom auto-setup temporal docker image (#7681)" (#7835)
This reverts commit e20d98fe0a.
2021-11-10 15:09:24 -08:00
Benoit Moriceau
e20d98fe0a Custom auto-setup temporal docker image (#7681)
This is a custom auto-setup script for the temporal environment. Unfortunately there is no other way properly update the DB without copy pasting parts of the temporal auto-setup script. Ideally temporal would provide a dedicated container for it DB but it is not the case right now.
2021-11-10 14:32:36 -08:00
LiRen Tu
779c39c088 Copy job attempt state to configs database (#7219)
* Add migration to create latest state table

* Log migration name

* Expose db variables to airbyte-db

* Implement migration

* Fix migration test

* temp

* Rebase on master

* Save state in temporal (#7253)

* Copy state to airbyte_configs table

* Add standard sync state

* Move state methods to config repository

* Add unit tests

* Fix unit tests

* Register standard sync state in migration

* Add comment

* Use config model instead of json node

* Add comments

* Remove unnecessary method

* Fix migration query

* Remove unused config database

* Move persist statement and log the call

* Update dev doc

* Add unit tests for sync workflow

Co-authored-by: Charles <giardina.charles@gmail.com>
2021-10-25 17:08:08 -07:00
Jared Rhizor
f82c9ce90e temporarily re-add webapp papercups/openreplay vars (#7015) 2021-10-13 10:28:08 -07:00
Artem Astapenko
fea3bf6798 Remove papercups (#6955)
* Remove papercups

* Remove support chat
2021-10-13 15:59:28 +03:00
Artem Astapenko
d0d92011e6 Remove openreplay (#6940)
* Remove openreplay

* Add lock file. Fix openreplay import
2021-10-13 15:17:59 +03:00
Jared Rhizor
f88b8313a8 add the ability to use a secret persistence (#6415)
* test exposing secrets in configrepo

* fix local persistence sql

* working propagation, just without check/discover replacements and without feature flagging

* switch if statement

* set up secret persistence for google secrets manager

* add ttl-based secret persistence for check/discover usage in the future

* set up check/discover to pass around necessary parts

* Revert "set up check/discover to pass around necessary parts"

This reverts commit 489d2d5f5d.

* working updates + check/discover operations

* fix additional configs created on deletion

* clean up docker compose file

* finish up configrepo

* make api path optional

* clean up schedulerapp and local testing persistence

* make optional in the worker app

* add rest of feature flagging

* fmt

* remove completed todo

* fix refactoring typo

* fix another refactoring typo

* fix compilation error in test case

* fix tests

* final cleanups

* fix conditional

* address a couple of things

* add hydrator interface

* add replaceAllConfigs

* specfetcher handling

* fix constructor

* fix test

* fix typo

* fix merge build error

* remove extra config

* fix integration test

* fix final piece
2021-09-29 11:53:29 -07:00
Jared Rhizor
c7d8055731 split scheduler and worker (#5737)
* docker-compose split of scheduler and worker

* fix heartbeat location bug + add support for kubernetes

* use two workers in integration tests

* capture logs in AirbyteTestContainer

* add waiting

* rename to make it easier to review

* rename module

* fix remaining conflicts

* allow configuring max workers of each type and document usage

* fix build

* remove comment

* add worker resource requiremetns

* try to fix for connector build

* fix regression in biuld

* add env comments for SUBMITTER_NUM_THREADS

* Update airbyte-workers/src/main/java/io/airbyte/workers/WorkerApp.java

Co-authored-by: Davin Chia <davinchia@gmail.com>

* Update airbyte-workers/src/main/java/io/airbyte/workers/temporal/TemporalPool.java

Co-authored-by: Davin Chia <davinchia@gmail.com>

* merge temporalpool into workerapp

* output docker system info

* move check to before

* remove unnecessary parts of the patch

* could this be the problem? i thought i added this

* show disk usage

* add print statements

* add pruning

* fix prune option

* use force

Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-09-08 11:27:32 -07:00
LiRen Tu
36ebf95b32 Revert "add data volume again (#5715)" (#5719) 2021-08-30 08:56:34 -07:00
Marcos Marx
32c2a6b1ad add data volume again (#5715) 2021-08-27 18:04:37 -03:00
Jared Rhizor
164d13866e allow scheduler and server to run on separate nodes (#5506)
* allow scheduler and server to run on separate nodes

* re-add workspace mount for docker compose only

* remove stacktrace printing

* add affinity testing components

* reorder mounts

* just try a two node cluster

* add waiting log line

* seed containers are now axed

* remove unused var

* add comment

* rename to integration-test
2021-08-26 15:51:16 -04:00
LiRen Tu
37c53db4bb Integrate database with Flyway and jOOQ (#5543)
* Implement database migrator

* Add unit tests

* Add RUN_FLYWAY_MIGRATION variable

* Run flyway migration in server

* Add db migration info api

* Add db migration migrate api

* Add unit test

* Remove base airbyte migration

* Implement migration dev helper

* Dry and format code

* Fix url

* Use camel case

* Add db migration page

* Add button to run migration

* Update migration table

* Fix resource warning

* Update readme

* Revert package-lock changes

* Update readme

* Address simple frontend review comments

* Add java migration template (not completed yet)

* Add method to generate migration file

* Set up jooq code generation

* Check in generated code

* Move generated code to build directory

* Exclude db dev center methods in gradle

* Update airbyte-db/README.md

Co-authored-by: Davin Chia <davinchia@gmail.com>

* Mark getMigrator as private

Co-authored-by: Davin Chia <davinchia@gmail.com>

* Address review comments

* Format code

* Fix format output column name

* Remove config persistence builder

* Remove dumpSchemaToFile method

* Run baseline in server

* Rename info to list

Co-authored-by: Charles <giardina.charles@gmail.com>

* Rename executeDbMigrationInfo to listMigrations

Co-authored-by: Charles <giardina.charles@gmail.com>

* Rename RUN_FLYWAY_MIGRATION

* Clean up migration apis

* Remove redundant version comparison

* Refactor db migrator

* Add migration file location parameter back

This is necessary because other databases may exist in a different module and follow different patterns.

* Fix build

* Generate jooq code in gradle

* Remove frontend changes

* Remove testing migration

Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: Charles <giardina.charles@gmail.com>
2021-08-25 04:34:19 -07:00
Davin Chia
9dd05d4790 🔪 Get rid of retries within an attempt. (#5570)
See https://airbytehq.slack.com/archives/C019WEENQRM/p1629383779144300 for more details.

TLDR:
Retrying within an attempt is confusing for a UX perspective (we already have attempts), does not provides more value from a retries perspective (we already have attempts) and inefficient from a resource perspective (hogs up api limits and compute/memory resources).

Follow up ticket: #5571
2021-08-24 10:11:20 +08:00
Vladimir remar
fe8f7faf8d 🎉 Set max_sync_timeout in days by env vars (#5297)
* Set max_sync_timeout in days by env vars

* update name var to MAX_SYNC_TIMEOUT_DAYS
2021-08-23 18:38:20 -03:00
LiRen Tu
79b8fd5c12 Remove seed generation task and seed from YAML files (#5335)
## What
- This is the first PR for #4890.
  - This PR does not remove the config volume.
  - This PR does not mount the directories for the local connectors.
- Resolves #5373.

## How
-  Previously the seed container copies the configs to the storage root, it may take some time for the operation to complete and for the `CONFIG_DIR` to show up. So we cannot infer anything based on the existence of this directory. Now this seed generation step has been removed. So we can tell immediately whether `CONFIG_DIR` exists or not.
  - If `CONFIG_DIR` exists, it means the user has just migrated Airbyte from an old version that uses this file system config persistence.
  - Otherwise, we can seed the config persistence from the YAML files.
2021-08-17 11:40:06 -07:00
Vladimir remar
9e5054b891 Set MAX_RETRIES and MAX_SYNC_JOB_ATTEMPTS by env vars (#5098)
* Set MAX_RETRIES and MAX_SYNC_JOB_ATTEMPTS by env vars
2021-08-02 16:33:18 +08:00
Davin Chia
196cbd51df Add worker env to docker deploy to silence any logs about picking default values. (#5068) 2021-07-29 17:06:48 +08:00
Artem Astapenko
660fca03e2 Add openreplay variable (#4844) 2021-07-20 07:13:03 +07:00
Davin Chia
8034aa5e57 Make number of Concurrent Jobs configurable. (#4687) 2021-07-19 20:17:59 +08:00
LiRen Tu
e577b4987e 🎉 Migrate config persistence to database (#4670)
* Implement db config persistence

* Fix database readiness check

* Reduce logging noise

* Setup config database in config persistence factory

* Update documentation

* Load seed from yaml files

* Refactor config persistence factory

* Add one more test to mimic migration

* Remove unnecessary changes

* Run code formatter

* Update placeholder env values

* Set default config database parameters in docker compose

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>

* Default setupDatabase to false

* Rename variable

* Set default config db parameters for server

* Remove config db parameters from the env file

* Remove unnecessary environment statements

* Hide config persistence factory (#4772)

* Remove CONFIG_DATABASE_HOST

* Use builder in the test

* Simplify config persistence builder

* Clarify config db connection readiness

* Format code

* Add logging

* Fix typo

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>

* Add a config_id only index

* Reuse record insertion code

* Add id field name to config schema

* Support data loading from legacy config schemas

* Log missing logs in migration test

* Move airbyte configs table to separate directory

* Update exception message

* Dump specific tables from the job database

* Remove postgres specific uuid extension

* Comment out future branch

* Default configs db variables to empty

When defaulting them to the jobs db variables, it somehow does not work.

* Log inserted config records

* Log all db write operations

* Add back config db variables in env file to mute warnings

* Log connection exception to debug flaky e2e test

* Leave config db variables empty

`.env` file does not support variable expansion.

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
Co-authored-by: Charles <giardina.charles@gmail.com>
2021-07-19 03:52:40 -07:00
Davin Chia
d0b994926b 🐛 Stub out the GCP Env Var in Docker to prevent noisy and harmless errors. (#4642)
* Add this to prevent noisy errors.
2021-07-09 16:37:48 +08:00
George Claireaux
2ca4f9441b 🐛 Kubernetes: Fix server starting up before Temporal ready to operate (#4567)
* refactor waitForTemporalServerAndLog into TemporalUtils

* remove use of docker-compose-wait
2021-07-07 09:51:00 +01:00
Duk Panhavad
a7196d40c9 Add LOG_LEVEL for Temporal docker (#4532) 2021-07-05 15:26:30 +08:00
Davin Chia
fd5c5be352 Inject LOG_LEVEL for docker. (#4498) 2021-07-02 16:57:12 +08:00
Christophe Duong
8b6093ce61 Configure kube pod resources for workers/syncs (#4381)
* Configure kube pod resources for workers/syncs
2021-07-01 12:25:14 +02:00
Subodh Kant Chaturvedi
4b6465d853 fix versioning for server in docker compose (#4430) 2021-06-30 13:05:33 +05:30
Subodh Kant Chaturvedi
0e194fb331 use old seed mount (#4413) 2021-06-30 01:21:48 +05:30
Subodh Kant Chaturvedi
887752822c 🎉 introduce automatic migration at the startup of server for docker environment (#3980)
* introduce automatic migration at the startup of server

* handle versions with non-zero patch

* it works!!!

* add dummy data

* cleanup orphan configs

* add more assertions

* format + add comments

* move migration acceptance test to acceptance test directory

* add automatic migration test to the build

* address review comments

* missed out on these

* format

* add more assertions

* format

* fix test

* format

* use default port for temporal

* move seed to server + introduce atomice replacement for config

* make tests better

* remove unwanted changes

* move atomic replacement logic behind persistence + pass path to latest seeds

* format

* update seeds

* review comments

* update seeds

* merge latest seeds with configs

* fix bug around latest seed

* update seed

* update seed

* seeds should be populated by separate container

* address review comment + change latest definition url

* update seeds

* format

* update seed references

* update seed

* update seed

* update seed

* update seed references

* update seed references + add Migration Acceptance Test

* update seed container in kube + disable automatic migration for kube + update docs

* update docs

* address review comments from Michel

* update doc

* temporary commmit to see if build becomes green

* delete seeds from airbyte config + undo temp commit
2021-06-29 23:50:00 +05:30
Jared Rhizor
0e5ec6a1b6 revert log-breaking change and add comments (#4303) 2021-06-23 11:27:35 -07:00