1
0
mirror of synced 2026-01-05 12:05:28 -05:00
Commit Graph

149 Commits

Author SHA1 Message Date
Lake Mossman
73034c64da Sweep old scheduler code (#13400)
* sweep all scheduler application code and new-scheduler conditional logic

* remove airbyte-scheduler from deployments and docs

* format

* remove 'v2' from github actions

* add back scheduler in delete deployment command

* remove scheduler parameters from helm chart values

* add back job cleaner + test and add comment

* remove now-unused env vars from code and docs

* format

* remove feature flags from web backend connection handler as it is no longer needed

* remove feature flags from config api as it is now longer needed

* remove feature flags input from config api test

* format + shorter url

* remove scheduler parameters from helm chart readme
2022-06-06 10:49:17 -07:00
Davin Chia
7788594e22 Start publishing proper artifacts. (#13484)
## What
Finale of https://github.com/airbytehq/airbyte/pull/13122.

We've renamed all directories in previous PRs. Here we remove the fat jar configuration and add publishing to all subprojects.

Explanation for what is happening:

Identically named subprojects have the following issues:
* publishing as is leads to classpath confusion when the jars with the same names are placed in the Java distribution. This leads to NoClassDefFound errors on runtime.
* deconflicting the jar names without changing directory names leads to dependency errors as the OSS jar pom files are generated using project dependencies (suggesting a dependency a sibling subproject in the same repo) that use subprojects group and name as a reference. This means the generated jars look for Jars that do not exists (as their names have been changed) and cannot compile.
* the workaround to changing a subproject's name involves resetting the subproject's name in the settings.gradle and depending on the new name in each build.gradle. This increases configuration burden and decreases the ease of reading, since one will have to check the settings.gradle to know what the right subproject name is. See https://github.com/gradle/gradle/issues/847 for more info.
* given that Gradle itself doesn't have support for identically named subprojects (see the linked issue), the simplest solution is to not allow duplicated directories. I've only renamed conflicting directories here to keep things simple. I will create a follow up issues to enforce non-identical subproject names in our builds.

## How
* Remove fat jar configuration.
* Add publishing to all subprojects.
2022-06-06 17:15:25 +08:00
Marcos Marx
9aa0220820 revert redshift and json operations (#13465) 2022-06-03 19:44:40 -03:00
Marcos Marx
adf4b6df25 run gradlew add headers, correct formatting (#13460) 2022-06-03 16:17:03 -03:00
Adam
071a8e9dc8 Destination Redshift: fixed array contents verification for SUPER (#13069)
* add array handling to json flatten

* bump version, add changelog entry

* test: add flatten unit tests

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-06-03 15:46:16 -03:00
LiRen Tu
3dcda7ae52 Use cheaper operation to estimate json data byte size (#13240)
* Simplify byte size estimation

* Format code

* Update comment
2022-05-26 19:42:54 -07:00
Alexandre Girard
3894134d11 Bump year in license short to 2022 (#13191)
* Bump to 2022

* format
2022-05-25 17:56:49 -07:00
Lake Mossman
26ed3856e1 Migrate OSS to temporal scheduler (#12757)
* Migrate OSS to temporal scheduler

* add comment about migration being performed in server

* add comments about removing migration logic

* formatting and add tests for migration logic

* rm duplicated test

* remove more duplicated build task

* remove retry

* disable acceptance tests that call temporal directly when on kube

* set NEW_SCHEDULER and CONTAINER_ORCHESTRATOR_ENABLED env vars to true to be consistent

* set default value of container orchestrator enabled to true

* Revert "set default value of container orchestrator enabled to true"

This reverts commit 21b36703a9.

* Revert "set NEW_SCHEDULER and CONTAINER_ORCHESTRATOR_ENABLED env vars to true to be consistent"

This reverts commit 6dd2ec04a2.

* Revert "Revert "set NEW_SCHEDULER and CONTAINER_ORCHESTRATOR_ENABLED env vars to true to be consistent""

This reverts commit 2f40f9da50.

* Revert "Revert "set default value of container orchestrator enabled to true""

This reverts commit 26068d5b31.

* fix sync workflow test

* remove defunct cancellation tests due to internal temporal error

* format - remove unused imports

* revert changes that set container orchestrator enabled to true everywhere

* remove NEW_SCHEDULER feature flag from .env files, and set CONTAINER_ORCHESTRATOR_ENABLED flag to true for kube .env files

Co-authored-by: Benoit Moriceau <benoit@airbyte.io>
2022-05-18 17:05:42 -07:00
Jonathan Pearlin
ebb9f3e1ac Prepare Database Access Layer for Dependency Injection (#12546)
* Prepare database access objects for dependency injection

* Replace duplicate code

* Remove unused imports

* Remove redundant validation call

* Remove unused imports

* Use constants

* Disable fast fail during connection pool initialization

* Remove typo

* Add missing test dependency

* Add missing test dependency

* Add missing test dependency

* Fix issue caused by rebase

* Add method for cloud

* Autoclose DSL context during migration

* Better connection close handling

* Fix typo in dependency

* Fix SpotBugs issue

* React to rebase

* Fix typo

* Update JavaDoc

* Fix database close calls

* Pass configs to getServer

* Fix typo

* Fix call to removed method

* Fix typo

* Use catalog to manage versions

* PR feedback

* Centralize shutdown hook

* Fix rebase issues

* Document test cases

* Document test cases

* Formatting

* Properly close database resources

* Rebase cleanup
2022-05-09 15:26:54 -04:00
Benoit Moriceau
455decc018 Migrate to a secret store by default (#12516)
This is adding a metadata entry in order to make the use of the DB secret store to be the default one.
It will avoid having secret outside of the secret table.
2022-05-06 12:45:13 -07:00
Benoit Moriceau
07359ffd77 Migrate secret from a non secret store to a secret store (#12088)
Introduce a migration to a secret manager

If a secret manager is specify, it will go though all the config, save the secret in the configured secret store. If the secret is already in a store, it will not migrate the secret to the secret store.
2022-05-04 16:40:33 -07:00
Benoit Moriceau
e8813ee60c Restore jsonPath and fix it (#12325)
This restore the Json traversal library. A bug was introduce in the Json path library, the PR fix it.

In a json schema we can define an enum without specifying a "type" attribute. It wasn't handle in the previous implemantation. We now return a right type in the getType method and process it the same way than the an integer/boolean/string type.
2022-04-28 13:58:59 -07:00
Subodh Kant Chaturvedi
10a3aa70a4 Revert "json schema traversal + secrets (#11847)" (#12185)
This reverts commit d41c3f7d6f.
2022-04-20 23:05:43 +05:30
Charles
d41c3f7d6f json schema traversal + secrets (#11847) 2022-04-14 17:37:06 -07:00
Parker Mossman
884a94ed29 Un-Revert OSS branch build for Cloud workflow (#11808)
* Revert "Revert "Build OSS branch for deploying to Cloud env (#11474)""

This reverts commit 55e3c1e051.

* add action to get dev branch tag to OSS project instead of doing it in cloud

* remove dev branch version action, going to do this in cloud afterall
2022-04-08 15:17:04 -07:00
Charles
f512208afd Introduce json path commons lib (#11680) 2022-04-08 12:03:57 -07:00
lmossman
55e3c1e051 Revert "Build OSS branch for deploying to Cloud env (#11474)"
This reverts commit 189efe7b42.
2022-04-05 15:44:31 -07:00
Parker Mossman
189efe7b42 Build OSS branch for deploying to Cloud env (#11474)
* add VERSION buildArg to Dockerfiles, default to current airbyte version but overwritable

* use VERSION env var consistently as Dockerfile buildArg, jar version, and tag

pass version and image_tag into docker build task function

* add github action for building and pushing an OSS branch for Cloud to consume

* allow AirbyteVersion to validate versions containing 'oss-branch' prefix

* change oss-branch prefix to dev for branch-based versions

* better action name

* add docker-compose-cloud.build.yaml to define minimum set of cloud images that are pushed by oss branch action

* update local dev docs to describe optional usage of VERSION env var

* make branch_version_tag input optional, if not provided, generates dev-<commit_hash>

* fix typo

* fix missed merge conflict

* update docker docs

* update integrationRunner isDev check
2022-04-05 15:06:17 -07:00
Christophe Duong
848bb349b5 🎉 Change destination-s3 buffering to reduce/stabilize memory/thread consumption (#11294)
* Refactor destination-s3 to use the new serialization strategy and get memory usage under control
2022-03-28 17:40:44 +02:00
Peter Hu
8fbbf4b7cf List actor definitions for workspace (#11336)
* writeActorDefinitionWorkspaceGrant

* listPublicActorDefinitions

* listGrantedActorDefinitions

* list concat helper

* listSourceDefinitionsForWorkspace

* listDestinationDefinitionsForWorkspace

* named mock definitions

don't rely on magic index

* remove excessive line breaks
2022-03-23 16:14:26 -07:00
Benoit Moriceau
f2cb12cbc0 Avoid putting secret in the exported JSON object (#11296)
The OSS deployment allows to do an export and import of the workspace configuration. If the users is not using a secret manager in their deployment, this will return API key and password in plain text. This is a serious security issue, especially since it has been presented by @timroes that some instance are exposed on the public internet. In order to avoid returning password or other values that should be in a secret we need to sanitize the export. This is what this PR is doing.
2022-03-23 10:08:09 -07:00
terencecho
f4bb7b21b2 Add Auto-Disable Failing Connections feature (#11099)
* Add Disable Failing Connections feature

* Rename and cleanup

* list jobs based off connection id

* Move variables to env config and update unit tests

* Fix env flag name

* Fix missing name changes

* Add comments to unit test

* Address PR comments

* Support multiple config types

* Update unit tests

* Remove the attemptId notion in the connectionManagerWorkflow (#10780)

This is removing the attemptId from the create attempt activity to replace it with the attemptNumber. This will be modified in the workflow in a later commit.

* Revert "Remove the attemptId notion in the connectionManagerWorkflow (#10780)" (#11057)

This reverts commit 99338c852a.

* Revert "Revert "Remove the attemptId notion in the connectionManagerWorkflow (#10780)" (#11057)" (#11073)

This reverts commit 892dc7ec66.

* Revert "Revert "Revert "Remove the attemptId notion in the connectionManagerWorkflow (#10780)" (#11057)" (#11073)" (#11081)

This reverts commit e27bb74050.

* Add Disable Failing Connections feature

* Rename and cleanup

* Fix rebase

* only disable if first job is older than max days

* Return boolean for activity

* Return boolean for activity

* Add unit tests for ConnectionManagerWorkflow

* Utilize object output for activity and ignore non success or failed runs

* Utilize object output for activity and ignore non success or failed runs

Co-authored-by: Benoit Moriceau <benoit@airbyte.io>
2022-03-18 11:13:28 -07:00
Charles
c1c8675366 Add readmes to all modules (#8893) 2022-03-13 14:45:36 -07:00
Davin Chia
af6c64d2d6 Reporter App Monitoring. (#11074)
Add a metric to monitor the monitoring app's rate of publishing metrics. Though this isn't perfect, it gives us some insight into whether metric publishing is okay or running into issues.
2022-03-13 11:48:56 +08:00
Charles
5fde59fdbd add spotbugs (#10522) 2022-03-11 12:05:17 -08:00
Davin Chia
a135ff2259 Refactor Reporter App. (#11070)
Refactor for readability.

Create a separate enum class. This enum is added to when a dev needs to add another metric to the reporter.

This helps us isolate the emission logic + scheduling configuration from the actual threads pushing the logic.
2022-03-12 02:03:54 +08:00
LiRen Tu
f5748998c8 Fix unit test failures from env configs (#10998)
* Fix unit test failures from env configs

* Default null to empty string

* Format code

* Fix one more unit test

* Remove unused import
2022-03-09 14:24:54 -08:00
Marcos Marx
44e8f6fcdf Auto-upgrade connectors when they are in use only a patch version update (#10515)
* auto-upgrade connectors there are in use with patch version only

* update check version docstring

* remove try/catch from hasNewPatchVersion

* refactor write std defs function

* run format

* add unit test and change exception

* update airbyte version function name to be more clear

* correct unit test in migration tests

* run format
2022-03-09 18:43:48 -03:00
LiRen Tu
81417e6728 Add connector metadata as sentry tags (#10475)
* Pass worker metadata to connector

* Fix compilation

* Pass in job id and image from worker

* Remove application version

* Add default job environment variables

* Add back removed comment

* Rename env map to job metadata

* Fix env configs

* Read connector from application

* Use empty string

* Remove println

* Fix unit test

* Fix compilation error

* Introduce constants for worker env

* Add worker env to ENV_VARS_TO_TRANSFER

* Pass into getWorkerMetadata map to all constructions

* Format code

* Format octavia cli

* Fix test compilation

* Fix typos
2022-03-09 07:36:03 -08:00
Peter Hu
c78b510028 push failures to segment (#10715)
* test: new failures metadata for segment tracking

* new failures metadata for segment tracking

failure_reasons: array of all failures (as json objects) for a job
- for general analytics on failures
main_failure_reason: main failure reason (as json object) for this job
- for operational usage (for Intercom)
- currently this is just the first failure reason chronologically
    - we'll probably to change this when we have more data on how to
determine failure reasons more intelligently

- added an attempt_id to failures so we can group failures by attempt
- removed stacktrace from failures since it's not clear how we'd use
these in an analytics use case (and because segment has a 32kb size
limit for events)

* remove attempt_id

attempt info is already in failure metadata

* explicitly sort failures array chronologically

* replace "unknown" enums with null

note: ImmutableMaps don't allow nulls

* move sorting to the correct place
2022-03-01 11:33:42 -08:00
Jared Rhizor
a85c02ab76 run format (#10445) 2022-02-17 15:28:28 -08:00
Edward Gao
07e2232025 Track const config values in analytics (#10120) 2022-02-17 15:17:03 -08:00
LiRen Tu
049a11b2bc 🎉 Snowflake destination: reduce memory footprint (#10394)
* Add detailed logging for flushing

* Log sentry transaction event id

* Adjust logging

* Log memory usage

* Add jvm monitoring

* Remove log

* Remove port 9010

* Remove host network mode

* Sample record size

* Remove profiling code

* Add unit tests

* Use average estimation

* Rename variable

* Format code

* Bump version

* Revert unnecessary change

* Update doc

* Fix format

* Bump version in seed
2022-02-17 12:55:35 -08:00
Jared Rhizor
a843fdac0f allow injecting the image for the container orchestrator (#10169) 2022-02-07 21:02:52 -08:00
Benoit Moriceau
20b940abcc Add logs (#10158) 2022-02-07 12:14:35 -08:00
Jared Rhizor
27e4c71fa4 fix container orchestrator logging (#10117)
* fix container orchestrator logging

* just reconfiguring is sufficient

* remove out of date comment
2022-02-04 15:38:53 -08:00
Jared Rhizor
c1ae3073d1 upgrade log4j2 to 2.17.1 (#8977)
* upgrade log4j2 to 2.17.0

* Use colors instead of background.

* Put this back.

* Fix syntax.

* go back to original log colors

* fix comments + remove system property -> env var fallback mechanism

* upgrade all the way to 2.17.1

Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: lmossman <lake@airbyte.io>
2022-02-04 10:00:21 -08:00
Jared Rhizor
db4093277f async container launch kubernetes "process" (#9242)
* add misc todos

* save work so far

* configure async pod processing

* remove comment

* fmt

* working except logging propagation?

* add comment

* add logging and misc configuration fixes

* add output propagation

* fix state reading

* logging is working (but background highlighting is not)

* fix log highlighting

* use sys instead of ctx

* comment

* clean up and test state management

* clean up orchestrator app construction

* unify launcher workers and handle resuming

* respond to comments

* misc

* disable

* fix comment

* respond to comments
2022-01-20 07:56:06 -08:00
Benoit Moriceau
e7da9232bb Fix record count and add acceptance test to the new scheduler (#9487)
* Add a job notification

The new scheduler was missing a notification step after the job is done.

This is needed in order to report the number of record of a sync.

* Acceptance test with the new scheduler

Add a new github action task to run the acceptances test with the new scheduler

* Retry if the failure

* PR comments
2022-01-19 18:16:19 -08:00
Davin Chia
8c3c68c160 Document various available configuration. (#9249)
- Add comments to the interface methods in Configs.java.
- Add new document on configuring airbyte. Transfer the non internal-only variables to this document.
2022-01-04 17:27:58 +08:00
Alexander Tsukanov
eea41b4fc8 🎉 Destination Snowflake and RedShift: Implement the Byte-buffered logic (#8869)
* airbyte-8336: Byte based approach.

* test-commit

* airbyte-8336: Split file by cnhunks.

* airbyte-8336: Renamed variable.

* airbyte-8336: make snowflake DEFAULT_MAX_BATCH_SIZE_BYTES_SNOWFLAKE constant.

* airbyte-8336: make snowflake DEFAULT_MAX_BATCH_SIZE_BYTES_SNOWFLAKE constant.

* airbyte-8336: make snowflake DEFAULT_MAX_BATCH_SIZE_BYTES_SNOWFLAKE constant.

* airbyte-8336: fix of unit tests

* airbyte-8336: Changed to default buffer size in SnowFlake.

* airbyte-8336: Changed 15 GB to 1 GB for max size.

* airbyte-8336: Changed to default buffer size in SnowFlake.

* airbyte-8336: Bumped connector version.

* airbyte-8336: Bumped connector version.

* airbyte-8336: Bumped connector version.
2021-12-24 12:32:10 +02:00
Benoit Moriceau
389efbd23d Feature/new temporal scheduler (#8352)
This getting rid of scheduling with the scheduler app and the job submitter. It is replaced by a temporal workflow which will be responsible to schedule the syncs on time.
2021-12-23 20:15:38 +01:00
Subodh Kant Chaturvedi
8654c4a62f implement flyway migration for config database normalization + tests (#8563)
* implement flyway migration for config database normalization + tests

* use Enums + create connection operation table

* address review comments

* format

* undo change

* handle null value for enums

* implement new database config persistence (#8620)

* implement new database config persistence

* make new persistence compatible with applications

* update tests

* do not update createdAt timestamp

* review comments + incorporate changes from bootloader

* address review comments

* fixed test + remove unused method

* fix archive handler test

* handle null value for tombstone

* add logs for assert migration

* final review comments
2021-12-21 22:33:25 +05:30
Charles
cc70c0f721 allow serializing to yaml without quoting strings (#8896) 2021-12-19 11:16:50 -08:00
Jared Rhizor
20c488948e add container orchestrator in feature flag (#8015)
* it's working

* clean up process factory

* fix and add tests

* clarify airbyte_default

* fix build

* fix kube acceptance test (maybe)

* oops

* fix output prop issue

* fix output propagation regression

* fix kube singleton problem

* sync passing on kube but getting wrong exit code of 7

* misc

* fix port usage

* remove host port that causes conflicts

* eliminate envconfigs static usage

* this took way too long to figure out

* get rid of annoying ==== this is new log messages

* finally successfully completing syncs

* stop using magic strings and clean up logging

* misc minor cleanups

* fmt

* misc

* correct

* misc fixes

* rename + misc

* better logs

* logging fix 1

* logging fix 1 -- fixed

* finally get logging working nicely

* add comment for simplification

* fmt

* misc

* fmt

* break into separate class

* remove comment

* remove flaky multi-node testing

* try to fix connector build

* remove separate node check

* switch to new configs type

* fix regression from logging config changes

* only log path one time

* remove misleading setting terminology

* fix connector build

* fix earlier merge conflict

* fix runtime kubernetes bug

* fix connector build (again)

* greatly simplify logging config by forcing the container-orchestrator to use default (non-job) logging

* add secret insertion for orchestrator on kube

* fix k8s ports

* add four ports

* fix logging test regression

* temporarily disable kube tests to check logging

* improve comments

* make Docker run more secure by limiting env vars transferred

* re-enable kubernetes tests

* fix conflict

* fix docker launching

* revert temporal hacks

* match master

* fix typo

* remove completed todo

* fix conflict

* increase memory requirement to something reasonable

* Update airbyte-container-orchestrator/Dockerfile

Co-authored-by: Davin Chia <davinchia@gmail.com>

* Update airbyte-container-orchestrator/Dockerfile

Co-authored-by: Davin Chia <davinchia@gmail.com>

* see if this stabilitizes tests

* address review comments

* bump new container version

* revert temporary addition

* change port back to 9000

* make re-initialization actually a no-op

* add feature flag

* fix version from merging earlier

* fix dockerfile

* fix connector build

* fix

* bump node version

* fix dockerfile

Co-authored-by: Davin Chia <davinchia@gmail.com>
2021-12-17 16:51:17 -08:00
Davin Chia
bab9edabd8 Use throwable for tryDeserialise. (#8631) 2021-12-09 01:32:06 +08:00
Davin Chia
635109aeba 🐛 Fix forgotten rename. (#8511) 2021-12-04 23:48:49 +08:00
Davin Chia
341f505a94 Rename env vars for better readability. (#8447)
* Rename GcsStorageBucket to GcsLogBucket.

* Update all references to GCP_STORAGE_BUCKET to GCS_LOG_BUCKET.

* Undo this for configuration files for older Airbyte versions.

* Clean up Job env vars. (#8462)

* Rename MAX_SYNC_JOB_ATTEMPTS to SYNC_JOB_MAX_ATTEMPTS.

* Rename MAX_SYNC_TIMEOUT_DAYS to SYNC_JOB_MAX_TIMEOUT_DAYS.

* Rename WORKER_POD_TOLERATIONS to JOB_POD_TOLERATIONS.

* Rename WORKER_POD_NODE_SELECTORS to JOB_POD_NODE_SELECTORS.

* Rename JOB_IMAGE_PULL_POLICY to JOB_POD_MAIN_CONTAINER_IMAGE_PULL_POLICY.

* Rename JOBS_IMAGE_PULL_SECRET to JOB_POD_MAIN_CONTAINER_IMAGE_PULL_SECRET.

* Rename JOB_SOCAT_IMAGE to JOB_POD_SOCAT_IMAGE.

* Rename JOB_BUSYBOX_IMAGE to JOB_POD_BUSYBOX_IMAGE.

* Rename JOB_CURL_IMAGE to JOB_POD_CURL_IMAGE.

* Rename KUBE_NAMESPACE to JOB_POD_KUBE_NAMESPACE.

* Rename RESOURCE_CPU_REQUEST to JOB_POD_MAIN_CONTAINER_CPU_REQUEST.

* Rename RESOURCE_CPU_LIMIT to JOB_POD_MAIN_CONTAINER_CPU_LIMIT.

* Rename RESOURCE_MEMORY_REQUEST to JOB_POD_MAIN_CONTAINER_MEMORY_REQUEST.

* Rename RESOURCE_MEMORY_LIMIT to JOB_POD_MAIN_CONTAINER_MEMORY_LIMIT.

* Remove worker suffix from created pods to reduce confusion with actual worker pods.

* Use sync instead of worker to name job pods.
2021-12-03 23:28:48 +08:00
Charles
ada2e1724a Refactor MigrationAcceptanceTest to test for major version bumps (#8154) 2021-11-29 20:14:18 -08:00
mkhokh-33
5032addf3e 🐛 Source MySQL: transform binary data base64 format (#8047)
* Source-MySql: transform binary data base64 format, add integration tests

* Source-MySql: fix code style

* Source-MySql: bump versions

* Source-MySql: bump versions in source_specs.yaml

* Source-MySql: added test for stream with binary data for DestinationAbstractTest

* Source-MySql: added format
2021-11-23 16:04:48 +02:00