1
0
mirror of synced 2026-01-25 01:01:56 -05:00
Commit Graph

367 Commits

Author SHA1 Message Date
Thibaud Chardonnens
921f4a13c6 add /tmp emptyDir volume to connector pods (#10761)
Some connectors (such as destination-s3) require to write some temporary data (generally to /tmp).
It is a good security practice to enforce read only root filesystem on Kubernetes pod, and, some productive Kubernetes clusters enforce that all pods run with read only root filesystem.
Therefore, in order to still allow connectors to write temporary data to /tmp with read only root fs, we must mount an emptyDir volume to /tmp.

The original PR was here: #9874 we decided to split it into 3 different PRs.

This limit for this will be done in https://github.com/airbytehq/airbyte/issues/11025.
2022-03-10 21:01:09 +08:00
LiRen Tu
f5748998c8 Fix unit test failures from env configs (#10998)
* Fix unit test failures from env configs

* Default null to empty string

* Format code

* Fix one more unit test

* Remove unused import
2022-03-09 14:24:54 -08:00
LiRen Tu
81417e6728 Add connector metadata as sentry tags (#10475)
* Pass worker metadata to connector

* Fix compilation

* Pass in job id and image from worker

* Remove application version

* Add default job environment variables

* Add back removed comment

* Rename env map to job metadata

* Fix env configs

* Read connector from application

* Use empty string

* Remove println

* Fix unit test

* Fix compilation error

* Introduce constants for worker env

* Add worker env to ENV_VARS_TO_TRANSFER

* Pass into getWorkerMetadata map to all constructions

* Format code

* Format octavia cli

* Fix test compilation

* Fix typos
2022-03-09 07:36:03 -08:00
Benoit Moriceau
780c98c476 Add test that check that we continue a reset as a reset if it failed (#10806)
This is adding tests to make sure that a reset is continued as a reset after an attempt or as a job when the maximum amount of attempt is reach.

It also fixes the workflow to continue as a reset in a new job if it fails more than the maximum number of attempt.

Open question:

- Is it what we want for the job (continue as a reset if the job failed)?
- Do we need to respect the schedule if the reset failed more than the maximum attempts?
2022-03-08 17:36:39 -08:00
Benoit Moriceau
fd1f9339ed Disable flaky test (#10927) 2022-03-07 15:26:10 -08:00
Davin Chia
7bbcb369aa Add failure origin metric. (#10884)
Part of https://docs.google.com/document/d/11pEUsHyKUhh4CtV3aReau3SUG-ncEvy6ROJRVln6YB4/edit#.

Introduce a metric to track failure origins.
2022-03-07 11:52:29 +08:00
Artemiy Kzr
e34c3578fd Destination Clickhouse: enable normalization for Secure connections (#10754)
* Clickhouse Destination: enable normalization for Secure connections

* bump normalization version

* run mypy check

* add lib

* install stubs running mypy

* rollback gradlew command

Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
2022-03-04 22:12:49 -03:00
Benoit Moriceau
fbe084f17b Rm remaining merge conflict in comments (#10876) 2022-03-04 17:01:36 -08:00
Lake Mossman
754452c220 raise timeout on flaky test (#10868) 2022-03-04 13:22:50 -08:00
Benoit Moriceau
f18d304a86 Add autoformat (#10808) 2022-03-02 15:34:45 -08:00
Jared Rhizor
5aecbc30f8 default to no resource limits for OSS (#10800) 2022-03-02 15:18:24 -08:00
Benoit Moriceau
b610393279 Extract event from the temporal worker run factory (#10739)
Extract of different events that can happen to a sync into a non temporal related interface.
2022-03-01 15:09:49 -08:00
Thibaud Chardonnens
ce9b967597 Adds default sidecar cpu request and limit and add resources to the init container (#10759) 2022-03-01 18:46:09 -03:00
Benoit Moriceau
a60aa5f1a0 Update temporal retention TTL from 7 to 30 days (#10635)
Increase the temporal retention to 30 days instead of 7. It will help with on call investigation.
2022-03-01 13:10:43 -08:00
Jared Rhizor
a46b885356 remove --cpu-shares flag (#10738) 2022-02-28 15:14:18 -08:00
Davin Chia
ed9c3dca02 Fix error NPE in metrics emission. (#10675) 2022-02-28 18:04:14 +08:00
lmossman
dd58eb3004 add mocks to tests 2022-02-25 11:19:40 -08:00
Charles
1242aa8b4a Set resource limits for connector definitions: expose in worker (#10483)
* pipe through to worker

* wip

* pass source and dest def resource reqs to job client

* fix test

* use resource requirements utils to get resource reqs for legacy and new impls

* undo changes to pass sync input to container launcher worker factory

* remove import

* fix hierarchy order of resource requirements

* add nullable annotations

* undo change to test

* format

* use destination resource reqs for normalization and make resource req utils more flexible

* format

* refactor resource requirements utils and add tests

* switch to storing source/dest resource requirements directly on job sync config

* fix tests and javadocs

* use sync input resource requirements for container orchestrator pod

* do not set connection resource reqs to worker reqs

* add overrident requirement utils method + test + comment

Co-authored-by: lmossman <lake@airbyte.io>
2022-02-25 10:00:30 -08:00
Davin Chia
e21ec5a15b Add attempt status by release stage metrics. (#10659)
Add,

- attempt_created_by_release_stage
- attempt_failed_by_release_stage
- attempt_succeeded_by_release_stage
2022-02-25 19:33:10 +08:00
Davin Chia
29095c659e Correct cancelled job metric name. (#10658) 2022-02-25 19:18:55 +08:00
Davin Chia
5bc6d814ba Cloud Dashboard 1 (#10628)
Publish metrics for:
- created jobs tagged by release stage
- failed jobs tagged by release stage
- cancelled jobs tagged by release stage
- succeed jobs tagged by release stage
2022-02-25 16:21:47 +08:00
Parker Mossman
2157b47b60 Log pod state if init pod wait condition times out (for debugging transient test issue) (#10639)
* log pod state if init pod search times out

* increase test timeout from 5 to 6 minutes to give kube pod process timeout time to trigger

* format
2022-02-24 14:29:09 -08:00
Jared Rhizor
def938a3c9 stabilize connection manager tests (#10606)
* stabilize connection manager tests

* just call shutdown once

* another run just so we can see if it's passing

* another run just so we can see if it's passing

* re-disable test

* run another test

* run another test

* run another test

* run another test
2022-02-24 14:00:43 -08:00
Benoit Moriceau
b4e14d7a44 Add missing continue as new (#10636) 2022-02-24 11:02:47 -08:00
Jared Rhizor
9bf67dd91d fix orchestrator restart problem for cloud (#10565)
* test time ranges for cancellations

* try with wait

* fix cancellation on worker restart

* revert for CI testing that the test fails without the retry policy

* revert testing change

* matrix test the different possible cases

* re-enable new retry policy

* switch to no_retry

* switch back to new retry

* paramaterize correctly

* revert to no-retry

* re-enable new retry policy

* speed up test + fixees

* significantly speed up test

* fix ordering

* use multiple task queues in connection manager test

* use versioning for task queue change

* remove sync workflow registration for the connection manager queue

* use more specific example

* respond to parker's comments
2022-02-23 22:11:39 -08:00
Parker Mossman
34be57c4c1 Add timeout to connector pod init container command (#10592)
* add timeout to init container command

* add disk usage check into init command

* fix up disk usage checking and logs from init entrypoint

* run format
2022-02-23 16:28:48 -08:00
Benoit Moriceau
22e4f6cd54 Change the block logic and block after the job creation (#10597)
This is changing the check to see if a connection exist in order to make it more performant and more accurate. It makes sure that the workflow is reachable by trying to query it.
2022-02-23 15:24:24 -08:00
Benoit Moriceau
2c09037597 Bmoric/move flag check to handler (#10469)
Move the feature flag checks to the handler instead of the configuration API. This could have avoid some bug related to the missing flag check in the cloud project.
2022-02-23 13:36:09 -08:00
Charles
2e4d91eb0a refactor ConnectionHelper so that conversion logic can be shared (#10480) 2022-02-22 18:08:42 -08:00
Subodh Kant Chaturvedi
54b134c255 convert enum to use the same thing as API (#10562)
* convert enum to use the same thing as API

* Fix flaky test

Co-authored-by: Benoit Moriceau <benoit@airbyte.io>
2022-02-22 16:27:12 -08:00
Benoit Moriceau
dfce970430 Fix input (#10557) 2022-02-22 12:29:35 -08:00
Maksym Pavlenok
bbd13802d8 🐛 Fix Python checker configs and Connector Base workflow (#10505) 2022-02-22 19:58:55 +02:00
Davin Chia
db4dcdda75 Cloud Health Dashboard Step 0: Set up Metrics Registry. (#10478)
Set up a Metrics Registry. The purpose of this registry is to better enforce metrics -> application relationship, metric -> description relationship, provide a central location where folks can understand what metrics OSS AB emits, and enforce some standards.

Past experience has shown me that metrics emission can quickly get out of hand: 1) unclear what is emitted 2) similar metrics emitted in multiple places 3) not clear what metrics corresponds to what application.

This is my attempt to provide a framework for us to operate in.

Let me know if folks think this provides more complexity than is useful.

I've added the KubePodProcess metric in here to demonstrate/test how everything will work in practice.
2022-02-22 23:29:43 +08:00
Charles
e7d7c773be add keepalive for TCP socket in KubePodProcess (#10528)
* add keepalive

* afsdlk
2022-02-22 20:33:17 +05:30
Vadym Hevlich
5464b1c830 🐛 Normalization: Fix sync from HubSpot to MySQL fails with "Row size too large" on create table (#10485)
* Update mysql normalization to cast string as text.
Bump docker version.
Update basic-normalization.md docs.

* Update docs PR reference

* Update mysql normalization to cast string as for is_timestamp_with_time_zone type
2022-02-22 14:22:26 +02:00
Malik Diarra
b26bdf7cc3 Add database object to ConfigRepository (#10473) 2022-02-18 17:56:37 -08:00
Benoit Moriceau
76e969f2e5 Bmoric/add worflow internal state (#10439)
Refactor the connection workflow in order to have a single final object as a workflow state
2022-02-18 13:14:49 -08:00
Benoit Moriceau
eb728c8dc4 Bmoric/refacto connection manager (#10370)
This is re-organizing the connectionManagerWorkflow in order to make it easier to understand. It is:

- Removing some un-needed variables
- Extracting the activity call in their own methods
- Reporting the status when possible do not wait to be out of the cancelation scope to report success and failure
- Avoid else condition in order to be more explicit on when we exit.
2022-02-18 11:29:22 -08:00
Davin Chia
ef673c5695 Inject check connection resource (#10410)
Make it possible to set resource limits specifically for Check Connection. This helps speed up the Check Connection operation for Java based connectors.

After this PR is merged, I will do an OSS release and make the required Helm changes in Cloud.
2022-02-18 16:06:11 +08:00
Benoit Moriceau
9d546d3ebb Add a maximum page size and use the count instead of the list (#10443)
* Add a maximum page size and use the count instead of the list

* Fix typo
2022-02-17 15:28:14 -08:00
Jared Rhizor
a66d8be03a continue workflows on restarts (#10294)
* fix normalization output processing in container orchestrator

* add full scheduler v2 acceptance tests

* speed up tests

* fixes

* clean up

* wip handle worker restarts

* only downtime during sync test not passing

* commit temp

* mostly cleaned up

* add attempt count check

* remove todo

* switch all pending checks to running checks

* use ++

* Update airbyte-container-orchestrator/src/main/java/io/airbyte/container_orchestrator/ContainerOrchestratorApp.java

Co-authored-by: Charles <giardina.charles@gmail.com>

* Update airbyte-workers/src/main/java/io/airbyte/workers/temporal/sync/LauncherWorker.java

Co-authored-by: Charles <giardina.charles@gmail.com>

* add more context

* remove unused arg

* test on CI that no_retry is insufficient

* revert back to orchestrator retry

* test for retry logic

* remove fialing test and switch back activity config to just no retry

Co-authored-by: Charles <giardina.charles@gmail.com>
2022-02-17 15:14:51 -08:00
LiRen Tu
049a11b2bc 🎉 Snowflake destination: reduce memory footprint (#10394)
* Add detailed logging for flushing

* Log sentry transaction event id

* Adjust logging

* Log memory usage

* Add jvm monitoring

* Remove log

* Remove port 9010

* Remove host network mode

* Sample record size

* Remove profiling code

* Add unit tests

* Use average estimation

* Rename variable

* Format code

* Bump version

* Revert unnecessary change

* Update doc

* Fix format

* Bump version in seed
2022-02-17 12:55:35 -08:00
Jared Rhizor
3da09aa152 make status checks configurable from env vars + use shorter replication interval for testing (#10368)
* make status check interval env-configurable

* apply to test files to get the speed improvements

* evert "apply to test files to get the speed improvements"

This reverts commit 97159e3a8b.

* Revert "evert "apply to test files to get the speed improvements""

This reverts commit bf3c6a5612.
2022-02-16 11:14:05 -08:00
Nikolai Korolev
1f908436fb Normalization Clickhouse: Fix exception in case password is not provided (#10219)
* Normalization Clickhouse: Fix exception in case password is not provided

* Do not provide password in dbt config in case there is no one

* bump connector version

* bump normalization version

Co-authored-by: Marcos Marx <marcosmarxm@gmail.com>
2022-02-16 15:56:08 -03:00
Benoit Moriceau
ab10996f89 If an activity is failing, stuck the workflow and make it queriable (#10121)
After an activity failure, we are blocking the workflow. A new query method is available to query the workflows to get the list of workflow being stuck.
Then the activity can be retry with a signal.
2022-02-15 14:32:03 -08:00
Parker Mossman
b742a451a0 Configure kube pod process per job type (#10200)
* split workerConfigs and processFactory by job type, env var for check job node selectors

* move status check interval to WorkerConfigs and customize for check worker

* add scaffolding for spec, discover, and sync configs

* optional orElse instead of orElseGet

* add replicationWorkerConfigs with custom resource requirements
2022-02-15 09:59:41 -08:00
Jared Rhizor
af5e133a89 fix race conditions in ConnectionManagerWorkflowTest (#10296) 2022-02-14 14:11:17 -08:00
Benoit Moriceau
c5e199f260 Disable flaky test (#10321) 2022-02-14 09:44:00 -08:00
VitaliiMaltsev
e30d8348b2 Change JsonSchemaPrimitive to a class (#9913)
* fix for jdk 17

* add JsonSchemaType class

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* fix Oracle tests

* fix Redshift tests

* fix Redshift tests

* fix checkstyle

* fix MSSQL tests

* fix cockroachdb tests

* fix checkstyle

* fix checkstyle

* replace star imports

* replace star imports

* replace star imports

* update JsonSchemaType | fixed checkstyle

* Remove unused variables in test

* Fix imports

* Expand imports

* Fix more imports

Co-authored-by: vmaltsev <vitalii.maltsev@globallogic.com>
Co-authored-by: Liren Tu <tuliren.git@outlook.com>
2022-02-14 02:12:37 -08:00
Lake Mossman
820a9ff840 do not wipe out existing connection resource requirements on update (#10291)
* do not wipe out existing connection resource requirements on update

* format

* do not pull any values from worker configs here

* remove logger
2022-02-11 15:57:13 -08:00