Some connectors (such as destination-s3) require to write some temporary data (generally to /tmp).
It is a good security practice to enforce read only root filesystem on Kubernetes pod, and, some productive Kubernetes clusters enforce that all pods run with read only root filesystem.
Therefore, in order to still allow connectors to write temporary data to /tmp with read only root fs, we must mount an emptyDir volume to /tmp.
The original PR was here: #9874 we decided to split it into 3 different PRs.
This limit for this will be done in https://github.com/airbytehq/airbyte/issues/11025.
* Pass worker metadata to connector
* Fix compilation
* Pass in job id and image from worker
* Remove application version
* Add default job environment variables
* Add back removed comment
* Rename env map to job metadata
* Fix env configs
* Read connector from application
* Use empty string
* Remove println
* Fix unit test
* Fix compilation error
* Introduce constants for worker env
* Add worker env to ENV_VARS_TO_TRANSFER
* Pass into getWorkerMetadata map to all constructions
* Format code
* Format octavia cli
* Fix test compilation
* Fix typos
This is adding tests to make sure that a reset is continued as a reset after an attempt or as a job when the maximum amount of attempt is reach.
It also fixes the workflow to continue as a reset in a new job if it fails more than the maximum number of attempt.
Open question:
- Is it what we want for the job (continue as a reset if the job failed)?
- Do we need to respect the schedule if the reset failed more than the maximum attempts?
* pipe through to worker
* wip
* pass source and dest def resource reqs to job client
* fix test
* use resource requirements utils to get resource reqs for legacy and new impls
* undo changes to pass sync input to container launcher worker factory
* remove import
* fix hierarchy order of resource requirements
* add nullable annotations
* undo change to test
* format
* use destination resource reqs for normalization and make resource req utils more flexible
* format
* refactor resource requirements utils and add tests
* switch to storing source/dest resource requirements directly on job sync config
* fix tests and javadocs
* use sync input resource requirements for container orchestrator pod
* do not set connection resource reqs to worker reqs
* add overrident requirement utils method + test + comment
Co-authored-by: lmossman <lake@airbyte.io>
Publish metrics for:
- created jobs tagged by release stage
- failed jobs tagged by release stage
- cancelled jobs tagged by release stage
- succeed jobs tagged by release stage
* stabilize connection manager tests
* just call shutdown once
* another run just so we can see if it's passing
* another run just so we can see if it's passing
* re-disable test
* run another test
* run another test
* run another test
* run another test
* test time ranges for cancellations
* try with wait
* fix cancellation on worker restart
* revert for CI testing that the test fails without the retry policy
* revert testing change
* matrix test the different possible cases
* re-enable new retry policy
* switch to no_retry
* switch back to new retry
* paramaterize correctly
* revert to no-retry
* re-enable new retry policy
* speed up test + fixees
* significantly speed up test
* fix ordering
* use multiple task queues in connection manager test
* use versioning for task queue change
* remove sync workflow registration for the connection manager queue
* use more specific example
* respond to parker's comments
* add timeout to init container command
* add disk usage check into init command
* fix up disk usage checking and logs from init entrypoint
* run format
This is changing the check to see if a connection exist in order to make it more performant and more accurate. It makes sure that the workflow is reachable by trying to query it.
Move the feature flag checks to the handler instead of the configuration API. This could have avoid some bug related to the missing flag check in the cloud project.
Set up a Metrics Registry. The purpose of this registry is to better enforce metrics -> application relationship, metric -> description relationship, provide a central location where folks can understand what metrics OSS AB emits, and enforce some standards.
Past experience has shown me that metrics emission can quickly get out of hand: 1) unclear what is emitted 2) similar metrics emitted in multiple places 3) not clear what metrics corresponds to what application.
This is my attempt to provide a framework for us to operate in.
Let me know if folks think this provides more complexity than is useful.
I've added the KubePodProcess metric in here to demonstrate/test how everything will work in practice.
* Update mysql normalization to cast string as text.
Bump docker version.
Update basic-normalization.md docs.
* Update docs PR reference
* Update mysql normalization to cast string as for is_timestamp_with_time_zone type
This is re-organizing the connectionManagerWorkflow in order to make it easier to understand. It is:
- Removing some un-needed variables
- Extracting the activity call in their own methods
- Reporting the status when possible do not wait to be out of the cancelation scope to report success and failure
- Avoid else condition in order to be more explicit on when we exit.
Make it possible to set resource limits specifically for Check Connection. This helps speed up the Check Connection operation for Java based connectors.
After this PR is merged, I will do an OSS release and make the required Helm changes in Cloud.
* fix normalization output processing in container orchestrator
* add full scheduler v2 acceptance tests
* speed up tests
* fixes
* clean up
* wip handle worker restarts
* only downtime during sync test not passing
* commit temp
* mostly cleaned up
* add attempt count check
* remove todo
* switch all pending checks to running checks
* use ++
* Update airbyte-container-orchestrator/src/main/java/io/airbyte/container_orchestrator/ContainerOrchestratorApp.java
Co-authored-by: Charles <giardina.charles@gmail.com>
* Update airbyte-workers/src/main/java/io/airbyte/workers/temporal/sync/LauncherWorker.java
Co-authored-by: Charles <giardina.charles@gmail.com>
* add more context
* remove unused arg
* test on CI that no_retry is insufficient
* revert back to orchestrator retry
* test for retry logic
* remove fialing test and switch back activity config to just no retry
Co-authored-by: Charles <giardina.charles@gmail.com>
* make status check interval env-configurable
* apply to test files to get the speed improvements
* evert "apply to test files to get the speed improvements"
This reverts commit 97159e3a8b.
* Revert "evert "apply to test files to get the speed improvements""
This reverts commit bf3c6a5612.
* Normalization Clickhouse: Fix exception in case password is not provided
* Do not provide password in dbt config in case there is no one
* bump connector version
* bump normalization version
Co-authored-by: Marcos Marx <marcosmarxm@gmail.com>
After an activity failure, we are blocking the workflow. A new query method is available to query the workflows to get the list of workflow being stuck.
Then the activity can be retry with a signal.
* split workerConfigs and processFactory by job type, env var for check job node selectors
* move status check interval to WorkerConfigs and customize for check worker
* add scaffolding for spec, discover, and sync configs
* optional orElse instead of orElseGet
* add replicationWorkerConfigs with custom resource requirements