* track latest config message
* pass new config as part of outputs
* persist new config
* persist config as the messages come through, dont set output
* clean up old implementation
* accept control messages for destinations
* get api client from micronaut
* mask instance-wide oauth params when updating configs
* defaultreplicationworker tests
* formatting
* tests for source/destination handlers
* rm todo
* refactor test a bit to fix pmd
* fix pmd
* fix test
* add PersistConfigHelperTest
* update message tracker comment
* fix pmd
* format
* move ApiClientBeanFactory to commons-worker, use in container-orchestrator
* pull out config updating to separate methods
* add jitter
* rename PersistConfigHelper -> UpdateConnectorConfigHelper, docs
* fix exception type
* fmt
* move message type check into runnable
* formatting
* pass api client env vars to container orchestrator
* pass micronaut envs to container orchestrator
* print stacktrace for debugging
* different api host for container orchestrator
* fix default env var
* format
* fix errors after merge
* set source and destination actor id as part of the sync input
* fix: get destination definition
* fix null ptr
* remove "actor" from naming
* fix missing change from rename
* revert ContainerOrchestratorConfigBeanFactory changes
* inject sourceapi/destinationapi directly rather than airbyteapiclient
* UpdateConnectorConfigHelper -> ConnectorConfigUpdater
* rm log
* fix test
* dont fail on config update error
* pass id, not full config to runnables/accept control message
* add new config required for api client
* add test file
* fix test compatibility
* mount data plane credentials secret to container orchestrator (#20724)
* mount data plane credentials secret to container orchestrator
* rm copy-pasta
* properly handle empty strings
* set env vars like before
* use the right config vars
Follow up PR to #20787 . Make stats available to the read apis so these are available to the webapp.
After this, all that is left is writing these stats as the job progresses.
Add the required logic in JobHistoryHandler.java.
Took the chance to also rename our internal Attempt models field from id to attemptNumber to better reflect that the field stores not the row's database id, but the job's attempt number. Most of the files changes here are due to that rename.
* pass workspace id to sync workflow and use it to selectively enable field selection
* fix tests around workspace id in job creation
* make sure field selection environment variables get passed through properly
* clean up handling around field selection flags
* debug logging for field selection
* properly handle empty field selection feature flag
* fix pmd
* actually fix pmd
* addlog
* fix applicaion.yml
* remove logging
* var name for boolean
* test setup
* test
* more fix for testing
* self review
* remove unrelated changes
* remove unwanted cdk changes
* more clean ups
* Disable auto detect schema activity bits
* Disable impacted tests
* Disable auto detect schema checks
* Add comment as to why code has been disabled
* Fix PMD warnings
* Fix PMD warning
Co-authored-by: Peter Hu <peter@airbyte.io>
* updated IntegrationLauncherConfig.yaml and added to this class suportDBT and normalizationImage fields. Added to the GenerateInputActivityImpl and TemporalClient classes code parts for read destination_definition.yaml and get suportDBT and normalizationImage fields. Added logging and comparing normalization images from NormalizationRunnerFactory and destination_definition.yaml
* updated minor remarks
* updated minor remarks
* fixed minor remarks
* added normalization data to the tests
* fixed minor remarks
* removed NormalizationRunnerFactory
* fixed remarks
* fixed remarks
* fixed remarks
* updated acceptance tests
* updated acceptance tests
* updated check_images_exist.sh script
* updated method for get normalization image name for destination acceptance test
* fixed code style
* fixed code style and removed tests data
* updated JobErrorReporterTest.java
* updated JobErrorReporterTest.java
* fixed remarks
* added integration type field to the dectination_definition file and actor_definition table
* fixed tests
* fixed tests
* fixed minor changes after pulling master changes
* fixed minor changes after pulling master changes
* renamed integrationType to normalizationIntegrationType/ fixed minor remarks
* renamed extra dependencies
* updated docs
* updated docs
* fixed minor remarks
* added NormalizationDestinationDefinitionConfig.yaml for StandardDestinationDefinition.yaml and updated configuration
* updated normalization tag
* updated DestinationAcceptanceTest.java
* updated DestinationAcceptanceTest.java
* updated imports and descriptions
* Revert "fix: remove unused CONTAINER_ORCHESTRATOR_ENABLED var (#20261)"
This reverts commit ce29361b55.
* docs: add additional commentary on flag usage
* implement column filtering in the replication workflow
* fixes to column selection in replication workflow
* add a basic acceptance test for column selection
* make CI acceptance tests run with new field selection flag enabled
* fix format
* readability improvements around columns selection tests and other small fixes
* kubeprocessfactory flag
* plumbing through custom connector bit
* config, style, pmd
* fix more missing configs pass along
* fix sync config
* test fix
* add a flag to specify if we want to use a separate pool or not
* add missing micronaut configs
* make orchestrator job to run in custom pool too
* micronaut fix
* pass config to orchestrator
* fix test
* destination test fix
* PR comments fix
* style fix
* comment fix
* no checks on kubeprocess
* rename
* add flags for reset work
* test fix
* kubeprocessfactory flag
* plumbing through custom connector bit
* config, style, pmd
* fix more missing configs pass along
* fix sync config
* test fix
* add a flag to specify if we want to use a separate pool or not
* add missing micronaut configs
* make orchestrator job to run in custom pool too
* micronaut fix
* pass config to orchestrator
* fix test
* destination test fix
* PR comments fix
* style fix
* comment fix
* no checks on kubeprocess
* rename
Follow up from ##19814, where we introduced the StreamStats object to consolidate/simplify some of the stats memory objects.
In this PR, we extend the StreamStats object to also include the emitted records and bytes.
- Make StreamStats into a proper object. We cannot use a record as record fields are immutable. We need mutable fields to count.
- Consolidate the emitted records into StreamStats.
- Take the chance to move all the stats/metrics related classes into a book_keeping package to keep things clean.
* updated IntegrationLauncherConfig.yaml and added to this class suportDBT and normalizationImage fields. Added to the GenerateInputActivityImpl and TemporalClient classes code parts for read destination_definition.yaml and get suportDBT and normalizationImage fields. Added logging and comparing normalization images from NormalizationRunnerFactory and destination_definition.yaml
* updated minor remarks
* updated minor remarks
* fixed minor remarks
* added normalization data to the tests
* fixed minor remarks
* fixed remarks
Implement estimate message processing allowing the platform to hold on to estimate message counts in memory.
The estimate message is protocol message connectors can choose to emit to provide support for progress bar calculations. There are two kinds of estimates, per-Sync or per-Stream. Sources cannot emit both types in a single sync.
Per-stream estimates are what we usually expect. Per-sync estimates are for sources that cannot provide more granular estimates for whatever reasons e.g. CDC sources.
In a follow up PR, the platform will periodically save these messages through the save stats api.
Today we often see HTTP/1.1 header parser received no bytes' during syncs, especially in the Data Plane.
This PR attempts to fix this by adding naive retries.
Add a basic retry wrapper with the unique ability to retry for a much longer period on the last retry. This is particularly useful for us as most of our jobs are long running workflows, and the benefit of not having to restart the entire job outweighs the added wait time.
Alternative solutions I explored:
- Switching the underlying HTTP client to a more fully featured HTTP client. E.g. Apache or OkHttp. Issues with this:
- These clients do not support the ability to configure the retry policy we want.
- These clients do not support the ability to inject application aware logging.
- Most importantly, because this changes the interface, the resulting change set is big and affects many unrelated classes. I do think we eventually want to switch the underlying libraries out. However I don't think we should do this as part of OC work.
- Exploring pairing retry libraries such as https://resilience4j.readme.io/docs with the native http clients. The main issue here is the lack of ability to configure the last retry period.
Since the hand-rolled wrapper is simple + gets the job done, my thoughts are to run with this for the time being and revisit this if additional requirements around the clients come up.
* Tmp
* Move when the deletion is performed
* Re-enable disable test
* PR comments
* Use cancel
* rename
* Fix test and version check position
* remove unused temporal deletion code
* Remove false todo
* Rm repeated test
* Rm unused import
* Tmp
* Move when the deletion is performed
* Re-enable disable test
* PR comments
* Use cancel
* rename
* Fix test and version check position
* Log exception