Let's try #20937 again, this time with better test for error cases.
See original PR for description.
This PR adds testing and logic to handle empty/bad job input.
* track latest config message
* pass new config as part of outputs
* persist new config
* persist config as the messages come through, dont set output
* clean up old implementation
* accept control messages for destinations
* get api client from micronaut
* mask instance-wide oauth params when updating configs
* defaultreplicationworker tests
* formatting
* tests for source/destination handlers
* rm todo
* refactor test a bit to fix pmd
* fix pmd
* fix test
* add PersistConfigHelperTest
* update message tracker comment
* fix pmd
* format
* move ApiClientBeanFactory to commons-worker, use in container-orchestrator
* pull out config updating to separate methods
* add jitter
* rename PersistConfigHelper -> UpdateConnectorConfigHelper, docs
* fix exception type
* fmt
* move message type check into runnable
* formatting
* pass api client env vars to container orchestrator
* pass micronaut envs to container orchestrator
* print stacktrace for debugging
* different api host for container orchestrator
* fix default env var
* format
* fix errors after merge
* set source and destination actor id as part of the sync input
* fix: get destination definition
* fix null ptr
* remove "actor" from naming
* fix missing change from rename
* revert ContainerOrchestratorConfigBeanFactory changes
* inject sourceapi/destinationapi directly rather than airbyteapiclient
* UpdateConnectorConfigHelper -> ConnectorConfigUpdater
* rm log
* fix test
* dont fail on config update error
* pass id, not full config to runnables/accept control message
* add new config required for api client
* add test file
* fix test compatibility
* mount data plane credentials secret to container orchestrator (#20724)
* mount data plane credentials secret to container orchestrator
* rm copy-pasta
* properly handle empty strings
* set env vars like before
* use the right config vars
Follow up PR to #20787 . Make stats available to the read apis so these are available to the webapp.
After this, all that is left is writing these stats as the job progresses.
Add the required logic in JobHistoryHandler.java.
Took the chance to also rename our internal Attempt models field from id to attemptNumber to better reflect that the field stores not the row's database id, but the job's attempt number. Most of the files changes here are due to that rename.
* pass workspace id to sync workflow and use it to selectively enable field selection
* fix tests around workspace id in job creation
* make sure field selection environment variables get passed through properly
* clean up handling around field selection flags
* debug logging for field selection
* properly handle empty field selection feature flag
* fix pmd
* actually fix pmd
Implement the persistence layer changes following #19191.
This PR handles writing and reading stats to the new stream stat_table and columns in the existing sync_stats table.
At the same time we introduce upserts of stats records - i.e. merge updates into a single record - in preparation for real time stats updates vs the current approach where a new stat record is always written.
There will be two remaining PRs after this:
- First PR will be to fully wire up and test the API.
- Second PR will be to actually save stats while jobs are running.
* updated IntegrationLauncherConfig.yaml and added to this class suportDBT and normalizationImage fields. Added to the GenerateInputActivityImpl and TemporalClient classes code parts for read destination_definition.yaml and get suportDBT and normalizationImage fields. Added logging and comparing normalization images from NormalizationRunnerFactory and destination_definition.yaml
* updated minor remarks
* updated minor remarks
* fixed minor remarks
* added normalization data to the tests
* fixed minor remarks
* removed NormalizationRunnerFactory
* fixed remarks
* fixed remarks
* fixed remarks
* updated acceptance tests
* updated acceptance tests
* updated check_images_exist.sh script
* updated method for get normalization image name for destination acceptance test
* fixed code style
* fixed code style and removed tests data
* updated JobErrorReporterTest.java
* updated JobErrorReporterTest.java
* fixed remarks
* added integration type field to the dectination_definition file and actor_definition table
* fixed tests
* fixed tests
* fixed minor changes after pulling master changes
* fixed minor changes after pulling master changes
* renamed integrationType to normalizationIntegrationType/ fixed minor remarks
* renamed extra dependencies
* updated docs
* updated docs
* fixed minor remarks
* added NormalizationDestinationDefinitionConfig.yaml for StandardDestinationDefinition.yaml and updated configuration
* updated normalization tag
* updated DestinationAcceptanceTest.java
* updated DestinationAcceptanceTest.java
* updated imports and descriptions
* kubeprocessfactory flag
* plumbing through custom connector bit
* config, style, pmd
* fix more missing configs pass along
* fix sync config
* test fix
* add a flag to specify if we want to use a separate pool or not
* add missing micronaut configs
* make orchestrator job to run in custom pool too
* micronaut fix
* pass config to orchestrator
* fix test
* destination test fix
* PR comments fix
* style fix
* comment fix
* no checks on kubeprocess
* rename
* add flags for reset work
* test fix
* kubeprocessfactory flag
* plumbing through custom connector bit
* config, style, pmd
* fix more missing configs pass along
* fix sync config
* test fix
* add a flag to specify if we want to use a separate pool or not
* add missing micronaut configs
* make orchestrator job to run in custom pool too
* micronaut fix
* pass config to orchestrator
* fix test
* destination test fix
* PR comments fix
* style fix
* comment fix
* no checks on kubeprocess
* rename
Some refactoring in preparation for the progress bar persistence changes.
The main change here was to simplify some of the JobPersistence methods by moving the logic to calculate attemptId into the JobPersistence implementation. This logic currently sits outside the class and is duplicated in multiple places. We could expose a helper method to calculate this logic, however that felt unnecessary at this point.
The alternative is further duplicating this logic as the progress bar logic is implemented, so I want to get that out of the way.
The other reason it's cleaner to use jobId and attemptNumber is these concepts/terms are more familiar throughout the rest of the codebase and it feels more intuitive to continue speaking this language (in my opinion).
Some random bits I wanted to clean up on the way as well. I will leave comments in the files as appropriate.
Method interface:
- has generic type that isn't needed. Confusing because no generic is happening here.
- has two existing parameters that are data fields of one of the already existing parameters. Confusing since the two additional parameters aren't actually supposed to be passed in separately from the existing parameter.
Remove both.
We should only force (FULL_REFRESH,OVERWRITE) for the streams to reset.
For the other streams, we only want to replace OVERWRITE with APPEND to
avoid having destination clear the stream. The other cases should be
left as is.
* Pass protocol version into IntegrationLauncherConfig
* Use VersionedStreamStreamFactory in AirbyteSource/Destination
* Add AirbyteMessageBufferedWriter
* Use VersionedBufferedWriter
Implements the webhook operation as part of the sync workflow.
- Introduces the new activity implementation
- Updates the various interfaces that pass input to get the relevant configs to the sync workflow
- Hooks the new activity into the sync workflow
- Passes the webhook configs along into the sync workflow job
* WIP - Add additional sync timing information
* Fixup tests
* fix PMD problem
* send data to segment
* Test JobTracker
* respond to PR suggestions
* fixup test
* formatting
* fix initializer for stats
* Make thread-safe with synchronized
* Don't clobber syncStats on init
* add comments and fix init
* Do what Pedro says
* Extract timeTracker pojo
* query once for all needed models, instead of querying within connections loop
* cleanup and fix failing tests
* pmd fix
* fix query and add test
* return empty if input list is empty
* undo aggressive autoformatting
* don't query for connection operations in a loop, instead query once and group-by connectionID in memory
* try handling operationIds in a single query instead of two
* remove optional
* fix operationIds query
* very annoying, test was failing because operationIds can be listed in a different order. verify operationIds separately from rest of object
* combined queries/functions instead of separate queries for actor and definition
* remove leftover lines that aren't doing anything
* format
* add javadoc
* format
* use leftjoin so that connections that lack operations aren't left out
* clean up comments and format
* Add Airbyte Protocol Range configs
* Refactor metadata read/write
* Add ProtocolVersion Min/Max get/set to JobsPersistence
* Store the supported protocol version range in airbyte_metadata
* Use defaults in EnvConfigs instead of .env
* wip for moving scheduler-persistence to airbyte-persistence
* move main/resources
* move settings include to match existing includes
* fix incorrect import paths
* fix import order