Testing shows this is causing ~ 5MB/s of throughput on the platform. This is not needed since we can simply modify the already present Json node instead of a cloned object.
This should help both CPU and GC pressure.
* wip: return whether configuration was updated
* updated outputs working
* fix pmd
* update description, format
* add didUpdateConfiguration to metadata, rm unneeded generics
* add didUpdateConfiguration to api response
* update name to fix pmd
* not required
* rename to match api response
* remove unused field
* match naming
* track latest config message
* pass new config as part of outputs
* persist new config
* persist config as the messages come through, dont set output
* clean up old implementation
* accept control messages for destinations
* get api client from micronaut
* mask instance-wide oauth params when updating configs
* defaultreplicationworker tests
* formatting
* tests for source/destination handlers
* rm todo
* refactor test a bit to fix pmd
* fix pmd
* fix test
* add PersistConfigHelperTest
* update message tracker comment
* fix pmd
* format
* move ApiClientBeanFactory to commons-worker, use in container-orchestrator
* pull out config updating to separate methods
* add jitter
* rename PersistConfigHelper -> UpdateConnectorConfigHelper, docs
* fix exception type
* fmt
* move message type check into runnable
* formatting
* pass api client env vars to container orchestrator
* pass micronaut envs to container orchestrator
* print stacktrace for debugging
* different api host for container orchestrator
* fix default env var
* format
* fix errors after merge
* set source and destination actor id as part of the sync input
* fix: get destination definition
* fix null ptr
* remove "actor" from naming
* fix missing change from rename
* revert ContainerOrchestratorConfigBeanFactory changes
* inject sourceapi/destinationapi directly rather than airbyteapiclient
* UpdateConnectorConfigHelper -> ConnectorConfigUpdater
* rm log
* fix test
* dont fail on config update error
* process control messages for discover jobs
* process control messages for CHECK
* persist config updates on check_connection_for_update
* get last config message rather than first
* fix pmd
* fix failing tests
* add tests
* source id not required for check connection (create case)
* suppress pmd warning for BusyWait literal
* source id not required for checkc onnection (create case) (p2)
* pass id, not full config to runnables/accept control message
* add new config required for api client
* add test file
* remove debugging logs
* rename method (getLast -> getMostRecent)
* rm version check (re-added this in by mistake on merge)
* fix test compatibility
* simplify
* track latest config message
* pass new config as part of outputs
* persist new config
* persist config as the messages come through, dont set output
* clean up old implementation
* accept control messages for destinations
* get api client from micronaut
* mask instance-wide oauth params when updating configs
* defaultreplicationworker tests
* formatting
* tests for source/destination handlers
* rm todo
* refactor test a bit to fix pmd
* fix pmd
* fix test
* add PersistConfigHelperTest
* update message tracker comment
* fix pmd
* format
* move ApiClientBeanFactory to commons-worker, use in container-orchestrator
* pull out config updating to separate methods
* add jitter
* rename PersistConfigHelper -> UpdateConnectorConfigHelper, docs
* fix exception type
* fmt
* move message type check into runnable
* formatting
* pass api client env vars to container orchestrator
* pass micronaut envs to container orchestrator
* print stacktrace for debugging
* different api host for container orchestrator
* fix default env var
* format
* fix errors after merge
* set source and destination actor id as part of the sync input
* fix: get destination definition
* fix null ptr
* remove "actor" from naming
* fix missing change from rename
* revert ContainerOrchestratorConfigBeanFactory changes
* inject sourceapi/destinationapi directly rather than airbyteapiclient
* UpdateConnectorConfigHelper -> ConnectorConfigUpdater
* rm log
* fix test
* dont fail on config update error
* pass id, not full config to runnables/accept control message
* add new config required for api client
* add test file
* fix test compatibility
* mount data plane credentials secret to container orchestrator (#20724)
* mount data plane credentials secret to container orchestrator
* rm copy-pasta
* properly handle empty strings
* set env vars like before
* use the right config vars
Introduce a performance test harness for the default replication worker to make it easy for devs to test effect of changes on platform throughput.
The current set up is designed to be run manually. In the future, we can look into integrating this report into our build pipelines. For now, this is good enough as I wanted to start somewhere.
The general idea is to use JMH to run the test n number of times (currently 4 times). The dev can then look at logs to see throughput and how it varies.
As of this PR, we see general platform throughput of ~ 20 - 25 MB/s.
* pass workspace id to sync workflow and use it to selectively enable field selection
* fix tests around workspace id in job creation
* make sure field selection environment variables get passed through properly
* clean up handling around field selection flags
* debug logging for field selection
* properly handle empty field selection feature flag
* fix pmd
* actually fix pmd
The Java 19 toolchain doesn't like sneaky throws. Not entirely sure why. However, I think it's better practice to not use sneaky throws as it makes it clearer what is throw and where.
Example error message when trying to compile the current codebase with Java 19:
error: Error during the transformation of 'io.airbyte.validation.json.JsonSchemaValidatorTest'; post-compiler 'lombok.bytecode.SneakyThrowsRemover' caused an exception: java.lang.IllegalArgumentException: Unsupported class file major version 63
at org.objectweb.asm.ClassReader.<init>(ClassReader.java:199)
at org.objectweb.asm.ClassReader.<init>(ClassReader.java:180)
at org.objectweb.asm.ClassReader.<init>(ClassReader.java:166)
at lombok.bytecode.AsmUtil.fixJSRInlining(AsmUtil.java:37)
at lombok.bytecode.SneakyThrowsRemover.applyTransformations(SneakyThrowsRemover.java:46)
at lombok.core.PostCompiler.applyTransformations(PostCompiler.java:44)
at lombok.core.PostCompiler$1.close(PostCompiler.java:87)
at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeClass(ClassWriter.java:1508)
at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.genCode(JavaCompiler.java:738)
Add the orchestrator label to orchestrators so we can better differentiate orchestrator pods.
This is useful since orchestrator pods are the only pods in the job namespace with a need to talk to the main Airbyte application pods. These labels allow us to apply more granular network filtering.
Also took the chance to do some clean up of labels.
* updated IntegrationLauncherConfig.yaml and added to this class suportDBT and normalizationImage fields. Added to the GenerateInputActivityImpl and TemporalClient classes code parts for read destination_definition.yaml and get suportDBT and normalizationImage fields. Added logging and comparing normalization images from NormalizationRunnerFactory and destination_definition.yaml
* updated minor remarks
* updated minor remarks
* fixed minor remarks
* added normalization data to the tests
* fixed minor remarks
* removed NormalizationRunnerFactory
* fixed remarks
* fixed remarks
* fixed remarks
* updated acceptance tests
* updated acceptance tests
* updated check_images_exist.sh script
* updated method for get normalization image name for destination acceptance test
* fixed code style
* fixed code style and removed tests data
* updated JobErrorReporterTest.java
* updated JobErrorReporterTest.java
* fixed remarks
* added integration type field to the dectination_definition file and actor_definition table
* fixed tests
* fixed tests
* fixed minor changes after pulling master changes
* fixed minor changes after pulling master changes
* renamed integrationType to normalizationIntegrationType/ fixed minor remarks
* renamed extra dependencies
* updated docs
* updated docs
* fixed minor remarks
* added NormalizationDestinationDefinitionConfig.yaml for StandardDestinationDefinition.yaml and updated configuration
* updated normalization tag
* updated DestinationAcceptanceTest.java
* updated DestinationAcceptanceTest.java
* updated imports and descriptions
* implement column filtering in the replication workflow
* fixes to column selection in replication workflow
* add a basic acceptance test for column selection
* make CI acceptance tests run with new field selection flag enabled
* fix format
* readability improvements around columns selection tests and other small fixes
* kubeprocessfactory flag
* plumbing through custom connector bit
* config, style, pmd
* fix more missing configs pass along
* fix sync config
* test fix
* add a flag to specify if we want to use a separate pool or not
* add missing micronaut configs
* make orchestrator job to run in custom pool too
* micronaut fix
* pass config to orchestrator
* fix test
* destination test fix
* PR comments fix
* style fix
* comment fix
* no checks on kubeprocess
* rename
Follow up from ##19814, where we introduced the StreamStats object to consolidate/simplify some of the stats memory objects.
In this PR, we extend the StreamStats object to also include the emitted records and bytes.
- Make StreamStats into a proper object. We cannot use a record as record fields are immutable. We need mutable fields to count.
- Consolidate the emitted records into StreamStats.
- Take the chance to move all the stats/metrics related classes into a book_keeping package to keep things clean.
* wip; add micronaut
* add additional json deserializer methods
* wip; converting to micronaut
* misc cleanup
* wip; broken
* wip; still broken
* wip
* formatting
* minor code cleanup; no actual changes
* wip; still broken
* removed commented out code; no longer broken
* wip; clean-up micronaut code
* cleanup; format
* fix pmd issues
* remove unused file
* init ApplicationTest
* edited link (#19444)
* move 'Example values' into intl (#19446)
* Revert "Update action.yml (#19416)" (#19450)
This reverts commit 78fb528a9a.
* Notifications Workflow (#18735)
* notification workflow
* Bmoric/remove unused code (#19188)
* Tmp
* Move when the deletion is performed
* Re-enable disable test
* PR comments
* Use cancel
* rename
* Fix test and version check position
* remove unused temporal deletion code
* Remove false todo
* Rm repeated test
* Rm unused import
* Make sure that long running activity are not retried (#19452)
* Parse list of dicts in json_schema_helper.find_nodes() (#19386)
* Get test on nested list/dict passing - use index to query next object for list
* Fix flakecheck
* Test that get_node provides correct value
* Improve test and test cases
* Rewrite method for better comprehension
* Add test for base-level key. Rewrite method for comprehension and handling this case
* adding tests
* fix test
* formatting
* remove unused dependencies
* add missing test resource
* format
* add missing test resource (real)
* format
* add back protocol-models dep
* format
* pr feedback; log stacktrace
Co-authored-by: Sophia Wiley <106352739+sophia-wiley@users.noreply.github.com>
Co-authored-by: Lake Mossman <lake@airbyte.io>
Co-authored-by: Topher Lubaway <asimplechris@gmail.com>
Co-authored-by: Anne <102554163+alovew@users.noreply.github.com>
Co-authored-by: Benoit Moriceau <benoit@airbyte.io>
Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com>
* updated IntegrationLauncherConfig.yaml and added to this class suportDBT and normalizationImage fields. Added to the GenerateInputActivityImpl and TemporalClient classes code parts for read destination_definition.yaml and get suportDBT and normalizationImage fields. Added logging and comparing normalization images from NormalizationRunnerFactory and destination_definition.yaml
* updated minor remarks
* updated minor remarks
* fixed minor remarks
* added normalization data to the tests
* fixed minor remarks
* fixed remarks
Implement estimate message processing allowing the platform to hold on to estimate message counts in memory.
The estimate message is protocol message connectors can choose to emit to provide support for progress bar calculations. There are two kinds of estimates, per-Sync or per-Stream. Sources cannot emit both types in a single sync.
Per-stream estimates are what we usually expect. Per-sync estimates are for sources that cannot provide more granular estimates for whatever reasons e.g. CDC sources.
In a follow up PR, the platform will periodically save these messages through the save stats api.
#19191 made me realise the DefaultReplicationWorker's metric tracking today has a bug where we aren't accounting for namespace when tracking metrics today. i.e. Streams with the same name and duplicate namespace will merge metrics.
While reading the code to figure out a fix, I realised we don't have a good conceptual representation of stream namespace <> name pairs within the platform today. We use a concatenated string. Though this works, it will become harder and harder to read/track as we do more operations that involve namespace i.e. progress bars and column selection.
This PR introduces the AirbyteStreamNameNamespacePair object into the platform code to make it more convenient to work with Streams in the future. (Especially if we proceed with the project to make streams a first-class citizen!)
The AirbyteStreamNameNamespacePair object was written to deal with the same issue of namespace <> name pair manipulation within the Java destination code. It implements the Comparable interface, which makes it convenient to use for Collections operations.
For an example of how this is consumed, see #19361.
* init attempt at initcontainer
* wait for init container to be up instead of main container
* copy files to init container
* Revert "Bmoric/extract webbackend api (#18988)"
This reverts commit b05a5b2a6a.
* Revert "Revert "Bmoric/extract webbackend api (#18988)""
This reverts commit ebef6e44e8.
* block on initContainer status; cleanup init script
* add log messages
* add quotes to log messages
* pr feedback, add comment to bash script
Logic in this class is going to have to change as part of two big upcoming projects:
- column selection
- progress bars
To prepare for this, I've gone ahead and refactored the run method for readability. This is a monster function. The current function is too long and contains several operational abstractions, increasing unnecessary complexity. This is the core of what we do, so it's important to ensure this code is extremely understandable.
Ultimately we want to probably want to break the run method up into two or more separate classes - one that deals with replication and one that deals with outputs - for better testing, readability and isolation. This sets the stage for that.
I have intentionally NOT removed or touched any logic, nor have I put thought into consolidating the function signatures to preserve as much of the pre-existing logic and keep the changeset small and reviewable.
This changeset only renames and moves code around.
* To be remove
* Remove the signal for waiting after a failed activity and ensure we are waiting the expected time
* Revert "To be remove"
This reverts commit 3a5f7b4f72.
* Remove unused and move failure reason to the helper
* Avoid repetitive new Set()
* Pass protocol version into IntegrationLauncherConfig
* Use VersionedStreamStreamFactory in AirbyteSource/Destination
* Add AirbyteMessageBufferedWriter
* Use VersionedBufferedWriter