Implement the persistence layer changes following #19191.
This PR handles writing and reading stats to the new stream stat_table and columns in the existing sync_stats table.
At the same time we introduce upserts of stats records - i.e. merge updates into a single record - in preparation for real time stats updates vs the current approach where a new stat record is always written.
There will be two remaining PRs after this:
- First PR will be to fully wire up and test the API.
- Second PR will be to actually save stats while jobs are running.
* database migration to add column for field selection info
* add field selection info to standard sync persistence
* fix around persistence of field selection info
* API changes to support configuring column selection
* style and testing improvements around column selection api impl
* acceptance test fix for field selection api changes
When auto-detect schema changes feature flag is on, disable connections that have breaking schema changes and connections that have any schema changes where the user has set their preference to disable.
* add structured dbt cloud information to the operations api
* remove unused webhook features, test updates
* update tests to use structured dbt cloud operation api
* add missing webhook operator type
API changes to support the progress bar.
- The eventual idea is for the save_stats route to be called by the workers during replication. Workers will save stats for a job id and attempt number.
- Make modifications to the /jobs/list and the /jobs/get_debug_info routes to also return estimated bytes/records.
We need both estimated metadata, as well as running states to calculate progress bar and throughput.
- add the save_stats route. This is the route that will be called by workers. I've done my best to reuse existing openapi bodies to reduce duplication.
- add the estimatedRecords and estimatedBytes fields to the AttemptStats body. This is part of the AttemptRead and the AttemptStreamStats objects. This eventually filters up to the jobs/list and jobs/get_debug_info objects. This also adds these to all the endpoints that were previously returning stats information. I think the duplicated data is a small issue and don't think it's worth splitting out a new api objects, though I will gladly do so if folks feel strongly.
minor changes to the AttemptApiController to support the new route.
- I've stubbed out the handlers for now since the backend is not yet implemented.
* Tmp
* Extract the Attempt API from the V1 API
* Add comments
* Move Connection API out of configuration API
* format
* format
* Rename to Controller
* Rename to Controller
* Add values to the factory
* Change the constructor to use hadler instead of objects needed by the handler
* Update with new tags.
* tmp
* Fix PMD errors
* Extract DB migrator
* Add something that I forgot
* extract destination definition api
* restore destination factory initialization
* extract destination definition specification api
* format
* format
* format
* extract health check api
* extract jobs api
* fix test
* format
* Extract logs api
* Add missing declaration
* Fix build
* Tmp
* format and PR comments
* Extract notification API
* re-organize tags
* Extract all Oauth
* Fix PMD
* add schemaChange
* merge conflict
* frontend tests
* tests
* l
* fix source catalog id
* test
* formatting
* move schema change to build backend web connection
* check if actor catalog id is different
* fix
* tests and fixes
* remove extra var
* remove logging
* continue to pass back new catalog id
* api updates
* fix mockdata
* tests
* add schemaChange
* merge conflict
* frontend tests
* tests
* l
* fix source catalog id
* test
* formatting
* move schema change to build backend web connection
* check if actor catalog id is different
* fix
* tests and fixes
* remove extra var
* remove logging
* continue to pass back new catalog id
* api updates
* fix mockdata
* tests
* tests
* optional of nullable
* Tmp
* For diff
* Add test
* More test
* Fix test and add some
* Fix merge and test
* Fix PMD
* Fix test
* Rm dead code
* Fix pmd
* Address PR comments
* RM unused column
Co-authored-by: alovew <anne@airbyte.io>
* ensure workspace webhook configs can be correctly passed between API and persistence layers
* remove unnecessary logging
* add unit tests to workspace webhook config handling
* additional testing and style cleanup around workspace webhook config handling
* introduce webhook operations to the operations API and persistence
* add unit tests for webhooks in operations endpoint handler
* fixes and additional testing in webhook operations handler
* cleanup refactor around operations handling to reduce duplicate code
* wip
* handle webhook configs in workspaces endpoint and split/hydrate secrets
* style improvements to documentation around webhook configs
* Clarify documentation around webhook auth tokens
* More documentation clarification around webhook configs
* Format.
* unit test coverage for webhook config handling
* use common json parsing libraries around webhook configs
* clean up around testing webhook operation configs
Co-authored-by: Davin Chia <davinchia@gmail.com>
* progress on adding geography throughout api
* fix workspace handler test
* more progress
* implement workspace defaulting and add/update more tests
* fix bootloader tests
* set defaultGeography in missing places
* add Geography column when reading Connection record from DB
* fix pmd
* add more comments/description
* format
* description
* Use PATCH Api for toggling connections
* remove catalog from web backend connection list, and move icon to source/destination field in response
* Adjust FE code
* comment out failing tests
* Resolve merge conflict
* add back in tests and make them pass with new list items
* format
* leave repository layer alone for now, just remove catalog from API response
* format
* load the icon when returning
* update sourceHandler and destinationHandler test to account for icons
* add icon to source and destination in webBackendConnectionHandlerTest
* change icon test to actually load an svg rather than using static mocks
* also fix icon test for WebBackendConnectionsHandlerTest
* make PMD happy
Co-authored-by: Tim Roes <tim@airbyte.io>
Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: KC <krishna@airbyte.io>
* remove operationIds from WebBackendConnectionUpdate, just use operations
* refactor connection updates to patch-style update, where null fields remain unchanged
* better comment and arg name
* format
* make sure we are still 'dual-writing' to the old schedule column, even when the patch doesn't specify anything for it
* update acceptance test to update with new schedule syntax
* add catalog sorting to preserve stream order during patch, and more tests
* format
* add description, throw runtime exception for impossible branches, move streamReset to private helper
* PR suggestions
* add nested test classes and write a test for the catalog sorting method
* format
* add comment clarifying that the catalog sort is for UX, and isn't critical
* format
* format
* update acceptance tests to send proper catalog patches instead of whole new catalog
* format
* format
* simplify catalog patching - now, if a catalog is present on the request, replace the entire catalog with it.
Otherwise, if catalog on the request is null, leave the catalog unchanged
* format
* format
* Revert "update acceptance tests to send proper catalog patches instead of whole new catalog"
This reverts commit 71922648b4e070f46ff6c468813b7ab8dd9d6651.
* adjust description
* add jobInfoLight API endpoint that excludes attempt information, which can be enormous as it includes all log lines
* update replication activity to call new light endpoint
* start implementation of new persistence method
* add includingJobId and totalJobCount to job list request
* format
* update local openapi as well
* refactor queries into JOOQ and return empty list if target job cannot be found
* fix descriptions and undo changes from other branch
* switch including job to starting job
* fix job history handler tests
* rewrite jobs subqueries in jooq
* fix multiple config type querying
* remove unnecessary casts
* switch back to 'including' and return multiple of page size necessary to include job
* undo webapp changes
* fix test description
* format
* Update protocol version from actor defs API operations
* Implement default airbyte protocol version support
* Add version parsing
* Add acceptance tests
* Fix Acceptance Tests
* format
* Make test package private
* add new Configs for multi-cloud
* add api endpoints for setting workflow attempt id and createOrUpdate state
* update activities to call APIs instead of persistence
* workerApp refactor to separately initialize control or data plane dependencies
* modify syncWorkflow to call new activity that decides which task queue for data plane tasks
* misc to get build working
* move StateConverter to worker, so that server and worker can both access without needing to introduce any new dependencies
* update configs - remove extranneous helpers, clarify naming and comments, removed COMBINED value
* forgot to actually remove COMBINED enum value, this removes it
* add WorkerApp todo for breaking API Client into a scoped client
* rename decideTaskQueueActivity var to routeToTaskQueueActivity
* pr comments
* naming fix
* refactor secretHydrator instantiation
* WorkerApp PR feedback: move API client logic to separate class, use updated configs, etc
* add a RouterService class that is injected into RouteToTaskQueueActivityImpl
* AttemptApi cleanup and added unit test coverage
* fix confusion between AttemptId and AttemptNumber in new AttemptApi
* remove unused getDataPlaneSecretsHydrator
* remove unused import
Co-authored-by: Xiaohan Song <xiaohan@airbyte.io>
* save
* clean up more usages and remove withRefreshedCatalog
* make webapp use correct endpoint
* add back intercept
* fix acceptance test
* fix log
* remove 'new' from test name
* fix: clone api doesn't take update configurations
* fix: you will be able to create clone in different workspace
* fix: added description to source/destination body