1
0
mirror of synced 2025-12-19 18:14:56 -05:00
Commit Graph

2353 Commits

Author SHA1 Message Date
Edward Gao
21c1ccbf8a Bulk Load CDK: Add remaining data coercion tests (#70828) 2025-12-15 14:37:55 -08:00
Edward Gao
dc92c5a254 Bulk Load CDK: add int/number coercion tests (#70230) 2025-12-11 19:24:21 +00:00
Ryan Br...
6a48afc369 Add CDC_CURSOR_COLUMN to load cdk for reference. (#70852) 2025-12-10 15:34:29 -08:00
Ryan Br...
9e8263dab3 Map the namespace before creating the final table names. (#70827) 2025-12-09 19:39:50 -08:00
Ryan Br...
8f9119be2d Rbroughan/cdk component test fixes plus escape hatch (#70714) 2025-12-09 10:48:12 -08:00
Wenqi Hu
c0eb6dcb4b Fix CDC race condition when Debezium engine thread closes before record emitted (#70360) 2025-12-05 09:43:15 -08:00
Ryan Br...
8e437b321d Rbroughan/dest stream table schema final (#70279) 2025-12-03 13:29:15 -08:00
Jimmy Ma
a515e9ae81 chore: add dependency injection tests (#69845)
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2025-12-02 09:42:07 -08:00
Edward Gao
652c2eda86 Bulk load CDK: more schema evolution test cases (#69343) 2025-12-01 21:00:05 +00:00
Aaron ("AJ") Steers
5938dee934 docs(bulk-cdk): Add API documentation links to README and CONTRIBUTING (#69811)
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-11-21 11:22:42 -08:00
Aaron ("AJ") Steers
130339b57f feat(bulk-cdk): Add Dokka documentation generation with Vercel deployment (#69752)
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-11-21 09:30:17 -08:00
Edward Gao
aacf63d66a Bulk load cdk: improve component tests' upsert test coverage (#69338) 2025-11-20 00:43:33 +00:00
Wenqi Hu
88758565c0 Fix default partition_id value for CheckpointOnlyPartitionReader (#69753) 2025-11-19 20:51:21 +00:00
Edward Gao
e10b9022ad Bulk CDK: check that we aren't republishing _any_ version (#69330) 2025-11-14 08:13:29 -08:00
Wenqi Hu
e62b337005 Support multiple ts precision decoding (#69326) 2025-11-13 16:59:54 -08:00
Edward Gao
f943c0f299 Bulk load CKD: schema evolution test suite (#69234) 2025-11-13 23:17:28 +00:00
Wenqi Hu
dadad3a8ee Fix duplicate metadata key in JdbcMetadataQuerier (#69309) 2025-11-12 16:54:23 -08:00
Francis Genet
409566389d Remove a random println from our code (#69288) 2025-11-11 15:42:44 -08:00
Rodi Reich Zilberman
ce42e776be 9834 java l3 mysql cdc null values in flexible milestone cpms and flexible milestone positions streams (#69254) 2025-11-11 15:18:00 -08:00
Edward Gao
be3a818445 Bulk load CDK: break down ensureSchemaMatches (#69090) 2025-11-11 19:02:57 +00:00
Francis Genet
79a014e1c2 [GCS-DL] More CDK changes on the IcebergTableSynchronizer (#69267) 2025-11-11 18:58:09 +00:00
sophiecuiy
0d9d75d52a case sensitivity fix (#69225)
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2025-11-10 16:32:43 -08:00
Subodh Kant Chaturvedi
c6c51ebf0f chore(dataflow-cdk): remove logs from state store cause it confuses users (#69214) 2025-11-10 08:07:08 +05:30
Francis Genet
313a09e375 [GCS-DL] Make getOperation in iceberg public (#69243) 2025-11-07 14:28:45 -08:00
Francis Genet
9aaf7265b7 Update the IcebergTableSynchronizer.kt to allow us to force commit between schema updates (#69213) 2025-11-06 14:38:31 -08:00
Jonathan Pearlin
8f93b643c1 chore: refactor additional stat tracking (#69222) 2025-11-06 14:23:38 -05:00
Jonathan Pearlin
f244a50b56 fix: add additional stats to expected destination state message (#69201) 2025-11-05 11:42:17 -05:00
Jonathan Pearlin
dc83e41e77 refactor: separate validation result handling from coercer (#69113) 2025-11-05 09:18:59 -05:00
Wenqi Hu
6b0db42e24 Query table metadata per schema (#69184) 2025-11-04 14:13:38 -08:00
sophiecuiy
c761e11888 Adding table filtering to JDBC (#69094)
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2025-11-04 09:54:43 -08:00
Matt Bayley
4774b7611a Introduce extract-trigger toolkit (#68189) 2025-11-03 16:56:22 -08:00
Edward Gao
93c8bd934f Bulk load CDK: parse non-property-ful schema into UnknownType (#69109) 2025-11-03 08:10:21 -08:00
Subodh Kant Chaturvedi
2c1179dcae fix(dataflow-cdk) : fix bug related to index of state messages (#69099) 2025-10-30 23:52:29 +05:30
Edward Gao
781f01f951 Bulk load CDK: move test-only TableOpClient methods to new interface (#68200) 2025-10-29 10:21:53 -07:00
Edward Gao
136831a8a9 publish bulk cdk for 2bd50bf216 (#68693) 2025-10-28 16:37:15 -07:00
Edward Gao
2bd50bf216 Edgao/component upsert tests/cdk (#68185) 2025-10-28 16:24:47 -07:00
Edward Gao
348cbfc73b Bulk load CDK: log diagnostics when unflushed states at end of sync (#68668) 2025-10-28 18:03:24 +00:00
Rodi Reich Zilberman
5610f7211f Remove CDC metafield decoration from streams that cannot be incremental - no pk (#68651) 2025-10-28 09:27:42 -07:00
Edward Gao
dcec4c899c Bulk load CDK: more component test stuff (#68167) 2025-10-28 15:41:50 +00:00
Wenqi Hu
e526e316ba Timeout for inactive DB during CDC sync (#68118) 2025-10-24 10:08:02 -07:00
Subodh Kant Chaturvedi
1b48ee9adf feat: improve temporal representation in proto + shared encoder/decoder for source and dest (#67016)
Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com>
2025-10-21 22:31:47 +05:30
Wenqi Hu
7bae93473e Safe debezium engine shutdown (#68208)
## What
While the engine is shutting down we don't accept any incoming event as:
a. It's incorrect since we already decided to close the engine (i.e
target reached)
b. It's unsafe for socket write as the engine forcefully kills it event
thread, which may cause our socket to disconnect if in the middle to
write

## How
Added a flag to signal engine shutting down and prevent accepting any
incoming event.

## Review guide
<!--
1. `x.py`
2. `y.py`
-->

## User Impact
<!--
* What is the end result perceived by the user?
* If there are negative side effects, please list them. 
-->

## Can this PR be safely reverted and rolled back?
<!--
* If unsure, leave it blank.
-->
- [ ] YES 💚
- [ ] NO 
2025-10-20 18:27:22 -07:00
Ryan Br...
7843cc6178 Refactor: Separate Interface Concerns & Add Table Component Tests (#67624)
## What

Refactors the database operation interfaces and establishes the first
component test suite

## How

### Interface Separation

- Consolidated `AirbyteClient` and `DirectLoadTableSqlOperations` into
`TableOperationsClient`
- Standard SQL operations (CREATE, DROP, COPY, UPSERT) and their
compositions
- Adds straightforward methods for test with default impls to avoid
breaking connectors

- Renamed DirectLoadTableNativeOperations → TableSchemaEvolutionClient
- Better reflects its responsibility for complex schema evolution
operations
- Distinguishes it from simple SQL operations handled by
TableOperationsClient

### Test Suite Infrastructure

- TableOperationsSuite: Interface-based test suite for validating all
table operations
- TableOperationsTestHarness: Helper class for test execution
- TableOperationsFixtures: Centralized test data and constants

### Documentation Improvements

- Added comprehensive JavaDoc explaining the complexity domains each
interface handles
- Documented specific challenges implementors must address (type
translation, nullable columns, PK changes, etc.)
- Clear cross-references between related interfaces

### Other
- Moves some interfaces / objects from the `db` toolkit to the main
`load` cdk for simpler dependency handling

### Migration Guide

For connector implementations:
1. TableOperationsClient replaces AirbyteClient and
DirectLoadTableSqlOperations
2. Replace DirectLoadTableNativeOperations with
TableSchemaEvolutionClient
3. Ensure getGenerationId() calls use TableOperationsClient instance
4. Update imports
2025-10-16 11:19:56 -07:00
Subodh Kant Chaturvedi
57a436fc2a feat(s3-datalake-connector-cdk): implement polaris catalog spec (#67942)
Issue: https://github.com/airbytehq/airbyte-internal-issues/issues/14734
2025-10-15 21:45:39 +05:30
Ryan Br...
e159e212cf Rbroughan/better interleaved streams (#67583)
## What
Better handle interleaved streams

Refactor `MemoryAndParallelismConfig` to `AggregatePublishingConfig` for
clarity

## How
* We now specify the total allowed memory (`maxEstBytesAllAggregates`)
and evict the largest if total usage is above that
* We continue to check cardinality first out of performance concerns

## Note
Is is a "breaking change" because it renames some stuff
2025-10-09 10:50:45 -07:00
Wenqi Hu
8873692d9d Support nano sec to preserve precision in TimeAccessor (#67559)
## What
As title. Need this for mssql data type

## How
<!--
* Describe how code changes achieve the solution.
-->

## Review guide
<!--
1. `x.py`
2. `y.py`
-->

## User Impact
<!--
* What is the end result perceived by the user?
* If there are negative side effects, please list them. 
-->

## Can this PR be safely reverted and rolled back?
<!--
* If unsure, leave it blank.
-->
- [ ] YES 💚
- [ ] NO 
2025-10-08 15:04:33 -07:00
Wenqi Hu
0d200684a0 Enable heart beat timeout for CDC sync (#67075)
## What
When connector configured `"airbyte.first.record.wait.seconds"`
property, we use it for heart beat timeout so we can timeout if no cdc
activity

## How
<!--
* Describe how code changes achieve the solution.
-->

## Review guide
<!--
1. `x.py`
2. `y.py`
-->

## User Impact
<!--
* What is the end result perceived by the user?
* If there are negative side effects, please list them. 
-->

## Can this PR be safely reverted and rolled back?
<!--
* If unsure, leave it blank.
-->
- [ ] YES 💚
- [ ] NO 
2025-10-08 17:56:36 +00:00
Rodi Reich Zilberman
a421083ca4 CDK changes only (#67152) 2025-10-07 10:51:43 -07:00
Ryan Br...
b4c677cf2d Move last 2 inlined dispatchers into bean factory (#67002) 2025-10-01 17:05:40 -07:00
Ryan Br...
fc1a15238a Rbroughan/state enricher tests (#66997) 2025-10-01 15:33:48 -07:00