1
0
mirror of synced 2025-12-21 19:11:14 -05:00
Commit Graph

469 Commits

Author SHA1 Message Date
Edward Gao
21c1ccbf8a Bulk Load CDK: Add remaining data coercion tests (#70828) 2025-12-15 14:37:55 -08:00
Edward Gao
dc92c5a254 Bulk Load CDK: add int/number coercion tests (#70230) 2025-12-11 19:24:21 +00:00
Ryan Br...
6a48afc369 Add CDC_CURSOR_COLUMN to load cdk for reference. (#70852) 2025-12-10 15:34:29 -08:00
Ryan Br...
9e8263dab3 Map the namespace before creating the final table names. (#70827) 2025-12-09 19:39:50 -08:00
Ryan Br...
8f9119be2d Rbroughan/cdk component test fixes plus escape hatch (#70714) 2025-12-09 10:48:12 -08:00
Ryan Br...
8e437b321d Rbroughan/dest stream table schema final (#70279) 2025-12-03 13:29:15 -08:00
Jimmy Ma
a515e9ae81 chore: add dependency injection tests (#69845)
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2025-12-02 09:42:07 -08:00
Edward Gao
652c2eda86 Bulk load CDK: more schema evolution test cases (#69343) 2025-12-01 21:00:05 +00:00
Edward Gao
aacf63d66a Bulk load cdk: improve component tests' upsert test coverage (#69338) 2025-11-20 00:43:33 +00:00
Wenqi Hu
e62b337005 Support multiple ts precision decoding (#69326) 2025-11-13 16:59:54 -08:00
Edward Gao
f943c0f299 Bulk load CKD: schema evolution test suite (#69234) 2025-11-13 23:17:28 +00:00
Francis Genet
409566389d Remove a random println from our code (#69288) 2025-11-11 15:42:44 -08:00
Edward Gao
be3a818445 Bulk load CDK: break down ensureSchemaMatches (#69090) 2025-11-11 19:02:57 +00:00
Subodh Kant Chaturvedi
c6c51ebf0f chore(dataflow-cdk): remove logs from state store cause it confuses users (#69214) 2025-11-10 08:07:08 +05:30
Jonathan Pearlin
8f93b643c1 chore: refactor additional stat tracking (#69222) 2025-11-06 14:23:38 -05:00
Jonathan Pearlin
f244a50b56 fix: add additional stats to expected destination state message (#69201) 2025-11-05 11:42:17 -05:00
Jonathan Pearlin
dc83e41e77 refactor: separate validation result handling from coercer (#69113) 2025-11-05 09:18:59 -05:00
Edward Gao
93c8bd934f Bulk load CDK: parse non-property-ful schema into UnknownType (#69109) 2025-11-03 08:10:21 -08:00
Subodh Kant Chaturvedi
2c1179dcae fix(dataflow-cdk) : fix bug related to index of state messages (#69099) 2025-10-30 23:52:29 +05:30
Edward Gao
781f01f951 Bulk load CDK: move test-only TableOpClient methods to new interface (#68200) 2025-10-29 10:21:53 -07:00
Edward Gao
2bd50bf216 Edgao/component upsert tests/cdk (#68185) 2025-10-28 16:24:47 -07:00
Edward Gao
348cbfc73b Bulk load CDK: log diagnostics when unflushed states at end of sync (#68668) 2025-10-28 18:03:24 +00:00
Rodi Reich Zilberman
5610f7211f Remove CDC metafield decoration from streams that cannot be incremental - no pk (#68651) 2025-10-28 09:27:42 -07:00
Edward Gao
dcec4c899c Bulk load CDK: more component test stuff (#68167) 2025-10-28 15:41:50 +00:00
Subodh Kant Chaturvedi
1b48ee9adf feat: improve temporal representation in proto + shared encoder/decoder for source and dest (#67016)
Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com>
2025-10-21 22:31:47 +05:30
Ryan Br...
7843cc6178 Refactor: Separate Interface Concerns & Add Table Component Tests (#67624)
## What

Refactors the database operation interfaces and establishes the first
component test suite

## How

### Interface Separation

- Consolidated `AirbyteClient` and `DirectLoadTableSqlOperations` into
`TableOperationsClient`
- Standard SQL operations (CREATE, DROP, COPY, UPSERT) and their
compositions
- Adds straightforward methods for test with default impls to avoid
breaking connectors

- Renamed DirectLoadTableNativeOperations → TableSchemaEvolutionClient
- Better reflects its responsibility for complex schema evolution
operations
- Distinguishes it from simple SQL operations handled by
TableOperationsClient

### Test Suite Infrastructure

- TableOperationsSuite: Interface-based test suite for validating all
table operations
- TableOperationsTestHarness: Helper class for test execution
- TableOperationsFixtures: Centralized test data and constants

### Documentation Improvements

- Added comprehensive JavaDoc explaining the complexity domains each
interface handles
- Documented specific challenges implementors must address (type
translation, nullable columns, PK changes, etc.)
- Clear cross-references between related interfaces

### Other
- Moves some interfaces / objects from the `db` toolkit to the main
`load` cdk for simpler dependency handling

### Migration Guide

For connector implementations:
1. TableOperationsClient replaces AirbyteClient and
DirectLoadTableSqlOperations
2. Replace DirectLoadTableNativeOperations with
TableSchemaEvolutionClient
3. Ensure getGenerationId() calls use TableOperationsClient instance
4. Update imports
2025-10-16 11:19:56 -07:00
Ryan Br...
e159e212cf Rbroughan/better interleaved streams (#67583)
## What
Better handle interleaved streams

Refactor `MemoryAndParallelismConfig` to `AggregatePublishingConfig` for
clarity

## How
* We now specify the total allowed memory (`maxEstBytesAllAggregates`)
and evict the largest if total usage is above that
* We continue to check cardinality first out of performance concerns

## Note
Is is a "breaking change" because it renames some stuff
2025-10-09 10:50:45 -07:00
Rodi Reich Zilberman
a421083ca4 CDK changes only (#67152) 2025-10-07 10:51:43 -07:00
Ryan Br...
b4c677cf2d Move last 2 inlined dispatchers into bean factory (#67002) 2025-10-01 17:05:40 -07:00
Ryan Br...
fc1a15238a Rbroughan/state enricher tests (#66997) 2025-10-01 15:33:48 -07:00
Edward Gao
12837e2833 Bulk CDK: enforce test timeouts better (#66722) 2025-09-29 09:31:05 -07:00
Ryan Br...
9b8cbba8e2 Just set dest stats to source stats for now. (#66698) 2025-09-25 10:37:13 -07:00
Ryan Br...
bcbfd97b29 Rbroughan/fix interleaved stream state (#66686) 2025-09-25 10:01:31 -07:00
Ryan Br...
1c29927e11 Rbroughan/fix proto input stream hang (#66576)
Co-authored-by: Subodh Kant Chaturvedi <subodh1810@gmail.com>
2025-09-24 12:28:52 -07:00
Subodh Kant Chaturvedi
2a9057a68f fix: dataflow destination cdk minor fixes (#66559) 2025-09-22 12:41:47 -07:00
Ryan Br...
ec2a9694e3 Rbroughan/dataflow speed stats (#66543) 2025-09-19 15:38:13 -07:00
Subodh Kant Chaturvedi
73010e2231 fix: use dedicated dispatcher for parse+aggregate stage for an individual pipeline + cache column name lookup (#66496) 2025-09-19 13:19:55 +05:30
Ryan Br...
5a2c1a69fc fix docker test input stream (#66327) 2025-09-16 15:00:04 -04:00
Edward Gao
224459edf3 chore: re-add the bulk CDK version file (#66193) 2025-09-16 17:01:26 +00:00
Ryan Br...
82a6735530 Fix non dockerized test input (#66240) 2025-09-15 21:05:32 -04:00
Jonathan Pearlin
43f95df54b fix: use correct field name for generation ID meta column (#66209) 2025-09-12 13:27:54 -04:00
Rodi Reich Zilberman
856acc93cd Extract CDK proto encoding (#66152)
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2025-09-10 16:44:03 -07:00
Edward Gao
5a6690808f move cdk version into its own file (#66175) 2025-09-10 21:44:15 +00:00
Jonathan Pearlin
f5bebd7741 fix: add per-test additional properties for spec integration test (#66173) 2025-09-10 16:15:39 -04:00
Ryan Br...
0e9f41089f Misc. transform package cleanup (#66142) 2025-09-09 14:13:19 -07:00
Subodh Kant Chaturvedi
51f91ed9b1 feat: implement proto record munger for clickhouse (#65939)
Co-authored-by: tryangul <ryan.broughan@gmail.com>
2025-09-09 23:09:53 +05:30
Jose Pefaur
dabed92279 fix: set io dispatchers on discover tests (#66026) 2025-09-09 12:28:59 -05:00
Ryan Br...
1502f68b52 Add bytes to dataflow cdk emitted states (#65953) 2025-09-09 10:08:19 -07:00
Ryan Br...
4464737f19 Throw and fail sync if unflushed states exist at sync end. (#65975) 2025-09-08 10:13:55 -07:00
Maxime Carbonneau-Leclerc
a0d0e1cdff feat(declarative bulk): spec for low code (#65157)
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2025-09-08 08:52:35 -04:00