Edward Gao
21c1ccbf8a
Bulk Load CDK: Add remaining data coercion tests ( #70828 )
2025-12-15 14:37:55 -08:00
Edward Gao
dc92c5a254
Bulk Load CDK: add int/number coercion tests ( #70230 )
2025-12-11 19:24:21 +00:00
Ryan Br...
6a48afc369
Add CDC_CURSOR_COLUMN to load cdk for reference. ( #70852 )
2025-12-10 15:34:29 -08:00
Ryan Br...
9e8263dab3
Map the namespace before creating the final table names. ( #70827 )
2025-12-09 19:39:50 -08:00
Ryan Br...
8f9119be2d
Rbroughan/cdk component test fixes plus escape hatch ( #70714 )
2025-12-09 10:48:12 -08:00
Ryan Br...
8e437b321d
Rbroughan/dest stream table schema final ( #70279 )
2025-12-03 13:29:15 -08:00
Jimmy Ma
a515e9ae81
chore: add dependency injection tests ( #69845 )
...
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2025-12-02 09:42:07 -08:00
Edward Gao
652c2eda86
Bulk load CDK: more schema evolution test cases ( #69343 )
2025-12-01 21:00:05 +00:00
Edward Gao
aacf63d66a
Bulk load cdk: improve component tests' upsert test coverage ( #69338 )
2025-11-20 00:43:33 +00:00
Edward Gao
f943c0f299
Bulk load CKD: schema evolution test suite ( #69234 )
2025-11-13 23:17:28 +00:00
Francis Genet
409566389d
Remove a random println from our code ( #69288 )
2025-11-11 15:42:44 -08:00
Edward Gao
be3a818445
Bulk load CDK: break down ensureSchemaMatches ( #69090 )
2025-11-11 19:02:57 +00:00
Subodh Kant Chaturvedi
c6c51ebf0f
chore(dataflow-cdk): remove logs from state store cause it confuses users ( #69214 )
2025-11-10 08:07:08 +05:30
Jonathan Pearlin
8f93b643c1
chore: refactor additional stat tracking ( #69222 )
2025-11-06 14:23:38 -05:00
Jonathan Pearlin
f244a50b56
fix: add additional stats to expected destination state message ( #69201 )
2025-11-05 11:42:17 -05:00
Jonathan Pearlin
dc83e41e77
refactor: separate validation result handling from coercer ( #69113 )
2025-11-05 09:18:59 -05:00
Edward Gao
93c8bd934f
Bulk load CDK: parse non-property-ful schema into UnknownType ( #69109 )
2025-11-03 08:10:21 -08:00
Subodh Kant Chaturvedi
2c1179dcae
fix(dataflow-cdk) : fix bug related to index of state messages ( #69099 )
2025-10-30 23:52:29 +05:30
Edward Gao
781f01f951
Bulk load CDK: move test-only TableOpClient methods to new interface ( #68200 )
2025-10-29 10:21:53 -07:00
Edward Gao
2bd50bf216
Edgao/component upsert tests/cdk ( #68185 )
2025-10-28 16:24:47 -07:00
Edward Gao
348cbfc73b
Bulk load CDK: log diagnostics when unflushed states at end of sync ( #68668 )
2025-10-28 18:03:24 +00:00
Edward Gao
dcec4c899c
Bulk load CDK: more component test stuff ( #68167 )
2025-10-28 15:41:50 +00:00
Subodh Kant Chaturvedi
1b48ee9adf
feat: improve temporal representation in proto + shared encoder/decoder for source and dest ( #67016 )
...
Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com >
2025-10-21 22:31:47 +05:30
Ryan Br...
7843cc6178
Refactor: Separate Interface Concerns & Add Table Component Tests ( #67624 )
...
## What
Refactors the database operation interfaces and establishes the first
component test suite
## How
### Interface Separation
- Consolidated `AirbyteClient` and `DirectLoadTableSqlOperations` into
`TableOperationsClient`
- Standard SQL operations (CREATE, DROP, COPY, UPSERT) and their
compositions
- Adds straightforward methods for test with default impls to avoid
breaking connectors
- Renamed DirectLoadTableNativeOperations → TableSchemaEvolutionClient
- Better reflects its responsibility for complex schema evolution
operations
- Distinguishes it from simple SQL operations handled by
TableOperationsClient
### Test Suite Infrastructure
- TableOperationsSuite: Interface-based test suite for validating all
table operations
- TableOperationsTestHarness: Helper class for test execution
- TableOperationsFixtures: Centralized test data and constants
### Documentation Improvements
- Added comprehensive JavaDoc explaining the complexity domains each
interface handles
- Documented specific challenges implementors must address (type
translation, nullable columns, PK changes, etc.)
- Clear cross-references between related interfaces
### Other
- Moves some interfaces / objects from the `db` toolkit to the main
`load` cdk for simpler dependency handling
### Migration Guide
For connector implementations:
1. TableOperationsClient replaces AirbyteClient and
DirectLoadTableSqlOperations
2. Replace DirectLoadTableNativeOperations with
TableSchemaEvolutionClient
3. Ensure getGenerationId() calls use TableOperationsClient instance
4. Update imports
2025-10-16 11:19:56 -07:00
Ryan Br...
e159e212cf
Rbroughan/better interleaved streams ( #67583 )
...
## What
Better handle interleaved streams
Refactor `MemoryAndParallelismConfig` to `AggregatePublishingConfig` for
clarity
## How
* We now specify the total allowed memory (`maxEstBytesAllAggregates`)
and evict the largest if total usage is above that
* We continue to check cardinality first out of performance concerns
## Note
Is is a "breaking change" because it renames some stuff
2025-10-09 10:50:45 -07:00
Ryan Br...
b4c677cf2d
Move last 2 inlined dispatchers into bean factory ( #67002 )
2025-10-01 17:05:40 -07:00
Ryan Br...
fc1a15238a
Rbroughan/state enricher tests ( #66997 )
2025-10-01 15:33:48 -07:00
Ryan Br...
9b8cbba8e2
Just set dest stats to source stats for now. ( #66698 )
2025-09-25 10:37:13 -07:00
Ryan Br...
bcbfd97b29
Rbroughan/fix interleaved stream state ( #66686 )
2025-09-25 10:01:31 -07:00
Ryan Br...
1c29927e11
Rbroughan/fix proto input stream hang ( #66576 )
...
Co-authored-by: Subodh Kant Chaturvedi <subodh1810@gmail.com >
2025-09-24 12:28:52 -07:00
Subodh Kant Chaturvedi
2a9057a68f
fix: dataflow destination cdk minor fixes ( #66559 )
2025-09-22 12:41:47 -07:00
Ryan Br...
ec2a9694e3
Rbroughan/dataflow speed stats ( #66543 )
2025-09-19 15:38:13 -07:00
Subodh Kant Chaturvedi
73010e2231
fix: use dedicated dispatcher for parse+aggregate stage for an individual pipeline + cache column name lookup ( #66496 )
2025-09-19 13:19:55 +05:30
Ryan Br...
5a2c1a69fc
fix docker test input stream ( #66327 )
2025-09-16 15:00:04 -04:00
Ryan Br...
82a6735530
Fix non dockerized test input ( #66240 )
2025-09-15 21:05:32 -04:00
Jonathan Pearlin
43f95df54b
fix: use correct field name for generation ID meta column ( #66209 )
2025-09-12 13:27:54 -04:00
Jonathan Pearlin
f5bebd7741
fix: add per-test additional properties for spec integration test ( #66173 )
2025-09-10 16:15:39 -04:00
Ryan Br...
0e9f41089f
Misc. transform package cleanup ( #66142 )
2025-09-09 14:13:19 -07:00
Subodh Kant Chaturvedi
51f91ed9b1
feat: implement proto record munger for clickhouse ( #65939 )
...
Co-authored-by: tryangul <ryan.broughan@gmail.com >
2025-09-09 23:09:53 +05:30
Jose Pefaur
dabed92279
fix: set io dispatchers on discover tests ( #66026 )
2025-09-09 12:28:59 -05:00
Ryan Br...
1502f68b52
Add bytes to dataflow cdk emitted states ( #65953 )
2025-09-09 10:08:19 -07:00
Ryan Br...
4464737f19
Throw and fail sync if unflushed states exist at sync end. ( #65975 )
2025-09-08 10:13:55 -07:00
Maxime Carbonneau-Leclerc
a0d0e1cdff
feat(declarative bulk): spec for low code ( #65157 )
...
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com >
2025-09-08 08:52:35 -04:00
Ryan Br...
e401d477ec
Socket support for Dataflow ( #65606 )
2025-09-04 16:18:50 -07:00
Ryan Br...
f8ae71e16e
Fixes a rare array out of bounds error in stream init phase ( #65621 )
2025-08-29 15:27:20 -07:00
Subodh Kant Chaturvedi
6a453b3df4
feat: implement socket+proto support for bigquery ( #65114 )
2025-08-28 00:16:15 +05:30
Ryan Br...
51536bab93
Add missing unit test. ( #65536 )
2025-08-26 09:57:40 -07:00
Ryan Br...
882d560e9e
Dataflow CDK feature completeness. ( #65143 )
2025-08-22 10:52:13 -07:00
Ryan Br...
741893caf8
Set test env for dockerized dests tests ( #65141 )
2025-08-22 10:18:37 -07:00
Ryan Br...
c63fcdb398
run-agg-and-flush-on-diff-dispatchers ( #64924 )
2025-08-14 11:07:23 -07:00