1
0
mirror of synced 2025-12-23 11:57:55 -05:00
Commit Graph

1127 Commits

Author SHA1 Message Date
Edward Gao
53da5baa7d Destination bigquery 1s1t: fix 1s1t schema change logic; extract TyperDeduper (#28490)
* rename for clarity

* fix cleanup method

* giant commit because I'm irresponsible

* rename constant

* better raw table creation

* fix build?

* move code around

* tweaks

* more code shuffling

* Automated Commit - Format and Process Resources Changes

* add tests

* minor tweak

* remove unimportant methods

* cleanup

* Automated Commit - Format and Process Resources Changes

* derp

* clean up tests

* some more fixes post-merge

* botched merge

* create NoopTyperDeduper

* try and update everything to work?

* tweak comment

* move suffix args to end of list

* fix exception message

* Automated Commit - Format and Process Resources Changes

* add sqlgenerator test for softReset

* only prepare once

* update log message

* do what intellij says

* implement one more test

* less indirection

* Automated Commit - Format and Process Resources Changes

* rename test

* use noop in test

* version bump + changelog

* use stringutils

* fix typo

* flip if-statement

* typo

* simplify logic

* fix schema change logic

* typo

* use spy for clarity

* Automated Commit - Format and Process Resources Changes

* better test teardown

* slightly better logs

* fix exception message

* softReset returns single string

* Automated Commit - Format and Process Resources Changes

* simplify if chain

---------

Co-authored-by: edgao <edgao@users.noreply.github.com>
2023-07-21 13:45:19 -06:00
Maxime Carbonneau-Leclerc
2464106459 Version bump to release CATs (#28577) 2023-07-21 15:12:25 -04:00
Maxime Carbonneau-Leclerc
675175a50e Allow for sources without spec.json or spec.yaml (#28519)
* Allow for sources without spec.json or spec.yaml

* Automated Commit - Format and Process Resources Changes
2023-07-21 08:32:32 -04:00
Cynthia Yin
7e4797d90d Destinations V2: clean up AirbyteType code (#28430)
* general cleanup - move stuff around, add more comments

* guarantee `getAirbyteProtocolType` won't handle array values for `type`

* rename OneOf to Union

* simplify union ordering logic

* update testChooseUnion

* fix docs typos

* Automated Commit - Format and Process Resources Changes

* address comments

* Automated Commit - Format and Process Resources Changes

---------

Co-authored-by: cynthiaxyin <cynthiaxyin@users.noreply.github.com>
2023-07-20 14:59:52 -06:00
Edward Gao
225bfc4900 Destination Bigquery: improve test reliability by randomizing staging path in GCS tests (#28520) 2023-07-20 13:13:02 -07:00
Joe Bell
a16cbea2ae Destination BigQuery - Handle Schema Changes (#28382)
* Add ability to detect differences in expected Schemas and perform soft resets

* Remove alter table for overwrite syncs since its unneccessary

* Updates after testing

* pr reorganize

* comments

* add collection util test

* Add Tests

* bump version

* Automated Commit - Format and Process Resources Changes

* Destination BigQuery - Reduce amount of typing and deduping for GCS staging (#28489)

* undo comment out

* centralize t&d logic for staging and standard, add valve to staging

* Share more logic for typing and deduping

* Remove record checking logic and use only time for staging inserts

* Add Javadoc

* Automated Commit - Format and Process Resources Changes

---------

Co-authored-by: jbfbell <jbfbell@users.noreply.github.com>

* Change TableNotMigratedException to extend runtime exception, remove SqlGenerator interface method

* Make Lambda slightly more readable

* add test for validating v2 schemas

* change soft reset to single string

* convert back to list, update dockerfile

* remove needless default

---------

Co-authored-by: jbfbell <jbfbell@users.noreply.github.com>
2023-07-20 09:25:44 -06:00
Cynthia Yin
80a57912f5 found a bug with time types (#28495) 2023-07-19 20:20:29 -05:00
Pedro S. Lopez
e10f768b50 workaround for normalizations (#28451) 2023-07-18 23:37:34 -05:00
Alexandre Girard
95c20eb79c connector-acceptance-tests: Bump PyYaml to 6.0 (#28432)
* pin cdk to 0.46.0

* bump pyyaml

* reset
2023-07-18 18:41:24 -05:00
Edward Gao
4f64c2adfb Destination Bigquery: 1s1t: handle raw name collisions (#28366)
* move records to separate files

* better raw table names

* update tests

* manual format b/c CI bugged out
2023-07-17 15:39:20 -05:00
Evan Tahler
b81cc031e0 destination-redshift should fail syncs if records or properties are too large, rather than silently skipping records and succeeding (#27993)
* `destination-redshift` will fail syncs if records or properties are too large, rather than silently skipping records and succeding

* Bump version

* remove tests that don't matter any more

* more test removal

* more test removal

---------

Co-authored-by: Augustin <augustin@airbyte.io>
2023-07-14 14:27:12 -05:00
Edward Gao
934acaa137 Destination bigquery: rerelease 1s1t behind gate (#27936)
* Revert "Revert "Destination Bigquery: Scaffolding for destinations v2 (#27268)""

This reverts commit 348c577dbb.

* version bumps+changelog

* Speed up BQ by having 2 queries, and not an OR (#27981)

* 🐛 Destination Bigquery: fix bug in standard inserts for syncs >10K records (#27856)

* only run t+d code if it's enabled

* dockerfile+changelog

* remove changelog entry

* Destinations V2: handle optional fields for `object` and `array` types (#27898)

* catch null schema

* fix null properties

* clean up

* consolidate + add more tests

* try catch

* empty json test

* Automated Commit - Formatting Changes

* remove todo

* destination bigquery: misc updates to 1s1t code (#28057)

* switch to checkedconsumer

* add unit test for buildColumnId

* use flag

* restructure prefix check

* fix build

* more type-parsing fixes (#28100)

* more type-parsing fixes

* handle duplicates

* Automated Commit - Format and Process Resources Changes

* add tests for asColumns

* Automated Commit - Format and Process Resources Changes

* log warnings instead of throwing exception

* better log message

* error level

---------

Co-authored-by: edgao <edgao@users.noreply.github.com>

* Automated Commit - Formatting Changes

* Improve protocol type parsing (#28126)

* Automated Commit - Formatting Changes

* Change from T&D every 10k records to an increasing time based interval (#28130)

* fifteen minute t&d

* add typing and deduping operation valve for increased intervals of typing and deduping

* Automated Commit - Format and Process Resources Changes

* resolve bizarre merge conflict

* Automated Commit - Format and Process Resources Changes

---------

Co-authored-by: jbfbell <jbfbell@users.noreply.github.com>

* Simplify and speed up CDC delete support [DestinationsV2] (#28029)

* Simplify and speed up CDC delete support [DestinationsV2]

* better QUOTE

* spotbugs?

* recompile dbt image for local arch and use that when building images

* things compile, but tests fail

* tests working-ish

* comment

* fix logic to re-insert deleted records for cursor comparison.

tests pass!

* remove comment

* Skip CDC re-include logic if there are no CDC columns

* stop hardcoding pk (#28092)

* wip

* remove TODOs

---------

Co-authored-by: Edward Gao <edward.gao@airbyte.io>

* update method name

* Automated Commit - Formatting Changes

* depend on pinned normalization version

* implement 1s1t DATs for destination-bigquery (#27852)

* intiial implementation

* Automated Commit - Formatting Changes

* add second sync to test

* do concurrent things

* Automated Commit - Formatting Changes

* clarify comment

* minor tweaks

* more stuff

* Automated Commit - Formatting Changes

* minor cleanup

* lots of fixes

* handle sql vs json null better
* verify extra columns
* only check deleted_at if in DEDUP mode and the column exists
* add full refresh append test case

* Automated Commit - Formatting Changes

* add tests for the remaining sync modes

* Automated Commit - Formatting Changes

* readability stuff

* Automated Commit - Formatting Changes

* add test for gcs mode

* remove static fields

* Automated Commit - Formatting Changes

* add more test cases, tweak test scaffold

* cleanup

* Automated Commit - Formatting Changes

* extract recorddiffer

* and use it in the sql generator test

* fix

* comment

* naming+comment

* one more comment

* better assert

* remove unnecessary thing

* one last thing

* Automated Commit - Formatting Changes

* enable concurrent execution on all java integration tests

* add test for default namespace

* Automated Commit - Formatting Changes

* implement a 2-stream test

* Automated Commit - Formatting Changes

* extract methods

* invert jsonNodesNotEquivalent

* Automated Commit - Formatting Changes

* fix conditional

* pull out diffSingleRecord

* Automated Commit - Formatting Changes

* handle nulls correctly

* remove raw-specific handling; break up methods

* Automated Commit - Formatting Changes

---------

Co-authored-by: edgao <edgao@users.noreply.github.com>
Co-authored-by: octavia-approvington <octavia-approvington@users.noreply.github.com>

* Destinations V2: move create raw tables earlier (#28255)

* move create raw tables

* better log message

* stop building normalization (#28256)

* fix ability to run tests

* disable incremental t+d for now

* Automated Commit - Formatting Changes

---------

Co-authored-by: Evan Tahler <evan@airbyte.io>
Co-authored-by: Cynthia Yin <cynthia@airbyte.io>
Co-authored-by: cynthiaxyin <cynthiaxyin@users.noreply.github.com>
Co-authored-by: edgao <edgao@users.noreply.github.com>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
Co-authored-by: jbfbell <jbfbell@users.noreply.github.com>
Co-authored-by: octavia-approvington <octavia-approvington@users.noreply.github.com>
2023-07-14 09:34:56 -05:00
Charles
6c2fb9ff81 fix async destination order (#28257)
Partly closes airbytehq/oncall#2437 - a rare condition that can occur when comparator values change mid-comparison.

Closes #28154 - a NPE when retrieving from the queue.

- For the comparator error, we are unfortunately unable to write a test case to prove the error. It's exceedingly difficult to force the java.lang.IllegalArgumentException: Comparison method violates its general contract! error. We do know the current implementation contains an error this PR fixes.
- The NPE error is straight-forward. The peek returns a null and we were not accounting for that before.
2023-07-13 19:47:03 -07:00
Duy Nguyen
633c939d46 [Source-postgres] Set default cursor value for cdc mode (#27442)
* use LSN as default cursor for postgres CDC

* Fixed static constant

* Set lsn default cursor value for postgres sync

* Bumped metadata and dockerfile versions

* Disable acceptance backwards compatibility discovery test as this is a breaking change

---------

Co-authored-by: Conor <cpdeethree@users.noreply.github.com>
2023-07-13 11:12:34 -05:00
Subodh Kant Chaturvedi
4ddd057039 source-postgres: implement logic to make CDC compatible with ctid (#27652)
* wip

* wip 2

* undo unwatned change

* some refactoring

* fix few tests

* more fixing

* more fixes

* Automated Commit - Format and Process Resources Changes

* more improvements

* 1 more test

* another test

* add flag for ctid enabling

* fix conflicts

* else block is not required

* use emittedAt

* skip WAL processing if streams under vacuuming

---------

Co-authored-by: subodh1810 <subodh1810@users.noreply.github.com>
Co-authored-by: Augustin <augustin@airbyte.io>
2023-07-06 13:37:28 -05:00
Joe Bell
b03be1b714 🐛 Destination Snowflake: Merge old snowflake work (#27935)
* Adds data as JsonNode to pass through, running into memory issues so add JVM args to attach VisualVM

* JsonNode

* Lowers the optimal batch size to see if this improvements movement

* Fixes NPE by checking if PartialAirbyteMessage contains a PartialAirbyteRecord

* Fixes config switch when config is not explicitly set (config migration needed)

* Adds logic to check if queue has elements before getting timeOfLastMessage

* Add PartialSerialisedMessage test. (#27452)

* Test deserialise.

* Add tests.

* Simplify and fix tests.

* Format.

* Adds tests for deserializeAirbyteMessage

* Adds tests for deserializeAirbyteMessage with bad data

* Cleans up deserializeAirbyteMessage and throws Exception when invalid message

* More code cleanup

---------

Co-authored-by: ryankfu <ryan.fu@airbyte.io>

* 🤖 Auto format destination-snowflake code [skip ci]

* Cleans up code w/o JVM args & rebase

* 🤖 Auto format destination-snowflake code [skip ci]

* Adds breadcrumb on the STATE message deviation and where the deserialize/serialize is done to unpack

* 🤖 Auto format destination-snowflake code [skip ci]

* Adds back line formatter removed and comment describing rational for lower batchSize

* Bumps Snowflake version and type checks

* Added note to remove PartialAirbyteRecordMessage with low resource testing

* Automated Commit - Format and Process Resources Changes

* Fix issue with multiple namespaces in snowflake not writing to the correct staging schema

* Fix issue with multiple namespaces in snowflake not writing to the correct staging schema

* remove stage name maniuplating method

* update readme

* Source Stripe: update credit_notes expected records (#27941)

* Source Zendesk Talk: update expected records (#27942)

* Source Xero: update expected records (#27943)

* Metadata: Persist Registry entries (#27766)

* DNC

* Update poetry

* Update dagster

* Apply partition

* Get metadata entry

* Use helpers

* Write registry entry to appropriate location

* Delete when registry removed

* Update to use new file (broken)

* Render registry from registry entries

* Run format

* Fix plural issue

* Update to all metadata file blobs

* Fix test

* Update to all blobs

* Add ignore validation error for version logic

* Rename to max_run_request

* Pedros review

* Ella suggestions

Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com>

* Update airbyte-ci/connectors/metadata_service/orchestrator/orchestrator/assets/registry_entry.py

Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com>

* Update naming

* Add tests for connector type and deletion

* Test safe parse

* Format

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@sers.noreply.github.com>
Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com>

* Fix Dagster Deploy Failure (#27955)

* Add pydantic

* Add pydantic to orchestration deploy pipeline

* 🐛 Source Jira: update expected records (#27951)

* Source Jira: update expected records

* Update issues expected records

* Source Zendesk Chat: update expected records (#27965)

* 🐛 Source Pipedrive: update expected records (#27967)

* 🐛 Source Pinterest: update expected records (#27964)

*  Source Amazon-Ads: Add streams for portfolios and sponsored brands v3 (#27607)

* Add stream for sponsored brands v3
* Add new stream Portfolios

* Source Google Search Console: added discover and googleNews to searchType (#27952)

* added discover and googleNews to searchType

* updated changelog

* fixed types for streams

* 🎉 Source Instagram: Improve, refactor `STATE` management (#27908)

* add test for enabling

* update versions

* fix test

* update other snowflake loading method types

* remove standard

---------

Co-authored-by: ryankfu <ryan.fu@airbyte.io>
Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: ryankfu <ryankfu@users.noreply.github.com>
Co-authored-by: Augustin <augustin@airbyte.io>
Co-authored-by: Arsen Losenko <20901439+arsenlosenko@users.noreply.github.com>
Co-authored-by: Ben Church <ben@airbyte.io>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@sers.noreply.github.com>
Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com>
Co-authored-by: Anatolii Yatsuk <35109939+tolik0@users.noreply.github.com>
Co-authored-by: Daryna Ishchenko <80129833+darynaishchenko@users.noreply.github.com>
Co-authored-by: Baz <oleksandr.bazarnov@globallogic.com>
2023-07-06 12:28:52 -05:00
Augustin
816e83700e connectors-ci/cat: validate image under test is the right one (#27919) 2023-07-03 18:19:52 +00:00
Edward Gao
52b8cbe39d Revert "Destination Bigquery: Scaffolding for destinations v2 (#27268)" (#27891)
* Revert "Destination Bigquery: Scaffolding for destinations v2 (#27268)"

This reverts commit ba3e39bb0c.

* bump versions to 1.5.1 everywhere
2023-06-30 20:26:48 -04:00
Cynthia Yin
98a0dad893 Destinations V2: change singleton log message level from warn to debug (#27887) 2023-06-30 13:59:06 -07:00
Edward Gao
ba3e39bb0c Destination Bigquery: Scaffolding for destinations v2 (#27268)
* copy files from edgao branch

* start writing create table statement

* add basic unit test setup

* create a table, probably

* remove outdated todo

* derp, one more column

* ugh

* add partitioning+clustering

* use StringSubstitutor

* substitutions in updateTable

* wip generate update/insert statement

* split up into smaller methods

* handle json types correctly

* rename stuff

* more json_query vs _value stuff

* minor tweak

* super basic test setup

* laying foundation for type parsing

* more stuff

* tweaks

* more progress on type parsing

* fix json_value stuff?

* misc fixes in insert

* fix dedupFinalTable

* add testDedupRaw

* full e2e test

* type parsing: gave up and mirrored the dbt code structure to avoid bugs

* type parsing - more cleanup

* handle column name collisions

* handle tablename collisions...?

* comments

* remove original ns/name from quotedstream

* also javadoc

* remove redundant method

* fix table rename

* add incremental append test

* add full refresh append test

* comment

* call T+D sql in a reasonable location for standard inserts

* add config option

* use config option here

* type parsing - fix fromJsonSchema

* gate everything

* log query + runtime

* add spec option temporarily

* Raw Table Updates

* fix more stuff

* first big pass at toDialectType

* no quotes

* wrap everything in quotes

* resolve some TODOs

* log sql statement in tests

* overwriteFinalTable returns optional

* minor clean up

* add raw dataset override

* try to preserve the original namespace for t+d?

* write to the raw table correctly

* update todos

* write directly to raw table

this is kind of dumb because we're still trying to do tmp table operations,
and we still don't ack state until the end of the entire sync.

* standard inserts write to raw table correctly

* imports + log statements

* move logs + add comment

* explicitly create raw table

* move comment to better place

* Typing issues

* bash attempt

* formatting updates

* formatting updates

* write to the airbyte schema by default unless overriden by config options

* standard inserts truncate raw table at start of sync

* full refresh overwrite will overwrite correctly!

* fix avro record schema parsing

* better raw table recreate

* rename raw table to match standard inserts

* full refresh overwrite does tmp table things

* small clean up

* small clean up

* remove errors entry if no errors

* pull out destination config into singleton

* clean up singleton stuff

* make sure dest config exists when trying to do lookups

* avoid stringifying null

* quick thoughts on alter table

* add basic cdc testcase

* tweak cdc test setup

* rename raw table to match standard inserts

* minor tweak

* delete exact sql string assertions

* switch to JSON type

* minor cleanup

* sql whitespace changes

* explain cdc deletions

* GCS Staging Full Refresh create temp table

* assert schema

* first out of order cdc test

* add another cdc test case (currently failing)

* better test structure

* make this work

* oops, fix test

* stop trying to delete deletion records

* minor improvements to code+test

* enable concurrent test runs on integration test

* move stuff to static initializer

* extract utility method

* formatting

* Move conditional to the base java package, replace conditionals which did not use the typing and deduping flag but should have been.

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* switch back to empty list; write big assert

* minor wording tweaks

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* DestinationConfigTest

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* formatting

* remove ParsedType

* 🤖 Auto format destination-gcs code [skip ci]

* 🤖 Auto format destination-bigquery code [skip ci]

* tests verify every data type

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* full update with all data types

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* move stuff to new base lib

* 🤖 Auto format destination-gcs code [skip ci]

* Automated Commit - Formatting Changes

* 🤖 Auto format destination-bigquery code [skip ci]

* fix test

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* asserts in dedupFinalTable

* better asserts in dedupRawTable

* [wip] test case for all data types

* 🤖 Auto format destination-gcs code [skip ci]

* 🤖 Auto format destination-bigquery code [skip ci]

* AirbyteTypeTest

* Automated Commit - Formatting Changes

* remove comments

* test chooseOneOf

* slightly better test output

* Automated Commit - Formatting Changes

* add some awful pretty print code

* more comment

* minor tweaks

* verify array/object type

* fix test

* handle deletions more correctly

* test toDialectType

* Destinations v2: better namespace handling (#27682)

* [wip] better namespace handling

* 🤖 Auto format destination-bigquery code [skip ci]

* wip also implement in gcs

* get gcs working (?)

* 🤖 Auto format destination-bigquery code [skip ci]

* remove duplicate method

* 🤖 Auto format destination-bigquery code [skip ci]

* fixed my code style settings

* make ci happy?

* 🤖 Auto format destination-bigquery code [skip ci]

* make ci happy?

* remove incorrect test

* blank line change

* initialize singleton

---------

Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>

* reset args correctly

* Automated Commit - Formatting Changes

* more bash stuff

* parse implicit structs

* initialize singleton in more tests

* Automated Commit - Formatting Changes

* I missed this namespace handling thing

* test more schemas

* fix singular types specified in arrays

* Automated Commit - Formatting Changes

* disable test for unimplemented feature

* initialize singleton

* remove spec options; changelogs+metadata

* randomize namespace

* also bump dockerfile

* unremove namespace sanitizing in legacy mode

* ... disable the correct test

* even more unit test fixes!

* move integration test to integration tests

---------

Co-authored-by: Cynthia Yin <cynthia@airbyte.io>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: edgao <edgao@users.noreply.github.com>
Co-authored-by: cynthiaxyin <cynthiaxyin@users.noreply.github.com>
2023-06-29 08:44:37 -07:00
Ben Church
5b183cbb7a Bnchrch/cat/backwards fail removed prop (#27685)
* Incorrect way to do this

* Working

* Make tests pretty

* Revert "Incorrect way to do this"

This reverts commit f8e29594c1b5fa07bad805806f2571af883d27fd.

* Add backwards compatibility docs

* bump version

* format

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@sers.noreply.github.com>
2023-06-28 08:21:27 -07:00
Evan Tahler
4fb1f98221 Fix destination-s3 build (#27786)
* bump version

* PR id

* shh normalization, shh

* remove a bunch of arm64 deps?

* might as well match the dockerfile
2023-06-27 17:15:44 -07:00
Edward Gao
9c56062a7d Normalization integration tests: set explicit cursor on cdc streams (#27670) 2023-06-23 14:33:49 -07:00
Augustin
8dcaf2b469 cat: increase docker connection timeout to 2mn (#27429) 2023-06-19 12:36:47 +02:00
Maxime Carbonneau-Leclerc
77dcefc47b Issue 26607/fix cats support array in state (#27358)
* [ISSUE #26607] fix cursor paths

* [ISSUE #26607] changelog and bump version

* Automated Commit - Format and Process Resources Changes
2023-06-14 11:33:26 -05:00
Augustin
5b8200181c connectors-ci: deprecate slash test (#27200) 2023-06-14 18:19:13 +02:00
Maxime Carbonneau-Leclerc
01a8f19d83 [ISSUE #26607] CATs: support list in state (#27306) 2023-06-13 12:10:22 -04:00
Ryan Fu
769ed3efa7 Adds PartialAirbyteMessage overhead & removes serialize/deserialize on CSV writer (#27222)
* No Op change to see if Connectors Base is unrelated

* Isolated test to see the memory used after processing CSV writer

* Changes record writer

* Removed Json.serialize on string

* Turn on Snowflake Async

* Fixed emittedAt timestamp in milliseconds

* Add comments. Undo Evan build changes.

* Allow remote debugging.

* Adds calc for PartialAirbyteRecordMessage memory overhead

* Revert back airbyte build architecture

* Passes in the parsed emittedAt to the CSVwriter

* Cleans up lingering comments

* Performance enhancement - increase the available JVM memory known by the container

* Serializes the data earlier to avoid normalization issues and lowers optimal batch size

* Passes messageString instead of serializing the data due to memory overhead causes OOM early

* Removes memory overhead of record data

* Cleans up PR, fixes memory issues & increases throughput by lowering optimal batch size

* Add calculation comment & removes procps from docker install

* Automated Commit - Format and Process Resources Changes

* Turns on Async for testing purposes

* Moves up implementation higher up the classes to remove unnecessary no-op

* Automated Commit - Format and Process Resources Changes

* Adds TODOs for migrating buffers to use only serialized records

* Breaks down calc & throws UnsupportedOperationException

* Adds TODO for migration all destinations to use the getDataRow(id, formattedString, emittedAt) to avoid unnecessary ser-de overhead

---------

Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: ryankfu <ryankfu@users.noreply.github.com>
2023-06-12 18:27:13 -07:00
Subodh Kant Chaturvedi
42e0a683f6 postgres-cdc: implementation to construct initial debezium state manually to skip snapshot (#27106)
* postgres-cdc: implementation to construct initial debezium state manually to skip snapshot

* add comment
2023-06-12 21:55:29 +05:30
Conor
4c9acc55a2 allow specifying override version in bump_version.sh (#27127)
* allow specifying override version in bump_version.sh

* Automated Commit - Format and Process Resources Changes
2023-06-07 16:04:34 -05:00
Ben Church
9711bee6ad CAT: Ensure we only generate iso8601 with hypothesis in backwards compatibility test (#26683)
* Ensure we only generate iso8601

* Remove format in case of pattern

* Remove unused types

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@sers.noreply.github.com>
2023-06-07 16:43:11 +00:00
Evan Tahler
4dd9fe0c1c Fix normalization builds (#26930) 2023-06-02 07:41:40 -07:00
Augustin
f3c7e5f875 cat: skip test_catalog_has_supported_data_types (#26926)
* skip test_catalog_has_supported_data_types

* bump version
2023-06-01 17:07:16 -05:00
Charles
4ac62f3b4f Async Snowflake Destination (#26703)
* snowflake at end of coding retreat week

* turn off async snowflake

* add comment

* Fixed test to point to the ShimMessageConsumer instead of AsyncMessageConsumer

* Automated Change

---------

Co-authored-by: ryankfu <ryan.fu@airbyte.io>
Co-authored-by: ryankfu <ryankfu@users.noreply.github.com>
2023-05-31 23:01:36 +00:00
Augustin
de052da4f4 cat: fix test_catalog_has_supporte_data_types when / in stream property (#26868) 2023-05-31 22:10:55 +00:00
Augustin
789c5dd384 cat: integer is a supported airbyte type (#26856) 2023-05-31 20:09:29 +02:00
Charles
223f698516 encapsulate trigger logic for flush workers (#26706) 2023-05-31 09:59:05 -07:00
Charles
480a43b9ab grab bag of non-controversial clean up tasks (#26702) 2023-05-31 08:53:27 -07:00
Charles
567f839834 Serialized Message Consumer (#26700) 2023-05-31 08:14:21 -07:00
Augustin
bef78705b8 cat: fix typo in changelog (#26797) 2023-05-30 15:44:00 -05:00
Augustin
818e68860a cat: fix bug in test_catalog_has_supported_data_types (#26766) 2023-05-30 22:41:22 +02:00
Augustin
7de2886d7b cat: validate types, formats, airbyte types and their combinations on catalog (#26669) 2023-05-30 10:48:00 +02:00
Evan Tahler
75240b0bbf Multi-architecture normalization build (local) (#26677)
* Multi-architecture normalization build (local)

When building and testing normalization locally, we need to force the base images to match the local host OS.

This is not a problem when publishing the connectors as `airbyte-ci`/dagger handles this for us

* Update build.gradle
2023-05-26 10:57:15 -07:00
Edward Gao
cf2ded2bbb Destination Bigquery: small tweak to clarify logs (#26585)
* make logs less misleading

* version bumps + changelog

* tweak wording
2023-05-25 18:20:40 +00:00
Subodh Kant Chaturvedi
9b9809b006 fix(cdc): limit queue size to lower memory consumption (#26473)
* fix(cdc): limit queue size to lower memory consumption

* add queue size attribute in spec

* disable retries

* fix log

* review comments

* add validation test for queue size

* update expected spec

* bump version + changelog

* update metadata files

---------

Co-authored-by: Ben Church <ben@airbyte.io>
2023-05-25 22:34:40 +05:30
Augustin
fd3655707e connectors-ci: add finalize_build logic to handle custom Dockerfiles (#26489) 2023-05-25 09:32:19 +02:00
Rodi Reich Zilberman
2aa8a0dbbb Allow sessionvariables jdbc url param in source-mysql (#25859)
* initial commit

* cleanup

* cleanup

* Throw a configuration exception in case of bad jdbc url param

* End-to-End test session variable jdbc param

* reafctoring sanity

* sanity

* bump version and update note

* cherry pick fix to unrelated build error

* fix failing test

---------

Co-authored-by: Ryan Fu <ryan.fu@airbyte.io>
2023-05-22 18:53:33 -07:00
Charles
5f3ed16408 hide queue inside MemoryBoundedLinkedBlockingQueue (#26375)
Hiding the actual java Queue has an inner class to avoid the chance that someone tries to use native queue methods that we haven't overridden. Good thing that I did this too, because one of the changes we made during hack dayz wasn't reflected in our current feature branch. We need to override poll(time, unit) not just poll. This PR makes sure we won't make that mistake again!
2023-05-22 18:04:05 -07:00
Davin Chia
1ed32e55bc Async Code V0: Flush Worker Improvements. (#26384)
Incorporate the changes from #26178 .

- Update queue method from setMaxMemory to addMaxMemory.
- Update work retrieval logic to only assign work
  - if there are free worker threads
  - to account for in-progress worker threads
2023-05-22 17:53:10 -07:00
Edward Gao
d261259575 fix build (#26378)
* fix build

* delete unused variable
2023-05-23 00:19:00 +00:00