1
0
mirror of synced 2025-12-26 14:02:10 -05:00
Commit Graph

227 Commits

Author SHA1 Message Date
Davin Chia
ec0f83b03f 🚨 Destination Snowflake: Remove GCS/S3 Staging. (#29236)
As title, remove the GCS/S3 staging methods.

There isn't much usage so we can remove this. Internal Staging is also recommended by Snowflake, so using that is both cheaper and faster.

Co-authored-by: davinchia <davinchia@users.noreply.github.com>
Co-authored-by: Evan Tahler <evan@airbyte.io>
Co-authored-by: Pedro S. Lopez <pedroslopez@me.com>
2023-08-18 13:15:25 -07:00
Edward Gao
f003a062ba 🐛 Destination bigquery: Properly fix per-stream state handling (#29498)
Co-authored-by: edgao <edgao@users.noreply.github.com>
2023-08-17 15:30:37 +00:00
Joe Bell
085b1215ca 🐛 Destination Bigquery - fix migration logic (#29461) 2023-08-16 22:51:23 -06:00
Edward Gao
e9f1a7ad01 Destination snowflake 1s1t: release v2 early access (#29174)
* disable 1s1t in gcs/s3 mode

* derp

* quote things in many places

* fix timestamp format...?

* delete unused tests

* more expectedrecord timestamp fixes

* implement dumpFinalTable

* fix bug

* bugfix in schema change detection

* add schema change detection tests

* fix schema change detection

* add spec options

* logistics

* add timestamp format test

* and snowflake

* accept raw schema override

* fix test handling

* fix unit test

* Automated Commit - Format and Process Resources Changes

* typo

* forgot to fix this

* ... I had uncommitted changes

* resolve TODOs

* add regex examples

* correctly drop check table in check connection

* also bump teradata >.>

---------

Co-authored-by: edgao <edgao@users.noreply.github.com>
2023-08-09 10:21:05 -06:00
Edward Gao
38530ee0b3 Destinations snowflake, redshift: Simplify default namespace handling (#29188)
* pass default namespace to more convenient location

* add final modifier

* logistcs
2023-08-08 18:34:41 -06:00
Edward Gao
2866ed6f2f Destination snowflake: mostly done implementations for sqlgenerator+destinationhandler (#28677)
* csv sheet generator supports 1s1t

* create+insert raw tables 1s1t

* add skeletons

* start writing tests

* progress in creating raw tables

* fix tests

* add s3 test; better csv generation

* handle case-sensitive column names

* also add gcs test

* hook T+D into the destination

* fix redshift; simplify

* Delete unused files?

* disable test; enable cleanup

* initialize config singleton in tests

* logistics

* header

* simplify

* fix unit tests

* correctly disable tests

* use default null for loaded_at

* fix test

* autoformat

* cython >.>

* more singleton init

* literally how?

* basic destinationhandler impl

* use raw string for type >.>

* add toDialectType

* basic createTable impl

* better sql query

* comment

* unused variables

* recorddiffer can be case-sensitive

* misc fixes

* add expected_records

* move constants to base-java

* use ternary

* fix tests

* resolve todo

* T+D can trigger on first commit

* fix test teardown

* implement softReset

* implement overwriteFinalTable

* better type stuff; check table schema

* fix

* derp

* implement updateTable?

* derp

* random wip stuff

* fix insertRaw

* theoretically implement stuff?

* stuff

* put suffix at the end

* different uuids

* fix expected records

* move tdtest resources into dat folder

* use resource files

* stuff

* move code around

* more stuff

* rename final table

* stuff

* cdc immediate deletion

* cdcComplexUpdate

* cleanup

* botched rebase

* more tests

* move back to old file

* Automated Commit - Format and Process Resources Changes

* add comments

* Automated Commit - Format and Process Resources Changes

* fix merge

* move expected_records into dat folder

* wip implement sqlgenerator test

* basic implementation

* tons of fixes, still tons more to go

* more stuff

* fix more things

* hacky convert temporal types to varchar

* test data fix

* fix variant parsing

* fix number

* fix time parsing; fix test data

* typo

* fix input data

* progress

* switch back to float

* add more test files

* swap int -> number

* fix PK null check

* fix overwriteTable

* better test

* Automated Commit - Format and Process Resources Changes

* type aliases, one more test

* also verify numeric precision/scale

* logistics

---------

Co-authored-by: edgao <edgao@users.noreply.github.com>
2023-08-07 09:14:59 -07:00
Benoit Moriceau
d7f6bcbefe 🐛 Avoid writing records to log (#29047)
* Avoid writing records to log

* Update version
2023-08-03 15:20:12 -05:00
Jonathan Pearlin
549e36f156 Proof of concept parallel source stream reading implementation for MySQL (#26580)
* Proof of concept parallel source stream reading implementation for MySQL

* Automated Change

* Add read method that supports concurrent execution to Source interface

* Remove parallel iterator

* Ensure that executor service is stopped

* Automated Commit - Format and Process Resources Changes

* Expose method to fix compilation issue

* Use concurrent map to avoid access issues

* Automated Commit - Format and Process Resources Changes

* Ensure concurrent streams finish before closing source

* Fix compile issue

* Formatting

* Exclude concurrent stream threads from orphan thread watcher

* Automated Commit - Format and Process Resources Changes

* Refactor orphaned thread logic to account for concurrent execution

* PR feedback

* Implement readStreams in wrapper source

* Automated Commit - Format and Process Resources Changes

* Add readStream override

* Automated Commit - Format and Process Resources Changes

* 🤖 Auto format source-mysql code [skip ci]

* 🤖 Auto format source-mysql code [skip ci]

* 🤖 Auto format source-mysql code [skip ci]

* 🤖 Auto format source-mysql code [skip ci]

* 🤖 Auto format source-mysql code [skip ci]

* Debug logging

* Reduce logging level

* Replace synchronized calls to System.out.println when concurrent

* Close consumer

* Flush before close

* Automated Commit - Format and Process Resources Changes

* Remove charset

* Use ASCII and flush periodically for parallel streams

* Test performance harness patch

* Automated Commit - Format and Process Resources Changes

* Cleanup

* Logging to identify concurrent read enabled

* Mark parameter as final

---------

Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com>
Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com>
Co-authored-by: rodireich <rodireich@users.noreply.github.com>
2023-08-03 13:23:52 -05:00
Benoit Moriceau
792a2e5f57 Add common async methods (#29003)
* Add common async methods

* Automated Commit - Format and Process Resources Changes

---------

Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com>
2023-08-02 13:32:06 -05:00
Benoit Moriceau
a68ea60f63 Reduce log noise (#28917)
* Reduce log noise

* Automated Commit - Format and Process Resources Changes

---------

Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com>
2023-08-01 13:21:41 -05:00
Edward Gao
360f0e8f74 Destination snowflake: Add 1s1t skeletons (#28618)
* csv sheet generator supports 1s1t

* create+insert raw tables 1s1t

* add skeletons

* start writing tests

* progress in creating raw tables

* fix tests

* add s3 test; better csv generation

* handle case-sensitive column names

* also add gcs test

* hook T+D into the destination

* fix redshift; simplify

* Delete unused files?

* disable test; enable cleanup

* initialize config singleton in tests

* logistics

* header

* simplify

* fix unit tests

* correctly disable tests

* use default null for loaded_at

* fix test

* autoformat

* cython >.>

* more singleton init

* literally how?

* unused variables

* recorddiffer can be case-sensitive

* move constants to base-java

* use ternary
2023-07-31 11:14:25 -05:00
Edward Gao
83fb3caeea 🚨 Destination bigquery 1s1t: change raw dataset + table name (#28723)
* add test for raw dataset override

* tests hardcode raw dataset name

* rename raw tables

* minimum 1

* logistics

* different option per destination
2023-07-27 12:37:17 -05:00
Edward Gao
df274b7f40 Destination snowflake: destinations v2 scaffolding (#28584)
* deps

* scaffolding

* logistics

* base-jdbc depends on base-td
2023-07-24 14:58:50 -05:00
Davin Chia
496854caf4 Fix Async Framework Race Condition. (#28342)
While running the Snowflake certification test, we noticed NPE on this line, indicating the state id queue was empty.

This should never happen as we always keep an 'open' state id with a running counter to associate the next state the source emits to.

I eventually realised this happens because of a race condition in the FlushWorkers, where multiple threads flush the same queue not realising the state id the flushing logic was conducted on, was already flushed by another thread.
2023-07-21 14:41:38 -07:00
Edward Gao
934acaa137 Destination bigquery: rerelease 1s1t behind gate (#27936)
* Revert "Revert "Destination Bigquery: Scaffolding for destinations v2 (#27268)""

This reverts commit 348c577dbb.

* version bumps+changelog

* Speed up BQ by having 2 queries, and not an OR (#27981)

* 🐛 Destination Bigquery: fix bug in standard inserts for syncs >10K records (#27856)

* only run t+d code if it's enabled

* dockerfile+changelog

* remove changelog entry

* Destinations V2: handle optional fields for `object` and `array` types (#27898)

* catch null schema

* fix null properties

* clean up

* consolidate + add more tests

* try catch

* empty json test

* Automated Commit - Formatting Changes

* remove todo

* destination bigquery: misc updates to 1s1t code (#28057)

* switch to checkedconsumer

* add unit test for buildColumnId

* use flag

* restructure prefix check

* fix build

* more type-parsing fixes (#28100)

* more type-parsing fixes

* handle duplicates

* Automated Commit - Format and Process Resources Changes

* add tests for asColumns

* Automated Commit - Format and Process Resources Changes

* log warnings instead of throwing exception

* better log message

* error level

---------

Co-authored-by: edgao <edgao@users.noreply.github.com>

* Automated Commit - Formatting Changes

* Improve protocol type parsing (#28126)

* Automated Commit - Formatting Changes

* Change from T&D every 10k records to an increasing time based interval (#28130)

* fifteen minute t&d

* add typing and deduping operation valve for increased intervals of typing and deduping

* Automated Commit - Format and Process Resources Changes

* resolve bizarre merge conflict

* Automated Commit - Format and Process Resources Changes

---------

Co-authored-by: jbfbell <jbfbell@users.noreply.github.com>

* Simplify and speed up CDC delete support [DestinationsV2] (#28029)

* Simplify and speed up CDC delete support [DestinationsV2]

* better QUOTE

* spotbugs?

* recompile dbt image for local arch and use that when building images

* things compile, but tests fail

* tests working-ish

* comment

* fix logic to re-insert deleted records for cursor comparison.

tests pass!

* remove comment

* Skip CDC re-include logic if there are no CDC columns

* stop hardcoding pk (#28092)

* wip

* remove TODOs

---------

Co-authored-by: Edward Gao <edward.gao@airbyte.io>

* update method name

* Automated Commit - Formatting Changes

* depend on pinned normalization version

* implement 1s1t DATs for destination-bigquery (#27852)

* intiial implementation

* Automated Commit - Formatting Changes

* add second sync to test

* do concurrent things

* Automated Commit - Formatting Changes

* clarify comment

* minor tweaks

* more stuff

* Automated Commit - Formatting Changes

* minor cleanup

* lots of fixes

* handle sql vs json null better
* verify extra columns
* only check deleted_at if in DEDUP mode and the column exists
* add full refresh append test case

* Automated Commit - Formatting Changes

* add tests for the remaining sync modes

* Automated Commit - Formatting Changes

* readability stuff

* Automated Commit - Formatting Changes

* add test for gcs mode

* remove static fields

* Automated Commit - Formatting Changes

* add more test cases, tweak test scaffold

* cleanup

* Automated Commit - Formatting Changes

* extract recorddiffer

* and use it in the sql generator test

* fix

* comment

* naming+comment

* one more comment

* better assert

* remove unnecessary thing

* one last thing

* Automated Commit - Formatting Changes

* enable concurrent execution on all java integration tests

* add test for default namespace

* Automated Commit - Formatting Changes

* implement a 2-stream test

* Automated Commit - Formatting Changes

* extract methods

* invert jsonNodesNotEquivalent

* Automated Commit - Formatting Changes

* fix conditional

* pull out diffSingleRecord

* Automated Commit - Formatting Changes

* handle nulls correctly

* remove raw-specific handling; break up methods

* Automated Commit - Formatting Changes

---------

Co-authored-by: edgao <edgao@users.noreply.github.com>
Co-authored-by: octavia-approvington <octavia-approvington@users.noreply.github.com>

* Destinations V2: move create raw tables earlier (#28255)

* move create raw tables

* better log message

* stop building normalization (#28256)

* fix ability to run tests

* disable incremental t+d for now

* Automated Commit - Formatting Changes

---------

Co-authored-by: Evan Tahler <evan@airbyte.io>
Co-authored-by: Cynthia Yin <cynthia@airbyte.io>
Co-authored-by: cynthiaxyin <cynthiaxyin@users.noreply.github.com>
Co-authored-by: edgao <edgao@users.noreply.github.com>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
Co-authored-by: jbfbell <jbfbell@users.noreply.github.com>
Co-authored-by: octavia-approvington <octavia-approvington@users.noreply.github.com>
2023-07-14 09:34:56 -05:00
Charles
6c2fb9ff81 fix async destination order (#28257)
Partly closes airbytehq/oncall#2437 - a rare condition that can occur when comparator values change mid-comparison.

Closes #28154 - a NPE when retrieving from the queue.

- For the comparator error, we are unfortunately unable to write a test case to prove the error. It's exceedingly difficult to force the java.lang.IllegalArgumentException: Comparison method violates its general contract! error. We do know the current implementation contains an error this PR fixes.
- The NPE error is straight-forward. The peek returns a null and we were not accounting for that before.
2023-07-13 19:47:03 -07:00
Joe Bell
b03be1b714 🐛 Destination Snowflake: Merge old snowflake work (#27935)
* Adds data as JsonNode to pass through, running into memory issues so add JVM args to attach VisualVM

* JsonNode

* Lowers the optimal batch size to see if this improvements movement

* Fixes NPE by checking if PartialAirbyteMessage contains a PartialAirbyteRecord

* Fixes config switch when config is not explicitly set (config migration needed)

* Adds logic to check if queue has elements before getting timeOfLastMessage

* Add PartialSerialisedMessage test. (#27452)

* Test deserialise.

* Add tests.

* Simplify and fix tests.

* Format.

* Adds tests for deserializeAirbyteMessage

* Adds tests for deserializeAirbyteMessage with bad data

* Cleans up deserializeAirbyteMessage and throws Exception when invalid message

* More code cleanup

---------

Co-authored-by: ryankfu <ryan.fu@airbyte.io>

* 🤖 Auto format destination-snowflake code [skip ci]

* Cleans up code w/o JVM args & rebase

* 🤖 Auto format destination-snowflake code [skip ci]

* Adds breadcrumb on the STATE message deviation and where the deserialize/serialize is done to unpack

* 🤖 Auto format destination-snowflake code [skip ci]

* Adds back line formatter removed and comment describing rational for lower batchSize

* Bumps Snowflake version and type checks

* Added note to remove PartialAirbyteRecordMessage with low resource testing

* Automated Commit - Format and Process Resources Changes

* Fix issue with multiple namespaces in snowflake not writing to the correct staging schema

* Fix issue with multiple namespaces in snowflake not writing to the correct staging schema

* remove stage name maniuplating method

* update readme

* Source Stripe: update credit_notes expected records (#27941)

* Source Zendesk Talk: update expected records (#27942)

* Source Xero: update expected records (#27943)

* Metadata: Persist Registry entries (#27766)

* DNC

* Update poetry

* Update dagster

* Apply partition

* Get metadata entry

* Use helpers

* Write registry entry to appropriate location

* Delete when registry removed

* Update to use new file (broken)

* Render registry from registry entries

* Run format

* Fix plural issue

* Update to all metadata file blobs

* Fix test

* Update to all blobs

* Add ignore validation error for version logic

* Rename to max_run_request

* Pedros review

* Ella suggestions

Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com>

* Update airbyte-ci/connectors/metadata_service/orchestrator/orchestrator/assets/registry_entry.py

Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com>

* Update naming

* Add tests for connector type and deletion

* Test safe parse

* Format

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@sers.noreply.github.com>
Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com>

* Fix Dagster Deploy Failure (#27955)

* Add pydantic

* Add pydantic to orchestration deploy pipeline

* 🐛 Source Jira: update expected records (#27951)

* Source Jira: update expected records

* Update issues expected records

* Source Zendesk Chat: update expected records (#27965)

* 🐛 Source Pipedrive: update expected records (#27967)

* 🐛 Source Pinterest: update expected records (#27964)

*  Source Amazon-Ads: Add streams for portfolios and sponsored brands v3 (#27607)

* Add stream for sponsored brands v3
* Add new stream Portfolios

* Source Google Search Console: added discover and googleNews to searchType (#27952)

* added discover and googleNews to searchType

* updated changelog

* fixed types for streams

* 🎉 Source Instagram: Improve, refactor `STATE` management (#27908)

* add test for enabling

* update versions

* fix test

* update other snowflake loading method types

* remove standard

---------

Co-authored-by: ryankfu <ryan.fu@airbyte.io>
Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: ryankfu <ryankfu@users.noreply.github.com>
Co-authored-by: Augustin <augustin@airbyte.io>
Co-authored-by: Arsen Losenko <20901439+arsenlosenko@users.noreply.github.com>
Co-authored-by: Ben Church <ben@airbyte.io>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@sers.noreply.github.com>
Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com>
Co-authored-by: Anatolii Yatsuk <35109939+tolik0@users.noreply.github.com>
Co-authored-by: Daryna Ishchenko <80129833+darynaishchenko@users.noreply.github.com>
Co-authored-by: Baz <oleksandr.bazarnov@globallogic.com>
2023-07-06 12:28:52 -05:00
Edward Gao
52b8cbe39d Revert "Destination Bigquery: Scaffolding for destinations v2 (#27268)" (#27891)
* Revert "Destination Bigquery: Scaffolding for destinations v2 (#27268)"

This reverts commit ba3e39bb0c.

* bump versions to 1.5.1 everywhere
2023-06-30 20:26:48 -04:00
Cynthia Yin
98a0dad893 Destinations V2: change singleton log message level from warn to debug (#27887) 2023-06-30 13:59:06 -07:00
Edward Gao
ba3e39bb0c Destination Bigquery: Scaffolding for destinations v2 (#27268)
* copy files from edgao branch

* start writing create table statement

* add basic unit test setup

* create a table, probably

* remove outdated todo

* derp, one more column

* ugh

* add partitioning+clustering

* use StringSubstitutor

* substitutions in updateTable

* wip generate update/insert statement

* split up into smaller methods

* handle json types correctly

* rename stuff

* more json_query vs _value stuff

* minor tweak

* super basic test setup

* laying foundation for type parsing

* more stuff

* tweaks

* more progress on type parsing

* fix json_value stuff?

* misc fixes in insert

* fix dedupFinalTable

* add testDedupRaw

* full e2e test

* type parsing: gave up and mirrored the dbt code structure to avoid bugs

* type parsing - more cleanup

* handle column name collisions

* handle tablename collisions...?

* comments

* remove original ns/name from quotedstream

* also javadoc

* remove redundant method

* fix table rename

* add incremental append test

* add full refresh append test

* comment

* call T+D sql in a reasonable location for standard inserts

* add config option

* use config option here

* type parsing - fix fromJsonSchema

* gate everything

* log query + runtime

* add spec option temporarily

* Raw Table Updates

* fix more stuff

* first big pass at toDialectType

* no quotes

* wrap everything in quotes

* resolve some TODOs

* log sql statement in tests

* overwriteFinalTable returns optional

* minor clean up

* add raw dataset override

* try to preserve the original namespace for t+d?

* write to the raw table correctly

* update todos

* write directly to raw table

this is kind of dumb because we're still trying to do tmp table operations,
and we still don't ack state until the end of the entire sync.

* standard inserts write to raw table correctly

* imports + log statements

* move logs + add comment

* explicitly create raw table

* move comment to better place

* Typing issues

* bash attempt

* formatting updates

* formatting updates

* write to the airbyte schema by default unless overriden by config options

* standard inserts truncate raw table at start of sync

* full refresh overwrite will overwrite correctly!

* fix avro record schema parsing

* better raw table recreate

* rename raw table to match standard inserts

* full refresh overwrite does tmp table things

* small clean up

* small clean up

* remove errors entry if no errors

* pull out destination config into singleton

* clean up singleton stuff

* make sure dest config exists when trying to do lookups

* avoid stringifying null

* quick thoughts on alter table

* add basic cdc testcase

* tweak cdc test setup

* rename raw table to match standard inserts

* minor tweak

* delete exact sql string assertions

* switch to JSON type

* minor cleanup

* sql whitespace changes

* explain cdc deletions

* GCS Staging Full Refresh create temp table

* assert schema

* first out of order cdc test

* add another cdc test case (currently failing)

* better test structure

* make this work

* oops, fix test

* stop trying to delete deletion records

* minor improvements to code+test

* enable concurrent test runs on integration test

* move stuff to static initializer

* extract utility method

* formatting

* Move conditional to the base java package, replace conditionals which did not use the typing and deduping flag but should have been.

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* switch back to empty list; write big assert

* minor wording tweaks

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* DestinationConfigTest

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* formatting

* remove ParsedType

* 🤖 Auto format destination-gcs code [skip ci]

* 🤖 Auto format destination-bigquery code [skip ci]

* tests verify every data type

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* full update with all data types

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* move stuff to new base lib

* 🤖 Auto format destination-gcs code [skip ci]

* Automated Commit - Formatting Changes

* 🤖 Auto format destination-bigquery code [skip ci]

* fix test

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* asserts in dedupFinalTable

* better asserts in dedupRawTable

* [wip] test case for all data types

* 🤖 Auto format destination-gcs code [skip ci]

* 🤖 Auto format destination-bigquery code [skip ci]

* AirbyteTypeTest

* Automated Commit - Formatting Changes

* remove comments

* test chooseOneOf

* slightly better test output

* Automated Commit - Formatting Changes

* add some awful pretty print code

* more comment

* minor tweaks

* verify array/object type

* fix test

* handle deletions more correctly

* test toDialectType

* Destinations v2: better namespace handling (#27682)

* [wip] better namespace handling

* 🤖 Auto format destination-bigquery code [skip ci]

* wip also implement in gcs

* get gcs working (?)

* 🤖 Auto format destination-bigquery code [skip ci]

* remove duplicate method

* 🤖 Auto format destination-bigquery code [skip ci]

* fixed my code style settings

* make ci happy?

* 🤖 Auto format destination-bigquery code [skip ci]

* make ci happy?

* remove incorrect test

* blank line change

* initialize singleton

---------

Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>

* reset args correctly

* Automated Commit - Formatting Changes

* more bash stuff

* parse implicit structs

* initialize singleton in more tests

* Automated Commit - Formatting Changes

* I missed this namespace handling thing

* test more schemas

* fix singular types specified in arrays

* Automated Commit - Formatting Changes

* disable test for unimplemented feature

* initialize singleton

* remove spec options; changelogs+metadata

* randomize namespace

* also bump dockerfile

* unremove namespace sanitizing in legacy mode

* ... disable the correct test

* even more unit test fixes!

* move integration test to integration tests

---------

Co-authored-by: Cynthia Yin <cynthia@airbyte.io>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: edgao <edgao@users.noreply.github.com>
Co-authored-by: cynthiaxyin <cynthiaxyin@users.noreply.github.com>
2023-06-29 08:44:37 -07:00
Augustin
5b8200181c connectors-ci: deprecate slash test (#27200) 2023-06-14 18:19:13 +02:00
Ryan Fu
769ed3efa7 Adds PartialAirbyteMessage overhead & removes serialize/deserialize on CSV writer (#27222)
* No Op change to see if Connectors Base is unrelated

* Isolated test to see the memory used after processing CSV writer

* Changes record writer

* Removed Json.serialize on string

* Turn on Snowflake Async

* Fixed emittedAt timestamp in milliseconds

* Add comments. Undo Evan build changes.

* Allow remote debugging.

* Adds calc for PartialAirbyteRecordMessage memory overhead

* Revert back airbyte build architecture

* Passes in the parsed emittedAt to the CSVwriter

* Cleans up lingering comments

* Performance enhancement - increase the available JVM memory known by the container

* Serializes the data earlier to avoid normalization issues and lowers optimal batch size

* Passes messageString instead of serializing the data due to memory overhead causes OOM early

* Removes memory overhead of record data

* Cleans up PR, fixes memory issues & increases throughput by lowering optimal batch size

* Add calculation comment & removes procps from docker install

* Automated Commit - Format and Process Resources Changes

* Turns on Async for testing purposes

* Moves up implementation higher up the classes to remove unnecessary no-op

* Automated Commit - Format and Process Resources Changes

* Adds TODOs for migrating buffers to use only serialized records

* Breaks down calc & throws UnsupportedOperationException

* Adds TODO for migration all destinations to use the getDataRow(id, formattedString, emittedAt) to avoid unnecessary ser-de overhead

---------

Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: ryankfu <ryankfu@users.noreply.github.com>
2023-06-12 18:27:13 -07:00
Charles
4ac62f3b4f Async Snowflake Destination (#26703)
* snowflake at end of coding retreat week

* turn off async snowflake

* add comment

* Fixed test to point to the ShimMessageConsumer instead of AsyncMessageConsumer

* Automated Change

---------

Co-authored-by: ryankfu <ryan.fu@airbyte.io>
Co-authored-by: ryankfu <ryankfu@users.noreply.github.com>
2023-05-31 23:01:36 +00:00
Charles
223f698516 encapsulate trigger logic for flush workers (#26706) 2023-05-31 09:59:05 -07:00
Charles
480a43b9ab grab bag of non-controversial clean up tasks (#26702) 2023-05-31 08:53:27 -07:00
Charles
567f839834 Serialized Message Consumer (#26700) 2023-05-31 08:14:21 -07:00
Edward Gao
cf2ded2bbb Destination Bigquery: small tweak to clarify logs (#26585)
* make logs less misleading

* version bumps + changelog

* tweak wording
2023-05-25 18:20:40 +00:00
Augustin
fd3655707e connectors-ci: add finalize_build logic to handle custom Dockerfiles (#26489) 2023-05-25 09:32:19 +02:00
Charles
5f3ed16408 hide queue inside MemoryBoundedLinkedBlockingQueue (#26375)
Hiding the actual java Queue has an inner class to avoid the chance that someone tries to use native queue methods that we haven't overridden. Good thing that I did this too, because one of the changes we made during hack dayz wasn't reflected in our current feature branch. We need to override poll(time, unit) not just poll. This PR makes sure we won't make that mistake again!
2023-05-22 18:04:05 -07:00
Davin Chia
1ed32e55bc Async Code V0: Flush Worker Improvements. (#26384)
Incorporate the changes from #26178 .

- Update queue method from setMaxMemory to addMaxMemory.
- Update work retrieval logic to only assign work
  - if there are free worker threads
  - to account for in-progress worker threads
2023-05-22 17:53:10 -07:00
Davin Chia
a7442e3c2d Async Destination V0: Async Stream Consumer (#26366)
Follow up after #26324 .

Introduce the AsyncStreamConsumer.

After this, one more PR to add the Staging Consumer changes in.
2023-05-22 13:16:36 -07:00
Davin Chia
988ce24b3f Async Destination V0 - Split up BufferManager (#26331)
Follow up to #26324 - here we split up the BufferManager and add tests and comments.

- Split up the buffer manager class into -> BufferManager, BufferEnqueue and BufferDequeue.
- Move all buffer related code to the buffers package.
- Rename test classes to match this split.
- Add java docs and tests as part of this split.
- Simplify the BufferDequeue interface to return a set streams representing the buffered streams instead of the underlying map of buffers. This lets us keep the memory queue package private.
- all getYMethods now return Optionals for better error handling. This would have resulted in NPEs previously.
2023-05-22 11:16:12 -07:00
Augustin
80032f73f9 connectors-ci: deprecate slash publish (#25865) 2023-05-22 10:10:56 +02:00
Davin Chia
8bfbef23cb Async Code V0 (#26324)
Split out the smallest set of reasonable changes from #26086 .

My goal was to split out the interface, as well as show how the interface it's meant to be used.

Follow up PRs:
- Split out classes from BufferManager and add more tests there.
- Add in the AsyncConsumer with tests.
- Add in the StagingConsumer factory.
2023-05-20 13:41:54 -07:00
Augustin
5c5eab0308 connectors-ci: fix postgres integration testing (#25942) 2023-05-11 21:19:29 +02:00
Jeff Cowan (Airbyte)
79db9f8e68 Clean up destination bases (#25346)
Changes in this refactor PR
* Use the proper interface name for the OnStartFunction
* Use the proper interface name for the OnCloseFunction
* Create and use a proper interface name for the FlushBufferFunction
* Create and use a proper interface name for the BufferCreateFunction
* Mostly naming consistency changes. These are things caught in static, compile time checks so should be low risk.

---------

Co-authored-by: jcowanpdx <jcowanpdx@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-04-27 08:47:02 -07:00
Jonathan Pearlin
a38af089e9 Track stream status in source (#24971)
* WIP Track stream status in source

* Revert formatting

* Revert formatting changes

* Remove unnecessary file

* Automated Change

* Automated Change

* Use new stream status trace message

* Rename class

* Remove unnecessary import

* Formatting

* Add tests

* Fix compile issues

* Automated Commit - Formatting Changes

* Remove TODO

* Fix compilation error

* Split STOPPED into INCOMPLETE and COMPLETE

* Remove unused import

* Changelog updates for source-postgres

* Remove unused import

* auto-bump connector version

---------

Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-04-26 15:14:25 -05:00
Edward Gao
7abda87840 destination bigquery: run normalization inside container if env var is set (#25097)
* super hacky start

* also check that we're writing

* v0 convert normalization logs to airbytemessage

* add start+end logs

* aggregate errors into a single trace?

* pipefail; quick tweaks to log parser

* make spotbugs happy

* more comments, uncomment env var check

* copy in SentryExceptionHelper

* final fixes

* write tests + fix bugs

* move to base-java

* remove outdated comment

* fix spotbugs

* Automated Change

* minor version bump

* changelog

* fix behavior when env var not set

* run normalization even if destination fails

* better logic

* better logging

* oops

* move to base-java

* rebump version

* Automated Change

* auto-bump connector version

* wtf how did this work previously

* auto-bump connector version

---------

Co-authored-by: edgao <edgao@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-04-25 20:11:57 +00:00
Rodi Reich Zilberman
0bab1756b8 Rename airbyte-config module (#24885)
* rename airbyte-config module

* Automated Commit - Formatting Changes

* sanity

* update import

* update import

* update script

* update script

* update script

* update script

* Automated Change

* Automated Change

* Automated Change

* Automated Change

* update awsdatalake icon

* point slash commands to new path

* sanity

* Automated Commit - Formatting Changes

* sanity

* Automated Change

* Automated Change

* sanity

---------

Co-authored-by: rodireich <rodireich@users.noreply.github.com>
2023-04-06 10:47:30 -07:00
Rodi Reich Zilberman
cd928a7844 Fix all tests to pass on local and CI environments (#24683)
* test docker behavior on CI env

* Automated Change

* test docker behavior on CI env

* Make all unit and integration tests in source-postgres pass locally

* Fix mysql ssh integration test

* Fix failing test

* Fix source-mssql build

* source-mssql runss tests locally.
Fix compilation errors

---------

Co-authored-by: rodireich <rodireich@users.noreply.github.com>
2023-04-03 12:30:11 -07:00
Edward Gao
05860064f8 Staging destinations: Fail fast on error during periodic checkpoint (#24671)
* rethrow exception for fail fast

* version bumps + changelog

* auto-bump connector version

* bump versions

* regenerate

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-03-29 23:49:14 +00:00
Lake Mossman
b1fbd6f33f Simplify Github and Postgres forms #2 (#24255)
* add grouping and collapsing fields to postgres source

* add auth group to github source connector

* revert postgres field order changes and adjust group of schemas field

* inject group into ssh tunnel spec for postgres only, through overloaded methods

* Automated Change

* bump Dockerfile versions and update changelogs

* bump strict encrypt version as well

* fix postgres acceptance test

* fix acceptance test again

* fix all postgres acceptance tests

* add newline

* undo other changes to postgres readme file

* add security group to tunnel_method in expected_spec.json

* bump version of strict encrypt

* manually bump versions in seed files

---------

Co-authored-by: lmossman <lmossman@users.noreply.github.com>
2023-03-22 11:56:17 -07:00
oneshcheret
f6bcc4914f Postgres source: add integration with data dog (#21533)
* Source postgres: add dd for env running locally

* Source postgres: add dd for running in cloud

* auto-bump connector version

* Source postgres: bump postgres strict-encrypt version

* Source postgres: filter datadog agent env variables just for postgres source

* Source postgres: format

* Source postgres: clean code

* Source postgres: pass java opts for all connectors

* Source postgres: temp removing dd agent from image

* Source postgres: add dd agent to image

* Source postgres: temp revert adding dd env variable

* Source postgres: temp hardcoded dd env variable

* Source postgres: temp hardcoded dd env variable

* Source postgres: temp hardcoded dd env variable

* Source postgres: temp hardcoded dd env variable

* Source postgres: temp hardcoded dd env variable

* Source postgres: temp hardcoded dd env variable

* Source postgres: temp hardcoded dd env variable

* Source postgres: temp hardcoded dd env variable

* Source postgres: temp removing hardcoded dd env variable

* Source postgres: temp added hardcoded dd env variable

* Source postgres: temp added hardcoded dd env variable

* Source postgres: temp added hardcoded dd env variable

* Source postgres: rename to java_opts and pass data dog host

* Source postgres: add vars to kube pods

* Source postgres: add vars to kube pods

* Source postgres: add vars to kube pods

* Source postgres: add Trace to more methods

* Source postgres: add Trace to more methods

* Source postgres: add Trace to more methods

* Source postgres: temp reverting service name removing

* Source postgres: temp reverting service name removing

* Source postgres: temp reverting service name removing

* Source postgres: temp adding trace to integration runner

* Source postgres: temp adding trace to integration runner

* Source postgres: bump postgres source dd version

* Source postgres: bump postgres source dd version

* Source postgres: revert temp changes

* Source postgres: merge with master

* Automated Commit - Formatting Changes

* Source postgres: move dd java agent to base java

* Source postgres: move dd java agent to base java

* Source postgres: clean up

* Source postgres: clean up

* Automated Change

* Source postgres: clean up

* Source postgres: bump version

* Source postgres: bump version for test

* Source postgres: temp bump version

* Source postgres: bump version

* Automated Change

* Source postgres: bump version

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: sashaNeshcheret <sashaNeshcheret@users.noreply.github.com>
2023-03-21 22:16:09 +05:30
Lake Mossman
2d3c48da8d Revert "Simplify postgres and GitHub forms (#24127)" (#24219)
This reverts commit fcc80cb5be.
2023-03-18 19:22:38 -07:00
Lake Mossman
fcc80cb5be Simplify postgres and GitHub forms (#24127)
* add grouping and collapsing fields to postgres source

* add auth group to github source connector

* revert postgres field order changes and adjust group of schemas field

* inject group into ssh tunnel spec for postgres only, through overloaded methods

* Automated Change

* bump Dockerfile versions and update changelogs

* bump strict encrypt version as well

* fix postgres acceptance test

* fix acceptance test again

---------

Co-authored-by: lmossman <lmossman@users.noreply.github.com>
2023-03-17 23:02:03 +00:00
Ryan Fu
85391864f7 Ryan/periodic buffer flush (#23931)
* Added support for periodic buffer flush with tests and uses env variable

* Improves code readability and encapulates testing logic

* Removed demo changes and created const for tests

* Updated constructor to reuse method signature

* Increases Snowflake parallel integration forks

* Bumps version number, fixes linting issues and constant format

* Generate seed spec
2023-03-10 21:43:39 +00:00
Sherif A. Nada
e85eda088e Remove ExtendedNameTrasformer (#23655) 2023-03-07 17:22:08 -08:00
Ryan Fu
32ae1b0c94 Logging recordWriter and onStreamFlush completion (#23360)
* Adds additional logging when flushing buffer and writing records

* Removes logging for writeRecord since this will explode log lines

* Added logging when uploading records to stage/bucket

* Fixes log lines to properly capture when records have been uploaded

* Bumps version and fixes logging message to more accurately reflect logic

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-02-27 11:49:58 -08:00
Rodi Reich Zilberman
49f97f1142 Integration Branch for DB/DW Sources team for Feb '23 code freeze (#23185)
* source-snowflake: use a safer method for parsing a BigInteger cursor value (#22358)

* use a safer method for parsing a BigInteger cursor value

* Add testing

* fix format change

* Fix failing integration tests

* Try removing the failing incremental test

* Try removing the failing incremental test

* Fix failing test

* Add metadata to connector logs (log level, class name, method name and line number) (#23105)

* Issue #17861 Add labels, class, method name and line numbers to connector logs

* Refactored unit test

* fix for warning about UTF8 charset in test class

---------

Co-authored-by: prateekmukhedkar <prateek@airbyte.io>

* Update docker image and release notes

* auto-bump connector version

* manually bump version on spec

---------

Co-authored-by: Prateek Mukhedkar <123108018+prateekmukhedkar@users.noreply.github.com>
Co-authored-by: prateekmukhedkar <prateek@airbyte.io>
Co-authored-by: Sergio Ropero <sergio@airbyte.io>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2023-02-22 11:59:23 -08:00
Peter Hu
e5eac0a4cc use published protocol models jar (#22498)
* add airbyte-protocol to deps.toml

* use published protocol jar for platform

* use published protocol jar for connectors

* point at published jar

* fix dep

* bump gcs storage

* fix build failures in standard-source-test

* fix deps

* downgrade alloy db because it is missing strictness tests

* Revert "downgrade alloy db because it is missing strictness tests"

This reverts commit cc6089d053.

---------

Co-authored-by: cgardens <charles@airbyte.io>
2023-02-13 12:50:43 -06:00