airbyte

mirror of synced 2025-12-30 03:02:21 -05:00

Author	SHA1	Message	Date
Cole Snodgrass	2e099acc52	update headers from 2022 -> 2023 (#22594 ) * It's 2023! * 2022 -> 2023 --------- Co-authored-by: evantahler <evan@airbyte.io>	2023-02-08 13:01:16 -08:00
Davin Chia	4bfaebdd33	Performance: Cache instead of recreating Json Validators. (#22060 ) Today we recreate JSON validators each time we perform json validation. Json validation is run twice for each record: Validation that the record conforms to the general Airbyte Protocol schema. Validation that the record conforms to that Stream's schema. Looking at the code, we create at least 3 objects that are discarded for every single record that passes through the platform. This is both CPU and Garbage Collection inefficient. In particular, creating the validator object is expensive since it parses the entire json schema each time. This CPU/GC inefficiency is true for all code that uses the current Json validator class, which can include Sources and Destinations. - Instead of recreating the schema validators each time, initialise the validators and reuse them. - The JsonSchemaValidator class should be rewritten to clean up it's methods and to cache validators by default. I'm skipping this for now to keep the PR small. I'll revisit in a follow up PR. - Improved the ReplicationWorkerPerformanceTest so the source messages are emitted through the various stream factories for a higher fidelity test.	2023-01-30 15:37:54 -08:00
Jimmy Ma	6660b13ad2	Add Airbyte Protocol V1 support. (#20036 ) * Add Airbyte Protocol V1 support. * Fix VersionedAirbyteStreamFactoryTest * Remove AirbyteMessageMigrationV0 example * Add Protocol Version constants * 🎉Updated normalization to handle new datatypes (#19721) * Updated normalization simple stream processing to handle new datatypes * Updated normalization nested stream processing to handle new datatypes * Updated normalization nested stream processing to handle new datatypes * Updated normalization drop_scd_catalog processing to handle new datatypes * Updated normalization ephemeral test processing to handle new datatypes * fixed more tests for normalization * fixed more tests for normalization * fixed more tests for normalization * fixed more tests for normalization * fixed more issues * fixed more issues (clickhouse) * fixed more issues * fixed more issues * fixed more issues * added binary type processing for some DBs * cleared commented code and moved some hardcodes to processing as macro * fixed codestyle and cleared commented code * minor refactor * minor refactor * minor refactor * fixed bool cast error * fixed dict->str cast error * fixed is_combining_node cast py check * removed commented code * removed commented code * committed autogenerated normalization_test_output files * committed autogenerated normalization_test_output files (new files) * refactored utils.py * Updated utils.py to use Callable functions and get rid of property_type in is_number and is_bool functions * committed autogenerated normalization_test_output files (new files) * fixed typo in TIMESTAMP_WITH_TIMEZONE_TYPE * updated stream_processor to handle string type first as a wider type * fixed arrays normalization by updating is_simple_property method as per new approaches * format Co-authored-by: Edward Gao <edward.gao@airbyte.io> * Update airbyte protocol migration (#20745) * Extract MigrationContainer from AirbyteMessageMigrator * Add ConfiguredAirbyteCatalogMigrations * Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations * Enable ConfiguredAirbyteCatalog migration * Fix tests * Remove extra this. * Add missing docs * Typo Co-authored-by: Edward Gao <edward.gao@airbyte.io> * Data types update: Implement protocol message migrations (#19240) * Extract MigrationContainer from AirbyteMessageMigrator * Add ConfiguredAirbyteCatalogMigrations * Add ConfiguredAirbyteCatalog to AirbyteMessageMigrations * Enable ConfiguredAirbyteCatalog migration * set up scaffolding * [wip] more scaffolding, basic unit test * minimal green code * [wip] add failing test for other primitive types * correct version number * handle basic primitive type decls * add implicit cases * add recursive schema * formatting * comment * support not * fix indentation * handle all nested schema cases * handle boolean schemas * verify empty schema handling * cleanup * extract map * code organization * extract method * reformat * [wip] more tests, minor fix type array handling * corrected test * cleanup * reformat * switch to v1 * add support for multityped fields * missed test case * nested test class * basic record upgrade * implement record upgrades * slight refactor * comments+clarificationso * extract constants * (partly) correct model classes * add de/ser * formatting * extract constants * fix json reference * update docs * switch to v1 models * fix compile+test * add base64 handling * use vnull * Data types update: Implement protocol message downgrade path (#19909) * rough skeleton for passing catalog into migration * basic test * more scaffolding * basic implementation * add primitives test * add in other tests (nested fields currently failing) * add formats * impleent oneOf handling * formatting * oneOf handling * better tests * comments + organization * progress * basic test case * downgrade objects, ish * basic array implementation * handle numeric failure * test for new type * handle array items * empty schema handling * first pass at oneof handling * add more tests+handling * more tests * comments * add empty oneof test case * format + reorganize * more reorganize * fix name * also downgrade binary data * only import vnull * move migrations into v1 package * extract schema mutation code * comment * extract schema migration to new class * extract record downgrade logic for future use * format * fix build after rebase * rename private method for consistency * also implement configuredcatalog migrations >.> * quick and dirty tests * slight cleanup * fix tests * pmd * pmd test * null check on message objects * maybe fix acceptance tests? * fix name * extract constants * more fixes * tmp * meh * fix cdc acc tests * revert to master source-postgres * remove log messages * revert other misc hacks * integers are valid cursors * remove unrelated change * fix build * fix build more? * [MUST REVERT] use dev normalization * capture kube logs * also here? * no debug logs? * delete dup from merging * add final everywhere * revert test changes Co-authored-by: Jimmy Ma <jimmy@airbyte.io> * On-the-fly migrations of persisted catalogs (#21757) * On the fly catalog migration for normalization activity * On the fly catalog migration for job persistence * On the fly migration for standard sync persistence * On the fly migration for airbyte catalogs * Refactor code to share JsonSchema traversal * Add V0 Data type search function * PMD and Format * Fix getOrInsertActorCatalog and ConfigRepositoryE2E tests * Null-proofing CatalogMigrationV1Helper * More null checks * Fix test * Format * Add data type v1 support to the FE * Changes AC test check to check exited ps (#21672) some docker compose changes no longer show exited processes. this broke out test this change should fix master tested in a runner that failed * Move wellknown types mapping to the utility function * use protocolv1 normalization --------- Co-authored-by: Topher Lubaway <asimplechris@gmail.com> Co-authored-by: Edward Gao <edward.gao@airbyte.io> * Update protocol support range (#21996) * bump normalization version to 0.3.0 * Add version check on normalization (#22048) * Add normalization min version check * Add visible for testing --------- Co-authored-by: Edward Gao <edward.gao@airbyte.io> Co-authored-by: Eugene <etsybaev@gmail.com> Co-authored-by: Topher Lubaway <asimplechris@gmail.com>	2023-01-30 10:17:49 -08:00
Davin Chia	d95c06d357	Remove unused imports. (#20938 )	2022-12-30 14:39:51 -08:00
Davin Chia	18593d91b5	Remove sneaky throws. (#20931 ) The Java 19 toolchain doesn't like sneaky throws. Not entirely sure why. However, I think it's better practice to not use sneaky throws as it makes it clearer what is throw and where. Example error message when trying to compile the current codebase with Java 19: error: Error during the transformation of 'io.airbyte.validation.json.JsonSchemaValidatorTest'; post-compiler 'lombok.bytecode.SneakyThrowsRemover' caused an exception: java.lang.IllegalArgumentException: Unsupported class file major version 63 at org.objectweb.asm.ClassReader.<init>(ClassReader.java:199) at org.objectweb.asm.ClassReader.<init>(ClassReader.java:180) at org.objectweb.asm.ClassReader.<init>(ClassReader.java:166) at lombok.bytecode.AsmUtil.fixJSRInlining(AsmUtil.java:37) at lombok.bytecode.SneakyThrowsRemover.applyTransformations(SneakyThrowsRemover.java:46) at lombok.core.PostCompiler.applyTransformations(PostCompiler.java:44) at lombok.core.PostCompiler$1.close(PostCompiler.java:87) at jdk.compiler/com.sun.tools.javac.jvm.ClassWriter.writeClass(ClassWriter.java:1508) at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.genCode(JavaCompiler.java:738)	2022-12-30 09:04:26 -08:00
Edward Gao	2392acb845	Enable record schema validation using v1 type system; CI uses MSG to start EC2 runners (#20439 ) * Revert "Revert "RecordSchemaValidator can resolve $ref schemas (#19625)" (#20113)" This reverts commit `86f61a53d3`. * just hardcode build? * sshable instance * pass arg for release oss only * also skip octavia + create PR * update ec2 runner * revert CI test changes * whoops * whoopswhoops	2022-12-14 16:06:21 -08:00
Jimmy Ma	86f61a53d3	Revert "RecordSchemaValidator can resolve $ref schemas (#19625 )" (#20113 ) This reverts commit `56bfdaab6a`.	2022-12-06 11:19:10 -08:00
Edward Gao	56bfdaab6a	RecordSchemaValidator can resolve $ref schemas (#19625 ) * put WellKnownTypes.json in worker * misc random experimentation * finalize * add test * copypasta error * formatting * pmd * other pmd >.> * generate in gradle * better interface + comment * formatting * better dockerfile caching	2022-11-30 07:59:10 -08:00
Evan Tahler	c56ee8d6b3	bump com.networknt:json-schema-validator to latest version (#16619 )	2022-09-14 15:41:53 -07:00
Anne	e9afa9bef3	Error Prone PMD rules (#15010 ) * Implement ErrorProne PMD rules: AssignmentInOperand AvoidAccessibilityAlteration AvoidBranchingStatementAsLastInLoop AvoidCatchingNPE AvoidCatchingThrowable AvoidDuplicateLiterals rule	2022-08-09 15:30:48 -07:00
Davin Chia	7788594e22	Start publishing proper artifacts. (#13484 ) ## What Finale of https://github.com/airbytehq/airbyte/pull/13122. We've renamed all directories in previous PRs. Here we remove the fat jar configuration and add publishing to all subprojects. Explanation for what is happening: Identically named subprojects have the following issues: * publishing as is leads to classpath confusion when the jars with the same names are placed in the Java distribution. This leads to NoClassDefFound errors on runtime. * deconflicting the jar names without changing directory names leads to dependency errors as the OSS jar pom files are generated using project dependencies (suggesting a dependency a sibling subproject in the same repo) that use subprojects group and name as a reference. This means the generated jars look for Jars that do not exists (as their names have been changed) and cannot compile. * the workaround to changing a subproject's name involves resetting the subproject's name in the settings.gradle and depending on the new name in each build.gradle. This increases configuration burden and decreases the ease of reading, since one will have to check the settings.gradle to know what the right subproject name is. See https://github.com/gradle/gradle/issues/847 for more info. * given that Gradle itself doesn't have support for identically named subprojects (see the linked issue), the simplest solution is to not allow duplicated directories. I've only renamed conflicting directories here to keep things simple. I will create a follow up issues to enforce non-identical subproject names in our builds. ## How * Remove fat jar configuration. * Add publishing to all subprojects.	2022-06-06 17:15:25 +08:00
Anne	5f0f106cb6	Limit the number of record schema validations performed (#13351 ) * Better error messages for record schema validations, and validate a maximum of 10 records per stream	2022-05-31 16:58:09 -07:00
Alexandre Girard	3894134d11	Bump year in license short to 2022 (#13191 ) * Bump to 2022 * format	2022-05-25 17:56:49 -07:00
Charles	c1c8675366	Add readmes to all modules (#8893 )	2022-03-13 14:45:36 -07:00
Jenny Brown	c77dd7ad66	Improved error handling (#7571 ) * Improved error handling * Comments	2021-11-03 12:17:52 -05:00
lmossman	b94ee00fd8	Revert "Generate seed connector specs on build (#7501 )" This reverts commit `a534bb2a8f`.	2021-11-03 08:46:43 -07:00
Lake Mossman	a534bb2a8f	Generate seed connector specs on build (#7501 ) * add specs module with logic to fetch specs on build * format + build and add gradle dependency for new script * check seed file for existing specs + refactor * add tests + a bit more refactoring * run gw format * update yaml config persistence to merge specs into definitions * add comment * delete secrets migration to be consistent with master * add dep * add tests for GcsBucketSpecFetcher * get rid of static block + format * DRY up parse call * add GCS details to comment * formatting + fix test * update comment * do not format seed specs files * change signature of run to allow cloud to reuse this script * run gw format * revert commits that change signature of run * fix comment typo Co-authored-by: Davin Chia <davinchia@gmail.com> * rename enum to be distinct from the enum in cloud * add missing dependencies between modules * add readme for seed connector spec generator * reword * reference readme in comment * ignore 'spec' field in newFields logic Co-authored-by: Davin Chia <davinchia@gmail.com>	2021-11-02 22:03:50 -07:00
Charles	ba44f700b9	add final for params, local variables, and fields (#7084 )	2021-10-15 16:41:04 -07:00
Michel Tricot	f25542a145	🎉 Update license for Core (#6479 )	2021-09-27 11:17:17 -07:00
Michel Tricot	1773e41e47	Shorten our headers + adds contributors file (#6478 )	2021-09-27 10:45:50 -07:00
Charles	23d27bcf34	split replication and normalization into separate temporal activities (#3136 )	2021-04-30 15:55:39 -07:00
Yury Koleda	b1061e32d9	🎉 Add MongoDB Source Signed-off-by: fut <fut.wrk@gmail.com>	2021-03-08 14:27:14 -08:00
Charles	e7edb2c858	Adding incremental to the catalog data model (#998 ) * Add ConfiguredAirbyteCatalog and ConfiguredAirbyteStream	2020-11-18 14:15:59 -08:00
Christophe Duong	0fac6a99b0	Move JsonSchemaValidator into its own module airbyte-json-validation (#234 ) (#647 )	2020-10-20 22:45:31 +02:00

24 Commits