airbyte

mirror of synced 2025-12-21 02:51:29 -05:00

Author	SHA1	Message	Date
Charles	1187e9e687	remove unused deps (#4512 ) Co-authored-by: Davin Chia <davinchia@gmail.com>	2021-07-12 13:55:47 -07:00
Sherif A. Nada	aca70d0f0d	🐛 platform: Fix silent failures in sources (#4617 )	2021-07-07 22:39:41 -07:00
Christophe Duong	75a1dda07e	🎉 New BigQuery destination with Structured/Repeated Records (#4176 )	2021-06-23 16:19:36 +02:00
Jared Rhizor	b4793b2510	add AIRBYTE_ENTRYPOINT for kubernetes support (#3973 ) * add AIRBYTE_ENTRYPOINT for kubernetes support * bump versions * bump version in seed * Update generic template * keep scaffold sources at 0.1.0 * add missing newline * handle python base versions correctly * re-bump mysql and postgres sources * re-bump snowflake destination * add skip tests option * switch to running tests * reverse conditional to make it safer * fix publish to include the test running * fix iterable version * fix file generation Co-authored-by: Sherif A. Nada <snadalive@gmail.com>	2021-06-09 13:01:45 -07:00
Andrii Leonets	213fae17a1	MySQL source: Add comprehensive data type test (#3810 )	2021-06-07 14:01:02 +03:00
masonwheeler	8dadd1cebd	Oracle destination implementation (#3498 ) Working implementation of Oracle destination Co-authored-by: cgardens <giardina.charles@gmail.com>	2021-06-03 16:27:09 -06:00
LiRen Tu	c13b9883e8	🎉 New destination: S3 (#3672 ) * Update README icon links * Update airbyte-specification doc * Extend base connector * Remove redundant region * Separate warning from info * Implement s3 destination * Run format * Clarify logging message * Rename variables and functions * Update documentation * Rename and annotate interface * Inject formatter factory * Remove part size * Fix spec field names and add unit tests * Add unit tests for csv output formatter * Format code * Complete acceptance test and fix bugs * Fix uuid * Remove generator template files They belong to another PR. * Add unhappy test case * Checkin airbyte state message * Adjust stream transfer manager parameters * Use underscore in filename * Create csv sheet generator to handle data processing * Format code * Add partition id to filename * Rename date format variable	2021-06-03 09:40:51 -07:00
Charles	aa6afb7282	Checkpointing: Worker use destination (instead of source) for state (#3290 ) * Migrate BufferedStreamConsumer users (e.g. all JDBC destinations, MeiliSearch) (#3473) * Add checkpointing test cases in Acceptance Tests (#3473) * Add testing for emitting state in Destination Standard Test (#3546) * Migrate BQ to support checkpointing (#3546) * Migrate copy destinations support checkpointing (#3547) * Checkpointing: Migrate CSV and JSON destinations (#3551)	2021-05-25 16:47:40 -07:00
Davin Chia	99c7ac27ca	Bugfix: BufferedStreamConsumer. (#3387 ) * Format. * Bump versions.	2021-05-13 16:38:23 +08:00
Jared Rhizor	6fd8e00ad8	don't split lines on LSEP unicode characters when reading lines in destinations (#3327 ) * use strict JSONL definition of new lines in destinations * failing test case * use next instead of nextLine * add \n in string for test * bump destination versions * bump to even newer version * bump versions in dockerfiles as well * force mysql test to pass	2021-05-10 12:57:12 -07:00
Charles	06599d475d	Write output records from destination to STDOUT (#3274 )	2021-05-07 13:32:39 -07:00
Charles	e4d0707781	Destination Checkpointing: Add StateMessage handing to BufferedStreamConsumer (#3230 )	2021-05-07 13:05:52 -07:00
Davin Chia	ae9e48a321	Add missing EU region and bump Redshift version. (#3262 ) * Add missing EU region and bump Redshift version. * Fix error where we were checking the entire AirbyteRecordMessage instead of just the data field.	2021-05-06 18:26:09 +08:00
Charles	c512a7ed83	Pre-Work for adding Checkpointing (take 2) (#3188 )	2021-05-03 12:51:39 -07:00
Davin Chia	75dd66438a	Fix master build. (#3174 ) * Fix master build. * Revert "Pre-Work for adding Checkpointing (#3163)" This reverts commit `4cf69ac699`. * Add Mason to auto-assign list.	2021-05-03 11:07:34 +08:00
Charles	4cf69ac699	Pre-Work for adding Checkpointing (#3163 )	2021-05-01 17:21:11 -07:00
Christophe Duong	77ffd74b32	Ignore records that are too big in Redshift destinations (instead of failing) (#2988 ) * Abort sync if one of the part fails to copy to temp table * Check for record size when copying data from s3 to redshift * Handle big record in RedshiftInsertDestination too	2021-04-30 21:04:03 +02:00
Jared Rhizor	a3b4444372	snowflake s3 copy & redshift s3 refactor (#2921 ) * snowflake s3 copy * refactor (some tests still need updating) * revert accidentally removing files * re-add purge * use baseconnector * getconnection logs error * use generic configs for copiers/suppliers/consumers * use stream copier terminology * remove weird delegate generics * some test changes * remove non-ci test that doesn't have a good equivalent atm * misc * finally fixed * tests and fix * add credentials * fix redshift build * respond to comments * fix check * bump versions for redshift and snowflake * fix creds	2021-04-26 09:41:53 -07:00
Charles	159a27f989	fix file clean up for BigQueue (#2570 )	2021-04-23 09:39:34 -07:00
Davin Chia	b9014acfca	:tada Namespace support. Supported source-destination pairs will now sync data into the same namespace as the source. (#2862 ) This PR introduces the following behavior for JDBC sources: Instead of streamName = schema.tableName, this is now streamName = tableName and namespace = schema. This means that, when replicating from these sources, data will be replicated into a form matching the source. e.g. public.users (postgres source) -> public.users (postgres destination) instead of current behaviour of public.public_users. Since MySQL does not have schemas, the MySQL source uses the database as it's namespace. To do so: - Make namespace a field class concept in Airbyte Protocol. This allows the source to propagate namespace and destinations to write to a source-defined namespace. Also sets us up for future namespace related configurability. - Add an optional namespace field to the AirbyteRecordMessage. This field will be set by sources that support namespace. - Introduce AirbyteStreamNameNamespacePair as a type-safe manner of identifying streams throughout our code base. - Modify base_normalisation to better support source defined namespace, specifically allowing normalisation of tables with the same name to different schemas.	2021-04-17 15:33:22 +08:00
Davin Chia	9f16651840	Add Namespace Field. (#2704 ) Add namespace field to the Airbyte Stream in preparation to propagate a source defined namespace to the Destination. This namespace field is then consumed as the destination schema the table is written to. This only applies to JDBC destinations. This is Steps 1 - 4 of the namespace tech spec, seen at https://docs.google.com/document/d/1qFk4YqnwxE4MCGeJ9M2scGOYej6JnDy9A0zbICP_zjI/edit. Some minor refactoring and commenting as I go. * Remove unnecessary test classes as they match Integration tests in terms of what is being tested. They have no real value since the corresponding integration test can be run locally without additional credentials. The main value the classes brings is letting us run tests without building the docker image (the integration tests require doing so). however I feel this benefit is not worth the additional maintenance cost. * Centralise DataArgumentProvider into it's own class for easier maintenance and usability.	2021-04-06 11:50:47 +08:00
Christophe Duong	6c6ea54bb8	Add SupportedDestinationSyncModes to destination specs objects (#2668 ) * Add SupportedDestinationSyncModes to destination specs objects * Bumpversions of destination connectors	2021-03-31 15:20:01 +02:00
Davin Chia	e6be760a8f	AcceptTracked should accept an AirbyteRecordMessage. (#2662 ) The acceptTracked method should accept an AirbyteRecordMessage instead of a generic AirbyteMessage. This allows us to centralise checking for a record and makes the interface easier to understand. We can also consolidate checking if a received message has a corresponding stream. However that's more involved and can be revisited at a later date.	2021-03-31 14:35:52 +08:00
Davin Chia	51ee38845b	Refine Destination Interfaces (#2637 ) Clean up Destination interfaces, with the goal of less repeated code and hopefully better readability for the next person. * Rename the write method in the Destination interface to getConsumer to better reflect that the method is not writing itself, but returning a consumer that will write upon accepting a message. This was consuming me when I was reading the code. * Remove generics from the FailureTrackingConsumer and the DestinationConsumer. Besides tests, there are no generic uses of the FaliureTrackingConsumer. Replace this with the AirbyteMessage to make explicit this is what the FailureTrackingConsumer use cases. Although this does restrict our future use of the FailureTrackingConsumer class, I'd rather limit this now and re-inject generics once we have more use cases. This was also confusing me - I kept on wondering what other data type can be consuming this interface. * Rename FailureTrackingConsumer to FailureTrackingAirbyteMessageConsumer to better reflect how the consumer is meant to be used strictly as a DestinationConsumer (it implements the interface). * Rename DestinationConsumer to AirbyteMessageConsumer. In a subsequent PR, I plan to consolidate logic to error if the received Airbyte message is not of Record type, into the FailureTrackingAirbyteMessageConsumer class.	2021-03-30 13:30:15 +08:00
Christophe Duong	8a29584125	☝🏼Destinations supports destination sync mode (#2460 ) * Handle destination sync mode in destinations * Source & Destination sync modes are required (#2500) * Provide Migration script making sure it is always defined for previous sync configs	2021-03-26 20:23:48 +01:00
Davin Chia	a5a142d085	Implement majority of more efficient Redshift Copy Strategy. (#2547 ) Instead of inserts, we write the data to S3 and issue a COPY command copying from S3 into redshift. Use a single file as its sufficiently performant and we do not want to introduce file destination related operations yet. Use an open source library for uploads as AWS does not natively support streaming loads. My intention with this PR is to first implement the meat of the write-and-copy strategy. This is mainly centered around the RedshiftCopier class. I plan hook up the RedshiftCopier to the actual Destination class, and implement all the plumbing, in a follow up PR. Co-authored-by: Davin Chia <davinchia@Davins-MacBook-Pro.local>	2021-03-24 22:26:28 +08:00
Charles	9a81bd6e5c	MeiliSearch Destination (#1964 )	2021-02-08 18:44:55 -08:00
Charles	285c176a93	remove naming transformer from destination interface (#1953 )	2021-02-04 15:46:53 -08:00
Charles	aadfae24bd	Iterator-based JDBC Source (and Redshift bugfix) (#1887 )	2021-02-02 17:14:14 -08:00
Charles	8184441892	Add logging to JdbcSource and JdbcDestination (#1798 ) * gives better visibility into the progress of phase of both the source and destination. * 1 log per every 10K records in the source * destination logs each step in destination including setting up tmp tables, creating final tables, cleaning up tmp tables	2021-01-22 22:13:10 -08:00
Christophe Duong	7a741d24b9	Disable ExtendedNameTransformer and use StandardNameTransformer instead (#1786 ) * Disable ExtendedNameTransformer and use StandardNameTransformer instead * Fix acceptance test	2021-01-22 20:53:12 +01:00
Charles	451846ccaf	add more logging to jdbc destination (#1775 )	2021-01-21 16:37:16 -08:00
Christophe Duong	ebc24f85fe	Rename destination tables (#1737 )	2021-01-20 17:37:00 +01:00
Christophe Duong	194d6cb5d0	Add metadata prefix to destination internal columns (#1708 ) * Add metadata prefix to destination internal columns * Bumpversion all destinations for new normalization behavior * Fix acceptance tests	2021-01-19 17:21:15 +01:00
hudsondba	a254234b0a	Increasing BATCH_SIZE and timeout sync (#1593 ) Co-authored-by: Hudson dos Santos <hudson@flieber.com>	2021-01-11 09:19:47 -08:00
Charles	f9b63d0a9e	Update Destination Abstractions (#1456 )	2021-01-07 10:37:10 -08:00
Christophe Duong	6d25e65d57	Renaming some classes to be less specific to SQL destinations (#1283 )	2020-12-21 15:16:57 +01:00
Charles	25689eea56	add incremental to jooq source (and postgres) (#1172 )	2020-12-08 21:14:11 -08:00
Christophe Duong	4ca8760e51	Chris/snowflake incremental (#1194 ) * Refactor Snowflake destination and support incremental (#1125)	2020-12-04 19:52:33 +01:00
Christophe Duong	d06392e900	Redshift Destination & refactoring to introduce destination-jdbc Closes #193 Closes #1126	2020-12-03 18:07:46 +01:00
Charles	cda619697e	add cleaned names to api interface (#1138 )	2020-11-30 21:30:16 -08:00
Christophe Duong	206d3cbea8	Naming conventions managed in destinations (#1060 ) Handled field and table identifier conversions as necessary in order to successfully in destinations	2020-11-25 18:53:23 +01:00
Charles	e7edb2c858	Adding incremental to the catalog data model (#998 ) * Add ConfiguredAirbyteCatalog and ConfiguredAirbyteStream	2020-11-18 14:15:59 -08:00
Jared Rhizor	8c7129c307	fix format build (#877 )	2020-11-10 09:13:21 -08:00
Sherif A. Nada	9305461fec	Add _raw suffix to non-normalized destination table names (#874 )	2020-11-10 08:31:27 -08:00
Jared Rhizor	e1d7add50a	mostly incremental builds (#817 ) * support cached builds for base -> base-java -> snowflake * use plugin for image building * fix matching on COPY from * remove docker.gradle * tmp commit * update connectors * finish rest of build files * fix ide errors * more build fixes * clean up * clean up for new soruces * fix spotless * fix flake problems * add recommended empty file * python caching * fixes upon review * clean up docker and build test files * clean up python * clean up * fix integration test dependencies * fix standard tests * fix * remove symlink * re-add requirements to fix normalizatioon build * fix symlink * fix dumbest build problem of all * add missing integration test def * fix missing dep * remove class exclusion * move trim so null source versions are allowed * rename map * fix hardcoded value * remove unnecessary dep * use dashes for salesforce package name * fix typo * DRY and fix test image name * Fix edit * assert string is not empty * build integration test image only for integrationTest * move code generator to tools and rename docker build tasks * make source test depend on integration test build, not the other way * remove guard because the docker build should exist before the integrationtest is applied * remove comment * DRY up airbyte-source-test * fix plugin compilation * add missing dependency * rename getTaggedImage to getDevTaggedImage * fix test vs main docker build bug	2020-11-09 11:00:11 -08:00
Charles	5f45aef149	Add support for sources to java base (#749 )	2020-10-30 10:33:51 -07:00
Sherif A. Nada	3cea00913d	place test mount roots in /tmp (#623 )	2020-10-29 14:43:33 -07:00
Charles	31013556c3	Integrations Reorganization: Bases (#673 )	2020-10-21 11:23:49 -07:00

1 2 3 4 5

249 Commits