mirror of synced 2026-01-03 06:02:23 -05:00

Files

Edward Gao ba3e39bb0c Destination Bigquery: Scaffolding for destinations v2 (#27268 )

* copy files from edgao branch

* start writing create table statement

* add basic unit test setup

* create a table, probably

* remove outdated todo

* derp, one more column

* ugh

* add partitioning+clustering

* use StringSubstitutor

* substitutions in updateTable

* wip generate update/insert statement

* split up into smaller methods

* handle json types correctly

* rename stuff

* more json_query vs _value stuff

* minor tweak

* super basic test setup

* laying foundation for type parsing

* more stuff

* tweaks

* more progress on type parsing

* fix json_value stuff?

* misc fixes in insert

* fix dedupFinalTable

* add testDedupRaw

* full e2e test

* type parsing: gave up and mirrored the dbt code structure to avoid bugs

* type parsing - more cleanup

* handle column name collisions

* handle tablename collisions...?

* comments

* remove original ns/name from quotedstream

* also javadoc

* remove redundant method

* fix table rename

* add incremental append test

* add full refresh append test

* comment

* call T+D sql in a reasonable location for standard inserts

* add config option

* use config option here

* type parsing - fix fromJsonSchema

* gate everything

* log query + runtime

* add spec option temporarily

* Raw Table Updates

* fix more stuff

* first big pass at toDialectType

* no quotes

* wrap everything in quotes

* resolve some TODOs

* log sql statement in tests

* overwriteFinalTable returns optional

* minor clean up

* add raw dataset override

* try to preserve the original namespace for t+d?

* write to the raw table correctly

* update todos

* write directly to raw table

this is kind of dumb because we're still trying to do tmp table operations,
and we still don't ack state until the end of the entire sync.

* standard inserts write to raw table correctly

* imports + log statements

* move logs + add comment

* explicitly create raw table

* move comment to better place

* Typing issues

* bash attempt

* formatting updates

* formatting updates

* write to the airbyte schema by default unless overriden by config options

* standard inserts truncate raw table at start of sync

* full refresh overwrite will overwrite correctly!

* fix avro record schema parsing

* better raw table recreate

* rename raw table to match standard inserts

* full refresh overwrite does tmp table things

* small clean up

* small clean up

* remove errors entry if no errors

* pull out destination config into singleton

* clean up singleton stuff

* make sure dest config exists when trying to do lookups

* avoid stringifying null

* quick thoughts on alter table

* add basic cdc testcase

* tweak cdc test setup

* rename raw table to match standard inserts

* minor tweak

* delete exact sql string assertions

* switch to JSON type

* minor cleanup

* sql whitespace changes

* explain cdc deletions

* GCS Staging Full Refresh create temp table

* assert schema

* first out of order cdc test

* add another cdc test case (currently failing)

* better test structure

* make this work

* oops, fix test

* stop trying to delete deletion records

* minor improvements to code+test

* enable concurrent test runs on integration test

* move stuff to static initializer

* extract utility method

* formatting

* Move conditional to the base java package, replace conditionals which did not use the typing and deduping flag but should have been.

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* switch back to empty list; write big assert

* minor wording tweaks

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* DestinationConfigTest

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* formatting

* remove ParsedType

* 🤖 Auto format destination-gcs code [skip ci]

* 🤖 Auto format destination-bigquery code [skip ci]

* tests verify every data type

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* full update with all data types

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* move stuff to new base lib

* 🤖 Auto format destination-gcs code [skip ci]

* Automated Commit - Formatting Changes

* 🤖 Auto format destination-bigquery code [skip ci]

* fix test

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-bigquery code [skip ci]

* 🤖 Auto format destination-gcs code [skip ci]

* asserts in dedupFinalTable

* better asserts in dedupRawTable

* [wip] test case for all data types

* 🤖 Auto format destination-gcs code [skip ci]

* 🤖 Auto format destination-bigquery code [skip ci]

* AirbyteTypeTest

* Automated Commit - Formatting Changes

* remove comments

* test chooseOneOf

* slightly better test output

* Automated Commit - Formatting Changes

* add some awful pretty print code

* more comment

* minor tweaks

* verify array/object type

* fix test

* handle deletions more correctly

* test toDialectType

* Destinations v2: better namespace handling (#27682)

* [wip] better namespace handling

* 🤖 Auto format destination-bigquery code [skip ci]

* wip also implement in gcs

* get gcs working (?)

* 🤖 Auto format destination-bigquery code [skip ci]

* remove duplicate method

* 🤖 Auto format destination-bigquery code [skip ci]

* fixed my code style settings

* make ci happy?

* 🤖 Auto format destination-bigquery code [skip ci]

* make ci happy?

* remove incorrect test

* blank line change

* initialize singleton

---------

Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>

* reset args correctly

* Automated Commit - Formatting Changes

* more bash stuff

* parse implicit structs

* initialize singleton in more tests

* Automated Commit - Formatting Changes

* I missed this namespace handling thing

* test more schemas

* fix singular types specified in arrays

* Automated Commit - Formatting Changes

* disable test for unimplemented feature

* initialize singleton

* remove spec options; changelogs+metadata

* randomize namespace

* also bump dockerfile

* unremove namespace sanitizing in legacy mode

* ... disable the correct test

* even more unit test fixes!

* move integration test to integration tests

---------

Co-authored-by: Cynthia Yin <cynthia@airbyte.io>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: edgao <edgao@users.noreply.github.com>
Co-authored-by: cynthiaxyin <cynthiaxyin@users.noreply.github.com>

2023-06-29 08:44:37 -07:00

cdk-dotnet

Revert "Improving docusaurus sidebar generation (#1927 ) (#14369 )" (#14596 )

2022-07-11 15:27:14 -05:00

cdk-python

help-connector-development > using-the-cdk (#25902 )

2023-05-09 08:45:06 -07:00

config-based

help-connector-development > using-the-cdk (#25902 )

2023-05-09 08:45:06 -07:00

connector-builder-ui

Connector builder: Add documentation for unified oauth authenticator (#27061 )

2023-06-22 12:20:32 +02:00

testing-connectors

Bnchrch/cat/backwards fail removed prop (#27685 )

2023-06-28 08:21:27 -07:00

tutorials

Docs: Metadata add docs ahead of merge (#26015 )

2023-05-11 16:10:13 -05:00

best-practices.md

Update best-practices.md (#24921 )

2023-04-05 16:33:24 -04:00

connector-metadata-file.md

Update documentation for support url to documentation url (#27375 )

2023-06-14 14:24:43 -05:00

connector-specification-reference.md

add documentation about new simplify connector form fields (#24125 )

2023-05-18 10:26:21 -07:00

debugging-docker.md

Destination Bigquery: Scaffolding for destinations v2 (#27268 )

2023-06-29 08:44:37 -07:00

README.md

connectors-ci: deprecate slash test (#27200 )

2023-06-14 18:19:13 +02:00

schema-reference.md

Docs: add schema stream reference instructions (#25936 )

2023-05-11 19:40:39 -04:00

ux-handbook.md

Docs: updated links from .io to .com (#23652 )

2023-03-06 17:27:55 +01:00

README.md

Connector Development

Airbyte supports two types of connectors: Sources and Destinations. A connector takes the form of a Docker image which follows the Airbyte specification.

To build a new connector in Java or Python, we provide templates so you don't need to start everything from scratch.

Note: you are not required to maintain the connectors you create. The goal is that the Airbyte core team and the community help maintain the connector.

Airbyte provides some Connector Development Kits (CDKs) to help you build connectors.

If you need help from our team for connector development, we offer premium support to our open-source users, talk to our team to get access to it.

Connector builder UI

The connector builder UI is based on the low-code development framework below and allows to develop and use connectors without leaving the Airbyte UI (no local development environment required).

Low-code Connector-Development Framework

You can use the low-code framework to build source connectors for REST APIs by modifying boilerplate YAML files.

Python Connector-Development Kit (CDK)

You can build a connector very quickly in Python with the Airbyte CDK, which generates 75% of the code required for you.

Community maintained CDKs

The Airbyte community also maintains some CDKs:

The Typescript CDK is actively maintained by Faros.ai for use in their product.
The Airbyte Dotnet CDK comes with C# templates which can be used to generate 75% of the code required for you

The Airbyte specification

Before building a new connector, review Airbyte's data protocol specification.

Adding a new connector

Requirements

To add a new connector you need to:

Implement & Package your connector in an Airbyte Protocol compliant Docker image
Add integration tests for your connector. At a minimum, all connectors must pass Airbyte's standard test suite, but you can also add your own tests.
Document how to build & test your connector
Publish the Docker image containing the connector

Each requirement has a subsection below.

1. Implement & package the connector

If you are building a connector in any of the following languages/frameworks, then you're in luck! We provide autogenerated templates to get you started quickly:

Sources

Python Source Connector
Singer-based Python Source Connector. Singer.io is an open source framework with a large community and many available connectors (known as taps & targets). To build an Airbyte connector from a Singer tap, wrap the tap in a thin Python package to make it Airbyte Protocol-compatible. See the Github Connector for an example of an Airbyte Connector implemented on top of a Singer tap.
Generic Connector: This template provides a basic starting point for any language.

Destinations

Java Destination Connector
Python Destination Connector

Creating a connector from a template

Run the interactive generator:

cd airbyte-integrations/connector-templates/generator
./generate.sh

and choose the relevant template by using the arrow keys. This will generate a new connector in the airbyte-integrations/connectors/<your-connector> directory.

Search the generated directory for "TODO"s and follow them to implement your connector. For more detailed walkthroughs and instructions, follow the relevant tutorial:

As you implement your connector, make sure to review the Best Practices for Connector Development guide. Following best practices is not a requirement for merging your contribution to Airbyte, but it certainly doesn't hurt ;)

2. Integration tests

At a minimum, your connector must implement the acceptance tests described in Testing Connectors

Note: Acceptance tests are not yet available for Python destination connectors. Coming soon!

3. Document building & testing your connector

If you're writing in Python or Java, skip this section -- it is provided automatically.

If you're writing in another language, please document the commands needed to:

Build your connector docker image (usually this is just docker build . but let us know if there are necessary flags, gotchas, etc..)
Run any unit or integration tests in a Docker image.

Your integration and unit tests must be runnable entirely within a Docker image. This is important to guarantee consistent build environments.

When you submit a PR to Airbyte with your connector, the reviewer will use the commands you provide to integrate your connector into Airbyte's build system as follows:

:airbyte-integrations:connectors:source-<name>:build should run unit tests and build the integration's Docker image
:airbyte-integrations:connectors:source-<name>:integrationTest should run integration tests including Airbyte's Standard test suite.

4. Publish the connector

Typically this will be handled as part of code review by an Airbyter. There is a section below on what steps are needed for publishing a connector and will mostly be used by Airbyte employees publishing the connector.

Updating an existing connector

The steps for updating an existing connector are the same as for building a new connector minus the need to use the autogenerator to create a new connector. Therefore the steps are:

Iterate on the connector to make the needed changes
Run tests
Add any needed docs updates
Create a PR to get the connector published

Adding normalization to a connector

In order to enable normalization for a destination connector, you'll need to set some fields on the destination definitions entry for the connector. This is done in the metadata.yaml file found at the root of each connector.

Here's an example of normalization fields being set to enable normalization for the Postgres destination:

data:
    # ... other fields
    normalizationConfig:
        normalizationRepository: airbyte/normalization
        normalizationTag: 0.2.25
        normalizationIntegrationType: postgres

For more information about what these fields mean, see the NormalizationDestinationDefinitionConfig schema.

The presence of these fields will enable normalization for the connector, and determine which docker image will run.

Publishing a connector

Once you've finished iterating on the changes to a connector as specified in its README.md, follow these instructions to ship the new version of the connector with Airbyte out of the box.

Bump the version in the Dockerfile of the connector (LABEL io.airbyte.version=X.X.X).
Bump the docker image version in the metadata.yaml of the connector.
Submit a PR containing the changes you made.
One of Airbyte maintainers will review the change in the new version and make sure the tests are passing.
You our an Airbyte maintainer can merge the PR once it is approved and all the required CI checks are passing you.
Once the PR is merged the new connector version will be published to DockerHub and the connector should now be available for everyone who uses it. Thank you!

Updating Connector Metadata

When a new (or updated version) of a connector is ready, our automations will check your branch for a few things:

Does the connector have an icon?
Does the connector have documentation and is it in the proper format?
Does the connector have a changelog entry for this version?
The metadata.yaml file is valid.

If any of the above are failing, you won't be able to merge your PR or publish your connector.

Connector icons should be square SVGs and be located in this directory.

Connector documentation and changelogs are markdown files living either here for sources, or here for destinations.

Using credentials in CI

In order to run integration tests in CI, you'll often need to inject credentials into CI. There are a few steps for doing this:

Place the credentials into Google Secret Manager(GSM): Airbyte uses a project 'Google Secret Manager' service as the source of truth for all CI secrets. Place the credentials exactly as they should be used by the connector into a GSM secret here i.e.: it should basically be a copy paste of the config.json passed into a connector via the --config flag. We use the following naming pattern: SECRET_<capital source OR destination name>_CREDS e.g: SECRET_SOURCE-S3_CREDS or SECRET_DESTINATION-SNOWFLAKE_CREDS.
Add the GSM secret's labels:
- connector (required) -- unique connector's name or set of connectors' names with '_' as delimiter i.e.: connector=source-s3, connector=destination-snowflake
- filename (optional) -- custom target secret file. Unfortunately Google doesn't use '.' into labels' values and so Airbyte CI scripts will add '.json' to the end automatically. By default secrets will be saved to ./secrets/config.json i.e: filename=config_auth => secrets/config_auth.json
Save a necessary JSON value Example.
That should be it.

Access CI secrets on GSM

Access to GSM storage is limited to Airbyte employees. To give an employee permissions to the project:

Go to the permissions' page
Add a new principal to dataline-integration-testing:

input their login email
select the role Development_CI_Secrets

Save