1
0
mirror of synced 2025-12-23 21:03:15 -05:00

Docs: Tutorials formatting + from scratch connector tutorial cleanup (#33839)

Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
This commit is contained in:
Natik Gadzhi
2024-03-06 20:15:27 -08:00
committed by GitHub
parent 4fcff41901
commit e4ccffbf6e
15 changed files with 1034 additions and 513 deletions

View File

@@ -2,24 +2,26 @@
## Summary
This article provides a checklist for how to create a Java destination. Each step in the checklist has a link to a more detailed explanation below.
This article provides a checklist for how to create a Java destination. Each step in the checklist
has a link to a more detailed explanation below.
## Requirements
Docker and Java with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md).
Docker and Java with the versions listed in the
[tech stack section](../../understanding-airbyte/tech-stack.md).
## Checklist
### Creating a destination
* Step 1: Create the destination using the template generator
* Step 2: Build the newly generated destination
* Step 3: Implement `spec` to define the configuration required to run the connector
* Step 4: Implement `check` to provide a way to validate configurations provided to the connector
* Step 5: Implement `write` to write data to the destination
* Step 6: Set up Acceptance Tests
* Step 7: Write unit tests or integration tests
* Step 8: Update the docs \(in `docs/integrations/destinations/<destination-name>.md`\)
- Step 1: Create the destination using the template generator
- Step 2: Build the newly generated destination
- Step 3: Implement `spec` to define the configuration required to run the connector
- Step 4: Implement `check` to provide a way to validate configurations provided to the connector
- Step 5: Implement `write` to write data to the destination
- Step 6: Set up Acceptance Tests
- Step 7: Write unit tests or integration tests
- Step 8: Update the docs \(in `docs/integrations/destinations/<destination-name>.md`\)
:::info
@@ -29,7 +31,8 @@ All `./gradlew` commands must be run from the root of the airbyte project.
:::info
If you need help with any step of the process, feel free to submit a PR with your progress and any questions you have, or ask us on [slack](https://slack.airbyte.io).
If you need help with any step of the process, feel free to submit a PR with your progress and any
questions you have, or ask us on [slack](https://slack.airbyte.io).
:::
@@ -44,7 +47,9 @@ $ cd airbyte-integrations/connector-templates/generator # assumes you are starti
$ ./generate.sh
```
Select the `Java Destination` template and then input the name of your connector. We'll refer to the destination as `<name>-destination` in this tutorial, but you should replace `<name>` with the actual name you used for your connector e.g: `BigQueryDestination` or `bigquery-destination`.
Select the `Java Destination` template and then input the name of your connector. We'll refer to the
destination as `<name>-destination` in this tutorial, but you should replace `<name>` with the
actual name you used for your connector e.g: `BigQueryDestination` or `bigquery-destination`.
### Step 2: Build the newly generated destination
@@ -55,11 +60,14 @@ You can build the destination by running:
./gradlew :airbyte-integrations:connectors:destination-<name>:build
```
This compiles the Java code for your destination and builds a Docker image with the connector. At this point, we haven't implemented anything of value yet, but once we do, you'll use this command to compile your code and Docker image.
This compiles the Java code for your destination and builds a Docker image with the connector. At
this point, we haven't implemented anything of value yet, but once we do, you'll use this command to
compile your code and Docker image.
:::info
Airbyte uses Gradle to manage Java dependencies. To add dependencies for your connector, manage them in the `build.gradle` file inside your connector's directory.
Airbyte uses Gradle to manage Java dependencies. To add dependencies for your connector, manage them
in the `build.gradle` file inside your connector's directory.
:::
@@ -67,38 +75,52 @@ Airbyte uses Gradle to manage Java dependencies. To add dependencies for your co
We recommend the following ways of iterating on your connector as you're making changes:
* Test-driven development \(TDD\) in Java
* Test-driven development \(TDD\) using Airbyte's Acceptance Tests
* Directly running the docker image
- Test-driven development \(TDD\) in Java
- Test-driven development \(TDD\) using Airbyte's Acceptance Tests
- Directly running the docker image
#### Test-driven development in Java
This should feel like a standard flow for a Java developer: you make some code changes then run java tests against them. You can do this directly in your IDE, but you can also run all unit tests via Gradle by running the command to build the connector:
This should feel like a standard flow for a Java developer: you make some code changes then run java
tests against them. You can do this directly in your IDE, but you can also run all unit tests via
Gradle by running the command to build the connector:
```text
./gradlew :airbyte-integrations:connectors:destination-<name>:build
```
This will build the code and run any unit tests. This approach is great when you are testing local behaviors and writing unit tests.
This will build the code and run any unit tests. This approach is great when you are testing local
behaviors and writing unit tests.
#### TDD using acceptance tests & integration tests
Airbyte provides a standard test suite \(dubbed "Acceptance Tests"\) that runs against every destination connector. They are "free" baseline tests to ensure the basic functionality of the destination. When developing a connector, you can simply run the tests between each change and use the feedback to guide your development.
Airbyte provides a standard test suite \(dubbed "Acceptance Tests"\) that runs against every
destination connector. They are "free" baseline tests to ensure the basic functionality of the
destination. When developing a connector, you can simply run the tests between each change and use
the feedback to guide your development.
If you want to try out this approach, check out Step 6 which describes what you need to do to set up the acceptance Tests for your destination.
If you want to try out this approach, check out Step 6 which describes what you need to do to set up
the acceptance Tests for your destination.
The nice thing about this approach is that you are running your destination exactly as Airbyte will run it in the CI. The downside is that the tests do not run very quickly. As such, we recommend this iteration approach only once you've implemented most of your connector and are in the finishing stages of implementation. Note that Acceptance Tests are required for every connector supported by Airbyte, so you should make sure to run them a couple of times while iterating to make sure your connector is compatible with Airbyte.
The nice thing about this approach is that you are running your destination exactly as Airbyte will
run it in the CI. The downside is that the tests do not run very quickly. As such, we recommend this
iteration approach only once you've implemented most of your connector and are in the finishing
stages of implementation. Note that Acceptance Tests are required for every connector supported by
Airbyte, so you should make sure to run them a couple of times while iterating to make sure your
connector is compatible with Airbyte.
#### Directly running the destination using Docker
If you want to run your destination exactly as it will be run by Airbyte \(i.e. within a docker container\), you can use the following commands from the connector module directory \(`airbyte-integrations/connectors/destination-<name>`\):
If you want to run your destination exactly as it will be run by Airbyte \(i.e. within a docker
container\), you can use the following commands from the connector module directory
\(`airbyte-integrations/connectors/destination-<name>`\):
```text
# First build the container
./gradlew :airbyte-integrations:connectors:destination-<name>:build
# Then use the following commands to run it
# Runs the "spec" command, used to find out what configurations are needed to run a connector
# Runs the "spec" command, used to find out what configurations are needed to run a connector
docker run --rm airbyte/destination-<name>:dev spec
# Runs the "check" command, used to validate if the input configurations are valid
@@ -108,54 +130,72 @@ docker run --rm -v $(pwd)/secrets:/secrets airbyte/destination-<name>:dev check
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/sample_files:/sample_files airbyte/destination-<name>:dev write --config /secrets/config.json --catalog /sample_files/configured_catalog.json
```
Note: Each time you make a change to your implementation you need to re-build the connector image via `./gradlew :airbyte-integrations:connectors:destination-<name>:build`.
Note: Each time you make a change to your implementation you need to re-build the connector image
via `./gradlew :airbyte-integrations:connectors:destination-<name>:build`.
The nice thing about this approach is that you are running your destination exactly as it will be run by Airbyte. The tradeoff is that iteration is slightly slower, because you need to re-build the connector between each change.
The nice thing about this approach is that you are running your destination exactly as it will be
run by Airbyte. The tradeoff is that iteration is slightly slower, because you need to re-build the
connector between each change.
#### Handling Exceptions
In order to best propagate user-friendly error messages and log error information to the platform, the [Airbyte Protocol](../../understanding-airbyte/airbyte-protocol.md#The Airbyte Protocol) implements AirbyteTraceMessage.
In order to best propagate user-friendly error messages and log error information to the platform,
the [Airbyte Protocol](../../understanding-airbyte/airbyte-protocol.md#The Airbyte Protocol)
implements AirbyteTraceMessage.
We recommend using AirbyteTraceMessages for known errors, as in these cases you can likely offer the user a helpful message as to what went wrong and suggest how they can resolve it.
We recommend using AirbyteTraceMessages for known errors, as in these cases you can likely offer the
user a helpful message as to what went wrong and suggest how they can resolve it.
Airbyte provides a static utility class, `io.airbyte.integrations.base.AirbyteTraceMessageUtility`,
to give you a clear and straight-forward way to emit these AirbyteTraceMessages. Example usage:
Airbyte provides a static utility class, `io.airbyte.integrations.base.AirbyteTraceMessageUtility`, to give you a clear and straight-forward way to emit these AirbyteTraceMessages. Example usage:
```java
try {
// some connector code responsible for doing X
}
}
catch (ExceptionIndicatingIncorrectCredentials credErr) {
AirbyteTraceMessageUtility.emitConfigErrorTrace(
credErr, "Connector failed due to incorrect credentials while doing X. Please check your connection is using valid credentials.")
throw credErr
}
}
catch (ExceptionIndicatingKnownErrorY knownErr) {
AirbyteTraceMessageUtility.emitSystemErrorTrace(
knownErr, "Connector failed because of reason Y while doing X. Please check/do/make ... to resolve this.")
throw knownErr
}
}
catch (Exception e) {
AirbyteTraceMessageUtility.emitSystemErrorTrace(
e, "Connector failed while doing X. Possible reasons for this could be ...")
throw e
throw e
}
```
Note the two different error trace methods.
- Where possible `emitConfigErrorTrace` should be used when we are certain the issue arises from a problem with the user's input configuration, e.g. invalid credentials.
- Where possible `emitConfigErrorTrace` should be used when we are certain the issue arises from a
problem with the user's input configuration, e.g. invalid credentials.
- For everything else or if unsure, use `emitSystemErrorTrace`.
### Step 3: Implement `spec`
Each destination contains a specification written in JsonSchema that describes its inputs. Defining the specification is a good place to start when developing your destination. Check out the documentation [here](https://json-schema.org/) to learn the syntax. Here's [an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/resources/spec.json) of what the `spec.json` looks like for the postgres destination.
Each destination contains a specification written in JsonSchema that describes its inputs. Defining
the specification is a good place to start when developing your destination. Check out the
documentation [here](https://json-schema.org/) to learn the syntax. Here's
[an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/resources/spec.json)
of what the `spec.json` looks like for the postgres destination.
Your generated template should have the spec file in `airbyte-integrations/connectors/destination-<name>/src/main/resources/spec.json`. The generated connector will take care of reading this file and converting it to the correct output. Edit it and you should be done with this step.
Your generated template should have the spec file in
`airbyte-integrations/connectors/destination-<name>/src/main/resources/spec.json`. The generated
connector will take care of reading this file and converting it to the correct output. Edit it and
you should be done with this step.
For more details on what the spec is, you can read about the Airbyte Protocol [here](../../understanding-airbyte/airbyte-protocol.md).
For more details on what the spec is, you can read about the Airbyte Protocol
[here](../../understanding-airbyte/airbyte-protocol.md).
See the `spec` operation in action:
```bash
# First build the connector
# First build the connector
./gradlew :airbyte-integrations:connectors:destination-<name>:build
# Run the spec operation
@@ -164,11 +204,17 @@ docker run --rm airbyte/destination-<name>:dev spec
### Step 4: Implement `check`
The check operation accepts a JSON object conforming to the `spec.json`. In other words if the `spec.json` said that the destination requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given the credentials in the config, whether we were able to connect to the destination.
The check operation accepts a JSON object conforming to the `spec.json`. In other words if the
`spec.json` said that the destination requires a `username` and `password` the config object might
be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports,
given the credentials in the config, whether we were able to connect to the destination.
While developing, we recommend storing any credentials in `secrets/config.json`. Any `secrets` directory in the Airbyte repo is gitignored by default.
While developing, we recommend storing any credentials in `secrets/config.json`. Any `secrets`
directory in the Airbyte repo is gitignored by default.
Implement the `check` method in the generated file `<Name>Destination.java`. Here's an [example implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L94) from the BigQuery destination.
Implement the `check` method in the generated file `<Name>Destination.java`. Here's an
[example implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L94)
from the BigQuery destination.
Verify that the method is working by placing your config in `secrets/config.json` then running:
@@ -182,41 +228,66 @@ docker run -v $(pwd)/secrets:/secrets --rm airbyte/destination-<name>:dev check
### Step 5: Implement `write`
The `write` operation is the main workhorse of a destination connector: it reads input data from the source and writes it to the underlying destination. It takes as input the config file used to run the connector as well as the configured catalog: the file used to describe the schema of the incoming data and how it should be written to the destination. Its "output" is two things:
The `write` operation is the main workhorse of a destination connector: it reads input data from the
source and writes it to the underlying destination. It takes as input the config file used to run
the connector as well as the configured catalog: the file used to describe the schema of the
incoming data and how it should be written to the destination. Its "output" is two things:
1. Data written to the underlying destination
2. `AirbyteMessage`s of type `AirbyteStateMessage`, written to stdout to indicate which records have been written so far during a sync. It's important to output these messages when possible in order to avoid re-extracting messages from the source. See the [write operation protocol reference](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#write) for more information.
2. `AirbyteMessage`s of type `AirbyteStateMessage`, written to stdout to indicate which records have
been written so far during a sync. It's important to output these messages when possible in order
to avoid re-extracting messages from the source. See the
[write operation protocol reference](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol#write)
for more information.
To implement the `write` Airbyte operation, implement the `getConsumer` method in your generated `<Name>Destination.java` file. Here are some example implementations from different destination conectors:
To implement the `write` Airbyte operation, implement the `getConsumer` method in your generated
`<Name>Destination.java` file. Here are some example implementations from different destination
conectors:
* [BigQuery](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L188)
* [Google Pubsub](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-pubsub/src/main/java/io/airbyte/integrations/destination/pubsub/PubsubDestination.java#L98)
* [Local CSV](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-csv/src/main/java/io/airbyte/integrations/destination/csv/CsvDestination.java#L90)
* [Postgres](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/java/io/airbyte/integrations/destination/postgres/PostgresDestination.java)
- [BigQuery](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L188)
- [Google Pubsub](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-pubsub/src/main/java/io/airbyte/integrations/destination/pubsub/PubsubDestination.java#L98)
- [Local CSV](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-csv/src/main/java/io/airbyte/integrations/destination/csv/CsvDestination.java#L90)
- [Postgres](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/java/io/airbyte/integrations/destination/postgres/PostgresDestination.java)
:::info
The Postgres destination leverages the `AbstractJdbcDestination` superclass which makes it extremely easy to create a destination for a database or data warehouse if it has a compatible JDBC driver. If the destination you are implementing has a JDBC driver, be sure to check out `AbstractJdbcDestination`.
The Postgres destination leverages the `AbstractJdbcDestination` superclass which makes it extremely
easy to create a destination for a database or data warehouse if it has a compatible JDBC driver. If
the destination you are implementing has a JDBC driver, be sure to check out
`AbstractJdbcDestination`.
:::
For a brief overview on the Airbyte catalog check out [the Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md).
For a brief overview on the Airbyte catalog check out
[the Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md).
### Step 6: Set up Acceptance Tests
The Acceptance Tests are a set of tests that run against all destinations. These tests are run in the Airbyte CI to prevent regressions and verify a baseline of functionality. The test cases are contained and documented in the [following file](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/bases/standard-destination-test/src/main/java/io/airbyte/integrations/standardtest/destination/DestinationAcceptanceTest.java).
The Acceptance Tests are a set of tests that run against all destinations. These tests are run in
the Airbyte CI to prevent regressions and verify a baseline of functionality. The test cases are
contained and documented in the
[following file](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/bases/standard-destination-test/src/main/java/io/airbyte/integrations/standardtest/destination/DestinationAcceptanceTest.java).
To setup acceptance Tests for your connector, follow the `TODO`s in the generated file `<name>DestinationAcceptanceTest.java`. Once setup, you can run the tests using `./gradlew :airbyte-integrations:connectors:destination-<name>:integrationTest`. Make sure to run this command from the Airbyte repository root.
To setup acceptance Tests for your connector, follow the `TODO`s in the generated file
`<name>DestinationAcceptanceTest.java`. Once setup, you can run the tests using
`./gradlew :airbyte-integrations:connectors:destination-<name>:integrationTest`. Make sure to run
this command from the Airbyte repository root.
### Step 7: Write unit tests and/or integration tests
The Acceptance Tests are meant to cover the basic functionality of a destination. Think of it as the bare minimum required for us to add a destination to Airbyte. You should probably add some unit testing or custom integration testing in case you need to test additional functionality of your destination.
The Acceptance Tests are meant to cover the basic functionality of a destination. Think of it as the
bare minimum required for us to add a destination to Airbyte. You should probably add some unit
testing or custom integration testing in case you need to test additional functionality of your
destination.
#### Step 8: Update the docs
Each connector has its own documentation page. By convention, that page should have the following path: in `docs/integrations/destinations/<destination-name>.md`. For the documentation to get packaged with the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match doing that from existing connectors.
Each connector has its own documentation page. By convention, that page should have the following
path: in `docs/integrations/destinations/<destination-name>.md`. For the documentation to get
packaged with the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match
doing that from existing connectors.
## Wrapping up
Well done on making it this far! If you'd like your connector to ship with Airbyte by default, create a PR against the Airbyte repo and we'll work with you to get it across the finish line.
Well done on making it this far! If you'd like your connector to ship with Airbyte by default,
create a PR against the Airbyte repo and we'll work with you to get it across the finish line.

View File

@@ -2,15 +2,20 @@
## Summary
This article provides a checklist for how to create a python source. Each step in the checklist has a link to a more detailed explanation below.
This article provides a checklist for how to create a python source. Each step in the checklist has
a link to a more detailed explanation below.
## Requirements
Docker, Python, and Java with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md).
Docker, Python, and Java with the versions listed in the
[tech stack section](../../understanding-airbyte/tech-stack.md).
:::info
All the commands below assume that `python` points to a version of python &gt;3.7. On some systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the case on your machine, substitute all `python` commands in this guide with `python3` . Otherwise, make sure to install Python 3 before beginning.
All the commands below assume that `python` points to a version of python &gt;3.7. On some systems,
`python` points to a Python2 installation and `python3` points to Python3. If this is the case on
your machine, substitute all `python` commands in this guide with `python3` . Otherwise, make sure
to install Python 3 before beginning.
:::
@@ -18,18 +23,21 @@ All the commands below assume that `python` points to a version of python &gt;3.
### Creating a Source
* Step 1: Create the source using template
* Step 2: Build the newly generated source
* Step 3: Set up your Airbyte development environment
* Step 4: Implement `spec` \(and define the specification for the source `airbyte-integrations/connectors/source-<source-name>/spec.yaml`\)
* Step 5: Implement `check`
* Step 6: Implement `discover`
* Step 7: Implement `read`
* Step 8: Set up Connector Acceptance Tests
* Step 9: Write unit tests or integration tests
* Step 10: Update the `README.md` \(If API credentials are required to run the integration, please document how they can be obtained or link to a how-to guide.\)
* Step 11: Update the `metadata.yaml` file with accurate information about your connector. These metadata will be used to add the connector to Airbyte's connector registry.
* Step 12: Add docs \(in `docs/integrations/sources/<source-name>.md`\)
- Step 1: Create the source using template
- Step 2: Build the newly generated source
- Step 3: Set up your Airbyte development environment
- Step 4: Implement `spec` \(and define the specification for the source
`airbyte-integrations/connectors/source-<source-name>/spec.yaml`\)
- Step 5: Implement `check`
- Step 6: Implement `discover`
- Step 7: Implement `read`
- Step 8: Set up Connector Acceptance Tests
- Step 9: Write unit tests or integration tests
- Step 10: Update the `README.md` \(If API credentials are required to run the integration, please
document how they can be obtained or link to a how-to guide.\)
- Step 11: Update the `metadata.yaml` file with accurate information about your connector. These
metadata will be used to add the connector to Airbyte's connector registry.
- Step 12: Add docs \(in `docs/integrations/sources/<source-name>.md`\)
:::info
Each step of the Creating a Source checklist is explained in more detail below.
@@ -41,14 +49,24 @@ All `./gradlew` commands must be run from the root of the airbyte project.
### Submitting a Source to Airbyte
* If you need help with any step of the process, feel free to submit a PR with your progress and any questions you have.
* Submit a PR.
* To run integration tests, Airbyte needs access to a test account/environment. Coordinate with an Airbyte engineer \(via the PR\) to add test credentials so that we can run tests for the integration in the CI. \(We will create our own test account once you let us know what source we need to create it for.\)
* Once the config is stored in Github Secrets, edit `.github/workflows/test-command.yml` and `.github/workflows/publish-command.yml` to inject the config into the build environment.
* Edit the `airbyte/tools/bin/ci_credentials.sh` script to pull the script from the build environment and write it to `secrets/config.json` during the build.
- If you need help with any step of the process, feel free to submit a PR with your progress and any
questions you have.
- Submit a PR.
- To run integration tests, Airbyte needs access to a test account/environment. Coordinate with an
Airbyte engineer \(via the PR\) to add test credentials so that we can run tests for the
integration in the CI. \(We will create our own test account once you let us know what source we
need to create it for.\)
- Once the config is stored in Github Secrets, edit `.github/workflows/test-command.yml` and
`.github/workflows/publish-command.yml` to inject the config into the build environment.
- Edit the `airbyte/tools/bin/ci_credentials.sh` script to pull the script from the build
environment and write it to `secrets/config.json` during the build.
:::info
If you have a question about a step the Submitting a Source to Airbyte checklist include it in your PR or ask it on [#help-connector-development channel on Slack](https://airbytehq.slack.com/archives/C027KKE4BCZ).
If you have a question about a step the Submitting a Source to Airbyte checklist include it
in your PR or ask it on
[#help-connector-development channel on Slack](https://airbytehq.slack.com/archives/C027KKE4BCZ).
:::
## Explaining Each Step
@@ -62,7 +80,8 @@ $ cd airbyte-integrations/connector-templates/generator # assumes you are starti
$ ./generate.sh
```
Select the `python` template and then input the name of your connector. For this walk through we will refer to our source as `example-python`
Select the `python` template and then input the name of your connector. For this walk through we
will refer to our source as `example-python`
### Step 2: Install the newly generated source
@@ -73,40 +92,58 @@ cd airbyte-integrations/connectors/source-<name>
poetry install
```
This step sets up the initial python environment.
### Step 3: Set up your Airbyte development environment
The generator creates a file `source_<source_name>/source.py`. This will be where you implement the logic for your source. The templated `source.py` contains extensive comments explaining each method that needs to be implemented. Briefly here is an overview of each of these methods.
The generator creates a file `source_<source_name>/source.py`. This will be where you implement the
logic for your source. The templated `source.py` contains extensive comments explaining each method
that needs to be implemented. Briefly here is an overview of each of these methods.
1. `spec`: declares the user-provided credentials or configuration needed to run the connector
2. `check`: tests if with the user-provided configuration the connector can connect with the underlying data source.
2. `check`: tests if with the user-provided configuration the connector can connect with the
underlying data source.
3. `discover`: declares the different streams of data that this connector can output
4. `read`: reads data from the underlying data source \(The stock ticker API\)
#### Dependencies
Python dependencies for your source should be declared in `airbyte-integrations/connectors/source-<source-name>/setup.py` in the `install_requires` field. You will notice that a couple of Airbyte dependencies are already declared there. Do not remove these; they give your source access to the helper interface that is provided by the generator.
Python dependencies for your source should be declared in
`airbyte-integrations/connectors/source-<source-name>/setup.py` in the `install_requires` field. You
will notice that a couple of Airbyte dependencies are already declared there. Do not remove these;
they give your source access to the helper interface that is provided by the generator.
You may notice that there is a `requirements.txt` in your source's directory as well. Do not touch this. It is autogenerated and used to provide Airbyte dependencies. All your dependencies should be declared in `setup.py`.
You may notice that there is a `requirements.txt` in your source's directory as well. Do not touch
this. It is autogenerated and used to provide Airbyte dependencies. All your dependencies should be
declared in `setup.py`.
#### Development Environment
The commands we ran above created a virtual environment for your source. If you want your IDE to auto complete and resolve dependencies properly, point it at the virtual env `airbyte-integrations/connectors/source-<source-name>/.venv`. Also anytime you change the dependencies in the `setup.py` make sure to re-run the build command. The build system will handle installing all dependencies in the `setup.py` into the virtual environment.
The commands we ran above created a virtual environment for your source. If you want your IDE to
auto complete and resolve dependencies properly, point it at the virtual env
`airbyte-integrations/connectors/source-<source-name>/.venv`. Also anytime you change the
dependencies in the `setup.py` make sure to re-run the build command. The build system will handle
installing all dependencies in the `setup.py` into the virtual environment.
Pretty much all it takes to create a source is to implement the `Source` interface. The template fills in a lot of information for you and has extensive docstrings describing what you need to do to implement each method. The next 4 steps are just implementing that interface.
Pretty much all it takes to create a source is to implement the `Source` interface. The template
fills in a lot of information for you and has extensive docstrings describing what you need to do to
implement each method. The next 4 steps are just implementing that interface.
:::info
All logging should be done through the `logger` object passed into each method. Otherwise, logs will not be shown in the Airbyte UI.
All logging should be done through the `logger` object passed into each method. Otherwise,
logs will not be shown in the Airbyte UI.
:::
#### Iterating on your implementation
Everyone develops differently but here are 3 ways that we recommend iterating on a source. Consider using whichever one matches your style.
Everyone develops differently but here are 3 ways that we recommend iterating on a source. Consider
using whichever one matches your style.
**Run the source using python**
You'll notice in your source's directory that there is a python file called `main.py`. This file exists as convenience for development. You can call it from within the virtual environment mentioned above `. ./.venv/bin/activate` to test out that your source works.
You'll notice in your source's directory that there is a python file called `main.py`. This file
exists as convenience for development. You can call it from within the virtual environment mentioned
above `. ./.venv/bin/activate` to test out that your source works.
```bash
# from airbyte-integrations/connectors/source-<source-name>
@@ -116,30 +153,38 @@ poetry run source-<source-name> discover --config secrets/config.json
poetry run source-<source-name> read --config secrets/config.json --catalog sample_files/configured_catalog.json
```
The nice thing about this approach is that you can iterate completely within in python. The downside is that you are not quite running your source as it will actually be run by Airbyte. Specifically you're not running it from within the docker container that will house it.
The nice thing about this approach is that you can iterate completely within in python. The downside
is that you are not quite running your source as it will actually be run by Airbyte. Specifically
you're not running it from within the docker container that will house it.
**Build the source docker image**
You have to build a docker image for your connector if you want to run your source exactly as it will be run by Airbyte.
You have to build a docker image for your connector if you want to run your source exactly as it
will be run by Airbyte.
**Option A: Building the docker image with `airbyte-ci`**
This is the preferred method for building and testing connectors.
If you want to open source your connector we encourage you to use our [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) tool to build your connector.
It will not use a Dockerfile but will build the connector image from our [base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md) and use our internal build logic to build an image from your Python connector code.
If you want to open source your connector we encourage you to use our
[`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)
tool to build your connector. It will not use a Dockerfile but will build the connector image from
our
[base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md)
and use our internal build logic to build an image from your Python connector code.
Running `airbyte-ci connectors --name source-<source-name> build` will build your connector image.
Once the command is done, you will find your connector image in your local docker host: `airbyte/source-<source-name>:dev`.
Once the command is done, you will find your connector image in your local docker host:
`airbyte/source-<source-name>:dev`.
**Option B: Building the docker image with a Dockerfile**
If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image using your own Dockerfile. This method is not preferred, and is not supported for certified connectors.
If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image
using your own Dockerfile. This method is not preferred, and is not supported for certified
connectors.
Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look something like this:
Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look
something like this:
```Dockerfile
@@ -156,6 +201,7 @@ RUN pip install ./airbyte/integration_code
Please use this as an example. This is not optimized.
Build your image:
```bash
docker build . -t airbyte/source-example-python:dev
```
@@ -170,20 +216,28 @@ docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/sample_files:/sample_files
```
:::info
Each time you make a change to your implementation you need to re-build the connector image. This ensures the new python code is added into the docker container.
Each time you make a change to your implementation you need to re-build the connector image.
This ensures the new python code is added into the docker container.
:::
The nice thing about this approach is that you are running your source exactly as it will be run by Airbyte. The tradeoff is that iteration is slightly slower, because you need to re-build the connector between each change.
The nice thing about this approach is that you are running your source exactly as it will be run by
Airbyte. The tradeoff is that iteration is slightly slower, because you need to re-build the
connector between each change.
**Detailed Debug Messages**
During development of your connector, you can enable the printing of detailed debug information during a sync by specifying the `--debug` flag. This will allow you to get a better picture of what is happening during each step of your sync.
During development of your connector, you can enable the printing of detailed debug information
during a sync by specifying the `--debug` flag. This will allow you to get a better picture of what
is happening during each step of your sync.
```bash
poetry run source-<source-name> read --config secrets/config.json --catalog sample_files/configured_catalog.json --debug
```
In addition to the preset CDK debug statements, you can also emit custom debug information from your connector by introducing your own debug statements:
In addition to the preset CDK debug statements, you can also emit custom debug information from your
connector by introducing your own debug statements:
```python
self.logger.debug(
@@ -197,50 +251,87 @@ self.logger.debug(
**TDD using acceptance tests & integration tests**
Airbyte provides an acceptance test suite that is run against every source. The objective of these tests is to provide some "free" tests that can sanity check that the basic functionality of the source works. One approach to developing your connector is to simply run the tests between each change and use the feedback from them to guide your development.
Airbyte provides an acceptance test suite that is run against every source. The objective of these
tests is to provide some "free" tests that can sanity check that the basic functionality of the
source works. One approach to developing your connector is to simply run the tests between each
change and use the feedback from them to guide your development.
If you want to try out this approach, check out Step 8 which describes what you need to do to set up the standard tests for your source.
If you want to try out this approach, check out Step 8 which describes what you need to do to set up
the standard tests for your source.
The nice thing about this approach is that you are running your source exactly as Airbyte will run it in the CI. The downside is that the tests do not run very quickly.
The nice thing about this approach is that you are running your source exactly as Airbyte will run
it in the CI. The downside is that the tests do not run very quickly.
### Step 4: Implement `spec`
Each source contains a specification that describes what inputs it needs in order for it to pull data. This file can be found in `airbyte-integrations/connectors/source-<source-name>/spec.yaml`. This is a good place to start when developing your source. Using JsonSchema define what the inputs are \(e.g. username and password\). Here's [an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml) of what the `spec.yaml` looks like for the stripe source.
Each source contains a specification that describes what inputs it needs in order for it to pull
data. This file can be found in `airbyte-integrations/connectors/source-<source-name>/spec.yaml`.
This is a good place to start when developing your source. Using JsonSchema define what the inputs
are \(e.g. username and password\). Here's
[an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml)
of what the `spec.yaml` looks like for the stripe source.
For more details on what the spec is, you can read about the Airbyte Protocol [here](../../understanding-airbyte/airbyte-protocol.md).
For more details on what the spec is, you can read about the Airbyte Protocol
[here](../../understanding-airbyte/airbyte-protocol.md).
The generated code that Airbyte provides, handles implementing the `spec` method for you. It assumes that there will be a file called `spec.yaml` in the same directory as `source.py`. If you have declared the necessary JsonSchema in `spec.yaml` you should be done with this step.
The generated code that Airbyte provides, handles implementing the `spec` method for you. It assumes
that there will be a file called `spec.yaml` in the same directory as `source.py`. If you have
declared the necessary JsonSchema in `spec.yaml` you should be done with this step.
### Step 5: Implement `check`
As described in the template code, this method takes in a json object called config that has the values described in the `spec.yaml` filled in. In other words if the `spec.yaml` said that the source requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given the credentials in the config, whether we were able to connect to the source. For example, with the given credentials could the source connect to the database server.
As described in the template code, this method takes in a json object called config that has the
values described in the `spec.yaml` filled in. In other words if the `spec.yaml` said that the
source requires a `username` and `password` the config object might be
`{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given
the credentials in the config, whether we were able to connect to the source. For example, with the
given credentials could the source connect to the database server.
While developing, we recommend storing this object in `secrets/config.json`. The `secrets` directory is gitignored by default.
While developing, we recommend storing this object in `secrets/config.json`. The `secrets` directory
is gitignored by default.
### Step 6: Implement `discover`
As described in the template code, this method takes in the same config object as `check`. It then returns a json object called a `catalog` that describes what data is available and metadata on what options are available for how to replicate it.
As described in the template code, this method takes in the same config object as `check`. It then
returns a json object called a `catalog` that describes what data is available and metadata on what
options are available for how to replicate it.
For a brief overview on the catalog check out [Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md).
For a brief overview on the catalog check out
[Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md).
### Step 7: Implement `read`
As described in the template code, this method takes in the same config object as the previous methods. It also takes in a "configured catalog". This object wraps the catalog emitted by the `discover` step and includes configuration on how the data should be replicated. For a brief overview on the configured catalog check out [Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md). It then returns a generator which returns each record in the stream.
As described in the template code, this method takes in the same config object as the previous
methods. It also takes in a "configured catalog". This object wraps the catalog emitted by the
`discover` step and includes configuration on how the data should be replicated. For a brief
overview on the configured catalog check out
[Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md).
It then returns a generator which returns each record in the stream.
### Step 8: Set up Connector Acceptance Tests (CATs)
The Connector Acceptance Tests are a set of tests that run against all sources. These tests are run in the Airbyte CI to prevent regressions. They also can help you sanity check that your source works as expected. The following [article](../testing-connectors/connector-acceptance-tests-reference.md) explains Connector Acceptance Tests and how to run them.
The Connector Acceptance Tests are a set of tests that run against all sources. These tests are run
in the Airbyte CI to prevent regressions. They also can help you sanity check that your source works
as expected. The following [article](../testing-connectors/connector-acceptance-tests-reference.md)
explains Connector Acceptance Tests and how to run them.
You can run the tests using [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md):
`airbyte-ci connectors --name source-<source-name> test --only-step=acceptance`
:::info
In some rare cases we make exceptions and allow a source to not need to pass all the standard tests. If for some reason you think your source cannot reasonably pass one of the tests cases, reach out to us on github or slack, and we can determine whether there's a change we can make so that the test will pass or if we should skip that test for your source.
In some rare cases we make exceptions and allow a source to not need to pass all the
standard tests. If for some reason you think your source cannot reasonably pass one of the tests
cases, reach out to us on github or slack, and we can determine whether there's a change we can make
so that the test will pass or if we should skip that test for your source.
:::
### Step 9: Write unit tests and/or integration tests
The connector acceptance tests are meant to cover the basic functionality of a source. Think of it as the bare minimum required for us to add a source to Airbyte. In case you need to test additional functionality of your source, write unit or integration tests.
The connector acceptance tests are meant to cover the basic functionality of a source. Think of it
as the bare minimum required for us to add a source to Airbyte. In case you need to test additional
functionality of your source, write unit or integration tests.
#### Unit Tests
@@ -250,32 +341,49 @@ You can run the tests using `poetry run pytest tests/unit_tests`
#### Integration Tests
Place any integration tests in the `integration_tests` directory such that they can be [discovered by pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html#conventions-for-python-test-discovery).
Place any integration tests in the `integration_tests` directory such that they can be
[discovered by pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html#conventions-for-python-test-discovery).
You can run the tests using `poetry run pytest tests/integration_tests`
### Step 10: Update the `README.md`
The template fills in most of the information for the readme for you. Unless there is a special case, the only piece of information you need to add is how one can get the credentials required to run the source. e.g. Where one can find the relevant API key, etc.
The template fills in most of the information for the readme for you. Unless there is a special
case, the only piece of information you need to add is how one can get the credentials required to
run the source. e.g. Where one can find the relevant API key, etc.
### Step 11: Add the connector to the API/UI
There are multiple ways to use the connector you have built.
If you are self hosting Airbyte (OSS) you are able to use the Custom Connector feature. This feature allows you to run any Docker container that implements the Airbye protocol. You can read more about it [here](https://docs.airbyte.com/integrations/custom-connectors/).
If you are self hosting Airbyte (OSS) you are able to use the Custom Connector feature. This feature
allows you to run any Docker container that implements the Airbye protocol. You can read more about
it [here](https://docs.airbyte.com/integrations/custom-connectors/).
If you are using Airbyte Cloud (or OSS), you can submit a PR to add your connector to the Airbyte repository. Once the PR is merged, the connector will be available to all Airbyte Cloud users. You can read more about it [here](https://docs.airbyte.com/contributing-to-airbyte/submit-new-connector).
If you are using Airbyte Cloud (or OSS), you can submit a PR to add your connector to the Airbyte
repository. Once the PR is merged, the connector will be available to all Airbyte Cloud users. You
can read more about it
[here](https://docs.airbyte.com/contributing-to-airbyte/submit-new-connector).
Note that when submitting an Airbyte connector, you will need to ensure that
1. The connector passes the CAT suite. See [Set up Connector Acceptance Tests](#step-8-set-up-connector-acceptance-tests-\(cats\)).
2. The metadata.yaml file (created by our generator) is filed out and valid. See [Connector Metadata File](https://docs.airbyte.com/connector-development/connector-metadata-file).
3. You have created appropriate documentation for the connector. See [Add docs](#step-12-add-docs).
1. The connector passes the CAT suite. See
[Set up Connector Acceptance Tests](<#step-8-set-up-connector-acceptance-tests-(cats)>).
2. The metadata.yaml file (created by our generator) is filed out and valid. See
[Connector Metadata File](https://docs.airbyte.com/connector-development/connector-metadata-file).
3. You have created appropriate documentation for the connector. See [Add docs](#step-12-add-docs).
### Step 12: Add docs
Each connector has its own documentation page. By convention, that page should have the following path: in `docs/integrations/sources/<source-name>.md`. For the documentation to get packaged with the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match doing that from existing connectors.
Each connector has its own documentation page. By convention, that page should have the following
path: in `docs/integrations/sources/<source-name>.md`. For the documentation to get packaged with
the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match doing that from
existing connectors.
## Related tutorials
For additional examples of how to use the Python CDK to build an Airbyte source connector, see the following tutorials:
For additional examples of how to use the Python CDK to build an Airbyte source connector, see the
following tutorials:
- [Python CDK Speedrun: Creating a Source](https://docs.airbyte.com/connector-development/tutorials/cdk-speedrun)
- [Build a connector to extract data from the Webflow API](https://airbyte.com/tutorials/extract-data-from-the-webflow-api)

View File

@@ -2,9 +2,11 @@
## CDK Speedrun \(HTTP API Source Creation Any Route\)
This is a blazing fast guide to building an HTTP source connector. Think of it as the TL;DR version of [this tutorial.](cdk-tutorial-python-http/getting-started.md)
This is a blazing fast guide to building an HTTP source connector. Think of it as the TL;DR version
of [this tutorial.](cdk-tutorial-python-http/getting-started.md)
If you are a visual learner and want to see a video version of this guide going over each part in detail, check it out below.
If you are a visual learner and want to see a video version of this guide going over each part in
detail, check it out below.
[A speedy CDK overview.](https://www.youtube.com/watch?v=kJ3hLoNfz_E)
@@ -19,9 +21,9 @@ If you are a visual learner and want to see a video version of this guide going
```bash
# # clone the repo if you havent already
# git clone --depth 1 https://github.com/airbytehq/airbyte/
# git clone --depth 1 https://github.com/airbytehq/airbyte/
# cd airbyte # start from repo root
cd airbyte-integrations/connector-templates/generator
cd airbyte-integrations/connector-templates/generator
./generate.sh
```
@@ -40,7 +42,8 @@ poetry install
cd source_python_http_example
```
We're working with the PokeAPI, so we need to define our input schema to reflect that. Open the `spec.yaml` file here and replace it with:
We're working with the PokeAPI, so we need to define our input schema to reflect that. Open the
`spec.yaml` file here and replace it with:
```yaml
documentationUrl: https://docs.airbyte.com/integrations/sources/pokeapi
@@ -61,9 +64,14 @@ connectionSpecification:
- snorlax
```
As you can see, we have one input to our input schema, which is `pokemon_name`, which is required. Normally, input schemas will contain information such as API keys and client secrets that need to get passed down to all endpoints or streams.
As you can see, we have one input to our input schema, which is `pokemon_name`, which is required.
Normally, input schemas will contain information such as API keys and client secrets that need to
get passed down to all endpoints or streams.
Ok, let's write a function that checks the inputs we just defined. Nuke the `source.py` file. Now add this code to it. For a crucial time skip, we're going to define all the imports we need in the future here. Also note that your `AbstractSource` class name must be a camel-cased version of the name you gave in the generation phase. In our case, this is `SourcePythonHttpExample`.
Ok, let's write a function that checks the inputs we just defined. Nuke the `source.py` file. Now
add this code to it. For a crucial time skip, we're going to define all the imports we need in the
future here. Also note that your `AbstractSource` class name must be a camel-cased version of the
name you gave in the generation phase. In our case, this is `SourcePythonHttpExample`.
```python
from typing import Any, Iterable, List, Mapping, MutableMapping, Optional, Tuple
@@ -94,7 +102,9 @@ class SourcePythonHttpExample(AbstractSource):
return [Pokemon(pokemon_name=config["pokemon_name"])]
```
Create a new file called `pokemon_list.py` at the same level. This will handle input validation for us so that we don't input invalid Pokemon. Let's start with a very limited list - any Pokemon not included in this list will get rejected.
Create a new file called `pokemon_list.py` at the same level. This will handle input validation for
us so that we don't input invalid Pokemon. Let's start with a very limited list - any Pokemon not
included in this list will get rejected.
```python
"""
@@ -133,7 +143,8 @@ Expected output:
### Define your Stream
In your `source.py` file, add this `Pokemon` class. This stream represents an endpoint you want to hit, which in our case, is the single [Pokemon endpoint](https://pokeapi.co/docs/v2#pokemon).
In your `source.py` file, add this `Pokemon` class. This stream represents an endpoint you want to
hit, which in our case, is the single [Pokemon endpoint](https://pokeapi.co/docs/v2#pokemon).
```python
class Pokemon(HttpStream):
@@ -151,7 +162,7 @@ class Pokemon(HttpStream):
return None
def path(
self,
self,
) -> str:
return "" # TODO
@@ -161,9 +172,16 @@ class Pokemon(HttpStream):
return None # TODO
```
Now download [this file](./cdk-speedrun-assets/pokemon.json). Name it `pokemon.json` and place it in `/source_python_http_example/schemas`.
Now download [this file](./cdk-speedrun-assets/pokemon.json). Name it `pokemon.json` and place it in
`/source_python_http_example/schemas`.
This file defines your output schema for every endpoint that you want to implement. Normally, this will likely be the most time-consuming section of the connector development process, as it requires defining the output of the endpoint exactly. This is really important, as Airbyte needs to have clear expectations for what the stream will output. Note that the name of this stream will be consistent in the naming of the JSON schema and the `HttpStream` class, as `pokemon.json` and `Pokemon` respectively in this case. Learn more about schema creation [here](https://docs.airbyte.com/connector-development/cdk-python/full-refresh-stream#defining-the-streams-schema).
This file defines your output schema for every endpoint that you want to implement. Normally, this
will likely be the most time-consuming section of the connector development process, as it requires
defining the output of the endpoint exactly. This is really important, as Airbyte needs to have
clear expectations for what the stream will output. Note that the name of this stream will be
consistent in the naming of the JSON schema and the `HttpStream` class, as `pokemon.json` and
`Pokemon` respectively in this case. Learn more about schema creation
[here](https://docs.airbyte.com/connector-development/cdk-python/full-refresh-stream#defining-the-streams-schema).
Test your discover function. You should receive a fairly large JSON object in return.
@@ -171,7 +189,8 @@ Test your discover function. You should receive a fairly large JSON object in re
poetry run source-python-http-example discover --config sample_files/config.json
```
Note that our discover function is using the `pokemon_name` config variable passed in from the `Pokemon` stream when we set it in the `__init__` function.
Note that our discover function is using the `pokemon_name` config variable passed in from the
`Pokemon` stream when we set it in the `__init__` function.
### Reading Data from the Source
@@ -220,7 +239,13 @@ class Pokemon(HttpStream):
return None
```
We now need a catalog that defines all of our streams. We only have one stream: `Pokemon`. Download that file [here](./cdk-speedrun-assets/configured_catalog_pokeapi.json). Place it in `/sample_files` named as `configured_catalog.json`. More clearly, this is where we tell Airbyte all the streams/endpoints we support for the connector and in which sync modes Airbyte can run the connector on. Learn more about the AirbyteCatalog [here](https://docs.airbyte.com/understanding-airbyte/beginners-guide-to-catalog) and learn more about sync modes [here](https://docs.airbyte.com/understanding-airbyte/connections#sync-modes).
We now need a catalog that defines all of our streams. We only have one stream: `Pokemon`. Download
that file [here](./cdk-speedrun-assets/configured_catalog_pokeapi.json). Place it in `/sample_files`
named as `configured_catalog.json`. More clearly, this is where we tell Airbyte all the
streams/endpoints we support for the connector and in which sync modes Airbyte can run the connector
on. Learn more about the AirbyteCatalog
[here](https://docs.airbyte.com/understanding-airbyte/beginners-guide-to-catalog) and learn more
about sync modes [here](https://docs.airbyte.com/understanding-airbyte/connections#sync-modes).
Let's read some data.
@@ -230,24 +255,30 @@ poetry run source-python-http-example read --config sample_files/config.json --c
If all goes well, containerize it so you can use it in the UI:
**Option A: Building the docker image with `airbyte-ci`**
This is the preferred method for building and testing connectors.
If you want to open source your connector we encourage you to use our [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) tool to build your connector.
It will not use a Dockerfile but will build the connector image from our [base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md) and use our internal build logic to build an image from your Python connector code.
If you want to open source your connector we encourage you to use our
[`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)
tool to build your connector. It will not use a Dockerfile but will build the connector image from
our
[base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md)
and use our internal build logic to build an image from your Python connector code.
Running `airbyte-ci connectors --name source-<source-name> build` will build your connector image.
Once the command is done, you will find your connector image in your local docker host: `airbyte/source-<source-name>:dev`.
Once the command is done, you will find your connector image in your local docker host:
`airbyte/source-<source-name>:dev`.
**Option B: Building the docker image with a Dockerfile**
If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image using your own Dockerfile. This method is not preferred, and is not supported for certified connectors.
If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image
using your own Dockerfile. This method is not preferred, and is not supported for certified
connectors.
Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look
something like this:
Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look something like this:
```Dockerfile
FROM airbyte/python-connector-base:1.1.0
@@ -263,13 +294,15 @@ RUN pip install ./airbyte/integration_code
Please use this as an example. This is not optimized.
Build your image:
```bash
docker build . -t airbyte/source-example-python:dev
```
You're done. Stop the clock :\)
## Further reading
If you have enjoyed the above example, and would like to explore the Python CDK in even more detail, you may be interested looking at [how to build a connector to extract data from the Webflow API](https://airbyte.com/tutorials/extract-data-from-the-webflow-api)
If you have enjoyed the above example, and would like to explore the Python CDK in even more detail,
you may be interested looking at
[how to build a connector to extract data from the Webflow API](https://airbyte.com/tutorials/extract-data-from-the-webflow-api)

View File

@@ -2,10 +2,18 @@
The second operation in the Airbyte Protocol that we'll implement is the `check` operation.
This operation verifies that the input configuration supplied by the user can be used to connect to the underlying data source. Note that this user-supplied configuration has the values described in the `spec.yaml` filled in. In other words if the `spec.yaml` said that the source requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. You should then implement something that returns a json object reporting, given the credentials in the config, whether we were able to connect to the source.
This operation verifies that the input configuration supplied by the user can be used to connect to
the underlying data source. Note that this user-supplied configuration has the values described in
the `spec.yaml` filled in. In other words if the `spec.yaml` said that the source requires a
`username` and `password` the config object might be
`{ "username": "airbyte", "password": "password123" }`. You should then implement something that
returns a json object reporting, given the credentials in the config, whether we were able to
connect to the source.
In order to make requests to the API, we need to specify the access.
In our case, this is a fairly trivial check since the API requires no credentials. Instead, let's verify that the user-input `base` currency is a legitimate currency. In `source.py` we'll find the following autogenerated source:
In order to make requests to the API, we need to specify the access. In our case, this is a fairly
trivial check since the API requires no credentials. Instead, let's verify that the user-input
`base` currency is a legitimate currency. In `source.py` we'll find the following autogenerated
source:
```python
class SourcePythonHttpTutorial(AbstractSource):
@@ -26,7 +34,8 @@ class SourcePythonHttpTutorial(AbstractSource):
...
```
Following the docstring instructions, we'll change the implementation to verify that the input currency is a real currency:
Following the docstring instructions, we'll change the implementation to verify that the input
currency is a real currency:
```python
def check_connection(self, logger, config) -> Tuple[bool, any]:
@@ -38,9 +47,19 @@ Following the docstring instructions, we'll change the implementation to verify
return True, None
```
Note: in a real implementation you should write code to connect to the API to validate connectivity and not just validate inputs - for an example see `check_connection` in the [OneSignal source connector implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-onesignal/source_onesignal/source.py)
:::info
Let's test out this implementation by creating two objects: a valid and an invalid config and attempt to give them as input to the connector. For this section, you will need to take the API access key generated earlier and add it to both configs. Because these configs contain secrets, we recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` directory is gitignored by default.
In a real implementation you should write code to connect to the API to validate connectivity
and not just validate inputs - for an example see `check_connection` in the
[OneSignal source connector implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-onesignal/source_onesignal/source.py)
:::
Let's test out this implementation by creating two objects: a valid and an invalid config and
attempt to give them as input to the connector. For this section, you will need to take the API
access key generated earlier and add it to both configs. Because these configs contain secrets, we
recommend storing configs which contain secrets in `secrets/config.json` because the `secrets`
directory is gitignored by default.
```bash
mkdir sample_files
@@ -60,4 +79,5 @@ You should see output like the following:
{"type": "CONNECTION_STATUS", "connectionStatus": {"status": "FAILED", "message": "Input currency BTC is invalid. Please input one of the following currencies: {'DKK', 'USD', 'CZK', 'BGN', 'JPY'}"}}
```
While developing, we recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` directory is gitignored by default.
While developing, we recommend storing configs which contain secrets in `secrets/config.json`
because the `secrets` directory is gitignored by default.

View File

@@ -8,9 +8,15 @@ $ cd airbyte-integrations/connector-templates/generator # assumes you are starti
$ ./generate.sh
```
This will bring up an interactive helper application. Use the arrow keys to pick a template from the list. Select the `Python HTTP API Source` template and then input the name of your connector. The application will create a new directory in airbyte/airbyte-integrations/connectors/ with the name of your new connector.
This will bring up an interactive helper application. Use the arrow keys to pick a template from the
list. Select the `Python HTTP API Source` template and then input the name of your connector. The
application will create a new directory in airbyte/airbyte-integrations/connectors/ with the name of
your new connector.
For this walk-through we will refer to our source as `python-http-example`. The finalized source code for this tutorial can be found [here](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-python-http-tutorial).
The source we will build in this tutorial will pull data from the [Rates API](https://exchangeratesapi.io/), a free and open API which documents historical exchange rates for fiat currencies.
For this walk-through we will refer to our source as `python-http-example`. The finalized source
code for this tutorial can be found
[here](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-python-http-tutorial).
The source we will build in this tutorial will pull data from the
[Rates API](https://exchangeratesapi.io/), a free and open API which documents historical exchange
rates for fiat currencies.

View File

@@ -1,15 +1,26 @@
# Step 5: Declare the Schema
The `discover` method of the Airbyte Protocol returns an `AirbyteCatalog`: an object which declares all the streams output by a connector and their schemas. It also declares the sync modes supported by the stream \(full refresh or incremental\). See the [catalog tutorial](https://docs.airbyte.com/understanding-airbyte/beginners-guide-to-catalog) for more information.
The `discover` method of the Airbyte Protocol returns an `AirbyteCatalog`: an object which declares
all the streams output by a connector and their schemas. It also declares the sync modes supported
by the stream \(full refresh or incremental\). See the
[catalog tutorial](https://docs.airbyte.com/understanding-airbyte/beginners-guide-to-catalog) for
more information.
This is a simple task with the Airbyte CDK. For each stream in our connector we'll need to:
This is a simple task with the Airbyte CDK. For each stream in our connector we'll need to:
1. Create a python `class` in `source.py` which extends `HttpStream`.
2. Place a `<stream_name>.json` file in the `source_<name>/schemas/` directory. The name of the file should be the snake\_case name of the stream whose schema it describes, and its contents should be the JsonSchema describing the output from that stream.
1. Create a python `class` in `source.py` which extends `HttpStream`.
2. Place a `<stream_name>.json` file in the `source_<name>/schemas/` directory. The name of the file
should be the snake_case name of the stream whose schema it describes, and its contents should be
the JsonSchema describing the output from that stream.
Let's create a class in `source.py` which extends `HttpStream`. You'll notice there are classes with extensive comments describing what needs to be done to implement various connector features. Feel free to read these classes as needed. But for the purposes of this tutorial, let's assume that we are adding classes from scratch either by deleting those generated classes or editing them to match the implementation below.
Let's create a class in `source.py` which extends `HttpStream`. You'll notice there are classes with
extensive comments describing what needs to be done to implement various connector features. Feel
free to read these classes as needed. But for the purposes of this tutorial, let's assume that we
are adding classes from scratch either by deleting those generated classes or editing them to match
the implementation below.
We'll begin by creating a stream to represent the data that we're pulling from the Exchange Rates API:
We'll begin by creating a stream to represent the data that we're pulling from the Exchange Rates
API:
```python
class ExchangeRates(HttpStream):
@@ -23,9 +34,9 @@ class ExchangeRates(HttpStream):
return None
def path(
self,
stream_state: Mapping[str, Any] = None,
stream_slice: Mapping[str, Any] = None,
self,
stream_state: Mapping[str, Any] = None,
stream_slice: Mapping[str, Any] = None,
next_page_token: Mapping[str, Any] = None
) -> str:
return "" # TODO
@@ -40,7 +51,9 @@ class ExchangeRates(HttpStream):
return None # TODO
```
Note that this implementation is entirely empty -- we haven't actually done anything. We'll come back to this in the next step. But for now we just want to declare the schema of this stream. We'll declare this as a stream that the connector outputs by returning it from the `streams` method:
Note that this implementation is entirely empty -- we haven't actually done anything. We'll come
back to this in the next step. But for now we just want to declare the schema of this stream. We'll
declare this as a stream that the connector outputs by returning it from the `streams` method:
```python
from airbyte_cdk.sources.streams.http.auth import NoAuth
@@ -53,26 +66,32 @@ class SourcePythonHttpTutorial(AbstractSource):
def streams(self, config: Mapping[str, Any]) -> List[Stream]:
# NoAuth just means there is no authentication required for this API and is included for completeness.
# Skip passing an authenticator if no authentication is required.
# Other authenticators are available for API token-based auth and Oauth2.
auth = NoAuth()
# Other authenticators are available for API token-based auth and Oauth2.
auth = NoAuth()
return [ExchangeRates(authenticator=auth)]
```
Having created this stream in code, we'll put a file `exchange_rates.json` in the `schemas/` folder. You can download the JSON file describing the output schema [here](./exchange_rates_schema.json) for convenience and place it in `schemas/`.
Having created this stream in code, we'll put a file `exchange_rates.json` in the `schemas/` folder.
You can download the JSON file describing the output schema [here](./exchange_rates_schema.json) for
convenience and place it in `schemas/`.
With `.json` schema file in place, let's see if the connector can now find this schema and produce a valid catalog:
With `.json` schema file in place, let's see if the connector can now find this schema and produce a
valid catalog:
```text
```bash
poetry run source-python-http-example discover --config secrets/config.json # this is not a mistake, the schema file is found by naming snake_case naming convention as specified above
```
you should see some output like:
```text
```json
{"type": "CATALOG", "catalog": {"streams": [{"name": "exchange_rates", "json_schema": {"$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": {"base": {"type": "string"}, "rates": {"type": "object", "properties": {"GBP": {"type": "number"}, "HKD": {"type": "number"}, "IDR": {"type": "number"}, "PHP": {"type": "number"}, "LVL": {"type": "number"}, "INR": {"type": "number"}, "CHF": {"type": "number"}, "MXN": {"type": "number"}, "SGD": {"type": "number"}, "CZK": {"type": "number"}, "THB": {"type": "number"}, "BGN": {"type": "number"}, "EUR": {"type": "number"}, "MYR": {"type": "number"}, "NOK": {"type": "number"}, "CNY": {"type": "number"}, "HRK": {"type": "number"}, "PLN": {"type": "number"}, "LTL": {"type": "number"}, "TRY": {"type": "number"}, "ZAR": {"type": "number"}, "CAD": {"type": "number"}, "BRL": {"type": "number"}, "RON": {"type": "number"}, "DKK": {"type": "number"}, "NZD": {"type": "number"}, "EEK": {"type": "number"}, "JPY": {"type": "number"}, "RUB": {"type": "number"}, "KRW": {"type": "number"}, "USD": {"type": "number"}, "AUD": {"type": "number"}, "HUF": {"type": "number"}, "SEK": {"type": "number"}}}, "date": {"type": "string"}}}, "supported_sync_modes": ["full_refresh"]}]}}
```
It's that simple! Now the connector knows how to declare your connector's stream's schema. We declare only one stream since our source is simple, but the principle is exactly the same if you had many streams.
You can also dynamically define schemas, but that's beyond the scope of this tutorial. See the [schema docs](../../cdk-python/full-refresh-stream.md#defining-the-streams-schema) for more information.
It's that simple! Now the connector knows how to declare your connector's stream's schema. We
declare only one stream since our source is simple, but the principle is exactly the same if you had
many streams.
You can also dynamically define schemas, but that's beyond the scope of this tutorial. See the
[schema docs](../../cdk-python/full-refresh-stream.md#defining-the-streams-schema) for more
information.

View File

@@ -1,14 +1,25 @@
# Step 3: Define Inputs
Each connector declares the inputs it needs to read data from the underlying data source. This is the Airbyte Protocol's `spec` operation.
Each connector declares the inputs it needs to read data from the underlying data source. This is
the Airbyte Protocol's `spec` operation.
The simplest way to implement this is by creating a `spec.yaml` file in `source_<name>/spec.yaml` which describes your connector's inputs according to the [ConnectorSpecification](https://github.com/airbytehq/airbyte/blob/master/docs/understanding-airbyte/airbyte-protocol.md#spec) schema. This is a good place to start when developing your source. Using JsonSchema, define what the inputs are \(e.g. username and password\). Here's [an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml) of what the `spec.yaml` looks like for the Stripe API source.
The simplest way to implement this is by creating a `spec.yaml` file in `source_<name>/spec.yaml`
which describes your connector's inputs according to the
[ConnectorSpecification](https://github.com/airbytehq/airbyte/blob/master/docs/understanding-airbyte/airbyte-protocol.md#spec)
schema. This is a good place to start when developing your source. Using JsonSchema, define what the
inputs are \(e.g. username and password\). Here's
[an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/spec.yaml)
of what the `spec.yaml` looks like for the Stripe API source.
For more details on what the spec is, you can read about the Airbyte Protocol [here](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol).
For more details on what the spec is, you can read about the Airbyte Protocol
[here](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol).
The generated code that Airbyte provides, handles implementing the `spec` method for you. It assumes that there will be a file called `spec.yaml` in the same directory as `source.py`. If you have declared the necessary JsonSchema in `spec.yaml` you should be done with this step.
The generated code that Airbyte provides, handles implementing the `spec` method for you. It assumes
that there will be a file called `spec.yaml` in the same directory as `source.py`. If you have
declared the necessary JsonSchema in `spec.yaml` you should be done with this step.
Given that we'll pulling currency data for our example source, we'll define the following `spec.yaml`:
Given that we'll pulling currency data for our example source, we'll define the following
`spec.yaml`:
```yaml
documentationUrl: https://docs.airbyte.com/integrations/sources/exchangeratesapi
@@ -36,12 +47,13 @@ connectionSpecification:
examples:
- USD
- EUR
description: "ISO reference currency. See <a href=\"https://www.ecb.europa.eu/stats/policy_and_exchange_rates/euro_reference_exchange_rates/html/index.en.html\">here</a>."
description:
'ISO reference currency. See <a
href="https://www.ecb.europa.eu/stats/policy_and_exchange_rates/euro_reference_exchange_rates/html/index.en.html">here</a>.'
```
In addition to metadata, we define three inputs:
* `apikey`: The API access key used to authenticate requests to the API
* `start_date`: The beginning date to start tracking currency exchange rates from
* `base`: The currency whose rates we're interested in tracking
- `apikey`: The API access key used to authenticate requests to the API
- `start_date`: The beginning date to start tracking currency exchange rates from
- `base`: The currency whose rates we're interested in tracking

View File

@@ -2,30 +2,37 @@
## Summary
This is a step-by-step guide for how to create an Airbyte source in Python to read data from an HTTP API. We'll be using the Exchange Rates API as an example since it is simple and demonstrates a lot of the capabilities of the CDK.
This is a step-by-step guide for how to create an Airbyte source in Python to read data from an HTTP
API. We'll be using the Exchange Rates API as an example since it is simple and demonstrates a lot
of the capabilities of the CDK.
## Requirements
* Python &gt;= 3.9
* [Poetry](https://python-poetry.org/)
* Docker
All the commands below assume that `python` points to a version of python &gt;=3.9.0. On some systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the case on your machine, substitute all `python` commands in this guide with `python3`.
All the commands below assume that `python` points to a version of python &gt;=3.9.0. On some
systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the
case on your machine, substitute all `python` commands in this guide with `python3`.
## Exchange Rates API Setup
For this guide we will be making API calls to the Exchange Rates API. In order to generate the API access key that will be used by the new connector, you will have to follow steps on the [Exchange Rates Data API](https://apilayer.com/marketplace/exchangerates_data-api/) by signing up for the Free tier plan. Once you have an API access key, you can continue with the guide.
For this guide we will be making API calls to the Exchange Rates API. In order to generate the API
access key that will be used by the new connector, you will have to follow steps on the
[Exchange Rates Data API](https://apilayer.com/marketplace/exchangerates_data-api/) by signing up
for the Free tier plan. Once you have an API access key, you can continue with the guide.
## Checklist
* Step 1: Create the source using the template
* Step 2: Install dependencies for the new source
* Step 3: Define the inputs needed by your connector
* Step 4: Implement connection checking
* Step 5: Declare the schema of your streams
* Step 6: Implement functionality for reading your streams
* Step 7: Use the connector in Airbyte
* Step 8: Write unit tests or integration tests
Each step of the Creating a Source checklist is explained in more detail in the following steps. We also mention how you can submit the connector to be included with the general Airbyte release at the end of the tutorial.
- Step 1: Create the source using the template
- Step 2: Install dependencies for the new source
- Step 3: Define the inputs needed by your connector
- Step 4: Implement connection checking
- Step 5: Declare the schema of your streams
- Step 6: Implement functionality for reading your streams
- Step 7: Use the connector in Airbyte
- Step 8: Write unit tests or integration tests
Each step of the Creating a Source checklist is explained in more detail in the following steps. We
also mention how you can submit the connector to be included with the general Airbyte release at the
end of the tutorial.

View File

@@ -7,7 +7,6 @@ cd ../../connectors/source-<name>
poetry install
```
Let's verify everything is working as intended. Run:
```bash
@@ -16,32 +15,43 @@ poetry run source-<name> spec
You should see some output:
```text
```json
{"type": "SPEC", "spec": {"documentationUrl": "https://docsurl.com", "connectionSpecification": {"$schema": "http://json-schema.org/draft-07/schema#", "title": "Python Http Tutorial Spec", "type": "object", "required": ["TODO"], "properties": {"TODO: This schema defines the configuration required for the source. This usually involves metadata such as database and/or authentication information.": {"type": "string", "description": "describe me"}}}}}
```
We just ran Airbyte Protocol's `spec` command! We'll talk more about this later, but this is a simple sanity check to make sure everything is wired up correctly.
We just ran Airbyte Protocol's `spec` command! We'll talk more about this later, but this is a
simple sanity check to make sure everything is wired up correctly.
## Notes on iteration cycle
### Dependencies
Python dependencies for your source should be declared in `airbyte-integrations/connectors/source-<source-name>/setup.py` in the `install_requires` field. You will notice that a couple of Airbyte dependencies are already declared there. Do not remove these; they give your source access to the helper interfaces provided by the generator.
Python dependencies for your source should be declared in
`airbyte-integrations/connectors/source-<source-name>/setup.py` in the `install_requires` field. You
will notice that a couple of Airbyte dependencies are already declared there. Do not remove these;
they give your source access to the helper interfaces provided by the generator.
You may notice that there is a `requirements.txt` in your source's directory as well. Don't edit this. It is autogenerated and used to provide Airbyte dependencies. All your dependencies should be declared in `setup.py`.
You may notice that there is a `requirements.txt` in your source's directory as well. Don't edit
this. It is autogenerated and used to provide Airbyte dependencies. All your dependencies should be
declared in `setup.py`.
### Development Environment
The commands we ran above created a [Python virtual environment](https://docs.python.org/3/tutorial/venv.html) for your source. If you want your IDE to auto complete and resolve dependencies properly, point it at the virtual env `airbyte-integrations/connectors/source-<source-name>/.venv`. Also anytime you change the dependencies in the `setup.py` make sure to re-run `pip install -r requirements.txt`.
The commands we ran above created a
[Python virtual environment](https://docs.python.org/3/tutorial/venv.html) for your source. If you
want your IDE to auto complete and resolve dependencies properly, point it at the virtual env
`airbyte-integrations/connectors/source-<source-name>/.venv`. Also anytime you change the
dependencies in the `setup.py` make sure to re-run `pip install -r requirements.txt`.
### Iterating on your implementation
There are two ways we recommend iterating on a source. Consider using whichever one matches your style.
There are two ways we recommend iterating on a source. Consider using whichever one matches your
style.
**Run the source using python**
You'll notice in your source's directory that there is a python file called `main.py`. This file exists as convenience for development. You run it to test that your source works:
You'll notice in your source's directory that there is a python file called `main.py`. This file
exists as convenience for development. You run it to test that your source works:
```bash
# from airbyte-integrations/connectors/source-<name>
@@ -51,11 +61,15 @@ poetry run source-<name> discover --config secrets/config.json
poetry run source-<name> read --config secrets/config.json --catalog sample_files/configured_catalog.json
```
The nice thing about this approach is that you can iterate completely within python. The downside is that you are not quite running your source as it will actually be run by Airbyte. Specifically, you're not running it from within the docker container that will house it.
The nice thing about this approach is that you can iterate completely within python. The downside is
that you are not quite running your source as it will actually be run by Airbyte. Specifically,
you're not running it from within the docker container that will house it.
**Run the source using docker**
If you want to run your source exactly as it will be run by Airbyte \(i.e. within a docker container\), you can use the following commands from the connector module directory \(`airbyte-integrations/connectors/source-python-http-example`\):
If you want to run your source exactly as it will be run by Airbyte \(i.e. within a docker
container\), you can use the following commands from the connector module directory
\(`airbyte-integrations/connectors/source-python-http-example`\):
```bash
# First build the container
@@ -68,7 +82,14 @@ docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-<name>:dev discover --
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/sample_files:/sample_files airbyte/source-<name>:dev read --config /secrets/config.json --catalog /sample_files/configured_catalog.json
```
Note: Each time you make a change to your implementation you need to re-build the connector image via `docker build . -t airbyte/source-<name>:dev`. This ensures the new python code is added into the docker container.
:::info
The nice thing about this approach is that you are running your source exactly as it will be run by Airbyte. The tradeoff is iteration is slightly slower, as the connector is re-built between each change.
Each time you make a change to your implementation you need to re-build the connector image
via `docker build . -t airbyte/source-<name>:dev`. This ensures the new python code is added into
the docker container.
:::
The nice thing about this approach is that you are running your source exactly as it will be run by
Airbyte. The tradeoff is iteration is slightly slower, as the connector is re-built between each
change.

View File

@@ -1,36 +1,45 @@
# Step 6: Read Data
Describing schemas is good and all, but at some point we have to start reading data! So let's get to work. But before, let's describe what we're about to do:
Describing schemas is good and all, but at some point we have to start reading data! So let's get to
work. But before, let's describe what we're about to do:
The `HttpStream` superclass, like described in the [concepts documentation](../../cdk-python/http-streams.md), is facilitating reading data from HTTP endpoints. It contains built-in functions or helpers for:
The `HttpStream` superclass, like described in the
[concepts documentation](../../cdk-python/http-streams.md), is facilitating reading data from HTTP
endpoints. It contains built-in functions or helpers for:
* authentication
* pagination
* handling rate limiting or transient errors
* and other useful functionality
- authentication
- pagination
- handling rate limiting or transient errors
- and other useful functionality
In order for it to be able to do this, we have to provide it with a few inputs:
* the URL base and path of the endpoint we'd like to hit
* how to parse the response from the API
* how to perform pagination
- the URL base and path of the endpoint we'd like to hit
- how to parse the response from the API
- how to perform pagination
Optionally, we can provide additional inputs to customize requests:
* request parameters and headers
* how to recognize rate limit errors, and how long to wait \(by default it retries 429 and 5XX errors using exponential backoff\)
* HTTP method and request body if applicable
* configure exponential backoff policy
- request parameters and headers
- how to recognize rate limit errors, and how long to wait \(by default it retries 429 and 5XX
errors using exponential backoff\)
- HTTP method and request body if applicable
- configure exponential backoff policy
Backoff policy options:
* `retry_factor` Specifies factor for exponential backoff policy \(by default is 5\)
* `max_retries` Specifies maximum amount of retries for backoff policy \(by default is 5\)
* `raise_on_http_errors` If set to False, allows opting-out of raising HTTP code exception \(by default is True\)
- `retry_factor` Specifies factor for exponential backoff policy \(by default is 5\)
- `max_retries` Specifies maximum amount of retries for backoff policy \(by default is 5\)
- `raise_on_http_errors` If set to False, allows opting-out of raising HTTP code exception \(by
default is True\)
There are many other customizable options - you can find them in the [`airbyte_cdk.sources.streams.http.HttpStream`](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/streams/http/http.py) class.
There are many other customizable options - you can find them in the
[`airbyte_cdk.sources.streams.http.HttpStream`](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/streams/http/http.py)
class.
So in order to read data from the exchange rates API, we'll fill out the necessary information for the stream to do its work. First, we'll implement a basic read that just reads the last day's exchange rates, then we'll implement incremental sync using stream slicing.
So in order to read data from the exchange rates API, we'll fill out the necessary information for
the stream to do its work. First, we'll implement a basic read that just reads the last day's
exchange rates, then we'll implement incremental sync using stream slicing.
Let's begin by pulling data for the last day's rates by using the `/latest` endpoint:
@@ -47,13 +56,13 @@ class ExchangeRates(HttpStream):
def path(
self,
stream_state: Mapping[str, Any] = None,
stream_slice: Mapping[str, Any] = None,
self,
stream_state: Mapping[str, Any] = None,
stream_slice: Mapping[str, Any] = None,
next_page_token: Mapping[str, Any] = None
) -> str:
# The "/latest" path gives us the latest currency exchange rates
return "latest"
return "latest"
def request_headers(
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None
@@ -77,23 +86,30 @@ class ExchangeRates(HttpStream):
stream_slice: Mapping[str, Any] = None,
next_page_token: Mapping[str, Any] = None,
) -> Iterable[Mapping]:
# The response is a simple JSON whose schema matches our stream's schema exactly,
# The response is a simple JSON whose schema matches our stream's schema exactly,
# so we just return a list containing the response
return [response.json()]
def next_page_token(self, response: requests.Response) -> Optional[Mapping[str, Any]]:
# The API does not offer pagination,
# The API does not offer pagination,
# so we return None to indicate there are no more pages in the response
return None
```
This may look big, but that's just because there are lots of \(unused, for now\) parameters in these methods \(those can be hidden with Python's `**kwargs`, but don't worry about it for now\). Really we just added a few lines of "significant" code:
This may look big, but that's just because there are lots of \(unused, for now\) parameters in these
methods \(those can be hidden with Python's `**kwargs`, but don't worry about it for now\). Really
we just added a few lines of "significant" code:
1. Added a constructor `__init__` which stores the `base` currency to query for and the `apikey` used for authentication.
2. `return {'base': self.base}` to add the `?base=<base-value>` query parameter to the request based on the `base` input by the user.
3. `return {'apikey': self.apikey}` to add the header `apikey=<apikey-string>` to the request based on the `apikey` input by the user.
4. `return [response.json()]` to parse the response from the API to match the schema of our schema `.json` file.
5. `return "latest"` to indicate that we want to hit the `/latest` endpoint of the API to get the latest exchange rate data.
1. Added a constructor `__init__` which stores the `base` currency to query for and the `apikey`
used for authentication.
2. `return {'base': self.base}` to add the `?base=<base-value>` query parameter to the request based
on the `base` input by the user.
3. `return {'apikey': self.apikey}` to add the header `apikey=<apikey-string>` to the request based
on the `apikey` input by the user.
4. `return [response.json()]` to parse the response from the API to match the schema of our schema
`.json` file.
5. `return "latest"` to indicate that we want to hit the `/latest` endpoint of the API to get the
latest exchange rate data.
Let's also pass the config specified by the user to the stream class:
@@ -105,7 +121,11 @@ Let's also pass the config specified by the user to the stream class:
We're now ready to query the API!
To do this, we'll need a [ConfiguredCatalog](../../../understanding-airbyte/beginners-guide-to-catalog.md). We've prepared one [here](https://github.com/airbytehq/airbyte/blob/master/docs/connector-development/tutorials/cdk-tutorial-python-http/configured_catalog.json) -- download this and place it in `sample_files/configured_catalog.json`. Then run:
To do this, we'll need a
[ConfiguredCatalog](../../../understanding-airbyte/beginners-guide-to-catalog.md). We've prepared
one
[here](https://github.com/airbytehq/airbyte/blob/master/docs/connector-development/tutorials/cdk-tutorial-python-http/configured_catalog.json)
-- download this and place it in `sample_files/configured_catalog.json`. Then run:
```bash
poetry run source-<name> --config secrets/config.json --catalog sample_files/configured_catalog.json
@@ -119,20 +139,25 @@ you should see some output lines, one of which is a record from the API:
There we have it - a stream which reads data in just a few lines of code!
We theoretically _could_ stop here and call it a connector. But let's give adding incremental sync a shot.
We theoretically _could_ stop here and call it a connector. But let's give adding incremental sync a
shot.
## Adding incremental sync
To add incremental sync, we'll do a few things:
1. Pass the `start_date` param input by the user into the stream.
2. Declare the stream's `cursor_field`.
To add incremental sync, we'll do a few things:
1. Pass the `start_date` param input by the user into the stream.
2. Declare the stream's `cursor_field`.
3. Declare the stream's property `_cursor_value` to hold the state value
4. Add `IncrementalMixin` to the list of the ancestors of the stream and implement setter and getter of the `state`.
5. Implement the `stream_slices` method.
6. Update the `path` method to specify the date to pull exchange rates for.
4. Add `IncrementalMixin` to the list of the ancestors of the stream and implement setter and getter
of the `state`.
5. Implement the `stream_slices` method.
6. Update the `path` method to specify the date to pull exchange rates for.
7. Update the configured catalog to use `incremental` sync when we're testing the stream.
We'll describe what each of these methods do below. Before we begin, it may help to familiarize yourself with how incremental sync works in Airbyte by reading the [docs on incremental](/using-airbyte/core-concepts/sync-modes/incremental-append.md).
We'll describe what each of these methods do below. Before we begin, it may help to familiarize
yourself with how incremental sync works in Airbyte by reading the
[docs on incremental](/using-airbyte/core-concepts/sync-modes/incremental-append.md).
To keep things concise, we'll only show functions as we edit them one by one.
@@ -166,11 +191,18 @@ class ExchangeRates(HttpStream, IncrementalMixin):
self._cursor_value = None
```
Declaring the `cursor_field` informs the framework that this stream now supports incremental sync. The next time you run `python main_dev.py discover --config secrets/config.json` you'll find that the `supported_sync_modes` field now also contains `incremental`.
Declaring the `cursor_field` informs the framework that this stream now supports incremental sync.
The next time you run `python main_dev.py discover --config secrets/config.json` you'll find that
the `supported_sync_modes` field now also contains `incremental`.
But we're not quite done with supporting incremental, we have to actually emit state! We'll structure our state object very simply: it will be a `dict` whose single key is `'date'` and value is the date of the last day we synced data from. For example, `{'date': '2021-04-26'}` indicates the connector previously read data up until April 26th and therefore shouldn't re-read anything before April 26th.
But we're not quite done with supporting incremental, we have to actually emit state! We'll
structure our state object very simply: it will be a `dict` whose single key is `'date'` and value
is the date of the last day we synced data from. For example, `{'date': '2021-04-26'}` indicates the
connector previously read data up until April 26th and therefore shouldn't re-read anything before
April 26th.
Let's do this by implementing the getter and setter for the `state` inside the `ExchangeRates` class.
Let's do this by implementing the getter and setter for the `state` inside the `ExchangeRates`
class.
```python
@property
@@ -179,7 +211,7 @@ Let's do this by implementing the getter and setter for the `state` inside the `
return {self.cursor_field: self._cursor_value.strftime('%Y-%m-%d')}
else:
return {self.cursor_field: self.start_date.strftime('%Y-%m-%d')}
@state.setter
def state(self, value: Mapping[str, Any]):
self._cursor_value = datetime.strptime(value[self.cursor_field], '%Y-%m-%d')
@@ -197,9 +229,11 @@ Update internal state `cursor_value` inside `read_records` method
```
This implementation compares the date from the latest record with the date in the current state and takes the maximum as the "new" state object.
This implementation compares the date from the latest record with the date in the current state and
takes the maximum as the "new" state object.
We'll implement the `stream_slices` method to return a list of the dates for which we should pull data based on the stream state if it exists:
We'll implement the `stream_slices` method to return a list of the dates for which we should pull
data based on the stream state if it exists:
```python
def _chunk_date_range(self, start_date: datetime) -> List[Mapping[str, Any]]:
@@ -218,18 +252,24 @@ We'll implement the `stream_slices` method to return a list of the dates for whi
return self._chunk_date_range(start_date)
```
Each slice will cause an HTTP request to be made to the API. We can then use the information present in the `stream_slice` parameter \(a single element from the list we constructed in `stream_slices` above\) to set other configurations for the outgoing request like `path` or `request_params`. For more info about stream slicing, see [the slicing docs](../../cdk-python/stream-slices.md).
Each slice will cause an HTTP request to be made to the API. We can then use the information present
in the `stream_slice` parameter \(a single element from the list we constructed in `stream_slices`
above\) to set other configurations for the outgoing request like `path` or `request_params`. For
more info about stream slicing, see [the slicing docs](../../cdk-python/stream-slices.md).
In order to pull data for a specific date, the Exchange Rates API requires that we pass the date as the path component of the URL. Let's override the `path` method to achieve this:
In order to pull data for a specific date, the Exchange Rates API requires that we pass the date as
the path component of the URL. Let's override the `path` method to achieve this:
```python
def path(self, stream_state: Mapping[str, Any] = None, stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None) -> str:
return stream_slice['date']
```
With these changes, your implementation should look like the file [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-python-http-tutorial/source_python_http_tutorial/source.py).
With these changes, your implementation should look like the file
[here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-python-http-tutorial/source_python_http_tutorial/source.py).
The last thing we need to do is change the `sync_mode` field in the `sample_files/configured_catalog.json` to `incremental`:
The last thing we need to do is change the `sync_mode` field in the
`sample_files/configured_catalog.json` to `incremental`:
```text
"sync_mode": "incremental",
@@ -243,7 +283,8 @@ Let's try it out:
poetry run source-<name> --config secrets/config.json --catalog sample_files/configured_catalog.json
```
You should see a bunch of `RECORD` messages and `STATE` messages. To verify that incremental sync is working, pass the input state back to the connector and run it again:
You should see a bunch of `RECORD` messages and `STATE` messages. To verify that incremental sync is
working, pass the input state back to the connector and run it again:
```bash
# Save the latest state to sample_files/state.json
@@ -253,7 +294,7 @@ poetry run source-<name> --config secrets/config.json --catalog sample_files/con
poetry run source-<name> --config secrets/config.json --catalog sample_files/configured_catalog.json --state sample_files/state.json
```
You should see that only the record from the last date is being synced! This is acceptable behavior, since Airbyte requires at-least-once delivery of records, so repeating the last record twice is OK.
You should see that only the record from the last date is being synced! This is acceptable behavior,
since Airbyte requires at-least-once delivery of records, so repeating the last record twice is OK.
With that, we've implemented incremental sync for our connector!

View File

@@ -1,4 +1,4 @@
# Step 8: Test Connector
# Step 8: Test the Connector
## Unit Tests
@@ -8,15 +8,21 @@ You can run the tests using `poetry run pytest tests/unit_tests`.
## Integration Tests
Place any integration tests in the `integration_tests` directory such that they can be [discovered by pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html#conventions-for-python-test-discovery).
Place any integration tests in the `integration_tests` directory such that they can be
[discovered by pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html#conventions-for-python-test-discovery).
You can run the tests using `poetry run pytest tests/integration_tests`.
More information on integration testing can be found on [the Testing Connectors doc](https://docs.airbyte.com/connector-development/testing-connectors/#running-integration-tests).
More information on integration testing can be found on
[the Testing Connectors doc](https://docs.airbyte.com/connector-development/testing-connectors/#running-integration-tests).
## Standard Tests
## Connector Acceptance Tests
Standard tests are a fixed set of tests Airbyte provides that every Airbyte source connector must pass. While they're only required if you intend to submit your connector to Airbyte, you might find them helpful in any case. See [Testing your connectors](../../testing-connectors/)
If you want to submit this connector to become a default connector within Airbyte, follow steps 8 onwards from the [Python source checklist](../building-a-python-source.md#step-8-set-up-standard-tests)
Connector Acceptance Tests (CATs) are a fixed set of tests Airbyte provides that every Airbyte
source connector must pass. While they're only required if you intend to submit your connector
to Airbyte, you might find them helpful in any case. See
[Testing your connectors](../../testing-connectors/)
If you want to submit this connector to become a default connector within Airbyte, follow steps 8
onwards from the
[Python source checklist](../building-a-python-source.md#step-8-set-up-standard-tests)

View File

@@ -1,26 +1,32 @@
# Step 7: Use the Connector in Airbyte
To use your connector in your own installation of Airbyte you have to build the docker image for your connector.
To use your connector in your own installation of Airbyte you have to build the docker image for
your connector.
**Option A: Building the docker image with `airbyte-ci`**
This is the preferred method for building and testing connectors.
If you want to open source your connector we encourage you to use our [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) tool to build your connector.
It will not use a Dockerfile but will build the connector image from our [base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md) and use our internal build logic to build an image from your Python connector code.
If you want to open source your connector we encourage you to use our
[`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)
tool to build your connector. It will not use a Dockerfile but will build the connector image from
our
[base image](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/README.md)
and use our internal build logic to build an image from your Python connector code.
Running `airbyte-ci connectors --name source-<source-name> build` will build your connector image.
Once the command is done, you will find your connector image in your local docker host: `airbyte/source-<source-name>:dev`.
Once the command is done, you will find your connector image in your local docker host:
`airbyte/source-<source-name>:dev`.
**Option B: Building the docker image with a Dockerfile**
If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image using your own Dockerfile. This method is not preferred, and is not supported for certified connectors.
If you don't want to rely on `airbyte-ci` to build your connector, you can build the docker image
using your own Dockerfile. This method is not preferred, and is not supported for certified
connectors.
Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look
something like this:
Create a `Dockerfile` in the root of your connector directory. The `Dockerfile` should look something like this:
```Dockerfile
FROM airbyte/python-connector-base:1.1.0
@@ -36,11 +42,15 @@ RUN pip install ./airbyte/integration_code
Please use this as an example. This is not optimized.
Build your image:
```bash
docker build . -t airbyte/source-example-python:dev
```
Then, follow the instructions from the [building a Python source tutorial](../building-a-python-source.md#step-11-add-the-connector-to-the-api-ui) for using the connector in the Airbyte UI, replacing the name as appropriate.
Note: your built docker image must be accessible to the `docker` daemon running on the Airbyte node. If you're doing this tutorial locally, these instructions are sufficient. Otherwise you may need to push your Docker image to Dockerhub.
Then, follow the instructions from the
[building a Python source tutorial](../building-a-python-source.md#step-11-add-the-connector-to-the-api-ui)
for using the connector in the Airbyte UI, replacing the name as appropriate.
Note: your built docker image must be accessible to the `docker` daemon running on the Airbyte node.
If you're doing this tutorial locally, these instructions are sufficient. Otherwise you may need to
push your Docker image to Dockerhub.

View File

@@ -1,97 +1,119 @@
# Profile Java Connector Memory Usage
This tutorial demos how to profile the memory usage of a Java connector with Visual VM. Such profiling can be useful when we want to debug memory leaks, or optimize the connector's memory footprint.
This tutorial demos how to profile the memory usage of a Java connector with Visual VM. Such
profiling can be useful when we want to debug memory leaks, or optimize the connector's memory
footprint.
The example focuses on docker deployment, because it is more straightforward. It is also possible to apply the same procedure to Kubernetes deployments.
The example focuses on docker deployment, because it is more straightforward. It is also possible to
apply the same procedure to Kubernetes deployments.
## Prerequisite
- [Docker](https://www.docker.com/products/personal) running locally.
- [VisualVM](https://visualvm.github.io/) preinstalled.
## Step-by-Step
1. Enable JMX in `airbyte-integrations/connectors/<connector-name>/build.gradle`, and expose it on port 6000. The port is chosen arbitrary, and can be port number that's available.
- `<connector-name>` examples: `source-mysql`, `source-github`, `destination-snowflake`.
```groovy
application {
mainClass = 'io.airbyte.integrations.<connector-main-class>'
applicationDefaultJvmArgs = [
'-XX:+ExitOnOutOfMemoryError',
'-XX:MaxRAMPercentage=75.0',
1. Enable JMX in `airbyte-integrations/connectors/<connector-name>/build.gradle`, and expose it on
port 6000. The port is chosen arbitrary, and can be port number that's available.
// add the following JVM arguments to enable JMX:
'-XX:NativeMemoryTracking=detail',
'-XX:+UsePerfData',
'-Djava.rmi.server.hostname=localhost',
'-Dcom.sun.management.jmxremote=true',
'-Dcom.sun.management.jmxremote.port=6000',
"-Dcom.sun.management.jmxremote.rmi.port=6000",
'-Dcom.sun.management.jmxremote.local.only=false',
'-Dcom.sun.management.jmxremote.authenticate=false',
'-Dcom.sun.management.jmxremote.ssl=false',
- `<connector-name>` examples: `source-mysql`, `source-github`, `destination-snowflake`.
// optionally, add a max heap size to limit the memory usage
'-Xmx2000m',
]
```groovy
application {
mainClass = 'io.airbyte.integrations.<connector-main-class>'
applicationDefaultJvmArgs = [
'-XX:+ExitOnOutOfMemoryError',
'-XX:MaxRAMPercentage=75.0',
// add the following JVM arguments to enable JMX:
'-XX:NativeMemoryTracking=detail',
'-XX:+UsePerfData',
'-Djava.rmi.server.hostname=localhost',
'-Dcom.sun.management.jmxremote=true',
'-Dcom.sun.management.jmxremote.port=6000',
"-Dcom.sun.management.jmxremote.rmi.port=6000",
'-Dcom.sun.management.jmxremote.local.only=false',
'-Dcom.sun.management.jmxremote.authenticate=false',
'-Dcom.sun.management.jmxremote.ssl=false',
// optionally, add a max heap size to limit the memory usage
'-Xmx2000m',
]
}
```
2. Modify `airbyte-integrations/connectors/<connector-name>/Dockerfile` to expose the JMX port.
```dockerfile
// optionally install procps to enable the ps command in the connector container
RUN apt-get update && apt-get install -y procps && rm -rf /var/lib/apt/lists/*
```dockerfile
// optionally install procps to enable the ps command in the connector container
RUN apt-get update && apt-get install -y procps && rm -rf /var/lib/apt/lists/*
// expose the same JMX port specified in the previous step
EXPOSE 6000
```
// expose the same JMX port specified in the previous step
EXPOSE 6000
```
3. Expose the same port in `airbyte-workers/src/main/java/io/airbyte/workers/process/DockerProcessFactory.java`.
3. Expose the same port in
`airbyte-workers/src/main/java/io/airbyte/workers/process/DockerProcessFactory.java`.
```java
// map local 6000 to the JMX port from the container
if (imageName.startsWith("airbyte/<connector-name>")) {
LOGGER.info("Exposing image {} port 6000", imageName);
cmd.add("-p");
cmd.add("6000:6000");
}
```
```java
// map local 6000 to the JMX port from the container
if (imageName.startsWith("airbyte/<connector-name>")) {
LOGGER.info("Exposing image {} port 6000", imageName);
cmd.add("-p");
cmd.add("6000:6000");
}
```
Disable the [`host` network mode](https://docs.docker.com/network/host/) by _removing_ the following code block in the same file. This is necessary because under the `host` network mode, published ports are discarded.
Disable the [`host` network mode](https://docs.docker.com/network/host/) by _removing_ the
following code block in the same file. This is necessary because under the `host` network mode,
published ports are discarded.
```java
if (networkName != null) {
cmd.add("--network");
cmd.add(networkName);
}
```
```java
if (networkName != null) {
cmd.add("--network");
cmd.add(networkName);
}
```
(This [commit](https://github.com/airbytehq/airbyte/pull/10394/commits/097ec57869a64027f5b7858aa8bb9575844e8b76) can be used as a reference. It reverts them. So just do the opposite.)
(This
[commit](https://github.com/airbytehq/airbyte/pull/10394/commits/097ec57869a64027f5b7858aa8bb9575844e8b76)
can be used as a reference. It reverts them. So just do the opposite.)
4. Build and launch Airbyte locally. It is necessary to build it because we have modified the `DockerProcessFactory.java`.
4. Build and launch Airbyte locally. It is necessary to build it because we have modified the
`DockerProcessFactory.java`.
```sh
SUB_BUILD=PLATFORM ./gradlew build -x test
VERSION=dev docker compose up
```
```sh
SUB_BUILD=PLATFORM ./gradlew build -x test
VERSION=dev docker compose up
```
5. Build the connector to be profiled locally. It will create a `dev` version local image: `airbyte/<connector-name>:dev`.
5. Build the connector to be profiled locally. It will create a `dev` version local image:
`airbyte/<connector-name>:dev`.
```sh
./gradlew :airbyte-integrations:connectors:<connector-name>:airbyteDocker
```
```sh
./gradlew :airbyte-integrations:connectors:<connector-name>:airbyteDocker
```
6. Connect to the launched local Airbyte server at `localhost:8000`, go to the `Settings` page, and change the version of the connector to be profiled to `dev` which was just built in the previous step.
6. Connect to the launched local Airbyte server at `localhost:8000`, go to the `Settings` page, and
change the version of the connector to be profiled to `dev` which was just built in the previous
step.
7. Create a connection using the connector to be profiled.
- The `Replication frequency` of this connector should be `manual` so that we can control when it starts.
- We can use the e2e test connectors as either the source or destination for convenience.
- The e2e test connectors are usually very reliable, and requires little configuration.
- For example, if we are profiling a source connector, create an e2e test destination at the other end of the connection.
- The `Replication frequency` of this connector should be `manual` so that we can control when it
starts.
- We can use the e2e test connectors as either the source or destination for convenience.
- The e2e test connectors are usually very reliable, and requires little configuration.
- For example, if we are profiling a source connector, create an e2e test destination at the
other end of the connection.
8. Profile the connector in question.
- Launch a data sync run.
- After the run starts, open Visual VM, and click `File` / `Add JMX Connection...`. A modal will show up. Type in `localhost:6000`, and click `OK`.
- Now we can see a new connection shows up under the `Local` category on the left, and the information about the connector's JVM gets retrieved.
![visual vm screenshot](https://visualvm.github.io/images/visualvm_screenshot_20.png)
- Launch a data sync run.
- After the run starts, open Visual VM, and click `File` / `Add JMX Connection...`. A modal will
show up. Type in `localhost:6000`, and click `OK`.
- Now we can see a new connection shows up under the `Local` category on the left, and the
information about the connector's JVM gets retrieved.
![visual vm screenshot](https://visualvm.github.io/images/visualvm_screenshot_20.png)

View File

@@ -2,13 +2,26 @@
## Overview
This tutorial will assume that you already have a working source. If you do not, feel free to refer to the [Building a Toy Connector](build-a-connector-the-hard-way.md) tutorial. This tutorial will build directly off the example from that article. We will also assume that you have a basic understanding of how Airbyte's Incremental-Append replication strategy works. We have a brief explanation of it [here](/using-airbyte/core-concepts/sync-modes/incremental-append.md).
This tutorial will assume that you already have a working source. If you do not, feel free to refer
to the [Building a Toy Connector](build-a-connector-the-hard-way.md) tutorial. This tutorial will
build directly off the example from that article. We will also assume that you have a basic
understanding of how Airbyte's Incremental-Append replication strategy works. We have a brief
explanation of it [here](../../../using-airbyte/core-concepts/sync-modes/incremental-append.md).
## Update Catalog in `discover`
First we need to identify a given stream in the Source as supporting incremental. This information is declared in the catalog that the `discover` method returns. You will notice in the stream object contains a field called `supported_sync_modes`. If we are adding incremental to an existing stream, we just need to add `"incremental"` to that array. This tells Airbyte that this stream can either be synced in an incremental fashion. In practice, this will mean that in the UI, a user will have the ability to configure this type of sync.
First we need to identify a given stream in the Source as supporting incremental. This information
is declared in the catalog that the `discover` method returns. You will notice in the stream object
contains a field called `supported_sync_modes`. If we are adding incremental to an existing stream,
we just need to add `"incremental"` to that array. This tells Airbyte that this stream can either be
synced in an incremental fashion. In practice, this will mean that in the UI, a user will have the
ability to configure this type of sync.
In the example we used in the Toy Connector tutorial, the `discover` method would not look like this. Note: that "incremental" has been added to the `supported_sync_modes` array. We also set `source_defined_cursor` to `True` and `default_cursor_field` to `["date"]` to declare that the Source knows what field to use for the cursor, in this case the date field, and does not require user input. Nothing else has changed.
In the example we used in the Toy Connector tutorial, the `discover` method would not look like
this. Note: that "incremental" has been added to the `supported_sync_modes` array. We also set
`source_defined_cursor` to `True` and `default_cursor_field` to `["date"]` to declare that the
Source knows what field to use for the cursor, in this case the date field, and does not require
user input. Nothing else has changed.
```python
def discover():
@@ -38,6 +51,7 @@ def discover():
```
Also, create a file called `incremental_configured_catalog.json` with the following content:
```javascript
{
"streams": [
@@ -73,7 +87,11 @@ Also, create a file called `incremental_configured_catalog.json` with the follow
Next we will adapt the `read` method that we wrote previously. We need to change three things.
First, we need to pass it information about what data was replicated in the previous sync. In Airbyte this is called a `state` object. The structure of the state object is determined by the Source. This means that each Source can construct a state object that makes sense to it and does not need to worry about adhering to any other convention. That being said, a pretty typical structure for a state object is a map of stream name to the last value in the cursor field for that stream.
First, we need to pass it information about what data was replicated in the previous sync. In
Airbyte this is called a `state` object. The structure of the state object is determined by the
Source. This means that each Source can construct a state object that makes sense to it and does not
need to worry about adhering to any other convention. That being said, a pretty typical structure
for a state object is a map of stream name to the last value in the cursor field for that stream.
In this case we might choose something like this:
@@ -85,9 +103,11 @@ In this case we might choose something like this:
}
```
The second change we need to make to the `read` method is to use the state object so that we only emit new records.
The second change we need to make to the `read` method is to use the state object so that we only
emit new records.
Lastly, we need to emit an updated state object, so that the next time this Source runs we do not resend messages that we have already sent.
Lastly, we need to emit an updated state object, so that the next time this Source runs we do not
resend messages that we have already sent.
Here's what our updated `read` method would look like.
@@ -150,12 +170,14 @@ def read(config, catalog, state):
```
That code requires to add a new library import in the `source.py` file:
```python
from datetime import timezone
```
We will also need to parse `state` argument in the `run` method. In order to do that, we will modify the code that
calls `read` method from `run` method:
We will also need to parse `state` argument in the `run` method. In order to do that, we will modify
the code that calls `read` method from `run` method:
```python
elif command == "read":
config = read_json(get_input_file_path(parsed_args.config))
@@ -166,19 +188,25 @@ calls `read` method from `run` method:
read(config, configured_catalog, state)
```
Finally, we need to pass more arguments to our `_call_api` method in order to fetch only new prices for incremental sync:
Finally, we need to pass more arguments to our `_call_api` method in order to fetch only new prices
for incremental sync:
```python
def _call_api(ticker, token, from_day, to_day):
return requests.get(f"https://api.polygon.io/v2/aggs/ticker/{ticker}/range/1/day/{from_day}/{to_day}?sort=asc&limit=120&apiKey={token}")
```
You will notice that in order to test these changes you need a `state` object. If you run an incremental sync
without passing a state object, the new code will output a state object that you can use with the next sync. If you run this:
You will notice that in order to test these changes you need a `state` object. If you run an
incremental sync without passing a state object, the new code will output a state object that you
can use with the next sync. If you run this:
```bash
python source.py read --config secrets/valid_config.json --catalog incremental_configured_catalog.json
```
The output will look like following:
```bash
{"type": "RECORD", "record": {"stream": "stock_prices", "data": {"date": "2022-03-07", "stock_ticker": "TSLA", "price": 804.58}, "emitted_at": 1647294277000}}
{"type": "RECORD", "record": {"stream": "stock_prices", "data": {"date": "2022-03-08", "stock_ticker": "TSLA", "price": 824.4}, "emitted_at": 1647294277000}}
@@ -189,25 +217,30 @@ The output will look like following:
```
Notice that the last line of output is the state object. Copy the state object:
```json
{"stock_prices": {"date": "2022-03-11"}}
{ "stock_prices": { "date": "2022-03-11" } }
```
and paste it into a new file (i.e. `state.json`). Now you can run an incremental sync:
```bash
python source.py read --config secrets/valid_config.json --catalog incremental_configured_catalog.json --state state.json
python source.py read --config secrets/valid_config.json --catalog incremental_configured_catalog.json --state state.json
```
## Run the incremental tests
The [Source Acceptance Test (SAT) suite](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) also includes test cases to ensure that incremental mode is working correctly.
The
[Connector Acceptance Test (CAT) suite](../../testing-connectors/connector-acceptance-tests-reference)
also includes test cases to ensure that incremental mode is working correctly.
To enable these tests, modify the existing `acceptance-test-config.yml` by adding the following:
```yaml
incremental:
- config_path: "secrets/valid_config.json"
configured_catalog_path: "incremental_configured_catalog.json"
future_state_path: "abnormal_state.json"
incremental:
- config_path: "secrets/valid_config.json"
configured_catalog_path: "incremental_configured_catalog.json"
future_state_path: "abnormal_state.json"
```
Your full `acceptance-test-config.yml` should look something like this:
@@ -240,13 +273,16 @@ tests:
future_state_path: "abnormal_state.json"
```
You will also need to create an `abnormal_state.json` file with a date in the future, which should not produce any records:
You will also need to create an `abnormal_state.json` file with a date in the future, which should
not produce any records:
```
```javascript
{"stock_prices": {"date": "2121-01-01"}}
```
And lastly you need to modify the `check` function call to include the new parameters `from_day` and `to_day` in `source.py`:
And lastly you need to modify the `check` function call to include the new parameters `from_day` and
`to_day` in `source.py`:
```python
def check(config):
# Validate input configuration by attempting to get the daily closing prices of the input stock ticker
@@ -272,8 +308,8 @@ Run the tests once again:
And finally, you should see a successful test summary:
```
collecting ...
test_core.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 86% ████████▋
collecting ...
test_core.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 86% ████████▋
test_full_refresh.py ✓ 91% █████████▏
test_incremental.py ✓✓ 100% ██████████
@@ -285,14 +321,15 @@ Results (8.90s):
That's all you need to do to add incremental functionality to the stock ticker Source.
You can deploy the new version of your connector simply by running:
```bash
./gradlew clean :airbyte-integrations:connectors:source-stock-ticker-api:build
```
Bonus points: go to Airbyte UI and reconfigure the connection to use incremental sync.
Incremental definitely requires more configurability than full refresh, so your implementation may deviate slightly depending on whether your cursor
field is source defined or user-defined. If you think you are running into one of those cases, check out
our [incremental](/using-airbyte/core-concepts/sync-modes/incremental-append.md) documentation for more information on different types of
configuration.
Incremental definitely requires more configurability than full refresh, so your implementation may
deviate slightly depending on whether your cursor field is source defined or user-defined. If you
think you are running into one of those cases, check out our
[incremental](/using-airbyte/core-concepts/sync-modes/incremental-append.md) documentation for more
information on different types of configuration.

View File

@@ -1,38 +1,48 @@
---
description: Building a source connector without using any helpers to learn the Airbyte Specification for sources
description:
Building a source connector without using any helpers to learn the Airbyte Specification for
sources
---
# Building a Source Connector: The Hard Way
This tutorial walks you through building a simple Airbyte source without using any helpers to demonstrate the following concepts in action:
This tutorial walks you through building a simple Airbyte source without using any helpers to
demonstrate the following concepts in action:
- [The Airbyte Specification](../../understanding-airbyte/airbyte-protocol.md) and the interface implemented by a source connector
- [The AirbyteCatalog](../../understanding-airbyte/beginners-guide-to-catalog.md)
- [The Airbyte Specification](../../../understanding-airbyte/airbyte-protocol.md) and the interface
implemented by a source connector
- [The AirbyteCatalog](../../../understanding-airbyte/beginners-guide-to-catalog.md)
- [Packaging your connector](https://docs.airbyte.com/connector-development#1.-implement-and-package-the-connector)
- [Testing your connector](../testing-connectors/connector-acceptance-tests-reference.md)
- [Testing your connector](../../testing-connectors/connector-acceptance-tests-reference.md)
:::warning
**This tutorial is meant for those interested in learning how the Airbyte Specification works in detail,
not for creating production connectors**.
If you're building a real source, you should start with using the [Connector Builder](../connector-builder-ui/overview), or
the [Connector Development Kit](https://github.com/airbytehq/airbyte/tree/master/airbyte-cdk/python/docs/tutorials).
**This tutorial is meant for those interested in learning how the Airbyte Specification
works in detail, not for creating production connectors**. If you're building a real source, you
should start with using the [Connector Builder](../../connector-builder-ui/overview), or the
[Connector Development Kit](https://github.com/airbytehq/airbyte/tree/master/airbyte-cdk/python/docs/tutorials).
:::
## Requirements
To run this tutorial, you'll need:
- Docker, Python, and Java with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md).
- The `requests` Python package installed via `pip install requests` \(or `pip3` if `pip` is linked to a Python2 installation on your system\)
- Docker, Python, and Java with the versions listed in the
[tech stack section](../../../understanding-airbyte/tech-stack.md).
- The `requests` Python package installed via `pip install requests` \(or `pip3` if `pip` is linked
to a Python2 installation on your system\)
## Our connector: a stock ticker API
The connector will output the daily price of a stock since a given date.
We'll leverage [Polygon.io API](https://polygon.io/) for this.
The connector will output the daily price of a stock since a given date. We'll leverage
[Polygon.io API](https://polygon.io/) for this.
:::info
We'll use Python to implement the connector, but you could build an Airbyte
connector in any language.
We'll use Python to implement the connector, but you could build an Airbyte connector in any
language.
:::
Here's the outline of what we'll do to build the connector:
@@ -40,7 +50,8 @@ Here's the outline of what we'll do to build the connector:
1. Use the Airbyte connector template to bootstrap the connector package
2. Implement the methods required by the Airbyte Specification for our connector:
1. `spec`: declares the user-provided credentials or configuration needed to run the connector
2. `check`: tests if the connector can connect with the underlying data source with the user-provided configuration
2. `check`: tests if the connector can connect with the underlying data source with the
user-provided configuration
3. `discover`: declares the different streams of data that this connector can output
4. `read`: reads data from the underlying data source \(The stock ticker API\)
3. Package the connector in a Docker image
@@ -49,10 +60,10 @@ Here's the outline of what we'll do to build the connector:
[Part 2 of this article](adding-incremental-sync.md) covers:
- Support [incremental sync](../../using-airbyte/core-concepts/sync-modes/incremental-append.md)
- Support [incremental sync](../../../using-airbyte/core-concepts/sync-modes/incremental-append.md)
- Add custom integration tests
Let's get started!
Let's get started!
---
@@ -65,7 +76,8 @@ $ pwd
/Users/sherifnada/code/airbyte
```
Airbyte provides a code generator which bootstraps the scaffolding for our connector. Let's use it by running:
Airbyte provides a code generator which bootstraps the scaffolding for our connector. Let's use it
by running:
```bash
$ cd airbyte-integrations/connector-templates/generator
@@ -74,14 +86,15 @@ $ ./generate.sh
Select the `Generic Source` template and call the connector `stock-ticker-api`:
![](../../.gitbook/assets/newsourcetutorial_plop.gif)
![](../../../.gitbook/assets/newsourcetutorial_plop.gif)
:::info
This tutorial uses the bare-bones `Generic Source` template to illustrate how all the pieces of a connector
work together. For real connectors, the generator provides `Python` and `Python HTTP API` source templates, they use
[Airbyte CDK](../cdk-python/README.md).
:::
This tutorial uses the bare-bones `Generic Source` template to illustrate how all the pieces
of a connector work together. For real connectors, the generator provides `Python` and
`Python HTTP API` source templates, they use [Airbyte CDK](../../cdk-python/README.md).
:::
```bash
$ cd ../../connectors/source-stock-ticker-api
@@ -91,7 +104,8 @@ Dockerfile README.md acceptance-test-config.yml
### 2. Implement the connector in line with the Airbyte Specification
In the connector package directory, create a single Python file `source.py` that will hold our implementation:
In the connector package directory, create a single Python file `source.py` that will hold our
implementation:
```bash
touch source.py
@@ -99,20 +113,27 @@ touch source.py
#### Implement the spec operation
The `spec` operation is described in the [Airbyte Protocol](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#spec).
It's a way for the connector to tell Airbyte what user inputs it needs in order to connecto to the source (the stock
ticker API in our case). Airbyte expects the command to output a connector specification in `AirbyteMessage` format.
The `spec` operation is described in the
[Airbyte Protocol](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#spec). It's a
way for the connector to tell Airbyte what user inputs it needs in order to connecto to the source
(the stock ticker API in our case). Airbyte expects the command to output a connector specification
in `AirbyteMessage` format.
To contact the stock ticker API, we need two things:
1. Which stock ticker we're interested in
2. The API key to use when contacting the API \(you can obtain a free API token from [Polygon.io](https://polygon.io/dashboard/signup) free plan\)
2. The API key to use when contacting the API \(you can obtain a free API token from
[Polygon.io](https://polygon.io/dashboard/signup) free plan\)
:::info
For reference, the API docs we'll be using
[can be found here](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to).
:::info
For reference, the API docs we'll be using [can be found here](https://polygon.io/docs/stocks/get_v2_aggs_ticker__stocksticker__range__multiplier___timespan___from___to).
:::
Let's create a [JSONSchema](http://json-schema.org/) file `spec.json` encoding these two requirements:
Let's create a [JSONSchema](http://json-schema.org/) file `spec.json` encoding these two
requirements:
```javascript
{
@@ -139,11 +160,15 @@ Let's create a [JSONSchema](http://json-schema.org/) file `spec.json` encoding t
}
```
- `documentationUrl` is the URL that will appear in the UI for the user to gain more info about this connector. Typically this points to `docs.airbyte.com/integrations/sources/source-<connector_name>` but to keep things simple we won't show adding documentation
- `title` is the "human readable" title displayed in the UI. Without this field, The Stock Ticker field will have the title `stock_ticker` in the UI
- `documentationUrl` is the URL that will appear in the UI for the user to gain more info about this
connector. Typically this points to
`docs.airbyte.com/integrations/sources/source-<connector_name>` but to keep things simple we won't
show adding documentation
- `title` is the "human readable" title displayed in the UI. Without this field, The Stock Ticker
field will have the title `stock_ticker` in the UI
- `description` will be shown in the Airbyte UI under each field to help the user understand it
- `airbyte_secret` used by Airbyte to determine if the field should be displayed as a password \(e.g: `********`\) in the UI and not readable from the API
- `airbyte_secret` used by Airbyte to determine if the field should be displayed as a password
\(e.g: `********`\) in the UI and not readable from the API
```bash
$ ls -1
@@ -155,7 +180,8 @@ metadata.yaml
spec.json
```
Now, let's edit `source.py` to detect if the program was invoked with the `spec` argument and if so, output the connector specification:
Now, let's edit `source.py` to detect if the program was invoked with the `spec` argument and if so,
output the connector specification:
```python
# source.py
@@ -228,10 +254,13 @@ if __name__ == "__main__":
Some notes on the above code:
1. As described in the [specification](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#key-takeaways),
Airbyte connectors are CLIs which communicate via stdout, so the output of the command is simply a JSON string
formatted according to the Airbyte Specification. So to "return" a value we use `print` to output the return value to stdout.
2. All Airbyte commands can output log messages that take the form `{"type":"LOG", "log":"message"}`, so we create a helper method `log(message)` to allow logging.
1. As described in the
[specification](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#key-takeaways),
Airbyte connectors are CLIs which communicate via stdout, so the output of the command is simply
a JSON string formatted according to the Airbyte Specification. So to "return" a value we use
`print` to output the return value to stdout.
2. All Airbyte commands can output log messages that take the form
`{"type":"LOG", "log":"message"}`, so we create a helper method `log(message)` to allow logging.
3. All Airbyte commands can output error messages that take the form
`{"type":"TRACE", "trace": {"type": "ERROR", "emitted_at": current_time_in_ms, "error": {"message": error_message}}}}`,
so we create a helper method `log_error(message)` to allow error messages.
@@ -245,17 +274,21 @@ python source.py spec
#### Implementing check connection
The second command to implement is the [check operation](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#check) `check --config <config_name>`,
which tells the user whether a config file they gave us is correct. In our case, "correct" means they input a valid
stock ticker and a correct API key like we declare via the `spec` operation.
The second command to implement is the
[check operation](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#check)
`check --config <config_name>`, which tells the user whether a config file they gave us is correct.
In our case, "correct" means they input a valid stock ticker and a correct API key like we declare
via the `spec` operation.
To achieve this, we'll:
1. Create valid and invalid configuration files to test the success and failure cases with our connector.
We'll place config files in the `secrets/` directory which is gitignored everywhere in the Airbyte monorepo by
default to avoid accidentally checking in API keys.
2. Add a `check` method which calls the Polygon.io API to verify if the provided token & stock ticker are correct and output the correct airbyte message.
3. Extend the argument parser to recognize the `check --config <config>` command and call the `check` method when the `check` command is invoked.
1. Create valid and invalid configuration files to test the success and failure cases with our
connector. We'll place config files in the `secrets/` directory which is gitignored everywhere in
the Airbyte monorepo by default to avoid accidentally checking in API keys.
2. Add a `check` method which calls the Polygon.io API to verify if the provided token & stock
ticker are correct and output the correct airbyte message.
3. Extend the argument parser to recognize the `check --config <config>` command and call the
`check` method when the `check` command is invoked.
Let's first add the configuration files:
@@ -265,7 +298,8 @@ $ echo '{"api_key": "put_your_key_here", "stock_ticker": "TSLA"}' > secrets/vali
$ echo '{"api_key": "not_a_real_key", "stock_ticker": "TSLA"}' > secrets/invalid_config.json
```
Make sure to add your actual API key instead of the placeholder value `<put_your_key_here>` when following the tutorial.
Make sure to add your actual API key instead of the placeholder value `<put_your_key_here>` when
following the tutorial.
Then we'll add the `check` method:
@@ -297,8 +331,8 @@ def check(config):
print(json.dumps(output_message))
```
In Airbyte, the contract for input files is that they will be available in the current working directory if they are not provided as an absolute path.
This method helps us achieve that:
In Airbyte, the contract for input files is that they will be available in the current working
directory if they are not provided as an absolute path. This method helps us achieve that:
```python
def get_input_file_path(path):
@@ -352,19 +386,30 @@ $ python source.py check --config secrets/invalid_config.json
{'type': 'CONNECTION_STATUS', 'connectionStatus': {'status': 'FAILED', 'message': 'API Key is incorrect.'}}
```
Our connector is able to detect valid and invalid configs correctly. Two methods down, two more to go!
Our connector is able to detect valid and invalid configs correctly. Two methods down, two more to
go!
#### Implementing Discover
The `discover` command outputs a Catalog, a struct that declares the Streams and Fields \(Airbyte's equivalents of tables and columns\) output by the connector. It also includes metadata around which features a connector supports \(e.g. which sync modes\). In other words it describes what data is available in the source. If you'd like to read a bit more about this concept check out our [Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md) or for a more detailed treatment read the [Airbyte Specification](../../understanding-airbyte/airbyte-protocol.md).
The `discover` command outputs a Catalog, a struct that declares the Streams and Fields \(Airbyte's
equivalents of tables and columns\) output by the connector. It also includes metadata around which
features a connector supports \(e.g. which sync modes\). In other words it describes what data is
available in the source. If you'd like to read a bit more about this concept check out our
[Beginner's Guide to the Airbyte Catalog](../../../understanding-airbyte/beginners-guide-to-catalog.md)
or for a more detailed treatment read the
[Airbyte Specification](../../../understanding-airbyte/airbyte-protocol.md).
The stock ticker connector outputs records belonging to exactly one Stream \(table\).
Each record contains three Fields \(columns\): `date`, `price`, and `stock_ticker`, corresponding to the price of a stock on a given day.
The stock ticker connector outputs records belonging to exactly one Stream \(table\). Each record
contains three Fields \(columns\): `date`, `price`, and `stock_ticker`, corresponding to the price
of a stock on a given day.
To implement `discover`, we'll:
1. Add a method `discover` in `source.py` which outputs the Catalog. To better understand what a catalog is, check out our [Beginner's Guide to the AirbyteCatalog](../../understanding-airbyte/beginners-guide-to-catalog.md)
2. Extend the arguments parser to use detect the `discover --config <config_path>` command and call the `discover` method
1. Add a method `discover` in `source.py` which outputs the Catalog. To better understand what a
catalog is, check out our
[Beginner's Guide to the AirbyteCatalog](../../../understanding-airbyte/beginners-guide-to-catalog.md)
2. Extend the arguments parser to use detect the `discover --config <config_path>` command and call
the `discover` method
Let's implement `discover` by adding the following in `source.py`:
@@ -416,8 +461,15 @@ We need to update our list of available commands:
```python
log("Invalid command. Allowable commands: [spec, check, discover]")
```
:::info
You may be wondering why `config` is a required input to `discover` if it's not used. This is done for consistency: the Airbyte Specification requires `--config` as an input to `discover` because many sources require it \(e.g: to discover the tables available in a Postgres database, you must supply a password\). So instead of guessing whether the flag is required depending on the connector, we always assume it is required, and the connector can choose whether to use it.
You may be wondering why `config` is a required input to `discover` if it's not used. This
is done for consistency: the Airbyte Specification requires `--config` as an input to `discover`
because many sources require it \(e.g: to discover the tables available in a Postgres database, you
must supply a password\). So instead of guessing whether the flag is required depending on the
connector, we always assume it is required, and the connector can choose whether to use it.
:::
The full run method is now below:
@@ -473,27 +525,43 @@ With that, we're done implementing the `discover` command.
#### Implementing the read operation
We've done a lot so far, but a connector ultimately exists to read data! This is where the [`read` command](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#read) comes in. The format of the command is:
We've done a lot so far, but a connector ultimately exists to read data! This is where the
[`read` command](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#read) comes in.
The format of the command is:
```bash
python source.py read --config <config_file_path> --catalog <configured_catalog.json> [--state <state_file_path>]
```
Each of these are described in the Airbyte Specification in detail, but we'll give a quick description of the two options we haven't seen so far:
Each of these are described in the Airbyte Specification in detail, but we'll give a quick
description of the two options we haven't seen so far:
- `--catalog` points to a Configured Catalog. The Configured Catalog contains the contents for the Catalog \(remember the Catalog we output from discover?\). It also contains some configuration information that describes how the data will by replicated. For example, we had `supported_sync_modes` in the Catalog. In the Configured Catalog, we select which of the `supported_sync_modes` we want to use by specifying the `sync_mode` field. \(This is the most complicated concept when working Airbyte, so if it is still not making sense that's okay for now. If you're just dying to understand how the Configured Catalog works checkout the [Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md)\).
- `--state` points to a state file. The state file is only relevant when some Streams are synced with the sync mode `incremental`, so we'll cover the state file in more detail in the incremental section below.
- `--catalog` points to a Configured Catalog. The Configured Catalog contains the contents for the
Catalog \(remember the Catalog we output from discover?\). It also contains some configuration
information that describes how the data will by replicated. For example, we had
`supported_sync_modes` in the Catalog. In the Configured Catalog, we select which of the
`supported_sync_modes` we want to use by specifying the `sync_mode` field. \(This is the most
complicated concept when working Airbyte, so if it is still not making sense that's okay for now.
If you're just dying to understand how the Configured Catalog works checkout the
[Beginner's Guide to the Airbyte Catalog](../../../understanding-airbyte/beginners-guide-to-catalog.md)\).
- `--state` points to a state file. The state file is only relevant when some Streams are synced
with the sync mode `incremental`, so we'll cover the state file in more detail in the incremental
section below.
Our connector only supports one Stream, `stock_prices`, so we'd expect the input catalog to contain that stream configured to sync in full refresh.
Since our connector doesn't support incremental sync yet, we'll ignore the state option for now.
Our connector only supports one Stream, `stock_prices`, so we'd expect the input catalog to contain
that stream configured to sync in full refresh. Since our connector doesn't support incremental sync
yet, we'll ignore the state option for now.
To read data in our connector, we'll:
1. Create a configured catalog which tells our connector that we want to sync the `stock_prices` stream
2. Implement a method `read` in `source.py`. For now we'll always read the last 7 days of a stock price's data
1. Create a configured catalog which tells our connector that we want to sync the `stock_prices`
stream
2. Implement a method `read` in `source.py`. For now we'll always read the last 7 days of a stock
price's data
3. Extend the arguments parser to recognize the `read` command and its arguments
First, let's create a configured catalog `fullrefresh_configured_catalog.json` to use as test input for the read operation:
First, let's create a configured catalog `fullrefresh_configured_catalog.json` to use as test input
for the read operation:
```javascript
{
@@ -573,7 +641,9 @@ def read(config, catalog):
print(json.dumps(output_message))
```
After doing some input validation, the code above calls the API to obtain daily prices for the input stock ticker, then outputs the prices. As always, our output is formatted according to the Airbyte Specification. Let's update our args parser with the following blocks:
After doing some input validation, the code above calls the API to obtain daily prices for the input
stock ticker, then outputs the prices. As always, our output is formatted according to the Airbyte
Specification. Let's update our args parser with the following blocks:
```python
# Accept the read command
@@ -667,7 +737,8 @@ $ python source.py read --config secrets/valid_config.json --catalog fullrefresh
{'type': 'RECORD', 'record': {'stream': 'stock_prices', 'data': {'date': '2020-12-21', 'stock_ticker': 'TSLA', 'price': 649.86}, 'emitted_at': 1608626365000}}
```
With this method, we now have a fully functioning connector! Let's pat ourselves on the back for getting there.
With this method, we now have a fully functioning connector! Let's pat ourselves on the back for
getting there.
For reference, the full `source.py` file now looks like this:
@@ -868,13 +939,15 @@ if __name__ == "__main__":
main()
```
A full connector in about 200 lines of code. Not bad! We're now ready to package & test our connector then use it in the Airbyte UI.
A full connector in about 200 lines of code. Not bad! We're now ready to package & test our
connector then use it in the Airbyte UI.
---
### 3. Package the connector in a Docker image
Our connector is very lightweight, so the Dockerfile needed to run it is very light as well. Edit the `Dockerfile` as follows:
Our connector is very lightweight, so the Dockerfile needed to run it is very light as well. Edit
the `Dockerfile` as follows:
```Dockerfile
FROM python:3.9-slim
@@ -905,8 +978,10 @@ Once we save the `Dockerfile`, we can build the image by running:
docker build . -t airbyte/source-stock-ticker-api:dev
```
To run any of our commands, we'll need to mount all the inputs into the Docker container first, then refer to their _mounted_ paths when invoking the connector.
This allows the connector to access your secrets without having to build them into the container. For example, we'd run `check` or `read` as follows:
To run any of our commands, we'll need to mount all the inputs into the Docker container first, then
refer to their _mounted_ paths when invoking the connector. This allows the connector to access your
secrets without having to build them into the container. For example, we'd run `check` or `read` as
follows:
```bash
$ docker run airbyte/source-stock-ticker-api:dev spec
@@ -930,11 +1005,17 @@ $ docker run -v $(pwd)/secrets/valid_config.json:/data/config.json -v $(pwd)/ful
### 4. Test the connector
The minimum requirement for testing your connector is to pass the [Connector Acceptance Test](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) suite. The connector acceptence test is a blackbox test suite containing a number of tests that validate your connector behaves as intended by the Airbyte Specification. You're encouraged to add custom test cases for your connector where it makes sense to do so e.g: to test edge cases that are not covered by the standard suite. But at the very least, your connector must pass Airbyte's acceptance test suite.
The minimum requirement for testing your connector is to pass the
[Connector Acceptance Test](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference)
suite. The connector acceptence test is a blackbox test suite containing a number of tests that
validate your connector behaves as intended by the Airbyte Specification. You're encouraged to add
custom test cases for your connector where it makes sense to do so e.g: to test edge cases that are
not covered by the standard suite. But at the very least, your connector must pass Airbyte's
acceptance test suite.
The code generator makes a minimal acceptance test configuration. Let's modify it as follows to setup
tests for each operation with valid and invalid credentials. Edit `acceptance-test-config.yaml` to look
as follows:
The code generator makes a minimal acceptance test configuration. Let's modify it as follows to
setup tests for each operation with valid and invalid credentials. Edit
`acceptance-test-config.yaml` to look as follows:
```yaml
# See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference)
@@ -969,8 +1050,11 @@ acceptance_tests:
# configured_catalog_path: "integration_tests/configured_catalog.json"
# future_state_path: "integration_tests/abnormal_state.json"
```
To run the test suite, we'll use [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md).
You can build and install `airbyte-ci` locally from Airbyte repository root by running `make`. Assuming you have it already:
To run the test suite, we'll use
[`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md).
You can build and install `airbyte-ci` locally from Airbyte repository root by running `make`.
Assuming you have it already:
```shell
airbyte-ci connectors --name=<name-of-your-connector></name-of-your-connector> --use-remote-secrets=false test
@@ -978,7 +1062,8 @@ airbyte-ci connectors --name=<name-of-your-connector></name-of-your-connector> -
`airbyte-ci` will build and then test your connector, and provide a report on the test results.
That's it! We've created a fully functioning connector. Now let's get to the exciting part: using it from the Airbyte UI.
That's it! We've created a fully functioning connector. Now let's get to the exciting part: using it
from the Airbyte UI.
---
@@ -992,22 +1077,28 @@ Let's recap what we've achieved so far:
To use it from the Airbyte UI, we need to:
1. Publish our connector's Docker image somewhere accessible by Airbyte Core \(Airbyte's server, scheduler, workers, and webapp infrastructure\)
2. Add the connector via the Airbyte UI and setup a connection from our new connector to a local CSV file for illustration purposes
1. Publish our connector's Docker image somewhere accessible by Airbyte Core \(Airbyte's server,
scheduler, workers, and webapp infrastructure\)
2. Add the connector via the Airbyte UI and setup a connection from our new connector to a local CSV
file for illustration purposes
3. Run a sync and inspect the output
#### 1. Publish the Docker image
Since we're running this tutorial locally, Airbyte will have access to any Docker images available to your local `docker` daemon. So all we need to do is build & tag our connector.
For real production connectors to be available on Airbyte Cloud, you'd need to publish them on DockerHub.
Since we're running this tutorial locally, Airbyte will have access to any Docker images available
to your local `docker` daemon. So all we need to do is build & tag our connector. For real
production connectors to be available on Airbyte Cloud, you'd need to publish them on DockerHub.
Airbyte's build system builds and tags your connector's image correctly by default as part of the connector's standard `build` process. **From the Airbyte repo root**, run:
Airbyte's build system builds and tags your connector's image correctly by default as part of the
connector's standard `build` process. **From the Airbyte repo root**, run:
```bash
./gradlew clean :airbyte-integrations:connectors:source-stock-ticker-api:build
```
This is the equivalent of running `docker build . -t airbyte/source-stock-ticker-api:dev` from the connector root, where the tag `airbyte/source-stock-ticker-api` is extracted from the label `LABEL io.airbyte.name` inside your `Dockerfile`.
This is the equivalent of running `docker build . -t airbyte/source-stock-ticker-api:dev` from the
connector root, where the tag `airbyte/source-stock-ticker-api` is extracted from the label
`LABEL io.airbyte.name` inside your `Dockerfile`.
Verify the image was built by running:
@@ -1020,17 +1111,20 @@ $ docker images | head
<none> <none> 1caf57c72afd 3 hours ago 121MB
```
`airbyte/source-stock-ticker-api` was built and tagged with the `dev` tag. Now let's head to the last step.
`airbyte/source-stock-ticker-api` was built and tagged with the `dev` tag. Now let's head to the
last step.
#### 2. Add the connector via the Airbyte UI
If the Airbyte server isn't already running, start it by running **from the Airbyte repository root**:
If the Airbyte server isn't already running, start it by running **from the Airbyte repository
root**:
```bash
docker compose up
```
When Airbyte server is done starting up, it prints the following banner in the log output \(it can take 10-20 seconds for the server to start\):
When Airbyte server is done starting up, it prints the following banner in the log output \(it can
take 10-20 seconds for the server to start\):
```bash
airbyte-server | 2022-03-11 18:38:33 INFO i.a.s.ServerApp(start):121 -
@@ -1047,79 +1141,90 @@ airbyte-server | Version: dev
airbyte-server |
```
After you see the above banner printed out in the terminal window where you are running `docker compose up`, visit [http://localhost:8000](http://localhost:8000) in your browser and log in with the default credentials: username `airbyte` and password `password`.
After you see the above banner printed out in the terminal window where you are running
`docker compose up`, visit [http://localhost:8000](http://localhost:8000) in your browser and log in
with the default credentials: username `airbyte` and password `password`.
If this is the first time using the Airbyte UI, then you will be prompted to go through a first-time wizard. To skip it, click the "Skip Onboarding" button.
If this is the first time using the Airbyte UI, then you will be prompted to go through a first-time
wizard. To skip it, click the "Skip Onboarding" button.
In the UI, click the "Settings" button in the left side bar:
![](../../.gitbook/assets/newsourcetutorial_sidebar_settings.png)
![](../../../.gitbook/assets/newsourcetutorial_sidebar_settings.png)
Then on the Settings page, select Sources
![](../../.gitbook/assets/newsourcetutorial_settings_page.png)
![](../../../.gitbook/assets/newsourcetutorial_settings_page.png)
Then on the Settings/Sources page, click "+ New Connector" button at the top right:
![](../../.gitbook/assets/newsourcetutorial_settings_sources_newconnector.png)
![](../../../.gitbook/assets/newsourcetutorial_settings_sources_newconnector.png)
On the modal that pops up, enter the following information then click "Add"
![](../../.gitbook/assets/newsourcetutorial_new_connector_modal.png)
![](../../../.gitbook/assets/newsourcetutorial_new_connector_modal.png)
After you click "Add", the modal will close and you will be back at the Settings page.
Now click "Sources" in the navigation bar on the left:
After you click "Add", the modal will close and you will be back at the Settings page. Now click
"Sources" in the navigation bar on the left:
![](../../.gitbook/assets/newsourcetutorial_sources_navbar.png)
![](../../../.gitbook/assets/newsourcetutorial_sources_navbar.png)
You will be redirected to Sources page, which, if you have not set up any connections, will be empty.
On the Sources page click "+ new source" in the top right corner:
You will be redirected to Sources page, which, if you have not set up any connections, will be
empty. On the Sources page click "+ new source" in the top right corner:
![](../../.gitbook/assets/newsourcetutorial_sources_page.png)
![](../../../.gitbook/assets/newsourcetutorial_sources_page.png)
A new modal will prompt you for details of the new source. Type "Stock Ticker" in the Name field.
Then, find your connector in the Source type dropdown. We have lots of connectors already, so it might be easier
to find your connector by typing part of its name:
Then, find your connector in the Source type dropdown. We have lots of connectors already, so it
might be easier to find your connector by typing part of its name:
![](../../.gitbook/assets/newsourcetutorial_find_your_connector.png)
![](../../../.gitbook/assets/newsourcetutorial_find_your_connector.png)
After you select your connector in the Source type dropdown, the modal will show two more fields: API Key and Stock Ticker.
Remember that `spec.json` file you created at the very beginning of this tutorial? These fields should correspond to the `properties`
section of that file. Copy-paste your Polygon.io API key and a stock ticker into these fields and then click "Set up source"
button at the bottom right of the modal.
After you select your connector in the Source type dropdown, the modal will show two more fields:
API Key and Stock Ticker. Remember that `spec.json` file you created at the very beginning of this
tutorial? These fields should correspond to the `properties` section of that file. Copy-paste your
Polygon.io API key and a stock ticker into these fields and then click "Set up source" button at the
bottom right of the modal.
![](../../.gitbook/assets/newsourcetutorial_source_config.png)
![](../../../.gitbook/assets/newsourcetutorial_source_config.png)
Once you click "Set up source", Airbyte will spin up your connector and run "check" method to verify the configuration.
You will see a progress bar briefly and if the configuration is valid, you will see a success message,
the modal will close and you will see your connector on the updated Sources page.
Once you click "Set up source", Airbyte will spin up your connector and run "check" method to verify
the configuration. You will see a progress bar briefly and if the configuration is valid, you will
see a success message, the modal will close and you will see your connector on the updated Sources
page.
![](../../.gitbook/assets/newsourcetutorial_sources_stock_ticker.png)
![](../../../.gitbook/assets/newsourcetutorial_sources_stock_ticker.png)
Next step is to add a destination. On the same page, click "add destination" and then click "+ add a new destination":
Next step is to add a destination. On the same page, click "add destination" and then click "+ add a
new destination":
![](../../.gitbook/assets/newsourcetutorial_add_destination_new_destination.png)
![](../../../.gitbook/assets/newsourcetutorial_add_destination_new_destination.png)
"New destination" wizard will show up. Type a name (e.g. "Local JSON") into the Name field and select "Local JSON" in Destination type drop-down.
After you select the destination type, type `/local/tutorial_json` into Destination path field.
When we run syncs, we'll find the output on our local filesystem in `/tmp/airbyte_local/tutorial_json`.
"New destination" wizard will show up. Type a name (e.g. "Local JSON") into the Name field and
select "Local JSON" in Destination type drop-down. After you select the destination type, type
`/local/tutorial_json` into Destination path field. When we run syncs, we'll find the output on our
local filesystem in `/tmp/airbyte_local/tutorial_json`.
Click "Set up destination" at the lower right of the form.
![](../../.gitbook/assets/newsourcetutorial_add_destination.png)
![](../../../.gitbook/assets/newsourcetutorial_add_destination.png)
After that Airbyte will test the destination and prompt you to configure the connection between Stock Ticker source and Local JSON destination.
Select "Mirror source structure" in the Destination Namespace, check the checkbox next to the stock_prices stream, and click "Set up connection" button at the bottom of the form:
After that Airbyte will test the destination and prompt you to configure the connection between
Stock Ticker source and Local JSON destination. Select "Mirror source structure" in the Destination
Namespace, check the checkbox next to the stock_prices stream, and click "Set up connection" button
at the bottom of the form:
![](../../.gitbook/assets/newsourcetutorial_configure_connection.png)
![](../../../.gitbook/assets/newsourcetutorial_configure_connection.png)
Ta-da! Your connection is now configured to sync once a day. You will see your new connection on the next screen:
Ta-da! Your connection is now configured to sync once a day. You will see your new connection on the
next screen:
![](../../.gitbook/assets/newsourcetutorial_connection_done.png)
![](../../../.gitbook/assets/newsourcetutorial_connection_done.png)
Airbyte will run the first sync job as soon as your connection is saved. Navigate to "Connections" in the side bar and wait for the first sync to succeed:
Airbyte will run the first sync job as soon as your connection is saved. Navigate to "Connections"
in the side bar and wait for the first sync to succeed:
![](../../.gitbook/assets/newsourcetutorial_first_sync.png)
![](../../../.gitbook/assets/newsourcetutorial_first_sync.png)
Let's verify the output. From your shell, run:
@@ -1132,14 +1237,17 @@ $ cat /tmp/airbyte_local/tutorial_json/_airbyte_raw_stock_prices.jsonl
{"_airbyte_ab_id":"0b7a8d33-4500-4a6d-9d74-11716bd22f01","_airbyte_emitted_at":1647026803000,"_airbyte_data":{"date":"2022-03-10","stock_ticker":"TSLA","price":838.3}}
```
Congratulations! We've successfully written a fully functioning Airbyte connector. You're an Airbyte contributor now ;\)
Congratulations! We've successfully written a fully functioning Airbyte connector. You're an Airbyte
contributor now ;\)
1. Follow the [next tutorial](adding-incremental-sync.md) to implement incremental sync.
2. Implement another connector using the Low-code CDK, [Connector Builder](../connector-builder-ui/overview), or [Connector Development Kit](https://github.com/airbytehq/airbyte/tree/master/airbyte-cdk/python/docs/tutorials)
3. We welcome low-code configuration based connector contributions! If you make a connector in the connector builder
and want to share it with everyone using Airbyte, pull requests are welcome!
2. Implement another connector using the Low-code CDK,
[Connector Builder](../../connector-builder-ui/overview.md), or
[Connector Development Kit](https://github.com/airbytehq/airbyte/tree/master/airbyte-cdk/python/docs/tutorials)
3. We welcome low-code configuration based connector contributions! If you make a connector in the
connector builder and want to share it with everyone using Airbyte, pull requests are welcome!
## Additional guides
- [Building a Python Source](https://docs.airbyte.com/connector-development/tutorials/building-a-python-source)
- [Building a Python Source](https://docs.airbyte.com/connector-development/tutorials/building-a-python-source.md)
- [Building a Java Destination](https://docs.airbyte.com/connector-development/tutorials/building-a-java-destination)