GitBook: [master] 84 pages and 72 assets modified
@@ -1,4 +1,4 @@
|
||||
# Overview
|
||||
# Introduction
|
||||
|
||||
 
|
||||
|
||||
@@ -32,7 +32,7 @@ docker-compose up
|
||||
|
||||
Now visit [http://localhost:8000](http://localhost:8000)
|
||||
|
||||
Here is a [step-by-step guide](docs/quickstart/getting-started.md) showing you how to load data from an API into a file, all on your computer.
|
||||
Here is a [step-by-step guide](https://github.com/airbytehq/airbyte/tree/e378d40236b6a34e1c1cb481c8952735ec687d88/docs/quickstart/getting-started.md) showing you how to load data from an API into a file, all on your computer.
|
||||
|
||||
## Features
|
||||
|
||||
|
||||
|
Before Width: | Height: | Size: 210 KiB |
|
Before Width: | Height: | Size: 254 KiB |
|
Before Width: | Height: | Size: 386 KiB |
|
Before Width: | Height: | Size: 429 KiB |
|
Before Width: | Height: | Size: 551 KiB |
|
Before Width: | Height: | Size: 681 KiB After Width: | Height: | Size: 681 KiB |
BIN
docs/.gitbook/assets/change-to-per-week (2).png
Normal file
|
After Width: | Height: | Size: 681 KiB |
BIN
docs/.gitbook/assets/change-to-per-week (3).png
Normal file
|
After Width: | Height: | Size: 681 KiB |
|
Before Width: | Height: | Size: 364 KiB |
|
Before Width: | Height: | Size: 719 KiB |
|
Before Width: | Height: | Size: 965 KiB After Width: | Height: | Size: 965 KiB |
BIN
docs/.gitbook/assets/datasources (3).png
Normal file
|
After Width: | Height: | Size: 965 KiB |
BIN
docs/.gitbook/assets/datasources (4).png
Normal file
|
After Width: | Height: | Size: 965 KiB |
|
Before Width: | Height: | Size: 307 KiB |
|
Before Width: | Height: | Size: 379 KiB |
|
Before Width: | Height: | Size: 492 KiB After Width: | Height: | Size: 492 KiB |
BIN
docs/.gitbook/assets/duration-spent-in-weekly-webinars (2).png
Normal file
|
After Width: | Height: | Size: 492 KiB |
BIN
docs/.gitbook/assets/duration-spent-in-weekly-webinars (3).png
Normal file
|
After Width: | Height: | Size: 492 KiB |
|
Before Width: | Height: | Size: 378 KiB |
|
Before Width: | Height: | Size: 470 KiB After Width: | Height: | Size: 470 KiB |
BIN
docs/.gitbook/assets/evolution-of-meetings-per-week (2).png
Normal file
|
After Width: | Height: | Size: 470 KiB |
BIN
docs/.gitbook/assets/evolution-of-meetings-per-week (3).png
Normal file
|
After Width: | Height: | Size: 470 KiB |
|
Before Width: | Height: | Size: 648 KiB |
|
Before Width: | Height: | Size: 493 KiB |
|
Before Width: | Height: | Size: 243 KiB After Width: | Height: | Size: 243 KiB |
BIN
docs/.gitbook/assets/launch (2).png
Normal file
|
After Width: | Height: | Size: 243 KiB |
BIN
docs/.gitbook/assets/launch (3).png
Normal file
|
After Width: | Height: | Size: 243 KiB |
|
Before Width: | Height: | Size: 667 KiB After Width: | Height: | Size: 667 KiB |
BIN
docs/.gitbook/assets/meetings-participant-ranked (2).png
Normal file
|
After Width: | Height: | Size: 667 KiB |
BIN
docs/.gitbook/assets/meetings-participant-ranked (3).png
Normal file
|
After Width: | Height: | Size: 667 KiB |
|
Before Width: | Height: | Size: 415 KiB |
|
Before Width: | Height: | Size: 505 KiB |
|
Before Width: | Height: | Size: 532 KiB After Width: | Height: | Size: 532 KiB |
|
After Width: | Height: | Size: 532 KiB |
|
After Width: | Height: | Size: 532 KiB |
|
Before Width: | Height: | Size: 504 KiB |
|
Before Width: | Height: | Size: 434 KiB After Width: | Height: | Size: 434 KiB |
BIN
docs/.gitbook/assets/postgres_credentials (2).png
Normal file
|
After Width: | Height: | Size: 434 KiB |
BIN
docs/.gitbook/assets/postgres_credentials (3).png
Normal file
|
After Width: | Height: | Size: 434 KiB |
|
Before Width: | Height: | Size: 612 KiB After Width: | Height: | Size: 612 KiB |
BIN
docs/.gitbook/assets/schema (2).png
Normal file
|
After Width: | Height: | Size: 612 KiB |
BIN
docs/.gitbook/assets/schema (3).png
Normal file
|
After Width: | Height: | Size: 612 KiB |
|
Before Width: | Height: | Size: 304 KiB |
|
Before Width: | Height: | Size: 323 KiB After Width: | Height: | Size: 323 KiB |
BIN
docs/.gitbook/assets/setup-successful (2).png
Normal file
|
After Width: | Height: | Size: 323 KiB |
BIN
docs/.gitbook/assets/setup-successful (3).png
Normal file
|
After Width: | Height: | Size: 323 KiB |
|
Before Width: | Height: | Size: 261 KiB After Width: | Height: | Size: 261 KiB |
BIN
docs/.gitbook/assets/sync-screen (2).png
Normal file
|
After Width: | Height: | Size: 261 KiB |
BIN
docs/.gitbook/assets/sync-screen (3).png
Normal file
|
After Width: | Height: | Size: 261 KiB |
|
Before Width: | Height: | Size: 1.1 MiB After Width: | Height: | Size: 1.1 MiB |
BIN
docs/.gitbook/assets/tableau-dashboard (2).png
Normal file
|
After Width: | Height: | Size: 1.1 MiB |
BIN
docs/.gitbook/assets/tableau-dashboard (3).png
Normal file
|
After Width: | Height: | Size: 1.1 MiB |
|
Before Width: | Height: | Size: 376 KiB |
|
Before Width: | Height: | Size: 194 KiB |
|
Before Width: | Height: | Size: 476 KiB |
|
Before Width: | Height: | Size: 374 KiB |
|
Before Width: | Height: | Size: 508 KiB After Width: | Height: | Size: 508 KiB |
BIN
docs/.gitbook/assets/zoom-marketplace-build-screen (2).png
Normal file
|
After Width: | Height: | Size: 508 KiB |
BIN
docs/.gitbook/assets/zoom-marketplace-build-screen (3).png
Normal file
|
After Width: | Height: | Size: 508 KiB |
@@ -10,11 +10,11 @@
|
||||
* [Browsing Output Logs](tutorials/browsing-output-logs.md)
|
||||
* [Upgrading Airbyte](tutorials/upgrading-airbyte.md)
|
||||
* [Using the Airflow Airbyte Operator](tutorials/using-the-airflow-airbyte-operator.md)
|
||||
* [Contributing to Airbyte](contributing-to-airbyte/tutorials/README.md)
|
||||
* [A Beginner's Guide to the AirbyteCatalog](contributing-to-airbyte/tutorials/beginners-guide-to-catalog.md)
|
||||
* [Building a Toy Connector](contributing-to-airbyte/tutorials/toy-connector.md)
|
||||
* [Adding Incremental to a Source](contributing-to-airbyte/tutorials/adding-incremental-sync.md)
|
||||
* [Building a Python Source](contributing-to-airbyte/tutorials/building-a-python-source.md)
|
||||
* [Contributing to Airbyte](tutorials/tutorials/README.md)
|
||||
* [A Beginner's Guide to the AirbyteCatalog](tutorials/tutorials/beginners-guide-to-catalog.md)
|
||||
* [Building a Toy Connector](tutorials/tutorials/toy-connector.md)
|
||||
* [Adding Incremental to a Source](tutorials/tutorials/adding-incremental-sync.md)
|
||||
* [Building a Python Source](tutorials/tutorials/building-a-python-source.md)
|
||||
* [Transformations and Normalization](tutorials/transformation-and-normalization/README.md)
|
||||
* [Transformations with SQL \(Part 1/2\)](tutorials/transformation-and-normalization/transformations-with-sql.md)
|
||||
* [Transformations with DBT \(Part 2/2\)](tutorials/transformation-and-normalization/transformations-with-dbt.md)
|
||||
@@ -95,20 +95,20 @@
|
||||
* [Monorepo Python Development](contributing-to-airbyte/building-new-connector/monorepo-python-development.md)
|
||||
* [Testing Connectors](contributing-to-airbyte/building-new-connector/testing-connectors.md)
|
||||
* [Standard Source Test Suite](contributing-to-airbyte/building-new-connector/standard-source-tests.md)
|
||||
* [Using the Airbyte CDK](contributing-to-airbyte/tutorials/cdk-tutorial-alpha/README.md)
|
||||
* [Getting Started](contributing-to-airbyte/tutorials/cdk-tutorial-alpha/0-getting-started.md)
|
||||
* [Step 1: Creating the Source](contributing-to-airbyte/tutorials/cdk-tutorial-alpha/1-creating-the-source.md)
|
||||
* [Step 2: Install Dependencies](contributing-to-airbyte/tutorials/cdk-tutorial-alpha/2-install-dependencies.md)
|
||||
* [Step 3: Define Inputs](contributing-to-airbyte/tutorials/cdk-tutorial-alpha/3-define-inputs.md)
|
||||
* [Step 4: Connection Checking](contributing-to-airbyte/tutorials/cdk-tutorial-alpha/4-connection-checking.md)
|
||||
* [Step 5: Declare the Schema](contributing-to-airbyte/tutorials/cdk-tutorial-alpha/5-declare-schema.md)
|
||||
* [Step 6: Read Data](contributing-to-airbyte/tutorials/cdk-tutorial-alpha/6-read-data.md)
|
||||
* [Step 7: Use the Connector in Airbyte](contributing-to-airbyte/tutorials/cdk-tutorial-alpha/7-use-connector-in-airbyte.md)
|
||||
* [Step 8: Test Connector](contributing-to-airbyte/tutorials/cdk-tutorial-alpha/8-test-your-connector.md)
|
||||
* [Using the Airbyte CDK](contributing-to-airbyte/cdk-tutorial-alpha/README.md)
|
||||
* [Getting Started](contributing-to-airbyte/cdk-tutorial-alpha/0-getting-started.md)
|
||||
* [Step 1: Creating the Source](contributing-to-airbyte/cdk-tutorial-alpha/1-creating-the-source.md)
|
||||
* [Step 2: Install Dependencies](contributing-to-airbyte/cdk-tutorial-alpha/2-install-dependencies.md)
|
||||
* [Step 3: Define Inputs](contributing-to-airbyte/cdk-tutorial-alpha/3-define-inputs.md)
|
||||
* [Step 4: Connection Checking](contributing-to-airbyte/cdk-tutorial-alpha/4-connection-checking.md)
|
||||
* [Step 5: Declare the Schema](contributing-to-airbyte/cdk-tutorial-alpha/5-declare-schema.md)
|
||||
* [Step 6: Read Data](contributing-to-airbyte/cdk-tutorial-alpha/6-read-data.md)
|
||||
* [Step 7: Use the Connector in Airbyte](contributing-to-airbyte/cdk-tutorial-alpha/7-use-connector-in-airbyte.md)
|
||||
* [Step 8: Test Connector](contributing-to-airbyte/cdk-tutorial-alpha/8-test-your-connector.md)
|
||||
* [Code Style](contributing-to-airbyte/code-style.md)
|
||||
* [Updating Documentation](contributing-to-airbyte/updating-documentation.md)
|
||||
* [Templates](contributing-to-airbyte/templates/README.md)
|
||||
* [Connector Doc Template](contributing-to-airbyte/templates/integration-documentation-template.md)
|
||||
* [Connector Doc Template](contributing-to-airbyte/templates/integration-documentation-template.md)
|
||||
* [Understanding Airbyte](understanding-airbyte/README.md)
|
||||
* [AirbyteCatalog & ConfiguredAirbyteCatalog](understanding-airbyte/catalog.md)
|
||||
* [Airbyte Specification](understanding-airbyte/airbyte-specification.md)
|
||||
@@ -121,9 +121,9 @@
|
||||
* [High-level View](understanding-airbyte/high-level-view.md)
|
||||
* [Workers & Jobs](understanding-airbyte/jobs.md)
|
||||
* [Technical Stack](understanding-airbyte/tech-stack.md)
|
||||
* [Change Data Capture (CDC)](understanding-airbyte/cdc.md)
|
||||
* [Change Data Capture \(CDC\)](understanding-airbyte/cdc.md)
|
||||
* [Namespaces](understanding-airbyte/namespaces.md)
|
||||
* [API documentation](api-documentation.md)
|
||||
* [API documentation](api-documentation.md)
|
||||
* [Project Overview](project-overview/README.md)
|
||||
* [Roadmap](project-overview/roadmap.md)
|
||||
* [Changelog](project-overview/changelog/README.md)
|
||||
@@ -131,7 +131,7 @@
|
||||
* [Connectors](project-overview/changelog/connectors.md)
|
||||
* [License](project-overview/license.md)
|
||||
* [Careers & Open Positions](career-and-open-positions/README.md)
|
||||
* [Senior Software Engineer](career-and-open-positions/senior-software-engineer.md)
|
||||
* [Senior Software Engineer](career-and-open-positions/senior-software-engineer.md)
|
||||
* [FAQ](faq/README.md)
|
||||
* [Technical Support](faq/technical-support.md)
|
||||
* [Getting Started](faq/getting-started.md)
|
||||
@@ -144,3 +144,4 @@
|
||||
* [Singer vs Airbyte](faq/differences-with/singer-vs-airbyte.md)
|
||||
* [Pipelinewise vs Airbyte](faq/differences-with/pipelinewise-vs-airbyte.md)
|
||||
* [Meltano vs Airbyte](faq/differences-with/meltano-vs-airbyte.md)
|
||||
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
# Career & Open Positions
|
||||
# Careers & Open Positions
|
||||
|
||||
## **Who we are**
|
||||
|
||||
[Airbyte](http://airbyte.io) is the upcoming open-source standard for EL\(T\). We enable data teams to replicate data from applications, APIs, and databases to data warehouses, lakes, and other destinations. We believe only an open-source approach can solve the problem of data integration, as it enables us to cover the long tail of integrations while enabling teams to adapt prebuilt connectors to their needs.
|
||||
|
||||
Airbyte is remote friendly, with most of the team still based in the Silicon Valley. We’re fully open as a company. Our **[company handbook](https://handbook.airbyte.io)**, **[culture & values](https://handbook.airbyte.io/company/culture-and-values)**, **[strategy](https://handbook.airbyte.io/strategy/strategy)** and **[roadmap](../project-overview/roadmap.md)** are open to all.
|
||||
Airbyte is remote friendly, with most of the team still based in the Silicon Valley. We’re fully open as a company. Our [**company handbook**](https://handbook.airbyte.io), [**culture & values**](https://handbook.airbyte.io/company/culture-and-values), [**strategy**](https://handbook.airbyte.io/strategy/strategy) and [**roadmap**](../project-overview/roadmap.md) are open to all.
|
||||
|
||||
We're backed by some of the world's [top investors](./#our-investors) and believe in product-led growth, where we build something awesome and let our product bring the users, rather than an outbound sales engine with cold calls.
|
||||
|
||||
@@ -50,12 +50,12 @@ If the written interview is a success, we might set you up with one or 2 additio
|
||||
|
||||
Once all of this done, we will discuss the process internally and get back to you very fast \(velocity is everything here\)! So about 2-3 calls and one written interview, that's it!
|
||||
|
||||
## **[Our Benefits](https://handbook.airbyte.io/people/benefits)**
|
||||
## [**Our Benefits**](https://handbook.airbyte.io/people/benefits)
|
||||
|
||||
* **Flexible work environment as fully remote** - we don’t look at when you log in, log out or how much time you work. We trust you, it’s the only way remote can actually work.
|
||||
* **[Unlimited vacation policy](https://handbook.airbyte.io/people/time-off)** with mandatory minimum time off - so you can fit work around your life.
|
||||
* **[Co-working space stipend](https://handbook.airbyte.io/people/expense-policy#work-space)** - we provide everyone with $200/month to use on a coworking space of their choice, if any.
|
||||
* **[Parental leave](https://handbook.airbyte.io/people/time-off#parental-leave)** \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us.
|
||||
* [**Unlimited vacation policy**](https://handbook.airbyte.io/people/time-off) with mandatory minimum time off - so you can fit work around your life.
|
||||
* [**Co-working space stipend**](https://handbook.airbyte.io/people/expense-policy#work-space) - we provide everyone with $200/month to use on a coworking space of their choice, if any.
|
||||
* [**Parental leave**](https://handbook.airbyte.io/people/time-off#parental-leave) \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us.
|
||||
* **Open book policy** - we reimburse books that employees want to purchase for their professional and career development.
|
||||
* **Continuous learning / training policy** - we sponsor the conferences and training programs you feel would add to your development in the company.
|
||||
* **Health insurance** for those from countries that do not provide this freely. Through Savvy in the US, which means you can choose the insurance you want and will receive a stipend from the company.
|
||||
|
||||
@@ -39,9 +39,9 @@ Wherever you want!
|
||||
## **Perks!!!**
|
||||
|
||||
* **Flexible work environment as fully remote** - we don’t look at when you log in, log out or how much time you work. We trust you, it’s the only way remote can actually work.
|
||||
* **[Unlimited vacation policy](https://handbook.airbyte.io/people/time-off)** with mandatory minimum time off - so you can fit work around your life.
|
||||
* **[Co-working space stipend](https://handbook.airbyte.io/people/expense-policy#work-space)** - we provide everyone with $200/month to use on a coworking space of their choice, if any.
|
||||
* **[Parental leave](https://handbook.airbyte.io/people/time-off#parental-leave)** \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us.
|
||||
* [**Unlimited vacation policy**](https://handbook.airbyte.io/people/time-off) with mandatory minimum time off - so you can fit work around your life.
|
||||
* [**Co-working space stipend**](https://handbook.airbyte.io/people/expense-policy#work-space) - we provide everyone with $200/month to use on a coworking space of their choice, if any.
|
||||
* [**Parental leave**](https://handbook.airbyte.io/people/time-off#parental-leave) \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us.
|
||||
* **Open book policy** - we reimburse books that employees want to purchase for their professional and career development.
|
||||
* **Continuous learning / training policy** - we sponsor the conferences and training programs you feel would add to your development in the company.
|
||||
* **Health insurance** for those from countries that do not provide this freely. Through Savvy in the US, which means you can choose the insurance you want and will receive a stipend from the company.
|
||||
|
||||
@@ -29,7 +29,7 @@ Here is a list of easy [good first issues](https://github.com/airbytehq/airbyte/
|
||||
It's easy to add your own connector to Airbyte! **Since Airbyte connectors are encapsulated within Docker containers, you can use any language you like.** Here are some links on how to add sources and destinations. We haven't built the documentation for all languages yet, so don't hesitate to reach out to us if you'd like help developing connectors in other languages.
|
||||
|
||||
* See [Building new connectors](building-new-connector/) to get started.
|
||||
* Since we frequently build connectors in Python, on top of Singer or in Java, we've created generator libraries to get you started quickly: [Build Python Source Connectors](tutorials/building-a-python-source.md) and [Build Java Connectors](building-new-connector/java-connectors.md)
|
||||
* Since we frequently build connectors in Python, on top of Singer or in Java, we've created generator libraries to get you started quickly: [Build Python Source Connectors](../tutorials/tutorials/building-a-python-source.md) and [Build Java Connectors](building-new-connector/java-connectors.md)
|
||||
* Integration tests \(tests that run a connector's image against an external resource\) can be run one of three ways, as detailed [here](building-new-connector/testing-connectors.md)
|
||||
|
||||
**Please note that, at no point in time, we will ask you to maintain your connector.** The goal is that the Airbyte team and the community helps maintain the connector.
|
||||
|
||||
@@ -42,9 +42,9 @@ npm run generate
|
||||
|
||||
and choose the relevant template. This will generate a new connector in the `airbyte-integrations/connectors/<your-connector>` directory.
|
||||
|
||||
Search the generated directory for "TODO"s and follow them to implement your connector.
|
||||
Search the generated directory for "TODO"s and follow them to implement your connector.
|
||||
|
||||
If you are developing a Python connector, you may find the [building a Python connector tutorial](../tutorials/building-a-python-source.md) helpful.
|
||||
If you are developing a Python connector, you may find the [building a Python connector tutorial](../../tutorials/tutorials/building-a-python-source.md) helpful.
|
||||
|
||||
### 2. Integration tests
|
||||
|
||||
@@ -54,14 +54,14 @@ At a minimum, your connector must implement the standard tests described in [Tes
|
||||
|
||||
If you're writing in Python or Java, skip this section -- it is provided automatically.
|
||||
|
||||
If you're writing in another language, please document the commands needed to:
|
||||
If you're writing in another language, please document the commands needed to:
|
||||
|
||||
1. Build your connector docker image \(usually this is just `docker build .` but let us know if there are necessary flags, gotchas, etc..\)
|
||||
2. Run any unit or integration tests _in a Docker image_.
|
||||
|
||||
Your integration and unit tests must be runnable entirely within a Docker image. This is important to guarantee consistent build environments.
|
||||
|
||||
When you submit a PR to Airbyte with your connector, the reviewer will use the commands you provide to integrate your connector into Airbyte's build system as follows:
|
||||
When you submit a PR to Airbyte with your connector, the reviewer will use the commands you provide to integrate your connector into Airbyte's build system as follows:
|
||||
|
||||
1. `:airbyte-integrations:connectors:source-<name>:build` should run unit tests and build the integration's Docker image
|
||||
2. `:airbyte-integrations:connectors:source-<name>:integrationTest` should run integration tests including Airbyte's Standard test suite.
|
||||
|
||||
@@ -4,7 +4,7 @@ This guide contains instructions on how to setup Python with Gradle within the A
|
||||
|
||||
## Python Connector Development
|
||||
|
||||
Before working with connectors written in Python, we recommend running `./gradlew :airbyte-integrations:connectors:<connector directory name>:build` (e.g. `./gradlew :airbyte-integrations:connectors:source-postgres:build`) from the root project directory. This will create a `virtualenv` and install dependencies for the connector you want to work on as well as any internal Airbyte python packages it depends on.
|
||||
Before working with connectors written in Python, we recommend running `./gradlew :airbyte-integrations:connectors:<connector directory name>:build` \(e.g. `./gradlew :airbyte-integrations:connectors:source-postgres:build`\) from the root project directory. This will create a `virtualenv` and install dependencies for the connector you want to work on as well as any internal Airbyte python packages it depends on.
|
||||
|
||||
When iterating on a single connector, you will often iterate by running
|
||||
|
||||
|
||||
@@ -4,9 +4,10 @@
|
||||
|
||||
To ensure a minimum quality bar, Airbyte runs all connectors against the same set of integration tests \(sources & destinations have two different test suites\). Those tests ensure that each connector adheres to the [Airbyte Specification](../../understanding-airbyte/airbyte-specification.md) and responds correctly to Airbyte commands when provided valid \(or invalid\) inputs.
|
||||
|
||||
*Note: If you are looking for reference documentation for the deprecated first version of test suites, see [Standard Tests (Legacy)](legacy-standard-source-tests.md).*
|
||||
_Note: If you are looking for reference documentation for the deprecated first version of test suites, see_ [_Standard Tests \(Legacy\)_](https://github.com/airbytehq/airbyte/tree/e378d40236b6a34e1c1cb481c8952735ec687d88/docs/contributing-to-airbyte/building-new-connector/legacy-standard-source-tests.md)_._
|
||||
|
||||
### Architecture of standard tests
|
||||
|
||||
The Standard Test Suite runs its tests against the connector's Docker image. It takes as input the configuration file `acceptance-tests-config.yml`.
|
||||
|
||||

|
||||
@@ -15,42 +16,46 @@ The Standard Test Suite use pytest as a test runner and was built as pytest plug
|
||||
|
||||
Each test suite has a timeout and will fail if the limit is exceeded.
|
||||
|
||||
See all the test cases, their description, and inputs in [Source Acceptance Tests](source-acceptance-tests.md).
|
||||
See all the test cases, their description, and inputs in [Source Acceptance Tests](https://github.com/airbytehq/airbyte/tree/e378d40236b6a34e1c1cb481c8952735ec687d88/docs/contributing-to-airbyte/building-new-connector/source-acceptance-tests.md).
|
||||
|
||||
### Setting up standard tests for your connector
|
||||
|
||||
Create `acceptance-test-config.yml`. In most cases, your connector already has this file in its root folder.
|
||||
Here is an example of the minimal `acceptance-test-config.yml`:
|
||||
Create `acceptance-test-config.yml`. In most cases, your connector already has this file in its root folder. Here is an example of the minimal `acceptance-test-config.yml`:
|
||||
|
||||
```yaml
|
||||
connector_image: airbyte/source-some-connector:dev
|
||||
tests:
|
||||
spec:
|
||||
- spec_path: "some_folder/spec.json"
|
||||
```
|
||||
|
||||
Build your connector image if needed.
|
||||
```
|
||||
|
||||
```text
|
||||
docker build .
|
||||
```
|
||||
Run one of the two scripts in the root of the connector:
|
||||
- `python -m pytest -p integration_tests.acceptance` - to run tests inside virtual environment
|
||||
- `./acceptance-test-docker.sh` - to run tests from a docker container
|
||||
|
||||
If the test fails you will see detail about the test and where to find its inputs and outputs to reproduce it.
|
||||
You can also debug failed tests by adding `—pdb —last-failed`:
|
||||
```
|
||||
Run one of the two scripts in the root of the connector:
|
||||
|
||||
* `python -m pytest -p integration_tests.acceptance` - to run tests inside virtual environment
|
||||
* `./acceptance-test-docker.sh` - to run tests from a docker container
|
||||
|
||||
If the test fails you will see detail about the test and where to find its inputs and outputs to reproduce it. You can also debug failed tests by adding `—pdb —last-failed`:
|
||||
|
||||
```text
|
||||
python -m pytest -p integration_tests.acceptance --pdb --last-failed
|
||||
```
|
||||
|
||||
See other useful pytest options [here](https://docs.pytest.org/en/stable/usage.html)
|
||||
|
||||
### Dynamically managing inputs & resources used in standard tests
|
||||
|
||||
Since the inputs to standard tests are often static, the file-based runner is sufficient for most connectors. However, in some cases, you may need to run pre or post hooks to dynamically create or destroy resources for use in standard tests.
|
||||
For example, if we need to spin up a Redshift cluster to use in the test then tear it down afterwards, we need the ability to run code before and after the tests, as well as customize the Redshift cluster URL we pass to the standard tests.
|
||||
If you have need for this use case, please reach out to us via [Github](https://github.com/airbytehq/airbyte) or [Slack](https://slack.airbyte.io).
|
||||
We currently support it for Java & Python, and other languages can be made available upon request.
|
||||
Since the inputs to standard tests are often static, the file-based runner is sufficient for most connectors. However, in some cases, you may need to run pre or post hooks to dynamically create or destroy resources for use in standard tests. For example, if we need to spin up a Redshift cluster to use in the test then tear it down afterwards, we need the ability to run code before and after the tests, as well as customize the Redshift cluster URL we pass to the standard tests. If you have need for this use case, please reach out to us via [Github](https://github.com/airbytehq/airbyte) or [Slack](https://slack.airbyte.io). We currently support it for Java & Python, and other languages can be made available upon request.
|
||||
|
||||
#### Python
|
||||
Create pytest yield-fixture with your custom setup/teardown code and place it in `integration_tests/acceptance.py`,
|
||||
Example of fixture that starts a docker container before tests and stops before exit:
|
||||
|
||||
Create pytest yield-fixture with your custom setup/teardown code and place it in `integration_tests/acceptance.py`, Example of fixture that starts a docker container before tests and stops before exit:
|
||||
|
||||
```python
|
||||
@pytest.fixture(scope="session", autouse=True)
|
||||
def connector_setup():
|
||||
@@ -108,3 +113,4 @@ Note that integration tests can be triggered with a slightly different syntax fo
|
||||
Commits to `master` attempt to launch integration tests. Two workflows launch for each commit: one is a launcher for integration tests, the other is the core build \(the same as the default for PR and branch builds\).
|
||||
|
||||
Since some of our connectors use rate-limited external resources, we don't want to overload from multiple commits to master. If a certain threshold of `master` integration tests are running, the integration test launcher passes but does not launch any tests. This can manually be re-run if necessary. The `master` build also runs every few hours automatically, and will launch the integration tests at that time.
|
||||
|
||||
|
||||
@@ -1,19 +1,19 @@
|
||||
# Building a Python Source for an HTTP API
|
||||
# Getting Started
|
||||
|
||||
## Summary
|
||||
|
||||
This is a step-by-step guide for how to create an Airbyte source in Python to read data from an HTTP API. We'll be using the
|
||||
Exchange Rates API as an example since it is simple and demonstrates a lot of the capabilities of the CDK.
|
||||
This is a step-by-step guide for how to create an Airbyte source in Python to read data from an HTTP API. We'll be using the Exchange Rates API as an example since it is simple and demonstrates a lot of the capabilities of the CDK.
|
||||
|
||||
## Requirements
|
||||
|
||||
* Python >= 3.7
|
||||
* Python >= 3.7
|
||||
* Docker
|
||||
* NodeJS (only used to generate the connector). We'll remove the NodeJS dependency soon.
|
||||
* NodeJS \(only used to generate the connector\). We'll remove the NodeJS dependency soon.
|
||||
|
||||
All the commands below assume that `python` points to a version of python >=3.7.9. On some systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the case on your machine, substitute all `python` commands in this guide with `python3`.
|
||||
All the commands below assume that `python` points to a version of python >=3.7.9. On some systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the case on your machine, substitute all `python` commands in this guide with `python3`.
|
||||
|
||||
## Checklist
|
||||
|
||||
* Step 1: Create the source using the template
|
||||
* Step 2: Install dependencies for the new source
|
||||
* Step 3: Define the inputs needed by your connector
|
||||
@@ -23,4 +23,5 @@ All the commands below assume that `python` points to a version of python >=3.7.
|
||||
* Step 7: Use the connector in Airbyte
|
||||
* Step 8: Write unit tests or integration tests
|
||||
|
||||
Each step of the Creating a Source checklist is explained in more detail in the following steps. We also mention how you can submit the connector to be included with the general Airbyte release at the end of the tutorial.
|
||||
Each step of the Creating a Source checklist is explained in more detail in the following steps. We also mention how you can submit the connector to be included with the general Airbyte release at the end of the tutorial.
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Step 1: Create the source using template
|
||||
# Step 1: Creating the Source
|
||||
|
||||
Airbyte provides a code generator which bootstraps the scaffolding for our connector.
|
||||
|
||||
@@ -11,5 +11,5 @@ $ npm run generate
|
||||
|
||||
Select the `Python HTTP API Source` template and then input the name of your connector. For this walk-through we will refer to our source as `python-http-example`. The finalized source code for this tutorial can be found [here](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-python-http-tutorial).
|
||||
|
||||
The source we will build in this tutorial will pull data from the [Rates API](ratesapi.io), a free and open API which
|
||||
documents historical exchange rates for fiat currencies.
|
||||
The source we will build in this tutorial will pull data from the [Rates API](https://github.com/airbytehq/airbyte/tree/d940c78307f09f38198e50e54195052d762af944/docs/contributing-to-airbyte/tutorials/cdk-tutorial-alpha/ratesapi.io), a free and open API which documents historical exchange rates for fiat currencies.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
# Step 2: Install dependencies for the new source
|
||||
# Step 2: Install Dependencies
|
||||
|
||||
Now that you've generated the module, let's navigate to its directory and install dependencies:
|
||||
|
||||
```text
|
||||
@@ -12,12 +13,13 @@ This step sets up the initial python environment. **All** subsequent `python` or
|
||||
|
||||
Let's verify everything is working as intended. Run:
|
||||
|
||||
```
|
||||
```text
|
||||
python main_dev.py spec
|
||||
```
|
||||
|
||||
You should see some output:
|
||||
```
|
||||
|
||||
```text
|
||||
{"type": "SPEC", "spec": {"documentationUrl": "https://docsurl.com", "connectionSpecification": {"$schema": "http://json-schema.org/draft-07/schema#", "title": "Python Http Tutorial Spec", "type": "object", "required": ["TODO"], "additionalProperties": false, "properties": {"TODO: This schema defines the configuration required for the source. This usually involves metadata such as database and/or authentication information.": {"type": "string", "description": "describe me"}}}}}
|
||||
```
|
||||
|
||||
@@ -26,6 +28,7 @@ We just ran Airbyte Protocol's `spec` command! We'll talk more about this later,
|
||||
Note that the `main_dev.py` file is a simple script that makes it easy to run your connector. Its invocation format is `python main_dev.py <command> [args]`. See the module's generated `README.md` for the commands it supports.
|
||||
|
||||
## Notes on iteration cycle
|
||||
|
||||
### Dependencies
|
||||
|
||||
Python dependencies for your source should be declared in `airbyte-integrations/connectors/source-<source-name>/setup.py` in the `install_requires` field. You will notice that a couple of Airbyte dependencies are already declared there. Do not remove these; they give your source access to the helper interfaces provided by the generator.
|
||||
@@ -33,9 +36,11 @@ Python dependencies for your source should be declared in `airbyte-integrations/
|
||||
You may notice that there is a `requirements.txt` in your source's directory as well. Don't edit this. It is autogenerated and used to provide Airbyte dependencies. All your dependencies should be declared in `setup.py`.
|
||||
|
||||
### Development Environment
|
||||
|
||||
The commands we ran above created a [Python virtual environment](https://docs.python.org/3/tutorial/venv.html) for your source. If you want your IDE to auto complete and resolve dependencies properly, point it at the virtual env `airbyte-integrations/connectors/source-<source-name>/.venv`. Also anytime you change the dependencies in the `setup.py` make sure to re-run `pip install -r requirements.txt`.
|
||||
|
||||
### Iterating on your implementation
|
||||
|
||||
There are two ways we recommend iterating on a source. Consider using whichever one matches your style.
|
||||
|
||||
**Run the source using python**
|
||||
@@ -70,3 +75,4 @@ docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/sample_files:/sample_files
|
||||
Note: Each time you make a change to your implementation you need to re-build the connector image via `docker build . -t airbyte/source-<name>:dev`. This ensures the new python code is added into the docker container.
|
||||
|
||||
The nice thing about this approach is that you are running your source exactly as it will be run by Airbyte. The tradeoff is iteration is slightly slower, as the connector is re-built between each change.
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Step 3: Define the inputs required by your connector
|
||||
# Step 3: Define Inputs
|
||||
|
||||
Each connector declares the inputs it needs to read data from the underlying data source. This is the Airbyte Protocol's `spec` operation.
|
||||
|
||||
@@ -10,7 +10,7 @@ The generated code that Airbyte provides, handles implementing the `spec` method
|
||||
|
||||
Given that we'll pulling currency data for our example source, we'll define the following `spec.json`:
|
||||
|
||||
```
|
||||
```text
|
||||
{
|
||||
"documentationUrl": "https://docs.airbyte.io/integrations/sources/exchangeratesapi",
|
||||
"connectionSpecification": {
|
||||
@@ -37,5 +37,7 @@ Given that we'll pulling currency data for our example source, we'll define the
|
||||
```
|
||||
|
||||
In addition to metadata, we define two inputs:
|
||||
|
||||
* `start_date`: The beginning date to start tracking currency exchange rates from
|
||||
* `base`: The currency whose rates we're interested in tracking
|
||||
* `base`: The currency whose rates we're interested in tracking
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
# Step 4: Implement connection checking
|
||||
# Step 4: Connection Checking
|
||||
|
||||
The second operation in the Airbyte Protocol that we'll implement is the `check` operation.
|
||||
|
||||
This operation verifies that the input configuration supplied by the user can be used to connect to the underlying data source. Note that this user-supplied configuration has the values described in the `spec.json` filled in. In other words if the `spec.json` said that the source requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. You should then implement something that returns a json object reporting, given the credentials in the config, whether we were able to connect to the source.
|
||||
@@ -38,7 +39,7 @@ Following the docstring instructions, we'll change the implementation to verify
|
||||
|
||||
Let's test out this implementation by creating two objects: a valid and an invalid config and attempt to give them as input to the connector
|
||||
|
||||
```
|
||||
```text
|
||||
echo '{"start_date": "2021-04-01", "base": "USD"}' > sample_files/config.json
|
||||
echo '{"start_date": "2021-04-01", "base": "BTC"}' > sample_files/invalid_config.json
|
||||
python main_dev.py check --config sample_files/config.json
|
||||
@@ -47,7 +48,7 @@ python main_dev.py check --config sample_files/invalid_config.json
|
||||
|
||||
You should see output like the following:
|
||||
|
||||
```
|
||||
```text
|
||||
> python main_dev.py check --config sample_files/config.json
|
||||
{"type": "CONNECTION_STATUS", "connectionStatus": {"status": "SUCCEEDED"}}
|
||||
|
||||
@@ -55,4 +56,5 @@ You should see output like the following:
|
||||
{"type": "CONNECTION_STATUS", "connectionStatus": {"status": "FAILED", "message": "Input currency BTC is invalid. Please input one of the following currencies: {'DKK', 'USD', 'CZK', 'BGN', 'JPY'}"}}
|
||||
```
|
||||
|
||||
While developing, we recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` directory is gitignored by default.
|
||||
While developing, we recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` directory is gitignored by default.
|
||||
|
||||
@@ -1,14 +1,13 @@
|
||||
# Step 5: Declare the schema of your streams
|
||||
# Step 5: Declare the Schema
|
||||
|
||||
The `discover` method of the Airbyte Protocol returns an `AirbyteCatalog`: an object which declares all the streams output by a connector and their schemas. It also declares the sync modes supported by the stream (full refresh or incremental). See the [catalog tutorial](https://docs.airbyte.io/tutorials/tutorials/beginners-guide-to-catalog) for more information.
|
||||
The `discover` method of the Airbyte Protocol returns an `AirbyteCatalog`: an object which declares all the streams output by a connector and their schemas. It also declares the sync modes supported by the stream \(full refresh or incremental\). See the [catalog tutorial](https://docs.airbyte.io/tutorials/tutorials/beginners-guide-to-catalog) for more information.
|
||||
|
||||
This is a simple task with the Airbyte CDK. For each stream in our connector we'll need to:
|
||||
1. Create a python `class` in `source.py` which extends `HttpStream`
|
||||
2. Place a `<stream_name>.json` file in the `source_<name>/schemas/` directory. The name of the file should be the snake_case name of the stream whose schema it describes, and its contents should be the JsonSchema describing the output from that stream.
|
||||
This is a simple task with the Airbyte CDK. For each stream in our connector we'll need to: 1. Create a python `class` in `source.py` which extends `HttpStream` 2. Place a `<stream_name>.json` file in the `source_<name>/schemas/` directory. The name of the file should be the snake\_case name of the stream whose schema it describes, and its contents should be the JsonSchema describing the output from that stream.
|
||||
|
||||
Let's create a class in `source.py` which extends `HttpStream`. You'll notice there are classes with extensive comments describing what needs to be done to implement various connector features. Feel free to read these classes as needed. But for the purposes of this tutorial, let's assume that we are adding classes from scratch either by deleting those generated classes or editing them to match the implementation below.
|
||||
|
||||
We'll begin by creating a stream to represent the data that we're pulling from the Exchange Rates API:
|
||||
|
||||
```python
|
||||
class ExchangeRates(HttpStream):
|
||||
url_base = "https://api.ratesapi.io/"
|
||||
@@ -32,8 +31,7 @@ class ExchangeRates(HttpStream):
|
||||
stream_slice: Mapping[str, Any] = None,
|
||||
next_page_token: Mapping[str, Any] = None,
|
||||
) -> Iterable[Mapping]:
|
||||
return None # TODO
|
||||
|
||||
return None # TODO
|
||||
```
|
||||
|
||||
Note that this implementation is entirely empty -- we haven't actually done anything. We'll come back to this in the next step. But for now we just want to declare the schema of this stream. We'll declare this as a stream that the connector outputs by returning it from the `streams` method:
|
||||
@@ -52,22 +50,23 @@ class SourcePythonHttpTutorial(AbstractSource):
|
||||
# Other authenticators are available for API token-based auth and Oauth2.
|
||||
auth = NoAuth()
|
||||
return [ExchangeRates(authenticator=auth)]
|
||||
|
||||
```
|
||||
|
||||
Having created this stream in code, we'll put a file `exchange_rates.json` in the `schemas/` folder. You can download the JSON file describing the output schema [here](http_api_source_assets/exchange_rates.json) for convenience and place it in `schemas/`.
|
||||
Having created this stream in code, we'll put a file `exchange_rates.json` in the `schemas/` folder. You can download the JSON file describing the output schema [here](https://github.com/airbytehq/airbyte/tree/d940c78307f09f38198e50e54195052d762af944/docs/contributing-to-airbyte/tutorials/cdk-tutorial-alpha/http_api_source_assets/exchange_rates.json) for convenience and place it in `schemas/`.
|
||||
|
||||
With `.json` schema file in place, let's see if the connector can now find this schema and produce a valid catalog:
|
||||
|
||||
```
|
||||
```text
|
||||
python main_dev.py discover --config sample_files/config.json
|
||||
```
|
||||
|
||||
you should see some output like:
|
||||
```
|
||||
|
||||
```text
|
||||
{"type": "CATALOG", "catalog": {"streams": [{"name": "exchange_rates", "json_schema": {"$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": {"base": {"type": "string"}, "rates": {"type": "object", "properties": {"GBP": {"type": "number"}, "HKD": {"type": "number"}, "IDR": {"type": "number"}, "PHP": {"type": "number"}, "LVL": {"type": "number"}, "INR": {"type": "number"}, "CHF": {"type": "number"}, "MXN": {"type": "number"}, "SGD": {"type": "number"}, "CZK": {"type": "number"}, "THB": {"type": "number"}, "BGN": {"type": "number"}, "EUR": {"type": "number"}, "MYR": {"type": "number"}, "NOK": {"type": "number"}, "CNY": {"type": "number"}, "HRK": {"type": "number"}, "PLN": {"type": "number"}, "LTL": {"type": "number"}, "TRY": {"type": "number"}, "ZAR": {"type": "number"}, "CAD": {"type": "number"}, "BRL": {"type": "number"}, "RON": {"type": "number"}, "DKK": {"type": "number"}, "NZD": {"type": "number"}, "EEK": {"type": "number"}, "JPY": {"type": "number"}, "RUB": {"type": "number"}, "KRW": {"type": "number"}, "USD": {"type": "number"}, "AUD": {"type": "number"}, "HUF": {"type": "number"}, "SEK": {"type": "number"}}}, "date": {"type": "string"}}}, "supported_sync_modes": ["full_refresh"]}]}}
|
||||
```
|
||||
|
||||
It's that simple! Now the connector knows how to declare your connector's stream's schema. We declare only one stream since our source is simple, but the principle is exactly the same if you had many streams.
|
||||
|
||||
You can also dynamically define schemas, but that's beyond the scope of this tutorial. See the [schema docs](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/bases/base-python/docs/schemas.md) for more information.
|
||||
|
||||
@@ -1,20 +1,24 @@
|
||||
# Step 6: Read data from the API
|
||||
# Step 6: Read Data
|
||||
|
||||
Describing schemas is good and all, but at some point we have to start reading data! So let's get to work. But before, let's describe what we're about to do:
|
||||
|
||||
The `HttpStream` superclass, like described in the [concepts documentation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/bases/base-python/README.md), is facilitating reading data from HTTP endpoints. It contains built-in functions or helpers for:
|
||||
|
||||
* authentication
|
||||
* pagination
|
||||
* handling rate limiting or transient errors
|
||||
* and other useful functionality
|
||||
|
||||
In order for it to be able to do this, we have to provide it with a few inputs:
|
||||
|
||||
* the URL base and path of the endpoint we'd like to hit
|
||||
* how to parse the response from the API
|
||||
* how to perform pagination
|
||||
|
||||
Optionally, we can provide additional inputs to customize requests:
|
||||
|
||||
* request parameters and headers
|
||||
* how to recognize rate limit errors, and how long to wait (by default it retries 429 and 5XX errors using exponential backoff)
|
||||
* how to recognize rate limit errors, and how long to wait \(by default it retries 429 and 5XX errors using exponential backoff\)
|
||||
* HTTP method and request body if applicable
|
||||
|
||||
There are many other customizable options - you can find them in the [`base_python.cdk.streams.http.HttpStream`](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/bases/base-python/base_python/cdk/streams/http.py) class.
|
||||
@@ -26,12 +30,12 @@ Let's begin by pulling data for the last day's rates by using the `/latest` endp
|
||||
```python
|
||||
class ExchangeRates(HttpStream):
|
||||
url_base = "https://api.ratesapi.io/"
|
||||
|
||||
|
||||
def __init__(self, base: str, **kwargs):
|
||||
super().__init__()
|
||||
self.base = base
|
||||
|
||||
|
||||
|
||||
def path(
|
||||
self,
|
||||
stream_state: Mapping[str, Any] = None,
|
||||
@@ -65,14 +69,9 @@ class ExchangeRates(HttpStream):
|
||||
# The API does not offer pagination,
|
||||
# so we return None to indicate there are no more pages in the response
|
||||
return None
|
||||
|
||||
```
|
||||
|
||||
This may look big, but that's just because there are lots of (unused, for now) parameters in these methods (those can be hidden with Python's `**kwargs`, but don't worry about it for now). Really we just added a few lines of "significant" code:
|
||||
1. Added a constructor `__init__` which stores the `base` currency to query for.
|
||||
2. `return {'base': self.base}` to add the `?base=<base-value>` query parameter to the request based on the `base` input by the user.
|
||||
3. `return [response.json()]` to parse the response from the API to match the schema of our schema `.json` file.
|
||||
4. `return "latest"` to indicate that we want to hit the `/latest` endpoint of the API to get the latest exchange rate data.
|
||||
This may look big, but that's just because there are lots of \(unused, for now\) parameters in these methods \(those can be hidden with Python's `**kwargs`, but don't worry about it for now\). Really we just added a few lines of "significant" code: 1. Added a constructor `__init__` which stores the `base` currency to query for. 2. `return {'base': self.base}` to add the `?base=<base-value>` query parameter to the request based on the `base` input by the user. 3. `return [response.json()]` to parse the response from the API to match the schema of our schema `.json` file. 4. `return "latest"` to indicate that we want to hit the `/latest` endpoint of the API to get the latest exchange rate data.
|
||||
|
||||
Let's also pass the `base` parameter input by the user to the stream class:
|
||||
|
||||
@@ -84,15 +83,15 @@ def streams(self, config: Mapping[str, Any]) -> List[Stream]:
|
||||
|
||||
We're now ready to query the API!
|
||||
|
||||
To do this, we'll need a [ConfiguredCatalog](https://docs.airbyte.io/tutorials/tutorials/beginners-guide-to-catalog). We've prepared one [here](http_api_source_assets/configured_catalog.json) -- download this and place it in `sample_files/configured_catalog.json`. Then run:
|
||||
To do this, we'll need a [ConfiguredCatalog](https://docs.airbyte.io/tutorials/tutorials/beginners-guide-to-catalog). We've prepared one [here](https://github.com/airbytehq/airbyte/tree/d940c78307f09f38198e50e54195052d762af944/docs/contributing-to-airbyte/tutorials/cdk-tutorial-alpha/http_api_source_assets/configured_catalog.json) -- download this and place it in `sample_files/configured_catalog.json`. Then run:
|
||||
|
||||
```
|
||||
```text
|
||||
python main_dev.py read --config sample_files/config.json --catalog sample_files/configured_catalog.json
|
||||
```
|
||||
|
||||
you should see some output lines, one of which is a record from the API:
|
||||
|
||||
```
|
||||
```text
|
||||
{"type": "RECORD", "record": {"stream": "exchange_rates", "data": {"base": "USD", "rates": {"GBP": 0.7196938353, "HKD": 7.7597848573, "IDR": 14482.4824162185, "ILS": 3.2412081092, "DKK": 6.1532478279, "INR": 74.7852709971, "CHF": 0.915763343, "MXN": 19.8439387671, "CZK": 21.3545717832, "SGD": 1.3261894911, "THB": 31.4398014067, "HRK": 6.2599917253, "EUR": 0.8274720728, "MYR": 4.0979726934, "NOK": 8.3043442284, "CNY": 6.4856433595, "BGN": 1.61836988, "PHP": 48.3516756309, "PLN": 3.770872983, "ZAR": 14.2690111709, "CAD": 1.2436905254, "ISK": 124.9482829954, "BRL": 5.4526272238, "RON": 4.0738932561, "NZD": 1.3841125362, "TRY": 8.3101365329, "JPY": 108.0182043856, "RUB": 74.9555647497, "KRW": 1111.7583781547, "USD": 1.0, "AUD": 1.2840711626, "HUF": 300.6206040546, "SEK": 8.3829540753}, "date": "2021-04-26"}, "emitted_at": 1619498062000}}
|
||||
```
|
||||
|
||||
@@ -101,13 +100,8 @@ There we have it - a stream which reads data in just a few lines of code!
|
||||
We theoretically _could_ stop here and call it a connector. But let's give adding incremental sync a shot.
|
||||
|
||||
## Adding incremental sync
|
||||
To add incremental sync, we'll do a few things:
|
||||
1. Pass the `start_date` param input by the user into the stream.
|
||||
2. Declare the stream's `cursor_field`.
|
||||
3. Implement the `get_updated_state` method.
|
||||
4. Implement the `stream_slices` method.
|
||||
5. Update the `path` method to specify the date to pull exchange rates for.
|
||||
6. Update the configured catalog to use `incremental` sync when we're testing the stream.
|
||||
|
||||
To add incremental sync, we'll do a few things: 1. Pass the `start_date` param input by the user into the stream. 2. Declare the stream's `cursor_field`. 3. Implement the `get_updated_state` method. 4. Implement the `stream_slices` method. 5. Update the `path` method to specify the date to pull exchange rates for. 6. Update the configured catalog to use `incremental` sync when we're testing the stream.
|
||||
|
||||
We'll describe what each of these methods do below. Before we begin, it may help to familiarize yourself with how incremental sync works in Airbyte by reading the [docs on incremental](https://docs.airbyte.io/architecture/connections/incremental-append).
|
||||
|
||||
@@ -155,7 +149,7 @@ Let's do this by implementing the `get_updated_state` method inside the `Exchang
|
||||
return {'date': max(current_parsed_date, latest_record_date).strftime('%Y-%m-%d')}
|
||||
else:
|
||||
return {'date': self.start_date.strftime('%Y-%m-%d')}
|
||||
```
|
||||
```
|
||||
|
||||
This implementation compares the date from the latest record with the date in the current state and takes the maximum as the "new" state object.
|
||||
|
||||
@@ -177,11 +171,12 @@ We'll implement the `stream_slices` method to return a list of the dates for whi
|
||||
Optional[Mapping[str, any]]]:
|
||||
start_date = datetime.strptime(stream_state['date'], '%Y-%m-%d') if stream_state and 'date' in stream_state else self.start_date
|
||||
return self._chunk_date_range(start_date)
|
||||
```
|
||||
```
|
||||
|
||||
Each slice will cause an HTTP request to be made to the API. We can then use the information present in the `stream_slice` parameter (a single element from the list we constructed in `stream_slices` above) to set other configurations for the outgoing request like `path` or `request_params`. For more info about stream slicing, see [the slicing docs](../concepts/stream_slices.md).
|
||||
Each slice will cause an HTTP request to be made to the API. We can then use the information present in the `stream_slice` parameter \(a single element from the list we constructed in `stream_slices` above\) to set other configurations for the outgoing request like `path` or `request_params`. For more info about stream slicing, see [the slicing docs](https://github.com/airbytehq/airbyte/tree/d940c78307f09f38198e50e54195052d762af944/docs/contributing-to-airbyte/tutorials/concepts/stream_slices.md).
|
||||
|
||||
In order to pull data for a specific date, the Exchange Rates API requires that we pass the date as the path component of the URL. Let's override the `path` method to achieve this:
|
||||
|
||||
```python
|
||||
def path(self, stream_state: Mapping[str, Any] = None, stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None) -> str:
|
||||
return stream_slice['date']
|
||||
@@ -190,7 +185,8 @@ def path(self, stream_state: Mapping[str, Any] = None, stream_slice: Mapping[str
|
||||
With these changes, your implementation should look like the file [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-python-http-tutorial/source_python_http_tutorial/source.py).
|
||||
|
||||
The last thing we need to do is change the `sync_mode` field in the `sample_files/configured_catalog.json` to `incremental`:
|
||||
```
|
||||
|
||||
```text
|
||||
"sync_mode": "incremental",
|
||||
```
|
||||
|
||||
@@ -198,12 +194,13 @@ We should now have a working implementation of incremental sync!
|
||||
|
||||
Let's try it out:
|
||||
|
||||
```
|
||||
```text
|
||||
python main_dev.py read --config sample_files/config.json --catalog sample_files/configured_catalog.json
|
||||
```
|
||||
|
||||
You should see a bunch of `RECORD` messages and `STATE` messages. To verify that incremental sync is working, pass the input state back to the connector and run it again:
|
||||
```
|
||||
|
||||
```text
|
||||
# Save the latest state to sample_files/state.json
|
||||
python main_dev.py read --config sample_files/config.json --catalog sample_files/configured_catalog.json | grep STATE | tail -n 1 | jq .state.data > sample_files/state.json
|
||||
|
||||
@@ -213,4 +210,5 @@ python main_dev.py read --config sample_files/config.json --catalog sample_files
|
||||
|
||||
You should see that only the record from the last date is being synced! This is acceptable behavior, since Airbyte requires at-least-once delivery of records, so repeating the last record twice is OK.
|
||||
|
||||
With that, we've implemented incremental sync for our connector!
|
||||
With that, we've implemented incremental sync for our connector!
|
||||
|
||||
@@ -1,4 +1,6 @@
|
||||
### Step 7: Use the connector in Airbyte
|
||||
# Step 7: Use the Connector in Airbyte
|
||||
|
||||
To use your connector in your own installation of Airbyte, build the docker image for your container by running `docker build . -t airbyte/source-python-http-example:dev`. Then, follow the instructions from the [building a toy source tutorial](https://docs.airbyte.io/tutorials/tutorials/toy-connector#use-the-connector-in-the-airbyte-ui) for using the connector in the Airbyte UI, replacing the name as appropriate.
|
||||
|
||||
Note: your built docker image must be accessible to the `docker` daemon running on the Airbyte node. If you're doing this tutorial locally, these instructions are sufficient. Otherwise you may need to push your Docker image to Dockerhub.
|
||||
Note: your built docker image must be accessible to the `docker` daemon running on the Airbyte node. If you're doing this tutorial locally, these instructions are sufficient. Otherwise you may need to push your Docker image to Dockerhub.
|
||||
|
||||
@@ -1,14 +1,18 @@
|
||||
# Step 8: Test your connector
|
||||
# Step 8: Test Connector
|
||||
|
||||
## Unit Tests
|
||||
|
||||
Add any relevant unit tests to the `unit_tests` directory. Unit tests should **not** depend on any secrets.
|
||||
|
||||
You can run the tests using `python -m pytest -s unit_tests`
|
||||
|
||||
## Integration Tests
|
||||
|
||||
Place any integration tests in the `integration_tests` directory such that they can be [discovered by pytest](https://docs.pytest.org/en/reorganize-docs/new-docs/user/naming_conventions.html).
|
||||
|
||||
## Standard Tests
|
||||
|
||||
Standard tests are a fixed set of tests Airbyte provides that every Airbyte source connector must pass. While they're only required if you intend to submit your connector to Airbyte, you might find them helpful in any case. See [Testing your connectors](https://docs.airbyte.io/contributing-to-airbyte/building-new-connector/testing-connectors)
|
||||
|
||||
If you want to submit this connector to become a default connector within Airbyte, follow
|
||||
steps 8 onwards from the [Python source checklist](https://docs.airbyte.io/tutorials/tutorials/building-a-python-source#step-8-set-up-standard-tests)
|
||||
If you want to submit this connector to become a default connector within Airbyte, follow steps 8 onwards from the [Python source checklist](https://docs.airbyte.io/tutorials/tutorials/building-a-python-source#step-8-set-up-standard-tests)
|
||||
|
||||
@@ -0,0 +1,2 @@
|
||||
# Using the Airbyte CDK
|
||||
|
||||
@@ -1 +0,0 @@
|
||||
# Tutorials
|
||||
@@ -1 +0,0 @@
|
||||
# CDK Python Source Tutorial (Alpha)
|
||||
@@ -1,7 +1,8 @@
|
||||
# On AWS \(ECS\)
|
||||
# On AWS ECS \(Coming Soon\)
|
||||
|
||||
{% hint style="warn" %}
|
||||
{% hint style="info" %}
|
||||
We do not currently support deployment on ECS.
|
||||
{% endhint %}
|
||||
|
||||
The current iteration is not compatible with ECS. Airbyte currently relies on docker containers being able to create other docker containers. ECS does not permit containers to do this. We will be revising this strategy soon, so that we can be compatible with ECS and other container services.
|
||||
|
||||
|
||||
@@ -1 +1,2 @@
|
||||
# Examples
|
||||
# Example Use Cases
|
||||
|
||||
|
||||
@@ -44,19 +44,19 @@ Choosing Zoom as **source type** will cause Airbyte to display the configuration
|
||||
|
||||

|
||||
|
||||
The Zoom connector for Airbyte requires you to provide it with a Zoom JWT token. Let’s take a detour and look at how to obtain one from Zoom.
|
||||
The Zoom connector for Airbyte requires you to provide it with a Zoom JWT token. Let’s take a detour and look at how to obtain one from Zoom.
|
||||
|
||||
### Obtaining a Zoom JWT Token
|
||||
|
||||
To obtain a Zoom JWT Token, login to your Zoom account and go to the [Zoom Marketplace](https://marketplace.zoom.us/). If this is your first time in the marketplace, you will need to agree to the Zoom’s marketplace terms of use.
|
||||
To obtain a Zoom JWT Token, login to your Zoom account and go to the [Zoom Marketplace](https://marketplace.zoom.us/). If this is your first time in the marketplace, you will need to agree to the Zoom’s marketplace terms of use.
|
||||
|
||||
Once you are in, you need to click on the **Develop** dropdown and then click on **Build App.**
|
||||
Once you are in, you need to click on the **Develop** dropdown and then click on **Build App.**
|
||||
|
||||

|
||||
|
||||
Clicking on **Build App** for the first time will display a modal for you to accept the Zoom’s API license and terms of use. Do accept if you agree and you will be presented with the below screen.
|
||||
|
||||

|
||||

|
||||
|
||||
Select **JWT** as the app you want to build and click on the **Create** button on the card. You will be presented with a modal to enter the app name; type in `airbyte-zoom`.
|
||||
|
||||
@@ -78,15 +78,15 @@ After copying it, click on the **Continue** button.
|
||||
|
||||

|
||||
|
||||
You will be taken to a screen to activate **Event Subscriptions**. Just leave it as is, as we won’t be needing Webhooks. Click on **Continue**, and your app should be marked as activated.
|
||||
You will be taken to a screen to activate **Event Subscriptions**. Just leave it as is, as we won’t be needing Webhooks. Click on **Continue**, and your app should be marked as activated.
|
||||
|
||||
### Connecting Zoom on Airbyte
|
||||
|
||||
So let’s go back to the Airbyte web UI and provide it with the JWT token we copied from our Zoom app.
|
||||
|
||||
Now click on the **Set up source** button. You will see the below success message when the connection is made successfully.
|
||||
Now click on the **Set up source** button. You will see the below success message when the connection is made successfully.
|
||||
|
||||

|
||||

|
||||
|
||||
And you will be taken to the page to add your destination.
|
||||
|
||||
@@ -94,19 +94,19 @@ And you will be taken to the page to add your destination.
|
||||
|
||||

|
||||
|
||||
For our destination, we will be using a PostgreSQL database, since Tableau supports PostgreSQL as a data source. Click on the **add destination** button, and then in the drop down click on **+ add a new destination**. In the page that presents itself, add the destination name and choose the Postgres destination.
|
||||
For our destination, we will be using a PostgreSQL database, since Tableau supports PostgreSQL as a data source. Click on the **add destination** button, and then in the drop down click on **+ add a new destination**. In the page that presents itself, add the destination name and choose the Postgres destination.
|
||||
|
||||

|
||||
|
||||
To supply Airbyte with the PostgreSQL configuration parameters needed to make a PostgreSQL destination, we will spin off a PostgreSQL container with Docker using the following command in our terminal.
|
||||
|
||||
To supply Airbyte with the PostgreSQL configuration parameters needed to make a PostgreSQL destination, we will spin off a PostgreSQL container with Docker using the following command in our terminal.
|
||||
|
||||
`docker run --rm --name airbyte-zoom-db -e POSTGRES_PASSWORD=password -v airbyte_zoom_data:/var/lib/postgresql/data -p 2000:5432 -d postgres`
|
||||
|
||||
This will spin a docker container and persist the data we will be replicating in the PostgreSQL database in a Docker volume `airbyte_zoom_data`.
|
||||
|
||||
This will spin a docker container and persist the data we will be replicating in the PostgreSQL database in a Docker volume `airbyte_zoom_data`.
|
||||
|
||||
Now, let’s supply the above credentials to the Airbyte UI requiring those credentials.
|
||||
|
||||

|
||||

|
||||
|
||||
Then click on the **Set up destination** button.
|
||||
|
||||
@@ -114,20 +114,20 @@ After the connection has been made to your PostgreSQL database successfully, Air
|
||||
|
||||
Leave all the fields checked.
|
||||
|
||||

|
||||

|
||||
|
||||
Select a **Sync frequency** of **manual** and then click on **Set up connection**.
|
||||
|
||||
After successfully making the connection, you will see your PostgreSQL destination. Click on the Launch button to start the data replication.
|
||||
|
||||

|
||||

|
||||
|
||||
Then click on the **airbyte-zoom-destination** to see the Sync page.
|
||||
Then click on the **airbyte-zoom-destination** to see the Sync page.
|
||||
|
||||

|
||||

|
||||
|
||||
Syncing should take a few minutes or longer depending on the size of the data being replicated. Once Airbyte is done replicating the data, you will get a **succeeded** status.
|
||||
|
||||
Syncing should take a few minutes or longer depending on the size of the data being replicated. Once Airbyte is done replicating the data, you will get a **succeeded** status.
|
||||
|
||||
Then, you can run the following SQL command on the PostgreSQL container to confirm that the sync was done successfully.
|
||||
|
||||
`docker exec airbyte-zoom-db psql -U postgres -c "SELECT * FROM public.users;"`
|
||||
@@ -144,15 +144,15 @@ Go ahead and install Tableau on your machine. After the installation is complete
|
||||
|
||||
Once your activation is successful, you will see your Tableau dashboard.
|
||||
|
||||

|
||||

|
||||
|
||||
On the sidebar menu under the **To a Server** section, click on the **More…** menu. You will see a list of datasource connectors you can connect Tableau with.
|
||||
|
||||

|
||||

|
||||
|
||||
Select **PostgreSQL** and you will be presented with a connection credentials modal.
|
||||
|
||||
Fill in the same details of the PostgreSQL database we used as the destination in Airbyte.
|
||||
Fill in the same details of the PostgreSQL database we used as the destination in Airbyte.
|
||||
|
||||

|
||||
|
||||
@@ -160,8 +160,6 @@ Next, click on the **Sign In** button. If the connection was made successfully,
|
||||
|
||||
_Note: If you are having trouble connecting PostgreSQL with Tableau, it might be because the driver Tableau comes with for PostgreSQL might not work for newer versions of PostgreSQL. You can download the JDBC driver for PostgreSQL_ [_here_](https://www.tableau.com/support/drivers?_ga=2.62351404.1800241672.1616922684-1838321730.1615100968) _and follow the setup instructions._
|
||||
|
||||
------
|
||||
|
||||
Now that we have replicated our Zoom data into a PostgreSQL database using Airbyte’s Zoom connector, and connected Tableau with our PostgreSQL database containing our Zoom data, let’s proceed to creating the charts we need to visualize the time spent by a team in Zoom calls.
|
||||
|
||||
## Step 3: Create the charts on Tableau with the Zoom data
|
||||
@@ -172,25 +170,23 @@ To create this chart, we will need to use the count of the meetings and the **cr
|
||||
|
||||

|
||||
|
||||
Drag the **meetings** table from the sidebar onto the space with the prompt.
|
||||
Drag the **meetings** table from the sidebar onto the space with the prompt.
|
||||
|
||||
|
||||
|
||||
Now that we have the meetings table, we can start building out the chart by clicking on **Sheet 1** at the bottom left of Tableau.
|
||||
Now that we have the meetings table, we can start building out the chart by clicking on **Sheet 1** at the bottom left of Tableau.
|
||||
|
||||

|
||||
|
||||
As stated earlier, we need **Created At**, but currently it’s a String data type. Let’s change that by converting it to a data time. So right click on **Created At**, then select `ChangeDataType` and choose Date & Time. And that’s it! That field is now of type **Date** & **Time**.
|
||||
As stated earlier, we need **Created At**, but currently it’s a String data type. Let’s change that by converting it to a data time. So right click on **Created At**, then select `ChangeDataType` and choose Date & Time. And that’s it! That field is now of type **Date** & **Time**.
|
||||
|
||||

|
||||
|
||||
Next, drag **Created At** to **Columns**.
|
||||
Next, drag **Created At** to **Columns**.
|
||||
|
||||

|
||||
|
||||
Currently, we get the Created At in **YEAR**, but per our requirement we want them in Weeks, so right click on the **YEAR\(Created At\)** and choose **Week Number**.
|
||||
|
||||

|
||||

|
||||
|
||||
Tableau should now look like this:
|
||||
|
||||
@@ -198,7 +194,7 @@ Tableau should now look like this:
|
||||
|
||||
Now, to finish up, we need to add the **meetings\(Count\) measure** Tableau already calculated for us in the **Rows** section. So drag **meetings\(Count\)** onto the Columns section to complete the chart.
|
||||
|
||||

|
||||

|
||||
|
||||
And now we are done with the very first chart. Let's save the sheet and create a new Dashboard that we will add this sheet to as well as the others we will be creating.
|
||||
|
||||
@@ -224,19 +220,19 @@ Note: We are adding a filter on the Duration to filter out null values. You can
|
||||
|
||||
### Evolution of the number of participants for all meetings per week
|
||||
|
||||
For this chart, we will need to have a calculated field called **\# of meetings attended**, which will be an aggregate of the counts of rows matching a particular user's email in the `report_meeting_participants` table plotted against the **Created At** field of the **meetings** table. To get this done, right click on the **User Email** field. Select **create** and click on **calculatedField**, then enter the title of the field as **\# of meetings attended**. Next, enter the below formula:
|
||||
For this chart, we will need to have a calculated field called **\# of meetings attended**, which will be an aggregate of the counts of rows matching a particular user's email in the `report_meeting_participants` table plotted against the **Created At** field of the **meetings** table. To get this done, right click on the **User Email** field. Select **create** and click on **calculatedField**, then enter the title of the field as **\# of meetings attended**. Next, enter the below formula:
|
||||
|
||||
`COUNT(IF [User Email] == [User Email] THEN [Id (Report Meeting Participants)] END)`
|
||||
|
||||
Then click on apply. Finally, drag the **Created At** fields \(make sure it’s on the **Weekly** number\) and the calculated field you just created to match the below screenshot:
|
||||
|
||||

|
||||

|
||||
|
||||
### Listing of team members with the number of meetings per week and number of hours spent in meetings, ranked.
|
||||
|
||||
To get this chart, we need to create a relationship between the **meetings table** and the `report_meeting_participants` table. You can do this by dragging the `report_meeting_participants` table in as a source alongside the **meetings** table and relate both via the **meeting id**. Then you will be able to create a new worksheet that looks like this:
|
||||
|
||||

|
||||

|
||||
|
||||
Note: To achieve the ranking, we simply use the sort menu icon on the top menu bar.
|
||||
|
||||
@@ -250,7 +246,7 @@ The rest of the charts will be needing the **webinars** and `report_webinar_part
|
||||
|
||||
For this chart, as for the meeting’s counterpart, we will get a calculated field off the Duration field to get the **Webinar Duration in Hours**, and then plot **Created At** against the **Sum of Webinar Duration in Hours**, as shown in the screenshot below. Note: Make sure you create a new sheet for each of these graphs.
|
||||
|
||||

|
||||

|
||||
|
||||
### Evolution of the number of participants for all webinars per week
|
||||
|
||||
@@ -264,7 +260,7 @@ Below is the chart:
|
||||
|
||||

|
||||
|
||||
#### Listing of team members with the number of webinars per week and number of hours spent in meetings, ranked
|
||||
#### Listing of team members with the number of webinars per week and number of hours spent in meetings, ranked
|
||||
|
||||
Below is the chart with these specs
|
||||
|
||||
@@ -272,6 +268,7 @@ Below is the chart with these specs
|
||||
|
||||
## Conclusion
|
||||
|
||||
In this article, we see how we can use Airbyte to get data off the Zoom API onto a PostgreSQL database, and then use that data to create some chart visualizations in Tableau.
|
||||
In this article, we see how we can use Airbyte to get data off the Zoom API onto a PostgreSQL database, and then use that data to create some chart visualizations in Tableau.
|
||||
|
||||
You can leverage Airbyte and Tableau to produce graphs on any collaboration tool. We just used Zoom to illustrate how it can be done. Hope this is helpful!
|
||||
|
||||
You can leverage Airbyte and Tableau to produce graphs on any collaboration tool. We just used Zoom to illustrate how it can be done. Hope this is helpful!
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Meltano vs Airbyte
|
||||
|
||||
We wrote an article, “[The State of Open-Source Data Integration and ETL](https://airbyte.io/articles/data-engineering-thoughts/the-state-of-open-source-data-integration-and-etl/),” in which we list and compare all ETL-related open-source projects, including Meltano and Airbyte. Don’t hesitate to check it out for more detailed arguments. As a summary, here are the differences:
|
||||
We wrote an article, “[The State of Open-Source Data Integration and ETL](https://airbyte.io/articles/data-engineering-thoughts/the-state-of-open-source-data-integration-and-etl/),” in which we list and compare all ETL-related open-source projects, including Meltano and Airbyte. Don’t hesitate to check it out for more detailed arguments. As a summary, here are the differences:
|
||||
|
||||

|
||||
|
||||
@@ -16,7 +16,7 @@ Meltano is a Gitlab side project. Since 2019, they have been iterating on severa
|
||||
|
||||
## **Airbyte:**
|
||||
|
||||
In contrast, Airbyte is a company fully committed to the open-source MIT project and has a [business model](../../company-handbook/business-model.md)in mind around this project. Our [team](../../company-handbook/team.md) are data integration experts that have built more than 1,000 integrations collectively at large scale. The team now counts 20 engineers working full-time on Airbyte.
|
||||
In contrast, Airbyte is a company fully committed to the open-source MIT project and has a [business model](https://github.com/airbytehq/airbyte/tree/428e10e727c05e5aed4235610ab86f0e5b304864/docs/company-handbook/business-model.md)in mind around this project. Our [team](https://github.com/airbytehq/airbyte/tree/428e10e727c05e5aed4235610ab86f0e5b304864/docs/company-handbook/team.md) are data integration experts that have built more than 1,000 integrations collectively at large scale. The team now counts 20 engineers working full-time on Airbyte.
|
||||
|
||||
* **Airbyte supports more than 60 connectors after only 8 months since its inception**, 20% of which were built by the community. Our ambition is to support **200+ connectors by the end of 2021.**
|
||||
* Airbyte’s connectors are **usable out of the box through a UI and API,** with monitoring, scheduling and orchestration. Airbyte was built on the premise that a user, whatever their background, should be able to move data in 2 minutes. Data engineers might want to use raw data and their own transformation processes, or to use Airbyte’s API to include data integration in their workflows. On the other hand, analysts and data scientists might want to use normalized consolidated data in their database or data warehouses. Airbyte supports all these use cases.
|
||||
|
||||
@@ -71,23 +71,26 @@ Depending on your Docker network configuration, you may not be able to connect t
|
||||
|
||||
If you are running into connection refused errors when running Airbyte via Docker Compose on Mac, try using `host.docker.internal` as the host. On Linux, you may have to modify `docker-compose.yml` and add a host that maps to your local machine using [`extra_hosts`](https://docs.docker.com/compose/compose-file/compose-file-v3/#extra_hosts).
|
||||
|
||||
## **Do you support change data capture (CDC) or logical replication for databases?**
|
||||
## **Do you support change data capture \(CDC\) or logical replication for databases?**
|
||||
|
||||
We currently support [CDC for Postgres 10+](../integrations/sources/postgres.md). We are adding support for a few other databases April/May 2021.
|
||||
We currently support [CDC for Postgres 10+](../integrations/sources/postgres.md). We are adding support for a few other databases April/May 2021.
|
||||
|
||||
## **Can I disable analytics in Airbyte?**
|
||||
|
||||
Yes, you can control what's sent outside of Airbyte for analytics purposes.
|
||||
|
||||
We instrumented some parts of Airbyte for the following reasons:
|
||||
- measure usage of Airbyte
|
||||
- measure usage of features & connectors
|
||||
- collect connector telemetry to measure stability
|
||||
- reach out to our users if they opt-in
|
||||
- ...
|
||||
|
||||
* measure usage of Airbyte
|
||||
* measure usage of features & connectors
|
||||
* collect connector telemetry to measure stability
|
||||
* reach out to our users if they opt-in
|
||||
* ...
|
||||
|
||||
To disable telemetry, modify the `.env` file and define the two following environment variables:
|
||||
```
|
||||
|
||||
```text
|
||||
TRACKING_STRATEGY=logging
|
||||
PAPERCUPS_STORYTIME=disabled
|
||||
```
|
||||
|
||||
|
||||
@@ -14,6 +14,7 @@ Airbyte continues to sync data using the configured schema until that schema is
|
||||
|
||||
For now, the schema can only be updated manually in the UI \(by clicking "Update Schema" in the settings page for the connection\). When a schema is updated Airbyte will re-sync all data for that source using the new schema.
|
||||
|
||||
## **How does Airbyte handle namespaces (or schemas for the DB-inclined)?**
|
||||
## **How does Airbyte handle namespaces \(or schemas for the DB-inclined\)?**
|
||||
|
||||
Airbyte respects source-defined namespaces when syncing data with a namespace-supported destination. See [this](../understanding-airbyte/namespaces.md) for more details.
|
||||
|
||||
|
||||
@@ -20,10 +20,10 @@ Each stream will be output into its own file. Each file will contain 3 columns:
|
||||
|
||||
#### Features
|
||||
|
||||
| Feature | Supported |
|
||||
| :--- | :--- |
|
||||
| Full Refresh Sync | Yes |
|
||||
| Incremental - Append Sync | Yes |
|
||||
| Feature | Supported | |
|
||||
| :--- | :--- | :--- |
|
||||
| Full Refresh Sync | Yes | |
|
||||
| Incremental - Append Sync | Yes | |
|
||||
| Namespaces | No | |
|
||||
|
||||
#### Performance considerations
|
||||
|
||||
@@ -20,10 +20,10 @@ Each stream will be output into its own file. Each file will a collections of `j
|
||||
|
||||
#### Features
|
||||
|
||||
| Feature | Supported |
|
||||
| :--- | :--- |
|
||||
| Full Refresh Sync | Yes |
|
||||
| Incremental - Append Sync | Yes |
|
||||
| Feature | Supported | |
|
||||
| :--- | :--- | :--- |
|
||||
| Full Refresh Sync | Yes | |
|
||||
| Incremental - Append Sync | Yes | |
|
||||
| Namespaces | No | |
|
||||
|
||||
#### Performance considerations
|
||||
|
||||
@@ -4,9 +4,7 @@
|
||||
|
||||
The Airbyte Redshift destination allows you to sync data to Redshift.
|
||||
|
||||
This Redshift destination connector has two replication strategies:
|
||||
1) INSERT: Replicates data via SQL INSERT queries. This is built on top of the destination-jdbc code base and is configured to rely on JDBC 4.2 standard drivers provided by Amazon via Mulesoft [here](https://mvnrepository.com/artifact/com.amazon.redshift/redshift-jdbc42) as described in Redshift documentation [here](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-install.html). Not recommended for production workloads as this does not scale well.
|
||||
2) COPY: Replicates data by first uploading data to an S3 bucket and issuing a COPY command. This is the recommended loading approach described by Redshift [best practices](https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html). Requires an S3 bucket and credentials.
|
||||
This Redshift destination connector has two replication strategies: 1\) INSERT: Replicates data via SQL INSERT queries. This is built on top of the destination-jdbc code base and is configured to rely on JDBC 4.2 standard drivers provided by Amazon via Mulesoft [here](https://mvnrepository.com/artifact/com.amazon.redshift/redshift-jdbc42) as described in Redshift documentation [here](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-install.html). Not recommended for production workloads as this does not scale well. 2\) COPY: Replicates data by first uploading data to an S3 bucket and issuing a COPY command. This is the recommended loading approach described by Redshift [best practices](https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html). Requires an S3 bucket and credentials.
|
||||
|
||||
Airbyte automatically picks an approach depending on the given configuration - if S3 configuration is present, Airbyte will use the COPY strategy and vice versa.
|
||||
|
||||
@@ -40,7 +38,7 @@ You will need to choose an existing database or create a new database that will
|
||||
|
||||
1. Active Redshift cluster
|
||||
2. Allow connections from Airbyte to your Redshift cluster \(if they exist in separate VPCs\)
|
||||
3. A staging S3 bucket with credentials (for the COPY strategy).
|
||||
3. A staging S3 bucket with credentials \(for the COPY strategy\).
|
||||
|
||||
### Setup guide
|
||||
|
||||
@@ -62,7 +60,7 @@ You should have all the requirements needed to configure Redshift as a destinati
|
||||
* **Database**
|
||||
* This database needs to exist within the cluster provided.
|
||||
|
||||
#### 2a. Fill up S3 info (for COPY strategy)
|
||||
#### 2a. Fill up S3 info \(for COPY strategy\)
|
||||
|
||||
Provide the required S3 info.
|
||||
|
||||
|
||||
@@ -40,9 +40,9 @@ For more information, see the [Facebook Insights API documentation. ](https://de
|
||||
|
||||
| Feature | Supported?\(Yes/No\) | Notes |
|
||||
| :--- | :--- | :--- |
|
||||
| Full Refresh Sync | Yes |
|
||||
| Full Refresh Sync | Yes | |
|
||||
| Incremental Sync | Yes | except AdCreatives |
|
||||
| Namespaces | No |
|
||||
| Namespaces | No | |
|
||||
|
||||
### Rate Limiting & Performance Considerations
|
||||
|
||||
|
||||
@@ -59,3 +59,4 @@ When you apply for a token, you need to mention:
|
||||
* That you have full access to the server running the code \(because you're self-hosting Airbyte\)
|
||||
|
||||
If for any reason the request gets denied, let us know and we will be able to unblock you.
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Google Workspace Admin Reports API
|
||||
# Google Workspace Admin Reports
|
||||
|
||||
## Overview
|
||||
|
||||
@@ -12,7 +12,7 @@ This Source is capable of syncing the following Streams:
|
||||
* [drive](https://developers.google.com/admin-sdk/reports/v1/guides/manage-audit-drive)
|
||||
* [logins](https://developers.google.com/admin-sdk/reports/v1/guides/manage-audit-login)
|
||||
* [mobile](https://developers.google.com/admin-sdk/reports/v1/guides/manage-audit-mobile)
|
||||
* [oauth_tokens](https://developers.google.com/admin-sdk/reports/v1/guides/manage-audit-tokens)
|
||||
* [oauth\_tokens](https://developers.google.com/admin-sdk/reports/v1/guides/manage-audit-tokens)
|
||||
|
||||
### Data type mapping
|
||||
|
||||
@@ -39,16 +39,18 @@ This connector attempts to back off gracefully when it hits Reports API's rate l
|
||||
## Getting started
|
||||
|
||||
### Requirements
|
||||
|
||||
* Credentials to a Google Service Account with delegated Domain Wide Authority
|
||||
* Email address of the workspace admin which created the Service Account
|
||||
|
||||
### Create a Service Account with delegated domain wide authority
|
||||
Follow the Google Documentation for performing [Domain Wide Delegation of Authority](https://developers.google.com/admin-sdk/reports/v1/guides/delegation) to create a Service account with delegated domain wide authority. This account must be created by an administrator of the Google Workspace.
|
||||
Please make sure to grant the following OAuth scopes to the service user:
|
||||
|
||||
Follow the Google Documentation for performing [Domain Wide Delegation of Authority](https://developers.google.com/admin-sdk/reports/v1/guides/delegation) to create a Service account with delegated domain wide authority. This account must be created by an administrator of the Google Workspace. Please make sure to grant the following OAuth scopes to the service user:
|
||||
|
||||
1. `https://www.googleapis.com/auth/admin.reports.audit.readonly`
|
||||
2. `https://www.googleapis.com/auth/admin.reports.usage.readonly`
|
||||
|
||||
At the end of this process, you should have JSON credentials to this Google Service Account.
|
||||
At the end of this process, you should have JSON credentials to this Google Service Account.
|
||||
|
||||
You should now be ready to use the Google Workspace Admin Reports API connector in Airbyte.
|
||||
|
||||
You should now be ready to use the Google Workspace Admin Reports API connector in Airbyte.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
## Overview
|
||||
|
||||
The Iterable supports full refresh and incremental sync.
|
||||
The Iterable supports full refresh and incremental sync.
|
||||
|
||||
This source can sync data for the [Iterable API](https://api.iterable.com/api/docs).
|
||||
|
||||
@@ -12,20 +12,20 @@ Several output streams are available from this source:
|
||||
|
||||
* [Campaigns](https://api.iterable.com/api/docs#campaigns_campaigns)
|
||||
* [Channels](https://api.iterable.com/api/docs#channels_channels)
|
||||
* [Email Bounce](https://api.iterable.com/api/docs#export_exportDataJson) (Incremental sync)
|
||||
* [Email Click](https://api.iterable.com/api/docs#export_exportDataJson) (Incremental sync)
|
||||
* [Email Complaint](https://api.iterable.com/api/docs#export_exportDataJson) (Incremental sync)
|
||||
* [Email Open](https://api.iterable.com/api/docs#export_exportDataJson) (Incremental sync)
|
||||
* [Email Send](https://api.iterable.com/api/docs#export_exportDataJson) (Incremental sync)
|
||||
* [Email Send Skip](https://api.iterable.com/api/docs#export_exportDataJson) (Incremental sync)
|
||||
* [Email Subscribe](https://api.iterable.com/api/docs#export_exportDataJson) (Incremental sync)
|
||||
* [Email Unsubscribe](https://api.iterable.com/api/docs#export_exportDataJson) (Incremental sync)
|
||||
* [Email Bounce](https://api.iterable.com/api/docs#export_exportDataJson) \(Incremental sync\)
|
||||
* [Email Click](https://api.iterable.com/api/docs#export_exportDataJson) \(Incremental sync\)
|
||||
* [Email Complaint](https://api.iterable.com/api/docs#export_exportDataJson) \(Incremental sync\)
|
||||
* [Email Open](https://api.iterable.com/api/docs#export_exportDataJson) \(Incremental sync\)
|
||||
* [Email Send](https://api.iterable.com/api/docs#export_exportDataJson) \(Incremental sync\)
|
||||
* [Email Send Skip](https://api.iterable.com/api/docs#export_exportDataJson) \(Incremental sync\)
|
||||
* [Email Subscribe](https://api.iterable.com/api/docs#export_exportDataJson) \(Incremental sync\)
|
||||
* [Email Unsubscribe](https://api.iterable.com/api/docs#export_exportDataJson) \(Incremental sync\)
|
||||
* [Lists](https://api.iterable.com/api/docs#lists_getLists)
|
||||
* [List Users](https://api.iterable.com/api/docs#lists_getLists_0)
|
||||
* [Message Types](https://api.iterable.com/api/docs#messageTypes_messageTypes)
|
||||
* [Metadata](https://api.iterable.com/api/docs#metadata_list_tables)
|
||||
* [Templates](https://api.iterable.com/api/docs#templates_getTemplates) (Incremental sync)
|
||||
* [Users](https://api.iterable.com/api/docs#export_exportDataJson) (Incremental sync)
|
||||
* [Templates](https://api.iterable.com/api/docs#templates_getTemplates) \(Incremental sync\)
|
||||
* [Users](https://api.iterable.com/api/docs#export_exportDataJson) \(Incremental sync\)
|
||||
|
||||
If there are more endpoints you'd like Airbyte to support, please [create an issue.](https://github.com/airbytehq/airbyte/issues/new/choose)
|
||||
|
||||
|
||||
@@ -30,12 +30,12 @@ If you do not see a type in this list, assume that it is coerced into a string.
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
| :--- | :--- | :--- |
|
||||
| Full Refresh Sync | Yes |
|
||||
| Incremental Sync - Append | Yes |
|
||||
| Replicate Incremental Deletes | Coming soon |
|
||||
| Logical Replication \(WAL\) | Coming soon |
|
||||
| SSL Support | Yes |
|
||||
| SSH Tunnel Connection | Coming soon |
|
||||
| Full Refresh Sync | Yes | |
|
||||
| Incremental Sync - Append | Yes | |
|
||||
| Replicate Incremental Deletes | Coming soon | |
|
||||
| Logical Replication \(WAL\) | Coming soon | |
|
||||
| SSL Support | Yes | |
|
||||
| SSH Tunnel Connection | Coming soon | |
|
||||
| Namespaces | Yes | Enabled by default |
|
||||
|
||||
## Getting started
|
||||
|
||||
@@ -19,12 +19,12 @@ MySQL data types are mapped to the following data types when synchronizing data:
|
||||
| `date` | string | |
|
||||
| `datetime` | string | |
|
||||
| `enum` | string | |
|
||||
| `tinyint` | number | |
|
||||
| `smallint` | number | |
|
||||
| `mediumint` | number | |
|
||||
| `int` | number | |
|
||||
| `bigint` | number | |
|
||||
| `numeric` | number | |
|
||||
| `tinyint` | number | |
|
||||
| `smallint` | number | |
|
||||
| `mediumint` | number | |
|
||||
| `int` | number | |
|
||||
| `bigint` | number | |
|
||||
| `numeric` | number | |
|
||||
| `string` | string | |
|
||||
|
||||
If you do not see a type in this list, assume that it is coerced into a string. We are happy to take feedback on preferred mappings.
|
||||
@@ -35,12 +35,12 @@ If you do not see a type in this list, assume that it is coerced into a string.
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
| :--- | :--- | :--- |
|
||||
| Full Refresh Sync | Yes |
|
||||
| Incremental - Append Sync | Yes |
|
||||
| Replicate Incremental Deletes | Coming soon |
|
||||
| Logical Replication \(WAL\) | Coming soon |
|
||||
| SSL Support | Yes |
|
||||
| SSH Tunnel Connection | Coming soon |
|
||||
| Full Refresh Sync | Yes | |
|
||||
| Incremental - Append Sync | Yes | |
|
||||
| Replicate Incremental Deletes | Coming soon | |
|
||||
| Logical Replication \(WAL\) | Coming soon | |
|
||||
| SSL Support | Yes | |
|
||||
| SSH Tunnel Connection | Coming soon | |
|
||||
| Namespaces | Yes | Enabled by default |
|
||||
|
||||
## Getting started
|
||||
|
||||
@@ -1,17 +1,12 @@
|
||||
# Oracle
|
||||
# Oracle DB
|
||||
|
||||
## Overview
|
||||
|
||||
The Oracle Database source supports both Full Refresh and Incremental syncs. You can choose if this
|
||||
connector will copy only the new or updated data, or all rows in the tables and columns you set up
|
||||
for replication, every time a sync is run.
|
||||
|
||||
The Oracle Database source supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run.
|
||||
|
||||
### Resulting schema
|
||||
|
||||
The Oracle source does not alter the schema present in your database. Depending on the destination
|
||||
connected to this source, however, the schema may be altered. See the destination's documentation
|
||||
for more details.
|
||||
The Oracle source does not alter the schema present in your database. Depending on the destination connected to this source, however, the schema may be altered. See the destination's documentation for more details.
|
||||
|
||||
### Data type mapping
|
||||
|
||||
@@ -19,27 +14,26 @@ Oracle data types are mapped to the following data types when synchronizing data
|
||||
|
||||
| Oracle Type | Resulting Type | Notes |
|
||||
| :--- | :--- | :--- |
|
||||
| `number` | number | |
|
||||
| `number` | number | |
|
||||
| `integer` | number | |
|
||||
| `decimal` | number | |
|
||||
| `float` | number | |
|
||||
| everything else | string | |
|
||||
|
||||
If you do not see a type in this list, assume that it is coerced into a string. We are happy to
|
||||
take feedback on preferred mappings.
|
||||
If you do not see a type in this list, assume that it is coerced into a string. We are happy to take feedback on preferred mappings.
|
||||
|
||||
### Features
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
| :--- | :--- | :--- |
|
||||
| Full Refresh Sync | Yes |
|
||||
| Incremental - Append Sync | Yes |
|
||||
| Replicate Incremental Deletes | Coming soon |
|
||||
| Logical Replication \(WAL\) | Coming soon |
|
||||
| SSL Support | Coming soon |
|
||||
| SSH Tunnel Connection | Coming soon |
|
||||
| LogMiner | Coming soon |
|
||||
| Flashback | Coming soon |
|
||||
| Full Refresh Sync | Yes | |
|
||||
| Incremental - Append Sync | Yes | |
|
||||
| Replicate Incremental Deletes | Coming soon | |
|
||||
| Logical Replication \(WAL\) | Coming soon | |
|
||||
| SSL Support | Coming soon | |
|
||||
| SSH Tunnel Connection | Coming soon | |
|
||||
| LogMiner | Coming soon | |
|
||||
| Flashback | Coming soon | |
|
||||
| Namespaces | Yes | Enabled by default |
|
||||
|
||||
## Getting started
|
||||
@@ -74,9 +68,11 @@ GRANT SELECT ANY TABLE TO airbyte;
|
||||
```
|
||||
|
||||
Or you can be more granular:
|
||||
|
||||
```sql
|
||||
GRANT SELECT ON "<schema_a>"."<table_1>" TO airbyte;
|
||||
GRANT SELECT ON "<schema_b>"."<table_2>" TO airbyte;
|
||||
```
|
||||
|
||||
Your database user should now be ready for use with Airbyte.
|
||||
|
||||
|
||||
@@ -49,12 +49,12 @@ Postgres data types are mapped to the following data types when synchronizing da
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
| :--- | :--- | :--- |
|
||||
| Full Refresh Sync | Yes |
|
||||
| Incremental - Append Sync | Yes |
|
||||
| Replicate Incremental Deletes | Yes |
|
||||
| Logical Replication \(WAL\) | Yes |
|
||||
| SSL Support | Yes |
|
||||
| SSH Tunnel Connection | Coming soon |
|
||||
| Full Refresh Sync | Yes | |
|
||||
| Incremental - Append Sync | Yes | |
|
||||
| Replicate Incremental Deletes | Yes | |
|
||||
| Logical Replication \(WAL\) | Yes | |
|
||||
| SSL Support | Yes | |
|
||||
| SSH Tunnel Connection | Coming soon | |
|
||||
| Namespaces | Yes | Enabled by default |
|
||||
|
||||
## Getting started
|
||||
@@ -100,33 +100,36 @@ ALTER DEFAULT PRIVILEGES IN SCHEMA <schema_name> GRANT SELECT ON TABLES TO airby
|
||||
|
||||
#### 3. Set up CDC \(Optional\)
|
||||
|
||||
Please read [the section on CDC below](#setting-up-cdc-for-postgres) for more information.
|
||||
Please read [the section on CDC below](postgres.md#setting-up-cdc-for-postgres) for more information.
|
||||
|
||||
#### 4. That's it!
|
||||
|
||||
Your database user should now be ready for use with Airbyte.
|
||||
|
||||
## Change Data Capture (CDC) / Logical Replication / WAL Replication
|
||||
We use [logical replication](https://www.postgresql.org/docs/10/logical-replication.html) of the Postgres write-ahead log (WAL) to incrementally capture deletes using the `pgoutput` plugin.
|
||||
## Change Data Capture \(CDC\) / Logical Replication / WAL Replication
|
||||
|
||||
We use [logical replication](https://www.postgresql.org/docs/10/logical-replication.html) of the Postgres write-ahead log \(WAL\) to incrementally capture deletes using the `pgoutput` plugin.
|
||||
|
||||
We do not require installing custom plugins like `wal2json` or `test_decoding`. We use `pgoutput`, which is included in Postgres 10+ by default.
|
||||
|
||||
Please read the [CDC docs](../../understanding-airbyte/cdc.md) for an overview of how Airbyte approaches CDC.
|
||||
|
||||
### Should I use CDC for Postgres?
|
||||
|
||||
* If you need a record of deletions and can accept the limitations posted below, you should to use CDC for Postgres.
|
||||
* If your data set is small and you just want snapshot of your table in the destination, consider using Full Refresh replication for your table instead of CDC.
|
||||
* If the limitations prevent you from using CDC and your goal is to maintain a snapshot of your table in the destination, consider using non-CDC incremental and occasionally reset the data and re-sync.
|
||||
* If your table has a primary key but doesn't have a reasonable cursor field for incremental syncing (i.e. `updated_at`), CDC allows you to sync your table incrementally.
|
||||
* If your table has a primary key but doesn't have a reasonable cursor field for incremental syncing \(i.e. `updated_at`\), CDC allows you to sync your table incrementally.
|
||||
|
||||
### CDC Limitations
|
||||
|
||||
* Make sure to read our [CDC docs](../../understanding-airbyte/cdc.md) to see limitations that impact all databases using CDC replication.
|
||||
* CDC is only available for Postgres 10+.
|
||||
* Airbyte requires a replication slot configured only for its use. Only one source should be configured that uses this replication slot. Instructions on how to set up a replication slot can be found below.
|
||||
* Log-based replication only works for master instances of Postgres.
|
||||
* Using logical replication increases disk space used on the database server. The additional data is stored until it is consumed.
|
||||
* We recommend setting frequent syncs for CDC in order to ensure that this data doesn't fill up your disk space.
|
||||
* If you stop syncing a CDC-configured Postgres instance to Airbyte, you should delete the replication slot. Otherwise, it may fill up your disk space.
|
||||
* We recommend setting frequent syncs for CDC in order to ensure that this data doesn't fill up your disk space.
|
||||
* If you stop syncing a CDC-configured Postgres instance to Airbyte, you should delete the replication slot. Otherwise, it may fill up your disk space.
|
||||
* Our CDC implementation uses at least once delivery for all change records.
|
||||
|
||||
### Setting up CDC for Postgres
|
||||
@@ -134,17 +137,20 @@ Please read the [CDC docs](../../understanding-airbyte/cdc.md) for an overview o
|
||||
#### Enable logical replication
|
||||
|
||||
Follow one of these guides to enable logical replication:
|
||||
* [Bare Metal, VMs (EC2/GCE/etc), Docker, etc.](#setting-up-cdc-on-bare-metal-vms-ec2gceetc-docker-etc)
|
||||
* [AWS Postgres RDS or Aurora](#setting-up-cdc-on-aws-postgres-rds-or-aurora)
|
||||
* [Azure Database for Postgres](#setting-up-cdc-on-azure-database-for-postgres)
|
||||
|
||||
#### Add user-level permissions
|
||||
* [Bare Metal, VMs \(EC2/GCE/etc\), Docker, etc.](postgres.md#setting-up-cdc-on-bare-metal-vms-ec2gceetc-docker-etc)
|
||||
* [AWS Postgres RDS or Aurora](postgres.md#setting-up-cdc-on-aws-postgres-rds-or-aurora)
|
||||
* [Azure Database for Postgres](postgres.md#setting-up-cdc-on-azure-database-for-postgres)
|
||||
|
||||
We recommend using a user specifically for Airbyte's replication so you can minimize access. This Airbyte user for your instance needs to be granted `REPLICATION` and `LOGIN` permissions. You can create a role with `CREATE ROLE <name> REPLICATION LOGIN;` and grant that role to the user. You still need to make sure the user can connect to the database, use the schema, and to use `SELECT` on tables (the same are required for non-CDC incremental syncs and all full refreshes).
|
||||
#### Add user-level permissions
|
||||
|
||||
We recommend using a user specifically for Airbyte's replication so you can minimize access. This Airbyte user for your instance needs to be granted `REPLICATION` and `LOGIN` permissions. You can create a role with `CREATE ROLE <name> REPLICATION LOGIN;` and grant that role to the user. You still need to make sure the user can connect to the database, use the schema, and to use `SELECT` on tables \(the same are required for non-CDC incremental syncs and all full refreshes\).
|
||||
|
||||
#### Create replication slot
|
||||
Next, you will need to create a replication slot. Here is the query used to create a replication slot called `airbyte_slot`:
|
||||
```
|
||||
|
||||
Next, you will need to create a replication slot. Here is the query used to create a replication slot called `airbyte_slot`:
|
||||
|
||||
```text
|
||||
SELECT pg_create_logical_replication_slot('airbyte_slot', 'pgoutput');`
|
||||
```
|
||||
|
||||
@@ -157,10 +163,12 @@ For each table you want to replicate with CDC, you will need to run `CREATE PUBL
|
||||
The UI currently allows selecting any tables for CDC. If a table is selected that is not part of the publication, it will not replicate even though it is selected. If a table is part of the publication but does not have a replication identity, that replication identity will be created automatically on the first run if the Airbyte user has the necessary permissions.
|
||||
|
||||
#### Start syncing
|
||||
|
||||
When configuring the source, select CDC and provide the replication slot and publication you just created. You should be ready to sync data with CDC!
|
||||
|
||||
### Setting up CDC on Bare Metal, VMs (EC2/GCE/etc), Docker, etc.
|
||||
Some settings must be configured in the `postgresql.conf` file for your database. You can find the location of this file using `psql -U postgres -c 'SHOW config_file'` withe the correct `psql` credentials specified. Alternatively, a custom file can be specified when running postgres with the `-c` flag. For example `postgres -c config_file=/etc/postgresql/postgresql.conf` runs Postgres with the config file at `/etc/postgresql/postgresql.conf`.
|
||||
### Setting up CDC on Bare Metal, VMs \(EC2/GCE/etc\), Docker, etc.
|
||||
|
||||
Some settings must be configured in the `postgresql.conf` file for your database. You can find the location of this file using `psql -U postgres -c 'SHOW config_file'` withe the correct `psql` credentials specified. Alternatively, a custom file can be specified when running postgres with the `-c` flag. For example `postgres -c config_file=/etc/postgresql/postgresql.conf` runs Postgres with the config file at `/etc/postgresql/postgresql.conf`.
|
||||
|
||||
If you are syncing data from a server using the `postgres` Docker image, you will need to mount a file and change the command to run Postgres with the set config file. If you're just testing CDC behavior, you may want to use a modified version of a [sample `postgresql.conf`](https://github.com/postgres/postgres/blob/master/src/backend/utils/misc/postgresql.conf.sample).
|
||||
|
||||
@@ -169,7 +177,8 @@ If you are syncing data from a server using the `postgres` Docker image, you wil
|
||||
* `max_replication_slots` is the maximum number of replication slots that are allowed to stream WAL changes. This must one if Airbyte will be the only service reading subscribing to WAL changes or more if other services are also reading from the WAL.
|
||||
|
||||
Here is what these settings would look like in `postgresql.conf`:
|
||||
```
|
||||
|
||||
```text
|
||||
wal_level = logical
|
||||
max_wal_senders = 1
|
||||
max_replication_slots = 1
|
||||
@@ -177,27 +186,32 @@ max_replication_slots = 1
|
||||
|
||||
After setting these values you will need to restart your instance.
|
||||
|
||||
Finally, [follow the rest of steps above](#setting-up-cdc-for-postgres).
|
||||
Finally, [follow the rest of steps above](postgres.md#setting-up-cdc-for-postgres).
|
||||
|
||||
### Setting up CDC on AWS Postgres RDS or Aurora
|
||||
|
||||
* Go to the `Configuration` tab for your DB cluster.
|
||||
* Find your cluster parameter group. You will either edit the parameters for this group or create a copy of this parameter group to edit. If you create a copy you will need to change your cluster's parameter group before restarting.
|
||||
* Within the parameter group page, search for `rds.logical_replication`. Select this row and click on the `Edit parameters` button. Set this value to `1`.
|
||||
* Wait for a maintenance window to automatically restart the instance or restart it manually.
|
||||
* Finally, [follow the rest of steps above](#setting-up-cdc-for-postgres).
|
||||
* Finally, [follow the rest of steps above](postgres.md#setting-up-cdc-for-postgres).
|
||||
|
||||
### Setting up CDC on Azure Database for Postgres
|
||||
|
||||
Use either the Azure CLI to:
|
||||
```
|
||||
|
||||
```text
|
||||
az postgres server configuration set --resource-group group --server-name server --name azure.replication_support --value logical
|
||||
az postgres server restart --resource-group group --name server
|
||||
```
|
||||
|
||||
Finally, [follow the rest of steps above](#setting-up-cdc-for-postgres).
|
||||
Finally, [follow the rest of steps above](postgres.md#setting-up-cdc-for-postgres).
|
||||
|
||||
### Setting up CDC on Google CloudSQL
|
||||
|
||||
Unfortunately, logical replication is not configurable for Google CloudSQL. You can indicate your support for this feature on the [Google Issue Tracker](https://issuetracker.google.com/issues/120274585).
|
||||
|
||||
### Setting up CDC on other platforms
|
||||
|
||||
If you encounter one of those not listed below, please consider [contributing to our docs](https://github.com/airbytehq/airbyte/tree/master/docs) and providing setup instructions.
|
||||
|
||||
|
||||
@@ -64,6 +64,7 @@ This Source is capable of syncing the following [Streams](https://developer.intu
|
||||
3. Obtain credentials
|
||||
|
||||
### Requirements
|
||||
|
||||
* Client ID
|
||||
* Client Secret
|
||||
* Realm ID
|
||||
@@ -71,4 +72,5 @@ This Source is capable of syncing the following [Streams](https://developer.intu
|
||||
|
||||
The easiest way to get these credentials is by using Quickbook's [OAuth 2.0 playground](https://developer.intuit.com/app/developer/qbo/docs/develop/authentication-and-authorization/oauth-2.0-playground)
|
||||
|
||||
**Important note:** The refresh token expires every 100 days. You will need to manually revisit the Oauth playground to obtain a refresh token every 100 days, or your syncs will expire. We plan on offering full Oauth support soon so you don't need to redo this process manually.
|
||||
**Important note:** The refresh token expires every 100 days. You will need to manually revisit the Oauth playground to obtain a refresh token every 100 days, or your syncs will expire. We plan on offering full Oauth support soon so you don't need to redo this process manually.
|
||||
|
||||
|
||||
@@ -36,7 +36,7 @@ This Source is capable of syncing the following core Streams:
|
||||
| :--- | :--- | :--- |
|
||||
| Full Refresh Sync | Yes | |
|
||||
| Incremental - Append Sync | Yes | |
|
||||
| Namespaces | No | |
|
||||
| Namespaces | No | |
|
||||
|
||||
### Performance considerations
|
||||
|
||||
|
||||
@@ -1,85 +1,75 @@
|
||||
# Smartsheets
|
||||
|
||||
### Table of Contents
|
||||
- [Sync Details](#sync-details)
|
||||
- [Column datatype mapping](#column-datatype-mapping)
|
||||
- [Features](#Features)
|
||||
- [Performance Considerations](#performance-considerations)
|
||||
- [Getting Started](#getting-started)
|
||||
- [Requirements](#requirements)
|
||||
- [Setup Guide](#setup-guide)
|
||||
- [Configuring the source in the Airbyte UI](#configuring-the-source-in-the-airbyte-ui)
|
||||
|
||||
* [Sync Details](smartsheets.md#sync-details)
|
||||
* [Column datatype mapping](smartsheets.md#column-datatype-mapping)
|
||||
* [Features](smartsheets.md#Features)
|
||||
* [Performance Considerations](smartsheets.md#performance-considerations)
|
||||
* [Getting Started](smartsheets.md#getting-started)
|
||||
* [Requirements](smartsheets.md#requirements)
|
||||
* [Setup Guide](smartsheets.md#setup-guide)
|
||||
* [Configuring the source in the Airbyte UI](smartsheets.md#configuring-the-source-in-the-airbyte-ui)
|
||||
|
||||
## Sync Details
|
||||
The Smartsheet Source is written to pull data from a single Smartsheet spreadsheet. Unlike Google Sheets, Smartsheets only allows one sheet per Smartsheet - so a given Airbyte connector instance can sync only one sheet at a time.
|
||||
|
||||
The Smartsheet Source is written to pull data from a single Smartsheet spreadsheet. Unlike Google Sheets, Smartsheets only allows one sheet per Smartsheet - so a given Airbyte connector instance can sync only one sheet at a time.
|
||||
|
||||
To replicate multiple spreadsheets, you can create multiple instances of the Smartsheet Source in Airbyte, reusing the API token for all your sheets that you need to sync.
|
||||
|
||||
**Note: Column headers must contain only alphanumeric characters or `_` , as specified in the** [**Airbyte Protocol**](../../understanding-airbyte/airbyte-specification.md).
|
||||
|
||||
### Column datatype mapping
|
||||
The data type mapping adopted by this connector is based on the Smartsheet [documentation](https://smartsheet.redoc.ly/tag/columnsRelated#section/Column-Types).
|
||||
|
||||
The data type mapping adopted by this connector is based on the Smartsheet [documentation](https://smartsheet.redoc.ly/tag/columnsRelated#section/Column-Types).
|
||||
|
||||
**NOTE**: For any column datatypes interpreted by Smartsheets beside `DATE` and `DATETIME`, this connector's source schema generation assumes a `string` type, in which case the `format` field is not required by Airbyte.
|
||||
|
||||
<center>
|
||||
\| Integration Type \| Airbyte Type \| Airbyte Format \| \| :--- \| :--- \| :--- \| \| \`TEXT\_NUMBER\` \| \`string\` \| \| \| \`DATE\` \| \`string\` \| \`format: date\` \| \| \`DATETIME\` \| \`string\` \| \`format: date-time\` \| \| \`anything else\` \| \`string\` \| \|
|
||||
|
||||
| Integration Type | Airbyte Type | Airbyte Format |
|
||||
| :--- | :--- | :--- |
|
||||
| `TEXT_NUMBER` | `string` | |
|
||||
| `DATE` | `string` | `format: date` |
|
||||
| `DATETIME` | `string` | `format: date-time` |
|
||||
| `anything else` | `string` | |
|
||||
|
||||
</center>
|
||||
|
||||
The remaining column datatypes supported by Smartsheets are more complex types (e.g. Predecessor, Dropdown List) and are not supported by this connector beyond its `string` representation.
|
||||
The remaining column datatypes supported by Smartsheets are more complex types \(e.g. Predecessor, Dropdown List\) and are not supported by this connector beyond its `string` representation.
|
||||
|
||||
### Features
|
||||
|
||||
This source connector only supports Full Refresh Sync. Since Smartsheets only allows 5000 rows per sheet, it's likely that the Full Refresh Sync Mode will suit the majority of use-cases.
|
||||
|
||||
<center>
|
||||
|
||||
| Feature | Supported?|
|
||||
| :--- | :--- |
|
||||
| Full Refresh Sync | <center>Yes</center> |
|
||||
| Incremental Sync | <center>No</center> |
|
||||
| Namespaces | <center>No</center> |
|
||||
|
||||
</center>
|
||||
\| Feature \| Supported?\| \| :--- \| :--- \| \| Full Refresh Sync \|Yes \| \| Incremental Sync \|No \| \| Namespaces \|No \|
|
||||
|
||||
### Performance considerations
|
||||
At the time of writing, the [Smartsheets API rate limit](https://developers.smartsheet.com/blog/smartsheet-api-best-practices#:~:text=The%20Smartsheet%20API%20currently%20imposes,per%20minute%20per%20Access%20Token.) is 300 requests per minute per API access token. This connector makes 6 API calls per sync operation.
|
||||
|
||||
<hr>
|
||||
At the time of writing, the [Smartsheets API rate limit](https://developers.smartsheet.com/blog/smartsheet-api-best-practices#:~:text=The%20Smartsheet%20API%20currently%20imposes,per%20minute%20per%20Access%20Token.) is 300 requests per minute per API access token. This connector makes 6 API calls per sync operation.
|
||||
|
||||
## Getting started
|
||||
|
||||
### Requirements
|
||||
|
||||
To configure the Smartsheet Source for syncs, you'll need the following:
|
||||
|
||||
* A Smartsheets API access token - generated by a Smartsheets user with at least **read** access
|
||||
* The ID of the spreadsheet you'd like to sync
|
||||
|
||||
### Setup guide
|
||||
|
||||
#### Obtain a Smartsheets API access token
|
||||
|
||||
You can generate an API key for your account from a session of your Smartsheet webapp by clicking:
|
||||
|
||||
- Account (top-right icon)
|
||||
- Apps & Integrations
|
||||
- API Access
|
||||
- Generate new access token
|
||||
* Account \(top-right icon\)
|
||||
* Apps & Integrations
|
||||
* API Access
|
||||
* Generate new access token
|
||||
|
||||
For questions on advanced authorization flows, refer to [this](https://www.smartsheet.com/content-center/best-practices/tips-tricks/api-getting-started).
|
||||
|
||||
#### The spreadsheet ID of your Smartsheet
|
||||
|
||||
You'll also need the ID of the Spreadsheet you'd like to sync. Unlike Google Sheets, this ID is not found in the URL. You can find the required spreadsheet ID from your Smartsheet app session by going to:
|
||||
- File
|
||||
- Properties
|
||||
<br>
|
||||
|
||||
* File
|
||||
* Properties
|
||||
|
||||
### Configuring the source in the Airbyte UI
|
||||
|
||||
|
||||
To setup your new Smartsheets source, Airbyte will need:
|
||||
|
||||
1. Your API access token
|
||||
|
||||
@@ -30,7 +30,7 @@ This Source is capable of syncing the following core Streams:
|
||||
* [Transcriptions](https://www.twilio.com/docs/voice/api/recording-transcription?code-sample=code-read-list-all-transcriptions&code-language=curl&code-sdk-version=json#read-multiple-transcription-resources)
|
||||
* [Queues](https://www.twilio.com/docs/voice/api/queue-resource#read-multiple-queue-resources)
|
||||
* [Message media](https://www.twilio.com/docs/sms/api/media-resource#read-multiple-media-resources)
|
||||
* [Messages](https://www.twilio.com/docs/sms/api/message-resource#read-multiple-message-resources)
|
||||
* [Messages](https://www.twilio.com/docs/sms/api/message-resource#read-multiple-message-resources)
|
||||
|
||||
\(stream data can only be received for the last 400 days\)
|
||||
|
||||
|
||||