diff --git a/README.md b/README.md index 514b35f56d8..7fed3aa3003 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Introduction -[![GitHub Workflow Status](https://img.shields.io/github/workflow/status/airbytehq/airbyte/Airbyte%20CI)](https://github.com/airbytehq/airbyte/actions/workflows/gradle.yml) [![License](https://img.shields.io/static/v1?label=license&message=MIT&color=brightgreen)](./LICENSE) [![License](https://img.shields.io/static/v1?label=license&message=ELv2&color=brightgreen)](./LICENSE) +[![GitHub Workflow Status](https://img.shields.io/github/workflow/status/airbytehq/airbyte/Airbyte%20CI)](https://github.com/airbytehq/airbyte/actions/workflows/gradle.yml) [![License](https://img.shields.io/static/v1?label=license&message=MIT&color=brightgreen)](https://github.com/airbytehq/airbyte/tree/a9b1c6c0420550ad5069aca66c295223e0d05e27/LICENSE/README.md) [![License](https://img.shields.io/static/v1?label=license&message=ELv2&color=brightgreen)](https://github.com/airbytehq/airbyte/tree/a9b1c6c0420550ad5069aca66c295223e0d05e27/LICENSE/README.md) ![](docs/.gitbook/assets/airbyte_new_logo.svg) @@ -20,7 +20,7 @@ Airbyte is on a mission to make data integration pipelines a commodity. * **No more security compliance process** to go through as Airbyte is self-hosted. * **No more pricing indexed on volume**, as cloud-based solutions offer. -Here's a list of our [connectors with their health status](docs/integrations). +Here's a list of our [connectors with their health status](docs/integrations/). ## Quick start @@ -52,7 +52,7 @@ If you want to schedule a 20-min call with our team to help you get set up, plea We love contributions to Airbyte, big or small. -See our [Contributing guide](docs/contributing-to-airbyte/) on how to get started. Not sure where to start? We’ve listed some [good first issues](https://github.com/airbytehq/airbyte/labels/good%20first%20issue) to start with. If you have any questions, please open a draft PR or visit our [slack channel](slack.airbyte.io) where the core team can help answer your questions. +See our [Contributing guide](docs/contributing-to-airbyte/) on how to get started. Not sure where to start? We’ve listed some [good first issues](https://github.com/airbytehq/airbyte/labels/good%20first%20issue) to start with. If you have any questions, please open a draft PR or visit our [slack channel](https://github.com/airbytehq/airbyte/tree/a9b1c6c0420550ad5069aca66c295223e0d05e27/slack.airbyte.io) where the core team can help answer your questions. **Note that you are able to create connectors using the language you want, as Airbyte connections run as Docker containers.** @@ -73,5 +73,5 @@ Check out our [roadmap](docs/project-overview/roadmap.md) to get informed on wha ## License -See the [LICENSE](docs/project-overview/licenses/README.md) file for licensing information. +See the [LICENSE](docs/project-overview/licenses/) file for licensing information. diff --git a/docs/.gitbook/assets/change-to-per-week (3) (1).png b/docs/.gitbook/assets/change-to-per-week (3) (3) (1).png similarity index 100% rename from docs/.gitbook/assets/change-to-per-week (3) (1).png rename to docs/.gitbook/assets/change-to-per-week (3) (3) (1).png diff --git a/docs/.gitbook/assets/change-to-per-week (3) (2).png b/docs/.gitbook/assets/change-to-per-week (3) (3) (2).png similarity index 100% rename from docs/.gitbook/assets/change-to-per-week (3) (2).png rename to docs/.gitbook/assets/change-to-per-week (3) (3) (2).png diff --git a/docs/.gitbook/assets/change-to-per-week (3).png b/docs/.gitbook/assets/change-to-per-week (3) (3) (3).png similarity index 100% rename from docs/.gitbook/assets/change-to-per-week (3).png rename to docs/.gitbook/assets/change-to-per-week (3) (3) (3).png diff --git a/docs/.gitbook/assets/change-to-per-week (3) (3) (4).png b/docs/.gitbook/assets/change-to-per-week (3) (3) (4).png new file mode 100644 index 00000000000..015d141aed6 Binary files /dev/null and b/docs/.gitbook/assets/change-to-per-week (3) (3) (4).png differ diff --git a/docs/.gitbook/assets/datasources (4) (1).png b/docs/.gitbook/assets/datasources (4) (4) (1).png similarity index 100% rename from docs/.gitbook/assets/datasources (4) (1).png rename to docs/.gitbook/assets/datasources (4) (4) (1).png diff --git a/docs/.gitbook/assets/datasources (4) (2).png b/docs/.gitbook/assets/datasources (4) (4) (2).png similarity index 100% rename from docs/.gitbook/assets/datasources (4) (2).png rename to docs/.gitbook/assets/datasources (4) (4) (2).png diff --git a/docs/.gitbook/assets/datasources (4) (3).png b/docs/.gitbook/assets/datasources (4) (4) (3).png similarity index 100% rename from docs/.gitbook/assets/datasources (4) (3).png rename to docs/.gitbook/assets/datasources (4) (4) (3).png diff --git a/docs/.gitbook/assets/datasources (4).png b/docs/.gitbook/assets/datasources (4) (4) (4).png similarity index 100% rename from docs/.gitbook/assets/datasources (4).png rename to docs/.gitbook/assets/datasources (4) (4) (4).png diff --git a/docs/.gitbook/assets/duration-spent-in-weekly-webinars (3) (4).png b/docs/.gitbook/assets/duration-spent-in-weekly-webinars (3) (4).png new file mode 100644 index 00000000000..441c104f5c0 Binary files /dev/null and b/docs/.gitbook/assets/duration-spent-in-weekly-webinars (3) (4).png differ diff --git a/docs/.gitbook/assets/evolution-of-meetings-per-week (3) (1).png b/docs/.gitbook/assets/evolution-of-meetings-per-week (3) (3) (1).png similarity index 100% rename from docs/.gitbook/assets/evolution-of-meetings-per-week (3) (1).png rename to docs/.gitbook/assets/evolution-of-meetings-per-week (3) (3) (1).png diff --git a/docs/.gitbook/assets/evolution-of-meetings-per-week (3) (2).png b/docs/.gitbook/assets/evolution-of-meetings-per-week (3) (3) (2).png similarity index 100% rename from docs/.gitbook/assets/evolution-of-meetings-per-week (3) (2).png rename to docs/.gitbook/assets/evolution-of-meetings-per-week (3) (3) (2).png diff --git a/docs/.gitbook/assets/evolution-of-meetings-per-week (3).png b/docs/.gitbook/assets/evolution-of-meetings-per-week (3) (3) (3).png similarity index 100% rename from docs/.gitbook/assets/evolution-of-meetings-per-week (3).png rename to docs/.gitbook/assets/evolution-of-meetings-per-week (3) (3) (3).png diff --git a/docs/.gitbook/assets/evolution-of-meetings-per-week (3) (3) (4).png b/docs/.gitbook/assets/evolution-of-meetings-per-week (3) (3) (4).png new file mode 100644 index 00000000000..23054ffc2eb Binary files /dev/null and b/docs/.gitbook/assets/evolution-of-meetings-per-week (3) (3) (4).png differ diff --git a/docs/.gitbook/assets/launch (3) (1).png b/docs/.gitbook/assets/launch (3) (3) (1).png similarity index 100% rename from docs/.gitbook/assets/launch (3) (1).png rename to docs/.gitbook/assets/launch (3) (3) (1).png diff --git a/docs/.gitbook/assets/launch (3) (2).png b/docs/.gitbook/assets/launch (3) (3) (2).png similarity index 100% rename from docs/.gitbook/assets/launch (3) (2).png rename to docs/.gitbook/assets/launch (3) (3) (2).png diff --git a/docs/.gitbook/assets/launch (3).png b/docs/.gitbook/assets/launch (3) (3) (3).png similarity index 100% rename from docs/.gitbook/assets/launch (3).png rename to docs/.gitbook/assets/launch (3) (3) (3).png diff --git a/docs/.gitbook/assets/launch (3) (3) (4).png b/docs/.gitbook/assets/launch (3) (3) (4).png new file mode 100644 index 00000000000..cfcc543a16b Binary files /dev/null and b/docs/.gitbook/assets/launch (3) (3) (4).png differ diff --git a/docs/.gitbook/assets/meetings-participant-ranked (3) (1).png b/docs/.gitbook/assets/meetings-participant-ranked (3) (3) (1).png similarity index 100% rename from docs/.gitbook/assets/meetings-participant-ranked (3) (1).png rename to docs/.gitbook/assets/meetings-participant-ranked (3) (3) (1).png diff --git a/docs/.gitbook/assets/meetings-participant-ranked (3) (2).png b/docs/.gitbook/assets/meetings-participant-ranked (3) (3) (2).png similarity index 100% rename from docs/.gitbook/assets/meetings-participant-ranked (3) (2).png rename to docs/.gitbook/assets/meetings-participant-ranked (3) (3) (2).png diff --git a/docs/.gitbook/assets/meetings-participant-ranked (3).png b/docs/.gitbook/assets/meetings-participant-ranked (3) (3) (3).png similarity index 100% rename from docs/.gitbook/assets/meetings-participant-ranked (3).png rename to docs/.gitbook/assets/meetings-participant-ranked (3) (3) (3).png diff --git a/docs/.gitbook/assets/meetings-participant-ranked (3) (3) (4).png b/docs/.gitbook/assets/meetings-participant-ranked (3) (3) (4).png new file mode 100644 index 00000000000..2f943f18e90 Binary files /dev/null and b/docs/.gitbook/assets/meetings-participant-ranked (3) (3) (4).png differ diff --git a/docs/.gitbook/assets/postgres_credentials (3) (1).png b/docs/.gitbook/assets/postgres_credentials (3) (3) (1).png similarity index 100% rename from docs/.gitbook/assets/postgres_credentials (3) (1).png rename to docs/.gitbook/assets/postgres_credentials (3) (3) (1).png diff --git a/docs/.gitbook/assets/postgres_credentials (3) (2).png b/docs/.gitbook/assets/postgres_credentials (3) (3) (2).png similarity index 100% rename from docs/.gitbook/assets/postgres_credentials (3) (2).png rename to docs/.gitbook/assets/postgres_credentials (3) (3) (2).png diff --git a/docs/.gitbook/assets/postgres_credentials (3).png b/docs/.gitbook/assets/postgres_credentials (3) (3) (3).png similarity index 100% rename from docs/.gitbook/assets/postgres_credentials (3).png rename to docs/.gitbook/assets/postgres_credentials (3) (3) (3).png diff --git a/docs/.gitbook/assets/postgres_credentials (3) (3) (4).png b/docs/.gitbook/assets/postgres_credentials (3) (3) (4).png new file mode 100644 index 00000000000..b56bc6dc50a Binary files /dev/null and b/docs/.gitbook/assets/postgres_credentials (3) (3) (4).png differ diff --git a/docs/.gitbook/assets/schema (3) (1).png b/docs/.gitbook/assets/schema (3) (3) (1).png similarity index 100% rename from docs/.gitbook/assets/schema (3) (1).png rename to docs/.gitbook/assets/schema (3) (3) (1).png diff --git a/docs/.gitbook/assets/schema (3) (2).png b/docs/.gitbook/assets/schema (3) (3) (2).png similarity index 100% rename from docs/.gitbook/assets/schema (3) (2).png rename to docs/.gitbook/assets/schema (3) (3) (2).png diff --git a/docs/.gitbook/assets/schema (3).png b/docs/.gitbook/assets/schema (3) (3) (3).png similarity index 100% rename from docs/.gitbook/assets/schema (3).png rename to docs/.gitbook/assets/schema (3) (3) (3).png diff --git a/docs/.gitbook/assets/schema (3) (3) (4).png b/docs/.gitbook/assets/schema (3) (3) (4).png new file mode 100644 index 00000000000..9d4e5d4b692 Binary files /dev/null and b/docs/.gitbook/assets/schema (3) (3) (4).png differ diff --git a/docs/.gitbook/assets/setup-successful (3) (1).png b/docs/.gitbook/assets/setup-successful (3) (2) (1).png similarity index 100% rename from docs/.gitbook/assets/setup-successful (3) (1).png rename to docs/.gitbook/assets/setup-successful (3) (2) (1).png diff --git a/docs/.gitbook/assets/setup-successful (3) (3).png b/docs/.gitbook/assets/setup-successful (3) (2) (2).png similarity index 100% rename from docs/.gitbook/assets/setup-successful (3) (3).png rename to docs/.gitbook/assets/setup-successful (3) (2) (2).png diff --git a/docs/.gitbook/assets/setup-successful (3).png b/docs/.gitbook/assets/setup-successful (3) (2) (3).png similarity index 100% rename from docs/.gitbook/assets/setup-successful (3).png rename to docs/.gitbook/assets/setup-successful (3) (2) (3).png diff --git a/docs/.gitbook/assets/sync-screen (3) (1).png b/docs/.gitbook/assets/sync-screen (3) (3) (1).png similarity index 100% rename from docs/.gitbook/assets/sync-screen (3) (1).png rename to docs/.gitbook/assets/sync-screen (3) (3) (1).png diff --git a/docs/.gitbook/assets/sync-screen (3) (2).png b/docs/.gitbook/assets/sync-screen (3) (3) (2).png similarity index 100% rename from docs/.gitbook/assets/sync-screen (3) (2).png rename to docs/.gitbook/assets/sync-screen (3) (3) (2).png diff --git a/docs/.gitbook/assets/sync-screen (3).png b/docs/.gitbook/assets/sync-screen (3) (3) (3).png similarity index 100% rename from docs/.gitbook/assets/sync-screen (3).png rename to docs/.gitbook/assets/sync-screen (3) (3) (3).png diff --git a/docs/.gitbook/assets/tableau-dashboard (3) (1).png b/docs/.gitbook/assets/tableau-dashboard (3) (3) (1).png similarity index 100% rename from docs/.gitbook/assets/tableau-dashboard (3) (1).png rename to docs/.gitbook/assets/tableau-dashboard (3) (3) (1).png diff --git a/docs/.gitbook/assets/tableau-dashboard (3) (2).png b/docs/.gitbook/assets/tableau-dashboard (3) (3) (2).png similarity index 100% rename from docs/.gitbook/assets/tableau-dashboard (3) (2).png rename to docs/.gitbook/assets/tableau-dashboard (3) (3) (2).png diff --git a/docs/.gitbook/assets/tableau-dashboard (3).png b/docs/.gitbook/assets/tableau-dashboard (3) (3) (3).png similarity index 100% rename from docs/.gitbook/assets/tableau-dashboard (3).png rename to docs/.gitbook/assets/tableau-dashboard (3) (3) (3).png diff --git a/docs/.gitbook/assets/tableau-dashboard (3) (3) (4).png b/docs/.gitbook/assets/tableau-dashboard (3) (3) (4).png new file mode 100644 index 00000000000..b3c5c91b7ba Binary files /dev/null and b/docs/.gitbook/assets/tableau-dashboard (3) (3) (4).png differ diff --git a/docs/.gitbook/assets/zoom-marketplace-build-screen (3) (1).png b/docs/.gitbook/assets/zoom-marketplace-build-screen (3) (3) (1).png similarity index 100% rename from docs/.gitbook/assets/zoom-marketplace-build-screen (3) (1).png rename to docs/.gitbook/assets/zoom-marketplace-build-screen (3) (3) (1).png diff --git a/docs/.gitbook/assets/zoom-marketplace-build-screen (3) (2).png b/docs/.gitbook/assets/zoom-marketplace-build-screen (3) (3) (2).png similarity index 100% rename from docs/.gitbook/assets/zoom-marketplace-build-screen (3) (2).png rename to docs/.gitbook/assets/zoom-marketplace-build-screen (3) (3) (2).png diff --git a/docs/.gitbook/assets/zoom-marketplace-build-screen (3).png b/docs/.gitbook/assets/zoom-marketplace-build-screen (3) (3) (3).png similarity index 100% rename from docs/.gitbook/assets/zoom-marketplace-build-screen (3).png rename to docs/.gitbook/assets/zoom-marketplace-build-screen (3) (3) (3).png diff --git a/docs/.gitbook/assets/zoom-marketplace-build-screen (3) (3) (4).png b/docs/.gitbook/assets/zoom-marketplace-build-screen (3) (3) (4).png new file mode 100644 index 00000000000..f481e066aaa Binary files /dev/null and b/docs/.gitbook/assets/zoom-marketplace-build-screen (3) (3) (4).png differ diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index a110e92ff4a..a8b273a6b97 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -8,7 +8,7 @@ * [Set up a Connection](quickstart/set-up-a-connection.md) * [Deploying Airbyte](deploying-airbyte/README.md) * [Local Deployment](deploying-airbyte/local-deployment.md) - * [On Airbyte Cloud](deploying-airbyte/on-cloud.md) + * [On Airbyte Cloud](deploying-airbyte/on-cloud.md) * [On AWS \(EC2\)](deploying-airbyte/on-aws-ec2.md) * [On AWS ECS \(Coming Soon\)](deploying-airbyte/on-aws-ecs.md) * [On Azure\(VM\)](deploying-airbyte/on-azure-vm-cloud-shell.md) @@ -139,14 +139,14 @@ * [Databricks](integrations/destinations/databricks.md) * [DynamoDB](integrations/destinations/dynamodb.md) * [Chargify](integrations/destinations/keen.md) - * [Google Cloud Storage (GCS)](integrations/destinations/gcs.md) + * [Google Cloud Storage \(GCS\)](integrations/destinations/gcs.md) * [Google PubSub](integrations/destinations/pubsub.md) * [Kafka](integrations/destinations/kafka.md) - * [Keen](integrations/destinations/keen.md) + * [Keen](integrations/destinations/keen-1.md) * [Local CSV](integrations/destinations/local-csv.md) * [Local JSON](integrations/destinations/local-json.md) * [MeiliSearch](integrations/destinations/meilisearch.md) - * [MongoDB](integrations/destinations/mongodb.md) + * [MongoDB](integrations/destinations/mongodb.md) * [MSSQL](integrations/destinations/mssql.md) * [MySQL](integrations/destinations/mysql.md) * [Oracle DB](integrations/destinations/oracle.md) @@ -179,7 +179,7 @@ * [HTTP-API-based Connectors](connector-development/cdk-python/http-streams.md) * [Python Concepts](connector-development/cdk-python/python-concepts.md) * [Stream Slices](connector-development/cdk-python/stream-slices.md) - * [Connector Development Kit \(Javascript\)](connector-development/cdk-faros-js/README.md) + * [Connector Development Kit \(Javascript\)](connector-development/cdk-faros-js.md) * [Airbyte 101 for Connector Development](connector-development/airbyte101.md) * [Testing Connectors](connector-development/testing-connectors/README.md) * [Source Acceptance Tests Reference](connector-development/testing-connectors/source-acceptance-tests-reference.md) @@ -227,3 +227,4 @@ * [On Setting up a New Connection](troubleshooting/new-connection.md) * [On Running a Sync](troubleshooting/running-sync.md) * [On Upgrading](troubleshooting/on-upgrading.md) + diff --git a/docs/connector-development/README.md b/docs/connector-development/README.md index cb38d961aa8..fe8ce35eb40 100644 --- a/docs/connector-development/README.md +++ b/docs/connector-development/README.md @@ -6,13 +6,13 @@ To build a new connector in Java or Python, we provide templates so you don't ne **Note: you are not required to maintain the connectors you create.** The goal is that the Airbyte core team and the community help maintain the connector. -## Python Connector-Development Kit (CDK) +## Python Connector-Development Kit \(CDK\) -You can build a connector very quickly in Python with the [Airbyte CDK](cdk-python/README.md), which generates 75% of the code required for you. +You can build a connector very quickly in Python with the [Airbyte CDK](cdk-python/), which generates 75% of the code required for you. -## TS/JS Connector-Development Kit (Faros AI Airbyte CDK) +## TS/JS Connector-Development Kit \(Faros AI Airbyte CDK\) -You can build a connector in TypeScript/JavaScript with the [Faros AI CDK](./cdk-faros-js/README.md), which generates and boostraps most of the code required for HTTP Airbyte sources. +You can build a connector in TypeScript/JavaScript with the [Faros AI CDK](https://github.com/airbytehq/airbyte/tree/01b905a38385ca514c2d9c07cc44a8f9a48ce762/docs/connector-development/cdk-faros-js/README.md), which generates and boostraps most of the code required for HTTP Airbyte sources. ## The Airbyte specification @@ -25,7 +25,7 @@ Before building a new connector, review [Airbyte's data protocol specification]( To add a new connector you need to: 1. Implement & Package your connector in an Airbyte Protocol compliant Docker image -2. Add integration tests for your connector. At a minimum, all connectors must pass [Airbyte's standard test suite](testing-connectors/README.md), but you can also add your own tests. +2. Add integration tests for your connector. At a minimum, all connectors must pass [Airbyte's standard test suite](testing-connectors/), but you can also add your own tests. 3. Document how to build & test your connector 4. Publish the Docker image containing the connector @@ -36,11 +36,13 @@ Each requirement has a subsection below. If you are building a connector in any of the following languages/frameworks, then you're in luck! We provide autogenerated templates to get you started quickly: #### Sources + * **Python Source Connector** * [**Singer**](https://singer.io)**-based Python Source Connector**. [Singer.io](https://singer.io/) is an open source framework with a large community and many available connectors \(known as taps & targets\). To build an Airbyte connector from a Singer tap, wrap the tap in a thin Python package to make it Airbyte Protocol-compatible. See the [Github Connector](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-github-singer) for an example of an Airbyte Connector implemented on top of a Singer tap. * **Generic Connector**: This template provides a basic starting point for any language. #### Destinations + * **Java Destination Connector** * **Python Destination Connector** @@ -58,7 +60,7 @@ and choose the relevant template by using the arrow keys. This will generate a n Search the generated directory for "TODO"s and follow them to implement your connector. For more detailed walkthroughs and instructions, follow the relevant tutorial: * [Speedrun: Building a HTTP source with the CDK](tutorials/cdk-speedrun.md) -* [Building a HTTP source with the CDK](tutorials/cdk-tutorial-python-http) +* [Building a HTTP source with the CDK](tutorials/cdk-tutorial-python-http/) * [Building a Python source](tutorials/building-a-python-source.md) * [Building a Python destination](tutorials/building-a-python-destination.md) * [Building a Java destination](tutorials/building-a-java-destination.md) @@ -67,9 +69,9 @@ As you implement your connector, make sure to review the [Best Practices for Con ### 2. Integration tests -At a minimum, your connector must implement the acceptance tests described in [Testing Connectors](testing-connectors/README.md) +At a minimum, your connector must implement the acceptance tests described in [Testing Connectors](testing-connectors/) -**Note: Acceptance tests are not yet available for Python destination connectors. Coming [soon](https://github.com/airbytehq/airbyte/issues/4698)!** +**Note: Acceptance tests are not yet available for Python destination connectors. Coming** [**soon**](https://github.com/airbytehq/airbyte/issues/4698)**!** ### 3. Document building & testing your connector @@ -88,10 +90,12 @@ When you submit a PR to Airbyte with your connector, the reviewer will use the c 2. `:airbyte-integrations:connectors:source-:integrationTest` should run integration tests including Airbyte's Standard test suite. ### 4. Publish the connector -Typically this will be handled as part of code review by an Airbyter. There is a section below on what steps are needed for publishing a connector and will mostly be used by Airbyte employees publishing the connector. + +Typically this will be handled as part of code review by an Airbyter. There is a section below on what steps are needed for publishing a connector and will mostly be used by Airbyte employees publishing the connector. ## Updating an existing connector -The steps for updating an existing connector are the same as for building a new connector minus the need to use the autogenerator to create a new connector. Therefore the steps are: + +The steps for updating an existing connector are the same as for building a new connector minus the need to use the autogenerator to create a new connector. Therefore the steps are: 1. Iterate on the connector to make the needed changes 2. Run tests @@ -100,7 +104,7 @@ The steps for updating an existing connector are the same as for building a new ## Publishing a connector -Once you've finished iterating on the changes to a connector as specified in its `README.md`, follow these instructions to ship the new version of the connector with Airbyte out of the box. +Once you've finished iterating on the changes to a connector as specified in its `README.md`, follow these instructions to ship the new version of the connector with Airbyte out of the box. 1. Bump the version in the `Dockerfile` of the connector \(`LABEL io.airbyte.version=X.X.X`\). 2. Update the connector definition in the Airbyte connector index to use the new version: @@ -125,6 +129,7 @@ Once you've finished iterating on the changes to a connector as specified in its 6. The new version of the connector is now available for everyone who uses it. Thank you! ## Using credentials in CI + In order to run integration tests in CI, you'll often need to inject credentials into CI. There are a few steps for doing this: 1. **Place the credentials into Lastpass**: Airbyte uses a shared Lastpass account as the source of truth for all secrets. Place the credentials **exactly as they should be used by the connector** into a secure note i.e: it should basically be a copy paste of the `config.json` passed into a connector via the `--config` flag. We use the following naming pattern: ` creds` e.g: `source google adwords creds` or `destination snowflake creds`. @@ -132,3 +137,4 @@ In order to run integration tests in CI, you'll often need to inject credentials 3. **Inject the credentials into test and publish CI workflows**: edit the files `.github/workflows/publish-command.yml` and `.github/workflows/test-command.yml` to inject the secret into the CI run. This will make these secrets available to the `/test` and `/publish` commands. 4. **During CI, write the secret from env variables to the connector directory**: edit `tools/bin/ci_credentials.sh` to write the secret into the `secrets/` directory of the relevant connector. 5. That should be it. + diff --git a/docs/connector-development/airbyte101.md b/docs/connector-development/airbyte101.md index a49f5fada8d..258d85262a8 100644 --- a/docs/connector-development/airbyte101.md +++ b/docs/connector-development/airbyte101.md @@ -2,5 +2,5 @@ ## The Airbyte Catalog -The Airbyte catalog defines the relationship between your incoming data's schema and the schema of your output stream. This -is an incredibly important concept to understand as a connector dev, so check out the AirbyteCatalog [here](../understanding-airbyte/beginners-guide-to-catalog.md). \ No newline at end of file +The Airbyte catalog defines the relationship between your incoming data's schema and the schema of your output stream. This is an incredibly important concept to understand as a connector dev, so check out the AirbyteCatalog [here](../understanding-airbyte/beginners-guide-to-catalog.md). + diff --git a/docs/connector-development/best-practices.md b/docs/connector-development/best-practices.md index 7c30a237919..21459e00132 100644 --- a/docs/connector-development/best-practices.md +++ b/docs/connector-development/best-practices.md @@ -48,3 +48,4 @@ When reviewing connectors, we'll use the following "checklist" to verify whether ### Rate Limiting Most APIs enforce rate limits. Your connector should gracefully handle those \(i.e: without failing the connector process\). The most common way to handle rate limits is to implement backoff. + diff --git a/docs/connector-development/cdk-faros-js/README.md b/docs/connector-development/cdk-faros-js.md similarity index 71% rename from docs/connector-development/cdk-faros-js/README.md rename to docs/connector-development/cdk-faros-js.md index 18820a4bcf8..b1b33e0fb79 100644 --- a/docs/connector-development/cdk-faros-js/README.md +++ b/docs/connector-development/cdk-faros-js.md @@ -1,11 +1,12 @@ -# Connector Development Kit (TypeScript/JavaScript) +# Connector Development Kit \(Javascript\) -The [Faros AI TypeScript/JavaScript CDK](https://github.com/faros-ai/airbyte-connectors/tree/main/faros-airbyte-cdk) allows you to build Airbyte connectors quickly similarly to how our [Python CDK](../cdk-python) does. This CDK currently offers support for creating Airbyte source connectors for: +The [Faros AI TypeScript/JavaScript CDK](https://github.com/faros-ai/airbyte-connectors/tree/main/faros-airbyte-cdk) allows you to build Airbyte connectors quickly similarly to how our [Python CDK](cdk-python/) does. This CDK currently offers support for creating Airbyte source connectors for: -- HTTP APIs +* HTTP APIs ## Resources [This document](https://github.com/faros-ai/airbyte-connectors/blob/main/sources/README.md) is the main guide for developing an Airbyte source with the Faros CDK. -An example of a source built with the Faros AI CDK can be found [here](https://github.com/faros-ai/airbyte-connectors/tree/main/sources/example-source). It's recommended that you follow along with the example source while building for the first time. \ No newline at end of file +An example of a source built with the Faros AI CDK can be found [here](https://github.com/faros-ai/airbyte-connectors/tree/main/sources/example-source). It's recommended that you follow along with the example source while building for the first time. + diff --git a/docs/connector-development/cdk-python/README.md b/docs/connector-development/cdk-python/README.md index 206518dff45..2e3262ba4d4 100644 --- a/docs/connector-development/cdk-python/README.md +++ b/docs/connector-development/cdk-python/README.md @@ -10,7 +10,7 @@ The CDK provides an improved developer experience by providing basic implementat This document is a general introduction to the CDK. Readers should have basic familiarity with the [Airbyte Specification](https://docs.airbyte.io/architecture/airbyte-specification) before proceeding. -If you have any issues with troubleshooting or want to learn more about the CDK from the Airbyte team, head to the #connector-development channel in [our Slack](https://airbytehq.slack.com/ssb/redirect) to inquire further! +If you have any issues with troubleshooting or want to learn more about the CDK from the Airbyte team, head to the \#connector-development channel in [our Slack](https://airbytehq.slack.com/ssb/redirect) to inquire further! ## Getting Started @@ -29,23 +29,23 @@ Additionally, you can follow [this tutorial](https://docs.airbyte.io/connector-d #### Basic Concepts -If you want to learn more about the classes required to implement an Airbyte Source, head to our [basic concepts doc](./basic-concepts.md). +If you want to learn more about the classes required to implement an Airbyte Source, head to our [basic concepts doc](basic-concepts.md). #### Full Refresh Streams -If you have questions or are running into issues creating your first full refresh stream, head over to our [full refresh stream doc](./full-refresh-stream.md). If you have questions about implementing a `path` or `parse_response` function, this doc is for you. +If you have questions or are running into issues creating your first full refresh stream, head over to our [full refresh stream doc](full-refresh-stream.md). If you have questions about implementing a `path` or `parse_response` function, this doc is for you. #### Incremental Streams -Having trouble figuring out how to write a `stream_slices` function or aren't sure what a `cursor_field` is? Head to our [incremental stream doc](./incremental-stream.md). +Having trouble figuring out how to write a `stream_slices` function or aren't sure what a `cursor_field` is? Head to our [incremental stream doc](incremental-stream.md). #### Practical Tips Airbyte recommends using the CDK template generator to develop with the CDK. The template generates created all the required scaffolding, with convenient TODOs, allowing developers to truly focus on implementing the API. -For tips on useful Python knowledge, see the [Python Concepts](./python-concepts.md) page. +For tips on useful Python knowledge, see the [Python Concepts](python-concepts.md) page. -You can find a complete tutorial for implementing an HTTP source connector in [this tutorial](../tutorials/cdk-tutorial-python-http) +You can find a complete tutorial for implementing an HTTP source connector in [this tutorial](../tutorials/cdk-tutorial-python-http/) ### Example Connectors diff --git a/docs/connector-development/cdk-python/basic-concepts.md b/docs/connector-development/cdk-python/basic-concepts.md index 53513202cf1..f8c65f1c395 100644 --- a/docs/connector-development/cdk-python/basic-concepts.md +++ b/docs/connector-development/cdk-python/basic-concepts.md @@ -46,7 +46,7 @@ Note that while this is the most flexible way to implement a source connector, i An `AbstractSource` also owns a set of `Stream`s. This is populated via the `AbstractSource`'s `streams` [function](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L63). `Discover` and `Read` rely on this populated set. -`Discover` returns an `AirbyteCatalog` representing all the distinct resources the underlying API supports. Here is the [entrypoint](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L74) for those interested in reading the code. See [schemas](schemas.md) for more information on how to declare the schema of a stream. +`Discover` returns an `AirbyteCatalog` representing all the distinct resources the underlying API supports. Here is the [entrypoint](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L74) for those interested in reading the code. See [schemas](https://github.com/airbytehq/airbyte/tree/21116cad97f744f936e503f9af5a59ed3ac59c38/docs/contributing-to-airbyte/python/concepts/schemas.md) for more information on how to declare the schema of a stream. `Read` creates an in-memory stream reading from each of the `AbstractSource`'s streams. Here is the [entrypoint](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py#L90) for those interested. diff --git a/docs/connector-development/cdk-python/incremental-stream.md b/docs/connector-development/cdk-python/incremental-stream.md index 9c46208c66a..9acce80890e 100644 --- a/docs/connector-development/cdk-python/incremental-stream.md +++ b/docs/connector-development/cdk-python/incremental-stream.md @@ -8,7 +8,7 @@ Several new pieces are essential to understand how incrementality works with the * cursor fields * `Stream.get_updated_state` - as well as a few other optional concepts. + as well as a few other optional concepts. ### `AirbyteStateMessage` @@ -26,23 +26,22 @@ In the context of the CDK, setting the `Stream.cursor_field` property to any tru This function helps the stream keep track of the latest state by inspecting every record output by the stream \(as returned by the `Stream.read_records` method\) and comparing it against the most recent state object. This allows sync to resume from where the previous sync last stopped, regardless of success or failure. This function typically compares the state object's and the latest record's cursor field, picking the latest one. - ## Checkpointing state -There are two ways to checkpointing state (i.e: controling the timing of when state is saved) while reading data from a connector: +There are two ways to checkpointing state \(i.e: controling the timing of when state is saved\) while reading data from a connector: 1. Interval-based checkpointing 2. Stream Slices - ### Interval based checkpointing -This is the simplest method for checkpointing. When the interval is set to a truthy value e.g: 100, then state is persisted after every 100 records output by the connector e.g: state is saved after reading 100 records, then 200, 300, etc.. -While this is very simple, **it requires that records are output in ascending order with regards to the cursor field**. For example, if your stream outputs records in ascending order of the `updated_at` field, then this is a good fit for your usecase. But if the stream outputs records in a random order, then you cannot use this method because we can only be certain that we read records after a particular `updated_at` timestamp once all records have been fully read. +This is the simplest method for checkpointing. When the interval is set to a truthy value e.g: 100, then state is persisted after every 100 records output by the connector e.g: state is saved after reading 100 records, then 200, 300, etc.. -Interval based checkpointing can be implemented by setting the `Stream.state_checkpoint_interval` property e.g: +While this is very simple, **it requires that records are output in ascending order with regards to the cursor field**. For example, if your stream outputs records in ascending order of the `updated_at` field, then this is a good fit for your usecase. But if the stream outputs records in a random order, then you cannot use this method because we can only be certain that we read records after a particular `updated_at` timestamp once all records have been fully read. -``` +Interval based checkpointing can be implemented by setting the `Stream.state_checkpoint_interval` property e.g: + +```text class MyAmazingStream(Stream): # Save the state every 100 records state_checkpoint_interval = 100 @@ -58,7 +57,7 @@ A Slice object is not typed, and the developer is free to include any informatio As an example, suppose an API is able to dispense data hourly. If the last sync was exactly 24 hours ago, we can either make an API call retrieving all data at once, or make 24 calls each retrieving an hour's worth of data. In the latter case, the `stream_slices` function, sees that the previous state contains yesterday's timestamp, and returns a list of 24 Slices, each with a different hourly timestamp to be used when creating request. If the stream fails halfway through \(at the 12th slice\), then the next time it starts reading, it will read from the beginning of the 12th slice. -For a more in-depth description of stream slicing, see the [Stream Slices guide](stream-slices.md). +For a more in-depth description of stream slicing, see the [Stream Slices guide](https://github.com/airbytehq/airbyte/tree/8500fef4133d3d06e16e8b600d65ebf2c58afefd/docs/connector-development/cdk-python/stream-slices.md). ## Conclusion diff --git a/docs/connector-development/cdk-python/schemas.md b/docs/connector-development/cdk-python/schemas.md index 71c5e2e68b0..f623dde6eda 100644 --- a/docs/connector-development/cdk-python/schemas.md +++ b/docs/connector-development/cdk-python/schemas.md @@ -1,24 +1,30 @@ -# Defining your stream schemas -Your connector must describe the schema of each stream it can output using [JSONSchema](https://json-schema.org). +# Defining Stream Schemas + +Your connector must describe the schema of each stream it can output using [JSONSchema](https://json-schema.org). + +The simplest way to do this is to describe the schema of your streams using one `.json` file per stream. You can also dynamically generate the schema of your stream in code, or you can combine both approaches: start with a `.json` file and dynamically add properties to it. -The simplest way to do this is to describe the schema of your streams using one `.json` file per stream. You can also dynamically generate the schema of your stream in code, or you can combine both approaches: start with a `.json` file and dynamically add properties to it. - The schema of a stream is the return value of `Stream.get_json_schema`. - + ## Static schemas + By default, `Stream.get_json_schema` reads a `.json` file in the `schemas/` directory whose name is equal to the value of the `Stream.name` property. In turn `Stream.name` by default returns the name of the class in snake case. Therefore, if you have a class `class EmployeeBenefits(HttpStream)` the default behavior will look for a file called `schemas/employee_benefits.json`. You can override any of these behaviors as you need. Important note: any objects referenced via `$ref` should be placed in the `shared/` directory in their own `.json` files. ### Generating schemas from OpenAPI definitions + If you are implementing a connector to pull data from an API which publishes an [OpenAPI/Swagger spec](https://swagger.io/specification/), you can use a tool we've provided for generating JSON schemas from the OpenAPI definition file. Detailed information can be found [here](https://github.com/airbytehq/airbyte/tree/master/tools/openapi2jsonschema/). - + ## Dynamic schemas + If you'd rather define your schema in code, override `Stream.get_json_schema` in your stream class to return a `dict` describing the schema using [JSONSchema](https://json-schema.org). -## Dynamically modifying static schemas -Override `Stream.get_json_schema` to run the default behavior, edit the returned value, then return the edited value: -``` +## Dynamically modifying static schemas + +Override `Stream.get_json_schema` to run the default behavior, edit the returned value, then return the edited value: + +```text def get_json_schema(self): schema = super().get_json_schema() schema['dynamically_determined_property'] = "property" @@ -27,11 +33,12 @@ def get_json_schema(self): ## Type transformation -It is important to ensure output data conforms to the declared json schema. This is because the destination receiving this data to load into tables may strictly enforce schema (e.g. when data is stored in a SQL database, you can't put CHAT type into INTEGER column). In the case of changes to API output (which is almost guaranteed to happen over time) or a minor mistake in jsonschema definition, data syncs could thus break because of mismatched datatype schemas. +It is important to ensure output data conforms to the declared json schema. This is because the destination receiving this data to load into tables may strictly enforce schema \(e.g. when data is stored in a SQL database, you can't put CHAT type into INTEGER column\). In the case of changes to API output \(which is almost guaranteed to happen over time\) or a minor mistake in jsonschema definition, data syncs could thus break because of mismatched datatype schemas. -To remain robust in operation, the CDK provides a transformation ability to perform automatic object mutation to align with desired schema before outputting to the destination. All streams inherited from airbyte_cdk.sources.streams.core.Stream class have this transform configuration available. It is _disabled_ by default and can be configured per stream within a source connector. +To remain robust in operation, the CDK provides a transformation ability to perform automatic object mutation to align with desired schema before outputting to the destination. All streams inherited from airbyte_cdk.sources.streams.core.Stream class have this transform configuration available. It is \_disabled_ by default and can be configured per stream within a source connector. ### Default type transformation + Here's how you can configure the TypeTransformer: ```python @@ -43,26 +50,35 @@ class MyStream(Stream): transformer = Transformer(TransformConfig.DefaultSchemaNormalization) ... ``` + In this case default transformation will be applied. For example if you have schema like this -```json + +```javascript {"type": "object", "properties": {"value": {"type": "string"}}} ``` + and source API returned object with non-string type, it would be casted to string automaticaly: -```json + +```javascript {"value": 12} -> {"value": "12"} ``` + Also it works on complex types: -```json + +```javascript {"value": {"unexpected_object": "value"}} -> {"value": "{'unexpected_object': 'value'}"} ``` + And objects inside array of referenced by $ref attribute. - If the value cannot be cast (e.g. string "asdf" cannot be casted to integer), the field would retain its original value. Schema type transformation support any jsonschema types, nested objects/arrays and reference types. Types described as array of more than one type (except "null"), types under oneOf/anyOf keyword wont be transformed. +If the value cannot be cast \(e.g. string "asdf" cannot be casted to integer\), the field would retain its original value. Schema type transformation support any jsonschema types, nested objects/arrays and reference types. Types described as array of more than one type \(except "null"\), types under oneOf/anyOf keyword wont be transformed. -*Note:* This transformation is done by the source, not the stream itself. I.e. if you have overriden "read_records" method in your stream it wont affect object transformation. All transformation are done in-place by modifing output object before passing it to "get_updated_state" method, so "get_updated_state" would receive the transformed object. +_Note:_ This transformation is done by the source, not the stream itself. I.e. if you have overriden "read\_records" method in your stream it wont affect object transformation. All transformation are done in-place by modifing output object before passing it to "get\_updated\_state" method, so "get\_updated\_state" would receive the transformed object. ### Custom schema type transformation + Default schema type transformation performs simple type casting. Sometimes you want to perform more sophisticated transform like making "date-time" field compliant to rcf3339 standard. In this case you can use custom schema type transformation: + ```python class MyStream(Stream): ... @@ -74,27 +90,34 @@ class MyStream(Stream): # transformed_value = ... return transformed_value ``` -Where original_value is initial field value and field_schema is part of jsonschema describing field type. For schema -```json + +Where original\_value is initial field value and field\_schema is part of jsonschema describing field type. For schema + +```javascript {"type": "object", "properties": {"value": {"type": "string", "format": "date-time"}}} ``` -field_schema variable would be equal to -```json + +field\_schema variable would be equal to + +```javascript {"type": "string", "format": "date-time"} ``` + In this case default transformation would be skipped and only custom transformation apply. If you want to run both default and custom transformation you can configure transdormer object by combining config flags: + ```python transformer = Transformer(TransformConfig.DefaultSchemaNormalization | TransformConfig.CustomSchemaNormalization) ``` + In this case custom transformation will be applied after default type transformation function. Note that order of flags doesnt matter, default transformation will always be run before custom. ### Performance consideration -Transofrming each object on the fly would add some time for each object processing. This time is depends on object/schema complexitiy and hardware configuration. +Transofrming each object on the fly would add some time for each object processing. This time is depends on object/schema complexitiy and hardware configuration. -There is some performance benchmark we've done with ads_insights facebook schema (it is complex schema with objects nested inside arrays ob object and a lot of references) and example object. -Here is average transform time per single object, seconds: -``` +There is some performance benchmark we've done with ads\_insights facebook schema \(it is complex schema with objects nested inside arrays ob object and a lot of references\) and example object. Here is average transform time per single object, seconds: + +```text regular transform: 0.0008423403530008121 @@ -107,4 +130,6 @@ transform without actual value setting (but iterating through object properties just traverse/validate through json schema and object fields: 0.0006139181846665452 ``` -On my PC (AMD Ryzen 7 5800X) it took 0.8 milliseconds per one object. As you can see most time (~ 75%) is taken by jsonschema traverse/validation routine and very little (less than 10 %) by actual converting. Processing time can be reduced by skipping jsonschema type checking but it would be no warnings about possible object jsonschema inconsistency. + +On my PC \(AMD Ryzen 7 5800X\) it took 0.8 milliseconds per one object. As you can see most time \(~ 75%\) is taken by jsonschema traverse/validation routine and very little \(less than 10 %\) by actual converting. Processing time can be reduced by skipping jsonschema type checking but it would be no warnings about possible object jsonschema inconsistency. + diff --git a/docs/connector-development/connector-specification-reference.md b/docs/connector-development/connector-specification-reference.md index 7d1fbbfd40c..b82a7783d85 100644 --- a/docs/connector-development/connector-specification-reference.md +++ b/docs/connector-development/connector-specification-reference.md @@ -1,14 +1,16 @@ # Connector Specification Reference -The [connector specification](../understanding-airbyte/airbyte-specification.md#spec) describes what inputs can be used to configure a connector. Like the rest of the Airbyte Protocol, it uses [JsonSchema](https://json-schema.org), but with some slight modifications. + +The [connector specification](../understanding-airbyte/airbyte-specification.md#spec) describes what inputs can be used to configure a connector. Like the rest of the Airbyte Protocol, it uses [JsonSchema](https://json-schema.org), but with some slight modifications. ## Demoing your specification + While iterating on your specification, you can preview what it will look like in the UI in realtime by following the instructions [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-webapp/docs/HowTo-ConnectionSpecification.md). - ### Secret obfuscation -By default, any fields in a connector's specification are visible can be read in the UI. However, if you want to obfuscate fields in the UI and API (for example when working with a password), add the `airbyte_secret` annotation to your connector's `spec.json` e.g: -``` +By default, any fields in a connector's specification are visible can be read in the UI. However, if you want to obfuscate fields in the UI and API \(for example when working with a password\), add the `airbyte_secret` annotation to your connector's `spec.json` e.g: + +```text "password": { "type": "string", "examples": ["hunter2"], @@ -16,14 +18,13 @@ By default, any fields in a connector's specification are visible can be read in }, ``` -Here is an example of what the password field would look like: -Screen Shot 2021-08-04 at 11 15 04 PM - +Here is an example of what the password field would look like: ![Screen Shot 2021-08-04 at 11 15 04 PM](https://user-images.githubusercontent.com/6246757/128300633-7f379b05-5f4a-46e8-ad88-88155e7f4260.png) ### Multi-line String inputs -Sometimes when a user is inputting a string field into a connector, newlines need to be preserveed. For example, if we want a connector to use an RSA key which looks like this: -``` +Sometimes when a user is inputting a string field into a connector, newlines need to be preserveed. For example, if we want a connector to use an RSA key which looks like this: + +```text ---- BEGIN PRIVATE KEY ---- 123 456 @@ -31,11 +32,11 @@ Sometimes when a user is inputting a string field into a connector, newlines nee ---- END PRIVATE KEY ---- ``` -we need to preserve the line-breaks. In other words, the string `---- BEGIN PRIVATE KEY ----123456789---- END PRIVATE KEY ----` is not equivalent to the one above since it loses linebreaks. +we need to preserve the line-breaks. In other words, the string `---- BEGIN PRIVATE KEY ----123456789---- END PRIVATE KEY ----` is not equivalent to the one above since it loses linebreaks. -By default, string inputs in the UI can lose their linebreaks. In order to accept multi-line strings in the UI, annotate your string field with `multiline: true` e.g: +By default, string inputs in the UI can lose their linebreaks. In order to accept multi-line strings in the UI, annotate your string field with `multiline: true` e.g: -``` +```text "private_key": { "type": "string", "description": "RSA private key to use for SSH connection", @@ -44,30 +45,27 @@ By default, string inputs in the UI can lose their linebreaks. In order to accep }, ``` -this will display a multi-line textbox in the UI like the following screenshot: -Screen Shot 2021-08-04 at 11 13 09 PM +this will display a multi-line textbox in the UI like the following screenshot: ![Screen Shot 2021-08-04 at 11 13 09 PM](https://user-images.githubusercontent.com/6246757/128300404-1dc35323-bceb-4f93-9b81-b23cc4beb670.png) +### Using `oneOf`s -### Using `oneOf`s In some cases, a connector needs to accept one out of many options. For example, a connector might need to know the compression codec of the file it will read, which will render in the Airbyte UI as a list of the available codecs. In JSONSchema, this can be expressed using the [oneOf](https://json-schema.org/understanding-json-schema/reference/combining.html#oneof) keyword. {% hint style="info" %} Some connectors may follow an older format for dropdown lists, we are currently migrating away from that to this standard. {% endhint %} -In order for the Airbyte UI to correctly render a specification, however, a few extra rules must be followed: +In order for the Airbyte UI to correctly render a specification, however, a few extra rules must be followed: 1. The top-level item containing the `oneOf` must have `type: object`. 2. Each item in the `oneOf` array must be a property with `type: object`. 3. One `string` field with the same property name must be consistently present throughout each object inside the `oneOf` array. It is required to add a [`const`](https://json-schema.org/understanding-json-schema/reference/generic.html#constant-values) value unique to that `oneOf` option. -Let's look at the [source-file](../integrations/sources/file.md) implementation as an example. In this example, we have `provider` as a dropdown -list option, which allows the user to select what provider their file is being hosted on. We note that the `oneOf` keyword lives under the `provider` object as follows: +Let's look at the [source-file](../integrations/sources/file.md) implementation as an example. In this example, we have `provider` as a dropdown list option, which allows the user to select what provider their file is being hosted on. We note that the `oneOf` keyword lives under the `provider` object as follows: -In each item in the `oneOf` array, the `option_title` string field exists with the aforementioned `const`, `default` and `enum` value unique to that item. There is a [Github issue](https://github.com/airbytehq/airbyte/issues/6384) to improve it and use only `const` in the specification. This helps the UI and the connector distinguish between the option that was chosen by the user. This can -be displayed with adapting the file source spec to this example: +In each item in the `oneOf` array, the `option_title` string field exists with the aforementioned `const`, `default` and `enum` value unique to that item. There is a [Github issue](https://github.com/airbytehq/airbyte/issues/6384) to improve it and use only `const` in the specification. This helps the UI and the connector distinguish between the option that was chosen by the user. This can be displayed with adapting the file source spec to this example: -```json +```javascript { "connection_specification": { "$schema": "http://json-schema.org/draft-07/schema#", @@ -126,5 +124,6 @@ be displayed with adapting the file source spec to this example: ] } } -} +} ``` + diff --git a/docs/connector-development/testing-connectors/source-acceptance-tests-reference.md b/docs/connector-development/testing-connectors/source-acceptance-tests-reference.md index d3605a98a9a..bbe75d0c611 100644 --- a/docs/connector-development/testing-connectors/source-acceptance-tests-reference.md +++ b/docs/connector-development/testing-connectors/source-acceptance-tests-reference.md @@ -65,10 +65,10 @@ def connector_setup(): container.stop() ``` -These tests are configurable via `acceptance-test-config.yml`. Each test has a number of inputs, -you can provide multiple sets of inputs which will cause the same to run multiple times - one for each set of inputs. +These tests are configurable via `acceptance-test-config.yml`. Each test has a number of inputs, you can provide multiple sets of inputs which will cause the same to run multiple times - one for each set of inputs. Example of `acceptance-test-config.yml`: + ```yaml connector_image: string # Docker image to test, for example 'airbyte/source-hubspot:0.1.0' base_path: string # Base path for all relative paths, optional, default - ./ @@ -84,97 +84,111 @@ tests: # Tests configuration ``` ## Test Spec + Verify that a spec operation issued to the connector returns a valid spec. -| Input | Type| Default | Note | -|--|--|--|--| -| `spec_path` | string | `secrets/spec.json` |Path to a JSON object representing the spec expected to be output by this connector | -| `timeout_seconds` | int | 10 |Test execution timeout in seconds| + +| Input | Type | Default | Note | +| :--- | :--- | :--- | :--- | +| `spec_path` | string | `secrets/spec.json` | Path to a JSON object representing the spec expected to be output by this connector | +| `timeout_seconds` | int | 10 | Test execution timeout in seconds | ## Test Connection + Verify that a check operation issued to the connector with the input config file returns a successful response. -| Input | Type| Default | Note | -|--|--|--|--| -| `config_path` | string | `secrets/config.json` |Path to a JSON object representing a valid connector configuration| -| `status` | `succeed` `failed` `exception`| |Indicate if connection check should succeed with provided config| -| `timeout_seconds` | int | 30 |Test execution timeout in seconds| + +| Input | Type | Default | Note | +| :--- | :--- | :--- | :--- | +| `config_path` | string | `secrets/config.json` | Path to a JSON object representing a valid connector configuration | +| `status` | `succeed` `failed` `exception` | | Indicate if connection check should succeed with provided config | +| `timeout_seconds` | int | 30 | Test execution timeout in seconds | ## Test Discovery Verifies when a discover operation is run on the connector using the given config file, a valid catalog is produced by the connector. -| Input | Type| Default | Note | -|--|--|--|--| -| `config_path` | string | `secrets/config.json` |Path to a JSON object representing a valid connector configuration| -| `configured_catalog_path` | string| `integration_tests/configured_catalog.json` |Path to configured catalog| -| `timeout_seconds` | int | 30 |Test execution timeout in seconds| + +| Input | Type | Default | Note | +| :--- | :--- | :--- | :--- | +| `config_path` | string | `secrets/config.json` | Path to a JSON object representing a valid connector configuration | +| `configured_catalog_path` | string | `integration_tests/configured_catalog.json` | Path to configured catalog | +| `timeout_seconds` | int | 30 | Test execution timeout in seconds | ## Test Basic Read -Configuring all streams in the input catalog to full refresh mode verifies that a read operation produces some RECORD messages. -Each stream should have some data, if you can't guarantee this for particular streams - add them to the `empty_streams` list. -| Input | Type| Default | Note | -|--|--|--|--| -| `config_path` | string | `secrets/config.json` |Path to a JSON object representing a valid connector configuration| -| `configured_catalog_path` | string| `integration_tests/configured_catalog.json` |Path to configured catalog| -| `empty_streams` | array | [] |List of streams that might be empty| -| `validate_schema` | boolean | True |Verify that structure and types of records matches the schema from discovery command| -| `timeout_seconds` | int | 5*60 |Test execution timeout in seconds| -| `expect_records` | object |None| Compare produced records with expected records, see details below| -| `expect_records.path` | string | | File with expected records| +Configuring all streams in the input catalog to full refresh mode verifies that a read operation produces some RECORD messages. Each stream should have some data, if you can't guarantee this for particular streams - add them to the `empty_streams` list. + +| Input | Type | Default | Note | +| :--- | :--- | :--- | :--- | +| `config_path` | string | `secrets/config.json` | Path to a JSON object representing a valid connector configuration | +| `configured_catalog_path` | string | `integration_tests/configured_catalog.json` | Path to configured catalog | +| `empty_streams` | array | \[\] | List of streams that might be empty | +| `validate_schema` | boolean | True | Verify that structure and types of records matches the schema from discovery command | +| `timeout_seconds` | int | 5\*60 | Test execution timeout in seconds | +| `expect_records` | object | None | Compare produced records with expected records, see details below | +| `expect_records.path` | string | | File with expected records | | `expect_records.extra_fields` | boolean | False | Allow output records to have other fields i.e: expected records are a subset | -| `expect_records.exact_order` | boolean | False | Ensure that records produced in exact same order| -| `expect_records.extra_records` | boolean | True | Allow connector to produce extra records, but still enforce all records from the expected file to be produced| +| `expect_records.exact_order` | boolean | False | Ensure that records produced in exact same order | +| `expect_records.extra_records` | boolean | True | Allow connector to produce extra records, but still enforce all records from the expected file to be produced | `expect_records` is a nested configuration, if omitted - the part of the test responsible for record matching will be skipped. Due to the fact that we can't identify records without primary keys, only the following flag combinations are supported: -| extra_fields | exact_order| extra_records | -|--|--|--| -|x|x|| -||x|x| -||x|| -|||x| -|||| + +| extra\_fields | exact\_order | extra\_records | +| :--- | :--- | :--- | +| x | x | | +| | x | x | +| | x | | +| | | x | +| | | | + ### Schema format checking If some field has [format](https://json-schema.org/understanding-json-schema/reference/string.html#format) attribute specified on its catalog json schema, Source Acceptance Testing framework performs checking against format. It support checking of all [builtin](https://json-schema.org/understanding-json-schema/reference/string.html#built-in-formats) jsonschema formats for draft 7 specification: email, hostnames, ip addresses, time, date and date-time formats. -Note: For date-time we are not checking against compliance against ISO8601 (and RFC3339 as subset of it). Since we are using specified format to set database column type on db normalization stage, value should be compliant to bigquery [timestamp](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type) and SQL "timestamp with timezone" formats. -### Example of `expected_records.txt`: -In general, the expected_records.json should contain the subset of output of the records of particular stream you need to test. -The required fields are: `stream, data, emitted_at` +Note: For date-time we are not checking against compliance against ISO8601 \(and RFC3339 as subset of it\). Since we are using specified format to set database column type on db normalization stage, value should be compliant to bigquery [timestamp](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type) and SQL "timestamp with timezone" formats. -```JSON +### Example of `expected_records.txt`: + +In general, the expected\_records.json should contain the subset of output of the records of particular stream you need to test. The required fields are: `stream, data, emitted_at` + +```javascript {"stream": "my_stream", "data": {"field_1": "value0", "field_2": "value0", "field_3": null, "field_4": {"is_true": true}, "field_5": 123}, "emitted_at": 1626172757000} {"stream": "my_stream", "data": {"field_1": "value1", "field_2": "value1", "field_3": null, "field_4": {"is_true": false}, "field_5": 456}, "emitted_at": 1626172757000} {"stream": "my_stream", "data": {"field_1": "value2", "field_2": "value2", "field_3": null, "field_4": {"is_true": true}, "field_5": 678}, "emitted_at": 1626172757000} {"stream": "my_stream", "data": {"field_1": "value3", "field_2": "value3", "field_3": null, "field_4": {"is_true": false}, "field_5": 91011}, "emitted_at": 1626172757000} - ``` ## Test Full Refresh sync + ### TestSequentialReads This test performs two read operations on all streams which support full refresh syncs. It then verifies that the RECORD messages output from both were identical or the former is a strict subset of the latter. -| Input | Type| Default | Note | -|--|--|--|--| -| `config_path` | string | `secrets/config.json` |Path to a JSON object representing a valid connector configuration| -| `configured_catalog_path` | string | `integration_tests/configured_catalog.json` |Path to configured catalog| -| `timeout_seconds` | int | 20*60 |Test execution timeout in seconds| + +| Input | Type | Default | Note | +| :--- | :--- | :--- | :--- | +| `config_path` | string | `secrets/config.json` | Path to a JSON object representing a valid connector configuration | +| `configured_catalog_path` | string | `integration_tests/configured_catalog.json` | Path to configured catalog | +| `timeout_seconds` | int | 20\*60 | Test execution timeout in seconds | ## Test Incremental sync + ### TestTwoSequentialReads + This test verifies that all streams in the input catalog which support incremental sync can do so correctly. It does this by running two read operations: the first takes the configured catalog and config provided to this test as input. It then verifies that the sync produced a non-zero number of `RECORD` and `STATE` messages. The second read takes the same catalog and config used in the first test, plus the last `STATE` message output by the first read operation as the input state file. It verifies that either no records are produced \(since we read all records in the first sync\) or all records that produced have cursor value greater or equal to cursor value from `STATE` message. This test is performed only for streams that support incremental. Streams that do not support incremental sync are ignored. If no streams in the input catalog support incremental sync, this test is skipped. -| Input | Type| Default | Note | -|--|--|--|--| -| `config_path` | string | `secrets/config.json` |Path to a JSON object representing a valid connector configuration| -| `configured_catalog_path` | string | `integration_tests/configured_catalog.json` |Path to configured catalog| -| `cursor_paths` | dict | {} | For each stream, the path of its cursor field in the output state messages. If omitted the path will be taken from the last piece of path from stream cursor_field.| -| `timeout_seconds` | int | 20*60 |Test execution timeout in seconds| + +| Input | Type | Default | Note | +| :--- | :--- | :--- | :--- | +| `config_path` | string | `secrets/config.json` | Path to a JSON object representing a valid connector configuration | +| `configured_catalog_path` | string | `integration_tests/configured_catalog.json` | Path to configured catalog | +| `cursor_paths` | dict | {} | For each stream, the path of its cursor field in the output state messages. If omitted the path will be taken from the last piece of path from stream cursor\_field. | +| `timeout_seconds` | int | 20\*60 | Test execution timeout in seconds | ### TestStateWithAbnormallyLargeValues This test verifies that sync produces no records when run with the STATE with abnormally large values -| Input | Type| Default | Note | -|--|--|--|--| -| `config_path` | string | `secrets/config.json` |Path to a JSON object representing a valid connector configuration| -| `configured_catalog_path` | string | `integration_tests/configured_catalog.json` |Path to configured catalog| -| `future_state_path` | string | None |Path to the state file with abnormally large cursor values| -| `timeout_seconds` | int | 20*60 |Test execution timeout in seconds| + +| Input | Type | Default | Note | +| :--- | :--- | :--- | :--- | +| `config_path` | string | `secrets/config.json` | Path to a JSON object representing a valid connector configuration | +| `configured_catalog_path` | string | `integration_tests/configured_catalog.json` | Path to configured catalog | +| `future_state_path` | string | None | Path to the state file with abnormally large cursor values | +| `timeout_seconds` | int | 20\*60 | Test execution timeout in seconds | + diff --git a/docs/connector-development/tutorials/building-a-java-destination.md b/docs/connector-development/tutorials/building-a-java-destination.md index 7edf95a465b..a7a6425a72d 100644 --- a/docs/connector-development/tutorials/building-a-java-destination.md +++ b/docs/connector-development/tutorials/building-a-java-destination.md @@ -40,7 +40,7 @@ $ cd airbyte-integrations/connector-templates/generator # assumes you are starti $ ./generate.sh ``` -Select the `Java Destination` template and then input the name of your connector. We'll refer to the destination as `-destination` in this tutorial, but you should replace `` with the actual name you used for your connector e.g: `BigQueryDestination` or `bigquery-destination`. +Select the `Java Destination` template and then input the name of your connector. We'll refer to the destination as `-destination` in this tutorial, but you should replace `` with the actual name you used for your connector e.g: `BigQueryDestination` or `bigquery-destination`. ### Step 2: Build the newly generated destination @@ -51,43 +51,45 @@ You can build the destination by running: ./gradlew :airbyte-integrations:connectors:destination-:build ``` -On Mac M1(Apple Silicon) machines(until openjdk images natively support ARM64 images) set the platform variable as shown below and build +On Mac M1\(Apple Silicon\) machines\(until openjdk images natively support ARM64 images\) set the platform variable as shown below and build + ```bash export DOCKER_BUILD_PLATFORM=linux/amd64 # Must be run from the Airbyte project root ./gradlew :airbyte-integrations:connectors:destination-:build ``` -this compiles the java code for your destination and builds a Docker image with the connector. At this point, we haven't implemented anything of value yet, but once we do, you'll use this command to compile your code and Docker image. +this compiles the java code for your destination and builds a Docker image with the connector. At this point, we haven't implemented anything of value yet, but once we do, you'll use this command to compile your code and Docker image. {% hint style="info" %} -Airbyte uses Gradle to manage Java dependencies. To add dependencies for your connector, manage them in the `build.gradle` file inside your connector's directory. +Airbyte uses Gradle to manage Java dependencies. To add dependencies for your connector, manage them in the `build.gradle` file inside your connector's directory. {% endhint %} #### Iterating on your implementation We recommend the following ways of iterating on your connector as you're making changes: -* Test-driven development (TDD) in Java -* Test-driven development (TDD) using Airbyte's Acceptance Tests +* Test-driven development \(TDD\) in Java +* Test-driven development \(TDD\) using Airbyte's Acceptance Tests * Directly running the docker image #### Test-driven development in Java -This should feel like a standard flow for a Java developer: you make some code changes then run java tests against them. You can do this directly in your IDE, but you can also run all unit tests via Gradle by running the command to build the connector: -``` +This should feel like a standard flow for a Java developer: you make some code changes then run java tests against them. You can do this directly in your IDE, but you can also run all unit tests via Gradle by running the command to build the connector: + +```text ./gradlew :airbyte-integrations:connectors:destination-:build ``` -This will build the code and run any unit tests. This approach is great when you are testing local behaviors and writing unit tests. +This will build the code and run any unit tests. This approach is great when you are testing local behaviors and writing unit tests. #### TDD using acceptance tests & integration tests -Airbyte provides a standard test suite (dubbed "Acceptance Tests") that runs against every destination connector. They are "free" baseline tests to ensure the basic functionality of the destination. When developing a connector, you can simply run the tests between each change and use the feedback to guide your development. +Airbyte provides a standard test suite \(dubbed "Acceptance Tests"\) that runs against every destination connector. They are "free" baseline tests to ensure the basic functionality of the destination. When developing a connector, you can simply run the tests between each change and use the feedback to guide your development. If you want to try out this approach, check out Step 6 which describes what you need to do to set up the acceptance Tests for your destination. -The nice thing about this approach is that you are running your destination exactly as Airbyte will run it in the CI. The downside is that the tests do not run very quickly. As such, we recommend this iteration approach only once you've implemented most of your connector and are in the finishing stages of implementation. Note that Acceptance Tests are required for every connector supported by Airbyte, so you should make sure to run them a couple of times while iterating to make sure your connector is compatible with Airbyte. +The nice thing about this approach is that you are running your destination exactly as Airbyte will run it in the CI. The downside is that the tests do not run very quickly. As such, we recommend this iteration approach only once you've implemented most of your connector and are in the finishing stages of implementation. Note that Acceptance Tests are required for every connector supported by Airbyte, so you should make sure to run them a couple of times while iterating to make sure your connector is compatible with Airbyte. #### Directly running the destination using Docker @@ -116,11 +118,12 @@ The nice thing about this approach is that you are running your destination exac Each destination contains a specification written in JsonSchema that describes its inputs. Defining the specification is a good place to start when developing your destination. Check out the documentation [here](https://json-schema.org/) to learn the syntax. Here's [an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/resources/spec.json) of what the `spec.json` looks like for the postgres destination. -Your generated template should have the spec file in `airbyte-integrations/connectors/destination-/src/main/resources/spec.json`. The generated connector will take care of reading this file and converting it to the correct output. Edit it and you should be done with this step. +Your generated template should have the spec file in `airbyte-integrations/connectors/destination-/src/main/resources/spec.json`. The generated connector will take care of reading this file and converting it to the correct output. Edit it and you should be done with this step. For more details on what the spec is, you can read about the Airbyte Protocol [here](../../understanding-airbyte/airbyte-specification.md). -See the `spec` operation in action: +See the `spec` operation in action: + ```bash # First build the connector ./gradlew :airbyte-integrations:connectors:destination-:build @@ -131,15 +134,15 @@ docker run --rm airbyte/destination-:dev spec ### Step 4: Implement `check` -The check operation accepts a JSON object conforming to the `spec.json`. In other words if the `spec.json` said that the destination requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given the credentials in the config, whether we were able to connect to the destination. +The check operation accepts a JSON object conforming to the `spec.json`. In other words if the `spec.json` said that the destination requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given the credentials in the config, whether we were able to connect to the destination. While developing, we recommend storing any credentials in `secrets/config.json`. Any `secrets` directory in the Airbyte repo is gitignored by default. -Implement the `check` method in the generated file `Destination.java`. Here's an [example implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L94) from the BigQuery destination. +Implement the `check` method in the generated file `Destination.java`. Here's an [example implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L94) from the BigQuery destination. -Verify that the method is working by placing your config in `secrets/config.json` then running: +Verify that the method is working by placing your config in `secrets/config.json` then running: -``` +```text # First build the connector ./gradlew :airbyte-integrations:connectors:destination-:build @@ -148,26 +151,25 @@ docker run -v $(pwd)/secrets:/secrets --rm airbyte/destination-:dev check ``` ### Step 5: Implement `write` -The `write` operation is the main workhorse of a destination connector: it reads input data from the source and writes it to the underlying destination. It takes as input the config file used to run the connector as well as the configured catalog: the file used to describe the schema of the incoming data and how it should be written to the destination. Its "output" is two things: + +The `write` operation is the main workhorse of a destination connector: it reads input data from the source and writes it to the underlying destination. It takes as input the config file used to run the connector as well as the configured catalog: the file used to describe the schema of the incoming data and how it should be written to the destination. Its "output" is two things: 1. Data written to the underlying destination 2. `AirbyteMessage`s of type `AirbyteStateMessage`, written to stdout to indicate which records have been written so far during a sync. It's important to output these messages when possible in order to avoid re-extracting messages from the source. See the [write operation protocol reference](https://docs.airbyte.io/understanding-airbyte/airbyte-specification#write) for more information. To implement the `write` Airbyte operation, implement the `getConsumer` method in your generated `Destination.java` file. Here are some example implementations from different destination conectors: - + * [BigQuery](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java#L188) * [Google Pubsub](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-pubsub/src/main/java/io/airbyte/integrations/destination/pubsub/PubsubDestination.java#L98) * [Local CSV](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-csv/src/main/java/io/airbyte/integrations/destination/csv/CsvDestination.java#L90) * [Postgres](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/java/io/airbyte/integrations/destination/postgres/PostgresDestination.java) - {% hint style="info" %} -The Postgres destination leverages the `AbstractJdbcDestination` superclass which makes it extremely easy to create a destination for a database or data warehouse if it has a compatible JDBC driver. If the destination you are implementing has a JDBC driver, be sure to check out `AbstractJdbcDestination`. +The Postgres destination leverages the `AbstractJdbcDestination` superclass which makes it extremely easy to create a destination for a database or data warehouse if it has a compatible JDBC driver. If the destination you are implementing has a JDBC driver, be sure to check out `AbstractJdbcDestination`. {% endhint %} For a brief overview on the Airbyte catalog check out [the Beginner's Guide to the Airbyte Catalog](../../understanding-airbyte/beginners-guide-to-catalog.md). - ### Step 6: Set up Acceptance Tests The Acceptance Tests are a set of tests that run against all destinations. These tests are run in the Airbyte CI to prevent regressions and verify a baseline of functionality. The test cases are contained and documented in the [following file](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/bases/standard-destination-test/src/main/java/io/airbyte/integrations/standardtest/destination/DestinationAcceptanceTest.java). @@ -175,6 +177,7 @@ The Acceptance Tests are a set of tests that run against all destinations. These To setup acceptance Tests for your connector, follow the `TODO`s in the generated file `DestinationAcceptanceTest.java`. Once setup, you can run the tests using `./gradlew :airbyte-integrations:connectors:destination-:integrationTest`. Make sure to run this command from the Airbyte repository root. ### Step 7: Write unit tests and/or integration tests + The Acceptance Tests are meant to cover the basic functionality of a destination. Think of it as the bare minimum required for us to add a destination to Airbyte. You should probably add some unit testing or custom integration testing in case you need to test additional functionality of your destination. #### Step 8: Update the docs @@ -182,4 +185,6 @@ The Acceptance Tests are meant to cover the basic functionality of a destination Each connector has its own documentation page. By convention, that page should have the following path: in `docs/integrations/destinations/.md`. For the documentation to get packaged with the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match doing that from existing connectors. ## Wrapping up -Well done on making it this far! If you'd like your connector to ship with Airbyte by default, create a PR against the Airbyte repo and we'll work with you to get it across the finish line. + +Well done on making it this far! If you'd like your connector to ship with Airbyte by default, create a PR against the Airbyte repo and we'll work with you to get it across the finish line. + diff --git a/docs/connector-development/tutorials/building-a-python-destination.md b/docs/connector-development/tutorials/building-a-python-destination.md index 4e2df1a2e37..c6afcbf3a4b 100644 --- a/docs/connector-development/tutorials/building-a-python-destination.md +++ b/docs/connector-development/tutorials/building-a-python-destination.md @@ -6,7 +6,7 @@ This article provides a checklist for how to create a Python destination. Each s ## Requirements -Docker and Python with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md). You can use any Python version between 3.7 and 3.9, but this tutorial was tested with 3.7. +Docker and Python with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md). You can use any Python version between 3.7 and 3.9, but this tutorial was tested with 3.7. ## Checklist @@ -22,7 +22,7 @@ Docker and Python with the versions listed in the [tech stack section](../../und * Step 8: Update the docs \(in `docs/integrations/destinations/.md`\) {% hint style="info" %} -If you need help with any step of the process, feel free to submit a PR with your progress and any questions you have, or ask us on [slack](https://slack.airbyte.io). Also reference the KvDB python destination implementation if you want to see an example of a working destination. +If you need help with any step of the process, feel free to submit a PR with your progress and any questions you have, or ask us on [slack](https://slack.airbyte.io). Also reference the KvDB python destination implementation if you want to see an example of a working destination. {% endhint %} ## Explaining Each Step @@ -36,11 +36,11 @@ $ cd airbyte-integrations/connector-templates/generator # assumes you are starti $ ./generate.sh ``` -Select the `Python Destination` template and then input the name of your connector. We'll refer to the destination as `destination-` in this tutorial, but you should replace `` with the actual name you used for your connector e.g: `redis` or `google-sheets`. +Select the `Python Destination` template and then input the name of your connector. We'll refer to the destination as `destination-` in this tutorial, but you should replace `` with the actual name you used for your connector e.g: `redis` or `google-sheets`. ### Step 2: Setup the dev environment -Setup your Python virtual environment: +Setup your Python virtual environment: ```bash cd airbyte-integrations/connectors/destination- @@ -54,6 +54,7 @@ source .venv/bin/activate # Install with the "tests" extra which provides test requirements pip install '.[tests]' ``` + This step sets up the initial python environment. **All** subsequent `python` or `pip` commands assume you have activated your virtual environment. If you want your IDE to auto complete and resolve dependencies properly, point it at the python binary in `airbyte-integrations/connectors/destination-/.venv/bin/python`. Also anytime you change the dependencies in the `setup.py` make sure to re-run the build command. The build system will handle installing all dependencies in the `setup.py` into the virtual environment. @@ -62,14 +63,14 @@ Let's quickly get a few housekeeping items out of the way. #### Dependencies -Python dependencies for your destination should be declared in `airbyte-integrations/connectors/destination-/setup.py` in the `install_requires` field. You might notice that a couple of Airbyte dependencies are already declared there (mainly the Airbyte CDK and potentially some testing libraries or helpers). Keep those as they will be useful during development. +Python dependencies for your destination should be declared in `airbyte-integrations/connectors/destination-/setup.py` in the `install_requires` field. You might notice that a couple of Airbyte dependencies are already declared there \(mainly the Airbyte CDK and potentially some testing libraries or helpers\). Keep those as they will be useful during development. You may notice that there is a `requirements.txt` in your destination's directory as well. Do not touch this. It is autogenerated and used to install local Airbyte dependencies which are not published to PyPI. All your dependencies should be declared in `setup.py`. #### Iterating on your implementation -Pretty much all it takes to create a destination is to implement the `Destination` interface. Let's briefly recap the three methods implemented by a Destination: - +Pretty much all it takes to create a destination is to implement the `Destination` interface. Let's briefly recap the three methods implemented by a Destination: + 1. `spec`: declares the user-provided credentials or configuration needed to run the connector 2. `check`: tests if the user-provided configuration can be used to connect to the underlying data destination, and with the correct write permissions 3. `write`: writes data to the underlying destination by reading a configuration, a stream of records from stdin, and a configured catalog describing the schema of the data and how it should be written to the destination @@ -98,8 +99,7 @@ cat messages.jsonl | python main.py write --config secrets/config.json --catalog The nice thing about this approach is that you can iterate completely within in python. The downside is that you are not quite running your destination as it will actually be run by Airbyte. Specifically you're not running it from within the docker container that will house it. -**Run using Docker** -If you want to run your destination exactly as it will be run by Airbyte \(i.e. within a docker container\), you can use the following commands from the connector module directory \(`airbyte-integrations/connectors/destination-`\): +**Run using Docker** If you want to run your destination exactly as it will be run by Airbyte \(i.e. within a docker container\), you can use the following commands from the connector module directory \(`airbyte-integrations/connectors/destination-`\): ```bash # First build the container @@ -117,7 +117,7 @@ The nice thing about this approach is that you are running your source exactly a **TDD using standard tests** -_note: these tests aren't yet available for Python connectors but will be very soon. Until then you should use custom unit or integration tests for TDD_. +_note: these tests aren't yet available for Python connectors but will be very soon. Until then you should use custom unit or integration tests for TDD_. Airbyte provides a standard test suite that is run against every destination. The objective of these tests is to provide some "free" tests that can sanity check that the basic functionality of the destination works. One approach to developing your connector is to simply run the tests between each change and use the feedback from them to guide your development. @@ -127,26 +127,25 @@ The nice thing about this approach is that you are running your destination exac ### Step 3: Implement `spec` -Each destination contains a specification written in JsonSchema that describes the inputs it requires and accepts. Defining the specification is a good place to start development. -To do this, find the spec file generated in `airbyte-integrations/connectors/destination-/src/main/resources/spec.json`. Edit it and you should be done with this step. The generated connector will take care of reading this file and converting it to the correct output. +Each destination contains a specification written in JsonSchema that describes the inputs it requires and accepts. Defining the specification is a good place to start development. To do this, find the spec file generated in `airbyte-integrations/connectors/destination-/src/main/resources/spec.json`. Edit it and you should be done with this step. The generated connector will take care of reading this file and converting it to the correct output. Some notes about fields in the output spec: + * `supportsNormalization` is a boolean which indicates if this connector supports [basic normalization via DBT](https://docs.airbyte.io/understanding-airbyte/basic-normalization). If true, `supportsDBT` must also be true. * `supportsDBT` is a boolean which indicates whether this destination is compatible with DBT. If set to true, the user can define custom DBT transformations that run on this destination after each successful sync. This must be true if `supportsNormalization` is set to true. * `supported_destination_sync_modes`: An array of strings declaring the sync modes supported by this connector. The available options are: - * `overwrite`: The connector can be configured to wipe any existing data in a stream before writing new data - * `append`: The connector can be configured to append new data to existing data - * `append_dedupe`: The connector can be configured to deduplicate (i.e: UPSERT) data in the destination based on the new data and primary keys + * `overwrite`: The connector can be configured to wipe any existing data in a stream before writing new data + * `append`: The connector can be configured to append new data to existing data + * `append_dedupe`: The connector can be configured to deduplicate \(i.e: UPSERT\) data in the destination based on the new data and primary keys * `supportsIncremental`: Whether the connector supports any `append` sync mode. Must be set to true if `append` or `append_dedupe` are included in the `supported_destination_sync_modes`. - -Some helpful resources: +Some helpful resources: * [**JSONSchema website**](https://json-schema.org/) -* [**Definition of Airbyte Protocol data models**](https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml). The output of `spec` is described by the `ConnectorSpecification` model (which is wrapped in an `AirbyteConnectionStatus` message). +* [**Definition of Airbyte Protocol data models**](https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml). The output of `spec` is described by the `ConnectorSpecification` model \(which is wrapped in an `AirbyteConnectionStatus` message\). * [**Postgres Destination's spec.json file**](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/resources/spec.json) as an example `spec.json`. -Once you've edited the file, see the `spec` operation in action: +Once you've edited the file, see the `spec` operation in action: ```bash python main.py spec @@ -154,20 +153,21 @@ python main.py spec ### Step 4: Implement `check` -The check operation accepts a JSON object conforming to the `spec.json`. In other words if the `spec.json` said that the destination requires a `username` and `password`, the config object might be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given the credentials in the config, whether we were able to connect to the destination. +The check operation accepts a JSON object conforming to the `spec.json`. In other words if the `spec.json` said that the destination requires a `username` and `password`, the config object might be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given the credentials in the config, whether we were able to connect to the destination. While developing, we recommend storing any credentials in `secrets/config.json`. Any `secrets` directory in the Airbyte repo is gitignored by default. -Implement the `check` method in the generated file `destination_/destination.py`. Here's an [example implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-kvdb/destination_kvdb/destination.py) from the KvDB destination. +Implement the `check` method in the generated file `destination_/destination.py`. Here's an [example implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-kvdb/destination_kvdb/destination.py) from the KvDB destination. -Verify that the method is working by placing your config in `secrets/config.json` then running: +Verify that the method is working by placing your config in `secrets/config.json` then running: ```bash python main.py check --config secrets/config.json ``` ### Step 5: Implement `write` -The `write` operation is the main workhorse of a destination connector: it reads input data from the source and writes it to the underlying destination. It takes as input the config file used to run the connector as well as the configured catalog: the file used to describe the schema of the incoming data and how it should be written to the destination. Its "output" is two things: + +The `write` operation is the main workhorse of a destination connector: it reads input data from the source and writes it to the underlying destination. It takes as input the config file used to run the connector as well as the configured catalog: the file used to describe the schema of the incoming data and how it should be written to the destination. Its "output" is two things: 1. Data written to the underlying destination 2. `AirbyteMessage`s of type `AirbyteStateMessage`, written to stdout to indicate which records have been written so far during a sync. It's important to output these messages when possible in order to avoid re-extracting messages from the source. See the [write operation protocol reference](https://docs.airbyte.io/understanding-airbyte/airbyte-specification#write) for more information. @@ -176,22 +176,25 @@ To implement the `write` Airbyte operation, implement the `write` method in your ### Step 6: Set up Acceptance Tests -_Coming soon. These tests are not yet available for Python destinations but will be very soon. For now please skip this step and rely on copious -amounts of integration and unit testing_. +_Coming soon. These tests are not yet available for Python destinations but will be very soon. For now please skip this step and rely on copious amounts of integration and unit testing_. ### Step 7: Write unit tests and/or integration tests + The Acceptance Tests are meant to cover the basic functionality of a destination. Think of it as the bare minimum required for us to add a destination to Airbyte. You should probably add some unit testing or custom integration testing in case you need to test additional functionality of your destination. Add unit tests in `unit_tests/` directory and integration tests in the `integration_tests/` directory. Run them via + ```bash python -m pytest -s -vv integration_tests/ -``` +``` -See the [KvDB integration tests](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-kvdb/integration_tests/integration_test.py) for an example of tests you can implement. +See the [KvDB integration tests](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-kvdb/integration_tests/integration_test.py) for an example of tests you can implement. #### Step 8: Update the docs Each connector has its own documentation page. By convention, that page should have the following path: in `docs/integrations/destinations/.md`. For the documentation to get packaged with the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match doing that from existing connectors. ## Wrapping up -Well done on making it this far! If you'd like your connector to ship with Airbyte by default, create a PR against the Airbyte repo and we'll work with you to get it across the finish line. + +Well done on making it this far! If you'd like your connector to ship with Airbyte by default, create a PR against the Airbyte repo and we'll work with you to get it across the finish line. + diff --git a/docs/connector-development/tutorials/cdk-speedrun.md b/docs/connector-development/tutorials/cdk-speedrun.md index b665221d1d1..5e5bb93bcca 100644 --- a/docs/connector-development/tutorials/cdk-speedrun.md +++ b/docs/connector-development/tutorials/cdk-speedrun.md @@ -6,6 +6,8 @@ This is a blazing fast guide to building an HTTP source connector. Think of it a If you are a visual learner and want to see a video version of this guide going over each part in detail, check it out below. +{% embed url="https://www.youtube.com/watch?v=kJ3hLoNfz\_E&t=3s" caption="A speedy CDK overview." %} + ## Dependencies 1. Python >= 3.7 @@ -38,7 +40,7 @@ cd source_python_http_example We're working with the PokeAPI, so we need to define our input schema to reflect that. Open the `spec.json` file here and replace it with: -```json +```javascript { "documentationUrl": "https://docs.airbyte.io/integrations/sources/pokeapi", "connectionSpecification": { @@ -58,10 +60,10 @@ We're working with the PokeAPI, so we need to define our input schema to reflect } } ``` + As you can see, we have one input to our input schema, which is `pokemon_name`, which is required. Normally, input schemas will contain information such as API keys and client secrets that need to get passed down to all endpoints or streams. -Ok, let's write a function that checks the inputs we just defined. Nuke the `source.py` file. Now add this code to it. For a crucial time skip, we're going to define all the imports we need in the future here. Also note -that your `AbstractSource` class name must be a camel-cased version of the name you gave in the generation phase. In our case, this is `SourcePythonHttpExample`. +Ok, let's write a function that checks the inputs we just defined. Nuke the `source.py` file. Now add this code to it. For a crucial time skip, we're going to define all the imports we need in the future here. Also note that your `AbstractSource` class name must be a camel-cased version of the name you gave in the generation phase. In our case, this is `SourcePythonHttpExample`. ```python from typing import Any, Iterable, List, Mapping, MutableMapping, Optional, Tuple @@ -152,11 +154,9 @@ class Pokemon(HttpStream): return None # TODO ``` -Now download [this file](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/pokemon.json). Name it `pokemon.json` and place it in `/source_python_http_example/schemas`. +Now download [this file](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/pokemon.json). Name it `pokemon.json` and place it in `/source_python_http_example/schemas`. -This file defines your output schema for every endpoint that you want to implement. Normally, this will likely be the most time-consuming section of the connector development process, as it requires defining the output of the endpoint -exactly. This is really important, as Airbyte needs to have clear expectations for what the stream will output. Note that the name of this stream will be consistent in the naming of the JSON schema and the `HttpStream` class, as -`pokemon.json` and `Pokemon` respectively in this case. Learn more about schema creation [here](https://docs.airbyte.io/connector-development/cdk-python/full-refresh-stream#defining-the-streams-schema). +This file defines your output schema for every endpoint that you want to implement. Normally, this will likely be the most time-consuming section of the connector development process, as it requires defining the output of the endpoint exactly. This is really important, as Airbyte needs to have clear expectations for what the stream will output. Note that the name of this stream will be consistent in the naming of the JSON schema and the `HttpStream` class, as `pokemon.json` and `Pokemon` respectively in this case. Learn more about schema creation [here](https://docs.airbyte.io/connector-development/cdk-python/full-refresh-stream#defining-the-streams-schema). Test your discover function. You should receive a fairly large JSON object in return. @@ -213,8 +213,7 @@ class Pokemon(HttpStream): return None ``` -We now need a catalog that defines all of our streams. We only have one stream: `Pokemon`. Download that file [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/configured_catalog_pokeapi.json). Place it in `/sample_files` named as `configured_catalog.json`. More clearly, -this is where we tell Airbyte all the streams/endpoints we support for the connector and in which sync modes Airbyte can run the connector on. Learn more about the AirbyteCatalog [here](https://docs.airbyte.io/understanding-airbyte/beginners-guide-to-catalog) and learn more about sync modes [here](https://docs.airbyte.io/understanding-airbyte/connections#sync-modes). +We now need a catalog that defines all of our streams. We only have one stream: `Pokemon`. Download that file [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/configured_catalog_pokeapi.json). Place it in `/sample_files` named as `configured_catalog.json`. More clearly, this is where we tell Airbyte all the streams/endpoints we support for the connector and in which sync modes Airbyte can run the connector on. Learn more about the AirbyteCatalog [here](https://docs.airbyte.io/understanding-airbyte/beginners-guide-to-catalog) and learn more about sync modes [here](https://docs.airbyte.io/understanding-airbyte/connections#sync-modes). Let's read some data. diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/1-creating-the-source.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/1-creating-the-source.md index 50c09811d08..bead7be4942 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/1-creating-the-source.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/1-creating-the-source.md @@ -8,7 +8,7 @@ $ cd airbyte-integrations/connector-templates/generator # assumes you are starti $ ./generate.sh ``` -This will bring up an interactive helper application. Use the arrow keys to pick a template from the list. Select the `Python HTTP API Source` template and then input the name of your connector. The application will create a new directory in airbyte/airbyte-integrations/connectors/ with the name of your new connector. +This will bring up an interactive helper application. Use the arrow keys to pick a template from the list. Select the `Python HTTP API Source` template and then input the name of your connector. The application will create a new directory in airbyte/airbyte-integrations/connectors/ with the name of your new connector. For this walk-through we will refer to our source as `python-http-example`. The finalized source code for this tutorial can be found [here](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-python-http-tutorial). diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/6-read-data.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/6-read-data.md index 9f0674b39bc..6f6f5a09aa8 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/6-read-data.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/6-read-data.md @@ -24,9 +24,9 @@ Optionally, we can provide additional inputs to customize requests: Backoff policy options: -- `retry_factor` Specifies factor for exponential backoff policy (by default is 5) -- `max_retries` Specifies maximum amount of retries for backoff policy (by default is 5) -- `raise_on_http_errors` If set to False, allows opting-out of raising HTTP code exception (by default is True) +* `retry_factor` Specifies factor for exponential backoff policy \(by default is 5\) +* `max_retries` Specifies maximum amount of retries for backoff policy \(by default is 5\) +* `raise_on_http_errors` If set to False, allows opting-out of raising HTTP code exception \(by default is True\) There are many other customizable options - you can find them in the [`airbyte_cdk.sources.streams.http.HttpStream`](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/streams/http/http.py) class. diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/8-test-your-connector.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/8-test-your-connector.md index 8c4a3a53156..970bdcf0fe0 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/8-test-your-connector.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/8-test-your-connector.md @@ -12,7 +12,7 @@ Place any integration tests in the `integration_tests` directory such that they ## Standard Tests -Standard tests are a fixed set of tests Airbyte provides that every Airbyte source connector must pass. While they're only required if you intend to submit your connector to Airbyte, you might find them helpful in any case. See [Testing your connectors](../../testing-connectors/README.md) +Standard tests are a fixed set of tests Airbyte provides that every Airbyte source connector must pass. While they're only required if you intend to submit your connector to Airbyte, you might find them helpful in any case. See [Testing your connectors](../../testing-connectors/) If you want to submit this connector to become a default connector within Airbyte, follow steps 8 onwards from the [Python source checklist](../building-a-python-source.md#step-8-set-up-standard-tests) diff --git a/docs/connector-development/tutorials/cdk-tutorial-python-http/README.md b/docs/connector-development/tutorials/cdk-tutorial-python-http/README.md index 1faf1a277c0..7339bcb01a2 100644 --- a/docs/connector-development/tutorials/cdk-tutorial-python-http/README.md +++ b/docs/connector-development/tutorials/cdk-tutorial-python-http/README.md @@ -1,2 +1,2 @@ -# Creating an HTTP API Source with the Python CDK +# Python CDK: Creating a HTTP API Source diff --git a/docs/contributing-to-airbyte/README.md b/docs/contributing-to-airbyte/README.md index adb4ea4f95d..5b5339b81e6 100644 --- a/docs/contributing-to-airbyte/README.md +++ b/docs/contributing-to-airbyte/README.md @@ -28,13 +28,13 @@ Here is a list of easy [good first issues](https://github.com/airbytehq/airbyte/ It's easy to add your own connector to Airbyte! **Since Airbyte connectors are encapsulated within Docker containers, you can use any language you like.** Here are some links on how to add sources and destinations. We haven't built the documentation for all languages yet, so don't hesitate to reach out to us if you'd like help developing connectors in other languages. -For sources, simply head over to our [Python CDK](../connector-development/cdk-python/README.md). +For sources, simply head over to our [Python CDK](../connector-development/cdk-python/). {% hint style="info" %} The CDK currently does not support creating destinations, but it will very soon. {% endhint %} -* See [Building new connectors](../connector-development/README.md) to get started. +* See [Building new connectors](../connector-development/) to get started. * Since we frequently build connectors in Python, on top of Singer or in Java, we've created generator libraries to get you started quickly: [Build Python Source Connectors](../connector-development/tutorials/building-a-python-source.md) and [Build Java Destination Connectors](../connector-development/tutorials/building-a-java-destination.md) * Integration tests \(tests that run a connector's image against an external resource\) can be run one of three ways, as detailed [here](../connector-development/testing-connectors/source-acceptance-tests-reference.md) @@ -72,7 +72,7 @@ First, a big thank you! A few things to keep in mind when contributing code: * If you're working on an issue, please comment that you are doing so to prevent duplicate work by others also. * Rebase master with your branch before submitting a pull request. -Here are some details about [our review process](#review-process). +Here are some details about [our review process](./#review-process). ### **Upvoting issues, feature and connector requests** diff --git a/docs/contributing-to-airbyte/code-style.md b/docs/contributing-to-airbyte/code-style.md index c486c9f97f7..71ee349b202 100644 --- a/docs/contributing-to-airbyte/code-style.md +++ b/docs/contributing-to-airbyte/code-style.md @@ -16,6 +16,6 @@ Install it in IntelliJ:‌ 2. Select the file we just downloaded 3. Select `GoogleStyle` in the drop down 4. Change default `Hard wrap at` in `Wrapping and Braces` tab to **150**. -5. We prefer `import foo.bar.ClassName` over `import foo.bar.*`. Even in cases where we import multiple classes from the same package. This can be set by going to `Preferences > Code Style > Java > Imports` and changing `Class count to use import with '*'` to 9999 and `Names count to use static import with '*' to 9999. +5. We prefer `import foo.bar.ClassName` over `import foo.bar.*`. Even in cases where we import multiple classes from the same package. This can be set by going to `Preferences > Code Style > Java > Imports` and changing `Class count to use import with '*'` to 9999 and \`Names count to use static import with '\*' to 9999. 6. You're done! diff --git a/docs/contributing-to-airbyte/developing-locally.md b/docs/contributing-to-airbyte/developing-locally.md index eca32f629ce..e95ba817f2c 100644 --- a/docs/contributing-to-airbyte/developing-locally.md +++ b/docs/contributing-to-airbyte/developing-locally.md @@ -28,7 +28,7 @@ To start contributing: ## Build with `gradle` -To compile and build just the platform (not all the connectors): +To compile and build just the platform \(not all the connectors\): ```bash SUB_BUILD=PLATFORM ./gradlew build @@ -38,7 +38,6 @@ This will build all the code and run all the unit tests. `SUB_BUILD=PLATFORM ./gradlew build` creates all the necessary artifacts \(Webapp, Jars and Docker images\) so that you can run Airbyte locally. Since this builds everything, it can take some time. - {% hint style="info" %} Gradle will use all CPU cores by default. If Gradle uses too much/too little CPU, tuning the number of CPU cores it uses to better suit a dev's need can help. diff --git a/docs/contributing-to-airbyte/developing-on-kubernetes.md b/docs/contributing-to-airbyte/developing-on-kubernetes.md index ff40c7d549d..023479cd3e6 100644 --- a/docs/contributing-to-airbyte/developing-on-kubernetes.md +++ b/docs/contributing-to-airbyte/developing-on-kubernetes.md @@ -1,14 +1,15 @@ -# Developing On Kubernetes +# Developing on Kubernetes -Make sure to read [our docs for developing locally](./developing-locally.md) first. +Make sure to read [our docs for developing locally](developing-locally.md) first. ## Architecture ![Airbyte on Kubernetes](../.gitbook/assets/contributing-to-airbyte-k8s-architecture.png) -## Iteration Cycle (Locally) +## Iteration Cycle \(Locally\) If you're developing locally using Minikube/Docker Desktop/Kind, you can iterate with the following series of commands: + ```bash ./gradlew composeBuild # build dev images kubectl delete -k kube/overlays/dev # optional (allows you to recreate resources from scratch) @@ -18,18 +19,16 @@ kubectl port-forward svc/airbyte-webapp-svc 8000:80 # port forward the api/ui ## Iteration Cycle \(on GKE\) -The process is similar to developing on a local cluster, except you will need to build the local version and push it to your own container -registry with names such as `your-registry/scheduler`. Then you will need to configure an overlay to override the name of images and apply -your overlay with `kubectl apply -k `. +The process is similar to developing on a local cluster, except you will need to build the local version and push it to your own container registry with names such as `your-registry/scheduler`. Then you will need to configure an overlay to override the name of images and apply your overlay with `kubectl apply -k `. We are [working to improve this process](https://github.com/airbytehq/airbyte/issues/4225). ## Completely resetting a local cluster -In most cases, running `kubectl delete -k kube/overlays/dev` is sufficient to remove the core Airbyte-related components. However, if you are in a dev environment on a local cluster only running Airbyte and want to start **completely from scratch** (removing all PVCs, pods, completed pods, etc.), you can use the following command -to destroy everything on the cluster: +In most cases, running `kubectl delete -k kube/overlays/dev` is sufficient to remove the core Airbyte-related components. However, if you are in a dev environment on a local cluster only running Airbyte and want to start **completely from scratch** \(removing all PVCs, pods, completed pods, etc.\), you can use the following command to destroy everything on the cluster: ```bash # BE CAREFUL, THIS COMMAND DELETES ALL RESOURCES, EVEN NON-AIRBYTE ONES! kubectl delete "$(kubectl api-resources --namespaced=true --verbs=delete -o name | tr "\n" "," | sed -e 's/,$//')" --all ``` + diff --git a/docs/contributing-to-airbyte/gradle-cheatsheet.md b/docs/contributing-to-airbyte/gradle-cheatsheet.md index eb3ba5515e1..31e4dec1350 100644 --- a/docs/contributing-to-airbyte/gradle-cheatsheet.md +++ b/docs/contributing-to-airbyte/gradle-cheatsheet.md @@ -1,7 +1,7 @@ # Gradle Cheatsheet - ## Overview + We have 3 ways of slicing our builds: 1. **Build Everything**: Including every single connectors. @@ -12,39 +12,43 @@ We have 3 ways of slicing our builds: In our CI we run **Build Platform** and **Build Connectors Base**. Then separately, on a regular cadence, we build each connector and run its integration tests. -We split Build Platform and Build Connectors Base from each other for a few reasons: -1. The tech stacks are very different. The Platform is almost entirely Java. Because of differing needs around separating environments, the Platform build can be optimized separately from the Connectors one. -2. We want to the iteration cycles of people working on connectors or the platform faster _and_ independent. e.g. Before this change someone working on a Platform feature needs to run formatting on the entire codebase (including connectors). This led to a lot of cosmetic build failures that obfuscated actually problems. Ideally a failure on the connectors side should not block progress on the platform side. -3. The lifecycles are different. One can safely release the Platform even if parts of Connectors Base is failing (and vice versa). +We split Build Platform and Build Connectors Base from each other for a few reasons: 1. The tech stacks are very different. The Platform is almost entirely Java. Because of differing needs around separating environments, the Platform build can be optimized separately from the Connectors one. 2. We want to the iteration cycles of people working on connectors or the platform faster _and_ independent. e.g. Before this change someone working on a Platform feature needs to run formatting on the entire codebase \(including connectors\). This led to a lot of cosmetic build failures that obfuscated actually problems. Ideally a failure on the connectors side should not block progress on the platform side. 3. The lifecycles are different. One can safely release the Platform even if parts of Connectors Base is failing \(and vice versa\). Future Work: The next step here is to figure out how to more formally split connectors and platform. Right now we exploit behavior in `settings.gradle` to separate them. This is not a best practice. Ultimately, we want these two builds to be totally separate. We do not know what that will look like yet. ## Cheatsheet + Here is a cheatsheet for common gradle commands. ### Basic Build Syntax + Here is the syntax for running gradle commands on the different parts of the code base that we called out above. #### Build Everything -```shell + +```text ./gradlew ``` #### Build Platform -```shell + +```text SUB_BUILD=PLATFORM ./gradlew ``` #### Build Connectors Base -```shell + +```text SUB_BUILD=CONNECTORS_BASE ./gradlew ``` ### Build -In order to "build" the project. This task includes producing all artifacts and running unit tests (anything called in the `:test` task). It does _not_ include integration tests (anything called in the `:integrationTest` task). + +In order to "build" the project. This task includes producing all artifacts and running unit tests \(anything called in the `:test` task\). It does _not_ include integration tests \(anything called in the `:integrationTest` task\). For example all the following are valid. -```shell + +```text ./gradlew build SUB_BUILD=PLATFORM ./gradlew build SUB_BUILD=CONNECTORS_BASE ./gradlew build @@ -52,10 +56,11 @@ SUB_BUILD=CONNECTORS_BASE ./gradlew build ### Formatting -The build system has a custom task called `format`. It is not called as part of `build`. If the command is called on a subset of the project, it will (mostly) target just the included modules. The exception is that `spotless` (a gradle formatter) will always format any file types that it is configured to manage regardless of which sub build is run. `spotless` is relatively fast, so this should not be too much of an annoyance. It can lead to formatting changes in unexpected parts of the code base. +The build system has a custom task called `format`. It is not called as part of `build`. If the command is called on a subset of the project, it will \(mostly\) target just the included modules. The exception is that `spotless` \(a gradle formatter\) will always format any file types that it is configured to manage regardless of which sub build is run. `spotless` is relatively fast, so this should not be too much of an annoyance. It can lead to formatting changes in unexpected parts of the code base. For example all the following are valid. -```shell + +```text ./gradlew format SUB_BUILD=PLATFORM ./gradlew format SUB_BUILD=CONNECTORS_BASE ./gradlew format @@ -64,44 +69,53 @@ SUB_BUILD=CONNECTORS_BASE ./gradlew format ### Platform-Specific Commands #### Build Artifacts + This command just builds the docker images that are used as artifacts in the platform. It bypasses running tests. -```shell +```text SUB_BUILD=PLATFORM ./gradlew composeBuild ``` #### Running Tests + The Platform has 3 different levels of tests: Unit Tests, Acceptance Tests, Frontend Acceptance Tests. -##### Unit Tests -Unit Tests can be run using the `:test` task on any submodule. These test class-level behavior. They should avoid using external resources (e.g. calling staging services or pulling resources from the internet). We do allow these tests to spin up local resources (usually in docker containers). For example, we use test containers frequently to spin up test postgres databases. +**Unit Tests** + +Unit Tests can be run using the `:test` task on any submodule. These test class-level behavior. They should avoid using external resources \(e.g. calling staging services or pulling resources from the internet\). We do allow these tests to spin up local resources \(usually in docker containers\). For example, we use test containers frequently to spin up test postgres databases. + +**Acceptance Tests** -##### Acceptance Tests We split Acceptance Tests into 2 different test suites: + * Platform Acceptance Tests: These tests are a coarse test to sanity check that each major feature in the platform. They are run with the following command: `SUB_BUILD=PLATFORM ./gradlew :airbyte-tests:acceptanceTests`. These tests expect to find a local version of Airbyte running. For testing the docker version start Airbyte locally. For an example, see the [script](https://github.com/airbytehq/airbyte/blob/master/tools/bin/acceptance_test.sh) that is used by the CI. For Kubernetes, see the [script](https://github.com/airbytehq/airbyte/blob/master/tools/bin/acceptance_test_kube.sh) that is used by the CI. * Migration Acceptance Tests: These tests make sure the end-to-end process of migrating from one version of Airbyte to the next works. These tests are run with the following command: `SUB_BUILD=PLATFORM ./gradlew :airbyte-tests:automaticMigrationAcceptanceTest --scan`. These tests do not expect there to be a separate deployment of Airbyte running. These tests currently all live in `airbyte-tests` -##### Frontend Acceptance Tests +**Frontend Acceptance Tests** + These are acceptance tests for the frontend. They are run with `SUB_BUILD=PLATFORM ./gradlew --no-daemon :airbyte-e2e-testing:e2etest`. Like the Platform Acceptance Tests, they expect Airbyte to be running locally. See the [script](https://github.com/airbytehq/airbyte/blob/master/tools/bin/e2e_test.sh) that is used by the CI. These tests currently all live in `airbyte-e2e-testing`. -##### Future Work -Our story around "integration testing" or "E2E testing" is a little ambiguous. Our Platform Acceptance Test Suite is getting somewhat unwieldy. It was meant to just be some coarse sanity checks, but over time we have found more need to test interactions between systems more granular. Whether we start supporting a separate class of tests (e.g. integration tests) or figure out how allow for more granular tests in the existing Acceptance Test framework is TBD. +**Future Work** -### Connectors-Specific Commands (Connector Development) +Our story around "integration testing" or "E2E testing" is a little ambiguous. Our Platform Acceptance Test Suite is getting somewhat unwieldy. It was meant to just be some coarse sanity checks, but over time we have found more need to test interactions between systems more granular. Whether we start supporting a separate class of tests \(e.g. integration tests\) or figure out how allow for more granular tests in the existing Acceptance Test framework is TBD. + +### Connectors-Specific Commands \(Connector Development\) #### Commands used in CI -All connectors, regardless of implementation language, implement the following interface to allow uniformity in the build system when run from CI: -**Build connector, run unit tests, and build Docker image**: `./gradlew :airbyte-integrations:connectors::build` -**Run integration tests**: `./gradlew :airbyte-integrations:connectors::integrationTest` +All connectors, regardless of implementation language, implement the following interface to allow uniformity in the build system when run from CI: + +**Build connector, run unit tests, and build Docker image**: `./gradlew :airbyte-integrations:connectors::build` **Run integration tests**: `./gradlew :airbyte-integrations:connectors::integrationTest` #### Python -The ideal end state for a Python connector developer is that they shouldn't have to know Gradle exists. + +The ideal end state for a Python connector developer is that they shouldn't have to know Gradle exists. We're almost there, but today there is only one Gradle command that's needed when developing in Python, used for formatting code. **Formatting python module**: `./gradlew :airbyte-integrations:connectors::airbytePythonFormat` + diff --git a/docs/contributing-to-airbyte/monorepo-python-development.md b/docs/contributing-to-airbyte/monorepo-python-development.md index e1c809cb30b..bce42d17d51 100644 --- a/docs/contributing-to-airbyte/monorepo-python-development.md +++ b/docs/contributing-to-airbyte/monorepo-python-development.md @@ -4,15 +4,18 @@ This guide contains instructions on how to setup Python with Gradle within the A ## Python Connector Development -Before working with connectors written in Python, we recommend running +Before working with connectors written in Python, we recommend running + ```bash ./gradlew :airbyte-integrations:connectors::build ``` + e.g ```bash ./gradlew :airbyte-integrations:connectors:source-postgres:build ``` + from the root project directory. This will create a `virtualenv` and install dependencies for the connector you want to work on as well as any internal Airbyte python packages it depends on. When iterating on a single connector, you will often iterate by running @@ -30,37 +33,45 @@ This command will: 2. [isort](https://pypi.org/project/isort/) to sort imports 3. [Flake8](https://pypi.org/project/flake8/) to check formatting 4. [MyPy](https://pypi.org/project/mypy/) to check type usage - + ## Formatting/linting -To format and lint your code before commit you can use the Gradle command above, but for convenience we support [pre-commit](https://pre-commit.com/) tool. -To use it you need to install it first: + +To format and lint your code before commit you can use the Gradle command above, but for convenience we support [pre-commit](https://pre-commit.com/) tool. To use it you need to install it first: + ```bash pip install pre-commit -``` -then, to install `pre-commit` as a git hook, run ``` + +then, to install `pre-commit` as a git hook, run + +```text pre-commit install ``` + That's it, `pre-commit` will format/lint the code every time you commit something. You find more information about pre-commit [here](https://pre-commit.com/). ## IDE + At Airbyte, we use IntelliJ IDEA for development. Although it is possible to develop connectors with any IDE, we typically recommend IntelliJ IDEA or PyCharm, since we actively work towards compatibility. ### Autocompletion + Install the [Pydantic](https://plugins.jetbrains.com/plugin/12861-pydantic) plugin. This will help autocompletion with some of our internal types. -### PyCharm (ItelliJ IDEA) +### PyCharm \(ItelliJ IDEA\) + The following setup steps are written for PyCharm but should have similar equivalents for IntelliJ IDEA: -2. Go to `File -> New -> Project...` -3. Select `Pure Python`. -4. Select a project name like `airbyte` and a directory **outside of** the `airbyte` code root. -5. Go to `Prefferences -> Project -> Python Interpreter` -6. Find a gear ⚙️ button next to `Python interpreter` dropdown list, click and select `Add` -7. Select `Virtual Environment -> Existing` -8. Set the interpreter path to the one that was created by Gradle command, i.e. `airbyte-integrations/connectors/your-connector-dir/.venv/bin/python`. -9. Wait for PyCharm to finish indexing and loading skeletons from selected virtual environment. +1. Go to `File -> New -> Project...` +2. Select `Pure Python`. +3. Select a project name like `airbyte` and a directory **outside of** the `airbyte` code root. +4. Go to `Prefferences -> Project -> Python Interpreter` +5. Find a gear ⚙️ button next to `Python interpreter` dropdown list, click and select `Add` +6. Select `Virtual Environment -> Existing` +7. Set the interpreter path to the one that was created by Gradle command, i.e. `airbyte-integrations/connectors/your-connector-dir/.venv/bin/python`. +8. Wait for PyCharm to finish indexing and loading skeletons from selected virtual environment. You should now have access to code completion and proper syntax highlighting for python projects. If you need to work on another connector you can quickly change the current virtual environment in the bottom toolbar. + diff --git a/docs/contributing-to-airbyte/updating-documentation.md b/docs/contributing-to-airbyte/updating-documentation.md index d0a122d183f..b3e37fb4b4c 100644 --- a/docs/contributing-to-airbyte/updating-documentation.md +++ b/docs/contributing-to-airbyte/updating-documentation.md @@ -4,12 +4,13 @@ Our documentation uses [GitBook](https://gitbook.com), and all the [Markdown](ht ## Workflow for updating docs -1. Modify docs using Git or the Github UI (All docs live in the `docs/` folder in the [Airbyte repository](https://github.com/airbytehq/airbyte)) +1. Modify docs using Git or the Github UI \(All docs live in the `docs/` folder in the [Airbyte repository](https://github.com/airbytehq/airbyte)\) 2. If you're adding new files, update `docs/SUMMARY.md`. 3. If you're moving existing pages, add redirects in the [`.gitbook.yaml` file](https://github.com/airbytehq/airbyte/blob/master/.gitbook.yaml) in the Airbyte repository root directory 4. Create a Pull Request ### Modify in the Github UI + 1. Directly edit the docs you want to edit [in the Github UI](https://docs.github.com/en/github/managing-files-in-a-repository/managing-files-on-github/editing-files-in-your-repository) 2. Create a Pull Request @@ -22,20 +23,22 @@ Our documentation uses [GitBook](https://gitbook.com), and all the [Markdown](ht git clone git@github.com:{YOUR_USERNAME}/airbyte.git cd airbyte ``` + Or + ```bash git clone https://github.com/{YOUR_USERNAME}/airbyte.git cd airbyte ``` - {% hint style="info" %} + While cloning on Windows, you might encounter errors about long filenames. Refer to the instructions [here](../deploying-airbyte/local-deployment.md#handling-long-filename-error) to correct it. - {% endhint %} 3. Modify the documentation. 4. Create a pull request ## Documentation Best Practices -Connectors typically have the following documentation elements: + +Connectors typically have the following documentation elements: * READMEs * Changelogs @@ -43,86 +46,91 @@ Connectors typically have the following documentation elements: * Source code comments * How-to guides -Below are some best practices related to each of these. +Below are some best practices related to each of these. ### READMEs + Every module should have a README containing: * A brief description of the module -* development pre-requisites (like which language or binaries are required for development) +* development pre-requisites \(like which language or binaries are required for development\) * how to install dependencies * how to build and run the code locally & via Docker * any other information needed for local iteration - + ### Changelogs -##### Core +**Core** + Core changelogs should be updated in the `docs/project-overview/platform.md` file. #### Connectors -Each connector should have a CHANGELOG.md section in its public facing docs in the `docs/integrations//` at the bottom of the page. Inside, each new connector version should have a section whose title is the connector's version number. The body of this section should describe the changes added in the new version. For example: -``` +Each connector should have a CHANGELOG.md section in its public facing docs in the `docs/integrations//` at the bottom of the page. Inside, each new connector version should have a section whose title is the connector's version number. The body of this section should describe the changes added in the new version. For example: + +```text | Version | Date | Pull Request | Subject | | :------ | :-------- | :----- | :------ | | 0.2.0 | 20XX-05-XX | [PR2#](https://github.com/airbytehq/airbyte/pull/PR2#) | Fixed bug with schema generation

Added a better description for the `password` input parameter | | 0.1.0 | 20XX-04-XX | [PR#](https://github.com/airbytehq/airbyte/pull/PR#) | Added incremental sync | ``` - + ### Source code comments -It's hard to pin down exactly what to do around source code comments, but there are two (very subjective) and rough guidelines: + +It's hard to pin down exactly what to do around source code comments, but there are two \(very subjective\) and rough guidelines: **If something is not obvious, write it down**. Examples include: * non-trivial class definitions should have docstrings -* magic variables should have comments explaining why those values are used (e.g: if using a page size of 10 in a connector, describe why if possible. If there is no reason, that's also fine, just mention in a comment). +* magic variables should have comments explaining why those values are used \(e.g: if using a page size of 10 in a connector, describe why if possible. If there is no reason, that's also fine, just mention in a comment\). * Complicated subroutines/logic which cannot be refactored should have comments explaining what they are doing and why - -**If something is obvious, don't write it down** since it's probably more likely to go out of date. For example, a comment like `x = 42; // sets x to 42` is not adding any new information and is therefore better omitted. -### Issues & Pull Requests +**If something is obvious, don't write it down** since it's probably more likely to go out of date. For example, a comment like `x = 42; // sets x to 42` is not adding any new information and is therefore better omitted. + +### Issues & Pull Requests #### Titles -**Describe outputs, not implementation**: An issue or PR title should describe the desired end result, not the implementation. The exception is child issues/subissues of an epic. -**Be specific about the domain**. Airbyte operates a monorepo, so being specific about what is being changed in the PR or issue title is important. +**Describe outputs, not implementation**: An issue or PR title should describe the desired end result, not the implementation. The exception is child issues/subissues of an epic. **Be specific about the domain**. Airbyte operates a monorepo, so being specific about what is being changed in the PR or issue title is important. -Some examples: -_subpar issue title_: `Remove airbyteCdk.dependsOn("unrelatedPackage")`. This describes a solution not a problem. +Some examples: _subpar issue title_: `Remove airbyteCdk.dependsOn("unrelatedPackage")`. This describes a solution not a problem. -_good issue title_: `Building the Airbyte Python CDK should not build unrelated packages`. Describes desired end state and the intent is understandable without reading the full issue. +_good issue title_: `Building the Airbyte Python CDK should not build unrelated packages`. Describes desired end state and the intent is understandable without reading the full issue. _subpar PR title_: `Update tests`. Which tests? What was the update? - -_good PR title_: `Source MySQL: update acceptance tests to connect to SSL-enabled database`. Specific about the domain and change that was made. -**PR title conventions** -When creating a PR, follow the naming conventions depending on the change being made: +_good PR title_: `Source MySQL: update acceptance tests to connect to SSL-enabled database`. Specific about the domain and change that was made. -* Notable updates to Airbyte Core: "🎉" - * e.g: `🎉 enable configuring un-nesting in normalization` -* New connectors: “🎉 New source or destination: ” e.g: `🎉 New Source: Okta` -* New connector features: “🎉 : E.g: - * `🎉 Destination Redshift: write JSONs as SUPER type instead of VARCHAR` - * `🎉 Source MySQL: enable logical replication` +**PR title conventions** When creating a PR, follow the naming conventions depending on the change being made: + +* Notable updates to Airbyte Core: "🎉" + * e.g: `🎉 enable configuring un-nesting in normalization` +* New connectors: “🎉 New source or destination: ” e.g: `🎉 New Source: Okta` +* New connector features: “🎉 : E.g: + * `🎉 Destination Redshift: write JSONs as SUPER type instead of VARCHAR` + * `🎉 Source MySQL: enable logical replication` * Bugfixes should start with the 🐛 emoji - * `🐛 Source Facebook Marketing: fix incorrect parsing of lookback window` + * `🐛 Source Facebook Marketing: fix incorrect parsing of lookback window` * Documentation improvements should start with any of the book/paper emojis: 📚 📝 etc… -* Any refactors, cleanups, etc.. that are not visible improvements to the user should not have emojis +* Any refactors, cleanups, etc.. that are not visible improvements to the user should not have emojis -The emojis help us identify which commits should be included in the product release notes. +The emojis help us identify which commits should be included in the product release notes. -#### Descriptions -**Context**: Provide enough information (or a link to enough information) in the description so team members with no context can understand what the issue or PR is trying to accomplish. This usually means you should include two things: +#### Descriptions + +**Context**: Provide enough information \(or a link to enough information\) in the description so team members with no context can understand what the issue or PR is trying to accomplish. This usually means you should include two things: 1. Some background information motivating the problem 2. A description of the problem itself 3. Good places to start reading and file changes that can be skipped -Some examples: -_insufficient context_: `Create an OpenAPI to JSON schema generator`. Unclear what the value or problem being solved here is. + Some examples: + +_insufficient context_: `Create an OpenAPI to JSON schema generator`. Unclear what the value or problem being solved here is. _good context_: -``` + +```text When creating or updating connectors, we spend a lot of time manually transcribing JSON Schema files based on OpenAPI docs. This is ncessary because OpenAPI and JSON schema are very similar but not perfectly compatible. This process is automatable. Therefore we should create a program which converts from OpenAPI to JSONSchema format. -``` +``` + diff --git a/docs/deploying-airbyte/local-deployment.md b/docs/deploying-airbyte/local-deployment.md index 0add0111026..5084e8d33d3 100644 --- a/docs/deploying-airbyte/local-deployment.md +++ b/docs/deploying-airbyte/local-deployment.md @@ -26,10 +26,9 @@ We recommend following [this guide](https://docs.docker.com/docker-for-windows/i **I have a Mac with the M1 chip. Is it possible to run Airbyte?** -Some users using Macs with an M1 chip are facing some problems running Airbyte. -The problem is related with the chip and Docker. [Issue #2017](https://github.com/airbytehq/airbyte/issues/2017) was created to follow up the problem, you can subscribe to it and get updates about the resolution. -If you can successfully run Airbyte using a MacBook with the M1 chip, let us know so that we can share the process with the community! +Some users using Macs with an M1 chip are facing some problems running Airbyte. The problem is related with the chip and Docker. [Issue \#2017](https://github.com/airbytehq/airbyte/issues/2017) was created to follow up the problem, you can subscribe to it and get updates about the resolution. If you can successfully run Airbyte using a MacBook with the M1 chip, let us know so that we can share the process with the community! **Other issues** If you encounter any issues, just connect to our [Slack](https://slack.airbyte.io). Our community will help! We also have a [troubleshooting](../troubleshooting/on-deploying.md) section in our docs for common problems. + diff --git a/docs/deploying-airbyte/on-cloud.md b/docs/deploying-airbyte/on-cloud.md index 09ea060a4de..6d7fa52f913 100644 --- a/docs/deploying-airbyte/on-cloud.md +++ b/docs/deploying-airbyte/on-cloud.md @@ -11,7 +11,7 @@ Airbyte Cloud requires no setup and can be immediately run from your web browser If you don't have an invite, sign up [here!](https://airbyte.io/cloud-waitlist) **2. Click on the default workspace.** - + You will be provided 1000 credits to get your first few syncs going! ![](../.gitbook/assets/cloud_onboarding.png) @@ -21,3 +21,4 @@ You will be provided 1000 credits to get your first few syncs going! ![](../.gitbook/assets/cloud_connection_onboarding.png) **4. You're done!** + diff --git a/docs/deploying-airbyte/on-kubernetes.md b/docs/deploying-airbyte/on-kubernetes.md index 3d4f03de3d8..eece6c1f7ad 100644 --- a/docs/deploying-airbyte/on-kubernetes.md +++ b/docs/deploying-airbyte/on-kubernetes.md @@ -1,14 +1,16 @@ -# On Kubernetes +# On Kubernetes \(Beta\) ## Overview -Airbyte allows scaling sync workloads horizontally using Kubernetes. The core components (api server, scheduler, etc) run as deployments while the scheduler launches connector-related pods on different nodes. +Airbyte allows scaling sync workloads horizontally using Kubernetes. The core components \(api server, scheduler, etc\) run as deployments while the scheduler launches connector-related pods on different nodes. ## Getting Started ### Cluster Setup + For local testing we recommend following one of the following setup guides: -* [Docker Desktop (Mac)](https://docs.docker.com/desktop/kubernetes/) + +* [Docker Desktop \(Mac\)](https://docs.docker.com/desktop/kubernetes/) * [Minikube](https://minikube.sigs.k8s.io/docs/start/) * NOTE: Start Minikube with at least 4gb RAM with `minikube start --memory=4000` * [Kind](https://kind.sigs.k8s.io/docs/user/quick-start/) @@ -17,8 +19,7 @@ For testing on GKE you can [create a cluster with the command line or the Cloud For testing on EKS you can [install eksctl](https://eksctl.io/introduction/) and run `eksctl create cluster` to create an EKS cluster/VPC/subnets/etc. This process should take 10-15 minutes. -For production, Airbyte should function on most clusters v1.19 and above. We have tested support on GKE and EKS. If you run into a problem starting -Airbyte, please reach out on the `#troubleshooting` channel on our [Slack](https://slack.airbyte.io/) or [create an issue on GitHub](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=). +For production, Airbyte should function on most clusters v1.19 and above. We have tested support on GKE and EKS. If you run into a problem starting Airbyte, please reach out on the `#troubleshooting` channel on our [Slack](https://slack.airbyte.io/) or [create an issue on GitHub](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=). ### Install `kubectl` @@ -31,7 +32,9 @@ Configure `kubectl` to connect to your cluster by using `kubectl use-context my- * For GKE * Configure `gcloud` with `gcloud auth login`. * On the Google Cloud Console, the cluster page will have a `Connect` button, which will give a command to run locally that looks like + `gcloud container clusters get-credentials CLUSTER_NAME --zone ZONE_NAME --project PROJECT_NAME`. + * Use `kubectl config get-contexts` to show the contexts available. * Run `kubectl use-context ` to access the cluster from `kubectl`. * For EKS @@ -43,16 +46,16 @@ Configure `kubectl` to connect to your cluster by using `kubectl use-context my- ### Configure Logs -Both `dev` and `stable` versions of Airbyte include a stand-alone `Minio` deployment. Airbyte publishes logs to this `Minio` deployment by default. -This means Airbyte comes as a **self-contained Kubernetes deployment - no other configuration is required**. +Both `dev` and `stable` versions of Airbyte include a stand-alone `Minio` deployment. Airbyte publishes logs to this `Minio` deployment by default. This means Airbyte comes as a **self-contained Kubernetes deployment - no other configuration is required**. -Airbyte currently supports logging to `Minio`, `S3` or `GCS`. The following instructions are for users wishing to log to their own `Minio` layer, `S3` bucket -or `GCS` bucket. +Airbyte currently supports logging to `Minio`, `S3` or `GCS`. The following instructions are for users wishing to log to their own `Minio` layer, `S3` bucket or `GCS` bucket. The provided credentials require both read and write permissions. The logger attempts to create the log bucket if it does not exist. #### Configuring Custom Minio Log Location + Replace the following variables in the `.env` file in the `kube/overlays/stable` directory: + ```text # The Minio bucket to write logs in. S3_LOG_BUCKET= @@ -63,10 +66,13 @@ AWS_SECRET_ACCESS_KEY= # Endpoint where Minio is deployed at. S3_MINIO_ENDPOINT= ``` + The `S3_PATH_STYLE_ACCESS` variable should remain `true`. The `S3_LOG_BUCKET_REGION` variable should remain empty. #### Configuring Custom S3 Log Location + Replace the following variables in the `.env` file in the `kube/overlays/stable` directory: + ```text # The S3 bucket to write logs in. S3_LOG_BUCKET= @@ -82,20 +88,22 @@ S3_MINIO_ENDPOINT= S3_PATH_STYLE_ACCESS= ``` -See [here](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) for instructions on creating an S3 bucket and -[here](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) for instructions on creating AWS credentials. +See [here](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) for instructions on creating an S3 bucket and [here](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) for instructions on creating AWS credentials. #### Configuring Custom GCS Log Location + Create the GCP service account with read/write permission to the GCS log bucket. -1) Base64 encode the GCP json secret. -``` +1\) Base64 encode the GCP json secret. + +```text # The output of this command will be a Base64 string. $ cat gcp.json | base64 ``` -2) Populate the gcs-log-creds secrets with the Base64-encoded credential. This is as simple as taking the encoded credential from the previous step -and adding it to the `secret-gcs-log-creds,yaml` file. -``` + +2\) Populate the gcs-log-creds secrets with the Base64-encoded credential. This is as simple as taking the encoded credential from the previous step and adding it to the `secret-gcs-log-creds,yaml` file. + +```text apiVersion: v1 kind: Secret metadata: @@ -105,28 +113,28 @@ data: gcp.json: ``` -3) Replace the following variables in the `.env` file in the `kube/overlays/stable` directory: -``` +3\) Replace the following variables in the `.env` file in the `kube/overlays/stable` directory: + +```text # The GCS bucket to write logs in. GCP_STORAGE_BUCKET= # The path the GCS creds are written to. Unless you know what you are doing, use the below default value. GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcs-log-creds/gcp.json ``` -See [here](https://cloud.google.com/storage/docs/creating-buckets) for instruction on creating a GCS bucket and -[here](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#iam-service-account-keys-create-console) for instruction on creating GCP credentials. +See [here](https://cloud.google.com/storage/docs/creating-buckets) for instruction on creating a GCS bucket and [here](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#iam-service-account-keys-create-console) for instruction on creating GCP credentials. ### Launch Airbyte Run the following commands to launch Airbyte: + ```text git clone https://github.com/airbytehq/airbyte.git cd airbyte kubectl apply -k kube/overlays/stable ``` -After 2-5 minutes, `kubectl get pods | grep airbyte` should show `Running` as the status for all the core Airbyte pods. This may take longer -on Kubernetes clusters with slow internet connections. +After 2-5 minutes, `kubectl get pods | grep airbyte` should show `Running` as the status for all the core Airbyte pods. This may take longer on Kubernetes clusters with slow internet connections. Run `kubectl port-forward svc/airbyte-webapp-svc 8000:80` to allow access to the UI/API. @@ -147,50 +155,40 @@ Now visit [http://localhost:8000](http://localhost:8000) in your browser and sta ### Increasing job parallelism -The number of simultaneous jobs (getting specs, checking connections, discovering schemas, and performing syncs) is limited by a few factors. First of all, the `SUBMITTER_NUM_THREADS` (set in the `.env` file for your Kustimization overlay) provides a global limit on the number of simultaneous jobs that can run across all worker pods. +The number of simultaneous jobs \(getting specs, checking connections, discovering schemas, and performing syncs\) is limited by a few factors. First of all, the `SUBMITTER_NUM_THREADS` \(set in the `.env` file for your Kustimization overlay\) provides a global limit on the number of simultaneous jobs that can run across all worker pods. -The number of worker pods can be changed by increasing the number of replicas for the `airbyte-worker` deployment. An example of a Kustomization patch that increases this number can be seen in `airbyte/kube/overlays/dev-integration-test/kustomization.yaml` and `airbyte/kube/overlays/dev-integration-test/parallelize-worker.yaml`. The number of simultaneous jobs on a specific worker pod is also limited by the number of ports exposed by the worker deployment and set by `TEMPORAL_WORKER_PORTS` in your `.env` file. Without additional ports used to communicate to connector pods, jobs will start to run but will hang until ports become available. +The number of worker pods can be changed by increasing the number of replicas for the `airbyte-worker` deployment. An example of a Kustomization patch that increases this number can be seen in `airbyte/kube/overlays/dev-integration-test/kustomization.yaml` and `airbyte/kube/overlays/dev-integration-test/parallelize-worker.yaml`. The number of simultaneous jobs on a specific worker pod is also limited by the number of ports exposed by the worker deployment and set by `TEMPORAL_WORKER_PORTS` in your `.env` file. Without additional ports used to communicate to connector pods, jobs will start to run but will hang until ports become available. -You can also tune environment variables for the max simultaneous job types that can run on the worker pod by setting `MAX_SPEC_WORKERS`, `MAX_CHECK_WORKERS`, `MAX_DISCOVER_WORKERS`, `MAX_SYNC_WORKERS` for the worker pod deployment (not in the `.env` file). These values can be used if you want to create separate worker deployments for separate types of workers with different resource allocations. +You can also tune environment variables for the max simultaneous job types that can run on the worker pod by setting `MAX_SPEC_WORKERS`, `MAX_CHECK_WORKERS`, `MAX_DISCOVER_WORKERS`, `MAX_SYNC_WORKERS` for the worker pod deployment \(not in the `.env` file\). These values can be used if you want to create separate worker deployments for separate types of workers with different resource allocations. ### Cloud logging -Airbyte writes logs to two directories. App logs, including server and scheduler logs, are written to the `app-logging` directory. -Job logs are written to the `job-logging` directory. Both directories live at the top-level e.g., the `app-logging` directory lives at -`s3://log-bucket/app-logging` etc. These paths can change, so we recommend having a dedicated log bucket, and to not use this bucket for other -purposes. +Airbyte writes logs to two directories. App logs, including server and scheduler logs, are written to the `app-logging` directory. Job logs are written to the `job-logging` directory. Both directories live at the top-level e.g., the `app-logging` directory lives at `s3://log-bucket/app-logging` etc. These paths can change, so we recommend having a dedicated log bucket, and to not use this bucket for other purposes. -Airbyte publishes logs every minute. This means it is normal to see minute-long log delays. Each publish creates it's own log file, since Cloud -Storages do not support append operations. This also mean it is normal to see hundreds of files in your log bucket. +Airbyte publishes logs every minute. This means it is normal to see minute-long log delays. Each publish creates it's own log file, since Cloud Storages do not support append operations. This also mean it is normal to see hundreds of files in your log bucket. -Each log file is named `{yyyyMMddHH24mmss}_{podname}_{UUID}` and is not compressed. Users can view logs simply by navigating to the relevant folder and -downloading the file for the time period in question. +Each log file is named `{yyyyMMddHH24mmss}_{podname}_{UUID}` and is not compressed. Users can view logs simply by navigating to the relevant folder and downloading the file for the time period in question. -See the [Known Issues](#known-issues) section for planned logging improvements. +See the [Known Issues](on-kubernetes.md#known-issues) section for planned logging improvements. ### Using an external DB -After [Issue #3605](https://github.com/airbytehq/airbyte/issues/3605) is completed, users will be able to configure custom dbs instead of a simple -`postgres` container running directly in Kubernetes. This separate instance (preferable on a system like AWS RDS or Google Cloud SQL) should be easier -and safer to maintain than Postgres on your cluster. +After [Issue \#3605](https://github.com/airbytehq/airbyte/issues/3605) is completed, users will be able to configure custom dbs instead of a simple `postgres` container running directly in Kubernetes. This separate instance \(preferable on a system like AWS RDS or Google Cloud SQL\) should be easier and safer to maintain than Postgres on your cluster. ## Known Issues -As we improve our Kubernetes offering, we would like to point out some common pain points. We are working on improving these. Please let us know if -there are any other issues blocking your adoption of Airbyte or if you would like to contribute fixes to address any of these issues. +As we improve our Kubernetes offering, we would like to point out some common pain points. We are working on improving these. Please let us know if there are any other issues blocking your adoption of Airbyte or if you would like to contribute fixes to address any of these issues. -* Some UI operations have higher latency on Kubernetes than Docker-Compose. ([#4233](https://github.com/airbytehq/airbyte/issues/4233)) -* Logging to Azure Storage is not supported. ([#4200](https://github.com/airbytehq/airbyte/issues/4200)) -* Large log files might take a while to load. ([#4201](https://github.com/airbytehq/airbyte/issues/4201)) -* UI does not include configured buckets in the displayed log path. ([#4204](https://github.com/airbytehq/airbyte/issues/4204)) -* Logs are not reset when Airbyte is re-deployed. ([#4235](https://github.com/airbytehq/airbyte/issues/4235)) +* Some UI operations have higher latency on Kubernetes than Docker-Compose. \([\#4233](https://github.com/airbytehq/airbyte/issues/4233)\) +* Logging to Azure Storage is not supported. \([\#4200](https://github.com/airbytehq/airbyte/issues/4200)\) +* Large log files might take a while to load. \([\#4201](https://github.com/airbytehq/airbyte/issues/4201)\) +* UI does not include configured buckets in the displayed log path. \([\#4204](https://github.com/airbytehq/airbyte/issues/4204)\) +* Logs are not reset when Airbyte is re-deployed. \([\#4235](https://github.com/airbytehq/airbyte/issues/4235)\) * File sources reading from and file destinations writing to local mounts are not supported on Kubernetes. ## Customizing Airbyte Manifests -We use [Kustomize](https://kustomize.io/) to allow overrides for different environments. Our shared resources are in the `kube/resources` directory, -and we define overlays for each environment. We recommend creating your own overlay if you want to customize your deployments. -This overlay can live in your own VCS. +We use [Kustomize](https://kustomize.io/) to allow overrides for different environments. Our shared resources are in the `kube/resources` directory, and we define overlays for each environment. We recommend creating your own overlay if you want to customize your deployments. This overlay can live in your own VCS. Example `kustomization.yaml` file: @@ -204,8 +202,7 @@ bases: ### View Raw Manifests -For a specific overlay, you can run `kubectl kustomize kube/overlays/stable` to view the manifests that Kustomize will apply to your Kubernetes cluster. -This is useful for debugging because it will show the exact resources you are defining. +For a specific overlay, you can run `kubectl kustomize kube/overlays/stable` to view the manifests that Kustomize will apply to your Kubernetes cluster. This is useful for debugging because it will show the exact resources you are defining. ### Helm Charts @@ -214,41 +211,47 @@ Check out the [Helm Chart Readme](https://github.com/airbytehq/airbyte/tree/mast ## Operator Guide ### View API Server Logs -`kubectl logs deployments/airbyte-server` to view real-time logs. Logs can also be downloaded as a text file via the Admin tab in the UI. + +`kubectl logs deployments/airbyte-server` to view real-time logs. Logs can also be downloaded as a text file via the Admin tab in the UI. ### View Scheduler or Job Logs + `kubectl logs deployments/airbyte-scheduler` to view real-time logs. Logs can also be downloaded as a text file via the Admin tab in the UI. ### Connector Container Logs -Although all logs can be accessed by viewing the scheduler logs, connector container logs may be easier to understand when isolated by accessing from -the Airbyte UI or the [Airbyte API](../api-documentation.md) for a specific job attempt. Connector pods launched by Airbyte will not relay logs directly -to Kubernetes logging. You must access these logs through Airbyte. + +Although all logs can be accessed by viewing the scheduler logs, connector container logs may be easier to understand when isolated by accessing from the Airbyte UI or the [Airbyte API](../api-documentation.md) for a specific job attempt. Connector pods launched by Airbyte will not relay logs directly to Kubernetes logging. You must access these logs through Airbyte. ### Upgrading Airbyte Kube + See [Upgrading K8s](../operator-guides/upgrading-airbyte.md). ### Resizing Volumes -To resize a volume, change the `.spec.resources.requests.storage` value. After re-applying, the mount should be extended if that operation is supported -for your type of mount. For a production deployment, it's useful to track the usage of volumes to ensure they don't run out of space. + +To resize a volume, change the `.spec.resources.requests.storage` value. After re-applying, the mount should be extended if that operation is supported for your type of mount. For a production deployment, it's useful to track the usage of volumes to ensure they don't run out of space. ### Copy Files To/From Volumes + See the documentation for [`kubectl cp`](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#cp). ### Listing Files + ```bash kubectl exec -it airbyte-scheduler-6b5747df5c-bj4fx ls /tmp/workspace/8 ``` ### Reading Files + ```bash kubectl exec -it airbyte-scheduler-6b5747df5c-bj4fx cat /tmp/workspace/8/0/logs.log ``` ### Persistent storage on GKE regional cluster -Running Airbyte on GKE regional cluster requires enabling persistent regional storage. To do so, enable [CSI driver](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver) -on GKE. After enabling, add `storageClassName: standard-rwo` to the [volume-configs](../../kube/resources/volume-configs.yaml) yaml. + +Running Airbyte on GKE regional cluster requires enabling persistent regional storage. To do so, enable [CSI driver](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver) on GKE. After enabling, add `storageClassName: standard-rwo` to the [volume-configs](https://github.com/airbytehq/airbyte/tree/86ee2ad05bccb4aca91df2fb07c412efde5ba71c/kube/resources/volume-configs.yaml) yaml. `volume-configs.yaml` example: + ```yaml apiVersion: v1 kind: PersistentVolumeClaim @@ -266,8 +269,10 @@ spec: ``` ## Troubleshooting -If you run into any problems operating Airbyte on Kubernetes, please reach out on the `#issues` channel on our [Slack](https://slack.airbyte.io/) or -[create an issue on GitHub](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=). + +If you run into any problems operating Airbyte on Kubernetes, please reach out on the `#issues` channel on our [Slack](https://slack.airbyte.io/) or [create an issue on GitHub](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=). ## Developing Airbyte on Kubernetes + [Read about the Kubernetes dev cycle!](https://docs.airbyte.io/contributing-to-airbyte/developing-on-kubernetes) + diff --git a/docs/deploying-airbyte/on-oci-vm.md b/docs/deploying-airbyte/on-oci-vm.md index 2e4d1672ad9..de7cdb6e7c0 100644 --- a/docs/deploying-airbyte/on-oci-vm.md +++ b/docs/deploying-airbyte/on-oci-vm.md @@ -1,25 +1,26 @@ -# Install Airbyte on Oracle Cloud Infrastructure (OCI) VM +# On Oracle Cloud Infrastructure VM Install Airbyte on Oracle Cloud Infrastructure VM running Oracle Linux 7 -## Create OCI Instance -Go to OCI Console > Compute > Instances > Create Instance +## Create OCI Instance + +Go to OCI Console > Compute > Instances > Create Instance ![](../.gitbook/assets/OCIScreen1.png) ![](../.gitbook/assets/OCIScreen2.png) - ## Whitelist Port 8000 for a CIDR range in Security List of OCI VM Subnet -Go to OCI Console > Networking > Virtual Cloud Network -Select the Subnet > Security List > Add Ingress Rules +Go to OCI Console > Networking > Virtual Cloud Network + +Select the Subnet > Security List > Add Ingress Rules ![](../.gitbook/assets/OCIScreen3.png) - ## Login to the Instance/VM with the SSH key and 'opc' user -``` + +```text chmod 600 private-key-file ssh -i private-key-file opc@oci-private-instance-ip -p 2200 @@ -37,52 +38,48 @@ sudo service docker start sudo usermod -a -G docker $USER - ### Install Docker Compose -sudo wget https://github.com/docker/compose/releases/download/1.26.2/docker-compose-$(uname -s)-$(uname -m) -O /usr/local/bin/docker-compose +sudo wget [https://github.com/docker/compose/releases/download/1.26.2/docker-compose-$\(uname](https://github.com/docker/compose/releases/download/1.26.2/docker-compose-$%28uname) -s\)-$\(uname -m\) -O /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose docker-compose --version - ### Install Airbyte mkdir airbyte && cd airbyte -wget https://raw.githubusercontent.com/airbytehq/airbyte/master/{.env,docker-compose.yaml} +wget [https://raw.githubusercontent.com/airbytehq/airbyte/master/{.env,docker-compose.yaml}](https://raw.githubusercontent.com/airbytehq/airbyte/master/{.env,docker-compose.yaml}) -which docker-compose +which docker-compose sudo /usr/local/bin/docker-compose up -d - - ## Create SSH Tunnel to Login to the Instance -it is highly recommended to not have a Public IP for the Instance where you are running Airbyte). +it is highly recommended to not have a Public IP for the Instance where you are running Airbyte\). ### SSH Local Port Forward to Airbyte VM From your local workstation -``` +```text ssh opc@bastion-host-public-ip -i -L 2200:oci-private-instance-ip:22 ssh opc@localhost -i -p 2200 ``` ### Airbyte GUI Local Port Forward to Airbyte VM -``` +```text ssh opc@bastion-host-public-ip -i -L 8000:oci-private-instance-ip:8000 ``` - ## Access Airbyte -Open URL in Browser : http://localhost:8000/ +Open URL in Browser : [http://localhost:8000/](http://localhost:8000/) ![](../.gitbook/assets/OCIScreen4.png) -/* Please note Airbyte currently does not support SSL/TLS certificates */ +/ _Please note Airbyte currently does not support SSL/TLS certificates_ / + diff --git a/docs/integrations/README.md b/docs/integrations/README.md index c160f916667..0ce0f7212f3 100644 --- a/docs/integrations/README.md +++ b/docs/integrations/README.md @@ -1,137 +1,141 @@ # Connector Catalog ## Connector grades + Airbyte uses a grading system for connectors to help users understand what to expect from a connector. There are three grades, explained below: **Certified**: This connector has been proven to be robust via usage by a large number of users and extensive testing. -**Beta**: While this connector is well tested and is expected to work a majority of the time, it was released recently. There may be some unhandled edge cases but Airbyte will provide very quick turnaround for support on any issues (we'll publish our target KPIs for support turnaround very soon). All beta connectors will make their way to certified status after enough field testing. +**Beta**: While this connector is well tested and is expected to work a majority of the time, it was released recently. There may be some unhandled edge cases but Airbyte will provide very quick turnaround for support on any issues \(we'll publish our target KPIs for support turnaround very soon\). All beta connectors will make their way to certified status after enough field testing. -**Alpha**: This connector is either not sufficiently tested, has extremely limited functionality (e.g: created as an example connector), or for any other reason may not be very mature. +**Alpha**: This connector is either not sufficiently tested, has extremely limited functionality \(e.g: created as an example connector\), or for any other reason may not be very mature. ### Sources + | Connector | Grade | -|----|----| -|[Amazon Seller Partner](./sources/amazon-seller-partner.md)| Alpha | -|[Amplitude](./sources/amplitude.md)| Beta | -|[Apify Dataset](./sources/apify-dataset.md)| Alpha | -|[Appstore](./sources/appstore.md)| Alpha | -|[Asana](./sources/asana.md) | Beta | -|[AWS CloudTrail](./sources/aws-cloudtrail.md)| Beta | -|[BambooHR](./sources/bamboo-hr.md)| Alpha | -|[Braintree](./sources/braintree.md)| Alpha | -|[BigCommerce](./sources/bigcommerce.md)| Alpha | -|[BigQuery](./sources/bigquery.md)| Beta | -|[Bing Ads](./sources/bing-ads.md)| Beta | -|[Cart](./sources/cart.md)| Beta | -|[Chargebee](./sources/chargebee.md)| Alpha | -|[ClickHouse](./sources/clickhouse.md)| Beta | -|[Close.com](./sources/close-com.md)| Beta | -|[CockroachDB](./sources/cockroachdb.md)| Beta | -|[Db2](./sources/db2.md)| Beta | -|[Dixa](./sources/dixa.md) | Alpha | -|[Drift](./sources/drift.md)| Beta | -|[Drupal](./sources/drupal.md)| Beta | -|[Exchange Rates API](./sources/exchangeratesapi.md)| Certified | -|[Facebook Marketing](./sources/facebook-marketing.md)| Beta | -|[Facebook Pages](./sources/facebook-pages.md)| Alpha | -|[Files](./sources/file.md)| Certified | -|[Freshdesk](./sources/freshdesk.md)| Certified | -|[GitHub](./sources/github.md)| Beta | -|[GitLab](./sources/gitlab.md)| Beta | -|[Google Ads](./sources/google-ads.md)| Beta | -|[Google Adwords](./sources/google-adwords.md)| Beta | -|[Google Analytics v4](./sources/google-analytics-v4.md)| Beta | -|[Google Directory](./sources/google-directory.md)| Certified | -|[Google Search Console](./sources/google-search-console.md)| Beta | -|[Google Sheets](./sources/google-sheets.md)| Certified | -|[Google Workspace Admin Reports](./sources/google-workspace-admin-reports.md)| Certified | -|[Greenhouse](./sources/greenhouse.md)| Beta | -|[Hubspot](./sources/hubspot.md)| Certified | -|[Instagram](./sources/instagram.md)| Certified | -|[Intercom](./sources/intercom.md)| Beta | -|[Iterable](./sources/iterable.md)| Beta | -|[Jira](./sources/jira.md)| Certified | -|[Klaviyo](./sources/klaviyo.md)| Beta | -|[Klaviyo](./sources/kustomer.md)| Alpha | -|[LinkedIn Ads](./sources/linkedin-ads.md)| Beta | -|[Kustomer](./sources/kustomer.md)| Alpha | -|[Lever Hiring](./sources/lever-hiring.md)| Beta | -|[Looker](./sources/looker.md)| Beta | -|[Magento](./sources/magento.md)| Beta | -|[Mailchimp](./sources/mailchimp.md)| Certified | -|[Marketo](./sources/marketo.md)| Beta | -|[Microsoft SQL Server \(MSSQL\)](./sources/mssql.md)| Certified | -|[Microsoft Dynamics AX](./sources/microsoft-dynamics-ax.md)| Beta | -|[Microsoft Dynamics Customer Engagement](./sources/microsoft-dynamics-customer-engagement.md)| Beta | -|[Microsoft Dynamics GP](./sources/microsoft-dynamics-gp.md)| Beta | -|[Microsoft Dynamics NAV](./sources/microsoft-dynamics-nav.md)| Beta | -|[Microsoft Teams](./sources/microsoft-teams.md)| Certified | -|[Mixpanel](./sources/mixpanel.md)| Beta | -|[Mongo DB](./sources/mongodb-v2.md)| Beta | -|[MySQL](./sources/mysql.md)| Certified | -|[Okta](./sources/okta.md)| Beta | -|[Oracle DB](./sources/oracle.md)| Certified | -|[Oracle PeopleSoft](./sources/oracle-peoplesoft.md)| Beta | -|[Oracle Siebel CRM](./sources/oracle-siebel-crm.md)| Beta | -|[PayPal Transaction](./sources/paypal-transaction.md)| Beta | -|[Pipedrive](./sources/pipedrive.md)| Alpha | -|[Plaid](./sources/plaid.md)| Alpha | -|[PokéAPI](./sources/pokeapi.md)| Beta | -|[Postgres](./sources/postgres.md)| Certified | -|[PostHog](./sources/posthog.md)| Beta | -|[PrestaShop](./sources/presta-shop.md)| Beta | -|[Quickbooks](./sources/quickbooks.md)| Beta | -|[Recharge](./sources/recharge.md)| Beta | -|[Recurly](./sources/recurly.md)| Beta | -|[Redshift](./sources/redshift.md)| Certified | -|[S3](./sources/s3.md)| Alpha | -|[Salesforce](./sources/salesforce.md)| Certified | -|[SAP Business One](./sources/sap-business-one.md)| Beta | -|[Sendgrid](./sources/sendgrid.md)| Certified | -|[Shopify](./sources/shopify.md)| Certified | -|[Short.io](./sources/shortio.md)| Beta | -|[Slack](./sources/slack.md)| Beta | -|[Spree Commerce](./sources/spree-commerce.md)| Beta | -|[Smartsheets](./sources/smartsheets.md)| Beta | -|[Snowflake](./sources/snowflake.md)| Beta | -|[Square](./sources/square.md)| Beta | -|[Stripe](./sources/stripe.md)| Certified | -|[Sugar CRM](./sources/sugar-crm.md)| Beta | -|[SurveyMonkey](./sources/surveymonkey.md)| Beta | -|[Tempo](./sources/tempo.md)| Beta | -|[Trello](./sources/trello.md)| Beta | -|[Twilio](./sources/twilio.md)| Beta | -|[US Census](./sources/us-census.md)| Alpha | -|[WooCommerce](./sources/woo-commerce.md)| Beta | -|[Wordpress](./sources/wordpress.md)| Beta | -|[Zencart](./sources/zencart.md)| Beta | -|[Zendesk Chat](./sources/zendesk-chat.md)| Certified | -|[Zendesk Sunshine](./sources/zendesk-sunshine.md)| Beta | -|[Zendesk Support](./sources/zendesk-support.md)| Certified | -|[Zendesk Talk](./sources/zendesk-talk.md)| Certified | -|[Zoom](./sources/zoom.md)| Beta | -|[Zuora](./sources/zuora.md)| Beta | +| :--- | :--- | +| [Amazon Seller Partner](sources/amazon-seller-partner.md) | Alpha | +| [Amplitude](sources/amplitude.md) | Beta | +| [Apify Dataset](sources/apify-dataset.md) | Alpha | +| [Appstore](sources/appstore.md) | Alpha | +| [Asana](sources/asana.md) | Beta | +| [AWS CloudTrail](sources/aws-cloudtrail.md) | Beta | +| [BambooHR](sources/bamboo-hr.md) | Alpha | +| [Braintree](sources/braintree.md) | Alpha | +| [BigCommerce](sources/bigcommerce.md) | Alpha | +| [BigQuery](sources/bigquery.md) | Beta | +| [Bing Ads](sources/bing-ads.md) | Beta | +| [Cart](sources/cart.md) | Beta | +| [Chargebee](sources/chargebee.md) | Alpha | +| [ClickHouse](sources/clickhouse.md) | Beta | +| [Close.com](sources/close-com.md) | Beta | +| [CockroachDB](sources/cockroachdb.md) | Beta | +| [Db2](sources/db2.md) | Beta | +| [Dixa](sources/dixa.md) | Alpha | +| [Drift](sources/drift.md) | Beta | +| [Drupal](sources/drupal.md) | Beta | +| [Exchange Rates API](sources/exchangeratesapi.md) | Certified | +| [Facebook Marketing](sources/facebook-marketing.md) | Beta | +| [Facebook Pages](sources/facebook-pages.md) | Alpha | +| [Files](sources/file.md) | Certified | +| [Freshdesk](sources/freshdesk.md) | Certified | +| [GitHub](sources/github.md) | Beta | +| [GitLab](sources/gitlab.md) | Beta | +| [Google Ads](sources/google-ads.md) | Beta | +| [Google Adwords](sources/google-adwords.md) | Beta | +| [Google Analytics v4](sources/google-analytics-v4.md) | Beta | +| [Google Directory](sources/google-directory.md) | Certified | +| [Google Search Console](sources/google-search-console.md) | Beta | +| [Google Sheets](sources/google-sheets.md) | Certified | +| [Google Workspace Admin Reports](sources/google-workspace-admin-reports.md) | Certified | +| [Greenhouse](sources/greenhouse.md) | Beta | +| [Hubspot](sources/hubspot.md) | Certified | +| [Instagram](sources/instagram.md) | Certified | +| [Intercom](sources/intercom.md) | Beta | +| [Iterable](sources/iterable.md) | Beta | +| [Jira](sources/jira.md) | Certified | +| [Klaviyo](sources/klaviyo.md) | Beta | +| [Klaviyo](sources/kustomer.md) | Alpha | +| [LinkedIn Ads](sources/linkedin-ads.md) | Beta | +| [Kustomer](sources/kustomer.md) | Alpha | +| [Lever Hiring](sources/lever-hiring.md) | Beta | +| [Looker](sources/looker.md) | Beta | +| [Magento](sources/magento.md) | Beta | +| [Mailchimp](sources/mailchimp.md) | Certified | +| [Marketo](sources/marketo.md) | Beta | +| [Microsoft SQL Server \(MSSQL\)](sources/mssql.md) | Certified | +| [Microsoft Dynamics AX](sources/microsoft-dynamics-ax.md) | Beta | +| [Microsoft Dynamics Customer Engagement](sources/microsoft-dynamics-customer-engagement.md) | Beta | +| [Microsoft Dynamics GP](sources/microsoft-dynamics-gp.md) | Beta | +| [Microsoft Dynamics NAV](sources/microsoft-dynamics-nav.md) | Beta | +| [Microsoft Teams](sources/microsoft-teams.md) | Certified | +| [Mixpanel](sources/mixpanel.md) | Beta | +| [Mongo DB](sources/mongodb-v2.md) | Beta | +| [MySQL](sources/mysql.md) | Certified | +| [Okta](sources/okta.md) | Beta | +| [Oracle DB](sources/oracle.md) | Certified | +| [Oracle PeopleSoft](sources/oracle-peoplesoft.md) | Beta | +| [Oracle Siebel CRM](sources/oracle-siebel-crm.md) | Beta | +| [PayPal Transaction](sources/paypal-transaction.md) | Beta | +| [Pipedrive](sources/pipedrive.md) | Alpha | +| [Plaid](sources/plaid.md) | Alpha | +| [PokéAPI](sources/pokeapi.md) | Beta | +| [Postgres](sources/postgres.md) | Certified | +| [PostHog](sources/posthog.md) | Beta | +| [PrestaShop](sources/presta-shop.md) | Beta | +| [Quickbooks](sources/quickbooks.md) | Beta | +| [Recharge](sources/recharge.md) | Beta | +| [Recurly](sources/recurly.md) | Beta | +| [Redshift](sources/redshift.md) | Certified | +| [S3](sources/s3.md) | Alpha | +| [Salesforce](sources/salesforce.md) | Certified | +| [SAP Business One](sources/sap-business-one.md) | Beta | +| [Sendgrid](sources/sendgrid.md) | Certified | +| [Shopify](sources/shopify.md) | Certified | +| [Short.io](sources/shortio.md) | Beta | +| [Slack](sources/slack.md) | Beta | +| [Spree Commerce](sources/spree-commerce.md) | Beta | +| [Smartsheets](sources/smartsheets.md) | Beta | +| [Snowflake](sources/snowflake.md) | Beta | +| [Square](sources/square.md) | Beta | +| [Stripe](sources/stripe.md) | Certified | +| [Sugar CRM](sources/sugar-crm.md) | Beta | +| [SurveyMonkey](sources/surveymonkey.md) | Beta | +| [Tempo](sources/tempo.md) | Beta | +| [Trello](sources/trello.md) | Beta | +| [Twilio](sources/twilio.md) | Beta | +| [US Census](sources/us-census.md) | Alpha | +| [WooCommerce](https://github.com/airbytehq/airbyte/tree/8d599c86a84726235c765c78db1ddd85c558bf7f/docs/integrations/sources/woo-commerce.md) | Beta | +| [Wordpress](sources/wordpress.md) | Beta | +| [Zencart](sources/zencart.md) | Beta | +| [Zendesk Chat](sources/zendesk-chat.md) | Certified | +| [Zendesk Sunshine](sources/zendesk-sunshine.md) | Beta | +| [Zendesk Support](sources/zendesk-support.md) | Certified | +| [Zendesk Talk](sources/zendesk-talk.md) | Certified | +| [Zoom](sources/zoom.md) | Beta | +| [Zuora](sources/zuora.md) | Beta | ### Destinations + | Connector | Grade | -|----|----| -|[AzureBlobStorage](./destinations/azureblobstorage.md)| Alpha | -|[BigQuery](./destinations/bigquery.md)| Certified | -|[Chargify (Keen)](./destinations/keen.md)| Alpha | -|[Databricks](./destinations/databricks.md) | Beta | -|[Google Cloud Storage (GCS)](./destinations/gcs.md)| Alpha | -|[Google Pubsub](./destinations/pubsub.md)| Alpha | -|[Kafka](./destinations/kafka.md)| Alpha | -|[Keen](./destinations/keen.md)| Alpha | -|[Local CSV](./destinations/local-csv.md)| Certified | -|[Local JSON](./destinations/local-json.md)| Certified | -|[MeiliSearch](./destinations/meilisearch.md)| Beta | -|[MongoDB](./destinations/mongodb.md)| Alpha | -|[MySQL](./destinations/mysql.md)| Beta | -|[Oracle](./destinations/oracle.md)| Alpha | -|[Postgres](./destinations/postgres.md)| Certified | -|[Redshift](./destinations/redshift.md)| Certified | -|[S3](./destinations/s3.md)| Certified | -|[SQL Server (MSSQL)](./destinations/mssql.md)| Alpha | -|[Snowflake](./destinations/snowflake.md)| Certified | +| :--- | :--- | +| [AzureBlobStorage](destinations/azureblobstorage.md) | Alpha | +| [BigQuery](destinations/bigquery.md) | Certified | +| [Chargify \(Keen\)]() | Alpha | +| [Databricks](destinations/databricks.md) | Beta | +| [Google Cloud Storage \(GCS\)](destinations/gcs.md) | Alpha | +| [Google Pubsub](destinations/pubsub.md) | Alpha | +| [Kafka](destinations/kafka.md) | Alpha | +| [Keen]() | Alpha | +| [Local CSV](destinations/local-csv.md) | Certified | +| [Local JSON](destinations/local-json.md) | Certified | +| [MeiliSearch](destinations/meilisearch.md) | Beta | +| [MongoDB](destinations/mongodb.md) | Alpha | +| [MySQL](destinations/mysql.md) | Beta | +| [Oracle](destinations/oracle.md) | Alpha | +| [Postgres](destinations/postgres.md) | Certified | +| [Redshift](destinations/redshift.md) | Certified | +| [S3](destinations/s3.md) | Certified | +| [SQL Server \(MSSQL\)](destinations/mssql.md) | Alpha | +| [Snowflake](destinations/snowflake.md) | Certified | + diff --git a/docs/integrations/custom-connectors.md b/docs/integrations/custom-connectors.md index 98492b5a769..ab8c133b954 100644 --- a/docs/integrations/custom-connectors.md +++ b/docs/integrations/custom-connectors.md @@ -6,13 +6,13 @@ description: Missing a connector? If you'd like to **ask for a new connector,** you can request it directly [here](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=area%2Fintegration%2C+new-integration&template=new-integration-request.md&title=). -If you'd like to build new connectors and **make them part of the pool of pre-built connectors on Airbyte,** first a big thank you. We invite you to check our [contributing guide on building connectors](../contributing-to-airbyte/README.md). +If you'd like to build new connectors and **make them part of the pool of pre-built connectors on Airbyte,** first a big thank you. We invite you to check our [contributing guide on building connectors](../contributing-to-airbyte/). If you'd like to build new connectors, or update existing ones, **for your own usage,** without contributing to the Airbyte codebase, read along. ## Developing your own connector -It's easy to code your own connectors on Airbyte. Here is a link to instruct on how to code new sources and destinations: [building new connectors](../contributing-to-airbyte/README.md) +It's easy to code your own connectors on Airbyte. Here is a link to instruct on how to code new sources and destinations: [building new connectors](../contributing-to-airbyte/) While the guides in the link above are specific to the languages used most frequently to write integrations, **Airbyte connectors can be written in any language**. Please reach out to us if you'd like help developing connectors in other languages. diff --git a/docs/integrations/destinations/azureblobstorage.md b/docs/integrations/destinations/azureblobstorage.md index 91f3f36a602..8c8c29b2606 100644 --- a/docs/integrations/destinations/azureblobstorage.md +++ b/docs/integrations/destinations/azureblobstorage.md @@ -1,4 +1,4 @@ -# Azure Blob Storage +# AzureBlobStorage ## Overview @@ -11,43 +11,42 @@ The Airbyte Azure Blob Storage destination allows you to sync data to Azure Blob | Feature | Support | Notes | | :--- | :---: | :--- | | Full Refresh Sync | ✅ | Warning: this mode deletes all previously synced data in the configured blob. | -| Incremental - Append Sync | ✅ | The append mode would only work for "Append blobs" blobs as per Azure limitations, more details https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#blobs | +| Incremental - Append Sync | ✅ | The append mode would only work for "Append blobs" blobs as per Azure limitations, more details [https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction\#blobs](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#blobs) | | Incremental - Deduped History | ❌ | As this connector does not support dbt, we don't support this sync mode on this destination. | ## Configuration | Parameter | Type | Notes | | :--- | :---: | :--- | -| Endpoint Domain Name | string | This is Azure Blob Storage endpoint domain name. Leave default value (or leave it empty if run container from command line) to use Microsoft native one. | -| Azure blob storage container (Bucket) Name | string | A name of the Azure blob storage container. If not exists - will be created automatically. If leave empty, then will be created automatically airbytecontainer+timestamp. | +| Endpoint Domain Name | string | This is Azure Blob Storage endpoint domain name. Leave default value \(or leave it empty if run container from command line\) to use Microsoft native one. | +| Azure blob storage container \(Bucket\) Name | string | A name of the Azure blob storage container. If not exists - will be created automatically. If leave empty, then will be created automatically airbytecontainer+timestamp. | | Azure Blob Storage account name | string | The account's name of the Azure Blob Storage. | | The Azure blob storage account key | string | Azure blob storage account key. Example: `abcdefghijklmnopqrstuvwxyz/0123456789+ABCDEFGHIJKLMNOPQRSTUVWXYZ/0123456789%++sampleKey==`. | | Format | object | Format specific configuration. See below for details. | ⚠️ Please note that under "Full Refresh Sync" mode, data in the configured blob will be wiped out before each sync. We recommend you to provision a dedicated Azure Blob Storage Container resource for this sync to prevent unexpected data deletion from misconfiguration. ⚠️ - ## Output Schema Each stream will be outputted to its dedicated Blob according to the configuration. The complete datastore of each stream includes all the output files under that Blob. You can think of the Blob as equivalent of a Table in the database world. -- Under Full Refresh Sync mode, old output files will be purged before new files are created. -- Under Incremental - Append Sync mode, new output files will be added that only contain the new data. +* Under Full Refresh Sync mode, old output files will be purged before new files are created. +* Under Incremental - Append Sync mode, new output files will be added that only contain the new data. ### CSV -Like most of the other Airbyte destination connectors, usually the output has three columns: a UUID, an emission timestamp, and the data blob. With the CSV output, it is possible to normalize (flatten) the data blob to multiple columns. +Like most of the other Airbyte destination connectors, usually the output has three columns: a UUID, an emission timestamp, and the data blob. With the CSV output, it is possible to normalize \(flatten\) the data blob to multiple columns. | Column | Condition | Description | | :--- | :--- | :--- | | `_airbyte_ab_id` | Always exists | A uuid assigned by Airbyte to each processed record. | | `_airbyte_emitted_at` | Always exists. | A timestamp representing when the event was pulled from the data source. | -| `_airbyte_data` | When no normalization (flattening) is needed, all data reside under this column as a json blob. | -| root level fields | When root level normalization (flattening) is selected, the root level fields are expanded. | +| `_airbyte_data` | When no normalization \(flattening\) is needed, all data reside under this column as a json blob. | | +| root level fields | When root level normalization \(flattening\) is selected, the root level fields are expanded. | | For example, given the following json object from a source: -```json +```javascript { "user_id": 123, "name": { @@ -69,11 +68,11 @@ With root level normalization, the output CSV is: | :--- | :--- | :--- | :--- | | `26d73cde-7eb1-4e1e-b7db-a4c03b4cf206` | 1622135805000 | 123 | `{ "first": "John", "last": "Doe" }` | -### JSON Lines (JSONL) +### JSON Lines \(JSONL\) [Json Lines](https://jsonlines.org/) is a text format with one JSON per line. Each line has a structure as follows: -```json +```javascript { "_airbyte_ab_id": "", "_airbyte_emitted_at": "", @@ -83,7 +82,7 @@ With root level normalization, the output CSV is: For example, given the following two json objects from a source: -```json +```javascript [ { "user_id": 123, @@ -104,7 +103,7 @@ For example, given the following two json objects from a source: They will be like this in the output file: -```jsonl +```text { "_airbyte_ab_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_emitted_at": "1622135805000", "_airbyte_data": { "user_id": 123, "name": { "first": "John", "last": "Doe" } } } { "_airbyte_ab_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_emitted_at": "1631948170000", "_airbyte_data": { "user_id": 456, "name": { "first": "Jane", "last": "Roe" } } } ``` @@ -114,17 +113,17 @@ They will be like this in the output file: ### Requirements 1. Create an AzureBlobStorage account. -2. Check if it works under https://portal.azure.com/ -> "Storage explorer (preview)". +2. Check if it works under [https://portal.azure.com/](https://portal.azure.com/) -> "Storage explorer \(preview\)". ### Setup guide * Fill up AzureBlobStorage info * **Endpoint Domain Name** - * Leave default value (or leave it empty if run container from command line) to use Microsoft native one or use your own. + * Leave default value \(or leave it empty if run container from command line\) to use Microsoft native one or use your own. * **Azure blob storage container** * If not exists - will be created automatically. If leave empty, then will be created automatically airbytecontainer+timestamp.. * **Azure Blob Storage account name** - * See [this](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-create?tabs=azure-portal) on how to create an account. + * See [this](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-create?tabs=azure-portal) on how to create an account. * **The Azure blob storage account key** * Corresponding key to the above user. * **Format** @@ -133,10 +132,9 @@ They will be like this in the output file: * This depends on your networking setup. * The easiest way to verify if Airbyte is able to connect to your Azure blob storage container is via the check connection tool in the UI. - - ## CHANGELOG | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.0 | 2021-08-30 | [#5332](https://github.com/airbytehq/airbyte/pull/5332) | Initial release with JSONL and CSV output. | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-08-30 | [\#5332](https://github.com/airbytehq/airbyte/pull/5332) | Initial release with JSONL and CSV output. | + diff --git a/docs/integrations/destinations/bigquery.md b/docs/integrations/destinations/bigquery.md index 643213004bd..ba6974883af 100644 --- a/docs/integrations/destinations/bigquery.md +++ b/docs/integrations/destinations/bigquery.md @@ -13,7 +13,7 @@ description: >- | Full Refresh Sync | Yes | | | Incremental - Append Sync | Yes | | | Incremental - Deduped History | Yes | | -| Bulk loading | Yes | | +| Bulk loading | Yes | | | Namespaces | Yes | | There are two flavors of connectors for this destination: @@ -30,10 +30,10 @@ Check out common troubleshooting issues for the BigQuery destination connector o Each stream will be output into its own table in BigQuery. Each table will contain 3 columns: * `_airbyte_ab_id`: a uuid assigned by Airbyte to each event that is processed. The column type in BigQuery is `String`. -* `_airbyte_emitted_at`: a timestamp representing when the event was pulled from the data source. The column type in BigQuery is `String`. Due to a Google [limitations](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#data_types) for data migration from GCs to BigQuery by its native job - the timestamp (seconds from 1970' can't be used). Only date format, so only String is accepted for us in this case. +* `_airbyte_emitted_at`: a timestamp representing when the event was pulled from the data source. The column type in BigQuery is `String`. Due to a Google [limitations](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#data_types) for data migration from GCs to BigQuery by its native job - the timestamp \(seconds from 1970' can't be used\). Only date format, so only String is accepted for us in this case. * `_airbyte_data`: a json blob representing with the event data. The column type in BigQuery is `String`. -## Getting Started (Airbyte Open-Source / Airbyte Cloud) +## Getting Started \(Airbyte Open-Source / Airbyte Cloud\) #### Requirements @@ -45,6 +45,7 @@ To use the BigQuery destination, you'll need: * A Service Account Key to authenticate into your Service Account For GCS Staging upload mode: + * GCS role enabled for same user as used for biqquery * HMAC key obtained for user. Currently, only the [HMAC key](https://cloud.google.com/storage/docs/authentication/hmackeys) is supported. More credential types will be added in the future. @@ -88,39 +89,42 @@ You should now have all the requirements needed to configure BigQuery as a desti * **Dataset Location** * **Dataset ID**: the name of the schema where the tables will be created. * **Service Account Key**: the contents of your Service Account Key JSON file -* **Google BigQuery client chunk size**: Google BigQuery client's chunk(buffer) size (MIN=1, MAX = 15) for each table. The default 15MiB value is used if not set explicitly. It's recommended to decrease value for big data sets migration for less HEAP memory consumption and avoiding crashes. For more details refer to https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html +* **Google BigQuery client chunk size**: Google BigQuery client's chunk\(buffer\) size \(MIN=1, MAX = 15\) for each table. The default 15MiB value is used if not set explicitly. It's recommended to decrease value for big data sets migration for less HEAP memory consumption and avoiding crashes. For more details refer to [https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html) Once you've configured BigQuery as a destination, delete the Service Account Key from your computer. #### Uploading Options + There are 2 available options to upload data to BigQuery `Standard` and `GCS Staging`. -- `Standard` is option to upload data directly from your source to BigQuery storage. This way is faster and requires less resources than GCS one. + +* `Standard` is option to upload data directly from your source to BigQuery storage. This way is faster and requires less resources than GCS one. + Please be aware you may see some fails for big datasets and slow sources, i.e. if reading from source takes more than 10-12 hours. - This is caused by the Google BigQuery SDK client limitations. For more details please check https://github.com/airbytehq/airbyte/issues/3549 -- `GCS Uploading (CSV format)`: This approach has been implemented in order to avoid the issue for big datasets mentioned above. + + This is caused by the Google BigQuery SDK client limitations. For more details please check [https://github.com/airbytehq/airbyte/issues/3549](https://github.com/airbytehq/airbyte/issues/3549) + +* `GCS Uploading (CSV format)`: This approach has been implemented in order to avoid the issue for big datasets mentioned above. + At the first step all data is uploaded to GCS bucket and then all moved to BigQuery at one shot stream by stream. - The [destination-gcs connector](./gcs.md) is partially used under the hood here, so you may check its documentation for more details. + + The [destination-gcs connector](gcs.md) is partially used under the hood here, so you may check its documentation for more details. For the GCS Staging upload type additional params must be configured: - * **GCS Bucket Name** - * **GCS Bucket Path** - * **GCS Bucket Keep files after migration** - * See [this](https://cloud.google.com/storage/docs/creating-buckets) to create an S3 bucket. - * **HMAC Key Access ID** - * See [this](https://cloud.google.com/storage/docs/authentication/hmackeys) on how to generate an access key. - * We recommend creating an Airbyte-specific user or service account. This user or account will require read and write permissions to objects in the bucket. - * **Secret Access Key** - * Corresponding key to the above access ID. -* Make sure your GCS bucket is accessible from the machine running Airbyte. - * This depends on your networking setup. - * The easiest way to verify if Airbyte is able to connect to your GCS bucket is via the check connection tool in the UI. +* **GCS Bucket Name** +* **GCS Bucket Path** +* **GCS Bucket Keep files after migration** + * See [this](https://cloud.google.com/storage/docs/creating-buckets) to create an S3 bucket. +* **HMAC Key Access ID** + * See [this](https://cloud.google.com/storage/docs/authentication/hmackeys) on how to generate an access key. + * We recommend creating an Airbyte-specific user or service account. This user or account will require read and write permissions to objects in the bucket. +* **Secret Access Key** + * Corresponding key to the above access ID. + * Make sure your GCS bucket is accessible from the machine running Airbyte. +* This depends on your networking setup. +* The easiest way to verify if Airbyte is able to connect to your GCS bucket is via the check connection tool in the UI. - -Note: -It partially re-uses the destination-gcs connector under the hood. So you may also refer to its guide for additional clarifications. -**GCS Region** for GCS would be used the same as set for BigQuery -**Format** - Gcs format is set to CSV +Note: It partially re-uses the destination-gcs connector under the hood. So you may also refer to its guide for additional clarifications. **GCS Region** for GCS would be used the same as set for BigQuery **Format** - Gcs format is set to CSV ## Naming Conventions @@ -143,24 +147,25 @@ Therefore, Airbyte BigQuery destination will convert any invalid characters into ### bigquery | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.4.0 | 2021-10-04 | [#6733](https://github.com/airbytehq/airbyte/issues/6733) | Support dataset starting with numbers | -| 0.4.0 | 2021-08-26 | [#5296](https://github.com/airbytehq/airbyte/issues/5296) | Added GCS Staging uploading option | -| 0.3.12 | 2021-08-03 | [#3549](https://github.com/airbytehq/airbyte/issues/3549) | Add optional arg to make a possibility to change the BigQuery client's chunk\buffer size | -| 0.3.11 | 2021-07-30 | [#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | -| 0.3.10 | 2021-07-28 | [#3549](https://github.com/airbytehq/airbyte/issues/3549) | Add extended logs and made JobId filled with region and projectId | -| 0.3.9 | 2021-07-28 | [#5026](https://github.com/airbytehq/airbyte/pull/5026) | Add sanitized json fields in raw tables to handle quotes in column names | -| 0.3.6 | 2021-06-18 | [#3947](https://github.com/airbytehq/airbyte/issues/3947) | Service account credentials are now optional. | -| 0.3.4 | 2021-06-07 | [#3277](https://github.com/airbytehq/airbyte/issues/3277) | Add dataset location option | +| :--- | :--- | :--- | :--- | +| 0.4.0 | 2021-10-04 | [\#6733](https://github.com/airbytehq/airbyte/issues/6733) | Support dataset starting with numbers | +| 0.4.0 | 2021-08-26 | [\#5296](https://github.com/airbytehq/airbyte/issues/5296) | Added GCS Staging uploading option | +| 0.3.12 | 2021-08-03 | [\#3549](https://github.com/airbytehq/airbyte/issues/3549) | Add optional arg to make a possibility to change the BigQuery client's chunk\buffer size | +| 0.3.11 | 2021-07-30 | [\#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | +| 0.3.10 | 2021-07-28 | [\#3549](https://github.com/airbytehq/airbyte/issues/3549) | Add extended logs and made JobId filled with region and projectId | +| 0.3.9 | 2021-07-28 | [\#5026](https://github.com/airbytehq/airbyte/pull/5026) | Add sanitized json fields in raw tables to handle quotes in column names | +| 0.3.6 | 2021-06-18 | [\#3947](https://github.com/airbytehq/airbyte/issues/3947) | Service account credentials are now optional. | +| 0.3.4 | 2021-06-07 | [\#3277](https://github.com/airbytehq/airbyte/issues/3277) | Add dataset location option | ### bigquery-denormalized | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.6 | 2021-09-16 | [#6145](https://github.com/airbytehq/airbyte/pull/6145) | BigQuery Denormalized support for date, datetime & timestamp types through the json "format" key -| 0.1.5 | 2021-09-07 | [#5881](https://github.com/airbytehq/airbyte/pull/5881) | BigQuery Denormalized NPE fix -| 0.1.4 | 2021-09-04 | [#5813](https://github.com/airbytehq/airbyte/pull/5813) | fix Stackoverflow error when receive a schema from source where "Array" type doesn't contain a required "items" element | -| 0.1.3 | 2021-08-07 | [#5261](https://github.com/airbytehq/airbyte/pull/5261) | 🐛 Destination BigQuery(Denormalized): Fix processing arrays of records | -| 0.1.2 | 2021-07-30 | [#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | -| 0.1.1 | 2021-06-21 | [#3555](https://github.com/airbytehq/airbyte/pull/3555) | Partial Success in BufferedStreamConsumer | -| 0.1.0 | 2021-06-21 | [#4176](https://github.com/airbytehq/airbyte/pull/4176) | Destination using Typed Struct and Repeated fields | +| :--- | :--- | :--- | :--- | +| 0.1.6 | 2021-09-16 | [\#6145](https://github.com/airbytehq/airbyte/pull/6145) | BigQuery Denormalized support for date, datetime & timestamp types through the json "format" key | +| 0.1.5 | 2021-09-07 | [\#5881](https://github.com/airbytehq/airbyte/pull/5881) | BigQuery Denormalized NPE fix | +| 0.1.4 | 2021-09-04 | [\#5813](https://github.com/airbytehq/airbyte/pull/5813) | fix Stackoverflow error when receive a schema from source where "Array" type doesn't contain a required "items" element | +| 0.1.3 | 2021-08-07 | [\#5261](https://github.com/airbytehq/airbyte/pull/5261) | 🐛 Destination BigQuery\(Denormalized\): Fix processing arrays of records | +| 0.1.2 | 2021-07-30 | [\#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | +| 0.1.1 | 2021-06-21 | [\#3555](https://github.com/airbytehq/airbyte/pull/3555) | Partial Success in BufferedStreamConsumer | +| 0.1.0 | 2021-06-21 | [\#4176](https://github.com/airbytehq/airbyte/pull/4176) | Destination using Typed Struct and Repeated fields | + diff --git a/docs/integrations/destinations/databricks.md b/docs/integrations/destinations/databricks.md index 7bb74c5168a..b10eb2254db 100644 --- a/docs/integrations/destinations/databricks.md +++ b/docs/integrations/destinations/databricks.md @@ -13,11 +13,12 @@ Due to legal reasons, this is currently a private connector that is only availab | Feature | Support | Notes | | :--- | :---: | :--- | | Full Refresh Sync | ✅ | Warning: this mode deletes all previously synced data in the configured bucket path. | -| Incremental - Append Sync | ✅ | | -| Incremental - Deduped History | ❌ | | -| Namespaces | ✅ | | +| Incremental - Append Sync | ✅ | | +| Incremental - Deduped History | ❌ | | +| Namespaces | ✅ | | ## Data Source + Databricks supports various cloud storage as the [data source](https://docs.databricks.com/data/data-sources/index.html). Currently, only Amazon S3 is supported. ## Configuration @@ -25,16 +26,16 @@ Databricks supports various cloud storage as the [data source](https://docs.data | Category | Parameter | Type | Notes | | :--- | :--- | :---: | :--- | | Databricks | Server Hostname | string | Required. See [documentation](https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#get-server-hostname-port-http-path-and-jdbc-url). | -| | HTTP Path | string | Required. See [documentation](https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#get-server-hostname-port-http-path-and-jdbc-url). | -| | Port | string | Optional. Default to "443". See [documentation](https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#get-server-hostname-port-http-path-and-jdbc-url). | -| | Personal Access Token | string | Required. See [documentation](https://docs.databricks.com/sql/user/security/personal-access-tokens.html). | +| | HTTP Path | string | Required. See [documentation](https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#get-server-hostname-port-http-path-and-jdbc-url). | +| | Port | string | Optional. Default to "443". See [documentation](https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#get-server-hostname-port-http-path-and-jdbc-url). | +| | Personal Access Token | string | Required. See [documentation](https://docs.databricks.com/sql/user/security/personal-access-tokens.html). | | General | Database schema | string | Optional. Default to "public". Each data stream will be written to a table under this database schema. | -| | Purge Staging Data | boolean | The connector creates staging files and tables on S3. By default they will be purged when the data sync is complete. Set it to `false` for debugging purpose. | +| | Purge Staging Data | boolean | The connector creates staging files and tables on S3. By default they will be purged when the data sync is complete. Set it to `false` for debugging purpose. | | Data Source - S3 | Bucket Name | string | Name of the bucket to sync data into. | -| | Bucket Path | string | Subdirectory under the above bucket to sync the data into. | -| | Region | string | See [documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions) for all region codes. | -| | Access Key ID | string | AWS/Minio credential. | -| | Secret Access Key | string | AWS/Minio credential. | +| | Bucket Path | string | Subdirectory under the above bucket to sync the data into. | +| | Region | string | See [documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions) for all region codes. | +| | Access Key ID | string | AWS/Minio credential. | +| | Secret Access Key | string | AWS/Minio credential. | ⚠️ Please note that under "Full Refresh Sync" mode, data in the configured bucket and path will be wiped out before each sync. We recommend you to provision a dedicated S3 resource for this sync to prevent unexpected data deletion from misconfiguration. ⚠️ @@ -42,13 +43,13 @@ Databricks supports various cloud storage as the [data source](https://docs.data Data streams are first written as staging Parquet files on S3, and then loaded into Databricks tables. All the staging files will be deleted after the sync is done. For debugging purposes, here is the full path for a staging file: -``` +```text s3:///// ``` For example: -``` +```text s3://testing_bucket/data_output_path/98c450be-5b1c-422d-b8b5-6ca9903727d9/users ↑ ↑ ↑ ↑ | | | stream name @@ -57,18 +58,17 @@ s3://testing_bucket/data_output_path/98c450be-5b1c-422d-b8b5-6ca9903727d9/users bucket name ``` - ## Unmanaged Spark SQL Table Currently, all streams are synced into unmanaged Spark SQL tables. See [documentation](https://docs.databricks.com/data/tables.html#managed-and-unmanaged-tables) for details. In summary, you have full control of the location of the data underlying an unmanaged table. The full path of each data stream is: -``` +```text s3:///// ``` For example: -``` +```text s3://testing_bucket/data_output_path/public/users ↑ ↑ ↑ ↑ | | | stream name @@ -97,11 +97,12 @@ Learn how source data is converted to Parquet and the current limitations [here] 1. Credentials for a Databricks cluster. See [documentation](https://docs.databricks.com/clusters/create.html). 2. Credentials for an S3 bucket. See [documentation](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys). -3. Grant the Databricks cluster full access to the S3 bucket. Or mount it as Databricks File System (DBFS). See [documentation](https://docs.databricks.com/data/data-sources/aws/amazon-s3.html). +3. Grant the Databricks cluster full access to the S3 bucket. Or mount it as Databricks File System \(DBFS\). See [documentation](https://docs.databricks.com/data/data-sources/aws/amazon-s3.html). ## CHANGELOG | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.1 | 2021-10-05 | [#6792](https://github.com/airbytehq/airbyte/pull/6792) | Require users to accept Databricks JDBC Driver [Terms & Conditions](https://databricks.com/jdbc-odbc-driver-license). | -| 0.1.0 | 2021-09-14 | [#5998](https://github.com/airbytehq/airbyte/pull/5998) | Initial private release. | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-10-05 | [\#6792](https://github.com/airbytehq/airbyte/pull/6792) | Require users to accept Databricks JDBC Driver [Terms & Conditions](https://databricks.com/jdbc-odbc-driver-license). | +| 0.1.0 | 2021-09-14 | [\#5998](https://github.com/airbytehq/airbyte/pull/5998) | Initial private release. | + diff --git a/docs/integrations/destinations/dynamodb.md b/docs/integrations/destinations/dynamodb.md index ebaf106ccd2..5e01c7e43a2 100644 --- a/docs/integrations/destinations/dynamodb.md +++ b/docs/integrations/destinations/dynamodb.md @@ -1,4 +1,4 @@ -# Dynamodb +# DynamoDB This destination writes data to AWS DynamoDB. @@ -20,9 +20,9 @@ Each stream will be output into its own DynamoDB table. Each table will a collec | Feature | Support | Notes | | :--- | :---: | :--- | | Full Refresh Sync | ✅ | Warning: this mode deletes all previously synced data in the configured DynamoDB table. | -| Incremental - Append Sync | ✅ | | +| Incremental - Append Sync | ✅ | | | Incremental - Deduped History | ❌ | As this connector does not support dbt, we don't support this sync mode on this destination. | -| Namespaces | ✅ | Namespace will be used as part of the table name. | +| Namespaces | ✅ | Namespace will be used as part of the table name. | ### Performance considerations @@ -38,24 +38,25 @@ This connector by default uses 10 capacity units for both Read and Write in Dyna ### Setup guide * Fill up DynamoDB info - * **DynamoDB Endpoint** - * Leave empty if using AWS DynamoDB, fill in endpoint URL if using customized endpoint. - * **DynamoDB Table Name** - * The name prefix of the DynamoDB table to store the extracted data. The table name is \\_\\_\. - * **DynamoDB Region** - * The region of the DynamoDB. - * **Access Key Id** - * See [this](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) on how to generate an access key. - * We recommend creating an Airbyte-specific user. This user will require [read and write permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_dynamodb_specific-table.html) to the DynamoDB table. - * **Secret Access Key** - * Corresponding key to the above key id. + * **DynamoDB Endpoint** + * Leave empty if using AWS DynamoDB, fill in endpoint URL if using customized endpoint. + * **DynamoDB Table Name** + * The name prefix of the DynamoDB table to store the extracted data. The table name is \\_\\_\. + * **DynamoDB Region** + * The region of the DynamoDB. + * **Access Key Id** + * See [this](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) on how to generate an access key. + * We recommend creating an Airbyte-specific user. This user will require [read and write permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_dynamodb_specific-table.html) to the DynamoDB table. + * **Secret Access Key** + * Corresponding key to the above key id. * Make sure your DynamoDB tables are accessible from the machine running Airbyte. - * This depends on your networking setup. - * You can check AWS DynamoDB documentation with a tutorial on how to properly configure your DynamoDB's access [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/access-control-overview.html). - * The easiest way to verify if Airbyte is able to connect to your DynamoDB tables is via the check connection tool in the UI. + * This depends on your networking setup. + * You can check AWS DynamoDB documentation with a tutorial on how to properly configure your DynamoDB's access [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/access-control-overview.html). + * The easiest way to verify if Airbyte is able to connect to your DynamoDB tables is via the check connection tool in the UI. ## CHANGELOG | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.0 | 2021-08-20 | [#5561](https://github.com/airbytehq/airbyte/pull/5561) | Initial release. | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-08-20 | [\#5561](https://github.com/airbytehq/airbyte/pull/5561) | Initial release. | + diff --git a/docs/integrations/destinations/gcs.md b/docs/integrations/destinations/gcs.md index d7f054e2a36..d0f4806f40c 100644 --- a/docs/integrations/destinations/gcs.md +++ b/docs/integrations/destinations/gcs.md @@ -1,4 +1,4 @@ -# Google Cloud Storage +# Google Cloud Storage \(GCS\) ## Overview @@ -11,7 +11,7 @@ The Airbyte GCS destination allows you to sync data to cloud storage buckets. Ea | Feature | Support | Notes | | :--- | :---: | :--- | | Full Refresh Sync | ✅ | Warning: this mode deletes all previously synced data in the configured bucket path. | -| Incremental - Append Sync | ✅ | | +| Incremental - Append Sync | ✅ | | | Incremental - Deduped History | ❌ | As this connector does not support dbt, we don't support this sync mode on this destination. | | Namespaces | ❌ | Setting a specific bucket path is equivalent to having separate namespaces. | @@ -25,7 +25,7 @@ The Airbyte GCS destination allows you to sync data to cloud storage buckets. Ea | HMAC Key Access ID | string | HMAC key access ID . The access ID for the GCS bucket. When linked to a service account, this ID is 61 characters long; when linked to a user account, it is 24 characters long. See [HMAC key](https://cloud.google.com/storage/docs/authentication/hmackeys) for details. | | HMAC Key Secret | string | The corresponding secret for the access ID. It is a 40-character base-64 encoded string. | | Format | object | Format specific configuration. See below for details. | -| Part Size | integer | Arg to configure a block size. Max allowed blocks by GCS = 10,000, i.e. max stream size = blockSize * 10,000 blocks. | +| Part Size | integer | Arg to configure a block size. Max allowed blocks by GCS = 10,000, i.e. max stream size = blockSize \* 10,000 blocks. | Currently, only the [HMAC key](https://cloud.google.com/storage/docs/authentication/hmackeys) is supported. More credential types will be added in the future. @@ -33,13 +33,13 @@ Currently, only the [HMAC key](https://cloud.google.com/storage/docs/authenticat The full path of the output data is: -``` +```text ///--. ``` For example: -``` +```text testing_bucket/data_output_path/public/users/2021_01_01_1609541171643_0.csv ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ | | | | | | | format extension @@ -54,10 +54,7 @@ bucket name Please note that the stream name may contain a prefix, if it is configured on the connection. -The rationales behind this naming pattern are: -1. Each stream has its own directory. -2. The data output files can be sorted by upload time. -3. The upload time composes of a date part and millis part so that it is both readable and unique. +The rationales behind this naming pattern are: 1. Each stream has its own directory. 2. The data output files can be sorted by upload time. 3. The upload time composes of a date part and millis part so that it is both readable and unique. Currently, each data sync will only create one file per stream. In the future, the output file can be partitioned by size. Each partition is identifiable by the partition ID, which is always 0 for now. @@ -65,8 +62,8 @@ Currently, each data sync will only create one file per stream. In the future, t Each stream will be outputted to its dedicated directory according to the configuration. The complete datastore of each stream includes all the output files under that directory. You can think of the directory as equivalent of a Table in the database world. -- Under Full Refresh Sync mode, old output files will be purged before new files are created. -- Under Incremental - Append Sync mode, new output files will be added that only contain the new data. +* Under Full Refresh Sync mode, old output files will be purged before new files are created. +* Under Incremental - Append Sync mode, new output files will be added that only contain the new data. ### Avro @@ -76,28 +73,28 @@ Each stream will be outputted to its dedicated directory according to the config Here is the available compression codecs: -- No compression -- `deflate` - - Compression level - - Range `[0, 9]`. Default to 0. - - Level 0: no compression & fastest. - - Level 9: best compression & slowest. -- `bzip2` -- `xz` - - Compression level - - Range `[0, 9]`. Default to 6. - - Level 0-3 are fast with medium compression. - - Level 4-6 are fairly slow with high compression. - - Level 7-9 are like level 6 but use bigger dictionaries and have higher memory requirements. Unless the uncompressed size of the file exceeds 8 MiB, 16 MiB, or 32 MiB, it is waste of memory to use the presets 7, 8, or 9, respectively. -- `zstandard` - - Compression level - - Range `[-5, 22]`. Default to 3. - - Negative levels are 'fast' modes akin to `lz4` or `snappy`. - - Levels above 9 are generally for archival purposes. - - Levels above 18 use a lot of memory. - - Include checksum - - If set to `true`, a checksum will be included in each data block. -- `snappy` +* No compression +* `deflate` + * Compression level + * Range `[0, 9]`. Default to 0. + * Level 0: no compression & fastest. + * Level 9: best compression & slowest. +* `bzip2` +* `xz` + * Compression level + * Range `[0, 9]`. Default to 6. + * Level 0-3 are fast with medium compression. + * Level 4-6 are fairly slow with high compression. + * Level 7-9 are like level 6 but use bigger dictionaries and have higher memory requirements. Unless the uncompressed size of the file exceeds 8 MiB, 16 MiB, or 32 MiB, it is waste of memory to use the presets 7, 8, or 9, respectively. +* `zstandard` + * Compression level + * Range `[-5, 22]`. Default to 3. + * Negative levels are 'fast' modes akin to `lz4` or `snappy`. + * Levels above 9 are generally for archival purposes. + * Levels above 18 use a lot of memory. + * Include checksum + * If set to `true`, a checksum will be included in each data block. +* `snappy` #### Data schema @@ -106,7 +103,7 @@ Under the hood, an Airbyte data stream in Json schema is converted to an Avro sc 1. Json schema types are mapped to Avro typea as follows: | Json Data Type | Avro Data Type | - | :---: | :---: | +| :---: | :---: | | string | string | | number | double | | integer | int | @@ -115,32 +112,33 @@ Under the hood, an Airbyte data stream in Json schema is converted to an Avro sc | object | record | | array | array | -2. Built-in Json schema formats are not mapped to Avro logical types at this moment. -2. Combined restrictions ("allOf", "anyOf", and "oneOf") will be converted to type unions. The corresponding Avro schema can be less stringent. For example, the following Json schema +1. Built-in Json schema formats are not mapped to Avro logical types at this moment. +2. Combined restrictions \("allOf", "anyOf", and "oneOf"\) will be converted to type unions. The corresponding Avro schema can be less stringent. For example, the following Json schema - ```json - { + ```javascript + { "oneOf": [ { "type": "string" }, { "type": "integer" } ] - } - ``` -will become this in Avro schema: + } + ``` - ```json - { + will become this in Avro schema: + + ```javascript + { "type": ["null", "string", "int"] - } - ``` + } + ``` -2. Keyword `not` is not supported, as there is no equivalent validation mechanism in Avro schema. -3. Only alphanumeric characters and underscores (`/a-zA-Z0-9_/`) are allowed in a stream or field name. Any special character will be converted to an alphabet or underscore. For example, `spécial:character_names` will become `special_character_names`. The original names will be stored in the `doc` property in this format: `_airbyte_original_name:`. -4. All field will be nullable. For example, a `string` Json field will be typed as `["null", "string"]` in Avro. This is necessary because the incoming data stream may have optional fields. -5. For array fields in Json schema, when the `items` property is an array, it means that each element in the array should follow its own schema sequentially. For example, the following specification means the first item in the array should be a string, and the second a number. +3. Keyword `not` is not supported, as there is no equivalent validation mechanism in Avro schema. +4. Only alphanumeric characters and underscores \(`/a-zA-Z0-9_/`\) are allowed in a stream or field name. Any special character will be converted to an alphabet or underscore. For example, `spécial:character_names` will become `special_character_names`. The original names will be stored in the `doc` property in this format: `_airbyte_original_name:`. +5. All field will be nullable. For example, a `string` Json field will be typed as `["null", "string"]` in Avro. This is necessary because the incoming data stream may have optional fields. +6. For array fields in Json schema, when the `items` property is an array, it means that each element in the array should follow its own schema sequentially. For example, the following specification means the first item in the array should be a string, and the second a number. - ```json - { + ```javascript + { "array_field": { "type": "array", "items": [ @@ -148,12 +146,12 @@ will become this in Avro schema: { "type": "number" } ] } - } - ``` + } + ``` -This is not supported in Avro schema. As a compromise, the converter creates a union, ["string", "number"], which is less stringent: +This is not supported in Avro schema. As a compromise, the converter creates a union, \["string", "number"\], which is less stringent: - ```json +```javascript { "name": "array_field", "type": [ @@ -165,20 +163,20 @@ This is not supported in Avro schema. As a compromise, the converter creates a u ], "default": null } - ``` +``` -6. Two Airbyte specific fields will be added to each Avro record: +1. Two Airbyte specific fields will be added to each Avro record: | Field | Schema | Document | - | :--- | :--- | :---: | -| `_airbyte_ab_id` | `uuid` | [link](http://avro.apache.org/docs/current/spec.html#UUID) +| :--- | :--- | :---: | +| `_airbyte_ab_id` | `uuid` | [link](http://avro.apache.org/docs/current/spec.html#UUID) | | `_airbyte_emitted_at` | `timestamp-millis` | [link](http://avro.apache.org/docs/current/spec.html#Timestamp+%28millisecond+precision%29) | -7. Currently `additionalProperties` is not supported. This means if the source is schemaless (e.g. Mongo), or has flexible fields, they will be ignored. We will have a solution soon. Feel free to submit a new issue if this is blocking for you. +1. Currently `additionalProperties` is not supported. This means if the source is schemaless \(e.g. Mongo\), or has flexible fields, they will be ignored. We will have a solution soon. Feel free to submit a new issue if this is blocking for you. For example, given the following Json schema: -```json +```javascript { "type": "object", "$schema": "http://json-schema.org/draft-07/schema#", @@ -207,7 +205,7 @@ For example, given the following Json schema: Its corresponding Avro schema will be: -```json +```javascript { "name" : "stream_name", "type" : "record", @@ -254,18 +252,18 @@ Its corresponding Avro schema will be: ### CSV -Like most of the other Airbyte destination connectors, usually the output has three columns: a UUID, an emission timestamp, and the data blob. With the CSV output, it is possible to normalize (flatten) the data blob to multiple columns. +Like most of the other Airbyte destination connectors, usually the output has three columns: a UUID, an emission timestamp, and the data blob. With the CSV output, it is possible to normalize \(flatten\) the data blob to multiple columns. | Column | Condition | Description | | :--- | :--- | :--- | | `_airbyte_ab_id` | Always exists | A uuid assigned by Airbyte to each processed record. | | `_airbyte_emitted_at` | Always exists. | A timestamp representing when the event was pulled from the data source. | -| `_airbyte_data` | When no normalization (flattening) is needed, all data reside under this column as a json blob. | -| root level fields | When root level normalization (flattening) is selected, the root level fields are expanded. | +| `_airbyte_data` | When no normalization \(flattening\) is needed, all data reside under this column as a json blob. | | +| root level fields | When root level normalization \(flattening\) is selected, the root level fields are expanded. | | For example, given the following json object from a source: -```json +```javascript { "user_id": 123, "name": { @@ -287,11 +285,11 @@ With root level normalization, the output CSV is: | :--- | :--- | :--- | :--- | | `26d73cde-7eb1-4e1e-b7db-a4c03b4cf206` | 1622135805000 | 123 | `{ "first": "John", "last": "Doe" }` | -### JSON Lines (JSONL) +### JSON Lines \(JSONL\) [Json Lines](https://jsonlines.org/) is a text format with one JSON per line. Each line has a structure as follows: -```json +```javascript { "_airbyte_ab_id": "", "_airbyte_emitted_at": "", @@ -301,7 +299,7 @@ With root level normalization, the output CSV is: For example, given the following two json objects from a source: -```json +```javascript [ { "user_id": 123, @@ -322,7 +320,7 @@ For example, given the following two json objects from a source: They will be like this in the output file: -```jsonl +```text { "_airbyte_ab_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_emitted_at": "1622135805000", "_airbyte_data": { "user_id": 123, "name": { "first": "John", "last": "Doe" } } } { "_airbyte_ab_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_emitted_at": "1631948170000", "_airbyte_data": { "user_id": 456, "name": { "first": "Jane", "last": "Roe" } } } ``` @@ -336,17 +334,17 @@ The following configuration is available to configure the Parquet output: | Parameter | Type | Default | Description | | :--- | :---: | :---: | :--- | | `compression_codec` | enum | `UNCOMPRESSED` | **Compression algorithm**. Available candidates are: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZO`, `BROTLI`, `LZ4`, and `ZSTD`. | -| `block_size_mb` | integer | 128 (MB) | **Block size (row group size)** in MB. This is the size of a row group being buffered in memory. It limits the memory usage when writing. Larger values will improve the IO when reading, but consume more memory when writing. | -| `max_padding_size_mb` | integer | 8 (MB) | **Max padding size** in MB. This is the maximum size allowed as padding to align row groups. This is also the minimum size of a row group. | -| `page_size_kb` | integer | 1024 (KB) | **Page size** in KB. The page size is for compression. A block is composed of pages. A page is the smallest unit that must be read fully to access a single record. If this value is too small, the compression will deteriorate. | -| `dictionary_page_size_kb` | integer | 1024 (KB) | **Dictionary Page Size** in KB. There is one dictionary page per column per row group when dictionary encoding is used. The dictionary page size works like the page size but for dictionary. | +| `block_size_mb` | integer | 128 \(MB\) | **Block size \(row group size\)** in MB. This is the size of a row group being buffered in memory. It limits the memory usage when writing. Larger values will improve the IO when reading, but consume more memory when writing. | +| `max_padding_size_mb` | integer | 8 \(MB\) | **Max padding size** in MB. This is the maximum size allowed as padding to align row groups. This is also the minimum size of a row group. | +| `page_size_kb` | integer | 1024 \(KB\) | **Page size** in KB. The page size is for compression. A block is composed of pages. A page is the smallest unit that must be read fully to access a single record. If this value is too small, the compression will deteriorate. | +| `dictionary_page_size_kb` | integer | 1024 \(KB\) | **Dictionary Page Size** in KB. There is one dictionary page per column per row group when dictionary encoding is used. The dictionary page size works like the page size but for dictionary. | | `dictionary_encoding` | boolean | `true` | **Dictionary encoding**. This parameter controls whether dictionary encoding is turned on. | -These parameters are related to the `ParquetOutputFormat`. See the [Java doc](https://www.javadoc.io/doc/org.apache.parquet/parquet-hadoop/1.12.0/org/apache/parquet/hadoop/ParquetOutputFormat.html) for more details. Also see [Parquet documentation](https://parquet.apache.org/documentation/latest/#configurations) for their recommended configurations (512 - 1024 MB block size, 8 KB page size). +These parameters are related to the `ParquetOutputFormat`. See the [Java doc](https://www.javadoc.io/doc/org.apache.parquet/parquet-hadoop/1.12.0/org/apache/parquet/hadoop/ParquetOutputFormat.html) for more details. Also see [Parquet documentation](https://parquet.apache.org/documentation/latest/#configurations) for their recommended configurations \(512 - 1024 MB block size, 8 KB page size\). #### Data schema -Under the hood, an Airbyte data stream in Json schema is first converted to an Avro schema, then the Json object is converted to an Avro record, and finally the Avro record is outputted to the Parquet format. See the `Data schema` section from the [Avro output](#avro) for rules and limitations. +Under the hood, an Airbyte data stream in Json schema is first converted to an Avro schema, then the Json object is converted to an Avro record, and finally the Avro record is outputted to the Parquet format. See the `Data schema` section from the [Avro output](gcs.md#avro) for rules and limitations. ## Getting started @@ -373,7 +371,8 @@ Under the hood, an Airbyte data stream in Json schema is first converted to an A ## CHANGELOG | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.2 | 2021-09-12 | [#5720](https://github.com/airbytehq/airbyte/issues/5720) | Added configurable block size for stream. Each stream is limited to 10,000 by GCS | -| 0.1.1 | 2021-08-26 | [#5296](https://github.com/airbytehq/airbyte/issues/5296) | Added storing gcsCsvFileLocation property for CSV format. This is used by destination-bigquery (GCS Staging upload type) | -| 0.1.0 | 2021-07-16 | [#4329](https://github.com/airbytehq/airbyte/pull/4784) | Initial release. | +| :--- | :--- | :--- | :--- | +| 0.1.2 | 2021-09-12 | [\#5720](https://github.com/airbytehq/airbyte/issues/5720) | Added configurable block size for stream. Each stream is limited to 10,000 by GCS | +| 0.1.1 | 2021-08-26 | [\#5296](https://github.com/airbytehq/airbyte/issues/5296) | Added storing gcsCsvFileLocation property for CSV format. This is used by destination-bigquery \(GCS Staging upload type\) | +| 0.1.0 | 2021-07-16 | [\#4329](https://github.com/airbytehq/airbyte/pull/4784) | Initial release. | + diff --git a/docs/integrations/destinations/kafka.md b/docs/integrations/destinations/kafka.md index 206f6272f9f..d7f3fa98ae1 100644 --- a/docs/integrations/destinations/kafka.md +++ b/docs/integrations/destinations/kafka.md @@ -10,8 +10,7 @@ The Airbyte Kafka destination allows you to sync data to Kafka. Each stream is w Each stream will be output into a Kafka topic. -Currently, this connector only writes data with JSON format. More formats (e.g. Apache Avro) will be supported in -the future. +Currently, this connector only writes data with JSON format. More formats \(e.g. Apache Avro\) will be supported in the future. Each record will contain in its key the uuid assigned by Airbyte, and in the value these 3 fields: @@ -29,7 +28,6 @@ Each record will contain in its key the uuid assigned by Airbyte, and in the val | Incremental - Deduped History | No | As this connector does not support dbt, we don't support this sync mode on this destination. | | Namespaces | Yes | | - ## Getting started ### Requirements @@ -46,41 +44,25 @@ Make sure your Kafka brokers can be accessed by Airbyte. #### **Permissions** -Airbyte should be allowed to write messages into topics, and these topics should be created before writing into Kafka -or, at least, enable the configuration in the brokers `auto.create.topics.enable` (which is not recommended for -production environments). +Airbyte should be allowed to write messages into topics, and these topics should be created before writing into Kafka or, at least, enable the configuration in the brokers `auto.create.topics.enable` \(which is not recommended for production environments\). -Note that if you choose to use dynamic topic names, you will probably need to enable `auto.create.topics.enable` -to avoid your connection failing if there was an update to the source connector's schema. Otherwise a hardcoded -topic name may be best. +Note that if you choose to use dynamic topic names, you will probably need to enable `auto.create.topics.enable` to avoid your connection failing if there was an update to the source connector's schema. Otherwise a hardcoded topic name may be best. #### Target topics -You can determine the topics to which messages are written via the `topic_pattern` configuration parameter. -Messages can be written to either a hardcoded, pre-defined topic, or dynamically written to different topics -based on the [namespace](https://docs.airbyte.io/understanding-airbyte/namespaces) or stream they came from. +You can determine the topics to which messages are written via the `topic_pattern` configuration parameter. Messages can be written to either a hardcoded, pre-defined topic, or dynamically written to different topics based on the [namespace](https://docs.airbyte.io/understanding-airbyte/namespaces) or stream they came from. -To write all messages to a single hardcoded topic, enter its name in the `topic_pattern` field -e.g: setting `topic_pattern` to `my-topic-name` will write all messages from all streams and namespaces to that topic. +To write all messages to a single hardcoded topic, enter its name in the `topic_pattern` field e.g: setting `topic_pattern` to `my-topic-name` will write all messages from all streams and namespaces to that topic. -To define the output topics dynamically, you can leverage the `{namespace}` and `{stream}` pattern variables, -which cause messages to be written to different topics based on the values present when producing the records. -For example, setting the `topic_pattern` parameter to `airbyte_syncs/{namespace}/{stream}` means that messages -from namespace `n1` and stream `s1` will get written to the topic `airbyte_syncs/n1/s1`, and messages -from `s2` to `airbyte_syncs/n1/s2` etc. +To define the output topics dynamically, you can leverage the `{namespace}` and `{stream}` pattern variables, which cause messages to be written to different topics based on the values present when producing the records. For example, setting the `topic_pattern` parameter to `airbyte_syncs/{namespace}/{stream}` means that messages from namespace `n1` and stream `s1` will get written to the topic `airbyte_syncs/n1/s1`, and messages from `s2` to `airbyte_syncs/n1/s2` etc. -If you define output topic dynamically, you might want to enable `auto.create.topics.enable` to -avoid your connection failing if there was an update to the source connector's schema. -Otherwise, you'll need to manually create topics in Kafka as they are added/updated in the source, which is the -recommended option for production environments. +If you define output topic dynamically, you might want to enable `auto.create.topics.enable` to avoid your connection failing if there was an update to the source connector's schema. Otherwise, you'll need to manually create topics in Kafka as they are added/updated in the source, which is the recommended option for production environments. -**NOTICE**: a naming convention transformation will be applied to the target topic name using -the `StandardNameTransformer` so that some special characters will be replaced. +**NOTICE**: a naming convention transformation will be applied to the target topic name using the `StandardNameTransformer` so that some special characters will be replaced. ### Setup the Kafka destination in Airbyte -You should now have all the requirements needed to configure Kafka as a destination in the UI. You can configure the -following parameters on the Kafka destination (though many of these are optional or have default values): +You should now have all the requirements needed to configure Kafka as a destination in the UI. You can configure the following parameters on the Kafka destination \(though many of these are optional or have default values\): * **Bootstrap servers** * **Topic pattern** @@ -110,12 +92,13 @@ following parameters on the Kafka destination (though many of these are optional More info about this can be found in the [Kafka producer configs documentation site](https://kafka.apache.org/documentation/#producerconfigs). -*NOTE*: Some configurations for SSL are not available yet. +_NOTE_: Some configurations for SSL are not available yet. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.2 | 2021-09-14 | [#6040](https://github.com/airbytehq/airbyte/pull/6040) | Change spec.json and config parser | -| 0.1.1 | 2021-07-30 | [#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | -| 0.1.0 | 2021-07-21 | [#3746](https://github.com/airbytehq/airbyte/pull/3746) | Initial Release | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.2 | 2021-09-14 | [\#6040](https://github.com/airbytehq/airbyte/pull/6040) | Change spec.json and config parser | +| 0.1.1 | 2021-07-30 | [\#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | +| 0.1.0 | 2021-07-21 | [\#3746](https://github.com/airbytehq/airbyte/pull/3746) | Initial Release | + diff --git a/docs/integrations/destinations/keen-1.md b/docs/integrations/destinations/keen-1.md new file mode 100644 index 00000000000..c76a373263f --- /dev/null +++ b/docs/integrations/destinations/keen-1.md @@ -0,0 +1,67 @@ +--- +description: Keen is a fully managed event streaming and analytic platform. +--- + +# Keen + +## Overview + +The Airbyte Keen destination allows you to send/stream data into Keen. Keen is a flexible, fully managed event streaming and analytic platform. + +### Sync overview + +#### Output schema + +Each stream will output an event in Keen. Each collection will inherit the name from the stream with all non-alphanumeric characters removed, except for `.-_` and whitespace characters. When possible, the connector will try to guess the timestamp value for the record and override the special field `keen.timestamp` with it. + +#### Features + +| Feature | Supported?\(Yes/No\) | Notes | +| :--- | :--- | :--- | +| Full Refresh Sync | Yes | | +| Incremental - Append Sync | Yes | | +| Incremental - Deduped History | No | As this connector does not support dbt, we don't support this sync mode on this destination. | +| Namespaces | No | | + +## Getting started + +### Requirements + +To use the Keen destination, you'll need: + +* A Keen Project ID +* A Keen Master API key associated with the project + +See the setup guide for more information about how to acquire the required resources. + +### Setup guide + +#### Keen Project + +If you already have the project set up, jump to the "Access" section. + +Login to your [Keen](https://keen.io/) account, then click the Add New link next to the Projects label on the left-hand side tab. Then give project a name. + +#### API Key and Project ID + +Keen connector uses Keen Kafka Inbound Cluster to stream the data. It requires `Project ID` and `Master Key` for the authentication. To get them, navigate to the `Access` tab from the left-hand side panel and check the `Project Details` section. **Important**: This destination requires the Project's **Master** Key. + +#### Timestamp Inference + +`Infer Timestamp` field lets you specify if you want the connector to guess the special `keen.timestamp` field based on the streamed data. It might be useful for historical data synchronization to fully leverage Keen's analytics power. If not selected, `keen.timestamp` will be set to date when data was streamed. By default, set to `true`. + +### Setup the Keen destination in Airbyte + +Now you should have all the parameters needed to configure Keen destination. + +* **Project ID** +* **Master API Key** +* **Infer Timestamp** + +## CHANGELOG + +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.0 | 2021-09-10 | [\#5973](https://github.com/airbytehq/airbyte/pull/5973) | Fix timestamp inference for complex schemas | +| 0.1.0 | 2021-08-18 | [\#5339](https://github.com/airbytehq/airbyte/pull/5339) | Keen Destination Release! | + diff --git a/docs/integrations/destinations/keen.md b/docs/integrations/destinations/keen.md index 4589117022d..c1559b18f81 100644 --- a/docs/integrations/destinations/keen.md +++ b/docs/integrations/destinations/keen.md @@ -2,7 +2,7 @@ description: Keen is a fully managed event streaming and analytic platform. --- -# Keen +# Chargify ## Overview @@ -12,8 +12,7 @@ The Airbyte Keen destination allows you to send/stream data into Keen. Keen is a #### Output schema -Each stream will output an event in Keen. Each collection will inherit the name from the stream with all non-alphanumeric characters removed, except for `.-_ ` and whitespace characters. When possible, the connector will try to guess the timestamp value for the record and override the special field `keen.timestamp` with it. - +Each stream will output an event in Keen. Each collection will inherit the name from the stream with all non-alphanumeric characters removed, except for `.-_` and whitespace characters. When possible, the connector will try to guess the timestamp value for the record and override the special field `keen.timestamp` with it. #### Features @@ -43,11 +42,9 @@ If you already have the project set up, jump to the "Access" section. Login to your [Keen](https://keen.io/) account, then click the Add New link next to the Projects label on the left-hand side tab. Then give project a name. +#### API Key and Project ID -#### API Key and Project ID - -Keen connector uses Keen Kafka Inbound Cluster to stream the data. It requires `Project ID` and `Master Key` for the authentication. To get them, navigate to the `Access` tab from the left-hand side panel and check the `Project Details` section. -**Important**: This destination requires the Project's **Master** Key. +Keen connector uses Keen Kafka Inbound Cluster to stream the data. It requires `Project ID` and `Master Key` for the authentication. To get them, navigate to the `Access` tab from the left-hand side panel and check the `Project Details` section. **Important**: This destination requires the Project's **Master** Key. #### Timestamp Inference @@ -63,8 +60,8 @@ Now you should have all the parameters needed to configure Keen destination. ## CHANGELOG -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.0 | 2021-09-10 | [#5973](https://github.com/airbytehq/airbyte/pull/5973) | Fix timestamp inference for complex schemas | -| 0.1.0 | 2021-08-18 | [#5339](https://github.com/airbytehq/airbyte/pull/5339) | Keen Destination Release! | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.0 | 2021-09-10 | [\#5973](https://github.com/airbytehq/airbyte/pull/5973) | Fix timestamp inference for complex schemas | +| 0.1.0 | 2021-08-18 | [\#5339](https://github.com/airbytehq/airbyte/pull/5339) | Keen Destination Release! | diff --git a/docs/integrations/destinations/local-csv.md b/docs/integrations/destinations/local-csv.md index 5849081fe08..ec4b3a6a35d 100644 --- a/docs/integrations/destinations/local-csv.md +++ b/docs/integrations/destinations/local-csv.md @@ -47,8 +47,7 @@ The local mount is mounted by Docker onto `LOCAL_ROOT`. This means the `/local` ## Access Replicated Data Files -If your Airbyte instance is running on the same computer that you are navigating with, you can open your browser and enter [file:///tmp/airbyte\_local](file:///tmp/airbyte_local) to look at the replicated data locally. -If the first approach fails or if your Airbyte instance is running on a remote server, follow the following steps to access the replicated files: +If your Airbyte instance is running on the same computer that you are navigating with, you can open your browser and enter [file:///tmp/airbyte\_local](file:///tmp/airbyte_local) to look at the replicated data locally. If the first approach fails or if your Airbyte instance is running on a remote server, follow the following steps to access the replicated files: 1. Access the scheduler container using `docker exec -it airbyte-scheduler bash` 2. Navigate to the default local mount using `cd /tmp/airbyte_local` @@ -58,9 +57,9 @@ If the first approach fails or if your Airbyte instance is running on a remote s You can also copy the output file to your host machine, the following command will copy the file to the current working directory you are using: -``` +```text docker cp airbyte-scheduler:/tmp/airbyte_local/{destination_path}/{filename}.csv . ``` - Note: If you are running Airbyte on Windows with Docker backed by WSL2, you have to use similar step as above or refer to this [link](../../operator-guides/locating-files-local-destination.md) for an alternative approach. + diff --git a/docs/integrations/destinations/local-json.md b/docs/integrations/destinations/local-json.md index dbb05d5e50e..aea607f486d 100644 --- a/docs/integrations/destinations/local-json.md +++ b/docs/integrations/destinations/local-json.md @@ -47,8 +47,7 @@ The local mount is mounted by Docker onto `LOCAL_ROOT`. This means the `/local` ## Access Replicated Data Files -If your Airbyte instance is running on the same computer that you are navigating with, you can open your browser and enter [file:///tmp/airbyte\_local](file:///tmp/airbyte_local) to look at the replicated data locally. -If the first approach fails or if your Airbyte instance is running on a remote server, follow the following steps to access the replicated files: +If your Airbyte instance is running on the same computer that you are navigating with, you can open your browser and enter [file:///tmp/airbyte\_local](file:///tmp/airbyte_local) to look at the replicated data locally. If the first approach fails or if your Airbyte instance is running on a remote server, follow the following steps to access the replicated files: 1. Access the scheduler container using `docker exec -it airbyte-scheduler bash` 2. Navigate to the default local mount using `cd /tmp/airbyte_local` @@ -58,8 +57,9 @@ If the first approach fails or if your Airbyte instance is running on a remote s You can also copy the output file to your host machine, the following command will copy the file to the current working directory you are using: -``` +```text docker cp airbyte-scheduler:/tmp/airbyte_local/{destination_path}/{filename}.jsonl . ``` Note: If you are running Airbyte on Windows with Docker backed by WSL2, you have to use similar step as above or refer to this [link](../../operator-guides/locating-files-local-destination.md) for an alternative approach. + diff --git a/docs/integrations/destinations/mongodb.md b/docs/integrations/destinations/mongodb.md index 7ea055ae4ac..824295b9ecb 100644 --- a/docs/integrations/destinations/mongodb.md +++ b/docs/integrations/destinations/mongodb.md @@ -1,4 +1,4 @@ -# Mongodb +# MongoDB ## Features @@ -17,21 +17,24 @@ Each stream will be output into its own collection in MongoDB. Each collection w * `_airbyte_emitted_at`: a timestamp representing when the event was pulled from the data source. The field type in MongoDB is `Timestamp`. * `_airbyte_data`: a json blob representing with the event data. The field type in MongoDB is `Object`. -## Getting Started (Airbyte Cloud) +## Getting Started \(Airbyte Cloud\) + Airbyte Cloud only supports connecting to your MongoDB instance with TLS encryption. Other than that, you can proceed with the open-source instructions below. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) #### Requirements To use the MongoDB destination, you'll need: * A MongoDB server - + #### **Permissions** + You need a MongoDB user that can create collections and write documents. We highly recommend creating an Airbyte-specific user for this purpose. #### Target Database + You will need to choose an existing database or create a new database that will be used to store synced data from Airbyte. ### Setup the MongoDB destination in Airbyte @@ -39,14 +42,14 @@ You will need to choose an existing database or create a new database that will You should now have all the requirements needed to configure MongoDB as a destination in the UI. You'll need the following information to configure the MongoDB destination: * **Standalone MongoDb instance** - * Host: URL of the database - * Port: Port to use for connecting to the database - * TLS: indicates whether to create encrypted connection + * Host: URL of the database + * Port: Port to use for connecting to the database + * TLS: indicates whether to create encrypted connection * **Replica Set** - * Server addresses: the members of a replica set - * Replica Set: A replica set name + * Server addresses: the members of a replica set + * Replica Set: A replica set name * **MongoDb Atlas Cluster** - * Cluster URL: URL of a cluster to connect to + * Cluster URL: URL of a cluster to connect to * **Database** * **Username** * **Password** @@ -63,12 +66,13 @@ Since database names are case insensitive in MongoDB, database names cannot diff #### Restrictions on Database Names for Windows -For MongoDB deployments running on Windows, database names cannot contain any of the following characters: /\. "$*<>:|?* +For MongoDB deployments running on Windows, database names cannot contain any of the following characters: /. "$_<>:\|?_ Also database names cannot contain the null character. #### Restrictions on Database Names for Unix and Linux Systems -For MongoDB deployments running on Unix and Linux systems, database names cannot contain any of the following characters: /\. "$ + +For MongoDB deployments running on Unix and Linux systems, database names cannot contain any of the following characters: /. "$ Also database names cannot contain the null character. @@ -81,11 +85,13 @@ Database names cannot be empty and must have fewer than 64 characters. Collection names should begin with an underscore or a letter character, and cannot: * contain the $. -* be an empty string (e.g. ""). +* be an empty string \(e.g. ""\). * contain the null character. -* begin with the system. prefix. (Reserved for internal use.) +* begin with the system. prefix. \(Reserved for internal use.\) ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-09-29 | [6536](https://github.com/airbytehq/airbyte/pull/6536) | Destination MongoDb: added support via TLS/SSL | + +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-09-29 | [6536](https://github.com/airbytehq/airbyte/pull/6536) | Destination MongoDb: added support via TLS/SSL | + diff --git a/docs/integrations/destinations/mssql.md b/docs/integrations/destinations/mssql.md index 410cb62752b..dbc7145392d 100644 --- a/docs/integrations/destinations/mssql.md +++ b/docs/integrations/destinations/mssql.md @@ -1,4 +1,4 @@ -# MS SQL Server +# MSSQL ## Features @@ -21,21 +21,23 @@ Each stream will be output into its own table in SQL Server. Each table will con * `_airbyte_emitted_at`: a timestamp representing when the event was pulled from the data source. The column type in SQL Server is `DATETIMEOFFSET(7)`. * `_airbyte_data`: a JSON blob representing with the event data. The column type in SQL Server is `NVARCHAR(MAX)`. -#### Microsoft SQL Server specifics or why NVARCHAR type is used here: +#### Microsoft SQL Server specifics or why NVARCHAR type is used here: + * NVARCHAR is Unicode - 2 bytes per character, therefore max. of 1 billion characters; will handle East Asian, Arabic, Hebrew, Cyrillic etc. characters just fine. * VARCHAR is non-Unicode - 1 byte per character, max. capacity is 2 billion characters, but limited to the character set you're SQL Server is using, basically - no support for those languages mentioned before -## Getting Started (Airbyte Cloud) +## Getting Started \(Airbyte Cloud\) + Airbyte Cloud only supports connecting to your MSSQL instance with TLS encryption. Other than that, you can proceed with the open-source instructions below. | Feature | Supported?\(Yes/No\) | Notes | | :--- | :--- | :--- | | Full Refresh Sync | Yes | | | Incremental - Append Sync | Yes | | -| Incremental - Deduped History | Yes | | +| Incremental - Deduped History | Yes | | | Namespaces | Yes | | -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) ### Requirements @@ -45,11 +47,10 @@ MS SQL Server: `Azure SQL Database`, `Azure Synapse Analytics`, `Azure SQL Manag ### Normalization Requirements -To sync **with** normalization you'll need to use MS SQL Server of the following versions: -`SQL Server 2019`, `SQL Server 2017`, `SQL Server 2016`, `SQL Server 2014`. -The work of normalization on `SQL Server 2012` and bellow are not guaranteed. +To sync **with** normalization you'll need to use MS SQL Server of the following versions: `SQL Server 2019`, `SQL Server 2017`, `SQL Server 2016`, `SQL Server 2014`. The work of normalization on `SQL Server 2012` and bellow are not guaranteed. ### Setup guide + * MS SQL Server: `Azure SQL Database`, `Azure Synapse Analytics`, `Azure SQL Managed Instance`, `SQL Server 2019`, `SQL Server 2017`, `SQL Server 2016`, `SQL Server 2014`, `SQL Server 2012`, or `PDW 2008R2 AU34`. #### Network Access @@ -64,9 +65,9 @@ You need a user configured in SQL Server that can create tables and write rows. You will need to choose an existing database or create a new database that will be used to store synced data from Airbyte. -#### SSL configuration (optional) +#### SSL configuration \(optional\) -Airbyte supports a SSL-encrypted connection to the database. If you want to use SSL to securely access your database, ensure that [the server is configured to use an SSL certificate.](https://support.microsoft.com/en-us/topic/how-to-enable-ssl-encryption-for-an-instance-of-sql-server-by-using-microsoft-management-console-1c7ae22f-8518-2b3e-93eb-d735af9e344c) +Airbyte supports a SSL-encrypted connection to the database. If you want to use SSL to securely access your database, ensure that [the server is configured to use an SSL certificate.](https://support.microsoft.com/en-us/topic/how-to-enable-ssl-encryption-for-an-instance-of-sql-server-by-using-microsoft-management-console-1c7ae22f-8518-2b3e-93eb-d735af9e344c) ### Setup the MSSQL destination in Airbyte @@ -78,48 +79,53 @@ You should now have all the requirements needed to configure SQL Server as a des * **Password** * **Schema** * **Database** - * This database needs to exist within the schema provided. + * This database needs to exist within the schema provided. * **SSL Method**: - * The SSL configuration supports three modes: Unencrypted, Encrypted (trust server certificate), and Encrypted (verify certificate). + * The SSL configuration supports three modes: Unencrypted, Encrypted \(trust server certificate\), and Encrypted \(verify certificate\). * **Unencrypted**: Do not use SSL encryption on the database connection - * **Encrypted (trust server certificate)**: Use SSL encryption without verifying the server's certificate. This is useful for self-signed certificates in testing scenarios, but should not be used in production. - * **Encrypted (verify certificate)**: Use the server's SSL certificate, after standard certificate verification. - * **Host Name In Certificate** (optional): When using certificate verification, this property can be set to specify an expected name for added security. If this value is present, and the server's certificate's host name does not match it, certificate verification will fail. - + * **Encrypted \(trust server certificate\)**: Use SSL encryption without verifying the server's certificate. This is useful for self-signed certificates in testing scenarios, but should not be used in production. + * **Encrypted \(verify certificate\)**: Use the server's SSL certificate, after standard certificate verification. + * **Host Name In Certificate** \(optional\): When using certificate verification, this property can be set to specify an expected name for added security. If this value is present, and the server's certificate's host name does not match it, certificate verification will fail. + ## Connection via SSH Tunnel -Airbyte has the ability to connect to the MS SQL Server instance via an SSH Tunnel. The reason you might want to do this because it is not possible -(or against security policy) to connect to the database directly (e.g. it does not have a public IP address). +Airbyte has the ability to connect to the MS SQL Server instance via an SSH Tunnel. The reason you might want to do this because it is not possible \(or against security policy\) to connect to the database directly \(e.g. it does not have a public IP address\). -When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server (a.k.a. a bastion sever) that have direct access to the database. -Airbyte connects to the bastion and then asks the bastion to connect directly to the server. +When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server \(a.k.a. a bastion sever\) that have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. Using this feature requires additional configuration, when creating the source. We will talk through what each piece of configuration means. 1. Configure all fields for the source as you normally would, except `SSH Tunnel Method`. -2. `SSH Tunnel Method` defaults to `No Tunnel` (meaning a direct connection). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. -3. Choose `Key Authentication` if you will be using an RSA private key as your secret for establishing the SSH Tunnel (see below for more information on generating this key). +2. `SSH Tunnel Method` defaults to `No Tunnel` \(meaning a direct connection\). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. +3. Choose `Key Authentication` if you will be using an RSA private key as your secret for establishing the SSH Tunnel \(see below for more information on generating this key\). 4. Choose `Password Authentication` if you will be using a password as your secret for establishing the SSH Tunnel. -5. `SSH Tunnel Jump Server Host` refers to the intermediate (bastion) server that Airbyte will connect to. This should be a hostname or an IP Address. +5. `SSH Tunnel Jump Server Host` refers to the intermediate \(bastion\) server that Airbyte will connect to. This should be a hostname or an IP Address. 6. `SSH Connection Port` is the port on the bastion server with which to make the SSH connection. The default port for SSH connections is `22`, -so unless you have explicitly changed something, go with the default. + + so unless you have explicitly changed something, go with the default. + 7. `SSH Login Username` is the username that Airbyte should use when connection to the bastion server. This is NOT the MS SQL Server username. 8. If you are using `Password Authentication`, then `SSH Login Username` should be set to the password of the User from the previous step. -If you are using `SSH Key Authentication` leave this blank. Again, this is not the MS SQL Server password, but the password for the OS-user that -Airbyte is using to perform commands on the bastion. + + If you are using `SSH Key Authentication` leave this blank. Again, this is not the MS SQL Server password, but the password for the OS-user that + + Airbyte is using to perform commands on the bastion. + 9. If you are using `SSH Key Authentication`, then `SSH Private Key` should be set to the RSA Private Key that you are using to create the SSH connection. -This should be the full contents of the key file starting with `-----BEGIN RSA PRIVATE KEY-----` and ending with `-----END RSA PRIVATE KEY-----`. + + This should be the full contents of the key file starting with `-----BEGIN RSA PRIVATE KEY-----` and ending with `-----END RSA PRIVATE KEY-----`. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.9 | 2021-09-29 | [#5970](https://github.com/airbytehq/airbyte/pull/5970) | Add support & test cases for MSSQL Destination via SSH tunnels | -| 0.1.8 | 2021-08-07 | [#5272](https://github.com/airbytehq/airbyte/pull/5272) | Add batch method to insert records | -| 0.1.7 | 2021-07-30 | [#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | -| 0.1.6 | 2021-06-21 | [#3555](https://github.com/airbytehq/airbyte/pull/3555) | Partial Success in BufferedStreamConsumer | -| 0.1.5 | 2021-07-20 | [#4874](https://github.com/airbytehq/airbyte/pull/4874) | declare object types correctly in spec | -| 0.1.4 | 2021-06-17 | [#3744](https://github.com/airbytehq/airbyte/pull/3744) | Fix doc/params in specification file | -| 0.1.3 | 2021-05-28 | [#3728](https://github.com/airbytehq/airbyte/pull/3973) | Change dockerfile entrypoint | -| 0.1.2 | 2021-05-13 | [#3367](https://github.com/airbytehq/airbyte/pull/3671) | Fix handle symbols unicode | -| 0.1.1 | 2021-05-11 | [#3566](https://github.com/airbytehq/airbyte/pull/3195) | MS SQL Server Destination Release! | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.9 | 2021-09-29 | [\#5970](https://github.com/airbytehq/airbyte/pull/5970) | Add support & test cases for MSSQL Destination via SSH tunnels | +| 0.1.8 | 2021-08-07 | [\#5272](https://github.com/airbytehq/airbyte/pull/5272) | Add batch method to insert records | +| 0.1.7 | 2021-07-30 | [\#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | +| 0.1.6 | 2021-06-21 | [\#3555](https://github.com/airbytehq/airbyte/pull/3555) | Partial Success in BufferedStreamConsumer | +| 0.1.5 | 2021-07-20 | [\#4874](https://github.com/airbytehq/airbyte/pull/4874) | declare object types correctly in spec | +| 0.1.4 | 2021-06-17 | [\#3744](https://github.com/airbytehq/airbyte/pull/3744) | Fix doc/params in specification file | +| 0.1.3 | 2021-05-28 | [\#3728](https://github.com/airbytehq/airbyte/pull/3973) | Change dockerfile entrypoint | +| 0.1.2 | 2021-05-13 | [\#3367](https://github.com/airbytehq/airbyte/pull/3671) | Fix handle symbols unicode | +| 0.1.1 | 2021-05-11 | [\#3566](https://github.com/airbytehq/airbyte/pull/3195) | MS SQL Server Destination Release! | + diff --git a/docs/integrations/destinations/mysql.md b/docs/integrations/destinations/mysql.md index 8ec54671b23..5900cda6e81 100644 --- a/docs/integrations/destinations/mysql.md +++ b/docs/integrations/destinations/mysql.md @@ -18,10 +18,11 @@ Each stream will be output into its own table in MySQL. Each table will contain * `_airbyte_emitted_at`: a timestamp representing when the event was pulled from the data source. The column type in MySQL is `TIMESTAMP(6)`. * `_airbyte_data`: a json blob representing with the event data. The column type in MySQL is `JSON`. -## Getting Started (Airbyte Cloud) +## Getting Started \(Airbyte Cloud\) + Airbyte Cloud only supports connecting to your MySQL instance with TLS encryption. Other than that, you can proceed with the open-source instructions below. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) ### Requirements @@ -56,29 +57,30 @@ You should now have all the requirements needed to configure MySQL as a destinat ## Known Limitations -Note that MySQL documentation discusses identifiers case sensitivity using the `lower_case_table_names` system variable. -One of their recommendations is: +Note that MySQL documentation discusses identifiers case sensitivity using the `lower_case_table_names` system variable. One of their recommendations is: - "It is best to adopt a consistent convention, such as always creating and referring to databases and tables using lowercase names. - This convention is recommended for maximum portability and ease of use." +```text +"It is best to adopt a consistent convention, such as always creating and referring to databases and tables using lowercase names. + This convention is recommended for maximum portability and ease of use." +``` [Source: MySQL docs](https://dev.mysql.com/doc/refman/8.0/en/identifier-case-sensitivity.html) -As a result, Airbyte MySQL destination forces all identifier (table, schema and columns) names to be lowercase. +As a result, Airbyte MySQL destination forces all identifier \(table, schema and columns\) names to be lowercase. ## Connection via SSH Tunnel -Airbyte has the ability to connect to a MySQl instance via an SSH Tunnel. The reason you might want to do this because it is not possible (or against security policy) to connect to the database directly (e.g. it does not have a public IP address). +Airbyte has the ability to connect to a MySQl instance via an SSH Tunnel. The reason you might want to do this because it is not possible \(or against security policy\) to connect to the database directly \(e.g. it does not have a public IP address\). -When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server (a.k.a. a bastion sever) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. +When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server \(a.k.a. a bastion sever\) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. Using this feature requires additional configuration, when creating the destination. We will talk through what each piece of configuration means. 1. Configure all fields for the destination as you normally would, except `SSH Tunnel Method`. -2. `SSH Tunnel Method` defaults to `No Tunnel` (meaning a direct connection). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. - 1. Choose `Key Authentication` if you will be using an RSA private key as your secret for establishing the SSH Tunnel (see below for more information on generating this key). - 2. Choose `Password Authentication` if you will be using a password as your secret for establishing the SSH Tunnel. -3. `SSH Tunnel Jump Server Host` refers to the intermediate (bastion) server that Airbyte will connect to. This should be a hostname or an IP Address. +2. `SSH Tunnel Method` defaults to `No Tunnel` \(meaning a direct connection\). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. + 1. Choose `Key Authentication` if you will be using an RSA private key as your secret for establishing the SSH Tunnel \(see below for more information on generating this key\). + 2. Choose `Password Authentication` if you will be using a password as your secret for establishing the SSH Tunnel. +3. `SSH Tunnel Jump Server Host` refers to the intermediate \(bastion\) server that Airbyte will connect to. This should be a hostname or an IP Address. 4. `SSH Connection Port` is the port on the bastion server with which to make the SSH connection. The default port for SSH connections is `22`, so unless you have explicitly changed something, go with the default. 5. `SSH Login Username` is the username that Airbyte should use when connection to the bastion server. This is NOT the MySQl username. 6. If you are using `Password Authentication`, then `SSH Login Username` should be set to the password of the User from the previous step. If you are using `SSH Key Authentication` leave this blank. Again, this is not the MySQl password, but the password for the OS-user that Airbyte is using to perform commands on the bastion. @@ -87,16 +89,17 @@ Using this feature requires additional configuration, when creating the destinat ## CHANGELOG | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.13 | 2021-09-28 | [#6506](https://github.com/airbytehq/airbyte/pull/6506) | Added support for MySQL destination via TLS/SSL | -| 0.1.12 | 2021-09-24 | [#6317](https://github.com/airbytehq/airbyte/pull/6317) | Added option to connect to DB via SSH | -| 0.1.11 | 2021-07-30 | [#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | -| 0.1.10 | 2021-07-28 | [#5026](https://github.com/airbytehq/airbyte/pull/5026) | Add sanitized json fields in raw tables to handle quotes in column names | -| 0.1.7 | 2021-07-09 | [#4651](https://github.com/airbytehq/airbyte/pull/4651) | Switch normalization flag on so users can use normalization. | -| 0.1.6 | 2021-07-03 | [#4531](https://github.com/airbytehq/airbyte/pull/4531) | Added normalization for MySQL. | -| 0.1.5 | 2021-07-03 | [#3973](https://github.com/airbytehq/airbyte/pull/3973) | Added `AIRBYTE_ENTRYPOINT` for kubernetes support. | -| 0.1.4 | 2021-07-03 | [#3290](https://github.com/airbytehq/airbyte/pull/3290) | Switched to get states from destination instead of source. | -| 0.1.3 | 2021-07-03 | [#3387](https://github.com/airbytehq/airbyte/pull/3387) | Fixed a bug for message length checking. | -| 0.1.2 | 2021-07-03 | [#3327](https://github.com/airbytehq/airbyte/pull/3327) | Fixed LSEP unicode characters. | -| 0.1.1 | 2021-07-03 | [#3289](https://github.com/airbytehq/airbyte/pull/3289) | Added support for outputting messages. | -| 0.1.0 | 2021-05-06 | [#3242](https://github.com/airbytehq/airbyte/pull/3242) | Added MySQL destination. | +| :--- | :--- | :--- | :--- | +| 0.1.13 | 2021-09-28 | [\#6506](https://github.com/airbytehq/airbyte/pull/6506) | Added support for MySQL destination via TLS/SSL | +| 0.1.12 | 2021-09-24 | [\#6317](https://github.com/airbytehq/airbyte/pull/6317) | Added option to connect to DB via SSH | +| 0.1.11 | 2021-07-30 | [\#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | +| 0.1.10 | 2021-07-28 | [\#5026](https://github.com/airbytehq/airbyte/pull/5026) | Add sanitized json fields in raw tables to handle quotes in column names | +| 0.1.7 | 2021-07-09 | [\#4651](https://github.com/airbytehq/airbyte/pull/4651) | Switch normalization flag on so users can use normalization. | +| 0.1.6 | 2021-07-03 | [\#4531](https://github.com/airbytehq/airbyte/pull/4531) | Added normalization for MySQL. | +| 0.1.5 | 2021-07-03 | [\#3973](https://github.com/airbytehq/airbyte/pull/3973) | Added `AIRBYTE_ENTRYPOINT` for kubernetes support. | +| 0.1.4 | 2021-07-03 | [\#3290](https://github.com/airbytehq/airbyte/pull/3290) | Switched to get states from destination instead of source. | +| 0.1.3 | 2021-07-03 | [\#3387](https://github.com/airbytehq/airbyte/pull/3387) | Fixed a bug for message length checking. | +| 0.1.2 | 2021-07-03 | [\#3327](https://github.com/airbytehq/airbyte/pull/3327) | Fixed LSEP unicode characters. | +| 0.1.1 | 2021-07-03 | [\#3289](https://github.com/airbytehq/airbyte/pull/3289) | Added support for outputting messages. | +| 0.1.0 | 2021-05-06 | [\#3242](https://github.com/airbytehq/airbyte/pull/3242) | Added MySQL destination. | + diff --git a/docs/integrations/destinations/oracle.md b/docs/integrations/destinations/oracle.md index 99e04a7d9bd..5d7039c969e 100644 --- a/docs/integrations/destinations/oracle.md +++ b/docs/integrations/destinations/oracle.md @@ -1,4 +1,4 @@ -# Oracle +# Oracle DB ## Features @@ -19,12 +19,13 @@ By default, each stream will be output into its own table in Oracle. Each table * `_AIRBYTE_EMITTED_AT`: a timestamp representing when the event was pulled from the data source. The column type in Oracle is `TIMESTAMP WITH TIME ZONE`. * `_AIRBYTE_DATA`: a json blob representing with the event data. The column type in Oracles is `NCLOB`. -Enabling normalization will also create normalized, strongly typed tables. +Enabling normalization will also create normalized, strongly typed tables. + +## Getting Started \(Airbyte Cloud\) -## Getting Started (Airbyte Cloud) The Oracle connector is currently in Alpha on Airbyte Cloud. Only TLS encrypted connections to your DB can be made from Airbyte Cloud. Other than that, follow the open-source instructions below. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) #### Requirements @@ -43,10 +44,10 @@ As Airbyte namespaces allows us to store data into different schemas, we have di | Login user | Destination user | Required permissions | Comment | | :--- | :--- | :--- | :--- | -| DBA User | Any user | - | | -| Regular user | Same user as login | Create, drop and write table, create session | | +| DBA User | Any user | - | | +| Regular user | Same user as login | Create, drop and write table, create session | | | Regular user | Any existing user | Create, drop and write ANY table, create session | Grants can be provided on a system level by DBA or by target user directly | -| Regular user | Not existing user | Create, drop and write ANY table, create user, create session | Grants should be provided on a system level by DBA | +| Regular user | Not existing user | Create, drop and write ANY table, create user, create session | Grants should be provided on a system level by DBA | We highly recommend creating an Airbyte-specific user for this purpose. @@ -59,33 +60,34 @@ You should now have all the requirements needed to configure Oracle as a destina * **Username** * **Password** * **Database** -* -## Connection via SSH Tunnel +* **Connection via SSH Tunnel** -Airbyte has the ability to connect to a Oracle instance via an SSH Tunnel. The reason you might want to do this because it is not possible (or against security policy) to connect to the database directly (e.g. it does not have a public IP address). +Airbyte has the ability to connect to a Oracle instance via an SSH Tunnel. The reason you might want to do this because it is not possible \(or against security policy\) to connect to the database directly \(e.g. it does not have a public IP address\). -When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server (a.k.a. a bastion sever) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. +When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server \(a.k.a. a bastion sever\) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. Using this feature requires additional configuration, when creating the source. We will talk through what each piece of configuration means. 1. Configure all fields for the source as you normally would, except `SSH Tunnel Method`. -2. `SSH Tunnel Method` defaults to `No Tunnel` (meaning a direct connection). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. - 1. Choose `Key Authentication` if you will be using an RSA private key as your secret for establishing the SSH Tunnel (see below for more information on generating this key). - 2. Choose `Password Authentication` if you will be using a password as your secret for establishing the SSH Tunnel. -3. `SSH Tunnel Jump Server Host` refers to the intermediate (bastion) server that Airbyte will connect to. This should be a hostname or an IP Address. +2. `SSH Tunnel Method` defaults to `No Tunnel` \(meaning a direct connection\). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. + 1. Choose `Key Authentication` if you will be using an RSA private key as your secret for establishing the SSH Tunnel \(see below for more information on generating this key\). + 2. Choose `Password Authentication` if you will be using a password as your secret for establishing the SSH Tunnel. +3. `SSH Tunnel Jump Server Host` refers to the intermediate \(bastion\) server that Airbyte will connect to. This should be a hostname or an IP Address. 4. `SSH Connection Port` is the port on the bastion server with which to make the SSH connection. The default port for SSH connections is `22`, so unless you have explicitly changed something, go with the default. 5. `SSH Login Username` is the username that Airbyte should use when connection to the bastion server. This is NOT the Oracle username. 6. If you are using `Password Authentication`, then `SSH Login Username` should be set to the password of the User from the previous step. If you are using `SSH Key Authentication` leave this blank. Again, this is not the Oracle password, but the password for the OS-user that Airbyte is using to perform commands on the bastion. 7. If you are using `SSH Key Authentication`, then `SSH Private Key` should be set to the RSA Private Key that you are using to create the SSH connection. This should be the full contents of the key file starting with `-----BEGIN RSA PRIVATE KEY-----` and ending with `-----END RSA PRIVATE KEY-----`. ## Changelog + | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.9 | 2021-10-06 | [#6611](https://github.com/airbytehq/airbyte/pull/6611)| 🐛 Destination Oracle: maxStringLength should be 128| -| 0.1.8 | 2021-09-28 | [#6370](https://github.com/airbytehq/airbyte/pull/6370)| Add SSH Support for Oracle Destination | -| 0.1.7 | 2021-08-30 | [#5746](https://github.com/airbytehq/airbyte/pull/5746) | Use default column name for raw tables | -| 0.1.6 | 2021-08-23 | [#5542](https://github.com/airbytehq/airbyte/pull/5542) | Remove support for Oracle 11g to allow normalization | -| 0.1.5 | 2021-08-10 | [#5307](https://github.com/airbytehq/airbyte/pull/5307) | 🐛 Destination Oracle: Fix destination check for users without dba role | -| 0.1.4 | 2021-07-30 | [#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | -| 0.1.3 | 2021-07-21 | [#3555](https://github.com/airbytehq/airbyte/pull/3555) | Partial Success in BufferedStreamConsumer | +| :--- | :--- | :--- | :--- | +| 0.1.9 | 2021-10-06 | [\#6611](https://github.com/airbytehq/airbyte/pull/6611) | 🐛 Destination Oracle: maxStringLength should be 128 | +| 0.1.8 | 2021-09-28 | [\#6370](https://github.com/airbytehq/airbyte/pull/6370) | Add SSH Support for Oracle Destination | +| 0.1.7 | 2021-08-30 | [\#5746](https://github.com/airbytehq/airbyte/pull/5746) | Use default column name for raw tables | +| 0.1.6 | 2021-08-23 | [\#5542](https://github.com/airbytehq/airbyte/pull/5542) | Remove support for Oracle 11g to allow normalization | +| 0.1.5 | 2021-08-10 | [\#5307](https://github.com/airbytehq/airbyte/pull/5307) | 🐛 Destination Oracle: Fix destination check for users without dba role | +| 0.1.4 | 2021-07-30 | [\#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | +| 0.1.3 | 2021-07-21 | [\#3555](https://github.com/airbytehq/airbyte/pull/3555) | Partial Success in BufferedStreamConsumer | | 0.1.2 | 2021-07-20 | [4874](https://github.com/airbytehq/airbyte/pull/4874) | Require `sid` instead of `database` in connector specification | + diff --git a/docs/integrations/destinations/postgres.md b/docs/integrations/destinations/postgres.md index 8fca4a44241..71ab0314223 100644 --- a/docs/integrations/destinations/postgres.md +++ b/docs/integrations/destinations/postgres.md @@ -21,10 +21,11 @@ Each stream will be output into its own table in Postgres. Each table will conta * `_airbyte_emitted_at`: a timestamp representing when the event was pulled from the data source. The column type in Postgres is `TIMESTAMP WITH TIME ZONE`. * `_airbyte_data`: a json blob representing with the event data. The column type in Postgres is `JSONB`. -## Getting Started (Airbyte Cloud) +## Getting Started \(Airbyte Cloud\) + Airbyte Cloud only supports connecting to your Postgres instance with SSL or TLS encryption. TLS is used by default. Other than that, you can proceed with the open-source instructions below. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) #### Requirements @@ -45,6 +46,7 @@ You need a Postgres user that can create tables and write rows. We highly recomm You will need to choose an existing database or create a new database that will be used to store synced data from Airbyte. ### Setup the Postgres Destination in Airbyte + You should now have all the requirements needed to configure Postgres as a destination in the UI. You'll need the following information to configure the Postgres destination: * **Host** @@ -72,7 +74,9 @@ From [Postgres SQL Identifiers syntax](https://www.postgresql.org/docs/9.0/sql-s Therefore, Airbyte Postgres destination will create tables and schemas using the Unquoted identifiers when possible or fallback to Quoted Identifiers if the names are containing special characters. ## Changelog + | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.3.10 | 2021-08-11 | [#5336](https://github.com/airbytehq/airbyte/pull/5336) | 🐛 Destination Postgres: fix \u0000(NULL) value processing | -| 0.3.11 | 2021-09-07 | [#5743](https://github.com/airbytehq/airbyte/pull/5743) | Add SSH Tunnel support | +| :--- | :--- | :--- | :--- | +| 0.3.10 | 2021-08-11 | [\#5336](https://github.com/airbytehq/airbyte/pull/5336) | 🐛 Destination Postgres: fix \u0000\(NULL\) value processing | +| 0.3.11 | 2021-09-07 | [\#5743](https://github.com/airbytehq/airbyte/pull/5743) | Add SSH Tunnel support | + diff --git a/docs/integrations/destinations/pubsub.md b/docs/integrations/destinations/pubsub.md index c27801533f2..60feec7bed2 100644 --- a/docs/integrations/destinations/pubsub.md +++ b/docs/integrations/destinations/pubsub.md @@ -1,5 +1,7 @@ --- -description: 'Pub/Sub is an asynchronous messaging service provided by Google Cloud Provider.' +description: >- + Pub/Sub is an asynchronous messaging service provided by Google Cloud + Provider. --- # Google PubSub @@ -9,7 +11,7 @@ description: 'Pub/Sub is an asynchronous messaging service provided by Google Cl The Airbyte Google PubSub destination allows you to send/stream data into PubSub. Pub/Sub is an asynchronous messaging service provided by Google Cloud Provider. ### Sync overview - + #### Output schema Each stream will be output a PubSubMessage with attributes. The message attributes will be @@ -17,9 +19,10 @@ Each stream will be output a PubSubMessage with attributes. The message attribut * `_stream`: the name of stream where the data is coming from * `_namespace`: namespace if available from the stream -The data will be a serialized JSON, containing the following fields +The data will be a serialized JSON, containing the following fields + * `_airbyte_ab_id`: a uuid string assigned by Airbyte to each event that is processed. -* `_airbyte_emitted_at`: a long timestamp(ms) representing when the event was pulled from the data source. +* `_airbyte_emitted_at`: a long timestamp\(ms\) representing when the event was pulled from the data source. * `_airbyte_data`: a json string representing source data. #### Features @@ -50,8 +53,7 @@ See the setup guide for more information about how to create the required resour If you have a Google Cloud Project with PubSub enabled, skip to the "Create a Topic" section. -First, follow along the Google Cloud instructions to [Create a Project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#before_you_begin). -PubSub is enabled automatically in new projects. If this is not the case for your project, find it in [Marketplace](https://console.cloud.google.com/marketplace/product/google/pubsub.googleapis.com) and enable. +First, follow along the Google Cloud instructions to [Create a Project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#before_you_begin). PubSub is enabled automatically in new projects. If this is not the case for your project, find it in [Marketplace](https://console.cloud.google.com/marketplace/product/google/pubsub.googleapis.com) and enable. #### PubSub topic for Airbyte syncs @@ -86,6 +88,7 @@ Once you've configured PubSub as a destination, delete the Service Account Key f ## CHANGELOG | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.1 | August 13, 2021 | [#4699](https://github.com/airbytehq/airbyte/pull/4699)| Added json config validator | -| 0.1.0 | June 24, 2021 | [#4339](https://github.com/airbytehq/airbyte/pull/4339)| Initial release | +| :--- | :--- | :--- | :--- | +| 0.1.1 | August 13, 2021 | [\#4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | +| 0.1.0 | June 24, 2021 | [\#4339](https://github.com/airbytehq/airbyte/pull/4339) | Initial release | + diff --git a/docs/integrations/destinations/redshift.md b/docs/integrations/destinations/redshift.md index e4cdf389a01..42be929b638 100644 --- a/docs/integrations/destinations/redshift.md +++ b/docs/integrations/destinations/redshift.md @@ -4,7 +4,7 @@ The Airbyte Redshift destination allows you to sync data to Redshift. -This Redshift destination connector has two replication strategies: +This Redshift destination connector has two replication strategies: 1. INSERT: Replicates data via SQL INSERT queries. This is built on top of the destination-jdbc code base and is configured to rely on JDBC 4.2 standard drivers provided by Amazon via Mulesoft [here](https://mvnrepository.com/artifact/com.amazon.redshift/redshift-jdbc42) as described in Redshift documentation [here](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-install.html). **Not recommended for production workloads as this does not scale well**. 2. COPY: Replicates data by first uploading data to an S3 bucket and issuing a COPY command. This is the recommended loading approach described by Redshift [best practices](https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html). Requires an S3 bucket and credentials. @@ -45,7 +45,7 @@ You will need to choose an existing database or create a new database that will 3. A staging S3 bucket with credentials \(for the COPY strategy\). {% hint style="info" %} -Even if your Airbyte instance is running on a server in the same VPC as your Redshift cluster, you may need to place them in the **same security group** to allow connections between the two. +Even if your Airbyte instance is running on a server in the same VPC as your Redshift cluster, you may need to place them in the **same security group** to allow connections between the two. {% endhint %} ### Setup guide @@ -109,9 +109,10 @@ See [docs](https://docs.aws.amazon.com/redshift/latest/dg/r_Character_types.html ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.3.14 | 2021-10-08 | [5924](https://github.com/airbytehq/airbyte/pull/5924) | Fixed AWS S3 Staging COPY is writing records from different table in the same raw table | -| 0.3.13 | 2021-09-02 | [5745](https://github.com/airbytehq/airbyte/pull/5745) | Disable STATUPDATE flag when using S3 staging to speed up performance | -| 0.3.12 | 2021-07-21 | [3555](https://github.com/airbytehq/airbyte/pull/3555) | Enable partial checkpointing for halfway syncs | -| 0.3.11 | 2021-07-20 | [4874](https://github.com/airbytehq/airbyte/pull/4874) | allow `additionalProperties` in connector spec | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.3.14 | 2021-10-08 | [5924](https://github.com/airbytehq/airbyte/pull/5924) | Fixed AWS S3 Staging COPY is writing records from different table in the same raw table | +| 0.3.13 | 2021-09-02 | [5745](https://github.com/airbytehq/airbyte/pull/5745) | Disable STATUPDATE flag when using S3 staging to speed up performance | +| 0.3.12 | 2021-07-21 | [3555](https://github.com/airbytehq/airbyte/pull/3555) | Enable partial checkpointing for halfway syncs | +| 0.3.11 | 2021-07-20 | [4874](https://github.com/airbytehq/airbyte/pull/4874) | allow `additionalProperties` in connector spec | + diff --git a/docs/integrations/destinations/s3.md b/docs/integrations/destinations/s3.md index 90cd4ad5a79..ab811c48bf0 100644 --- a/docs/integrations/destinations/s3.md +++ b/docs/integrations/destinations/s3.md @@ -5,7 +5,7 @@ | Feature | Support | Notes | | :--- | :---: | :--- | | Full Refresh Sync | ✅ | Warning: this mode deletes all previously synced data in the configured bucket path. | -| Incremental - Append Sync | ✅ | | +| Incremental - Append Sync | ✅ | | | Incremental - Deduped History | ❌ | As this connector does not support dbt, we don't support this sync mode on this destination. | | Namespaces | ❌ | Setting a specific bucket path is equivalent to having separate namespaces. | @@ -26,19 +26,19 @@ Check out common troubleshooting issues for the S3 destination connector on our | Access Key ID | string | AWS/Minio credential. | | Secret Access Key | string | AWS/Minio credential. | | Format | object | Format specific configuration. See below for details. | -| Part Size | integer | Arg to configure a block size. Max allowed blocks by S3 = 10,000, i.e. max stream size = blockSize * 10,000 blocks. | +| Part Size | integer | Arg to configure a block size. Max allowed blocks by S3 = 10,000, i.e. max stream size = blockSize \* 10,000 blocks. | ⚠️ Please note that under "Full Refresh Sync" mode, data in the configured bucket and path will be wiped out before each sync. We recommend you to provision a dedicated S3 resource for this sync to prevent unexpected data deletion from misconfiguration. ⚠️ The full path of the output data is: -``` +```text ///--. ``` For example: -``` +```text testing_bucket/data_output_path/public/users/2021_01_01_1609541171643_0.csv ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ | | | | | | | format extension @@ -53,10 +53,7 @@ bucket name Please note that the stream name may contain a prefix, if it is configured on the connection. -The rationales behind this naming pattern are: -1. Each stream has its own directory. -2. The data output files can be sorted by upload time. -3. The upload time composes of a date part and millis part so that it is both readable and unique. +The rationales behind this naming pattern are: 1. Each stream has its own directory. 2. The data output files can be sorted by upload time. 3. The upload time composes of a date part and millis part so that it is both readable and unique. Currently, each data sync will only create one file per stream. In the future, the output file can be partitioned by size. Each partition is identifiable by the partition ID, which is always 0 for now. @@ -64,39 +61,39 @@ Currently, each data sync will only create one file per stream. In the future, t Each stream will be outputted to its dedicated directory according to the configuration. The complete datastore of each stream includes all the output files under that directory. You can think of the directory as equivalent of a Table in the database world. -- Under Full Refresh Sync mode, old output files will be purged before new files are created. -- Under Incremental - Append Sync mode, new output files will be added that only contain the new data. +* Under Full Refresh Sync mode, old output files will be purged before new files are created. +* Under Incremental - Append Sync mode, new output files will be added that only contain the new data. ### Avro -[Apache Avro](https://avro.apache.org/) serializes data in a compact binary format. Currently, the Airbyte S3 Avro connector always uses the [binary encoding](http://avro.apache.org/docs/current/spec.html#binary_encoding), and assumes that all data records follow the same schema. +[Apache Avro](https://avro.apache.org/) serializes data in a compact binary format. Currently, the Airbyte S3 Avro connector always uses the [binary encoding](http://avro.apache.org/docs/current/spec.html#binary_encoding), and assumes that all data records follow the same schema. #### Configuration Here is the available compression codecs: -- No compression -- `deflate` - - Compression level - - Range `[0, 9]`. Default to 0. - - Level 0: no compression & fastest. - - Level 9: best compression & slowest. -- `bzip2` -- `xz` - - Compression level - - Range `[0, 9]`. Default to 6. - - Level 0-3 are fast with medium compression. - - Level 4-6 are fairly slow with high compression. - - Level 7-9 are like level 6 but use bigger dictionaries and have higher memory requirements. Unless the uncompressed size of the file exceeds 8 MiB, 16 MiB, or 32 MiB, it is waste of memory to use the presets 7, 8, or 9, respectively. -- `zstandard` - - Compression level - - Range `[-5, 22]`. Default to 3. - - Negative levels are 'fast' modes akin to `lz4` or `snappy`. - - Levels above 9 are generally for archival purposes. - - Levels above 18 use a lot of memory. - - Include checksum - - If set to `true`, a checksum will be included in each data block. -- `snappy` +* No compression +* `deflate` + * Compression level + * Range `[0, 9]`. Default to 0. + * Level 0: no compression & fastest. + * Level 9: best compression & slowest. +* `bzip2` +* `xz` + * Compression level + * Range `[0, 9]`. Default to 6. + * Level 0-3 are fast with medium compression. + * Level 4-6 are fairly slow with high compression. + * Level 7-9 are like level 6 but use bigger dictionaries and have higher memory requirements. Unless the uncompressed size of the file exceeds 8 MiB, 16 MiB, or 32 MiB, it is waste of memory to use the presets 7, 8, or 9, respectively. +* `zstandard` + * Compression level + * Range `[-5, 22]`. Default to 3. + * Negative levels are 'fast' modes akin to `lz4` or `snappy`. + * Levels above 9 are generally for archival purposes. + * Levels above 18 use a lot of memory. + * Include checksum + * If set to `true`, a checksum will be included in each data block. +* `snappy` #### Data schema @@ -104,43 +101,44 @@ Under the hood, an Airbyte data stream in Json schema is converted to an Avro sc 1. Json schema types are mapped to Avro types as follows: - | Json Data Type | Avro Data Type | - | :---: | :---: | - | string | string | - | number | double | - | integer | int | - | boolean | boolean | - | null | null | - | object | record | - | array | array | + | Json Data Type | Avro Data Type | + | :---: | :---: | + | string | string | + | number | double | + | integer | int | + | boolean | boolean | + | null | null | + | object | record | + | array | array | 2. Built-in Json schema formats are not mapped to Avro logical types at this moment. -2. Combined restrictions ("allOf", "anyOf", and "oneOf") will be converted to type unions. The corresponding Avro schema can be less stringent. For example, the following Json schema +3. Combined restrictions \("allOf", "anyOf", and "oneOf"\) will be converted to type unions. The corresponding Avro schema can be less stringent. For example, the following Json schema - ```json - { + ```javascript + { "oneOf": [ { "type": "string" }, { "type": "integer" } ] - } - ``` - will become this in Avro schema: + } + ``` - ```json - { + will become this in Avro schema: + + ```javascript + { "type": ["null", "string", "int"] - } - ``` + } + ``` -2. Keyword `not` is not supported, as there is no equivalent validation mechanism in Avro schema. -3. Only alphanumeric characters and underscores (`/a-zA-Z0-9_/`) are allowed in a stream or field name. Any special character will be converted to an alphabet or underscore. For example, `spécial:character_names` will become `special_character_names`. The original names will be stored in the `doc` property in this format: `_airbyte_original_name:`. -4. The field name cannot start with a number, so an underscore will be added to the field name at the beginning. -5. All field will be nullable. For example, a `string` Json field will be typed as `["null", "string"]` in Avro. This is necessary because the incoming data stream may have optional fields. -6. For array fields in Json schema, when the `items` property is an array, it means that each element in the array should follow its own schema sequentially. For example, the following specification means the first item in the array should be a string, and the second a number. +4. Keyword `not` is not supported, as there is no equivalent validation mechanism in Avro schema. +5. Only alphanumeric characters and underscores \(`/a-zA-Z0-9_/`\) are allowed in a stream or field name. Any special character will be converted to an alphabet or underscore. For example, `spécial:character_names` will become `special_character_names`. The original names will be stored in the `doc` property in this format: `_airbyte_original_name:`. +6. The field name cannot start with a number, so an underscore will be added to the field name at the beginning. +7. All field will be nullable. For example, a `string` Json field will be typed as `["null", "string"]` in Avro. This is necessary because the incoming data stream may have optional fields. +8. For array fields in Json schema, when the `items` property is an array, it means that each element in the array should follow its own schema sequentially. For example, the following specification means the first item in the array should be a string, and the second a number. - ```json - { + ```javascript + { "array_field": { "type": "array", "items": [ @@ -148,13 +146,13 @@ Under the hood, an Airbyte data stream in Json schema is converted to an Avro sc { "type": "number" } ] } - } - ``` + } + ``` - This is not supported in Avro schema. As a compromise, the converter creates a union, ["string", "number"], which is less stringent: + This is not supported in Avro schema. As a compromise, the converter creates a union, \["string", "number"\], which is less stringent: - ```json - { + ```javascript + { "name": "array_field", "type": [ "null", @@ -164,21 +162,21 @@ Under the hood, an Airbyte data stream in Json schema is converted to an Avro sc } ], "default": null - } - ``` + } + ``` -6. Two Airbyte specific fields will be added to each Avro record: +9. Two Airbyte specific fields will be added to each Avro record: - | Field | Schema | Document | - | :--- | :--- | :---: | - | `_airbyte_ab_id` | `uuid` | [link](http://avro.apache.org/docs/current/spec.html#UUID) - | `_airbyte_emitted_at` | `timestamp-millis` | [link](http://avro.apache.org/docs/current/spec.html#Timestamp+%28millisecond+precision%29) | + | Field | Schema | Document | + | :--- | :--- | :---: | + | `_airbyte_ab_id` | `uuid` | [link](http://avro.apache.org/docs/current/spec.html#UUID) | + | `_airbyte_emitted_at` | `timestamp-millis` | [link](http://avro.apache.org/docs/current/spec.html#Timestamp+%28millisecond+precision%29) | -7. Currently `additionalProperties` is not supported. This means if the source is schemaless (e.g. Mongo), or has flexible fields, they will be ignored. We will have a solution soon. Feel free to submit a new issue if this is blocking for you. +10. Currently `additionalProperties` is not supported. This means if the source is schemaless \(e.g. Mongo\), or has flexible fields, they will be ignored. We will have a solution soon. Feel free to submit a new issue if this is blocking for you. For example, given the following Json schema: -```json +```javascript { "type": "object", "$schema": "http://json-schema.org/draft-07/schema#", @@ -207,7 +205,7 @@ For example, given the following Json schema: Its corresponding Avro schema will be: -```json +```javascript { "name" : "stream_name", "type" : "record", @@ -254,18 +252,18 @@ Its corresponding Avro schema will be: ### CSV -Like most of the other Airbyte destination connectors, usually the output has three columns: a UUID, an emission timestamp, and the data blob. With the CSV output, it is possible to normalize (flatten) the data blob to multiple columns. +Like most of the other Airbyte destination connectors, usually the output has three columns: a UUID, an emission timestamp, and the data blob. With the CSV output, it is possible to normalize \(flatten\) the data blob to multiple columns. | Column | Condition | Description | | :--- | :--- | :--- | | `_airbyte_ab_id` | Always exists | A uuid assigned by Airbyte to each processed record. | | `_airbyte_emitted_at` | Always exists. | A timestamp representing when the event was pulled from the data source. | -| `_airbyte_data` | When no normalization (flattening) is needed, all data reside under this column as a json blob. | -| root level fields | When root level normalization (flattening) is selected, the root level fields are expanded. | +| `_airbyte_data` | When no normalization \(flattening\) is needed, all data reside under this column as a json blob. | | +| root level fields | When root level normalization \(flattening\) is selected, the root level fields are expanded. | | For example, given the following json object from a source: -```json +```javascript { "user_id": 123, "name": { @@ -287,11 +285,11 @@ With root level normalization, the output CSV is: | :--- | :--- | :--- | :--- | | `26d73cde-7eb1-4e1e-b7db-a4c03b4cf206` | 1622135805000 | 123 | `{ "first": "John", "last": "Doe" }` | -### JSON Lines (JSONL) +### JSON Lines \(JSONL\) [Json Lines](https://jsonlines.org/) is a text format with one JSON per line. Each line has a structure as follows: -```json +```javascript { "_airbyte_ab_id": "", "_airbyte_emitted_at": "", @@ -301,7 +299,7 @@ With root level normalization, the output CSV is: For example, given the following two json objects from a source: -```json +```javascript [ { "user_id": 123, @@ -322,7 +320,7 @@ For example, given the following two json objects from a source: They will be like this in the output file: -```jsonl +```text { "_airbyte_ab_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_emitted_at": "1622135805000", "_airbyte_data": { "user_id": 123, "name": { "first": "John", "last": "Doe" } } } { "_airbyte_ab_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_emitted_at": "1631948170000", "_airbyte_data": { "user_id": 456, "name": { "first": "Jane", "last": "Roe" } } } ``` @@ -336,19 +334,19 @@ The following configuration is available to configure the Parquet output: | Parameter | Type | Default | Description | | :--- | :---: | :---: | :--- | | `compression_codec` | enum | `UNCOMPRESSED` | **Compression algorithm**. Available candidates are: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZO`, `BROTLI`, `LZ4`, and `ZSTD`. | -| `block_size_mb` | integer | 128 (MB) | **Block size (row group size)** in MB. This is the size of a row group being buffered in memory. It limits the memory usage when writing. Larger values will improve the IO when reading, but consume more memory when writing. | -| `max_padding_size_mb` | integer | 8 (MB) | **Max padding size** in MB. This is the maximum size allowed as padding to align row groups. This is also the minimum size of a row group. | -| `page_size_kb` | integer | 1024 (KB) | **Page size** in KB. The page size is for compression. A block is composed of pages. A page is the smallest unit that must be read fully to access a single record. If this value is too small, the compression will deteriorate. | -| `dictionary_page_size_kb` | integer | 1024 (KB) | **Dictionary Page Size** in KB. There is one dictionary page per column per row group when dictionary encoding is used. The dictionary page size works like the page size but for dictionary. | +| `block_size_mb` | integer | 128 \(MB\) | **Block size \(row group size\)** in MB. This is the size of a row group being buffered in memory. It limits the memory usage when writing. Larger values will improve the IO when reading, but consume more memory when writing. | +| `max_padding_size_mb` | integer | 8 \(MB\) | **Max padding size** in MB. This is the maximum size allowed as padding to align row groups. This is also the minimum size of a row group. | +| `page_size_kb` | integer | 1024 \(KB\) | **Page size** in KB. The page size is for compression. A block is composed of pages. A page is the smallest unit that must be read fully to access a single record. If this value is too small, the compression will deteriorate. | +| `dictionary_page_size_kb` | integer | 1024 \(KB\) | **Dictionary Page Size** in KB. There is one dictionary page per column per row group when dictionary encoding is used. The dictionary page size works like the page size but for dictionary. | | `dictionary_encoding` | boolean | `true` | **Dictionary encoding**. This parameter controls whether dictionary encoding is turned on. | -These parameters are related to the `ParquetOutputFormat`. See the [Java doc](https://www.javadoc.io/doc/org.apache.parquet/parquet-hadoop/1.12.0/org/apache/parquet/hadoop/ParquetOutputFormat.html) for more details. Also see [Parquet documentation](https://parquet.apache.org/documentation/latest/#configurations) for their recommended configurations (512 - 1024 MB block size, 8 KB page size). +These parameters are related to the `ParquetOutputFormat`. See the [Java doc](https://www.javadoc.io/doc/org.apache.parquet/parquet-hadoop/1.12.0/org/apache/parquet/hadoop/ParquetOutputFormat.html) for more details. Also see [Parquet documentation](https://parquet.apache.org/documentation/latest/#configurations) for their recommended configurations \(512 - 1024 MB block size, 8 KB page size\). #### Data schema -Under the hood, an Airbyte data stream in Json schema is first converted to an Avro schema, then the Json object is converted to an Avro record, and finally the Avro record is outputted to the Parquet format. See the `Data schema` section from the [Avro output](#avro) for rules and limitations. +Under the hood, an Airbyte data stream in Json schema is first converted to an Avro schema, then the Json object is converted to an Avro record, and finally the Avro record is outputted to the Parquet format. See the `Data schema` section from the [Avro output](s3.md#avro) for rules and limitations. -## Getting Started (Airbyte Open-Source / Airbyte Cloud) +## Getting Started \(Airbyte Open-Source / Airbyte Cloud\) #### Requirements @@ -356,6 +354,7 @@ Under the hood, an Airbyte data stream in Json schema is first converted to an A 2. An S3 bucket with credentials. #### Setup Guide + * Fill up S3 info * **S3 Endpoint** * Leave empty if using AWS S3, fill in S3 URL if using Minio S3. @@ -375,17 +374,18 @@ Under the hood, an Airbyte data stream in Json schema is first converted to an A ## CHANGELOG | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.12 | 2021-09-13 | [#5720](https://github.com/airbytehq/airbyte/issues/5720) | Added configurable block size for stream. Each stream is limited to 10,000 by S3 | -| 0.1.11 | 2021-09-10 | [#5729](https://github.com/airbytehq/airbyte/pull/5729) | For field names that start with a digit, a `_` will be appended at the beginning for the` Parquet` and `Avro` formats. | -| 0.1.10 | 2021-08-17 | [#4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | -| 0.1.9 | 2021-07-12 | [#4666](https://github.com/airbytehq/airbyte/pull/4666) | Fix MinIO output for Parquet format. | -| 0.1.8 | 2021-07-07 | [#4613](https://github.com/airbytehq/airbyte/pull/4613) | Patched schema converter to support combined restrictions. | -| 0.1.7 | 2021-06-23 | [#4227](https://github.com/airbytehq/airbyte/pull/4227) | Added Avro and JSONL output. | -| 0.1.6 | 2021-06-16 | [#4130](https://github.com/airbytehq/airbyte/pull/4130) | Patched the check to verify prefix access instead of full-bucket access. | -| 0.1.5 | 2021-06-14 | [#3908](https://github.com/airbytehq/airbyte/pull/3908) | Fixed default `max_padding_size_mb` in `spec.json`. | -| 0.1.4 | 2021-06-14 | [#3908](https://github.com/airbytehq/airbyte/pull/3908) | Added Parquet output. | -| 0.1.3 | 2021-06-13 | [#4038](https://github.com/airbytehq/airbyte/pull/4038) | Added support for alternative S3. | -| 0.1.2 | 2021-06-10 | [#4029](https://github.com/airbytehq/airbyte/pull/4029) | Fixed `_airbyte_emitted_at` field to be a UTC instead of local timestamp for consistency. | -| 0.1.1 | 2021-06-09 | [#3973](https://github.com/airbytehq/airbyte/pull/3973) | Added `AIRBYTE_ENTRYPOINT` in base Docker image for Kubernetes support. | -| 0.1.0 | 2021-06-03 | [#3672](https://github.com/airbytehq/airbyte/pull/3672) | Initial release with CSV output. | +| :--- | :--- | :--- | :--- | +| 0.1.12 | 2021-09-13 | [\#5720](https://github.com/airbytehq/airbyte/issues/5720) | Added configurable block size for stream. Each stream is limited to 10,000 by S3 | +| 0.1.11 | 2021-09-10 | [\#5729](https://github.com/airbytehq/airbyte/pull/5729) | For field names that start with a digit, a `_` will be appended at the beginning for the`Parquet` and `Avro` formats. | +| 0.1.10 | 2021-08-17 | [\#4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | +| 0.1.9 | 2021-07-12 | [\#4666](https://github.com/airbytehq/airbyte/pull/4666) | Fix MinIO output for Parquet format. | +| 0.1.8 | 2021-07-07 | [\#4613](https://github.com/airbytehq/airbyte/pull/4613) | Patched schema converter to support combined restrictions. | +| 0.1.7 | 2021-06-23 | [\#4227](https://github.com/airbytehq/airbyte/pull/4227) | Added Avro and JSONL output. | +| 0.1.6 | 2021-06-16 | [\#4130](https://github.com/airbytehq/airbyte/pull/4130) | Patched the check to verify prefix access instead of full-bucket access. | +| 0.1.5 | 2021-06-14 | [\#3908](https://github.com/airbytehq/airbyte/pull/3908) | Fixed default `max_padding_size_mb` in `spec.json`. | +| 0.1.4 | 2021-06-14 | [\#3908](https://github.com/airbytehq/airbyte/pull/3908) | Added Parquet output. | +| 0.1.3 | 2021-06-13 | [\#4038](https://github.com/airbytehq/airbyte/pull/4038) | Added support for alternative S3. | +| 0.1.2 | 2021-06-10 | [\#4029](https://github.com/airbytehq/airbyte/pull/4029) | Fixed `_airbyte_emitted_at` field to be a UTC instead of local timestamp for consistency. | +| 0.1.1 | 2021-06-09 | [\#3973](https://github.com/airbytehq/airbyte/pull/3973) | Added `AIRBYTE_ENTRYPOINT` in base Docker image for Kubernetes support. | +| 0.1.0 | 2021-06-03 | [\#3672](https://github.com/airbytehq/airbyte/pull/3672) | Initial release with CSV output. | + diff --git a/docs/integrations/destinations/snowflake.md b/docs/integrations/destinations/snowflake.md index 95e89e40c50..56db5d68f7d 100644 --- a/docs/integrations/destinations/snowflake.md +++ b/docs/integrations/destinations/snowflake.md @@ -187,12 +187,11 @@ The final query should show a `STORAGE_GCP_SERVICE_ACCOUNT` property with an ema Finally, you need to add read/write permissions to your bucket with that email. - -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.3.14 | 2021-09-08 | [#5924](https://github.com/airbytehq/airbyte/pull/5924) | Fixed AWS S3 Staging COPY is writing records from different table in the same raw table | -| 0.3.13 | 2021-09-01 | [#5784](https://github.com/airbytehq/airbyte/pull/5784) | Updated query timeout from 30 minutes to 3 hours | -| 0.3.12 | 2021-07-30 | [#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | -| 0.3.11 | 2021-07-21 | [#3555](https://github.com/airbytehq/airbyte/pull/3555) | Partial Success in BufferedStreamConsumer | -| 0.3.10 | 2021-07-12 | [#4713](https://github.com/airbytehq/airbyte/pull/4713)| Tag traffic with `airbyte` label to enable optimization opportunities from Snowflake | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.3.14 | 2021-09-08 | [\#5924](https://github.com/airbytehq/airbyte/pull/5924) | Fixed AWS S3 Staging COPY is writing records from different table in the same raw table | +| 0.3.13 | 2021-09-01 | [\#5784](https://github.com/airbytehq/airbyte/pull/5784) | Updated query timeout from 30 minutes to 3 hours | +| 0.3.12 | 2021-07-30 | [\#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json | +| 0.3.11 | 2021-07-21 | [\#3555](https://github.com/airbytehq/airbyte/pull/3555) | Partial Success in BufferedStreamConsumer | +| 0.3.10 | 2021-07-12 | [\#4713](https://github.com/airbytehq/airbyte/pull/4713) | Tag traffic with `airbyte` label to enable optimization opportunities from Snowflake | diff --git a/docs/integrations/sources/amazon-ads.md b/docs/integrations/sources/amazon-ads.md index ba74c375eef..0ba7e922a3d 100644 --- a/docs/integrations/sources/amazon-ads.md +++ b/docs/integrations/sources/amazon-ads.md @@ -56,27 +56,27 @@ Information about expected report generation waiting time you may find [here](ht ### Requirements -* client_id -* client_secret -* refresh_token +* client\_id +* client\_secret +* refresh\_token * scope * profiles * region -* start_date (optional) +* start\_date \(optional\) -More how to get client_id and client_secret you can find on [AWS docs](https://advertising.amazon.com/API/docs/en-us/setting-up/step-1-create-lwa-app). +More how to get client\_id and client\_secret you can find on [AWS docs](https://advertising.amazon.com/API/docs/en-us/setting-up/step-1-create-lwa-app). Refresh token is generated according to standard [AWS Oauth 2.0 flow](https://developer.amazon.com/docs/login-with-amazon/conceptual-overview.html) -Scope usually has "advertising::campaign_management" value, but customers may need to set scope to "cpc_advertising:campaign_management", - -Start date used for generating reports starting from the specified start date. Should be in YYYY-MM-DD format and not more than 60 days in the past. If not specified today date is used. Date for specific profile is calculated according to its timezone, this parameter should be specified in UTC timezone. Since it have no sense of generate report for current day (metrics could be changed) it generates report for day before (e.g. if start_date is 2021-10-11 it would use 20211010 as reportDate parameter for request). +Scope usually has "advertising::campaign\_management" value, but customers may need to set scope to "cpc\_advertising:campaign\_management", +Start date used for generating reports starting from the specified start date. Should be in YYYY-MM-DD format and not more than 60 days in the past. If not specified today date is used. Date for specific profile is calculated according to its timezone, this parameter should be specified in UTC timezone. Since it have no sense of generate report for current day \(metrics could be changed\) it generates report for day before \(e.g. if start\_date is 2021-10-11 it would use 20211010 as reportDate parameter for request\). ## CHANGELOG | Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| `0.1.2` | 2021-10-01 | [#6367](https://github.com/airbytehq/airbyte/pull/6461) | `Add option to pull data for different regions. Add option to choose profiles we want to pull data. Add lookback` | -| `0.1.1` | 2021-09-22 | [#6367](https://github.com/airbytehq/airbyte/pull/6367) | `Add seller and vendor filters to profiles stream` | -| `0.1.0` | 2021-08-13 | [#5023](https://github.com/airbytehq/airbyte/pull/5023) | `Initial version` | +| :--- | :--- | :--- | :--- | +| `0.1.2` | 2021-10-01 | [\#6367](https://github.com/airbytehq/airbyte/pull/6461) | `Add option to pull data for different regions. Add option to choose profiles we want to pull data. Add lookback` | +| `0.1.1` | 2021-09-22 | [\#6367](https://github.com/airbytehq/airbyte/pull/6367) | `Add seller and vendor filters to profiles stream` | +| `0.1.0` | 2021-08-13 | [\#5023](https://github.com/airbytehq/airbyte/pull/5023) | `Initial version` | + diff --git a/docs/integrations/sources/amazon-seller-partner.md b/docs/integrations/sources/amazon-seller-partner.md index 063fd5460b1..eda0fda4d75 100644 --- a/docs/integrations/sources/amazon-seller-partner.md +++ b/docs/integrations/sources/amazon-seller-partner.md @@ -8,18 +8,17 @@ This source can sync data for the [Amazon Seller Partner API](https://github.com This source is capable of syncing the following streams: -* [GET_FLAT_FILE_ALL_ORDERS_DATA_BY_ORDER_DATE_GENERAL](https://sellercentral.amazon.com/gp/help/help.html?itemID=201648780) -* [GET_MERCHANT_LISTINGS_ALL_DATA](https://github.com/amzn/selling-partner-api-docs/blob/main/references/reports-api/reporttype-values.md#inventory-reports) -* [GET_FBA_INVENTORY_AGED_DATA](https://sellercentral.amazon.com/gp/help/200740930) -* [GET_AMAZON_FULFILLED_SHIPMENTS_DATA_GENERAL](https://sellercentral.amazon.com/gp/help/help.html?itemID=200453120) -* [GET_FLAT_FILE_OPEN_LISTINGS_DATA](https://github.com/amzn/selling-partner-api-docs/blob/main/references/reports-api/reporttype-values.md#inventory-reports) -* [GET_FBA_FULFILLMENT_REMOVAL_ORDER_DETAIL_DATA](https://sellercentral.amazon.com/gp/help/help.html?itemID=200989110) -* [GET_FBA_FULFILLMENT_REMOVAL_SHIPMENT_DETAIL_DATA](https://sellercentral.amazon.com/gp/help/help.html?itemID=200989100) -* [GET_VENDOR_INVENTORY_HEALTH_AND_PLANNING_REPORT](https://github.com/amzn/selling-partner-api-docs/blob/main/references/reports-api/reporttype-values.md#vendor-retail-analytics-reports) -* [Orders](https://github.com/amzn/selling-partner-api-docs/blob/main/references/orders-api/ordersV0.md) (incremental) +* [GET\_FLAT\_FILE\_ALL\_ORDERS\_DATA\_BY\_ORDER\_DATE\_GENERAL](https://sellercentral.amazon.com/gp/help/help.html?itemID=201648780) +* [GET\_MERCHANT\_LISTINGS\_ALL\_DATA](https://github.com/amzn/selling-partner-api-docs/blob/main/references/reports-api/reporttype-values.md#inventory-reports) +* [GET\_FBA\_INVENTORY\_AGED\_DATA](https://sellercentral.amazon.com/gp/help/200740930) +* [GET\_AMAZON\_FULFILLED\_SHIPMENTS\_DATA\_GENERAL](https://sellercentral.amazon.com/gp/help/help.html?itemID=200453120) +* [GET\_FLAT\_FILE\_OPEN\_LISTINGS\_DATA](https://github.com/amzn/selling-partner-api-docs/blob/main/references/reports-api/reporttype-values.md#inventory-reports) +* [GET\_FBA\_FULFILLMENT\_REMOVAL\_ORDER\_DETAIL\_DATA](https://sellercentral.amazon.com/gp/help/help.html?itemID=200989110) +* [GET\_FBA\_FULFILLMENT\_REMOVAL\_SHIPMENT\_DETAIL\_DATA](https://sellercentral.amazon.com/gp/help/help.html?itemID=200989100) +* [GET\_VENDOR\_INVENTORY\_HEALTH\_AND\_PLANNING\_REPORT](https://github.com/amzn/selling-partner-api-docs/blob/main/references/reports-api/reporttype-values.md#vendor-retail-analytics-reports) +* [Orders](https://github.com/amzn/selling-partner-api-docs/blob/main/references/orders-api/ordersV0.md) \(incremental\) * [VendorDirectFulfillmentShipping](https://github.com/amzn/selling-partner-api-docs/blob/main/references/vendor-direct-fulfillment-shipping-api/vendorDirectFulfillmentShippingV1.md) - ### Data type mapping | Integration Type | Airbyte Type | Notes | @@ -47,14 +46,14 @@ Information about rate limits you may find [here](https://github.com/amzn/sellin ### Requirements -* replication_start_date -* refresh_token -* lwa_app_id -* lwa_client_secret -* aws_access_key -* aws_secret_key -* role_arn -* aws_environment +* replication\_start\_date +* refresh\_token +* lwa\_app\_id +* lwa\_client\_secret +* aws\_access\_key +* aws\_secret\_key +* role\_arn +* aws\_environment * region ### Setup guide @@ -64,8 +63,9 @@ Information about how to get credentials you may find [here](https://github.com/ ## CHANGELOG | Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| `0.2.1` | 2021-09-17 | [#5248](https://github.com/airbytehq/airbyte/pull/5248) | `Added extra stream support. Updated reports streams logics` | -| `0.2.0` | 2021-08-06 | [#4863](https://github.com/airbytehq/airbyte/pull/4863) | `Rebuild source with airbyte-cdk` | -| `0.1.3` | 2021-06-23 | [#4288](https://github.com/airbytehq/airbyte/pull/4288) | `Bugfix failing connection check` | -| `0.1.2` | 2021-06-15 | [#4108](https://github.com/airbytehq/airbyte/pull/4108) | `Fixed: Sync fails with timeout when create report is CANCELLED` | +| :--- | :--- | :--- | :--- | +| `0.2.1` | 2021-09-17 | [\#5248](https://github.com/airbytehq/airbyte/pull/5248) | `Added extra stream support. Updated reports streams logics` | +| `0.2.0` | 2021-08-06 | [\#4863](https://github.com/airbytehq/airbyte/pull/4863) | `Rebuild source with airbyte-cdk` | +| `0.1.3` | 2021-06-23 | [\#4288](https://github.com/airbytehq/airbyte/pull/4288) | `Bugfix failing connection check` | +| `0.1.2` | 2021-06-15 | [\#4108](https://github.com/airbytehq/airbyte/pull/4108) | `Fixed: Sync fails with timeout when create report is CANCELLED` | + diff --git a/docs/integrations/sources/amplitude.md b/docs/integrations/sources/amplitude.md index 82ea52fc190..9011f9fa91e 100644 --- a/docs/integrations/sources/amplitude.md +++ b/docs/integrations/sources/amplitude.md @@ -43,8 +43,9 @@ Please read [How to get your API key and Secret key](https://help.amplitude.com/ ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.2 | 2021-09-21 | [6353](https://github.com/airbytehq/airbyte/pull/6353) | Correct output schemas on cohorts, events, active_users, and average_session_lengths streams | -| 0.1.1 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE_ENTRYPOINT for kubernetes support | -| 0.1.0 | 2021-06-08 | [3664](https://github.com/airbytehq/airbyte/pull/3664) | New Source: Amplitude | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.2 | 2021-09-21 | [6353](https://github.com/airbytehq/airbyte/pull/6353) | Correct output schemas on cohorts, events, active\_users, and average\_session\_lengths streams | +| 0.1.1 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE\_ENTRYPOINT for kubernetes support | +| 0.1.0 | 2021-06-08 | [3664](https://github.com/airbytehq/airbyte/pull/3664) | New Source: Amplitude | + diff --git a/docs/integrations/sources/apify-dataset.md b/docs/integrations/sources/apify-dataset.md index 89ef85cd606..ddb3426d38d 100644 --- a/docs/integrations/sources/apify-dataset.md +++ b/docs/integrations/sources/apify-dataset.md @@ -1,23 +1,20 @@ --- -description: >- - Web scraping and automation platform. +description: Web scraping and automation platform. --- -# Apify dataset +# Apify Dataset ## Overview + [Apify](https://www.apify.com) is a web scraping and web automation platform providing both ready-made and custom solutions, an open-source [SDK](https://sdk.apify.com/) for web scraping, proxies, and many other tools to help you build and run web automation jobs at scale. -The results of a scraping job are usually stored in [Apify Dataset](https://docs.apify.com/storage/dataset). This Airbyte connector allows you -to automatically sync the contents of a dataset to your chosen destination using Airbyte. +The results of a scraping job are usually stored in [Apify Dataset](https://docs.apify.com/storage/dataset). This Airbyte connector allows you to automatically sync the contents of a dataset to your chosen destination using Airbyte. To sync data from a dataset, all you need to know is its ID. You will find it in [Apify console](https://my.apify.com/) under storages. ### Running Airbyte sync from Apify webhook -When your Apify job (aka [actor run](https://docs.apify.com/actors/running)) finishes, it can trigger an Airbyte sync by calling the Airbyte -[API](https://airbyte-public-api-docs.s3.us-east-2.amazonaws.com/rapidoc-api-docs.html#post-/v1/connections/sync) manual -connection trigger (`POST /v1/connections/sync`). The API can be called from Apify [webhook](https://docs.apify.com/webhooks) which is -executed when your Apify run finishes. + +When your Apify job \(aka [actor run](https://docs.apify.com/actors/running)\) finishes, it can trigger an Airbyte sync by calling the Airbyte [API](https://airbyte-public-api-docs.s3.us-east-2.amazonaws.com/rapidoc-api-docs.html#post-/v1/connections/sync) manual connection trigger \(`POST /v1/connections/sync`\). The API can be called from Apify [webhook](https://docs.apify.com/webhooks) which is executed when your Apify run finishes. ![](../../.gitbook/assets/apify_trigger_airbyte_connection.png) @@ -43,6 +40,8 @@ The Apify dataset connector uses [Apify Python Client](https://docs.apify.com/ap * Apify [dataset](https://docs.apify.com/storage/dataset) ID ### Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-07-29 | [PR#5069](https://github.com/airbytehq/airbyte/pull/5069) | Initial version of the connector | \ No newline at end of file + +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-07-29 | [PR\#5069](https://github.com/airbytehq/airbyte/pull/5069) | Initial version of the connector | + diff --git a/docs/integrations/sources/appstore.md b/docs/integrations/sources/appstore.md index 1ea5eb7890c..688ce56b3d6 100644 --- a/docs/integrations/sources/appstore.md +++ b/docs/integrations/sources/appstore.md @@ -60,6 +60,7 @@ Generate/Find all requirements using this [external article](https://leapfin.com ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.4 | 2021-07-06 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.4 | 2021-07-06 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | + diff --git a/docs/integrations/sources/asana.md b/docs/integrations/sources/asana.md index 34af79ddcf5..1b29f15eedf 100644 --- a/docs/integrations/sources/asana.md +++ b/docs/integrations/sources/asana.md @@ -2,8 +2,7 @@ ## Sync overview -This source can sync data for the [Asana API](https://developers.asana.com/docs). It supports only Full Refresh syncs. - +This source can sync data for the [Asana API](https://developers.asana.com/docs). It supports only Full Refresh syncs. ### Output schema @@ -53,15 +52,14 @@ The Asana connector should not run into Asana API limitations under normal usage ### Setup guide -Please follow these [steps](https://developers.asana.com/docs/personal-access-token) -to obtain Personal Access Token for your account. - +Please follow these [steps](https://developers.asana.com/docs/personal-access-token) to obtain Personal Access Token for your account. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.3 | 2021-10-06 | [](https://github.com/airbytehq/airbyte/pull/) | Add oauth init flow parameters support | -| 0.1.2 | 2021-09-24 | [6402](https://github.com/airbytehq/airbyte/pull/6402) | Fix SAT tests: update schemas and invalid_config.json file | -| 0.1.1 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add entrypoint and bump version for connector | -| 0.1.0 | 2021-05-25 | [3510](https://github.com/airbytehq/airbyte/pull/3510) | New Source: Asana | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.3 | 2021-10-06 | | Add oauth init flow parameters support | +| 0.1.2 | 2021-09-24 | [6402](https://github.com/airbytehq/airbyte/pull/6402) | Fix SAT tests: update schemas and invalid\_config.json file | +| 0.1.1 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add entrypoint and bump version for connector | +| 0.1.0 | 2021-05-25 | [3510](https://github.com/airbytehq/airbyte/pull/3510) | New Source: Asana | + diff --git a/docs/integrations/sources/aws-cloudtrail.md b/docs/integrations/sources/aws-cloudtrail.md index 8c9b46d0d5f..4f987f3d8a0 100644 --- a/docs/integrations/sources/aws-cloudtrail.md +++ b/docs/integrations/sources/aws-cloudtrail.md @@ -33,8 +33,7 @@ Insight events are not supported right now. Only Management events are available ### Performance considerations -The rate of lookup requests for `events` stream is limited to two per second, per account, per region. -This connector gracefully retries when encountering a throttling error. However if the errors continue repeatedly after multiple retries (for example if you setup many instances of this connector using the same account and region), the connector sync will fail. +The rate of lookup requests for `events` stream is limited to two per second, per account, per region. This connector gracefully retries when encountering a throttling error. However if the errors continue repeatedly after multiple retries \(for example if you setup many instances of this connector using the same account and region\), the connector sync will fail. ## Getting started @@ -48,11 +47,11 @@ This connector gracefully retries when encountering a throttling error. However Please, follow this [steps](https://docs.aws.amazon.com/powershell/latest/userguide/pstools-appendix-sign-up.html) to get your AWS access key and secret. - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.2 | 2021-08-04 | [5152](https://github.com/airbytehq/airbyte/pull/5152) | Fix connector spec.json | -| 0.1.1 | 2021-07-06 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | -| 0.1.0 | 2021-06-23 | [4122](https://github.com/airbytehq/airbyte/pull/4122) | Initial release supporting the LookupEvent API | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.2 | 2021-08-04 | [5152](https://github.com/airbytehq/airbyte/pull/5152) | Fix connector spec.json | +| 0.1.1 | 2021-07-06 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | +| 0.1.0 | 2021-06-23 | [4122](https://github.com/airbytehq/airbyte/pull/4122) | Initial release supporting the LookupEvent API | + diff --git a/docs/integrations/sources/bamboo-hr.md b/docs/integrations/sources/bamboo-hr.md index 9d1d4bd9e98..ec3584496d9 100644 --- a/docs/integrations/sources/bamboo-hr.md +++ b/docs/integrations/sources/bamboo-hr.md @@ -32,6 +32,7 @@ BambooHR has the [rate limits](https://documentation.bamboohr.com/docs/api-detai ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-08-27 | [5054](https://github.com/airbytehq/airbyte/pull/5054) | Initial release with Employees API | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-08-27 | [5054](https://github.com/airbytehq/airbyte/pull/5054) | Initial release with Employees API | + diff --git a/docs/integrations/sources/bigcommerce.md b/docs/integrations/sources/bigcommerce.md index d7239e49345..eaa4489a4f2 100644 --- a/docs/integrations/sources/bigcommerce.md +++ b/docs/integrations/sources/bigcommerce.md @@ -40,16 +40,16 @@ BigCommerce has some [rate limit restrictions](https://developer.bigcommerce.com ## Getting started -1. Navigate to your store’s control panel (Advanced Settings > API Accounts > Create API Account) +1. Navigate to your store’s control panel \(Advanced Settings > API Accounts > Create API Account\) 2. Create an API account. 3. Select the resources you want to allow access to. Airbyte only needs read-level access. * Note: The UI will show all possible data sources and will show errors when syncing if it doesn't have permissions to access a resource. 4. The generated `Access Token` is what you'll use as the `access_token` for the integration. 5. You're ready to set up BigCommerce in Airbyte! - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-08-19 | [5521](https://github.com/airbytehq/airbyte/pull/5521) | Initial Release. Source BigCommerce | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-08-19 | [5521](https://github.com/airbytehq/airbyte/pull/5521) | Initial Release. Source BigCommerce | + diff --git a/docs/integrations/sources/bigquery.md b/docs/integrations/sources/bigquery.md index adc19e06d21..9cae803213b 100644 --- a/docs/integrations/sources/bigquery.md +++ b/docs/integrations/sources/bigquery.md @@ -20,27 +20,27 @@ The BigQuery data types mapping: | CockroachDb Type | Resulting Type | Notes | | :--- | :--- | :--- | -| `BOOL` | Boolean | | -| `INT64` | Number | | -| `FLOAT64` | Number | | -| `NUMERIC` | Number | | -| `BIGNUMERIC` | Number | | -| `STRING` | String | | -| `BYTES` | String | | +| `BOOL` | Boolean | | +| `INT64` | Number | | +| `FLOAT64` | Number | | +| `NUMERIC` | Number | | +| `BIGNUMERIC` | Number | | +| `STRING` | String | | +| `BYTES` | String | | | `DATE` | String | In ISO8601 format | | `DATETIME` | String | In ISO8601 format | | `TIMESTAMP` | String | In ISO8601 format | -| `TIME` | String | | -| `ARRAY` | Array | | -| `STRUCT` | Object | | -| `GEOGRAPHY` | String | | +| `TIME` | String | | +| `ARRAY` | Array | | +| `STRUCT` | Object | | +| `GEOGRAPHY` | String | | ### Features | Feature | Supported | Notes | | :--- | :--- | :--- | | Full Refresh Sync | Yes | | -| Incremental Sync| Yes | | +| Incremental Sync | Yes | | | Change Data Capture | No | | | SSL Support | Yes | | @@ -77,7 +77,7 @@ Follow the [Creating and Managing Service Account Keys](https://cloud.google.com You should now have all the requirements needed to configure BigQuery as a source in the UI. You'll need the following information to configure the BigQuery source: * **Project ID** -* **Default Dataset ID [Optional]**: the schema name if only one schema is interested. Dramatically boost source discover operation. +* **Default Dataset ID \[Optional\]**: the schema name if only one schema is interested. Dramatically boost source discover operation. * **Credentials JSON**: the contents of your Service Account Key JSON file Once you've configured BigQuery as a source, delete the Service Account Key from your computer. @@ -87,9 +87,10 @@ Once you've configured BigQuery as a source, delete the Service Account Key from ### source-bigquery | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.4 | 2021-09-30 | [#6524](https://github.com/airbytehq/airbyte/pull/6524) | Allow `dataset_id` null in spec | -| 0.1.3 | 2021-09-16 | [#6051](https://github.com/airbytehq/airbyte/pull/6051) | Handle NPE `dataset_id` is not provided| -| 0.1.2 | 2021-09-16 | [#6135](https://github.com/airbytehq/airbyte/pull/6135) | 🐛 BigQuery source: Fix nested structs | -| 0.1.1 | 2021-07-28 | [#4981](https://github.com/airbytehq/airbyte/pull/4981) | 🐛 BigQuery source: Fix nested arrays | -| 0.1.0 | 2021-07-22 | [#4457](https://github.com/airbytehq/airbyte/pull/4457) | 🎉 New Source: Big Query. | +| :--- | :--- | :--- | :--- | +| 0.1.4 | 2021-09-30 | [\#6524](https://github.com/airbytehq/airbyte/pull/6524) | Allow `dataset_id` null in spec | +| 0.1.3 | 2021-09-16 | [\#6051](https://github.com/airbytehq/airbyte/pull/6051) | Handle NPE `dataset_id` is not provided | +| 0.1.2 | 2021-09-16 | [\#6135](https://github.com/airbytehq/airbyte/pull/6135) | 🐛 BigQuery source: Fix nested structs | +| 0.1.1 | 2021-07-28 | [\#4981](https://github.com/airbytehq/airbyte/pull/4981) | 🐛 BigQuery source: Fix nested arrays | +| 0.1.0 | 2021-07-22 | [\#4457](https://github.com/airbytehq/airbyte/pull/4457) | 🎉 New Source: Big Query. | + diff --git a/docs/integrations/sources/bing-ads.md b/docs/integrations/sources/bing-ads.md index f1fced30077..8ce0f251521 100644 --- a/docs/integrations/sources/bing-ads.md +++ b/docs/integrations/sources/bing-ads.md @@ -2,8 +2,7 @@ ## Overview -This source can sync data from the [Bing Ads](https://docs.microsoft.com/en-us/advertising/guides/?view=bingads-13). -Connector is based on a [Bing Ads Python SDK](https://github.com/BingAds/BingAds-Python-SDK). +This source can sync data from the [Bing Ads](https://docs.microsoft.com/en-us/advertising/guides/?view=bingads-13). Connector is based on a [Bing Ads Python SDK](https://github.com/BingAds/BingAds-Python-SDK). ### Output schema @@ -14,8 +13,8 @@ This Source is capable of syncing the following core Streams: * [AdGroups](https://docs.microsoft.com/en-us/advertising/campaign-management-service/getadgroupsbycampaignid?view=bingads-13) * [Ads](https://docs.microsoft.com/en-us/advertising/campaign-management-service/getadsbyadgroupid?view=bingads-13) - Supported report streams: + * [AccountPerformanceReport](https://docs.microsoft.com/en-us/advertising/reporting-service/accountperformancereportrequest?view=bingads-13) * [AdPerformanceReport](https://docs.microsoft.com/en-us/advertising/reporting-service/adperformancereportrequest?view=bingads-13) * [AdGroupPerformanceReport](https://docs.microsoft.com/en-us/advertising/reporting-service/adgroupperformancereportrequest?view=bingads-13) @@ -23,7 +22,6 @@ Supported report streams: * [BudgetSummaryReport](https://docs.microsoft.com/en-us/advertising/reporting-service/budgetsummaryreportrequest?view=bingads-13) * [KeywordPerformanceReport](https://docs.microsoft.com/en-us/advertising/reporting-service/keywordperformancereportrequest?view=bingads-13) - ### Data type mapping | Integration Type | Airbyte Type | Notes | @@ -50,20 +48,20 @@ API limits number of requests for all Microsoft Advertising clients. You can fin ### Requirements * accounts: Has 2 options - - fetch data from all accounts to which you have access - - you need to provide specific account ids for which you a going to pull data. Use this [guide](https://docs.microsoft.com/en-us/advertising/guides/get-started?view=bingads-13#get-ids) to find your account id -* user_id: Sign in to the Microsoft Advertising web application. The URL will contain a uid key/value pair in the query string that identifies your User ID -* customer_id: Use this [guide](https://docs.microsoft.com/en-us/advertising/guides/get-started?view=bingads-13#get-ids) to get this id -* developer_token: You can find this token [here](https://docs.microsoft.com/en-us/advertising/guides/get-started?view=bingads-13#get-developer-token) -* refresh_token: Token received during [auth process](https://docs.microsoft.com/en-us/advertising/guides/authentication-oauth?view=bingads-13) -* client_secret: Secret generated during application registration -* client_id: Id generated during application registration -* reports_start_date: From which date report generation should start -* report_aggregation: Defines how report data will be aggregated -* hourly_reports: includes hourly report streams if true -* daily_reports: includes daily report streams if true -* weekly_reports: includes weekly report streams if true -* monthly_reports: includes monthly report streams if true + * fetch data from all accounts to which you have access + * you need to provide specific account ids for which you a going to pull data. Use this [guide](https://docs.microsoft.com/en-us/advertising/guides/get-started?view=bingads-13#get-ids) to find your account id +* user\_id: Sign in to the Microsoft Advertising web application. The URL will contain a uid key/value pair in the query string that identifies your User ID +* customer\_id: Use this [guide](https://docs.microsoft.com/en-us/advertising/guides/get-started?view=bingads-13#get-ids) to get this id +* developer\_token: You can find this token [here](https://docs.microsoft.com/en-us/advertising/guides/get-started?view=bingads-13#get-developer-token) +* refresh\_token: Token received during [auth process](https://docs.microsoft.com/en-us/advertising/guides/authentication-oauth?view=bingads-13) +* client\_secret: Secret generated during application registration +* client\_id: Id generated during application registration +* reports\_start\_date: From which date report generation should start +* report\_aggregation: Defines how report data will be aggregated +* hourly\_reports: includes hourly report streams if true +* daily\_reports: includes daily report streams if true +* weekly\_reports: includes weekly report streams if true +* monthly\_reports: includes monthly report streams if true ### Setup guide @@ -75,10 +73,10 @@ Full authentication process described [here](https://docs.microsoft.com/en-us/ad Be aware that `refresh token` will expire in 90 days. You need to repeat auth process to get the new one `refresh token` - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-08-31 | [5750](https://github.com/airbytehq/airbyte/pull/5750) | Added reporting streams) | -| 0.1.0 | 2021-07-22 | [4911](https://github.com/airbytehq/airbyte/pull/4911) | Initial release supported core streams (Accounts, Campaigns, Ads, AdGroups) | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-08-31 | [5750](https://github.com/airbytehq/airbyte/pull/5750) | Added reporting streams\) | +| 0.1.0 | 2021-07-22 | [4911](https://github.com/airbytehq/airbyte/pull/4911) | Initial release supported core streams \(Accounts, Campaigns, Ads, AdGroups\) | + diff --git a/docs/integrations/sources/braintree.md b/docs/integrations/sources/braintree.md index e4931a66e01..d6c5f6042c6 100644 --- a/docs/integrations/sources/braintree.md +++ b/docs/integrations/sources/braintree.md @@ -56,6 +56,7 @@ We recommend creating a restricted, read-only key specifically for Airbyte acces ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-08-17 | [5362](https://github.com/airbytehq/airbyte/pull/5362) | Initial version | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-08-17 | [5362](https://github.com/airbytehq/airbyte/pull/5362) | Initial version | + diff --git a/docs/integrations/sources/cart.md b/docs/integrations/sources/cart.md index bce8ec04df0..d096457acf9 100644 --- a/docs/integrations/sources/cart.md +++ b/docs/integrations/sources/cart.md @@ -2,8 +2,7 @@ ## Sync overview -This source can sync data for the [Cart API](https://developers.cart.com/docs/rest-api/docs/README.md). It supports both Full Refresh and Incremental sync for all streams. -You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. +This source can sync data for the [Cart API](https://developers.cart.com/docs/rest-api/docs/README.md). It supports both Full Refresh and Incremental sync for all streams. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. ### Output schema @@ -45,12 +44,13 @@ The Cart api has some request limitation. See [this](https://developers.cart.com ### Setup guide -Please follow these [steps](https://developers.cart.com/docs/rest-api/docs/README.md#setup) to obtain Access Token for your account. +Please follow these [steps](https://developers.cart.com/docs/rest-api/docs/README.md#setup) to obtain Access Token for your account. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.3 | 2021-08-26 | [5465](https://github.com/airbytehq/airbyte/pull/5465) | Add the end_date option for limitation of the amount of synced data| -| 0.1.2 | 2021-08-23 | [1111](https://github.com/airbytehq/airbyte/pull/1111) | Add `order_items` stream | -| 0.1.0 | 2021-06-08 | [4574](https://github.com/airbytehq/airbyte/pull/4574) | Initial Release | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.3 | 2021-08-26 | [5465](https://github.com/airbytehq/airbyte/pull/5465) | Add the end\_date option for limitation of the amount of synced data | +| 0.1.2 | 2021-08-23 | [1111](https://github.com/airbytehq/airbyte/pull/1111) | Add `order_items` stream | +| 0.1.0 | 2021-06-08 | [4574](https://github.com/airbytehq/airbyte/pull/4574) | Initial Release | + diff --git a/docs/integrations/sources/chargebee.md b/docs/integrations/sources/chargebee.md index e9a485d7daa..55ec510a971 100644 --- a/docs/integrations/sources/chargebee.md +++ b/docs/integrations/sources/chargebee.md @@ -25,29 +25,29 @@ This connector outputs the following streams: Some streams may depend on Product Catalog version and be accessible only on sites with specific Product Catalog version. This means that we have following streams: 1. presented in both `Product Catalog 1.0` and `Product Catalog 2.0`: - - Subscriptions - - Customers - - Invoices - - Orders - + * Subscriptions + * Customers + * Invoices + * Orders 2. presented only in `Product Catalog 1.0`: - - Plans - - Addons - + * Plans + * Addons 3. presented only in `Product Catalog 2.0`: - - Items - - Item Prices - - Attached Items + * Items + * Item Prices + * Attached Items Also, 8 streams from the above 9 incremental streams are pure incremental meaning that they: -- read only new records; -- output only new records. + +* read only new records; +* output only new records. `Attached Items` incremental stream is also incremental but with one difference, it: -- read all records; -- output only new records. -This means that syncing the `Attached Items` stream, even in incremental mode, is expensive in terms of your Chargebee API quota. Generally speaking, it incurs a number of API calls equal to the total number of attached items in your chargebee instance divided by 100, regardless of how many AttachedItems were actually changed or synced in a particular sync job. +* read all records; +* output only new records. + +This means that syncing the `Attached Items` stream, even in incremental mode, is expensive in terms of your Chargebee API quota. Generally speaking, it incurs a number of API calls equal to the total number of attached items in your chargebee instance divided by 100, regardless of how many AttachedItems were actually changed or synced in a particular sync job. ### Features @@ -75,16 +75,15 @@ The Chargebee connector should not run into [Chargebee API](https://apidocs.char ### Setup guide -Log into Chargebee and then generate an [API Key](https://apidocs.chargebee.com/docs/api?prod_cat_ver=2#api_authentication). -Then follow [these](https://apidocs.chargebee.com/docs/api?prod_cat_ver=2) instructions, under `API Version` section, on how to find your Product Catalog version. - +Log into Chargebee and then generate an [API Key](https://apidocs.chargebee.com/docs/api?prod_cat_ver=2#api_authentication). Then follow [these](https://apidocs.chargebee.com/docs/api?prod_cat_ver=2) instructions, under `API Version` section, on how to find your Product Catalog version. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.4 | 2021-09-27 | [6454](https://github.com/airbytehq/airbyte/pull/6454) | Fix examples in spec file | -| 0.1.3 | 2021-08-17 | [5421](https://github.com/airbytehq/airbyte/pull/5421) | Add support for "Product Catalog 2.0" specific streams: `Items`, `Item prices` and `Attached Items` | -| 0.1.2 | 2021-07-30 | [5067](https://github.com/airbytehq/airbyte/pull/5067) | Prepare connector for publishing | -| 0.1.1 | 2021-07-07 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add entrypoint and bump version for connector | -| 0.1.0 | 2021-06-30 | [3410](https://github.com/airbytehq/airbyte/pull/3410) | New Source: Chargebee | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.4 | 2021-09-27 | [6454](https://github.com/airbytehq/airbyte/pull/6454) | Fix examples in spec file | +| 0.1.3 | 2021-08-17 | [5421](https://github.com/airbytehq/airbyte/pull/5421) | Add support for "Product Catalog 2.0" specific streams: `Items`, `Item prices` and `Attached Items` | +| 0.1.2 | 2021-07-30 | [5067](https://github.com/airbytehq/airbyte/pull/5067) | Prepare connector for publishing | +| 0.1.1 | 2021-07-07 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add entrypoint and bump version for connector | +| 0.1.0 | 2021-06-30 | [3410](https://github.com/airbytehq/airbyte/pull/3410) | New Source: Chargebee | + diff --git a/docs/integrations/sources/clickhouse.md b/docs/integrations/sources/clickhouse.md index 8de90cfdafd..b8354651f23 100644 --- a/docs/integrations/sources/clickhouse.md +++ b/docs/integrations/sources/clickhouse.md @@ -55,9 +55,9 @@ You can limit this grant down to specific tables instead of the whole database. Your database user should now be ready for use with Airbyte. - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.2 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | \ No newline at end of file +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.2 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | + diff --git a/docs/integrations/sources/close-com.md b/docs/integrations/sources/close-com.md index 14bb0081c89..41a2cb8300c 100644 --- a/docs/integrations/sources/close-com.md +++ b/docs/integrations/sources/close-com.md @@ -71,8 +71,7 @@ The [Close.com API](https://developer.close.com/) uses the same [JSONSchema](htt ### Performance considerations -The Close.com Connector has rate limit. There are 60 RPS for Organizations. -You can find detailed info [here](https://developer.close.com/#ratelimits). +The Close.com Connector has rate limit. There are 60 RPS for Organizations. You can find detailed info [here](https://developer.close.com/#ratelimits). ## Getting started @@ -89,6 +88,7 @@ We recommend creating a restricted key specifically for Airbyte access. This wil ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-08-10 | [5366](https://github.com/airbytehq/airbyte/pull/5366) | Initial release of Close.com connector for Airbyte | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-08-10 | [5366](https://github.com/airbytehq/airbyte/pull/5366) | Initial release of Close.com connector for Airbyte | + diff --git a/docs/integrations/sources/cockroachdb.md b/docs/integrations/sources/cockroachdb.md index 0f616aa48d6..5f03415b8b9 100644 --- a/docs/integrations/sources/cockroachdb.md +++ b/docs/integrations/sources/cockroachdb.md @@ -42,11 +42,10 @@ CockroachDb data types are mapped to the following data types when synchronizing | Feature | Supported | Notes | | :--- | :--- | :--- | | Full Refresh Sync | Yes | | -| Incremental Sync| Yes | | +| Incremental Sync | Yes | | | Change Data Capture | No | | | SSL Support | Yes | | - ## Getting started ### Requirements @@ -94,6 +93,7 @@ Your database user should now be ready for use with Airbyte. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.2 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.2 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | + diff --git a/docs/integrations/sources/db2.md b/docs/integrations/sources/db2.md index 3ab4c8ba7db..0e3a6a6eda5 100644 --- a/docs/integrations/sources/db2.md +++ b/docs/integrations/sources/db2.md @@ -1,11 +1,10 @@ -# IBM Db2 +# Db2 ## Overview -The IBM Db2 source allows you to sync data from Db2. -It supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. +The IBM Db2 source allows you to sync data from Db2. It supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. -This IBM Db2 source connector is built on top of the [IBM Data Server Driver](https://mvnrepository.com/artifact/com.ibm.db2/jcc/11.5.5.0) for JDBC and SQLJ. It is a pure-Java driver (Type 4) that supports the JDBC 4 specification as described in IBM Db2 [documentation](https://www.ibm.com/docs/en/db2/11.5?topic=apis-supported-drivers-jdbc-sqlj). +This IBM Db2 source connector is built on top of the [IBM Data Server Driver](https://mvnrepository.com/artifact/com.ibm.db2/jcc/11.5.5.0) for JDBC and SQLJ. It is a pure-Java driver \(Type 4\) that supports the JDBC 4 specification as described in IBM Db2 [documentation](https://www.ibm.com/docs/en/db2/11.5?topic=apis-supported-drivers-jdbc-sqlj). #### Resulting schema @@ -24,16 +23,15 @@ The IBM Db2 source does not alter the schema present in your warehouse. Dependin ### Requirements 1. You'll need the following information to configure the IBM Db2 source: - -* **Host** -* **Port** -* **Database** -* **Username** -* **Password** - -2. Create a dedicated read-only Airbyte user and role with access to all schemas needed for replication. +2. **Host** +3. **Port** +4. **Database** +5. **Username** +6. **Password** +7. Create a dedicated read-only Airbyte user and role with access to all schemas needed for replication. ### Setup guide + #### 1. Specify port, host and name of the database. #### 2. Create a dedicated read-only user with access to the relevant schemas \(Recommended but optional\) @@ -49,15 +47,14 @@ CREATE ROLE 'AIRBYTE_ROLE'; -- grant Airbyte database access GRANT CONNECT ON 'DATABASE' TO ROLE 'AIRBYTE_ROLE' GRANT ROLE 'AIRBYTE_ROLE' TO USER 'AIRBYTE_USER' - ``` Your database user should now be ready for use with Airbyte. - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | -| 0.1.0 | 2021-06-22 | [4197](https://github.com/airbytehq/airbyte/pull/4197) | New Source: IBM DB2 | \ No newline at end of file +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | +| 0.1.0 | 2021-06-22 | [4197](https://github.com/airbytehq/airbyte/pull/4197) | New Source: IBM DB2 | + diff --git a/docs/integrations/sources/dixa.md b/docs/integrations/sources/dixa.md index 818522fa1cf..2ed02da7bd4 100644 --- a/docs/integrations/sources/dixa.md +++ b/docs/integrations/sources/dixa.md @@ -2,9 +2,7 @@ ## Sync overview -This source can sync data for the [Dixa conversation_export API](https://support.dixa.help/en/articles/174-export-conversations-via-api). -It supports both Full Refresh and Incremental syncs. -You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. +This source can sync data for the [Dixa conversation\_export API](https://support.dixa.help/en/articles/174-export-conversations-via-api). It supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. ### Output schema @@ -31,12 +29,9 @@ This Source is capable of syncing the following Streams: ### Performance considerations -The connector is limited by standard Dixa conversation_export API [limits](https://support.dixa.help/en/articles/174-export-conversations-via-api). -It should not run into limitations under normal usage. -Please [create an issue](https://github.com/airbytehq/airbyte/issues) if you see any rate limit issues that are not automatically retried successfully. +The connector is limited by standard Dixa conversation\_export API [limits](https://support.dixa.help/en/articles/174-export-conversations-via-api). It should not run into limitations under normal usage. Please [create an issue](https://github.com/airbytehq/airbyte/issues) if you see any rate limit issues that are not automatically retried successfully. -When using the connector, keep in mind that increasing the `batch_size` parameter will -decrease the number of requests sent to the API, but increase the response and processing time. +When using the connector, keep in mind that increasing the `batch_size` parameter will decrease the number of requests sent to the API, but increase the response and processing time. ## Getting started @@ -47,12 +42,15 @@ decrease the number of requests sent to the API, but increase the response and p ### Setup guide 1. Generate an API token using the [Dixa documentation](https://support.dixa.help/en/articles/259-how-to-generate-an-api-token). -1. Define a `start_timestamp`: the connector will pull records with `updated_at >= start_timestamp` -1. Define a `batch_size`: this represents the number of days which will be batched in a single request. +2. Define a `start_timestamp`: the connector will pull records with `updated_at >= start_timestamp` +3. Define a `batch_size`: this represents the number of days which will be batched in a single request. + Keep the performance consideration above in mind ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-08-12 | [5367](https://github.com/airbytehq/airbyte/pull/5367) | Migrated to CI Sandbox, refactorred code structure for future support | -| 0.1.0 | 2021-07-07 | [4358](https://github.com/airbytehq/airbyte/pull/4358) | New source | + +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-08-12 | [5367](https://github.com/airbytehq/airbyte/pull/5367) | Migrated to CI Sandbox, refactorred code structure for future support | +| 0.1.0 | 2021-07-07 | [4358](https://github.com/airbytehq/airbyte/pull/4358) | New source | + diff --git a/docs/integrations/sources/drupal.md b/docs/integrations/sources/drupal.md index 07358d35b2e..95b4f3eab33 100644 --- a/docs/integrations/sources/drupal.md +++ b/docs/integrations/sources/drupal.md @@ -4,21 +4,22 @@ ## Sync overview - {% hint style="warning" %} You will only be able to connect to a self-hosted instance of Drupal using these instructions. {% endhint %} -Drupal can run on MySQL, Percona, MariaDb, MSSQL, MongoDB, Postgres, or SQL-Lite. If you're not using SQL-lite, you can use Airbyte to sync your Drupal instance by connecting to the underlying database using the appropriate Airbyte connector: +Drupal can run on MySQL, Percona, MariaDb, MSSQL, MongoDB, Postgres, or SQL-Lite. If you're not using SQL-lite, you can use Airbyte to sync your Drupal instance by connecting to the underlying database using the appropriate Airbyte connector: -* [MySQL/Percona/MariaDB](./mysql.md) -* [MSSQL](./mssql.md) -* [Mongo](mongodb.md) +* [MySQL/Percona/MariaDB](mysql.md) +* [MSSQL](mssql.md) +* [Mongo]() * [Postgres](postgres.md) {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} ### Output schema -The schema will be loaded according to the rules of the underlying database's connector. + +The schema will be loaded according to the rules of the underlying database's connector. + diff --git a/docs/integrations/sources/exchangeratesapi.md b/docs/integrations/sources/exchangeratesapi.md index 613e3752218..42edbd7eb64 100644 --- a/docs/integrations/sources/exchangeratesapi.md +++ b/docs/integrations/sources/exchangeratesapi.md @@ -6,7 +6,6 @@ The exchange rates integration is a toy integration to demonstrate how Airbyte w It pulls all its data from [https://exchangeratesapi.io](https://exchangeratesapi.io) - #### Output schema It contains one stream: `exchange_rates` @@ -38,15 +37,15 @@ Currencies are `number` and the date is a `string`. In order to get an `API Access Key` please go to [this](https://manage.exchangeratesapi.io/signup/free) page and enter needed info. After registration and login you will see your `API Access Key`, also you may find it [here](https://manage.exchangeratesapi.io/dashboard). -If you have `free` subscription plan (you may check it [here](https://manage.exchangeratesapi.io/plan)) this means that you will have 2 limitations: +If you have `free` subscription plan \(you may check it [here](https://manage.exchangeratesapi.io/plan)\) this means that you will have 2 limitations: 1. 1000 API calls per month. 2. You won't be able to specify the `base` parameter, meaning that you will be dealing only with default base value which is EUR. - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.0 | 2021-05-26 | [3566](https://github.com/airbytehq/airbyte/pull/3566) | Move from `api.ratesapi.io/` to `api.exchangeratesapi.io/`.
Add required field `access_key` to `config.json`. | -| 0.1.0 | 2021-04-19 | [2942](https://github.com/airbytehq/airbyte/pull/2942) | Implement Exchange API using the CDK | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.0 | 2021-05-26 | [3566](https://github.com/airbytehq/airbyte/pull/3566) | Move from `api.ratesapi.io/` to `api.exchangeratesapi.io/`. Add required field `access_key` to `config.json`. | +| 0.1.0 | 2021-04-19 | [2942](https://github.com/airbytehq/airbyte/pull/2942) | Implement Exchange API using the CDK | + diff --git a/docs/integrations/sources/facebook-marketing.md b/docs/integrations/sources/facebook-marketing.md index e8d02c1ded3..c31b93411de 100644 --- a/docs/integrations/sources/facebook-marketing.md +++ b/docs/integrations/sources/facebook-marketing.md @@ -18,22 +18,23 @@ This Source is capable of syncing the following tables and their data: * [AdInsights](https://developers.facebook.com/docs/marketing-api/reference/adgroup/insights/) You can segment the AdInsights table into parts based on the following information. Each part will be synced as a separate table if normalization is enabled: + * Country * DMA \(Designated Market Area\) * Gender & Age * Platform & Device * Region -For more information, see the [Facebook Insights API documentation. ](https://developers.facebook.com/docs/marketing-api/reference/adgroup/insights/)\\ +For more information, see the [Facebook Insights API documentation. ](https://developers.facebook.com/docs/marketing-api/reference/adgroup/insights/)\ -## Getting Started (Airbyte Cloud) +## Getting Started \(Airbyte Cloud\) 1. Click `Authenticate your Facebook Marketing account`. 2. Enter your Account ID. Learn how to find it are [here](https://www.facebook.com/business/help/1492627900875762). 3. Enter a start date and your Insights settings. 4. You're done. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) #### Requirements @@ -70,7 +71,7 @@ See the Facebook [documentation on Authorization](https://developers.facebook.co With the Ad Account ID and API access token, you should be ready to start pulling data from the Facebook Marketing API. Head to the Airbyte UI to setup your source connector! -## Rate Limiting & Performance Considerations (Airbyte Open Source) +## Rate Limiting & Performance Considerations \(Airbyte Open Source\) Facebook heavily throttles API tokens generated from Facebook Apps by default, making it infeasible to use such a token for syncs with Airbyte. To be able to use this connector without your syncs taking days due to rate limiting follow the instructions in the Setup Guide below to access better rate limits. @@ -87,28 +88,29 @@ See Facebook's [documentation on rate limiting](https://developers.facebook.com/ ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.20 | 2021-10-04 | [6719](https://github.com/airbytehq/airbyte/pull/6719) | Update version of facebook_bussiness package to 12.0 | -| 0.2.19 | 2021-09-30 | [6438](https://github.com/airbytehq/airbyte/pull/6438) | Annotate Oauth2 flow initialization parameters in connector specification | -| 0.2.18 | 2021-09-28 | [6499](https://github.com/airbytehq/airbyte/pull/6499) | Fix field values converting fail | -| 0.2.17 | 2021-09-14 | [4978](https://github.com/airbytehq/airbyte/pull/4978) | Convert values' types according to schema types | -| 0.2.16 | 2021-09-14 | [6060](https://github.com/airbytehq/airbyte/pull/6060) | Fix schema for `ads_insights` stream | -| 0.2.15 | 2021-09-14 | [5958](https://github.com/airbytehq/airbyte/pull/5958) | Fix url parsing and add report that exposes conversions | -| 0.2.14 | 2021-07-19 | [4820](https://github.com/airbytehq/airbyte/pull/4820) | Improve the rate limit management | -| 0.2.12 | 2021-06-20 | [3743](https://github.com/airbytehq/airbyte/pull/3743) | Refactor connector to use CDK:
- Improve error handling.
- Improve async job performance (insights).
- Add new configuration parameter `insights_days_per_job`.
- Rename stream `adsets` to `ad_sets`.
- Refactor schema logic for insights, allowing to configure any possible insight stream. | -| 0.2.10 | 2021-06-16 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Update version of facebook_bussiness to 11.0 | -| 0.2.9 | 2021-06-10 | [3996](https://github.com/airbytehq/airbyte/pull/3996) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | -| 0.2.8 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add 80000 as a rate-limiting error code | -| 0.2.7 | 2021-06-03 | [3646](https://github.com/airbytehq/airbyte/pull/3646) | Add missing fields to AdInsights streams | -| 0.2.6 | 2021-05-25 | [3525](https://github.com/airbytehq/airbyte/pull/3525) | Fix handling call rate limit | -| 0.2.5 | 2021-05-20 | [3396](https://github.com/airbytehq/airbyte/pull/3396) | Allow configuring insights lookback window | -| 0.2.4 | 2021-05-13 | [3395](https://github.com/airbytehq/airbyte/pull/3395) | Fix an issue that caused losing Insights data from the past 28 days while incremental sync | -| 0.2.3 | 2021-04-28 | [3116](https://github.com/airbytehq/airbyte/pull/3116) | Wait longer (5 min) for async jobs to start | -| 0.2.2 | 2021-04-03 | [2726](https://github.com/airbytehq/airbyte/pull/2726) | Fix base connector versioning | -| 0.2.1 | 2021-03-12 | [2391](https://github.com/airbytehq/airbyte/pull/2391) | Support FB Marketing API v10 | -| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | -| 0.1.4 | 2021-02-24 | [1902](https://github.com/airbytehq/airbyte/pull/1902) | Add `include_deleted` option in params | -| 0.1.3 | 2021-02-15 | [1990](https://github.com/airbytehq/airbyte/pull/1990) | Support Insights stream via async queries | -| 0.1.2 | 2021-01-22 | [1699](https://github.com/airbytehq/airbyte/pull/1699) | Add incremental support | -| 0.1.1 | 2021-01-15 | [1552](https://github.com/airbytehq/airbyte/pull/1552) | Release Native Facebook Marketing Connector | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.20 | 2021-10-04 | [6719](https://github.com/airbytehq/airbyte/pull/6719) | Update version of facebook\_bussiness package to 12.0 | +| 0.2.19 | 2021-09-30 | [6438](https://github.com/airbytehq/airbyte/pull/6438) | Annotate Oauth2 flow initialization parameters in connector specification | +| 0.2.18 | 2021-09-28 | [6499](https://github.com/airbytehq/airbyte/pull/6499) | Fix field values converting fail | +| 0.2.17 | 2021-09-14 | [4978](https://github.com/airbytehq/airbyte/pull/4978) | Convert values' types according to schema types | +| 0.2.16 | 2021-09-14 | [6060](https://github.com/airbytehq/airbyte/pull/6060) | Fix schema for `ads_insights` stream | +| 0.2.15 | 2021-09-14 | [5958](https://github.com/airbytehq/airbyte/pull/5958) | Fix url parsing and add report that exposes conversions | +| 0.2.14 | 2021-07-19 | [4820](https://github.com/airbytehq/airbyte/pull/4820) | Improve the rate limit management | +| 0.2.12 | 2021-06-20 | [3743](https://github.com/airbytehq/airbyte/pull/3743) | Refactor connector to use CDK: - Improve error handling. - Improve async job performance \(insights\). - Add new configuration parameter `insights_days_per_job`. - Rename stream `adsets` to `ad_sets`. - Refactor schema logic for insights, allowing to configure any possible insight stream. | +| 0.2.10 | 2021-06-16 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Update version of facebook\_bussiness to 11.0 | +| 0.2.9 | 2021-06-10 | [3996](https://github.com/airbytehq/airbyte/pull/3996) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | +| 0.2.8 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add 80000 as a rate-limiting error code | +| 0.2.7 | 2021-06-03 | [3646](https://github.com/airbytehq/airbyte/pull/3646) | Add missing fields to AdInsights streams | +| 0.2.6 | 2021-05-25 | [3525](https://github.com/airbytehq/airbyte/pull/3525) | Fix handling call rate limit | +| 0.2.5 | 2021-05-20 | [3396](https://github.com/airbytehq/airbyte/pull/3396) | Allow configuring insights lookback window | +| 0.2.4 | 2021-05-13 | [3395](https://github.com/airbytehq/airbyte/pull/3395) | Fix an issue that caused losing Insights data from the past 28 days while incremental sync | +| 0.2.3 | 2021-04-28 | [3116](https://github.com/airbytehq/airbyte/pull/3116) | Wait longer \(5 min\) for async jobs to start | +| 0.2.2 | 2021-04-03 | [2726](https://github.com/airbytehq/airbyte/pull/2726) | Fix base connector versioning | +| 0.2.1 | 2021-03-12 | [2391](https://github.com/airbytehq/airbyte/pull/2391) | Support FB Marketing API v10 | +| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | +| 0.1.4 | 2021-02-24 | [1902](https://github.com/airbytehq/airbyte/pull/1902) | Add `include_deleted` option in params | +| 0.1.3 | 2021-02-15 | [1990](https://github.com/airbytehq/airbyte/pull/1990) | Support Insights stream via async queries | +| 0.1.2 | 2021-01-22 | [1699](https://github.com/airbytehq/airbyte/pull/1699) | Add incremental support | +| 0.1.1 | 2021-01-15 | [1552](https://github.com/airbytehq/airbyte/pull/1552) | Release Native Facebook Marketing Connector | + diff --git a/docs/integrations/sources/facebook-pages.md b/docs/integrations/sources/facebook-pages.md index 7c522adb90e..d2854818f12 100644 --- a/docs/integrations/sources/facebook-pages.md +++ b/docs/integrations/sources/facebook-pages.md @@ -34,7 +34,6 @@ The linked Facebook docs go into detail about the fields present on those stream ### Rate Limiting & Performance Considerations - Facebook heavily throttles API tokens generated from Facebook Apps by default, making it infeasible to use such a token for syncs with Airbyte. To be able to use this connector without your syncs taking days due to rate limiting follow the instructions in the Setup Guide below to access better rate limits. See Facebook's [documentation on rate limiting](https://developers.facebook.com/docs/graph-api/overview/rate-limiting) for more information on requesting a quota upgrade. @@ -62,29 +61,28 @@ Visit the [Facebook Developers App hub](https://developers.facebook.com/apps/) a #### Connect a User Page -Follow the [Graph API Explorer](https://developers.facebook.com/tools/explorer/) link. -1. Choose your app at `Facebook App` field -2. Choose your Page at `User or Page` field -3. Add next permission: - * pages_read_engagement - * pages_read_user_content - * pages_show_list - * read_insights -4. Click Generate Access Token and follow instructions. +Follow the [Graph API Explorer](https://developers.facebook.com/tools/explorer/) link. 1. Choose your app at `Facebook App` field 2. Choose your Page at `User or Page` field 3. Add next permission: + +* pages\_read\_engagement +* pages\_read\_user\_content +* pages\_show\_list +* read\_insights + 1. Click Generate Access Token and follow instructions. After all the steps, it should look something like this ![](../../.gitbook/assets/facebook-pages-1.png) -Now can copy your Access Token from `Access Token` field (This is a short live Page access token, if you need a long-lived Page access token, you can [generate](https://developers.facebook.com/docs/facebook-login/access-tokens/refreshing#get-a-long-lived-page-access-token) one from a long-lived User access token. Long-lived Page access token do not have an expiration date and only expire or are invalidated under certain conditions.) +Now can copy your Access Token from `Access Token` field \(This is a short live Page access token, if you need a long-lived Page access token, you can [generate](https://developers.facebook.com/docs/facebook-login/access-tokens/refreshing#get-a-long-lived-page-access-token) one from a long-lived User access token. Long-lived Page access token do not have an expiration date and only expire or are invalidated under certain conditions.\) #### Getting Page ID -You can easily get the page id from the page url. For example, if you have a page URL such as `https://www.facebook.com/Test-1111111111`, the ID would be` Test-1111111111`. +You can easily get the page id from the page url. For example, if you have a page URL such as `https://www.facebook.com/Test-1111111111`, the ID would be`Test-1111111111`. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-09-30 | [6438](https://github.com/airbytehq/airbyte/pull/6438) | Annotate Oauth2 flow initialization parameters in connector specification | -| 0.1.0 | 2021-09-01 | [5158](https://github.com/airbytehq/airbyte/pull/5158) | Initial Release | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-09-30 | [6438](https://github.com/airbytehq/airbyte/pull/6438) | Annotate Oauth2 flow initialization parameters in connector specification | +| 0.1.0 | 2021-09-01 | [5158](https://github.com/airbytehq/airbyte/pull/5158) | Initial Release | + diff --git a/docs/integrations/sources/file.md b/docs/integrations/sources/file.md index d4339cc1a14..c2469f77e10 100644 --- a/docs/integrations/sources/file.md +++ b/docs/integrations/sources/file.md @@ -43,24 +43,25 @@ This source produces a single table for the target file as it replicates only on | HTML | No | | XML | No | | Excel | Yes | -| Excel Binary Workbook| Yes | +| Excel Binary Workbook | Yes | | Feather | Yes | | Parquet | Yes | | Pickle | No | -## Getting Started (Airbyte Cloud) +## Getting Started \(Airbyte Cloud\) Setup through Airbyte Cloud will be exactly the same as the open-source setup, except for the fact that local files are disabled. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) 1. Once the File Source is selected, you should define both the storage provider along its URL and format of the file. 2. Depending on the provider choice and privacy of the data, you will have to configure more options. #### Provider Specific Information + * In case of GCS, it is necessary to provide the content of the service account keyfile to access private buckets. See settings of [BigQuery Destination](../destinations/bigquery.md) * In case of AWS S3, the pair of `aws_access_key_id` and `aws_secret_access_key` is necessary to access private S3 buckets. -* In case of AzBlob, it is necessary to provide the `storage_account` in which the blob you want to access resides. Either `sas_token` [(info)](https://docs.microsoft.com/en-us/azure/storage/blobs/sas-service-create?tabs=dotnet) or `shared_key` [(info)](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal) is necessary to access private blobs. +* In case of AzBlob, it is necessary to provide the `storage_account` in which the blob you want to access resides. Either `sas_token` [\(info\)](https://docs.microsoft.com/en-us/azure/storage/blobs/sas-service-create?tabs=dotnet) or `shared_key` [\(info\)](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal) is necessary to access private blobs. ### Reader Options @@ -121,18 +122,19 @@ In order to read large files from a remote location, this connector uses the [sm ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.6 | 2021-08-26 | [5613](https://github.com/airbytehq/airbyte/pull/5613) | Add support to xlsb format | -| 0.2.5 | 2021-07-26 | [4953](https://github.com/airbytehq/airbyte/pull/4953) | Allow non-default port for SFTP type | -| 0.2.4 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE_ENTRYPOINT for Kubernetes support | -| 0.2.3 | 2021-06-01 | [3771](https://github.com/airbytehq/airbyte/pull/3771) | Add Azure Storage Blob Files option | -| 0.2.2 | 2021-04-16 | [2883](https://github.com/airbytehq/airbyte/pull/2883) | Fix CSV discovery memory consumption | -| 0.2.1 | 2021-04-03 | [2726](https://github.com/airbytehq/airbyte/pull/2726) | Fix base connector versioning | -| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | -| 0.1.10 | 2021-02-18 | [2118](https://github.com/airbytehq/airbyte/pull/2118) | Support JSONL format | -| 0.1.9 | 2021-02-02 | [1768](https://github.com/airbytehq/airbyte/pull/1768) | Add test cases for all formats | -| 0.1.8 | 2021-01-27 | [1738](https://github.com/airbytehq/airbyte/pull/1738) | Adopt connector best practices | -| 0.1.7 | 2020-12-16 | [1331](https://github.com/airbytehq/airbyte/pull/1331) | Refactor Python base connector | -| 0.1.6 | 2020-12-08 | [1249](https://github.com/airbytehq/airbyte/pull/1249) | Handle NaN values | -| 0.1.5 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.6 | 2021-08-26 | [5613](https://github.com/airbytehq/airbyte/pull/5613) | Add support to xlsb format | +| 0.2.5 | 2021-07-26 | [4953](https://github.com/airbytehq/airbyte/pull/4953) | Allow non-default port for SFTP type | +| 0.2.4 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE\_ENTRYPOINT for Kubernetes support | +| 0.2.3 | 2021-06-01 | [3771](https://github.com/airbytehq/airbyte/pull/3771) | Add Azure Storage Blob Files option | +| 0.2.2 | 2021-04-16 | [2883](https://github.com/airbytehq/airbyte/pull/2883) | Fix CSV discovery memory consumption | +| 0.2.1 | 2021-04-03 | [2726](https://github.com/airbytehq/airbyte/pull/2726) | Fix base connector versioning | +| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | +| 0.1.10 | 2021-02-18 | [2118](https://github.com/airbytehq/airbyte/pull/2118) | Support JSONL format | +| 0.1.9 | 2021-02-02 | [1768](https://github.com/airbytehq/airbyte/pull/1768) | Add test cases for all formats | +| 0.1.8 | 2021-01-27 | [1738](https://github.com/airbytehq/airbyte/pull/1738) | Adopt connector best practices | +| 0.1.7 | 2020-12-16 | [1331](https://github.com/airbytehq/airbyte/pull/1331) | Refactor Python base connector | +| 0.1.6 | 2020-12-08 | [1249](https://github.com/airbytehq/airbyte/pull/1249) | Handle NaN values | +| 0.1.5 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | + diff --git a/docs/integrations/sources/github.md b/docs/integrations/sources/github.md index 3a1274d1959..a0fb528c262 100644 --- a/docs/integrations/sources/github.md +++ b/docs/integrations/sources/github.md @@ -41,21 +41,17 @@ This connector outputs the following incremental streams: ### Notes -1. Only 3 streams from above 12 incremental streams (`comments`, `commits` and `issues`) are pure incremental -meaning that they: - - read only new records; - - output only new records. +1. Only 3 streams from above 12 incremental streams \(`comments`, `commits` and `issues`\) are pure incremental meaning that they: + * read only new records; + * output only new records. - Other 8 incremental streams are also incremental but with one difference, they: - - read all records; - - output only new records. + Other 8 incremental streams are also incremental but with one difference, they: - Please, consider this behaviour when using those 8 incremental streams because it may affect you API call limits. + * read all records; + * output only new records. -2. We are passing few parameters (`since`, `sort` and `direction`) to GitHub in order to filter records and sometimes - for large streams specifying very distant `start_date` in the past may result in keep on getting error from GitHub - instead of records (respective `WARN` log message will be outputted). In this case Specifying more recent - `start_date` may help. + Please, consider this behaviour when using those 8 incremental streams because it may affect you API call limits. +2. We are passing few parameters \(`since`, `sort` and `direction`\) to GitHub in order to filter records and sometimes for large streams specifying very distant `start_date` in the past may result in keep on getting error from GitHub instead of records \(respective `WARN` log message will be outputted\). In this case Specifying more recent `start_date` may help. ### Features @@ -82,7 +78,6 @@ The Github connector should not run into Github API limitations under normal usa **Note**: if you want to specify the organization to receive data from all its repositories, then you should specify it according to the following pattern: `/*` - ### Setup guide Log into Github and then generate a [personal access token](https://github.com/settings/tokens). @@ -95,21 +90,22 @@ Your token should have at least the `repo` scope. Depending on which streams you ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.3 | 2021-10-06 | [6833](https://github.com/airbytehq/airbyte/pull/6833) | Fix config backward compatability | -| 0.2.2 | 2021-10-05 | [6761](https://github.com/airbytehq/airbyte/pull/6761) | Add oauth worflow specification | -| 0.2.1 | 2021-09-22 | [6223](https://github.com/airbytehq/airbyte/pull/6223) | Add option to pull commits from user-specified branches | -| 0.2.0 | 2021-09-19 | [5898](https://github.com/airbytehq/airbyte/pull/5898) and [6227](https://github.com/airbytehq/airbyte/pull/6227) | Don't minimize any output fields & add better error handling | -| 0.1.11 | 2021-09-15 | [5949](https://github.com/airbytehq/airbyte/pull/5949) | Add caching for all streams | -| 0.1.10 | 2021-09-09 | [5860](https://github.com/airbytehq/airbyte/pull/5860) | Add reaction streams | -| 0.1.9 | 2021-09-02 | [5788](https://github.com/airbytehq/airbyte/pull/5788) | Handling empty repository, check method using RepositoryStats stream | -| 0.1.8 | 2021-09-01 | [5757](https://github.com/airbytehq/airbyte/pull/5757) | Add more streams | -| 0.1.7 | 2021-08-27 | [5696](https://github.com/airbytehq/airbyte/pull/5696) | Handle negative backoff values | -| 0.1.6 | 2021-08-18 | [5456](https://github.com/airbytehq/airbyte/pull/5223) | Add MultipleTokenAuthenticator | -| 0.1.5 | 2021-08-18 | [5456](https://github.com/airbytehq/airbyte/pull/5456) | Fix set up validation | -| 0.1.4 | 2021-08-13 | [5136](https://github.com/airbytehq/airbyte/pull/5136) | Support syncing multiple repositories/organizations | -| 0.1.3 | 2021-08-03 | [5156](https://github.com/airbytehq/airbyte/pull/5156) | Extended existing schemas with `users` property for certain streams | -| 0.1.2 | 2021-07-13 | [4708](https://github.com/airbytehq/airbyte/pull/4708) | Fix bug with IssueEvents stream and add handling for rate limiting | -| 0.1.1 | 2021-07-07 | [4590](https://github.com/airbytehq/airbyte/pull/4590) | Fix schema in the `pull_request` stream | -| 0.1.0 | 2021-07-06 | [4174](https://github.com/airbytehq/airbyte/pull/4174) | New Source: GitHub | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.3 | 2021-10-06 | [6833](https://github.com/airbytehq/airbyte/pull/6833) | Fix config backward compatability | +| 0.2.2 | 2021-10-05 | [6761](https://github.com/airbytehq/airbyte/pull/6761) | Add oauth worflow specification | +| 0.2.1 | 2021-09-22 | [6223](https://github.com/airbytehq/airbyte/pull/6223) | Add option to pull commits from user-specified branches | +| 0.2.0 | 2021-09-19 | [5898](https://github.com/airbytehq/airbyte/pull/5898) and [6227](https://github.com/airbytehq/airbyte/pull/6227) | Don't minimize any output fields & add better error handling | +| 0.1.11 | 2021-09-15 | [5949](https://github.com/airbytehq/airbyte/pull/5949) | Add caching for all streams | +| 0.1.10 | 2021-09-09 | [5860](https://github.com/airbytehq/airbyte/pull/5860) | Add reaction streams | +| 0.1.9 | 2021-09-02 | [5788](https://github.com/airbytehq/airbyte/pull/5788) | Handling empty repository, check method using RepositoryStats stream | +| 0.1.8 | 2021-09-01 | [5757](https://github.com/airbytehq/airbyte/pull/5757) | Add more streams | +| 0.1.7 | 2021-08-27 | [5696](https://github.com/airbytehq/airbyte/pull/5696) | Handle negative backoff values | +| 0.1.6 | 2021-08-18 | [5456](https://github.com/airbytehq/airbyte/pull/5223) | Add MultipleTokenAuthenticator | +| 0.1.5 | 2021-08-18 | [5456](https://github.com/airbytehq/airbyte/pull/5456) | Fix set up validation | +| 0.1.4 | 2021-08-13 | [5136](https://github.com/airbytehq/airbyte/pull/5136) | Support syncing multiple repositories/organizations | +| 0.1.3 | 2021-08-03 | [5156](https://github.com/airbytehq/airbyte/pull/5156) | Extended existing schemas with `users` property for certain streams | +| 0.1.2 | 2021-07-13 | [4708](https://github.com/airbytehq/airbyte/pull/4708) | Fix bug with IssueEvents stream and add handling for rate limiting | +| 0.1.1 | 2021-07-07 | [4590](https://github.com/airbytehq/airbyte/pull/4590) | Fix schema in the `pull_request` stream | +| 0.1.0 | 2021-07-06 | [4174](https://github.com/airbytehq/airbyte/pull/4174) | New Source: GitHub | + diff --git a/docs/integrations/sources/gitlab.md b/docs/integrations/sources/gitlab.md index 1e43dcf15b4..a00b4d941f4 100644 --- a/docs/integrations/sources/gitlab.md +++ b/docs/integrations/sources/gitlab.md @@ -53,9 +53,9 @@ Log into Gitlab and then generate a [personal access token](https://docs.gitlab. Your token should have the `read_api` scope, that Grants read access to the API, including all groups and projects, the container registry, and the package registry. - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-07-06 | [4174](https://github.com/airbytehq/airbyte/pull/4174) | Initial Release | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-07-06 | [4174](https://github.com/airbytehq/airbyte/pull/4174) | Initial Release | + diff --git a/docs/integrations/sources/google-ads.md b/docs/integrations/sources/google-ads.md index df23471009d..b5c13b64a81 100644 --- a/docs/integrations/sources/google-ads.md +++ b/docs/integrations/sources/google-ads.md @@ -1,7 +1,7 @@ # Google Ads {% hint style="warning" %} -If you don't already have a developer token from Google Ads, make sure you follow the [instructions](#how-to-apply-for-the-developer-token) so your request doesn't get denied. +If you don't already have a developer token from Google Ads, make sure you follow the [instructions](google-ads.md#how-to-apply-for-the-developer-token) so your request doesn't get denied. {% endhint %} ## Features @@ -18,22 +18,24 @@ If you don't already have a developer token from Google Ads, make sure you follo This source is capable of syncing the following tables and their data: #### Main Tables + * [accounts](https://developers.google.com/google-ads/api/fields/v8/customer) -* [ad_group_ads](https://developers.google.com/google-ads/api/fields/v8/ad_group_ad) -* [ad_groups](https://developers.google.com/google-ads/api/fields/v8/ad_group) +* [ad\_group\_ads](https://developers.google.com/google-ads/api/fields/v8/ad_group_ad) +* [ad\_groups](https://developers.google.com/google-ads/api/fields/v8/ad_group) * [campaigns](https://developers.google.com/google-ads/api/fields/v8/campaign) -* [click_view](https://developers.google.com/google-ads/api/reference/rpc/v8/ClickView) +* [click\_view](https://developers.google.com/google-ads/api/reference/rpc/v8/ClickView) #### Report Tables -* [account_performance_report](https://developers.google.com/google-ads/api/docs/migration/mapping#account_performance) -* [ad_group_ad_report](https://developers.google.com/google-ads/api/docs/migration/mapping#ad_performance) -* [display_keyword_report](https://developers.google.com/google-ads/api/docs/migration/mapping#display_keyword_performance) -* [display_topics_report](https://developers.google.com/google-ads/api/docs/migration/mapping#display_topics_performance) -* [shopping_performance_report](https://developers.google.com/google-ads/api/docs/migration/mapping#shopping_performance) + +* [account\_performance\_report](https://developers.google.com/google-ads/api/docs/migration/mapping#account_performance) +* [ad\_group\_ad\_report](https://developers.google.com/google-ads/api/docs/migration/mapping#ad_performance) +* [display\_keyword\_report](https://developers.google.com/google-ads/api/docs/migration/mapping#display_keyword_performance) +* [display\_topics\_report](https://developers.google.com/google-ads/api/docs/migration/mapping#display_topics_performance) +* [shopping\_performance\_report](https://developers.google.com/google-ads/api/docs/migration/mapping#shopping_performance) **Note**: Due to constraints from the Google Ads API, the `click_view` stream retrieves data one day at a time and can only retrieve data newer than 90 days ago -## Getting Started (Airbyte-Cloud) +## Getting Started \(Airbyte-Cloud\) 1. Click `Authenticate your Google Ads account` to sign in with Google and authorize your account. 2. Get the customer ID for your account. Learn how to do that [here](https://support.google.com/google-ads/answer/1704344) @@ -41,19 +43,19 @@ This source is capable of syncing the following tables and their data: 4. Fill out a start date, and optionally, a conversion window, and custom [GAQL](https://developers.google.com/google-ads/api/docs/query/overview). 5. You're done. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) #### Requirements Google Ads Account with an approved Developer Token \(note: In order to get API access to Google Ads, you must have a "manager" account. This must be created separately from your standard account. You can find more information about this distinction in the [google ads docs](https://ads.google.com/home/tools/manager-accounts/).\) -* developer_token -* client_id -* client_secret -* refresh_token -* start_date -* customer_id -* login_customer_id (you can find more information about this field in [Google Ads docs](https://developers.google.com/google-ads/api/docs/concepts/call-structure#cid)) +* developer\_token +* client\_id +* client\_secret +* refresh\_token +* start\_date +* customer\_id +* login\_customer\_id \(you can find more information about this field in [Google Ads docs](https://developers.google.com/google-ads/api/docs/concepts/call-structure#cid)\) #### Setup guide @@ -62,7 +64,7 @@ This guide will provide information as if starting from scratch. Please skip ove * Create an Google Ads Account. Here are [Google's instruction](https://support.google.com/google-ads/answer/6366720) on how to create one. * Create an Google Ads MANAGER Account. Here are [Google's instruction](https://ads.google.com/home/tools/manager-accounts/) on how to create one. * You should now have two Google Ads accounts: a normal account and a manager account. Link the Manager account to the normal account following [Google's documentation](https://support.google.com/google-ads/answer/7459601). -* Apply for a developer token \(**make sure you follow our** [**instructions**](#how-to-apply-for-the-developer-token)\) on your Manager account. This token allows you to access your data from the Google Ads API. Here are [Google's instructions](https://developers.google.com/google-ads/api/docs/first-call/dev-token). The docs are a little unclear on this point, but you will _not_ be able to access your data via the Google Ads API until this token is approved. You cannot use a test developer token, it has to be at least a basic developer token. It usually takes Google 24 hours to respond to these applications. This developer token is the value you will use in the `developer_token` field. +* Apply for a developer token \(**make sure you follow our** [**instructions**](google-ads.md#how-to-apply-for-the-developer-token)\) on your Manager account. This token allows you to access your data from the Google Ads API. Here are [Google's instructions](https://developers.google.com/google-ads/api/docs/first-call/dev-token). The docs are a little unclear on this point, but you will _not_ be able to access your data via the Google Ads API until this token is approved. You cannot use a test developer token, it has to be at least a basic developer token. It usually takes Google 24 hours to respond to these applications. This developer token is the value you will use in the `developer_token` field. * Fetch your `client_id`, `client_secret`, and `refresh_token`. Google provides [instructions](https://developers.google.com/google-ads/api/docs/first-call/overview) on how to do this. * Select your `customer_id`. The `customer_is` refer to the id of each of your Google Ads accounts. This is the 10 digit number in the top corner of the page when you are in google ads ui. The source will only pull data from the accounts for which you provide an id. If you are having trouble finding it, check out [Google's instructions](https://support.google.com/google-ads/answer/1704344). @@ -85,26 +87,26 @@ If for any reason the request gets denied, let us know and we will be able to un The Google Ads Query Language can query the Google Ads API. Check out [Google Ads Query Language](https://developers.google.com/google-ads/api/docs/query/overview) -## Rate Limiting & Performance Considerations (Airbyte Open Source) +## Rate Limiting & Performance Considerations \(Airbyte Open Source\) This source is constrained by whatever API limits are set for the Google Ads that is used. You can read more about those limits in the [Google Developer docs](https://developers.google.com/google-ads/api/docs/best-practices/quotas). - ## CHANGELOG | Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | +| :--- | :--- | :--- | :--- | | `0.1.15` | 2021-10-07 | [6684](https://github.com/airbytehq/airbyte/pull/6684) | Add new stream `click_view` | | `0.1.14` | 2021-10-01 | [6565](https://github.com/airbytehq/airbyte/pull/6565) | Fix OAuth Spec File | | `0.1.13` | 2021-09-27 | [6458](https://github.com/airbytehq/airbyte/pull/6458) | Update OAuth Spec File | -| `0.1.11` | 2021-09-22 | [#6373](https://github.com/airbytehq/airbyte/pull/6373) | Fix inconsistent segments.date field type across all streams | -| `0.1.10` | 2021-09-13 | [#6022](https://github.com/airbytehq/airbyte/pull/6022) | Annotate Oauth2 flow initialization parameters in connector spec | -| `0.1.9` | 2021-09-07 | [#5302](https://github.com/airbytehq/airbyte/pull/5302) | Add custom query stream support | -| `0.1.8` | 2021-08-03 | [#5509](https://github.com/airbytehq/airbyte/pull/5509) | allow additionalProperties in spec.json | -| `0.1.7` | 2021-08-03 | [#5422](https://github.com/airbytehq/airbyte/pull/5422) | Correct query to not skip dates | -| `0.1.6` | 2021-08-03 | [#5423](https://github.com/airbytehq/airbyte/pull/5423) | Added new stream UserLocationReport | -| `0.1.5` | 2021-08-03 | [#5159](https://github.com/airbytehq/airbyte/pull/5159) | Add field `login_customer_id` to spec | -| `0.1.4` | 2021-07-28 | [#4962](https://github.com/airbytehq/airbyte/pull/4962) | Support new Report streams | -| `0.1.3` | 2021-07-23 | [#4788](https://github.com/airbytehq/airbyte/pull/4788) | Support main streams, fix bug with exception `DATE_RANGE_TOO_NARROW` for incremental streams | -| `0.1.2` | 2021-07-06 | [#4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | -| `0.1.1` | 2021-06-23 | [#4288](https://github.com/airbytehq/airbyte/pull/4288) | `Bugfix: Correctly declare required parameters ` | +| `0.1.11` | 2021-09-22 | [\#6373](https://github.com/airbytehq/airbyte/pull/6373) | Fix inconsistent segments.date field type across all streams | +| `0.1.10` | 2021-09-13 | [\#6022](https://github.com/airbytehq/airbyte/pull/6022) | Annotate Oauth2 flow initialization parameters in connector spec | +| `0.1.9` | 2021-09-07 | [\#5302](https://github.com/airbytehq/airbyte/pull/5302) | Add custom query stream support | +| `0.1.8` | 2021-08-03 | [\#5509](https://github.com/airbytehq/airbyte/pull/5509) | allow additionalProperties in spec.json | +| `0.1.7` | 2021-08-03 | [\#5422](https://github.com/airbytehq/airbyte/pull/5422) | Correct query to not skip dates | +| `0.1.6` | 2021-08-03 | [\#5423](https://github.com/airbytehq/airbyte/pull/5423) | Added new stream UserLocationReport | +| `0.1.5` | 2021-08-03 | [\#5159](https://github.com/airbytehq/airbyte/pull/5159) | Add field `login_customer_id` to spec | +| `0.1.4` | 2021-07-28 | [\#4962](https://github.com/airbytehq/airbyte/pull/4962) | Support new Report streams | +| `0.1.3` | 2021-07-23 | [\#4788](https://github.com/airbytehq/airbyte/pull/4788) | Support main streams, fix bug with exception `DATE_RANGE_TOO_NARROW` for incremental streams | +| `0.1.2` | 2021-07-06 | [\#4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | +| `0.1.1` | 2021-06-23 | [\#4288](https://github.com/airbytehq/airbyte/pull/4288) | `Bugfix: Correctly declare required parameters` | + diff --git a/docs/integrations/sources/google-adwords.md b/docs/integrations/sources/google-adwords.md index 3382cfc4019..51f2c0ab945 100644 --- a/docs/integrations/sources/google-adwords.md +++ b/docs/integrations/sources/google-adwords.md @@ -1,4 +1,4 @@ -# Google Adwords (Deprecated) +# Google Adwords As mentioned by Google, the AdWords API will sunset in [April 2022](https://ads-developers.googleblog.com/2021/04/upgrade-to-google-ads-api-from-adwords.html). Migrate all requests to the Google Ads API by then to continue managing your Google Ads accounts. @@ -22,9 +22,9 @@ This Adwords source wraps the [Singer Adwords Tap](https://github.com/singer-io/ Several tables and their data are available from this source \(accounts, campaigns, ads, etc.\) For a comprehensive output schema [look at the Singer tap schema files](https://github.com/singer-io/tap-adwords/tree/master/tap_adwords/schemas). -## Getting Started (Airbyte Open-Source / Airbyte Cloud) +## Getting Started \(Airbyte Open-Source / Airbyte Cloud\) -#### Requirements +### Requirements * Google Adwords Manager Account with an approved Developer Token \(note: In order to get API access to Google Adwords, you must have a "manager" account. This must be created separately from your standard account. You can find more information about this distinction in the [Google Ads docs](https://ads.google.com/home/tools/manager-accounts/).\) @@ -54,12 +54,13 @@ If for any reason the request gets denied, let us know and we will be able to un Tokens issued after April 28, 2021 are only given access to the Google Ads API as the AdWords API is no longer available for new users. Thus, this source can only be used if you already have a token issued previously. A new source using the Google Ads API is being built \(see [issue 3457](https://github.com/airbytehq/airbyte/issues/3457) for more information\). -## Rate Limiting & Performance Considerations (Airbyte Open-Source) +## Rate Limiting & Performance Considerations \(Airbyte Open-Source\) This source is constrained by whatever API limits are set for the Google Adwords Manager that is used. You can read more about those limits in the [Google Developer docs](https://developers.google.com/adwords/api/faq#access). ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.2 | 2021-06-25 | [4205](https://github.com/airbytehq/airbyte/pull/4205) | Set up CDK SAT tests. Incremental tests are disabled due to unsupported state structure in current tests: required structure: {stream_name: cursor_value} given {‘bookmarks’: {stream_name: cursor_value}} | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.2 | 2021-06-25 | [4205](https://github.com/airbytehq/airbyte/pull/4205) | Set up CDK SAT tests. Incremental tests are disabled due to unsupported state structure in current tests: required structure: {stream\_name: cursor\_value} given {‘bookmarks’: {stream\_name: cursor\_value}} | + diff --git a/docs/integrations/sources/google-analytics-v4.md b/docs/integrations/sources/google-analytics-v4.md index 5a5cf6c57d7..1a59867e5c4 100644 --- a/docs/integrations/sources/google-analytics-v4.md +++ b/docs/integrations/sources/google-analytics-v4.md @@ -1,4 +1,4 @@ -# Google Analytics V4 +# Google Analytics ## Features @@ -14,32 +14,33 @@ This source is capable of syncing the following tables and their data: -* website_overview -* traffic_sources +* website\_overview +* traffic\_sources * pages * locations -* monthly_active_users -* four_weekly_active_users -* two_weekly_active_users -* weekly_active_users -* daily_active_users +* monthly\_active\_users +* four\_weekly\_active\_users +* two\_weekly\_active\_users +* weekly\_active\_users +* daily\_active\_users * devices * Any custom reports. See [below](https://docs.airbyte.io/integrations/sources/google-analytics-v4#reading-custom-reports-from-google-analytics) for details. Please reach out to us on Slack or [create an issue](https://github.com/airbytehq/airbyte/issues) if you need to send custom Google Analytics report data with Airbyte. -## Getting Started (Airbyte Cloud) +## Getting Started \(Airbyte Cloud\) 1. Click `OAuth2.0 authorization` then `Authenticate your Google Analytics account`. 2. Find your View ID for the view you want to fetch data from. Find it [here](https://ga-dev-tools.web.app/account-explorer/). 3. Enter a start date, window size, and custom report information. 4. You're done. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) There are 2 options of setting up authorization for this source: - - Create service account specifically for Airbyte and authorize with JWT. Select "JWT authorization" from the "Authentication mechanism" dropdown list. - - Use your Google account and authorize over Google's OAuth on connection setup. Select "Default OAuth2.0 authorization" from dropdown list. + +* Create service account specifically for Airbyte and authorize with JWT. Select "JWT authorization" from the "Authentication mechanism" dropdown list. +* Use your Google account and authorize over Google's OAuth on connection setup. Select "Default OAuth2.0 authorization" from dropdown list. #### Create a Service Account @@ -116,7 +117,7 @@ A custom report can contain no more than 10 unique metrics. The default availabl Incremental sync supports only if you add `ga:date` dimension to your custom report. -## Rate Limits & Performance Considerations (Airbyte Open-Source) +## Rate Limits & Performance Considerations \(Airbyte Open-Source\) [Analytics Reporting API v4](https://developers.google.com/analytics/devguides/reporting/core/v4/limits-quotas) @@ -129,11 +130,12 @@ The Google Analytics connector should not run into Google Analytics API limitati ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.7 | 2021-10-07 | [6414](https://github.com/airbytehq/airbyte/pull/6414) | Declare oauth parameters in google sources | -| 0.1.6 | 2021-09-27 | [6459](https://github.com/airbytehq/airbyte/pull/6459) | Update OAuth Spec File | -| 0.1.3 | 2021-09-21 | [6357](https://github.com/airbytehq/airbyte/pull/6357) | Fix oauth workflow parameters | -| 0.1.2 | 2021-09-20 | [6306](https://github.com/airbytehq/airbyte/pull/6306) | Support of airbyte OAuth initialization flow | -| 0.1.1 | 2021-08-25 | [5655](https://github.com/airbytehq/airbyte/pull/5655) | Corrected validation of empty custom report| -| 0.1.0 | 2021-08-10 | [5290](https://github.com/airbytehq/airbyte/pull/5290) | Initial Release| +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.7 | 2021-10-07 | [6414](https://github.com/airbytehq/airbyte/pull/6414) | Declare oauth parameters in google sources | +| 0.1.6 | 2021-09-27 | [6459](https://github.com/airbytehq/airbyte/pull/6459) | Update OAuth Spec File | +| 0.1.3 | 2021-09-21 | [6357](https://github.com/airbytehq/airbyte/pull/6357) | Fix oauth workflow parameters | +| 0.1.2 | 2021-09-20 | [6306](https://github.com/airbytehq/airbyte/pull/6306) | Support of airbyte OAuth initialization flow | +| 0.1.1 | 2021-08-25 | [5655](https://github.com/airbytehq/airbyte/pull/5655) | Corrected validation of empty custom report | +| 0.1.0 | 2021-08-10 | [5290](https://github.com/airbytehq/airbyte/pull/5290) | Initial Release | + diff --git a/docs/integrations/sources/google-search-console.md b/docs/integrations/sources/google-search-console.md index e7a14832886..e771b096b32 100644 --- a/docs/integrations/sources/google-search-console.md +++ b/docs/integrations/sources/google-search-console.md @@ -10,7 +10,7 @@ This Source is capable of syncing the following Streams: * [Sites](https://developers.google.com/webmaster-tools/search-console-api-original/v3/sites/get) * [Sitemaps](https://developers.google.com/webmaster-tools/search-console-api-original/v3/sitemaps/list) -* [Full Analytics report](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query) (this stream has a long sync time because it is very detailed, use with care) +* [Full Analytics report](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query) \(this stream has a long sync time because it is very detailed, use with care\) * [Analytics report by country](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query) * [Analytics report by date](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query) * [Analytics report by device](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query) @@ -43,41 +43,43 @@ This connector attempts to back off gracefully when it hits Reports API's rate l ### Requirements -* Credentials to a Google Service Account (or Google Service Account with delegated Domain Wide Authority) or Google User Account +* Credentials to a Google Service Account \(or Google Service Account with delegated Domain Wide Authority\) or Google User Account ## How to create the client credentials for Google Search Console, to use with Airbyte? You can either: + * Use the existing `Service Account` for your Google Project with granted Admin Permissions * Use your personal Google User Account with oauth. If you choose this option, your account must have permissions to view the Google Search Console project you choose. * Create the new `Service Account` credentials for your Google Project, and grant Admin Permissions to it * Follow the `Delegating domain-wide authority` process to obtain the necessary permissions to your google account from the administrator of Workspace ### Creating a Google service account + A service account's credentials include a generated email address that is unique and at least one public/private key pair. If domain-wide delegation is enabled, then a client ID is also part of the service account's credentials. 1. Open the [Service accounts page](https://console.developers.google.com/iam-admin/serviceaccounts) 2. If prompted, select an existing project, or create a new project. 3. Click `+ Create service account`. 4. Under Service account details, type a `name`, `ID`, and `description` for the service account, then click `Create`. - * Optional: Under `Service account permissions`, select the `IAM roles` to grant to the service account, then click `Continue`. - * Optional: Under `Grant users access to this service account`, add the `users` or `groups` that are allowed to use and manage the service account. + * Optional: Under `Service account permissions`, select the `IAM roles` to grant to the service account, then click `Continue`. + * Optional: Under `Grant users access to this service account`, add the `users` or `groups` that are allowed to use and manage the service account. 5. Go to [API Console/Credentials](https://console.cloud.google.com/apis/credentials), check the `Service Accounts` section, click on the Email address of service account you just created. 6. Open `Details` tab and find `Show domain-wide delegation`, checkmark the `Enable Google Workspace Domain-wide Delegation`. 7. On `Keys` tab click `+ Add key`, then click `Create new key`. -Your new public/private key pair should be now generated and downloaded to your machine as `.json` you can find it in the `Downloads` folder or somewhere else if you use another default destination for downloaded files. This file serves as the only copy of the private key. You are responsible for storing it securely. -If you lose this key pair, you will need to generate a new one! +Your new public/private key pair should be now generated and downloaded to your machine as `.json` you can find it in the `Downloads` folder or somewhere else if you use another default destination for downloaded files. This file serves as the only copy of the private key. You are responsible for storing it securely. If you lose this key pair, you will need to generate a new one! + +### Using the existing Service Account -### Using the existing Service Account 1. Go to [API Console/Credentials](https://console.cloud.google.com/apis/credentials), check the `Service Accounts` section, click on the Email address of service account you just created. 2. Click on `Details` tab and find `Show domain-wide delegation`, checkmark the `Enable Google Workspace Domain-wide Delegation`. -2. On `Keys` tab click `+ Add key`, then click `Create new key`. +3. On `Keys` tab click `+ Add key`, then click `Create new key`. -Your new public/private key pair should be now generated and downloaded to your machine as `.json` you can find it in the `Downloads` folder or somewhere else if you use another default destination for downloaded files. This file serves as the only copy of the private key. You are responsible for storing it securely. -If you lose this key pair, you will need to generate a new one! +Your new public/private key pair should be now generated and downloaded to your machine as `.json` you can find it in the `Downloads` folder or somewhere else if you use another default destination for downloaded files. This file serves as the only copy of the private key. You are responsible for storing it securely. If you lose this key pair, you will need to generate a new one! ### Note + You can return to the [API Console/Credentials](https://console.cloud.google.com/apis/credentials) at any time to view the email address, public key fingerprints, and other information, or to generate additional public/private key pairs. For more details about service account credentials in the API Console, see [Service accounts](https://cloud.google.com/iam/docs/understanding-service-accounts) in the API Console help file. ### Create a Service Account with delegated domain-wide authority @@ -90,14 +92,14 @@ At the end of this process, you should have JSON credentials to this Google Serv You should now be ready to use the Google Workspace Admin Reports API connector in Airbyte. - ## CHANGELOG | Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | +| :--- | :--- | :--- | :--- | | `0.1.6` | 2021-09-27 | [6460](https://github.com/airbytehq/airbyte/pull/6460) | Update OAuth Spec File | | `0.1.4` | 2021-09-23 | [6394](https://github.com/airbytehq/airbyte/pull/6394) | Update Doc link Spec File | | `0.1.3` | 2021-09-23 | [6405](https://github.com/airbytehq/airbyte/pull/6405) | Correct Spec File | | `0.1.2` | 2021-09-17 | [6222](https://github.com/airbytehq/airbyte/pull/6222) | Correct Spec File | | `0.1.1` | 2021-09-22 | [6315](https://github.com/airbytehq/airbyte/pull/6315) | Verify access to all sites when performing connection check | | `0.1.0` | 2021-09-03 | [5350](https://github.com/airbytehq/airbyte/pull/5350) | Initial Release | + diff --git a/docs/integrations/sources/google-sheets.md b/docs/integrations/sources/google-sheets.md index a0b069530cf..bdcb43bd348 100644 --- a/docs/integrations/sources/google-sheets.md +++ b/docs/integrations/sources/google-sheets.md @@ -84,18 +84,18 @@ The Airbyte UI will ask for two things: 1. The spreadsheet ID 2. The content of the credentials JSON you created in the "Create a Service Account and Service Account Key" step above. This should be as simple as opening the file and copy-pasting all its contents into this field in the Airbyte UI. - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.5 | 2021-09-12 | [5972](https://github.com/airbytehq/airbyte/pull/5972) | Fix full_refresh test by adding supported_sync_modes to Stream initialization | -| 0.2.4 | 2021-08-05 | [5233](https://github.com/airbytehq/airbyte/pull/5233) | Fix error during listing sheets with diagram only | -| 0.2.3 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE_ENTRYPOINT for Kubernetes support | -| 0.2.2 | 2021-04-20 | [2994](https://github.com/airbytehq/airbyte/pull/2994) | Formatting spec | -| 0.2.1 | 2021-04-03 | [2726](https://github.com/airbytehq/airbyte/pull/2726) | Fix base connector versioning | -| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | -| 0.1.7 | 2021-01-21 | [1762](https://github.com/airbytehq/airbyte/pull/1762) | Fix issue large spreadsheet | -| 0.1.6 | 2021-01-27 | [1668](https://github.com/airbytehq/airbyte/pull/1668) | Adopt connector best practices | -| 0.1.5 | 2020-12-30 | [1438](https://github.com/airbytehq/airbyte/pull/1438) | Implement backoff | -| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.5 | 2021-09-12 | [5972](https://github.com/airbytehq/airbyte/pull/5972) | Fix full\_refresh test by adding supported\_sync\_modes to Stream initialization | +| 0.2.4 | 2021-08-05 | [5233](https://github.com/airbytehq/airbyte/pull/5233) | Fix error during listing sheets with diagram only | +| 0.2.3 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE\_ENTRYPOINT for Kubernetes support | +| 0.2.2 | 2021-04-20 | [2994](https://github.com/airbytehq/airbyte/pull/2994) | Formatting spec | +| 0.2.1 | 2021-04-03 | [2726](https://github.com/airbytehq/airbyte/pull/2726) | Fix base connector versioning | +| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | +| 0.1.7 | 2021-01-21 | [1762](https://github.com/airbytehq/airbyte/pull/1762) | Fix issue large spreadsheet | +| 0.1.6 | 2021-01-27 | [1668](https://github.com/airbytehq/airbyte/pull/1668) | Adopt connector best practices | +| 0.1.5 | 2020-12-30 | [1438](https://github.com/airbytehq/airbyte/pull/1438) | Implement backoff | +| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | + diff --git a/docs/integrations/sources/greenhouse.md b/docs/integrations/sources/greenhouse.md index 3e433d4f4e8..444101faccb 100644 --- a/docs/integrations/sources/greenhouse.md +++ b/docs/integrations/sources/greenhouse.md @@ -48,6 +48,7 @@ Please follow the [Greenhouse documentation for generating an API key](https://d ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.4 | 2021-09-15 | [6238](https://github.com/airbytehq/airbyte/pull/6238) | added identification of accessible streams for API keys with limited permissions | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.4 | 2021-09-15 | [6238](https://github.com/airbytehq/airbyte/pull/6238) | added identification of accessible streams for API keys with limited permissions | + diff --git a/docs/integrations/sources/harvest.md b/docs/integrations/sources/harvest.md index 013b08bc6d0..b4bc7f88642 100644 --- a/docs/integrations/sources/harvest.md +++ b/docs/integrations/sources/harvest.md @@ -2,8 +2,7 @@ ## Overview -The Harvest connector can be used to sync your Harvest data. It supports full refresh sync for all streams and incremental sync for all streams except of Expense Reports streams which are: Clients Report, Projects Report, Categories Report, Team Report. -Incremental sync is also now available for Company stream, but it always has only one record. +The Harvest connector can be used to sync your Harvest data. It supports full refresh sync for all streams and incremental sync for all streams except of Expense Reports streams which are: Clients Report, Projects Report, Categories Report, Team Report. Incremental sync is also now available for Company stream, but it always has only one record. ### Output schema @@ -35,7 +34,6 @@ Several output streams are available from this source: * [Time Reports](https://help.getharvest.com/api-v2/reports-api/reports/time-reports/) * [Project Budget Report](https://help.getharvest.com/api-v2/reports-api/reports/project-budget-report/) - ### Features | Feature | Supported? | @@ -62,19 +60,19 @@ The Harvest connector will gracefully handle rate limits. For more information, This connector supports only authentication with API Key. To obtain API key follow the instructions below: 1. Go to Account Settings page; -1. Under Integrations section press Authorized OAuth2 API Clients button; -1. New page will be opened on which you need to click on Create New Personal Access Token button and follow instructions. +2. Under Integrations section press Authorized OAuth2 API Clients button; +3. New page will be opened on which you need to click on Create New Personal Access Token button and follow instructions. See [docs](https://help.getharvest.com/api-v2/authentication-api/authentication/authentication/) for more details. - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.5 | 2021-09-28 | [5747](https://github.com/airbytehq/airbyte/pull/5747) | Update schema date-time fields | -| 0.1.4 | 2021-06-22 | [5701](https://github.com/airbytehq/airbyte/pull/5071) | Harvest normalization failure: fixing the schemas | -| 0.1.3 | 2021-06-22 | [4274](https://github.com/airbytehq/airbyte/pull/4274) | Fix wrong data type on `statement_key` in `clients` stream | -| 0.1.2 | 2021-06-07 | [4222](https://github.com/airbytehq/airbyte/pull/4222) | Correct specification parameter name | -| 0.1.1 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | -| 0.1.0 | 2021-06-07 | [3709](https://github.com/airbytehq/airbyte/pull/3709) | Release Harvest connector! | \ No newline at end of file +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.5 | 2021-09-28 | [5747](https://github.com/airbytehq/airbyte/pull/5747) | Update schema date-time fields | +| 0.1.4 | 2021-06-22 | [5701](https://github.com/airbytehq/airbyte/pull/5071) | Harvest normalization failure: fixing the schemas | +| 0.1.3 | 2021-06-22 | [4274](https://github.com/airbytehq/airbyte/pull/4274) | Fix wrong data type on `statement_key` in `clients` stream | +| 0.1.2 | 2021-06-07 | [4222](https://github.com/airbytehq/airbyte/pull/4222) | Correct specification parameter name | +| 0.1.1 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | +| 0.1.0 | 2021-06-07 | [3709](https://github.com/airbytehq/airbyte/pull/3709) | Release Harvest connector! | + diff --git a/docs/integrations/sources/hubspot.md b/docs/integrations/sources/hubspot.md index b35e0eb3c5b..092f9c34237 100644 --- a/docs/integrations/sources/hubspot.md +++ b/docs/integrations/sources/hubspot.md @@ -9,9 +9,9 @@ | Replicate Incremental Deletes | No | | SSL connection | Yes | -## Troubleshooting +## Troubleshooting -Check out common troubleshooting issues for the Hubspot connector on our Discourse [here](https://discuss.airbyte.io/tags/c/connector/11/source-hubspot). +Check out common troubleshooting issues for the Hubspot connector on our Discourse [here](https://discuss.airbyte.io/tags/c/connector/11/source-hubspot). ## Supported Tables @@ -22,7 +22,7 @@ This source is capable of syncing the following tables and their data: * [Contact Lists](http://developers.hubspot.com/docs/methods/lists/get_lists) * [Contacts](https://developers.hubspot.com/docs/methods/contacts/get_contacts) * [Deal Pipelines](https://developers.hubspot.com/docs/methods/pipelines/get_pipelines_for_object_type) -* [Deals](https://developers.hubspot.com/docs/api/crm/deals) (including Contact associations) +* [Deals](https://developers.hubspot.com/docs/api/crm/deals) \(including Contact associations\) * [Email Events](https://developers.hubspot.com/docs/methods/email/get_events) \(Incremental\) * [Engagements](https://legacydocs.hubspot.com/docs/methods/engagements/get-all-engagements) * [Forms](https://developers.hubspot.com/docs/api/marketing/forms) @@ -34,7 +34,7 @@ This source is capable of syncing the following tables and their data: * [Tickets](https://developers.hubspot.com/docs/api/crm/tickets) * [Workflows](https://legacydocs.hubspot.com/docs/methods/workflows/v3/get_workflows) -## Getting Started (Airbyte Open-Source / Airbyte Cloud) +## Getting Started \(Airbyte Open-Source / Airbyte Cloud\) #### Requirements @@ -55,7 +55,8 @@ The connector is restricted by normal Hubspot [rate limitations](https://legacyd When connector reads the stream using `API Key` that doesn't have neccessary permissions to read particular stream, like `workflows`, which requires to be enabled in order to be processed, the log message returned to the output and sync operation goes on with other streams available. Example of the output message when trying to read `workflows` stream with missing permissions for the `API Key`: -``` + +```text { "type": "LOG", "log": { @@ -70,12 +71,12 @@ Example of the output message when trying to read `workflows` stream with missin If you are using Oauth, most of the streams require the appropriate [scopes](https://legacydocs.hubspot.com/docs/methods/oauth2/initiate-oauth-integration#scopes) enabled for the API account. | Stream | Required Scope | -| :--- | :---- | +| :--- | :--- | | `campaigns` | `content` | | `companies` | `contacts` | | `contact_lists` | `contacts` | | `contacts` | `contacts` | -| `deal_pipelines` | either the `contacts` scope (to fetch deals pipelines) or the `tickets` scope. | +| `deal_pipelines` | either the `contacts` scope \(to fetch deals pipelines\) or the `tickets` scope. | | `deals` | `contacts` | | `email_events` | `content` | | `engagements` | `contacts` | @@ -90,15 +91,16 @@ If you are using Oauth, most of the streams require the appropriate [scopes](htt ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.16 | 2021-09-27 | [6465](https://github.com/airbytehq/airbyte/pull/6465) | Implement OAuth support. Use CDK authenticator instead of connector specific authenticator | -| 0.1.15 | 2021-09-23 | [6374](https://github.com/airbytehq/airbyte/pull/6374) | Use correct schema for `owners` stream | -| 0.1.14 | 2021-09-08 | [5693](https://github.com/airbytehq/airbyte/pull/5693) | Include deal_to_contact association when pulling deal stream and include contact ID in contact stream | -| 0.1.13 | 2021-09-08 | [5834](https://github.com/airbytehq/airbyte/pull/5834) | Fixed array fields without items property in schema | -| 0.1.12 | 2021-09-02 | [5798](https://github.com/airbytehq/airbyte/pull/5798) | Treat empty string values as None for field with format to fix normalization errors | -| 0.1.11 | 2021-08-26 | [5685](https://github.com/airbytehq/airbyte/pull/5685) | Remove all date-time format from schemas | -| 0.1.10 | 2021-08-17 | [5463](https://github.com/airbytehq/airbyte/pull/5463) | Fix fail on reading stream using `API Key` without required permissions | -| 0.1.9 | 2021-08-11 | [5334](https://github.com/airbytehq/airbyte/pull/5334) | Fix empty strings inside float datatype | -| 0.1.8 | 2021-08-06 | [5250](https://github.com/airbytehq/airbyte/pull/5250) | Fix issue with printing exceptions | -| 0.1.7 | 2021-07-27 | [4913](https://github.com/airbytehq/airbyte/pull/4913) | Update fields schema | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.16 | 2021-09-27 | [6465](https://github.com/airbytehq/airbyte/pull/6465) | Implement OAuth support. Use CDK authenticator instead of connector specific authenticator | +| 0.1.15 | 2021-09-23 | [6374](https://github.com/airbytehq/airbyte/pull/6374) | Use correct schema for `owners` stream | +| 0.1.14 | 2021-09-08 | [5693](https://github.com/airbytehq/airbyte/pull/5693) | Include deal\_to\_contact association when pulling deal stream and include contact ID in contact stream | +| 0.1.13 | 2021-09-08 | [5834](https://github.com/airbytehq/airbyte/pull/5834) | Fixed array fields without items property in schema | +| 0.1.12 | 2021-09-02 | [5798](https://github.com/airbytehq/airbyte/pull/5798) | Treat empty string values as None for field with format to fix normalization errors | +| 0.1.11 | 2021-08-26 | [5685](https://github.com/airbytehq/airbyte/pull/5685) | Remove all date-time format from schemas | +| 0.1.10 | 2021-08-17 | [5463](https://github.com/airbytehq/airbyte/pull/5463) | Fix fail on reading stream using `API Key` without required permissions | +| 0.1.9 | 2021-08-11 | [5334](https://github.com/airbytehq/airbyte/pull/5334) | Fix empty strings inside float datatype | +| 0.1.8 | 2021-08-06 | [5250](https://github.com/airbytehq/airbyte/pull/5250) | Fix issue with printing exceptions | +| 0.1.7 | 2021-07-27 | [4913](https://github.com/airbytehq/airbyte/pull/4913) | Update fields schema | + diff --git a/docs/integrations/sources/instagram.md b/docs/integrations/sources/instagram.md index d4f4edbb987..a87e3c7f0e2 100644 --- a/docs/integrations/sources/instagram.md +++ b/docs/integrations/sources/instagram.md @@ -81,9 +81,10 @@ With the Instagram Account ID and API access token, you should be ready to start ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.9 | 2021-09-30 | [6438](https://github.com/airbytehq/airbyte/pull/6438) | Annotate Oauth2 flow initialization parameters in connector specification | -| 0.1.8 | 2021-08-11 | [5354](https://github.com/airbytehq/airbyte/pull/5354) | added check for empty state and fixed tests.| -| 0.1.7 | 2021-07-19 | [4805](https://github.com/airbytehq/airbyte/pull/4805) | Add support for previous format of STATE.| -| 0.1.6 | 2021-07-07 | [4210](https://github.com/airbytehq/airbyte/pull/4210) | Refactor connector to use CDK:
- improve error handling.
- fix sync fail with HTTP status 400.
- integrate SAT.| +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.9 | 2021-09-30 | [6438](https://github.com/airbytehq/airbyte/pull/6438) | Annotate Oauth2 flow initialization parameters in connector specification | +| 0.1.8 | 2021-08-11 | [5354](https://github.com/airbytehq/airbyte/pull/5354) | added check for empty state and fixed tests. | +| 0.1.7 | 2021-07-19 | [4805](https://github.com/airbytehq/airbyte/pull/4805) | Add support for previous format of STATE. | +| 0.1.6 | 2021-07-07 | [4210](https://github.com/airbytehq/airbyte/pull/4210) | Refactor connector to use CDK: - improve error handling. - fix sync fail with HTTP status 400. - integrate SAT. | + diff --git a/docs/integrations/sources/intercom.md b/docs/integrations/sources/intercom.md index c9187ea0acd..69332587df2 100644 --- a/docs/integrations/sources/intercom.md +++ b/docs/integrations/sources/intercom.md @@ -12,7 +12,7 @@ Several output streams are available from this source: * [Admins](https://developers.intercom.com/intercom-api-reference/reference#list-admins) \(Full table\) * [Companies](https://developers.intercom.com/intercom-api-reference/reference#list-companies) \(Incremental\) - * [Company Segments](https://developers.intercom.com/intercom-api-reference/reference#list-attached-segments-1) \(Incremental\) + * [Company Segments](https://developers.intercom.com/intercom-api-reference/reference#list-attached-segments-1) \(Incremental\) * [Conversations](https://developers.intercom.com/intercom-api-reference/reference#list-conversations) \(Incremental\) * [Conversation Parts](https://developers.intercom.com/intercom-api-reference/reference#get-a-single-conversation) \(Incremental\) * [Data Attributes](https://developers.intercom.com/intercom-api-reference/reference#data-attributes) \(Full table\) @@ -53,11 +53,12 @@ Please read [How to get your Access Token](https://developers.intercom.com/build ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.5 | 2021-09-28 | [6082](https://github.com/airbytehq/airbyte/pull/6082) | Corrected android_last_seen_at field data type in schemas | -| 0.1.4 | 2021-09-20 | [6087](https://github.com/airbytehq/airbyte/pull/6087) | Corrected updated_at field data type in schemas | -| 0.1.3 | 2021-09-08 | [5908](https://github.com/airbytehq/airbyte/pull/5908) | Corrected timestamp and arrays in schemas | -| 0.1.2 | 2021-08-19 | [5531](https://github.com/airbytehq/airbyte/pull/5531) | Corrected pagination | -| 0.1.1 | 2021-07-31 | [5123](https://github.com/airbytehq/airbyte/pull/5123) | Corrected rate limit | -| 0.1.0 | 2021-07-19 | [4676](https://github.com/airbytehq/airbyte/pull/4676) | Release Slack CDK Connector | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.5 | 2021-09-28 | [6082](https://github.com/airbytehq/airbyte/pull/6082) | Corrected android\_last\_seen\_at field data type in schemas | +| 0.1.4 | 2021-09-20 | [6087](https://github.com/airbytehq/airbyte/pull/6087) | Corrected updated\_at field data type in schemas | +| 0.1.3 | 2021-09-08 | [5908](https://github.com/airbytehq/airbyte/pull/5908) | Corrected timestamp and arrays in schemas | +| 0.1.2 | 2021-08-19 | [5531](https://github.com/airbytehq/airbyte/pull/5531) | Corrected pagination | +| 0.1.1 | 2021-07-31 | [5123](https://github.com/airbytehq/airbyte/pull/5123) | Corrected rate limit | +| 0.1.0 | 2021-07-19 | [4676](https://github.com/airbytehq/airbyte/pull/4676) | Release Slack CDK Connector | + diff --git a/docs/integrations/sources/iterable.md b/docs/integrations/sources/iterable.md index 9c249c82ef2..01f6cbfe262 100644 --- a/docs/integrations/sources/iterable.md +++ b/docs/integrations/sources/iterable.md @@ -57,6 +57,7 @@ Please read [How to find your API key](https://support.iterable.com/hc/en-us/art ## CHANGELOG | Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| `0.1.8` | 2021-09-20 | [5915](https://github.com/airbytehq/airbyte/pull/5915) | Add new streams: campaign_metrics, events | +| :--- | :--- | :--- | :--- | +| `0.1.8` | 2021-09-20 | [5915](https://github.com/airbytehq/airbyte/pull/5915) | Add new streams: campaign\_metrics, events | | `0.1.7` | 2021-09-20 | [6242](https://github.com/airbytehq/airbyte/pull/6242) | Updated schema for: campaigns, lists, templates, metadata | + diff --git a/docs/integrations/sources/jira.md b/docs/integrations/sources/jira.md index 177c3e02580..506110e80fd 100644 --- a/docs/integrations/sources/jira.md +++ b/docs/integrations/sources/jira.md @@ -3,10 +3,10 @@ ## Features | Feature | Supported? | | -| :--- | :--- | :--- +| :--- | :--- | :--- | | Full Refresh Sync | Yes | | | Incremental Sync | Yes | Only Issues | -| Replicate Incremental Deletes | Coming soon | | +| Replicate Incremental Deletes | Coming soon | | | SSL connection | Yes | | ## Troubleshooting @@ -67,7 +67,7 @@ This source is capable of syncing the following tables and their data: If there are more endpoints you'd like Airbyte to support, please [create an issue.](https://github.com/airbytehq/airbyte/issues/new/choose) -## Getting Started (Airbyte Open-Source / Airbyte Cloud) +## Getting Started \(Airbyte Open-Source / Airbyte Cloud\) ### Requirements @@ -84,12 +84,13 @@ The Jira connector should not run into Jira API limitations under normal usage. ## CHANGELOG | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.2.11 | 2021-09-02 | [#6523](https://github.com/airbytehq/airbyte/pull/6523) | Add cache and more streams (boards and sprints) | -| 0.2.9 | 2021-07-28 | [#5426](https://github.com/airbytehq/airbyte/pull/5426) | Changed cursor field from fields.created to fields.updated for Issues stream. Made Issues worklogs stream full refresh. | -| 0.2.8 | 2021-07-28 | [#4947](https://github.com/airbytehq/airbyte/pull/4947) | Source Jira: fixing schemas accordinately to response. | -| 0.2.7 | 2021-07-19 | [#4817](https://github.com/airbytehq/airbyte/pull/4817) | Fixed `labels` schema properties issue. | -| 0.2.6 | 2021-06-15 | [#4113](https://github.com/airbytehq/airbyte/pull/4113) | Fixed `user` stream with the correct endpoint and query param. | -| 0.2.5 | 2021-06-09 | [#3973](https://github.com/airbytehq/airbyte/pull/3973) | Added `AIRBYTE_ENTRYPOINT` in base Docker image for Kubernetes support. | -| 0.2.4 | | | Implementing base_read acceptance test dived by stream groups. | -| 0.2.3 | | | Implementing incremental sync. Migrated to airbyte-cdk. Adding all available entities in Jira Cloud. | +| :--- | :--- | :--- | :--- | +| 0.2.11 | 2021-09-02 | [\#6523](https://github.com/airbytehq/airbyte/pull/6523) | Add cache and more streams \(boards and sprints\) | +| 0.2.9 | 2021-07-28 | [\#5426](https://github.com/airbytehq/airbyte/pull/5426) | Changed cursor field from fields.created to fields.updated for Issues stream. Made Issues worklogs stream full refresh. | +| 0.2.8 | 2021-07-28 | [\#4947](https://github.com/airbytehq/airbyte/pull/4947) | Source Jira: fixing schemas accordinately to response. | +| 0.2.7 | 2021-07-19 | [\#4817](https://github.com/airbytehq/airbyte/pull/4817) | Fixed `labels` schema properties issue. | +| 0.2.6 | 2021-06-15 | [\#4113](https://github.com/airbytehq/airbyte/pull/4113) | Fixed `user` stream with the correct endpoint and query param. | +| 0.2.5 | 2021-06-09 | [\#3973](https://github.com/airbytehq/airbyte/pull/3973) | Added `AIRBYTE_ENTRYPOINT` in base Docker image for Kubernetes support. | +| 0.2.4 | | | Implementing base\_read acceptance test dived by stream groups. | +| 0.2.3 | | | Implementing incremental sync. Migrated to airbyte-cdk. Adding all available entities in Jira Cloud. | + diff --git a/docs/integrations/sources/kafka.md b/docs/integrations/sources/kafka.md index d4e2379541f..a7971b615d4 100644 --- a/docs/integrations/sources/kafka.md +++ b/docs/integrations/sources/kafka.md @@ -10,8 +10,7 @@ The Airbyte Kafka source allows you to sync data from Kafka. Each Kafka topic is Each Kafka topic will be output into a stream. -Currently, this connector only reads data with JSON format. More formats (e.g. Apache Avro) will be supported in -the future. +Currently, this connector only reads data with JSON format. More formats \(e.g. Apache Avro\) will be supported in the future. #### Features @@ -41,18 +40,15 @@ Airbyte should be allowed to read messages from topics, and these topics should #### Target topics -You can determine the topics from which messages are read via the `topic_pattern` configuration parameter. -Messages can be read from a hardcoded, pre-defined topic. +You can determine the topics from which messages are read via the `topic_pattern` configuration parameter. Messages can be read from a hardcoded, pre-defined topic. -To read all messages from a single hardcoded topic, enter its name in the `topic_pattern` field -e.g: setting `topic_pattern` to `my-topic-name` will read all messages from that topic. +To read all messages from a single hardcoded topic, enter its name in the `topic_pattern` field e.g: setting `topic_pattern` to `my-topic-name` will read all messages from that topic. -You can determine the topic partitions from which messages are read via the `topic_partitions` configuration parameter. +You can determine the topic partitions from which messages are read via the `topic_partitions` configuration parameter. ### Setup the Kafka destination in Airbyte -You should now have all the requirements needed to configure Kafka as a destination in the UI. You can configure the -following parameters on the Kafka destination (though many of these are optional or have default values): +You should now have all the requirements needed to configure Kafka as a destination in the UI. You can configure the following parameters on the Kafka destination \(though many of these are optional or have default values\): * **Bootstrap servers** * **Topic pattern** @@ -74,3 +70,4 @@ following parameters on the Kafka destination (though many of these are optional More info about this can be found in the [Kafka consumer configs documentation site](https://kafka.apache.org/documentation/#consumerconfigs). ## Changelog + diff --git a/docs/integrations/sources/klaviyo.md b/docs/integrations/sources/klaviyo.md index 4c5bc5afc25..dc7ce68f901 100644 --- a/docs/integrations/sources/klaviyo.md +++ b/docs/integrations/sources/klaviyo.md @@ -2,8 +2,7 @@ ## Sync overview -This source can sync data for the [Klaviyo API](https://apidocs.klaviyo.com/reference/api-overview). It supports both Full Refresh and Incremental syncs. -You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. +This source can sync data for the [Klaviyo API](https://apidocs.klaviyo.com/reference/api-overview). It supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. ### Output schema @@ -28,15 +27,14 @@ This Source is capable of syncing the following core Streams: | Feature | Supported?\(Yes/No\) | Notes | | :--- | :--- | :--- | | Full Refresh Sync | Yes | | -| Incremental Sync | Yes | Only Events | +| Incremental Sync | Yes | Only Events | | Namespaces | No | | ### Performance considerations The connector is restricted by normal Klaviyo [requests limitation](https://apidocs.klaviyo.com/reference/api-overview#rate-limits). -The Klaviyo connector should not run into Klaviyo API limitations under normal usage. -Please [create an issue](https://github.com/airbytehq/airbyte/issues) if you see any rate limit issues that are not automatically retried successfully. +The Klaviyo connector should not run into Klaviyo API limitations under normal usage. Please [create an issue](https://github.com/airbytehq/airbyte/issues) if you see any rate limit issues that are not automatically retried successfully. ## Getting started @@ -46,5 +44,5 @@ Please [create an issue](https://github.com/airbytehq/airbyte/issues) if you see ### Setup guide -Please follow these [steps](https://help.klaviyo.com/hc/en-us/articles/115005062267-How-to-Manage-Your-Account-s-API-Keys#your-private-api-keys3) -to obtain Private API Key for your account. +Please follow these [steps](https://help.klaviyo.com/hc/en-us/articles/115005062267-How-to-Manage-Your-Account-s-API-Keys#your-private-api-keys3) to obtain Private API Key for your account. + diff --git a/docs/integrations/sources/kustomer.md b/docs/integrations/sources/kustomer.md index 8cbf64e189d..82c8c57633c 100644 --- a/docs/integrations/sources/kustomer.md +++ b/docs/integrations/sources/kustomer.md @@ -12,33 +12,35 @@ This Source Connector is based on a [Singer tap](https://github.com/singer-io/ta This Source is capable of syncing the following core Streams: -- [Conversations](https://developer.kustomer.com/kustomer-api-docs/reference/conversations) -- [Customers](https://developer.kustomer.com/kustomer-api-docs/reference/customers) -- [KObjects](https://developer.kustomer.com/kustomer-api-docs/reference/kobjects-custom-objects) -- [Messages](https://developer.kustomer.com/kustomer-api-docs/reference/messages) -- [Notes](https://developer.kustomer.com/kustomer-api-docs/reference/notes) -- [Shortcuts](https://developer.kustomer.com/kustomer-api-docs/reference/shortcuts) -- [Tags](https://developer.kustomer.com/kustomer-api-docs/reference/tags-knowledge-base) -- [Teams](https://developer.kustomer.com/kustomer-api-docs/reference/teams) -- [Users](https://developer.kustomer.com/kustomer-api-docs/reference/users) +* [Conversations](https://developer.kustomer.com/kustomer-api-docs/reference/conversations) +* [Customers](https://developer.kustomer.com/kustomer-api-docs/reference/customers) +* [KObjects](https://developer.kustomer.com/kustomer-api-docs/reference/kobjects-custom-objects) +* [Messages](https://developer.kustomer.com/kustomer-api-docs/reference/messages) +* [Notes](https://developer.kustomer.com/kustomer-api-docs/reference/notes) +* [Shortcuts](https://developer.kustomer.com/kustomer-api-docs/reference/shortcuts) +* [Tags](https://developer.kustomer.com/kustomer-api-docs/reference/tags-knowledge-base) +* [Teams](https://developer.kustomer.com/kustomer-api-docs/reference/teams) +* [Users](https://developer.kustomer.com/kustomer-api-docs/reference/users) ### Features -| Feature | Supported?\(Yes/No\) | Notes | -| :------------------------ | :------------------- | :---- | -| Full Refresh Sync | Yes | | -| Incremental - Append Sync | Yes | | -| Namespaces | No | | +| Feature | Supported?\(Yes/No\) | Notes | +| :--- | :--- | :--- | +| Full Refresh Sync | Yes | | +| Incremental - Append Sync | Yes | | +| Namespaces | No | | ### Performance considerations Kustomer has some [rate limit restrictions](https://developer.kustomer.com/kustomer-api-docs/reference/rate-limiting). ## Requirements + * **Kustomer API token**. See the [Kustomer docs](https://help.kustomer.com/api-keys-SJs5YTIWX) for information on how to obtain an API token. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :--------- | :----------------------------------------------------- | :---------------------------- | -| 0.1.0 | 2021-07-22 | [4550](https://github.com/airbytehq/airbyte/pull/4550) | Add Kustomer Source Connector | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-07-22 | [4550](https://github.com/airbytehq/airbyte/pull/4550) | Add Kustomer Source Connector | + diff --git a/docs/integrations/sources/lever-hiring.md b/docs/integrations/sources/lever-hiring.md index 2c3534d93e7..2c18aafb0c9 100644 --- a/docs/integrations/sources/lever-hiring.md +++ b/docs/integrations/sources/lever-hiring.md @@ -10,22 +10,22 @@ This source can sync data for the [Lever Hiring API](https://hire.lever.co/devel This Source is capable of syncing the following core Streams: -- [Applications](https://hire.lever.co/developer/documentation#list-all-applications) -- [Interviews](https://hire.lever.co/developer/documentation#list-all-interviews) -- [Notes](https://hire.lever.co/developer/documentation#list-all-notes) -- [Offers](https://hire.lever.co/developer/documentation#list-all-offers) -- [Opportunities](https://hire.lever.co/developer/documentation#list-all-opportunities) -- [Referrals](https://hire.lever.co/developer/documentation#list-all-referrals) -- [Users](https://hire.lever.co/developer/documentation#list-all-users) +* [Applications](https://hire.lever.co/developer/documentation#list-all-applications) +* [Interviews](https://hire.lever.co/developer/documentation#list-all-interviews) +* [Notes](https://hire.lever.co/developer/documentation#list-all-notes) +* [Offers](https://hire.lever.co/developer/documentation#list-all-offers) +* [Opportunities](https://hire.lever.co/developer/documentation#list-all-opportunities) +* [Referrals](https://hire.lever.co/developer/documentation#list-all-referrals) +* [Users](https://hire.lever.co/developer/documentation#list-all-users) ### Features -| Feature | Supported?\(Yes/No\) | Notes | -| :------------------------ | :------------------- | :---- | -| Full Refresh Sync | Yes | | -| Incremental - Append Sync | Yes | | -| SSL connection | Yes | | -| Namespaces | No | | +| Feature | Supported?\(Yes/No\) | Notes | +| :--- | :--- | :--- | +| Full Refresh Sync | Yes | | +| Incremental - Append Sync | Yes | | +| SSL connection | Yes | | +| Namespaces | No | | ### Performance considerations @@ -41,6 +41,7 @@ The Lever Hiring connector should not run into Lever Hiring API limitations unde ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :--------- | :----------------------------------------------------- | :---------------------------- | -| 0.1.0 | 2021-09-22 | [6141](https://github.com/airbytehq/airbyte/pull/6141) | Add Lever Hiring Source Connector | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-09-22 | [6141](https://github.com/airbytehq/airbyte/pull/6141) | Add Lever Hiring Source Connector | + diff --git a/docs/integrations/sources/linkedin-ads.md b/docs/integrations/sources/linkedin-ads.md index 902b2e991cd..9039a1f2be1 100644 --- a/docs/integrations/sources/linkedin-ads.md +++ b/docs/integrations/sources/linkedin-ads.md @@ -4,12 +4,12 @@ The LinkedIn Ads source supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. -This Source Connector is based on a [Airbyte CDK](https://docs.airbyte.io/connector-development/cdk-python). -Airbyte uses [LinkedIn Marketing Developer Platform - API](https://docs.microsoft.com/en-us/linkedin/marketing/integrations/marketing-integrations-overview) to fetch data from LinkedIn Ads. +This Source Connector is based on a [Airbyte CDK](https://docs.airbyte.io/connector-development/cdk-python). Airbyte uses [LinkedIn Marketing Developer Platform - API](https://docs.microsoft.com/en-us/linkedin/marketing/integrations/marketing-integrations-overview) to fetch data from LinkedIn Ads. ### Output schema This Source is capable of syncing the following data as streams: + * [Accounts](https://docs.microsoft.com/en-us/linkedin/marketing/integrations/ads/account-structure/create-and-manage-accounts?tabs=http#search-for-accounts) * [Account Users](https://docs.microsoft.com/en-us/linkedin/marketing/integrations/ads/account-structure/create-and-manage-account-users?tabs=http#find-ad-account-users-by-accounts) * [Campaign Groups](https://docs.microsoft.com/en-us/linkedin/marketing/integrations/ads/account-structure/create-and-manage-campaign-groups?tabs=http#search-for-campaign-groups) @@ -19,8 +19,8 @@ This Source is capable of syncing the following data as streams: * [Ad Analytics by Campaign](https://docs.microsoft.com/en-us/linkedin/marketing/integrations/ads-reporting/ads-reporting?tabs=curl#ad-analytics) * [Ad Analytics by Creative](https://docs.microsoft.com/en-us/linkedin/marketing/integrations/ads-reporting/ads-reporting?tabs=curl#ad-analytics) - ### NOTE: + `Ad Direct Sponsored Contents` includes the information about VIDEO ADS, as well as SINGLE IMAGE ADS and other directly sponsored ads your account might have. ### Data type mapping @@ -40,83 +40,89 @@ This Source is capable of syncing the following data as streams: | Feature | Supported?\(Yes/No\) | Notes | | :--- | :--- | :--- | | Full Refresh Overwrite Sync | Yes | | -| Full Refresh Append Sync | Yes | | +| Full Refresh Append Sync | Yes | | | Incremental - Append Sync | Yes | | | Incremental - Append + Deduplication Sync | Yes | | | Namespaces | No | | - ### Performance considerations -There are official Rate Limits for LinkedIn Ads API Usage, [more information here](https://docs.microsoft.com/en-us/linkedin/shared/api-guide/concepts/rate-limits?context=linkedin/marketing/context). -Rate limited requests will receive a 429 response. In rare cases, LinkedIn may also return a 429 response as part of infrastructure protection. API service will return to normal automatically. -In such cases you will receive the next error message: -``` +There are official Rate Limits for LinkedIn Ads API Usage, [more information here](https://docs.microsoft.com/en-us/linkedin/shared/api-guide/concepts/rate-limits?context=linkedin/marketing/context). Rate limited requests will receive a 429 response. In rare cases, LinkedIn may also return a 429 response as part of infrastructure protection. API service will return to normal automatically. In such cases you will receive the next error message: + +```text "Caught retryable error ' or null' after tries. Waiting seconds then retrying..." ``` -This is expected when the connector hits the 429 - Rate Limit Exceeded HTTP Error. -If the maximum of available API requests capacity is reached, you will have the following message: -``` + +This is expected when the connector hits the 429 - Rate Limit Exceeded HTTP Error. If the maximum of available API requests capacity is reached, you will have the following message: + +```text "Max try rate limit exceded..." ``` -After 5 unsuccessful attempts - the connector will stop the sync operation. -In such cases check your Rate Limits [on this page](https://www.linkedin.com/developers/apps) > Choose you app > Analytics +After 5 unsuccessful attempts - the connector will stop the sync operation. In such cases check your Rate Limits [on this page](https://www.linkedin.com/developers/apps) > Choose you app > Analytics ## Getting started ### Authentication + The source LinkedIn uses `access_token` provided in the UI connector's settings to make API requests. Access tokens expire after `2 months from generating date (60 days)` and require a user to manually authenticate again. If you receive a `401 invalid token response`, the error logs will state that your access token has expired and to re-authenticate your connection to generate a new token. This is described more [here](https://docs.microsoft.com/en-us/linkedin/shared/authentication/authorization-code-flow?context=linkedin/context). The API user account should be assigned one of the following roles: -* ACCOUNT_BILLING_ADMIN -* ACCOUNT_MANAGER -* CAMPAIGN_MANAGER -* CREATIVE_MANAGER -* VIEWER (Recommended) + +* ACCOUNT\_BILLING\_ADMIN +* ACCOUNT\_MANAGER +* CAMPAIGN\_MANAGER +* CREATIVE\_MANAGER +* VIEWER \(Recommended\) The API user account should be assigned the following permissions for the API endpoints: -Endpoints such as: -`Accounts`, `Account Users`, `Ad Direct Sponsored Contents`, `Campaign Groups`, `Campaings`, `Creatives` requires the next permissions set: +Endpoints such as: `Accounts`, `Account Users`, `Ad Direct Sponsored Contents`, `Campaign Groups`, `Campaings`, `Creatives` requires the next permissions set: - * `r_ads`: read ads (Recommended), `rw_ads`: read-write ads +* `r_ads`: read ads \(Recommended\), `rw_ads`: read-write ads Endpoints such as: `Ad Analytics by Campaign`, `Ad Analytics by Creatives` requires the next permissions set: - * `r_ads_reporting`: read ads reporting +* `r_ads_reporting`: read ads reporting The complete set of prmissions is: - * `r_emailaddress,r_liteprofile,r_ads,r_ads_reporting,r_organization_social` -### Generate the Access_Token +* `r_emailaddress,r_liteprofile,r_ads,r_ads_reporting,r_organization_social` + +### Generate the Access\_Token + 1. **Login to LinkedIn as the API user.** 2. **Create an App** [here](https://www.linkedin.com/developers/apps): - * `App Name`: airbyte-source - * `Company`: search and find your company LinkedIn page - * `Privacy policy URL`: link to company privacy policy - * `Business email`: developer/admin email address - * `App logo`: Airbyte's (or Company's) logo - * `Products`: Select [Marketing Developer Platform](https://www.linkedin.com/developers/apps/122632736/products/marketing-developer-platform) (checkbox) - Review/agree to legal terms and create app. + * `App Name`: airbyte-source + * `Company`: search and find your company LinkedIn page + * `Privacy policy URL`: link to company privacy policy + * `Business email`: developer/admin email address + * `App logo`: Airbyte's \(or Company's\) logo + * `Products`: Select [Marketing Developer Platform](https://www.linkedin.com/developers/apps/122632736/products/marketing-developer-platform) \(checkbox\) + + Review/agree to legal terms and create app. 3. **Verify App**: - * Provide the verify URL to your Company's LinkedIn Admin to verify and authorize the app. - * Once verified, select the App in the Console [here](https://www.linkedin.com/developers/apps). - * **Review the `Auth` tab**: - * Record `client_id` and `client_secret` (for later steps). - * Review permissions and ensure app has the permissions (above). - * Oauth 2.0 settings: Provide a `redirect_uri` (for later steps): `https://airbyte.io` - * Review the `Products` tab and ensure `Marketing Developer Platform` has been added and approved (listed in the `Products` section/tab). - * Review the `Usage & limits` tab. This shows the daily application and user/member limits with percent used for each resource endpoint. -4. **Authorize App**: (The authorization token `lasts 60-days before expiring`. The connector app will need to be reauthorized when the authorization token expires.): + * Provide the verify URL to your Company's LinkedIn Admin to verify and authorize the app. + * Once verified, select the App in the Console [here](https://www.linkedin.com/developers/apps). + * **Review the `Auth` tab**: + * Record `client_id` and `client_secret` \(for later steps\). + * Review permissions and ensure app has the permissions \(above\). + * Oauth 2.0 settings: Provide a `redirect_uri` \(for later steps\): `https://airbyte.io` + * Review the `Products` tab and ensure `Marketing Developer Platform` has been added and approved \(listed in the `Products` section/tab\). + * Review the `Usage & limits` tab. This shows the daily application and user/member limits with percent used for each resource endpoint. +4. **Authorize App**: \(The authorization token `lasts 60-days before expiring`. The connector app will need to be reauthorized when the authorization token expires.\): + Create an Authorization URL with the following pattern: - * The permissions set you need to use is: `r_emailaddress,r_liteprofile,r_ads,r_ads_reporting,r_organization_social` - * URL pattern: Provide the scope from permissions above (with + delimiting each permission) and replace the other highlighted parameters: `https://www.linkedin.com/oauth/v2/authorization?response_type=code&client_id=YOUR_CLIENT_ID&redirect_uri=YOUR_REDIRECT_URI&scope=r_emailaddress,r_liteprofile,r_ads,r_ads_reporting,r_organization_social` - * Modify and open the `url` in the browser. - * Once redirected, click `Allow` to authorize app. - * The browser will be redirected to the `redirect_uri`. Record the `code` parameter listed in the redirect URL in the Browser header URL. + + * The permissions set you need to use is: `r_emailaddress,r_liteprofile,r_ads,r_ads_reporting,r_organization_social` + * URL pattern: Provide the scope from permissions above \(with + delimiting each permission\) and replace the other highlighted parameters: `https://www.linkedin.com/oauth/v2/authorization?response_type=code&client_id=YOUR_CLIENT_ID&redirect_uri=YOUR_REDIRECT_URI&scope=r_emailaddress,r_liteprofile,r_ads,r_ads_reporting,r_organization_social` + * Modify and open the `url` in the browser. + * Once redirected, click `Allow` to authorize app. + * The browser will be redirected to the `redirect_uri`. Record the `code` parameter listed in the redirect URL in the Browser header URL. + 5. **Run the following curl command** using `Terminal` or `Command line` with the parameters replaced to return your `access_token`. The `access_token` expires in 2-months. - ``` + + ```text curl -0 -v -X POST https://www.linkedin.com/oauth/v2/accessToken\ -H "Accept: application/json"\ -H "application/x-www-form-urlencoded"\ @@ -125,12 +131,14 @@ The complete set of prmissions is: -d "client_id=YOUR_CLIENT_ID"\ -d "client_secret=YOUR_CLIENT_SECRET"\ -d "redirect_uri=YOUR_REDIRECT_URI" - ``` + ``` + 6. **Use the `access_token`** from response from the `Step 5` to autorize LinkedIn Ads connector. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :-------- | :------ | -| 0.1.1 | 2021-10-02 | [6610](https://github.com/airbytehq/airbyte/pull/6610) | Fix for `Campaigns/targetingCriteria` transformation, coerced `Creatives/variables/values` to string by default | -| 0.1.0 | 2021-09-05 | [5285](https://github.com/airbytehq/airbyte/pull/5285) | Initial release of Native LinkedIn Ads connector for Airbyte | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-10-02 | [6610](https://github.com/airbytehq/airbyte/pull/6610) | Fix for `Campaigns/targetingCriteria` transformation, coerced `Creatives/variables/values` to string by default | +| 0.1.0 | 2021-09-05 | [5285](https://github.com/airbytehq/airbyte/pull/5285) | Initial release of Native LinkedIn Ads connector for Airbyte | + diff --git a/docs/integrations/sources/looker.md b/docs/integrations/sources/looker.md index f3a691ce282..323e729a242 100644 --- a/docs/integrations/sources/looker.md +++ b/docs/integrations/sources/looker.md @@ -77,11 +77,12 @@ Please read the "API3 Key" section in [Looker's information for users docs](http ## CHANGELOG | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.2.4 | 2021-06-25 | [#3911](https://github.com/airbytehq/airbyte/pull/3911) | Added `run_look` endpoint. | -| 0.2.3 | 2021-06-22 | [#3587](https://github.com/airbytehq/airbyte/pull/3587) | Added support for self-hosted instances. | -| 0.2.2 | 2021-06-09 | [#3973](https://github.com/airbytehq/airbyte/pull/3973) | Added `AIRBYTE_ENTRYPOINT` for kubernetes support. | -| 0.2.1 | 2021-04-02 | [#2726](https://github.com/airbytehq/airbyte/pull/2726) | Fixed connector base versioning. | -| 0.2.0 | 2021-03-09 | [#2238](https://github.com/airbytehq/airbyte/pull/2238) | Allowed future / unknown properties in the protocol. | -| 0.1.1 | 2021-01-27 | [#1857](https://github.com/airbytehq/airbyte/pull/1857) | Fix failed CI tests. | -| 0.1.0 | 2020-12-24 | [#1441](https://github.com/airbytehq/airbyte/pull/1441) | Added looker connector. | +| :--- | :--- | :--- | :--- | +| 0.2.4 | 2021-06-25 | [\#3911](https://github.com/airbytehq/airbyte/pull/3911) | Added `run_look` endpoint. | +| 0.2.3 | 2021-06-22 | [\#3587](https://github.com/airbytehq/airbyte/pull/3587) | Added support for self-hosted instances. | +| 0.2.2 | 2021-06-09 | [\#3973](https://github.com/airbytehq/airbyte/pull/3973) | Added `AIRBYTE_ENTRYPOINT` for kubernetes support. | +| 0.2.1 | 2021-04-02 | [\#2726](https://github.com/airbytehq/airbyte/pull/2726) | Fixed connector base versioning. | +| 0.2.0 | 2021-03-09 | [\#2238](https://github.com/airbytehq/airbyte/pull/2238) | Allowed future / unknown properties in the protocol. | +| 0.1.1 | 2021-01-27 | [\#1857](https://github.com/airbytehq/airbyte/pull/1857) | Fix failed CI tests. | +| 0.1.0 | 2020-12-24 | [\#1441](https://github.com/airbytehq/airbyte/pull/1441) | Added looker connector. | + diff --git a/docs/integrations/sources/magento.md b/docs/integrations/sources/magento.md index 126db8d925d..69718d8dbcf 100644 --- a/docs/integrations/sources/magento.md +++ b/docs/integrations/sources/magento.md @@ -4,12 +4,13 @@ ## Sync overview -Magento runs on MySQL. You can use Airbyte to sync your Magento instance by connecting to the underlying database using the [MySQL connector](mysql.md). - +Magento runs on MySQL. You can use Airbyte to sync your Magento instance by connecting to the underlying database using the [MySQL connector](mysql.md). {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} ### Output schema -The output schema is described in the [Magento docs](https://docs.magento.com/mbi/data-analyst/importing-data/integrations/magento-data.html). See the [MySQL connector](mysql.md) for more info on general rules followed by the MySQL connector when moving data. + +The output schema is described in the [Magento docs](https://docs.magento.com/mbi/data-analyst/importing-data/integrations/magento-data.html). See the [MySQL connector](mysql.md) for more info on general rules followed by the MySQL connector when moving data. + diff --git a/docs/integrations/sources/mailchimp.md b/docs/integrations/sources/mailchimp.md index c4b5ee1e463..b875cd6eb8b 100644 --- a/docs/integrations/sources/mailchimp.md +++ b/docs/integrations/sources/mailchimp.md @@ -48,15 +48,16 @@ To start syncing Mailchimp data with Airbyte, you'll need two things: ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.8 | 2021-08-17 | [5481](https://github.com/airbytehq/airbyte/pull/5481) | Remove date-time type from some fields | -| 0.2.7 | 2021-08-03 | [5137](https://github.com/airbytehq/airbyte/pull/5137) | Source Mailchimp: fix primary key for email activities | -| 0.2.6 | 2021-07-28 | [5024](https://github.com/airbytehq/airbyte/pull/5024) | Source Mailchimp: handle records with no no "activity" field in response | -| 0.2.5 | 2021-07-08 | [4621](https://github.com/airbytehq/airbyte/pull/4621) | Mailchimp fix url-base | -| 0.2.4 | 2021-06-09 | [4285](https://github.com/airbytehq/airbyte/pull/4285) | Use datacenter URL parameter from apikey | -| 0.2.3 | 2021-06-08 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE_ENTRYPOINT for Kubernetes support | -| 0.2.2 | 2021-06-08 | [3415](https://github.com/airbytehq/airbyte/pull/3415) | Get Members activities | -| 0.2.1 | 2021-04-03 | [2726](https://github.com/airbytehq/airbyte/pull/2726) | Fix base connector versioning | -| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | -| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.8 | 2021-08-17 | [5481](https://github.com/airbytehq/airbyte/pull/5481) | Remove date-time type from some fields | +| 0.2.7 | 2021-08-03 | [5137](https://github.com/airbytehq/airbyte/pull/5137) | Source Mailchimp: fix primary key for email activities | +| 0.2.6 | 2021-07-28 | [5024](https://github.com/airbytehq/airbyte/pull/5024) | Source Mailchimp: handle records with no no "activity" field in response | +| 0.2.5 | 2021-07-08 | [4621](https://github.com/airbytehq/airbyte/pull/4621) | Mailchimp fix url-base | +| 0.2.4 | 2021-06-09 | [4285](https://github.com/airbytehq/airbyte/pull/4285) | Use datacenter URL parameter from apikey | +| 0.2.3 | 2021-06-08 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE\_ENTRYPOINT for Kubernetes support | +| 0.2.2 | 2021-06-08 | [3415](https://github.com/airbytehq/airbyte/pull/3415) | Get Members activities | +| 0.2.1 | 2021-04-03 | [2726](https://github.com/airbytehq/airbyte/pull/2726) | Fix base connector versioning | +| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | +| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | + diff --git a/docs/integrations/sources/marketo.md b/docs/integrations/sources/marketo.md index 826542e3e9b..655c2d181cf 100644 --- a/docs/integrations/sources/marketo.md +++ b/docs/integrations/sources/marketo.md @@ -90,5 +90,6 @@ We're almost there! Armed with your Endpoint & Identity URLs and your Client ID ## CHANGELOG | Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| `0.1.0` | 2021-09-06 | [5863](https://github.com/airbytehq/airbyte/pull/5863) | Release Marketo CDK Connector| +| :--- | :--- | :--- | :--- | +| `0.1.0` | 2021-09-06 | [5863](https://github.com/airbytehq/airbyte/pull/5863) | Release Marketo CDK Connector | + diff --git a/docs/integrations/sources/microsoft-dynamics-ax.md b/docs/integrations/sources/microsoft-dynamics-ax.md index 8604312e323..cff5b16acb2 100644 --- a/docs/integrations/sources/microsoft-dynamics-ax.md +++ b/docs/integrations/sources/microsoft-dynamics-ax.md @@ -1,10 +1,12 @@ -# MS Dynamics AX +# Microsoft Dynamics AX -[MS Dynamics AX](https://dynamics.microsoft.com/en-us/ax) is a powerful enterprise resource planning (ERP) software package for finance and operations. +[MS Dynamics AX](https://dynamics.microsoft.com/en-us/ax) is a powerful enterprise resource planning \(ERP\) software package for finance and operations. ## Sync overview -MS Dynamics AX runs on the MSSQL database. You can use the [MSSQL connector](mssql.md) to sync your MS Dynamics AX instance by connecting to the underlying database. +MS Dynamics AX runs on the MSSQL database. You can use the [MSSQL connector](mssql.md) to sync your MS Dynamics AX instance by connecting to the underlying database. ### Output schema -To understand your MS Dynamics AX database schema, see the [Microsoft docs](https://docs.microsoft.com/en-us/dynamicsax-2012/developer/database-erds-on-the-axerd-website). Otherwise, the schema will be loaded according to the rules of MSSQL connector. + +To understand your MS Dynamics AX database schema, see the [Microsoft docs](https://docs.microsoft.com/en-us/dynamicsax-2012/developer/database-erds-on-the-axerd-website). Otherwise, the schema will be loaded according to the rules of MSSQL connector. + diff --git a/docs/integrations/sources/microsoft-dynamics-customer-engagement.md b/docs/integrations/sources/microsoft-dynamics-customer-engagement.md index 55125929231..88ae5f951e0 100644 --- a/docs/integrations/sources/microsoft-dynamics-customer-engagement.md +++ b/docs/integrations/sources/microsoft-dynamics-customer-engagement.md @@ -1,15 +1,16 @@ -# MS Dynamics Customer Engagement On-premise +# Microsoft Dynamics Customer Engagement -[MS Dynamics Customer Engagement](https://docs.microsoft.com/en-us/dynamics365/customerengagement/on-premises/overview?view=op-9-1) is an on-premise Customer Relationship Management (CRM) software. +[MS Dynamics Customer Engagement](https://docs.microsoft.com/en-us/dynamics365/customerengagement/on-premises/overview?view=op-9-1) is an on-premise Customer Relationship Management \(CRM\) software. ## Sync overview -MS Dynamics Customer Engagement runs on [MSSQL](https://docs.microsoft.com/en-us/dynamics365/customerengagement/on-premises/deploy/system-requirements-required-technologies?view=op-9-1) database. You can use the [MSSQL connector](mssql.md) to sync your MS Dynamics Customer Engagement instance by connecting to the underlying database. +MS Dynamics Customer Engagement runs on [MSSQL](https://docs.microsoft.com/en-us/dynamics365/customerengagement/on-premises/deploy/system-requirements-required-technologies?view=op-9-1) database. You can use the [MSSQL connector](mssql.md) to sync your MS Dynamics Customer Engagement instance by connecting to the underlying database. {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} - ### Output schema -To understand your MS Dynamics Customer Engagement database schema, see the [Entity Reference documentation](https://docs.microsoft.com/en-us/dynamics365/customerengagement/on-premises/developer/about-entity-reference?view=op-9-1). Otherwise, the schema will be loaded according to the rules of MSSQL connector. + +To understand your MS Dynamics Customer Engagement database schema, see the [Entity Reference documentation](https://docs.microsoft.com/en-us/dynamics365/customerengagement/on-premises/developer/about-entity-reference?view=op-9-1). Otherwise, the schema will be loaded according to the rules of MSSQL connector. + diff --git a/docs/integrations/sources/microsoft-dynamics-gp.md b/docs/integrations/sources/microsoft-dynamics-gp.md index d3c33953c96..d5a8317905a 100644 --- a/docs/integrations/sources/microsoft-dynamics-gp.md +++ b/docs/integrations/sources/microsoft-dynamics-gp.md @@ -1,15 +1,16 @@ -# MS Dynamics GP +# Microsoft Dynamics GP -[MS Dynamics GP](https://dynamics.microsoft.com/en-us/gp/) is a mid-market business accounting or enterprise resource planning (ERP) software. +[MS Dynamics GP](https://dynamics.microsoft.com/en-us/gp/) is a mid-market business accounting or enterprise resource planning \(ERP\) software. ## Sync overview -MS Dynamics GP runs on the [MSSQL](https://docs.microsoft.com/en-us/dynamics-gp/installation/installing-on-first-computer) database. You can use the [MSSQL connector](mssql.md) to sync your MS Dynamics GP instance by connecting to the underlying database. +MS Dynamics GP runs on the [MSSQL](https://docs.microsoft.com/en-us/dynamics-gp/installation/installing-on-first-computer) database. You can use the [MSSQL connector](mssql.md) to sync your MS Dynamics GP instance by connecting to the underlying database. {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} ### Output schema -To understand your MS Dynamics GP database schema, see the [Microsoft docs](https://docs.microsoft.com/en-us/dynamicsax-2012/developer/tables-overview). Otherwise, the schema will be loaded according to the rules of MSSQL connector. + +To understand your MS Dynamics GP database schema, see the [Microsoft docs](https://docs.microsoft.com/en-us/dynamicsax-2012/developer/tables-overview). Otherwise, the schema will be loaded according to the rules of MSSQL connector. diff --git a/docs/integrations/sources/microsoft-dynamics-nav.md b/docs/integrations/sources/microsoft-dynamics-nav.md index c633a9669d7..d98aa67282c 100644 --- a/docs/integrations/sources/microsoft-dynamics-nav.md +++ b/docs/integrations/sources/microsoft-dynamics-nav.md @@ -1,15 +1,16 @@ -# MS Dynamics NAV +# Microsoft Dynamics NAV -[MS Dynamics NAV](https://dynamics.microsoft.com/en-us/nav-overview/) is a business management solution for small and mid-sized organizations that automates and streamlines business processes and helps you manage your business. +[MS Dynamics NAV](https://dynamics.microsoft.com/en-us/nav-overview/) is a business management solution for small and mid-sized organizations that automates and streamlines business processes and helps you manage your business. ## Sync overview -MS Dynamics NAV runs on the [MSSQL](https://docs.microsoft.com/en-us/dynamics-nav/installation-considerations-for-microsoft-sql-server) database. You can use the [MSSQL connector](mssql.md) to sync your MS Dynamics NAV instance by connecting to the underlying database. +MS Dynamics NAV runs on the [MSSQL](https://docs.microsoft.com/en-us/dynamics-nav/installation-considerations-for-microsoft-sql-server) database. You can use the [MSSQL connector](mssql.md) to sync your MS Dynamics NAV instance by connecting to the underlying database. {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} - ### Output schema -To understand your MS Dynamics NAV database schema, see the [Microsoft docs](https://docs.microsoft.com/en-us/dynamics-nav-app/). Otherwise, the schema will be loaded according to the rules of MSSQL connector. + +To understand your MS Dynamics NAV database schema, see the [Microsoft docs](https://docs.microsoft.com/en-us/dynamics-nav-app/). Otherwise, the schema will be loaded according to the rules of MSSQL connector. + diff --git a/docs/integrations/sources/mixpanel.md b/docs/integrations/sources/mixpanel.md index 3a9ca1006b1..4c3e3dc1ce9 100644 --- a/docs/integrations/sources/mixpanel.md +++ b/docs/integrations/sources/mixpanel.md @@ -5,6 +5,7 @@ The Mixpanel source supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. This Source Connector is based on a [Airbyte CDK](https://docs.airbyte.io/connector-development/cdk-python). + ### Output schema Several output streams are available from this source: @@ -29,11 +30,12 @@ If there are more endpoints you'd like Airbyte to support, please [create an iss | SSL connection | Yes | | Namespaces | No | -Please note, that incremental sync could return duplicated (old records) for the state date due to API filter limitation, which is granular to the whole day only. +Please note, that incremental sync could return duplicated \(old records\) for the state date due to API filter limitation, which is granular to the whole day only. ### Performance considerations The Mixpanel connector should not run into Mixpanel API limitations under normal usage. Please [create an issue](https://github.com/airbytehq/airbyte/issues) if you see any rate limit issues that are not automatically retried successfully. + * Export stream - 60 reqs per hour * All streams - 400 reqs per hour @@ -42,19 +44,18 @@ The Mixpanel connector should not run into Mixpanel API limitations under normal ### Requirements * Mixpanel API Secret - * Project region `US` or `EU` ### Setup guide Please read [Find API Secret](https://help.mixpanel.com/hc/en-us/articles/115004502806-Find-Project-Token-). -Select the correct region (EU or US) for your Mixpanel project. See detail [here](https://help.mixpanel.com/hc/en-us/articles/360039135652-Data-Residency-in-EU) - +Select the correct region \(EU or US\) for your Mixpanel project. See detail [here](https://help.mixpanel.com/hc/en-us/articles/360039135652-Data-Residency-in-EU) ## CHANGELOG | Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | +| :--- | :--- | :--- | :--- | | `0.1.1` | 2021-09-16 | [6075](https://github.com/airbytehq/airbyte/issues/6075) | Added option to select project region | | `0.1.0` | 2021-07-06 | [3698](https://github.com/airbytehq/airbyte/issues/3698) | created CDK native mixpanel connector | + diff --git a/docs/integrations/sources/mongodb-v2.md b/docs/integrations/sources/mongodb-v2.md index 7fb8eb10867..72846473fd8 100644 --- a/docs/integrations/sources/mongodb-v2.md +++ b/docs/integrations/sources/mongodb-v2.md @@ -78,11 +78,10 @@ Your `READ_ONLY_USER` should now be ready for use with Airbyte. ### TLS/SSL on a Connection -It is recommended to use encrypted connection. -Connection with TLS/SSL security protocol for MongoDb Atlas Cluster and Replica Set instances is enabled by default. -To enable TSL/SSL connection with Standalone MongoDb instance, please refer to [MongoDb Documentation](https://docs.mongodb.com/manual/tutorial/configure-ssl/). +It is recommended to use encrypted connection. Connection with TLS/SSL security protocol for MongoDb Atlas Cluster and Replica Set instances is enabled by default. To enable TSL/SSL connection with Standalone MongoDb instance, please refer to [MongoDb Documentation](https://docs.mongodb.com/manual/tutorial/configure-ssl/). ### Сonfiguration Parameters + * Database: database name * Authentication Source: specifies the database that the supplied credentials should be validated against. Defaults to `admin`. * User: username to use when connecting @@ -100,7 +99,9 @@ To enable TSL/SSL connection with Standalone MongoDb instance, please refer to [ For more information regarding configuration parameters, please see [MongoDb Documentation](https://docs.mongodb.com/drivers/java/sync/v4.3/fundamentals/connection/). ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-09-21 | [6364](https://github.com/airbytehq/airbyte/pull/6364) | Source MongoDb: added support via TLS/SSL | -| 0.1.0 | 2021-08-30 | [5530](https://github.com/airbytehq/airbyte/pull/5530) | New source: MongoDb ported to java | + +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-09-21 | [6364](https://github.com/airbytehq/airbyte/pull/6364) | Source MongoDb: added support via TLS/SSL | +| 0.1.0 | 2021-08-30 | [5530](https://github.com/airbytehq/airbyte/pull/5530) | New source: MongoDb ported to java | + diff --git a/docs/integrations/sources/mssql.md b/docs/integrations/sources/mssql.md index accce0a140c..d15213490f0 100644 --- a/docs/integrations/sources/mssql.md +++ b/docs/integrations/sources/mssql.md @@ -7,7 +7,7 @@ | Full Refresh Sync | Yes | | | Incremental Sync - Append | Yes | | | Replicate Incremental Deletes | Yes | | -| CDC (Change Data Capture) | Yes | | +| CDC \(Change Data Capture\) | Yes | | | SSL Support | Yes | | | SSH Tunnel Connection | Yes | | | Namespaces | Yes | Enabled by default | @@ -18,10 +18,11 @@ The MSSQL source does not alter the schema present in your database. Depending o You may run into an issue where the connector provides wrong values for some data types. See [discussion](https://github.com/airbytehq/airbyte/issues/4270) on unexpected behaviour for certain datatypes. -## Getting Started (Airbyte Cloud) +## Getting Started \(Airbyte Cloud\) + On Airbyte Cloud, only TLS connections to your MSSQL instance are supported in source configuration. Other than that, you can proceed with the open-source instructions below. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) #### Requirements @@ -41,12 +42,11 @@ _Coming soon: suggestions on how to create this user._ #### 3. Your database user should now be ready for use with Airbyte! -## Change Data Capture (CDC) +## Change Data Capture \(CDC\) -We use [SQL Server's change data capture feature](https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-2017) -to capture row-level `INSERT`, `UPDATE` and `DELETE` operations that occur on cdc-enabled tables. +We use [SQL Server's change data capture feature](https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-2017) to capture row-level `INSERT`, `UPDATE` and `DELETE` operations that occur on cdc-enabled tables. -Some extra setup requiring at least *db_owner* permissions on the database(s) you intend to sync from will be required (detailed [below](mssql.md#setting-up-cdc-for-mssql)). +Some extra setup requiring at least _db\_owner_ permissions on the database\(s\) you intend to sync from will be required \(detailed [below](mssql.md#setting-up-cdc-for-mssql)\). Please read the [CDC docs](../../understanding-airbyte/cdc.md) for an overview of how Airbyte approaches CDC. @@ -61,15 +61,15 @@ Please read the [CDC docs](../../understanding-airbyte/cdc.md) for an overview o * Make sure to read our [CDC docs](../../understanding-airbyte/cdc.md) to see limitations that impact all databases using CDC replication. * There are some critical issues regarding certain datatypes. Please find detailed info in [this Github issue](https://github.com/airbytehq/airbyte/issues/4542). -* CDC is only available for SQL Server 2016 Service Pack 1 (SP1) and later. -* *db_owner* (or higher) permissions are required to perform the [neccessary setup](mssql.md#setting-up-cdc-for-mssql) for CDC. -* You must enable [snapshot isolation mode](https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/snapshot-isolation-in-sql-server) on the database(s) you want to sync. This is used for retrieving an initial snapshot without locking tables. -* On Linux, CDC is not supported on versions earlier than SQL Server 2017 CU18 (SQL Server 2019 is supported). -* Change data capture cannot be enabled on tables with a clustered columnstore index. (It can be enabled on tables with a *non-clustered* columnstore index). +* CDC is only available for SQL Server 2016 Service Pack 1 \(SP1\) and later. +* _db\_owner_ \(or higher\) permissions are required to perform the [neccessary setup](mssql.md#setting-up-cdc-for-mssql) for CDC. +* You must enable [snapshot isolation mode](https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/snapshot-isolation-in-sql-server) on the database\(s\) you want to sync. This is used for retrieving an initial snapshot without locking tables. +* On Linux, CDC is not supported on versions earlier than SQL Server 2017 CU18 \(SQL Server 2019 is supported\). +* Change data capture cannot be enabled on tables with a clustered columnstore index. \(It can be enabled on tables with a _non-clustered_ columnstore index\). * The SQL Server CDC feature processes changes that occur in user-created tables only. You cannot enable CDC on the SQL Server master database. -* Using variables with partition switching on databases or tables with change data capture (CDC) is not supported for the `ALTER TABLE` ... `SWITCH TO` ... `PARTITION` ... statement -* Our implementation has not been tested with managed instances, such as Azure SQL Database (we welcome any feedback from users who try this!) - * If you do want to try this, CDC can only be enabled on Azure SQL databases tiers above Standard 3 (S3+). Basic, S0, S1 and S2 tiers are not supported for CDC. +* Using variables with partition switching on databases or tables with change data capture \(CDC\) is not supported for the `ALTER TABLE` ... `SWITCH TO` ... `PARTITION` ... statement +* Our implementation has not been tested with managed instances, such as Azure SQL Database \(we welcome any feedback from users who try this!\) + * If you do want to try this, CDC can only be enabled on Azure SQL databases tiers above Standard 3 \(S3+\). Basic, S0, S1 and S2 tiers are not supported for CDC. * Our CDC implementation uses at least once delivery for all change records. * Read more on CDC limitations in the [Microsoft docs](https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-2017#limitations). @@ -79,15 +79,18 @@ Please read the [CDC docs](../../understanding-airbyte/cdc.md) for an overview o MS SQL Server provides some built-in stored procedures to enable CDC. -- To enable CDC, a SQL Server administrator with the necessary privileges (*db_owner* or *sysadmin*) must first run a query to enable CDC at the database level. -```text +* To enable CDC, a SQL Server administrator with the necessary privileges \(_db\_owner_ or _sysadmin_\) must first run a query to enable CDC at the database level. + + ```text USE {database name} GO EXEC sys.sp_cdc_enable_db GO - ``` -- The administrator must then enable CDC for each table that you want to capture. Here's an example: -```text + ``` + +* The administrator must then enable CDC for each table that you want to capture. Here's an example: + + ```text USE {database name} GO @@ -98,132 +101,158 @@ MS SQL Server provides some built-in stored procedures to enable CDC. @filegroup_name = N'{fiilegroup name}', [2] @supports_net_changes = 0 [3] GO -``` - - [1] Specifies a role which will gain `SELECT` permission on the captured columns of the source table. We suggest putting a value here so you can use this role in the next step but you can also set the value of @role_name to `NULL` to allow only *sysadmin* and *db_owner* to have access. Be sure that the credentials used to connect to the source in Airbyte align with this role so that Airbyte can access the cdc tables. - - [2] Specifies the filegroup where SQL Server places the change table. We recommend creating a separate filegroup for CDC but you can leave this parameter out to use the default filegroup. - - [3] If 0, only the support functions to query for all changes are generated. If 1, the functions that are needed to query for net changes are also generated. If supports_net_changes is set to 1, index_name must be specified, or the source table must have a defined primary key. + ``` -- (For more details on parameters, see the [Microsoft doc page](https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-enable-table-transact-sql?view=sql-server-ver15) for this stored procedure). + * \[1\] Specifies a role which will gain `SELECT` permission on the captured columns of the source table. We suggest putting a value here so you can use this role in the next step but you can also set the value of @role\_name to `NULL` to allow only _sysadmin_ and _db\_owner_ to have access. Be sure that the credentials used to connect to the source in Airbyte align with this role so that Airbyte can access the cdc tables. + * \[2\] Specifies the filegroup where SQL Server places the change table. We recommend creating a separate filegroup for CDC but you can leave this parameter out to use the default filegroup. + * \[3\] If 0, only the support functions to query for all changes are generated. If 1, the functions that are needed to query for net changes are also generated. If supports\_net\_changes is set to 1, index\_name must be specified, or the source table must have a defined primary key. - -- If you have many tables to enable CDC on and would like to avoid having to run this query one-by-one for every table, [this script](http://www.techbrothersit.com/2013/06/change-data-capture-cdc-sql-server_69.html) might help! +* \(For more details on parameters, see the [Microsoft doc page](https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-enable-table-transact-sql?view=sql-server-ver15) for this stored procedure\). +* If you have many tables to enable CDC on and would like to avoid having to run this query one-by-one for every table, [this script](http://www.techbrothersit.com/2013/06/change-data-capture-cdc-sql-server_69.html) might help! For further detail, see the [Microsoft docs on enabling and disabling CDC](https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/enable-and-disable-change-data-capture-sql-server?view=sql-server-ver15). #### 2. Enable snapshot isolation -- When a sync runs for the first time using CDC, Airbyte performs an initial consistent snapshot of your database. To avoid acquiring table locks, Airbyte uses *snapshot isolation*, allowing simultaneous writes by other database clients. This must be enabled on the database like so: -```text +* When a sync runs for the first time using CDC, Airbyte performs an initial consistent snapshot of your database. To avoid acquiring table locks, Airbyte uses _snapshot isolation_, allowing simultaneous writes by other database clients. This must be enabled on the database like so: + + ```text ALTER DATABASE {database name} SET ALLOW_SNAPSHOT_ISOLATION ON; -``` + ``` #### 3. Create a user and grant appropriate permissions -- Rather than use *sysadmin* or *db_owner* credentials, we recommend creating a new user with the relevant CDC access for use with Airbyte. First let's create the login and user and add to the [db_datareader](https://docs.microsoft.com/en-us/sql/relational-databases/security/authentication-access/database-level-roles?view=sql-server-ver15) role: -```text + +* Rather than use _sysadmin_ or _db\_owner_ credentials, we recommend creating a new user with the relevant CDC access for use with Airbyte. First let's create the login and user and add to the [db\_datareader](https://docs.microsoft.com/en-us/sql/relational-databases/security/authentication-access/database-level-roles?view=sql-server-ver15) role: + + ```text USE {database name}; CREATE LOGIN {user name} WITH PASSWORD = '{password}'; CREATE USER {user name} FOR LOGIN {user name}; EXEC sp_addrolemember 'db_datareader', '{user name}'; -``` - - Add the user to the role specified earlier when enabling cdc on the table(s): -```text - EXEC sp_addrolemember '{role name}', '{user name}'; -``` - - This should be enough access, but if you run into problems, try also directly granting the user `SELECT` access on the cdc schema: -```text - USE {database name}; - GRANT SELECT ON SCHEMA :: [cdc] TO {user name}; -``` - - If feasible, granting this user 'VIEW SERVER STATE' permissions will allow Airbyte to check whether or not the [SQL Server Agent](https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver15#relationship-with-log-reader-agent) is running. This is preferred as it ensures syncs will fail if the CDC tables are not being updated by the Agent in the source database. -```text - USE master; - GRANT VIEW SERVER STATE TO {user name}; -``` + ``` + + * Add the user to the role specified earlier when enabling cdc on the table\(s\): + + ```text + EXEC sp_addrolemember '{role name}', '{user name}'; + ``` + + * This should be enough access, but if you run into problems, try also directly granting the user `SELECT` access on the cdc schema: + + ```text + USE {database name}; + GRANT SELECT ON SCHEMA :: [cdc] TO {user name}; + ``` + + * If feasible, granting this user 'VIEW SERVER STATE' permissions will allow Airbyte to check whether or not the [SQL Server Agent](https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver15#relationship-with-log-reader-agent) is running. This is preferred as it ensures syncs will fail if the CDC tables are not being updated by the Agent in the source database. + + ```text + USE master; + GRANT VIEW SERVER STATE TO {user name}; + ``` #### 4. Extend the retention period of CDC data -- In SQL Server, by default, only three days of data are retained in the change tables. Unless you are running very frequent syncs, we suggest increasing this retention so that in case of a failure in sync or if the sync is paused, there is still some bandwidth to start from the last point in incremental sync. -- These settings can be changed using the stored procedure [sys.sp_cdc_change_job](https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-change-job-transact-sql?view=sql-server-ver15) as below: -```text +* In SQL Server, by default, only three days of data are retained in the change tables. Unless you are running very frequent syncs, we suggest increasing this retention so that in case of a failure in sync or if the sync is paused, there is still some bandwidth to start from the last point in incremental sync. +* These settings can be changed using the stored procedure [sys.sp\_cdc\_change\_job](https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sys-sp-cdc-change-job-transact-sql?view=sql-server-ver15) as below: + + ```text -- we recommend 14400 minutes (10 days) as retention period EXEC sp_cdc_change_job @job_type='cleanup', @retention = {minutes} -``` + ``` -- After making this change, a restart of the cleanup job is required: +* After making this change, a restart of the cleanup job is required: ```text EXEC sys.sp_cdc_stop_job @job_type = 'cleanup'; - + EXEC sys.sp_cdc_start_job @job_type = 'cleanup'; ``` #### 5. Ensure the SQL Server Agent is running -- MSSQL uses the SQL Server Agent +* MSSQL uses the SQL Server Agent + to [run the jobs necessary](https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver15#agent-jobs) + for CDC. It is therefore vital that the Agent is operational in order for to CDC to work effectively. You can check + the status of the SQL Server Agent as follows: ```text EXEC xp_servicecontrol 'QueryState', N'SQLServerAGENT'; ``` -- If you see something other than 'Running.' please follow +* If you see something other than 'Running.' please follow + the [Microsoft docs](https://docs.microsoft.com/en-us/sql/ssms/agent/start-stop-or-pause-the-sql-server-agent-service?view=sql-server-ver15) + to start the service. ## Connection to MSSQL via an SSH Tunnel -Airbyte has the ability to connect to a MSSQL instance via an SSH Tunnel. The reason you might want to do this because -it is not possible (or against security policy) to connect to the database directly (e.g. it does not have a public IP -address). +Airbyte has the ability to connect to a MSSQL instance via an SSH Tunnel. The reason you might want to do this because it is not possible \(or against security policy\) to connect to the database directly \(e.g. it does not have a public IP address\). -When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server (a.k.a. a bastion sever) -that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect -directly to the server. +When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server \(a.k.a. a bastion sever\) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. -Using this feature requires additional configuration, when creating the source. We will talk through what each piece of -configuration means. +Using this feature requires additional configuration, when creating the source. We will talk through what each piece of configuration means. 1. Configure all fields for the source as you normally would, except `SSH Tunnel Method`. -2. `SSH Tunnel Method` defaults to `No Tunnel` (meaning a direct connection). If you want to use an +2. `SSH Tunnel Method` defaults to `No Tunnel` \(meaning a direct connection\). If you want to use an + SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. - 1. Choose `Key Authentication` if you will be using an RSA private key as your secret for - establishing the SSH Tunnel (see below for more information on generating this key). - 2. Choose `Password Authentication` if you will be using a password as your secret for establishing - the SSH Tunnel. -3. `SSH Tunnel Jump Server Host` refers to the intermediate (bastion) server that Airbyte will connect to. This should + + 1. Choose `Key Authentication` if you will be using an RSA private key as your secret for + + establishing the SSH Tunnel \(see below for more information on generating this key\). + + 2. Choose `Password Authentication` if you will be using a password as your secret for establishing + + the SSH Tunnel. + +3. `SSH Tunnel Jump Server Host` refers to the intermediate \(bastion\) server that Airbyte will connect to. This should + be a hostname or an IP Address. + 4. `SSH Connection Port` is the port on the bastion server with which to make the SSH connection. The default port for + SSH connections is `22`, so unless you have explicitly changed something, go with the default. + 5. `SSH Login Username` is the username that Airbyte should use when connection to the bastion server. This is NOT the + MSSQL username. + 6. If you are using `Password Authentication`, then `SSH Login Username` should be set to the + password of the User from the previous step. If you are using `SSH Key Authentication` leave this + blank. Again, this is not the MSSQL password, but the password for the OS-user that Airbyte is + using to perform commands on the bastion. + 7. If you are using `SSH Key Authentication`, then `SSH Private Key` should be set to the RSA + private Key that you are using to create the SSH connection. This should be the full contents of + the key file starting with `-----BEGIN RSA PRIVATE KEY-----` and ending + with `-----END RSA PRIVATE KEY-----`. ### Generating an SSH Key Pair -The connector expects an RSA key in PEM format. To generate this key: +The connector expects an RSA key in PEM format. To generate this key: - ssh-keygen -t rsa -m PEM -f myuser_rsa +```text +ssh-keygen -t rsa -m PEM -f myuser_rsa +``` -This produces the private key in pem format, and the public key remains in the standard format used by the `authorized_keys` file on -your bastion host. The public key should be added to your bastion host to whichever user you want to use with Airbyte. The private -key is provided via copy-and-paste to the Airbyte connector configuration screen, so it may log in to the bastion. +This produces the private key in pem format, and the public key remains in the standard format used by the `authorized_keys` file on your bastion host. The public key should be added to your bastion host to whichever user you want to use with Airbyte. The private key is provided via copy-and-paste to the Airbyte connector configuration screen, so it may log in to the bastion. ## Data type mapping -MSSQL data types are mapped to the following data types when synchronizing data. -You can check the test values examples [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-mssql/src/test-integration/java/io/airbyte/integrations/source/mssql/MssqlSourceComprehensiveTest.java). -If you can't find the data type you are looking for or have any problems feel free to add a new test! +MSSQL data types are mapped to the following data types when synchronizing data. You can check the test values examples [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-mssql/src/test-integration/java/io/airbyte/integrations/source/mssql/MssqlSourceComprehensiveTest.java). If you can't find the data type you are looking for or have any problems feel free to add a new test! | MSSQL Type | Resulting Type | Notes | | :--- | :--- | :--- | @@ -263,24 +292,25 @@ If you do not see a type in this list, assume that it is coerced into a string. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.3.7 | 2021-09-30 | [6585](https://github.com/airbytehq/airbyte/pull/6585) | Improved SSH Tunnel key generation steps | -| 0.3.6 | 2021-09-17 | [6318](https://github.com/airbytehq/airbyte/pull/6318) | Added option to connect to DB via SSH | -| 0.3.4 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | -| 0.3.3 | 2021-07-05 | [4689](https://github.com/airbytehq/airbyte/pull/4689) | Add CDC support | -| 0.3.2 | 2021-06-09 | [3179](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE_ENTRYPOINT for Kubernetes support | -| 0.3.1 | 2021-06-08 | [3893](https://github.com/airbytehq/airbyte/pull/3893) | Enable SSL connection | -| 0.3.0 | 2021-04-21 | [2990](https://github.com/airbytehq/airbyte/pull/2990) | Support namespaces | -| 0.2.3 | 2021-03-28 | [2600](https://github.com/airbytehq/airbyte/pull/2600) | Add NCHAR and NVCHAR support to DB and cursor type casting | -| 0.2.2 | 2021-03-26 | [2460](https://github.com/airbytehq/airbyte/pull/2460) | Destination supports destination sync mode | -| 0.2.1 | 2021-03-18 | [2488](https://github.com/airbytehq/airbyte/pull/2488) | Sources support primary keys | -| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | -| 0.1.11 | 2021-02-02 | [1887](https://github.com/airbytehq/airbyte/pull/1887) | Migrate AbstractJdbcSource to use iterators |] -| 0.1.10 | 2021-01-25 | [1746](https://github.com/airbytehq/airbyte/pull/1746) | Fix NPE in State Decorator | -| 0.1.9 | 2021-01-19 | [1724](https://github.com/airbytehq/airbyte/pull/1724) | Fix JdbcSource handling of tables with same names in different schemas | -| 0.1.9 | 2021-01-14 | [1655](https://github.com/airbytehq/airbyte/pull/1655) | Fix JdbcSource OOM | -| 0.1.8 | 2021-01-13 | [1588](https://github.com/airbytehq/airbyte/pull/1588) | Handle invalid numeric values in JDBC source | -| 0.1.6 | 2020-12-09 | [1172](https://github.com/airbytehq/airbyte/pull/1172) | Support incremental sync | -| 0.1.5 | 2020-11-30 | [1038](https://github.com/airbytehq/airbyte/pull/1038) | Change JDBC sources to discover more than standard schemas | -| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | +| Version | Date | Pull Request | Subject | | +| :--- | :--- | :--- | :--- | :--- | +| 0.3.7 | 2021-09-30 | [6585](https://github.com/airbytehq/airbyte/pull/6585) | Improved SSH Tunnel key generation steps | | +| 0.3.6 | 2021-09-17 | [6318](https://github.com/airbytehq/airbyte/pull/6318) | Added option to connect to DB via SSH | | +| 0.3.4 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | | +| 0.3.3 | 2021-07-05 | [4689](https://github.com/airbytehq/airbyte/pull/4689) | Add CDC support | | +| 0.3.2 | 2021-06-09 | [3179](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE\_ENTRYPOINT for Kubernetes support | | +| 0.3.1 | 2021-06-08 | [3893](https://github.com/airbytehq/airbyte/pull/3893) | Enable SSL connection | | +| 0.3.0 | 2021-04-21 | [2990](https://github.com/airbytehq/airbyte/pull/2990) | Support namespaces | | +| 0.2.3 | 2021-03-28 | [2600](https://github.com/airbytehq/airbyte/pull/2600) | Add NCHAR and NVCHAR support to DB and cursor type casting | | +| 0.2.2 | 2021-03-26 | [2460](https://github.com/airbytehq/airbyte/pull/2460) | Destination supports destination sync mode | | +| 0.2.1 | 2021-03-18 | [2488](https://github.com/airbytehq/airbyte/pull/2488) | Sources support primary keys | | +| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | | +| 0.1.11 | 2021-02-02 | [1887](https://github.com/airbytehq/airbyte/pull/1887) | Migrate AbstractJdbcSource to use iterators | \] | +| 0.1.10 | 2021-01-25 | [1746](https://github.com/airbytehq/airbyte/pull/1746) | Fix NPE in State Decorator | | +| 0.1.9 | 2021-01-19 | [1724](https://github.com/airbytehq/airbyte/pull/1724) | Fix JdbcSource handling of tables with same names in different schemas | | +| 0.1.9 | 2021-01-14 | [1655](https://github.com/airbytehq/airbyte/pull/1655) | Fix JdbcSource OOM | | +| 0.1.8 | 2021-01-13 | [1588](https://github.com/airbytehq/airbyte/pull/1588) | Handle invalid numeric values in JDBC source | | +| 0.1.6 | 2020-12-09 | [1172](https://github.com/airbytehq/airbyte/pull/1172) | Support incremental sync | | +| 0.1.5 | 2020-11-30 | [1038](https://github.com/airbytehq/airbyte/pull/1038) | Change JDBC sources to discover more than standard schemas | | +| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | | + diff --git a/docs/integrations/sources/mysql.md b/docs/integrations/sources/mysql.md index 853195de31a..a11bfa70b54 100644 --- a/docs/integrations/sources/mysql.md +++ b/docs/integrations/sources/mysql.md @@ -11,7 +11,7 @@ | SSL Support | Yes | | | SSH Tunnel Connection | Yes | | | Namespaces | Yes | Enabled by default | -| Arrays | Yes | Byte arrays are not supported yet | +| Arrays | Yes | Byte arrays are not supported yet | The MySQL source does not alter the schema present in your database. Depending on the destination connected to this source, however, the schema may be altered. See the destination's documentation for more details. @@ -19,10 +19,11 @@ The MySQL source does not alter the schema present in your database. Depending o There may be problems with mapping values in MySQL's datetime field to other relational data stores. MySQL permits zero values for date/time instead of NULL which may not be accepted by other data stores. To work around this problem, you can pass the following key value pair in the JDBC connector of the source setting `zerodatetimebehavior=Converttonull`. -## Getting Started (Airbyte Cloud) -On Airbyte Cloud, only TLS connections to your MySQL instance are supported. Other than that, you can proceed with the open-source instructions below. +## Getting Started \(Airbyte Cloud\) -## Getting Started (Airbyte Open-Source) +On Airbyte Cloud, only TLS connections to your MySQL instance are supported. Other than that, you can proceed with the open-source instructions below. + +## Getting Started \(Airbyte Open-Source\) #### Requirements @@ -43,13 +44,14 @@ To create a dedicated database user, run the following commands against your dat CREATE USER 'airbyte'@'%' IDENTIFIED BY 'your_password_here'; ``` -The right set of permissions differ between the `STANDARD` and `CDC` replication method. -For `STANDARD` replication method, only `SELECT` permission is required. +The right set of permissions differ between the `STANDARD` and `CDC` replication method. For `STANDARD` replication method, only `SELECT` permission is required. ```sql GRANT SELECT ON .* TO 'airbyte'@'%'; ``` + For `CDC` replication method, `SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT` permissions are required. + ```sql GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'airbyte'@'%'; ``` @@ -64,7 +66,6 @@ For `STANDARD` replication method this is not applicable. If you select the `CDC Your database user should now be ready for use with Airbyte. - ## Change Data Capture \(CDC\) * If you need a record of deletions and can accept the limitations posted below, you should be able to use CDC for MySQL. @@ -77,51 +78,48 @@ Your database user should now be ready for use with Airbyte. * Make sure to read our [CDC docs](../../understanding-airbyte/cdc.md) to see limitations that impact all databases using CDC replication. * Our CDC implementation uses at least once delivery for all change records. - **1. Enable binary logging** You must enable binary logging for MySQL replication. The binary logs record transaction updates for replication tools to propagate changes. You can configure your MySQL server configuration file with the following properties, which are described in below: -``` + +```text server-id = 223344 log_bin = mysql-bin binlog_format = ROW binlog_row_image = FULL expire_logs_days = 10 ``` + * server-id : The value for the server-id must be unique for each server and replication client in the MySQL cluster. The `server-id` should be a non-zero value. If the `server-id` is already set to a non-zero value, you don't need to make any change. You can set the `server-id` to any value between 1 and 4294967295. For more information refer [mysql doc](https://dev.mysql.com/doc/refman/8.0/en/replication-options.html#sysvar_server_id) -* log_bin : The value of log_bin is the base name of the sequence of binlog files. If the `log_bin` is already set, you don't need to make any change. For more information refer [mysql doc](https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#option_mysqld_log-bin) -* binlog_format : The `binlog_format` must be set to `ROW`. For more information refer [mysql doc](https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_format) -* binlog_row_image : The `binlog_row_image` must be set to `FULL`. It determines how row images are written to the binary log. For more information refer [mysql doc](https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html#sysvar_binlog_row_image) -* expire_logs_days : This is the number of days for automatic binlog file removal. We recommend 10 days so that in case of a failure in sync or if the sync is paused, we still have some bandwidth to start from the last point in incremental sync. We also recommend setting frequent syncs for CDC. +* log\_bin : The value of log\_bin is the base name of the sequence of binlog files. If the `log_bin` is already set, you don't need to make any change. For more information refer [mysql doc](https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#option_mysqld_log-bin) +* binlog\_format : The `binlog_format` must be set to `ROW`. For more information refer [mysql doc](https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_format) +* binlog\_row\_image : The `binlog_row_image` must be set to `FULL`. It determines how row images are written to the binary log. For more information refer [mysql doc](https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html#sysvar_binlog_row_image) +* expire\_logs\_days : This is the number of days for automatic binlog file removal. We recommend 10 days so that in case of a failure in sync or if the sync is paused, we still have some bandwidth to start from the last point in incremental sync. We also recommend setting frequent syncs for CDC. **2. Enable GTIDs \(Optional\)** -Global transaction identifiers (GTIDs) uniquely identify transactions that occur on a server within a cluster. -Though not required for a Airbyte MySQL connector, using GTIDs simplifies replication and enables you to more easily confirm if primary and replica servers are consistent. -For more information refer [mysql doc](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html#option_mysqld_gtid-mode) -* Enable gtid_mode : Boolean that specifies whether GTID mode of the MySQL server is enabled or not. Enable it via `mysql> gtid_mode=ON` -* Enable enforce_gtid_consistency : Boolean that specifies whether the server enforces GTID consistency by allowing the execution of statements that can be logged in a transactionally safe manner. Required when using GTIDs. Enable it via `mysql> enforce_gtid_consistency=ON` +Global transaction identifiers \(GTIDs\) uniquely identify transactions that occur on a server within a cluster. Though not required for a Airbyte MySQL connector, using GTIDs simplifies replication and enables you to more easily confirm if primary and replica servers are consistent. For more information refer [mysql doc](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html#option_mysqld_gtid-mode) -**Note** +* Enable gtid\_mode : Boolean that specifies whether GTID mode of the MySQL server is enabled or not. Enable it via `mysql> gtid_mode=ON` +* Enable enforce\_gtid\_consistency : Boolean that specifies whether the server enforces GTID consistency by allowing the execution of statements that can be logged in a transactionally safe manner. Required when using GTIDs. Enable it via `mysql> enforce_gtid_consistency=ON` -When a sync runs for the first time using CDC, Airbyte performs an initial consistent snapshot of your database. -Airbyte doesn't acquire any table locks (for tables defined with MyISAM engine, the tables would still be locked) while creating the snapshot to allow writes by other database clients. -But in order for the sync to work without any error/unexpected behaviour, it is assumed that no schema changes are happening while the snapshot is running. +**Note** +When a sync runs for the first time using CDC, Airbyte performs an initial consistent snapshot of your database. Airbyte doesn't acquire any table locks \(for tables defined with MyISAM engine, the tables would still be locked\) while creating the snapshot to allow writes by other database clients. But in order for the sync to work without any error/unexpected behaviour, it is assumed that no schema changes are happening while the snapshot is running. ## Connection via SSH Tunnel -Airbyte has the ability to connect to a MySQl instance via an SSH Tunnel. The reason you might want to do this because it is not possible (or against security policy) to connect to the database directly (e.g. it does not have a public IP address). +Airbyte has the ability to connect to a MySQl instance via an SSH Tunnel. The reason you might want to do this because it is not possible \(or against security policy\) to connect to the database directly \(e.g. it does not have a public IP address\). -When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server (a.k.a. a bastion sever) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. +When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server \(a.k.a. a bastion sever\) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. Using this feature requires additional configuration, when creating the source. We will talk through what each piece of configuration means. 1. Configure all fields for the source as you normally would, except `SSH Tunnel Method`. -2. `SSH Tunnel Method` defaults to `No Tunnel` (meaning a direct connection). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. - 1. Choose `Key Authentication` if you will be using an RSA private key as your secret for establishing the SSH Tunnel (see below for more information on generating this key). +2. `SSH Tunnel Method` defaults to `No Tunnel` \(meaning a direct connection\). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. + 1. Choose `Key Authentication` if you will be using an RSA private key as your secret for establishing the SSH Tunnel \(see below for more information on generating this key\). 2. Choose `Password Authentication` if you will be using a password as your secret for establishing the SSH Tunnel. -3. `SSH Tunnel Jump Server Host` refers to the intermediate (bastion) server that Airbyte will connect to. This should be a hostname or an IP Address. +3. `SSH Tunnel Jump Server Host` refers to the intermediate \(bastion\) server that Airbyte will connect to. This should be a hostname or an IP Address. 4. `SSH Connection Port` is the port on the bastion server with which to make the SSH connection. The default port for SSH connections is `22`, so unless you have explicitly changed something, go with the default. 5. `SSH Login Username` is the username that Airbyte should use when connection to the bastion server. This is NOT the MySQl username. 6. If you are using `Password Authentication`, then `SSH Login Username` should be set to the password of the User from the previous step. If you are using `SSH Key Authentication` leave this blank. Again, this is not the MySQl password, but the password for the OS-user that Airbyte is using to perform commands on the bastion. @@ -129,19 +127,17 @@ Using this feature requires additional configuration, when creating the source. ### Generating an SSH Key Pair -The connector expects an RSA key in PEM format. To generate this key: +The connector expects an RSA key in PEM format. To generate this key: - ssh-keygen -t rsa -m PEM -f myuser_rsa +```text +ssh-keygen -t rsa -m PEM -f myuser_rsa +``` -This produces the private key in pem format, and the public key remains in the standard format used by the `authorized_keys` file on -your bastion host. The public key should be added to your bastion host to whichever user you want to use with Airbyte. The private -key is provided via copy-and-paste to the Airbyte connector configuration screen, so it may log in to the bastion. +This produces the private key in pem format, and the public key remains in the standard format used by the `authorized_keys` file on your bastion host. The public key should be added to your bastion host to whichever user you want to use with Airbyte. The private key is provided via copy-and-paste to the Airbyte connector configuration screen, so it may log in to the bastion. ## Data Type Mapping -MySQL data types are mapped to the following data types when synchronizing data. -You can check the test values examples [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-mysql/src/test-integration/java/io/airbyte/integrations/source/mysql/MySqlSourceComprehensiveTest.java). -If you can't find the data type you are looking for or have any problems feel free to add a new test! +MySQL data types are mapped to the following data types when synchronizing data. You can check the test values examples [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-mysql/src/test-integration/java/io/airbyte/integrations/source/mysql/MySqlSourceComprehensiveTest.java). If you can't find the data type you are looking for or have any problems feel free to add a new test! | MySQL Type | Resulting Type | Notes | | :--- | :--- | :--- | @@ -182,30 +178,31 @@ If you do not see a type in this list, assume that it is coerced into a string. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.4.7 | 2021-09-30 | [6585](https://github.com/airbytehq/airbyte/pull/6585) | Improved SSH Tunnel key generation steps | -| 0.4.6 | 2021-09-29 | [6510](https://github.com/airbytehq/airbyte/pull/6510) | Support SSL connection | -| 0.4.5 | 2021-09-17 | [6146](https://github.com/airbytehq/airbyte/pull/6146) | Added option to connect to DB via SSH| -| 0.4.1 | 2021-07-23 | [4956](https://github.com/airbytehq/airbyte/pull/4956) | Fix log link | -| 0.3.7 | 2021-06-09 | [3179](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE_ENTRYPOINT for Kubernetes support | -| 0.3.6 | 2021-06-09 | [3966](https://github.com/airbytehq/airbyte/pull/3966) | Fix excessive logging for CDC method | -| 0.3.5 | 2021-06-07 | [3890](https://github.com/airbytehq/airbyte/pull/3890) | Fix CDC handle tinyint(1) and boolean types | -| 0.3.4 | 2021-06-04 | [3846](https://github.com/airbytehq/airbyte/pull/3846) | Fix max integer value failure | -| 0.3.3 | 2021-06-02 | [3789](https://github.com/airbytehq/airbyte/pull/3789) | MySQL CDC poll wait 5 minutes when not received a single record | -| 0.3.2 | 2021-06-01 | [3757](https://github.com/airbytehq/airbyte/pull/3757) | MySQL CDC poll 5s to 5 min | -| 0.3.1 | 2021-06-01 | [3505](https://github.com/airbytehq/airbyte/pull/3505) | Implemented MySQL CDC | -| 0.3.0 | 2021-04-21 | [2990](https://github.com/airbytehq/airbyte/pull/2990) | Support namespaces | -| 0.2.5 | 2021-04-15 | [2899](https://github.com/airbytehq/airbyte/pull/2899) | Fix bug in tests | -| 0.2.4 | 2021-03-28 | [2600](https://github.com/airbytehq/airbyte/pull/2600) | Add NCHAR and NVCHAR support to DB and cursor type casting | -| 0.2.3 | 2021-03-26 | [2611](https://github.com/airbytehq/airbyte/pull/2611) | Add an optional `jdbc_url_params` in parameters | -| 0.2.2 | 2021-03-26 | [2460](https://github.com/airbytehq/airbyte/pull/2460) | Destination supports destination sync mode | -| 0.2.1 | 2021-03-18 | [2488](https://github.com/airbytehq/airbyte/pull/2488) | Sources support primary keys | -| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | -| 0.1.10 | 2021-02-02 | [1887](https://github.com/airbytehq/airbyte/pull/1887) | Migrate AbstractJdbcSource to use iterators | -| 0.1.9 | 2021-01-25 | [1746](https://github.com/airbytehq/airbyte/pull/1746) | Fix NPE in State Decorator | -| 0.1.8 | 2021-01-19 | [1724](https://github.com/airbytehq/airbyte/pull/1724) | Fix JdbcSource handling of tables with same names in different schemas | -| 0.1.7 | 2021-01-14 | [1655](https://github.com/airbytehq/airbyte/pull/1655) | Fix JdbcSource OOM | -| 0.1.6 | 2021-01-08 | [1307](https://github.com/airbytehq/airbyte/pull/1307) | Migrate Postgres and MySQL to use new JdbcSource | -| 0.1.5 | 2020-12-11 | [1267](https://github.com/airbytehq/airbyte/pull/1267) | Support incremental sync | -| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.4.7 | 2021-09-30 | [6585](https://github.com/airbytehq/airbyte/pull/6585) | Improved SSH Tunnel key generation steps | +| 0.4.6 | 2021-09-29 | [6510](https://github.com/airbytehq/airbyte/pull/6510) | Support SSL connection | +| 0.4.5 | 2021-09-17 | [6146](https://github.com/airbytehq/airbyte/pull/6146) | Added option to connect to DB via SSH | +| 0.4.1 | 2021-07-23 | [4956](https://github.com/airbytehq/airbyte/pull/4956) | Fix log link | +| 0.3.7 | 2021-06-09 | [3179](https://github.com/airbytehq/airbyte/pull/3973) | Add AIRBYTE\_ENTRYPOINT for Kubernetes support | +| 0.3.6 | 2021-06-09 | [3966](https://github.com/airbytehq/airbyte/pull/3966) | Fix excessive logging for CDC method | +| 0.3.5 | 2021-06-07 | [3890](https://github.com/airbytehq/airbyte/pull/3890) | Fix CDC handle tinyint\(1\) and boolean types | +| 0.3.4 | 2021-06-04 | [3846](https://github.com/airbytehq/airbyte/pull/3846) | Fix max integer value failure | +| 0.3.3 | 2021-06-02 | [3789](https://github.com/airbytehq/airbyte/pull/3789) | MySQL CDC poll wait 5 minutes when not received a single record | +| 0.3.2 | 2021-06-01 | [3757](https://github.com/airbytehq/airbyte/pull/3757) | MySQL CDC poll 5s to 5 min | +| 0.3.1 | 2021-06-01 | [3505](https://github.com/airbytehq/airbyte/pull/3505) | Implemented MySQL CDC | +| 0.3.0 | 2021-04-21 | [2990](https://github.com/airbytehq/airbyte/pull/2990) | Support namespaces | +| 0.2.5 | 2021-04-15 | [2899](https://github.com/airbytehq/airbyte/pull/2899) | Fix bug in tests | +| 0.2.4 | 2021-03-28 | [2600](https://github.com/airbytehq/airbyte/pull/2600) | Add NCHAR and NVCHAR support to DB and cursor type casting | +| 0.2.3 | 2021-03-26 | [2611](https://github.com/airbytehq/airbyte/pull/2611) | Add an optional `jdbc_url_params` in parameters | +| 0.2.2 | 2021-03-26 | [2460](https://github.com/airbytehq/airbyte/pull/2460) | Destination supports destination sync mode | +| 0.2.1 | 2021-03-18 | [2488](https://github.com/airbytehq/airbyte/pull/2488) | Sources support primary keys | +| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | +| 0.1.10 | 2021-02-02 | [1887](https://github.com/airbytehq/airbyte/pull/1887) | Migrate AbstractJdbcSource to use iterators | +| 0.1.9 | 2021-01-25 | [1746](https://github.com/airbytehq/airbyte/pull/1746) | Fix NPE in State Decorator | +| 0.1.8 | 2021-01-19 | [1724](https://github.com/airbytehq/airbyte/pull/1724) | Fix JdbcSource handling of tables with same names in different schemas | +| 0.1.7 | 2021-01-14 | [1655](https://github.com/airbytehq/airbyte/pull/1655) | Fix JdbcSource OOM | +| 0.1.6 | 2021-01-08 | [1307](https://github.com/airbytehq/airbyte/pull/1307) | Migrate Postgres and MySQL to use new JdbcSource | +| 0.1.5 | 2020-12-11 | [1267](https://github.com/airbytehq/airbyte/pull/1267) | Support incremental sync | +| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | + diff --git a/docs/integrations/sources/okta.md b/docs/integrations/sources/okta.md index b3384961a9c..c459e0b7715 100644 --- a/docs/integrations/sources/okta.md +++ b/docs/integrations/sources/okta.md @@ -52,16 +52,15 @@ Different Okta APIs require different admin privilege levels. API tokens inherit 3. Click Create Token. 4. Name your token and click Create Token. 5. Record the token value. This is the only opportunity to see it and record it. -8. In Airbyte, create a Okta source. -9. You can now pull data from your Okta instance! - +6. In Airbyte, create a Okta source. +7. You can now pull data from your Okta instance! ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.3 | 2021-09-08 | [5905](https://github.com/airbytehq/airbyte/pull/5905)| Fix incremental stream defect | -| 0.1.2 | 2021-07-01 | [4456](https://github.com/airbytehq/airbyte/pull/4456)| Bugfix infinite pagination in logs stream | -| 0.1.1 | 2021-06-09 | [3937](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` env variable for kubernetes support| -| 0.1.0 | 2021-05-30 | [3563](https://github.com/airbytehq/airbyte/pull/3563) | Initial Release | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.3 | 2021-09-08 | [5905](https://github.com/airbytehq/airbyte/pull/5905) | Fix incremental stream defect | +| 0.1.2 | 2021-07-01 | [4456](https://github.com/airbytehq/airbyte/pull/4456) | Bugfix infinite pagination in logs stream | +| 0.1.1 | 2021-06-09 | [3937](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` env variable for kubernetes support | +| 0.1.0 | 2021-05-30 | [3563](https://github.com/airbytehq/airbyte/pull/3563) | Initial Release | diff --git a/docs/integrations/sources/oracle-peoplesoft.md b/docs/integrations/sources/oracle-peoplesoft.md index 37bc7684142..535bd1cd16d 100644 --- a/docs/integrations/sources/oracle-peoplesoft.md +++ b/docs/integrations/sources/oracle-peoplesoft.md @@ -1,19 +1,20 @@ -# Oracle PeopleSoft +# Oracle Peoplesoft [Oracle PeopleSoft](https://www.oracle.com/applications/peoplesoft/) is a Human Resource, Financial, Supply Chain, Customer Relationship, and Enterprise Performance Management System. ## Sync overview -Oracle PeopleSoft can run on the [Oracle, MSSQL, or IBM DB2](https://docs.oracle.com/en/applications/peoplesoft/peopletools/index.html) databases. You can use Airbyte to sync your Oracle PeopleSoft instance by connecting to the underlying database using the appropriate Airbyte connector: +Oracle PeopleSoft can run on the [Oracle, MSSQL, or IBM DB2](https://docs.oracle.com/en/applications/peoplesoft/peopletools/index.html) databases. You can use Airbyte to sync your Oracle PeopleSoft instance by connecting to the underlying database using the appropriate Airbyte connector: * [DB2](db2.md) -* [MSSQL](./mssql.md) +* [MSSQL](mssql.md) * [Oracle](oracle.md) {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} - ### Output schema -The schema will be loaded according to the rules of the underlying database's connector. Oracle provides ERD diagrams but they are behind a paywall. Contact your Oracle rep to gain access. + +The schema will be loaded according to the rules of the underlying database's connector. Oracle provides ERD diagrams but they are behind a paywall. Contact your Oracle rep to gain access. + diff --git a/docs/integrations/sources/oracle-siebel-crm.md b/docs/integrations/sources/oracle-siebel-crm.md index c983bc315f6..5a1e40c9f70 100644 --- a/docs/integrations/sources/oracle-siebel-crm.md +++ b/docs/integrations/sources/oracle-siebel-crm.md @@ -1,19 +1,20 @@ # Oracle Siebel CRM -[Oracle Siebel CRM](https://www.oracle.com/cx/siebel/) is a Customer Relationship Management platform. +[Oracle Siebel CRM](https://www.oracle.com/cx/siebel/) is a Customer Relationship Management platform. ## Sync overview -Oracle Siebel CRM can run on the [Oracle, MSSQL, or IBM DB2](https://docs.oracle.com/cd/E88140_01/books/DevDep/installing-and-configuring-siebel-crm.html#PrerequisiteSoftware) databases. You can use Airbyte to sync your Oracle Siebel CRM instance by connecting to the underlying database using the appropriate Airbyte connector: +Oracle Siebel CRM can run on the [Oracle, MSSQL, or IBM DB2](https://docs.oracle.com/cd/E88140_01/books/DevDep/installing-and-configuring-siebel-crm.html#PrerequisiteSoftware) databases. You can use Airbyte to sync your Oracle Siebel CRM instance by connecting to the underlying database using the appropriate Airbyte connector: * [DB2](db2.md) -* [MSSQL](./mssql.md) +* [MSSQL](mssql.md) * [Oracle](oracle.md) {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} - ### Output schema -To understand your Oracle Siebel CRM database schema, see the [Organization Setup Overview docs](https://docs.oracle.com/cd/E88140_01/books/DevDep/basic-organization-setup-overview.html#basic-organization-setup-overview) documentation. Otherwise, the schema will be loaded according to the rules of the underlying database's connector. + +To understand your Oracle Siebel CRM database schema, see the [Organization Setup Overview docs](https://docs.oracle.com/cd/E88140_01/books/DevDep/basic-organization-setup-overview.html#basic-organization-setup-overview) documentation. Otherwise, the schema will be loaded according to the rules of the underlying database's connector. + diff --git a/docs/integrations/sources/oracle.md b/docs/integrations/sources/oracle.md index bbf6a338961..30b67a5c6eb 100644 --- a/docs/integrations/sources/oracle.md +++ b/docs/integrations/sources/oracle.md @@ -16,10 +16,11 @@ The Oracle source does not alter the schema present in your database. Depending on the destination connected to this source, however, the schema may be altered. See the destination's documentation for more details. -## Getting Started (Airbyte Cloud) +## Getting Started \(Airbyte Cloud\) + On Airbyte Cloud, only TLS connections to your Oracle instance are supported. Other than that, you can proceed with the open-source instructions below. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) #### Requirements @@ -63,17 +64,17 @@ Case sensitive. Defaults to the upper-cased user if empty. If the user does not ## Connection via SSH Tunnel -Airbyte has the ability to connect to a Oracle instance via an SSH Tunnel. The reason you might want to do this because it is not possible (or against security policy) to connect to the database directly (e.g. it does not have a public IP address). +Airbyte has the ability to connect to a Oracle instance via an SSH Tunnel. The reason you might want to do this because it is not possible \(or against security policy\) to connect to the database directly \(e.g. it does not have a public IP address\). -When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server (a.k.a. a bastion sever) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. +When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server \(a.k.a. a bastion sever\) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. Using this feature requires additional configuration, when creating the source. We will talk through what each piece of configuration means. 1. Configure all fields for the source as you normally would, except `SSH Tunnel Method`. -2. `SSH Tunnel Method` defaults to `No Tunnel` (meaning a direct connection). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. - 1. Choose `Key Authentication` if you will be using an RSA private key as your secret for establishing the SSH Tunnel (see below for more information on generating this key). +2. `SSH Tunnel Method` defaults to `No Tunnel` \(meaning a direct connection\). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. + 1. Choose `Key Authentication` if you will be using an RSA private key as your secret for establishing the SSH Tunnel \(see below for more information on generating this key\). 2. Choose `Password Authentication` if you will be using a password as your secret for establishing the SSH Tunnel. -3. `SSH Tunnel Jump Server Host` refers to the intermediate (bastion) server that Airbyte will connect to. This should be a hostname or an IP Address. +3. `SSH Tunnel Jump Server Host` refers to the intermediate \(bastion\) server that Airbyte will connect to. This should be a hostname or an IP Address. 4. `SSH Connection Port` is the port on the bastion server with which to make the SSH connection. The default port for SSH connections is `22`, so unless you have explicitly changed something, go with the default. 5. `SSH Login Username` is the username that Airbyte should use when connection to the bastion server. This is NOT the Oracle username. 6. If you are using `Password Authentication`, then `SSH Login Username` should be set to the password of the User from the previous step. If you are using `SSH Key Authentication` leave this blank. Again, this is not the Oracle password, but the password for the OS-user that Airbyte is using to perform commands on the bastion. @@ -81,19 +82,17 @@ Using this feature requires additional configuration, when creating the source. ### Generating an SSH Key Pair -The connector expects an RSA key in PEM format. To generate this key: +The connector expects an RSA key in PEM format. To generate this key: - ssh-keygen -t rsa -m PEM -f myuser_rsa +```text +ssh-keygen -t rsa -m PEM -f myuser_rsa +``` -This produces the private key in pem format, and the public key remains in the standard format used by the `authorized_keys` file on -your bastion host. The public key should be added to your bastion host to whichever user you want to use with Airbyte. The private -key is provided via copy-and-paste to the Airbyte connector configuration screen, so it may log in to the bastion. +This produces the private key in pem format, and the public key remains in the standard format used by the `authorized_keys` file on your bastion host. The public key should be added to your bastion host to whichever user you want to use with Airbyte. The private key is provided via copy-and-paste to the Airbyte connector configuration screen, so it may log in to the bastion. ## Data Type Mapping -Oracle data types are mapped to the following data types when synchronizing data. -You can check the test values examples [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-oracle/src/test-integration/java/io/airbyte/integrations/source/oracle/OracleSourceComprehensiveTest.java). -If you can't find the data type you are looking for or have any problems feel free to add a new test! +Oracle data types are mapped to the following data types when synchronizing data. You can check the test values examples [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-oracle/src/test-integration/java/io/airbyte/integrations/source/oracle/OracleSourceComprehensiveTest.java). If you can't find the data type you are looking for or have any problems feel free to add a new test! | Oracle Type | Resulting Type | Notes | | :--- | :--- | :--- | @@ -127,20 +126,16 @@ If you do not see a type in this list, assume that it is coerced into a string. Airbite has the ability to connect to the Oracle source with 3 network connectivity options: -1.`Unencrypted` the connection will be made using the TCP protocol. In this case, all data over the network will be transmitted in unencrypted form. -2.`Native network encryption` gives you the ability to encrypt database connections, without the configuration overhead of TCP / IP and SSL / TLS and without the need to open and listen on different ports. -In this case, the *SQLNET.ENCRYPTION_CLIENT* option will always be set as *REQUIRED* by default: The client or server will only accept encrypted traffic, -but the user has the opportunity to choose an `Encryption algorithm` according to the security policies he needs. -3.`TLS Encrypted` (verify certificate) - if this option is selected, data transfer will be transfered using the TLS protocol, taking into account the handshake procedure and certificate verification. -To use this option, insert the content of the certificate issued by the server into the `SSL PEM file` field +1.`Unencrypted` the connection will be made using the TCP protocol. In this case, all data over the network will be transmitted in unencrypted form. 2.`Native network encryption` gives you the ability to encrypt database connections, without the configuration overhead of TCP / IP and SSL / TLS and without the need to open and listen on different ports. In this case, the _SQLNET.ENCRYPTION\_CLIENT_ option will always be set as _REQUIRED_ by default: The client or server will only accept encrypted traffic, but the user has the opportunity to choose an `Encryption algorithm` according to the security policies he needs. 3.`TLS Encrypted` \(verify certificate\) - if this option is selected, data transfer will be transfered using the TLS protocol, taking into account the handshake procedure and certificate verification. To use this option, insert the content of the certificate issued by the server into the `SSL PEM file` field ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.3.7 | 2021-10-01 | [6616](https://github.com/airbytehq/airbyte/pull/6616) | Added network encryption options | -| 0.3.6 | 2021-09-30 | [6585](https://github.com/airbytehq/airbyte/pull/6585) | Improved SSH Tunnel key generation steps | -| 0.3.5 | 2021-09-22 | [6356](https://github.com/airbytehq/airbyte/pull/6356) | Added option to connect to DB via SSH. | -| 0.3.4 | 2021-09-01 | [6038](https://github.com/airbytehq/airbyte/pull/6038) | Remove automatic filtering of system schemas. | -| 0.3.3 | 2021-09-01 | [5779](https://github.com/airbytehq/airbyte/pull/5779) | Ability to only discover certain schemas. | -| 0.3.2 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator. | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.3.7 | 2021-10-01 | [6616](https://github.com/airbytehq/airbyte/pull/6616) | Added network encryption options | +| 0.3.6 | 2021-09-30 | [6585](https://github.com/airbytehq/airbyte/pull/6585) | Improved SSH Tunnel key generation steps | +| 0.3.5 | 2021-09-22 | [6356](https://github.com/airbytehq/airbyte/pull/6356) | Added option to connect to DB via SSH. | +| 0.3.4 | 2021-09-01 | [6038](https://github.com/airbytehq/airbyte/pull/6038) | Remove automatic filtering of system schemas. | +| 0.3.3 | 2021-09-01 | [5779](https://github.com/airbytehq/airbyte/pull/5779) | Ability to only discover certain schemas. | +| 0.3.2 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator. | + diff --git a/docs/integrations/sources/paypal-transaction.md b/docs/integrations/sources/paypal-transaction.md index ef658044e1d..6aa6c89c9f9 100644 --- a/docs/integrations/sources/paypal-transaction.md +++ b/docs/integrations/sources/paypal-transaction.md @@ -1,65 +1,62 @@ -# Paypal Transaction API - -## Overview - -The [Paypal Transaction API](https://developer.paypal.com/docs/api/transaction-search/v1/). is used to get the history of transactions for a PayPal account. - - -#### Output schema - -This Source is capable of syncing the following core Streams: - -* [Transactions](https://developer.paypal.com/docs/api/transaction-search/v1/#transactions) -* [Balances](https://developer.paypal.com/docs/api/transaction-search/v1/#balances) - -#### Data type mapping - -| Integration Type | Airbyte Type | Notes | -| :--- | :--- | :--- | -| `string` | `string` | | -| `number` | `number` | | -| `array` | `array` | | -| `object` | `object` | | - -#### Features - -| Feature | Supported? | -| :--- | :--- | -| Full Refresh Sync | Yes | -| Incremental - Append Sync | Yes | -| Namespaces | No | - - -### Getting started - -### Requirements - -* client_id. -* secret. -* is_sandbox. - -### Setup guide - -In order to get an `Client ID` and `Secret` please go to [this](https://developer.paypal.com/docs/platforms/get-started/ page and follow the instructions. After registration you may find your `Client ID` and `Secret` [here](https://developer.paypal.com/developer/accounts/). - - -## Performance considerations - -Paypal transaction API has some [limits](https://developer.paypal.com/docs/integration/direct/transaction-search/) -- `start_date_min` = 3 years, API call lists transaction for the previous three years. -- `start_date_max` = 1.5 days, it takes a maximum of three hours for executed transactions to appear in the list transactions call. It is set to 1.5 days by default based on experience, otherwise API throw an error. -- `stream_slice_period` = 1 day, the maximum supported date range is 31 days. -- `records_per_request` = 10000, the maximum number of records in a single request. -- `page_size` = 500, the maximum page size is 500. -- `requests_per_minute` = 30, maximum limit is 50 requests per minute from IP address to all endpoint - -Transactions sync is performed with default `stream_slice_period` = 1 day, it means that there will be 1 request for each day between start_date and now (or end_date). if `start_date` is greater then `start_date_max`. -Balances sync is similarly performed with default `stream_slice_period` = 1 day, but it will do additional request for the end_date of the sync (now). - -## Changelog - -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-08-03 | [5155](https://github.com/airbytehq/airbyte/pull/5155) | fix start_date_min limit | -| 0.1.0 | 2021-06-10 | [4240](https://github.com/airbytehq/airbyte/pull/4240) | PayPal Transaction Search API | - +# Paypal Transaction + +## Overview + +The [Paypal Transaction API](https://developer.paypal.com/docs/api/transaction-search/v1/). is used to get the history of transactions for a PayPal account. + +#### Output schema + +This Source is capable of syncing the following core Streams: + +* [Transactions](https://developer.paypal.com/docs/api/transaction-search/v1/#transactions) +* [Balances](https://developer.paypal.com/docs/api/transaction-search/v1/#balances) + +#### Data type mapping + +| Integration Type | Airbyte Type | Notes | +| :--- | :--- | :--- | +| `string` | `string` | | +| `number` | `number` | | +| `array` | `array` | | +| `object` | `object` | | + +#### Features + +| Feature | Supported? | +| :--- | :--- | +| Full Refresh Sync | Yes | +| Incremental - Append Sync | Yes | +| Namespaces | No | + +### Getting started + +### Requirements + +* client\_id. +* secret. +* is\_sandbox. + +### Setup guide + +In order to get an `Client ID` and `Secret` please go to \[this\]\([https://developer.paypal.com/docs/platforms/get-started/](https://developer.paypal.com/docs/platforms/get-started/) page and follow the instructions. After registration you may find your `Client ID` and `Secret` [here](https://developer.paypal.com/developer/accounts/). + +## Performance considerations + +Paypal transaction API has some [limits](https://developer.paypal.com/docs/integration/direct/transaction-search/) + +* `start_date_min` = 3 years, API call lists transaction for the previous three years. +* `start_date_max` = 1.5 days, it takes a maximum of three hours for executed transactions to appear in the list transactions call. It is set to 1.5 days by default based on experience, otherwise API throw an error. +* `stream_slice_period` = 1 day, the maximum supported date range is 31 days. +* `records_per_request` = 10000, the maximum number of records in a single request. +* `page_size` = 500, the maximum page size is 500. +* `requests_per_minute` = 30, maximum limit is 50 requests per minute from IP address to all endpoint + +Transactions sync is performed with default `stream_slice_period` = 1 day, it means that there will be 1 request for each day between start\_date and now \(or end\_date\). if `start_date` is greater then `start_date_max`. Balances sync is similarly performed with default `stream_slice_period` = 1 day, but it will do additional request for the end\_date of the sync \(now\). + +## Changelog + +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-08-03 | [5155](https://github.com/airbytehq/airbyte/pull/5155) | fix start\_date\_min limit | +| 0.1.0 | 2021-06-10 | [4240](https://github.com/airbytehq/airbyte/pull/4240) | PayPal Transaction Search API | + diff --git a/docs/integrations/sources/pipedrive.md b/docs/integrations/sources/pipedrive.md index f2e572cc913..c3b3f0f5d48 100644 --- a/docs/integrations/sources/pipedrive.md +++ b/docs/integrations/sources/pipedrive.md @@ -2,31 +2,39 @@ ## Overview -The Pipedrive connector can be used to sync your Pipedrive data. It supports full refresh sync for Deals, Leads, Activities, ActivityFields, -Persons, Pipelines, Stages, Users streams and incremental sync for Activities, Deals, Persons, Pipelines, Stages, Users streams. +The Pipedrive connector can be used to sync your Pipedrive data. It supports full refresh sync for Deals, Leads, Activities, ActivityFields, Persons, Pipelines, Stages, Users streams and incremental sync for Activities, Deals, Persons, Pipelines, Stages, Users streams. -There was a priority to include at least a single stream of each stream type which is present on Pipedrive, so the list of the supported -streams is meant to be easily extendable. By the way, we can only support incremental stream support for the streams listed -[there](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents). +There was a priority to include at least a single stream of each stream type which is present on Pipedrive, so the list of the supported streams is meant to be easily extendable. By the way, we can only support incremental stream support for the streams listed [there](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents). ### Output schema Several output streams are available from this source: * [Activities](https://developers.pipedrive.com/docs/api/v1/Activities#getActivities), - retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) (incremental) + + retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) \(incremental\) + * [ActivityFields](https://developers.pipedrive.com/docs/api/v1/ActivityFields#getActivityFields) * [Deals](https://developers.pipedrive.com/docs/api/v1/Deals#getDeals), - retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) (incremental) + + retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) \(incremental\) + * [Leads](https://developers.pipedrive.com/docs/api/v1/Leads#getLeads) * [Persons](https://developers.pipedrive.com/docs/api/v1/Persons#getPersons), - retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) (incremental) + + retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) \(incremental\) + * [Pipelines](https://developers.pipedrive.com/docs/api/v1/Pipelines#getPipelines), - retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) (incremental) + + retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) \(incremental\) + * [Stages](https://developers.pipedrive.com/docs/api/v1/Stages#getStages), - retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) (incremental) + + retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) \(incremental\) + * [Users](https://developers.pipedrive.com/docs/api/v1/Users#getUsers), - retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) (incremental) + + retrieved by [getRecents](https://developers.pipedrive.com/docs/api/v1/Recents#getRecents) \(incremental\) ### Features @@ -53,32 +61,36 @@ The Pipedrive connector will gracefully handle rate limits. For more information This connector supports only authentication with API Token. To obtain API Token follow the instructions below: #### Enable API: + 1. Click Manage users from the left-side menu. -1. Click on the Permission sets tab. -1. Choose the set where the user (who needs the API enabled) belongs to. -1. Lastly, click on "use API" on the right-hand side section (you need to scroll down a bit). +2. Click on the Permission sets tab. +3. Choose the set where the user \(who needs the API enabled\) belongs to. +4. Lastly, click on "use API" on the right-hand side section \(you need to scroll down a bit\). + Now all users who belong in the set that has the API enabled can find their API token under - Settings > Personal Preferences > API in their Pipedrive web app. - + + Settings > Personal Preferences > API in their Pipedrive web app. + See [Enabling API for company users](https://pipedrive.readme.io/docs/enabling-api-for-company-users) for more info. - + #### How to find the API token: -1. Account name (on the top right) -1. Company settings -1. Personal preferences -1. API -1. Copy API Token + +1. Account name \(on the top right\) +2. Company settings +3. Personal preferences +4. API +5. Copy API Token See [How to find the API token](https://pipedrive.readme.io/docs/how-to-find-the-api-token) for more info. - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.5 | 2021-09-27 | [6441](https://github.com/airbytehq/airbyte/pull/6441) | Fix normalization error | -| 0.1.4 | 2021-08-26 | [5943](https://github.com/airbytehq/airbyte/pull/5943) | Add organizations stream | -| 0.1.3 | 2021-08-26 | [5642](https://github.com/airbytehq/airbyte/pull/5642) | Remove date-time from deals stream | -| 0.1.2 | 2021-07-23 | [4912](https://github.com/airbytehq/airbyte/pull/4912) | Update money type to support floating point | -| 0.1.1 | 2021-07-19 | [4686](https://github.com/airbytehq/airbyte/pull/4686) | Update spec.json | -| 0.1.0 | 2021-07-19 | [4686](https://github.com/airbytehq/airbyte/pull/4686) | Release Pipedrive connector! | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.5 | 2021-09-27 | [6441](https://github.com/airbytehq/airbyte/pull/6441) | Fix normalization error | +| 0.1.4 | 2021-08-26 | [5943](https://github.com/airbytehq/airbyte/pull/5943) | Add organizations stream | +| 0.1.3 | 2021-08-26 | [5642](https://github.com/airbytehq/airbyte/pull/5642) | Remove date-time from deals stream | +| 0.1.2 | 2021-07-23 | [4912](https://github.com/airbytehq/airbyte/pull/4912) | Update money type to support floating point | +| 0.1.1 | 2021-07-19 | [4686](https://github.com/airbytehq/airbyte/pull/4686) | Update spec.json | +| 0.1.0 | 2021-07-19 | [4686](https://github.com/airbytehq/airbyte/pull/4686) | Release Pipedrive connector! | + diff --git a/docs/integrations/sources/pokeapi.md b/docs/integrations/sources/pokeapi.md index e3b18fdbff3..001e7111c31 100644 --- a/docs/integrations/sources/pokeapi.md +++ b/docs/integrations/sources/pokeapi.md @@ -24,7 +24,7 @@ This source uses the fully open [PokéAPI](https://pokeapi.co/docs/v2#info) to s Currently, only one output stream is available from this source, which is the Pokémon output stream. This schema is defined [here](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-pokeapi/source_pokeapi/schemas/pokemon.json). -## Rate Limiting & Performance Considerations (Airbyte Open-Source) +## Rate Limiting & Performance Considerations \(Airbyte Open-Source\) According to the API's [fair use policy](https://pokeapi.co/docs/v2#fairuse), please make sure to cache resources retrieved from the PokéAPI wherever possible. That said, the PokéAPI does not perform rate limiting. @@ -32,11 +32,10 @@ According to the API's [fair use policy](https://pokeapi.co/docs/v2#fairuse), pl The PokéAPI uses the same [JSONSchema](https://json-schema.org/understanding-json-schema/reference/index.html) types that Airbyte uses internally \(`string`, `date-time`, `object`, `array`, `boolean`, `integer`, and `number`\), so no type conversions happen as part of this source. - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2020-06-29 | [1046](https://github.com/airbytehq/airbyte/pull/4410) | Fix runtime UI error from GitHub store path. | -| 0.1.0 | 2020-05-04 | [1046](https://github.com/airbytehq/airbyte/pull/3149) | Add source for PokeAPI. | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2020-06-29 | [1046](https://github.com/airbytehq/airbyte/pull/4410) | Fix runtime UI error from GitHub store path. | +| 0.1.0 | 2020-05-04 | [1046](https://github.com/airbytehq/airbyte/pull/3149) | Add source for PokeAPI. | diff --git a/docs/integrations/sources/postgres.md b/docs/integrations/sources/postgres.md index ca2a9d4a847..0a1b27acdd6 100644 --- a/docs/integrations/sources/postgres.md +++ b/docs/integrations/sources/postgres.md @@ -19,10 +19,11 @@ The Postgres source does not alter the schema present in your database. Depending on the destination connected to this source, however, the schema may be altered. See the destination's documentation for more details. -## Getting Started (Airbyte Cloud) +## Getting Started \(Airbyte Cloud\) + On Airbyte Cloud, only TLS connections to your Postgres instance are supported. Other than that, you can proceed with the open-source instructions below. -## Getting Started (Airbyte Open-Source) +## Getting Started \(Airbyte Open-Source\) #### Requirements @@ -109,8 +110,7 @@ We recommend using a user specifically for Airbyte's replication so you can mini #### 3. Select replication plugin -We recommend using a `pgoutput` plugin as it is the standard logical decoding plugin in Postgres. -In case the replication table contains a lot of big JSON blobs and table size exceeds 1 GB, we recommend using a `wal2json` instead. Please note that `wal2json` may require additional installation for Bare Metal, VMs \(EC2/GCE/etc\), Docker, etc. For more information read [wal2json documentation](https://github.com/eulerto/wal2json). +We recommend using a `pgoutput` plugin as it is the standard logical decoding plugin in Postgres. In case the replication table contains a lot of big JSON blobs and table size exceeds 1 GB, we recommend using a `wal2json` instead. Please note that `wal2json` may require additional installation for Bare Metal, VMs \(EC2/GCE/etc\), Docker, etc. For more information read [wal2json documentation](https://github.com/eulerto/wal2json). #### 4. Create replication slot @@ -179,17 +179,17 @@ Unfortunately, logical replication is not configurable for Google CloudSQL. You ## Connection via SSH Tunnel -Airbyte has the ability to connect to a Postgres instance via an SSH Tunnel. The reason you might want to do this because it is not possible (or against security policy) to connect to the database directly (e.g. it does not have a public IP address). +Airbyte has the ability to connect to a Postgres instance via an SSH Tunnel. The reason you might want to do this because it is not possible \(or against security policy\) to connect to the database directly \(e.g. it does not have a public IP address\). -When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server (a.k.a. a bastion sever) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. +When using an SSH tunnel, you are configuring Airbyte to connect to an intermediate server \(a.k.a. a bastion sever\) that _does_ have direct access to the database. Airbyte connects to the bastion and then asks the bastion to connect directly to the server. Using this feature requires additional configuration, when creating the source. We will talk through what each piece of configuration means. 1. Configure all fields for the source as you normally would, except `SSH Tunnel Method`. -2. `SSH Tunnel Method` defaults to `No Tunnel` (meaning a direct connection). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. - 1. Choose `Key Authentication` if you will be using an RSA Private as your secrets for establishing the SSH Tunnel (see below for more information on generating this key). +2. `SSH Tunnel Method` defaults to `No Tunnel` \(meaning a direct connection\). If you want to use an SSH Tunnel choose `SSH Key Authentication` or `Password Authentication`. + 1. Choose `Key Authentication` if you will be using an RSA Private as your secrets for establishing the SSH Tunnel \(see below for more information on generating this key\). 2. Choose `Password Authentication` if you will be using a password as your secret for establishing the SSH Tunnel. -3. `SSH Tunnel Jump Server Host` refers to the intermediate (bastion) server that Airbyte will connect to. This should be a hostname or an IP Address. +3. `SSH Tunnel Jump Server Host` refers to the intermediate \(bastion\) server that Airbyte will connect to. This should be a hostname or an IP Address. 4. `SSH Connection Port` is the port on the bastion server with which to make the SSH connection. The default port for SSH connections is `22`, so unless you have explicitly changed something, go with the default. 5. `SSH Login Username` is the username that Airbyte should use when connection to the bastion server. This is NOT the Postgres username. 6. If you are using `Password Authentication`, then `SSH Login Username` should be set to the password of the User from the previous step. If you are using `SSH Key Authentication` leave this blank. Again, this is not the Postgres password, but the password for the OS-user that Airbyte is using to perform commands on the bastion. @@ -197,25 +197,23 @@ Using this feature requires additional configuration, when creating the source. ### Generating an RSA Private Key -The connector expects an RSA key in PEM format. To generate this key: +The connector expects an RSA key in PEM format. To generate this key: - ssh-keygen -t rsa -m PEM -f myuser_rsa +```text +ssh-keygen -t rsa -m PEM -f myuser_rsa +``` -This produces the private key in pem format, and the public key remains in the standard format used by the `authorized_keys` file on -your bastion host. The public key should be added to your bastion host to whichever user you want to use with Airbyte. The private -key is provided via copy-and-paste to the Airbyte connector configuration screen, so it may log in to the bastion. +This produces the private key in pem format, and the public key remains in the standard format used by the `authorized_keys` file on your bastion host. The public key should be added to your bastion host to whichever user you want to use with Airbyte. The private key is provided via copy-and-paste to the Airbyte connector configuration screen, so it may log in to the bastion. ## Data type mapping -Postgres data types are mapped to the following data types when synchronizing data. -You can check the test values examples [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-postgres/src/test-integration/java/io/airbyte/integrations/io/airbyte/integration_tests/sources/PostresSourceComprehensiveTest.java). -If you can't find the data type you are looking for or have any problems feel free to add a new test! +Postgres data types are mapped to the following data types when synchronizing data. You can check the test values examples [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-postgres/src/test-integration/java/io/airbyte/integrations/io/airbyte/integration_tests/sources/PostresSourceComprehensiveTest.java). If you can't find the data type you are looking for or have any problems feel free to add a new test! | Postgres Type | Resulting Type | Notes | | :--- | :--- | :--- | | `bigint` | number | | | `bigserial` | number | | -| `bit` | boolean | | +| `bit` | boolean | | | `blob` | boolean | | | `boolean` | boolean | | | `box` | string | | @@ -267,32 +265,33 @@ If you can't find the data type you are looking for or have any problems feel fr ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.3.12 | 2021-09-30 | [6585](https://github.com/airbytehq/airbyte/pull/6585) | Improved SSH Tunnel key generation steps | -| 0.3.11 | 2021-09-02 | [5742](https://github.com/airbytehq/airbyte/pull/5742) | Add SSH Tunnel support | -| 0.3.9 | 2021-08-17 | [5304](https://github.com/airbytehq/airbyte/pull/5304) | Fix CDC OOM issue | -| 0.3.8 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | -| 0.3.4 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | -| 0.3.3 | 2021-06-08 | [3960](https://github.com/airbytehq/airbyte/pull/3960) | Add method field in specification parameters | -| 0.3.2 | 2021-05-26 | [3179](https://github.com/airbytehq/airbyte/pull/3179) | Remove `isCDC` logging | -| 0.3.1 | 2021-04-21 | [2878](https://github.com/airbytehq/airbyte/pull/2878) | Set defined cursor for CDC | -| 0.3.0 | 2021-04-21 | [2990](https://github.com/airbytehq/airbyte/pull/2990) | Support namespaces | -| 0.2.7 | 2021-04-16 | [2923](https://github.com/airbytehq/airbyte/pull/2923) | SSL spec as optional | -| 0.2.6 | 2021-04-16 | [2757](https://github.com/airbytehq/airbyte/pull/2757) | Support SSL connection | -| 0.2.5 | 2021-04-12 | [2859](https://github.com/airbytehq/airbyte/pull/2859) | CDC bugfix | -| 0.2.4 | 2021-04-09 | [2548](https://github.com/airbytehq/airbyte/pull/2548) | Support CDC | -| 0.2.3 | 2021-03-28 | [2600](https://github.com/airbytehq/airbyte/pull/2600) | Add NCHAR and NVCHAR support to DB and cursor type casting | -| 0.2.2 | 2021-03-26 | [2460](https://github.com/airbytehq/airbyte/pull/2460) | Destination supports destination sync mode | -| 0.2.1 | 2021-03-18 | [2488](https://github.com/airbytehq/airbyte/pull/2488) | Sources support primary keys | -| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | -| 0.1.13 | 2021-02-02 | [1887](https://github.com/airbytehq/airbyte/pull/1887) | Migrate AbstractJdbcSource to use iterators | -| 0.1.12 | 2021-01-25 | [1746](https://github.com/airbytehq/airbyte/pull/1746) | Fix NPE in State Decorator | -| 0.1.11 | 2021-01-25 | [1765](https://github.com/airbytehq/airbyte/pull/1765) | Add field titles to specification | -| 0.1.10 | 2021-01-19 | [1724](https://github.com/airbytehq/airbyte/pull/1724) | Fix JdbcSource handling of tables with same names in different schemas | -| 0.1.9 | 2021-01-14 | [1655](https://github.com/airbytehq/airbyte/pull/1655) | Fix JdbcSource OOM | -| 0.1.8 | 2021-01-13 | [1588](https://github.com/airbytehq/airbyte/pull/1588) | Handle invalid numeric values in JDBC source | -| 0.1.7 | 2021-01-08 | [1307](https://github.com/airbytehq/airbyte/pull/1307) | Migrate Postgres and MySql to use new JdbcSource | -| 0.1.6 | 2020-12-09 | [1172](https://github.com/airbytehq/airbyte/pull/1172) | Support incremental sync | -| 0.1.5 | 2020-11-30 | [1038](https://github.com/airbytehq/airbyte/pull/1038) | Change JDBC sources to discover more than standard schemas | -| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.3.12 | 2021-09-30 | [6585](https://github.com/airbytehq/airbyte/pull/6585) | Improved SSH Tunnel key generation steps | +| 0.3.11 | 2021-09-02 | [5742](https://github.com/airbytehq/airbyte/pull/5742) | Add SSH Tunnel support | +| 0.3.9 | 2021-08-17 | [5304](https://github.com/airbytehq/airbyte/pull/5304) | Fix CDC OOM issue | +| 0.3.8 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | +| 0.3.4 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | +| 0.3.3 | 2021-06-08 | [3960](https://github.com/airbytehq/airbyte/pull/3960) | Add method field in specification parameters | +| 0.3.2 | 2021-05-26 | [3179](https://github.com/airbytehq/airbyte/pull/3179) | Remove `isCDC` logging | +| 0.3.1 | 2021-04-21 | [2878](https://github.com/airbytehq/airbyte/pull/2878) | Set defined cursor for CDC | +| 0.3.0 | 2021-04-21 | [2990](https://github.com/airbytehq/airbyte/pull/2990) | Support namespaces | +| 0.2.7 | 2021-04-16 | [2923](https://github.com/airbytehq/airbyte/pull/2923) | SSL spec as optional | +| 0.2.6 | 2021-04-16 | [2757](https://github.com/airbytehq/airbyte/pull/2757) | Support SSL connection | +| 0.2.5 | 2021-04-12 | [2859](https://github.com/airbytehq/airbyte/pull/2859) | CDC bugfix | +| 0.2.4 | 2021-04-09 | [2548](https://github.com/airbytehq/airbyte/pull/2548) | Support CDC | +| 0.2.3 | 2021-03-28 | [2600](https://github.com/airbytehq/airbyte/pull/2600) | Add NCHAR and NVCHAR support to DB and cursor type casting | +| 0.2.2 | 2021-03-26 | [2460](https://github.com/airbytehq/airbyte/pull/2460) | Destination supports destination sync mode | +| 0.2.1 | 2021-03-18 | [2488](https://github.com/airbytehq/airbyte/pull/2488) | Sources support primary keys | +| 0.2.0 | 2021-03-09 | [2238](https://github.com/airbytehq/airbyte/pull/2238) | Protocol allows future/unknown properties | +| 0.1.13 | 2021-02-02 | [1887](https://github.com/airbytehq/airbyte/pull/1887) | Migrate AbstractJdbcSource to use iterators | +| 0.1.12 | 2021-01-25 | [1746](https://github.com/airbytehq/airbyte/pull/1746) | Fix NPE in State Decorator | +| 0.1.11 | 2021-01-25 | [1765](https://github.com/airbytehq/airbyte/pull/1765) | Add field titles to specification | +| 0.1.10 | 2021-01-19 | [1724](https://github.com/airbytehq/airbyte/pull/1724) | Fix JdbcSource handling of tables with same names in different schemas | +| 0.1.9 | 2021-01-14 | [1655](https://github.com/airbytehq/airbyte/pull/1655) | Fix JdbcSource OOM | +| 0.1.8 | 2021-01-13 | [1588](https://github.com/airbytehq/airbyte/pull/1588) | Handle invalid numeric values in JDBC source | +| 0.1.7 | 2021-01-08 | [1307](https://github.com/airbytehq/airbyte/pull/1307) | Migrate Postgres and MySql to use new JdbcSource | +| 0.1.6 | 2020-12-09 | [1172](https://github.com/airbytehq/airbyte/pull/1172) | Support incremental sync | +| 0.1.5 | 2020-11-30 | [1038](https://github.com/airbytehq/airbyte/pull/1038) | Change JDBC sources to discover more than standard schemas | +| 0.1.4 | 2020-11-30 | [1046](https://github.com/airbytehq/airbyte/pull/1046) | Add connectors using an index YAML file | + diff --git a/docs/integrations/sources/posthog.md b/docs/integrations/sources/posthog.md index a20e70243ea..fe2bcf5c528 100644 --- a/docs/integrations/sources/posthog.md +++ b/docs/integrations/sources/posthog.md @@ -2,16 +2,15 @@ ## Sync overview -This source can sync data for the [PostHog API](https://posthog.com/docs/api/overview). It supports both Full Refresh and Incremental syncs. -You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. +This source can sync data for the [PostHog API](https://posthog.com/docs/api/overview). It supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. ### Output schema This Source is capable of syncing the following core Streams: -* [Annotations](https://posthog.com/docs/api/annotations) (Incremental) +* [Annotations](https://posthog.com/docs/api/annotations) \(Incremental\) * [Cohorts](https://posthog.com/docs/api/cohorts) -* [Events](https://posthog.com/docs/api/events) (Incremental) +* [Events](https://posthog.com/docs/api/events) \(Incremental\) * [FeatureFlags](https://posthog.com/docs/api/feature-flags) * [Insights](https://posthog.com/docs/api/insights) * [InsightsPath](https://posthog.com/docs/api/insights) @@ -50,14 +49,15 @@ Please [create an issue](https://github.com/airbytehq/airbyte/issues) if you see ### Setup guide -Please follow these [steps](https://posthog.com/docs/api/overview#how-to-obtain-a-personal-api-key) to obtain Private API Key for your account. +Please follow these [steps](https://posthog.com/docs/api/overview#how-to-obtain-a-personal-api-key) to obtain Private API Key for your account. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.4 | 2021-09-14 | [6058](https://github.com/airbytehq/airbyte/pull/6058) | Support self-hosted posthog instances | -| 0.1.3 | 2021-07-20 | [4001](https://github.com/airbytehq/airbyte/pull/4001) | Incremental streams read only relevant pages| -| 0.1.2 | 2021-07-15 | [4692](https://github.com/airbytehq/airbyte/pull/4692) | Use account information for checking the connection| -| 0.1.1 | 2021-07-05 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` env variable for kubernetes support| -| 0.1.0 | 2021-06-08 | [3768](https://github.com/airbytehq/airbyte/pull/3768) | Initial Release| +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.4 | 2021-09-14 | [6058](https://github.com/airbytehq/airbyte/pull/6058) | Support self-hosted posthog instances | +| 0.1.3 | 2021-07-20 | [4001](https://github.com/airbytehq/airbyte/pull/4001) | Incremental streams read only relevant pages | +| 0.1.2 | 2021-07-15 | [4692](https://github.com/airbytehq/airbyte/pull/4692) | Use account information for checking the connection | +| 0.1.1 | 2021-07-05 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` env variable for kubernetes support | +| 0.1.0 | 2021-06-08 | [3768](https://github.com/airbytehq/airbyte/pull/3768) | Initial Release | + diff --git a/docs/integrations/sources/presta-shop.md b/docs/integrations/sources/presta-shop.md index 61692a7c7b7..a04c203fc3a 100644 --- a/docs/integrations/sources/presta-shop.md +++ b/docs/integrations/sources/presta-shop.md @@ -64,17 +64,15 @@ This Source is capable of syncing the following core Streams: * [Weight Ranges](https://devdocs.prestashop.com/1.7/webservice/resources/weight_ranges/) * [Zones](https://devdocs.prestashop.com/1.7/webservice/resources/zones/) - - If there are more endpoints you'd like Airbyte to support, please [create an issue.](https://github.com/airbytehq/airbyte/issues/new/choose) ### Features | Feature | Supported? | | -| :--- | :--- | :--- +| :--- | :--- | :--- | | Full Refresh Sync | Yes | | | Incremental Sync | Yes | Addresses, Cart Rules, Carts, Categories, Customer Messages, Customer Threads, Customers, Manufacturers, Messages, Order Carriers, Order Histories, Order Invoices, Order Payments, Order Slip, Orders, Products, Stock Movement Reasons, Stock Movements, Stores, Suppliers, Tax Rule Groups | -| Replicate Incremental Deletes | Coming soon | | +| Replicate Incremental Deletes | Coming soon | | | SSL connection | Yes | | | Namespaces | No | | @@ -92,5 +90,6 @@ By default, the webservice feature is disabled on PrestaShop and needs to be [sw ## CHANGELOG | Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | -| 0.1.0 | 2021-07-02 | [#4465](https://github.com/airbytehq/airbyte/pull/4465) | Initial implementation | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-07-02 | [\#4465](https://github.com/airbytehq/airbyte/pull/4465) | Initial implementation | + diff --git a/docs/integrations/sources/quickbooks.md b/docs/integrations/sources/quickbooks.md index a69312711f0..ab7509066c5 100644 --- a/docs/integrations/sources/quickbooks.md +++ b/docs/integrations/sources/quickbooks.md @@ -77,6 +77,7 @@ The easiest way to get these credentials is by using Quickbook's [OAuth 2.0 play ## CHANGELOG | Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | +| :--- | :--- | :--- | :--- | | `0.1.3` | 2021-08-10 | [4986](https://github.com/airbytehq/airbyte/pull/4986) | Using number data type for decimal fields instead string | | `0.1.2` | 2021-07-06 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | + diff --git a/docs/integrations/sources/recharge.md b/docs/integrations/sources/recharge.md index fead348af5f..9bd2b4fbb9a 100644 --- a/docs/integrations/sources/recharge.md +++ b/docs/integrations/sources/recharge.md @@ -48,7 +48,8 @@ Please read [How to generate your API token](https://support.rechargepayments.co ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.2 | 2021-09-17 | [6149](https://github.com/airbytehq/airbyte/pull/6149) | Change `cursor_field` for Incremental streams | -| \ No newline at end of file +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.2 | 2021-09-17 | [6149](https://github.com/airbytehq/airbyte/pull/6149) | Change `cursor_field` for Incremental streams | +| | | | | + diff --git a/docs/integrations/sources/redshift.md b/docs/integrations/sources/redshift.md index d65c1df233e..7d7a711ddae 100644 --- a/docs/integrations/sources/redshift.md +++ b/docs/integrations/sources/redshift.md @@ -45,9 +45,9 @@ This is dependent on your networking setup. The easiest way to verify if Airbyte Next is to provide the necessary information on how to connect to your cluster such as the `host` whcih is part of the connection string or Endpoint accessible [here](https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-connect-to-cluster.html#rs-gsg-how-to-get-connection-string) without the `port` and `database` name \(it typically includes the cluster-id, region and end with `.redshift.amazonaws.com`\). - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.3.2 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | \ No newline at end of file +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.3.2 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | + diff --git a/docs/integrations/sources/s3.md b/docs/integrations/sources/s3.md index fae4e8e0974..055167ee70e 100644 --- a/docs/integrations/sources/s3.md +++ b/docs/integrations/sources/s3.md @@ -1,25 +1,26 @@ -# AWS S3 +# S3 ## Overview -The S3 source enables syncing of file-based tables with support for multiple files using glob-like pattern matching, and both Full Refresh and Incremental syncs, using the last_modified property of files to determine incremental batches. +The S3 source enables syncing of file-based tables with support for multiple files using glob-like pattern matching, and both Full Refresh and Incremental syncs, using the last\_modified property of files to determine incremental batches. You can choose if this connector will read only the new/updated files, or all the matching files, every time a sync is run. Connector allows using either Amazon S3 storage or 3rd party S3 compatible service like Wasabi or custom S3 services set up with minio, leofs, ceph etc. + ### Output Schema -At this time, this source produces only a single stream (table) for the target files. +At this time, this source produces only a single stream \(table\) for the target files. -By default, the schema will be automatically inferred from all the relevant files present when setting up the connection, however you can also specify a schema in the source settings to enforce desired columns and datatypes. Any additional columns found (on any sync) are packed into an extra mapping field called `_ab_additional_properties`. Any missing columns will be added and null-filled. +By default, the schema will be automatically inferred from all the relevant files present when setting up the connection, however you can also specify a schema in the source settings to enforce desired columns and datatypes. Any additional columns found \(on any sync\) are packed into an extra mapping field called `_ab_additional_properties`. Any missing columns will be added and null-filled. -We'll be considering extending these behaviours in the future and welcome your feedback! +We'll be considering extending these behaviours in the future and welcome your feedback! Note that you should provide the `dataset` which dictates how the table will be identified in the destination. ### Data Types -Currently, complex types (array and object) are coerced to string, but we'll be looking to improve support for this in the future! +Currently, complex types \(array and object\) are coerced to string, but we'll be looking to improve support for this in the future! ### Features @@ -68,62 +69,64 @@ We're looking to enable these other formats very soon, so watch this space! ### Requirements -- If syncing from a private bucket, the credentials you use for the connection must have have both `read` and `list` access on the S3 bucket. `list` is required to discover files based on the provided pattern(s). +* If syncing from a private bucket, the credentials you use for the connection must have have both `read` and `list` access on the S3 bucket. `list` is required to discover files based on the provided pattern\(s\). ### Quickstart 1. Create a new S3 source with a suitable name. Since each S3 source maps to just a single table, it may be worth including that in the name. -1. Set `dataset` appropriately. This will be the name of the table in the destination. -1. If your bucket contains *only* files containing data for this table, use `**` as path_pattern. See the [Path Patterns section](s3.md#path-patterns) for more specific pattern matching. -1. Leave schema as `{}` to automatically infer it from the file(s). For details on providing a schema, see the [User Schema section](s3.md#user-schema). -1. Fill in the fields within the provider box appropriately. If your bucket is not public, add [credentials](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) with sufficient permissions under `aws_access_key_id` and `aws_secret_access_key`. -1. Choose the format corresponding to the format of your files and fill in fields as required. If unsure about values, try out the defaults and come back if needed. Find details on these settings [here](s3.md#file-format-settings). +2. Set `dataset` appropriately. This will be the name of the table in the destination. +3. If your bucket contains _only_ files containing data for this table, use `**` as path\_pattern. See the [Path Patterns section](s3.md#path-patterns) for more specific pattern matching. +4. Leave schema as `{}` to automatically infer it from the file\(s\). For details on providing a schema, see the [User Schema section](s3.md#user-schema). +5. Fill in the fields within the provider box appropriately. If your bucket is not public, add [credentials](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) with sufficient permissions under `aws_access_key_id` and `aws_secret_access_key`. +6. Choose the format corresponding to the format of your files and fill in fields as required. If unsure about values, try out the defaults and come back if needed. Find details on these settings [here](s3.md#file-format-settings). ### Path Pattern -(tl;dr -> path pattern syntax using [wcmatch.glob](https://facelessuser.github.io/wcmatch/glob/). GLOBSTAR and SPLIT flags are enabled.) +\(tl;dr -> path pattern syntax using [wcmatch.glob](https://facelessuser.github.io/wcmatch/glob/). GLOBSTAR and SPLIT flags are enabled.\) This connector can sync multiple files by using glob-style patterns, rather than requiring a specific path for every file. This enables: -- Referencing many files with just one pattern, e.g. `**` would indicate every file in the bucket. -- Referencing future files that don't exist yet (and therefore don't have a specific path). +* Referencing many files with just one pattern, e.g. `**` would indicate every file in the bucket. +* Referencing future files that don't exist yet \(and therefore don't have a specific path\). -You must provide a path pattern. You can also provide many patterns split with | for more complex directory layouts. +You must provide a path pattern. You can also provide many patterns split with \| for more complex directory layouts. -Each path pattern is a reference from the *root* of the bucket, so don't include the bucket name in the pattern(s). +Each path pattern is a reference from the _root_ of the bucket, so don't include the bucket name in the pattern\(s\). Some example patterns: -- `**` : match everything. -- `**/*.csv` : match all files with specific extension. -- `myFolder/**/*.csv` : match all csv files anywhere under myFolder. -- `*/**` : match everything at least one folder deep. -- `*/*/*/**` : match everything at least three folders deep. -- `**/file.*|**/file` : match every file called "file" with any extension (or no extension). -- `x/*/y/*` : match all files that sit in folder x -> any folder -> folder y. -- `**/prefix*.csv` : match all csv files with specific prefix. -- `**/prefix*.parquet` : match all parquet files with specific prefix. +* `**` : match everything. +* `**/*.csv` : match all files with specific extension. +* `myFolder/**/*.csv` : match all csv files anywhere under myFolder. +* `*/**` : match everything at least one folder deep. +* `*/*/*/**` : match everything at least three folders deep. +* `**/file.*|**/file` : match every file called "file" with any extension \(or no extension\). +* `x/*/y/*` : match all files that sit in folder x -> any folder -> folder y. +* `**/prefix*.csv` : match all csv files with specific prefix. +* `**/prefix*.parquet` : match all parquet files with specific prefix. Let's look at a specific example, matching the following bucket layout: - myBucket - -> log_files - -> some_table_files - -> part1.csv - -> part2.csv - -> images - -> more_table_files - -> part3.csv - -> extras - -> misc - -> another_part1.csv +```text +myBucket + -> log_files + -> some_table_files + -> part1.csv + -> part2.csv + -> images + -> more_table_files + -> part3.csv + -> extras + -> misc + -> another_part1.csv +``` -We want to pick up part1.csv, part2.csv and part3.csv (excluding another_part1.csv for now). We could do this a few different ways: +We want to pick up part1.csv, part2.csv and part3.csv \(excluding another\_part1.csv for now\). We could do this a few different ways: -- We could pick up every csv file called "partX" with the single pattern `**/part*.csv`. -- To be a bit more robust, we could use the dual pattern `some_table_files/*.csv|more_table_files/*.csv` to pick up relevant files only from those exact folders. -- We could achieve the above in a single pattern by using the pattern `*table_files/*.csv`. This could however cause problems in the future if new unexpected folders started being created. -- We can also recursively wildcard, so adding the pattern `extras/**/*.csv` would pick up any csv files nested in folders below "extras", such as "extras/misc/another_part1.csv". +* We could pick up every csv file called "partX" with the single pattern `**/part*.csv`. +* To be a bit more robust, we could use the dual pattern `some_table_files/*.csv|more_table_files/*.csv` to pick up relevant files only from those exact folders. +* We could achieve the above in a single pattern by using the pattern `*table_files/*.csv`. This could however cause problems in the future if new unexpected folders started being created. +* We can also recursively wildcard, so adding the pattern `extras/**/*.csv` would pick up any csv files nested in folders below "extras", such as "extras/misc/another\_part1.csv". As you can probably tell, there are many ways to achieve the same goal with path patterns. We recommend using a pattern that ensures clarity and is robust against future additions to the directory structure. @@ -131,76 +134,80 @@ As you can probably tell, there are many ways to achieve the same goal with path Providing a schema allows for more control over the output of this stream. Without a provided schema, columns and datatypes will be inferred from each file and a superset schema created. This will probably be fine in most cases but there may be situations you want to enforce a schema instead, e.g.: -- You only care about a specific known subset of the columns. The other columns would all still be included, but packed into the `_ab_additional_properties` map. -- Your initial dataset is quite small (in terms of number of records), and you think the automatic type inference from this sample might not be representative of the data in the future. -- You want to purposely define types for every column. -- You know the names of columns that will be added to future data and want to include these in the core schema as columns rather than have them appear in the `_ab_additional_properties` map. +* You only care about a specific known subset of the columns. The other columns would all still be included, but packed into the `_ab_additional_properties` map. +* Your initial dataset is quite small \(in terms of number of records\), and you think the automatic type inference from this sample might not be representative of the data in the future. +* You want to purposely define types for every column. +* You know the names of columns that will be added to future data and want to include these in the core schema as columns rather than have them appear in the `_ab_additional_properties` map. Or any other reason! The schema must be provided as valid JSON as a map of `{"column": "datatype"}` where each datatype is one of: -- string -- number -- integer -- object -- array -- boolean -- null +* string +* number +* integer +* object +* array +* boolean +* null For example: -- {"id": "integer", "location": "string", "longitude": "number", "latitude": "number"} -- {"username": "string", "friends": "array", "information": "object"} +* {"id": "integer", "location": "string", "longitude": "number", "latitude": "number"} +* {"username": "string", "friends": "array", "information": "object"} ### S3 Provider Settings -- `bucket` : name of the bucket your files are in -- `aws_access_key_id` : one half of the [required credentials](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) for accessing a private bucket. -- `aws_secret_access_key` : other half of the [required credentials](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) for accessing a private bucket. -- `path_prefix` : an optional string that limits the files returned by AWS when listing files to only that those starting with this prefix. This is different to path_pattern as it gets pushed down to the API call made to S3 rather than filtered in Airbyte and it does not accept pattern-style symbols (like wildcards `*`). We recommend using this if your bucket has many folders and files that are unrelated to this stream and all the relevant files will always sit under this chosen prefix. -- `endpoint` : optional parameter that allow using of non Amazon S3 compatible services. Leave it blank for using default Amazon serivce. -- `use_ssl` : Allows using custom servers that configured to use plain http. Ignored in case of using Amazon service. -- `verify_ssl_cert` : Skip ssl validity check in case of using custom servers with self signed certificates. Ignored in case of using Amazon service. -### File Format Settings -The Reader in charge of loading the file format is currently based on [PyArrow](https://arrow.apache.org/docs/python/generated/pyarrow.csv.open_csv.html) (Apache Arrow). -Note that all files within one stream must adhere to the same read options for every provided format. +* `bucket` : name of the bucket your files are in +* `aws_access_key_id` : one half of the [required credentials](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) for accessing a private bucket. +* `aws_secret_access_key` : other half of the [required credentials](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) for accessing a private bucket. +* `path_prefix` : an optional string that limits the files returned by AWS when listing files to only that those starting with this prefix. This is different to path\_pattern as it gets pushed down to the API call made to S3 rather than filtered in Airbyte and it does not accept pattern-style symbols \(like wildcards `*`\). We recommend using this if your bucket has many folders and files that are unrelated to this stream and all the relevant files will always sit under this chosen prefix. +* `endpoint` : optional parameter that allow using of non Amazon S3 compatible services. Leave it blank for using default Amazon serivce. +* `use_ssl` : Allows using custom servers that configured to use plain http. Ignored in case of using Amazon service. +* `verify_ssl_cert` : Skip ssl validity check in case of using custom servers with self signed certificates. Ignored in case of using Amazon service. + + **File Format Settings** + + The Reader in charge of loading the file format is currently based on [PyArrow](https://arrow.apache.org/docs/python/generated/pyarrow.csv.open_csv.html) \(Apache Arrow\). + + Note that all files within one stream must adhere to the same read options for every provided format. #### CSV -Since CSV files are effectively plain text, providing specific reader options is often required for correct parsing of the files. -These settings are applied when a CSV is created or exported so please ensure that this process happens consistently over time. +Since CSV files are effectively plain text, providing specific reader options is often required for correct parsing of the files. These settings are applied when a CSV is created or exported so please ensure that this process happens consistently over time. -- `delimiter` : Even though CSV is an acronymn for Comma Separated Values, it is used more generally as a term for flat file data that may or may not be comma separated. The delimiter field lets you specify which character acts as the separator. -- `quote_char` : In some cases, data values may contain instances of reserved characters (like a comma, if that's the delimiter). CSVs can allow this behaviour by wrapping a value in defined quote characters so that on read it can parse it correctly. -- `escape_char` : An escape character can be used to prefix a reserved character and allow correct parsing. -- `encoding` : Some data may use a different character set (typically when different alphabets are involved). See the [list of allowable encodings here](https://docs.python.org/3/library/codecs.html#standard-encodings). -- `double_quote` : Whether two quotes in a quoted CSV value denote a single quote in the data. -- `newlines_in_values` : Sometimes referred to as `multiline`. In most cases, newline characters signal the end of a row in a CSV, however text data may contain newline characters within it. Setting this to True allows correct parsing in this case. -- `block_size` : This is the number of bytes to process in memory at a time while reading files. The default value here is usually fine but if your table is particularly wide (lots of columns / data in fields is large) then raising this might solve failures on detecting schema. Since this defines how much data to read into memory, raising this too high could cause Out Of Memory issues so use with caution. +* `delimiter` : Even though CSV is an acronymn for Comma Separated Values, it is used more generally as a term for flat file data that may or may not be comma separated. The delimiter field lets you specify which character acts as the separator. +* `quote_char` : In some cases, data values may contain instances of reserved characters \(like a comma, if that's the delimiter\). CSVs can allow this behaviour by wrapping a value in defined quote characters so that on read it can parse it correctly. +* `escape_char` : An escape character can be used to prefix a reserved character and allow correct parsing. +* `encoding` : Some data may use a different character set \(typically when different alphabets are involved\). See the [list of allowable encodings here](https://docs.python.org/3/library/codecs.html#standard-encodings). +* `double_quote` : Whether two quotes in a quoted CSV value denote a single quote in the data. +* `newlines_in_values` : Sometimes referred to as `multiline`. In most cases, newline characters signal the end of a row in a CSV, however text data may contain newline characters within it. Setting this to True allows correct parsing in this case. +* `block_size` : This is the number of bytes to process in memory at a time while reading files. The default value here is usually fine but if your table is particularly wide \(lots of columns / data in fields is large\) then raising this might solve failures on detecting schema. Since this defines how much data to read into memory, raising this too high could cause Out Of Memory issues so use with caution. -The final setting in the UI is `additional_reader_options`. This is a catch-all to allow for editing the less commonly required CSV parsing options. The value must be a valid JSON string, e.g.: +The final setting in the UI is `additional_reader_options`. This is a catch-all to allow for editing the less commonly required CSV parsing options. The value must be a valid JSON string, e.g.: - {"timestamp_parsers": ["%m/%d/%Y %H:%M", "%Y/%m/%d %H:%M"], "strings_can_be_null": true, "null_values": ["NA", "NULL"]} +```text +{"timestamp_parsers": ["%m/%d/%Y %H:%M", "%Y/%m/%d %H:%M"], "strings_can_be_null": true, "null_values": ["NA", "NULL"]} +``` You can find details on [available options here](https://arrow.apache.org/docs/python/generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions). #### Parquet -Apache Parquet file is a column-oriented data storage format of the Apache Hadoop ecosystem. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. For now this solutiion are iterating through individual files at the abstract-level thus partitioned parquet datasets are unsupported. -The following settings are available: -- `buffer_size` : If positive, perform read buffering when deserializing individual column chunks. Otherwise IO calls are unbuffered. -- `columns` : If not None, only these columns will be read from the file. -- `batch_size` : Maximum number of records per batch. Batches may be smaller if there aren’t enough rows in the file. +Apache Parquet file is a column-oriented data storage format of the Apache Hadoop ecosystem. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. For now this solutiion are iterating through individual files at the abstract-level thus partitioned parquet datasets are unsupported. The following settings are available: +* `buffer_size` : If positive, perform read buffering when deserializing individual column chunks. Otherwise IO calls are unbuffered. +* `columns` : If not None, only these columns will be read from the file. +* `batch_size` : Maximum number of records per batch. Batches may be smaller if there aren’t enough rows in the file. You can find details on [here](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetFile.html#pyarrow.parquet.ParquetFile.iter_batches). ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.5 | 2021-09-24 | [6398](https://github.com/airbytehq/airbyte/pull/6398) | Support custom non Amazon S3 services | -| 0.1.4 | 2021-08-13 | [5305](https://github.com/airbytehq/airbyte/pull/5305) | Support of Parquet format | -| 0.1.3 | 2021-08-04 | [5197](https://github.com/airbytehq/airbyte/pull/5197) | Fixed bug where sync could hang indefinitely on schema inference | -| 0.1.2 | 2021-08-02 | [5135](https://github.com/airbytehq/airbyte/pull/5135) | Fixed bug in spec so it displays in UI correctly | -| 0.1.1 | 2021-07-30 | [4990](https://github.com/airbytehq/airbyte/pull/4990/commits/ff5f70662c5f84eabc03526cddfcc9d73c58c0f4) | Fixed documentation url in source definition | -| 0.1.0 | 2021-07-30 | [4990](https://github.com/airbytehq/airbyte/pull/4990) | Created S3 source connector | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.5 | 2021-09-24 | [6398](https://github.com/airbytehq/airbyte/pull/6398) | Support custom non Amazon S3 services | +| 0.1.4 | 2021-08-13 | [5305](https://github.com/airbytehq/airbyte/pull/5305) | Support of Parquet format | +| 0.1.3 | 2021-08-04 | [5197](https://github.com/airbytehq/airbyte/pull/5197) | Fixed bug where sync could hang indefinitely on schema inference | +| 0.1.2 | 2021-08-02 | [5135](https://github.com/airbytehq/airbyte/pull/5135) | Fixed bug in spec so it displays in UI correctly | +| 0.1.1 | 2021-07-30 | [4990](https://github.com/airbytehq/airbyte/pull/4990/commits/ff5f70662c5f84eabc03526cddfcc9d73c58c0f4) | Fixed documentation url in source definition | +| 0.1.0 | 2021-07-30 | [4990](https://github.com/airbytehq/airbyte/pull/4990) | Created S3 source connector | + diff --git a/docs/integrations/sources/salesforce.md b/docs/integrations/sources/salesforce.md index ce9338dda51..64b27d5c5bf 100644 --- a/docs/integrations/sources/salesforce.md +++ b/docs/integrations/sources/salesforce.md @@ -45,7 +45,7 @@ If you log in using at [https://login.salesforce.com](https://login.salesforce.c ## Streams -**Note**: The connector supports reading not only standard streams (listed below), but also reading `Custom Objects`. +**Note**: The connector supports reading not only standard streams \(listed below\), but also reading `Custom Objects`. List of available streams: @@ -730,11 +730,11 @@ List of available streams: * TaskWhoRelation * UndecidedEventRelation - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.2 | 2021-09-30 | [6438](https://github.com/airbytehq/airbyte/pull/6438) | Annotate Oauth2 flow initialization parameters in connector specification | -| 0.1.1 | 2021-09-21 | [6209](https://github.com/airbytehq/airbyte/pull/6209) | Fix bug with pagination for BULK API | -| 0.1.0 | 2021-09-08 | [5619](https://github.com/airbytehq/airbyte/pull/5619) | Salesforce Aitbyte-Native Connector | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.2 | 2021-09-30 | [6438](https://github.com/airbytehq/airbyte/pull/6438) | Annotate Oauth2 flow initialization parameters in connector specification | +| 0.1.1 | 2021-09-21 | [6209](https://github.com/airbytehq/airbyte/pull/6209) | Fix bug with pagination for BULK API | +| 0.1.0 | 2021-09-08 | [5619](https://github.com/airbytehq/airbyte/pull/5619) | Salesforce Aitbyte-Native Connector | + diff --git a/docs/integrations/sources/sap-business-one.md b/docs/integrations/sources/sap-business-one.md index 3d2e2e53c3a..bfd3ac62e3b 100644 --- a/docs/integrations/sources/sap-business-one.md +++ b/docs/integrations/sources/sap-business-one.md @@ -1,15 +1,16 @@ # SAP Business One -[SAP Business One](https://www.sap.com/products/business-one.html) is an Enterprise Resource Planning (ERP) system. +[SAP Business One](https://www.sap.com/products/business-one.html) is an Enterprise Resource Planning \(ERP\) system. ## Sync overview SAP Business One can run on the MSSQL or SAP HANA databases. If your instance is deployed on MSSQL, you can use Airbyte to sync your SAP Business One instance by using the [MSSQL connector](mssql.md). {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} - ### Output schema -The schema will be loaded according to the rules of the underlying database's connector and the data available in your B1 instance. + +The schema will be loaded according to the rules of the underlying database's connector and the data available in your B1 instance. + diff --git a/docs/integrations/sources/sendgrid.md b/docs/integrations/sources/sendgrid.md index 05df604bf67..41b2088c842 100644 --- a/docs/integrations/sources/sendgrid.md +++ b/docs/integrations/sources/sendgrid.md @@ -43,7 +43,8 @@ Generate a API key using the [Sendgrid documentation](https://sendgrid.com/docs/ We recommend creating a key specifically for Airbyte access. This will allow you to control which resources Airbyte should be able to access. The API key should be read-only on all resources except Marketing, where it needs Full Access. -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.7 | 2021-09-08 | [5910](https://github.com/airbytehq/airbyte/pull/5910) | Add Single Sends Stats stream | -| 0.2.6 | 2021-07-19 | [4839](https://github.com/airbytehq/airbyte/pull/4839) | Gracefully handle malformed responses from the API | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.7 | 2021-09-08 | [5910](https://github.com/airbytehq/airbyte/pull/5910) | Add Single Sends Stats stream | +| 0.2.6 | 2021-07-19 | [4839](https://github.com/airbytehq/airbyte/pull/4839) | Gracefully handle malformed responses from the API | + diff --git a/docs/integrations/sources/shopify.md b/docs/integrations/sources/shopify.md index 6cc0ee68d24..cee1f238aba 100644 --- a/docs/integrations/sources/shopify.md +++ b/docs/integrations/sources/shopify.md @@ -31,8 +31,10 @@ This Source is capable of syncing the following core Streams: * [Pages](https://help.shopify.com/en/api/reference/online-store/page) * [Price Rules](https://help.shopify.com/en/api/reference/discounts/pricerule) -#### NOTE: +#### NOTE: + For better experience with `Incremental Refresh` the follwing is recomended: + * `Order Refunds`, `Order Risks`, `Transactions` should be synced along with `Orders` stream. * `Discount Codes` should be synced along with `Price Rules` stream. @@ -57,13 +59,13 @@ If child streams are synced alone from the parent stream - the full sync will ta ### Performance considerations -Shopify has some [rate limit restrictions](https://shopify.dev/concepts/about-apis/rate-limits). -Typically, there should not be issues with throttling or exceeding the rate limits but in some edge cases, user can receive the warning message as follows: -``` +Shopify has some [rate limit restrictions](https://shopify.dev/concepts/about-apis/rate-limits). Typically, there should not be issues with throttling or exceeding the rate limits but in some edge cases, user can receive the warning message as follows: + +```text "Caught retryable error ' or null' after tries. Waiting seconds then retrying..." ``` -This is expected when the connector hits the 429 - Rate Limit Exceeded HTTP Error. -With given error message the sync operation is still goes on, but will require more time to finish. + +This is expected when the connector hits the 429 - Rate Limit Exceeded HTTP Error. With given error message the sync operation is still goes on, but will require more time to finish. ## Getting started @@ -75,24 +77,24 @@ With given error message the sync operation is still goes on, but will require m 5. The password under the `Admin API` section is what you'll use as the `api_password` for the integration. 6. You're ready to set up Shopify in Airbyte! - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.18 | 2021-09-21 | [6056](https://github.com/airbytehq/airbyte/pull/6056) | Added `pre_tax_price` to the `orders/line_items` schema | -| 0.1.17 | 2021-09-17 | [5244](https://github.com/airbytehq/airbyte/pull/5244) | Created data type enforcer for converting prices into numbers | -| 0.1.16 | 2021-09-09 | [5965](https://github.com/airbytehq/airbyte/pull/5945) | Fixed the connector's performance for `Incremental refresh` | -| 0.1.15 | 2021-09-02 | [5853](https://github.com/airbytehq/airbyte/pull/5853) | Fixed `amount` type in `order_refund` schema | -| 0.1.14 | 2021-09-02 | [5801](https://github.com/airbytehq/airbyte/pull/5801) | Fixed `line_items/discount allocations` & `duties` parts of `orders` schema | -| 0.1.13 | 2021-08-17 | [5470](https://github.com/airbytehq/airbyte/pull/5470) | Fixed rate limits throttling | -| 0.1.12 | 2021-08-09 | [5276](https://github.com/airbytehq/airbyte/pull/5276) | Add status property to product schema | -| 0.1.11 | 2021-07-23 | [4943](https://github.com/airbytehq/airbyte/pull/4943) | Fix products schema up to API 2021-07 | -| 0.1.10 | 2021-07-19 | [4830](https://github.com/airbytehq/airbyte/pull/4830) | Fix for streams json schemas, upgrade to API version 2021-07 | -| 0.1.9 | 2021-07-04 | [4472](https://github.com/airbytehq/airbyte/pull/4472) | Incremental sync is now using updated_at instead of since_id by default | -| 0.1.8 | 2021-06-29 | [4121](https://github.com/airbytehq/airbyte/pull/4121) | Add draft orders stream | -| 0.1.7 | 2021-06-26 | [4290](https://github.com/airbytehq/airbyte/pull/4290) | Fixed the bug when limiting output records to 1 caused infinity loop | -| 0.1.6 | 2021-06-24 | [4009](https://github.com/airbytehq/airbyte/pull/4009) | Add pages, price rules and discount codes streams | -| 0.1.5 | 2021-06-10 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | -| 0.1.4 | 2021-06-09 | [3926](https://github.com/airbytehq/airbyte/pull/3926) | New attributes to Orders schema | -| 0.1.3 | 2021-06-08 | [3787](https://github.com/airbytehq/airbyte/pull/3787) | Add Native Shopify Source Connector | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.18 | 2021-09-21 | [6056](https://github.com/airbytehq/airbyte/pull/6056) | Added `pre_tax_price` to the `orders/line_items` schema | +| 0.1.17 | 2021-09-17 | [5244](https://github.com/airbytehq/airbyte/pull/5244) | Created data type enforcer for converting prices into numbers | +| 0.1.16 | 2021-09-09 | [5965](https://github.com/airbytehq/airbyte/pull/5945) | Fixed the connector's performance for `Incremental refresh` | +| 0.1.15 | 2021-09-02 | [5853](https://github.com/airbytehq/airbyte/pull/5853) | Fixed `amount` type in `order_refund` schema | +| 0.1.14 | 2021-09-02 | [5801](https://github.com/airbytehq/airbyte/pull/5801) | Fixed `line_items/discount allocations` & `duties` parts of `orders` schema | +| 0.1.13 | 2021-08-17 | [5470](https://github.com/airbytehq/airbyte/pull/5470) | Fixed rate limits throttling | +| 0.1.12 | 2021-08-09 | [5276](https://github.com/airbytehq/airbyte/pull/5276) | Add status property to product schema | +| 0.1.11 | 2021-07-23 | [4943](https://github.com/airbytehq/airbyte/pull/4943) | Fix products schema up to API 2021-07 | +| 0.1.10 | 2021-07-19 | [4830](https://github.com/airbytehq/airbyte/pull/4830) | Fix for streams json schemas, upgrade to API version 2021-07 | +| 0.1.9 | 2021-07-04 | [4472](https://github.com/airbytehq/airbyte/pull/4472) | Incremental sync is now using updated\_at instead of since\_id by default | +| 0.1.8 | 2021-06-29 | [4121](https://github.com/airbytehq/airbyte/pull/4121) | Add draft orders stream | +| 0.1.7 | 2021-06-26 | [4290](https://github.com/airbytehq/airbyte/pull/4290) | Fixed the bug when limiting output records to 1 caused infinity loop | +| 0.1.6 | 2021-06-24 | [4009](https://github.com/airbytehq/airbyte/pull/4009) | Add pages, price rules and discount codes streams | +| 0.1.5 | 2021-06-10 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | +| 0.1.4 | 2021-06-09 | [3926](https://github.com/airbytehq/airbyte/pull/3926) | New attributes to Orders schema | +| 0.1.3 | 2021-06-08 | [3787](https://github.com/airbytehq/airbyte/pull/3787) | Add Native Shopify Source Connector | + diff --git a/docs/integrations/sources/shortio.md b/docs/integrations/sources/shortio.md index fffe153eb1b..d77b6171ef3 100644 --- a/docs/integrations/sources/shortio.md +++ b/docs/integrations/sources/shortio.md @@ -1,4 +1,4 @@ -# Short.io +# Shortio ## Sync overview @@ -41,6 +41,7 @@ This Source is capable of syncing the following Streams: ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-08-16 | [3787](https://github.com/airbytehq/airbyte/pull/5418) | Add Native Shortio Source Connector | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-08-16 | [3787](https://github.com/airbytehq/airbyte/pull/5418) | Add Native Shortio Source Connector | + diff --git a/docs/integrations/sources/slack.md b/docs/integrations/sources/slack.md index 54b321e1037..57babfdf708 100644 --- a/docs/integrations/sources/slack.md +++ b/docs/integrations/sources/slack.md @@ -47,6 +47,7 @@ The Slack connector should not run into Slack API limitations under normal usage #### Slack connector can be connected using two types of authentication: OAuth2.0 or API Token #### Using OAuth2.0 authenticator + * Client ID - issued when you created your app * Client Secret - issued when you created your app * Refresh Token - a special kind of token used to obtain a renewed access token @@ -54,6 +55,7 @@ The Slack connector should not run into Slack API limitations under normal usage You can get more detailed information about this type of authentication by reading [Slack's documentation about OAuth2.0](https://api.slack.com/authentication/oauth-v2). #### Using API Token + * Slack API Token ### Setup guide @@ -107,11 +109,12 @@ We recommend creating a restricted, read-only key specifically for Airbyte acces ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.12 | 2021-10-07 | [6570](https://github.com/airbytehq/airbyte/pull/6570) | Implement OAuth support with OAuth authenticator | -| 0.1.11 | 2021-08-27 | [5830](https://github.com/airbytehq/airbyte/pull/5830) | Fixed sync operations hang forever issue | -| 0.1.10 | 2021-08-27 | [5697](https://github.com/airbytehq/airbyte/pull/5697) | Fixed max retries issue | -| 0.1.9 | 2021-07-20 | [4860](https://github.com/airbytehq/airbyte/pull/4860) | Fixed reading threads issue | -| 0.1.8 | 2021-07-14 | [4683](https://github.com/airbytehq/airbyte/pull/4683) | Add float_ts primary key | -| 0.1.7 | 2021-06-25 | [3978](https://github.com/airbytehq/airbyte/pull/3978) | Release Slack CDK Connector | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.12 | 2021-10-07 | [6570](https://github.com/airbytehq/airbyte/pull/6570) | Implement OAuth support with OAuth authenticator | +| 0.1.11 | 2021-08-27 | [5830](https://github.com/airbytehq/airbyte/pull/5830) | Fixed sync operations hang forever issue | +| 0.1.10 | 2021-08-27 | [5697](https://github.com/airbytehq/airbyte/pull/5697) | Fixed max retries issue | +| 0.1.9 | 2021-07-20 | [4860](https://github.com/airbytehq/airbyte/pull/4860) | Fixed reading threads issue | +| 0.1.8 | 2021-07-14 | [4683](https://github.com/airbytehq/airbyte/pull/4683) | Add float\_ts primary key | +| 0.1.7 | 2021-06-25 | [3978](https://github.com/airbytehq/airbyte/pull/3978) | Release Slack CDK Connector | + diff --git a/docs/integrations/sources/smartsheets.md b/docs/integrations/sources/smartsheets.md index de514c550be..0670eb8f154 100644 --- a/docs/integrations/sources/smartsheets.md +++ b/docs/integrations/sources/smartsheets.md @@ -25,24 +25,24 @@ The data type mapping adopted by this connector is based on the Smartsheet [docu **NOTE**: For any column datatypes interpreted by Smartsheets beside `DATE` and `DATETIME`, this connector's source schema generation assumes a `string` type, in which case the `format` field is not required by Airbyte. - | Integration Type | Airbyte Type | Airbyte Format | - | :--- | :--- | :--- | - | `TEXT_NUMBER` | `string` | | - | `DATE` | `string` | `format: date` | - | `DATETIME` | `string` | `format: date-time` | - | `anything else` | `string` | | +| Integration Type | Airbyte Type | Airbyte Format | +| :--- | :--- | :--- | +| `TEXT_NUMBER` | `string` | | +| `DATE` | `string` | `format: date` | +| `DATETIME` | `string` | `format: date-time` | +| `anything else` | `string` | | -The remaining column datatypes supported by Smartsheets are more complex types (e.g. Predecessor, Dropdown List) and are not supported by this connector beyond its `string` representation. +The remaining column datatypes supported by Smartsheets are more complex types \(e.g. Predecessor, Dropdown List\) and are not supported by this connector beyond its `string` representation. ### Features This source connector only supports Full Refresh Sync. Since Smartsheets only allows 5000 rows per sheet, it's likely that the Full Refresh Sync Mode will suit the majority of use-cases. - | Feature | Supported?| - | :--- | :--- | - | Full Refresh Sync |Yes | - | Incremental Sync |No | - | Namespaces |No | +| Feature | Supported? | +| :--- | :--- | +| Full Refresh Sync | Yes | +| Incremental Sync | No | +| Namespaces | No | ### Performance considerations @@ -63,7 +63,7 @@ To configure the Smartsheet Source for syncs, you'll need the following: You can generate an API key for your account from a session of your Smartsheet webapp by clicking: -* Account (top-right icon) +* Account \(top-right icon\) * Apps & Integrations * API Access * Generate new access token diff --git a/docs/integrations/sources/snapchat-marketing.md b/docs/integrations/sources/snapchat-marketing.md index 23acae47299..4fdbb93f588 100644 --- a/docs/integrations/sources/snapchat-marketing.md +++ b/docs/integrations/sources/snapchat-marketing.md @@ -1,28 +1,29 @@ -# Snapchat Marketing API +# Snapchat Marketing ## Overview The Snapchat Marketing source can sync data from the [Snapchat Marketing API](https://marketingapi.snapchat.com/docs/) Useful links: -- [Snapchat Ads Manager](https://ads.snapchat.com/) -- [Snapchat API Docs](https://marketingapi.snapchat.com/docs/) -- [Snapchat API FAQ](https://businesshelp.snapchat.com/s/article/api-faq?language=en_US) -- [Set up Snapchat Business account](https://businesshelp.snapchat.com/s/article/get-started?language=en_US) -- [Activate Access to the Snapchat Marketing API](https://businesshelp.snapchat.com/s/article/api-apply?language=en_US) + +* [Snapchat Ads Manager](https://ads.snapchat.com/) +* [Snapchat API Docs](https://marketingapi.snapchat.com/docs/) +* [Snapchat API FAQ](https://businesshelp.snapchat.com/s/article/api-faq?language=en_US) +* [Set up Snapchat Business account](https://businesshelp.snapchat.com/s/article/get-started?language=en_US) +* [Activate Access to the Snapchat Marketing API](https://businesshelp.snapchat.com/s/article/api-apply?language=en_US) #### Output schema This Source is capable of syncing the following Streams: -- [Organization](https://marketingapi.snapchat.com/docs/#organizations) -- [Ad Account](https://marketingapi.snapchat.com/docs/#get-all-ad-accounts) (Incremental) -- [Creative](https://marketingapi.snapchat.com/docs/#get-all-creatives) (Incremental) -- [Media](https://marketingapi.snapchat.com/docs/#get-all-media) (Incremental) -- [Campaign](https://marketingapi.snapchat.com/docs/#get-all-campaigns) (Incremental) -- [Ad](https://marketingapi.snapchat.com/docs/#get-all-ads-under-an-ad-account) (Incremental) -- [Ad Squad](https://marketingapi.snapchat.com/docs/#get-all-ad-squads-under-an-ad-account) (Incremental) -- [Segments](https://marketingapi.snapchat.com/docs/#get-all-audience-segments) (Incremental) +* [Organization](https://marketingapi.snapchat.com/docs/#organizations) +* [Ad Account](https://marketingapi.snapchat.com/docs/#get-all-ad-accounts) \(Incremental\) +* [Creative](https://marketingapi.snapchat.com/docs/#get-all-creatives) \(Incremental\) +* [Media](https://marketingapi.snapchat.com/docs/#get-all-media) \(Incremental\) +* [Campaign](https://marketingapi.snapchat.com/docs/#get-all-campaigns) \(Incremental\) +* [Ad](https://marketingapi.snapchat.com/docs/#get-all-ads-under-an-ad-account) \(Incremental\) +* [Ad Squad](https://marketingapi.snapchat.com/docs/#get-all-ad-squads-under-an-ad-account) \(Incremental\) +* [Segments](https://marketingapi.snapchat.com/docs/#get-all-audience-segments) \(Incremental\) #### Data type mapping @@ -45,30 +46,29 @@ This Source is capable of syncing the following Streams: ### Requirements -* client_id - Snapchat account client ID -* client_secret - Snapchat account client secret -* refresh_token - Snapchat account refresh token +* client\_id - Snapchat account client ID +* client\_secret - Snapchat account client secret +* refresh\_token - Snapchat account refresh token ### Setup guide -To get the required credentials you need to set up a snapchat business account. -Follow this guide to set up one: +To get the required credentials you need to set up a snapchat business account. Follow this guide to set up one: + * [Set up Snapchat Business account](https://businesshelp.snapchat.com/s/article/get-started?language=en_US) * After that - [Activate Access to the Snapchat Marketing API](https://businesshelp.snapchat.com/s/article/api-apply?language=en_US) * Adding the OAuth2 app requires the `redirect_url` parameter. If you have the API endpoint that will handle next OAuth process - write it to this parameter. -If not - just use some valid url. Here's the discussion about it: [Snapchat Redirect URL - Clarity in documentation please](https://github.com/Snap-Kit/bitmoji-sample/issues/3) + + If not - just use some valid url. Here's the discussion about it: [Snapchat Redirect URL - Clarity in documentation please](https://github.com/Snap-Kit/bitmoji-sample/issues/3) + * On this step you will retrieve **Client ID** and **Client Secret** carefully save **Client Secret** - you cannot view it in UI, only by regenerating -Snapchat uses OAuth2 authentication, so to get the refresh token the workflow in next: -1. Open the authorize link in a browser: - https://accounts.snapchat.com/login/oauth2/authorize?response_type=code&client_id={client_id}&redirect_uri={redirect_uri}&scope=snapchat-marketing-api&state=wmKkg0TWgppW8PTBZ20sldUmF7hwvU +Snapchat uses OAuth2 authentication, so to get the refresh token the workflow in next: 1. Open the authorize link in a browser: [https://accounts.snapchat.com/login/oauth2/authorize?response\_type=code&client\_id={client\_id}&redirect\_uri={redirect\_uri}&scope=snapchat-marketing-api&state=wmKkg0TWgppW8PTBZ20sldUmF7hwvU](https://accounts.snapchat.com/login/oauth2/authorize?response_type=code&client_id={client_id}&redirect_uri={redirect_uri}&scope=snapchat-marketing-api&state=wmKkg0TWgppW8PTBZ20sldUmF7hwvU) -2. Login & Authorize via UI +1. Login & Authorize via UI +2. Locate "code" query parameter in the redirect +3. Exchange code for access token + refresh token -3. Locate "code" query parameter in the redirect - -4. Exchange code for access token + refresh token - ``` + ```text curl -X POST \ -d "code={one_time_use_code}" \ -d "client_id={client_id}" \ @@ -78,8 +78,8 @@ Snapchat uses OAuth2 authentication, so to get the refresh token the workflow in https://accounts.snapchat.com/login/oauth2/access_token ``` -You will receive the API key and refresh token in response. Use this refresh token in the connector specifications. -The useful link to Authentication process is [here](https://marketingapi.snapchat.com/docs/#authentication) +You will receive the API key and refresh token in response. Use this refresh token in the connector specifications. +The useful link to Authentication process is [here](https://marketingapi.snapchat.com/docs/#authentication) ## Performance considerations @@ -87,7 +87,8 @@ Snapchat Marketing API has limitations to 1000 items per page ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-07-26 | [4843](https://github.com/airbytehq/airbyte/pull/4843) | Initial release supporting the Snapchat Marketing API | -| 0.1.1 | 2021-07-29 | [5072](https://github.com/airbytehq/airbyte/pull/5072) | Fix bug with incorrect stream_state value | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-07-26 | [4843](https://github.com/airbytehq/airbyte/pull/4843) | Initial release supporting the Snapchat Marketing API | +| 0.1.1 | 2021-07-29 | [5072](https://github.com/airbytehq/airbyte/pull/5072) | Fix bug with incorrect stream\_state value | + diff --git a/docs/integrations/sources/snowflake.md b/docs/integrations/sources/snowflake.md index b04ea2135a0..7fd16511b83 100644 --- a/docs/integrations/sources/snowflake.md +++ b/docs/integrations/sources/snowflake.md @@ -2,8 +2,7 @@ ## Overview -The Snowflake source allows you to sync data from Snowflake. -It supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. +The Snowflake source allows you to sync data from Snowflake. It supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. This Snowflake source connector is built on top of the source-jdbc code base and is configured to rely on JDBC 3.12.14 [Snowflake driver](https://github.com/snowflakedb/snowflake-jdbc) as described in Snowflake [documentation](https://docs.snowflake.com/en/user-guide/jdbc.html). @@ -24,18 +23,17 @@ The Snowflake source does not alter the schema present in your warehouse. Depend ### Requirements 1. You'll need the following information to configure the Snowflake source: - -* **Host** -* **Role** -* **Warehouse** -* **Database** -* **Schema** -* **Username** -* **Password** - -2. Create a dedicated read-only Airbyte user and role with access to all schemas needed for replication. +2. **Host** +3. **Role** +4. **Warehouse** +5. **Database** +6. **Schema** +7. **Username** +8. **Password** +9. Create a dedicated read-only Airbyte user and role with access to all schemas needed for replication. ### Setup guide + #### 1. Additional information about Snowflake connection parameters could be found [here](https://docs.snowflake.com/en/user-guide/jdbc-configure.html#connection-parameters). #### 2. Create a dedicated read-only user with access to the relevant schemas \(Recommended but optional\) @@ -73,9 +71,9 @@ You can limit this grant down to specific schemas instead of the whole database. Your database user should now be ready for use with Airbyte. - ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-08-13 | [4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator | + diff --git a/docs/integrations/sources/spree-commerce.md b/docs/integrations/sources/spree-commerce.md index 76359780f72..7e582a78565 100644 --- a/docs/integrations/sources/spree-commerce.md +++ b/docs/integrations/sources/spree-commerce.md @@ -4,15 +4,16 @@ ## Sync overview -Spree Commerce can run on the MySQL or Postgres databases. You can use Airbyte to sync your Spree Commerce instance by connecting to the underlying database using the appropriate Airbyte connector: +Spree Commerce can run on the MySQL or Postgres databases. You can use Airbyte to sync your Spree Commerce instance by connecting to the underlying database using the appropriate Airbyte connector: * [MySQL](mysql.md) * [Postgres](postgres.md) {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} - ### Output schema -The Spree Commerce schema is described in the [Spree Internals](https://dev-docs.spreecommerce.org/internals/) section of the Spree docs. Otherwise, the schema will follow the rules of the MySQL or Postgres connectors. + +The Spree Commerce schema is described in the [Spree Internals](https://dev-docs.spreecommerce.org/internals/) section of the Spree docs. Otherwise, the schema will follow the rules of the MySQL or Postgres connectors. + diff --git a/docs/integrations/sources/square.md b/docs/integrations/sources/square.md index 6bef3160a5f..4ef083eeb23 100644 --- a/docs/integrations/sources/square.md +++ b/docs/integrations/sources/square.md @@ -1,31 +1,32 @@ -# Square API +# Square ## Overview The Square Source can sync data from the [Square API](https://developer.squareup.com/reference/square) Useful links: -- [Square API Explorer](https://developer.squareup.com/explorer/square) -- [Square API Docs](https://developer.squareup.com/reference/square) -- [Square Developer Dashboard](https://developer.squareup.com/apps) + +* [Square API Explorer](https://developer.squareup.com/explorer/square) +* [Square API Docs](https://developer.squareup.com/reference/square) +* [Square Developer Dashboard](https://developer.squareup.com/apps) #### Output schema This Source is capable of syncing the following Streams: -- [Items](https://developer.squareup.com/explorer/square/catalog-api/search-catalog-objects) (Incremental) -- [Categories](https://developer.squareup.com/explorer/square/catalog-api/search-catalog-objects) (Incremental) -- [Discounts](https://developer.squareup.com/explorer/square/catalog-api/search-catalog-objects) (Incremental) -- [Taxes](https://developer.squareup.com/explorer/square/catalog-api/search-catalog-objects) (Incremental) -- [ModifierLists](https://developer.squareup.com/explorer/square/catalog-api/search-catalog-objects) (Incremental) -- [Payments](https://developer.squareup.com/reference/square_2021-06-16/payments-api/list-payments) (Incremental) -- [Refunds](https://developer.squareup.com/reference/square_2021-06-16/refunds-api/list-payment-refunds) (Incremental) -- [Locations](https://developer.squareup.com/explorer/square/locations-api/list-locations) -- [Team Members](https://developer.squareup.com/reference/square_2021-06-16/team-api/search-team-members) (old V1 Employees API) -- [List Team Member Wages](https://developer.squareup.com/explorer/square/labor-api/list-team-member-wages) (old V1 Roles API) -- [Customers](https://developer.squareup.com/explorer/square/customers-api/list-customers) -- [Shifts](https://developer.squareup.com/reference/square/labor-api/search-shifts) -- [Orders](https://developer.squareup.com/reference/square/orders-api/search-orders) +* [Items](https://developer.squareup.com/explorer/square/catalog-api/search-catalog-objects) \(Incremental\) +* [Categories](https://developer.squareup.com/explorer/square/catalog-api/search-catalog-objects) \(Incremental\) +* [Discounts](https://developer.squareup.com/explorer/square/catalog-api/search-catalog-objects) \(Incremental\) +* [Taxes](https://developer.squareup.com/explorer/square/catalog-api/search-catalog-objects) \(Incremental\) +* [ModifierLists](https://developer.squareup.com/explorer/square/catalog-api/search-catalog-objects) \(Incremental\) +* [Payments](https://developer.squareup.com/reference/square_2021-06-16/payments-api/list-payments) \(Incremental\) +* [Refunds](https://developer.squareup.com/reference/square_2021-06-16/refunds-api/list-payment-refunds) \(Incremental\) +* [Locations](https://developer.squareup.com/explorer/square/locations-api/list-locations) +* [Team Members](https://developer.squareup.com/reference/square_2021-06-16/team-api/search-team-members) \(old V1 Employees API\) +* [List Team Member Wages](https://developer.squareup.com/explorer/square/labor-api/list-team-member-wages) \(old V1 Roles API\) +* [Customers](https://developer.squareup.com/explorer/square/customers-api/list-customers) +* [Shifts](https://developer.squareup.com/reference/square/labor-api/search-shifts) +* [Orders](https://developer.squareup.com/reference/square/orders-api/search-orders) #### Data type mapping @@ -47,37 +48,35 @@ This Source is capable of syncing the following Streams: ### Requirements -* api_key - The Square API key token -* is_sandbox - the switch between sandbox (true) and production (false) environments +* api\_key - The Square API key token +* is\_sandbox - the switch between sandbox \(true\) and production \(false\) environments ### Setup guide -To get the API key for your square application follow [Geting started](https://developer.squareup.com/docs/get-started) -and [Access token](https://developer.squareup.com/docs/build-basics/access-tokens) guides +To get the API key for your square application follow [Geting started](https://developer.squareup.com/docs/get-started) and [Access token](https://developer.squareup.com/docs/build-basics/access-tokens) guides ## Performance considerations -No defined API rate limits were found in Square documentation however considering -[this information](https://stackoverflow.com/questions/28033966/whats-the-rate-limit-on-the-square-connect-api/28053836#28053836) -it has 10 QPS limits. The connector doesn't handle rate limits exceptions, but no errors were raised during testing. +No defined API rate limits were found in Square documentation however considering [this information](https://stackoverflow.com/questions/28033966/whats-the-rate-limit-on-the-square-connect-api/28053836#28053836) it has 10 QPS limits. The connector doesn't handle rate limits exceptions, but no errors were raised during testing. -Some Square API endpoints has different page size limitation +Some Square API endpoints has different page size limitation -- Items - 1000 -- Categories - 1000 -- Discounts - 1000 -- Taxes - 1000 -- ModifierLists - 1000 -- Payments - 100 -- Refunds - 100 -- TeamMembers - 100 -- ListTeamMemberWages - 200 -- Shifts - 200 -- Orders - 500 +* Items - 1000 +* Categories - 1000 +* Discounts - 1000 +* Taxes - 1000 +* ModifierLists - 1000 +* Payments - 100 +* Refunds - 100 +* TeamMembers - 100 +* ListTeamMemberWages - 200 +* Shifts - 200 +* Orders - 500 ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-07-09 | [4645](https://github.com/airbytehq/airbyte/pull/4645) | Update _send_request method due to Airbyte CDK changes | -| 0.1.0 | 2021-06-30 | [4439](https://github.com/airbytehq/airbyte/pull/4439) | Initial release supporting the Square API | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-07-09 | [4645](https://github.com/airbytehq/airbyte/pull/4645) | Update \_send\_request method due to Airbyte CDK changes | +| 0.1.0 | 2021-06-30 | [4439](https://github.com/airbytehq/airbyte/pull/4439) | Initial release supporting the Square API | + diff --git a/docs/integrations/sources/stripe.md b/docs/integrations/sources/stripe.md index 3e498ee2763..c5411102e1c 100644 --- a/docs/integrations/sources/stripe.md +++ b/docs/integrations/sources/stripe.md @@ -32,7 +32,7 @@ This Source is capable of syncing the following core Streams: The Stripe API does not allow querying objects which were updated since the last sync. Therefore, this connector uses the `created` field to query for new data in your Stripe account. -If your data is updated after creation, you can use the Loockback Window option when configuring the connector to always reload data from the past N days. This will allow you to pick up updates to the data. +If your data is updated after creation, you can use the Loockback Window option when configuring the connector to always reload data from the past N days. This will allow you to pick up updates to the data. ### Data type mapping @@ -69,19 +69,20 @@ If you would like to test Airbyte using test data on Stripe, `sk_test_` and `rk_ ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.21 | 2021-10-07 | [6841](https://github.com/airbytehq/airbyte/pull/6841) | Fix missing `start_date` argument + update json files for SAT | -| 0.1.20 | 2021-09-30 | [6017](https://github.com/airbytehq/airbyte/pull/6017) | Add lookback_window_days parameter | -| 0.1.19 | 2021-09-27 | [6466](https://github.com/airbytehq/airbyte/pull/6466) | Use `start_date` parameter in incremental streams | -| 0.1.18 | 2021-09-14 | [6004](https://github.com/airbytehq/airbyte/pull/6004) | Fix coupons and subscriptions stream schemas by removing incorrect timestamp formatting | -| 0.1.17 | 2021-09-14 | [6004](https://github.com/airbytehq/airbyte/pull/6004) | Add `PaymentIntents` stream | -| 0.1.16 | 2021-07-28 | [4980](https://github.com/airbytehq/airbyte/pull/4980) | Remove Updated field from schemas | -| 0.1.15 | 2021-07-21 | [4878](https://github.com/airbytehq/airbyte/pull/4878) | Fix incorrect percent_off and discounts data filed types | -| 0.1.14 | 2021-07-09 | [4669](https://github.com/airbytehq/airbyte/pull/4669) | Subscriptions Stream now returns all kinds of subscriptions (including expired and canceled) | -| 0.1.13 | 2021-07-03 | [4528](https://github.com/airbytehq/airbyte/pull/4528) | Remove regex for acc validation | -| 0.1.12 | 2021-06-08 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | -| 0.1.11 | 2021-05-30 | [3744](https://github.com/airbytehq/airbyte/pull/3744) | Fix types in schema | -| 0.1.10 | 2021-05-28 | [3728](https://github.com/airbytehq/airbyte/pull/3728) | Update data types to be number instead of int | -| 0.1.9 | 2021-05-13 | [3367](https://github.com/airbytehq/airbyte/pull/3367) | Add acceptance tests for connected accounts | -| 0.1.8 | 2021-05-11 | [3566](https://github.com/airbytehq/airbyte/pull/3368) | Bump CDK connectors | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.21 | 2021-10-07 | [6841](https://github.com/airbytehq/airbyte/pull/6841) | Fix missing `start_date` argument + update json files for SAT | +| 0.1.20 | 2021-09-30 | [6017](https://github.com/airbytehq/airbyte/pull/6017) | Add lookback\_window\_days parameter | +| 0.1.19 | 2021-09-27 | [6466](https://github.com/airbytehq/airbyte/pull/6466) | Use `start_date` parameter in incremental streams | +| 0.1.18 | 2021-09-14 | [6004](https://github.com/airbytehq/airbyte/pull/6004) | Fix coupons and subscriptions stream schemas by removing incorrect timestamp formatting | +| 0.1.17 | 2021-09-14 | [6004](https://github.com/airbytehq/airbyte/pull/6004) | Add `PaymentIntents` stream | +| 0.1.16 | 2021-07-28 | [4980](https://github.com/airbytehq/airbyte/pull/4980) | Remove Updated field from schemas | +| 0.1.15 | 2021-07-21 | [4878](https://github.com/airbytehq/airbyte/pull/4878) | Fix incorrect percent\_off and discounts data filed types | +| 0.1.14 | 2021-07-09 | [4669](https://github.com/airbytehq/airbyte/pull/4669) | Subscriptions Stream now returns all kinds of subscriptions \(including expired and canceled\) | +| 0.1.13 | 2021-07-03 | [4528](https://github.com/airbytehq/airbyte/pull/4528) | Remove regex for acc validation | +| 0.1.12 | 2021-06-08 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | +| 0.1.11 | 2021-05-30 | [3744](https://github.com/airbytehq/airbyte/pull/3744) | Fix types in schema | +| 0.1.10 | 2021-05-28 | [3728](https://github.com/airbytehq/airbyte/pull/3728) | Update data types to be number instead of int | +| 0.1.9 | 2021-05-13 | [3367](https://github.com/airbytehq/airbyte/pull/3367) | Add acceptance tests for connected accounts | +| 0.1.8 | 2021-05-11 | [3566](https://github.com/airbytehq/airbyte/pull/3368) | Bump CDK connectors | + diff --git a/docs/integrations/sources/sugar-crm.md b/docs/integrations/sources/sugar-crm.md index 5265d7b3281..9a6f5ce9f05 100644 --- a/docs/integrations/sources/sugar-crm.md +++ b/docs/integrations/sources/sugar-crm.md @@ -4,28 +4,26 @@ ## Sync overview - {% hint style="warning" %} You will only be able to connect to a self-hosted instance of Sugar CRM using these instructions. {% endhint %} -Sugar CRM can run on the MySQL, MSSQL, Oracle, or Db2 databases. You can use Airbyte to sync your Sugar CRM instance by connecting to the underlying database using the appropriate Airbyte connector: +Sugar CRM can run on the MySQL, MSSQL, Oracle, or Db2 databases. You can use Airbyte to sync your Sugar CRM instance by connecting to the underlying database using the appropriate Airbyte connector: * [DB2](db2.md) -* [MySQL](./mysql.md) -* [MSSQL](./mssql.md) +* [MySQL](mysql.md) +* [MSSQL](mssql.md) * [Oracle](oracle.md) - {% hint style="info" %} -To use Oracle or DB2, you'll require an Enterprise or Ultimate Sugar subscription. +To use Oracle or DB2, you'll require an Enterprise or Ultimate Sugar subscription. {% endhint %} {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} - - ### Output schema -To understand your Sugar CRM database schema, see the [VarDefs](https://support.sugarcrm.com/Documentation/Sugar_Developer/Sugar_Developer_Guide_11.0/Data_Framework/Vardefs/) documentation. Otherwise, the schema will be loaded according to the rules of the underlying database's connector. + +To understand your Sugar CRM database schema, see the [VarDefs](https://support.sugarcrm.com/Documentation/Sugar_Developer/Sugar_Developer_Guide_11.0/Data_Framework/Vardefs/) documentation. Otherwise, the schema will be loaded according to the rules of the underlying database's connector. + diff --git a/docs/integrations/sources/surveymonkey.md b/docs/integrations/sources/surveymonkey.md index fdacfe9c57a..74348b5ef07 100644 --- a/docs/integrations/sources/surveymonkey.md +++ b/docs/integrations/sources/surveymonkey.md @@ -2,14 +2,13 @@ ## Sync overview -This source can sync data for the [SurveyMonkey API](https://developer.surveymonkey.com/api/v3/). It supports both Full Refresh and Incremental syncs. -You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. +This source can sync data for the [SurveyMonkey API](https://developer.surveymonkey.com/api/v3/). It supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. ### Output schema This Source is capable of syncing the following core Streams: -* [Surveys](https://developer.surveymonkey.com/api/v3/#surveys) (Incremental) +* [Surveys](https://developer.surveymonkey.com/api/v3/#surveys) \(Incremental\) * [SurveyPages](https://developer.surveymonkey.com/api/v3/#surveys-id-pages) * [SurveyQuestions](https://developer.surveymonkey.com/api/v3/#surveys-id-pages-id-questions) * [SurveyResponses](https://developer.surveymonkey.com/api/v3/#survey-responses) @@ -34,6 +33,7 @@ This Source is capable of syncing the following core Streams: ### Performance considerations The SurveyMonkey API applies heavy API quotas for default private apps, which have the following limits: + * 125 requests per minute * 500 requests per day @@ -49,13 +49,12 @@ Please [create an issue](https://github.com/airbytehq/airbyte/issues) if you see ### Setup guide -Please read this [docs](https://developer.surveymonkey.com/api/v3/#getting-started). -Register your application [here](https://developer.surveymonkey.com/apps/) -Then go to Settings and copy your access token +Please read this [docs](https://developer.surveymonkey.com/api/v3/#getting-started). Register your application [here](https://developer.surveymonkey.com/apps/) Then go to Settings and copy your access token ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-09-10 | [5983](https://github.com/airbytehq/airbyte/pull/5983) | Fix caching for gzip compressed http response | -| 0.1.0 | 2021-07-06 | [4097](https://github.com/airbytehq/airbyte/pull/4097) | Initial Release | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-09-10 | [5983](https://github.com/airbytehq/airbyte/pull/5983) | Fix caching for gzip compressed http response | +| 0.1.0 | 2021-07-06 | [4097](https://github.com/airbytehq/airbyte/pull/4097) | Initial Release | + diff --git a/docs/integrations/sources/tempo.md b/docs/integrations/sources/tempo.md index 3e509eef0a4..eb61976e35a 100644 --- a/docs/integrations/sources/tempo.md +++ b/docs/integrations/sources/tempo.md @@ -13,7 +13,6 @@ This connector outputs the following streams: * Worklogs * Workload Schemes - ### Features | Feature | Supported? | @@ -34,9 +33,7 @@ If there are more endpoints you'd like Airbyte to support, please [create an iss ### Setup guide -Source Tempo is designed to interact with the data your permissions give you access to. To do so, you will need to generate a Tempo OAuth 2.0 token for an individual user. - -Go to **Tempo > Settings**, scroll down to **Data Access** and select **API integration**. - +Source Tempo is designed to interact with the data your permissions give you access to. To do so, you will need to generate a Tempo OAuth 2.0 token for an individual user. +Go to **Tempo > Settings**, scroll down to **Data Access** and select **API integration**. diff --git a/docs/integrations/sources/trello.md b/docs/integrations/sources/trello.md index 95ff1ee8523..8febd126cff 100644 --- a/docs/integrations/sources/trello.md +++ b/docs/integrations/sources/trello.md @@ -11,11 +11,11 @@ This Source Connector is based on a [Airbyte CDK](https://docs.airbyte.io/connec Several output streams are available from this source: * [Boards](https://developers.intercom.com/intercom-api-reference/reference#list-attached-segments-1) \(Full table\) - * [Actions](https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-boardid-actions-get) \(Incremental\) - * [Cards](https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-cards-get) \(Full table\) - * [Checklists](https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-checklists-get) \(Full table\) - * [Lists](https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-lists-get) \(Full table\) - * [Users](https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-members-get) \(Full table\) + * [Actions](https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-boardid-actions-get) \(Incremental\) + * [Cards](https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-cards-get) \(Full table\) + * [Checklists](https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-checklists-get) \(Full table\) + * [Lists](https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-lists-get) \(Full table\) + * [Users](https://developer.atlassian.com/cloud/trello/rest/api-group-boards/#api-boards-id-members-get) \(Full table\) If there are more endpoints you'd like Airbyte to support, please [create an issue.](https://github.com/airbytehq/airbyte/issues/new/choose) @@ -47,6 +47,7 @@ Please read [How to get your APIs Token and Key](https://developer.atlassian.com ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-08-18 | [5501](https://github.com/airbytehq/airbyte/pull/5501) | Release Trello CDK Connector | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-08-18 | [5501](https://github.com/airbytehq/airbyte/pull/5501) | Release Trello CDK Connector | + diff --git a/docs/integrations/sources/twilio.md b/docs/integrations/sources/twilio.md index 09603fdcec6..6c98ba0b914 100644 --- a/docs/integrations/sources/twilio.md +++ b/docs/integrations/sources/twilio.md @@ -2,8 +2,7 @@ ## Overview -The Twilio connector can be used to sync your Twilio data. -It supports full refresh sync for all streams and incremental sync for the Alerts, Calls, Conferences, Message Media, Messages, Recordings and Usage Records streams. +The Twilio connector can be used to sync your Twilio data. It supports full refresh sync for all streams and incremental sync for the Alerts, Calls, Conferences, Message Media, Messages, Recordings and Usage Records streams. ### Output schema @@ -65,6 +64,7 @@ See [docs](https://www.twilio.com/docs/iam/api) for more details. ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-07-02 | [4070](https://github.com/airbytehq/airbyte/pull/4070) | Native Twilio connector implemented | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-07-02 | [4070](https://github.com/airbytehq/airbyte/pull/4070) | Native Twilio connector implemented | + diff --git a/docs/integrations/sources/typeform.md b/docs/integrations/sources/typeform.md index 4530c3392e7..aef1ead8e8c 100644 --- a/docs/integrations/sources/typeform.md +++ b/docs/integrations/sources/typeform.md @@ -1,19 +1,19 @@ -# Typeform API +# Typeform ## Overview The Typeform Connector can be used to sync your [Typeform](https://developer.typeform.com/get-started/) data Useful links: -- [Token generation](https://developer.typeform.com/get-started/personal-access-token/) + +* [Token generation](https://developer.typeform.com/get-started/personal-access-token/) #### Output schema This Source is capable of syncing the following Streams: -- [Forms](https://developer.typeform.com/create/reference/retrieve-form/) (Full Refresh) -- [Responses](https://developer.typeform.com/responses/reference/retrieve-responses/) (Incremental) - +* [Forms](https://developer.typeform.com/create/reference/retrieve-form/) \(Full Refresh\) +* [Responses](https://developer.typeform.com/responses/reference/retrieve-responses/) \(Incremental\) #### Data type mapping @@ -36,34 +36,35 @@ This Source is capable of syncing the following Streams: ### Requirements * token - The Typeform API key token -* start_date - Date to start fetching Responses stream data from. +* start\_date - Date to start fetching Responses stream data from. ### Setup guide To get the API token for your application follow this [steps](https://developer.typeform.com/get-started/personal-access-token/) -- Log in to your account at Typeform. -- In the upper-right corner, in the drop-down menu next to your profile photo, click My Account. -- In the left menu, click Personal tokens. -- Click Generate a new token. -- In the Token name field, type a name for the token to help you identify it. -- Choose needed scopes (API actions this token can perform - or permissions it has). See here for more details on scopes. -- Click Generate token. +* Log in to your account at Typeform. +* In the upper-right corner, in the drop-down menu next to your profile photo, click My Account. +* In the left menu, click Personal tokens. +* Click Generate a new token. +* In the Token name field, type a name for the token to help you identify it. +* Choose needed scopes \(API actions this token can perform - or permissions it has\). See here for more details on scopes. +* Click Generate token. ## Performance considerations Typeform API page size limit per source: -- Forms - 200 -- Responses - 1000 +* Forms - 200 +* Responses - 1000 Connector performs additional API call to fetch all possible `form ids` on an account using [retrieve forms endpoint](https://developer.typeform.com/create/reference/retrieve-forms/) -API rate limits (2 requests per second): https://developer.typeform.com/get-started/#rate-limits +API rate limits \(2 requests per second\): [https://developer.typeform.com/get-started/\#rate-limits](https://developer.typeform.com/get-started/#rate-limits) ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.1 | 2021-09-06 | [5799](https://github.com/airbytehq/airbyte/pull/5799) | Add missed choices field to responses schema | -| 0.1.0 | 2021-07-10 | [4541](https://github.com/airbytehq/airbyte/pull/4541) | Initial release for Typeform API supporting Forms and Responses streams | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-09-06 | [5799](https://github.com/airbytehq/airbyte/pull/5799) | Add missed choices field to responses schema | +| 0.1.0 | 2021-07-10 | [4541](https://github.com/airbytehq/airbyte/pull/4541) | Initial release for Typeform API supporting Forms and Responses streams | + diff --git a/docs/integrations/sources/us-census.md b/docs/integrations/sources/us-census.md index 4ac80fe9e51..586bd400b78 100644 --- a/docs/integrations/sources/us-census.md +++ b/docs/integrations/sources/us-census.md @@ -6,7 +6,7 @@ This connector syncs data from the [US Census API](https://www.census.gov/data/d ### Output schema -This source always outputs a single stream, `us_census_stream`. The output of the stream depends on the configuration of the connector. +This source always outputs a single stream, `us_census_stream`. The output of the stream depends on the configuration of the connector. ### Features @@ -17,10 +17,10 @@ This source always outputs a single stream, `us_census_stream`. The output of th | SSL connection | Yes | | Namespaces | No | - ## Getting started ### Requirements + * US Census API key * US Census dataset path & query parameters @@ -28,7 +28,7 @@ This source always outputs a single stream, `us_census_stream`. The output of th Visit the [US Census API page](https://api.census.gov/data/key_signup.html) to obtain an API key. -In addition, to understand how to configure the dataset path and query parameters, follow the guide and examples in the [API documentation](https://www.census.gov/data/developers/data-sets.html). Some particularly helpful pages: +In addition, to understand how to configure the dataset path and query parameters, follow the guide and examples in the [API documentation](https://www.census.gov/data/developers/data-sets.html). Some particularly helpful pages: * [Available Datasets](https://www.census.gov/data/developers/guidance/api-user-guide.Available_Data.html) * [Core Concepts](https://www.census.gov/data/developers/guidance/api-user-guide.Core_Concepts.html) @@ -36,6 +36,7 @@ In addition, to understand how to configure the dataset path and query parameter ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-07-20 | [4228](https://github.com/airbytehq/airbyte/pull/4228) | Initial release | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-07-20 | [4228](https://github.com/airbytehq/airbyte/pull/4228) | Initial release | + diff --git a/docs/integrations/sources/woo-commerce.md b/docs/integrations/sources/woo-commerce.md new file mode 100644 index 00000000000..27fb96eb99d --- /dev/null +++ b/docs/integrations/sources/woo-commerce.md @@ -0,0 +1,2 @@ +# Woo Commerce + diff --git a/docs/integrations/sources/wordpress.md b/docs/integrations/sources/wordpress.md index ff2d4a15f59..4a3a3c27cef 100644 --- a/docs/integrations/sources/wordpress.md +++ b/docs/integrations/sources/wordpress.md @@ -4,12 +4,13 @@ ## Sync overview -Wordpress runs on a MySQL database. You can use Airbyte to sync your Wordpress instance by connecting to the underlying MySQL database and leveraging the [MySQL](./mysql.md) connector. +Wordpress runs on a MySQL database. You can use Airbyte to sync your Wordpress instance by connecting to the underlying MySQL database and leveraging the [MySQL](mysql.md) connector. {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} - ### Output schema -The output schema is the same as that of the [Wordpress Database](https://codex.wordpress.org/Database_Description) described here. + +The output schema is the same as that of the [Wordpress Database](https://codex.wordpress.org/Database_Description) described here. + diff --git a/docs/integrations/sources/zencart.md b/docs/integrations/sources/zencart.md index 28aa116de59..81b731d4d82 100644 --- a/docs/integrations/sources/zencart.md +++ b/docs/integrations/sources/zencart.md @@ -1,14 +1,16 @@ # Zencart -[Zencart](https://zen-cart.com) is an open source online store management system built on PHP, MySQL, and HTML. +[Zencart](https://zen-cart.com) is an open source online store management system built on PHP, MySQL, and HTML. ## Sync overview -Zencart runs on a MySQL database. You can use Airbyte to sync your Zencart instance by connecting to the underlying MySQL database and leveraging the [MySQL](./mysql.md) connector. +Zencart runs on a MySQL database. You can use Airbyte to sync your Zencart instance by connecting to the underlying MySQL database and leveraging the [MySQL](mysql.md) connector. {% hint style="info" %} -Reach out to your service representative or system admin to find the parameters required to connect to the underlying database +Reach out to your service representative or system admin to find the parameters required to connect to the underlying database {% endhint %} ### Output schema -The output schema is the same as that of the [Zencart Database](https://docs.zen-cart.com/dev/schema/) described here. + +The output schema is the same as that of the [Zencart Database](https://docs.zen-cart.com/dev/schema/) described here. + diff --git a/docs/integrations/sources/zendesk-chat.md b/docs/integrations/sources/zendesk-chat.md index de636cbbdb8..91e56231b6b 100644 --- a/docs/integrations/sources/zendesk-chat.md +++ b/docs/integrations/sources/zendesk-chat.md @@ -60,8 +60,9 @@ We recommend creating a restricted, read-only key specifically for Airbyte acces ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.2 | 2021-08-17 | [5476](https://github.com/airbytehq/airbyte/pull/5476) | Correct field unread to boolean type | -| 0.1.1 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | -| 0.1.0 | 2021-05-03 | [3088](https://github.com/airbytehq/airbyte/pull/3088) | Initial release | \ No newline at end of file +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.2 | 2021-08-17 | [5476](https://github.com/airbytehq/airbyte/pull/5476) | Correct field unread to boolean type | +| 0.1.1 | 2021-06-09 | [3973](https://github.com/airbytehq/airbyte/pull/3973) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | +| 0.1.0 | 2021-05-03 | [3088](https://github.com/airbytehq/airbyte/pull/3088) | Initial release | + diff --git a/docs/integrations/sources/zendesk-sunshine.md b/docs/integrations/sources/zendesk-sunshine.md index 844d6721b3a..365370ea00d 100644 --- a/docs/integrations/sources/zendesk-sunshine.md +++ b/docs/integrations/sources/zendesk-sunshine.md @@ -16,12 +16,11 @@ This Source is capable of syncing the following core Streams: * [RelationshipRecords](https://developer.zendesk.com/api-reference/custom-data/custom-objects-api/relationships/) * [ObjectTypePolicies](https://developer.zendesk.com/api-reference/custom-data/custom-objects-api/permissions/) * [Jobs](https://developer.zendesk.com/api-reference/custom-data/custom-objects-api/jobs/) + This stream is currently not available because it stores data temporary. + * [Limits](https://developer.zendesk.com/api-reference/custom-data/custom-objects-api/limits/) - - - ### Data type mapping | Integration Type | Airbyte Type | Notes | @@ -60,7 +59,7 @@ We recommend creating a restricted, read-only key specifically for Airbyte acces ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.1.0 | 2021-07-08 | [4359](https://github.com/airbytehq/airbyte/pull/4359) | Initial Release | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.0 | 2021-07-08 | [4359](https://github.com/airbytehq/airbyte/pull/4359) | Initial Release | diff --git a/docs/integrations/sources/zendesk-support.md b/docs/integrations/sources/zendesk-support.md index 510154b5944..74904a619dd 100644 --- a/docs/integrations/sources/zendesk-support.md +++ b/docs/integrations/sources/zendesk-support.md @@ -4,9 +4,8 @@ The Zendesk Support source supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. -This source can sync data for the [Zendesk Support API](https://developer.zendesk.com/api-reference/apps/apps-support-api/introduction/). -This Source Connector is based on a [Airbyte CDK](https://docs.airbyte.io/connector-development/cdk-python). -Incremental sync are implemented on API side by its filters +This source can sync data for the [Zendesk Support API](https://developer.zendesk.com/api-reference/apps/apps-support-api/introduction/). This Source Connector is based on a [Airbyte CDK](https://docs.airbyte.io/connector-development/cdk-python). Incremental sync are implemented on API side by its filters + ### Output schema This Source is capable of syncing the following core Streams: @@ -26,16 +25,20 @@ This Source is capable of syncing the following core Streams: * [Tags](https://developer.zendesk.com/rest_api/docs/support/tags) * [SLA Policies](https://developer.zendesk.com/rest_api/docs/support/sla_policies) - ### Not implemented schema - These Zendesk endpoints are available too. But syncing with them will be implemented in the future. - #### Tickets + **Not implemented schema** + + These Zendesk endpoints are available too. But syncing with them will be implemented in the future. + + **Tickets** + * [Ticket Attachments](https://developer.zendesk.com/api-reference/ticketing/tickets/ticket-attachments/) * [Ticket Requests](https://developer.zendesk.com/api-reference/ticketing/tickets/ticket-requests/) * [Ticket Metric Events](https://developer.zendesk.com/api-reference/ticketing/tickets/ticket_metric_events/) * [Ticket Activities](https://developer.zendesk.com/api-reference/ticketing/tickets/activity_stream/) * [Ticket Skips](https://developer.zendesk.com/api-reference/ticketing/tickets/ticket_skips/) - #### Help Center + **Help Center** + * [Articles](https://developer.zendesk.com/api-reference/help_center/help-center-api/articles/) * [Article Attachments](https://developer.zendesk.com/api-reference/help_center/help-center-api/article_attachments/) * [Article Comments](https://developer.zendesk.com/api-reference/help_center/help-center-api/article_comments/) @@ -57,13 +60,14 @@ This Source is capable of syncing the following core Streams: | `number` | `number` | | | `array` | `array` | | | `object` | `object` | | + ### Features | Feature | Supported?\(Yes/No\) | Notes | | :--- | :--- | :--- | | Full Refresh Sync | Yes | | | Incremental - Append Sync | Yes | | -| Incremental - Debuped + History Sync | Yes | Enabled according to type of destination | +| Incremental - Debuped + History Sync | Yes | Enabled according to type of destination | | Namespaces | No | | ### Performance considerations @@ -73,14 +77,15 @@ The connector is restricted by normal Zendesk [requests limitation](https://deve The Zendesk connector should not run into Zendesk API limitations under normal usage. Please [create an issue](https://github.com/airbytehq/airbyte/issues) if you see any rate limit issues that are not automatically retried successfully. ## Getting started + ### Requirements + * Zendesk Subdomain * Auth Method * API Token * Zendesk API Token * Zendesk Email - * oAuth2 (not implemented) - + * oAuth2 \(not implemented\) ### Setup guide @@ -89,8 +94,9 @@ Generate a API access token using the [Zendesk support](https://support.zendesk. We recommend creating a restricted, read-only key specifically for Airbyte access. This will allow you to control which resources Airbyte should be able to access. ### CHANGELOG + | Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| `0.1.1` | 2021-09-02 | [5787](https://github.com/airbytehq/airbyte/pull/5787) | fixed incremental logic for the ticket_comments stream | +| :--- | :--- | :--- | :--- | +| `0.1.1` | 2021-09-02 | [5787](https://github.com/airbytehq/airbyte/pull/5787) | fixed incremental logic for the ticket\_comments stream | | `0.1.0` | 2021-07-21 | [4861](https://github.com/airbytehq/airbyte/pull/4861) | created CDK native zendesk connector | diff --git a/docs/integrations/sources/zoom.md b/docs/integrations/sources/zoom.md index b809a04177e..59f8fbcc25d 100644 --- a/docs/integrations/sources/zoom.md +++ b/docs/integrations/sources/zoom.md @@ -60,6 +60,7 @@ The Zoom connector should not run into Zoom API limitations under normal usage. Please read [How to generate your JWT Token](https://marketplace.zoom.us/docs/guides/build/jwt-app). -| Version | Date | Pull Request | Subject | -| :------ | :-------- | :----- | :------ | -| 0.2.4 | 2021-07-06 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.2.4 | 2021-07-06 | [4539](https://github.com/airbytehq/airbyte/pull/4539) | Add `AIRBYTE_ENTRYPOINT` for Kubernetes support | + diff --git a/docs/integrations/sources/zuora.md b/docs/integrations/sources/zuora.md index 7e8b80cf37e..9a3f0a2abd0 100644 --- a/docs/integrations/sources/zuora.md +++ b/docs/integrations/sources/zuora.md @@ -4,7 +4,7 @@ The Zuora source supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. -Airbyte uses [REST API](https://www.zuora.com/developer/api-reference/#section/Introduction) to fetch data from Zuora. The REST API accepts [ZOQL (Zuora Object Query Language)](https://knowledgecenter.zuora.com/Central_Platform/Query/Export_ZOQL), a SQL-like language, to export the data. +Airbyte uses [REST API](https://www.zuora.com/developer/api-reference/#section/Introduction) to fetch data from Zuora. The REST API accepts [ZOQL \(Zuora Object Query Language\)](https://knowledgecenter.zuora.com/Central_Platform/Query/Export_ZOQL), a SQL-like language, to export the data. This Source Connector is based on a [Airbyte CDK](https://docs.airbyte.io/connector-development/cdk-python). @@ -19,9 +19,8 @@ This Source is capable of syncing: The discovering of Zuora Account objects schema may take a while, if you add the connection for the first time, and/or you need to refresh your list of available streams. Please take your time to wait and don't cancel this operation, usually it takes up to 5-10 min, depending on number of objects available in Zuora Account. ### Note: -Some of the Zuora Objects may not be available for sync due to limitations of Zuora Supscription Plan, Permissions. -For details refer to the [Availability of Data Source Objects](https://knowledgecenter.zuora.com/DC_Developers/M_Export_ZOQL) section in the Zuora documentation. +Some of the Zuora Objects may not be available for sync due to limitations of Zuora Supscription Plan, Permissions. For details refer to the [Availability of Data Source Objects](https://knowledgecenter.zuora.com/DC_Developers/M_Export_ZOQL) section in the Zuora documentation. ### Data type mapping @@ -59,77 +58,90 @@ Any other data type not listed in the table above will be treated as `string`. | Feature | Supported?\(Yes/No\) | Notes | | :--- | :--- | :--- | | Full Refresh Overwrite Sync | Yes | | -| Full Refresh Append Sync | Yes | | +| Full Refresh Append Sync | Yes | | | Incremental - Append Sync | Yes | | | Incremental - Append + Deduplication Sync | Yes | | | Namespaces | No | | ## Supported Environments for Zuora + | Environment | Supported?\(Yes/No\) | Notes | | :--- | :--- | :--- | -| Production | Yes | Select from exising options while setup| +| Production | Yes | Select from exising options while setup | | Sandbox | Yes | Select from exising options while setup | ## List of Supported Environments for Zuora + ### Production - + | Environment | Endpoint | | :--- | :--- | -| US Production | https://rest.zuora.com | -| US Cloud Production | https://rest.na.zuora.com | -| EU Production | https://rest.eu.zuora.com | +| US Production | [https://rest.zuora.com](https://rest.zuora.com) | +| US Cloud Production | [https://rest.na.zuora.com](https://rest.na.zuora.com) | +| EU Production | [https://rest.eu.zuora.com](https://rest.eu.zuora.com) | + ### Sandbox + | Environment | Endpoint | | :--- | :--- | -| US API Sandbox | https://rest.apisandbox.zuora.com | -| US Cloud API Sandbox | https://rest.sandbox.na.zuora.com | -| US Central Sandbox | https://rest.test.zuora.com | -| EU API Sandbox | https://rest.sandbox.eu.zuora.com | -| EU Central Sandbox | https://rest.test.eu.zuora.com | - +| US API Sandbox | [https://rest.apisandbox.zuora.com](https://rest.apisandbox.zuora.com) | +| US Cloud API Sandbox | [https://rest.sandbox.na.zuora.com](https://rest.sandbox.na.zuora.com) | +| US Central Sandbox | [https://rest.test.zuora.com](https://rest.test.zuora.com) | +| EU API Sandbox | [https://rest.sandbox.eu.zuora.com](https://rest.sandbox.eu.zuora.com) | +| EU Central Sandbox | [https://rest.test.eu.zuora.com](https://rest.test.eu.zuora.com) | + ### Other + | Environment | Endpoint | | :--- | :--- | -| US Performance Test | https://rest.pt1.zuora.com | +| US Performance Test | [https://rest.pt1.zuora.com](https://rest.pt1.zuora.com) | For more information about available environments, please visit [this page](https://knowledgecenter.zuora.com/BB_Introducing_Z_Business/D_Zuora_Environments) ### Performance considerations If you experience the long time for sync operation, please consider: + * to increase the `window_in_days` parameter inside Zuora source configuration * use the smaller date range by tuning `start_date` parameter. ### Note + Usually, the very first sync operation for all of the objects inside Zuora account takes up to 25-45-60 min, the more data you have, the more time you'll need. ## Getting started ### Create an API user role + 1. Log in to your `Zuora acccount`. -2. In the top right corner of the Zuora dashboard, select `Settings` > `Administration Settings`. +2. In the top right corner of the Zuora dashboard, select `Settings` > `Administration Settings`. 3. Select `Manage User Roles`. 4. Select `Add new role` to create a new role, and fill in neccessary information up to the form. ### Assign the role to a user -5. From the `administration` page, click `Manage Users`. -6. Click `add single user`. -7. Create a user and assign it to the role you created in `Create an API user role` section. -8. You should receive an email with activation instructions. Follow them to activate your API user. -For more information visit [Create an API User page](https://knowledgecenter.zuora.com/Billing/Tenant_Management/A_Administrator_Settings/Manage_Users/Create_an_API_User) + +1. From the `administration` page, click `Manage Users`. +2. Click `add single user`. +3. Create a user and assign it to the role you created in `Create an API user role` section. +4. You should receive an email with activation instructions. Follow them to activate your API user. + + For more information visit [Create an API User page](https://knowledgecenter.zuora.com/Billing/Tenant_Management/A_Administrator_Settings/Manage_Users/Create_an_API_User) ### Create Client ID and Client Secret -9. From the `administration` page, click `Manage Users`. -10. Click on User Name of the target user. -11. Enter a client name and description and click `create`. -12. A pop-up will open with your Client ID and Client Secret. - Make a note of your Client ID and Client Secret because they will never be shown again. You will need them to configure Airbyte Zuora Connector. -13. You're ready to set up Zuora connector in Airbyte, using created `Client ID` and `Client Secret`! +1. From the `administration` page, click `Manage Users`. +2. Click on User Name of the target user. +3. Enter a client name and description and click `create`. +4. A pop-up will open with your Client ID and Client Secret. + + Make a note of your Client ID and Client Secret because they will never be shown again. You will need them to configure Airbyte Zuora Connector. + +5. You're ready to set up Zuora connector in Airbyte, using created `Client ID` and `Client Secret`! ## Changelog -| Version | Date | Pull Request | Subject | -| :------ | :------ -| 0.1.1 | 2021-10-01 | [6575](https://github.com/airbytehq/airbyte/pull/6575) | Added OAuth support for Airbyte Cloud | -| 0.1.0 | 2021-08-01 | [4661](https://github.com/airbytehq/airbyte/pull/4661) | Initial release of Native Zuora connector for Airbyte | +| Version | Date | Pull Request | Subject | +| :--- | :--- | :--- | :--- | +| 0.1.1 | 2021-10-01 | [6575](https://github.com/airbytehq/airbyte/pull/6575) | Added OAuth support for Airbyte Cloud | +| 0.1.0 | 2021-08-01 | [4661](https://github.com/airbytehq/airbyte/pull/4661) | Initial release of Native Zuora connector for Airbyte | + diff --git a/docs/operator-guides/README.md b/docs/operator-guides/README.md index 84ce15b7886..10a3aad8349 100644 --- a/docs/operator-guides/README.md +++ b/docs/operator-guides/README.md @@ -1,2 +1,2 @@ -# Tutorials +# Operator Guides diff --git a/docs/operator-guides/configuring-airbyte-db.md b/docs/operator-guides/configuring-airbyte-db.md index 6b98ebb080c..6374abcd970 100644 --- a/docs/operator-guides/configuring-airbyte-db.md +++ b/docs/operator-guides/configuring-airbyte-db.md @@ -1,29 +1,31 @@ -# Configuring the Airbyte Internal Database +# Configuring the Airbyte Database Airbyte uses different objects to store internal state and metadata. This data is stored and manipulated by the various Airbyte components, but you have the ability to manage the deployment of this database in the following two ways: -- Using the default Postgres database that Airbyte spins-up as part of the Docker service described in the `docker-compose.yml` file: `airbyte/db`. -- Through a dedicated custom Postgres instance (the `airbyte/db` is in this case unused, and can therefore be removed or de-activated from the `docker-compose.yml` file). + +* Using the default Postgres database that Airbyte spins-up as part of the Docker service described in the `docker-compose.yml` file: `airbyte/db`. +* Through a dedicated custom Postgres instance \(the `airbyte/db` is in this case unused, and can therefore be removed or de-activated from the `docker-compose.yml` file\). The various entities are persisted in two internal databases: -- Job database - - Data about executions of Airbyte Jobs and various runtime metadata. - - Data about the internal orchestrator used by Airbyte, Temporal.io (Tasks, Workflow data, Events, and visibility data). -- Config database - - Connectors, Sync Connections and various Airbyte configuration objects. +* Job database + * Data about executions of Airbyte Jobs and various runtime metadata. + * Data about the internal orchestrator used by Airbyte, Temporal.io \(Tasks, Workflow data, Events, and visibility data\). +* Config database + * Connectors, Sync Connections and various Airbyte configuration objects. -Note that no actual data from the source (or destination) connectors ever transits or is retained in this internal database. +Note that no actual data from the source \(or destination\) connectors ever transits or is retained in this internal database. -If you need to interact with it, for example, to make back-ups or perform some clean-up maintenances, you can also gain access to the Export and Import functionalities of this database via the API or the UI (in the Admin page, in the Configuration Tab). +If you need to interact with it, for example, to make back-ups or perform some clean-up maintenances, you can also gain access to the Export and Import functionalities of this database via the API or the UI \(in the Admin page, in the Configuration Tab\). ## Connecting to an External Postgres database Let's walk through what is required to use a Postgres instance that is not managed by Airbyte. First, for the sake of the tutorial, we will run a new instance of Postgres in its own docker container with the command below. If you already have Postgres running elsewhere, you can skip this step and use the credentials for that in future steps. + ```bash docker run --rm --name airbyte-postgres -e POSTGRES_PASSWORD=password -p 3000:5432 -d postgres ``` -In order to configure Airbyte services with this new database, we need to edit the following environment variables declared in the `.env` file (used by the docker-compose command afterward): +In order to configure Airbyte services with this new database, we need to edit the following environment variables declared in the `.env` file \(used by the docker-compose command afterward\): ```bash DATABASE_USER=postgres @@ -40,7 +42,7 @@ CONFIG_DATABASE_USER=airbyte_config_db_user CONFIG_DATABASE_PASSWORD=password ``` -Additionally, you must redefine the JDBC URL constructed in the environment variable `DATABASE_URL` to include the correct host, port, and database. If you need to provide extra arguments to the JDBC driver (for example, to handle SSL) you should add it here as well: +Additionally, you must redefine the JDBC URL constructed in the environment variable `DATABASE_URL` to include the correct host, port, and database. If you need to provide extra arguments to the JDBC driver \(for example, to handle SSL\) you should add it here as well: ```bash DATABASE_URL=jdbc:postgresql://host.docker.internal:3000/postgres?ssl=true&sslmode=require @@ -59,16 +61,19 @@ This step is only required when you setup Airbyte with a custom database for the {% endhint %} If you provide an empty database to Airbyte and start Airbyte up for the first time, the server will automatically create the relevant tables in your database, and copy the data. Please make sure: + * The database exists in the server. * The user has both read and write permissions to the database. * The database is empty. * If the database is not empty, and has a table that shares the same name as one of the Airbyte tables, the server will assume that the database has been initialized, and will not copy the data over, resulting in server failure. If you run into this issue, just wipe out the database, and launch the server again. ## Accessing the default database located in docker airbyte-db + In extraordinary circumstances while using the default `airbyte-db` Postgres database, if a developer wants to access the data that tracks jobs, they can do so with the following instructions. As we've seen previously, the credentials for the database are specified in the `.env` file that is used to run Airbyte. By default, the values are: -```shell + +```text DATABASE_USER=docker DATABASE_PASSWORD=docker DATABASE_DB=airbyte @@ -76,10 +81,11 @@ DATABASE_DB=airbyte If you have overridden these defaults, you will need to substitute them in the instructions below. -The following command will allow you to access the database instance using `psql`. +The following command will allow you to access the database instance using `psql`. -```shell +```text docker exec -ti airbyte-db psql -U docker -d airbyte ``` -To access the configuration files for sources, destinations, and connections that have been added, simply query the `airbyte-configs` table. \ No newline at end of file +To access the configuration files for sources, destinations, and connections that have been added, simply query the `airbyte-configs` table. + diff --git a/docs/operator-guides/locating-files-local-destination.md b/docs/operator-guides/locating-files-local-destination.md index f281afb2d3a..e5ce906683e 100644 --- a/docs/operator-guides/locating-files-local-destination.md +++ b/docs/operator-guides/locating-files-local-destination.md @@ -1,4 +1,4 @@ -# Windows - Looking outputs for local destination (csv/json) +# Windows - Browsing Local File Output ## Overview @@ -8,12 +8,14 @@ There can be confusion when using local destinations in Airbyte on Windows, espe ## Locating where your temp folder is -While running Airbyte's Docker image on Windows with WSL2, you can access your temp folder by doing the following: +While running Airbyte's Docker image on Windows with WSL2, you can access your temp folder by doing the following: -1. Open File Explorer (Or any folder where you can access the address bar) +1. Open File Explorer \(Or any folder where you can access the address bar\) 2. Type in `\\wsl$` in the address bar 3. The folders below will be displayed -![](../.gitbook/assets/windows-wsl2-docker-folders.png) + + ![](../.gitbook/assets/windows-wsl2-docker-folders.png) + 4. You can start digging here, but it is recommended to start searching from here and just search for the folder name you used for your local files. The folder address should be similar to `\\wsl$\docker-desktop\tmp\docker-desktop-root\containers\services\docker\rootfs\tmp\airbyte_local` 5. You should be able to locate your local destination CSV or JSON files in this folder. @@ -21,3 +23,4 @@ While running Airbyte's Docker image on Windows with WSL2, you can access your t 1. Local JSON and Local CSV files do not persist between Docker restarts. This means that once you turn off your Docker image, your data is lost. This is consistent with the `tmp` nature of the folder. 2. In the root folder of your docker files, it might generate tmp and var folders that only have empty folders inside. + diff --git a/docs/operator-guides/reset.md b/docs/operator-guides/reset.md index 71bed9077d5..ff7dc4d0612 100644 --- a/docs/operator-guides/reset.md +++ b/docs/operator-guides/reset.md @@ -7,11 +7,14 @@ The reset button gives you a blank slate, of sorts, to perform a fresh new sync. As outlined above, you can click on the `Reset your data` button to give you that clean slate. Just as a heads up, here is what it does and doesn't do: The reset button **DOES**: -- Delete all records in your destination tables -- Delete all records in your destination file + +* Delete all records in your destination tables +* Delete all records in your destination file The reset button **DOES NOT**: -- Delete the destination tables -- Delete a destination file if using the LocalCSV or LocalJSON Destinations + +* Delete the destination tables +* Delete a destination file if using the LocalCSV or LocalJSON Destinations Because of this, if you have any orphaned tables or files that are no longer being synced to, they will have to be cleaned up later, as Airbyte will not clean them up for you. + diff --git a/docs/operator-guides/scaling-airbyte.md b/docs/operator-guides/scaling-airbyte.md index eb702f2c34b..d89226bf909 100644 --- a/docs/operator-guides/scaling-airbyte.md +++ b/docs/operator-guides/scaling-airbyte.md @@ -1,39 +1,30 @@ # Scaling Airbyte -As depicted in our [High-Level View](../understanding-airbyte/high-level-view.md), Airbyte is made up of several components under the hood: -1. Scheduler -2. Server -3. Temporal -4. Webapp -5. Database +As depicted in our [High-Level View](../understanding-airbyte/high-level-view.md), Airbyte is made up of several components under the hood: 1. Scheduler 2. Server 3. Temporal 4. Webapp 5. Database These components perform control plane operations that are low-scale, low-resource work. In addition to the work being low cost, these components are efficient and optimized for these jobs, meaning that only uncommonly large workloads will require deployments at scale. In general, you would only encounter scaling issues when running over a thousand connections. -As a reference point, the typical Airbyte user has 5 - 20 connectors and 10 - 100 connections configured. Almost all of these connections are scheduled, -either hourly or daily, resulting in at most 100 concurrent jobs. +As a reference point, the typical Airbyte user has 5 - 20 connectors and 10 - 100 connections configured. Almost all of these connections are scheduled, either hourly or daily, resulting in at most 100 concurrent jobs. ## What To Scale -[Workers](../understanding-airbyte/jobs.md) do all the heavy lifting within Airbyte. A worker is responsible for executing Airbyte operations (e.g. Discover, Read, Sync etc), -and is created on demand whenever these operations are requested. Thus, every job has a corresponding worker executing its work. -How a worker executes work depends on the Airbyte deployment. In the Docker deployment, an Airbyte worker spins up at least one Docker container. In the Kubernetes -deployment, an Airbyte worker will create at least one Kubernetes pod. The created resource (Docker container or Kubernetes pod) does all the actual work. +[Workers](../understanding-airbyte/jobs.md) do all the heavy lifting within Airbyte. A worker is responsible for executing Airbyte operations \(e.g. Discover, Read, Sync etc\), and is created on demand whenever these operations are requested. Thus, every job has a corresponding worker executing its work. + +How a worker executes work depends on the Airbyte deployment. In the Docker deployment, an Airbyte worker spins up at least one Docker container. In the Kubernetes deployment, an Airbyte worker will create at least one Kubernetes pod. The created resource \(Docker container or Kubernetes pod\) does all the actual work. Thus, scaling Airbyte is a matter of ensuring that the Docker container or Kubernetes Pod running the jobs has sufficient resources to execute its work. -Jobs-wise, we are mainly concerned with Sync jobs when thinking about scale. Sync jobs sync data from sources to destinations and are the majority of jobs run. Sync jobs use two workers. -One worker reads from the source; the other worker writes to the destination. +Jobs-wise, we are mainly concerned with Sync jobs when thinking about scale. Sync jobs sync data from sources to destinations and are the majority of jobs run. Sync jobs use two workers. One worker reads from the source; the other worker writes to the destination. -**In general, we recommend starting out with a mid-sized cloud instance (e.g. 4 or 8 cores) and gradually tuning instance size to your workload.** +**In general, we recommend starting out with a mid-sized cloud instance \(e.g. 4 or 8 cores\) and gradually tuning instance size to your workload.** -There are two resources to be aware of when thinking of scale: -1. Memory -2. Disk space +There are two resources to be aware of when thinking of scale: 1. Memory 2. Disk space ### Memory + As mentioned above, we are mainly concerned with scaling Sync jobs. Within a Sync job, the main memory culprit is the Source worker. -This is because the Source worker reads up to 10,000 records in memory. This can present problems for database sources with tables that have large row sizes. e.g. a table with an average row size of 0.5MBs will require 0.5 * 10000 / 1000 = 5GBs of RAM. See [this issue](https://github.com/airbytehq/airbyte/issues/3439) for more information. +This is because the Source worker reads up to 10,000 records in memory. This can present problems for database sources with tables that have large row sizes. e.g. a table with an average row size of 0.5MBs will require 0.5 \* 10000 / 1000 = 5GBs of RAM. See [this issue](https://github.com/airbytehq/airbyte/issues/3439) for more information. Our Java connectors currently follow Java's default behaviour with container memory and will only use up to 1/4 of the host's allocated memory. e.g. On a Docker agent with 8GBs of RAM configured, a Java connector limits itself to 2Gbs of RAM and will see Out-of-Memory exceptions if this goes higher. The same applies to Kubernetes pods. @@ -42,22 +33,21 @@ Note that all Source database connectors are Java connectors. This means that us Improving this behaviour is on our roadmap. Please see [this issue](https://github.com/airbytehq/airbyte/issues/3440) for more information. ### Disk Space + Airbyte uses backpressure to try to read the minimal amount of logs required. In the past, disk space was a large concern, but we've since deprecated the expensive on-disk queue approach. However, disk space might become an issue for the following reasons: 1. Long-running syncs can produce a fair amount of logs from the Docker agent and Airbyte on Docker deployments. Some work has been done to minimize accidental logging, so this should no longer be an acute problem, but is still an open issue. - -2. Although Airbyte connector images aren't massive, they aren't exactly small either. The typical connector image is ~300MB. An Airbyte deployment with -multiple connectors can easily use up to 10GBs of disk space. +2. Although Airbyte connector images aren't massive, they aren't exactly small either. The typical connector image is ~300MB. An Airbyte deployment with multiple connectors can easily use up to 10GBs of disk space. Because of this, we recommend allocating a minimum of 30GBs of disk space per node. Since storage is on the cheaper side, we'd recommend you be safe than sorry, so err on the side of over-provisioning. -### On Kubernetes +### On Kubernetes + Users running Airbyte Kubernetes also have to make sure the Kubernetes cluster can accommodate the number of pods Airbyte creates. -To be safe, make sure the Kubernetes cluster can schedule up to `2 x ` pods at once. This is the worse case estimate, and most users should be fine with `2 x ` -as a rule of thumb. +To be safe, make sure the Kubernetes cluster can schedule up to `2 x ` pods at once. This is the worse case estimate, and most users should be fine with `2 x ` as a rule of thumb. This is a **non-issue** for users running Airbyte Docker. @@ -66,3 +56,4 @@ This is a **non-issue** for users running Airbyte Docker. The advice here is best-effort and by no means comprehensive. Please reach out on Slack if anything doesn't make sense or if something can be improved. If you've been running Airbyte in production and have more tips up your sleeve, we welcome contributions! + diff --git a/docs/operator-guides/transformation-and-normalization/transformations-with-airbyte.md b/docs/operator-guides/transformation-and-normalization/transformations-with-airbyte.md index 6d62f127e9c..5ac5090e4c0 100644 --- a/docs/operator-guides/transformation-and-normalization/transformations-with-airbyte.md +++ b/docs/operator-guides/transformation-and-normalization/transformations-with-airbyte.md @@ -4,39 +4,41 @@ This tutorial will describe how to push a custom dbt transformation project back to Airbyte to use during syncs. -This guide is the last part of the tutorial series on transformations, following [Transformations with SQL](transformations-with-sql.md) and -[connecting EL with T using dbt](transformations-with-dbt.md). +This guide is the last part of the tutorial series on transformations, following [Transformations with SQL](transformations-with-sql.md) and [connecting EL with T using dbt](transformations-with-dbt.md). -(Example outputs are updated with Airbyte version 0.23.0-alpha from May 2021) +\(Example outputs are updated with Airbyte version 0.23.0-alpha from May 2021\) ## Transformations with Airbyte -After replication of data from a source connector (Extract) to a destination connector (Load), multiple optional transformation steps can now be applied as part of an Airbyte Sync. Possible workflows are: +After replication of data from a source connector \(Extract\) to a destination connector \(Load\), multiple optional transformation steps can now be applied as part of an Airbyte Sync. Possible workflows are: 1. Basic normalization transformations as automatically generated by Airbyte dbt code generator. -2. Customized normalization transformations as edited by the user (the default generated normalization one should therefore be disabled) +2. Customized normalization transformations as edited by the user \(the default generated normalization one should therefore be disabled\) 3. Customized business transformations as specified by the user. ## Public Git repository -In the connection settings page, I can add new Transformations steps to apply after [normalization](../../understanding-airbyte/basic-normalization.md). For example, I want to run my custom dbt project [jaffle_shop](https://github.com/fishtown-analytics/jaffle_shop), whenever my sync is done replicating and normalizing my data. + +In the connection settings page, I can add new Transformations steps to apply after [normalization](../../understanding-airbyte/basic-normalization.md). For example, I want to run my custom dbt project [jaffle\_shop](https://github.com/fishtown-analytics/jaffle_shop), whenever my sync is done replicating and normalizing my data. ![](../../.gitbook/assets/custom-dbt-transformations-seed.png) ![](../../.gitbook/assets/custom-dbt-transformations.png) - ## Private Git repository + Now, let's connect my mono-repo Business Intelligence project stored in a private git repository to update the related tables and dashboards when my Airbyte syncs complete. Note that if you need to connect to a private git repository, the recommended way to do so is to generate a `Personal Access Token` that can be used instead of a password. Then, you'll be able to include the credentials in the git repository url: -- [GitHub - Personal Access Tokens](https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token) -- [Gitlab - Personal Access Tokens](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html) -- [Azure DevOps - Personal Access Tokens](https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate) +* [GitHub - Personal Access Tokens](https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token) +* [Gitlab - Personal Access Tokens](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html) +* [Azure DevOps - Personal Access Tokens](https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate) And then use it for cloning: - git clone https://username:token@github.com/user/repo +```text +git clone https://username:token@github.com/user/repo +``` Where `https://username:token@github.com/user/repo` is the git repository url. @@ -54,7 +56,6 @@ In Airbyte, I can use the git url as: `https://airbyteuser:ghp_***********ShLrG2 ![](../../.gitbook/assets/setup-custom-transformation.png) - ## How-to use custom dbt tips ### Refresh models partially @@ -63,18 +64,21 @@ Since I am using a mono-repo from my organization, other team members or departm The whole warehouse is scheduled for full refresh on a different orchestration tool, or as part of the git repository CI. However, here, I want to partially refresh some small relevant tables when attaching this operation to a specific Airbyte sync, in this case, the Covid dataset. -Therefore, I can restrict the execution of models to a particular tag or folder by specifying in the dbt cli arguments, in this case whatever is related to "covid_api": +Therefore, I can restrict the execution of models to a particular tag or folder by specifying in the dbt cli arguments, in this case whatever is related to "covid\_api": - run --models tag:covid_api opendata.base.* +```text +run --models tag:covid_api opendata.base.* +``` Now, when replications syncs are triggered by Airbyte, my custom transformations from my private git repository are also run at the end! -### Using a custom run with variables +### Using a custom run with variables If you want to use a custom run and pass variables you need to use the follow syntax: + ```bash run --vars '{table_name":"sample","schema_name":"other_value"}' ``` -This string must have no space. -There is a [Github issue](https://github.com/airbytehq/airbyte/issues/4348) to improve this. -If you want to contribute to Airbyte, this is a good opportunity! + +This string must have no space. There is a [Github issue](https://github.com/airbytehq/airbyte/issues/4348) to improve this. If you want to contribute to Airbyte, this is a good opportunity! + diff --git a/docs/operator-guides/transformation-and-normalization/transformations-with-dbt.md b/docs/operator-guides/transformation-and-normalization/transformations-with-dbt.md index 59c68a84b13..aabd62a4068 100644 --- a/docs/operator-guides/transformation-and-normalization/transformations-with-dbt.md +++ b/docs/operator-guides/transformation-and-normalization/transformations-with-dbt.md @@ -6,18 +6,17 @@ This tutorial will describe how to integrate SQL based transformations with Airb This tutorial is the second part of the previous tutorial [Transformations with SQL](transformations-with-sql.md). Next, we'll wrap-up with a third part on submitting transformations back in Airbyte: [Transformations with Airbyte](transformations-with-airbyte.md). -(Example outputs are updated with Airbyte version 0.23.0-alpha from May 2021) +\(Example outputs are updated with Airbyte version 0.23.0-alpha from May 2021\) ## Transformations with dbt The tool in charge of transformation behind the scenes is actually called [dbt](https://blog.getdbt.com/what--exactly--is-dbt-/) \(Data Build Tool\). -Before generating the SQL files as we've seen in the previous tutorial, Airbyte sets up a dbt Docker instance and automatically generates a dbt project for us. This is created as specified in the [dbt project documentation page](https://docs.getdbt.com/docs/building-a-dbt-project/projects) with the right credentials for the target destination. The dbt models are then run afterward, thanks to the [dbt CLI](https://docs.getdbt.com/dbt-cli/cli-overview). -However, for now, let's run through working with the dbt tool. +Before generating the SQL files as we've seen in the previous tutorial, Airbyte sets up a dbt Docker instance and automatically generates a dbt project for us. This is created as specified in the [dbt project documentation page](https://docs.getdbt.com/docs/building-a-dbt-project/projects) with the right credentials for the target destination. The dbt models are then run afterward, thanks to the [dbt CLI](https://docs.getdbt.com/dbt-cli/cli-overview). However, for now, let's run through working with the dbt tool. ### Validate dbt project settings -Let's say we identified our workspace (as shown in the previous tutorial [Transformations with SQL](transformations-with-sql.md)), and we have a workspace ID of: +Let's say we identified our workspace \(as shown in the previous tutorial [Transformations with SQL](transformations-with-sql.md)\), and we have a workspace ID of: ```bash NORMALIZE_WORKSPACE="5/0/" @@ -59,6 +58,7 @@ Connection: sslmode: None Connection test: OK connection ok ``` + ### Compile and build dbt normalization models If the previous command does not show any errors or discrepancies, it is now possible to invoke the CLI from within the docker image to trigger transformation processing: @@ -78,13 +78,14 @@ Concurrency: 32 threads (target='prod') 1 of 1 START table model quarantine.covid_epidemiology....................................................... [RUN] 1 of 1 OK created table model quarantine.covid_epidemiology.................................................. [SELECT 35822 in 0.47s] - + Finished running 1 table model in 0.74s. Completed successfully Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 ``` + ### Exporting dbt normalization project outside Airbyte As seen in the tutorial on [exploring workspace folder](../browsing-output-logs.md), it is possible to browse the `normalize` folder and examine further logs if an error occurs. @@ -214,3 +215,4 @@ Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 Now, that you've exported the generated normalization models, you can edit and tweak them as necessary. If you want to know how to push your modifications back to Airbyte and use your updated dbt project during Airbyte syncs, you can continue with the following [tutorial on importing transformations into Airbyte](transformations-with-airbyte.md)... + diff --git a/docs/operator-guides/transformation-and-normalization/transformations-with-sql.md b/docs/operator-guides/transformation-and-normalization/transformations-with-sql.md index fd7551c717f..df085355c6e 100644 --- a/docs/operator-guides/transformation-and-normalization/transformations-with-sql.md +++ b/docs/operator-guides/transformation-and-normalization/transformations-with-sql.md @@ -1,14 +1,16 @@ # Transformations with SQL \(Part 1/3\) -## Overview +## Transformations with SQL \(Part 1/3\) + +### Overview This tutorial will describe how to integrate SQL based transformations with Airbyte syncs using plain SQL queries. This is the first part of ELT tutorial. The second part goes deeper with [Transformations with dbt](transformations-with-dbt.md) and then wrap-up with a third part on [Transformations with Airbyte](transformations-with-airbyte.md). -# (Examples outputs are updated with Airbyte version 0.23.0-alpha from May 2021) +## \(Examples outputs are updated with Airbyte version 0.23.0-alpha from May 2021\) -## First transformation step: Normalization +### First transformation step: Normalization At its core, Airbyte is geared to handle the EL \(Extract Load\) steps of an ELT process. These steps can also be referred in Airbyte's dialect as "Source" and "Destination". @@ -34,7 +36,7 @@ In order to do so, we will now describe how you can leverage the basic normaliza Note: We will rely on docker commands that we've gone over as part of another [Tutorial on Exploring Docker Volumes](../browsing-output-logs.md). -## \(Optional\) Configure some Covid \(data\) source and Postgres destinations +### \(Optional\) Configure some Covid \(data\) source and Postgres destinations If you have sources and destinations already setup on your deployment, you can skip to the next section. @@ -57,7 +59,7 @@ After setting up the connectors, we can trigger the sync and study the logs: Notice that the process ran in the `/tmp/workspace/5/0` folder. -## Identify Workspace ID with Normalize steps +### Identify Workspace ID with Normalize steps If you went through the previous setup of source/destination section and run a sync, you were able to identify which workspace was used, let's define some environment variables to remember this: @@ -72,7 +74,7 @@ Or if you want to find any folder where the normalize step was run: NORMALIZE_WORKSPACE=`docker run --rm -i -v airbyte_workspace:/data busybox find /data -path "*normalize/models*" | sed -E "s;/data/([0-9]+/[0-9]+/)normalize/.*;\1;g" | sort | uniq | tail -n 1` ``` -## Export Plain SQL files +### Export Plain SQL files Airbyte is internally using a specialized tool for handling transformations called dbt. @@ -114,7 +116,7 @@ Example Output: ```sql create table "postgres".quarantine."covid_epidemiology_f11__dbt_tmp" as ( - + with __dbt__CTE__covid_epidemiology_ab1_558 as ( -- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema @@ -175,7 +177,7 @@ from __dbt__CTE__covid_epidemiology_ab1_558 select *, md5(cast( - + coalesce(cast("key" as varchar ), '') || '-' || coalesce(cast("date" as @@ -222,7 +224,7 @@ from __dbt__CTE__covid_epidemiology_ab3_558 ); ``` -### Simple SQL Query +#### Simple SQL Query We could simplify the SQL query by removing some parts that may be unnecessary for your current usage \(such as generating a md5 column; [Why exactly would I want to use that?!](https://blog.getdbt.com/the-most-underutilized-function-in-sql/)\). @@ -249,7 +251,7 @@ as ( ); ``` -### Customize SQL Query +#### Customize SQL Query Feel free to: @@ -314,3 +316,4 @@ create view "postgres"."public"."covid_epidemiology" as ( Then you can run in your preferred SQL editor or tool! If you are familiar with dbt or want to learn more about it, you can continue with the following [tutorial using dbt](transformations-with-dbt.md)... + diff --git a/docs/operator-guides/upgrading-airbyte.md b/docs/operator-guides/upgrading-airbyte.md index 14ae9c4efa9..7518ce92b59 100644 --- a/docs/operator-guides/upgrading-airbyte.md +++ b/docs/operator-guides/upgrading-airbyte.md @@ -8,7 +8,7 @@ This tutorial will describe how to determine if you need to run this upgrade pro Airbyte intelligently performs upgrades automatically based off of your version defined in your `.env` file and will handle data migration for you. -If you are running [Airbyte on Kubernetes](../deploying-airbyte/on-kubernetes.md), you will need to use one of the two processes defined [here](https://docs.airbyte.io/upgrading-airbyte#upgrading-k-8-s) that differ based on your Airbyte version. +If you are running [Airbyte on Kubernetes](../deploying-airbyte/on-kubernetes.md), you will need to use one of the two processes defined [here](https://docs.airbyte.io/upgrading-airbyte#upgrading-k-8-s) that differ based on your Airbyte version. ## Upgrading on Docker @@ -38,9 +38,9 @@ If you did not start Airbyte from the root of the Airbyte monorepo, you may run This will completely reset your Airbyte deployment back to scratch and you will lose all data. {% endhint %} -## Upgrading on K8s (0.27.0-alpha and above) +## Upgrading on K8s \(0.27.0-alpha and above\) -If you are upgrading from (i.e. your current version of Airbyte is) Airbyte version **0.27.0-alpha or above** on Kubernetes : +If you are upgrading from \(i.e. your current version of Airbyte is\) Airbyte version **0.27.0-alpha or above** on Kubernetes : 1. In a terminal, on the host where Airbyte is running, turn off Airbyte. @@ -57,13 +57,14 @@ If you are upgrading from (i.e. your current version of Airbyte is) Airbyte vers ```bash kubectl apply -k kube/overlays/stable ``` - After 2-5 minutes, `kubectl get pods | grep airbyte` should show `Running` as the status for all the core Airbyte pods. This may take longer - on Kubernetes clusters with slow internet connections. + + After 2-5 minutes, `kubectl get pods | grep airbyte` should show `Running` as the status for all the core Airbyte pods. This may take longer on Kubernetes clusters with slow internet connections. Run `kubectl port-forward svc/airbyte-webapp-svc 8000:80` to allow access to the UI/API. -## Upgrading on K8s (0.26.4-alpha and below) -If you are upgrading from (i.e. your current version of Airbyte is) Airbyte version **before 0.27.0-alpha** on Kubernetes we **do not** support automatic migration. Please follow the following steps to upgrade your Airbyte Kubernetes deployment. +## Upgrading on K8s \(0.26.4-alpha and below\) + +If you are upgrading from \(i.e. your current version of Airbyte is\) Airbyte version **before 0.27.0-alpha** on Kubernetes we **do not** support automatic migration. Please follow the following steps to upgrade your Airbyte Kubernetes deployment. 1. Switching over to your browser, navigate to the Admin page in the UI. Then go to the Configuration Tab. Click Export. This will download a compressed back-up archive \(gzipped tarball\) of all of your Airbyte configuration data and sync history locally. @@ -96,13 +97,14 @@ If you are upgrading from (i.e. your current version of Airbyte is) Airbyte ver # Careful, this is deleting data! kubectl delete -k kube/overlays/stable ``` -4. Follow **Step 2** in the `Upgrading on Docker` section to check out the most recent version of Airbyte. Although it is possible to - migrate by changing the `.env` file in the kube overlay directory, this is not recommended as it does not capture any changes to the Kubernetes manifests. +4. Follow **Step 2** in the `Upgrading on Docker` section to check out the most recent version of Airbyte. Although it is possible to migrate by changing the `.env` file in the kube overlay directory, this is not recommended as it does not capture any changes to the Kubernetes manifests. 5. Bring Airbyte back up. + ```bash kubectl apply -k kube/overlays/stable ``` + 6. Switching over to your browser, navigate to the Admin page in the UI. Then go to the Configuration Tab and click on Import. Upload your migrated archive. If you prefer to import and export your data via API instead the UI, follow these instructions: @@ -124,3 +126,4 @@ Here is an example of what this request might look like assuming that the migrat ```bash curl -H "Content-Type: application/x-gzip" -X POST localhost:8000/api/v1/deployment/import --data-binary @/tmp/airbyte_archive_migrated.tar.gz ``` + diff --git a/docs/operator-guides/using-the-airflow-airbyte-operator.md b/docs/operator-guides/using-the-airflow-airbyte-operator.md index f87002203d6..a365bd3462a 100644 --- a/docs/operator-guides/using-the-airflow-airbyte-operator.md +++ b/docs/operator-guides/using-the-airflow-airbyte-operator.md @@ -2,11 +2,9 @@ description: Start triggering Airbyte jobs with Apache Airflow in minutes --- -# Using Apache Airflow with the Airbyte Operator +# Using the Airflow Airbyte Operator -Airbyte is an official community provider for the Apache Airflow project. -The Airbyte operator allows you to trigger synchronization jobs in Apache Airflow, -and this tutorial will walk through configuring your Airflow DAG to do so. +Airbyte is an official community provider for the Apache Airflow project. The Airbyte operator allows you to trigger synchronization jobs in Apache Airflow, and this tutorial will walk through configuring your Airflow DAG to do so. {% hint style="warning" %} Due to some difficulties in setting up Airflow, we recommend first trying out the deployment using the local example [here](https://github.com/airbytehq/airbyte/tree/master/resources/examples/airflow), as it contains accurate configuration required to get the Airbyte operator up and running. @@ -26,9 +24,7 @@ For the purposes of this tutorial, set your Connection's **sync frequency** to * ### **Start Apache Airflow** -If you don't have an Airflow instance, we recommend following this [guide](https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html) to set one up. -Additionally, you will need to install the `apache-airflow-providers-airbyte` package to use Airbyte Operator on Apache Airflow. -You can read more about it [here](https://airflow.apache.org/docs/apache-airflow-providers-airbyte/stable/index.html) +If you don't have an Airflow instance, we recommend following this [guide](https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html) to set one up. Additionally, you will need to install the `apache-airflow-providers-airbyte` package to use Airbyte Operator on Apache Airflow. You can read more about it [here](https://airflow.apache.org/docs/apache-airflow-providers-airbyte/stable/index.html) ## 2. Create a DAG in Apache Airflow to trigger your Airbyte job diff --git a/docs/project-overview/changelog/README.md b/docs/project-overview/changelog/README.md index 8ddadb37ce4..86c8266c239 100644 --- a/docs/project-overview/changelog/README.md +++ b/docs/project-overview/changelog/README.md @@ -4,27 +4,22 @@ We're going over the changes from 0.29.17 and before... and there's a lot of big improvements here, so don't miss them! -**New Source**: Facebook Pages -**New Destination**: MongoDB -**New Destination**: DynamoDB +**New Source**: Facebook Pages **New Destination**: MongoDB **New Destination**: DynamoDB -* 🎉 You can now send notifications via webhook for successes and failures on Airbyte syncs. (This is a massive contribution by @Pras, thank you) :tada: +* 🎉 You can now send notifications via webhook for successes and failures on Airbyte syncs. \(This is a massive contribution by @Pras, thank you\) :tada: * 🎉 Scheduling jobs and worker jobs are now separated, allowing for workers to be scaled horizontally. * 🎉 When developing a connector, you can now preview what your spec looks like in real time with this process. * 🎉 Oracle destination: Now has basic normalization. -* 🎉 Add XLSB (binary excel) support to the Files source (contributed by Muutech). +* 🎉 Add XLSB \(binary excel\) support to the Files source \(contributed by Muutech\). * 🎉 You can now properly cancel K8s deployments. - * ✨ S3 source: Support for Parquet format. -* ✨ Github source: Branches, repositories, organization users, tags, and pull request stats streams added (contributed by @Christopher Wu). +* ✨ Github source: Branches, repositories, organization users, tags, and pull request stats streams added \(contributed by @Christopher Wu\). * ✨ BigQuery destination: Added GCS upload option. * ✨ Salesforce source: Now Airbyte native. * ✨ Redshift destination: Optimized for performance. - * 🏗 CDK: :tada: We’ve released a tool to generate JSON Schemas from OpenAPI specs. This should make specifying schemas for API connectors a breeze! :tada: * 🏗 CDK: Source Acceptance Tests now verify that connectors correctly format strings which are declared as using date-time and date formats. -* 🏗 CDK: Add private options to help in testing: _limit and _page_size are now accepted by any CDK connector to minimze your output size for quick iteration while testing. - +* 🏗 CDK: Add private options to help in testing: \_limit and \_page\_size are now accepted by any CDK connector to minimze your output size for quick iteration while testing. * 🐛 Fixed a bug that made it possible for connector definitions to be duplicated, violating uniqueness. * 🐛 Pipedrive source: Output schemas no longer remove timestamp from fields. * 🐛 Github source: Empty repos and negative backoff values are now handled correctly. @@ -38,7 +33,7 @@ We're going over the changes from 0.29.17 and before... and there's a lot of big * 🐛 Hubspot source: Empty strings are no longer handled as dates, fixing the deals, companies, and contacts streams. * 🐛 Typeform source: Allows for multiple choices in responses now. * 🐛 Shopify source: The type for the amount field is now fixed in the schema. -* 🐛 Postgres destination: \u0000(NULL) value processing is now fixed. +* 🐛 Postgres destination: \u0000\(NULL\) value processing is now fixed. As usual... thank you to our wonderful contributors this week: Pras, Christopher Wu, Brian M, yahu98, Michele Zuccala, jinnig, and luizgribeiro! @@ -49,14 +44,10 @@ Got the changes from 0.29.13... with some other surprises! * 🔥 There's a new way to create Airbyte sources! The team at Faros AI has created a Javascript/Typescript CDK which can be found here and in our docs here. This is absolutely awesome and give a huge thanks to Chalenge Masekera, Christopher Wu, eskrm, and Matthew Tovbin! * ✨ New Destination: Azure Blob Storage :sparkles: -**New Source**: Bamboo HR (contributed by @Oren Haliva) -**New Source**: BigCommerce (contributed by @James Wilson) -**New Source**: Trello -**New Source**: Google Analytics V4 -**New Source**: Amazon Ads +**New Source**: Bamboo HR \(contributed by @Oren Haliva\) **New Source**: BigCommerce \(contributed by @James Wilson\) **New Source**: Trello **New Source**: Google Analytics V4 **New Source**: Amazon Ads * 💎 Alpine Docker images are the new standard for Python connectors, so image sizes have dropped by around 100 MB! -* ✨ You can now apply tolerations for Airbyte Pods on K8s deployments (contributed by @Pras). +* ✨ You can now apply tolerations for Airbyte Pods on K8s deployments \(contributed by @Pras\). * 🐛 Shopify source: Rate limit throttling fixed. * 📚 We now have a doc on how to deploy Airbyte at scale. Check it out here! * 🏗 Airbyte CDK: You can now ignore HTTP status errors and override retry parameters. @@ -65,15 +56,17 @@ As usual, thank you to our awesome contributors: Oren Haliva, Pras, James Wilson ## 08/26/2021 Summary -New Source: Short.io (contributed by @Apostol Tegko) +New Source: Short.io \(contributed by @Apostol Tegko\) + * 💎 GitHub source: Added support for rotating through multiple API tokens! -* ✨ Syncs are now scheduled with a 3 day timeout (contributed by @Vladimir Remar). -* ✨ Google Ads source: Added UserLocationReport stream (contributed by @Max Krog). -* ✨ Cart source: Added the order_items stream. +* ✨ Syncs are now scheduled with a 3 day timeout \(contributed by @Vladimir Remar\). +* ✨ Google Ads source: Added UserLocationReport stream \(contributed by @Max Krog\). +* ✨ Cart source: Added the order\_items stream. * 🐛 Postgres source: Fixed out-of-memory issue with CDC interacting with large JSON blobs. * 🐛 Intercom source: Pagination now works as expected. As always, thank you to our awesome community contributors this week: Apostol Tegko, Vladimir Remar, Max Krog, Pras, Marco Fontana, Troy Harvey, and damianlegawiec! + ## 08/20/2021 Summary Hey Airbyte community, we got some patch notes for y'all. Here's all the changes we've pushed since the last update. @@ -119,12 +112,10 @@ For this week's update, we got... a few new connectors this week in 0.29.0. We f * New Source: Sugar CRM * New Source: Wordpress * New Source: Zencart - - * 🐛 Shopify source: Fixed the products schema to be in accordance with the API * 🐛 BigQuery source: No longer fails with nested array data types. -View the full release highlights here: [Platform](./platform.md), [Connectors](./connectors.md) +View the full release highlights here: [Platform](platform.md), [Connectors](connectors.md) And as always, thank you to our wonderful contributors: Madison Swain-Bowden, Brian Krausz, Apostol Tegko, Matej Hamas, Vladimir Remar, Oren Haliva, satishblotout, jacqueskpoty, wallies @@ -132,22 +123,19 @@ And as always, thank you to our wonderful contributors: Madison Swain-Bowden, Br What's going on? We just released 0.28.0 and here's the main highlights. - * New Destination: Google Cloud Storage ✨ -* New Destination: Kafka ✨ (contributed by @Mario Molina) +* New Destination: Kafka ✨ \(contributed by @Mario Molina\) * New Source: Pipedrive -* New Source: US Census (contributed by @Daniel Mateus Pires (Earnest Research)) - - +* New Source: US Census \(contributed by @Daniel Mateus Pires \(Earnest Research\)\) * ✨ Google Ads source: Now supports Campaigns, Ads, AdGroups, and Accounts streams. -* ✨ Stripe source: All subscription types (including expired and canceled ones) are now returned. +* ✨ Stripe source: All subscription types \(including expired and canceled ones\) are now returned. * 🐛 Facebook source: Improved rate limit management -* 🐛 Square source: The send_request method is no longer broken due to CDK changes +* 🐛 Square source: The send\_request method is no longer broken due to CDK changes * 🐛 MySQL destination: Does not fail on columns with JSON data now. -View the full release highlights here: [Platform](./platform.md), [Connectors](./connectors.md) +View the full release highlights here: [Platform](platform.md), [Connectors](connectors.md) -And as always, thank you to our wonderful contributors: Mario Molina, Daniel Mateus Pires (Earnest Research), gunu, Ankur Adhikari, Vladimir Remar, Madison Swain-Bowden, Maksym Pavlenok, Sam Crowder, mildbyte, avida, and gaart +And as always, thank you to our wonderful contributors: Mario Molina, Daniel Mateus Pires \(Earnest Research\), gunu, Ankur Adhikari, Vladimir Remar, Madison Swain-Bowden, Maksym Pavlenok, Sam Crowder, mildbyte, avida, and gaart ## 07/16/2021 Summary @@ -156,15 +144,13 @@ As for our changes this week... * New Source: Zendesk Sunshine * New Source: Dixa * New Source: Typeform - - * 💎 MySQL destination: Now supports normalization! -* 💎 MSSQL source: Now supports CDC (Change Data Capture) +* 💎 MSSQL source: Now supports CDC \(Change Data Capture\) * ✨ Snowflake destination: Data coming from Airbyte is now identifiable * 🐛 GitHub source: Now uses the correct cursor field for the IssueEvents stream -* 🐛 Square source: The send_request method is no longer broken due to CDK changes +* 🐛 Square source: The send\_request method is no longer broken due to CDK changes -View the full release highlights here: [Platform](./platform.md), [Connectors](./connectors.md) +View the full release highlights here: [Platform](platform.md), [Connectors](connectors.md) As usual, thank you to our awesome community contributors this week: Oliver Meyer, Varun, Brian Krausz, shadabshaukat, Serhii Lazebnyi, Juliano Benvenuto Piovezan, mildbyte, and Sam Crowder! @@ -177,20 +163,18 @@ As usual, thank you to our awesome community contributors this week: Oliver Meye * New Source: Airbyte-Native GitHub * New Source: Airbyte-Native GitLab * New Source: Airbyte-Native Twilio - - * ✨ S3 destination: Now supports anyOf, oneOf and allOf schema fields. * ✨ Instagram source: Migrated to the CDK and has improved error handling. * ✨ Shopify source: Add support for draft orders. * ✨ K8s Deployments: Now support logging to GCS. -* 🐛 GitHub source: Fixed issue with locked breaking normalization of the pull_request stream. +* 🐛 GitHub source: Fixed issue with locked breaking normalization of the pull\_request stream. * 🐛 Okta source: Fix endless loop when syncing data from logs stream. * 🐛 PostgreSQL source: Fixed decimal handling with CDC. * 🐛 Fixed random silent source failures. * 📚 New document on how the CDK handles schemas. * 🏗️ Python CDK: Now allows setting of network adapter args on outgoing HTTP requests. -View the full release highlights here: [Platform](./platform.md), [Connectors](./connectors.md) +View the full release highlights here: [Platform](platform.md), [Connectors](connectors.md) As usual, thank you to our awesome community contributors this week: gunu, P.VAD, Rodrigo Parra, Mario Molina, Antonio Grass, sabifranjo, Jaime Farres, shadabshaukat, Rodrigo Menezes, dkelwa, Jonathan Duval, and Augustin Lafanechère. @@ -199,10 +183,9 @@ As usual, thank you to our awesome community contributors this week: gunu, P.VAD * New Destination: Google PubSub * New Source: AWS CloudTrail -*The risks and issues with upgrading Airbyte are now gone...* +_The risks and issues with upgrading Airbyte are now gone..._ + * 🎉 Airbyte automatically upgrades versions safely at server startup 🎉 - - * 💎 Logs on K8s are now stored in Minio by default, no S3 bucket required * ✨ Looker Source: Supports the Run Look output stream * ✨ Slack Source: is now Airbyte native! @@ -211,11 +194,11 @@ As usual, thank you to our awesome community contributors this week: gunu, P.VAD Starting from next week, our weekly office hours will now become demo days! Drop by to get sneak peeks and new feature demos. -* We added the #careers channel, so if you're hiring, post your job reqs there! -* We added a #understanding-airbyte channel to mirror [this](../../understanding-airbyte) section on our docs site. Ask any questions about our architecture or protocol there. -* We added a #contributing-to-airbyte channel. A lot of people ask us about how to contribute to the project, so ask away there! +* We added the \#careers channel, so if you're hiring, post your job reqs there! +* We added a \#understanding-airbyte channel to mirror [this](../../understanding-airbyte/) section on our docs site. Ask any questions about our architecture or protocol there. +* We added a \#contributing-to-airbyte channel. A lot of people ask us about how to contribute to the project, so ask away there! -View the full release highlights here: [Platform](./platform.md), [Connectors](./connectors.md) +View the full release highlights here: [Platform](platform.md), [Connectors](connectors.md) As usual, thank you to our awesome community contributors this week: Harshith Mullapudi, Michael Irvine, and [sabifranjo](https://github.com/sabifranjo). @@ -227,7 +210,7 @@ As usual, thank you to our awesome community contributors this week: Harshith Mu * ✨ Looker source now supports self-hosted instances. * ✨ Facebook Marketing source is now migrated to the CDK, massively improving async job performance and error handling. -View the full connector release notes [here](./connectors.md). +View the full connector release notes [here](connectors.md). As usual, thank you to some of our awesome community contributors this week: Harshith Mullapudi, Tyler DeLange, Daniel Mateus Pires, EdBizarro, Tyler Schroeder, and Konrad Schlatte! @@ -238,7 +221,7 @@ As usual, thank you to some of our awesome community contributors this week: Har * ✨ We now support configuring your destination namespace at the table level when setting up a connection! * ✨ The S3 destination now supports Minio S3 and Parquet output! -View the full release notes here: [Platform](./platform.md), [Connectors](./connectors.md) +View the full release notes here: [Platform](platform.md), [Connectors](connectors.md) As usual, thank you to some of our awesome community contributors this week: Tyler DeLange, Mario Molina, Rodrigo Parra, Prashanth Patali, Christopher Wu, Itai Admi, Fred Reimer, and Konrad Schlatte! @@ -247,25 +230,25 @@ As usual, thank you to some of our awesome community contributors this week: Tyl * New Destination: [S3!!](../../integrations/destinations/s3.md) * New Sources: [Harvest](../../integrations/sources/harvest.md), [Amplitude](../../integrations/sources/amplitude.md), [Posthog](../../integrations/sources/posthog.md) * 🐛 Ensure that logs from threads created by replication workers are added to the log file. -* 🐛 Handle TINYINT(1) and BOOLEAN correctly and fix target file comparison for MySQL CDC. +* 🐛 Handle TINYINT\(1\) and BOOLEAN correctly and fix target file comparison for MySQL CDC. * Jira source: now supports all available entities in Jira Cloud. * 📚 Added a troubleshooting section, a gradle cheatsheet, a reminder on what the reset button does, and a refresh on our docs best practices. #### Connector Development: + * Containerized connector code generator * Added JDBC source connector bootstrap template. * Added Java destination generator. -View the full release notes highlights here: [Platform](./platform.md), [Connectors](./connectors.md) +View the full release notes highlights here: [Platform](platform.md), [Connectors](connectors.md) -As usual, thank you to some of our awesome community contributors this week (I've noticed that we've had more contributors to our docs, which we really appreciate). -Ping, Harshith Mullapudi, Michael Irvine, Matheus di Paula, jacqueskpoty and P.VAD. +As usual, thank you to some of our awesome community contributors this week \(I've noticed that we've had more contributors to our docs, which we really appreciate\). Ping, Harshith Mullapudi, Michael Irvine, Matheus di Paula, jacqueskpoty and P.VAD. ## Overview Airbyte is comprised of 2 parts: -* Platform (The scheduler, workers, api, web app, and the Airbyte protocol). Here is the [changelog for Platform](platform.md). +* Platform \(The scheduler, workers, api, web app, and the Airbyte protocol\). Here is the [changelog for Platform](platform.md). * Connectors that run in Docker containers. Here is the [changelog for the connectors](connectors.md). ## Airbyte Platform Releases diff --git a/docs/project-overview/changelog/connectors.md b/docs/project-overview/changelog/connectors.md index 1f3a3628c43..08a32a01b92 100644 --- a/docs/project-overview/changelog/connectors.md +++ b/docs/project-overview/changelog/connectors.md @@ -13,20 +13,24 @@ Check out our [connector roadmap](https://github.com/airbytehq/airbyte/projects/ ## 9/9/2021 New source: + * [**Facebook Pages**](https://docs.airbyte.io/integrations/sources/facebook-pages) New destinations: + * [**MongoDB**](https://docs.airbyte.io/integrations/destinations/mongodb) * [**DynamoDB**](https://docs.airbyte.io/integrations/destinations/dynamodb) New features: + * **S3** source: Support for Parquet format. -* **Github** source: Branches, repositories, organization users, tags, and pull request stats streams added (contributed by @Christopher Wu). +* **Github** source: Branches, repositories, organization users, tags, and pull request stats streams added \(contributed by @Christopher Wu\). * **BigQuery** destination: Added GCS upload option. * **Salesforce** source: Now Airbyte native. * **Redshift** destination: Optimized for performance. Bug fixes: + * **Pipedrive** source: Output schemas no longer remove timestamp from fields. * **Github** source: Empty repos and negative backoff values are now handled correctly. * **Harvest** source: Normalization now works as expected. @@ -39,11 +43,12 @@ Bug fixes: * **Hubspot** source: Empty strings are no longer handled as dates, fixing the deals, companies, and contacts streams. * **Typeform** source: Allows for multiple choices in responses now. * **Shopify** source: The type for the amount field is now fixed in the schema. -* **Postgres** destination: \u0000(NULL) value processing is now fixed. +* **Postgres** destination: \u0000\(NULL\) value processing is now fixed. ## 9/1/2021 New sources: + * [**Bamboo HR**](https://docs.airbyte.io/integrations/sources/bamboo-hr) * [**BigCommerce**](https://docs.airbyte.io/integrations/sources/bigcommerce) * [**Trello**](https://docs.airbyte.io/integrations/sources/trello) @@ -51,34 +56,42 @@ New sources: * [**Amazon Ads**](https://docs.airbyte.io/integrations/sources/google-analytics-v4) Bug fixes: + * **Shopify** source: Rate limit throttling fixed. ## 8/26/2021 -New source: +New source: + * [**Short.io**](https://docs.airbyte.io/integrations/sources/shortio) New features: + * **GitHub** source: Add support for rotating through multiple API tokens. * **Google Ads** source: Added `UserLocationReport` stream. * **Cart** source: Added the `order_items` stream. Bug fixes: + * **Postgres** source: Fix out-of-memory issue with CDC interacting with large JSON blobs. * **Intercom** source: Pagination now works as expected. ## 8/18/2021 New source: + * [**Bing Ads**](https://docs.airbyte.io/integrations/sources/bing-ads) New destination: + * [**Keen**](https://docs.airbyte.io/integrations/destinations/keen) New features: + * **Chargebee** source: Adds support for the `items`, `item prices` and `attached items` endpoints. Bug fixes: + * **Quickbooks** source: Now uses the number data type for decimal fields. * **Hubspot** source: Fixed `empty string` inside of the `number` and `float` datatypes. * **GitHub** source: Validation fixed on non-required fields. @@ -88,14 +101,15 @@ Bug fixes: ## 8/9/2021 New sources: + * [**S3/Abstract Files**](https://docs.airbyte.io/integrations/sources/s3) * [**Zuora**](https://docs.airbyte.io/integrations/sources/zuora) * [**Kustomer**](https://docs.airbyte.io/integrations/sources/kustomer) * [**Apify**](https://docs.airbyte.io/integrations/sources/apify-dataset) * [**Chargebee**](https://docs.airbyte.io/integrations/sources/chargebee) - New features: + * **Shopify** source: The `status` property is now in the `Products` stream. * **Amazon Seller Partner** source: Added support for `GET_MERCHANT_LISTINGS_ALL_DATA` and `GET_FBA_INVENTORY_AGED_DATA` stream endpoints. * **GitHub** source: Existing streams now don't minify the `user` property. @@ -103,8 +117,8 @@ New features: * **Zendesk** source: Migrated from Singer to the Airbyte CDK. * **Amazon Seller Partner** source: Migrated to the Airbyte CDK. - Bug fixes: + * **Hubspot** source: Casting exceptions are now logged correctly. * **S3** source: Fixed bug where syncs could hang indefinitely. * **Shopify** source: Fixed the `products` schema to be in accordance with the API. @@ -114,12 +128,13 @@ Bug fixes: * **S3** source: Fixed bug in spec to properly display the `format` field in the UI. New CDK features: -* Now allows for setting request data in non-JSON formats. +* Now allows for setting request data in non-JSON formats. ## 7/30/2021 New sources: + * [**PrestaShop**](https://docs.airbyte.io/integrations/sources/presta-shop) * [**Snapchat Marketing**](https://docs.airbyte.io/integrations/sources/snapchat-marketing) * [**Drupal**](https://docs.airbyte.io/integrations/sources/drupal) @@ -138,6 +153,7 @@ New sources: * [**Zencart**](https://docs.airbyte.io/integrations/sources/zencart) Bug fixes: + * **Shopify** source: Fixed the `products` schema to be in accordance with the API. * **BigQuery** source: No longer fails with `Array of Records` data types. * **BigQuery** destination: Improved logging, Job IDs are now filled with location and Project IDs. @@ -145,22 +161,26 @@ Bug fixes: ## 7/23/2021 New sources: + * [**Pipedrive**](https://docs.airbyte.io/integrations/sources/pipedrive) * [**US Census**](https://docs.airbyte.io/integrations/sources/us-census) * [**BigQuery**](https://docs.airbyte.io/integrations/sources/bigquery) New destinations: + * [**Google Cloud Storage**](https://docs.airbyte.io/integrations/destinations/gcs) * [**Kafka**](https://docs.airbyte.io/integrations/destinations/kafka) New Features: + * **Java Connectors**: Now have config validators for check, discover, read, and write calls -* **Stripe** source: All subscription types are returnable (including expired and canceled ones). +* **Stripe** source: All subscription types are returnable \(including expired and canceled ones\). * **Mixpanel** source: Migrated to the CDK. * **Intercom** source: Migrated to the CDK. * **Google Ads** source: Now supports the `Campaigns`, `Ads`, `AdGroups`, and `Accounts` streams. Bug Fixes: + * **Facebook** source: Improved rate limit management * **Instagram** source: Now supports old format for state and automatically updates it to the new format. * **Sendgrid** source: Now gracefully handles malformed responses from API. @@ -169,23 +189,29 @@ Bug Fixes: * **Slack** source: Now does not fail stream slicing on reading threads. ## 7/16/2021 + 3 new sources: + * [**Zendesk Sunshine**](https://docs.airbyte.io/integrations/sources/zendesk-sunshine) * [**Dixa**](https://docs.airbyte.io/integrations/sources/dixa) * [**Typeform**](https://docs.airbyte.io/integrations/sources/typeform) New Features: + * **MySQL** destination: Now supports normalization! -* **MSSQL** source: Now supports CDC (Change Data Capture). +* **MSSQL** source: Now supports CDC \(Change Data Capture\). * **Snowflake** destination: Data coming from Airbyte is now identifiable. * **GitHub** source: Now handles rate limiting. Bug Fixes: + * **GitHub** source: Now uses the correct cursor field for the `IssueEvents` stream. * **Square** source: `send_request` method is no longer broken. ## 7/08/2021 + 7 new sources: + * [**PayPal Transaction**](https://docs.airbyte.io/integrations/sources/paypal-transaction) * [**Square**](https://docs.airbyte.io/integrations/sources/square) * [**SurveyMonkey**](https://docs.airbyte.io/integrations/sources/surveymonkey) @@ -195,26 +221,29 @@ Bug Fixes: * [**Airbyte-native Twilio**](https://docs.airbyte.io/integrations/sources/twilio) New Features: + * **S3** destination: Now supports `anyOf`, `oneOf` and `allOf` schema fields. * **Instagram** source: Migrated to the CDK and has improved error handling. * **Snowflake** source: Now has comprehensive data type tests. * **Shopify** source: Change the default stream cursor field to `update_at` where possible. * **Shopify** source: Add support for draft orders. * **MySQL** destination: Now supports normalization. - + Connector Development: + * **Python CDK**: Now allows setting of network adapter args on outgoing HTTP requests. * Abstract classes for non-JDBC relational database sources. Bugfixes: -* **GitHub** source: Fixed issue with `locked` breaking normalization of the pull_request stream. + +* **GitHub** source: Fixed issue with `locked` breaking normalization of the pull\_request stream. * **PostgreSQL** source: Fixed decimal handling with CDC. * **Okta** source: Fix endless loop when syncing data from logs stream. - ## 7/01/2021 Bugfixes: + * **Looker** source: Now supports the Run Look stream. * **Google Adwords**: CI is fixed and new version is published. * **Slack** source: Now Airbyte native and supports channels, channel members, messages, users, and threads streams. @@ -224,9 +253,11 @@ Bugfixes: ## 6/24/2021 1 new source: + * [**Db2**](https://docs.airbyte.io/integrations/sources/db2) New features: + * **S3** destination: supports Avro and Jsonl output! * **BigQuery** destination: now supports loading JSON data as structured data. * **Looker** source: Now supports self-hosted instances. @@ -235,9 +266,11 @@ New features: ## 6/18/2021 1 new source: + * [**Snowflake**](https://docs.airbyte.io/integrations/sources/snowflake) New features: + * **Postgres** source: now has comprehensive data type tests. * **Google Ads** source: now uses the [Google Ads Query Language](https://developers.google.com/google-ads/api/docs/query/overview)! * **S3** destination: supports Parquet output! @@ -245,15 +278,19 @@ New features: * **BigQuery** destination: credentials are now optional. ## 6/10/2021 + 1 new destination: + * [**S3**](https://docs.airbyte.io/integrations/destinations/s3) 3 new sources: + * [**Harvest**](https://docs.airbyte.io/integrations/sources/harvest) * [**Amplitude**](https://docs.airbyte.io/integrations/sources/amplitude) * [**Posthog**](https://docs.airbyte.io/integrations/sources/posthog) New features: + * **Jira** source: now supports all available entities in Jira Cloud. * **ExchangeRatesAPI** source: clearer messages around unsupported currencies. * **MySQL** source: Comprehensive core extension to be more compatible with other JDBC sources. @@ -261,6 +298,7 @@ New features: * **Shopify** source: Add order risks + new attributes to orders schema for native connector Bugfixes: + * **MSSQL** destination: fixed handling of unicode symbols. Connector development updates: @@ -269,41 +307,43 @@ Connector development updates: * Added JDBC source connector bootstrap template. * Added Java destination generator. - ## 06/3/2021 2 new sources: + * [**Okta**](https://docs.airbyte.io/integrations/sources/okta) * [**Amazon Seller Partner**](https://docs.airbyte.io/integrations/sources/amazon-seller-partner) New features: -* **MySQL CDC** now only polls for 5 minutes if we haven't received any records ([#3789](https://github.com/airbytehq/airbyte/pull/3789)) -* **Python CDK** now supports Python 3.7.X ([#3692](https://github.com/airbytehq/airbyte/pull/3692)) -* **File** source: now supports Azure Blob Storage ([#3660](https://github.com/airbytehq/airbyte/pull/3660)) + +* **MySQL CDC** now only polls for 5 minutes if we haven't received any records \([\#3789](https://github.com/airbytehq/airbyte/pull/3789)\) +* **Python CDK** now supports Python 3.7.X \([\#3692](https://github.com/airbytehq/airbyte/pull/3692)\) +* **File** source: now supports Azure Blob Storage \([\#3660](https://github.com/airbytehq/airbyte/pull/3660)\) Bugfixes: -* **Recurly** source: now uses type `number` instead of `integer` ([#3769](https://github.com/airbytehq/airbyte/pull/3769)) -* **Stripe** source: fix types in schema ([#3744](https://github.com/airbytehq/airbyte/pull/3744)) -* **Stripe** source: output `number` instead of `int` ([#3728](https://github.com/airbytehq/airbyte/pull/3728)) -* **MSSQL** destination: fix issue with unicode symbols handling ([#3671](https://github.com/airbytehq/airbyte/pull/3671)) -*** +* **Recurly** source: now uses type `number` instead of `integer` \([\#3769](https://github.com/airbytehq/airbyte/pull/3769)\) +* **Stripe** source: fix types in schema \([\#3744](https://github.com/airbytehq/airbyte/pull/3744)\) +* **Stripe** source: output `number` instead of `int` \([\#3728](https://github.com/airbytehq/airbyte/pull/3728)\) +* **MSSQL** destination: fix issue with unicode symbols handling \([\#3671](https://github.com/airbytehq/airbyte/pull/3671)\) ## 05/25/2021 4 new sources: + * [**Asana**](https://docs.airbyte.io/integrations/sources/asana) * [**Klaviyo**](https://docs.airbyte.io/integrations/sources/klaviyo) * [**Recharge**](https://docs.airbyte.io/integrations/sources/recharge) * [**Tempo**](https://docs.airbyte.io/integrations/sources/tempo) Progress on connectors: + * **CDC for MySQL** is now available! -* **Sendgrid** source: support incremental sync, as rewritten using HTTP CDK ([#3445](https://github.com/airbytehq/airbyte/pull/3445)) -* **Github** source bugfix: exception when parsing null date values, use `created_at` as cursor value for issue_milestones ([#3314](https://github.com/airbytehq/airbyte/pull/3314)) -* **Slack** source bugfix: don't overwrite thread_ts in threads stream ([#3483](https://github.com/airbytehq/airbyte/pull/3483)) -* **Facebook Marketing** source: allow configuring insights lookback window ([#3396](https://github.com/airbytehq/airbyte/pull/3396)) -* **Freshdesk** source: fix discovery ([#3591](https://github.com/airbytehq/airbyte/pull/3591)) +* **Sendgrid** source: support incremental sync, as rewritten using HTTP CDK \([\#3445](https://github.com/airbytehq/airbyte/pull/3445)\) +* **Github** source bugfix: exception when parsing null date values, use `created_at` as cursor value for issue\_milestones \([\#3314](https://github.com/airbytehq/airbyte/pull/3314)\) +* **Slack** source bugfix: don't overwrite thread\_ts in threads stream \([\#3483](https://github.com/airbytehq/airbyte/pull/3483)\) +* **Facebook Marketing** source: allow configuring insights lookback window \([\#3396](https://github.com/airbytehq/airbyte/pull/3396)\) +* **Freshdesk** source: fix discovery \([\#3591](https://github.com/airbytehq/airbyte/pull/3591)\) ## 05/18/2021 @@ -312,19 +352,22 @@ Progress on connectors: 1 new source: [**ClickHouse**](https://docs.airbyte.io/integrations/sources/clickhouse) Progress on connectors: -* **Shopify**: make this source more resilient to timeouts ([#3409](https://github.com/airbytehq/airbyte/pull/3409)) -* **Freshdesk** bugfix: output correct schema for various streams ([#3376](https://github.com/airbytehq/airbyte/pull/3376)) -* **Iterable**: update to use latest version of CDK ([#3378](https://github.com/airbytehq/airbyte/pull/3378)) + +* **Shopify**: make this source more resilient to timeouts \([\#3409](https://github.com/airbytehq/airbyte/pull/3409)\) +* **Freshdesk** bugfix: output correct schema for various streams \([\#3376](https://github.com/airbytehq/airbyte/pull/3376)\) +* **Iterable**: update to use latest version of CDK \([\#3378](https://github.com/airbytehq/airbyte/pull/3378)\) ## 05/11/2021 1 new destination: [**MySQL**](https://docs.airbyte.io/integrations/destinations/mysql) 2 new sources: + * [**Google Search Console**](https://docs.airbyte.io/integrations/sources/google-search-console) * [**PokeAPI**](https://docs.airbyte.io/integrations/sources/pokeapi) \(talking about long tail and having fun ;\)\) Progress on connectors: + * **Zoom**: bugfix on declaring correct types to match data coming from API \([\#3159](https://github.com/airbytehq/airbyte/pull/3159)\), thanks to [vovavovavovavova](https://github.com/vovavovavovavova) * **Smartsheets**: bugfix on gracefully handling empty cell values \([\#3337](https://github.com/airbytehq/airbyte/pull/3337)\), thanks to [Nathan Nowack](https://github.com/zzstoatzz) * **Stripe**: fix date property name, only add connected account header when set, and set primary key \(\#3210\), thanks to [Nathan Yergler](https://github.com/nyergler) @@ -458,7 +501,7 @@ Other progress on connectors: ## 01/19/2021 -* **Our new** [**Connector Health Grade**](../../integrations) **page** +* **Our new** [**Connector Health Grade**](../../integrations/) **page** * **1 new source:** App Store \(thanks to [@Muriloo](https://github.com/Muriloo)\) * Fixes on connectors: * Bug fix writing boolean columns to Redshift @@ -499,8 +542,7 @@ Other progress on connectors: ## 12/04/2020 -**New sources:** [Redshift](../../integrations/sources/redshift.md), [Greenhouse](../../integrations/sources/greenhouse.md) -**New destination:** [Redshift](../../integrations/destinations/redshift.md) +**New sources:** [Redshift](../../integrations/sources/redshift.md), [Greenhouse](../../integrations/sources/greenhouse.md) **New destination:** [Redshift](../../integrations/destinations/redshift.md) ## 11/30/2020 @@ -532,8 +574,7 @@ Other progress on connectors: ## 11/04/2020 -**New sources:** [Facebook Ads](connectors.md), [Google Ads](../../integrations/sources/google-adwords.md), [Marketo](../../integrations/sources/marketo.md) -**New destination:** [Snowflake](../../integrations/destinations/snowflake.md) +**New sources:** [Facebook Ads](connectors.md), [Google Ads](../../integrations/sources/google-adwords.md), [Marketo](../../integrations/sources/marketo.md) **New destination:** [Snowflake](../../integrations/destinations/snowflake.md) ## 10/30/2020 @@ -545,6 +586,5 @@ Other progress on connectors: ## 09/23/2020 -**New sources:** [Stripe](../../integrations/sources/stripe.md), [Postgres](../../integrations/sources/postgres.md) -**New destinations:** [BigQuery](../../integrations/destinations/bigquery.md), [Postgres](../../integrations/destinations/postgres.md), [local CSV](../../integrations/destinations/local-csv.md) +**New sources:** [Stripe](../../integrations/sources/stripe.md), [Postgres](../../integrations/sources/postgres.md) **New destinations:** [BigQuery](../../integrations/destinations/bigquery.md), [Postgres](../../integrations/destinations/postgres.md), [local CSV](../../integrations/destinations/local-csv.md) diff --git a/docs/project-overview/changelog/platform.md b/docs/project-overview/changelog/platform.md index 356d7c9a80f..d0894e56769 100644 --- a/docs/project-overview/changelog/platform.md +++ b/docs/project-overview/changelog/platform.md @@ -7,92 +7,119 @@ description: Be sure to not miss out on new features and improvements! This is the changelog for Airbyte Platform. For our connector changelog, please visit our [Connector Changelog](connectors.md) page. ## [09-08-2021 - 0.29.17](https://github.com/airbytehq/airbyte/releases/tag/v0.29.17-alpha) + * You can now properly cancel deployments when deploying on K8s. ## [09-08-2021 - 0.29.16](https://github.com/airbytehq/airbyte/releases/tag/v0.29.16-alpha) + * You can now send notifications via webhook for successes and failures on Airbyte syncs. * Scheduling jobs and worker jobs are now separated, allowing for workers to be scaled horizontally. ## [09-04-2021 - 0.29.15](https://github.com/airbytehq/airbyte/releases/tag/v0.29.15-alpha) + * Fixed a bug that made it possible for connector definitions to be duplicated, violating uniqueness. ## [09-02-2021 - 0.29.14](https://github.com/airbytehq/airbyte/releases/tag/v0.29.14-alpha) + * Nothing of note. ## [08-27-2021 - 0.29.13](https://github.com/airbytehq/airbyte/releases/tag/v0.29.13-alpha) + * The scheduler now waits for the server before it creates any databases. * You can now apply tolerations for Airbyte Pods on K8s deployments. ## [08-23-2021 - 0.29.12](https://github.com/airbytehq/airbyte/releases/tag/v0.29.12-alpha) + * Syncs now have a `max_sync_timeout` that times them out after 3 days. * Fixed Kube deploys when logging with Minio. ## [08-20-2021 - 0.29.11](https://github.com/airbytehq/airbyte/releases/tag/v0.29.11-alpha) + * Nothing of note. ## [08-20-2021 - 0.29.10](https://github.com/airbytehq/airbyte/releases/tag/v0.29.10-alpha) + * Migration of Python connector template images to Alpine Docker images to reduce size. ## [08-20-2021 - 0.29.9](https://github.com/airbytehq/airbyte/releases/tag/v0.29.9-alpha) + * Nothing of note. ## [08-17-2021 - 0.29.8](https://github.com/airbytehq/airbyte/releases/tag/v0.29.8-alpha) + * Nothing of note. ## [08-14-2021 - 0.29.7](https://github.com/airbytehq/airbyte/releases/tag/v0.29.7-alpha) + * Re-release: Fixed errant ENV variable in `0.29.6` ## [08-14-2021 - 0.29.6](https://github.com/airbytehq/airbyte/releases/tag/v0.29.6-alpha) + * Connector pods no longer fail with edge case names for the associated Docker images. ## [08-14-2021 - 0.29.5](https://github.com/airbytehq/airbyte/releases/tag/v0.29.5-alpha) + * Nothing of note. ## [08-12-2021 - 0.29.4](https://github.com/airbytehq/airbyte/releases/tag/v0.29.4-alpha) + * Introduced implementation for date-time support in normalization. ## [08-9-2021 - 0.29.3](https://github.com/airbytehq/airbyte/releases/tag/v0.29.3-alpha) + * Importing configuration no longer removes available but unused connectors. ## [08-6-2021 - 0.29.2](https://github.com/airbytehq/airbyte/releases/tag/v0.29.2-alpha) + * Fixed nil pointer exception in version migrations. ## [07-29-2021 - 0.29.1](https://github.com/airbytehq/airbyte/releases/tag/v0.29.1-alpha) + * When migrating, types represented in the config archive need to be a subset of the types declared in the schema. ## [07-28-2021 - 0.29.0](https://github.com/airbytehq/airbyte/releases/tag/v0.29.0-alpha) + * Deprecated `DEFAULT_WORKSPACE_ID`; default workspace no longer exists by default. ## [07-28-2021 - 0.28.2](https://github.com/airbytehq/airbyte/releases/tag/v0.28.2-alpha) + * Backend now handles workspaceId for WebBackend operations. ## [07-26-2021 - 0.28.1](https://github.com/airbytehq/airbyte/releases/tag/v0.28.1-alpha) + * K8s: Overly-sensitive logs are now silenced. ## [07-22-2021 - 0.28.0](https://github.com/airbytehq/airbyte/releases/tag/v0.28.0-alpha) + * Acceptance test dependencies fixed. ## [07-22-2021 - 0.27.5](https://github.com/airbytehq/airbyte/releases/tag/v0.27.5-alpha) + * Fixed unreliable logging on Kubernetes deployments. * Introduced pre-commit to auto-format files on commits. ## [07-21-2021 - 0.27.4](https://github.com/airbytehq/airbyte/releases/tag/v0.27.4-alpha) + * Config persistence is now migrated to the internal Airbyte database. * Source connector ports now properly close when deployed on Kubernetes. * Missing dependencies added that allow acceptance tests to run. ## [07-15-2021 - 0.27.3](https://github.com/airbytehq/airbyte/releases/tag/v0.27.3-alpha) + * Fixed some minor API spec errors. ## [07-12-2021 - 0.27.2](https://github.com/airbytehq/airbyte/releases/tag/v0.27.2-alpha) + * GCP environment variable is now stubbed out to prevent noisy and harmless errors. ## [07-8-2021 - 0.27.1](https://github.com/airbytehq/airbyte/releases/tag/v0.27.1-alpha) + * New API endpoint: List workspaces * K8s: Server doesn't start up before Temporal is ready to operate now. * Silent source failures caused by last patch fixed to throw exceptions. ## [07-1-2021 - 0.27.0](https://github.com/airbytehq/airbyte/releases/tag/v0.27.0-alpha) + * Airbyte now automatically upgrades on server startup! * Airbyte will check whether your `.env` Airbyte version is compatible with the Airbyte version in the database and upgrade accordingly. * When running Airbyte on K8s logs will automatically be stored in a Minio bucket unless configured otherwise. @@ -110,7 +137,7 @@ This is the changelog for Airbyte Platform. For our connector changelog, please ## [06-09-2021 - 0.24.8 / 0.25.0](https://github.com/airbytehq/airbyte/releases/tag/v0.24.8-alpha) -* Bugfix: Handle TINYINT(1) and BOOLEAN correctly and fix target file comparison for MySQL CDC. +* Bugfix: Handle TINYINT\(1\) and BOOLEAN correctly and fix target file comparison for MySQL CDC. * Bugfix: Updating the source/destination name in the UI now works as intended. ## [06-04-2021 - 0.24.7](https://github.com/airbytehq/airbyte/releases/tag/v0.24.7-alpha) @@ -129,26 +156,26 @@ This is the changelog for Airbyte Platform. For our connector changelog, please * Minor fixes to documentation * Reliability updates in preparation for custom transformations -* Limit Docker log size to 500 MB ([#3702](https://github.com/airbytehq/airbyte/pull/3702)) +* Limit Docker log size to 500 MB \([\#3702](https://github.com/airbytehq/airbyte/pull/3702)\) ## [05-26-2021 - 0.24.2](https://github.com/airbytehq/airbyte/releases/tag/v0.24.2-alpha) -* Fix for file names being too long in Windows deployments ([#3625](https://github.com/airbytehq/airbyte/pull/3625)) -* Allow users to access the API and WebApp from the same port ([#3603](https://github.com/airbytehq/airbyte/pull/3603)) +* Fix for file names being too long in Windows deployments \([\#3625](https://github.com/airbytehq/airbyte/pull/3625)\) +* Allow users to access the API and WebApp from the same port \([\#3603](https://github.com/airbytehq/airbyte/pull/3603)\) ## [05-25-2021 - 0.24.1](https://github.com/airbytehq/airbyte/releases/tag/v0.24.1-alpha) -* **Checkpointing for incremental syncs** that will now continue where they left off even if they fail! ([#3290](https://github.com/airbytehq/airbyte/pull/3290)) +* **Checkpointing for incremental syncs** that will now continue where they left off even if they fail! \([\#3290](https://github.com/airbytehq/airbyte/pull/3290)\) ## [05-25-2021 - 0.24.0](https://github.com/airbytehq/airbyte/releases/tag/v0.24.0-alpha) -* Avoid dbt runtime exception "maximum recursion depth exceeded" in ephemeral materialization ([#3470](https://github.com/airbytehq/airbyte/pull/3470)) +* Avoid dbt runtime exception "maximum recursion depth exceeded" in ephemeral materialization \([\#3470](https://github.com/airbytehq/airbyte/pull/3470)\) ## [05-18-2021 - 0.23.0](https://github.com/airbytehq/airbyte/releases/tag/v0.23.0-alpha) -* Documentation to deploy locally on Windows is now available ([#3425](https://github.com/airbytehq/airbyte/pull/3425)) +* Documentation to deploy locally on Windows is now available \([\#3425](https://github.com/airbytehq/airbyte/pull/3425)\) * Connector icons are now displayed in the UI -* Restart core containers if they fail automatically ([#3423](https://github.com/airbytehq/airbyte/pull/3423)) +* Restart core containers if they fail automatically \([\#3423](https://github.com/airbytehq/airbyte/pull/3423)\) * Progress on supporting custom transformation using dbt. More updates on this soon! ## [05-11-2021 - 0.22.3](https://github.com/airbytehq/airbyte/releases/tag/v0.22.3-alpha) diff --git a/docs/project-overview/licenses/README.md b/docs/project-overview/licenses/README.md index 93c64ce24f5..5627fdc6f31 100644 --- a/docs/project-overview/licenses/README.md +++ b/docs/project-overview/licenses/README.md @@ -9,6 +9,5 @@ The license for a particular work is defined with following prioritized rules: 3. First LICENSE found when exploring parent directories up to the project top level directory 4. Defaults to Elastic License 2.0 -If you have any question regarding licenses, just visit our [FAQ](https://airbyte.io/license-faq) or [contact us](mailto:license@airbyte.io). - +If you have any question regarding licenses, just visit our [FAQ](https://airbyte.io/license-faq) or [contact us](mailto:license@airbyte.io). diff --git a/docs/project-overview/licenses/elv2-license.md b/docs/project-overview/licenses/elv2-license.md index 6a52dfa68df..2986bc13962 100644 --- a/docs/project-overview/licenses/elv2-license.md +++ b/docs/project-overview/licenses/elv2-license.md @@ -1,44 +1,38 @@ -Elastic License 2.0 (ELv2) +# ELv2 -**Acceptance** -By using the software, you agree to all of the terms and conditions below. +Elastic License 2.0 \(ELv2\) -**Copyright License** -The licensor grants you a non-exclusive, royalty-free, worldwide, non-sublicensable, non-transferable license to use, copy, distribute, make available, and prepare derivative works of the software, in each case subject to the limitations and conditions below +**Acceptance** By using the software, you agree to all of the terms and conditions below. -**Limitations** -You may not provide the software to third parties as a hosted or managed service, where the service provides users with access to any substantial set of the features or functionality of the software. +**Copyright License** The licensor grants you a non-exclusive, royalty-free, worldwide, non-sublicensable, non-transferable license to use, copy, distribute, make available, and prepare derivative works of the software, in each case subject to the limitations and conditions below + +**Limitations** You may not provide the software to third parties as a hosted or managed service, where the service provides users with access to any substantial set of the features or functionality of the software. You may not move, change, disable, or circumvent the license key functionality in the software, and you may not remove or obscure any functionality in the software that is protected by the license key. You may not alter, remove, or obscure any licensing, copyright, or other notices of the licensor in the software. Any use of the licensor’s trademarks is subject to applicable law. -**Patents** -The licensor grants you a license, under any patent claims the licensor can license, or becomes able to license, to make, have made, use, sell, offer for sale, import and have imported the software, in each case subject to the limitations and conditions in this license. This license does not cover any patent claims that you cause to be infringed by modifications or additions to the software. If you or your company make any written claim that the software infringes or contributes to infringement of any patent, your patent license for the software granted under these terms ends immediately. If your company makes such a claim, your patent license ends immediately for work on behalf of your company. +**Patents** The licensor grants you a license, under any patent claims the licensor can license, or becomes able to license, to make, have made, use, sell, offer for sale, import and have imported the software, in each case subject to the limitations and conditions in this license. This license does not cover any patent claims that you cause to be infringed by modifications or additions to the software. If you or your company make any written claim that the software infringes or contributes to infringement of any patent, your patent license for the software granted under these terms ends immediately. If your company makes such a claim, your patent license ends immediately for work on behalf of your company. -**Notices** -You must ensure that anyone who gets a copy of any part of the software from you also gets a copy of these terms. +**Notices** You must ensure that anyone who gets a copy of any part of the software from you also gets a copy of these terms. If you modify the software, you must include in any modified copies of the software prominent notices stating that you have modified the software. -**No Other Rights** -These terms do not imply any licenses other than those expressly granted in these terms. +**No Other Rights** These terms do not imply any licenses other than those expressly granted in these terms. -**Termination** -If you use the software in violation of these terms, such use is not licensed, and your licenses will automatically terminate. If the licensor provides you with a notice of your violation, and you cease all violation of this license no later than 30 days after you receive that notice, your licenses will be reinstated retroactively. However, if you violate these terms after such reinstatement, any additional violation of these terms will cause your licenses to terminate automatically and permanently. +**Termination** If you use the software in violation of these terms, such use is not licensed, and your licenses will automatically terminate. If the licensor provides you with a notice of your violation, and you cease all violation of this license no later than 30 days after you receive that notice, your licenses will be reinstated retroactively. However, if you violate these terms after such reinstatement, any additional violation of these terms will cause your licenses to terminate automatically and permanently. -**No Liability** -As far as the law allows, the software comes as is, without any warranty or condition, and the licensor will not be liable to you for any damages arising out of these terms or the use or nature of the software, under any kind of legal claim. +**No Liability** As far as the law allows, the software comes as is, without any warranty or condition, and the licensor will not be liable to you for any damages arising out of these terms or the use or nature of the software, under any kind of legal claim. -**Definitions** -The *licensor* is the entity offering these terms, and the *software* is the software the licensor makes available under these terms, including any portion of it. +**Definitions** The _licensor_ is the entity offering these terms, and the _software_ is the software the licensor makes available under these terms, including any portion of it. -*you* refers to the individual or entity agreeing to these terms. +_you_ refers to the individual or entity agreeing to these terms. -*your company* is any legal entity, sole proprietorship, or other kind of organization that you work for, plus all organizations that have control over, are under the control of, or are under common control with that organization. *control* means ownership of substantially all the assets of an entity, or the power to direct its management and policies by vote, contract, or otherwise. Control can be direct or indirect. +_your company_ is any legal entity, sole proprietorship, or other kind of organization that you work for, plus all organizations that have control over, are under the control of, or are under common control with that organization. _control_ means ownership of substantially all the assets of an entity, or the power to direct its management and policies by vote, contract, or otherwise. Control can be direct or indirect. -*your licenses* are all the licenses granted to you for the software under these terms. +_your licenses_ are all the licenses granted to you for the software under these terms. -*use* means anything you do with the software requiring one of your licenses. +_use_ means anything you do with the software requiring one of your licenses. + +_trademark_ means trademarks, service marks, and similar rights. -*trademark* means trademarks, service marks, and similar rights. diff --git a/docs/project-overview/licenses/mit-license.md b/docs/project-overview/licenses/mit-license.md index ec45d182fcb..c9cef864ea5 100644 --- a/docs/project-overview/licenses/mit-license.md +++ b/docs/project-overview/licenses/mit-license.md @@ -1,21 +1,12 @@ +# MIT + MIT License -Copyright (c) 2020 Airbyte, Inc. +Copyright \(c\) 2020 Airbyte, Inc. -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: +Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files \(the "Software"\), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: -The above copyright notice and this permission notice shall be included in all -copies or substantial portions of the Software. +The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -SOFTWARE. diff --git a/docs/project-overview/roadmap.md b/docs/project-overview/roadmap.md index 4f2702b6fc1..99ef5c8638a 100644 --- a/docs/project-overview/roadmap.md +++ b/docs/project-overview/roadmap.md @@ -20,7 +20,7 @@ We understand that we're not "production-ready" for a lot of companies yet. In t * OAuth support for connector configuration \([\#768](https://github.com/airbytehq/airbyte/issues/768)\). **Coming a bit later:** - + * Support for creating destination connectors with the CDK. * Our declarative interface \(CLI\). * Credential and secrets vaulting \([\#837](https://github.com/airbytehq/airbyte/issues/837)\). @@ -32,7 +32,7 @@ Our goal is to become "production-ready" for any company whatever their data sta We also wanted to share with you how we think about the high-level roadmap over the next few months and years. We foresee several high-level phases that we will try to share here. -### **1. Parity on data consolidation (ELT) in warehouses / databases** +### **1. Parity on data consolidation \(ELT\) in warehouses / databases** Our first focus is to support batch-type ELT integrations. We feel that we can provide value right away as soon as we support one of the integrations you need. Batch integrations are also easier to build and sustain. So we would rather start with that. diff --git a/docs/project-overview/slack-code-of-conduct.md b/docs/project-overview/slack-code-of-conduct.md index 7486e187638..25761a2481e 100644 --- a/docs/project-overview/slack-code-of-conduct.md +++ b/docs/project-overview/slack-code-of-conduct.md @@ -1,5 +1,5 @@ --- -description: 'Be nice to one another.' +description: Be nice to one another. --- # Slack Code of Conduct @@ -21,14 +21,16 @@ If you ask a question and don’t get a response, review your question for clari This is a public forum; do not contact individual members of this community without their explicit permission, independent of reason. ## Rule 5: No soliciting! -If you’re a vendor, you may advertise your product in #shameless-plugs. Advertising your product anywhere else, especially in direct messages (DMs), is strictly against the rules. We are appreciative when recruiters and vendors identify themselves through their Slack username. + +If you’re a vendor, you may advertise your product in \#shameless-plugs. Advertising your product anywhere else, especially in direct messages \(DMs\), is strictly against the rules. We are appreciative when recruiters and vendors identify themselves through their Slack username. ## Rule 6: Don't spam tags. For support and questions, generally avoid tagging community members. You will find that our community of volunteers is generally very responsive and amazingly helpful! As mentioned above, if you don’t receive an answer to your question, feel free to ping @support. ## Rule 7: Use threads for discussion. + Using threads allows us to scope conversations without burying messages before it! They allow us to be organized in responding to questions and help keep our time to first response and resolution very low. ---- -*If you see a message or receive a direct message that violates any of these rules, please contact an Airbyte team member and we will take the appropriate moderation action immediately. We have zero tolerance for intentional rule-breaking and hate speech.* +_If you see a message or receive a direct message that violates any of these rules, please contact an Airbyte team member and we will take the appropriate moderation action immediately. We have zero tolerance for intentional rule-breaking and hate speech._ + diff --git a/docs/quickstart/add-a-source.md b/docs/quickstart/add-a-source.md index d9e6ce797ae..3d299d4b319 100644 --- a/docs/quickstart/add-a-source.md +++ b/docs/quickstart/add-a-source.md @@ -12,4 +12,5 @@ You might have to wait ~30 seconds before the fields show up because it is the f ![](../.gitbook/assets/getting-started-source.png) -Can't find the connectors that you want? Try your hand at easily building one yourself using our [Python CDK for HTTP API sources!](../connector-development/cdk-python) +Can't find the connectors that you want? Try your hand at easily building one yourself using our [Python CDK for HTTP API sources!](../connector-development/cdk-python/) + diff --git a/docs/quickstart/deploy-airbyte.md b/docs/quickstart/deploy-airbyte.md index 22f124a220f..38163974c68 100644 --- a/docs/quickstart/deploy-airbyte.md +++ b/docs/quickstart/deploy-airbyte.md @@ -1,9 +1,8 @@ -# Deploy Airbyte Open-Source +# Deploy Airbyte Deploying Airbyte Open-Source just takes two steps. 1. Install Docker on your workstation \(see [instructions](https://www.docker.com/products/docker-desktop)\). Make sure you're on the latest version of `docker-compose`. - 2. Run the following commands in your terminal: ```bash @@ -20,8 +19,9 @@ Alternatively, if you have an Airbyte Cloud invite, just follow [these steps.](. If you have any questions about the Airbyte Open-Source setup and deployment process, head over to our [Getting Started FAQ](https://discuss.airbyte.io/c/faq/15) on our Discourse that answers the following questions and more: -- How long does it take to set up Airbyte? -- Where can I see my data once I've run a sync? -- Can I set a start time for my sync? +* How long does it take to set up Airbyte? +* Where can I see my data once I've run a sync? +* Can I set a start time for my sync? + +If there are any questions that we couldn't answer here, we'd love to help you get started. [Join our Slack](https://airbytehq.slack.com/ssb/redirect) and feel free to ask your questions in the \#getting-started channel. -If there are any questions that we couldn't answer here, we'd love to help you get started. [Join our Slack](https://airbytehq.slack.com/ssb/redirect) and feel free to ask your questions in the #getting-started channel. diff --git a/docs/quickstart/set-up-a-connection.md b/docs/quickstart/set-up-a-connection.md index 31ab2180e4e..7147edcba3a 100644 --- a/docs/quickstart/set-up-a-connection.md +++ b/docs/quickstart/set-up-a-connection.md @@ -34,7 +34,7 @@ jq '._airbyte_data | {abilities: .abilities, weight: .weight}' And there you have it. You've pulled data from an API directly into a file, with all of the actual configuration for this replication only taking place in the UI. Note: If you are using Airbyte on Windows with WSL2 and Docker, refer to [this tutorial](../operator-guides/locating-files-local-destination.md) or [this section](../integrations/destinations/local-json.md#access-replicated-data-files) in the local-json destination guide to locate the replicated folder and file. - + ## That's it! This is just the beginning of using Airbyte. We support a large collection of sources and destinations. You can even contribute your own. @@ -43,9 +43,7 @@ If you have any questions at all, please reach out to us on [Slack](https://slac Thank you and we hope you enjoy using Airbyte. - {% hint style="warning" %} -At the moment, Airbyte runs a full-refresh to recreate the final tables. This can cause more costs in some destinations like Snowflake, Redshift, and Bigquery. -To understand better what sync mode and frequency you should select, read [this doc](../understanding-airbyte/connections/README.md). -There is a FAQ topic on our Discourse that more extensively explains the cost issue [here](https://discuss.airbyte.io/t/why-are-my-final-tables-are-being-recreated-everytime/76). +At the moment, Airbyte runs a full-refresh to recreate the final tables. This can cause more costs in some destinations like Snowflake, Redshift, and Bigquery. To understand better what sync mode and frequency you should select, read [this doc](../understanding-airbyte/connections/). There is a FAQ topic on our Discourse that more extensively explains the cost issue [here](https://discuss.airbyte.io/t/why-are-my-final-tables-are-being-recreated-everytime/76). {% endhint %} + diff --git a/docs/troubleshooting/README.md b/docs/troubleshooting/README.md index 865994b9107..43c736ddbd4 100644 --- a/docs/troubleshooting/README.md +++ b/docs/troubleshooting/README.md @@ -1,33 +1,30 @@ # Troubleshooting & FAQ -Our FAQ is now a section on our Discourse forum. Check it out [here](https://discuss.airbyte.io/c/faq/15)! -If you don't see your question answered, feel free to open up a new topic for it. +Our FAQ is now a section on our Discourse forum. Check it out [here](https://discuss.airbyte.io/c/faq/15)! If you don't see your question answered, feel free to open up a new topic for it. -The troubleshooting section is aimed at collecting common issues users have to provide quick solutions. -There are some sections you can find: -- [On Deploying](on-deploying.md): -- [On Setting up a New Connection](new-connection.md) -- [On Running a Sync](running-sync.md) -- [On Upgrading](on-upgrading.md) +The troubleshooting section is aimed at collecting common issues users have to provide quick solutions. There are some sections you can find: +* [On Deploying](on-deploying.md): +* [On Setting up a New Connection](new-connection.md) +* [On Running a Sync](running-sync.md) +* [On Upgrading](on-upgrading.md) -If you don't see your issue listed in those sections, you can send a message in our #issues Slack channel. -Using the template bellow will allow us to address your issue quickly and will give us full understanding of your situation. - +If you don't see your issue listed in those sections, you can send a message in our \#issues Slack channel. Using the template bellow will allow us to address your issue quickly and will give us full understanding of your situation. ## Slack Issue Template -**Is this your first time deploying Airbyte**: No / Yes
-**OS Version / Instance**: Ubuntu 18.04, Mac OS, Windows, GCP , EC2 micro.a4
-**Memory / Disk**: 16Gb / 1Tb SSD
-**Deployment**: Docker / Kubernetes
-**Airbyte Version**: 0.26.2-alpha
-**Source name/version**: File 0.24
-**Destination name/version**: Postgres 0.3.0
-**Step**: Setting new connection, source / On sync
-**Description**: I'm trying to sync for the first time and the process doesn't finish. I had enabled CDC and other cool features.
+**Is this your first time deploying Airbyte**: No / Yes + **OS Version / Instance**: Ubuntu 18.04, Mac OS, Windows, GCP , EC2 micro.a4 + **Memory / Disk**: 16Gb / 1Tb SSD + **Deployment**: Docker / Kubernetes + **Airbyte Version**: 0.26.2-alpha + **Source name/version**: File 0.24 + **Destination name/version**: Postgres 0.3.0 + **Step**: Setting new connection, source / On sync + **Description**: I'm trying to sync for the first time and the process doesn't finish. I had enabled CDC and other cool features. -Add the logs and other relevant information in the message thread. -Below is an example: + +Add the logs and other relevant information in the message thread. Below is an example: ![](../.gitbook/assets/issue-example.png) + diff --git a/docs/troubleshooting/new-connection.md b/docs/troubleshooting/new-connection.md index 69d309be0d7..ada2ebb126c 100644 --- a/docs/troubleshooting/new-connection.md +++ b/docs/troubleshooting/new-connection.md @@ -30,16 +30,17 @@ If you are running into connection refused errors when running Airbyte via Docke ## I don’t see a form when selecting a connector -We’ve had that issue once. (no spinner & 500 http error). We don’t know why. Resolution: try to stop airbyte (`docker-compose down`) & restart (`docker-compose up`) +We’ve had that issue once. \(no spinner & 500 http error\). We don’t know why. Resolution: try to stop airbyte \(`docker-compose down`\) & restart \(`docker-compose up`\) ## Connection hangs when trying to run the discovery step -You receive the error below when you tried to sync a database with a lot of tables (6000 or more). +You receive the error below when you tried to sync a database with a lot of tables \(6000 or more\). ```bash airbyte-scheduler | io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: grpc: received message larger than max ( vs. 4194304) ``` -There are two Github issues tracking this problem: [Issue #3942](https://github.com/airbytehq/airbyte/issues/3942) and [Issue #3943](https://github.com/airbytehq/airbyte/issues/3943) -The workaround for this is trying to transfer the tables you really want to use to another namespace. -If you need all tables you should split them into separate namespaces and try to use two connections. +There are two Github issues tracking this problem: [Issue \#3942](https://github.com/airbytehq/airbyte/issues/3942) and [Issue \#3943](https://github.com/airbytehq/airbyte/issues/3943) + +The workaround for this is trying to transfer the tables you really want to use to another namespace. If you need all tables you should split them into separate namespaces and try to use two connections. + diff --git a/docs/troubleshooting/on-deploying.md b/docs/troubleshooting/on-deploying.md index ae6842893fd..e9175bae3cd 100644 --- a/docs/troubleshooting/on-deploying.md +++ b/docs/troubleshooting/on-deploying.md @@ -2,24 +2,27 @@ description: Common issues and their workarounds when trying to deploy Airbyte --- -# On deploying +# On Deploying ## Stuck in onboarding, can’t skip or do anything -To fully reset Airbyte, you also need to delete the docker volumes associated with Airbyte. This is where data is stored. -Assuming that you are running Airbyte by running `docker-compose up`, then what you need to do is: + +To fully reset Airbyte, you also need to delete the docker volumes associated with Airbyte. This is where data is stored. Assuming that you are running Airbyte by running `docker-compose up`, then what you need to do is: + * Turn off Airbyte completely: `docker-compose down -v` * Turn Airbyte back on: `docker-compose up` -that should handle you getting reset to the beginning. -I would be curious if we can see the logs associated with the failure you are seeing. I would say if after you reset you run into it again we can debug that. +that should handle you getting reset to the beginning. I would be curious if we can see the logs associated with the failure you are seeing. I would say if after you reset you run into it again we can debug that. ## Git says file names are too long. -If you are cloning the repo, you might run into a problem where git indicates that certain filenames are too long and it therefore can't create the local file. So if you received this error after cloning the repo, run the following commands in *git bash*: + +If you are cloning the repo, you might run into a problem where git indicates that certain filenames are too long and it therefore can't create the local file. So if you received this error after cloning the repo, run the following commands in _git bash_: + ```bash cd airbyte git config core.longpaths true git reset --hard HEAD ``` + However it's worth pointing out that the `core.longpaths` option is defaulted to false for a reason, so use with caution. This git configuration is only changed within the cloned Airbyte repo, so you won't need to worry about changing this setting for other repositories. Find more details about this issue in [this stack overflow question](https://stackoverflow.com/questions/22575662/filename-too-long-in-git-for-windows). Instead of cloning the repo, you can alternatively download the latest Airbyte release [here](https://github.com/airbytehq/airbyte/releases). Unzip the downloaded file, access the unzipped file using PowerShell terminal, and run `docker-compose up`. After this, you should see the Airbyte containers in the Docker application as in the image below. @@ -28,11 +31,11 @@ Instead of cloning the repo, you can alternatively download the latest Airbyte r ## I have run `docker-compose up` and can not access the interface -- If you see a blank screen and not a loading icon: - +* If you see a blank screen and not a loading icon: + Check your web browser version; Some old versions of web browsers doesn't support our current Front-end stack. -- If you see a loading icon or the message `Cannot reach the server` persist: +* If you see a loading icon or the message `Cannot reach the server` persist: Check if all Airbyte containers are running, executing: `docker ps` @@ -44,22 +47,26 @@ f02fc709b130 airbyte/server:1.11.1-alpha "/bin/bash -c './wai…" 2 hou b88d94652268 airbyte/db:1.11.1-alpha "docker-entrypoint.s…" 2 hours ago Up 2 hours 5432/tcp airbyte-db 0573681a10e0 temporalio/auto-setup:1.7.0 "/entrypoint.sh /bin…" 2 hours ago Up 2 hours 6933-6935/tcp, [...] airbyte-temporal ``` + You must see 5 containers running. If you are not seeing execute the following steps: + * `docker-compose down -v` * `docker-compose up` -Keep in mind the commands above will delete ALL containers, volumes and data created by Airbyte. + + Keep in mind the commands above will delete ALL containers, volumes and data created by Airbyte. + We do not recommend this is you already deploy and have connection created. -First, let's check the server logs by running `docker logs airbyte-server | grep ERROR`.
-If this command returns any output, please run `docker logs airbyte-server > airbyte-server.log`.
-This command will create a file in the current directory. We advise you to send a message on our #issues on Slack channel +First, let's check the server logs by running `docker logs airbyte-server | grep ERROR`. + If this command returns any output, please run `docker logs airbyte-server > airbyte-server.log`. + This command will create a file in the current directory. We advise you to send a message on our \#issues on Slack channel -If you don't have any server errors let's check the scheduler, `docker logs airbyte-scheduler | grep ERROR`.
-If this command returns any output, please run `docker logs airbyte-scheduler > airbyte-scheduler.log`.
-This command will create a file in the current directory. We advise you to send a message on our #issues on Slack channel +If you don't have any server errors let's check the scheduler, `docker logs airbyte-scheduler | grep ERROR`. + If this command returns any output, please run `docker logs airbyte-scheduler > airbyte-scheduler.log`. + This command will create a file in the current directory. We advise you to send a message on our \#issues on Slack channel -If there is no error printed in both cases, we recommend running: `docker restart airbyte-server airbyte-scheduler`
-Wait a few moments and try to access the interface again. +If there is no error printed in both cases, we recommend running: `docker restart airbyte-server airbyte-scheduler` + Wait a few moments and try to access the interface again. ## `docker.errors.DockerException`: Error while fetching server API version @@ -73,9 +80,9 @@ directory')) It usually means that Docker isn't running on your machine \(and a running Docker daemon is required to run Airbyte\). An easy way to verify this is to run `docker ps`, which will show `Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?` if the Docker daemon is not running on your machine. -This happens (sometimes) on Windows system when you first install `docker`. You need to restart your machine. - +This happens \(sometimes\) on Windows system when you first install `docker`. You need to restart your machine. ## Getting a weird error related to setting up the Airbyte server when running Docker Compose -- wondering if this is because I played around with Airbyte in a past version? If you are okay with losing your previous Airbyte configurations, you can run `docker-compose down -v` and that should fix things then `docker-compose up`. + diff --git a/docs/troubleshooting/on-upgrading.md b/docs/troubleshooting/on-upgrading.md index d3f5a12faa9..b1ba4dea74c 100644 --- a/docs/troubleshooting/on-upgrading.md +++ b/docs/troubleshooting/on-upgrading.md @@ -1 +1,2 @@ - +# On Upgrading + diff --git a/docs/troubleshooting/running-sync.md b/docs/troubleshooting/running-sync.md index 1dc4a40fcff..6429e006a2a 100644 --- a/docs/troubleshooting/running-sync.md +++ b/docs/troubleshooting/running-sync.md @@ -4,7 +4,7 @@ Several things to check: -* **Is Airbyte updated to your latest version?** You can see the latest version [here](https://github.com/airbytehq/airbyte/tags). If not, please upgrade to the latest one, [upgrading instructions are here](../operator-guides/upgrading-airbyte.md) +* **Is Airbyte updated to your latest version?** You can see the latest version [here](https://github.com/airbytehq/airbyte/tags). If not, please upgrade to the latest one, [upgrading instructions are here]() * **Is the connector that is failing updated to the latest version?** You can check the latest version available for the connectors [in the yamls here](https://github.com/airbytehq/airbyte/tree/master/airbyte-config/init/src/main/resources/seed). If you don't have the latest connector version, make sure you first update to the latest Airbyte version, and then go to the Admin section in the web app and put the right version in the cell for the connector. Then try again. If the above workaround does not fix your problem, please report it [here](https://github.com/airbytehq/airbyte/issues/1462) or in our [Slack](https://slack.airbyte.io). @@ -15,7 +15,7 @@ Our current version of incremental is [append](../understanding-airbyte/connecti If this is true, then, there are still several things to check: -* **Is Airbyte updated to your latest version?** You can see the latest version [here](https://github.com/airbytehq/airbyte/tags). If not, please upgrade to the latest one, [upgrading instructions are here](../operator-guides/upgrading-airbyte.md) +* **Is Airbyte updated to your latest version?** You can see the latest version [here](https://github.com/airbytehq/airbyte/tags). If not, please upgrade to the latest one, [upgrading instructions are here]() * **Is the connector that is failing updated to the latest version?** You can check the latest version available for the connectors [in the yamls here](https://github.com/airbytehq/airbyte/tree/master/airbyte-config/init/src/main/resources/seed). If you don't have the latest connector version, make sure you first update to the latest Airbyte version, and then go to the Admin section in the web app and put the right version in the cell for the connector. Then try again. If the above workaround does not fix your problem, please report it [here](https://github.com/airbytehq/airbyte/issues/1462) or in our [Slack](https://slack.airbyte.io). @@ -26,7 +26,8 @@ Several things to check: * What is the name of the table you are looking at in the destination? Let's make sure you're not looking at a temporary table. * **Is the basic normalization toggle set to true at the connection settings?** If it's false, you won't see columns but most probably a JSON file. So you need to switch it on true, and try again. -* **Is Airbyte updated to your latest version?** You can see the latest version [here](https://github.com/airbytehq/airbyte/tags). If not, please upgrade to the latest one, [upgrading instructions are here](../operator-guides/upgrading-airbyte.md) +* **Is Airbyte updated to your latest version?** You can see the latest version [here](https://github.com/airbytehq/airbyte/tags). If not, please upgrade to the latest one, [upgrading instructions are here]() * **Is the connector that is failing updated to the latest version?** You can check the latest version available for the connectors [in the yamls here](https://github.com/airbytehq/airbyte/tree/master/airbyte-config/init/src/main/resources/seed). If you don't have the latest connector version, make sure you first update to the latest Airbyte version, and then go to the Admin section in the web app and put the right version in the cell for the connector. Then try again. -If the above workaround does not fix your problem, please report it [here](https://github.com/airbytehq/airbyte/issues/1462) or in our [Slack](https://slack.airbyte.io). \ No newline at end of file +If the above workaround does not fix your problem, please report it [here](https://github.com/airbytehq/airbyte/issues/1462) or in our [Slack](https://slack.airbyte.io). + diff --git a/docs/understanding-airbyte/basic-normalization.md b/docs/understanding-airbyte/basic-normalization.md index 8372a03aaaf..3939e64e140 100644 --- a/docs/understanding-airbyte/basic-normalization.md +++ b/docs/understanding-airbyte/basic-normalization.md @@ -6,8 +6,7 @@ The high-level overview contains all the information you need to use Basic Normalization when pulling from APIs. Information past that can be read for advanced or educational purposes. {% endhint %} -When you run your first Airbyte sync without the basic normalization, you'll notice that your data gets written to your destination as one data column with a JSON blob that contains all of your data. This is the `_airbyte_raw_` table that you may have seen before. Why do we create this table? A core tenet of ELT philosophy is that data should be untouched as it moves through the E and L stages so that the raw data is always accessible. If an unmodified version of the -data exists in the destination, it can be retransformed without needing to sync data again. +When you run your first Airbyte sync without the basic normalization, you'll notice that your data gets written to your destination as one data column with a JSON blob that contains all of your data. This is the `_airbyte_raw_` table that you may have seen before. Why do we create this table? A core tenet of ELT philosophy is that data should be untouched as it moves through the E and L stages so that the raw data is always accessible. If an unmodified version of the data exists in the destination, it can be retransformed without needing to sync data again. If you have Basic Normalization enabled, Airbyte automatically uses this JSON blob to create a schema and tables with your data in mind, converting it to the format of your destination. This runs after your sync and may take a long time if you have a large amount of data synced. If you don't enable Basic Normalization, you'll have to transform the JSON data from that column yourself. @@ -47,7 +46,7 @@ Airbyte places the json blob version of your data in a table called `_airbyte_ra ## Why does Airbyte have Basic Normalization? -At its core, Airbyte is geared to handle the EL \(Extract Load\) steps of an ELT process. These steps can also be referred in Airbyte's dialect as "Source" and "Destination". +At its core, Airbyte is geared to handle the EL \(Extract Load\) steps of an ELT process. These steps can also be referred in Airbyte's dialect as "Source" and "Destination". However, this is actually producing a table in the destination with a JSON blob column... For the typical analytics use case, you probably want this json blob normalized so that each field is its own column. @@ -60,15 +59,16 @@ To summarize, we can represent the ELT process in the diagram below. These are s ![](../.gitbook/assets/connecting-EL-with-T-4.png) In Airbyte, the current normalization option is implemented using a dbt Transformer composed of: -- Airbyte base-normalization python package to generate dbt SQL models files -- dbt to compile and executes the models on top of the data in the destinations that supports it. + +* Airbyte base-normalization python package to generate dbt SQL models files +* dbt to compile and executes the models on top of the data in the destinations that supports it. ## Destinations that Support Basic Normalization * [BigQuery](../integrations/destinations/bigquery.md) * [MySQL](../integrations/destinations/mysql.md) * The server must support the `WITH` keyword. - * Require MySQL >= 8.0, or MariaDB >= 10.2.1. + * Require MySQL >= 8.0, or MariaDB >= 10.2.1. * [Postgres](../integrations/destinations/postgres.md) * [Snowflake](../integrations/destinations/snowflake.md) * [Redshift](../integrations/destinations/redshift.md) @@ -277,13 +277,13 @@ As an example from the hubspot source, we could have the following tables with n As mentioned in the overview: -- Airbyte places the json blob version of your data in a table called `_airbyte_raw_`. -- If basic normalization is turned on, it will place a separate copy of the data in a table called ``. -- In certain pathological cases, basic normalization is required to generate large models with many columns and multiple intermediate transformation steps for a stream. This may break down the "ephemeral" materialization strategy and require the use of additional intermediate views or tables instead. As a result, you may notice additional temporary tables being generated in the destination to handle these checkpoints. +* Airbyte places the json blob version of your data in a table called `_airbyte_raw_`. +* If basic normalization is turned on, it will place a separate copy of the data in a table called ``. +* In certain pathological cases, basic normalization is required to generate large models with many columns and multiple intermediate transformation steps for a stream. This may break down the "ephemeral" materialization strategy and require the use of additional intermediate views or tables instead. As a result, you may notice additional temporary tables being generated in the destination to handle these checkpoints. ## UI Configurations -To enable basic normalization (which is optional), you can toggle it on or disable it in the "Normalization and Transformation" section when setting up your connection: +To enable basic normalization \(which is optional\), you can toggle it on or disable it in the "Normalization and Transformation" section when setting up your connection: ![](../.gitbook/assets/basic-normalization-configuration.png) @@ -299,22 +299,21 @@ Note that all the choices made by Normalization as described in this documentati ### airbyte-integration/bases/base-normalization -Note that Basic Normalization is packaged in a docker image `airbyte/normalization`. -This image is tied to and released along with a specific Airbyte version. -It is not configurable independently like it is possible to do with connectors (source & destinations) +Note that Basic Normalization is packaged in a docker image `airbyte/normalization`. This image is tied to and released along with a specific Airbyte version. It is not configurable independently like it is possible to do with connectors \(source & destinations\) Therefore, in order to "upgrade" to the desired normalization version, you need to use the corresponding Airbyte version that it's being released in: | Airbyte Version | Normalization Version | Date | Pull Request | Subject | -| :--- | :--- | :--- | :--- | :--- | -| 0.30.16-alpha | 0.1.52 | 2021-10-07 | [#6379](https://github.com/airbytehq/airbyte/pull/6379) | Handle empty string for date and date-time format | -| 0.30.16-alpha | 0.1.51 | 2021-10-08 | [#6799](https://github.com/airbytehq/airbyte/pull/6799) | Added support for ad_cdc_log_pos while normalization | -| 0.30.16-alpha | 0.1.50 | 2021-10-07 | [#6079](https://github.com/airbytehq/airbyte/pull/6079) | Added support for MS SQL Server normalization | -| 0.30.16-alpha | 0.1.49 | 2021-10-06 | [#6709](https://github.com/airbytehq/airbyte/pull/6709) | Forward destination dataset location to dbt profiles | -| 0.29.17-alpha | 0.1.47 | 2021-09-20 | [#6317](https://github.com/airbytehq/airbyte/pull/6317) | MySQL: updated MySQL normalization with using SSH tunnel | -| 0.29.17-alpha | 0.1.45 | 2021-09-18 | [#6052](https://github.com/airbytehq/airbyte/pull/6052) | Snowflake: accept any date-time format | -| 0.29.8-alpha | 0.1.40 | 2021-08-18 | [#5433](https://github.com/airbytehq/airbyte/pull/5433) | Allow optional credentials_json for BigQuery | -| 0.29.5-alpha | 0.1.39 | 2021-08-11 | [#4557](https://github.com/airbytehq/airbyte/pull/4557) | Handle date times and solve conflict name btw stream/field | -| 0.28.2-alpha | 0.1.38 | 2021-07-28 | [#5027](https://github.com/airbytehq/airbyte/pull/5027) | Handle quotes in column names when parsing JSON blob | -| 0.27.5-alpha | 0.1.37 | 2021-07-22 | [#3947](https://github.com/airbytehq/airbyte/pull/4881/) | Handle `NULL` cursor field values when deduping | -| 0.27.2-alpha | 0.1.36 | 2021-07-09 | [#3947](https://github.com/airbytehq/airbyte/pull/4163/) | Enable normalization for MySQL destination | +| :--- | :--- | :--- | :--- | :--- | +| 0.30.16-alpha | 0.1.52 | 2021-10-07 | [\#6379](https://github.com/airbytehq/airbyte/pull/6379) | Handle empty string for date and date-time format | +| 0.30.16-alpha | 0.1.51 | 2021-10-08 | [\#6799](https://github.com/airbytehq/airbyte/pull/6799) | Added support for ad\_cdc\_log\_pos while normalization | +| 0.30.16-alpha | 0.1.50 | 2021-10-07 | [\#6079](https://github.com/airbytehq/airbyte/pull/6079) | Added support for MS SQL Server normalization | +| 0.30.16-alpha | 0.1.49 | 2021-10-06 | [\#6709](https://github.com/airbytehq/airbyte/pull/6709) | Forward destination dataset location to dbt profiles | +| 0.29.17-alpha | 0.1.47 | 2021-09-20 | [\#6317](https://github.com/airbytehq/airbyte/pull/6317) | MySQL: updated MySQL normalization with using SSH tunnel | +| 0.29.17-alpha | 0.1.45 | 2021-09-18 | [\#6052](https://github.com/airbytehq/airbyte/pull/6052) | Snowflake: accept any date-time format | +| 0.29.8-alpha | 0.1.40 | 2021-08-18 | [\#5433](https://github.com/airbytehq/airbyte/pull/5433) | Allow optional credentials\_json for BigQuery | +| 0.29.5-alpha | 0.1.39 | 2021-08-11 | [\#4557](https://github.com/airbytehq/airbyte/pull/4557) | Handle date times and solve conflict name btw stream/field | +| 0.28.2-alpha | 0.1.38 | 2021-07-28 | [\#5027](https://github.com/airbytehq/airbyte/pull/5027) | Handle quotes in column names when parsing JSON blob | +| 0.27.5-alpha | 0.1.37 | 2021-07-22 | [\#3947](https://github.com/airbytehq/airbyte/pull/4881/) | Handle `NULL` cursor field values when deduping | +| 0.27.2-alpha | 0.1.36 | 2021-07-09 | [\#3947](https://github.com/airbytehq/airbyte/pull/4163/) | Enable normalization for MySQL destination | + diff --git a/docs/understanding-airbyte/catalog.md b/docs/understanding-airbyte/catalog.md index 9841febb94b..0a4ed7cebec 100644 --- a/docs/understanding-airbyte/catalog.md +++ b/docs/understanding-airbyte/catalog.md @@ -1,4 +1,4 @@ -# AirbyteCatalog & ConfiguredAirbyteCatalog Reference +# AirbyteCatalog Reference ## Overview diff --git a/docs/understanding-airbyte/cdc.md b/docs/understanding-airbyte/cdc.md index 506ba61ab11..18367d899a7 100644 --- a/docs/understanding-airbyte/cdc.md +++ b/docs/understanding-airbyte/cdc.md @@ -14,8 +14,8 @@ The Airbyte Protocol outputs records from sources. Records from `UPDATE` stateme We add some metadata columns for CDC sources: -* `ab_cdc_lsn` (postgres and sql server sources) is the point in the log where the record was retrieved -* `ab_cdc_log_file` & `ab_cdc_log_pos` (specific to mysql source) is the file name and position in the file where the record was retrieved +* `ab_cdc_lsn` \(postgres and sql server sources\) is the point in the log where the record was retrieved +* `ab_cdc_log_file` & `ab_cdc_log_pos` \(specific to mysql source\) is the file name and position in the file where the record was retrieved * `ab_cdc_updated_at` is the timestamp for the database transaction that resulted in this record change and is present for records from `DELETE`/`INSERT`/`UPDATE` statements * `ab_cdc_deleted_at` is the timestamp for the database transaction that resulted in this record change and is only present for records from `DELETE` statements @@ -30,7 +30,7 @@ We add some metadata columns for CDC sources: ## Current Support -* [Postgres](../integrations/sources/postgres.md) (For a quick video overview of CDC on Postgres, click [here](https://www.youtube.com/watch?v=NMODvLgZvuE&ab_channel=Airbyte)) +* [Postgres](../integrations/sources/postgres.md) \(For a quick video overview of CDC on Postgres, click [here](https://www.youtube.com/watch?v=NMODvLgZvuE&ab_channel=Airbyte)\) * [MySQL](../integrations/sources/mysql.md) * [Microsoft SQL Server / MSSQL](../integrations/sources/mssql.md) diff --git a/docs/understanding-airbyte/connections/README.md b/docs/understanding-airbyte/connections/README.md index d7086466b73..f47189fcad7 100644 --- a/docs/understanding-airbyte/connections/README.md +++ b/docs/understanding-airbyte/connections/README.md @@ -5,8 +5,8 @@ A connection is a configuration for syncing data between a source and a destinat * Sync schedule: when to trigger a sync of the data. * Destination [Namespace](../namespaces.md) and stream names: where the data will end up being written. * A catalog selection: which [streams and fields](../catalog.md) to replicate from the source -* Sync mode: how streams should be replicated (read and write): -* Optional transformations: how to convert Airbyte protocol messages (raw JSON blob) data into some other data representations. +* Sync mode: how streams should be replicated \(read and write\): +* Optional transformations: how to convert Airbyte protocol messages \(raw JSON blob\) data into some other data representations. ## Sync schedules @@ -28,7 +28,7 @@ When a scheduled connection is first created, a sync is executed as soon as poss ## Destination namespace -The location of where a connection replication will store data is referenced as the destination namespace. The destination connectors should create and write records (for both raw and normalized tables) in the specified namespace which should be configurable in the UI via the Namespace Configuration field (or NamespaceDefinition in the API). You can read more about configuring namespaces [here](../namespaces.md). +The location of where a connection replication will store data is referenced as the destination namespace. The destination connectors should create and write records \(for both raw and normalized tables\) in the specified namespace which should be configurable in the UI via the Namespace Configuration field \(or NamespaceDefinition in the API\). You can read more about configuring namespaces [here](../namespaces.md). ## Destination stream name @@ -44,20 +44,18 @@ All the customization of namespace and stream names described above will be equa A sync mode governs how Airbyte reads from a source and writes to a destination. Airbyte provides different sync modes to account for various use cases. To minimize confusion, a mode's behavior is reflected in its name. The easiest way to understand Airbyte's sync modes is to understand how the modes are named. -1. The first part of the name denotes how the source connector reads data from the source: - -* Incremental: Read records added to the source since the last sync job. (The first sync using Incremental is equivalent to a Full Refresh) - * Method 1: Using a cursor. Generally supported by all connectors whose data source allows extracting records incrementally. - * Method 2: Using change data capture. Only supported by some sources. See [CDC](../cdc.md) for more info. -* Full Refresh: Read everything in the source. - -2. The second part of the sync mode name denotes how the destination connector writes data. This is not affected by how the source connector produced the data: - -* Overwrite: Overwrite by first deleting existing data in the destination. -* Append: Write by adding data to existing tables in the destination. -* Deduped History: Write by first adding data to existing tables in the destination to keep a history of changes. The final table is produced by de-duplicating the intermediate ones using a primary key. +1. The first part of the name denotes how the source connector reads data from the source: +2. Incremental: Read records added to the source since the last sync job. \(The first sync using Incremental is equivalent to a Full Refresh\) + * Method 1: Using a cursor. Generally supported by all connectors whose data source allows extracting records incrementally. + * Method 2: Using change data capture. Only supported by some sources. See [CDC](../cdc.md) for more info. +3. Full Refresh: Read everything in the source. +4. The second part of the sync mode name denotes how the destination connector writes data. This is not affected by how the source connector produced the data: +5. Overwrite: Overwrite by first deleting existing data in the destination. +6. Append: Write by adding data to existing tables in the destination. +7. Deduped History: Write by first adding data to existing tables in the destination to keep a history of changes. The final table is produced by de-duplicating the intermediate ones using a primary key. A sync mode is therefore, a combination of a source and destination mode together. The UI exposes the following options, whenever both source and destination connectors are capable to support it for the corresponding stream: + * [Full Refresh Overwrite](full-refresh-overwrite.md): Sync the whole stream and replace data in destination by overwriting it. * [Full Refresh Append](full-refresh-append.md): Sync the whole stream and append data in destination. * [Incremental Append](incremental-append.md): Sync new records from stream and append data in destination. @@ -71,11 +69,11 @@ As described by the [Airbyte Protocol from the Airbyte Specifications](../airbyt On top of this replication, Airbyte provides the option to enable or disable an additional transformation step at the end of the sync called [basic normalization](../basic-normalization.md). This operation is: -- only available for destinations that support dbt execution. -- responsible for automatically generating a pipeline or a DAG of dbt transformation models to convert JSON blob objects into normalized tables. -- responsible for running and applying these dbt models to the data written in the destination. +* only available for destinations that support dbt execution. +* responsible for automatically generating a pipeline or a DAG of dbt transformation models to convert JSON blob objects into normalized tables. +* responsible for running and applying these dbt models to the data written in the destination. ### Custom sync operations -Further operations can be included in a sync on top of Airbyte basic normalization (or even to replace it completely). -See [operations](../operations.md) for more details. +Further operations can be included in a sync on top of Airbyte basic normalization \(or even to replace it completely\). See [operations](../operations.md) for more details. + diff --git a/docs/understanding-airbyte/connections/full-refresh-append.md b/docs/understanding-airbyte/connections/full-refresh-append.md index a6c0dba23fc..56fdbaab447 100644 --- a/docs/understanding-airbyte/connections/full-refresh-append.md +++ b/docs/understanding-airbyte/connections/full-refresh-append.md @@ -2,7 +2,7 @@ ## Overview -The **Full Refresh** modes are the simplest methods that Airbyte uses to sync data, as they always retrieve all available data requested from the source, regardless of whether it has been synced before. This contrasts with [**Incremental sync**](./incremental-append.md), which does not sync data that has already been synced before. +The **Full Refresh** modes are the simplest methods that Airbyte uses to sync data, as they always retrieve all available data requested from the source, regardless of whether it has been synced before. This contrasts with [**Incremental sync**](incremental-append.md), which does not sync data that has already been synced before. In the **Append** variant, new syncs will take all data from the sync and append it to the destination table. Therefore, if syncing similar information multiple times, every sync will create duplicates of already existing data. diff --git a/docs/understanding-airbyte/connections/full-refresh-overwrite.md b/docs/understanding-airbyte/connections/full-refresh-overwrite.md index 62a30821847..f5c962da8ce 100644 --- a/docs/understanding-airbyte/connections/full-refresh-overwrite.md +++ b/docs/understanding-airbyte/connections/full-refresh-overwrite.md @@ -2,7 +2,7 @@ ## Overview -The **Full Refresh** modes are the simplest methods that Airbyte uses to sync data, as they always retrieve all available information requested from the source, regardless of whether it has been synced before. This contrasts with [**Incremental sync**](./incremental-append.md), which does not sync data that has already been synced before. +The **Full Refresh** modes are the simplest methods that Airbyte uses to sync data, as they always retrieve all available information requested from the source, regardless of whether it has been synced before. This contrasts with [**Incremental sync**](incremental-append.md), which does not sync data that has already been synced before. In the **Overwrite** variant, new syncs will destroy all data in the existing destination table and then pull the new data in. Therefore, data that has been removed from the source after an old sync will be deleted in the destination table. diff --git a/docs/understanding-airbyte/connections/incremental-append.md b/docs/understanding-airbyte/connections/incremental-append.md index ca588712986..a097e3f054c 100644 --- a/docs/understanding-airbyte/connections/incremental-append.md +++ b/docs/understanding-airbyte/connections/incremental-append.md @@ -22,20 +22,20 @@ As mentioned above, the delta from a sync will be _appended_ to the existing dat Assume that `updated_at` is our `cursor_field`. Let's say the following data already exists into our data warehouse. -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | false | 1754 | | Marie Antoinette | false | 1755 | In the next sync, the delta contains the following record: -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVII | false | 1785 | At the end of this incremental sync, the data warehouse would now contain: -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | false | 1754 | | Marie Antoinette | false | 1755 | @@ -45,14 +45,14 @@ At the end of this incremental sync, the data warehouse would now contain: Let's assume that our warehouse contains all the data that it did at the end of the previous section. Now, unfortunately the king and queen lose their heads. Let's see that delta: -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | true | 1793 | | Marie Antoinette | true | 1793 | The output we expect to see in the warehouse is as follows: -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | false | 1754 | | Marie Antoinette | false | 1755 | @@ -90,7 +90,7 @@ If you only care about having the latest snapshot of your data, you may want to When replicating data incrementally, Airbyte provides an at-least-once delivery guarantee. This means that it is acceptable for sources to re-send some data when ran incrementally. One case where this is particularly relevant is when a source's cursor is not very granular. For example, if a cursor field has the granularity of a day \(but not hours, seconds, etc\), then if that source is run twice in the same day, there is no way for the source to know which records that are that date were already replicated earlier that day. By convention, sources should prefer resending data if the cursor field is ambiguous. -Additionally, you may run into behavior where you see the same row being emitted during each sync. This will occur if your data has not changed and you attempt to run additional syncs, as the cursor field will always be greater than or equal to itself, causing it to pull the latest row multiple times until there is new data at the source. +Additionally, you may run into behavior where you see the same row being emitted during each sync. This will occur if your data has not changed and you attempt to run additional syncs, as the cursor field will always be greater than or equal to itself, causing it to pull the latest row multiple times until there is new data at the source. ## Known Limitations @@ -102,29 +102,29 @@ SELECT * FROM table WHERE cursor_field >= 'last_sync_max_cursor_field_value' Let's say the following data already exists into our data warehouse. -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | false | 1754 | | Marie Antoinette | false | 1755 | At the start of the next sync, the source data contains the following new record: -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | true | 1754 | At the end of the second incremental sync, the data warehouse would still contain data from the first sync because the delta record did not provide a valid value for the cursor field \(the cursor field is not greater than last sync's max value, `1754 < 1755`\), so it is not emitted by the source as a new or modified record. -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | false | 1754 | | Marie Antoinette | false | 1755 | Similarly, if multiple modifications are made during the same day to the same records. If the frequency of the sync is not granular enough \(for example, set for every 24h\), then intermediate modifications to the data are not going to be detected and emitted. Only the state of data at the time the sync runs will be reflected in the destination. -Those concerns could be solved by using a different incremental approach based on binary logs, Write-Ahead-Logs \(WAL\), or also called [Change Data Capture (CDC)](../cdc.md). +Those concerns could be solved by using a different incremental approach based on binary logs, Write-Ahead-Logs \(WAL\), or also called [Change Data Capture \(CDC\)](../cdc.md). The current behavior of **Incremental** is not able to handle source schema changes yet, for example, when a column is added, renamed or deleted from an existing table etc. It is recommended to trigger a [Full refresh - Overwrite](full-refresh-overwrite.md) to correctly replicate the data to the destination with the new schema changes. -If you are not satisfied with how transformations are applied on top of the appended data, you can find more relevant SQL transformations you might need to do on your data in the [Connecting EL with T using SQL \(part 1/2\)]() +If you are not satisfied with how transformations are applied on top of the appended data, you can find more relevant SQL transformations you might need to do on your data in the [Connecting EL with T using SQL \(part 1/2\)](incremental-append.md) diff --git a/docs/understanding-airbyte/connections/incremental-deduped-history.md b/docs/understanding-airbyte/connections/incremental-deduped-history.md index f17fdba1f15..bcf182cb1b6 100644 --- a/docs/understanding-airbyte/connections/incremental-deduped-history.md +++ b/docs/understanding-airbyte/connections/incremental-deduped-history.md @@ -30,20 +30,20 @@ As mentioned above, the delta from a sync will be _appended_ to the existing his Assume that `updated_at` is our `cursor_field` and `name` is the `primary_key`. Let's say the following data already exists into our data warehouse. -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | false | 1754 | | Marie Antoinette | false | 1755 | In the next sync, the delta contains the following record: -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVII | false | 1785 | At the end of this incremental sync, the data warehouse would now contain: -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | false | 1754 | | Marie Antoinette | false | 1755 | @@ -53,7 +53,7 @@ At the end of this incremental sync, the data warehouse would now contain: Let's assume that our warehouse contains all the data that it did at the end of the previous section. Now, unfortunately the king and queen lose their heads. Let's see that delta: -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | true | 1793 | | Marie Antoinette | true | 1793 | @@ -62,7 +62,7 @@ The output we expect to see in the warehouse is as follows: In the history table: -| name | deceased | updated_at | start_at | end_at | +| name | deceased | updated\_at | start\_at | end\_at | | :--- | :--- | :--- | :--- | :--- | | Louis XVI | false | 1754 | 1754 | 1793 | | Louis XVI | true | 1793 | 1793 | NULL | @@ -72,7 +72,7 @@ In the history table: In the final de-duplicated table: -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | true | 1793 | | Louis XVII | false | 1785 | @@ -122,33 +122,31 @@ select * from table where cursor_field > 'last_sync_max_cursor_field_value' Let's say the following data already exists into our data warehouse. -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | false | 1754 | | Marie Antoinette | false | 1755 | At the start of the next sync, the source data contains the following new record: -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | true | 1754 | At the end of the second incremental sync, the data warehouse would still contain data from the first sync because the delta record did not provide a valid value for the cursor field \(the cursor field is not greater than last sync's max value, `1754 < 1755`\), so it is not emitted by the source as a new or modified record. -| name | deceased | updated_at | +| name | deceased | updated\_at | | :--- | :--- | :--- | | Louis XVI | false | 1754 | | Marie Antoinette | false | 1755 | Similarly, if multiple modifications are made during the same day to the same records. If the frequency of the sync is not granular enough \(for example, set for every 24h\), then intermediate modifications to the data are not going to be detected and emitted. Only the state of data at the time the sync runs will be reflected in the destination. -Those concerns could be solved by using a different incremental approach based on binary logs, Write-Ahead-Logs \(WAL\), or also called [Change Data Capture (CDC)](../cdc.md). +Those concerns could be solved by using a different incremental approach based on binary logs, Write-Ahead-Logs \(WAL\), or also called [Change Data Capture \(CDC\)](../cdc.md). The current behavior of **Incremental** is not able to handle source schema changes yet, for example, when a column is added, renamed or deleted from an existing table etc. It is recommended to trigger a [Full refresh - Overwrite](full-refresh-overwrite.md) to correctly replicate the data to the destination with the new schema changes. -Additionally, this sync mode is only supported for destinations where dbt/normalization is possible for the moment. -The de-duplicating logic is indeed implemented as dbt models as part of a sequence of transformations applied after the Extract and Load activities (thus, an ELT approach). -Nevertheless, it is theoretically possible that destinations can handle directly this logic (maybe in the future) before actually writing records to the destination (as in traditional ETL manner), but that's not the way it is implemented at this time. +Additionally, this sync mode is only supported for destinations where dbt/normalization is possible for the moment. The de-duplicating logic is indeed implemented as dbt models as part of a sequence of transformations applied after the Extract and Load activities \(thus, an ELT approach\). Nevertheless, it is theoretically possible that destinations can handle directly this logic \(maybe in the future\) before actually writing records to the destination \(as in traditional ETL manner\), but that's not the way it is implemented at this time. If you are not satisfied with how transformations are applied on top of the appended data, you can find more relevant SQL transformations you might need to do on your data in the [Connecting EL with T using SQL \(part 1/2\)](../../operator-guides/transformation-and-normalization/transformations-with-sql.md) diff --git a/docs/understanding-airbyte/glossary.md b/docs/understanding-airbyte/glossary.md index d07f2e3e515..e6913602595 100644 --- a/docs/understanding-airbyte/glossary.md +++ b/docs/understanding-airbyte/glossary.md @@ -2,21 +2,21 @@ ### Airbyte CDK -The Airbyte CDK (Connector Development Kit) allows you to create connectors for Sources or Destinations. If your source or destination doesn't exist, you can use the CDK to make the building process a lot easier. It generates all the tests and files you need and all you need to do is write the connector-specific code for your source or destination. We created one in Python which you can check out [here](../connector-development/cdk-python) and the Faros AI team created a Javascript/Typescript one that you can check out [here](../connector-development/cdk-faros-js). +The Airbyte CDK \(Connector Development Kit\) allows you to create connectors for Sources or Destinations. If your source or destination doesn't exist, you can use the CDK to make the building process a lot easier. It generates all the tests and files you need and all you need to do is write the connector-specific code for your source or destination. We created one in Python which you can check out [here](../connector-development/cdk-python/) and the Faros AI team created a Javascript/Typescript one that you can check out [here](../connector-development/cdk-faros-js.md). ### DAG -DAG stands for **Directed Acyclic Graph**. It's a term originally coined by math graph theorists that describes a tree-like process that cannot contain loops. For example, in the following diagram, you start at A and can choose B or C, which then proceed to D and E, respectively. This kind of structure is great for representing workflows and is what tools like [Airflow](https://airflow.apache.org/) use to orchestrate the execution of software based on different cases or states. -![](../.gitbook/assets/glossary_dag_example.png) +DAG stands for **Directed Acyclic Graph**. It's a term originally coined by math graph theorists that describes a tree-like process that cannot contain loops. For example, in the following diagram, you start at A and can choose B or C, which then proceed to D and E, respectively. This kind of structure is great for representing workflows and is what tools like [Airflow](https://airflow.apache.org/) use to orchestrate the execution of software based on different cases or states. ![](../.gitbook/assets/glossary_dag_example.png) ### ETL/ELT + Stands for **E**xtract, **T**ransform, and **L**oad and **E**xtract, **L**oad, and **T**ransform, respectively. -**Extract**: Retrieve data from a [source](../integrations/sources), which can be an application, database, anything really. +**Extract**: Retrieve data from a [source](../integrations/sources/), which can be an application, database, anything really. -**Load**: Move data to your [destination](../integrations/destinations). +**Load**: Move data to your [destination](../integrations/destinations/). -**Transform**: Clean up the data. This is referred to as [normalization](./basic-normalization.md) in Airbyte and involves [deduplication](./connections/incremental-deduped-history.md), changing data types, formats, and more. +**Transform**: Clean up the data. This is referred to as [normalization](basic-normalization.md) in Airbyte and involves [deduplication](connections/incremental-deduped-history.md), changing data types, formats, and more. ### Full Refresh Sync @@ -33,22 +33,26 @@ Airbyte spits out tables with the prefix `_airbyte_raw_`. This is your replicate ## Advanced Terms ### AirbyteCatalog + {% hint style="info" %} This is only relevant for individuals who want to create a connector. {% endhint %} -This refers to how you define the data that you can retrieve from a Source. For example, if you want to retrieve information from an API, the data that you can receive needs to be defined clearly so that Airbyte can have a clear expectation of what endpoints are supported and what the objects that the streams return look like. This is represented as a sort of schema that Airbyte can interpret. Learn more [here](./beginners-guide-to-catalog.md). +This refers to how you define the data that you can retrieve from a Source. For example, if you want to retrieve information from an API, the data that you can receive needs to be defined clearly so that Airbyte can have a clear expectation of what endpoints are supported and what the objects that the streams return look like. This is represented as a sort of schema that Airbyte can interpret. Learn more [here](beginners-guide-to-catalog.md). ### Airbyte Specification + {% hint style="info" %} This is only relevant for individuals who want to create a connector. {% endhint %} -This refers to the functions that a Source or Destination must implement to successfully retrieve data and load it, respectively. Implementing these functions using the Airbyte Specification makes a Source or Destination work correctly. Learn more [here](./airbyte-specification.md). +This refers to the functions that a Source or Destination must implement to successfully retrieve data and load it, respectively. Implementing these functions using the Airbyte Specification makes a Source or Destination work correctly. Learn more [here](airbyte-specification.md). ### Temporal + {% hint style="info" %} This is only relevant for individuals who want to learn about or contribute to our underlying platform. {% endhint %} -[Temporal](https://temporal.io/) is a development kit that lets you create workflows, parallelize them, and handle failures/retries gracefully. We use it to reliably schedule each step of the ELT process, and a Temporal service is always deployed with each Airbyte installation. \ No newline at end of file +[Temporal](https://temporal.io/) is a development kit that lets you create workflows, parallelize them, and handle failures/retries gracefully. We use it to reliably schedule each step of the ELT process, and a Temporal service is always deployed with each Airbyte installation. + diff --git a/docs/understanding-airbyte/namespaces.md b/docs/understanding-airbyte/namespaces.md index 061c2b4b294..9c2ff5cd64b 100644 --- a/docs/understanding-airbyte/namespaces.md +++ b/docs/understanding-airbyte/namespaces.md @@ -8,7 +8,7 @@ The high-level overview contains all the information you need to use Namespaces When looking through our connector docs, you'll notice that some sources and destinations support "Namespaces." These allow you to organize and separate your data into groups in the destination if the destination supports it. For example, in a database, a namespace could be a schema in the database. If your desired destination doesn't support it, you can ignore this feature. -Note that this is the location that both your normalized and raw data will get written to. Your raw data will show up with the prefix `_airbyte_raw_` in the namespace you define. If you don't enable basic normalization, you will only receive the raw tables. +Note that this is the location that both your normalized and raw data will get written to. Your raw data will show up with the prefix `_airbyte_raw_` in the namespace you define. If you don't enable basic normalization, you will only receive the raw tables. If only your destination supports namespaces, you have two simple options. **This is the most likely case**, as all HTTP APIs currently don't support Namespaces. @@ -33,9 +33,7 @@ If the Destination does not support namespaces, the [namespace field](https://gi ## Destination namespace configuration -As part of the [connections sync settings](connections/README.md), it is possible to configure the namespace used by: -1. destination connectors: to store the `_airbyte_raw_*` tables. -2. basic normalization: to store the final normalized tables. +As part of the [connections sync settings](connections/), it is possible to configure the namespace used by: 1. destination connectors: to store the `_airbyte_raw_*` tables. 2. basic normalization: to store the final normalized tables. Note that custom transformation outputs are not affected by the namespace settings from Airbyte: It is up to the configuration of the custom dbt project, and how it is written to handle its [custom schemas](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-custom-schemas). The default target schema for dbt in this case, will always be the destination namespace. @@ -43,13 +41,11 @@ Available options for namespace configurations are: ### - Mirror source structure -Some sources (such as databases based on JDBC for example) are providing namespace informations from which a stream has been extracted from. Whenever a source is able to fill this field in the catalog.json file, the destination will try to reproduce exactly the same namespace when this configuraton is set. -For sources or streams where the source namespace is not known, the behavior will fall back to the "Destination Connector settings". +Some sources \(such as databases based on JDBC for example\) are providing namespace informations from which a stream has been extracted from. Whenever a source is able to fill this field in the catalog.json file, the destination will try to reproduce exactly the same namespace when this configuraton is set. For sources or streams where the source namespace is not known, the behavior will fall back to the "Destination Connector settings". ### - Destination connector settings -All stream will be replicated and store in the default namespace defined on the destination settings page. -In the destinations, namespace refers to: +All stream will be replicated and store in the default namespace defined on the destination settings page. In the destinations, namespace refers to: | Destination Connector | Namespace setting | | :--- | :--- | @@ -66,35 +62,34 @@ In the destinations, namespace refers to: When replicating multiple sources into the same destination, conflicts on tables being overwritten by syncs can occur. -For example, a Github source can be replicated into a "github" schema. -But if we have multiple connections to different GitHub repositories (similar in multi-tenant scenarios): +For example, a Github source can be replicated into a "github" schema. But if we have multiple connections to different GitHub repositories \(similar in multi-tenant scenarios\): -- we'd probably wish to keep the same table names (to keep consistent queries downstream) -- but store them in different namespaces (to avoid mixing data from different "tenants") +* we'd probably wish to keep the same table names \(to keep consistent queries downstream\) +* but store them in different namespaces \(to avoid mixing data from different "tenants"\) To solve this, we can either: -- use a specific namespace for each connection, thus this option of custom format. -- or, use prefix to stream names as described below. +* use a specific namespace for each connection, thus this option of custom format. +* or, use prefix to stream names as described below. Note that we can use a template format string using variables that will be resolved during replication as follow: -- `${SOURCE_NAMESPACE}`: will be replaced by the namespace provided by the source if available +* `${SOURCE_NAMESPACE}`: will be replaced by the namespace provided by the source if available ### Examples -The following table summarises how this works. We assume an example of replication configurations between a Postgres Source and Snowflake Destination (with settings of schema = "my_schema"): +The following table summarises how this works. We assume an example of replication configurations between a Postgres Source and Snowflake Destination \(with settings of schema = "my\_schema"\): | Namespace Configuration | Source Namespace | Source Table Name | Destination Namespace | Destination Table Name | | :--- | :--- | :--- | :--- | :--- | -| Mirror source structure | public | my_table | public | my_table | -| Mirror source structure | | my_table | my_schema | my_table | -| Destination connector settings | public | my_table | my_schema | my_table | -| Destination connector settings | | my_table | my_schema | my_table | -| Custom format = "custom" | public | my_table | custom | my_table | -| Custom format = "${SOURCE_NAMESPACE}" | public | my_table | public | my_table | -| Custom format = "my_${SOURCE_NAMESPACE}_schema" | public | my_table | my_public_schema | my_table | -| Custom format = " " | public | my_table | my_schema | my_table | +| Mirror source structure | public | my\_table | public | my\_table | +| Mirror source structure | | my\_table | my\_schema | my\_table | +| Destination connector settings | public | my\_table | my\_schema | my\_table | +| Destination connector settings | | my\_table | my\_schema | my\_table | +| Custom format = "custom" | public | my\_table | custom | my\_table | +| Custom format = "${SOURCE\_NAMESPACE}" | public | my\_table | public | my\_table | +| Custom format = "my\_${SOURCE\_NAMESPACE}\_schema" | public | my\_table | my\_public\_schema | my\_table | +| Custom format = " " | public | my\_table | my\_schema | my\_table | ## Requirements @@ -122,3 +117,4 @@ The following table summarises how this works. We assume an example of replicati * Redshift * Snowflake * S3 + diff --git a/docs/understanding-airbyte/operations.md b/docs/understanding-airbyte/operations.md index e5cfb691d8e..0e8cb909765 100644 --- a/docs/understanding-airbyte/operations.md +++ b/docs/understanding-airbyte/operations.md @@ -1,39 +1,43 @@ -# Sync Operations +# Operations -Airbyte [connections](connections/README.md) support configuring additional transformations that execute after the sync. Useful applications could be: +Airbyte [connections](connections/) support configuring additional transformations that execute after the sync. Useful applications could be: -- Customized normalization to better fit the requirements of your own business context. -- Business transformations from a technical data representation into a more logical and business oriented data structure. This can facilitate usage by end-users, non-technical operators, and executives looking to generate Business Intelligence dashboards and reports. -- Data Quality, performance optimization, alerting and monitoring, etc. -- Integration with other tools from your data stack (orchestration, data visualization, etc.) +* Customized normalization to better fit the requirements of your own business context. +* Business transformations from a technical data representation into a more logical and business oriented data structure. This can facilitate usage by end-users, non-technical operators, and executives looking to generate Business Intelligence dashboards and reports. +* Data Quality, performance optimization, alerting and monitoring, etc. +* Integration with other tools from your data stack \(orchestration, data visualization, etc.\) ## Supported Operations ### dbt transformations #### - git repository url: -A url to a git repository to (shallow) clone the latest dbt project code from. + +A url to a git repository to \(shallow\) clone the latest dbt project code from. The project versioned in the repository is expected to: -- be a valid dbt package with a `dbt_project.yml` file at its root. -- have a `dbt_project.yml` with a "profile" name declared as described [here](https://docs.getdbt.com/dbt-cli/configure-your-profile). +* be a valid dbt package with a `dbt_project.yml` file at its root. +* have a `dbt_project.yml` with a "profile" name declared as described [here](https://docs.getdbt.com/dbt-cli/configure-your-profile). When using the dbt CLI, dbt checks your `profiles.yml` file for a profile with the same name. A profile contains all the details required to connect to your data warehouse. This file generally lives outside of your dbt project to avoid sensitive credentials being checked in to version control. Therefore, a `profiles.yml` will be generated according to the configured destination from the Airbyte UI. Note that if you prefer to use your own `profiles.yml` stored in the git repository or in the Docker image, then you can specify an override with `--profiles-dir=` in the dbt CLI arguments. -#### - git repository branch (optional): +#### - git repository branch \(optional\): + The name of the branch to use when cloning the git repository. If left empty, git will use the default branch of your repository. #### - docker image: + A Docker image and tag to run dbt commands from. The Docker image should have `/bin/bash` and `dbt` installed for this operation type to work. A typical value for this field would be for example: `fishtownanalytics/dbt:0.19.1` from [dbt dockerhub](https://hub.docker.com/r/fishtownanalytics/dbt/tags?page=1&ordering=last_updated). -This field lets you configure the version of dbt that your custom dbt project requires and the loading of additional software and packages necessary for your transformations (other than your dbt `packages.yml` file). +This field lets you configure the version of dbt that your custom dbt project requires and the loading of additional software and packages necessary for your transformations \(other than your dbt `packages.yml` file\). #### - dbt cli arguments + This operation type is aimed at running the dbt cli. A typical value for this field would be "run" and the actual command invoked would as a result be: `dbt run` in the docker container. @@ -49,3 +53,4 @@ One thing to consider is that dbt allows for vast configuration of the run comma ## Going Further In the meantime, please feel free to react, comment, and share your thoughts/use cases with us. We would be glad to hear your feedback and ideas as they will help shape the next set of features and our roadmap for the future. You can head to our GitHub and participate in the corresponding issue or discussions. Thank you! + diff --git a/docs/understanding-airbyte/tech-stack.md b/docs/understanding-airbyte/tech-stack.md index 1372dd1f622..9c18d894d44 100644 --- a/docs/understanding-airbyte/tech-stack.md +++ b/docs/understanding-airbyte/tech-stack.md @@ -31,18 +31,19 @@ Connectors can be written in any language. However the most common languages are ## FAQ -#### *Why do we write most destination/database connectors in Java?* +### _Why do we write most destination/database connectors in Java?_ JDBC makes writing reusable database connector frameworks fairly easy, saving us a lot of development time. -#### *Why are most REST API connectors written in Python?* +### _Why are most REST API connectors written in Python?_ -Most contributors felt comfortable writing in Python, so we created a [Python CDK](../connector-development/cdk-python) to accelerate this development. You can write a connector from scratch in any language as long as it follows the [Airbyte Specification](./airbyte-specification.md). +Most contributors felt comfortable writing in Python, so we created a [Python CDK](../connector-development/cdk-python/) to accelerate this development. You can write a connector from scratch in any language as long as it follows the [Airbyte Specification](airbyte-specification.md). -#### *Why did we choose to build the server with Java?* +### _Why did we choose to build the server with Java?_ Simply put, the team has more experience writing production Java code. -#### *Why do we use [Temporal](https://temporal.io) for orchestration?* +### _Why do we use_ [_Temporal_](https://temporal.io) _for orchestration?_ + +Temporal solves the two major hurdles that exist in orchestrating hundreds to thousands of jobs simultaneously: scaling state management and proper queue management. Temporal solves this by offering primitives that allow serialising the jobs' current runtime memory into a DB. Since a job's entire state is stored, it's trivial to recover from failures, and it's easy to determine if a job was assigned correctly. -Temporal solves the two major hurdles that exist in orchestrating hundreds to thousands of jobs simultaneously: scaling state management and proper queue management. Temporal solves this by offering primitives that allow serialising the jobs' current runtime memory into a DB. Since a job's entire state is stored, it's trivial to recover from failures, and it's easy to determine if a job was assigned correctly. \ No newline at end of file