1
0
mirror of synced 2025-12-19 18:14:56 -05:00

Documentation Reorganization (#3124)

* First reorganization pass.

* Stop auto-generating api docs html file.

* Update spelling

* Final cleanup.

* Final changes_REAL_actual_2_thisone

* fix path for generating api docs

Co-authored-by: Abhi Vaidyanatha <abhivaidyanatha@Abhis-MacBook-Pro.local>
Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: jrhizor <me@jaredrhizor.com>
This commit is contained in:
Abhi Vaidyanatha
2021-04-29 15:18:52 -07:00
committed by GitHub
parent 34433d972c
commit e378d40236
63 changed files with 237 additions and 147 deletions

View File

@@ -7,7 +7,7 @@
**Data integration made simple, secure and extensible.**
The new open-source standard to sync data from applications, APIs & databases to warehouses, lakes & other destinations.
[![](docs/.gitbook/assets/deploy-locally.svg)](docs/deploying-airbyte/on-your-workstation.md) [![](docs/.gitbook/assets/deploy-on-aws.svg)](docs/deploying-airbyte/on-aws-ec2.md) [![](docs/.gitbook/assets/deploy-on-gcp.svg)](docs/deploying-airbyte/on-gcp-compute-engine.md)
[![](docs/.gitbook/assets/deploy-locally.svg)](docs/deploying-airbyte/local-deployment.md) [![](docs/.gitbook/assets/deploy-on-aws.svg)](docs/deploying-airbyte/on-aws-ec2.md) [![](docs/.gitbook/assets/deploy-on-gcp.svg)](docs/deploying-airbyte/on-gcp-compute-engine.md)
![](docs/.gitbook/assets/airbyte-ui-for-your-integration-pipelines.png)
@@ -32,7 +32,7 @@ docker-compose up
Now visit [http://localhost:8000](http://localhost:8000)
Here is a [step-by-step guide](docs/getting-started.md) showing you how to load data from an API into a file, all on your computer.
Here is a [step-by-step guide](docs/quickstart/getting-started.md) showing you how to load data from an API into a file, all on your computer.
## Features
@@ -67,9 +67,9 @@ For general help using Airbyte, please refer to the official Airbyte documentati
## Roadmap
Check out our [roadmap](docs/roadmap.md) to get informed on what we are currently working on, and what we have in mind for the next weeks, months and years.
Check out our [roadmap](docs/project-overview/roadmap.md) to get informed on what we are currently working on, and what we have in mind for the next weeks, months and years.
## License
Airbyte is licensed under the MIT license. See the [LICENSE](docs/license.md) file for licensing information.
Airbyte is licensed under the MIT license. See the [LICENSE](docs/project-overview/license.md) file for licensing information.

View File

@@ -101,7 +101,7 @@ task generateApiDocs(type: GenerateTask) {
]
doLast {
def target = file(rootProject.file("docs/api/generated-api-html"))
def target = file(rootProject.file("docs/reference/api/generated-api-html"))
delete target
mkdir target
copy {

View File

@@ -1,25 +1,34 @@
# Table of contents
* [Overview](../README.md)
* [Getting Started](getting-started.md)
* [Introduction](../README.md)
* [Quickstart](quickstart/README.md)
* [Deploy Airbyte](quickstart/deploy-airbyte.md)
* [Add a Source](quickstart/add-a-source.md)
* [Add a Destination](quickstart/add-a-destination.md)
* [Set up a Connection](quickstart/set-up-a-connection.md)
* [Tutorials](tutorials/README.md)
* [Adding Incremental to a Source](tutorials/adding-incremental-sync.md)
* [Beginner's Guide to the AirbyteCatalog](tutorials/beginners-guide-to-catalog.md)
* [Build a Slack Activity Dashboard](tutorials/build-a-slack-activity-dashboard.md)
* [Building a Python Source](tutorials/building-a-python-source.md)
* [Building a Toy Connector](tutorials/toy-connector.md)
* [Connecting EL with T using SQL \(part 1/2\)](tutorials/connecting-el-with-t-using-sql.md)
* [Connecting EL with T using DBT \(part 2/2\)](tutorials/connecting-el-with-t-using-dbt.md)
* [Exploring Airbyte logs folder](tutorials/exploring-workspace-folder.md)
* [Postgres Replication](tutorials/postgres-replication.md)
* [Save and Search Through Your Slack History on a Free Slack Plan](tutorials/slack-history.md)
* [Browsing Output Logs](tutorials/browsing-output-logs.md)
* [Upgrading Airbyte](tutorials/upgrading-airbyte.md)
* [Using the Airflow Airbyte Operator](tutorials/using-the-airflow-airbyte-operator.md)
* [Visualizing the Time Spent by Your Team in Zoom Calls](tutorials/zoom-activity-dashboard.md)
* [Changelog](changelog/README.md)
* [Platform](changelog/platform.md)
* [Connectors](changelog/connectors.md)
* [Roadmap](roadmap.md)
* [Contributing to Airbyte](contributing-to-airbyte/tutorials/README.md)
* [A Beginner's Guide to the AirbyteCatlog](contributing-to-airbyte/tutorials/beginners-guide-to-catalog.md)
* [Building a Toy Connector](contributing-to-airbyte/tutorials/toy-connector.md)
* [Adding Incremental to a Source](contributing-to-airbyte/tutorials/adding-incremental-sync.md)
* [Building a Python Source](contributing-to-airbyte/tutorials/building-a-python-source.md)
* [Transformations and Normalization](tutorials/transformation-and-normalization/README.md)
* [Transformations with SQL \(Part 1/2\)](tutorials/transformation-and-normalization/transformations-with-sql.md)
* [Transformations with DBT \(Part 2/2\)](tutorials/transformation-and-normalization/transformations-with-sql.md)
* [Example Use Cases](examples/README.md)
* [Postgres Replication](examples/postgres-replication.md)
* [Build a Slack Activity Dashboard](examples/build-a-slack-activity-dashboard.md)
* [Visualizing the Time Spent by Your Team in Zoom Calls](examples/zoom-activity-dashboard.md)
* [Save and Search Through Your Slack History on a Free Slack Plan](examples/slack-history.md)
* [Deploying Airbyte](deploying-airbyte/README.md)
* [Local Deployment](deploying-airbyte/local-deployment.md)
* [On AWS \(EC2\)](deploying-airbyte/on-aws-ec2.md)
* [On GCP \(Compute Engine\)](deploying-airbyte/on-gcp-compute-engine.md)
* [On Kubernetes \(Alpha\)](deploying-airbyte/on-kubernetes.md)
* [On AWS ECS \(Coming Soon\)](deploying-airbyte/on-aws-ecs.md)
* [Connectors](integrations/README.md)
* [Sources](integrations/sources/README.md)
* [Appstore](integrations/sources/appstore.md)
@@ -77,27 +86,6 @@
* [Snowflake](integrations/destinations/snowflake.md)
* [Custom or New Connector](integrations/custom-connectors.md)
* [Connector Health](integrations/connector-health.md)
* [Deploying Airbyte](deploying-airbyte/README.md)
* [On Your Workstation](deploying-airbyte/on-your-workstation.md)
* [On AWS \(EC2\)](deploying-airbyte/on-aws-ec2.md)
* [On GCP \(Compute Engine\)](deploying-airbyte/on-gcp-compute-engine.md)
* [On Kubernetes \(Alpha\)](deploying-airbyte/on-kubernetes.md)
* [On AWS ECS \(Coming Soon\)](deploying-airbyte/on-aws-ecs.md)
* [API documentation](api-documentation.md)
* [Architecture](architecture/README.md)
* [AirbyteCatalog & ConfiguredAirbyteCatalog](architecture/catalog.md)
* [Airbyte Specification](architecture/airbyte-specification.md)
* [Basic Normalization](architecture/basic-normalization.md)
* [Connections](architecture/connections/README.md)
* [Full Refresh - Overwrite](architecture/connections/full-refresh-overwrite.md)
* [Full Refresh - Append](architecture/connections/full-refresh-append.md)
* [Incremental Sync - Append](architecture/connections/incremental-append.md)
* [Incremental Sync - Deduped History](architecture/connections/incremental-deduped-history.md)
* [High-level View](architecture/high-level-view.md)
* [Workers & Jobs](architecture/jobs.md)
* [Technical Stack](architecture/tech-stack.md)
* [Change Data Capture (CDC)](architecture/cdc.md)
* [Namespaces](architecture/namespaces.md)
* [Contributing to Airbyte](contributing-to-airbyte/README.md)
* [Code of Conduct](contributing-to-airbyte/code-of-conduct.md)
* [Developing Locally](contributing-to-airbyte/developing-locally.md)
@@ -110,7 +98,30 @@
* [Code Style](contributing-to-airbyte/code-style.md)
* [Updating Documentation](contributing-to-airbyte/updating-documentation.md)
* [Templates](contributing-to-airbyte/templates/README.md)
* [Connector Doc Template](contributing-to-airbyte/templates/integration-documentation-template.md)
* [Connector Doc Template](contributing-to-airbyte/templates/integration-documentation-template.md)
* [Understanding Airbyte](understanding-airbyte/README.md)
* [AirbyteCatalog & ConfiguredAirbyteCatalog](understanding-airbyte/catalog.md)
* [Airbyte Specification](understanding-airbyte/airbyte-specification.md)
* [Basic Normalization](understanding-airbyte/basic-normalization.md)
* [Connections](understanding-airbyte/connections/README.md)
* [Full Refresh - Overwrite](understanding-airbyte/connections/full-refresh-overwrite.md)
* [Full Refresh - Append](understanding-airbyte/connections/full-refresh-append.md)
* [Incremental Sync - Append](understanding-airbyte/connections/incremental-append.md)
* [Incremental Sync - Deduped History](understanding-airbyte/connections/incremental-deduped-history.md)
* [High-level View](understanding-airbyte/high-level-view.md)
* [Workers & Jobs](understanding-airbyte/jobs.md)
* [Technical Stack](understanding-airbyte/tech-stack.md)
* [Change Data Capture (CDC)](understanding-airbyte/cdc.md)
* [Namespaces](understanding-airbyte/namespaces.md)
* [API documentation](api-documentation.md)
* [Project Overview](project-overview/README.md)
* [Roadmap](project-overview/roadmap.md)
* [Changelog](project-overview/changelog/README.md)
* [Platform](project-overview/changelog/platform.md)
* [Connectors](project-overview/changelog/connectors.md)
* [License](project-overview/license.md)
* [Careers & Open Positions](career-and-open-positions/README.md)
* [Senior Software Engineer](career-and-open-positions/senior-software-engineer.md)
* [FAQ](faq/README.md)
* [Technical Support](faq/technical-support.md)
* [Getting Started](faq/getting-started.md)
@@ -122,8 +133,4 @@
* [StitchData vs Airbyte](faq/differences-with/stitchdata-vs-airbyte.md)
* [Singer vs Airbyte](faq/differences-with/singer-vs-airbyte.md)
* [Pipelinewise vs Airbyte](faq/differences-with/pipelinewise-vs-airbyte.md)
* [Meltano vs Airbyte](faq/differences-with/meltano-vs-airbyte.md)
* [Career & Open Positions](career-and-open-positions/README.md)
* [Senior Software Engineer](career-and-open-positions/senior-software-engineer.md)
* [License](license.md)
* [Meltano vs Airbyte](faq/differences-with/meltano-vs-airbyte.md)

View File

@@ -1,2 +0,0 @@
# Architecture

View File

@@ -4,7 +4,7 @@
[Airbyte](http://airbyte.io) is the upcoming open-source standard for EL\(T\). We enable data teams to replicate data from applications, APIs, and databases to data warehouses, lakes, and other destinations. We believe only an open-source approach can solve the problem of data integration, as it enables us to cover the long tail of integrations while enabling teams to adapt prebuilt connectors to their needs.
Airbyte is remote friendly, with most of the team still based in the Silicon Valley. Were fully open as a company. Our **[company handbook](https://handbook.airbyte.io)**, **[culture & values](https://handbook.airbyte.io/company/culture-and-values)**, **[strategy](https://handbook.airbyte.io/strategy/strategy)** and **[roadmap](../roadmap.md)** are open to all.
Airbyte is remote friendly, with most of the team still based in the Silicon Valley. Were fully open as a company. Our **[company handbook](https://handbook.airbyte.io)**, **[culture & values](https://handbook.airbyte.io/company/culture-and-values)**, **[strategy](https://handbook.airbyte.io/strategy/strategy)** and **[roadmap](../project-overview/roadmap.md)** are open to all.
We're backed by some of the world's [top investors](./#our-investors) and believe in product-led growth, where we build something awesome and let our product bring the users, rather than an outbound sales engine with cold calls.

View File

@@ -4,7 +4,7 @@
[Airbyte](http://airbyte.io) is the upcoming open-source standard for EL\(T\). We enable data teams to replicate data from applications, APIs, and databases to data warehouses, lakes, and other destinations. We believe only an open-source approach can solve the problem of data integration, as it enables us to cover the long tail of integrations while enabling teams to adapt prebuilt connectors to their needs.
Airbyte is remote friendly, with most of the team still based in the Silicon Valley. Were fully open as a company. Our **[company handbook](https://handbook.airbyte.io)**, **[culture & values](https://handbook.airbyte.io/company/culture-and-values)**, **[strategy](https://handbook.airbyte.io/strategy/strategy)** and **[roadmap](../roadmap.md)** are open to all.
Airbyte is remote friendly, with most of the team still based in the Silicon Valley. Were fully open as a company. Our **[company handbook](https://handbook.airbyte.io)**, **[culture & values](https://handbook.airbyte.io/company/culture-and-values)**, **[strategy](https://handbook.airbyte.io/strategy/strategy)** and **[roadmap](../project-overview/roadmap.md)** are open to all.
We're backed by some of the world's [top investors](./#our-investors) and believe in product-led growth, where we build something awesome and let our product bring the users, rather than an outbound sales engine with cold calls.

View File

@@ -4,7 +4,7 @@
[Airbyte](http://airbyte.io) is the upcoming open-source standard for EL\(T\). We enable data teams to replicate data from applications, APIs, and databases to data warehouses, lakes, and other destinations. We believe only an open-source approach can solve the problem of data integration, as it enables us to cover the long tail of integrations while enabling teams to adapt prebuilt connectors to their needs.
Airbyte is remote friendly, with most of the team still based in the Silicon Valley. Were fully open as a company. Our **[company handbook](https://handbook.airbyte.io)**, **[culture & values](https://handbook.airbyte.io/company/culture-and-values)**, **[strategy](https://handbook.airbyte.io/strategy/strategy)** and **[roadmap](../roadmap.md)** are open to all.
Airbyte is remote friendly, with most of the team still based in the Silicon Valley. Were fully open as a company. Our **[company handbook](https://handbook.airbyte.io)**, **[culture & values](https://handbook.airbyte.io/company/culture-and-values)**, **[strategy](https://handbook.airbyte.io/strategy/strategy)** and **[roadmap](../project-overview/roadmap.md)** are open to all.
We're backed by some of the world's [top investors](./#our-investors) and believe in product-led growth, where we build something awesome and let our product bring the users, rather than an outbound sales engine with cold calls.

View File

@@ -14,7 +14,7 @@ Please follow our [Code of conduct](code-of-conduct.md) in the context of any co
## Airbyte specification
Before you can start contributing, you need to understand [Airbyte's data protocol specification](../architecture/airbyte-specification.md).
Before you can start contributing, you need to understand [Airbyte's data protocol specification](../understanding-airbyte/airbyte-specification.md).
## First-time contributors, welcome!
@@ -29,7 +29,7 @@ Here is a list of easy [good first issues](https://github.com/airbytehq/airbyte/
It's easy to add your own connector to Airbyte! **Since Airbyte connectors are encapsulated within Docker containers, you can use any language you like.** Here are some links on how to add sources and destinations. We haven't built the documentation for all languages yet, so don't hesitate to reach out to us if you'd like help developing connectors in other languages.
* See [Building new connectors](building-new-connector/) to get started.
* Since we frequently build connectors in Python, on top of Singer or in Java, we've created generator libraries to get you started quickly: [Build Python Source Connectors](../tutorials/building-a-python-source.md) and [Build Java Connectors](building-new-connector/java-connectors.md)
* Since we frequently build connectors in Python, on top of Singer or in Java, we've created generator libraries to get you started quickly: [Build Python Source Connectors](tutorials/building-a-python-source.md) and [Build Java Connectors](building-new-connector/java-connectors.md)
* Integration tests \(tests that run a connector's image against an external resource\) can be run one of three ways, as detailed [here](building-new-connector/testing-connectors.md)
**Please note that, at no point in time, we will ask you to maintain your connector.** The goal is that the Airbyte team and the community helps maintain the connector.

View File

@@ -1,6 +1,6 @@
# Developing Connectors
Airbyte supports two types of connectors: Sources and Destinations. A connector takes the form of a Docker image which follows the [Airbyte specification](../../architecture/airbyte-specification.md).
Airbyte supports two types of connectors: Sources and Destinations. A connector takes the form of a Docker image which follows the [Airbyte specification](../../understanding-airbyte/airbyte-specification.md).
To build a new connector in Java or Python, we provide templates so you don't need to start everything from scratch.
@@ -8,7 +8,7 @@ To build a new connector in Java or Python, we provide templates so you don't ne
## The Airbyte specification
Before building a new connector, review [Airbyte's data protocol specification](../../architecture/airbyte-specification.md).
Before building a new connector, review [Airbyte's data protocol specification](../../understanding-airbyte/airbyte-specification.md).
## Adding a new connector
@@ -44,7 +44,7 @@ and choose the relevant template. This will generate a new connector in the `air
Search the generated directory for "TODO"s and follow them to implement your connector.
If you are developing a Python connector, you may find the [building a Python connector tutorial](../../tutorials/building-a-python-source.md) helpful.
If you are developing a Python connector, you may find the [building a Python connector tutorial](../tutorials/building-a-python-source.md) helpful.
### 2. Integration tests

View File

@@ -2,7 +2,7 @@
## Source Acceptance Tests
To ensure a minimum quality bar, Airbyte runs all connectors against the same set of integration tests \(sources & destinations have two different test suites\). Those tests ensure that each connector adheres to the [Airbyte Specification](../../architecture/airbyte-specification.md) and responds correctly to Airbyte commands when provided valid \(or invalid\) inputs.
To ensure a minimum quality bar, Airbyte runs all connectors against the same set of integration tests \(sources & destinations have two different test suites\). Those tests ensure that each connector adheres to the [Airbyte Specification](../../understanding-airbyte/airbyte-specification.md) and responds correctly to Airbyte commands when provided valid \(or invalid\) inputs.
*Note: If you are looking for reference documentation for the deprecated first version of test suites, see [Standard Tests (Legacy)](legacy-standard-source-tests.md).*

View File

@@ -0,0 +1 @@
# Tutorials

View File

@@ -2,7 +2,7 @@
## Overview
This tutorial will assume that you already have a working source. If you do not, feel free to refer to the [Building a Toy Connector](toy-connector.md) tutorial. This tutorial will build directly off the example from that article. We will also assume that you have a basic understanding of how Airbyte's Incremental-Append replication strategy works. We have a brief explanation of it [here](../architecture/connections/incremental-append.md).
This tutorial will assume that you already have a working source. If you do not, feel free to refer to the [Building a Toy Connector](toy-connector.md) tutorial. This tutorial will build directly off the example from that article. We will also assume that you have a basic understanding of how Airbyte's Incremental-Append replication strategy works. We have a brief explanation of it [here](../../understanding-airbyte/connections/incremental-append.md).
## Update Catalog in `discover`
@@ -116,5 +116,5 @@ def to_datetime(date):
return None
```
That's all you need to do to add incremental functionality to the stock ticker Source. Incremental definitely requires more configurability than full refresh, so your implementation may deviate slightly depending on whether your cursor field is source defined or user-defined. If you think you are running into one of those cases, check out our [incremental](../architecture/connections/incremental-append.md) documentation for more information on different types of configuration.
That's all you need to do to add incremental functionality to the stock ticker Source. Incremental definitely requires more configurability than full refresh, so your implementation may deviate slightly depending on whether your cursor field is source defined or user-defined. If you think you are running into one of those cases, check out our [incremental](../../understanding-airbyte/connections/incremental-append.md) documentation for more information on different types of configuration.

View File

@@ -2,7 +2,7 @@
## Overview
The goal of this article is to make the `AirbyteCatalog` approachable to someone contributing to Airbyte for the first time. If you are looking to get deeper into the details of the catalog, you can read our technical specification on it [here](../architecture/catalog.md).
The goal of this article is to make the `AirbyteCatalog` approachable to someone contributing to Airbyte for the first time. If you are looking to get deeper into the details of the catalog, you can read our technical specification on it [here](../../understanding-airbyte/catalog.md).
The goal of the `AirbyteCatalog` is to describe _what_ data is available in a source. The goal of the `ConfiguredAirbyteCatalog` is to, based on an `AirbyteCatalog`, specify _how_ data from the source is replicated.
@@ -16,7 +16,7 @@ This article will illustrate how to use `AirbyteCatalog` via a series of example
* [Dynamic Streams Example](beginners-guide-to-catalog.md#Dynamic-Streams-Example)
* [Nested Schema Example](beginners-guide-to-catalog.md#Nested-Schema-Example)
In order to understand in depth how to configure incremental data replication, head over to the [incremental replication docs](../architecture/connections/incremental-append.md).
In order to understand in depth how to configure incremental data replication, head over to the [incremental replication docs](../../understanding-airbyte/connections/incremental-append.md).
## Database Example
@@ -92,7 +92,7 @@ The catalog is structured as a list of `AirbyteStream`. In the case of a databas
Let's walk through what each field in a stream means.
* `name` - The name of the stream.
* `supported_sync_modes` - This field lists the type of data replication that this source supports. The possible values in this array include `FULL_REFRESH` \([docs](../architecture/connections/full-refresh-overwrite.md)\) and `INCREMENTAL` \([docs](../architecture/connections/incremental-append.md)\).
* `supported_sync_modes` - This field lists the type of data replication that this source supports. The possible values in this array include `FULL_REFRESH` \([docs](../../understanding-airbyte/connections/full-refresh-overwrite.md)\) and `INCREMENTAL` \([docs](../../understanding-airbyte/connections/incremental-append.md)\).
* `source_defined_cursor` - If the stream supports `INCREMENTAL` replication, then this field signal whether the source can figure out how to detect new records on its own or not.
* `json_schema` - This field is a [JsonSchema](https://json-schema.org/understanding-json-schema) object that describes the structure of the data. Notice that each key in the `properties` object corresponds to a column name in our database table.
@@ -137,7 +137,7 @@ Let's walk through each field in the `ConfiguredAirbyteStream`:
* `sync_mode` - This field must be one of the values that was in `supported_sync_modes` in the `AirbyteStream` - Configures which sync mode will be used when data is replicated.
* `stream` - Hopefully this one looks familiar! This field contains an `AirbyteStream`. It should be _identical_ the one we saw in the `AirbyteCatalog`.
* `cursor_field` - When `sync_mode` is `INCREMENTAL` and `source_defined_cursor = false`, this field configures which field in the stream will be used to determine if a record should be replicated or not. Read more about this concept in our [documentation of incremental replication](../architecture/connections/incremental-append.md).
* `cursor_field` - When `sync_mode` is `INCREMENTAL` and `source_defined_cursor = false`, this field configures which field in the stream will be used to determine if a record should be replicated or not. Read more about this concept in our [documentation of incremental replication](../../understanding-airbyte/connections/incremental-append.md).
### Summary of the Postgres Example

View File

@@ -6,7 +6,7 @@ This article provides a checklist for how to create a python source. Each step i
## Requirements
Docker, Python, and Java with the versions listed in the [tech stack section](../architecture/tech-stack.md).
Docker, Python, and Java with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md).
{% hint style="info" %}
All the commands below assume that `python` points to a version of python >3.7. On some systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the case on your machine, substitute all `python` commands in this guide with `python3` . Otherwise, make sure to install Python 3 before beginning.
@@ -151,7 +151,7 @@ The nice thing about this approach is that you are running your source exactly a
Each source contains a specification that describes what inputs it needs in order for it to pull data. This file can be found in `airbyte-integrations/connectors/source-<source-name>/spec.json`. This is a good place to start when developing your source. Using JsonSchema define what the inputs are \(e.g. username and password\). Here's [an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-postgres/src/main/resources/spec.json) of what the `spec.json` looks like for the postgres source.
For more details on what the spec is, you can read about the Airbyte Protocol [here](../architecture/airbyte-specification.md).
For more details on what the spec is, you can read about the Airbyte Protocol [here](../../understanding-airbyte/airbyte-specification.md).
The generated code that Airbyte provides, handles implementing the `spec` method for you. It assumes that there will be a file called `spec.json` in the same directory as `source.py`. If you have declared the necessary JsonSchema in `spec.json` you should be done with this step.
@@ -173,7 +173,7 @@ As described in the template code, this method takes in the same config object a
### Step 8: Set up Standard Tests
The Standard Tests are a set of tests that run against all sources. These tests are run in the Airbyte CI to prevent regressions. They also can help you sanity check that your source works as expected. The following [article](../contributing-to-airbyte/building-new-connector/testing-connectors.md) explains Standard Tests and how to run them.
The Standard Tests are a set of tests that run against all sources. These tests are run in the Airbyte CI to prevent regressions. They also can help you sanity check that your source works as expected. The following [article](../building-new-connector/testing-connectors.md) explains Standard Tests and how to run them.
You can run the tests using `./gradlew :airbyte-integrations:connectors:source-<source-name>:integrationTest`. Make sure to run this command from the Airbyte repository root.
@@ -205,7 +205,7 @@ The template fills in most of the information for the readme for you. Unless the
Open the following file: `airbyte-config/init/src/main/resources/seed/source_definitions.yaml`. You'll find a list of all the connectors that Airbyte displays in the UI. Pattern match to add your own connector. Make sure to generate a new _unique_ UUIDv4 for the `sourceDefinitionId` field. You can get one [here](https://www.uuidgenerator.net/).
Note that for simple and quick testing use cases, you can also do this step [using the UI](../integrations/custom-connectors.md#adding-your-connectors-in-the-ui).
Note that for simple and quick testing use cases, you can also do this step [using the UI](../../integrations/custom-connectors.md#adding-your-connectors-in-the-ui).
#### Step 12: Add docs

View File

@@ -6,10 +6,10 @@ description: Building a toy source connector to illustrate Airbyte's main concep
This tutorial walks you through building a simple Airbyte source to demonstrate the following concepts in Action:
* [The Airbyte Specification](../architecture/airbyte-specification.md) and the interface implemented by a source connector
* [The Airbyte Specification](../../understanding-airbyte/airbyte-specification.md) and the interface implemented by a source connector
* [The AirbyteCatalog](beginners-guide-to-catalog.md)
* [Packaging your connector](../contributing-to-airbyte/building-new-connector/#1-implement--package-the-connector)
* [Testing your connector](../contributing-to-airbyte/building-new-connector/testing-connectors.md)
* [Testing your connector](../building-new-connector/testing-connectors.md)
At the end of this tutorial, you will have a working source that you will be able to use in the Airbyte UI.
@@ -21,7 +21,7 @@ This tutorial can be done entirely on your local workstation.
To run this tutorial, you'll need:
* Docker, Python, and Java with the versions listed in the [tech stack section](../architecture/tech-stack.md).
* Docker, Python, and Java with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md).
* The `requests` Python package installed via `pip install requests` \(or `pip3` if `pip` is linked to a Python2 installation on your system\)
**A note on running Python**: all the commands below assume that `python` points to a version of python 3.7. Verify this by running
@@ -51,7 +51,7 @@ Here's the outline of what we'll do to build our connector:
Once we've completed the above steps, we will have built a functioning connector. Then, we'll add some optional functionality:
* Support [incremental sync](../architecture/connections/incremental-append.md)
* Support [incremental sync](../../understanding-airbyte/connections/incremental-append.md)
* Add custom integration tests
### 1. Bootstrap the connector package
@@ -81,7 +81,7 @@ $ npm run generate
We'll select the `generic` template and call the connector `stock-ticker-api`:
![](../.gitbook/assets/newsourcetutorial_plop.gif)
![](../../.gitbook/assets/newsourcetutorial_plop.gif)
Note: The generic template is very bare. If you are planning on developing a python source, we recommend using the `python` template. It provides some convenience code to help reduce boilerplate. This tutorial uses the bare-bones version because it makes it easier to see how all the pieces of a connector work together. You can find a walk through on how to build a python connector here \(**coming soon**\).
@@ -359,7 +359,7 @@ Our connector is able to detect valid and invalid configs correctly. Two methods
#### Implementing Discover
The `discover` command outputs a Catalog, a struct that declares the Streams and Fields \(Airbyte's equivalents of tables and columns\) output by the connector. It also includes metadata around which features a connector supports \(e.g. which sync modes\). In other words it describes what data is available in the source. If you'd like to read a bit more about this concept check out our [Beginner's Guide to the Airbyte Catalog](beginners-guide-to-catalog.md) or for a more detailed treatment read the [Airbyte Specification](../architecture/airbyte-specification.md).
The `discover` command outputs a Catalog, a struct that declares the Streams and Fields \(Airbyte's equivalents of tables and columns\) output by the connector. It also includes metadata around which features a connector supports \(e.g. which sync modes\). In other words it describes what data is available in the source. If you'd like to read a bit more about this concept check out our [Beginner's Guide to the Airbyte Catalog](beginners-guide-to-catalog.md) or for a more detailed treatment read the [Airbyte Specification](../../understanding-airbyte/airbyte-specification.md).
The data output by this connector will be structured in a very simple way. This connector outputs records belonging to exactly one Stream \(table\). Each record contains three Fields \(columns\): `date`, `price`, and `stock_ticker`, corresponding to the price of a stock on a given day.
@@ -887,7 +887,7 @@ $ docker run -v $(pwd)/secrets/valid_config.json:/data/config.json -v $(pwd)/ful
{'type': 'RECORD', 'record': {'stream': 'stock_prices', 'data': {'date': '2020-12-21', 'stock_ticker': 'TSLA', 'price': 649.86}, 'emitted_at': 1608628424000}}
```
and with that, we've packaged our connector in a functioning Docker image. The last requirement before calling this connector finished is to pass the [Airbyte Standard Test suite](../contributing-to-airbyte/building-new-connector/testing-connectors.md).
and with that, we've packaged our connector in a functioning Docker image. The last requirement before calling this connector finished is to pass the [Airbyte Standard Test suite](../building-new-connector/testing-connectors.md).
### 4. Test the connector
@@ -994,31 +994,31 @@ If this is the first time using the Airbyte UI, then you will be prompted to go
In the UI, click the "Admin" button in the left side bar:
![](../.gitbook/assets/newsourcetutorial_sidebar_admin.png)
![](../../.gitbook/assets/newsourcetutorial_sidebar_admin.png)
Then on the admin page, click "New Connector":
![](../.gitbook/assets/newsourcetutorial_admin_page.png)
![](../../.gitbook/assets/newsourcetutorial_admin_page.png)
On the modal that pops up, enter the following information then click "Add"
![](../.gitbook/assets/newsourcetutorial_new_connector_modal.png)
![](../../.gitbook/assets/newsourcetutorial_new_connector_modal.png)
Now from the "Sources" page \(if not redirected, click "Sources" on the left panel\) , click the "New source" button. You'll be taken to the detail page for adding a new source. Choose the "Stock Ticker API" source and add the following information, then click "Set up source":
![](../.gitbook/assets/newsourcetutorial_source_config.png)
![](../../.gitbook/assets/newsourcetutorial_source_config.png)
on the following page, click the "add destination" button then "add new destination":
![](../.gitbook/assets/newsourcetutorial_add_destination.png)
![](../../.gitbook/assets/newsourcetutorial_add_destination.png)
Configure a local JSON destination as follows: Note that we setup the output directory to `/local/tutorial_json`. When we run syncs, we'll find the output on our local filesystem in `/tmp/airbyte_local/tutorial_json`.
![](../.gitbook/assets/newsourcetutorial_destination_config.png)
![](../../.gitbook/assets/newsourcetutorial_destination_config.png)
Finally, setup the connection configuration:
![](../.gitbook/assets/newsourcetutorial_schema_select.png)
![](../../.gitbook/assets/newsourcetutorial_schema_select.png)
We'll choose the "manual" frequency, meaning we need to launch each sync by hand.
@@ -1028,11 +1028,11 @@ We've setup our connection! Now let's move data.
To launch the sync, click the "sync now" button:
![](../.gitbook/assets/newsourcetutorial_launchsync.png)
![](../../.gitbook/assets/newsourcetutorial_launchsync.png)
If you click on the connector row, you should be taken to the sync detail page. After a few seconds \(refresh the page if the status doesn't change to "succeeded" in a few seconds\), the status of the sync should change to `succeeded` as below:
![](../.gitbook/assets/newsourcetutorial_syncdetail.png)
![](../../.gitbook/assets/newsourcetutorial_syncdetail.png)
Let's verify the output. From your shell, run:

View File

@@ -1,4 +1,4 @@
# On Your Workstation
# Local Deployment
{% hint style="info" %}
These instructions have been tested on MacOS
@@ -7,10 +7,9 @@ These instructions have been tested on MacOS
## Setup & launch Airbyte
* Install Docker on your workstation \(see [instructions](https://www.docker.com/products/docker-desktop)\). Note: There is a known issue with docker-compose 1.27.3. If you are using that version, please upgrade to 1.27.4.
* Clone Airbyte's repository and run `docker compose`
* After Docker is installed, you can immediately get started locally by running:
```bash
# In your workstation terminal
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker-compose up

1
docs/examples/README.md Normal file
View File

@@ -0,0 +1 @@
# Examples

View File

@@ -6,7 +6,7 @@ description: Using Airbyte and MeiliSearch
![](../.gitbook/assets/slack-history-ui-title.png)
The [Slack free tier](https://slack.com/pricing/paid-vs-free) saves only the last 10K messages. For social Slack instances, it may be impractical to upgrade to a paid plan to retain these messages. Similarly, for an open-source project like [Airbyte](../architecture/catalog.md) where we interact with our community through a public Slack instance, the cost of paying for a seat for every Slack member is prohibitive.
The [Slack free tier](https://slack.com/pricing/paid-vs-free) saves only the last 10K messages. For social Slack instances, it may be impractical to upgrade to a paid plan to retain these messages. Similarly, for an open-source project like [Airbyte](../understanding-airbyte/catalog.md) where we interact with our community through a public Slack instance, the cost of paying for a seat for every Slack member is prohibitive.
However, searching through old messages can be really helpful. Losing that history feels like some advanced form of memory loss. What was that joke about Java 8 Streams? This contributor question sounds familiar—haven't we seen it before? But you just can't remember!

View File

@@ -2,7 +2,7 @@
## **What do I need to get started using Airbyte?**
You can deploy Airbyte in several ways, as [documented here](../deploying-airbyte/). Airbyte will then help you replicate data between a source and a destination. Airbyte offers pre-built connectors for both, you can see their list [here](../changelog/connectors.md). If you dont see the connector you need, you can [build your connector yourself](../contributing-to-airbyte/building-new-connector/) and benefit from Airbytes optional scheduling, orchestration and monitoring modules.
You can deploy Airbyte in several ways, as [documented here](../deploying-airbyte/). Airbyte will then help you replicate data between a source and a destination. Airbyte offers pre-built connectors for both, you can see their list [here](../project-overview/changelog/connectors.md). If you dont see the connector you need, you can [build your connector yourself](../contributing-to-airbyte/building-new-connector/) and benefit from Airbytes optional scheduling, orchestration and monitoring modules.
## **How long does it take to set up Airbyte?**
@@ -10,7 +10,7 @@ It depends on your source and destination. Check our setup guides to see the tas
## **What data sources does Airbyte offer connectors for?**
We already offer 50+ connectors, and will focus all our effort in ramping up the number of connectors and strengthening them. View the [full list here](../changelog/connectors.md). If you dont see a source you need, you can file a [connector request here](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=area%2Fintegration%2C+new-integration&template=new-integration-request.md&title=).
We already offer 50+ connectors, and will focus all our effort in ramping up the number of connectors and strengthening them. View the [full list here](../project-overview/changelog/connectors.md). If you dont see a source you need, you can file a [connector request here](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=area%2Fintegration%2C+new-integration&template=new-integration-request.md&title=).
## **Where can I see my data in Airbyte?**

View File

@@ -45,7 +45,7 @@ If the above workaround does not fix your problem, please report it [here](https
## Your incremental connection is not working
Our current version of incremental is [append](../architecture/connections/incremental-append.md). It works from a cursor field. So you need to check which cursor field you're using and if it's well populated in every record in your table.
Our current version of incremental is [append](../understanding-airbyte/connections/incremental-append.md). It works from a cursor field. So you need to check which cursor field you're using and if it's well populated in every record in your table.
If this is true, then, there are still several things to check:

View File

@@ -16,4 +16,4 @@ For now, the schema can only be updated manually in the UI \(by clicking "Update
## **How does Airbyte handle namespaces (or schemas for the DB-inclined)?**
Airbyte respects source-defined namespaces when syncing data with a namespace-supported destination. See [this](../architecture/namespaces.md) for more details.
Airbyte respects source-defined namespaces when syncing data with a namespace-supported destination. See [this](../understanding-airbyte/namespaces.md) for more details.

View File

@@ -10,7 +10,7 @@ Each sheet in the selected spreadsheet will be output as a separate stream. Each
Airbyte only supports replicating Grid sheets. See the [Google Sheets API docs](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/sheets#SheetType) for more info on all available sheet types.
**Note: Sheet names and column headers must contain only alphanumeric characters or `_`, as specified in the** [**Airbyte Protocol**](../../architecture/airbyte-specification.md). If your sheet or column header is named e.g: "the data", you'll need to change it to "the\_data" for it to be synced by Airbyte. This restriction does not apply to non-header cell values: those can contain any unicode characters. This limitation is temporary and future versions of Airbyte will support more permissive naming patterns.
**Note: Sheet names and column headers must contain only alphanumeric characters or `_`, as specified in the** [**Airbyte Protocol**](../../understanding-airbyte/airbyte-specification.md). If your sheet or column header is named e.g: "the data", you'll need to change it to "the\_data" for it to be synced by Airbyte. This restriction does not apply to non-header cell values: those can contain any unicode characters. This limitation is temporary and future versions of Airbyte will support more permissive naming patterns.
### Data type mapping

View File

@@ -111,7 +111,7 @@ We use [logical replication](https://www.postgresql.org/docs/10/logical-replicat
We do not require installing custom plugins like `wal2json` or `test_decoding`. We use `pgoutput`, which is included in Postgres 10+ by default.
Please read the [CDC docs](../../architecture/cdc.md) for an overview of how Airbyte approaches CDC.
Please read the [CDC docs](../../understanding-airbyte/cdc.md) for an overview of how Airbyte approaches CDC.
### Should I use CDC for Postgres?
* If you need a record of deletions and can accept the limitations posted below, you should to use CDC for Postgres.
@@ -120,7 +120,7 @@ Please read the [CDC docs](../../architecture/cdc.md) for an overview of how Air
* If your table has a primary key but doesn't have a reasonable cursor field for incremental syncing (i.e. `updated_at`), CDC allows you to sync your table incrementally.
### CDC Limitations
* Make sure to read our [CDC docs](../../architecture/cdc.md) to see limitations that impact all databases using CDC replication.
* Make sure to read our [CDC docs](../../understanding-airbyte/cdc.md) to see limitations that impact all databases using CDC replication.
* CDC is only available for Postgres 10+.
* Airbyte requires a replication slot configured only for its use. Only one source should be configured that uses this replication slot. Instructions on how to set up a replication slot can be found below.
* Log-based replication only works for master instances of Postgres.

View File

@@ -14,7 +14,7 @@ The Smartsheet Source is written to pull data from a single Smartsheet spreadshe
To replicate multiple spreadsheets, you can create multiple instances of the Smartsheet Source in Airbyte, reusing the API token for all your sheets that you need to sync.
**Note: Column headers must contain only alphanumeric characters or `_` , as specified in the** [**Airbyte Protocol**](../../architecture/airbyte-specification.md).
**Note: Column headers must contain only alphanumeric characters or `_` , as specified in the** [**Airbyte Protocol**](../../understanding-airbyte/airbyte-specification.md).
### Column datatype mapping
The data type mapping adopted by this connector is based on the Smartsheet [documentation](https://smartsheet.redoc.ly/tag/columnsRelated#section/Column-Types).

View File

@@ -0,0 +1 @@
# Project Overview

View File

@@ -115,7 +115,7 @@ Other progress on connectors:
## 01/19/2021
* **Our new** [**Connector Health Status**](../integrations/connector-health.md) **page**
* **Our new** [**Connector Health Status**](../../integrations/connector-health.md) **page**
* **1 new source:** App Store \(thanks to [@Muriloo](https://github.com/Muriloo)\)
* Fixes on connectors:
* Bug fix writing boolean columns to Redshift
@@ -125,14 +125,14 @@ Other progress on connectors:
## 01/12/2021
* **1 new source:** Tempo \(thanks to [@thomasvl](https://github.com/thomasvl)\)
* **Incremental support for 3 new source connectors:** [Salesforce](../integrations/sources/salesforce.md), [Slack](../integrations/sources/slack.md) and [Braintree](../integrations/sources/braintree.md)
* **Incremental support for 3 new source connectors:** [Salesforce](../../integrations/sources/salesforce.md), [Slack](../../integrations/sources/slack.md) and [Braintree](../../integrations/sources/braintree.md)
* Fixes on connectors:
* Fix a bug in MSSQL and Redshift source connectors where custom SQL types weren't being handled correctly.
* Improvement of the Snowflake connector from [@hudsondba](https://github.com/hudsondba) \(batch size and timeout sync\)
## 01/05/2021
* **Incremental support for 2 new source connectors:** [Mixpanel](../integrations/sources/mixpanel.md) and [Hubspot](../integrations/sources/hubspot.md)
* **Incremental support for 2 new source connectors:** [Mixpanel](../../integrations/sources/mixpanel.md) and [Hubspot](../../integrations/sources/hubspot.md)
* Fixes on connectors:
* Fixed a bug in the github connector where the connector didnt verify the provided API token was granted the correct permissions
* Fixed a bug in the Google sheets connector where rate limits were not always respected
@@ -140,68 +140,68 @@ Other progress on connectors:
## 12/30/2020
**New sources:** [Plaid](../integrations/sources/plaid.md) \(contributed by [tgiardina](https://github.com/tgiardina)\), [Looker](../integrations/sources/looker.md)
**New sources:** [Plaid](../../integrations/sources/plaid.md) \(contributed by [tgiardina](https://github.com/tgiardina)\), [Looker](../../integrations/sources/looker.md)
## 12/18/2020
**New sources:** [Drift](../integrations/sources/drift.md), [Microsoft Teams](../integrations/sources/microsoft-teams.md)
**New sources:** [Drift](../../integrations/sources/drift.md), [Microsoft Teams](../../integrations/sources/microsoft-teams.md)
## 12/10/2020
**New sources:** [Intercom](../integrations/sources/intercom.md), [Mixpanel](../integrations/sources/mixpanel.md), [Jira Cloud](../integrations/sources/jira.md), [Zoom](../integrations/sources/zoom.md)
**New sources:** [Intercom](../../integrations/sources/intercom.md), [Mixpanel](../../integrations/sources/mixpanel.md), [Jira Cloud](../../integrations/sources/jira.md), [Zoom](../../integrations/sources/zoom.md)
## 12/07/2020
**New sources:** [Slack](../integrations/sources/slack.md), [Braintree](../integrations/sources/braintree.md), [Zendesk Support](../integrations/sources/zendesk-support.md)
**New sources:** [Slack](../../integrations/sources/slack.md), [Braintree](../../integrations/sources/braintree.md), [Zendesk Support](../../integrations/sources/zendesk-support.md)
## 12/04/2020
**New sources:** [Redshift](../integrations/sources/redshift.md), [Greenhouse](../integrations/sources/greenhouse.md)
**New destination:** [Redshift](../integrations/destinations/redshift.md)
**New sources:** [Redshift](../../integrations/sources/redshift.md), [Greenhouse](../../integrations/sources/greenhouse.md)
**New destination:** [Redshift](../../integrations/destinations/redshift.md)
## 11/30/2020
**New sources:** [Freshdesk](../integrations/sources/freshdesk.md), [Twilio](../integrations/sources/twilio.md)
**New sources:** [Freshdesk](../../integrations/sources/freshdesk.md), [Twilio](../../integrations/sources/twilio.md)
## 11/25/2020
**New source:** [Recurly](../integrations/sources/recurly.md)
**New source:** [Recurly](../../integrations/sources/recurly.md)
## 11/23/2020
**New source:** [Sendgrid](../integrations/sources/sendgrid.md)
**New source:** [Sendgrid](../../integrations/sources/sendgrid.md)
## 11/18/2020
**New source:** [Mailchimp](../integrations/sources/mailchimp.md)
**New source:** [Mailchimp](../../integrations/sources/mailchimp.md)
## 11/13/2020
**New source:** [MSSQL](../integrations/sources/mssql.md)
**New source:** [MSSQL](../../integrations/sources/mssql.md)
## 11/11/2020
**New source:** [Shopify](../integrations/sources/shopify.md)
**New source:** [Shopify](../../integrations/sources/shopify.md)
## 11/09/2020
**New sources:** [Files \(CSV, JSON, HTML...\)](../integrations/sources/file.md)
**New sources:** [Files \(CSV, JSON, HTML...\)](../../integrations/sources/file.md)
## 11/04/2020
**New sources:** [Facebook Ads](connectors.md), [Google Ads](../integrations/sources/google-adwords.md), [Marketo](../integrations/sources/marketo.md)
**New destination:** [Snowflake](../integrations/destinations/snowflake.md)
**New sources:** [Facebook Ads](connectors.md), [Google Ads](../../integrations/sources/google-adwords.md), [Marketo](../../integrations/sources/marketo.md)
**New destination:** [Snowflake](../../integrations/destinations/snowflake.md)
## 10/30/2020
**New sources:** [Salesforce](../integrations/sources/salesforce.md), [Google Analytics](../integrations/sources/googleanalytics.md), [Hubspot](../integrations/sources/hubspot.md), [GitHub](../integrations/sources/github.md), [Google Sheets](../integrations/sources/google-sheets.md), [Rest APIs](connectors.md), and [MySQL](../integrations/sources/mysql.md)
**New sources:** [Salesforce](../../integrations/sources/salesforce.md), [Google Analytics](../../integrations/sources/googleanalytics.md), [Hubspot](../../integrations/sources/hubspot.md), [GitHub](../../integrations/sources/github.md), [Google Sheets](../../integrations/sources/google-sheets.md), [Rest APIs](connectors.md), and [MySQL](../../integrations/sources/mysql.md)
## 10/21/2020
**New destinations:** we built our own connectors for [BigQuery](../integrations/destinations/bigquery.md) and [Postgres](../integrations/destinations/postgres.md), to ensure they are of the highest quality.
**New destinations:** we built our own connectors for [BigQuery](../../integrations/destinations/bigquery.md) and [Postgres](../../integrations/destinations/postgres.md), to ensure they are of the highest quality.
## 09/23/2020
**New sources:** [Stripe](../integrations/sources/stripe.md), [Postgres](../integrations/sources/postgres.md)
**New destinations:** [BigQuery](../integrations/destinations/bigquery.md), [Postgres](../integrations/destinations/postgres.md), [local CSV](../integrations/destinations/local-csv.md)
**New sources:** [Stripe](../../integrations/sources/stripe.md), [Postgres](../../integrations/sources/postgres.md)
**New destinations:** [BigQuery](../../integrations/destinations/bigquery.md), [Postgres](../../integrations/destinations/postgres.md), [local CSV](../../integrations/destinations/local-csv.md)

View File

@@ -162,7 +162,7 @@ If you're interested in our progress on the Airbyte platform, please read below!
* **Incremental - Append"**
* We now allow sources to replicate only new or modified data. This enables to avoid re-fetching data that you have already replicated from a source.
* The delta from a sync will be _appended_ to the existing data in the data warehouse.
* Here are [all the details of this feature](../architecture/connections/incremental-append.md).
* Here are [all the details of this feature](../../understanding-airbyte/connections/incremental-append.md).
* It has been released for 15 connectors, including Postgres, MySQL, Intercom, Zendesk, Stripe, Twilio, Marketo, Shopify, GitHub, and all the destination connectors. We will expand it to all the connectors in the next couple of weeks.
* **Other features:**
* Improve interface for writing python sources \(should make writing new python sources easier and clearer\).

View File

@@ -0,0 +1 @@
# Getting Started

View File

@@ -0,0 +1,13 @@
# Add a Destination
The destination we are creating is a simple JSON line file, meaning that it will contain one JSON object per line. Each objects will represent data extracted from the source.
The resulting files will be located in `/tmp/airbyte_local/json_data`
To set it up, just follow the instructions on the screenshot below.
{% hint style="info" %}
You might have to wait ~30 seconds before the fields show up because it is the first time you're using Airbyte.
{% endhint %}
![](../.gitbook/assets/demo_destination.png)

View File

@@ -0,0 +1,11 @@
# Add a Source
Our demo source will pull data from an external API. It will replicate the closing price of currencies compared to USD since the specified start date.
To set it up, just follow the instructions on the screenshot below.
{% hint style="info" %}
You might have to wait ~30 seconds before the fields show up because it is the first time you're using Airbyte.
{% endhint %}
![](../.gitbook/assets/demo_source.png)

View File

@@ -0,0 +1,12 @@
# Deploy Airbyte
* Install Docker on your workstation \(see [instructions](https://www.docker.com/products/docker-desktop)\). Note: There is a known issue with docker-compose 1.27.3. If you are using that version, please upgrade to 1.27.4.
* After Docker is installed, you can immediately get started locally by running:
```bash
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker-compose up
```
* Once you see an Airbyte banner, the UI is ready to go at [http://localhost:8000](http://localhost:8000)!

View File

@@ -20,7 +20,7 @@ Once you see an Airbyte banner, the UI is ready to go at [http://localhost:8000/
You should see an onboarding page. Enter your email if you want updates about Airbyte and continue.
![](.gitbook/assets/airbyte_get-started.png)
![](../.gitbook/assets/airbyte_get-started.png)
## Set up your first connection
@@ -34,7 +34,7 @@ To set it up, just follow the instructions on the screenshot below.
You might have to wait ~30 seconds before the fields show up because it is the first time you're using Airbyte.
{% endhint %}
![](.gitbook/assets/demo_source.png)
![](../.gitbook/assets/demo_source.png)
### Create a destination
@@ -48,7 +48,7 @@ To set it up, just follow the instructions on the screenshot below.
You might have to wait ~30 seconds before the fields show up because it is the first time you're using Airbyte.
{% endhint %}
![](.gitbook/assets/demo_destination.png)
![](../.gitbook/assets/demo_destination.png)
### Create connection
@@ -56,7 +56,7 @@ When we create the connection, we can select which data stream we want to replic
To set it up, just follow the instructions on the screenshot below.
![](.gitbook/assets/demo_connection.png)
![](../.gitbook/assets/demo_connection.png)
## Check the logs of your first sync
@@ -64,7 +64,7 @@ After you've completed the onboarding, you will be redirected to the source list
From there, you can look at the logs, download them, force a sync and adjust the configuration of your connection.
![](.gitbook/assets/demo_history.png)
![](../.gitbook/assets/demo_history.png)
## Check the data of your first sync

View File

@@ -0,0 +1,42 @@
# Set up a Connection
When we create the connection, we can select which data stream we want to replicate. We can also select if we want an incremental replication. The replication will run at the specified sync frequency.
To set it up, just follow the instructions on the screenshot below.
![](../.gitbook/assets/demo_connection.png)
## Check the logs of your first sync
After you've completed the onboarding, you will be redirected to the source list and will see the source you just added. Click on it to find more information about it. You will now see all the destinations connected to that source. Click on it and you will see the sync history.
From there, you can look at the logs, download them, force a sync and adjust the configuration of your connection.
![](../.gitbook/assets/demo_history.png)
## Check the data of your first sync
Now let's verify that this worked:
```bash
cat /tmp/airbyte_local/json_data/_airbyte_raw_exchange_rate.jsonl
```
You should see one line for each day that was replicated.
If you have [`jq`](https://stedolan.github.io/jq/) installed, let's look at the evolution of `EUR`.
```bash
cat /tmp/airbyte_local/test_json/_airbyte_raw_exchange_rate.jsonl |
jq -c '.data | {date: .date, EUR: .EUR }'
```
And there you have it. You've pulled data from an API directly into a file and all of the actual configuration for this replication only took place in the UI.
## That's it!
This is just the beginning of using Airbyte. We support a large collection of sources and destinations. You can even contribute your own.
If you have any questions at all, please reach out to us on [Slack](https://slack.airbyte.io/). Were still in alpha, so if you see any rough edges or want to request a connector you need, please create an issue on our [Github](https://github.com/airbytehq/airbyte) or leave a thumbs up on an existing issue.
Thank you and we hope you enjoy using Airbyte.

1
docs/reference/README.md Normal file
View File

@@ -0,0 +1 @@
# Reference

View File

@@ -1,4 +1,4 @@
# Exploring Airbyte logs folder
# Browsing Output Logs
## Overview
@@ -72,7 +72,7 @@ normalize target_config.json
### Reading the content of the catalog.json file
For example, it is often useful to inspect the content of the [catalog](beginners-guide-to-catalog.md) file. You could do so by running a `cat` command:
For example, it is often useful to inspect the content of the [catalog](../contributing-to-airbyte/tutorials/beginners-guide-to-catalog.md) file. You could do so by running a `cat` command:
```bash
docker run -it --rm --volume airbyte_workspace:/data busybox cat /data/9/2/catalog.json

View File

@@ -0,0 +1 @@
# Transformation and Normalization

View File

@@ -1,10 +1,10 @@
# Connecting EL with T using DBT \(part 2/2\)
# Transformations with DBT \(part 2/2\)
## Overview
This tutorial will describe how to integrate SQL based transformations with Airbyte syncs using specialized transformation tool: DBT.
This tutorial is the second part of the previous tutorial [Connecting EL with T using SQL](connecting-el-with-t-using-sql.md).
This tutorial is the second part of the previous tutorial [Connecting EL with T using SQL](transformations-with-sql.md).
## Run Transformations with DBT
@@ -68,7 +68,7 @@ Completed successfully
Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
```
As seen in the tutorial on [exploring workspace folder](exploring-workspace-folder.md), it is possible to browse the `normalize` folder and examine further logs if an error occurs.
As seen in the tutorial on [exploring workspace folder](../browsing-output-logs.md), it is possible to browse the `normalize` folder and examine further logs if an error occurs.
In particular, we can also take a look at the DBT models generated by Airbyte and export them to the local host filesystem:

View File

@@ -1,10 +1,10 @@
# Connecting EL with T using SQL \(part 1/2\)
# Transformations with SQL \(part 1/2\)
## Overview
This tutorial will describe how to integrate SQL based transformations with Airbyte syncs using plain SQL queries.
This is the first part of ELT tutorial. The second part goes deeper with [connecting EL with T using DBT](connecting-el-with-t-using-dbt.md).
This is the first part of ELT tutorial. The second part goes deeper with [connecting EL with T using DBT](transformations-with-dbt.md).
## First transformation step: Normalization
@@ -12,13 +12,13 @@ At its core, Airbyte is geared to handle the EL \(Extract Load\) steps of an ELT
However, this is actually producing a table in the destination with a JSON blob column... For the typical analytics use case, you probably want this json blob normalized so that each field is its own column.
So, after EL, comes the T \(transformation\) and the first T step that Airbyte actually applies on top of the extracted data is called "Normalization". You can find more information about it [here](../architecture/basic-normalization.md).
So, after EL, comes the T \(transformation\) and the first T step that Airbyte actually applies on top of the extracted data is called "Normalization". You can find more information about it [here](../../understanding-airbyte/basic-normalization.md).
Airbyte runs this step before handing the final data over to other tools that will manage further transformation down the line.
To summarize, we can represent the ELT process in the diagram below. These are steps that happens between your "Source Database or API" and the final "Replicated Tables" with examples of implementation underneath:
![](../.gitbook/assets/connecting-EL-with-T-4.png)
![](../../.gitbook/assets/connecting-EL-with-T-4.png)
Anyway, it is possible to short-circuit this process \(no vendor lock-in\) and handle it yourself by turning this option off in the destination settings page.
@@ -30,7 +30,7 @@ This could be useful if:
In order to do so, we will now describe how you can leverage the basic normalization outputs that Airbyte generates to build your own transformations if you don't want to start from scratch.
Note: We will rely on docker commands that we've gone over as part of another [Tutorial on Exploring Docker Volumes](exploring-workspace-folder.md).
Note: We will rely on docker commands that we've gone over as part of another [Tutorial on Exploring Docker Volumes](../browsing-output-logs.md).
## \(Optional\) Configure some Covid \(data\) source and Postgres destinations
@@ -43,15 +43,15 @@ Here are some examples of public API CSV:
https://storage.googleapis.com/covid19-open-data/v2/latest/epidemiology.csv
```
![](../.gitbook/assets/connecting-EL-with-T-1.png)
![](../../.gitbook/assets/connecting-EL-with-T-1.png)
And a local Postgres Database, making sure that "Basic normalization" is enabled:
![](../.gitbook/assets/connecting-EL-with-T-2.png)
![](../../.gitbook/assets/connecting-EL-with-T-2.png)
After setting up the connectors, we can trigger the sync and study the logs:
![](../.gitbook/assets/connecting-EL-with-T-3.png)
![](../../.gitbook/assets/connecting-EL-with-T-3.png)
Notice that the process ran in the `/tmp/workspace/5/0` folder.
@@ -278,5 +278,5 @@ create view "postgres"."public"."covid_epidemiology" as (
Then you can run in your preferred SQL editor or tool!
If you are familiar with DBT or want to learn more about it, you can continue with the following [tutorial using DBT](connecting-el-with-t-using-dbt.md)...
If you are familiar with DBT or want to learn more about it, you can continue with the following [tutorial using DBT](transformations-with-dbt.md)...

View File

@@ -16,7 +16,7 @@ First, make sure you have Docker installed. \(We'll be using the `docker-compose
### **Start Airbyte**
If this is your first time using Airbyte, we suggest going through our [Basic Tutorial](../getting-started.md). This tutorial will use the Connection set up in the basic tutorial.
If this is your first time using Airbyte, we suggest going through our [Basic Tutorial](../quickstart/getting-started.md). This tutorial will use the Connection set up in the basic tutorial.
For the purposes of this tutorial, set your Connection's **sync frequency** to **manual**. Airflow will be responsible for manually triggering the Airbyte job.

View File

@@ -0,0 +1,2 @@
# Understanding Airbyte

View File

@@ -245,6 +245,6 @@ As an example from the hubspot source, we could have the following tables with n
Note that all the choices made by Normalization as described in this documentation page in terms of naming could be overriden by your own custom choices. To do so, you can follow the following tutorial
* to build a [custom SQL view](../tutorials/connecting-el-with-t-using-sql.md) with your own naming conventions
* to export, edit and run [custom DBT normalization](../tutorials/connecting-el-with-t-using-dbt.md) yourself
* to build a [custom SQL view](../tutorials/transformation-and-normalization/transformations-with-sql.md) with your own naming conventions
* to export, edit and run [custom DBT normalization](../tutorials/transformation-and-normalization/transformations-with-dbt.md) yourself

View File

@@ -138,5 +138,5 @@ Those concerns could be solved by using a different sync mode based on binary lo
The current behavior of **Incremental** is not able to handle source schema changes yet, for example, when a column is added, renamed or deleted from an existing table etc. It is recommended to trigger a [Full refresh - Overwrite](full-refresh-overwrite.md) to correctly replicate the data to the destination with the new schema changes.
If you are not satisfied with how transformations are applied on top of the appended data, you can find more relevant SQL transformations you might need to do on your data in the [Connecting EL with T using SQL \(part 1/2\)](../../tutorials/connecting-el-with-t-using-sql.md#simple-sql-query)
If you are not satisfied with how transformations are applied on top of the appended data, you can find more relevant SQL transformations you might need to do on your data in the [Connecting EL with T using SQL \(part 1/2\)](../../tutorials/transformation-and-normalization/transformations-with-sql.md#simple-sql-query)

View File

@@ -168,5 +168,5 @@ Those concerns could be solved by using a different sync mode based on binary lo
The current behavior of **Incremental** is not able to handle source schema changes yet, for example, when a column is added, renamed or deleted from an existing table etc. It is recommended to trigger a [Full refresh - Overwrite](full-refresh-overwrite.md) to correctly replicate the data to the destination with the new schema changes.
If you are not satisfied with how transformations are applied on top of the appended data, you can find more relevant SQL transformations you might need to do on your data in the [Connecting EL with T using SQL \(part 1/2\)](../../tutorials/connecting-el-with-t-using-sql.md#simple-sql-query)
If you are not satisfied with how transformations are applied on top of the appended data, you can find more relevant SQL transformations you might need to do on your data in the [Connecting EL with T using SQL \(part 1/2\)](../../tutorials/transformation-and-normalization/transformations-with-sql.md#simple-sql-query)

View File

@@ -13,7 +13,7 @@ The worker has 4 main responsibilities in its lifecycle.
1. Spin up any connector docker containers that are needed for the job.
2. They facilitate message passing to or from a connector docker container (more on this [below](#message-passing)).
3. Shut down any connector docker containers that it started.
4. Return the output of the job. (See [Airbyte Specification](./airbyte-specification.md) to understand the output of each worker type.)
4. Return the output of the job. (See [Airbyte Specification](airbyte-specification.md) to understand the output of each worker type.)
## Message Passing
@@ -23,7 +23,7 @@ There are 2 flavors of workers:
In the first case, the worker is generally extracting data from the connector and reporting it back to the scheduler. It does this by listening to STDOUT of the connector. In the second case, the worker is facilitating passing data (via record messages) from the source to the destination. It does this by listening on STDOUT of the source and writing to STDIN on the destination.
For more information on the schema of the messages that are passed, refer to [Airbyte Specification](./airbyte-specification.md).
For more information on the schema of the messages that are passed, refer to [Airbyte Specification](airbyte-specification.md).
## Worker Lifecycle