1
0
mirror of synced 2025-12-19 18:14:56 -05:00

Single node data plane documentation (#66176)

This commit is contained in:
Ian Alton
2025-09-12 09:47:06 -07:00
committed by GitHub
parent 08d65fba26
commit 1fbd26b697
6 changed files with 413 additions and 69 deletions

View File

@@ -7,6 +7,9 @@
"MD033": false, // Disable the "no-inline-html" rule (allows inline HTML in Markdown)
"MD010": false, // Disable the "no-hard-tabs" rule (allows hard tabs instead of spaces)
"MD041": false, // Disable the "first-line-heading/first-line-h1" rule (allows files without a top-level heading as the first line)
"MD046": {
"style": "fenced" // Enforce fenced code blocks (e.g. ` or ```) and don't permit code sample indentation without fences
},
"MD004": {
"style": "dash" // Enforce "dash" style for unordered list items (e.g., "- item")
}

View File

@@ -0,0 +1,301 @@
---
products: enterprise-flex
sidebar_label: Deploy a data plane with Airbox
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Deploy a data plane with Airbox in Enterprise Flex
Airbox is Airbyte's command line tool for managing Airbyte data planes on Kubernetes. It's the ideal way to deploy and manage data planes for teams that have limited Kubernetes expertise or don't want to deploy with Helm.
At the end of this guide, you'll have an Airbyte workspace that runs connections using a self-managed data plane.
## Prerequisites
Before you begin, ensure you satisfy all of these requirements.
### Subscription and permission requirements
- An active subscription to Airbyte Enterprise Flex
- You must be an organization Admin to manage data planes
### Infrastructure requirements
- A single-node on which to deploy your data plane. This can be a virtual machine from a Cloud provider, a bare metal server, or even your local computer.
- Minimum specs: 8 CPUs and 16 GB of RAM
- Recommended specs: 8 CPUs and 24 GB of RAM
### Software requirements
- Docker Desktop or Docker Engine (installation is described below)
- To manage and monitor your data plane after installation, you should also install these command line tools, although this isn't strictly necessary.
- [Helm](https://helm.sh/)
- [kind](https://kind.sigs.k8s.io/)
- [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl)
### Security considerations
- Self-managed data planes require egress from your network to Airbyte's managed control plane.
- Self-managed data planes only send requests to the control plane. The control plane must be able to send responses to the data plane, but not requests.
### Workspaces
You should already have considered [what regions and workspaces](getting-started) you need to satisfy your compliance and data sovereignty needs.
## Part 1. Install Airbox
You can install Airbox [as a binary](https://github.com/airbytehq/abctl/releases/tag/airbox-v0.1.0-beta1). Downloads are available for Windows, Mac, and Linux.
## Part 2: Install Docker Desktop
Install Docker Desktop on the machine that will host your data plane. Follow the steps for your operating system in Docker's online help, linked below.
- [Mac](https://docs.docker.com/desktop/install/mac-install/)
- [Windows](https://docs.docker.com/desktop/install/windows-install/)
- [Linux](https://docs.docker.com/desktop/install/linux-install/) - If you're installing on a Linux headless virtual machine, it's easier to use [Docker Engine](https://docs.docker.com/engine/install/) instead of Docker Desktop.
You don't need to do anything with Docker, but you do need to run it in the background. Once it's open, minimize it and proceed to Part 3.
:::info Why do you need Docker?
Airbyte runs on Kubernetes. When you deploy your data plane, Airbyte uses Docker to create a Kubernetes cluster on the computer hosting the data plane.
:::
## Part 3: Set credentials {#set-credentials}
You need an Airbyte application so Airbox can access your control plane. If you don't have one, create one and note the Client ID and Client Secret. If you already have one, you can skip creating one and reuse your existing credentials.
1. In Airbyte's UI, click your user name > **User Settings** > **Applications** > **Create an application**.
2. Enter a descriptive application name. For example, "Data plane deployment." Airbyte creates your application. Note the Client ID and Client Secret.
3. In your terminal, set the application credentials you created as environment variables.
```bash
export AIRBYTE_CLIENT_ID="<CLIENT_ID>"
export AIRBYTE_CLIENT_SECRET="<CLIENT_SECRET>"
```
## Part 4: Configure Airbox
After you enter your client ID and client secret, configure Airbox to access your Cloud control plane.
1. Configure Airbox to interact with your Airbyte control plane.
```bash
airbox config init
```
2. Select **Enterprise Flex** and press <kbd>Enter</kbd>
## Part 5: Authenticate with Airbyte
After configuring Airbyte, but before you can manage data planes, you must authenticate with it. You can also log out and, if you work in multiple organizations, switch between them.
### Log in
After you configure Airbyte, authenticate with it. Run the following command so Airbox can use the client ID and client secret you set earlier to authenticate with your Airbyte environment.
```bash
airbox auth login
```
You see the following result.
```bash
Authenticating with Airbyte
Connecting to: https://api.airbyte.com
Successfully authenticated!
```
Continue to Part 6.
### Log out
If you need to clear the authentication token Airbox uses to access your data plane, log out.
```bash
airbox auth logout
```
This doesn't remove the client ID and client secret from Airbyte. If you need to rotate credentials, you must also delete [your application](../using-airbyte/configuring-api-access).
### Switch organizations
1. If you use multiple Airbyte organizations, you can switch between them with the following command.
```bash
airbox auth switch-organization
```
If you belong to multiple organizations, Airbox shows you that list. If not, Airbox automatically sets you to your single organization again.
2. Choose the new organization you want to connect to and press <kbd>Enter</kbd>.
## Part 6: Deploy a data plane
After you authenticate with Airbyte, run the install command. This begins a process that creates a new kind cluster in Docker, registers the data plane with Airbyte's managed control plane, and deploys the data plane for use.
1. Install your data plane.
```bash
airbox install dataplane
```
2. Follow the prompts in the terminal.
1. Choose whether you want to create a new region or use an existing one (if you have some).
:::tip
To avoid confusion later, your regions in Airbyte should reflect the actual regions your data planes run in. For example, if you are installing this data plane in the AWS `us-west-1` region, you may wish to call it `us-west-1` or something similar.
:::
2. Name your data plane.
The process looks similar to this.
```bash
$ airbox install dataplane
Starting interactive dataplane installation
Select region option:
Use existing region
> Create new region
Enter new region name:
> us-west-1
Enter dataplane name:
> us-west-1-dataplane-1
Dataplane Credentials:
DataplaneID: <dataplane_ID>
ClientID: <client_ID>
ClientSecret: <client_secret>
Dataplane 'us-west-1-dataplane-1' installed successfully!
```
## Part 7: Assign a workspace to your data plane
If this data plane is in a new region, or you want a workspace to use this region now, in Airbyte's UI, follow these steps.
1. Click **Workspace settings** > **General**.
2. Under **Region**, select the region you created that contains your data plane.
## Part 8: Verify your data plane is running correctly
Once you assign your workspace to your data plane, verify that data plane runs syncs and creates pods correctly.
1. Create a connection.
1. Add the [Sample Data](/integrations/sources/faker) source, which generates non-sensitive sample data.
2. Add the [End-to-End Testing (/dev/null)](/integrations/destinations/dev-null) destination if you don't need to see the data. If you want to see the data in the destination, [Google Sheets](/integrations/destinations/google-sheets) is also a good option that's easy to set up.
3. [Create a connection](../move-data/add-connection) between that source and destination.
2. In your terminal, run `watch kubectl get po` or `kubectl get po -w`. This allows you to watch pods progress in your Kubernetes cluster.
3. In Airbyte's UI, start the sync.
4. Watch pods start, make progress, and complete. You should see something similar to this.
```bash
NAME READY STATUS RESTARTS AGE
us-west-1-airbyte-data-plane-c8858dd77-t55wn 1/1 Running 0 41m
replication-job-49346750-attempt-0 0/3 Completed 0 20m
source-faker-discover-49350414-0-cxrhx 0/2 Pending 0 0s
source-faker-discover-49350414-0-cxrhx 0/2 Pending 0 1s
source-faker-discover-49350414-0-cxrhx 0/2 Init:0/1 0 1s
source-faker-discover-49350414-0-cxrhx 0/2 Init:0/1 0 2s
source-faker-discover-49350414-0-cxrhx 0/2 PodInitializing 0 9s
source-faker-discover-49350414-0-cxrhx 2/2 Running 0 10s
source-faker-discover-49350414-0-cxrhx 1/2 NotReady 0 13s
source-faker-discover-49350414-0-cxrhx 0/2 Completed 0 19s
replication-job-49350414-attempt-0 0/3 Pending 0 0s
replication-job-49350414-attempt-0 0/3 Pending 0 0s
replication-job-49350414-attempt-0 0/3 Init:0/1 0 0s
replication-job-49350414-attempt-0 0/3 Init:0/1 0 1s
source-faker-discover-49350414-0-cxrhx 0/2 Completed 0 20s
replication-job-49350414-attempt-0 0/3 PodInitializing 0 17s
replication-job-49350414-attempt-0 3/3 Running 0 18s
replication-job-49350414-attempt-0 2/3 NotReady 0 31s
replication-job-49346750-attempt-0 0/3 Completed 0 29m
replication-job-49346750-attempt-0 0/3 Completed 0 29m
source-faker-discover-49350414-0-cxrhx 0/2 Completed 0 12m
source-faker-discover-49350414-0-cxrhx 0/2 Completed 0 12m
```
5. In Airbyte's UI, ensure the sync completes and populates the expected number of records, based on your settings for the Sample Data source.
<!-- ## Manage existing data planes
Once a data plane is running, you can manage and delete it if necessary.
### List all data planes
```bash
airbox get dataplane
```
### Delete a data plane
The following command removes the data plane registration from Airbyte.
```bash
airbox delete dataplane <dataplane_id>
```
However, the data plane continues to exist. To complete deletion, use Helm to uninstall the release from Kubernetes.
1. Find your data plane's release and namespace.
```bash
helm list -A
```
This returns something similar to this.
```bash
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
us-west-1-data-plane default 1 2025-09-09 16:36:12.282242 -0700 PDT deployed airbyte-data-plane-1.8.1 1.8.1
```
2. Uninstall the release. From the preceding result, use `NAME` as the `release-name` and `NAMESPACE` as the `namespace`.
```bash
helm uninstall <release-name> -n <namespace>
```
3. If this data plane belonged to a region you no longer need, you can also delete the region with Airbyte's API.
```curl
curl -X DELETE https://api.airbyte.com/v1/regions/{regionId} \
--header "Authorization: Bearer $TOKEN"
```-->
## Where Airbox stores configuration files
Airbox stores configuration data in `~/.airbyte/airbox/config.yaml`. This includes:
- Authentication credentials
- Context settings
- Organization and workspace IDs
## Restart a data plane
As long as Docker Desktop is running in the background, your data plane remains available. If you quit Docker Desktop or restart your virtual machine and want to restore your data plane, start Docker Desktop again. Once your containers are running, your data plane can resume work.
## Values.yaml not currently supported
Airbox doesn't currently support deployment customization with values.yaml files.

View File

@@ -1,21 +1,23 @@
---
products: enterprise-flex
sidebar_label: Deploy a data plane with Helm
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Multiple region deployments
# Deploy a data plane with Helm in Enterprise Flex
Airbyte Enterprise Flex customers can use Airbyte's public API to define regions and create independent data planes that operate in those regions. This ensures you're satisfying your data residency and governance requirements with a single Airbyte Cloud deployment, and it can help you reduce data egress costs with cloud providers.
![Stylized diagram showing a control plane above multiple data planes in different global regions](img/data-planes.png)
## How it works
If you're not familiar with Kubernetes, think of the control plane as the brain and data planes as the muscles doing work the brain tells them to do.
- The control plane is responsible for Airbyte's user interface, APIs, Terraform provider, and orchestrating work. Airbyte manages this for you in the cloud, reducing the time and resources it takes to start moving your data.
- The data plane initiates jobs, syncs data, completes jobs, and reports its status back to the control plane. We offer [cloud regions](https://docs.airbyte.com/platform/cloud/managing-airbyte-cloud/manage-data-residency) equipped to do this for you, but you also have the flexibility to deploy your own to keep sensitive data protected or meet local data residency requirements.
- The data plane initiates jobs, syncs data, completes jobs, and reports its status back to the control plane. We offer [cloud regions](https://docs.airbyte.com/platform/cloud/managing-airbyte-cloud/manage-data-residency) equipped to do this for you, but you also have the flexibility to deploy your own to keep sensitive data protected or meet local data residency requirements.
This separation of duties is what allows a single Airbyte deployment to ensure your data remains segregated and compliant.
@@ -35,11 +37,73 @@ If you have not already, ensure you have the [required infrastructure](https://d
Before you begin, make sure you've completed the following:
- You must be an Organization Administrator to manage regions and data planes.
- You must be an Organization Administrator to manage regions and data planes.
- You need a Kubernetes cluster on which your data plane can run. For example, if you want your data plane to run on eu-west-1, create an EKS cluster on eu-west-1.
- You need to use a [secrets manager](https://docs.airbyte.com/platform/deploying-airbyte/integrations/secrets) for the connections on your data plane. Modifying the configuration of connector secret storage will cause all existing connectors to fail, so we recommend only using newly created workspaces on the data plane.
- If you haven't already, get access to Airbyte's API by creating an application and generating an access token. For help, see [Configuring API access](https://docs.airbyte.com/platform/using-airbyte/configuring-api-access).
### Infrastructure prerequisites
For a production-ready deployment of self-managed data planes, you require the following infrastructure components. Airbyte recommend deploying to Amazon EKS, Google Kubernetes Engine, or Azure Kubernetes Service.
<Tabs>
<TabItem value="Amazon" label="Amazon" default>
| Component | Recommendation |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Kubernetes Cluster | Amazon EKS cluster running on EC2 instances in [2 or more availability zones](https://docs.aws.amazon.com/eks/latest/userguide/disaster-recovery-resiliency.html). |
| External Secrets Manager | [Amazon Secrets Manager](/platform/operator-guides/configuring-airbyte#secrets) for storing connector secrets, using a dedicated Airbyte role using a [policy with all required permissions](/platform/enterprise-setup/implementation-guide#aws-secret-manager-policy). |
| Object Storage (Optional)| Amazon S3 bucket with a directory for log storage. |
</TabItem>
</Tabs>
A few notes on Kubernetes cluster provisioning for self-managed data planes and Airbyte Enterprise Flex:
- We support Amazon Elastic Kubernetes Service (EKS) on EC2, Google Kubernetes Engine (GKE) on Google Compute Engine (GCE), or Azure Kubernetes Service (AKS) on Azure.
- While we support GKE Autopilot, we do not support Amazon EKS on Fargate.
We require you to install and configure the following Kubernetes tooling:
1. Install `helm` by following [these instructions](https://helm.sh/docs/intro/install/)
2. Install `kubectl` by following [these instructions](https://kubernetes.io/docs/tasks/tools/).
3. Configure `kubectl` to connect to your cluster by using `kubectl use-context my-cluster-name`:
<details>
<summary>Configure kubectl to connect to your cluster</summary>
<Tabs>
<TabItem value="Amazon EKS" label="Amazon EKS" default>
1. Configure your [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) to connect to your project.
2. Install [eksctl](https://eksctl.io/introduction/).
3. Run `eksctl utils write-kubeconfig --cluster=$CLUSTER_NAME` to make the context available to kubectl.
4. Use `kubectl config get-contexts` to show the available contexts.
5. Run `kubectl config use-context $EKS_CONTEXT` to access the cluster with kubectl.
</TabItem>
<TabItem value="GKE" label="GKE">
1. Configure `gcloud` with `gcloud auth login`.
2. On the Google Cloud Console, the cluster page will have a "Connect" button, with a command to run locally: `gcloud container clusters get-credentials $CLUSTER_NAME --zone $ZONE_NAME --project $PROJECT_NAME`.
3. Use `kubectl config get-contexts` to show the available contexts.
4. Run `kubectl config use-context $EKS_CONTEXT` to access the cluster with kubectl.
</TabItem>
</Tabs>
</details>
We also require you to create a Kubernetes namespace for your Airbyte deployment:
```
kubectl create namespace airbyte
```
## 1. Create a region {#step-1}
The first step is to create a region. Regions are objects that contain data planes, and which you associate to workspaces.
@@ -159,7 +223,7 @@ kind: Secret
metadata:
name: airbyte-config-secrets
type: Opaque
data:
stringData:
# Insert the data plane credentials received in step 2
DATA_PLANE_CLIENT_ID: your-data-plane-client-id
DATA_PLANE_CLIENT_SECRET: your-data-plane-client-secret

View File

@@ -1,88 +1,62 @@
---
products: enterprise-flex
sidebar_label: Get started
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Getting started
# Get started with Enterprise Flex
Any Airbyte Cloud environment can be easily upgraded to Enterprise Flex. To learn more about upgrading to Enterprise Flex, [talk to sales](https://airbyte.com/company/talk-to-sales).
Any Airbyte Cloud environment can upgrade to Enterprise Flex. To learn more about upgrading, [talk to sales](https://airbyte.com/company/talk-to-sales).
You may choose to run a self-managed data plane while using Airbyte Enterprise Flex.
You'll likely use a combination of managed and self-managed data planes. Since Airbyte sets up managed data planes for you, they're preferable when they're an option. Limit the use of self-managed data planes only to those connections that require your self-managed infrastructure.
**If you are not using any self-managed data planes, then no additional infrastructure is required to begin creating connections and running syncs.**
## Determine which regions you need
The following diagram illustrates a typical Airbyte Enterpris Flex deployment running a self-managed data plane:
Think about the data you need to sync and your data sovereignty and compliance requirements. Generally, these are things you want to consider:
- What data you want to sync, where it's stored today, and where you want it to be stored after syncing.
- Your national, sub-national, industry, and organization compliance and privacy requirements.
- Your data sovereignty needs.
- Your organization's security posture and data handling policies.
Based on this assessment, you should collect a list of how many, and which, regions you need.
## Create a workspace for each region
Each workspace uses a single region, so create one workspace for each region. A good starting pattern is to create one managed workspace for non-sensitive data without compliance and sovereignty requirements, and an additional workspace for each region your connections need to run in. For example, create one Workspace to handle U.S. data, one Workspace to handle Australian data, etc.
## Managed data planes
Managed data planes need no additional infrastructure. Begin adding sources, destinations, and connections in your workspace at your convenience.
## Self-managed data planes
The following diagram illustrates a typical Enterprise Flex deployment running a self-managed data plane.
![Airbyte Enterprise Flex Architecture Diagram](./img/enterprise-flex-architecture.png)
## Infrastructure prerequisites
You can deploy a self-managed data plane in Airbyte two ways.
For a production-ready deployment of self-managed data planes, various infrastructure components are required. We recommend deploying to Amazon EKS, Google Kubernetes Engine, or Azure Kubernetes Service.
- **Deploy with Helm**: A more traditional Kubernetes deployment using the [Helm](https://helm.sh/) package manager. This method deploys your data plane to a Kubernetes cluster like an Amazon EKS cluster. It's the right choice for teams that have in-house Kubernetes expertise. [**Deploy with Helm >**](data-plane)
<Tabs>
<TabItem value="Amazon" label="Amazon" default>
- **Deploy with Airbox**: Airbox is Airbyte's utility for simplified, single-node data plane deployments, like on a virtual machine. This utility abstracts away most of the nuance of a Kubernetes deployment. It's the right choice for teams with limited Kubernetes expertise. [**Deploy with Airbox >**](data-plane-util)
| Component | Recommendation |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Kubernetes Cluster | Amazon EKS cluster running on EC2 instances in [2 or more availability zones](https://docs.aws.amazon.com/eks/latest/userguide/disaster-recovery-resiliency.html). |
| External Secrets Manager | [Amazon Secrets Manager](/platform/operator-guides/configuring-airbyte#secrets) for storing connector secrets, using a dedicated Airbyte role using a [policy with all required permissions](/platform/enterprise-setup/implementation-guide#aws-secret-manager-policy). |
| Object Storage (Optional)| Amazon S3 bucket with a directory for log storage. |
</TabItem>
</Tabs>
A few notes on Kubernetes cluster provisioning for self-managed data planes and Airbyte Enterprise Flex:
- We support Amazon Elastic Kubernetes Service (EKS) on EC2, Google Kubernetes Engine (GKE) on Google Compute Engine (GCE), or Azure Kubernetes Service (AKS) on Azure.
- While we support GKE Autopilot, we do not support Amazon EKS on Fargate.
We require you to install and configure the following Kubernetes tooling:
1. Install `helm` by following [these instructions](https://helm.sh/docs/intro/install/)
2. Install `kubectl` by following [these instructions](https://kubernetes.io/docs/tasks/tools/).
3. Configure `kubectl` to connect to your cluster by using `kubectl use-context my-cluster-name`:
<details>
<summary>Configure kubectl to connect to your cluster</summary>
<Tabs>
<TabItem value="Amazon EKS" label="Amazon EKS" default>
1. Configure your [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) to connect to your project.
2. Install [eksctl](https://eksctl.io/introduction/).
3. Run `eksctl utils write-kubeconfig --cluster=$CLUSTER_NAME` to make the context available to kubectl.
4. Use `kubectl config get-contexts` to show the available contexts.
5. Run `kubectl config use-context $EKS_CONTEXT` to access the cluster with kubectl.
</TabItem>
<TabItem value="GKE" label="GKE">
1. Configure `gcloud` with `gcloud auth login`.
2. On the Google Cloud Console, the cluster page will have a "Connect" button, with a command to run locally: `gcloud container clusters get-credentials $CLUSTER_NAME --zone $ZONE_NAME --project $PROJECT_NAME`.
3. Use `kubectl config get-contexts` to show the available contexts.
4. Run `kubectl config use-context $EKS_CONTEXT` to access the cluster with kubectl.
</TabItem>
</Tabs>
</details>
We also require you to create a Kubernetes namespace for your Airbyte deployment:
```
kubectl create namespace airbyte
```
## Limitations and considerations
### Limitations and considerations
- While data planes process data in their respective regions, some metadata remains in the control plane.
- Airbyte stores Cursor and Primary Key data in the control plane regardless of data plane location. If you have data that you can't store in the control plane, don't use it as a cursor or primary key.
- The Connector Builder processes all data through the control plane, regardless of workspace settings. This limitation applies to the development and testing phase only; published connectors respect workspace data residency settings during syncs.
- If you want to run multiple data planes in the same region for higher availability, both must be part of the same region in Airbyte and use the same secrets manager to ensure connection credentials are the same.
- Data planes and the control plane must be configured to use the same secrets manager.
- This ensures that when you enter credentials in the UI, they are written to the secrets manager and available to the data plane when running syncs.
- Data planes must be able to communicate with the control plane.
- **Data planes will only ever send requests to the control plane and never require inbound requests.**
- Data planes must be able to communicate with the control plane.
- Data planes only send requests to the control plane and never require inbound requests.

View File

@@ -10,6 +10,7 @@ Cloud
Cloud Teams
Enterprise Flex
abctl
Airbox
PyAirbyte
[Nn]amespace
Connector Builder

View File

@@ -403,7 +403,8 @@ module.exports = {
},
items: [
"enterprise-flex/getting-started",
"enterprise-flex/data-plane"
"enterprise-flex/data-plane",
"enterprise-flex/data-plane-util",
],
},
{