1
0
mirror of synced 2025-12-23 21:03:15 -05:00

GitBook: [master] 186 pages and 77 assets modified

This commit is contained in:
Abhi Vaidyanatha
2021-10-08 21:17:47 +00:00
committed by gitbook-bot
parent c8c9905b35
commit ae32ecbb27
228 changed files with 3139 additions and 2834 deletions

View File

@@ -26,10 +26,9 @@ We recommend following [this guide](https://docs.docker.com/docker-for-windows/i
**I have a Mac with the M1 chip. Is it possible to run Airbyte?**
Some users using Macs with an M1 chip are facing some problems running Airbyte.
The problem is related with the chip and Docker. [Issue #2017](https://github.com/airbytehq/airbyte/issues/2017) was created to follow up the problem, you can subscribe to it and get updates about the resolution.
If you can successfully run Airbyte using a MacBook with the M1 chip, let us know so that we can share the process with the community!
Some users using Macs with an M1 chip are facing some problems running Airbyte. The problem is related with the chip and Docker. [Issue \#2017](https://github.com/airbytehq/airbyte/issues/2017) was created to follow up the problem, you can subscribe to it and get updates about the resolution. If you can successfully run Airbyte using a MacBook with the M1 chip, let us know so that we can share the process with the community!
**Other issues**
If you encounter any issues, just connect to our [Slack](https://slack.airbyte.io). Our community will help! We also have a [troubleshooting](../troubleshooting/on-deploying.md) section in our docs for common problems.

View File

@@ -11,7 +11,7 @@ Airbyte Cloud requires no setup and can be immediately run from your web browser
If you don't have an invite, sign up [here!](https://airbyte.io/cloud-waitlist)
**2. Click on the default workspace.**
You will be provided 1000 credits to get your first few syncs going!
![](../.gitbook/assets/cloud_onboarding.png)
@@ -21,3 +21,4 @@ You will be provided 1000 credits to get your first few syncs going!
![](../.gitbook/assets/cloud_connection_onboarding.png)
**4. You're done!**

View File

@@ -1,14 +1,16 @@
# On Kubernetes
# On Kubernetes \(Beta\)
## Overview
Airbyte allows scaling sync workloads horizontally using Kubernetes. The core components (api server, scheduler, etc) run as deployments while the scheduler launches connector-related pods on different nodes.
Airbyte allows scaling sync workloads horizontally using Kubernetes. The core components \(api server, scheduler, etc\) run as deployments while the scheduler launches connector-related pods on different nodes.
## Getting Started
### Cluster Setup
For local testing we recommend following one of the following setup guides:
* [Docker Desktop (Mac)](https://docs.docker.com/desktop/kubernetes/)
* [Docker Desktop \(Mac\)](https://docs.docker.com/desktop/kubernetes/)
* [Minikube](https://minikube.sigs.k8s.io/docs/start/)
* NOTE: Start Minikube with at least 4gb RAM with `minikube start --memory=4000`
* [Kind](https://kind.sigs.k8s.io/docs/user/quick-start/)
@@ -17,8 +19,7 @@ For testing on GKE you can [create a cluster with the command line or the Cloud
For testing on EKS you can [install eksctl](https://eksctl.io/introduction/) and run `eksctl create cluster` to create an EKS cluster/VPC/subnets/etc. This process should take 10-15 minutes.
For production, Airbyte should function on most clusters v1.19 and above. We have tested support on GKE and EKS. If you run into a problem starting
Airbyte, please reach out on the `#troubleshooting` channel on our [Slack](https://slack.airbyte.io/) or [create an issue on GitHub](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=).
For production, Airbyte should function on most clusters v1.19 and above. We have tested support on GKE and EKS. If you run into a problem starting Airbyte, please reach out on the `#troubleshooting` channel on our [Slack](https://slack.airbyte.io/) or [create an issue on GitHub](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=).
### Install `kubectl`
@@ -31,7 +32,9 @@ Configure `kubectl` to connect to your cluster by using `kubectl use-context my-
* For GKE
* Configure `gcloud` with `gcloud auth login`.
* On the Google Cloud Console, the cluster page will have a `Connect` button, which will give a command to run locally that looks like
`gcloud container clusters get-credentials CLUSTER_NAME --zone ZONE_NAME --project PROJECT_NAME`.
* Use `kubectl config get-contexts` to show the contexts available.
* Run `kubectl use-context <gke context>` to access the cluster from `kubectl`.
* For EKS
@@ -43,16 +46,16 @@ Configure `kubectl` to connect to your cluster by using `kubectl use-context my-
### Configure Logs
Both `dev` and `stable` versions of Airbyte include a stand-alone `Minio` deployment. Airbyte publishes logs to this `Minio` deployment by default.
This means Airbyte comes as a **self-contained Kubernetes deployment - no other configuration is required**.
Both `dev` and `stable` versions of Airbyte include a stand-alone `Minio` deployment. Airbyte publishes logs to this `Minio` deployment by default. This means Airbyte comes as a **self-contained Kubernetes deployment - no other configuration is required**.
Airbyte currently supports logging to `Minio`, `S3` or `GCS`. The following instructions are for users wishing to log to their own `Minio` layer, `S3` bucket
or `GCS` bucket.
Airbyte currently supports logging to `Minio`, `S3` or `GCS`. The following instructions are for users wishing to log to their own `Minio` layer, `S3` bucket or `GCS` bucket.
The provided credentials require both read and write permissions. The logger attempts to create the log bucket if it does not exist.
#### Configuring Custom Minio Log Location
Replace the following variables in the `.env` file in the `kube/overlays/stable` directory:
```text
# The Minio bucket to write logs in.
S3_LOG_BUCKET=
@@ -63,10 +66,13 @@ AWS_SECRET_ACCESS_KEY=
# Endpoint where Minio is deployed at.
S3_MINIO_ENDPOINT=
```
The `S3_PATH_STYLE_ACCESS` variable should remain `true`. The `S3_LOG_BUCKET_REGION` variable should remain empty.
#### Configuring Custom S3 Log Location
Replace the following variables in the `.env` file in the `kube/overlays/stable` directory:
```text
# The S3 bucket to write logs in.
S3_LOG_BUCKET=
@@ -82,20 +88,22 @@ S3_MINIO_ENDPOINT=
S3_PATH_STYLE_ACCESS=
```
See [here](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) for instructions on creating an S3 bucket and
[here](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) for instructions on creating AWS credentials.
See [here](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) for instructions on creating an S3 bucket and [here](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) for instructions on creating AWS credentials.
#### Configuring Custom GCS Log Location
Create the GCP service account with read/write permission to the GCS log bucket.
1) Base64 encode the GCP json secret.
```
1\) Base64 encode the GCP json secret.
```text
# The output of this command will be a Base64 string.
$ cat gcp.json | base64
```
2) Populate the gcs-log-creds secrets with the Base64-encoded credential. This is as simple as taking the encoded credential from the previous step
and adding it to the `secret-gcs-log-creds,yaml` file.
```
2\) Populate the gcs-log-creds secrets with the Base64-encoded credential. This is as simple as taking the encoded credential from the previous step and adding it to the `secret-gcs-log-creds,yaml` file.
```text
apiVersion: v1
kind: Secret
metadata:
@@ -105,28 +113,28 @@ data:
gcp.json: <base64-encoded-string>
```
3) Replace the following variables in the `.env` file in the `kube/overlays/stable` directory:
```
3\) Replace the following variables in the `.env` file in the `kube/overlays/stable` directory:
```text
# The GCS bucket to write logs in.
GCP_STORAGE_BUCKET=
# The path the GCS creds are written to. Unless you know what you are doing, use the below default value.
GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcs-log-creds/gcp.json
```
See [here](https://cloud.google.com/storage/docs/creating-buckets) for instruction on creating a GCS bucket and
[here](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#iam-service-account-keys-create-console) for instruction on creating GCP credentials.
See [here](https://cloud.google.com/storage/docs/creating-buckets) for instruction on creating a GCS bucket and [here](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#iam-service-account-keys-create-console) for instruction on creating GCP credentials.
### Launch Airbyte
Run the following commands to launch Airbyte:
```text
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
kubectl apply -k kube/overlays/stable
```
After 2-5 minutes, `kubectl get pods | grep airbyte` should show `Running` as the status for all the core Airbyte pods. This may take longer
on Kubernetes clusters with slow internet connections.
After 2-5 minutes, `kubectl get pods | grep airbyte` should show `Running` as the status for all the core Airbyte pods. This may take longer on Kubernetes clusters with slow internet connections.
Run `kubectl port-forward svc/airbyte-webapp-svc 8000:80` to allow access to the UI/API.
@@ -147,50 +155,40 @@ Now visit [http://localhost:8000](http://localhost:8000) in your browser and sta
### Increasing job parallelism
The number of simultaneous jobs (getting specs, checking connections, discovering schemas, and performing syncs) is limited by a few factors. First of all, the `SUBMITTER_NUM_THREADS` (set in the `.env` file for your Kustimization overlay) provides a global limit on the number of simultaneous jobs that can run across all worker pods.
The number of simultaneous jobs \(getting specs, checking connections, discovering schemas, and performing syncs\) is limited by a few factors. First of all, the `SUBMITTER_NUM_THREADS` \(set in the `.env` file for your Kustimization overlay\) provides a global limit on the number of simultaneous jobs that can run across all worker pods.
The number of worker pods can be changed by increasing the number of replicas for the `airbyte-worker` deployment. An example of a Kustomization patch that increases this number can be seen in `airbyte/kube/overlays/dev-integration-test/kustomization.yaml` and `airbyte/kube/overlays/dev-integration-test/parallelize-worker.yaml`. The number of simultaneous jobs on a specific worker pod is also limited by the number of ports exposed by the worker deployment and set by `TEMPORAL_WORKER_PORTS` in your `.env` file. Without additional ports used to communicate to connector pods, jobs will start to run but will hang until ports become available.
The number of worker pods can be changed by increasing the number of replicas for the `airbyte-worker` deployment. An example of a Kustomization patch that increases this number can be seen in `airbyte/kube/overlays/dev-integration-test/kustomization.yaml` and `airbyte/kube/overlays/dev-integration-test/parallelize-worker.yaml`. The number of simultaneous jobs on a specific worker pod is also limited by the number of ports exposed by the worker deployment and set by `TEMPORAL_WORKER_PORTS` in your `.env` file. Without additional ports used to communicate to connector pods, jobs will start to run but will hang until ports become available.
You can also tune environment variables for the max simultaneous job types that can run on the worker pod by setting `MAX_SPEC_WORKERS`, `MAX_CHECK_WORKERS`, `MAX_DISCOVER_WORKERS`, `MAX_SYNC_WORKERS` for the worker pod deployment (not in the `.env` file). These values can be used if you want to create separate worker deployments for separate types of workers with different resource allocations.
You can also tune environment variables for the max simultaneous job types that can run on the worker pod by setting `MAX_SPEC_WORKERS`, `MAX_CHECK_WORKERS`, `MAX_DISCOVER_WORKERS`, `MAX_SYNC_WORKERS` for the worker pod deployment \(not in the `.env` file\). These values can be used if you want to create separate worker deployments for separate types of workers with different resource allocations.
### Cloud logging
Airbyte writes logs to two directories. App logs, including server and scheduler logs, are written to the `app-logging` directory.
Job logs are written to the `job-logging` directory. Both directories live at the top-level e.g., the `app-logging` directory lives at
`s3://log-bucket/app-logging` etc. These paths can change, so we recommend having a dedicated log bucket, and to not use this bucket for other
purposes.
Airbyte writes logs to two directories. App logs, including server and scheduler logs, are written to the `app-logging` directory. Job logs are written to the `job-logging` directory. Both directories live at the top-level e.g., the `app-logging` directory lives at `s3://log-bucket/app-logging` etc. These paths can change, so we recommend having a dedicated log bucket, and to not use this bucket for other purposes.
Airbyte publishes logs every minute. This means it is normal to see minute-long log delays. Each publish creates it's own log file, since Cloud
Storages do not support append operations. This also mean it is normal to see hundreds of files in your log bucket.
Airbyte publishes logs every minute. This means it is normal to see minute-long log delays. Each publish creates it's own log file, since Cloud Storages do not support append operations. This also mean it is normal to see hundreds of files in your log bucket.
Each log file is named `{yyyyMMddHH24mmss}_{podname}_{UUID}` and is not compressed. Users can view logs simply by navigating to the relevant folder and
downloading the file for the time period in question.
Each log file is named `{yyyyMMddHH24mmss}_{podname}_{UUID}` and is not compressed. Users can view logs simply by navigating to the relevant folder and downloading the file for the time period in question.
See the [Known Issues](#known-issues) section for planned logging improvements.
See the [Known Issues](on-kubernetes.md#known-issues) section for planned logging improvements.
### Using an external DB
After [Issue #3605](https://github.com/airbytehq/airbyte/issues/3605) is completed, users will be able to configure custom dbs instead of a simple
`postgres` container running directly in Kubernetes. This separate instance (preferable on a system like AWS RDS or Google Cloud SQL) should be easier
and safer to maintain than Postgres on your cluster.
After [Issue \#3605](https://github.com/airbytehq/airbyte/issues/3605) is completed, users will be able to configure custom dbs instead of a simple `postgres` container running directly in Kubernetes. This separate instance \(preferable on a system like AWS RDS or Google Cloud SQL\) should be easier and safer to maintain than Postgres on your cluster.
## Known Issues
As we improve our Kubernetes offering, we would like to point out some common pain points. We are working on improving these. Please let us know if
there are any other issues blocking your adoption of Airbyte or if you would like to contribute fixes to address any of these issues.
As we improve our Kubernetes offering, we would like to point out some common pain points. We are working on improving these. Please let us know if there are any other issues blocking your adoption of Airbyte or if you would like to contribute fixes to address any of these issues.
* Some UI operations have higher latency on Kubernetes than Docker-Compose. ([#4233](https://github.com/airbytehq/airbyte/issues/4233))
* Logging to Azure Storage is not supported. ([#4200](https://github.com/airbytehq/airbyte/issues/4200))
* Large log files might take a while to load. ([#4201](https://github.com/airbytehq/airbyte/issues/4201))
* UI does not include configured buckets in the displayed log path. ([#4204](https://github.com/airbytehq/airbyte/issues/4204))
* Logs are not reset when Airbyte is re-deployed. ([#4235](https://github.com/airbytehq/airbyte/issues/4235))
* Some UI operations have higher latency on Kubernetes than Docker-Compose. \([\#4233](https://github.com/airbytehq/airbyte/issues/4233)\)
* Logging to Azure Storage is not supported. \([\#4200](https://github.com/airbytehq/airbyte/issues/4200)\)
* Large log files might take a while to load. \([\#4201](https://github.com/airbytehq/airbyte/issues/4201)\)
* UI does not include configured buckets in the displayed log path. \([\#4204](https://github.com/airbytehq/airbyte/issues/4204)\)
* Logs are not reset when Airbyte is re-deployed. \([\#4235](https://github.com/airbytehq/airbyte/issues/4235)\)
* File sources reading from and file destinations writing to local mounts are not supported on Kubernetes.
## Customizing Airbyte Manifests
We use [Kustomize](https://kustomize.io/) to allow overrides for different environments. Our shared resources are in the `kube/resources` directory,
and we define overlays for each environment. We recommend creating your own overlay if you want to customize your deployments.
This overlay can live in your own VCS.
We use [Kustomize](https://kustomize.io/) to allow overrides for different environments. Our shared resources are in the `kube/resources` directory, and we define overlays for each environment. We recommend creating your own overlay if you want to customize your deployments. This overlay can live in your own VCS.
Example `kustomization.yaml` file:
@@ -204,8 +202,7 @@ bases:
### View Raw Manifests
For a specific overlay, you can run `kubectl kustomize kube/overlays/stable` to view the manifests that Kustomize will apply to your Kubernetes cluster.
This is useful for debugging because it will show the exact resources you are defining.
For a specific overlay, you can run `kubectl kustomize kube/overlays/stable` to view the manifests that Kustomize will apply to your Kubernetes cluster. This is useful for debugging because it will show the exact resources you are defining.
### Helm Charts
@@ -214,41 +211,47 @@ Check out the [Helm Chart Readme](https://github.com/airbytehq/airbyte/tree/mast
## Operator Guide
### View API Server Logs
`kubectl logs deployments/airbyte-server` to view real-time logs. Logs can also be downloaded as a text file via the Admin tab in the UI.
`kubectl logs deployments/airbyte-server` to view real-time logs. Logs can also be downloaded as a text file via the Admin tab in the UI.
### View Scheduler or Job Logs
`kubectl logs deployments/airbyte-scheduler` to view real-time logs. Logs can also be downloaded as a text file via the Admin tab in the UI.
### Connector Container Logs
Although all logs can be accessed by viewing the scheduler logs, connector container logs may be easier to understand when isolated by accessing from
the Airbyte UI or the [Airbyte API](../api-documentation.md) for a specific job attempt. Connector pods launched by Airbyte will not relay logs directly
to Kubernetes logging. You must access these logs through Airbyte.
Although all logs can be accessed by viewing the scheduler logs, connector container logs may be easier to understand when isolated by accessing from the Airbyte UI or the [Airbyte API](../api-documentation.md) for a specific job attempt. Connector pods launched by Airbyte will not relay logs directly to Kubernetes logging. You must access these logs through Airbyte.
### Upgrading Airbyte Kube
See [Upgrading K8s](../operator-guides/upgrading-airbyte.md).
### Resizing Volumes
To resize a volume, change the `.spec.resources.requests.storage` value. After re-applying, the mount should be extended if that operation is supported
for your type of mount. For a production deployment, it's useful to track the usage of volumes to ensure they don't run out of space.
To resize a volume, change the `.spec.resources.requests.storage` value. After re-applying, the mount should be extended if that operation is supported for your type of mount. For a production deployment, it's useful to track the usage of volumes to ensure they don't run out of space.
### Copy Files To/From Volumes
See the documentation for [`kubectl cp`](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#cp).
### Listing Files
```bash
kubectl exec -it airbyte-scheduler-6b5747df5c-bj4fx ls /tmp/workspace/8
```
### Reading Files
```bash
kubectl exec -it airbyte-scheduler-6b5747df5c-bj4fx cat /tmp/workspace/8/0/logs.log
```
### Persistent storage on GKE regional cluster
Running Airbyte on GKE regional cluster requires enabling persistent regional storage. To do so, enable [CSI driver](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver)
on GKE. After enabling, add `storageClassName: standard-rwo` to the [volume-configs](../../kube/resources/volume-configs.yaml) yaml.
Running Airbyte on GKE regional cluster requires enabling persistent regional storage. To do so, enable [CSI driver](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver) on GKE. After enabling, add `storageClassName: standard-rwo` to the [volume-configs](https://github.com/airbytehq/airbyte/tree/86ee2ad05bccb4aca91df2fb07c412efde5ba71c/kube/resources/volume-configs.yaml) yaml.
`volume-configs.yaml` example:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
@@ -266,8 +269,10 @@ spec:
```
## Troubleshooting
If you run into any problems operating Airbyte on Kubernetes, please reach out on the `#issues` channel on our [Slack](https://slack.airbyte.io/) or
[create an issue on GitHub](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=).
If you run into any problems operating Airbyte on Kubernetes, please reach out on the `#issues` channel on our [Slack](https://slack.airbyte.io/) or [create an issue on GitHub](https://github.com/airbytehq/airbyte/issues/new?assignees=&labels=type%2Fbug&template=bug-report.md&title=).
## Developing Airbyte on Kubernetes
[Read about the Kubernetes dev cycle!](https://docs.airbyte.io/contributing-to-airbyte/developing-on-kubernetes)

View File

@@ -1,25 +1,26 @@
# Install Airbyte on Oracle Cloud Infrastructure (OCI) VM
# On Oracle Cloud Infrastructure VM
Install Airbyte on Oracle Cloud Infrastructure VM running Oracle Linux 7
## Create OCI Instance
Go to OCI Console > Compute > Instances > Create Instance
## Create OCI Instance
Go to OCI Console &gt; Compute &gt; Instances &gt; Create Instance
![](../.gitbook/assets/OCIScreen1.png)
![](../.gitbook/assets/OCIScreen2.png)
## Whitelist Port 8000 for a CIDR range in Security List of OCI VM Subnet
Go to OCI Console > Networking > Virtual Cloud Network
Select the Subnet > Security List > Add Ingress Rules
Go to OCI Console &gt; Networking &gt; Virtual Cloud Network
Select the Subnet &gt; Security List &gt; Add Ingress Rules
![](../.gitbook/assets/OCIScreen3.png)
## Login to the Instance/VM with the SSH key and 'opc' user
```
```text
chmod 600 private-key-file
ssh -i private-key-file opc@oci-private-instance-ip -p 2200
@@ -37,52 +38,48 @@ sudo service docker start
sudo usermod -a -G docker $USER
### Install Docker Compose
sudo wget https://github.com/docker/compose/releases/download/1.26.2/docker-compose-$(uname -s)-$(uname -m) -O /usr/local/bin/docker-compose
sudo wget [https://github.com/docker/compose/releases/download/1.26.2/docker-compose-$\(uname](https://github.com/docker/compose/releases/download/1.26.2/docker-compose-$%28uname) -s\)-$\(uname -m\) -O /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
docker-compose --version
### Install Airbyte
mkdir airbyte && cd airbyte
wget https://raw.githubusercontent.com/airbytehq/airbyte/master/{.env,docker-compose.yaml}
wget [https://raw.githubusercontent.com/airbytehq/airbyte/master/{.env,docker-compose.yaml}](https://raw.githubusercontent.com/airbytehq/airbyte/master/{.env,docker-compose.yaml})
which docker-compose
which docker-compose
sudo /usr/local/bin/docker-compose up -d
## Create SSH Tunnel to Login to the Instance
it is highly recommended to not have a Public IP for the Instance where you are running Airbyte).
it is highly recommended to not have a Public IP for the Instance where you are running Airbyte\).
### SSH Local Port Forward to Airbyte VM
From your local workstation
```
```text
ssh opc@bastion-host-public-ip -i <private-key-file.key> -L 2200:oci-private-instance-ip:22
ssh opc@localhost -i <private-key-file.key> -p 2200
```
### Airbyte GUI Local Port Forward to Airbyte VM
```
```text
ssh opc@bastion-host-public-ip -i <private-key-file.key> -L 8000:oci-private-instance-ip:8000
```
## Access Airbyte
Open URL in Browser : http://localhost:8000/
Open URL in Browser : [http://localhost:8000/](http://localhost:8000/)
![](../.gitbook/assets/OCIScreen4.png)
/* Please note Airbyte currently does not support SSL/TLS certificates */
/ _Please note Airbyte currently does not support SSL/TLS certificates_ /