1
0
mirror of synced 2025-12-29 09:03:46 -05:00

Update the documentation with the new configuration. (#38175)

Co-authored-by: perangel <perangel@gmail.com>
This commit is contained in:
Bryce Groff
2024-05-16 14:03:01 -07:00
committed by GitHub
parent 1bc850a635
commit d9792e7b5a

View File

@@ -69,16 +69,17 @@ kubectl create namespace airbyte
### Configure Kubernetes Secrets
Sensitive credentials such as AWS access keys are required to be made available in Kubernetes Secrets during deployment. The Kubernetes secret store and secret keys are referenced in your `values.yml` file. Ensure all required secrets are configured before deploying Airbyte Self-Managed Enterprise.
Sensitive credentials such as AWS access keys are required to be made available in Kubernetes Secrets during deployment. The Kubernetes secret store and secret keys are referenced in your `values.yaml` file. Ensure all required secrets are configured before deploying Airbyte Self-Managed Enterprise.
You may apply your Kubernetes secrets by applying the example manifests below to your cluster, or using `kubectl` directly. If your Kubernetes cluster already has permissions to make requests to an external entity via an instance profile, credentials are not required. For example, if your Amazon EKS cluster has been assigned a sufficient AWS IAM role to make requests to AWS S3, you do not need to specify access keys.
#### External Log Storage
#### Creating a Kubernetes Secret
While you can set the name of the secret to whatever you prefer, you will need to set that name in various places in your values.yaml file. For this reason we suggest that you keep the name of `airbyte-config-secrets` unless you have a reason to change it.
For Self-Managed Enterprise deployments, we recommend spinning up standalone log storage for additional reliability using tools such as S3 and GCS instead of against using the default internal Minio storage (`airbyte/minio`).
<details>
<summary>Secrets for External Log Storage</summary>
<summary>airbyte-config-secrets</summary>
<Tabs>
<TabItem value="S3" label="S3" default>
@@ -90,85 +91,55 @@ metadata:
name: airbyte-config-secrets
type: Opaque
stringData:
## Storage Secrets
# S3
# Enterprise License Key
license-key: ## e.g. xxxxx.yyyyy.zzzzz
# Database Secrets
database-host: ## e.g. database.internla
database-port: ## e.g. 5432
database-name: ## e.g. airbyte
database-user: ## e.g. airbyte
database-password: ## e.g. password
# Instance Admin
instance-admin-email: ## e.g. admin@company.example
instance-admin-password: ## e.g. password
# AWS S3 Secrets
s3-access-key-id: ## e.g. AKIAIOSFODNN7EXAMPLE
s3-secret-access-key: ## e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# AWS Secret Manager
aws-secret-manager-access-key-id: ## e.g. AKIAIOSFODNN7EXAMPLE
aws-secret-manager-secret-access-key: ## e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
```
Overriding `name`, `s3-access-key-id` or `s3-secret-access-key` allows you to store these secrets in the location of your choice. If you do this, you will also need to specify the secret location in the bucket config for your `values.yml` file.
Using `kubectl` to create the secret directly:
You can also use `kubectl` to create the secret directly from the CLI:
```sh
kubectl create secret generic airbyte-config-secrets \
--from-literal=license-key='' \
--from-literal=database-host='' \
--from-literal=database-port='' \
--from-literal=database-name='' \
--from-literal=database-user='' \
--from-literal=database-password='' \
--from-literal=instance-admin-email='' \
--from-literal=instance-admin-password='' \
--from-literal=s3-access-key-id='' \
--from-literal=s3-secret-access-key='' \
--from-literal=aws-secret-manager-access-key-id='' \
--from-literal=aws-secret-manager-secret-access-key='' \
--namespace airbyte
```
Ensure your access key is tied to an IAM user with the [following policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-policies-s3.html#iam-policy-ex0), allowing the cluster to S3 storage:
```yaml
{
"Version": "2012-10-17",
"Statement":
[
{ "Effect": "Allow", "Action": "s3:ListAllMyBuckets", "Resource": "*" },
{
"Effect": "Allow",
"Action": ["s3:ListBucket", "s3:GetBucketLocation"],
"Resource": "arn:aws:s3:::YOUR-S3-BUCKET-NAME",
},
{
"Effect": "Allow",
"Action":
[
"s3:PutObject",
"s3:PutObjectAcl",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:DeleteObject",
],
"Resource": "arn:aws:s3:::YOUR-S3-BUCKET-NAME/*",
},
],
}
```
</TabItem>
<TabItem value="GCS" label="GCS">
First, create a new file `gcp.json` containing the credentials JSON blob for the service account you are looking to assume.
```yaml
apiVersion: v1
kind: Secret
metadata:
name: gcp-cred-secrets
type: Opaque
stringData:
gcp.json: <CREDENTIALS_JSON_BLOB>
```
Using `kubectl` to create the secret directly from the `gcp.json` file:
```sh
kubectl create secret generic gcp-cred-secrets --from-file=gcp.json --namespace airbyte
```
</TabItem>
</Tabs>
</details>
#### External Connector Secret Management
Airbyte's default behavior is to store encrypted connector secrets on your cluster as Kubernetes secrets. You may opt to instead store connector secrets in an external secret manager of your choosing (AWS Secrets Manager, Google Secrets Manager or Hashicorp Vault).
<details>
<summary>Secrets for External Connector Secret Management</summary>
To store connector secrets in AWS Secrets Manager via a manifest:
```yaml
apiVersion: v1
@@ -177,21 +148,42 @@ metadata:
name: airbyte-config-secrets
type: Opaque
stringData:
aws-secret-manager-access-key-id: ## e.g. AKIAIOSFODNN7EXAMPLE
aws-secret-manager-secret-access-key: ## e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Enterprise License Key
license-key: ## e.g. xxxxx.yyyyy.zzzzz
# Database Secrets
database-host: ## e.g. database.internla
database-port: ## e.g. 5432
database-name: ## e.g. airbyte
database-user: ## e.g. airbyte
database-password: ## e.g. password
# Instance Admin
instance-admin-email: ## e.g. admin@company.example
instance-admin-password: ## e.g. password
# GCP Secrets
gcp.json: <CREDENTIALS_JSON_BLOB>
```
Overriding `name`, `aws-secret-manager-access-key-id` or `aws-secret-manager-secret-access-key` allows you to store these secrets in the location of your choice. If you do this, you will also need to specify the secret location in the secret manager config for your `values.yml` file.
Alternatively, you may choose to use `kubectl` to create the secret directly:
Using `kubectl` to create the secret directly from the `gcp.json` file:
```sh
kubectl create secret generic airbyte-config-secrets \
--from-literal=aws-secret-manager-access-key-id='' \
--from-literal=aws-secret-manager-secret-access-key='' \
--from-literal=license-key='' \
--from-literal=database-host='' \
--from-literal=database-port='' \
--from-literal=database-name='' \
--from-literal=database-user='' \
--from-literal=database-password='' \
--from-literal=instance-admin-email='' \
--from-literal=instance-admin-password='' \
--from-file=gcp.json
--namespace airbyte
```
</TabItem>
</Tabs>
</details>
## Installation Steps
@@ -204,93 +196,21 @@ Follow these instructions to add the Airbyte helm repository:
2. Perform the repo indexing process, and ensure your helm repository is up-to-date by running `helm repo update`.
3. You can then browse all charts uploaded to your repository by running `helm search repo airbyte`.
### Step 2: Create your Enterprise License File
### Step 2: Configure your Deployment
1. Create a new `airbyte` directory. Inside, create an empty `airbyte.yml` file.
1. Inside your `airbyte` directory, create an empty `values.yaml` file.
2. Paste the following into your newly created `airbyte.yml` file:
<details>
<summary>Template airbyte.yml file</summary>
```yaml
webapp-url: # example: http://localhost:8080
initial-user:
email:
first-name:
last-name:
username: # your existing Airbyte instance username
password: # your existing Airbyte instance password
license-key: # license key provided by Airbyte team
```
</details>
3. Fill in the contents of the `initial-user` block. The credentials grant an initial user with admin permissions. You should store these credentials in a secure location.
4. Add your Airbyte Self-Managed Enterprise license key to your `airbyte.yml` in the `license-key` field.
5. To enable SSO authentication, add [SSO auth details](/access-management/sso) to your `airbyte.yml` file.
<details>
<summary>Configuring auth in your airbyte.yml file</summary>
<Tabs>
<TabItem value="Okta" label="Okta">
To configure SSO with Okta, add the following at the end of your `airbyte.yml` file:
```yaml
auth:
identity-providers:
- type: okta
domain: $OKTA_DOMAIN
app-name: $OKTA_APP_INTEGRATION_NAME
client-id: $OKTA_CLIENT_ID
client-secret: $OKTA_CLIENT_SECRET
```
See the [following guide](/access-management/sso-providers/okta) on how to collect this information for Okta.
</TabItem>
<TabItem value="Other" label="Other">
To configure SSO with any identity provider via [OpenID Connect (OIDC)](https://openid.net/developers/how-connect-works/), such as Azure Entra ID (formerly ActiveDirectory), add the following at the end of your `values.yml` file:
```yaml
auth:
identity-providers:
- type: oidc
domain: $DOMAIN
app-name: $APP_INTEGRATION_NAME
client-id: $CLIENT_ID
client-secret: $CLIENT_SECRET
```
See the [following guide](/access-management/sso-providers/azure-entra-id) on how to collect this information for Azure Entra ID (formerly ActiveDirectory).
</TabItem>
</Tabs>
To modify auth configurations on an existing deployment (after Airbyte has been installed at least once), you will need to `helm upgrade` Airbyte with the additional environment variable `--set keycloak-setup.env_vars.KEYCLOAK_RESET_REALM=true`. As this also resets the list of Airbyte users and permissions, please use this with caution.
To deploy Self-Managed Enterprise without SSO, exclude the entire `auth:` section from your values.yml config file. You will authenticate with the instance admin user and password included in your `airbyte.yml`. Without SSO, you cannot currently have unique logins for multiple users.
</details>
### Step 3: Configure your Deployment
1. Inside your `airbyte` directory, create an empty `values.yml` file.
2. Paste the following into your newly created `values.yml` file. This is required to deploy Airbyte Self-Managed Enterprise:
2. Paste the following into your newly created `values.yaml` file. This is required to deploy Airbyte Self-Managed Enterprise:
```yml
global:
edition: enterprise
# This must be set to the public facing URL of your Airbyte instance.
airbyteUrl: #https://airbyte.company.example
```
3. The following subsections help you customize your deployment to use an external database, log storage, dedicated ingress, and more. To skip this and deploy a minimal, local version of Self-Managed Enterprise, [jump to Step 4](#step-4-deploy-self-managed-enterprise).
3. The following subsections help you customize your deployment to use an external database, log storage, dedicated ingress, and more. To skip this and deploy a minimal, local version of Self-Managed Enterprise, [jump to Step 3](#step-3-deploy-self-managed-enterprise).
#### Configuring the Airbyte Database
@@ -301,35 +221,40 @@ We assume in the following that you've already configured a Postgres instance:
<details>
<summary>External database setup steps</summary>
1. Add external database details to your `values.yml` file. This disables the default internal Postgres database (`airbyte/db`), and configures the external Postgres database:
Add external database details to your `values.yaml` file. This disables the default internal Postgres database (`airbyte/db`), and configures the external Postgres database. You can override all of the values below by setting them in the airbyte-config-secrets or set them directly here. You must set the database password in the airbyte-config-secrets. Here is an example configuration:
```yaml
postgresql:
enabled: false
externalDatabase:
host: ## Database host
user: ## Non-root username for the Airbyte database
database: db-airbyte ## Database name
port: 5432 ## Database port number
global:
database:
# -- Secret name where database credentials are stored
secretName: "" # e.g. "airbyte-config-secrets"
# -- The database host
host: ""
# -- The key within `secretName` where host is stored
#hostSecretKey: "" # e.g. "database-host"
# -- The database port
port: ""
# -- The key within `secretName` where port is stored
#portSecretKey: "" # e.g. "database-port"
# -- The database name
database: ""
# -- The key within `secretName` where the database name is stored
#databaseSecretKey: "" # e.g. "database-name"
# -- The database user
user: "" # -- The key within `secretName` where the user is stored
#userSecretKey: "" # e.g. "database-user"
# -- The key within `secretName` where password is stored
passwordSecretKey: "" # e.g."database-password"
```
2. For the non-root user's password which has database access, you may use `password`, `existingSecret` or `jdbcUrl`. We recommend using `existingSecret`, or injecting sensitive fields from your own external secret store. Each of these parameters is mutually exclusive:
```yaml
postgresql:
enabled: false
externalDatabase:
...
password: ## Password for non-root database user
existingSecret: ## The name of an existing Kubernetes secret containing the password.
existingSecretPasswordKey: ## The Kubernetes secret key containing the password.
jdbcUrl: "jdbc:postgresql://<user>:<password>@localhost:5432/db-airbyte" ## Full database JDBC URL. You can also add additional arguments.
```
The optional `jdbcUrl` field should be entered in the following format: `jdbc:postgresql://localhost:5432/db-airbyte`. We recommend against using this unless you need to add additional extra arguments can be passed to the JDBC driver at this time (e.g. to handle SSL).
</details>
#### Configuring External Logging
@@ -339,7 +264,7 @@ For Self-Managed Enterprise deployments, we recommend spinning up standalone log
<details>
<summary>External log storage setup steps</summary>
Add external log storage details to your `values.yml` file. This disables the default internal Minio instance (`airbyte/minio`), and configures the external log database:
Add external log storage details to your `values.yaml` file. This disables the default internal Minio instance (`airbyte/minio`), and configures the external log database:
<Tabs>
<TabItem value="S3" label="S3" default>
@@ -385,6 +310,73 @@ global:
</Tabs>
</details>
#### Configuring External Connector Secret Management
Airbyte's default behavior is to store encrypted connector secrets on your cluster as Kubernetes secrets. You may <b>optionally</b> opt to instead store connector secrets in an external secret manager such as AWS Secrets Manager, Google Secrets Manager or Hashicorp Vault. Upon creating a new connector, secrets (e.g. OAuth tokens, database passwords) will be written to, then read from the configured secrets manager.
<details>
<summary>Configuring external connector secret management</summary>
Modifing the configuration of connector secret storage will cause all <i>existing</i> connectors to fail. You will need to recreate these connectors to ensure they are reading from the appropriate secret store.
<Tabs>
<TabItem label="Amazon" value="Amazon">
If authenticating with credentials, ensure you've already created a Kubernetes secret containing both your AWS Secrets Manager access key ID, and secret access key. By default, secrets are expected in the `airbyte-config-secrets` Kubernetes secret, under the `aws-secret-manager-access-key-id` and `aws-secret-manager-secret-access-key` keys. Steps to configure these are in the above [prerequisites](#configure-kubernetes-secrets).
```yaml
secretsManager:
type: awsSecretManager
awsSecretManager:
region: <aws-region>
authenticationType: credentials ## Use "credentials" or "instanceProfile"
tags: ## Optional - You may add tags to new secrets created by Airbyte.
- key: ## e.g. team
value: ## e.g. deployments
- key: business-unit
value: engineering
kms: ## Optional - ARN for KMS Decryption.
```
Set `authenticationType` to `instanceProfile` if the compute infrastructure running Airbyte has pre-existing permissions (e.g. IAM role) to read and write from AWS Secrets Manager.
To decrypt secrets in the secret manager with AWS KMS, configure the `kms` field, and ensure your Kubernetes cluster has pre-existing permissions to read and decrypt secrets.
</TabItem>
<TabItem label="GCP" value="GCP">
Ensure you've already created a Kubernetes secret containing the credentials blob for the service account to be assumed by the cluster. By default, secrets are expected in the `gcp-cred-secrets` Kubernetes secret, under a `gcp.json` file. Steps to configure these are in the above [prerequisites](#configure-kubernetes-secrets). For simplicity, we recommend provisioning a single service account with access to both GCS and GSM.
```yaml
secretsManager:
type: googleSecretManager
storageSecretName: gcp-cred-secrets
googleSecretManager:
projectId: <project-id>
credentialsSecretKey: gcp.json
```
</TabItem>
</Tabs>
</details>
#### Configuring External OIDC Provider (Optional)
To enable SSO authentication, add [SSO auth details](/access-management/sso) to your `values.yaml` file.
```yaml
auth:
identityProvider:
type: oidc
oidc:
domain: #company.example
app-name: #airbyte
client-id: #e83bbc57-1991-417f-8203-3affb47636cf
client-secret: #$OKTA_CLIENT_SECRET
```
See the [following guide](/access-management/sso-providers/okta) on how to collect this information for Okta.
#### Configuring Ingress
To access the Airbyte UI, you will need to manually attach an ingress configuration to your deployment. The following is a skimmed down definition of an ingress resource you could use for Self-Managed Enterprise:
@@ -492,70 +484,18 @@ The ALB controller will use a `ServiceAccount` that requires the [following IAM
</Tabs>
</details>
Once this is complete, ensure that the value of the `webapp-url` field in your `values.yml` is configured to match the ingress URL.
Once this is complete, ensure that the value of the `webapp-url` field in your `values.yaml` is configured to match the ingress URL.
You may configure ingress using a load balancer or an API Gateway. We do not currently support most service meshes (such as Istio). If you are having networking issues after fully deploying Airbyte, please verify that firewalls or lacking permissions are not interfering with pod-pod communication. Please also verify that deployed pods have the right permissions to make requests to your external database.
#### Configuring External Connector Secret Management
Airbyte's default behavior is to store encrypted connector secrets on your cluster as Kubernetes secrets. You may <b>optionally</b> opt to instead store connector secrets in an external secret manager such as AWS Secrets Manager, Google Secrets Manager or Hashicorp Vault. Upon creating a new connector, secrets (e.g. OAuth tokens, database passwords) will be written to, then read from the configured secrets manager.
<details>
<summary>Configuring external connector secret management</summary>
Modifing the configuration of connector secret storage will cause all <i>existing</i> connectors to fail. You will need to recreate these connectors to ensure they are reading from the appropriate secret store.
<Tabs>
<TabItem label="Amazon" value="Amazon">
If authenticating with credentials, ensure you've already created a Kubernetes secret containing both your AWS Secrets Manager access key ID, and secret access key. By default, secrets are expected in the `airbyte-config-secrets` Kubernetes secret, under the `aws-secret-manager-access-key-id` and `aws-secret-manager-secret-access-key` keys. Steps to configure these are in the above [prerequisites](#configure-kubernetes-secrets).
```yaml
secretsManager:
type: awsSecretManager
awsSecretManager:
region: <aws-region>
authenticationType: credentials ## Use "credentials" or "instanceProfile"
tags: ## Optional - You may add tags to new secrets created by Airbyte.
- key: ## e.g. team
value: ## e.g. deployments
- key: business-unit
value: engineering
kms: ## Optional - ARN for KMS Decryption.
```
Set `authenticationType` to `instanceProfile` if the compute infrastructure running Airbyte has pre-existing permissions (e.g. IAM role) to read and write from AWS Secrets Manager.
To decrypt secrets in the secret manager with AWS KMS, configure the `kms` field, and ensure your Kubernetes cluster has pre-existing permissions to read and decrypt secrets.
</TabItem>
<TabItem label="GCP" value="GCP">
Ensure you've already created a Kubernetes secret containing the credentials blob for the service account to be assumed by the cluster. By default, secrets are expected in the `gcp-cred-secrets` Kubernetes secret, under a `gcp.json` file. Steps to configure these are in the above [prerequisites](#configure-kubernetes-secrets). For simplicity, we recommend provisioning a single service account with access to both GCS and GSM.
```yaml
secretsManager:
type: googleSecretManager
storageSecretName: gcp-cred-secrets
googleSecretManager:
projectId: <project-id>
credentialsSecretKey: gcp.json
```
</TabItem>
</Tabs>
</details>
### Step 4: Deploy Self-Managed Enterprise
### Step 3: Deploy Self-Managed Enterprise
Install Airbyte Self-Managed Enterprise on helm using the following command:
```sh
helm install \
--namespace airbyte \
--values ./values.yml \
--set-file airbyteYml="./airbyte.yml" \
--values ./values.yaml \
airbyte-enterprise \
airbyte/airbyte
```
@@ -572,8 +512,7 @@ Upgrade Airbyte Self-Managed Enterprise by:
```sh
helm upgrade \
--namespace airbyte \
--values ./values.yml \
--set-file airbyteYml="./airbyte.yml" \
--values ./values.yaml \
--install airbyte-enterprise \
airbyte/airbyte
```
@@ -587,9 +526,7 @@ After specifying your own configuration, run the following command:
```sh
helm upgrade \
--namespace airbyte \
--values path/to/values.yaml
--values ./values.yml \
--set-file airbyteYml="./airbyte.yml" \
--values ./values.yaml \
--install airbyte-enterprise \
airbyte/airbyte
```
@@ -598,9 +535,75 @@ airbyte/airbyte
You may choose to use your own service account instead of the Airbyte default, `airbyte-sa`. This may allow for better audit trails and resource management specific to your organizational policies and requirements.
To do this, add the following to your `values.yml`:
To do this, add the following to your `values.yaml`:
```
serviceAccount:
name:
```
## AWS Policies Appendix
Ensure your access key is tied to an IAM user or you are using a Role with the following policies.
### AWS S3 Policy
The [following policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-policies-s3.html#iam-policy-ex0), allow the cluster to communicate with S3 storage
```yaml
{
"Version": "2012-10-17",
"Statement":
[
{ "Effect": "Allow", "Action": "s3:ListAllMyBuckets", "Resource": "*" },
{
"Effect": "Allow",
"Action": ["s3:ListBucket", "s3:GetBucketLocation"],
"Resource": "arn:aws:s3:::YOUR-S3-BUCKET-NAME",
},
{
"Effect": "Allow",
"Action":
[
"s3:PutObject",
"s3:PutObjectAcl",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:DeleteObject",
],
"Resource": "arn:aws:s3:::YOUR-S3-BUCKET-NAME/*",
},
],
}
```
### AWS Secret Manager Policy
```yaml
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:CreateSecret",
"secretsmanager:ListSecrets",
"secretsmanager:DescribeSecret",
"secretsmanager:TagResource",
"secretsmanager:UpdateSecret"
],
"Resource": [
"*"
],
"Condition": {
"ForAllValues:StringEquals": {
"secretsmanager:ResourceTag/AirbyteManaged": "true"
}
}
}
]
}
```