1
0
mirror of synced 2025-12-20 02:23:30 -05:00
Files
airbyte/docs/platform/operator-guides/using-the-airflow-airbyte-operator.md
Ian Alton 4f97ee2dab docs: update plan names in docs content (#67573)
## What
<!--
* Describe what the change is solving. Link all GitHub issues related to
this change.
-->

This pull request updates plan names across most of the Documentation so
they're consistent with our current plan names.

## How
<!--
* Describe how code changes achieve the solution.
-->

My original plan was to convert free text to MDX variables so we only
had to make future updates to names in one place. While broadly
successful, there were numerous edge cases that made rolling this out
almost impossible. There were too many ways and places you couldn't use
variables due to a variety of limitations in Docusaurus and Airbyte's
internal MarkDown processor. Explaining how to properly use them made me
realize how prohibitively insufficient this was. In the end, I opted to
return to using free text for plan names.

Scope is now broadly reduced. This PR:

- Converts remaining instances of old plan names to new plan names. In
most cases, I replaced old plan names with new plan names directly. In
some cases, sentences were rewritten to make a bit more sense or be more
maintainable in the future.

- Removes previously added preprocessor variables from Docusaurus
configuration.

- Update Vale styles or various artifacts of content based on linter
findings.

## Review guide
<!--
1. `x.py`
2. `y.py`
-->

Spot check updated pages to ensure plan names appear appropriately. It's
probably not necessary to check every single instance in detail.

For Platform docs, changes only apply to the Next/Cloud version. After
merging, I'll regenerate 2.0 docs based on this. 1.8 and before won't be
updated.

## User Impact
<!--
* What is the end result perceived by the user?
* If there are negative side effects, please list them. 
-->

People can see correct plan names in docs content.

## Can this PR be safely reverted and rolled back?
<!--
* If unsure, leave it blank.
-->
- [x] YES 💚
- [ ] NO 

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-10-17 17:34:29 -07:00

7.5 KiB

description, products
description products
Start triggering Airbyte jobs with Apache Airflow in minutes oss-*

Using the Airbyte Operator to orchestrate Airbyte Core

Airbyte is an official community provider for the Apache Airflow project. The Airbyte operator allows you to trigger Airbyte self-managed synchronization jobs from Apache Airflow, and this article walks you through configuring your Airflow DAG to do so.

:::note For historic reasons, the Airbyte operator is designed to work with the internal Config API rather than the newer Airbyte API and is therefore not intended or designed for orchestrating Airbyte Cloud. As an alternative, it is possible to make use of Airflow's HTTP operators with both Airbyte self-managed and Cloud. This approach is described in Using the new Airbyte API to orchestrate Airbyte Cloud with Airflow. :::

The Airbyte Provider documentation on Airflow project can be found here.

1. Set up the tools

First, make sure you have Docker installed. (We'll be using the docker-compose command, so your install should contain docker-compose.)

Start Airbyte

If this is your first time using Airbyte, we suggest going through our Basic Tutorial. This tutorial will use the Connection set up in the basic tutorial.

For the purposes of this tutorial, set your Connection's sync frequency to manual. Airflow will be responsible for manually triggering the Airbyte job.

Start Apache Airflow

If you don't have an Airflow instance, we recommend following this guide to set one up. Additionally, you will need to install the apache-airflow-providers-airbyte package to use Airbyte Operator on Apache Airflow. You can read more about it here

2. Create a DAG in Apache Airflow to trigger your Airbyte job

Create an Airbyte connection in Apache Airflow

Once Airflow starts, navigate to Airflow's Connections page as seen below. The Airflow UI can be accessed at http://localhost:8080/.

Airflow will use the Airbyte API to execute our actions. The Airbyte API uses HTTP, so we'll need to create a HTTP Connection. Airbyte is typically hosted at localhost:8001. Configure Airflow's HTTP connection accordingly - we've provided a screenshot example.

Don't forget to click save!

Retrieving the Airbyte Connection ID

Get the Airbyte Connection ID so your Airflow DAG knows which Airbyte Connection to trigger.

  1. Open Airbyte.

  2. Click Connections > your connection.

  3. Get the connection ID from the URL. The URL looks like the following example, and your connection ID appears near the end: https://<YOUR_DOMAIN>/workspaces/<YOUR_WORKSPACE_ID>/connections/<YOUR_CONNECTION_ID>/status.

Creating a simple Airflow DAG to run an Airbyte Sync Job

Place the following file inside the /dags directory. Name this file dag_airbyte_example.py.

from airflow import DAG
from airflow.utils.dates import days_ago
from airflow.providers.airbyte.operators.airbyte import AirbyteTriggerSyncOperator

with DAG(dag_id='trigger_airbyte_job_example',
         default_args={'owner': 'airflow'},
         schedule_interval='@daily',
         start_date=days_ago(1)
    ) as dag:

    money_to_json = AirbyteTriggerSyncOperator(
        task_id='airbyte_money_json_example',
        airbyte_conn_id='airbyte_conn_example',
        connection_id='1e3b5a72-7bfd-4808-a13c-204505490110',
        asynchronous=False,
        timeout=3600,
        wait_seconds=3
    )

The Airbyte Airflow Operator accepts the following parameters:

  • airbyte_conn_id: Name of the Airflow HTTP Connection pointing at the Airbyte API. Tells Airflow where the Airbyte API is located.
  • connection_id: The ID of the Airbyte Connection to be triggered by Airflow.
  • asynchronous: Determines how the Airbyte Operator executes. When true, Airflow will monitor the Airbyte Job using an AirbyteJobSensor. Default value is false.
  • timeout: Maximum time Airflow will wait for the Airbyte job to complete. Only valid when asynchronous=False. Default value is 3600 seconds.
  • wait_seconds: The amount of time to wait between checks. Only valid when asynchronous=False. Default value is 3 seconds.

This code will produce the following simple DAG in the Airbyte UI: airbyte_money_json_example.

Our DAG will show up in the Airflow UI shortly after we place our DAG file, and be automatically triggered shortly after.

Check the Timeline tab to see if the job started syncing.

Using the asynchronous parameter

If your Airflow instance has limited resources and/or is under load, setting the asynchronous=True can help. Sensors do not occupy an Airflow worker slot, so this is helps reduce Airflow load.

from airflow import DAG
from airflow.utils.dates import days_ago
from airflow.providers.airbyte.operators.airbyte import AirbyteTriggerSyncOperator
from airflow.providers.airbyte.sensors.airbyte import AirbyteJobSensor

with DAG(dag_id='airbyte_trigger_job_example_async',
         default_args={'owner': 'airflow'},
         schedule_interval='@daily',
         start_date=days_ago(1)
    ) as dag:

    async_money_to_json = AirbyteTriggerSyncOperator(
        task_id='airbyte_async_money_json_example',
        airbyte_conn_id='airbyte_conn_example',
        connection_id='1e3b5a72-7bfd-4808-a13c-204505490110',
        asynchronous=True,
    )

    airbyte_sensor = AirbyteJobSensor(
        task_id='airbyte_sensor_money_json_example',
        airbyte_conn_id='airbyte_conn_example',
        airbyte_job_id=async_money_to_json.output
    )

    async_money_to_json >> airbyte_sensor

That's it!

Don't be fooled by our simple example of only one Airflow task. Airbyte is a powerful data integration platform supporting many sources and destinations. The Airbyte Airflow Operator means Airbyte can now be easily used with the Airflow ecosystem - give it a shot!

For additional information about using the Airflow and Airbyte together, see the following: