docs(ai-agents): Add AI connector tutorials and Agent engine positioning (#70461)
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
This commit is contained in:
@@ -1,104 +1,71 @@
|
||||
---
|
||||
products: embedded
|
||||
---
|
||||
import Taxonomy from "@site/static/_taxonomy_of_data_movement.md";
|
||||
|
||||
# AI Agents
|
||||
# Agent engine
|
||||
|
||||
Airbyte provides multiple tools to help you build data applications.
|
||||
Airbyte's Agent engine is a set of tools to help you automate, understand, move, and work with your data in coordination with AI agents. Some of these tools are standalone open source solutions, and others are pay solutions built on top of Airbyte Cloud.
|
||||
|
||||
- **MCP Servers**: Airbyte provides multiple MCP (Model Context Protocol) servers for different use cases:
|
||||
- [**PyAirbyte MCP**](#pyairbyte-mcp): Local MCP server for managing Airbyte connectors through AI assistants.
|
||||
- [**Connector Builder MCP**](#connector-builder-mcp): AI-assisted connector development - _**coming soon!**_
|
||||
- **Airbyte Embedded Widget**: App development teams who have signed up for Airbyte Embedded and are looking to get started onboarding customers using the Embedded Widget can follow the get started guide at the bottom of this page, which will step you through a complete sample onboarding app.
|
||||
- **Authentication Proxies**: Connect safely to third party APIs using Airbyte's Authentication Proxies.
|
||||
The platform is a solution for all types of audiences, from AI engineers who are deploying agents in large enterprises down to individual founders who need real-time context and action in their platforms.
|
||||
|
||||
## Prerequisites
|
||||
- **Agent connectors**: AI-optimized, type-safe connectors, usable with Airbyte's Connector MCP server or your own Python agents. [View the GitHub repo](https://github.com/airbytehq/airbyte-agent-connectors).
|
||||
|
||||
Before using any Airbyte developer tools, ensure you have:
|
||||
- **Airbyte Embedded**: Add hundreds of integrations into your product instantly. Your end-users can authenticate into their data sources and begin syncing data to your product. You no longer need to spend engineering cycles on data movement. Focus on what makes your product great, rather than maintaining data integrations.
|
||||
|
||||
- **Airbyte Cloud account**: Sign up at [cloud.airbyte.com](https://cloud.airbyte.com)
|
||||
- **Embedded access**: Contact michel@airbyte.io or teo@airbyte.io to enable Airbyte Embedded on your account
|
||||
- **API credentials**: Available in your Airbyte Cloud dashboard under Settings > Applications
|
||||
:::info New and growing
|
||||
The Agent engine is new and growing. Airbyte is actively seeking feedback, design partners, and community involvement. Expect this library of tools to grow and change rapidly.
|
||||
:::
|
||||
|
||||
## Why Airbyte?
|
||||
|
||||
- **Agents hallucinate or fail**: Stale and incomplete data erodes agent effectiveness. AI agents need real-time context from multiple business systems to be fully effective.
|
||||
|
||||
- **Custom API integrations are brittle and expensive**: Airbyte's library of agentic connectors are open source. Benefit from the economy of scale as third-party APIs evolve, add new endpoints, and deprecate old ones.
|
||||
|
||||
- **Agents need to write, not just read**: Airbyte provides the operational backbone needed to make agentic AI actually work in production. Fetch, write, and reason with live business data through a standardized, open protocol.
|
||||
|
||||
:::note
|
||||
Writes aren't supported in Airbyte yet.
|
||||
:::
|
||||
|
||||
### The use case for agentic data
|
||||
|
||||
The Agent engine enables agents to fetch, search, and reason with live business data.
|
||||
|
||||
Even if you're not a data expert, you still need to interpret vendor data. That means cleaning, normalizing, stitching fields together, and transforming your and your customers' data into entities your agents can actually use.
|
||||
|
||||
The Agent engine is an ideal solution when you:
|
||||
|
||||
- Don't want storage
|
||||
- Care a lot about freshness and latency
|
||||
- Are working with a small amount of data
|
||||
- Need to trigger side effects, like sending an email or closing a ticket
|
||||
|
||||
The agenda data platform _isn't_ ideal when you:
|
||||
|
||||
- Need all your data in one place
|
||||
- Need to join across datasets
|
||||
- Need more pipelines that can be slower
|
||||
- Want storage
|
||||
- Want to update content, but not trigger side effects
|
||||
- Rely on APIs that aren't good
|
||||
|
||||
If agentic data isn't what you're looking for and you need complex data aggregation for data analysis, [data replication](/platform) is likely the right solution.
|
||||
|
||||
### Taxonomy of data movement
|
||||
|
||||
<Taxonomy />
|
||||
|
||||
## Airbyte Embedded
|
||||
|
||||
[Airbyte Embedded](https://airbyte.com/embedded) equips product and software teams with the tools needed to move customer data and deliver context to AI applications.
|
||||
[Airbyte Embedded](embedded) equips product and software teams with the tools needed to move customer data and deliver context to AI applications.
|
||||
|
||||
### Embedded Workspaces & Widget
|
||||
Airbyte Embedded creates isolated workspaces for each of your customers, allowing them to configure their own data sources while keeping their data separate and secure. The Embedded Widget provides a pre-built UI component that handles the entire user onboarding flow from authentication to source configuration.
|
||||
|
||||
Airbyte Embedded creates isolated workspaces for each of your customers, allowing them to configure their own data sources while keeping their data separate and secure. The Embedded Widget provides a pre-built UI component that handles the entire user onboarding flow, from authentication to source configuration.
|
||||
Once Airbyte enables your organizaton on Airbyte Embedded, you can begin onboarding customers via the Embedded Widget. You can download the Embedded demo app [from GitHub](https://github.com/airbytehq/embedded-demoapp).
|
||||
|
||||
Once your Organization is enabled via Airbyte Embedded, you can begin onboarding customers via the Embedded Widget. You can download the code for the onboarding app [via GitHub](https://github.com/airbytehq/embedded-demoapp).
|
||||
## Agent connectors
|
||||
|
||||
## MCP Servers
|
||||
Airbyte's agent connectors are Python packages that equip AI agents to call third-party APIs through strongly typed, well-documented tools. Each connector is ready to use directly in your Python app, in an agent framework, or exposed through an MCP. [Learn more >](connectors) or [view the GitHub repo](https://github.com/airbytehq/airbyte-agent-connectors).
|
||||
|
||||
Airbyte provides multiple MCP (Model Context Protocol) servers to enable AI-assisted data integration workflows:
|
||||
## Connector MCP
|
||||
|
||||
### PyAirbyte MCP
|
||||
|
||||
[The PyAirbyte MCP server](./pyairbyte-mcp.md) is a local MCP server that provides a standardized interface for managing Airbyte connectors through MCP-compatible clients. It allows you to list connectors, validate configurations, and run sync operations using the MCP protocol. This is the recommended MCP server for most use cases.
|
||||
|
||||
### Connector Builder MCP
|
||||
|
||||
[The Connector Builder MCP server](./connector-builder-mcp.md) (coming soon) will provide AI-assisted capabilities for building and testing Airbyte connectors using the Model Context Protocol.
|
||||
|
||||
## Proxy Requests
|
||||
|
||||
### API Sources
|
||||
|
||||
:::warning
|
||||
The Airbyte Proxy feature is in alpha, which means it is still in active development and may include backward-incompatible changes. [Share feedback and requests directly with us](mailto:sonar@airbyte.io).
|
||||
:::
|
||||
|
||||
Airbyte's Authentication Proxy enables you to submit authenticated requests to external APIs. It can be both to fetch and write data.
|
||||
|
||||
Here's an example of how to query an external API with the proxy:
|
||||
|
||||
```bash
|
||||
curl -X POST -H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer {AIRBYTE_ACCESS_TOKEN}' \
|
||||
-d {"method": "GET", "url": "https://api.stripe.com/v1/balance", "headers": {"additional_header_key": "value"}}' \
|
||||
'https://api.airbyte.ai/api/v1/sonar/apis/{SOURCE_ID}/request'
|
||||
```
|
||||
|
||||
Here's an example of a POST:
|
||||
|
||||
```bash
|
||||
curl -X POST -H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer {AIRBYTE_ACCESS_TOKEN}' \
|
||||
-d {"method": "POST", "url": "https://api.stripe.com/v1/balance", "body": {"key": "value"}}' \
|
||||
'https://api.airbyte.ai/api/v1/sonar/apis/{SOURCE_ID}/request'
|
||||
```
|
||||
|
||||
Airbyte's Authentication Proxy can be used to authenticate using a Source configured through the Widget.
|
||||
|
||||
The following integrations are currently supported. More will follow shortly:
|
||||
|
||||
- Stripe
|
||||
|
||||
### File Storage Sources
|
||||
|
||||
Airbyte's File Storage Proxy enables you to submit authenticated requests to file storage sources. It can be used to list or fetch files.
|
||||
|
||||
Here's an example of how to list files:
|
||||
|
||||
```bash
|
||||
curl -X GET -H 'Content-Type: application/json' \
|
||||
-H 'Authorization: Bearer {AIRBYTE_ACCESS_TOKEN}' \
|
||||
'https://api.airbyte.ai/api/v1/sonar/files/{SOURCE_ID}/list/path/to/directory/or/file/prefix'
|
||||
```
|
||||
|
||||
Here's an example of how to fetch a file:
|
||||
|
||||
```bash
|
||||
curl -X GET -H 'Content-Type: application/octet-stream' \
|
||||
-H 'Authorization: Bearer {AIRBYTE_ACCESS_TOKEN}' \
|
||||
-H 'Range: bytes=0-1048575' \
|
||||
'https://api.airbyte.ai/api/v1/sonar/files/{SOURCE_ID}/get/path/to/file'
|
||||
```
|
||||
|
||||
For small files, you may omit the `Range` header.
|
||||
|
||||
The following integrations are currently supported. More will follow shortly:
|
||||
|
||||
- S3
|
||||
Use [agent connectors](connectors) to interact with your data using natural language.
|
||||
|
||||
61
docs/ai-agents/connectors/readme.md
Normal file
61
docs/ai-agents/connectors/readme.md
Normal file
@@ -0,0 +1,61 @@
|
||||
import DocCardList from '@theme/DocCardList';
|
||||
|
||||
# Agent connectors
|
||||
|
||||
Airbyte's agent connectors are Python packages that equip AI agents to call third-party APIs through strongly typed, well-documented tools. Each connector is ready to use directly in your Python app, in an agent framework, or exposed through an MCP.
|
||||
|
||||
## How agent connectors differ from data replication connectors
|
||||
|
||||
Traditional Airbyte connectors are for data replication. They move large volumes of data from a source into a destination such as a warehouse or data lake on a schedule. Agent connectors are lightweight, type-safe Python clients that let AI agents call third-party APIs directly in real time.
|
||||
|
||||
The key differences are:
|
||||
|
||||
- **Topology**: Data replication connectors are always used in a source-to-destination pairing managed by the Airbyte platform. Agent connectors are standalone library packages that you import into your app or agent and call directly, with no source/destination pairing or sync pipeline.
|
||||
|
||||
- **Use cases**: Data replication connectors are for batch ELT/ETL and analytics, building a full, historical dataset in a warehouse. Agent connectors are for operational AI use cases: answering a question, fetching a slice of fresh data, or performing an action in a SaaS tool while an agent is reasoning.
|
||||
|
||||
- **Execution model**: Data replication connectors run as jobs orchestrated by the Airbyte platform with schedules and state tracking. Agent connectors run inside your Python app or AI agent loop, returning results to that process immediately.
|
||||
|
||||
- **Data flow**: Data replication connectors write data into destinations and maintain state for incremental sync. Agent connectors stream typed responses back to the caller without creating a replicated copy of the data.
|
||||
|
||||
Agent connectors don't replace your existing source and destination connectors. They complement them by providing agentic, real-time access to the same systems. Unlike data replication connectors, you don't need to run the Airbyte platform to use Agent connectors—they are regular Python packages you add to your application or agent.
|
||||
|
||||
### Connector structure
|
||||
|
||||
Each connector is a standalone Python package in the [Airbyte Agent Connectors repository](https://github.com/airbytehq/airbyte-agent-connectors).
|
||||
|
||||
```text
|
||||
connectors/
|
||||
├── stripe/
|
||||
│ ├── airbyte_ai_stripe/
|
||||
│ ├── pyproject.toml
|
||||
│ ├── CHANGELOG.md
|
||||
│ └── README.md
|
||||
│ └── REFERENCE.md
|
||||
├── github/
|
||||
│ └── ...
|
||||
└── ...
|
||||
```
|
||||
|
||||
Inside each connector folder, you can find the following.
|
||||
|
||||
- The Python client
|
||||
- Connector-specific documentation with supported operations and authentication requirements
|
||||
- Typed methods generated from Airbyte's connector definitions
|
||||
- Validation + error handling
|
||||
|
||||
## When to use these connectors
|
||||
|
||||
Use Airbyte agent Connectors when you want:
|
||||
|
||||
- **Agent‑friendly data access**: Let LLM agents call real SaaS APIs, like a CRM, billing, or analytics, with guardrails and typed responses.
|
||||
|
||||
- **Consistent auth and schemas**: Reuse a uniform configuration and error‑handling pattern across many APIs. Use connectors inside frameworks like Pydantic AI, LangChain, or any custom agent loop.
|
||||
|
||||
- **Composable building blocks**: Combine multiple connectors in a single agent to orchestrate multi‑system workflows. Compared to building ad‑hoc API wrappers, these connectors give you a shared structure, generated clients, and alignment with the rest of the Airbyte ecosystem.
|
||||
|
||||
## How to work with agent connectors
|
||||
|
||||
Two options exist to work with an agent connector: Model Context Protocol (MCP) and Python SDK.
|
||||
|
||||
<DocCardList />
|
||||
240
docs/ai-agents/connectors/tutorial-mcp.md
Normal file
240
docs/ai-agents/connectors/tutorial-mcp.md
Normal file
@@ -0,0 +1,240 @@
|
||||
---
|
||||
sidebar_label: "Connector MCP tutorial"
|
||||
---
|
||||
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
# Get started with agent connectors: Connector MCP
|
||||
|
||||
In this tutorial, you'll install and run Airbyte's connector MCP server locally, connect the MCP server to Claude Code or your preferred agent, and learn to use natural language to explore your data. This tutorial uses Stripe, but if you don't have a Stripe account, you can use one of Airbyte's other agent connectors.
|
||||
|
||||
The MCP server is quick and easy to set up, but it affords less control over how you use agent connectors compared to the Python SDK. Data goes directly from the API to your AI agent.
|
||||
|
||||
## Overview
|
||||
|
||||
This tutorial is for AI engineers and other technical users who work with data and AIs. It assumes you have basic knowledge of the following.
|
||||
|
||||
- Claude Code or the AI agent of your choice
|
||||
- MCP servers
|
||||
- Stripe, or a different third-party service you want to connect to
|
||||
|
||||
## Before you start
|
||||
|
||||
Before you begin this tutorial, ensure you have installed the following software.
|
||||
|
||||
- Claude Code or the agent of your choice, and the plan necessary to run it locally
|
||||
- [Python](https://www.python.org/downloads/) version 3.13.7 or later
|
||||
- [uv](https://github.com/astral-sh/uv)
|
||||
- An account with Stripe, or a different third-party [supported by agent connectors](https://github.com/airbytehq/airbyte-agent-connectors/tree/main/connectors).
|
||||
|
||||
## Part 1: Clone the Connector MCP repository
|
||||
|
||||
Clone the Connector MCP repository.
|
||||
|
||||
```bash
|
||||
git clone https://github.com/airbytehq/airbyte-agent-connectors
|
||||
```
|
||||
|
||||
Once git finishes cloning, change directory into your repo.
|
||||
|
||||
```bash
|
||||
cd airbyte-agent-connectors/airbyte-agent-mcp
|
||||
```
|
||||
|
||||
## Part 2: Configure the connector you want to use
|
||||
|
||||
### Create a connector configuration file
|
||||
|
||||
The `configured_connectors.yaml` file defines which agent connectors you are making available through the MCP and which secrets you need for authentication.
|
||||
|
||||
1. Create a file called `configured_connectors.yaml`. It's easiest to add this file to the root, but if you want to add it somewhere else, you can instruct the MCP where to find it later.
|
||||
|
||||
2. Add your connector definition to this file. The `connector_name` field specifies which connector to load from the [Airbyte AI Connectors registry](https://connectors.airbyte.ai/registry.json). The keys under `secrets` are logical names that must match environment variables in your `.env` file.
|
||||
|
||||
```yaml title="configured_connectors.yaml"
|
||||
connectors:
|
||||
- id: stripe
|
||||
type: local
|
||||
connector_name: stripe
|
||||
description: "My Stripe API connector"
|
||||
secrets:
|
||||
api_key: STRIPE_API_KEY
|
||||
```
|
||||
|
||||
### Define secrets in `.env`
|
||||
|
||||
1. Create a new file called `.env`.
|
||||
|
||||
2. Populate that file with your secret definitions. For example, if you defined a `api_key`/`STRIPE_API_KEY` key-value pair in `configured_connectors.yaml`, define `STRIPE_API_KEY` in your `.env` file.
|
||||
|
||||
```text title=".env"
|
||||
STRIPE_API_KEY=your_stripe_api_key
|
||||
```
|
||||
|
||||
## Part 3: Run the Connector MCP
|
||||
|
||||
Use your package manager to run the Connector MCP.
|
||||
|
||||
1. If your `configured_connectors.yaml` and `.env` files are not in the repository root directory, specify their location with arguments before running the MCP.
|
||||
|
||||
```bash
|
||||
python -m connector_mcp path/to/configured_connectors.yaml path/to/.env
|
||||
```
|
||||
|
||||
2. Run the MCP.
|
||||
|
||||
```bash
|
||||
uv run connector_mcp
|
||||
```
|
||||
|
||||
## Part 4: Use the Connector MCP with your agent
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="Claude" label="Claude" default>
|
||||
|
||||
1. Add the MCP through your command line tool.
|
||||
|
||||
```bash
|
||||
claude mcp add --transport stdio connector-mcp -- \
|
||||
uv --directory /path/to/connector-mcp run connector_mcp
|
||||
```
|
||||
|
||||
Alternatively, open `.claude.json` and add the following configuration. Take extra care to get the path to the connector MCP correct. Claude expects the path from the root of your machine, not a relative path.
|
||||
|
||||
```json title=".claude.json"
|
||||
"mcpServers": {
|
||||
"connector-mcp": {
|
||||
"type": "stdio",
|
||||
"command": "uv",
|
||||
"args": [
|
||||
"--directory",
|
||||
"/path/to/connector-mcp",
|
||||
"run",
|
||||
"connector_mcp"
|
||||
],
|
||||
"env": {}
|
||||
}
|
||||
},
|
||||
```
|
||||
|
||||
2. Run Claude.
|
||||
|
||||
```bash
|
||||
claude
|
||||
```
|
||||
|
||||
3. Verify the MCP server is running.
|
||||
|
||||
```bash
|
||||
/mcp
|
||||
```
|
||||
|
||||
You should see something like this.
|
||||
|
||||
```bash
|
||||
Connector-mcp MCP Server
|
||||
|
||||
Status: ✔ connected
|
||||
Command: uv
|
||||
Args: --directory /path/to/connector-mcp run connector_mcp
|
||||
Config location: /path/to/.claude.json [project: /path/to/connector-mcp]
|
||||
Capabilities: tools
|
||||
Tools: 3 tools
|
||||
|
||||
❯ 1. View tools
|
||||
2. Reconnect
|
||||
3. Disable
|
||||
```
|
||||
|
||||
4. Press <kbd>Esc</kbd> to go back to the main Claude prompt screen. You're now ready to work.
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="Other" label="Other">
|
||||
|
||||
Connector MCP runs as a standard MCP server over stdio. Any MCP-compatible client that supports custom stdio servers can use it by running the same command shown in the Claude tab. Refer to your client's documentation for how to add a custom MCP server.
|
||||
|
||||
The key configuration elements are:
|
||||
|
||||
- **Transport**: stdio
|
||||
- **Command**: `uv`
|
||||
- **Arguments**: `--directory /path/to/connector-mcp run connector_mcp`
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## Part 5: Work with your data
|
||||
|
||||
Once your agent connects to the Connector MCP, you can use natural language to explore and interact with your data. The MCP server exposes three tools to your agent: one to list configured connectors, one to describe what a connector can do, and one to execute operations against your data sources.
|
||||
|
||||
### Verify your setup
|
||||
|
||||
Start by confirming your connector is properly configured. Ask your agent something like:
|
||||
|
||||
"List all configured connectors and tell me which entities and actions are available for the stripe connector."
|
||||
|
||||
Your agent discovers the available connectors and describes the Stripe connector's capabilities, showing you entities like `customers` and the actions you can perform on them, like `list` and `get`.
|
||||
|
||||
### Explore your data
|
||||
|
||||
Once you've verified your setup, you can start exploring your data with natural language queries. Here are some examples using Stripe:
|
||||
|
||||
<!-- vale off -->
|
||||
- "List the 10 most recent Stripe customers and show me their email, name, and account balance."
|
||||
- "Get the details for customer cus_ABC123 and show me all available fields."
|
||||
- "How many customers do I have in Stripe? List them grouped by their creation month."
|
||||
<!-- vale off -->
|
||||
|
||||
Your agent translates these requests into the appropriate API calls, fetches the data, and presents it in a readable format.
|
||||
|
||||
### Ask analytical questions
|
||||
|
||||
You can also ask your agent to analyze and summarize data across multiple records:
|
||||
|
||||
<!-- vale off -->
|
||||
- "Find any Stripe customers who have a negative balance and list them with their balance amounts."
|
||||
- "Summarize my Stripe customers by showing me the total count and the date range of when they were created."
|
||||
<!-- vale off -->
|
||||
|
||||
The agent can combine multiple API calls and reason over the results to answer more complex questions.
|
||||
|
||||
### Tips for effective queries
|
||||
|
||||
When working with your data through the MCP, keep these tips in mind:
|
||||
|
||||
- Be specific about which connector you want to use if you have multiple configured (for example, "Using the stripe connector, list customers").
|
||||
- Start with broad queries to understand what data is available, then drill down into specific records.
|
||||
- If you're unsure what fields are available, ask your agent to describe the connector's entities first.
|
||||
- For large datasets, specify limits in your queries to avoid overwhelming responses (for example, "Show me the first 20 customers").
|
||||
|
||||
## Summary
|
||||
|
||||
In this tutorial, you learned how to:
|
||||
|
||||
- Clone and set up Airbyte's Connector MCP
|
||||
- Integrate the MCP with your AI agent
|
||||
- Use natural language to interact with your data
|
||||
|
||||
## Next steps
|
||||
|
||||
- Continue adding new connectors to the MCP server by repeating Parts 2, 3, and 4 of this tutorial.
|
||||
|
||||
You can configure multiple connectors in the same file. Here's an example:
|
||||
|
||||
```yaml title="configured_connectors.yaml"
|
||||
connectors:
|
||||
- id: stripe
|
||||
type: local
|
||||
connector_name: stripe
|
||||
description: "Stripe connector from Airbyte registry"
|
||||
secrets:
|
||||
api_key: STRIPE_API_KEY
|
||||
- id: github
|
||||
type: local
|
||||
connector_name: github
|
||||
description: "GitHub connector from Airbyte registry"
|
||||
secrets:
|
||||
token: GITHUB_TOKEN
|
||||
```
|
||||
|
||||
- If you need to run more complex processing and trigger effects based on your data, try the [Python](tutorial-python) tutorial to start using agent connectors with the Python SDK.
|
||||
247
docs/ai-agents/connectors/tutorial-python.md
Normal file
247
docs/ai-agents/connectors/tutorial-python.md
Normal file
@@ -0,0 +1,247 @@
|
||||
---
|
||||
sidebar_label: "Python SDK tutorial"
|
||||
---
|
||||
|
||||
# Get started with agent connectors: Python SDK
|
||||
|
||||
In this tutorial, you'll create a new Python project with `uv`, add a Pydantic AI agent, equip it to use one of Airbyte's agent connectors, and use natural language to explore your data. This tutorial uses GitHub, but if you don't have a GitHub account, you can use one of Airbyte's other agent connectors and perform different operations.
|
||||
|
||||
Using the Python SDK is more time-consuming than the Connector MCP server, but affords you the most control over the context you send to your agent.
|
||||
|
||||
## Overview
|
||||
|
||||
This tutorial is for AI engineers and other technical users who work with data and AI tools. You can complete it in about 15 minutes.
|
||||
|
||||
The tutorial assumes you have basic knowledge of the following tools, but most software engineers shouldn't struggle with anything that follows.
|
||||
|
||||
- Python and package management with uv
|
||||
- Pydantic AI
|
||||
- GitHub, or a different third-party service you want to connect to
|
||||
|
||||
## Before you start
|
||||
|
||||
Before you begin this tutorial, ensure you have the following.
|
||||
|
||||
- [Python](https://www.python.org/downloads/) version 3.10 or later
|
||||
- [uv](https://github.com/astral-sh/uv)
|
||||
- A [GitHub personal access token](https://github.com/settings/tokens). For this tutorial, a classic token with `repo` scope is sufficient.
|
||||
- An [OpenAI API key](https://platform.openai.com/api-keys). This tutorial uses OpenAI, but Pydantic AI supports other LLM providers if you prefer.
|
||||
|
||||
## Part 1: Create a new Python project
|
||||
|
||||
In this tutorial you initialize a basic Python project to work in. However, if you have an existing project you want to work with, feel free to use that instead.
|
||||
|
||||
1. Create a new project using uv:
|
||||
|
||||
```bash
|
||||
uv init my-ai-agent --app
|
||||
cd my-ai-agent
|
||||
```
|
||||
|
||||
This creates a project with the following structure:
|
||||
|
||||
```text
|
||||
my-ai-agent/
|
||||
├── .gitignore
|
||||
├── .python-version
|
||||
├── README.md
|
||||
├── main.py
|
||||
└── pyproject.toml
|
||||
```
|
||||
|
||||
2. Create an `agent.py` file for your agent definition:
|
||||
|
||||
```bash
|
||||
touch agent.py
|
||||
```
|
||||
|
||||
You create `.env` and `uv.lock` files in later steps, so don't worry about them yet.
|
||||
|
||||
## Part 2: Install dependencies
|
||||
|
||||
Install the GitHub connector and Pydantic AI. This tutorial uses OpenAI as the LLM provider, but Pydantic AI supports many other providers.
|
||||
|
||||
```bash
|
||||
uv add airbyte-ai-github pydantic-ai
|
||||
```
|
||||
|
||||
This command installs:
|
||||
|
||||
- `airbyte-ai-github`: The Airbyte agent connector for GitHub, which provides type-safe access to GitHub's API.
|
||||
- `pydantic-ai`: The AI agent framework, which includes support for multiple LLM providers including OpenAI, Anthropic, and Google.
|
||||
|
||||
The GitHub connector also includes `python-dotenv`, which you can use to load environment variables from a `.env` file.
|
||||
|
||||
:::note
|
||||
If you want a smaller installation with only OpenAI support, you can use `pydantic-ai-slim[openai]` instead of `pydantic-ai`. See the [Pydantic AI installation docs](https://ai.pydantic.dev/install/) for more options.
|
||||
:::
|
||||
|
||||
## Part 3: Import Pydantic AI and the GitHub agent connector
|
||||
|
||||
Add the following imports to `agent.py`:
|
||||
|
||||
```python title="agent.py"
|
||||
import os
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from pydantic_ai import Agent
|
||||
from airbyte_ai_github import GithubConnector
|
||||
from airbyte_ai_github.models import GithubAuthConfig
|
||||
```
|
||||
|
||||
These imports provide:
|
||||
|
||||
- `os`: Access environment variables for your GitHub token and LLM API key.
|
||||
- `load_dotenv`: Load environment variables from your `.env` file.
|
||||
- `Agent`: The Pydantic AI agent class that orchestrates LLM interactions and tool calls.
|
||||
- `GithubConnector`: The Airbyte agent connector that provides type-safe access to GitHub's API.
|
||||
- `GithubAuthConfig`: The authentication configuration for the GitHub connector.
|
||||
|
||||
## Part 4: Add a .env file with your secrets
|
||||
|
||||
1. Create a `.env` file in your project root and add your secrets to it. Replace the placeholder values with your actual credentials.
|
||||
|
||||
```text title=".env"
|
||||
GITHUB_ACCESS_TOKEN=your-github-personal-access-token
|
||||
OPENAI_API_KEY=your-openai-api-key
|
||||
```
|
||||
|
||||
:::warning
|
||||
Never commit your `.env` file to version control. If you do this by mistake, rotate your secrets immediately.
|
||||
:::
|
||||
|
||||
2. Add the following line to `agent.py` after your imports to load the environment variables:
|
||||
|
||||
```python title="agent.py"
|
||||
load_dotenv()
|
||||
```
|
||||
|
||||
This makes your secrets available via `os.environ`. Pydantic AI automatically reads `OPENAI_API_KEY` from the environment, and you'll use `os.environ["GITHUB_ACCESS_TOKEN"]` to configure the connector in the next section.
|
||||
|
||||
## Part 5: Configure your connector and agent
|
||||
|
||||
Now that your environment is set up, add the following code to `agent.py` to create the GitHub connector and Pydantic AI agent.
|
||||
|
||||
### Define the connector
|
||||
|
||||
Define the agent connector for GitHub. It authenticates using your personal access token.
|
||||
|
||||
```python title="agent.py"
|
||||
connector = GithubConnector(
|
||||
auth_config=GithubAuthConfig(
|
||||
access_token=os.environ["GITHUB_ACCESS_TOKEN"]
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
### Create the agent
|
||||
|
||||
Create a Pydantic AI agent with a system prompt that describes its purpose:
|
||||
|
||||
```python title="agent.py"
|
||||
agent = Agent(
|
||||
"openai:gpt-4o",
|
||||
system_prompt=(
|
||||
"You are a helpful assistant that can access GitHub repositories, issues, "
|
||||
"and pull requests. Use the available tools to answer questions about "
|
||||
"GitHub data. Be concise and accurate in your responses."
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
- The `"openai:gpt-4o"` string specifies the model to use. You can use a different model by changing the model string. For example, use `"openai:gpt-4o-mini"` to lower costs, or see the [Pydantic AI models documentation](https://ai.pydantic.dev/models/) for other providers like Anthropic or Google.
|
||||
- The `system_prompt` parameter tells the LLM what role it should play and how to behave.
|
||||
|
||||
## Part 6: Add tools to your agent
|
||||
|
||||
Tools let your agent fetch real data from GitHub using Airbyte's agent connector. Without tools, the agent can only respond based on its training data. By registering connector operations as tools, the agent can decide when to call them based on natural language questions.
|
||||
|
||||
Add the following code to `agent.py`.
|
||||
|
||||
```python title="agent.py"
|
||||
# Tool to list issues in a repository
|
||||
@agent.tool_plain
|
||||
async def list_issues(owner: str, repo: str, limit: int = 10) -> str:
|
||||
"""List open issues in a GitHub repository."""
|
||||
result = await connector.issues.list(owner=owner, repo=repo, states=["OPEN"], per_page=limit)
|
||||
return str(result.data)
|
||||
|
||||
|
||||
# Tool to list pull requests in a repository
|
||||
@agent.tool_plain
|
||||
async def list_pull_requests(owner: str, repo: str, limit: int = 10) -> str:
|
||||
"""List open pull requests in a GitHub repository."""
|
||||
result = await connector.pull_requests.list(owner=owner, repo=repo, states=["OPEN"], per_page=limit)
|
||||
return str(result.data)
|
||||
```
|
||||
|
||||
The `@agent.tool_plain` decorator registers each function as a tool the agent can call. The docstring becomes the tool's description, which helps the LLM understand when to use it. The function parameters become the tool's input schema, so the LLM knows what arguments to provide.
|
||||
|
||||
With these two tools, your agent can answer questions about issues, pull requests, or both. For example, it can compare open issues against pending PRs to identify which issues might be resolved soon.
|
||||
|
||||
## Part 7: Run your project
|
||||
|
||||
Now that your agent is configured with tools, update `main.py` and run your project.
|
||||
|
||||
1. Update `main.py`. This code creates a simple chat interface in your command line tool and allows your agent to remember your conversation history between prompts.
|
||||
|
||||
```python title="main.py"
|
||||
import asyncio
|
||||
from agent import agent
|
||||
|
||||
async def main():
|
||||
print("GitHub Agent Ready! Ask questions about GitHub repositories.")
|
||||
print("Type 'quit' to exit.\n")
|
||||
|
||||
history = None
|
||||
|
||||
while True:
|
||||
prompt = input("You: ")
|
||||
if prompt.lower() in ('quit', 'exit', 'q'):
|
||||
break
|
||||
result = await agent.run(prompt, message_history=history)
|
||||
history = result.all_messages() # Call the method
|
||||
print(f"\nAgent: {result.output}\n")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
2. Run the project.
|
||||
|
||||
```bash
|
||||
uv run main.py
|
||||
```
|
||||
|
||||
### Chat with your agent
|
||||
|
||||
The agent waits for your input. Once you prompt it, the agent decides which tools to call based on your question, fetches the data from GitHub, and returns a natural language response. Try prompts like:
|
||||
|
||||
- "List the 10 most recent open issues in airbytehq/airbyte"
|
||||
- "What are the 10 most recent pull requests that are still open in airbytehq/airbyte?"
|
||||
- "Are there any open issues that might be fixed by a pending PR?"
|
||||
|
||||
The agent has basic message history within each session, and you can ask followup questions based on its responses.
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
If your agent fails to retrieve GitHub data, check the following:
|
||||
|
||||
- **HTTP 401 errors**: Your `GITHUB_ACCESS_TOKEN` is invalid or expired. Generate a new token and update your `.env` file.
|
||||
- **HTTP 403 errors**: Your token doesn't have the required scopes. Ensure your token has `repo` scope for accessing repository data.
|
||||
- **OpenAI errors**: Verify your `OPENAI_API_KEY` is valid, has available credits, and won't exceed rate limits.
|
||||
|
||||
## Summary
|
||||
|
||||
In this tutorial, you learned how to:
|
||||
|
||||
- Set up a new Python project with `uv`
|
||||
- Add Pydantic AI and Airbyte's GitHub agent connector to your project
|
||||
- Configure environment variables and authentication
|
||||
- Add tools to your agent using the GitHub connector
|
||||
- Run your project and use natural language to interact with GitHub data
|
||||
|
||||
## Next steps
|
||||
|
||||
- Add more tools and agent connectors to your project. For GitHub, you can wrap additional operations (like search, comments, or commits) as tools. Explore other agent connectors in the [Airbyte agent connectors catalog](https://github.com/airbytehq/airbyte-agent-connectors) to give your agent access to more services.
|
||||
- Consider how you might like to expand your agent's capabilities. For example, you might want to trigger effects like sending a Slack message or an email based on the agent's findings. You aren't limited to the capabilities of Airbyte's agent connectors. You can use other libraries and integrations to build an increasingly robust agent ecosystem.
|
||||
@@ -9,16 +9,26 @@ There are three components to Airbyte Embedded:
|
||||
|
||||
You can read more about about how Airbyte Embedded fits in your application [here](https://airbyte.com/blog/how-to-build-ai-apps-with-customer-context).
|
||||
|
||||
There are two approaches to set up Airbyte Embedded:
|
||||
Before using any Airbyte developer tools, ensure you have:
|
||||
|
||||
- **Airbyte Cloud account**: Sign up at [cloud.airbyte.com](https://cloud.airbyte.com)
|
||||
- **Embedded access**: Contact michel@airbyte.io or teo@airbyte.io to enable Airbyte Embedded on your account
|
||||
- **API credentials**: Available in your Airbyte Cloud dashboard under Settings > Applications
|
||||
|
||||
There are two approaches to set up Airbyte Embedded: the widget and the API.
|
||||
|
||||
## When to Use the Widget
|
||||
|
||||
Use the [Airbyte Embedded Widget](./widget/README.md) if you:
|
||||
|
||||
- Want to get started quickly with minimal development effort
|
||||
- Are comfortable with a pre-built UI that matches Airbyte's design
|
||||
- Want Airbyte to handle authentication, error states, and validation
|
||||
|
||||
## When to Use the API
|
||||
|
||||
Use the [Airbyte API](./api/README.md) if you:
|
||||
|
||||
- Need complete control over the user experience and UI design
|
||||
- Want to integrate data source configuration into your existing workflows
|
||||
|
||||
|
||||
@@ -4,9 +4,9 @@ products: embedded
|
||||
|
||||
---
|
||||
|
||||
# Airbyte API
|
||||
# Agent Engine API
|
||||
|
||||
The Airbyte API allows you to build a fully integrated Airbyte Embedded Experience.
|
||||
The Agent Engine API allows you to build a fully integrated Airbyte Embedded Experience.
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
@@ -27,4 +27,4 @@ For each user who wants to connect their data:
|
||||
|
||||
This approach separates one-time organizational setup from per-user operations, making your integration more scalable.
|
||||
|
||||
The complete API reference can be found at [api.airbyte.ai/api/v1/docs](https://api.airbyte.ai/api/v1/docs).
|
||||
[Full Agent Engine API reference](/ai-agents/embedded/api-reference/sonar).
|
||||
|
||||
@@ -1,7 +1,3 @@
|
||||
---
|
||||
products: embedded
|
||||
---
|
||||
|
||||
# Airbyte Embedded Widget
|
||||
|
||||
The [Airbyte Embedded Widget](https://github.com/airbytehq/airbyte-embedded-widget) is a Javascript library you can use in your application to allow your users to sync their data integrations to your data lake.
|
||||
|
||||
@@ -1,7 +1,3 @@
|
||||
---
|
||||
products: embedded
|
||||
---
|
||||
|
||||
# Managing Airbyte Embedded
|
||||
|
||||
## Customer Workspaces
|
||||
|
||||
@@ -1,7 +1,3 @@
|
||||
---
|
||||
products: embedded
|
||||
---
|
||||
|
||||
# Get started with the Airbyte Embedded Widget
|
||||
|
||||
This guide walks you through implementing the Airbyte Embedded Widget into your existing web application. You'll learn how to set up connection templates, authenticate your application, and embed the widget to allow your users to sync their data. This should take approximately 30 minutes to complete.
|
||||
|
||||
@@ -1,7 +1,3 @@
|
||||
---
|
||||
products: embedded
|
||||
---
|
||||
|
||||
# Template Tags
|
||||
|
||||
## Overview
|
||||
|
||||
@@ -1,7 +1,3 @@
|
||||
---
|
||||
products: embedded
|
||||
---
|
||||
|
||||
# Develop Your Web App
|
||||
|
||||
The sample onboarding app is a full-stack React application with support for both local and production (Vercel) deployment architectures:
|
||||
|
||||
@@ -1,7 +1,3 @@
|
||||
---
|
||||
products: embedded
|
||||
---
|
||||
|
||||
# 2-Minute Quickstart
|
||||
|
||||
## Setup (all apps)
|
||||
|
||||
@@ -1,7 +1,3 @@
|
||||
---
|
||||
products: embedded
|
||||
---
|
||||
|
||||
# Use Airbyte Embedded
|
||||
|
||||
With your app up and running, you will be prompted to enter your web password before continuing. After authentication, you will be presented with a screen to allow customers to onboard by adding their email address.
|
||||
|
||||
@@ -1,11 +1,12 @@
|
||||
---
|
||||
products: embedded
|
||||
draft: true
|
||||
---
|
||||
|
||||
# Connector Builder MCP Server
|
||||
|
||||
> **NOTE:**
|
||||
> The Connector Builder MCP server is currently in development. This documentation will be updated as the server becomes available.
|
||||
:::note
|
||||
The Connector Builder MCP server is currently in development. This documentation will be updated as the server becomes available.
|
||||
:::
|
||||
|
||||
The Connector Builder MCP server provides AI-driven connector building experience for building and testing Airbyte connectors using the [Model Context Protocol](https://modelcontextprotocol.io/). This enables AI assistants to help developers create, configure, and validate custom connectors through a standardized interface.
|
||||
|
||||
@@ -1,11 +1,8 @@
|
||||
---
|
||||
products: embedded
|
||||
---
|
||||
|
||||
# PyAirbyte MCP Server
|
||||
|
||||
> **NOTE:**
|
||||
> This MCP server implementation is experimental and may change without notice between minor versions of PyAirbyte. The API may be modified or entirely refactored in future versions.
|
||||
:::note
|
||||
This MCP server implementation is experimental and may change without notice between minor versions of PyAirbyte. The API may be modified or entirely refactored in future versions.
|
||||
:::
|
||||
|
||||
The PyAirbyte MCP (Model Context Protocol) server provides a standardized interface for managing Airbyte connectors through MCP-compatible clients. This experimental feature allows you to list connectors, validate configurations, and run sync operations using the MCP protocol.
|
||||
|
||||
10
docs/developers/mcp-servers/readme.md
Normal file
10
docs/developers/mcp-servers/readme.md
Normal file
@@ -0,0 +1,10 @@
|
||||
import DocCardList from '@theme/DocCardList';
|
||||
|
||||
# MCP Servers
|
||||
|
||||
Airbyte provides MCP (Model Context Protocol) servers to enable AI-assisted data integration workflows for different use cases.
|
||||
|
||||
- The PyAirbyte MCP is a local MCP server for managing Airbyte connectors through AI assistants.
|
||||
- The Connector Builder MCP (coming soon) is an AI-assisted connector development.
|
||||
|
||||
<DocCardList />
|
||||
@@ -2,27 +2,15 @@ import ConnectorRegistry from '@site/src/components/ConnectorRegistry';
|
||||
|
||||
# Connectors
|
||||
|
||||
A connector is a tool to pull data from a source or push data to a destination.
|
||||
|
||||
Source connectors connect to the APIs, file, databases, or data warehouses from which you want to pull data. Destination connectors are the data warehouses, data lakes, databases, or analytics tools to which you want to push data.
|
||||
|
||||
Browse Airbyte's catalog below to see which connectors are available, read their documentation, or review the code and GitHub issues for that connector. Most connectors are available in both Cloud and Self-Managed versions of Airbyte, but some are only available in Self-Managed.
|
||||
Airbyte's library of connectors is used by the [data replication platform](/platform). A connector is a tool to pull data from a source or push data to a destination. To learn more about connectors, see [Sources, destinations, and connectors](../platform/move-data/sources-destinations-connectors). To learn how to use a specific connector, find the documentation for the connector you want to use, below.
|
||||
|
||||
## Contribute to Airbyte's connectors
|
||||
|
||||
Don't see the connector you need? Need a connector to do something it doesn't currently do? Airbyte's connectors are open source. You can [build entirely new connectors](../platform/connector-development/) or contribute enhancements, bug fixes, and features to existing connectors. We encourage contributors to [add your changes](/community/contributing-to-airbyte/) to Airbyte's public connector catalog, but you always have the option to publish them privately in your own workspaces.
|
||||
Don't see the connector you need? Need a connector to do something it doesn't currently do? Airbyte's connectors are open source. You can [build new connectors](../platform/connector-development/) or contribute fixes and features to existing connectors. You can [add your changes](/community/contributing-to-airbyte/) to Airbyte's public connector catalog to help others, or publish changes privately in your own workspaces.
|
||||
|
||||
## Connector support levels
|
||||
|
||||
Each connector has one of the following support levels. Review [Connector support levels](/integrations/connector-support-levels) for details on each tier.
|
||||
|
||||
- **Airbyte**: maintained by Airbyte.
|
||||
|
||||
- **Enterprise**: special, premium connectors available to Enterprise and Pro customers **for an additional cost**. To learn more about enterprise connectors, [talk to Sales](https://airbyte.com/company/talk-to-sales).
|
||||
|
||||
- **Marketplace**: maintained by the open source community.
|
||||
|
||||
- **Custom**: If you create your own custom connector, you alone are responsible for its maintenance.
|
||||
Connectors have different support levels (Airbyte, Marketplace, Enterprise, and Custom). Review [Connector support levels](/integrations/connector-support-levels) for details.
|
||||
|
||||
## All source connectors
|
||||
|
||||
|
||||
@@ -2,18 +2,61 @@
|
||||
products: all
|
||||
---
|
||||
|
||||
# Airbyte platform
|
||||
# Data replication platform
|
||||
|
||||
import Tabs from "@theme/Tabs";
|
||||
import TabItem from "@theme/TabItem";
|
||||
import Taxonomy from "@site/static/_taxonomy_of_data_movement.md";
|
||||
|
||||
Airbyte is an open source data integration and activation platform. It helps you consolidate data from hundreds of sources into your data warehouses, data lakes, and databases. Then, it helps you move data from those locations into the operational tools where work happens, like CRMs, marketing platforms, and support systems.
|
||||
Use Airbyte's data replication platform to consolidate data from hundreds of sources into your data warehouses, data lakes, and databases. Then, move data into the operational tools where work happens, like CRMs, marketing platforms, and support systems.
|
||||
|
||||
Whether you're part of a large organization managing complex data pipelines or an individual analyst consolidating data, Airbyte works for you. Airbyte offers flexibility and scalability that's easy to tailor to your specific needs, from one-off jobs to enterprise solutions.
|
||||
|
||||
## Airbyte plans
|
||||
## Why Airbyte?
|
||||
|
||||
Airbyte is available as a self-managed, hybrid, or fully managed cloud solution. [Compare plans and pricing >](https://airbyte.com/pricing)
|
||||
Teams and organizations need efficient and timely data access to an ever-growing list of data sources. In-house data pipelines are brittle and costly to build and maintain. Airbyte's unique open source approach enables your data stack to adapt as your data needs evolve.
|
||||
|
||||
- **Wide connector availability:** Airbyte's connector catalog comes "out-of-the-box" with over 600 pre-built connectors. These connectors can be used to start replicating data from a source to a destination in just a few minutes.
|
||||
|
||||
- **Long-tail connector coverage:** You can easily extend Airbyte's capability to support your custom use cases through Airbyte's [No-Code Connector Builder](/platform/connector-development/connector-builder-ui/overview).
|
||||
|
||||
- **Robust platform** provides horizontal scaling required for large-scale data movement operations, available as [Cloud-managed](https://airbyte.com/product/airbyte-cloud) or [Self-managed](https://airbyte.com/product/airbyte-enterprise).
|
||||
|
||||
- **Accessible User Interfaces** through the UI, [**PyAirbyte**](/developers/using-pyairbyte) (Python library), [**API**](/developers/api-documentation), and [**Terraform Provider**](/developers/terraform-documentation) to integrate with your preferred tooling and approach to infrastructure management.
|
||||
|
||||
Airbyte is suitable for a wide range of data integration use cases, including AI data infrastructure and EL(T) workloads.
|
||||
|
||||
### The use case for data replication
|
||||
|
||||
Airbyte's data replication platform is an extract, load, and data activation solution. You might know this as ELT/reverse ETL.
|
||||
|
||||
Data replication is ideal when you:
|
||||
|
||||
- Need all your data in one place
|
||||
- Need to join across datasets
|
||||
- Need more pipelines that can be slower
|
||||
- Want storage
|
||||
- Want to update content, but not trigger side effects
|
||||
- Rely on APIs that aren't good, although good APIs are preferable
|
||||
|
||||
Data replication _isn't_ ideal when you:
|
||||
|
||||
- Don't want storage
|
||||
- Care a lot about freshness and latency
|
||||
- Are working with a small amount of data
|
||||
- Need to trigger side effects, like sending an email or closing a ticket
|
||||
|
||||
If data replication isn't what you're looking for, [Agent engine](/ai-agents) might be.
|
||||
|
||||
### Taxonomy of data movement
|
||||
|
||||
<Taxonomy />
|
||||
|
||||
## Plans
|
||||
|
||||
Airbyte's data replication platform is available as a self-managed, hybrid, or fully managed cloud solution.
|
||||
|
||||
[Compare plans and pricing >](https://airbyte.com/pricing)
|
||||
|
||||
### Self-managed plans
|
||||
|
||||
@@ -63,17 +106,6 @@ Many people think of Airbyte and its connectors as infrastructure. The [Terrafor
|
||||
|
||||
If you want to use Python to move data, our Python library, [PyAirbyte](/developers/using-pyairbyte), might be the best fit for you. It's a good choice if you're using Jupyter Notebook or iterating on an early prototype for a large data project and don't need to run a server. PyAirbyte isn't an SDK for managing Airbyte. If that's what you're looking for, use the [API or Python SDK](#api-sdk).
|
||||
|
||||
## Why Airbyte?
|
||||
|
||||
Teams and organizations need efficient and timely data access to an ever-growing list of data sources. In-house data pipelines are brittle and costly to build and maintain. Airbyte's unique open source approach enables your data stack to adapt as your data needs evolve.
|
||||
|
||||
- **Wide connector availability:** Airbyte's connector catalog comes "out-of-the-box" with over 600 pre-built connectors. These connectors can be used to start replicating data from a source to a destination in just a few minutes.
|
||||
- **Long-tail connector coverage:** You can easily extend Airbyte's capability to support your custom use cases through Airbyte's [No-Code Connector Builder](/platform/connector-development/connector-builder-ui/overview).
|
||||
- **Robust platform** provides horizontal scaling required for large-scale data movement operations, available as [Cloud-managed](https://airbyte.com/product/airbyte-cloud) or [Self-managed](https://airbyte.com/product/airbyte-enterprise).
|
||||
- **Accessible User Interfaces** through the UI, [**PyAirbyte**](/developers/using-pyairbyte) (Python library), [**API**](/developers/api-documentation), and [**Terraform Provider**](/developers/terraform-documentation) to integrate with your preferred tooling and approach to infrastructure management.
|
||||
|
||||
Airbyte is suitable for a wide range of data integration use cases, including AI data infrastructure and EL(T) workloads. Airbyte is also [embeddable](https://airbyte.com/product/powered-by-airbyte) within your own app or platform to power your product.
|
||||
|
||||
## Contribute
|
||||
|
||||
Airbyte is an open source product. This is vital to Airbyte's vision of data movement. The world has seemingly infinite data sources, and only through community collaboration can we address that long tail of data sources.
|
||||
|
||||
@@ -48,6 +48,7 @@ ETL
|
||||
ELT
|
||||
[Dd]ata activation
|
||||
ID
|
||||
[Aa]gent(ic)?
|
||||
|
||||
# Common acronyms and initialisms that don't need definitions
|
||||
|
||||
|
||||
@@ -181,7 +181,6 @@ const config: Config = {
|
||||
remarkPlugins: [
|
||||
plugins.docsHeaderDecoration,
|
||||
plugins.enterpriseDocsHeaderInformation,
|
||||
plugins.productInformation,
|
||||
plugins.docMetaTags,
|
||||
plugins.addButtonToTitle,
|
||||
[plugins.npm2yarn, { sync: true }],
|
||||
@@ -396,7 +395,7 @@ const config: Config = {
|
||||
position: "left",
|
||||
docsPluginId: "platform",
|
||||
sidebarId: "platform",
|
||||
label: "Platform",
|
||||
label: "Data replication",
|
||||
},
|
||||
{
|
||||
type: "docSidebar",
|
||||
|
||||
@@ -26,6 +26,18 @@ module.exports = {
|
||||
label: 'Java SDK',
|
||||
href: 'https://github.com/airbytehq/airbyte-api-java-sdk',
|
||||
},
|
||||
{
|
||||
type: 'category',
|
||||
label: 'MCP Servers',
|
||||
link: {
|
||||
type: "doc",
|
||||
id: 'mcp-servers/readme',
|
||||
},
|
||||
items: [
|
||||
'mcp-servers/pyairbyte-mcp',
|
||||
// 'mcp-servers/connector-builder-mcp',
|
||||
],
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
|
||||
@@ -241,7 +241,7 @@ module.exports = {
|
||||
{
|
||||
type: "category",
|
||||
collapsible: false,
|
||||
label: "Airbyte Platform",
|
||||
label: "Data replication platform",
|
||||
link: {
|
||||
type: "doc",
|
||||
id: "readme",
|
||||
|
||||
@@ -27,8 +27,7 @@
|
||||
--color-active-nav-item-text: var(--ifm-color-primary-darker);
|
||||
--ifm-table-background: transparent;
|
||||
--ifm-table-stripe-background: transparent;
|
||||
--ifm-table-head-background: var(--ifm-color-primary);
|
||||
--ifm-table-head-color: var(--color-white);
|
||||
--ifm-table-head-background: var(--color-blue-30);
|
||||
--ifm-table-border-color: var(--ifm-color-primary-lightest);
|
||||
--docusaurus-highlighted-code-line-bg: rgba(0, 0, 0, 0.2);
|
||||
|
||||
@@ -327,27 +326,10 @@ The variables for them have been added to :root at the top of this file */
|
||||
|
||||
table {
|
||||
border-spacing: 0;
|
||||
border-collapse: separate;
|
||||
border-collapse: collapse;
|
||||
overflow-x: auto;
|
||||
}
|
||||
|
||||
/* Add these new styles */
|
||||
table th:first-child {
|
||||
border-top-left-radius: 10px;
|
||||
}
|
||||
|
||||
table th:last-child {
|
||||
border-top-right-radius: 10px;
|
||||
}
|
||||
|
||||
table tr:last-child td:first-child {
|
||||
border-bottom-left-radius: 10px;
|
||||
}
|
||||
|
||||
table tr:last-child td:last-child {
|
||||
border-bottom-right-radius: 10px;
|
||||
}
|
||||
|
||||
table th code {
|
||||
color: var(--ifm-color-content);
|
||||
}
|
||||
@@ -360,6 +342,10 @@ table td code {
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
table th, table td {
|
||||
vertical-align: top;
|
||||
}
|
||||
|
||||
table tr:hover {
|
||||
background-color: var(--color-grey-40);
|
||||
transition: background-color 0.2s ease;
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -85,15 +85,15 @@ export default function Home() {
|
||||
|
||||
const navLinks = [
|
||||
{
|
||||
title: 'Platform',
|
||||
title: 'Data replication platform',
|
||||
link: '/platform/',
|
||||
description: 'Deploy Airbyte locally, to cloud providers, or use Airbyte Cloud. Create connections, build custom connectors, and start syncing data in minutes.',
|
||||
description: 'Use Airbyte\'s data replication platform to create connections, build custom connectors, and start syncing data in minutes.',
|
||||
icon: PlatformIcon,
|
||||
},
|
||||
{
|
||||
title: 'Connectors',
|
||||
link: '/integrations/',
|
||||
description: 'Browse Airbyte\'s catalog of over 600 sources and destinations, and learn to set them up in Airbyte.',
|
||||
description: 'Browse Airbyte\'s catalog of over 600 sources and destinations, and learn to set them up in Airbyte\'s data replication platform.',
|
||||
icon: ConnectorsIcon,
|
||||
},
|
||||
{
|
||||
@@ -103,9 +103,9 @@ export default function Home() {
|
||||
icon: ReleaseNotesIcon,
|
||||
},
|
||||
{
|
||||
title: 'AI agents',
|
||||
title: 'Agent engine',
|
||||
link: '/ai-agents/',
|
||||
description: 'Explore AI Agent tools and capabilities for building intelligent data pipelines.',
|
||||
description: 'Use Airbyte\'s Agent engine to build intelligent data pipelines, explore your data, and work with it with help from AI.',
|
||||
icon: AIAgentsIcon,
|
||||
},
|
||||
{
|
||||
@@ -134,11 +134,10 @@ export default function Home() {
|
||||
<div className={styles.heroContainer}>
|
||||
<div className={styles.heroLeft}>
|
||||
<p className={styles.heroDescription}>
|
||||
Airbyte is an open source data integration and activation platform.
|
||||
It helps you consolidate data from hundreds of sources into your data
|
||||
warehouses, data lakes, and databases. Then, it helps you move data
|
||||
from those locations into the operational tools where work happens,
|
||||
like CRMs, marketing platforms, and support systems.
|
||||
Airbyte is an open source data integration, activation, and agentic data platform.
|
||||
Use our data replication platform to consolidate data from hundreds of sources into your data warehouses, data lakes, and databases.
|
||||
Then, move data into the operational tools where work happens, like CRMs, marketing platforms, and support systems.
|
||||
Or, use our Agent engine to ask questions, explore, and update your data with AI agents.
|
||||
</p>
|
||||
</div>
|
||||
<div className={styles.heroRight}>
|
||||
|
||||
67
docusaurus/static/_taxonomy_of_data_movement.md
Normal file
67
docusaurus/static/_taxonomy_of_data_movement.md
Normal file
@@ -0,0 +1,67 @@
|
||||
People think about different types of data movement with a lot of nuance. At a high level, Airbyte thinks about them like the table below. Airbyte's data replication platform targets the first row in the table. Airbyte's agentic data platform targets the second row.
|
||||
|
||||
While the agentic data platform exists to support AI use cases, it's incorrect to say data replication doesn't support AI. For example, data replication is a core ingredient in Retrieval-Augmented Generation (RAG). Think about your approach to data movement in terms of getting your data into the right shape at the right time. Don't think about the choice as binary. It's safe to assume AI is a stakeholder of some kind in virtually every data movement operation.
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<th></th>
|
||||
<th>In</th>
|
||||
<th>Out (data activation)</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Data replication</th>
|
||||
<td>
|
||||
<strong>ELT/ETL</strong><br /><br />
|
||||
For when:
|
||||
<ul>
|
||||
<li>You need all the data</li>
|
||||
<li>You need to join across datasets</li>
|
||||
<li>You need more pipeline steps that are slow</li>
|
||||
</ul>
|
||||
Requires:
|
||||
<ul>
|
||||
<li>Storage</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td>
|
||||
<strong>Reverse ETL</strong><br /><br />
|
||||
For when:
|
||||
<ul>
|
||||
<li>You have a lot of data to update</li>
|
||||
<li>You want to update content, not trigger side effects</li>
|
||||
</ul>
|
||||
Requires:
|
||||
<ul>
|
||||
<li>Good vendor APIs</li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Operations</th>
|
||||
<td>
|
||||
<strong>Get</strong><br /><br />
|
||||
For when:
|
||||
<ul>
|
||||
<li>You don't need all the data</li>
|
||||
<li>You don't want storage</li>
|
||||
<li>Freshness (latency) matters</li>
|
||||
</ul>
|
||||
Requires:
|
||||
<ul>
|
||||
<li>Good vendor APIs</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td>
|
||||
<strong>Write</strong><br /><br />
|
||||
For when:
|
||||
<ul>
|
||||
<li>You're updating a small amount of data</li>
|
||||
<li>You want to trigger side effects, like sending an email or closing a ticket</li>
|
||||
</ul>
|
||||
Requires:
|
||||
<ul>
|
||||
<li>Good vendor APIs</li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
Reference in New Issue
Block a user