1
0
mirror of synced 2025-12-25 02:09:19 -05:00

[python-cdk] README cleanup (#37306)

This commit is contained in:
Natik Gadzhi
2024-04-16 14:57:11 -07:00
committed by GitHub
parent 33235c80d2
commit e4c942e526
2 changed files with 116 additions and 73 deletions

View File

@@ -1,57 +1,79 @@
# Connector Development Kit \(Python\)
# Airbyte Python CDK and Low-Code CDK
The Airbyte Python CDK is a framework for rapidly developing production-grade Airbyte connectors.The CDK currently offers helpers specific for creating Airbyte source connectors for:
Airbyte Python CDK is a framework for building Airbyte API Source Connectors. It provides a set of
classes and helpers that make it easy to build a connector against an HTTP API (REST, GraphQL, etc),
or a generic Python source connector.
- HTTP APIs \(REST APIs, GraphQL, etc..\)
- Generic Python sources \(anything not covered by the above\)
## Usage
The CDK provides an improved developer experience by providing basic implementation structure and abstracting away low-level glue boilerplate.
If you're looking to build a connector, we highly recommend that you
[start with the Connector Builder](https://docs.airbyte.com/connector-development/connector-builder-ui/overview).
It should be enough for 90% connectors out there. For more flexible and complex connectors, use the
[low-code CDK and `SourceDeclarativeManifest`](https://docs.airbyte.com/connector-development/config-based/low-code-cdk-overview).
This document is a general introduction to the CDK. Readers should have basic familiarity with the [Airbyte Specification](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/) before proceeding.
If that doesn't work, then consider building on top of the
[lower-level Python CDK itself](https://docs.airbyte.com/connector-development/cdk-python/).
# Setup
### Quick Start
## Prerequisites
#### Poetry
Before you can start working on this project, you will need to have Poetry installed on your system. Please follow the instructions below to install Poetry:
1. Open your terminal or command prompt.
2. Install Poetry using the recommended installation method:
To get started on a Python CDK based connector or a low-code connector, you can generate a connector
project from a template:
```bash
curl -sSL https://install.python-poetry.org | POETRY_VERSION=1.5.1 python3 -
# from the repo root
cd airbyte-integrations/connector-templates/generator
./generate.sh
```
Alternatively, you can use `pip` to install Poetry:
```bash
pip install --user poetry
```
3. After the installation is complete, close and reopen your terminal to ensure the newly installed `poetry` command is available in your system's PATH.
For more detailed instructions and alternative installation methods, please refer to the official Poetry documentation: https://python-poetry.org/docs/#installation
### Concepts & Documentation
See the [concepts docs](docs/concepts/) for a tour through what the API offers.
### Example Connectors
**HTTP Connectors**:
- [Stripe](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/source_stripe/source.py)
- [Slack](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-slack/source_slack/source.py)
- [Stripe](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-stripe/)
- [Salesforce](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce/)
**Simple Python connectors using the bare-bones `Source` abstraction**:
- [Google Sheets](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-google-sheets/google_sheets_source/google_sheets_source.py)
- [Mailchimp](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-mailchimp/source_mailchimp/source.py)
This will generate a project with a type and a name of your choice and put it in
`airbyte-integrations/connectors`. Open the directory with your connector in an editor and follow
the `TODO` items.
## Python CDK Overview
Airbyte CDK code is within `airbyte_cdk` directory. Here's a high level overview of what's inside:
- `connector_builder`. Internal wrapper that helps the Connector Builder platform run a declarative
manifest (low-code connector). You should not use this code directly. If you need to run a
`SourceDeclarativeManifest`, take a look at
[`source-declarative-manifest`](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-declarative-manifest)
connector implementation instead.
- `destinations`. Basic Destination connector support! If you're building a Destination connector in
Python, try that. Some of our vector DB destinations like `destination-pinecone` are using that
code.
- `models` expose `airbyte_protocol.models` as a part of `airbyte_cdk` package.
- `sources/concurrent_source` is the Concurrent CDK implementation. It supports reading data from
streams concurrently per slice / partition, useful for connectors with high throughput and high
number of records.
- `sources/declarative` is the low-code CDK. It works on top of Airbyte Python CDK, but provides a
declarative manifest language to define streams, operations, etc. This makes it easier to build
connectors without writing Python code.
- `sources/file_based` is the CDK for file-based sources. Examples include S3, Azure, GCS, etc.
- `sources/singer` is a singer tap source adapter. Deprecated.
## Contributing
Thank you for being interested in contributing to Airbyte Python CDK! Here are some guidelines to
get you started:
- We adhere to the [code of conduct](/CODE_OF_CONDUCT.md).
- You can contribute by reporting bugs, posting github discussions, opening issues, improving [documentation](/docs/), and
submitting pull requests with bugfixes and new features alike.
- If you're changing the code, please add unit tests for your change.
- When submitting issues or PRs, please add a small reproduction project. Using the changes in your
connector and providing that connector code as an example (or a satellite PR) helps!
### First time setup
Install the project dependencies and development tools:
@@ -62,61 +84,58 @@ poetry install --all-extras
Installing all extras is required to run the full suite of unit tests.
#### Iteration
#### Running tests locally
- Iterate on the CDK code locally
- Run tests via `poetry run poe unit-test-with-cov`, or `python -m pytest -s unit_tests` if you want to pass pytest options.
- Run `poetry run poe check-local` to lint all code, type-check modified code, and run unit tests with coverage in one command.
- Run tests via `poetry run poe unit-test-with-cov`, or `python -m pytest -s unit_tests` if you want
to pass pytest options.
- Run `poetry run poe check-local` to lint all code, type-check modified code, and run unit tests
with coverage in one command.
To see all available scripts, run `poetry run poe`.
##### Autogenerated files
If the iteration you are working on includes changes to the models or the connector generator, you might want to regenerate them. In order to do that, you can run:
Low-code CDK models are generated from `sources/declarative/declarative_component_schema.yaml`. If
the iteration you are working on includes changes to the models or the connector generator, you
might want to regenerate them. In order to do that, you can run:
```bash
poetry run poe build
```
This will generate the code generator docker image and the component manifest files based on the schemas and templates.
This will generate the code generator docker image and the component manifest files based on the
schemas and templates.
#### Testing
All tests are located in the `unit_tests` directory. Run `poetry run poe unit-test-with-cov` to run them. This also presents a test coverage report. For faster iteration with no coverage report and more options, `python -m pytest -s unit_tests` is a good place to start.
All tests are located in the `unit_tests` directory. Run `poetry run poe unit-test-with-cov` to run
them. This also presents a test coverage report. For faster iteration with no coverage report and
more options, `python -m pytest -s unit_tests` is a good place to start.
#### Building and testing a connector with your local CDK
When developing a new feature in the CDK, you may find it helpful to run a connector that uses that new feature. You can test this in one of two ways:
When developing a new feature in the CDK, you may find it helpful to run a connector that uses that
new feature. You can test this in one of two ways:
- Running a connector locally
- Building and running a source via Docker
##### Installing your local CDK into a local Python connector
In order to get a local Python connector running your local CDK, do the following.
Open the connector's `pyproject.toml` file and replace the line with `airbyte_cdk` with the
following:
First, make sure you have your connector's virtual environment active:
```bash
# from the `airbyte/airbyte-integrations/connectors/<connector-directory>` directory
source .venv/bin/activate
# if you haven't installed dependencies for your connector already
pip install -e .
```toml
airbyte_cdk = { path = "../../../airbyte-cdk/python/airbyte_cdk", develop = true }
```
Then, navigate to the CDK and install it in editable mode:
```bash
cd ../../../airbyte-cdk/python
pip install -e .
```
You should see that `pip` has uninstalled the version of `airbyte-cdk` defined by your connector's `setup.py` and installed your local CDK. Any changes you make will be immediately reflected in your editor, so long as your editor's interpreter is set to your connector's virtual environment.
Then, running `poetry update` should reinstall `airbyte_cdk` from your local working directory.
##### Building a Python connector in Docker with your local CDK installed
_Pre-requisite: Install the [`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_
_Pre-requisite: Install the
[`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_
You can build your connector image with the local CDK using
@@ -125,13 +144,16 @@ You can build your connector image with the local CDK using
airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> build
```
Note that the local CDK is injected at build time, so if you make changes, you will have to run the build command again to see them reflected.
Note that the local CDK is injected at build time, so if you make changes, you will have to run the
build command again to see them reflected.
##### Running Connector Acceptance Tests for a single connector in Docker with your local CDK installed
_Pre-requisite: Install the [`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_
_Pre-requisite: Install the
[`airbyte-ci` CLI](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)_
To run acceptance tests for a single connectors using the local CDK, from the connector directory, run
To run acceptance tests for a single connectors using the local CDK, from the connector directory,
run
```bash
airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test
@@ -139,9 +161,13 @@ airbyte-ci connectors --use-local-cdk --name=<CONNECTOR> test
#### When you don't have access to the API
There may be a time when you do not have access to the API (either because you don't have the credentials, network access, etc...) You will probably still want to do end-to-end testing at least once. In order to do so, you can emulate the server you would be reaching using a server stubbing tool.
There may be a time when you do not have access to the API (either because you don't have the
credentials, network access, etc...) You will probably still want to do end-to-end testing at least
once. In order to do so, you can emulate the server you would be reaching using a server stubbing
tool.
For example, using [mockserver](https://www.mock-server.com/), you can set up an expectation file like this:
For example, using [mockserver](https://www.mock-server.com/), you can set up an expectation file
like this:
```json
{
@@ -155,13 +181,19 @@ For example, using [mockserver](https://www.mock-server.com/), you can set up an
}
```
Assuming this file has been created at `secrets/mock_server_config/expectations.json`, running the following command will allow to match any requests on path `/data` to return the response defined in the expectation file:
Assuming this file has been created at `secrets/mock_server_config/expectations.json`, running the
following command will allow to match any requests on path `/data` to return the response defined in
the expectation file:
```bash
docker run -d --rm -v $(pwd)/secrets/mock_server_config:/config -p 8113:8113 --env MOCKSERVER_LOG_LEVEL=TRACE --env MOCKSERVER_SERVER_PORT=8113 --env MOCKSERVER_WATCH_INITIALIZATION_JSON=true --env MOCKSERVER_PERSISTED_EXPECTATIONS_PATH=/config/expectations.json --env MOCKSERVER_INITIALIZATION_JSON_PATH=/config/expectations.json mockserver/mockserver:5.15.0
```
HTTP requests to `localhost:8113/data` should now return the body defined in the expectations file. To test this, the implementer either has to change the code which defines the base URL for Python source or update the `url_base` from low-code. With the Connector Builder running in docker, you will have to use domain `host.docker.internal` instead of `localhost` as the requests are executed within docker.
HTTP requests to `localhost:8113/data` should now return the body defined in the expectations file.
To test this, the implementer either has to change the code which defines the base URL for Python
source or update the `url_base` from low-code. With the Connector Builder running in docker, you
will have to use domain `host.docker.internal` instead of `localhost` as the requests are executed
within docker.
#### Publishing a new version to PyPi