Co-authored-by: octavia-bot-hoard[bot] <230633153+octavia-bot-hoard[bot]@users.noreply.github.com>
12 KiB
12 KiB
description
| description |
|---|
| Web scraping and automation platform. |
Apify Dataset
Overview
Apify is a web scraping and web automation platform providing both ready-made and custom solutions, an open-source JavaScript SDK and Python SDK for web scraping, proxies, and many other tools to help you build and run web automation jobs at scale.
The results of a scraping job are usually stored in the [Apify Dataset](https://docs.apify.com/storage/dataset). This Airbyte connector provides streams to work with the datasets, including syncing their content to your chosen destination using Airbyte.To sync data from a dataset, all you need to know is your API token and dataset ID.
You can find your personal API token in the Apify Console in the [Settings -> Integrations](https://console.apify.com/account/integrations) and the dataset ID in the [Storage -> Datasets](https://console.apify.com/storage/datasets).Running Airbyte sync from Apify webhook
When your Apify job (aka Actor run) finishes, it can trigger an Airbyte sync by calling the Airbyte API manual connection trigger (POST /v1/connections/sync). The API can be called from Apify webhook which is executed when your Apify run finishes.
Features
| Feature | Supported? |
|---|---|
| Full Refresh Sync | Yes |
| Incremental Sync | Yes |
Performance considerations
The Apify dataset connector uses Apify Python Client under the hood and should handle any API limitations under normal usage.
Streams
dataset_collection
dataset
- Calls
https://api.apify.com/v2/datasets/{datasetId}(docs) - Properties:
item_collection
- Calls
api.apify.com/v2/datasets/{datasetId}/items(docs) - Properties:
- Limitations:
- The stream uses a dynamic schema (all the data are stored under the
"data"key), so it should support all the Apify Datasets (produced by whatever Actor).
- The stream uses a dynamic schema (all the data are stored under the
item_collection_website_content_crawler
- Calls the same endpoint and uses the same properties as the
item_collectionstream. - Limitations:
- The stream uses a static schema which corresponds to the datasets produced by Website Content Crawler Actor. So only datasets produced by this Actor are supported.
Changelog
Expand to review
| Version | Date | Pull Request | Subject |
|---|---|---|---|
| 2.2.35 | 2025-12-16 | 70793 | Update dependencies |
| 2.2.34 | 2025-11-25 | 69867 | Update dependencies |
| 2.2.33 | 2025-11-18 | 69519 | Update dependencies |
| 2.2.32 | 2025-11-04 | 68843 | Update dependencies |
| 2.2.31 | 2025-10-21 | 68364 | Update dependencies |
| 2.2.30 | 2025-10-14 | 67997 | Update dependencies |
| 2.2.29 | 2025-10-07 | 67163 | Update dependencies |
| 2.2.28 | 2025-09-30 | 66273 | Update dependencies |
| 2.2.27 | 2025-08-09 | 64655 | Update dependencies |
| 2.2.26 | 2025-08-02 | 64432 | Update dependencies |
| 2.2.25 | 2025-07-26 | 63802 | Update dependencies |
| 2.2.24 | 2025-07-05 | 62540 | Update dependencies |
| 2.2.23 | 2025-06-28 | 62139 | Update dependencies |
| 2.2.22 | 2025-06-15 | 61108 | Update dependencies |
| 2.2.21 | 2025-05-17 | 60677 | Update dependencies |
| 2.2.20 | 2025-05-10 | 59857 | Update dependencies |
| 2.2.19 | 2025-05-03 | 59312 | Update dependencies |
| 2.2.18 | 2025-04-26 | 58251 | Update dependencies |
| 2.2.17 | 2025-04-12 | 57599 | Update dependencies |
| 2.2.16 | 2025-04-05 | 57134 | Update dependencies |
| 2.2.15 | 2025-03-29 | 56579 | Update dependencies |
| 2.2.14 | 2025-03-22 | 56107 | Update dependencies |
| 2.2.13 | 2025-03-08 | 55423 | Update dependencies |
| 2.2.12 | 2025-03-01 | 54885 | Update dependencies |
| 2.2.11 | 2025-02-22 | 54235 | Update dependencies |
| 2.2.10 | 2025-02-15 | 53872 | Update dependencies |
| 2.2.9 | 2025-02-08 | 53440 | Update dependencies |
| 2.2.8 | 2025-02-01 | 52904 | Update dependencies |
| 2.2.7 | 2025-01-25 | 52208 | Update dependencies |
| 2.2.6 | 2025-01-18 | 51740 | Update dependencies |
| 2.2.5 | 2025-01-11 | 51257 | Update dependencies |
| 2.2.4 | 2024-12-28 | 50468 | Update dependencies |
| 2.2.3 | 2024-12-21 | 50217 | Update dependencies |
| 2.2.2 | 2024-12-14 | 49553 | Update dependencies |
| 2.2.1 | 2024-12-12 | 48216 | Update dependencies |
| 2.2.0 | 2024-10-29 | 47286 | Migrate to manifest only format |
| 2.1.27 | 2024-10-29 | 47068 | Update dependencies |
| 2.1.26 | 2024-10-12 | 46837 | Update dependencies |
| 2.1.25 | 2024-10-01 | 46373 | add user-agent header to be able to track Airbyte integration on Apify |
| 2.1.24 | 2024-10-05 | 46430 | Update dependencies |
| 2.1.23 | 2024-09-28 | 46146 | Update dependencies |
| 2.1.22 | 2024-09-21 | 45820 | Update dependencies |
| 2.1.21 | 2024-09-14 | 45479 | Update dependencies |
| 2.1.20 | 2024-09-07 | 45252 | Update dependencies |
| 2.1.19 | 2024-08-31 | 44962 | Update dependencies |
| 2.1.18 | 2024-08-24 | 44734 | Update dependencies |
| 2.1.17 | 2024-08-17 | 44204 | Update dependencies |
| 2.1.16 | 2024-08-10 | 43607 | Update dependencies |
| 2.1.15 | 2024-08-03 | 43071 | Update dependencies |
| 2.1.14 | 2024-07-27 | 42627 | Update dependencies |
| 2.1.13 | 2024-07-20 | 42364 | Update dependencies |
| 2.1.12 | 2024-07-13 | 41893 | Update dependencies |
| 2.1.11 | 2024-07-10 | 41344 | Update dependencies |
| 2.1.10 | 2024-07-09 | 41189 | Update dependencies |
| 2.1.9 | 2024-07-06 | 40813 | Update dependencies |
| 2.1.8 | 2024-06-25 | 40411 | Update dependencies |
| 2.1.7 | 2024-06-22 | 40187 | Update dependencies |
| 2.1.6 | 2024-06-04 | 39010 | [autopull] Upgrade base image to v1.2.1 |
| 2.1.5 | 2024-04-19 | 37115 | Updating to 0.80.0 CDK |
| 2.1.4 | 2024-04-18 | 37115 | Manage dependencies with Poetry. |
| 2.1.3 | 2024-04-15 | 37115 | Base image migration: remove Dockerfile and use the python-connector-base image |
| 2.1.2 | 2024-04-12 | 37115 | schema descriptions |
| 2.1.1 | 2023-12-14 | 33414 | Prepare for airbyte-lib |
| 2.1.0 | 2023-10-13 | 31333 | Add stream for arbitrary datasets |
| 2.0.0 | 2023-09-18 | 30428 | Fix broken stream, manifest refactor |
| 1.0.0 | 2023-08-25 | 29859 | Migrate to lowcode |
| 0.2.0 | 2022-06-20 | 28290 | Make connector work with platform changes not syncing empty stream schemas. |
| 0.1.11 | 2022-04-27 | 12397 | No changes. Used connector to test publish workflow changes. |
| 0.1.9 | 2022-04-05 | PR#11712 | No changes from 0.1.4. Used connector to test publish workflow changes. |
| 0.1.4 | 2021-12-23 | PR#8434 | Update fields in source-connectors specifications |
| 0.1.2 | 2021-11-08 | PR#7499 | Remove base-python dependencies |
| 0.1.0 | 2021-07-29 | PR#5069 | Initial version of the connector |
