1
0
mirror of synced 2026-01-04 18:04:31 -05:00
Files
airbyte/docs/integrations/sources/apify-dataset.md
Vadym 504580d833 Remove base-python gradle dependencies in connectors where base-python is not used (#7499)
* Remeve base-python references.

* Add requirements.txt

* Fix requirements.txt blank line

* Fix source-exchange rates to common CDK approach

* Fix source-smartsheets SAT.
Fix source-exchange-rates build.gradle.

* Bump docker version

* Update source-dixa SAT config

* Fix source-exchange-rates SAT config

* Revert bump scaffold sources version

* Fix source-shortio SAT config

* Fix source-square invalid_config.json

* Fix source-us-census invalid_config.json

* Fix source-intercom versioning
2021-11-10 13:12:29 +02:00

2.1 KiB

description
description
Web scraping and automation platform.

Apify Dataset

Overview

Apify is a web scraping and web automation platform providing both ready-made and custom solutions, an open-source SDK for web scraping, proxies, and many other tools to help you build and run web automation jobs at scale.

The results of a scraping job are usually stored in Apify Dataset. This Airbyte connector allows you to automatically sync the contents of a dataset to your chosen destination using Airbyte.

To sync data from a dataset, all you need to know is its ID. You will find it in Apify console under storages.

Running Airbyte sync from Apify webhook

When your Apify job (aka actor run) finishes, it can trigger an Airbyte sync by calling the Airbyte API manual connection trigger (POST /v1/connections/sync). The API can be called from Apify webhook which is executed when your Apify run finishes.

Output schema

Since the dataset items do not have strongly typed schema, they are synced as objects, without any assumption on their content.

Features

Feature Supported?
Full Refresh Sync Yes
Incremental Sync No

Performance considerations

The Apify dataset connector uses Apify Python Client under the hood and should handle any API limitations under normal usage.

Getting started

Requirements

Changelog

Version Date Pull Request Subject
0.1.2 2021-11-08 PR#7499 Remove base-python dependencies
0.1.0 2021-07-29 PR#5069 Initial version of the connector