diff --git a/docs/.gitbook/assets/zoom-marketplace-build-screen (1).png b/docs/.gitbook/assets/04_zoom-marketplace-build-screen.png similarity index 100% rename from docs/.gitbook/assets/zoom-marketplace-build-screen (1).png rename to docs/.gitbook/assets/04_zoom-marketplace-build-screen.png diff --git a/docs/.gitbook/assets/setup-successful (1).png b/docs/.gitbook/assets/09_setup-successful.png similarity index 100% rename from docs/.gitbook/assets/setup-successful (1).png rename to docs/.gitbook/assets/09_setup-successful.png diff --git a/docs/.gitbook/assets/postgres_credentials (1).png b/docs/.gitbook/assets/12_postgres_credentials.png similarity index 100% rename from docs/.gitbook/assets/postgres_credentials (1).png rename to docs/.gitbook/assets/12_postgres_credentials.png diff --git a/docs/.gitbook/assets/schema (1).png b/docs/.gitbook/assets/13_schema.png similarity index 100% rename from docs/.gitbook/assets/schema (1).png rename to docs/.gitbook/assets/13_schema.png diff --git a/docs/.gitbook/assets/launch (1).png b/docs/.gitbook/assets/14_launch.png similarity index 100% rename from docs/.gitbook/assets/launch (1).png rename to docs/.gitbook/assets/14_launch.png diff --git a/docs/.gitbook/assets/sync-screen (1).png b/docs/.gitbook/assets/15_sync-screen.png similarity index 100% rename from docs/.gitbook/assets/sync-screen (1).png rename to docs/.gitbook/assets/15_sync-screen.png diff --git a/docs/.gitbook/assets/tableau-dashboard (1).png b/docs/.gitbook/assets/16_tableau-dashboard.png similarity index 100% rename from docs/.gitbook/assets/tableau-dashboard (1).png rename to docs/.gitbook/assets/16_tableau-dashboard.png diff --git a/docs/.gitbook/assets/datasources (2).png b/docs/.gitbook/assets/17_datasources.png similarity index 100% rename from docs/.gitbook/assets/datasources (2).png rename to docs/.gitbook/assets/17_datasources.png diff --git a/docs/.gitbook/assets/change-to-per-week (1).png b/docs/.gitbook/assets/23_change-to-per-week.png similarity index 100% rename from docs/.gitbook/assets/change-to-per-week (1).png rename to docs/.gitbook/assets/23_change-to-per-week.png diff --git a/docs/.gitbook/assets/evolution-of-meetings-per-week (1).png b/docs/.gitbook/assets/25_evolution-of-meetings-per-week.png similarity index 100% rename from docs/.gitbook/assets/evolution-of-meetings-per-week (1).png rename to docs/.gitbook/assets/25_evolution-of-meetings-per-week.png diff --git a/docs/.gitbook/assets/number_of_participants_per_weekly_meetings (1).png b/docs/.gitbook/assets/28_number_of_participants_per_weekly_meetings.png similarity index 100% rename from docs/.gitbook/assets/number_of_participants_per_weekly_meetings (1).png rename to docs/.gitbook/assets/28_number_of_participants_per_weekly_meetings.png diff --git a/docs/.gitbook/assets/meetings-participant-ranked (1).png b/docs/.gitbook/assets/29_meetings-participant-ranked.png similarity index 100% rename from docs/.gitbook/assets/meetings-participant-ranked (1).png rename to docs/.gitbook/assets/29_meetings-participant-ranked.png diff --git a/docs/.gitbook/assets/duration-spent-in-weekly-webinars (1).png b/docs/.gitbook/assets/31_time-spent-in-weekly-webinars.png similarity index 100% rename from docs/.gitbook/assets/duration-spent-in-weekly-webinars (1).png rename to docs/.gitbook/assets/31_time-spent-in-weekly-webinars.png diff --git a/docs/.gitbook/assets/activate-webhook.png b/docs/.gitbook/assets/activate-webhook.png new file mode 100644 index 00000000000..ac8f53a5b71 Binary files /dev/null and b/docs/.gitbook/assets/activate-webhook.png differ diff --git a/docs/.gitbook/assets/airbyte-dashboard.png b/docs/.gitbook/assets/airbyte-dashboard.png new file mode 100644 index 00000000000..6f160508043 Binary files /dev/null and b/docs/.gitbook/assets/airbyte-dashboard.png differ diff --git a/docs/.gitbook/assets/app-information.png b/docs/.gitbook/assets/app-information.png new file mode 100644 index 00000000000..4e08b4c4ab1 Binary files /dev/null and b/docs/.gitbook/assets/app-information.png differ diff --git a/docs/.gitbook/assets/app-name-modal.png b/docs/.gitbook/assets/app-name-modal.png new file mode 100644 index 00000000000..37cda82d18c Binary files /dev/null and b/docs/.gitbook/assets/app-name-modal.png differ diff --git a/docs/.gitbook/assets/change-to-date-time.png b/docs/.gitbook/assets/change-to-date-time.png new file mode 100644 index 00000000000..4d7da0c0141 Binary files /dev/null and b/docs/.gitbook/assets/change-to-date-time.png differ diff --git a/docs/.gitbook/assets/choose-postgres-destination.png b/docs/.gitbook/assets/choose-postgres-destination.png new file mode 100644 index 00000000000..4cb0f52ef69 Binary files /dev/null and b/docs/.gitbook/assets/choose-postgres-destination.png differ diff --git a/docs/.gitbook/assets/click.png b/docs/.gitbook/assets/click.png new file mode 100644 index 00000000000..9f5eeec1823 Binary files /dev/null and b/docs/.gitbook/assets/click.png differ diff --git a/docs/.gitbook/assets/destination.png b/docs/.gitbook/assets/destination.png new file mode 100644 index 00000000000..5c56437ebfe Binary files /dev/null and b/docs/.gitbook/assets/destination.png differ diff --git a/docs/.gitbook/assets/drag-created-at.png b/docs/.gitbook/assets/drag-created-at.png new file mode 100644 index 00000000000..7e860f7669e Binary files /dev/null and b/docs/.gitbook/assets/drag-created-at.png differ diff --git a/docs/.gitbook/assets/empty-meeting-sheet.png b/docs/.gitbook/assets/empty-meeting-sheet.png new file mode 100644 index 00000000000..291e1e951b3 Binary files /dev/null and b/docs/.gitbook/assets/empty-meeting-sheet.png differ diff --git a/docs/.gitbook/assets/fill-in-connection-details.png b/docs/.gitbook/assets/fill-in-connection-details.png new file mode 100644 index 00000000000..a040fddf0d8 Binary files /dev/null and b/docs/.gitbook/assets/fill-in-connection-details.png differ diff --git a/docs/.gitbook/assets/hours-spent-in-weekly-meetings.png b/docs/.gitbook/assets/hours-spent-in-weekly-meetings.png new file mode 100644 index 00000000000..3515da70940 Binary files /dev/null and b/docs/.gitbook/assets/hours-spent-in-weekly-meetings.png differ diff --git a/docs/.gitbook/assets/meetings-per-week.png b/docs/.gitbook/assets/meetings-per-week.png new file mode 100644 index 00000000000..3b42972a9fa Binary files /dev/null and b/docs/.gitbook/assets/meetings-per-week.png differ diff --git a/docs/.gitbook/assets/number-of-webinars-participants.png b/docs/.gitbook/assets/number-of-webinars-participants.png new file mode 100644 index 00000000000..a20bbaeac35 Binary files /dev/null and b/docs/.gitbook/assets/number-of-webinars-participants.png differ diff --git a/docs/.gitbook/assets/number_of_webinar_attended_per_week.png b/docs/.gitbook/assets/number_of_webinar_attended_per_week.png new file mode 100644 index 00000000000..e7301d3604a Binary files /dev/null and b/docs/.gitbook/assets/number_of_webinar_attended_per_week.png differ diff --git a/docs/.gitbook/assets/setting-zoom-connector-name.png b/docs/.gitbook/assets/setting-zoom-connector-name.png new file mode 100644 index 00000000000..b295ee7fb87 Binary files /dev/null and b/docs/.gitbook/assets/setting-zoom-connector-name.png differ diff --git a/docs/.gitbook/assets/tableau-view-with-all-tables.png b/docs/.gitbook/assets/tableau-view-with-all-tables.png new file mode 100644 index 00000000000..ae33bba248d Binary files /dev/null and b/docs/.gitbook/assets/tableau-view-with-all-tables.png differ diff --git a/docs/.gitbook/assets/view-jwt-token.png b/docs/.gitbook/assets/view-jwt-token.png new file mode 100644 index 00000000000..f999d1d8ba9 Binary files /dev/null and b/docs/.gitbook/assets/view-jwt-token.png differ diff --git a/docs/.gitbook/assets/weekly-webinars.png b/docs/.gitbook/assets/weekly-webinars.png new file mode 100644 index 00000000000..7ae89f165e5 Binary files /dev/null and b/docs/.gitbook/assets/weekly-webinars.png differ diff --git a/docs/.gitbook/assets/zoom-dashboard.png b/docs/.gitbook/assets/zoom-dashboard.png new file mode 100644 index 00000000000..7d935168b7a Binary files /dev/null and b/docs/.gitbook/assets/zoom-dashboard.png differ diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index c05aa61ff64..8c77dbf04e1 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -93,7 +93,7 @@ * [High-level View](architecture/high-level-view.md) * [Workers & Jobs](architecture/jobs.md) * [Technical Stack](architecture/tech-stack.md) - * [Change Data Capture \(CDC\)](architecture/cdc.md) + * [Change Data Capture (CDC)](architecture/cdc.md) * [Contributing to Airbyte](contributing-to-airbyte/README.md) * [Code of Conduct](contributing-to-airbyte/code-of-conduct.md) * [Developing Locally](contributing-to-airbyte/developing-locally.md) diff --git a/docs/architecture/cdc.md b/docs/architecture/cdc.md index cf904420bfb..5639e526946 100644 --- a/docs/architecture/cdc.md +++ b/docs/architecture/cdc.md @@ -1,11 +1,9 @@ -# Change Data Capture \(CDC\) +# Change Data Capture (CDC) ## What is log-based incremental replication? - -Many common databases support writing all record changes to log files for the purpose of replication. A consumer of these log files \(such as Airbyte\) can read these logs while keeping track of the current position within the logs in order to read all record changes coming from `DELETE`/`INSERT`/`UPDATE` statements. +Many common databases support writing all record changes to log files for the purpose of replication. A consumer of these log files (such as Airbyte) can read these logs while keeping track of the current position within the logs in order to read all record changes coming from `DELETE`/`INSERT`/`UPDATE` statements. ## Syncing - The orchestration for syncing is similar to non-CDC database sources. After selecting a sync interval, syncs are launched regularly. We read data from the log up to the time that the sync was started. We do not treat CDC sources as infinite streaming sources. You should ensure that your schedule for running these syncs is frequent enough to consume the logs that are generated. The first time the sync is run, a snapshot of the current state of the data will be taken. This is done using `SELECT` statements and is effectively a Full Refresh. Subsequent syncs will use the logs to determine which changes took place since the last sync and update those. Airbyte keeps track of the current log position between syncs. A single sync might have some tables configured for Full Refresh replication and others for Incremental. If CDC is configured at the source level, all tables with Incremental selected will use CDC. All Full Refresh tables will replicate using the same process as non-CDC sources. However, these tables will still include CDC metadata columns by default. @@ -13,13 +11,11 @@ A single sync might have some tables configured for Full Refresh replication and The Airbyte Protocol outputs records from sources. Records from `UPDATE` statements appear the same way as records from `INSERT` statements. We support different options for how to sync this data into destinations using primary keys, so you can choose to append this data, delete in place, etc. We add some metadata columns for CDC sources: - * `ab_cdc_lsn` is the point in the log where the record was retrieved * `ab_cdc_updated_at` is the timestamp for the database transaction that resulted in this record change and is present for records from `DELETE`/`INSERT`/`UPDATE` statements * `ab_cdc_deleted_at` is the timestamp for the database transaction that resulted in this record change and is only present for records from `DELETE` statements ## Limitations - * CDC incremental is only supported for tables with primary keys. A CDC source can still choose to replicate tables without primary keys as Full Refresh or a non-CDC source can be configured for the same database to replicate the tables without primary keys using standard incremental replication. * Data must be in tables, not views. * The modifications you are trying to capture must be made using `DELETE`/`INSERT`/`UPDATE`. For example, changes made from `TRUNCATE`/`ALTER` won't appear in logs and therefore in your destination. @@ -28,13 +24,10 @@ We add some metadata columns for CDC sources: * The records produced by `DELETE` statements only contain primary keys. All other data fields are unset. ## Current Support - * [Postgres](../integrations/sources/postgres.md) ## Coming Soon - * [MySQL](../integrations/sources/mysql.md) * [SQL Server / MSSQL](../integrations/sources/mssql.md) * Oracle DB -* Please [create a ticket](https://github.com/airbytehq/airbyte/issues/new/choose) if you need CDC support on another database! - +* Please [create a ticket](https://github.com/airbytehq/airbyte/issues/new/choose) if you need CDC support on another database! \ No newline at end of file diff --git a/docs/architecture/jobs.md b/docs/architecture/jobs.md index 9d2582be2f4..94adcb70614 100644 --- a/docs/architecture/jobs.md +++ b/docs/architecture/jobs.md @@ -9,15 +9,21 @@ In Airbyte, all interactions with connectors are run as jobs performed by a Work ## Worker Responsibilities -The worker has 4 main responsibilities in its lifecycle. 1. Spin up any connector docker containers that are needed for the job. 2. They facilitate message passing to or from a connector docker container \(more on this [below](jobs.md#message-passing)\). 3. Shut down any connector docker containers that it started. 4. Return the output of the job. \(See [Airbyte Specification](airbyte-specification.md) to understand the output of each worker type.\) +The worker has 4 main responsibilities in its lifecycle. +1. Spin up any connector docker containers that are needed for the job. +2. They facilitate message passing to or from a connector docker container (more on this [below](#message-passing)). +3. Shut down any connector docker containers that it started. +4. Return the output of the job. (See [Airbyte Specification](./airbyte-specification.md) to understand the output of each worker type.) ## Message Passing -There are 2 flavors of workers: 1. There are workers that interact with a single connector \(e.g. spec, check, discover\) 2. There are workers that interact with 2 connectors \(e.g. sync, reset\) +There are 2 flavors of workers: +1. There are workers that interact with a single connector (e.g. spec, check, discover) +2. There are workers that interact with 2 connectors (e.g. sync, reset) -In the first case, the worker is generally extracting data from the connector and reporting it back to the scheduler. It does this by listening to STDOUT of the connector. In the second case, the worker is facilitating passing data \(via record messages\) from the source to the destination. It does this by listening on STDOUT of the source and writing to STDIN on the destination. +In the first case, the worker is generally extracting data from the connector and reporting it back to the scheduler. It does this by listening to STDOUT of the connector. In the second case, the worker is facilitating passing data (via record messages) from the source to the destination. It does this by listening on STDOUT of the source and writing to STDIN on the destination. -For more information on the schema of the messages that are passed, refer to [Airbyte Specification](airbyte-specification.md). +For more information on the schema of the messages that are passed, refer to [Airbyte Specification](./airbyte-specification.md). ## Worker Lifecycle diff --git a/docs/architecture/tech-stack.md b/docs/architecture/tech-stack.md index f721f89cdc3..43cb81c99c8 100644 --- a/docs/architecture/tech-stack.md +++ b/docs/architecture/tech-stack.md @@ -10,9 +10,7 @@ * Orchestration: [Temporal](https://temporal.io) ## Connectors - -Connectors can be written in any language. However the most common languages are: - +Connectors can be written in any language. However the most common languages are: * Python 3.7.9 * Java 14 @@ -28,4 +26,3 @@ Connectors can be written in any language. However the most common languages are * Containerization: [Docker](https://www.docker.com/) and [Docker Compose](https://docs.docker.com/compose/) * Linter \(Frontend\): [Prettier](https://prettier.io/) * Formatter \(Backend\): [Spotless](https://github.com/diffplug/spotless) - diff --git a/docs/career-and-open-positions/README.md b/docs/career-and-open-positions/README.md index 6356cc13584..72196077fc1 100644 --- a/docs/career-and-open-positions/README.md +++ b/docs/career-and-open-positions/README.md @@ -4,7 +4,7 @@ [Airbyte](http://airbyte.io) is the upcoming open-source standard for EL\(T\). We enable data teams to replicate data from applications, APIs, and databases to data warehouses, lakes, and other destinations. We believe only an open-source approach can solve the problem of data integration, as it enables us to cover the long tail of integrations while enabling teams to adapt prebuilt connectors to their needs. -Airbyte is remote friendly, with most of the team still based in the Silicon Valley. We’re fully open as a company. Our [**company handbook**](https://handbook.airbyte.io), [**culture & values**](https://handbook.airbyte.io/company/culture-and-values), [**strategy**](https://handbook.airbyte.io/strategy/strategy) and [**roadmap**](../roadmap.md) are open to all. +Airbyte is remote friendly, with most of the team still based in the Silicon Valley. We’re fully open as a company. Our **[company handbook](https://handbook.airbyte.io)**, **[culture & values](https://handbook.airbyte.io/company/culture-and-values)**, **[strategy](https://handbook.airbyte.io/strategy/strategy)** and **[roadmap](../roadmap.md)** are open to all. We're backed by some of the world's [top investors](./#our-investors) and believe in product-led growth, where we build something awesome and let our product bring the users, rather than an outbound sales engine with cold calls. @@ -50,12 +50,12 @@ If the written interview is a success, we might set you up with one or 2 additio Once all of this done, we will discuss the process internally and get back to you very fast \(velocity is everything here\)! So about 2-3 calls and one written interview, that's it! -## [**Our Benefits**](https://handbook.airbyte.io/people/benefits) +## **[Our Benefits](https://handbook.airbyte.io/people/benefits)** * **Flexible work environment as fully remote** - we don’t look at when you log in, log out or how much time you work. We trust you, it’s the only way remote can actually work. -* [**Unlimited vacation policy**](https://handbook.airbyte.io/people/time-off) with mandatory minimum time off - so you can fit work around your life. -* [**Co-working space stipend**](https://handbook.airbyte.io/people/expense-policy#work-space) - we provide everyone with $200/month to use on a coworking space of their choice, if any. -* [**Parental leave**](https://handbook.airbyte.io/people/time-off#parental-leave) \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us. +* **[Unlimited vacation policy](https://handbook.airbyte.io/people/time-off)** with mandatory minimum time off - so you can fit work around your life. +* **[Co-working space stipend](https://handbook.airbyte.io/people/expense-policy#work-space)** - we provide everyone with $200/month to use on a coworking space of their choice, if any. +* **[Parental leave](https://handbook.airbyte.io/people/time-off#parental-leave)** \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us. * **Open book policy** - we reimburse books that employees want to purchase for their professional and career development. * **Continuous learning / training policy** - we sponsor the conferences and training programs you feel would add to your development in the company. * **Health insurance** for those from countries that do not provide this freely. Through Savvy in the US, which means you can choose the insurance you want and will receive a stipend from the company. diff --git a/docs/career-and-open-positions/founding-developer-advocate.md b/docs/career-and-open-positions/founding-developer-advocate.md index f73897e7072..7ce72cc6f2c 100644 --- a/docs/career-and-open-positions/founding-developer-advocate.md +++ b/docs/career-and-open-positions/founding-developer-advocate.md @@ -1,10 +1,10 @@ -# Founding Developer Advocate +# Senior Developer Advocate ## **About Airbyte** [Airbyte](http://airbyte.io) is the upcoming open-source standard for EL\(T\). We enable data teams to replicate data from applications, APIs, and databases to data warehouses, lakes, and other destinations. We believe only an open-source approach can solve the problem of data integration, as it enables us to cover the long tail of integrations while enabling teams to adapt prebuilt connectors to their needs. -Airbyte is remote friendly, with most of the team still based in the Silicon Valley. We’re fully open as a company. Our [**company handbook**](https://handbook.airbyte.io), [**culture & values**](https://handbook.airbyte.io/company/culture-and-values), [**strategy**](https://handbook.airbyte.io/strategy/strategy) and [**roadmap**](../roadmap.md) are open to all. +Airbyte is remote friendly, with most of the team still based in the Silicon Valley. We’re fully open as a company. Our **[company handbook](https://handbook.airbyte.io)**, **[culture & values](https://handbook.airbyte.io/company/culture-and-values)**, **[strategy](https://handbook.airbyte.io/strategy/strategy)** and **[roadmap](../roadmap.md)** are open to all. We're backed by some of the world's [top investors](./#our-investors) and believe in product-led growth, where we build something awesome and let our product bring the users, rather than an outbound sales engine with cold calls. @@ -42,9 +42,9 @@ North America. ## **We provide** * **Flexible work environment as fully remote** - we don’t look at when you log in, log out or how much time you work. We trust you, it’s the only way remote can actually work. -* [**Unlimited vacation policy**](https://handbook.airbyte.io/people/time-off) with mandatory minimum time off - so you can fit work around your life. -* [**Co-working space stipend**](https://handbook.airbyte.io/people/expense-policy#work-space) - we provide everyone with $200/month to use on a coworking space of their choice, if any. -* [**Parental leave**](https://handbook.airbyte.io/people/time-off#parental-leave) \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us. +* **[Unlimited vacation policy](https://handbook.airbyte.io/people/time-off)** with mandatory minimum time off - so you can fit work around your life. +* **[Co-working space stipend](https://handbook.airbyte.io/people/expense-policy#work-space)** - we provide everyone with $200/month to use on a coworking space of their choice, if any. +* **[Parental leave](https://handbook.airbyte.io/people/time-off#parental-leave)** \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us. * **Open book policy** - we reimburse books that employees want to purchase for their professional and career development. * **Continuous learning / training policy** - we sponsor the conferences and training programs you feel would add to your development in the company. * **Health insurance** for those from countries that do not provide this freely. Through Savvy in the US, which means you can choose the insurance you want and will receive a stipend from the company. diff --git a/docs/career-and-open-positions/operations-manager.md b/docs/career-and-open-positions/operations-manager.md index 84a2b0f4723..83be945c641 100644 --- a/docs/career-and-open-positions/operations-manager.md +++ b/docs/career-and-open-positions/operations-manager.md @@ -4,7 +4,7 @@ [Airbyte](http://airbyte.io) is the upcoming open-source standard for EL\(T\). We enable data teams to replicate data from applications, APIs, and databases to data warehouses, lakes, and other destinations. We believe only an open-source approach can solve the problem of data integration, as it enables us to cover the long tail of integrations while enabling teams to adapt prebuilt connectors to their needs. -Airbyte is remote friendly, with most of the team still based in the Silicon Valley. We’re fully open as a company. Our [**company handbook**](https://handbook.airbyte.io), [**culture & values**](https://handbook.airbyte.io/company/culture-and-values), [**strategy**](https://handbook.airbyte.io/strategy/strategy) and [**roadmap**](../roadmap.md) are open to all. +Airbyte is remote friendly, with most of the team still based in the Silicon Valley. We’re fully open as a company. Our **[company handbook](https://handbook.airbyte.io)**, **[culture & values](https://handbook.airbyte.io/company/culture-and-values)**, **[strategy](https://handbook.airbyte.io/strategy/strategy)** and **[roadmap](../roadmap.md)** are open to all. We're backed by some of the world's [top investors](./#our-investors) and believe in product-led growth, where we build something awesome and let our product bring the users, rather than an outbound sales engine with cold calls. @@ -44,9 +44,9 @@ Remote ## **We provide** * **Flexible work environment as fully remote** - we don’t look at when you log in, log out or how much time you work. We trust you, it’s the only way remote can actually work. -* [**Unlimited vacation policy**](https://handbook.airbyte.io/people/time-off) with mandatory minimum time off - so you can fit work around your life. -* [**Co-working space stipend**](https://handbook.airbyte.io/people/expense-policy#work-space) - we provide everyone with $200/month to use on a coworking space of their choice, if any. -* [**Parental leave**](https://handbook.airbyte.io/people/time-off#parental-leave) \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us. +* **[Unlimited vacation policy](https://handbook.airbyte.io/people/time-off)** with mandatory minimum time off - so you can fit work around your life. +* **[Co-working space stipend](https://handbook.airbyte.io/people/expense-policy#work-space)** - we provide everyone with $200/month to use on a coworking space of their choice, if any. +* **[Parental leave](https://handbook.airbyte.io/people/time-off#parental-leave)** \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us. * **Open book policy** - we reimburse books that employees want to purchase for their professional and career development. * **Continuous learning / training policy** - we sponsor the conferences and training programs you feel would add to your development in the company. * **Health insurance** for those from countries that do not provide this freely. Through Savvy in the US, which means you can choose the insurance you want and will receive a stipend from the company. diff --git a/docs/career-and-open-positions/senior-software-engineer.md b/docs/career-and-open-positions/senior-software-engineer.md index 99c1bb1ecb6..be243fc6cc6 100644 --- a/docs/career-and-open-positions/senior-software-engineer.md +++ b/docs/career-and-open-positions/senior-software-engineer.md @@ -39,9 +39,9 @@ Wherever you want! ## **Perks!!!** * **Flexible work environment as fully remote** - we don’t look at when you log in, log out or how much time you work. We trust you, it’s the only way remote can actually work. -* [**Unlimited vacation policy**](https://handbook.airbyte.io/people/time-off) with mandatory minimum time off - so you can fit work around your life. -* [**Co-working space stipend**](https://handbook.airbyte.io/people/expense-policy#work-space) - we provide everyone with $200/month to use on a coworking space of their choice, if any. -* [**Parental leave**](https://handbook.airbyte.io/people/time-off#parental-leave) \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us. +* **[Unlimited vacation policy](https://handbook.airbyte.io/people/time-off)** with mandatory minimum time off - so you can fit work around your life. +* **[Co-working space stipend](https://handbook.airbyte.io/people/expense-policy#work-space)** - we provide everyone with $200/month to use on a coworking space of their choice, if any. +* **[Parental leave](https://handbook.airbyte.io/people/time-off#parental-leave)** \(for both parents, after one year spent with the company\) - so those raising families can do so while still working for us. * **Open book policy** - we reimburse books that employees want to purchase for their professional and career development. * **Continuous learning / training policy** - we sponsor the conferences and training programs you feel would add to your development in the company. * **Health insurance** for those from countries that do not provide this freely. Through Savvy in the US, which means you can choose the insurance you want and will receive a stipend from the company. diff --git a/docs/changelog/connectors.md b/docs/changelog/connectors.md index cfb3be8d5da..a08c07e982f 100644 --- a/docs/changelog/connectors.md +++ b/docs/changelog/connectors.md @@ -14,11 +14,11 @@ Check out our [connector roadmap](https://github.com/airbytehq/airbyte/projects/ ## 04/13/2021 -* New connector: [**Oracle DB**](https://docs.airbyte.io/integrations/sources/oracle) +* New connector: **[Oracle DB](https://docs.airbyte.io/integrations/sources/oracle)** ## 04/07/2021 -* New connector: [**Google Workspace Admin Reports**](https://docs.airbyte.io/integrations/sources/google-workspace-admin-reports) \(audit logs\) +* New connector: **[Google Workspace Admin Reports](https://docs.airbyte.io/integrations/sources/google-workspace-admin-reports)** (audit logs) * Bugfix in the base python connector library that caused errors to be silently skipped rather than failing the sync * **Exchangeratesapi.io** bugfix: to point to the updated API URL * **Redshift destination** bugfix: quote keywords “DATETIME” and “TIME” when used as identifiers diff --git a/docs/changelog/platform.md b/docs/changelog/platform.md index ac301e90329..53e64090d2c 100644 --- a/docs/changelog/platform.md +++ b/docs/changelog/platform.md @@ -12,7 +12,7 @@ If you're interested in our progress on the Airbyte platform, please read below! ## [04-12-2021 - 0.20.0](https://github.com/airbytehq/airbyte/releases/tag/v0.20.0-alpha) -* **Change Data Capture \(CDC\)** is now supported for Postgres, thanks to [@jrhizor](https://github.com/jrhizor) and [@cgardens](https://github.com/cgardens). We will now expand it to MySQL and MSSQL in the coming weeks. +* **Change Data Capture (CDC)** is now supported for Postgres, thanks to [@jrhizor](https://github.com/jrhizor) and [@cgardens](https://github.com/cgardens). We will now expand it to MySQL and MSSQL in the coming weeks. * When displaying the schema for a source, you can now search for table names, thanks to [@jamakase](https://github.com/jamakase) * Better feedback UX when manually triggering a sync with “Sync now” diff --git a/docs/contributing-to-airbyte/building-new-connector/README.md b/docs/contributing-to-airbyte/building-new-connector/README.md index 614262a35f0..b313cf2808a 100644 --- a/docs/contributing-to-airbyte/building-new-connector/README.md +++ b/docs/contributing-to-airbyte/building-new-connector/README.md @@ -42,7 +42,7 @@ npm run generate and choose the relevant template. This will generate a new connector in the `airbyte-integrations/connectors/` directory. -Search the generated directory for "TODO"s and follow them to implement your connector. +Search the generated directory for "TODO"s and follow them to implement your connector. If you are developing a Python connector, you may find the [building a Python connector tutorial](../../tutorials/building-a-python-source.md) helpful. @@ -54,14 +54,14 @@ At a minimum, your connector must implement the standard tests described in [Tes If you're writing in Python or Java, skip this section -- it is provided automatically. -If you're writing in another language, please document the commands needed to: +If you're writing in another language, please document the commands needed to: 1. Build your connector docker image \(usually this is just `docker build .` but let us know if there are necessary flags, gotchas, etc..\) 2. Run any unit or integration tests _in a Docker image_. Your integration and unit tests must be runnable entirely within a Docker image. This is important to guarantee consistent build environments. -When you submit a PR to Airbyte with your connector, the reviewer will use the commands you provide to integrate your connector into Airbyte's build system as follows: +When you submit a PR to Airbyte with your connector, the reviewer will use the commands you provide to integrate your connector into Airbyte's build system as follows: 1. `:airbyte-integrations:connectors:source-:build` should run unit tests and build the integration's Docker image 2. `:airbyte-integrations:connectors:source-:integrationTest` should run integration tests including Airbyte's Standard test suite. diff --git a/docs/contributing-to-airbyte/building-new-connector/monorepo-python-development.md b/docs/contributing-to-airbyte/building-new-connector/monorepo-python-development.md index cc558bc76db..4615373cfab 100644 --- a/docs/contributing-to-airbyte/building-new-connector/monorepo-python-development.md +++ b/docs/contributing-to-airbyte/building-new-connector/monorepo-python-development.md @@ -4,7 +4,7 @@ This guide contains instructions on how to setup Python with Gradle within the A ## Python Connector Development -Before working with connectors written in Python, we recommend running `./gradlew :airbyte-integrations:connectors::build` \(e.g. `./gradlew :airbyte-integrations:connectors:source-postgres:build`\) from the root project directory. This will create a `virtualenv` and install dependencies for the connector you want to work on as well as any internal Airbyte python packages it depends on. +Before working with connectors written in Python, we recommend running `./gradlew :airbyte-integrations:connectors::build` (e.g. `./gradlew :airbyte-integrations:connectors:source-postgres:build`) from the root project directory. This will create a `virtualenv` and install dependencies for the connector you want to work on as well as any internal Airbyte python packages it depends on. When iterating on a single connector, you will often iterate by running diff --git a/docs/deploying-airbyte/on-aws-ecs.md b/docs/deploying-airbyte/on-aws-ecs.md index 63ccca5c0ee..3aed11571a0 100644 --- a/docs/deploying-airbyte/on-aws-ecs.md +++ b/docs/deploying-airbyte/on-aws-ecs.md @@ -1,8 +1,7 @@ -# On AWS ECS \(Coming Soon\) +# On AWS \(ECS\) -{% hint style="info" %} +{% hint style="warn" %} We do not currently support deployment on ECS. {% endhint %} The current iteration is not compatible with ECS. Airbyte currently relies on docker containers being able to create other docker containers. ECS does not permit containers to do this. We will be revising this strategy soon, so that we can be compatible with ECS and other container services. - diff --git a/docs/faq/differences-with/fivetran-vs-airbyte.md b/docs/faq/differences-with/fivetran-vs-airbyte.md index a5484259345..ed7a358c0e0 100644 --- a/docs/faq/differences-with/fivetran-vs-airbyte.md +++ b/docs/faq/differences-with/fivetran-vs-airbyte.md @@ -13,7 +13,7 @@ We wrote an article, “[Open-source vs. Commercial Software: How to Solve the D ## **Airbyte:** -* **Free, as open source, so no more pricing based on usage**: learn more about our [future business model]() \(connectors will always remain open source\). +* **Free, as open source, so no more pricing based on usage**: learn more about our [future business model](../../company-handbook/business-model.md) \(connectors will always remain open source\). * **Supporting 60 connectors within 8 months from inception**. Our goal is to reach 200+ connectors by the end of 2021. * **Building new connectors made trivial, in the language of your choice:** Airbyte makes it a lot easier to create your own connector, vs. building them yourself in-house \(with Airflow or other tools\). Scheduling, orchestration, and monitoring comes out of the box with Airbyte. * **Addressing the long tail of connectors:** with the help of the community, Airbyte ambitions to support thousands of connectors. diff --git a/docs/faq/differences-with/meltano-vs-airbyte.md b/docs/faq/differences-with/meltano-vs-airbyte.md index 9de15956220..2022ad8c619 100644 --- a/docs/faq/differences-with/meltano-vs-airbyte.md +++ b/docs/faq/differences-with/meltano-vs-airbyte.md @@ -1,6 +1,6 @@ # Meltano vs Airbyte -We wrote an article, “[The State of Open-Source Data Integration and ETL](https://airbyte.io/articles/data-engineering-thoughts/the-state-of-open-source-data-integration-and-etl/),” in which we list and compare all ETL-related open-source projects, including Meltano and Airbyte. Don’t hesitate to check it out for more detailed arguments. As a summary, here are the differences: +We wrote an article, “[The State of Open-Source Data Integration and ETL](https://airbyte.io/articles/data-engineering-thoughts/the-state-of-open-source-data-integration-and-etl/),” in which we list and compare all ETL-related open-source projects, including Meltano and Airbyte. Don’t hesitate to check it out for more detailed arguments. As a summary, here are the differences: ![](https://airbyte.io/wp-content/uploads/2020/10/Landscape-of-open-source-data-integration-platforms-4.png) @@ -16,7 +16,7 @@ Meltano is a Gitlab side project. Since 2019, they have been iterating on severa ## **Airbyte:** -In contrast, Airbyte is a company fully committed to the open-source MIT project and has a [business model](https://github.com/airbytehq/airbyte/tree/428e10e727c05e5aed4235610ab86f0e5b304864/docs/company-handbook/business-model.md)in mind around this project. Our [team](https://github.com/airbytehq/airbyte/tree/428e10e727c05e5aed4235610ab86f0e5b304864/docs/company-handbook/team.md) are data integration experts that have built more than 1,000 integrations collectively at large scale. The team now counts 20 engineers working full-time on Airbyte. +In contrast, Airbyte is a company fully committed to the open-source MIT project and has a [business model](../../company-handbook/business-model.md)in mind around this project. Our [team](../../company-handbook/team.md) are data integration experts that have built more than 1,000 integrations collectively at large scale. The team now counts 20 engineers working full-time on Airbyte. * **Airbyte supports more than 60 connectors after only 8 months since its inception**, 20% of which were built by the community. Our ambition is to support **200+ connectors by the end of 2021.** * Airbyte’s connectors are **usable out of the box through a UI and API,** with monitoring, scheduling and orchestration. Airbyte was built on the premise that a user, whatever their background, should be able to move data in 2 minutes. Data engineers might want to use raw data and their own transformation processes, or to use Airbyte’s API to include data integration in their workflows. On the other hand, analysts and data scientists might want to use normalized consolidated data in their database or data warehouses. Airbyte supports all these use cases. diff --git a/docs/faq/differences-with/stitchdata-vs-airbyte.md b/docs/faq/differences-with/stitchdata-vs-airbyte.md index 16f5fbc88a9..3f53504cef3 100644 --- a/docs/faq/differences-with/stitchdata-vs-airbyte.md +++ b/docs/faq/differences-with/stitchdata-vs-airbyte.md @@ -14,7 +14,7 @@ We wrote an article, “[Open-source vs. Commercial Software: How to Solve the D ## Airbyte: -* **Free, as open source, so no more pricing based on usage:** learn more about our [future business model]() \(connectors will always remain open-source\). +* **Free, as open source, so no more pricing based on usage:** learn more about our [future business model](../../company-handbook/business-model.md) \(connectors will always remain open-source\). * **Supporting 50+ connectors by the end of 2020** \(so in only 5 months of existence\). Our goal is to reach 300+ connectors by the end of 2021. * **Building new connectors made trivial, in the language of your choice:** Airbyte makes it a lot easier to create your own connector, vs. building them yourself in-house \(with Airflow or other tools\). Scheduling, orchestration, and monitoring comes out of the box with Airbyte. * **Maintenance-free connectors you can use in minutes.** Just authenticate your sources and warehouse, and get connectors that adapt to schema and API changes for you. diff --git a/docs/faq/technical-support.md b/docs/faq/technical-support.md index f016d7e6e3c..b5025bbb86f 100644 --- a/docs/faq/technical-support.md +++ b/docs/faq/technical-support.md @@ -71,26 +71,23 @@ Depending on your Docker network configuration, you may not be able to connect t If you are running into connection refused errors when running Airbyte via Docker Compose on Mac, try using `host.docker.internal` as the host. On Linux, you may have to modify `docker-compose.yml` and add a host that maps to your local machine using [`extra_hosts`](https://docs.docker.com/compose/compose-file/compose-file-v3/#extra_hosts). -## **Do you support change data capture \(CDC\) or logical replication for databases?** +## **Do you support change data capture (CDC) or logical replication for databases?** -We currently support [CDC for Postgres 10+](../integrations/sources/postgres.md). We are adding support for a few other databases April/May 2021. +We currently support [CDC for Postgres 10+](../integrations/sources/postgres.md). We are adding support for a few other databases April/May 2021. ## **Can I disable analytics in Airbyte?** Yes, you can control what's sent outside of Airbyte for analytics purposes. We instrumented some parts of Airbyte for the following reasons: - -* measure usage of Airbyte -* measure usage of features & connectors -* collect connector telemetry to measure stability -* reach out to our users if they opt-in -* ... +- measure usage of Airbyte +- measure usage of features & connectors +- collect connector telemetry to measure stability +- reach out to our users if they opt-in +- ... To disable telemetry, modify the `.env` file and define the two following environment variables: - -```text +``` TRACKING_STRATEGY=logging PAPERCUPS_STORYTIME=disabled ``` - diff --git a/docs/integrations/destinations/redshift.md b/docs/integrations/destinations/redshift.md index 91393b68ae7..041bf644628 100644 --- a/docs/integrations/destinations/redshift.md +++ b/docs/integrations/destinations/redshift.md @@ -4,7 +4,9 @@ The Airbyte Redshift destination allows you to sync data to Redshift. -This Redshift destination connector has two replication strategies: 1\) INSERT: Replicates data via SQL INSERT queries. This is built on top of the destination-jdbc code base and is configured to rely on JDBC 4.2 standard drivers provided by Amazon via Mulesoft [here](https://mvnrepository.com/artifact/com.amazon.redshift/redshift-jdbc42) as described in Redshift documentation [here](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-install.html). Not recommended for production workloads as this does not scale well. 2\) COPY: Replicates data by first uploading data to an S3 bucket and issuing a COPY command. This is the recommended loading approach described by Redshift [best practices](https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html). Requires an S3 bucket and credentials. +This Redshift destination connector has two replication strategies: +1) INSERT: Replicates data via SQL INSERT queries. This is built on top of the destination-jdbc code base and is configured to rely on JDBC 4.2 standard drivers provided by Amazon via Mulesoft [here](https://mvnrepository.com/artifact/com.amazon.redshift/redshift-jdbc42) as described in Redshift documentation [here](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-install.html). Not recommended for production workloads as this does not scale well. +2) COPY: Replicates data by first uploading data to an S3 bucket and issuing a COPY command. This is the recommended loading approach described by Redshift [best practices](https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html). Requires an S3 bucket and credentials. Airbyte automatically picks an approach depending on the given configuration - if S3 configuration is present, Airbyte will use the COPY strategy and vice versa. @@ -37,7 +39,7 @@ You will need to choose an existing database or create a new database that will 1. Active Redshift cluster 2. Allow connections from Airbyte to your Redshift cluster \(if they exist in separate VPCs\) -3. A staging S3 bucket with credentials \(for the COPY strategy\). +3. A staging S3 bucket with credentials (for the COPY strategy). ### Setup guide @@ -59,7 +61,7 @@ You should have all the requirements needed to configure Redshift as a destinati * **Database** * This database needs to exist within the cluster provided. -#### 2a. Fill up S3 info \(for COPY strategy\) +#### 2a. Fill up S3 info (for COPY strategy) Provide the required S3 info. diff --git a/docs/integrations/sources/google-workspace-admin-reports.md b/docs/integrations/sources/google-workspace-admin-reports.md index bbd6aa8be62..2f01068b875 100644 --- a/docs/integrations/sources/google-workspace-admin-reports.md +++ b/docs/integrations/sources/google-workspace-admin-reports.md @@ -1,4 +1,4 @@ -# Google Workspace Admin Reports +# Google Workspace Admin Reports API ## Overview @@ -12,7 +12,7 @@ This Source is capable of syncing the following Streams: * [drive](https://developers.google.com/admin-sdk/reports/v1/guides/manage-audit-drive) * [logins](https://developers.google.com/admin-sdk/reports/v1/guides/manage-audit-login) * [mobile](https://developers.google.com/admin-sdk/reports/v1/guides/manage-audit-mobile) -* [oauth\_tokens](https://developers.google.com/admin-sdk/reports/v1/guides/manage-audit-tokens) +* [oauth_tokens](https://developers.google.com/admin-sdk/reports/v1/guides/manage-audit-tokens) ### Data type mapping @@ -38,18 +38,16 @@ This connector attempts to back off gracefully when it hits Reports API's rate l ## Getting started ### Requirements - * Credentials to a Google Service Account with delegated Domain Wide Authority * Email address of the workspace admin which created the Service Account ### Create a Service Account with delegated domain wide authority - -Follow the Google Documentation for performing [Domain Wide Delegation of Authority](https://developers.google.com/admin-sdk/reports/v1/guides/delegation) to create a Service account with delegated domain wide authority. This account must be created by an administrator of the Google Workspace. Please make sure to grant the following OAuth scopes to the service user: +Follow the Google Documentation for performing [Domain Wide Delegation of Authority](https://developers.google.com/admin-sdk/reports/v1/guides/delegation) to create a Service account with delegated domain wide authority. This account must be created by an administrator of the Google Workspace. +Please make sure to grant the following OAuth scopes to the service user: 1. `https://www.googleapis.com/auth/admin.reports.audit.readonly` 2. `https://www.googleapis.com/auth/admin.reports.usage.readonly` -At the end of this process, you should have JSON credentials to this Google Service Account. - -You should now be ready to use the Google Workspace Admin Reports API connector in Airbyte. +At the end of this process, you should have JSON credentials to this Google Service Account. +You should now be ready to use the Google Workspace Admin Reports API connector in Airbyte. diff --git a/docs/integrations/sources/oracle.md b/docs/integrations/sources/oracle.md index 49e0856a352..7918eca2f4e 100644 --- a/docs/integrations/sources/oracle.md +++ b/docs/integrations/sources/oracle.md @@ -1,12 +1,17 @@ -# Oracle DB +# Oracle ## Overview -The Oracle Database source supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run. +The Oracle Database source supports both Full Refresh and Incremental syncs. You can choose if this +connector will copy only the new or updated data, or all rows in the tables and columns you set up +for replication, every time a sync is run. + ### Resulting schema -The Oracle source does not alter the schema present in your database. Depending on the destination connected to this source, however, the schema may be altered. See the destination's documentation for more details. +The Oracle source does not alter the schema present in your database. Depending on the destination +connected to this source, however, the schema may be altered. See the destination's documentation +for more details. ### Data type mapping @@ -14,13 +19,14 @@ Oracle data types are mapped to the following data types when synchronizing data | Oracle Type | Resulting Type | Notes | | :--- | :--- | :--- | -| `number` | number | | +| `number` | number | | | `integer` | number | | | `decimal` | number | | | `float` | number | | | everything else | string | | -If you do not see a type in this list, assume that it is coerced into a string. We are happy to take feedback on preferred mappings. +If you do not see a type in this list, assume that it is coerced into a string. We are happy to +take feedback on preferred mappings. ### Features @@ -67,11 +73,9 @@ GRANT SELECT ANY TABLE TO airbyte; ``` Or you can be more granular: - ```sql GRANT SELECT ON ""."" TO airbyte; GRANT SELECT ON ""."" TO airbyte; ``` Your database user should now be ready for use with Airbyte. - diff --git a/docs/integrations/sources/postgres.md b/docs/integrations/sources/postgres.md index 28a2d13f954..b7f057f331a 100644 --- a/docs/integrations/sources/postgres.md +++ b/docs/integrations/sources/postgres.md @@ -99,36 +99,33 @@ ALTER DEFAULT PRIVILEGES IN SCHEMA GRANT SELECT ON TABLES TO airby #### 3. Set up CDC \(Optional\) -Please read [the section on CDC below](postgres.md#setting-up-cdc-for-postgres) for more information. +Please read [the section on CDC below](#setting-up-cdc-for-postgres) for more information. #### 4. That's it! Your database user should now be ready for use with Airbyte. -## Change Data Capture \(CDC\) / Logical Replication / WAL Replication - -We use [logical replication](https://www.postgresql.org/docs/10/logical-replication.html) of the Postgres write-ahead log \(WAL\) to incrementally capture deletes using the `pgoutput` plugin. +## Change Data Capture (CDC) / Logical Replication / WAL Replication +We use [logical replication](https://www.postgresql.org/docs/10/logical-replication.html) of the Postgres write-ahead log (WAL) to incrementally capture deletes using the `pgoutput` plugin. We do not require installing custom plugins like `wal2json` or `test_decoding`. We use `pgoutput`, which is included in Postgres 10+ by default. Please read the [CDC docs](../../architecture/cdc.md) for an overview of how Airbyte approaches CDC. ### Should I use CDC for Postgres? - * If you need a record of deletions and can accept the limitations posted below, you should to use CDC for Postgres. * If your data set is small and you just want snapshot of your table in the destination, consider using Full Refresh replication for your table instead of CDC. * If the limitations prevent you from using CDC and your goal is to maintain a snapshot of your table in the destination, consider using non-CDC incremental and occasionally reset the data and re-sync. -* If your table has a primary key but doesn't have a reasonable cursor field for incremental syncing \(i.e. `updated_at`\), CDC allows you to sync your table incrementally. +* If your table has a primary key but doesn't have a reasonable cursor field for incremental syncing (i.e. `updated_at`), CDC allows you to sync your table incrementally. ### CDC Limitations - * Make sure to read our [CDC docs](../../architecture/cdc.md) to see limitations that impact all databases using CDC replication. * CDC is only available for Postgres 10+. * Airbyte requires a replication slot configured only for its use. Only one source should be configured that uses this replication slot. Instructions on how to set up a replication slot can be found below. * Log-based replication only works for master instances of Postgres. * Using logical replication increases disk space used on the database server. The additional data is stored until it is consumed. - * We recommend setting frequent syncs for CDC in order to ensure that this data doesn't fill up your disk space. - * If you stop syncing a CDC-configured Postgres instance to Airbyte, you should delete the replication slot. Otherwise, it may fill up your disk space. + * We recommend setting frequent syncs for CDC in order to ensure that this data doesn't fill up your disk space. + * If you stop syncing a CDC-configured Postgres instance to Airbyte, you should delete the replication slot. Otherwise, it may fill up your disk space. * Our CDC implementation uses at least once delivery for all change records. ### Setting up CDC for Postgres @@ -136,20 +133,17 @@ Please read the [CDC docs](../../architecture/cdc.md) for an overview of how Air #### Enable logical replication Follow one of these guides to enable logical replication: +* [Bare Metal, VMs (EC2/GCE/etc), Docker, etc.](#setting-up-cdc-on-bare-metal-vms-ec2gceetc-docker-etc) +* [AWS Postgres RDS or Aurora](#setting-up-cdc-on-aws-postgres-rds-or-aurora) +* [Azure Database for Postgres](#setting-up-cdc-on-azure-database-for-postgres) -* [Bare Metal, VMs \(EC2/GCE/etc\), Docker, etc.](postgres.md#setting-up-cdc-on-bare-metal-vms-ec2gceetc-docker-etc) -* [AWS Postgres RDS or Aurora](postgres.md#setting-up-cdc-on-aws-postgres-rds-or-aurora) -* [Azure Database for Postgres](postgres.md#setting-up-cdc-on-azure-database-for-postgres) +#### Add user-level permissions -#### Add user-level permissions - -We recommend using a user specifically for Airbyte's replication so you can minimize access. This Airbyte user for your instance needs to be granted `REPLICATION` and `LOGIN` permissions. You can create a role with `CREATE ROLE REPLICATION LOGIN;` and grant that role to the user. You still need to make sure the user can connect to the database, use the schema, and to use `SELECT` on tables \(the same are required for non-CDC incremental syncs and all full refreshes\). +We recommend using a user specifically for Airbyte's replication so you can minimize access. This Airbyte user for your instance needs to be granted `REPLICATION` and `LOGIN` permissions. You can create a role with `CREATE ROLE REPLICATION LOGIN;` and grant that role to the user. You still need to make sure the user can connect to the database, use the schema, and to use `SELECT` on tables (the same are required for non-CDC incremental syncs and all full refreshes). #### Create replication slot - -Next, you will need to create a replication slot. Here is the query used to create a replication slot called `airbyte_slot`: - -```text +Next, you will need to create a replication slot. Here is the query used to create a replication slot called `airbyte_slot`: +``` SELECT pg_create_logical_replication_slot('airbyte_slot', 'pgoutput');` ``` @@ -162,12 +156,10 @@ For each table you want to replicate with CDC, you will need to run `CREATE PUBL The UI currently allows selecting any tables for CDC. If a table is selected that is not part of the publication, it will not replicate even though it is selected. If a table is part of the publication but does not have a replication identity, that replication identity will be created automatically on the first run if the Airbyte user has the necessary permissions. #### Start syncing - When configuring the source, select CDC and provide the replication slot and publication you just created. You should be ready to sync data with CDC! -### Setting up CDC on Bare Metal, VMs \(EC2/GCE/etc\), Docker, etc. - -Some settings must be configured in the `postgresql.conf` file for your database. You can find the location of this file using `psql -U postgres -c 'SHOW config_file'` withe the correct `psql` credentials specified. Alternatively, a custom file can be specified when running postgres with the `-c` flag. For example `postgres -c config_file=/etc/postgresql/postgresql.conf` runs Postgres with the config file at `/etc/postgresql/postgresql.conf`. +### Setting up CDC on Bare Metal, VMs (EC2/GCE/etc), Docker, etc. +Some settings must be configured in the `postgresql.conf` file for your database. You can find the location of this file using `psql -U postgres -c 'SHOW config_file'` withe the correct `psql` credentials specified. Alternatively, a custom file can be specified when running postgres with the `-c` flag. For example `postgres -c config_file=/etc/postgresql/postgresql.conf` runs Postgres with the config file at `/etc/postgresql/postgresql.conf`. If you are syncing data from a server using the `postgres` Docker image, you will need to mount a file and change the command to run Postgres with the set config file. If you're just testing CDC behavior, you may want to use a modified version of a [sample `postgresql.conf`](https://github.com/postgres/postgres/blob/master/src/backend/utils/misc/postgresql.conf.sample). @@ -176,8 +168,7 @@ If you are syncing data from a server using the `postgres` Docker image, you wil * `max_replication_slots` is the maximum number of replication slots that are allowed to stream WAL changes. This must one if Airbyte will be the only service reading subscribing to WAL changes or more if other services are also reading from the WAL. Here is what these settings would look like in `postgresql.conf`: - -```text +``` wal_level = logical max_wal_senders = 1 max_replication_slots = 1 @@ -185,32 +176,27 @@ max_replication_slots = 1 After setting these values you will need to restart your instance. -Finally, [follow the rest of steps above](postgres.md#setting-up-cdc-for-postgres). +Finally, [follow the rest of steps above](#setting-up-cdc-for-postgres). ### Setting up CDC on AWS Postgres RDS or Aurora - * Go to the `Configuration` tab for your DB cluster. * Find your cluster parameter group. You will either edit the parameters for this group or create a copy of this parameter group to edit. If you create a copy you will need to change your cluster's parameter group before restarting. * Within the parameter group page, search for `rds.logical_replication`. Select this row and click on the `Edit parameters` button. Set this value to `1`. * Wait for a maintenance window to automatically restart the instance or restart it manually. -* Finally, [follow the rest of steps above](postgres.md#setting-up-cdc-for-postgres). +* Finally, [follow the rest of steps above](#setting-up-cdc-for-postgres). ### Setting up CDC on Azure Database for Postgres - Use either the Azure CLI to: - -```text +``` az postgres server configuration set --resource-group group --server-name server --name azure.replication_support --value logical az postgres server restart --resource-group group --name server ``` -Finally, [follow the rest of steps above](postgres.md#setting-up-cdc-for-postgres). +Finally, [follow the rest of steps above](#setting-up-cdc-for-postgres). ### Setting up CDC on Google CloudSQL Unfortunately, logical replication is not configurable for Google CloudSQL. You can indicate your support for this feature on the [Google Issue Tracker](https://issuetracker.google.com/issues/120274585). ### Setting up CDC on other platforms - If you encounter one of those not listed below, please consider [contributing to our docs](https://github.com/airbytehq/airbyte/tree/master/docs) and providing setup instructions. - diff --git a/docs/integrations/sources/zendesk-talk.md b/docs/integrations/sources/zendesk-talk.md index e6a05df4ddc..c10c9776fe5 100644 --- a/docs/integrations/sources/zendesk-talk.md +++ b/docs/integrations/sources/zendesk-talk.md @@ -14,8 +14,8 @@ This Source is capable of syncing the following core Streams: * [Addresses](https://developer.zendesk.com/rest_api/docs/voice-api/phone_numbers#list-phone-numbers) * [Agents Activity](https://developer.zendesk.com/rest_api/docs/voice-api/stats#list-agents-activity) * [Agents Overview](https://developer.zendesk.com/rest_api/docs/voice-api/stats#show-agents-overview) -* [Calls](https://developer.zendesk.com/rest_api/docs/voice-api/incremental_exports#incremental-calls-export) \(Incremental sync\) -* [Call Legs](https://developer.zendesk.com/rest_api/docs/voice-api/incremental_exports#incremental-call-legs-export) \(Incremental sync\) +* [Calls](https://developer.zendesk.com/rest_api/docs/voice-api/incremental_exports#incremental-calls-export) (Incremental sync) +* [Call Legs](https://developer.zendesk.com/rest_api/docs/voice-api/incremental_exports#incremental-call-legs-export) (Incremental sync) * [Current Queue Activity](https://developer.zendesk.com/rest_api/docs/voice-api/stats#show-current-queue-activity) * [Greeting Categories](https://developer.zendesk.com/rest_api/docs/voice-api/greetings#list-greeting-categories) * [Greetings](https://developer.zendesk.com/rest_api/docs/voice-api/greetings#list-greetings) @@ -59,4 +59,3 @@ The Zendesk connector should not run into Zendesk API limitations under normal u Generate a API access token as described in [Zendesk docs](https://support.zendesk.com/hc/en-us/articles/226022787-Generating-a-new-API-token-) We recommend creating a restricted, read-only key specifically for Airbyte access. This will allow you to control which resources Airbyte should be able to access. - diff --git a/docs/roadmap.md b/docs/roadmap.md index 06767bed0b9..bb248184c83 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -13,15 +13,14 @@ Check out our [Roadmap for Core](https://github.com/airbytehq/airbyte/milestones We understand that we're not "production-ready" for a lot of companies yet. In the end, we just got started in July 2020, so we're at the beginning of the journey. Here is a highlight of the main features we are planning on releasing in the next few months: **April or so:** - * Low-code framework to build new connectors * Support of most popular databases as both sources and destinations -* CDC \(change data capture\) support across most popular databases \(MySQL / SQL Server / Oracle DB\) +* CDC \(change data capture\) support across most popular databases (MySQL / SQL Server / Oracle DB) * Support of data lakes, starting with Delta Lake * Support for custom DBT transformations -**Coming a bit later:** +**Coming a bit later:** * Our declarative interface \(CLI\) * Credential and secrets vaulting \([\#837](https://github.com/airbytehq/airbyte/issues/837)\) * OAuth support for connector configuration \([\#768](https://github.com/airbytehq/airbyte/issues/768)\) diff --git a/docs/tutorials/building-a-python-source.md b/docs/tutorials/building-a-python-source.md index a4a886f8e40..08344ca0f5a 100644 --- a/docs/tutorials/building-a-python-source.md +++ b/docs/tutorials/building-a-python-source.md @@ -9,7 +9,7 @@ This article provides a checklist for how to create a python source. Each step i Docker, Python, and Java with the versions listed in the [tech stack section](../architecture/tech-stack.md). {% hint style="info" %} -All the commands below assume that `python` points to a version of python >3.7. On some systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the case on your machine, substitute all `python` commands in this guide with `python3` . Otherwise, make sure to install Python 3 before beginning. +All the commands below assume that `python` points to a version of python >3.7. On some systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the case on your machine, substitute all `python` commands in this guide with `python3` . Otherwise, make sure to install Python 3 before beginning. {% endhint %} ## Checklist diff --git a/docs/tutorials/zoom-activity-dashboard.md b/docs/tutorials/zoom-activity-dashboard.md index 9d0571c87d1..f7c880acfdf 100644 --- a/docs/tutorials/zoom-activity-dashboard.md +++ b/docs/tutorials/zoom-activity-dashboard.md @@ -44,19 +44,19 @@ Choosing Zoom as **source type** will cause Airbyte to display the configuration ![](../.gitbook/assets/02_setting-zoom-connector-name.png) -The Zoom connector for Airbyte requires you to provide it with a Zoom JWT token. Let’s take a detour and look at how to obtain one from Zoom. +The Zoom connector for Airbyte requires you to provide it with a Zoom JWT token. Let’s take a detour and look at how to obtain one from Zoom. ### Obtaining a Zoom JWT Token -To obtain a Zoom JWT Token, login to your Zoom account and go to the [Zoom Marketplace](https://marketplace.zoom.us/). If this is your first time in the marketplace, you will need to agree to the Zoom’s marketplace terms of use. +To obtain a Zoom JWT Token, login to your Zoom account and go to the [Zoom Marketplace](https://marketplace.zoom.us/). If this is your first time in the marketplace, you will need to agree to the Zoom’s marketplace terms of use. -Once you are in, you need to click on the **Develop** dropdown and then click on **Build App.** +Once you are in, you need to click on the **Develop** dropdown and then click on **Build App.** ![](../.gitbook/assets/03_click.png) Clicking on **Build App** for the first time will display a modal for you to accept the Zoom’s API license and terms of use. Do accept if you agree and you will be presented with the below screen. -![](../.gitbook/assets/zoom-marketplace-build-screen.png) +![](../.gitbook/assets/04_zoom-marketplace-build-screen.png) Select **JWT** as the app you want to build and click on the **Create** button on the card. You will be presented with a modal to enter the app name; type in `airbyte-zoom`. @@ -78,15 +78,15 @@ After copying it, click on the **Continue** button. ![](../.gitbook/assets/08_activate-webhook.png) -You will be taken to a screen to activate **Event Subscriptions**. Just leave it as is, as we won’t be needing Webhooks. Click on **Continue**, and your app should be marked as activated. +You will be taken to a screen to activate **Event Subscriptions**. Just leave it as is, as we won’t be needing Webhooks. Click on **Continue**, and your app should be marked as activated. ### Connecting Zoom on Airbyte So let’s go back to the Airbyte web UI and provide it with the JWT token we copied from our Zoom app. -Now click on the **Set up source** button. You will see the below success message when the connection is made successfully. +Now click on the **Set up source** button. You will see the below success message when the connection is made successfully. -![](../.gitbook/assets/setup-successful%20%281%29.png) +![](../.gitbook/assets/09_setup-successful.png) And you will be taken to the page to add your destination. @@ -94,19 +94,19 @@ And you will be taken to the page to add your destination. ![](../.gitbook/assets/10_destination.png) -For our destination, we will be using a PostgreSQL database, since Tableau supports PostgreSQL as a data source. Click on the **add destination** button, and then in the drop down click on **+ add a new destination**. In the page that presents itself, add the destination name and choose the Postgres destination. +For our destination, we will be using a PostgreSQL database, since Tableau supports PostgreSQL as a data source. Click on the **add destination** button, and then in the drop down click on **+ add a new destination**. In the page that presents itself, add the destination name and choose the Postgres destination. ![](../.gitbook/assets/11_choose-postgres-destination.png) -To supply Airbyte with the PostgreSQL configuration parameters needed to make a PostgreSQL destination, we will spin off a PostgreSQL container with Docker using the following command in our terminal. - +To supply Airbyte with the PostgreSQL configuration parameters needed to make a PostgreSQL destination, we will spin off a PostgreSQL container with Docker using the following command in our terminal. + `docker run --rm --name airbyte-zoom-db -e POSTGRES_PASSWORD=password -v airbyte_zoom_data:/var/lib/postgresql/data -p 2000:5432 -d postgres` -This will spin a docker container and persist the data we will be replicating in the PostgreSQL database in a Docker volume `airbyte_zoom_data`. - +This will spin a docker container and persist the data we will be replicating in the PostgreSQL database in a Docker volume `airbyte_zoom_data`. + Now, let’s supply the above credentials to the Airbyte UI requiring those credentials. -![](../.gitbook/assets/postgres_credentials.png) +![](../.gitbook/assets/12_postgres_credentials.png) Then click on the **Set up destination** button. @@ -114,20 +114,20 @@ After the connection has been made to your PostgreSQL database successfully, Air Leave all the fields checked. -![](../.gitbook/assets/schema.png) +![](../.gitbook/assets/13_schema.png) Select a **Sync frequency** of **manual** and then click on **Set up connection**. After successfully making the connection, you will see your PostgreSQL destination. Click on the Launch button to start the data replication. -![](../.gitbook/assets/launch%20%281%29.png) +![](../.gitbook/assets/14_launch.png) -Then click on the **airbyte-zoom-destination** to see the Sync page. +Then click on the **airbyte-zoom-destination** to see the Sync page. -![](../.gitbook/assets/sync-screen%20%281%29.png) - -Syncing should take a few minutes or longer depending on the size of the data being replicated. Once Airbyte is done replicating the data, you will get a **succeeded** status. +![](../.gitbook/assets/15_sync-screen.png) +Syncing should take a few minutes or longer depending on the size of the data being replicated. Once Airbyte is done replicating the data, you will get a **succeeded** status. + Then, you can run the following SQL command on the PostgreSQL container to confirm that the sync was done successfully. `docker exec airbyte-zoom-db psql -U postgres -c "SELECT * FROM public.users;"` @@ -144,15 +144,15 @@ Go ahead and install Tableau on your machine. After the installation is complete Once your activation is successful, you will see your Tableau dashboard. -![](../.gitbook/assets/tableau-dashboard.png) +![](../.gitbook/assets/16_tableau-dashboard.png) On the sidebar menu under the **To a Server** section, click on the **More…** menu. You will see a list of datasource connectors you can connect Tableau with. -![](../.gitbook/assets/datasources%20%282%29.png) +![](../.gitbook/assets/17_datasources.png) Select **PostgreSQL** and you will be presented with a connection credentials modal. -Fill in the same details of the PostgreSQL database we used as the destination in Airbyte. +Fill in the same details of the PostgreSQL database we used as the destination in Airbyte. ![](../.gitbook/assets/18_fill-in-connection-details.png) @@ -160,6 +160,8 @@ Next, click on the **Sign In** button. If the connection was made successfully, _Note: If you are having trouble connecting PostgreSQL with Tableau, it might be because the driver Tableau comes with for PostgreSQL might not work for newer versions of PostgreSQL. You can download the JDBC driver for PostgreSQL_ [_here_](https://www.tableau.com/support/drivers?_ga=2.62351404.1800241672.1616922684-1838321730.1615100968) _and follow the setup instructions._ +------ + Now that we have replicated our Zoom data into a PostgreSQL database using Airbyte’s Zoom connector, and connected Tableau with our PostgreSQL database containing our Zoom data, let’s proceed to creating the charts we need to visualize the time spent by a team in Zoom calls. ## Step 3: Create the charts on Tableau with the Zoom data @@ -170,23 +172,25 @@ To create this chart, we will need to use the count of the meetings and the **cr ![](../.gitbook/assets/19_tableau-view-with-all-tables.png) -Drag the **meetings** table from the sidebar onto the space with the prompt. +Drag the **meetings** table from the sidebar onto the space with the prompt. -Now that we have the meetings table, we can start building out the chart by clicking on **Sheet 1** at the bottom left of Tableau. + + +Now that we have the meetings table, we can start building out the chart by clicking on **Sheet 1** at the bottom left of Tableau. ![](../.gitbook/assets/20_empty-meeting-sheet.png) -As stated earlier, we need **Created At**, but currently it’s a String data type. Let’s change that by converting it to a data time. So right click on **Created At**, then select `ChangeDataType` and choose Date & Time. And that’s it! That field is now of type **Date** & **Time**. +As stated earlier, we need **Created At**, but currently it’s a String data type. Let’s change that by converting it to a data time. So right click on **Created At**, then select `ChangeDataType` and choose Date & Time. And that’s it! That field is now of type **Date** & **Time**. ![](../.gitbook/assets/21_change-to-date-time.png) -Next, drag **Created At** to **Columns**. +Next, drag **Created At** to **Columns**. ![](../.gitbook/assets/22_drag-created-at.png) Currently, we get the Created At in **YEAR**, but per our requirement we want them in Weeks, so right click on the **YEAR\(Created At\)** and choose **Week Number**. -![](../.gitbook/assets/change-to-per-week.png) +![](../.gitbook/assets/23_change-to-per-week.png) Tableau should now look like this: @@ -194,7 +198,7 @@ Tableau should now look like this: Now, to finish up, we need to add the **meetings\(Count\) measure** Tableau already calculated for us in the **Rows** section. So drag **meetings\(Count\)** onto the Columns section to complete the chart. -![](../.gitbook/assets/evolution-of-meetings-per-week%20%281%29.png) +![](../.gitbook/assets/25_evolution-of-meetings-per-week.png) And now we are done with the very first chart. Let's save the sheet and create a new Dashboard that we will add this sheet to as well as the others we will be creating. @@ -220,19 +224,19 @@ Note: We are adding a filter on the Duration to filter out null values. You can ### Evolution of the number of participants for all meetings per week -For this chart, we will need to have a calculated field called **\# of meetings attended**, which will be an aggregate of the counts of rows matching a particular user's email in the `report_meeting_participants` table plotted against the **Created At** field of the **meetings** table. To get this done, right click on the **User Email** field. Select **create** and click on **calculatedField**, then enter the title of the field as **\# of meetings attended**. Next, enter the below formula: +For this chart, we will need to have a calculated field called **\# of meetings attended**, which will be an aggregate of the counts of rows matching a particular user's email in the `report_meeting_participants` table plotted against the **Created At** field of the **meetings** table. To get this done, right click on the **User Email** field. Select **create** and click on **calculatedField**, then enter the title of the field as **\# of meetings attended**. Next, enter the below formula: `COUNT(IF [User Email] == [User Email] THEN [Id (Report Meeting Participants)] END)` Then click on apply. Finally, drag the **Created At** fields \(make sure it’s on the **Weekly** number\) and the calculated field you just created to match the below screenshot: -![](../.gitbook/assets/number_of_participants_per_weekly_meetings.png) +![](../.gitbook/assets/28_number_of_participants_per_weekly_meetings.png) ### Listing of team members with the number of meetings per week and number of hours spent in meetings, ranked. To get this chart, we need to create a relationship between the **meetings table** and the `report_meeting_participants` table. You can do this by dragging the `report_meeting_participants` table in as a source alongside the **meetings** table and relate both via the **meeting id**. Then you will be able to create a new worksheet that looks like this: -![](../.gitbook/assets/meetings-participant-ranked.png) +![](../.gitbook/assets/29_meetings-participant-ranked.png) Note: To achieve the ranking, we simply use the sort menu icon on the top menu bar. @@ -246,7 +250,7 @@ The rest of the charts will be needing the **webinars** and `report_webinar_part For this chart, as for the meeting’s counterpart, we will get a calculated field off the Duration field to get the **Webinar Duration in Hours**, and then plot **Created At** against the **Sum of Webinar Duration in Hours**, as shown in the screenshot below. Note: Make sure you create a new sheet for each of these graphs. -![](../.gitbook/assets/duration-spent-in-weekly-webinars%20%281%29.png) +![](../.gitbook/assets/31_time-spent-in-weekly-webinars.png) ### Evolution of the number of participants for all webinars per week @@ -260,7 +264,7 @@ Below is the chart: ![](../.gitbook/assets/32_number_of_webinar_attended_per_week.png) -#### Listing of team members with the number of webinars per week and number of hours spent in meetings, ranked +#### Listing of team members with the number of webinars per week and number of hours spent in meetings, ranked Below is the chart with these specs @@ -268,7 +272,6 @@ Below is the chart with these specs ## Conclusion -In this article, we see how we can use Airbyte to get data off the Zoom API onto a PostgreSQL database, and then use that data to create some chart visualizations in Tableau. - -You can leverage Airbyte and Tableau to produce graphs on any collaboration tool. We just used Zoom to illustrate how it can be done. Hope this is helpful! +In this article, we see how we can use Airbyte to get data off the Zoom API onto a PostgreSQL database, and then use that data to create some chart visualizations in Tableau. +You can leverage Airbyte and Tableau to produce graphs on any collaboration tool. We just used Zoom to illustrate how it can be done. Hope this is helpful!