1
0
mirror of synced 2025-12-23 21:03:15 -05:00

📚 Docs Refresh: MySQL Source (#6666)

* Docs Refresh: MySQL Destination

* Add note about TLS.

* Update docs/integrations/sources/mysql.md

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* Update docs/integrations/sources/mysql.md

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* SSH Tunnel support.

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
This commit is contained in:
Abhi Vaidyanatha
2021-10-03 22:57:08 -07:00
committed by GitHub
parent bee03dc50f
commit 8a8af6aacb

View File

@@ -1,59 +1,6 @@
# MySQL
## Overview
The MySQL source supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run.
### Resulting schema
The MySQL source does not alter the schema present in your database. Depending on the destination connected to this source, however, the schema may be altered. See the destination's documentation for more details.
### Data type mapping
MySQL data types are mapped to the following data types when synchronizing data.
You can check the test values examples [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-mysql/src/test-integration/java/io/airbyte/integrations/source/mysql/MySqlSourceComprehensiveTest.java).
If you can't find the data type you are looking for or have any problems feel free to add a new test!
| MySQL Type | Resulting Type | Notes |
| :--- | :--- | :--- |
| `array` | array | |
| `bigint` | number | |
| `binary` | string | |
| `blob` | string | |
| `date` | string | |
| `datetime` | string | |
| `decimal` | number | |
| `decimal(19, 2)` | number | |
| `double` | number | |
| `enum` | string | |
| `float` | number | |
| `int` | number | |
| `int unsigned` | number | |
| `int zerofill` | number | |
| `json` | text | |
| `mediumint` | number | |
| `mediumint zerofill` | number | |
| `mediumint` | number | |
| `numeric` | number | |
| `point` | object | |
| `smallint` | number | |
| `smallint zerofill` | number | |
| `string` | string | |
| `tinyint` | number | |
| `text` | string | |
| `time` | string | |
| `timestamp` | string | |
| `tinytext` | string | |
| `varbinary(256)` | string | |
| `varchar` | string | |
| `varchar(256) character set cp1251` | string | |
| `varchar(256) character set utf16` | string | |
If you do not see a type in this list, assume that it is coerced into a string. We are happy to take feedback on preferred mappings.
**Note:** arrays for all the above types as well as custom types are supported, although they may be de-nested depending on the destination. Byte arrays are currently unsupported.
### Features
## Features
| Feature | Supported | Notes |
| :--- | :--- | :--- |
@@ -62,18 +9,26 @@ If you do not see a type in this list, assume that it is coerced into a string.
| Replicate Incremental Deletes | Yes | |
| CDC | Yes | |
| SSL Support | Yes | |
| SSH Tunnel Connection | Coming soon | |
| SSH Tunnel Connection | Yes | |
| Namespaces | Yes | Enabled by default |
| Arrays | Yes | Byte arrays are not supported yet |
## Getting started
The MySQL source does not alter the schema present in your database. Depending on the destination connected to this source, however, the schema may be altered. See the destination's documentation for more details.
### Requirements
## Troubleshooting
There may be problems with mapping values in MySQL's datetime field to other relational data stores. MySQL permits zero values for date/time instead of NULL which may not be accepted by other data stores. To work around this problem, you can pass the following key value pair in the JDBC connector of the source setting `zerodatetimebehavior=Converttonull`.
## Getting Started (Airbyte Cloud)
On Airbyte Cloud, only TLS connections to your MySQL instance are supported. Other than that, you can proceed with the open-source instructions below.
## Getting Started (Airbyte Open-Source)
#### Requirements
1. MySQL Server `8.0`, `5.7`, or `5.6`.
2. Create a dedicated read-only Airbyte user with access to all tables needed for replication
### Setup guide
#### 1. Make sure your database is accessible from the machine running Airbyte
This is dependent on your networking setup. The easiest way to verify if Airbyte is able to connect to your MySQL instance is via the check connection tool in the UI.
@@ -117,16 +72,13 @@ Your database user should now be ready for use with Airbyte.
* If the limitations prevent you from using CDC and your goal is to maintain a snapshot of your table in the destination, consider using non-CDC incremental and occasionally reset the data and re-sync.
* If your table has a primary key but doesn't have a reasonable cursor field for incremental syncing \(i.e. `updated_at`\), CDC allows you to sync your table incrementally.
### CDC Limitations
#### CDC Limitations
* Make sure to read our [CDC docs](../../understanding-airbyte/cdc.md) to see limitations that impact all databases using CDC replication.
* Our CDC implementation uses at least once delivery for all change records.
### Setting up CDC for MySQL
You must enable binary logging for MySQL replication. The binary logs record transaction updates for replication tools to propagate changes.
#### Enable binary logging
#### 1. Enable binary logging
You must enable binary logging for MySQL replication. The binary logs record transaction updates for replication tools to propagate changes. You can configure your MySQL server configuration file with the following properties, which are described in below:
```
@@ -142,7 +94,7 @@ expire_logs_days = 10
* binlog_row_image : The `binlog_row_image` must be set to `FULL`. It determines how row images are written to the binary log. For more information refer [mysql doc](https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html#sysvar_binlog_row_image)
* expire_logs_days : This is the number of days for automatic binlog file removal. We recommend 10 days so that in case of a failure in sync or if the sync is paused, we still have some bandwidth to start from the last point in incremental sync. We also recommend setting frequent syncs for CDC.
#### Enable GTIDs \(Optional\)
#### 2. Enable GTIDs \(Optional\)
Global transaction identifiers (GTIDs) uniquely identify transactions that occur on a server within a cluster.
Though not required for a Airbyte MySQL connector, using GTIDs simplifies replication and enables you to more easily confirm if primary and replica servers are consistent.
@@ -156,11 +108,8 @@ When a sync runs for the first time using CDC, Airbyte performs an initial consi
Airbyte doesn't acquire any table locks (for tables defined with MyISAM engine, the tables would still be locked) while creating the snapshot to allow writes by other database clients.
But in order for the sync to work without any error/unexpected behaviour, it is assumed that no schema changes are happening while the snapshot is running.
## Troubleshooting
There may be problems with mapping values in MySQL's datetime field to other relational data stores. MySQL permits zero values for date/time instead of NULL which may not be accepted by other data stores. To work around this problem, you can pass the following key value pair in the JDBC connector of the source setting `zerodatetimebehavior=Converttonull`.
## Connection to MySQL via an SSH Tunnel
## Connection via SSH Tunnel
Airbyte has the ability to connect to a MySQl instance via an SSH Tunnel. The reason you might want to do this because it is not possible (or against security policy) to connect to the database directly (e.g. it does not have a public IP address).
@@ -178,6 +127,49 @@ Using this feature requires additional configuration, when creating the source.
7. If you are using `SSH Key Authentication`, then `SSH Private Key` should be set to the RSA Private Key that you are using to create the SSH connection. This should be the full contents of the key file starting with `-----BEGIN RSA PRIVATE KEY-----` and ending with `-----END RSA PRIVATE KEY-----`.
## Data type mapping
MySQL data types are mapped to the following data types when synchronizing data.
You can check the test values examples [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-mysql/src/test-integration/java/io/airbyte/integrations/source/mysql/MySqlSourceComprehensiveTest.java).
If you can't find the data type you are looking for or have any problems feel free to add a new test!
| MySQL Type | Resulting Type | Notes |
| :--- | :--- | :--- |
| `array` | array | |
| `bigint` | number | |
| `binary` | string | |
| `blob` | string | |
| `date` | string | |
| `datetime` | string | |
| `decimal` | number | |
| `decimal(19, 2)` | number | |
| `double` | number | |
| `enum` | string | |
| `float` | number | |
| `int` | number | |
| `int unsigned` | number | |
| `int zerofill` | number | |
| `json` | text | |
| `mediumint` | number | |
| `mediumint zerofill` | number | |
| `mediumint` | number | |
| `numeric` | number | |
| `point` | object | |
| `smallint` | number | |
| `smallint zerofill` | number | |
| `string` | string | |
| `tinyint` | number | |
| `text` | string | |
| `time` | string | |
| `timestamp` | string | |
| `tinytext` | string | |
| `varbinary(256)` | string | |
| `varchar` | string | |
| `varchar(256) character set cp1251` | string | |
| `varchar(256) character set utf16` | string | |
If you do not see a type in this list, assume that it is coerced into a string. We are happy to take feedback on preferred mappings.
## Changelog
| Version | Date | Pull Request | Subject |