1
0
mirror of synced 2026-01-01 00:02:54 -05:00
Files
airbyte/docs/integrations/sources/s3.md
Christo Grabowski 56a7f07a92 📝 Docs: Source S3 documentation update (#28229)
* add detailed setup steps for s3 bucket

* complete s3 setup steps

* compress versioned setup steps

* update S3 Provider Settings section

* update CSV and Parquet sections

* update file format settings section

* final edits/fixes

* maintain typecase of True/False

* Update docs/integrations/sources/s3.md

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* Update docs/integrations/sources/s3.md

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* Update docs/integrations/sources/s3.md

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* Update docs/integrations/sources/s3.md

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* add example to escape character field description

---------

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
2023-07-12 14:27:58 -04:00

336 lines
32 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# S3
This page contains the setup guide and reference information for the Amazon S3 source connector.
:::info
Please note that using cloud storage may incur egress costs. Egress refers to data that is transferred out of the cloud storage system, such as when you download files or access them from a different location. For detailed information on egress costs, please consult the [Amazon S3 pricing guide](https://aws.amazon.com/s3/pricing/).
:::
## Prerequisites
- Access to the S3 bucket containing the files to replicate.
## Setup guide
### Step 1: Set up Amazon S3
**If you are syncing from a private bucket**, you will need to provide your `AWS Access Key ID` and `AWS Secret Access Key` to authenticate the connection, and ensure that the IAM user associated with the credentials has `read` and `list` permissions for the bucket. If you are unfamiliar with configuring AWS permissions, you can follow these steps to obtain the necessary permissions and credentials:
1. Log in to your Amazon AWS account and open the [IAM console](https://console.aws.amazon.com/iam/home#home).
2. In the IAM dashboard, select **Policies**, then click **Create Policy**.
3. Select the **JSON** tab, then paste the following JSON into the Policy editor (be sure to substitute in your bucket name):
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::{your-bucket-name}/*",
"arn:aws:s3:::{your-bucket-name}"
]
}
]
}
```
4. Give your policy a descriptive name, then click **Create policy**.
5. In the IAM dashboard, click **Users**. Select an existing IAM user or create a new one by clicking **Add users**.
6. If you are using an _existing_ IAM user, click the **Add permissions** dropdown menu and select **Add permissions**. If you are creating a _new_ user, you will be taken to the Permissions screen after selecting a name.
7. Select **Attach policies directly**, then find and check the box for your new policy. Click **Next**, then **Add permissions**.
8. After successfully creating your user, select the **Security credentials** tab and click **Create access key**. You will be prompted to select a use case and add optional tags to your access key. Click **Create access key** to generate the keys.
:::caution
Your `Secret Access Key` will only be visible once upon creation. Be sure to copy and store it securely for future use.
:::
For more information on managing your access keys, please refer to the
[official AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html).
### Step 2: Set up the Amazon S3 connector in Airbyte
1. [Log in to your Airbyte Cloud](https://cloud.airbyte.com/workspaces) account, or navigate to your Airbyte Open Source dashboard.
2. In the left navigation bar, click **Sources**. In the top-right corner, click **+ New source**.
3. Find and select **S3** from the list of available sources.
4. Enter the **Output Stream Name**. This will be the name of the table in the destination (can contain letters, numbers and underscores).
5. Enter the **Pattern of files to replicate**. This is a regular expression that allows Airbyte to pattern match the specific files to replicate. If you are replicating all the files within your bucket, use `**` as the pattern. For more precise pattern matching options, refer to the [Path Patterns section](#path-patterns) below.
6. Enter the name of the **Bucket** containing your files to replicate.
7. Toggle the **Optional fields** under the **Bucket** field to expand additional configuration options. **If you are syncing from a private bucket**, you must fill the **AWS Access Key ID** and **AWS Secret Access Key** fields with the appropriate credentials to authenticate the connection. All other fields are optional and can be left empty. Refer to the [S3 Provider Settings section](#s3-provider-settings) below for more information on each field.
8. In the **File Format** box, use the dropdown menu to select the format of the files you'd like to replicate. The supported formats are **CSV**, **Parquet**, **Avro** and **JSONL**. Toggling the **Optional fields** button within the **File Format** box will allow you to enter additional configurations based on the selected format. For a detailed breakdown of these settings, refer to the [File Format section](#file-format-settings) below.
6. (Optional) - If you want to enforce a specific schema, you can enter a **Manually enforced data schema**. By default, this value is set to `{}` and will automatically infer the schema from the file\(s\) you are replicating. For details on providing a custom schema, refer to the [User Schema section](#user-schema).
## Supported sync modes
The Amazon S3 source connector supports the following [sync modes](https://docs.airbyte.com/cloud/core-concepts#connection-sync-modes):
| Feature | Supported? |
| :--------------------------------------------- | :--------- |
| Full Refresh Sync | Yes |
| Incremental Sync | Yes |
| Replicate Incremental Deletes | No |
| Replicate Multiple Files \(pattern matching\) | Yes |
| Replicate Multiple Streams \(distinct tables\) | No |
| Namespaces | No |
## File Compressions
| Compression | Supported? |
| :---------- | :--------- |
| Gzip | Yes |
| Zip | No |
| Bzip2 | Yes |
| Lzma | No |
| Xz | No |
| Snappy | No |
Please let us know any specific compressions you'd like to see support for next!
## Path Patterns
\(tl;dr -&gt; path pattern syntax using [wcmatch.glob](https://facelessuser.github.io/wcmatch/glob/). GLOBSTAR and SPLIT flags are enabled.\)
This connector can sync multiple files by using glob-style patterns, rather than requiring a specific path for every file. This enables:
- Referencing many files with just one pattern, e.g. `**` would indicate every file in the bucket.
- Referencing future files that don't exist yet \(and therefore don't have a specific path\).
You must provide a path pattern. You can also provide many patterns split with \| for more complex directory layouts.
Each path pattern is a reference from the _root_ of the bucket, so don't include the bucket name in the pattern\(s\).
Some example patterns:
- `**` : match everything.
- `**/*.csv` : match all files with specific extension.
- `myFolder/**/*.csv` : match all csv files anywhere under myFolder.
- `*/**` : match everything at least one folder deep.
- `*/*/*/**` : match everything at least three folders deep.
- `**/file.*|**/file` : match every file called "file" with any extension \(or no extension\).
- `x/*/y/*` : match all files that sit in folder x -&gt; any folder -&gt; folder y.
- `**/prefix*.csv` : match all csv files with specific prefix.
- `**/prefix*.parquet` : match all parquet files with specific prefix.
Let's look at a specific example, matching the following bucket layout:
```text
myBucket
-> log_files
-> some_table_files
-> part1.csv
-> part2.csv
-> images
-> more_table_files
-> part3.csv
-> extras
-> misc
-> another_part1.csv
```
We want to pick up part1.csv, part2.csv and part3.csv \(excluding another_part1.csv for now\). We could do this a few different ways:
- We could pick up every csv file called "partX" with the single pattern `**/part*.csv`.
- To be a bit more robust, we could use the dual pattern `some_table_files/*.csv|more_table_files/*.csv` to pick up relevant files only from those exact folders.
- We could achieve the above in a single pattern by using the pattern `*table_files/*.csv`. This could however cause problems in the future if new unexpected folders started being created.
- We can also recursively wildcard, so adding the pattern `extras/**/*.csv` would pick up any csv files nested in folders below "extras", such as "extras/misc/another_part1.csv".
As you can probably tell, there are many ways to achieve the same goal with path patterns. We recommend using a pattern that ensures clarity and is robust against future additions to the directory structure.
## User Schema
Providing a schema allows for more control over the output of this stream. Without a provided schema, columns and datatypes will be inferred from the first created file in the bucket matching your path pattern and suffix. This will probably be fine in most cases but there may be situations you want to enforce a schema instead, e.g.:
- You only care about a specific known subset of the columns. The other columns would all still be included, but packed into the `_ab_additional_properties` map.
- Your initial dataset is quite small \(in terms of number of records\), and you think the automatic type inference from this sample might not be representative of the data in the future.
- You want to purposely define types for every column.
- You know the names of columns that will be added to future data and want to include these in the core schema as columns rather than have them appear in the `_ab_additional_properties` map.
Or any other reason! The schema must be provided as valid JSON as a map of `{"column": "datatype"}` where each datatype is one of:
- string
- number
- integer
- object
- array
- boolean
- null
For example:
- {"id": "integer", "location": "string", "longitude": "number", "latitude": "number"}
- {"username": "string", "friends": "array", "information": "object"}
:::note
Please note, the S3 Source connector used to infer schemas from all the available files and then merge them to create a superset schema. Starting from version 2.0.0 the schema inference works based on the first file found only. The first file we consider is the oldest one written to the prefix.
:::
## S3 Provider Settings
- **AWS Access Key ID**: One half of the [required credentials](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) for accessing a private bucket.
- **AWS Secret Access Key**: The other half of the [required credentials](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) for accessing a private bucket.
- **Path Prefix**: An optional string that limits the files returned by AWS when listing files to only those starting with the specified prefix. This is different than the **Path Pattern**, as the prefix is applied directly to the API call made to S3, rather than being filtered within Airbyte. **This is not a regular expression** and does not accept pattern-style symbols like wildcards (`*`). We recommend using this filter to improve performance if the connector if your bucket has many folders and files that are unrelated to the data you want to replicate, and all the relevant files will always reside under the specified prefix.
- Together with the **Path Pattern**, there are multiple ways to specify the files to sync. For example, all the following configurations are equivalent:
- **Prefix** = `<empty>`, **Pattern** = `path1/path2/myFolder/**/*`
- **Prefix** = `path1/`, **Pattern** = `path2/myFolder/**/*.csv`
- **Prefix** = `path1/path2/`, **Pattern** = `myFolder/**/*.csv`
- **Prefix** = `path1/path2/myFolder/`, **Pattern** = `**/*.csv`
- The ability to individually configure the prefix and pattern has been included to accommodate situations where you do not want to replicate the majority of the files in the bucket. If you are unsure of the best approach, you can safely leave the **Path Prefix** field empty and just [set the Path Pattern](#path-patterns) to meet your requirements.
- **Endpoint**: An optional parameter that enables the use of non-Amazon S3 compatible services. If you are using the default Amazon service, leave this field blank.
- **Start Date**: An optional parameter that marks a starting date and time in UTC for data replication. Any files that have _not_ been modified since this specified date/time will _not_ be replicated. Use the provided datepicker (recommended) or enter the desired date programmatically in the format `YYYY-MM-DDTHH:mm:ssZ`. Leaving this field blank will replicate data from all files that have not been excluded by the **Path Pattern** and **Path Prefix**.
## File Format Settings
The Reader in charge of loading the file format is currently based on [PyArrow](https://arrow.apache.org/docs/python/generated/pyarrow.csv.open_csv.html) \(Apache Arrow\).
:::note
All files within one stream must adhere to the same read options for every provided format.
:::
### CSV
Since CSV files are effectively plain text, providing specific reader options is often required for correct parsing of the files. These settings are applied when a CSV is created or exported so please ensure that this process happens consistently over time.
- **Delimiter**: Even though CSV is an acronym for Comma Separated Values, it is used more generally as a term for flat file data that may or may not be comma separated. The delimiter field lets you specify which character acts as the separator. To use [tab-delimiters](https://en.wikipedia.org/wiki/Tab-separated_values), you can set this value to `\t`. By default, this value is set to `,`.
- **Infer Datatypes**: This option determines whether a schema for the source should be inferred from the current data. When set to False and a custom schema is provided, the manually enforced schema takes precedence. If no custom schema is set and this option is set to False, all fields will be read as strings. Set to True by default.
- **Quote Character**: In some cases, data values may contain instances of reserved characters \(like a comma, if that's the delimiter\). CSVs can handle this by wrapping a value in defined quote characters so that on read it can parse it correctly. By default, this is set to `"`.
- **Escape Character**: An escape character can be used to prefix a reserved character and ensure correct parsing. A commonly used character is the backslash (`\`). For example, given the following data:
```
Product,Description,Price
Jeans,"Navy Blue, Bootcut, 34\"",49.99
```
The backslash (`\`) is used directly before the second double quote (`"`) to indicate that it is _not_ the closing quote for the field, but rather a literal double quote character that should be included in the value (in this example, denoting the size of the jeans in inches: `34"` ).
Leaving this field blank (default option) will disallow escaping.
- **Encoding**: Some data may use a different character set \(typically when different alphabets are involved\). See the [list of allowable encodings here](https://docs.python.org/3/library/codecs.html#standard-encodings). By default, this is set to `utf8`.
- **Double Quote**: This option determines whether two quotes in a quoted CSV value denote a single quote in the data. Set to True by default.
- **Allow newlines in values**: Also known as _multiline_, this option addresses situations where newline characters occur within text data. Typically, newline characters signify the end of a row in a CSV, but when this option is set to True, parsing correctly handles newlines within values. Set to False by default.
- **Additional Reader Options**: This allows for editing the less commonly used CSV [ConvertOptions](https://arrow.apache.org/docs/python/generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions). The value must be a valid JSON string, e.g.:
```
{
"timestamp_parsers": [
"%m/%d/%Y %H:%M", "%Y/%m/%d %H:%M"
],
"strings_can_be_null": true,
"null_values": [
"NA", "NULL"
]
}
```
- **Advanced Options**: This allows for editing the less commonly used CSV [ReadOptions](https://arrow.apache.org/docs/python/generated/pyarrow.csv.ReadOptions.html#pyarrow.csv.ReadOptions). The value must be a valid JSON string. One use case for this is when your CSV has no header, or if you want to use custom column names. You can specify `column_names` using this option. For example:
```
{
"column_names": [
"column1", "column2", "column3"
]
}
```
- **Block Size**: This is the number of bytes to process in memory at a time while reading files. The default value of `10000` is usually suitable, but if your files are particularly wide (lots of columns, or the values in the columns are particularly large), increasing this might help avoid schema detection failures.
:::caution
Be cautious when raising this value too high, as it may result in Out Of Memory issues due to increased memory usage.
:::
### Parquet
Apache Parquet is a column-oriented data storage format of the Apache Hadoop ecosystem. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. At the moment, partitioned parquet datasets are unsupported. The following settings are available:
- **Selected Columns**: If you only want to sync a subset of the columns from the file(s), enter the desired columns here as a comma-delimited list. Leave this field empty to sync all columns.
- **Record Batch Size**: Sets the maximum number of records per batch. Batches may be smaller if there arent enough rows in the file. This option can help avoid out-of-memory errors if your data is particularly wide. Set to `65536` by default.
- **Buffer Size**: If set to a positive number, read buffering is performed during the deserializing of individual column chunks. Otherwise I/O calls are unbuffered. Set to `2` by default.
For more information on these fields, please refer to the [Apache documentation](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetFile.html#pyarrow.parquet.ParquetFile.iter_batches).
### Avro
The Avro parser uses the [Fastavro library](https://fastavro.readthedocs.io/en/latest/). Currently, no additional options are supported.
### JSONL
The JSONL parser uses the PyArrow library, which only supports the line-delimited JSON format. For more detailed info, please refer to the [official docs](https://arrow.apache.org/docs/python/json.html).
- **Allow newlines in values**: While JSONL typically has each JSON object on a separate line, there are cases where newline characters may appear within JSON values, such as multiline strings. This option enables the parser to correctly interpret and treat newline characters within values. Please note that setting this option to True may affect performance. Set to False by default.
- **Unexpected field behavior**: This parameter determines how any JSON fields outside of the explicit schema (if defined) are handled. Possible behaviors include:
- `ignore`: Unexpected JSON fields are ignored.
- `error`: Error out on unexpected JSON fields.
- `infer`: Unexpected JSON fields are type-inferred and included in the output.
Set to `infer` by default.
- **Block Size**: This sets the number of bytes to process in memory at a time while reading files. The default value of `10000` is usually suitable, but if your files are particularly wide (lots of columns or the values in the columns are particularly large), increasing this might help avoid schema detection failures.
:::caution
Be cautious when raising this value too high, as it may result in Out Of Memory issues due to increased memory usage.
:::
## Changelog
| Version | Date | Pull Request | Subject |
| :------ | :--------- | :-------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------- |
| 3.1.0 | 2023-06-26 | [27725](https://github.com/airbytehq/airbyte/pull/27725) | License Update: Elv2 |
| 3.0.3 | 2023-06-23 | [27651](https://github.com/airbytehq/airbyte/pull/27651) | Handle Bucket Access Errors |
| 3.0.2 | 2023-06-22 | [27611](https://github.com/airbytehq/airbyte/pull/27611) | Fix start date |
| 3.0.1 | 2023-06-22 | [27604](https://github.com/airbytehq/airbyte/pull/27604) | Add logging for file reading |
| 3.0.0 | 2023-05-02 | [25127](https://github.com/airbytehq/airbyte/pull/25127) | Remove ab_additional column; Use platform-handled schema evolution |
| 2.2.0 | 2023-05-10 | [25937](https://github.com/airbytehq/airbyte/pull/25937) | Add support for Parquet Dataset |
| 2.1.4 | 2023-05-01 | [25361](https://github.com/airbytehq/airbyte/pull/25361) | Parse nested avro schemas |
| 2.1.3 | 2023-05-01 | [25706](https://github.com/airbytehq/airbyte/pull/25706) | Remove minimum block size for CSV check |
| 2.1.2 | 2023-04-18 | [25067](https://github.com/airbytehq/airbyte/pull/25067) | Handle block size related errors; fix config validator |
| 2.1.1 | 2023-04-18 | [25010](https://github.com/airbytehq/airbyte/pull/25010) | Refactor filter logic |
| 2.1.0 | 2023-04-10 | [25010](https://github.com/airbytehq/airbyte/pull/25010) | Add `start_date` field to filter files based on `LastModified` option |
| 2.0.4 | 2023-03-23 | [24429](https://github.com/airbytehq/airbyte/pull/24429) | Call `check` with a little block size to save time and memory. |
| 2.0.3 | 2023-03-17 | [24178](https://github.com/airbytehq/airbyte/pull/24178) | Support legacy datetime format for the period of migration, fix time-zone conversion. |
| 2.0.2 | 2023-03-16 | [24157](https://github.com/airbytehq/airbyte/pull/24157) | Return empty schema if `discover` finds no files; Do not infer extra data types when user defined schema is applied. |
| 2.0.1 | 2023-03-06 | [23195](https://github.com/airbytehq/airbyte/pull/23195) | Fix datetime format string |
| 2.0.0 | 2023-03-14 | [23189](https://github.com/airbytehq/airbyte/pull/23189) | Infer schema based on one file instead of all the files |
| 1.0.2 | 2023-03-02 | [23669](https://github.com/airbytehq/airbyte/pull/23669) | Made `Advanced Reader Options` and `Advanced Options` truly `optional` for `CSV` format |
| 1.0.1 | 2023-02-27 | [23502](https://github.com/airbytehq/airbyte/pull/23502) | Fix error handling |
| 1.0.0 | 2023-02-17 | [23198](https://github.com/airbytehq/airbyte/pull/23198) | Fix Avro schema discovery |
| 0.1.32 | 2023-02-07 | [22500](https://github.com/airbytehq/airbyte/pull/22500) | Speed up discovery |
| 0.1.31 | 2023-02-08 | [22550](https://github.com/airbytehq/airbyte/pull/22550) | Validate CSV read options and convert options |
| 0.1.30 | 2023-01-25 | [21587](https://github.com/airbytehq/airbyte/pull/21587) | Make sure spec works as expected in UI |
| 0.1.29 | 2023-01-19 | [21604](https://github.com/airbytehq/airbyte/pull/21604) | Handle OSError: skip unreachable keys and keep working on accessible ones. Warn a customer |
| 0.1.28 | 2023-01-10 | [21210](https://github.com/airbytehq/airbyte/pull/21210) | Update block size for json file format |
| 0.1.27 | 2022-12-08 | [20262](https://github.com/airbytehq/airbyte/pull/20262) | Check config settings for CSV file format |
| 0.1.26 | 2022-11-08 | [19006](https://github.com/airbytehq/airbyte/pull/19006) | Add virtual-hosted-style option |
| 0.1.24 | 2022-10-28 | [18602](https://github.com/airbytehq/airbyte/pull/18602) | Wrap errors into AirbyteTracedException pointing to a problem file |
| 0.1.23 | 2022-10-10 | [17991](https://github.com/airbytehq/airbyte/pull/17991) | Fix pyarrow to JSON schema type conversion for arrays |
| 0.1.23 | 2022-10-10 | [17800](https://github.com/airbytehq/airbyte/pull/17800) | Deleted `use_ssl` and `verify_ssl_cert` flags and hardcoded to `True` |
| 0.1.22 | 2022-09-28 | [17304](https://github.com/airbytehq/airbyte/pull/17304) | Migrate to per-stream state |
| 0.1.21 | 2022-09-20 | [16921](https://github.com/airbytehq/airbyte/pull/16921) | Upgrade pyarrow |
| 0.1.20 | 2022-09-12 | [16607](https://github.com/airbytehq/airbyte/pull/16607) | Fix for reading jsonl files containing nested structures |
| 0.1.19 | 2022-09-13 | [16631](https://github.com/airbytehq/airbyte/pull/16631) | Adjust column type to a broadest one when merging two or more json schemas |
| 0.1.18 | 2022-08-01 | [14213](https://github.com/airbytehq/airbyte/pull/14213) | Add support for jsonl format files. |
| 0.1.17 | 2022-07-21 | [14911](https://github.com/airbytehq/airbyte/pull/14911) | "decimal" type added for parquet |
| 0.1.16 | 2022-07-13 | [14669](https://github.com/airbytehq/airbyte/pull/14669) | Fixed bug when extra columns apeared to be non-present in master schema |
| 0.1.15 | 2022-05-31 | [12568](https://github.com/airbytehq/airbyte/pull/12568) | Fixed possible case of files being missed during incremental syncs |
| 0.1.14 | 2022-05-23 | [11967](https://github.com/airbytehq/airbyte/pull/11967) | Increase unit test coverage up to 90% |
| 0.1.13 | 2022-05-11 | [12730](https://github.com/airbytehq/airbyte/pull/12730) | Fixed empty options issue |
| 0.1.12 | 2022-05-11 | [12602](https://github.com/airbytehq/airbyte/pull/12602) | Added support for Avro file format |
| 0.1.11 | 2022-04-30 | [12500](https://github.com/airbytehq/airbyte/pull/12500) | Improve input configuration copy |
| 0.1.10 | 2022-01-28 | [8252](https://github.com/airbytehq/airbyte/pull/8252) | Refactoring of files' metadata |
| 0.1.9 | 2022-01-06 | [9163](https://github.com/airbytehq/airbyte/pull/9163) | Work-around for web-UI, `backslash - t` converts to `tab` for `format.delimiter` field. |
| 0.1.7 | 2021-11-08 | [7499](https://github.com/airbytehq/airbyte/pull/7499) | Remove base-python dependencies |
| 0.1.6 | 2021-10-15 | [6615](https://github.com/airbytehq/airbyte/pull/6615) & [7058](https://github.com/airbytehq/airbyte/pull/7058) | Memory and performance optimisation. Advanced options for CSV parsing. |
| 0.1.5 | 2021-09-24 | [6398](https://github.com/airbytehq/airbyte/pull/6398) | Support custom non Amazon S3 services |
| 0.1.4 | 2021-08-13 | [5305](https://github.com/airbytehq/airbyte/pull/5305) | Support of Parquet format |
| 0.1.3 | 2021-08-04 | [5197](https://github.com/airbytehq/airbyte/pull/5197) | Fixed bug where sync could hang indefinitely on schema inference |
| 0.1.2 | 2021-08-02 | [5135](https://github.com/airbytehq/airbyte/pull/5135) | Fixed bug in spec so it displays in UI correctly |
| 0.1.1 | 2021-07-30 | [4990](https://github.com/airbytehq/airbyte/pull/4990/commits/ff5f70662c5f84eabc03526cddfcc9d73c58c0f4) | Fixed documentation url in source definition |
| 0.1.0 | 2021-07-30 | [4990](https://github.com/airbytehq/airbyte/pull/4990) | Created S3 source connector |