* generate airbyte_type:integer
* normalization accepts `airbyte_type: integer`
* handles ints+longs
* update avro for consistency
* delete long type for now, treat all ints as longs
* update avro type mappings
{type:number, airbyte_type:integer} -> long
{type:number, airbyte_type:big_integer} -> string (i.e. "unbounded integer")
* fix test
* remove long handling
* Revert "remove long handling"
This reverts commit 33ade8d2831e675c3545ac6019d200ec312e54d9.
* Revert "update avro type mappings"
This reverts commit 5b0349badad7545efe8e1191291a628445fe1c84.
* Revert "delete long type for now, treat all ints as longs"
This reverts commit 018efd4a5d0c59f392fd8e3b0d0967c666b72947.
* Revert "update avro for consistency"
This reverts commit bcf47c6799b5906deb4f219d7f6e64ea73b41b74.
* newline@eof
* update test
* slightly better local tests
* fix test
* missed a few cases
* postgres tests use correct hostnames
* fix normalization
* fix int macro
* add test case
* normalization test output
* handle int/long correctly
* fix types for other DBs
* uint32 -> bigint; tests
* add type value assertions
* more test updates
* regenerate output
* reconcile big_integer to match docs
* update comment
* fix type
* fix mysql constructor call
* bigint only has 38 digits
* fix s3 ints, fix DAT test case
* big_integer should be string
* reduce to 28 digit big_ints
* fix test setup, mysql
* kill big_integer tests
* regenerate output
* version bumps
* auto-bump connector version [ci skip]
* auto-bump connector version [ci skip]
* auto-bump connector version [ci skip]
* auto-bump connector version [ci skip]
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
6.0 KiB
description
| description |
|---|
| BigQuery is a serverless, highly scalable, and cost-effective data warehouse offered by Google Cloud Provider. |
BigQuery
Overview
The BigQuery source supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is running.
Resulting schema
The BigQuery source does not alter the schema present in your database. Depending on the destination connected to this source, however, the schema may be altered. See the destination's documentation for more details.
Data type mapping
The BigQuery data types mapping:
| CockroachDb Type | Resulting Type | Notes |
|---|---|---|
BOOL |
Boolean | |
INT64 |
Number | |
FLOAT64 |
Number | |
NUMERIC |
Number | |
BIGNUMERIC |
Number | |
STRING |
String | |
BYTES |
String | |
DATE |
String | In ISO8601 format |
DATETIME |
String | In ISO8601 format |
TIMESTAMP |
String | In ISO8601 format |
TIME |
String | |
ARRAY |
Array | |
STRUCT |
Object | |
GEOGRAPHY |
String |
Features
| Feature | Supported | Notes |
|---|---|---|
| Full Refresh Sync | Yes | |
| Incremental Sync | Yes | |
| Change Data Capture | No | |
| SSL Support | Yes |
Getting started
Requirements
To use the BigQuery source, you'll need:
- A Google Cloud Project with BigQuery enabled
- A Google Cloud Service Account with the "BigQuery User" and "BigQuery Data Editor" roles in your GCP project
- A Service Account Key to authenticate into your Service Account
See the setup guide for more information about how to create the required resources.
Service account
In order for Airbyte to sync data from BigQuery, it needs credentials for a Service Account with the "BigQuery User" and "BigQuery Data Editor" roles, which grants permissions to run BigQuery jobs, write to BigQuery Datasets, and read table metadata. We highly recommend that this Service Account is exclusive to Airbyte for ease of permissioning and auditing. However, you can use a pre-existing Service Account if you already have one with the correct permissions.
The easiest way to create a Service Account is to follow GCP's guide for Creating a Service Account. Once you've created the Service Account, make sure to keep its ID handy as you will need to reference it when granting roles. Service Account IDs typically take the form <account-name>@<project-name>.iam.gserviceaccount.com
Then, add the service account as a Member in your Google Cloud Project with the "BigQuery User" role. To do this, follow the instructions for Granting Access in the Google documentation. The email address of the member you are adding is the same as the Service Account ID you just created.
At this point you should have a service account with the "BigQuery User" project-level permission.
Service account key
Service Account Keys are used to authenticate as Google Service Accounts. For Airbyte to leverage the permissions you granted to the Service Account in the previous step, you'll need to provide its Service Account Keys. See the Google documentation for more information about Keys.
Follow the Creating and Managing Service Account Keys guide to create a key. Airbyte currently supports JSON Keys only, so make sure you create your key in that format. As soon as you created the key, make sure to download it, as that is the only time Google will allow you to see its contents. Once you've successfully configured BigQuery as a source in Airbyte, delete this key from your computer.
Setup the BigQuery source in Airbyte
You should now have all the requirements needed to configure BigQuery as a source in the UI. You'll need the following information to configure the BigQuery source:
- Project ID
- Default Dataset ID [Optional]: the schema name if only one schema is interested. Dramatically boost source discover operation.
- Credentials JSON: the contents of your Service Account Key JSON file
Once you've configured BigQuery as a source, delete the Service Account Key from your computer.
CHANGELOG
source-bigquery
| Version | Date | Pull Request | Subject |
|---|---|---|---|
| 0.2.0 | 2022-07-26 | 14362 | Integral columns are now discovered as int64 fields. |
| 0.1.9 | 2022-07-14 | 14574 | Removed additionalProperties:false from JDBC source connectors |
| 0.1.8 | 2022-06-17 | 13864 | Updated stacktrace format for any trace message errors |
| 0.1.7 | 2022-04-11 | 11484 | BigQuery connector escape column names |
| 0.1.6 | 2022-02-14 | 10256 | Add -XX:+ExitOnOutOfMemoryError JVM option |
| 0.1.5 | 2021-12-23 | 8434 | Update fields in source-connectors specifications |
| 0.1.4 | 2021-09-30 | #6524 | Allow dataset_id null in spec |
| 0.1.3 | 2021-09-16 | #6051 | Handle NPE dataset_id is not provided |
| 0.1.2 | 2021-09-16 | #6135 | 🐛 BigQuery source: Fix nested structs |
| 0.1.1 | 2021-07-28 | #4981 | 🐛 BigQuery source: Fix nested arrays |
| 0.1.0 | 2021-07-22 | #4457 | 🎉 New Source: Big Query. |