1
0
mirror of synced 2025-12-20 10:32:35 -05:00
Files
airbyte/docs/integrations/sources/google-sheets.md
devin-ai-integration[bot] 9fa1f55620 feat(source-google-sheets): Add optional flag to read columns with empty headers (#69848)
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Alfredo Garcia <freddy.garcia7.fg@gmail.com>
Co-authored-by: Alfredo Garcia <alfredo.garcia@hallmark.edu>
2025-12-03 10:21:33 -06:00

39 KiB

Google Sheets

This page contains the setup guide and reference information for the Google Sheets source connector.

:::info The Google Sheets source connector pulls data from a single Google Sheets spreadsheet. Each sheet within a spreadsheet can be synced. To sync multiple spreadsheets, use our Google Drive connector or set up multiple Google Sheets source connectors in your Airbyte instance. No other files in your Google Drive are accessed. :::

Prerequisites

  • Spreadsheet Link - The link to the Google spreadsheet you want to sync.
  • For Airbyte Cloud A Google Workspace user with access to the spreadsheet
  • For Airbyte Open Source:
  • A GCP project
  • Enable the Google Sheets API in your GCP project
  • Service Account Key with access to the Spreadsheet you want to replicate

Setup guide

The Google Sheets source connector supports authentication via either OAuth or Service Account Key Authentication.

Step 1: Set up Google Sheets

For Airbyte Cloud:

We highly recommend using OAuth, as it significantly simplifies the setup process and allows you to authenticate directly from the Airbyte UI.

For Airbyte Open Source:

We recommend using Service Account Key Authentication. Follow the steps below to create a service account, generate a key, and enable the Google Sheets API.

:::note If you prefer to use OAuth for authentication with Airbyte Open Source, you can follow Google's OAuth instructions to create an authentication app. Be sure to set the scopes to https://www.googleapis.com/auth/spreadsheets.readonly. You will need to obtain your client ID, client secret, and refresh token for the connector setup. :::

Set up the service account key

Create a service account

  1. Open the Service Accounts page in your Google Cloud console.
  2. Select an existing project, or create a new project.
  3. At the top of the page, click + Create service account.
  4. Enter a name and description for the service account, then click Create and Continue.
  5. Under Service account permissions, select the roles to grant to the service account, then click Continue. We recommend the Viewer role.

Generate a key

  1. Go to the API Console/Credentials page and click on the email address of the service account you just created.
  2. In the Keys tab, click + Add key, then click Create new key.
  3. Select JSON as the Key type. This will generate and download the JSON key file that you'll use for authentication. Click Continue.

Enable the Google Sheets API

  1. Go to the API Console/Library page.
  2. Make sure you have selected the correct project from the top.
  3. Find and select the Google Sheets API.
  4. Click ENABLE.

If your spreadsheet is viewable by anyone with its link, no further action is needed. If not, give your Service account access to your spreadsheet.

Set up the Google Sheets connector in Airbyte

For Airbyte Cloud:

  1. Log into your Airbyte Cloud account.
  2. Click Sources and then click + New source.
  3. On the Set up the source page, select Google Sheets from the Source type dropdown.
  4. Enter a name for the Google Sheets connector.

For Airbyte Open Source:

  1. Navigate to the Airbyte Open Source dashboard.
  2. Click Sources and then click + New source.
  3. On the Set up the source page, select Google Sheets from the Source type dropdown.
  4. Enter a name for the Google Sheets connector.
  1. Select your authentication method:
  • For Airbyte Cloud: (Recommended) Select Authenticate via Google (OAuth) from the Authentication dropdown, click Sign in with Google and complete the authentication workflow.
  • For Airbyte Open Source: (Recommended) Select Service Account Key Authentication from the dropdown and enter your Google Cloud service account key in JSON format:
  {
    "type": "service_account",
    "project_id": "YOUR_PROJECT_ID",
    "private_key_id": "YOUR_PRIVATE_KEY",
    ...
  }
  • To authenticate your Google account via OAuth, select Authenticate via Google (OAuth) from the dropdown and enter your Google application's client ID, client secret, and refresh token.
6. For **Spreadsheet Link**, enter the link to the Google spreadsheet. To get the link, go to the Google spreadsheet you want to sync, click **Share** in the top right corner, and click **Copy Link**. 7. For **Batch Size**, enter an integer which represents batch size when processing a Google Sheet. Default value is 1000000. Batch size is an integer representing row batch size for each sent request to Google Sheets API. Row batch size means how many rows are processed from the google sheet, for example default value 1000000 would process rows 2-1000002, then 1000003-2000003 and so on. Based on [Google Sheets API limits documentation](https://developers.google.com/sheets/api/limits), it is possible to send up to 300 requests per minute, but each individual request has to be processed under 180 seconds, otherwise the request returns a timeout error. In regards to this information, consider network speed and number of columns of the google sheet when deciding a batch_size value. 8. (Optional) You may enable the option to **Convert Column Names to SQL-Compliant Format**. Enabling this option will allow the connector to convert column names to a standardized, SQL-friendly format. For example, a column name of `Café Earnings 2022` will be converted to `cafe_earnings_2022`. We recommend enabling this option if your target destination is SQL-based (ie Postgres, MySQL). Set to false by default. 9. Click **Set up source** and wait for the tests to complete.

Configuration Options

Stream Name Overrides (Rename Sheet/Stream Names)

The Google Sheets connector allows you to optionally rename streams (sheet/tab names) as they appear in Airbyte and your destination. This is useful if your sheet names are not descriptive, contain special characters, or you want to standardize naming across sources.

How it works

  • You can provide a list of overrides, each specifying a source_stream_name (the exact name of the sheet/tab in your spreadsheet) and a custom_stream_name (the name you want it to appear as in Airbyte and your destination).
  • If a source_stream_name is not found in your spreadsheet, it will be ignored and the default name will be used.
  • This feature only affects stream (sheet/tab) names, not field/column names.
  • If you want to rename fields or column names, you can do so using the Airbyte Mappings feature after your connection is created. See the Airbyte documentation for more details on how to use Mappings.
  • Renaming occurs before any other name conversion or sanitization options.

Example

Suppose your spreadsheet has sheets named Sheet1, 2024 Q1, and Summary. You want to rename them to sales_data, q1_2024, and leave Summary unchanged. You would configure:

[
  { "source_stream_name": "Sheet1", "custom_stream_name": "sales_data" },
  { "source_stream_name": "2024 Q1", "custom_stream_name": "q1_2024" }
]

After discovery, your streams in Airbyte will be named sales_data, q1_2024, and Summary.

How to configure

  • In the Airbyte UI, add your overrides in the Stream Name Overrides field as an array of objects.
  • If you do not wish to rename any streams, leave this field blank.
  • After adding or changing a stream name override, refresh your schema in Airbyte to see the new stream names take effect.
  • Overridden streams will default to Sync Mode: Full Refresh (Append), which does not support primary keys. If you want to use primary keys and deduplication, update the sync mode to "Full Refresh | Overwrite + Deduped" in your connection settings.

Google Sheets Connector Column Name Conversion

The Google Sheets connector offers options to customize how column names from your spreadsheet are converted to be SQL-compliant. These settings can be configured in the Airbyte UI when setting up the connector.


1. Convert Column Names to SQL-Compliant Format

  • Description: When enabled, this converts column names to a format compatible with SQL databases (e.g., lowercasing, replacing spaces with underscores). This is the primary toggle required to enable any column name conversion.
  • Default: Off

2. Additional Conversion Options

The following options allow you to fine-tune the column name conversion process. They only take effect if "Convert Column Names to SQL-Compliant Format" is enabled.

  • Remove Leading and Trailing Underscores

    • Description: Removes leading and trailing underscores from column names. Note that leading underscores are preserved for column names starting with a number if "Allow Leading Numbers" is disabled.
    • Example:
      • Input: " EXAMPLE Domain "
      • Output: "example_domain"
    • Default: Off
  • Combine Number-Word Pairs

    • Description: Combines adjacent numbers and words into a single token without separators.
    • Example:
      • Input: "50th Percentile"
      • Output: "50th_percentile" (if "Allow Leading Numbers" is enabled)
      • Output: "_50th_percentile" (if "Allow Leading Numbers" is disabled)
    • Default: Off
  • Remove All Special Characters

    • Description: Removes all special characters (e.g., *, ?, !, $, %, (, )) from column names.
    • Example:
      • Input: "Example ID*"
      • Output: "example_id"
    • Default: Off
  • Combine Letter-Number Pairs

    • Description: Combines adjacent letters and numbers into a single token without separators.
    • Example:
      • Input: "Q3 2023"
      • Output: "q3_2023"
    • Default: Off
  • Allow Leading Numbers

    • Description: Allows column names to start with numbers. If disabled, a leading underscore is added to column names that begin with a number.
    • Example:
      • Input: "50th Percentile"
      • Output: "50_th_percentile" (if enabled)
      • Output: "_50_th_percentile" (if disabled)
    • Default: Off

Additional Details

  • All converted column names are lowercased.
  • Multiple spaces or special characters are collapsed or removed, not replaced with multiple underscores.
  • Only single underscores are used to separate tokens.
  • The result is always SQL-friendly and readable.

These options provide flexibility to tailor column name conversions to your specific database requirements. Adjust them as needed in the Airbyte UI when configuring the Google Sheets connector.


Header Deduplication

The Google Sheets connector automatically handles duplicate column headers by appending the cell position to create unique field names. This ensures that all columns are properly synced even when your spreadsheet contains duplicate header names.

How it works

  • When duplicate headers are detected, the connector appends _<cell_position> to each duplicated header name
  • The cell position follows the standard Google Sheet naming convention (e.g., A1, B1, C1, etc.)
  • This creates unique field names while preserving the original header text

Example

If your spreadsheet has a header named stats in both columns C and Q (positions C1 and Q1), the connector will create two distinct fields:

  • stats_C1 (for the column at position C1)
  • stats_Q1 (for the column at position Q1)

This ensures that data from both columns is properly captured and synced to your destination, with clear identification of which column each field represents.


Supported sync modes

The Google Sheets source connector supports the following sync modes:

Supported Streams

Each sheet in the selected spreadsheet is synced as a separate stream. Each selected column in the sheet is synced as a string field.

Airbyte only supports replicating Grid sheets.

Data type map

Each sheet in the selected spreadsheet is synced as a separate stream. Each selected column in the sheet is synced as a string field.

Airbyte only supports replicating Grid sheets.

Integration Type Airbyte Type Notes
any type string

Limitations & Troubleshooting

Expand to see details about Google Sheets connector limitations and troubleshooting.

Connector limitations

Rate limiting

The Google API rate limits are:

  • 300 read requests per minute per project
  • 60 requests per minute per user per project

Airbyte batches requests to the API in order to efficiently pull data and respect these rate limits. We recommend not using the same user or service account for more than 3 instances of the Google Sheets source connector to ensure high transfer speeds.

Troubleshooting

  • If your sheet is completely empty (no header rows) or deleted, Airbyte will not delete the table in the destination. If this happens, the sync logs will contain a message saying the sheet has been skipped when syncing the full spreadsheet.
  • Connector setup will fail if the spreadsheet is not a Google Sheets file. If the file was saved or imported as another file type the setup could fail.
  • Check out common troubleshooting issues for the Google Sheets source connector on our Airbyte Forum.

Changelog

Expand to review
Version Date Pull Request Subject
0.12.15 2025-12-03 69848 Add optional flag to read columns with empty headers
0.12.14 2025-11-25 70029 Update dependencies
0.12.13 2025-11-18 69396 Update dependencies
0.12.12 2025-10-29 68759 Update dependencies
0.12.11 2025-10-21 68254 Update dependencies
0.12.10 2025-10-16 67531 Add error handling for unexpected data in sheets causing 500 responses.
0.12.9 2025-10-14 67876 Update dependencies
0.12.8 2025-10-07 67395 Update dependencies
0.12.7 2025-09-30 65383 Update dependencies
0.12.6 2025-09-10 66012 Update to CDK v7
0.12.5 2025-08-09 64633 Update dependencies
0.12.4 2025-08-02 64179 Update dependencies
0.12.3 2025-07-26 63822 Update dependencies
0.12.2 2025-07-22 63334 Feature: Deduplicate Headers
0.12.1 2025-07-19 55490 Update dependencies
0.12.0 2025-07-15 63305 Promoting release candidate 0.12.0-rc.2 to a main version.
0.12.0-rc.2 2025-07-11 62931 Fix: handle empty propeties_to_match in SchmemaMatchingExtractor
0.12.0-rc.1 2025-07-02 62456 Feature: migrate connector to manifest-only format
0.11.0 2025-06-11 61489 Feature: Added Streeam Name Override Options
0.10.0 2025-06-09 60836 Feature: Added additional sanitization flags when using Convert Column Names to SQL-Compliant Format (names_conversion)
0.9.6 2025-05-22 60874 Use custom backoff policy on 429 errors for single sheets
0.9.5 2025-05-13 60259 Fix whitespaces used for column names when enabling names_conversion
0.9.4 2025-03-01 54989 Update dependencies
0.9.3 2025-02-22 54434 Update dependencies
0.9.2 2025-02-15 53720 Update dependencies
0.9.1 2025-02-08 51696 Update dependencies
0.9.0 2025-02-04 53154 Promoting release candidate 0.9.0-rc.3 to a main version.
0.9.0-rc.3 2025-01-31 52682 Fix stream name typing
0.9.0-rc.2 2025-01-31 52671 Fix sheet id encoding
0.9.0-rc.1 2025-01-30 50843 Migrate to low-code
0.8.5 2025-01-11 44270 Starting with this version, the Docker image is now rootless. Please note that this and future versions will not be compatible with Airbyte versions earlier than 0.64
0.8.4 2024-12-09 48835 Implementing integration tests
0.7.4 2024-09-09 45108 Google Sheets API errors now cause syncs to fail
0.7.3 2024-08-12 43921 Update dependencies
0.7.2 2024-08-10 43544 Update dependencies
0.7.1 2024-08-03 43290 Update dependencies
0.7.0 2024-08-02 42975 Migrate to CDK v4.3.0
0.6.3 2024-07-27 42826 Update dependencies
0.6.2 2024-07-22 41993 Avoid syncs with rate limits being considered successful
0.6.1 2024-07-20 42376 Update dependencies
0.6.0 2024-07-17 42071 Migrate to CDK v3.9.0
0.5.11 2024-07-13 41527 Update dependencies
0.5.10 2024-07-09 41273 Update dependencies
0.5.9 2024-07-06 41005 Update dependencies
0.5.8 2024-06-28 40587 Replaced deprecated AirbyteLogger with logging.Logger
0.5.7 2024-06-25 40560 Catch an auth error during discover and raise a config error
0.5.6 2024-06-26 40533 Update dependencies
0.5.5 2024-06-25 40505 Update dependencies
0.5.4 2024-06-22 40129 Update dependencies
0.5.3 2024-06-06 39225 [autopull] Upgrade base image to v1.2.2
0.5.2 2024-06-02 38851 Emit state message at least once per stream
0.5.1 2024-04-11 35404 Add row_batch_size parameter more granular control read records
0.5.0 2024-03-26 36515 Resolve poetry dependency conflict, add record counts to state messages
0.4.0 2024-03-19 36267 Pin airbyte-cdk version to ^0
0.3.17 2024-02-29 35722 Add logic to emit stream statuses
0.3.16 2024-02-12 35136 Fix license in pyproject.toml.
0.3.15 2024-02-07 34944 Manage dependencies with Poetry.
0.3.14 2024-01-23 34437 Fix header cells filtering
0.3.13 2024-01-19 34376 Fix names conversion
0.3.12 2023-12-14 33414 Prepare for airbyte-lib
0.3.11 2023-10-19 31599 Base image migration: remove Dockerfile and use the python-connector-base image
0.3.10 2023-09-27 30487 Fix bug causing rows to be skipped when batch size increased due to rate limits.
0.3.9 2023-09-25 30749 Performance testing - include socat binary in docker image
0.3.8 2023-09-25 30747 Performance testing - include socat binary in docker image
0.3.7 2023-08-25 29826 Remove row batch size from spec, add auto increase this value when rate limits
0.3.6 2023-08-16 29491 Update to latest CDK
0.3.5 2023-08-16 29427 Add stop reading in case of 429 error
0.3.4 2023-05-15 29453 Update spec descriptions
0.3.3 2023-08-10 29327 Add user-friendly error message for 404 and 403 error while discover
0.3.2 2023-08-09 29246 Add checking while reading to skip modified sheets
0.3.1 2023-07-06 28033 Fixed several reported vulnerabilities (25 total), CVE-2022-37434, CVE-2022-42898
0.3.0 2023-06-26 27738 License Update: Elv2
0.2.39 2023-05-31 26833 Remove authSpecification in favour of advancedAuth in specification
0.2.38 2023-05-16 26097 Refactor config error
0.2.37 2023-02-21 23292 Skip non grid sheets.
0.2.36 2023-02-21 23272 Handle empty sheets gracefully.
0.2.35 2023-02-23 23057 Slugify column names
0.2.34 2023-02-15 23071 Change min spreadsheet id size to 20 symbols
0.2.33 2023-02-13 23278 Handle authentication errors
0.2.32 2023-02-13 22884 Do not consume http spreadsheets.
0.2.31 2022-10-09 19574 Revert 'Add row_id to rows and use as primary key'
0.2.30 2022-10-09 19215 Add row_id to rows and use as primary key
0.2.21 2022-10-04 15591 Clean instantiation of AirbyteStream
0.2.20 2022-10-10 17766 Fix null pointer exception when parsing the spreadsheet id.
0.2.19 2022-09-29 17410 Use latest CDK.
0.2.18 2022-09-28 17326 Migrate to per-stream states.
0.2.17 2022-08-03 15107 Expose Row Batch Size in Connector Specification
0.2.16 2022-07-07 13729 Improve configuration field description
0.2.15 2022-06-02 13446 Retry requests resulting in a server error
0.2.13 2022-05-06 12685 Update CDK to v0.1.56 to emit an AirbyeTraceMessage on uncaught exceptions
0.2.12 2022-04-20 12230 Update connector to use a spec.yaml
0.2.11 2022-04-13 11977 Replace leftover print statement with airbyte logger
0.2.10 2022-03-25 11404 Allow using Spreadsheet Link/URL instead of Spreadsheet ID
0.2.9 2022-01-25 9208 Update title and descriptions
0.2.7 2021-09-27 8470 Migrate to the CDK
0.2.6 2021-09-27 6354 Support connecting via Oauth webflow
0.2.5 2021-09-12 5972 Fix full_refresh test by adding supported_sync_modes to Stream initialization
0.2.4 2021-08-05 5233 Fix error during listing sheets with diagram only
0.2.3 2021-06-09 3973 Add AIRBYTE_ENTRYPOINT for Kubernetes support
0.2.2 2021-04-20 2994 Formatting spec
0.2.1 2021-04-03 2726 Fix base connector versioning
0.2.0 2021-03-09 2238 Protocol allows future/unknown properties
0.1.7 2021-01-21 1762 Fix issue large spreadsheet
0.1.6 2021-01-27 1668 Adopt connector best practices
0.1.5 2020-12-30 1438 Implement backoff
0.1.4 2020-11-30 1046 Add connectors using an index YAML file