1
0
mirror of synced 2025-12-21 11:01:41 -05:00
Files
airbyte/docs/integrations/sources/snowflake.md

18 KiB

Snowflake

Overview

The Snowflake source allows you to sync data from Snowflake. It supports both Full Refresh and Incremental syncs. You can choose if this connector will copy only the new or updated data, or all rows in the tables and columns you set up for replication, every time a sync is run.

This Snowflake source connector is built on top of the source-jdbc code base and is configured to rely on JDBC 3.14.1 Snowflake driver as described in Snowflake documentation.

Resulting schema

The Snowflake source does not alter the schema present in your warehouse. Depending on the destination connected to this source, however, the result schema may be altered. See the destination's documentation for more details.

Features

Feature Supported?(Yes/No) Notes
Full Refresh Sync Yes
Incremental - Append Sync Yes
Namespaces Yes

Getting started

Requirements

  1. You'll need the following information to configure the Snowflake source:
  2. Host
  3. Role
  4. Warehouse
  5. Database
  6. Schema
  7. Username
  8. Password
  9. JDBC URL Params (Optional)
  10. Create a dedicated read-only Airbyte user and role with access to all schemas needed for replication.

Setup guide

1. Additional information about Snowflake connection parameters could be found here.

This step is optional but highly recommended to allow for better permission control and auditing. Alternatively, you can use Airbyte with an existing user in your database.

To create a dedicated database user, run the following commands against your database:

-- set variables (these need to be uppercase)
SET AIRBYTE_ROLE = 'AIRBYTE_ROLE';
SET AIRBYTE_USERNAME = 'AIRBYTE_USER';

-- set user password
SET AIRBYTE_PASSWORD = '-password-';

BEGIN;

-- create Airbyte role
CREATE ROLE IF NOT EXISTS $AIRBYTE_ROLE;

-- create Airbyte user
CREATE USER IF NOT EXISTS $AIRBYTE_USERNAME
PASSWORD = $AIRBYTE_PASSWORD
DEFAULT_ROLE = $AIRBYTE_ROLE
DEFAULT_WAREHOUSE= $AIRBYTE_WAREHOUSE;

-- grant Airbyte schema access
GRANT OWNERSHIP ON SCHEMA $AIRBYTE_SCHEMA TO ROLE $AIRBYTE_ROLE;

COMMIT;

You can limit this grant down to specific schemas instead of the whole database. Note that to replicate data from multiple Snowflake databases, you can re-run the command above to grant access to all the relevant schemas, but you'll need to set up multiple sources connecting to the same db on multiple schemas.

Your database user should now be ready for use with Airbyte.

Authentication

Login and Password

Field Description
Host The host domain of the snowflake instance (must include the account, region, cloud environment, and end with snowflakecomputing.com). Example: accountname.us-east-2.aws.snowflakecomputing.com
Role The role you created in Step 1 for Airbyte to access Snowflake. Example: AIRBYTE_ROLE
Warehouse The warehouse you created in Step 1 for Airbyte to sync data into. Example: AIRBYTE_WAREHOUSE
Database The database you created in Step 1 for Airbyte to sync data into. Example: AIRBYTE_DATABASE
Schema The schema whose tables this replication is targeting. If no schema is specified, all tables with permission will be presented regardless of their schema.
Username The username you created in Step 2 to allow Airbyte to access the database. Example: AIRBYTE_USER
Password The password associated with the username.
JDBC URL Params (Optional) Additional properties to pass to the JDBC URL string when connecting to the database formatted as key=value pairs separated by the symbol &. Example: key1=value1&key2=value2&key3=value3

Key pair authentication

In order to configure key pair authentication you will need a private/public key pair. If you do not have the key pair yet, you can generate one using openssl command line tool Use this command in order to generate an unencrypted private key file:

openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt

Alternatively, use this command to generate an encrypted private key file:

openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -v1 PBE-SHA1-RC4-128 -out rsa_key.p8

Once you have your private key, you need to generate a matching public key. You can do so with the following command:

openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

Finally, you need to add the public key to your Snowflake user account. You can do so with the following SQL command in Snowflake:

alter user <user_name> set rsa_public_key=<public_key_value>;

and replace <user_name> with your user name and <public_key_value> with your public key.

Network policies

By default, Snowflake allows users to connect to the service from any computer or device IP address. A security administrator (i.e. users with the SECURITYADMIN role) or higher can create a network policy to allow or deny access to a single IP address or a list of addresses.

If you have any issues connecting with Airbyte Cloud please make sure that the list of IP addresses is on the allowed list

To determine whether a network policy is set on your account or for a specific user, execute the SHOW PARAMETERS command.

Account

SHOW PARAMETERS LIKE 'network_policy' IN ACCOUNT;

User

SHOW PARAMETERS LIKE 'network_policy' IN USER <username>;

To read more please check official Snowflake documentation

Changelog

Expand to review
Version Date Pull Request Subject
0.3.3 2024-06-28 40424 Support Snowflake key pair authentication
0.3.2 2024-02-13 38317 Hide oAuth option from connector
0.3.1 2024-02-13 35220 Adopt CDK 0.20.4
0.3.1 2024-01-24 34453 bump CDK version
0.3.0 2023-12-18 33484 Remove LEGACY state
0.2.2 2023-10-20 31613 Fixed handling of TIMESTAMP_TZ columns. upgrade
0.2.1 2023-10-11 31252 Snowflake JDBC version upgrade
0.2.0 2023-06-26 27737 License Update: Elv2
0.1.36 2023-06-20 27212 Fix silent exception swallowing in StreamingJdbcDatabase
0.1.35 2023-06-14 27335 Remove noisy debug logs
0.1.34 2023-03-30 24693 Fix failure with TIMESTAMP_WITH_TIMEZONE column being used as cursor
0.1.33 2023-03-29 24667 Fix bug which wont allow TIMESTAMP_WITH_TIMEZONE column to be used as a cursor
0.1.32 2023-03-22 20760 Removed redundant date-time datatypes formatting
0.1.31 2023-03-06 23455 For network isolation, source connector accepts a list of hosts it is allowed to connect to
0.1.30 2023-02-21 22358 Improved handling of big integer cursor type values.
0.1.29 2022-12-14 20436 Consolidate date/time values mapping for JDBC sources.
0.1.28 2023-01-06 20465 Improve the schema config field to only discover tables from the specified scehma and make the field optional
0.1.27 2022-12-14 20407 Fix an issue with integer values converted to floats during replication
0.1.26 2022-11-10 19314 Set application id in JDBC URL params based on OSS/Cloud environment
0.1.25 2022-11-10 15535 Update incremental query to avoid data missing when new data is inserted at the same time as a sync starts under non-CDC incremental mode
0.1.24 2022-09-26 17144 Fixed bug with incorrect date-time datatypes handling
0.1.23 2022-09-26 17116 added connection string identifier
0.1.22 2022-09-21 16766 Update JDBC Driver version to 3.13.22
0.1.21 2022-09-14 15668 Wrap logs in AirbyteLogMessage
0.1.20 2022-09-01 16258 Emit state messages more frequently
0.1.19 2022-08-19 15797 Allow using role during oauth
0.1.18 2022-08-18 14356 DB Sources: only show a table can sync incrementally if at least one column can be used as a cursor field
0.1.17 2022-08-09 15314 Discover integer columns as integers rather than floats
0.1.16 2022-08-04 15314 (broken, do not use) Discover integer columns as integers rather than floats
0.1.15 2022-07-22 14828 Source Snowflake: Source/Destination doesn't respect DATE data type
0.1.14 2022-07-22 14714 Clarified error message when invalid cursor column selected
0.1.13 2022-07-14 14574 Removed additionalProperties:false from JDBC source connectors
0.1.12 2022-04-29 12480 Query tables with adaptive fetch size to optimize JDBC memory consumption
0.1.11 2022-04-27 10953 Implement OAuth flow
0.1.9 2022-02-21 10242 Fixed cursor for old connectors that use non-microsecond format. Now connectors work with both formats
0.1.8 2022-02-18 10242 Updated timestamp transformation with microseconds
0.1.7 2022-02-14 10256 Add -XX:+ExitOnOutOfMemoryError JVM option
0.1.6 2022-01-25 9623 Add jdbc_url_params support for optional JDBC parameters
0.1.5 2022-01-19 9567 Added parameter for keeping JDBC session alive
0.1.4 2021-12-30 9203 Update connector fields title/description
0.1.3 2021-01-11 9304 Upgrade version of JDBC driver
0.1.2 2021-10-21 7257 Fixed parsing of extreme values for FLOAT and NUMBER data types
0.1.1 2021-08-13 4699 Added json config validator