1
0
mirror of synced 2025-12-20 10:32:35 -05:00
Files
airbyte/docs/integrations/sources/github.md
octavia-bot-hoard[bot] 85e9e9bedf 🐙 source-github: run up-to-date pipeline [2025-12-02] (#70286)
Co-authored-by: octavia-bot-hoard[bot] <230633153+octavia-bot-hoard[bot]@users.noreply.github.com>
2025-12-01 22:10:24 -08:00

70 KiB

GitHub

This page contains the setup guide and reference information for the GitHub source connector.

Prerequisites

  • List of GitHub Repositories (and access for them in case they are private)

For Airbyte Cloud:

For Airbyte Open Source:

Setup guide

Step 1: Set up GitHub

Create a GitHub Account.

Airbyte Open Source additional setup steps

Log into GitHub and then generate a personal access token. To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with ,.

Step 2: Set up the GitHub connector in Airbyte

For Airbyte Cloud:

  1. Log into your Airbyte Cloud account.
  2. Click Sources and then click + New source.
  3. On the Set up the source page, select GitHub from the Source type dropdown.
  4. Enter a name for the GitHub connector.
  5. To authenticate:
  • For Airbyte Cloud: Authenticate your GitHub account to authorize your GitHub account. Airbyte will authenticate the GitHub account you are already logged in to. Please make sure you are logged into the right account.

  • For Airbyte Open Source: Authenticate with Personal Access Token. To generate a personal access token, log into GitHub and then generate a personal access token. Enter your GitHub personal access token. To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with ,.

  1. GitHub Repositories - Enter a list of GitHub organizations/repositories, e.g. airbytehq/airbyte for single repository, airbytehq/airbyte airbytehq/another-repo for multiple repositories. If you want to specify the organization to receive data from all its repositories, then you should specify it according to the following example: airbytehq/*.

:::caution Repositories with the wrong name or repositories that do not exist or have the wrong name format will be skipped with WARN message in the logs. :::

  1. Start date (Optional) - The date from which you'd like to replicate data for streams. For streams which support this configuration, only data generated on or after the start date will be replicated.
  • These streams will only sync records generated on or after the Start Date: comments, commit_comment_reactions, commit_comments, commits, deployments, events, issue_comment_reactions, issue_events, issue_milestones, issue_reactions, issues, project_cards, project_columns, projects, pull_request_comment_reactions, pull_requests, pull_requeststats, releases, review_comments, reviews, stargazers, workflow_runs, workflows.

  • The Start Date does not apply to the streams below and all data will be synced for these streams: assignees, branches, collaborators, issue_labels, organizations, pull_request_commits, pull_request_stats, repositories, tags, teams, users

  1. Branch (Optional) - List of GitHub repository branches to pull commits from, e.g. airbytehq/airbyte/master. If no branches are specified for a repository, the default branch will be pulled. (e.g. airbytehq/airbyte/master airbytehq/airbyte/my-branch).

For Airbyte Open Source:

  1. Navigate to the Airbyte Open Source dashboard. Click Sources and then click + New source.
  2. On the Set up the source page, select GitHub from the Source type dropdown.
  3. Enter a name for the GitHub connector.

Supported sync modes

The GitHub source connector supports the following sync modes:

Supported Streams

This connector outputs the following full refresh streams:

This connector outputs the following incremental streams:

Entity-Relationship Diagram (ERD)

Notes

  1. Only 4 streams (comments, commits, issues and review comments) from the listed above streams are pure incremental meaning that they:

    • read only new records;
    • output only new records.
  2. Streams workflow_runs and worflow_jobs is almost pure incremental:

    • read new records and some portion of old records (in past 30 days) docs;
    • the workflow_jobs depends on the workflow_runs to read the data, so they both follow the same logic docs;
    • output only new records.
  3. Other 19 incremental streams are also incremental but with one difference, they:

    • read all records;
    • output only new records. Please, consider this behaviour when using those 19 incremental streams because it may affect you API call limits.
  4. Sometimes for large streams specifying very distant start_date in the past may result in keep on getting error from GitHub instead of records (respective WARN log message will be outputted). In this case Specifying more recent start_date may help. The "Start date" configuration option does not apply to the streams below, because the GitHub API does not include dates which can be used for filtering:

  • assignees
  • branches
  • collaborators
  • issue_labels
  • organizations
  • pull_request_commits
  • pull_request_stats
  • repositories
  • tags
  • teams
  • users

Limitations & Troubleshooting

Expand to see details about GitHub connector limitations and troubleshooting.

Connector limitations

Rate limiting

You can use a personal access token to make API requests. Additionally, you can authorize a GitHub App or OAuth app, which can then make API requests on your behalf. All of these requests count towards your personal rate limit of 5,000 requests per hour (15,000 requests per hour if the app is owned by a GitHub Enterprise Cloud organization ).

:::info REST API and GraphQL API rate limits are counted separately :::

:::tip In the event that limits are reached before all streams have been read, it is recommended to take the following actions:

  1. Utilize Incremental sync mode.
  2. Set a higher sync interval.
  3. Divide the sync into separate connections with a smaller number of streams. :::

Refer to GitHub article Rate limits for the REST API.

Permissions and scopes

If you use OAuth authentication method, the OAuth2.0 application requests the next list of scopes: repo, read:org, read:repo_hook, read:user, read:discussion, read:project, workflow. For personal access token you need to manually select needed scopes.

Your token should have at least the repo scope. Depending on which streams you want to sync, the user generating the token needs more permissions:

  • For syncing Collaborators, the user which generates the personal access token must be a collaborator. To become a collaborator, they must be invited by an owner. If there are no collaborators, no records will be synced. Read more about access permissions here.
  • Syncing Teams is only available to authenticated members of a team's organization. Personal user accounts and repositories belonging to them don't have access to Teams features. In this case no records will be synced.
  • To sync the Projects stream, the repository must have the Projects feature enabled.

Troubleshooting

  • Check out common troubleshooting issues for the GitHub source connector on our Airbyte Forum

Changelog

Expand to review
Version Date Pull Request Subject
2.1.5 2025-12-02 70286 Update dependencies
2.1.4 2025-11-25 69887 Update dependencies
2.1.3 2025-11-18 69421 Update dependencies
2.1.2 2025-11-11 69271 Update dependencies
2.1.1 2025-11-04 69002 Update dependencies
2.1.0 2025-10-24 68637 Update dependencies
2.0.0 2025-10-14 68095 Breaking Change: Renames +1 and -1 fields to plus_one and minus_one respectively.
1.9.2 2025-10-21 68332 Update dependencies
1.9.1 2025-10-20 68197 Promoting release candidate 1.9.1-rc.1 to a main version.
1.9.1-rc.1 2025-10-13 67584 Graceful error handling of invalid credentials when running operations
1.9.0 2025-10-13 67708 Promoting release candidate 1.9.0-rc.3 to a main version.
1.9.0-rc.3 2025-10-09 67589 Fix min time to wait on token rate limits
1.9.0-rc.2 2025-10-03 67026 Fix converting datetime in workflows stream
1.9.0-rc.1 2025-10-02 66736 Update to airbyte-cdk v^7
1.8.42 2025-09-30 66166 Update dependencies
1.8.41 2025-09-09 66065 Update dependencies
1.8.40 2025-08-23 65375 Update dependencies
1.8.39 2025-08-16 64982 Update dependencies
1.8.38 2025-08-09 64580 Update dependencies
1.8.37 2025-08-02 64231 Update dependencies
1.8.36 2025-07-26 63810 Update dependencies
1.8.35 2025-07-19 63520 Update dependencies
1.8.34 2025-07-12 63158 Update dependencies
1.8.33 2025-07-05 62666 Update dependencies
1.8.32 2025-06-28 62166 Update dependencies
1.8.31 2025-06-25 62054 Fix problem with contributor_activity stream when author is not present/none
1.8.30 2025-06-23 61742 Handle conflict when empty repositories, we will ignore
1.8.29 2025-06-21 61857 Update dependencies
1.8.28 2025-06-15 61603 Update dependencies
1.8.27 2025-06-07 54931 Update dependencies
1.8.26 2025-02-22 54404 Update dependencies
1.8.25 2025-02-15 53703 Update dependencies
1.8.24 2025-02-01 52875 Update dependencies
1.8.23 2025-01-25 52364 Update dependencies
1.8.22 2025-01-18 51666 Update dependencies
1.8.21 2025-01-11 51130 Update dependencies
1.8.20 2025-01-04 50517 Update dependencies
1.8.19 2024-12-21 50055 Update dependencies
1.8.18 2024-12-14 49178 Update dependencies
1.8.17 2024-11-25 48631 Starting with this version, the Docker image is now rootless. Please note that this and future versions will not be compatible with Airbyte versions earlier than 0.64
1.8.16 2024-11-05 48318 Update dependencies
1.8.15 2024-10-28 47051 Update dependencies
1.8.14 2024-10-12 46766 Update dependencies
1.8.13 2024-10-05 46415 Update dependencies
1.8.12 2024-09-28 46117 Update dependencies
1.8.11 2024-09-21 45742 Update dependencies
1.8.10 2024-09-14 45557 Update dependencies
1.8.9 2024-09-07 45320 Update dependencies
1.8.8 2024-08-23 44592 Fix state handling for stream WorkflowRuns
1.8.7 2024-08-31 45061 Update dependencies
1.8.6 2024-08-24 44703 Update dependencies
1.8.5 2024-08-17 44227 Update dependencies
1.8.4 2024-08-12 43749 Update dependencies
1.8.3 2024-08-10 42671 Update dependencies
1.8.2 2024-08-20 42966 Bump cdk version and enable RFR for all non-incremental streams
1.8.1 2024-07-20 42342 Update dependencies
1.8.0 2024-07-16 41677 Update to 3.4.0 CDK
1.7.13 2024-07-13 41746 Update dependencies
1.7.12 2024-07-10 41354 Update dependencies
1.7.11 2024-07-09 41221 Update dependencies
1.7.10 2024-07-06 41000 Update dependencies
1.7.9 2024-06-25 40289 Update dependencies
1.7.8 2024-06-22 40128 Update dependencies
1.7.7 2024-06-17 39513 Update deprecated state handling method
1.7.6 2024-06-04 39078 [autopull] Upgrade base image to v1.2.1
1.7.5 2024-05-29 38341 Add max_waiting_time to configuration
1.7.4 2024-05-21 38341 Update CDK authenticator package
1.7.3 2024-05-20 38299 Fixed spec typo
1.7.2 2024-04-19 36636 Updating to 0.80.0 CDK
1.7.1 2024-04-12 36636 schema descriptions
1.7.0 2024-03-19 36267 Pin airbyte-cdk version to ^0
1.6.5 2024-03-12 35986 Handle rate limit exception as config error
1.6.4 2024-03-08 35915 Fix per stream error handler; Make use the latest CDK version
1.6.3 2024-02-15 35271 Update branches schema
1.6.2 2024-02-12 34933 Update Airbyte CDK for integration tests
1.6.1 2024-02-09 35087 Manage dependencies with Poetry.
1.6.0 2024-02-02 34700 Continue Sync on Stream failure
1.5.7 2024-01-29 34598 Fix MultipleToken sleep time
1.5.6 2024-01-26 34503 Fix MultipleToken rotation logic
1.5.5 2023-12-26 33783 Fix retry for 504 error in GraphQL based streams
1.5.4 2023-11-20 32679 Return AirbyteMessage if max retry exeeded for 202 status code
1.5.3 2023-10-23 31702 Base image migration: remove Dockerfile and use the python-connector-base image
1.5.2 2023-10-13 31386 Handle ContributorActivity continuous ACCEPTED response
1.5.1 2023-10-12 31307 Increase backoff_time for stream ContributorActivity
1.5.0 2023-10-11 31300 Update Schemas: Add date-time format to fields
1.4.6 2023-10-04 31056 Migrate spec properties' repository and branch type to <array>
1.4.5 2023-10-02 31023 Increase backoff for stream Contributor Activity
1.4.4 2023-10-02 30971 Mark start_date as optional.
1.4.3 2023-10-02 30979 Fetch archived records in Project Cards
1.4.2 2023-09-30 30927 Provide actionable user error messages
1.4.1 2023-09-30 30839 Update CDK to Latest version
1.4.0 2023-09-29 30823 Add new stream issue Timeline Events
1.3.1 2023-09-28 30824 Handle empty response in stream ContributorActivity
1.3.0 2023-09-25 30731 Add new stream ProjectsV2
1.2.1 2023-09-22 30693 Handle 404 error in TeamMemberShips
1.2.0 2023-09-22 30647 Add support for self-hosted GitHub instances
1.1.1 2023-09-21 30654 Rewrite source connection error messages
1.1.0 2023-08-03 30615 Add new stream Contributor Activity
1.0.4 2023-08-03 29031 Reverted advancedAuth spec changes
1.0.3 2023-08-01 28910 Updated advancedAuth broken references
1.0.2 2023-07-11 28144 Add archived_at property to Organizations schema parameter
1.0.1 2023-05-22 25838 Deprecate "page size" input parameter
1.0.0 2023-05-19 25778 Improve repo(s) name validation on UI
0.5.0 2023-05-16 25793 Implement client-side throttling of requests
0.4.11 2023-05-12 26025 Added more transparent depiction of the personal access token expired
0.4.10 2023-05-15 26075 Add more specific error message description for no repos case.
0.4.9 2023-05-01 24523 Add undeclared columns to spec
0.4.8 2023-04-19 00000 Fix repo name validation
0.4.7 2023-03-24 24457 Add validation and transformation for repositories config
0.4.6 2023-03-24 24398 Fix caching for get_starting_point in stream "Commits"
0.4.5 2023-03-23 24417 Add pattern_descriptors to fields with an expected format
0.4.4 2023-03-17 24255 Add field groups and titles to improve display of connector setup form
0.4.3 2023-03-04 22993 Specified date formatting in specification
0.4.2 2023-03-03 23467 Added user friendly messages, added AirbyteTracedException config_error, updated SAT
0.4.1 2023-01-27 22039 Set AvailabilityStrategy for streams explicitly to None
0.4.0 2023-01-20 21457 Use GraphQL for issue_reactions stream
0.3.12 2023-01-18 21481 Handle 502 Bad Gateway error with proper log message
0.3.11 2023-01-06 21084 Raise Error if no organizations or repos are available during read
0.3.10 2022-12-15 20523 Revert changes from 0.3.9
0.3.9 2022-12-14 19978 Update CDK dependency; move custom HTTPError handling into AvailabilityStrategy classes
0.3.8 2022-11-10 19299 Fix events and workflow_runs datetimes
0.3.7 2022-10-20 18213 Skip retry on HTTP 200
0.3.6 2022-10-11 17852 Use default behaviour, retry on 429 and all 5XX errors
0.3.5 2022-10-07 17715 Improve 502 handling for comments stream
0.3.4 2022-10-04 17555 Skip repository if got HTTP 500 for WorkflowRuns stream
0.3.3 2022-09-28 17287 Fix problem with "null" cursor_field for WorkflowJobs stream
0.3.2 2022-09-28 17304 Migrate to per-stream state.
0.3.1 2022-09-21 16947 Improve error logging when handling HTTP 500 error
0.3.0 2022-09-09 16534 Add new stream WorkflowJobs
0.2.46 2022-08-17 15730 Validate input organizations and repositories
0.2.45 2022-08-11 15420 "User" object can be "null"
0.2.44 2022-08-01 14795 Use GraphQL for pull_request_comment_reactions stream
0.2.43 2022-07-26 15049 Bugfix schemas for streams deployments, workflow_runs, teams
0.2.42 2022-07-12 14613 Improve schema for stream pull_request_commits added "null"
0.2.41 2022-07-03 14376 Add Retry for GraphQL API Resource limitations
0.2.40 2022-07-01 14338 Revert: "Rename field mergeable to is_mergeable"
0.2.39 2022-06-30 14274 Rename field mergeable to is_mergeable
0.2.38 2022-06-27 13989 Use GraphQL for reviews stream
0.2.37 2022-06-21 13955 Fix "secondary rate limit" not retrying
0.2.36 2022-06-20 13926 Break point added for workflows_runs stream
0.2.35 2022-06-16 13763 Use GraphQL for pull_request_stats stream
0.2.34 2022-06-14 13707 Fix API sorting, fix get_starting_point caching
0.2.33 2022-06-08 13558 Enable caching only for parent streams
0.2.32 2022-06-07 13531 Fix different result from get_starting_point when reading by pages
0.2.31 2022-05-24 13115 Add incremental support for streams WorkflowRuns
0.2.30 2022-05-09 12294 Add incremental support for streams CommitCommentReactions, IssueCommentReactions, IssueReactions, PullRequestCommentReactions, Repositories, Workflows
0.2.29 2022-05-04 12482 Update input configuration copy
0.2.28 2022-04-21 11893 Add new streams TeamMembers, TeamMemberships
0.2.27 2022-04-02 11678 Fix "PAT Credentials" in spec
0.2.26 2022-03-31 11623 Re-factored incremental sync for Reviews stream
0.2.25 2022-03-31 11567 Improve code for better error handling
0.2.24 2022-03-30 9251 Add Streams Workflow and WorkflowRuns
0.2.23 2022-03-17 11212 Improve documentation and spec for Beta
0.2.22 2022-03-10 10878 Fix error handling for unavailable streams with 404 status code
0.2.21 2022-03-04 10749 Add new stream ProjectCards
0.2.20 2022-02-16 10385 Add new stream Deployments, ProjectColumns, PullRequestCommits
0.2.19 2022-02-07 10211 Add human-readable error in case of incorrect organization or repo name
0.2.18 2021-02-09 10193 Add handling secondary rate limits
0.2.17 2021-02-02 9999 Remove BAD_GATEWAY code from backoff_time
0.2.16 2021-02-02 9868 Add log message for streams that are restricted for OAuth. Update oauth scopes.
0.2.15 2021-01-26 9802 Add missing fields for auto_merge in pull request stream
0.2.14 2021-01-21 9664 Add custom pagination size for large streams
0.2.13 2021-01-20 9619 Fix logging for function should_retry
0.2.11 2021-01-17 9492 Remove optional parameter Accept for reaction`s streams to fix error with 502 HTTP status code in response
0.2.10 2021-01-03 7250 Use CDK caching and convert PR-related streams to incremental
0.2.9 2021-12-29 9179 Use default retry delays on server error responses
0.2.8 2021-12-07 8524 Update connector fields title/description
0.2.7 2021-12-06 8518 Add connection retry with GitHub
0.2.6 2021-11-24 8030 Support start date property for PullRequestStats and Reviews streams
0.2.5 2021-11-21 8170 Fix slow check connection for organizations with a lot of repos
0.2.4 2021-11-11 7856 Resolve $ref fields in some stream schemas
0.2.3 2021-10-06 6833 Fix config backward compatability
0.2.2 2021-10-05 6761 Add oauth worflow specification
0.2.1 2021-09-22 6223 Add option to pull commits from user-specified branches
0.2.0 2021-09-19 5898 and 6227 Don't minimize any output fields & add better error handling
0.1.11 2021-09-15 5949 Add caching for all streams
0.1.10 2021-09-09 5860 Add reaction streams
0.1.9 2021-09-02 5788 Handling empty repository, check method using RepositoryStats stream
0.1.8 2021-09-01 5757 Add more streams
0.1.7 2021-08-27 5696 Handle negative backoff values
0.1.6 2021-08-18 5456 Add MultipleTokenAuthenticator
0.1.5 2021-08-18 5456 Fix set up validation
0.1.4 2021-08-13 5136 Support syncing multiple repositories/organizations
0.1.3 2021-08-03 5156 Extended existing schemas with users property for certain streams
0.1.2 2021-07-13 4708 Fix bug with IssueEvents stream and add handling for rate limiting
0.1.1 2021-07-07 4590 Fix schema in the pull_request stream
0.1.0 2021-07-06 4174 New Source: GitHub