1
0
mirror of synced 2025-12-19 18:14:56 -05:00

Bulk CDK: add GHA cron to bump CDK version in certified connectors (#66826)

## What
Part of
https://github.com/airbytehq/airbyte-internal-issues/issues/14308.
Example run
https://github.com/airbytehq/airbyte/actions/runs/18284850032, example
PR https://github.com/airbytehq/airbyte/pull/67093/. See also
https://github.com/airbytehq/airbyte-enterprise/pull/257.

The cron runs on the first Monday of every month. Past discussions
wanted every 3 weeks, but that's a PITA to do in GHA.

I added one more commit after that run:
679081a47d.
Just changing the trigger to be on a cron, and marking the PR as ready
for review.

There's no automation on actually merging the PRs, but they'll tag the
appropriate team(s) for review. In principle, reviewers can just click
the merge button on green CI though.

There's some weird behavior where the PR sometimes fails to submit b/c
of push protection rules
([example](https://github.com/airbytehq/airbyte/actions/runs/18235269794/job/51927690655#step:12:119)).
@wennergr is looking into this.

## How
Use the various gradle tasks we've added to upgrade the CDK version.

I wanted to use the [bump-version
command](https://github.com/airbytehq/airbyte/actions/workflows/bump-version-command.yml)
to do the metadata+changelog, but it seems to be
[broken](https://airbytehq-team.slack.com/archives/C02U9R3AF37/p1759499392004199).
Got claude to write a hacky equivalent.

(a lot of the bash stuff is also from claude, but heavily edited for
readability+comments+style. I've done enough testing to be confident in
it though.)

## Can this PR be safely reverted and rolled back?
<!--
* If unsure, leave it blank.
-->
- [x] YES 💚
- [ ] NO 
This commit is contained in:
Edward Gao
2025-10-07 15:15:36 -07:00
committed by GitHub
parent 8eaf55864b
commit 6dbd718fed
4 changed files with 297 additions and 0 deletions

View File

@@ -0,0 +1,163 @@
name: Auto Upgrade CDK for Certified Connectors
on:
schedule:
# Run at 16:37 UTC on the first of every month
- cron: "37 16 1 * *"
workflow_dispatch:
workflow_call:
inputs:
repositories:
description: "Repository name for GitHub App authentication"
required: false
type: string
default: "airbyte"
jobs:
list-certified-connectors:
name: List Certified Connectors
runs-on: ubuntu-24.04
outputs:
connectors: ${{ steps.list-connectors.outputs.connectors }}
steps:
- name: Checkout Airbyte
uses: actions/checkout@08eba0b27e820071cde6df949e0beb9ba4906955 # v4
with:
# Needed for when airbyte-enterprise calls this workflow
submodules: true
- name: Install yq
run: sudo snap install yq
- name: List certified connectors
id: list-connectors
run: |
json_array=$(tools/bin/bulk-cdk-auto-upgrade/list-connectors-to-upgrade.sh)
echo "connectors=$json_array" >> $GITHUB_OUTPUT
echo "Found bulk CDK connectors to upgrade: $json_array"
upgrade-connector-cdk:
name: Upgrade CDK for ${{ matrix.connector }}
needs: list-certified-connectors
runs-on: ubuntu-24.04
strategy:
matrix:
connector: ${{ fromJson(needs.list-certified-connectors.outputs.connectors) }}
fail-fast: false
max-parallel: 5
steps:
- name: Authenticate as GitHub App
uses: actions/create-github-app-token@67018539274d69449ef7c02e8e71183d1719ab42 # v2
id: app-token
with:
owner: "airbytehq"
repositories: ${{ inputs.repositories || 'airbyte' }}
app-id: ${{ secrets.OCTAVIA_BOT_APP_ID }}
private-key: ${{ secrets.OCTAVIA_BOT_PRIVATE_KEY }}
- name: Get GitHub App User ID
id: get-user-id
run: echo "user-id=$(gh api "/users/${{ steps.app-token.outputs.app-slug }}[bot]" --jq .id)" >> "$GITHUB_OUTPUT"
env:
GH_TOKEN: ${{ steps.app-token.outputs.token }}
- run: |
git config --global user.name '${{ steps.app-token.outputs.app-slug }}[bot]'
git config --global user.email '${{ steps.get-user-id.outputs.user-id }}+${{ steps.app-token.outputs.app-slug }}[bot]@users.noreply.github.com'
- name: Checkout Airbyte
uses: actions/checkout@08eba0b27e820071cde6df949e0beb9ba4906955 # v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
# Needed for when airbyte-enterprise calls this workflow
submodules: true
- name: Install yq
run: sudo snap install yq
- name: Setup Java
uses: actions/setup-java@c5195efecf7bdfc987ee8bae7a71cb8b11521c00 # v4
with:
distribution: "zulu"
java-version: "21"
- name: Setup Gradle
uses: gradle/actions/setup-gradle@80e941e61874822d2a89974089c4915748e8f4b7 # v4
- name: Run upgradeCdk for ${{ matrix.connector }}
id: upgrade-cdk
run: |
set -euo pipefail
./gradlew :airbyte-integrations:connectors:${{ matrix.connector }}:upgradeCdk
# --quiet disables all of gradle's normal logging, so we only get the CDK version number
new_cdk_version=$(./gradlew :airbyte-integrations:connectors:${{ matrix.connector }}:getCdkVersion --quiet)
echo "new_cdk_version=$new_cdk_version" >> $GITHUB_OUTPUT
- name: Check for changes
id: check-changes
if: steps.upgrade-cdk.outputs.exit_code == '0'
run: |
if git diff --quiet; then
echo "has_changes=false" >> $GITHUB_OUTPUT
echo "No changes detected for ${{ matrix.connector }}"
else
echo "has_changes=true" >> $GITHUB_OUTPUT
echo "Changes detected for ${{ matrix.connector }}"
fi
- name: Bump connector version
id: bump-version
if: steps.check-changes.outputs.has_changes == 'true'
run: |
set -euo pipefail
new_version=$(tools/bin/bulk-cdk-auto-upgrade/bump-connector-metadata.sh "${{ matrix.connector }}")
echo "Updated dockerImageTag to $new_version"
echo "new_version=$new_version" >> $GITHUB_OUTPUT
- name: Create Pull Request
id: create-pr
if: steps.check-changes.outputs.has_changes == 'true'
uses: peter-evans/create-pull-request@c5a7806660adbe173f04e3e038b0ccdcd758773c # v6
with:
token: ${{ steps.app-token.outputs.token }}
commit-message: "chore: upgrade bulk CDK for ${{ matrix.connector }}"
branch: "auto-upgrade-jvm-bulk-cdk/${{ steps.upgrade-cdk.outputs.new_cdk_version }}/${{ matrix.connector }}"
delete-branch: true
title: "chore: upgrade ${{ matrix.connector }} to bulk CDK ${{ steps.upgrade-cdk.outputs.new_cdk_version }}"
body: |
Upgrade Bulk CDK version for `${{ matrix.connector }}`
🤖 Generated by [automated workflow](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}).
labels: |
area/connectors
auto-cdk-upgrade
assignees: ""
# We'll rely on CODEOWNERS to tag the right people.
reviewers: ""
# Submit as draft: true, because we need to push another commit to this branch to update the changelog.
# The `ready for review` step is where we switch the PR out of draft mode.
draft: true
# Update the connector changelog and push the commit to the new branch
- name: Update connector changelog
# airbyte-enterprise doesn't have changelogs, so only do this in the OSS repo.
if: steps.check-changes.outputs.has_changes == 'true' && inputs.repositories != 'airbyte-enterprise'
run: |
set -euo pipefail
tools/bin/bulk-cdk-auto-upgrade/populate-connector-changelog.sh \
"${{ matrix.connector }}" \
"${{ steps.bump-version.outputs.new_version }}" \
"${{ steps.create-pr.outputs.pull-request-number }}" \
"Upgrade to Bulk CDK ${{ steps.upgrade-cdk.outputs.new_cdk_version }}."
# Commit and push the changelog changes
branch_name="${{ steps.create-pr.outputs.pull-request-branch }}"
git fetch --depth=1 origin "$branch_name"
git checkout "$branch_name"
git add docs
git commit -m "update changelog for ${{ matrix.connector }}"
git push origin "$branch_name"
# We've pushed all the changes to the PR. Mark it ready for review.
- name: Ready for review
if: steps.check-changes.outputs.has_changes == 'true'
run: gh pr ready "${{ steps.create-pr.outputs.pull-request-number }}"
env:
GH_TOKEN: ${{ steps.app-token.outputs.token }}

View File

@@ -0,0 +1,30 @@
#!/usr/bin/env bash
set -euo pipefail
# Performs a patch version bump in the metadata.yaml of the given connector.
# Prints the new version number.
# Usage: tools/bin/bulk-cdk-auto-upgrade/bump-connector-metadata.sh destination-dev-null
connector_name=$1
metadata_file="airbyte-integrations/connectors/$connector_name/metadata.yaml"
# Extract current version
current_version=$(yq '.data.dockerImageTag' "$metadata_file")
# Parse version components
IFS='.' read -r major minor patch <<< "$current_version"
# Increment patch version
new_patch=$((patch + 1))
new_version="${major}.${minor}.${new_patch}"
# Update metadata.yaml
# Don't use `yq` for this. yq introduces unrelated diffs and does not preserve formatting.
# In particular, yq may introduce diffs that cause our formatter to complain.
# Don't use `-i` b/c it's not platform-agnostic (macos requires `-i ''`, but that doesn't work on linux)
sed "s/dockerImageTag: ${current_version}/dockerImageTag: ${new_version}/" "$metadata_file" > "$metadata_file.tmp"
mv "$metadata_file.tmp" "$metadata_file"
echo "$new_version"

View File

@@ -0,0 +1,52 @@
#!/usr/bin/env bash
set -euo pipefail
# List all certified connectors, which are using the bulk CDK, and don't have a weird version number (e.g. -rc suffix).
# And also source-datagen + destination-dev-null.
# Prints the result as a JSON array.
connectors=()
# datagen and dev-null aren't certified, but we should probably keep them on the latest CDK anyway
default_connectors=(destination-dev-null source-datagen)
# check whether these connectors exist. This is to support airbyte-enterprise.
for connector in "${default_connectors[@]}"; do
metadata_file="airbyte-integrations/connectors/${connector}/metadata.yaml"
build_gradle="airbyte-integrations/connectors/${connector}/build.gradle"
build_gradle_kts="airbyte-integrations/connectors/${connector}/build.gradle.kts"
# If metadata.yaml exists
if test -f "$metadata_file"; then
# If we're on a "normal" version (e.g. 12.34.56)
if echo $(yq '.data.dockerImageTag' "$metadata_file") | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+$'; then
connectors+=("$connector")
fi
fi
done
for dir in airbyte-integrations/connectors/*; do
metadata_file="${dir}/metadata.yaml"
build_gradle="${dir}/build.gradle"
build_gradle_kts="${dir}/build.gradle.kts"
# If metadata.yaml exists and says we're certified
if (test -f "$metadata_file") && (test $(yq '.data.supportLevel' "$metadata_file") = 'certified'); then
# If we have a gradle buildscript using the bulk connector plugin
if (test -f "$build_gradle" && grep -q "airbyte-bulk-connector" "$build_gradle") || \
(test -f "$build_gradle_kts" && grep -q "airbyte-bulk-connector" "$build_gradle_kts"); then
# If we're on a "normal" version (e.g. 12.34.56)
if echo $(yq '.data.dockerImageTag' "$metadata_file") | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+$'; then
connector_name=$(basename "$dir")
connectors+=("$connector_name")
fi
fi
fi
done
# Nonobvious `printf | jq | jq` thing here:
# Print each element of the array on a separate line
# -> jq converts each line to a JSON string (i.e. wrap in double quotes)
# -> jq reads those lines and wraps them into a JSON array (compact-output is needed for compatibility with github's output format)
json_array=$(printf '%s\n' "${connectors[@]}" | jq --raw-input . | jq --compact-output --slurp .)
echo "$json_array"

View File

@@ -0,0 +1,52 @@
#!/usr/bin/env bash
set -euo pipefail
# Adds a new changelog entry for the given connector.
# Usage: tools/bin/bulk-cdk-auto-upgrade/populate-connector-changelog.sh <connector> <new_connector_version> <pr_number> <changelog_text>
# E.g.: tools/bin/bulk-cdk-auto-upgrade/populate-connector-changelog.sh destination-dev-null 1.2.3 1234 'upgrade foo'
connector_id="$1"
new_version="$2"
pr_number="$3"
changelog_text="$4"
# Convert connector name to docs path:
# destination-dev-null -> docs/integrations/destinations/dev-null.md
# source-postgres -> docs/integrations/sources/postgres.md
connector_type=$(echo "${connector_id}" | cut -d'-' -f1)
connector_name=$(echo "${connector_id}" | cut -d'-' -f2-)
docs_file="docs/integrations/${connector_type}s/${connector_name}.md"
if ! test -f "$docs_file"; then
echo "Docs file not found at $docs_file. This connector is probably doing something weird." >&2
exit 1
fi
# YYYY-MM-DD format
today=$(date +%Y-%m-%d)
new_entry="| ${new_version} | ${today} | [${pr_number}](https://github.com/airbytehq/airbyte/pull/${pr_number}) | ${changelog_text} |"
# Find the changelog table and insert the new entry after the header row
# The changelog table starts with a header row (`| Version | Date | Pull Request | Subject |`),
# followed by a separator line.
# We want to insert the new entry after the separator line.
# The awk script is doing exactly that:
# When it sees a line matching the changelog header row, awk prints that line and sets the `header` flag.
# If the `header` flag is set and awk sees the separator line, awk prints the separator followed by the new changelog entry.
# Otherwise, awk just prints the line unchanged.
awk -v entry="$new_entry" '
/^\| Version *\| Date *\| Pull Request *\| Subject *\|$/ { print; header=1; next }
header && /^\|:?-+\|:?-+\|:?-+\|:?-+\|$/ { print; print entry; header=0; next }
{ print }
' "$docs_file" > "${docs_file}.tmp"
if cmp -s "$docs_file" "${docs_file}.tmp"; then
echo "Error: awk command made no edits to $docs_file. Changelog table may not exist or has unexpected format." >&2
exit 1
fi
mv "${docs_file}.tmp" "$docs_file"
echo "Added changelog entry to $docs_file"