1
0
mirror of synced 2025-12-25 02:09:19 -05:00

Source Greenhouse : Migrate to Manifest-only (#47283)

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: pnilan <patrick.nilan@airbyte.io>
This commit is contained in:
Tope Folorunso
2025-07-02 18:22:53 +01:00
committed by GitHub
parent b1ac34e005
commit 18e85e714b
18 changed files with 5751 additions and 4701 deletions

View File

@@ -1,3 +0,0 @@
[run]
omit =
source_greenhouse/run.py

View File

@@ -1,49 +1,22 @@
# Greenhouse source connector
This is the repository for the Greenhouse source connector, written in Python.
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.com/integrations/sources/greenhouse).
This directory contains the manifest-only connector for `source-greenhouse`.
This _manifest-only_ connector is not a Python package on its own, as it runs inside of the base `source-declarative-manifest` image.
For information about how to configure and use this connector within Airbyte, see [the connector's full documentation](https://docs.airbyte.com/integrations/sources/greenhouse).
## Local development
### Prerequisites
We recommend using the Connector Builder to edit this connector.
Using either Airbyte Cloud or your local Airbyte OSS instance, navigate to the **Builder** tab and select **Import a YAML**.
Then select the connector's `manifest.yaml` file to load the connector into the Builder. You're now ready to make changes to the connector!
- Python (~=3.9)
- Poetry (~=1.7) - installation instructions [here](https://python-poetry.org/docs/#installation)
### Installing the connector
From this connector directory, run:
```bash
poetry install --with dev
```
### Create credentials
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.com/integrations/sources/greenhouse)
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `spec` inside `source_greenhouse/manifest.yaml` file.
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
See `integration_tests/sample_config.json` for a sample config file.
### Locally running the connector
```
poetry run source-greenhouse spec
poetry run source-greenhouse check --config secrets/config.json
poetry run source-greenhouse discover --config secrets/config.json
poetry run source-greenhouse read --config secrets/config.json --catalog integration_tests/configured_catalog.json
```
### Running unit tests
To run unit tests locally, from the connector directory run:
```
poetry run pytest unit_tests
```
If you prefer to develop locally, you can follow the instructions below.
### Building the docker image
You can build any manifest-only connector with `airbyte-ci`:
1. Install [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)
2. Run the following command to build the docker image:
@@ -53,18 +26,24 @@ airbyte-ci connectors --name=source-greenhouse build
An image will be available on your host with the tag `airbyte/source-greenhouse:dev`.
### Creating credentials
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.com/integrations/sources/greenhouse)
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `spec` object in the connector's `manifest.yaml` file.
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
### Running as a docker container
Then run any of the connector commands as follows:
Then run any of the standard source connector commands:
```
```bash
docker run --rm airbyte/source-greenhouse:dev spec
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-greenhouse:dev check --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-greenhouse:dev discover --config /secrets/config.json
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-greenhouse:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
```
### Running our CI test suite
### Running the CI test suite
You can run our full test suite locally using [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md):
@@ -72,33 +51,15 @@ You can run our full test suite locally using [`airbyte-ci`](https://github.com/
airbyte-ci connectors --name=source-greenhouse test
```
### Customizing acceptance Tests
Customize `acceptance-test-config.yml` file to configure acceptance tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
### Dependency Management
All of your dependencies should be managed via Poetry.
To add a new dependency, run:
```bash
poetry add <package-name>
```
Please commit the changes to `pyproject.toml` and `poetry.lock` files.
## Publishing a new version of the connector
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-greenhouse test`
2. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)):
- bump the `dockerImageTag` value in in `metadata.yaml`
- bump the `version` value in `pyproject.toml`
3. Make sure the `metadata.yaml` content is up to date.
If you want to contribute changes to `source-greenhouse`, here's how you can do that:
1. Make your changes locally, or load the connector's manifest into Connector Builder and make changes there.
2. Make sure your changes are passing our test suite with `airbyte-ci connectors --name=source-greenhouse test`
3. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)):
- bump the `dockerImageTag` value in in `metadata.yaml`
4. Make sure the connector documentation and its changelog is up to date (`docs/integrations/sources/greenhouse.md`).
5. Create a Pull Request: use [our PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#pull-request-title-convention).
6. Pat yourself on the back for being an awesome contributor.
7. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
8. Once your PR is merged, the new version of the connector will be automatically published to Docker Hub and our connector registry.
8. Once your PR is merged, the new version of the connector will be automatically published to Docker Hub and our connector registry.

View File

@@ -5,7 +5,7 @@ test_strictness_level: "high"
acceptance_tests:
spec:
tests:
- spec_path: "source_greenhouse/spec.json"
- spec_path: "manifest.yaml"
connection:
tests:
- config_path: "secrets/config.json"

View File

@@ -1,9 +0,0 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#
from source_greenhouse.run import run
if __name__ == "__main__":
run()

File diff suppressed because it is too large Load Diff

View File

@@ -6,11 +6,11 @@ data:
hosts:
- harvest.greenhouse.io
connectorBuildOptions:
baseImage: docker.io/airbyte/python-connector-base:4.0.0@sha256:d9894b6895923b379f3006fa251147806919c62b7d9021b5cd125bb67d7bbe22
baseImage: docker.io/airbyte/source-declarative-manifest:6.56.7@sha256:41be3ac5f569004b6a25507cd40f5152e3691aecd2a9a3f873eb4c559903412d
connectorSubtype: api
connectorType: source
definitionId: 59f1e50a-331f-4f09-b3e8-2e8d4d355f44
dockerImageTag: 0.6.1
dockerImageTag: 0.7.0-rc.1
dockerRepository: airbyte/source-greenhouse
documentationUrl: https://docs.airbyte.com/integrations/sources/greenhouse
githubIssueLabel: source-greenhouse
@@ -20,11 +20,11 @@ data:
name: Greenhouse
remoteRegistries:
pypi:
enabled: true
enabled: false
packageName: airbyte-source-greenhouse
releases:
rolloutConfiguration:
enableProgressiveRollout: false
enableProgressiveRollout: true
registryOverrides:
cloud:
enabled: true
@@ -33,7 +33,7 @@ data:
releaseStage: generally_available
supportLevel: community
tags:
- language:python
- language:manifest-only
- cdk:low-code
connectorTestSuitesOptions:
- suite: liveTests

View File

@@ -1,36 +0,0 @@
[build-system]
requires = [ "poetry-core>=1.0.0",]
build-backend = "poetry.core.masonry.api"
[tool.poetry]
version = "0.6.1"
name = "source-greenhouse"
description = "Source implementation for Greenhouse."
authors = [ "Airbyte <contact@airbyte.io>",]
license = "MIT"
readme = "README.md"
documentation = "https://docs.airbyte.com/integrations/sources/greenhouse"
homepage = "https://airbyte.com"
repository = "https://github.com/airbytehq/airbyte"
[[tool.poetry.packages]]
include = "source_greenhouse"
[tool.poetry.dependencies]
python = "^3.10,<3.12"
airbyte-cdk = "^6"
[tool.poetry.scripts]
source-greenhouse = "source_greenhouse.run:run"
[tool.poetry.group.dev.dependencies]
pytest = "^8.0.0"
pytest-mock = "^3.6"
requests-mock = "^1.9.3"
[tool.poe]
include = [
# Shared tasks definition file(s) can be imported here.
# Run `poe` or `poe --help` to see the list of available tasks.
"${POE_GIT_DIR}/poe-tasks/poetry-connector-tasks.toml",
]

View File

@@ -1,6 +0,0 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#
from .source import SourceGreenhouse
__all__ = ["SourceGreenhouse"]

View File

@@ -1,52 +0,0 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#
import sys
import time
import traceback
from typing import List
from orjson import orjson
from airbyte_cdk.entrypoint import AirbyteEntrypoint, launch
from airbyte_cdk.models import AirbyteErrorTraceMessage, AirbyteMessage, AirbyteMessageSerializer, AirbyteTraceMessage, TraceType, Type
from source_greenhouse import SourceGreenhouse
def _get_source(args: List[str]):
catalog_path = AirbyteEntrypoint.extract_catalog(args)
config_path = AirbyteEntrypoint.extract_config(args)
state_path = AirbyteEntrypoint.extract_state(args)
try:
return SourceGreenhouse(
SourceGreenhouse.read_catalog(catalog_path) if catalog_path else None,
SourceGreenhouse.read_config(config_path) if config_path else None,
SourceGreenhouse.read_state(state_path) if state_path else None,
)
except Exception as error:
print(
orjson.dumps(
AirbyteMessageSerializer.dump(
AirbyteMessage(
type=Type.TRACE,
trace=AirbyteTraceMessage(
type=TraceType.ERROR,
emitted_at=time.time_ns() // 1_000_000,
error=AirbyteErrorTraceMessage(
message=f"Error starting the sync. This could be due to an invalid configuration or catalog. Please contact Support for assistance. Error: {error}",
stack_trace=traceback.format_exc(),
),
),
)
)
).decode()
)
raise
def run() -> None:
args = sys.argv[1:]
source = _get_source(args)
launch(source, args)

View File

@@ -1,21 +0,0 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#
from typing import Any, Mapping, Optional
from airbyte_cdk import TState
from airbyte_cdk.models import ConfiguredAirbyteCatalog
from airbyte_cdk.sources.declarative.yaml_declarative_source import YamlDeclarativeSource
"""
This file provides the necessary constructs to interpret a provided declarative YAML configuration file into
source connector.
WARNING: Do not modify this file.
"""
# Declarative Source
class SourceGreenhouse(YamlDeclarativeSource):
def __init__(self, catalog: Optional[ConfiguredAirbyteCatalog], config: Optional[Mapping[str, Any]], state: TState, **kwargs):
super().__init__(catalog=catalog, config=config, state=state, **{"path_to_yaml": "manifest.yaml"})

View File

@@ -0,0 +1,3 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.
pytest_plugins = ["airbyte_cdk.test.utils.manifest_only_fixtures"]

View File

@@ -0,0 +1,19 @@
[build-system]
requires = [ "poetry-core>=1.0.0",]
build-backend = "poetry.core.masonry.api"
[tool.poetry]
name = "source-greenhouse-tests"
version = "0.6.0"
description = "Unit tests for source-greenhouse"
authors = ["Airbyte <contact@airbyte.io>"]
[tool.poetry.dependencies]
python = "^3.10,<3.13"
airbyte-cdk = "6.10.0"
pytest = "^8"
[tool.pytest.ini_options]
filterwarnings = [
"ignore:This class is experimental*"
]

View File

@@ -3,16 +3,14 @@
#
from unittest.mock import MagicMock
from source_greenhouse.components import GreenhouseStateMigration
def test_migrate():
def test_migrate(components_module):
declarative_stream = MagicMock()
declarative_stream.retriever.partition_router.parent_stream_configs = [
{"partition_field": "parent_id"},
]
config = MagicMock()
state_migrator = GreenhouseStateMigration(declarative_stream, config)
state_migrator = components_module.GreenhouseStateMigration(declarative_stream, config)
stream_state = {
"1111111111": {"updated_at": "2025-01-01T00:00:00.000Z"},

View File

@@ -1,185 +0,0 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#
import json
from unittest.mock import MagicMock
import pytest
import requests
from airbyte_protocol_dataclasses.models import FailureType
from source_greenhouse.source import SourceGreenhouse
@pytest.fixture
def applications_stream():
source = SourceGreenhouse(MagicMock(), {"api_key": "123"}, MagicMock())
streams = source.streams({})
return [s for s in streams if s.name == "applications"][0]
def create_response(headers):
response = requests.Response()
response_body = {"next": "https://airbyte.io/next_url"}
response._content = json.dumps(response_body).encode("utf-8")
response.headers = headers
return response
def test_next_page_token_has_next(applications_stream):
headers = {"link": '<https://harvest.greenhouse.io/v1/applications?per_page=100&since_id=123456789>; rel="next"'}
response = create_response(headers)
next_page_token = applications_stream.retriever._next_page_token(
response=response, last_page_size=100, last_record={"data": "data"}, last_page_token_value=response.json()["next"]
)
assert next_page_token == {"next_page_token": "https://harvest.greenhouse.io/v1/applications?per_page=100&since_id=123456789"}
def test_next_page_token_has_not_next(applications_stream):
response = create_response({})
next_page_token = applications_stream.retriever._next_page_token(
response=response, last_page_size=100, last_record={"data": "data"}, last_page_token_value=response.json()["next"]
)
assert next_page_token is None
def test_request_params_next_page_token_is_not_none(applications_stream):
response = create_response({"link": f'<https://harvest.greenhouse.io/v1/applications?per_page={100}&since_id=123456789>; rel="next"'})
next_page_token = applications_stream.retriever._next_page_token(
response=response, last_page_size=100, last_record={"data": "data"}, last_page_token_value=response.json()["next"]
)
request_params = applications_stream.retriever._request_params(next_page_token=next_page_token, stream_state={})
path = applications_stream.retriever.paginator.path(next_page_token=next_page_token)
assert "applications?per_page=100&since_id=123456789" == path
assert request_params == {"per_page": 100}
def test_request_params_next_page_token_is_none(applications_stream):
request_params = applications_stream.retriever._request_params(stream_state={})
assert request_params == {"per_page": 100}
def test_parse_response_expected_response(applications_stream):
response = requests.Response()
response.status_code = 200
response_content = b"""
[
{
"status": "active",
"source": {
"public_name": "HRMARKET",
"id": 4000067003
},
"rejection_reason": null,
"rejection_details": null,
"rejected_at": null,
"prospective_office": null,
"prospective_department": null,
"prospect_detail": {
"prospect_stage": null,
"prospect_pool": null,
"prospect_owner": {
"name": "John Lafleur",
"id": 4218086003
}
},
"prospect": true,
"location": null,
"last_activity_at": "2020-11-24T23:24:37.049Z",
"jobs": [],
"job_post_id": null,
"id": 19214950003,
"current_stage": null,
"credited_to": {
"name": "John Lafleur",
"last_name": "Lafleur",
"id": 4218086003,
"first_name": "John",
"employee_id": null
},
"candidate_id": 17130511003,
"attachments": [],
"applied_at": "2020-11-24T23:24:37.023Z",
"answers": []
},
{
"status": "active",
"source": {
"public_name": "Jobs page on your website",
"id": 4000177003
},
"rejection_reason": null,
"rejection_details": null,
"rejected_at": null,
"prospective_office": null,
"prospective_department": null,
"prospect_detail": {
"prospect_stage": null,
"prospect_pool": null,
"prospect_owner": {
"name": "John Lafleur",
"id": 4218086003
}
},
"prospect": true,
"location": null,
"last_activity_at": "2020-11-24T23:25:13.804Z",
"jobs": [],
"job_post_id": null,
"id": 19214993003,
"current_stage": null,
"credited_to": {
"name": "John Lafleur",
"last_name": "Lafleur",
"id": 4218086003,
"first_name": "John",
"employee_id": null
},
"candidate_id": 17130554003,
"attachments": [],
"applied_at": "2020-11-24T23:25:13.781Z",
"answers": []
}
]
"""
response._content = response_content
parsed_response = applications_stream.retriever._parse_response(response, stream_state={}, records_schema={})
records = [dict(record) for record in parsed_response]
assert records == json.loads(response_content)
def test_parse_response_empty_content(applications_stream):
response = requests.Response()
response.status_code = 200
response._content = b"[]"
parsed_response = applications_stream.retriever._parse_response(response, stream_state={}, records_schema={})
records = [record for record in parsed_response]
assert records == []
def test_number_of_streams():
source = SourceGreenhouse(MagicMock(), {"api_key": "123"}, MagicMock())
streams = source.streams({})
assert len(streams) == 36
def test_ignore_403(applications_stream):
response = requests.Response()
response.status_code = 403
response._content = b""
parsed_response = applications_stream.retriever._parse_response(response, stream_state={}, records_schema={})
records = [record for record in parsed_response]
assert records == []
def test_retry_429(applications_stream):
response = requests.Response()
response.status_code = 429
response._content = b"{}"
error = applications_stream.retriever.requester.error_handler.interpret_response(response)
assert error.failure_type == FailureType.transient_error

View File

@@ -74,6 +74,7 @@ The Greenhouse connector should not run into Greenhouse API limitations under no
| Version | Date | Pull Request | Subject |
|:-----------|:-----------|:---------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 0.7.0-rc.1 | 2025-06-29 | [47283](https://github.com/airbytehq/airbyte/pull/47283) | Migrate to Manifest-only |
| 0.6.1 | 2025-03-22 | [53800](https://github.com/airbytehq/airbyte/pull/53800) | Update dependencies |
| 0.6.0 | 2025-03-14 | [55774](https://github.com/airbytehq/airbyte/pull/55774) | Promoting release candidate 0.6.0-rc.1 to a main version. |
| 0.6.0-rc.1 | 2025-03-14 | [54702](https://github.com/airbytehq/airbyte/pull/54702) | Update to latest airbyte-cdk, remove custom cursors. |