1
0
mirror of synced 2026-01-27 07:02:03 -05:00
Commit Graph

86 Commits

Author SHA1 Message Date
Anton Karpets
2cfa6ea2c8 File-based CDK: fix schemas merge for nullable object types (#37619) 2024-05-02 10:40:20 +03:00
Ella Rohm-Ensing
b7819d9f6c python: assert actual == expected ordering (#36980) 2024-04-11 15:16:33 +00:00
Anatolii Yatsuk
157be91cb1 File-based CDK: Add skip_wrong_number_of_fields_error parameter for CSV parser (#36237)
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
2024-03-20 22:49:49 +02:00
Tobias Macey
f67938993e [airbyte-cdk] Fix tab delimiter configuration in CSV file type (#35901) 2024-03-13 13:46:32 -03:00
Ella Rohm-Ensing
2ac5248387 Emit record counts in state messages for concurrent streams (#35907)
Co-authored-by: brianjlai <brian.lai@airbyte.io>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
2024-03-08 19:08:59 -05:00
Ella Rohm-Ensing
a4dca3b45b CDK: assert >0 state messages per read (fix tests) (#35906)
<!--
Thanks for your contribution! 
Before you submit the pull request, 
I'd like to kindly remind you to take a moment and read through our guidelines
to ensure that your contribution aligns with the type of contributions our project accepts.
All the information you need can be found here:
   https://docs.airbyte.com/contributing-to-airbyte/

We truly appreciate your interest in contributing to Airbyte,
and we're excited to see what you have to offer! 

If you have any questions or need any assistance, feel free to reach out in #contributions Slack channel.
-->

## What
* After https://github.com/airbytehq/airbyte/pull/35905, we should be emitting a state message with every successful sync. However there are a few tests that were too lenient and weren't actually _successful_ syncs. This PR fixes those cases and adds validation that we emit at least one state message per successful sync. 

## How
* Add an assertion that we get at least 1 state message for a successful sync 
* Fix some tests that previously "output 0 expected records" but actually errored silently - do not run them as read tests
* Fix a test that failed silently due to lack of support for multi-format
* Add a new test for syncs that output 0 records successfully

## 🚨 User Impact 🚨
None - test changes


## Pre-merge Actions
*Expand the relevant checklist and delete the others.*

<details><summary><strong>New Connector</strong></summary>

### Community member or Airbyter

- **Community member?** Grant edit access to maintainers ([instructions](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests))
- Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run `./gradlew :airbyte-integrations:connectors:<name>:integrationTest`.
- Connector version is set to `0.0.1`
    - `Dockerfile` has version `0.0.1`
- Documentation updated
    - Connector's `README.md`
    - Connector's `bootstrap.md`. See [description and examples](https://docs.google.com/document/d/1ypdgmwmEHWv-TrO4_YOQ7pAJGVrMp5BOkEVh831N260/edit?usp=sharing)
    - `docs/integrations/<source or destination>/<name>.md` including changelog with an entry for the initial version. See changelog [example](https://docs.airbyte.io/integrations/sources/stripe#changelog)
    - `docs/integrations/README.md`

### Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

- Create a non-forked branch based on this PR and test the below items on it
- Build is successful
- If new credentials are required for use in CI, add them to GSM. [Instructions](https://docs.airbyte.io/connector-development#using-credentials-in-ci).

</details>

<details><summary><strong>Updating a connector</strong></summary>

### Community member or Airbyter

- Grant edit access to maintainers ([instructions](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests))
- Unit & integration tests added


### Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

- Create a non-forked branch based on this PR and test the below items on it
- Build is successful
- If new credentials are required for use in CI, add them to GSM. [Instructions](https://docs.airbyte.io/connector-development#using-credentials-in-ci).

</details>

<details><summary><strong>Connector Generator</strong></summary>

- Issue acceptance criteria met
- PR name follows [PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook)
- If adding a new generator, add it to the [list of scaffold modules being tested](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connector-templates/generator/build.gradle#L41)
- The generator test modules (all connectors with `-scaffold` in their name) have been updated with the latest scaffold by running `./gradlew :airbyte-integrations:connector-templates:generator:generateScaffolds` then checking in your changes
- Documentation which references the generator is updated as needed

</details>

<details><summary><strong>Updating the Python CDK</strong></summary>

### Airbyter

Before merging:
- Pull Request description explains what problem it is solving
- Code change is unit tested
- Build and my-py check pass
- Smoke test the change on at least one affected connector
   - On Github: Run [this workflow](https://github.com/airbytehq/airbyte/actions/workflows/connectors_tests.yml), passing `--use-local-cdk --name=source-<connector>` as options
   - Locally: `airbyte-ci connectors --use-local-cdk --name=source-<connector> test`
- PR is reviewed and approved
      
After merging:
- [Publish the CDK](https://github.com/airbytehq/airbyte/actions/workflows/publish-cdk-command-manually.yml)
   - The CDK does not follow proper semantic versioning. Choose minor if this the change has significant user impact or is a breaking change. Choose patch otherwise.
   - Write a thoughtful changelog message so we know what was updated.
- Merge the platform PR that was auto-created for updating the Connector Builder's CDK version
   - This step is optional if the change does not affect the connector builder or declarative connectors.

</details>
2024-03-08 14:21:46 -08:00
Ella Rohm-Ensing
acbdc2d6e1 Introduce FinalStateCursor to emit state messages at the end of full refresh syncs (#35905)
Co-authored-by: brianjlai <brian.lai@airbyte.io>
2024-03-08 16:58:26 -05:00
Ella Rohm-Ensing
a090088594 file cdk: handle scalar values that resolve to None (#35688)
<!--
Thanks for your contribution! 
Before you submit the pull request, 
I'd like to kindly remind you to take a moment and read through our guidelines
to ensure that your contribution aligns with the type of contributions our project accepts.
All the information you need can be found here:
   https://docs.airbyte.com/contributing-to-airbyte/

We truly appreciate your interest in contributing to Airbyte,
and we're excited to see what you have to offer! 

If you have any questions or need any assistance, feel free to reach out in #contributions Slack channel.
-->

## What
* Closes https://github.com/airbytehq/airbyte/issues/34151
* Closes https://github.com/airbytehq/oncall/issues/4386

## How
Handle cases where the python value of a pyarrow scalar is None. This can be due to null values in data, as well as null-like values like `NaT` (similar to `NaN`). We previously handled this for `None` binary types, but now handle this for `None` of any type.

## 🚨 User Impact 🚨
No breaking changes. After this CDK version is released we should update the CDK dependency in S3 and any other file sources that parse parquet


## Pre-merge Actions
*Expand the relevant checklist and delete the others.*

<details><summary><strong>New Connector</strong></summary>

### Community member or Airbyter

- **Community member?** Grant edit access to maintainers ([instructions](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests))
- Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run `./gradlew :airbyte-integrations:connectors:<name>:integrationTest`.
- Connector version is set to `0.0.1`
    - `Dockerfile` has version `0.0.1`
- Documentation updated
    - Connector's `README.md`
    - Connector's `bootstrap.md`. See [description and examples](https://docs.google.com/document/d/1ypdgmwmEHWv-TrO4_YOQ7pAJGVrMp5BOkEVh831N260/edit?usp=sharing)
    - `docs/integrations/<source or destination>/<name>.md` including changelog with an entry for the initial version. See changelog [example](https://docs.airbyte.io/integrations/sources/stripe#changelog)
    - `docs/integrations/README.md`

### Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

- Create a non-forked branch based on this PR and test the below items on it
- Build is successful
- If new credentials are required for use in CI, add them to GSM. [Instructions](https://docs.airbyte.io/connector-development#using-credentials-in-ci).

</details>

<details><summary><strong>Updating a connector</strong></summary>

### Community member or Airbyter

- Grant edit access to maintainers ([instructions](https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests))
- Unit & integration tests added


### Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

- Create a non-forked branch based on this PR and test the below items on it
- Build is successful
- If new credentials are required for use in CI, add them to GSM. [Instructions](https://docs.airbyte.io/connector-development#using-credentials-in-ci).

</details>

<details><summary><strong>Connector Generator</strong></summary>

- Issue acceptance criteria met
- PR name follows [PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook)
- If adding a new generator, add it to the [list of scaffold modules being tested](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connector-templates/generator/build.gradle#L41)
- The generator test modules (all connectors with `-scaffold` in their name) have been updated with the latest scaffold by running `./gradlew :airbyte-integrations:connector-templates:generator:generateScaffolds` then checking in your changes
- Documentation which references the generator is updated as needed

</details>

<details><summary><strong>Updating the Python CDK</strong></summary>

### Airbyter

Before merging:
- Pull Request description explains what problem it is solving
- Code change is unit tested
- Build and my-py check pass
- Smoke test the change on at least one affected connector
   - On Github: Run [this workflow](https://github.com/airbytehq/airbyte/actions/workflows/connectors_tests.yml), passing `--use-local-cdk --name=source-<connector>` as options
   - Locally: `airbyte-ci connectors --use-local-cdk --name=source-<connector> test`
- PR is reviewed and approved
      
After merging:
- [Publish the CDK](https://github.com/airbytehq/airbyte/actions/workflows/publish-cdk-command-manually.yml)
   - The CDK does not follow proper semantic versioning. Choose minor if this the change has significant user impact or is a breaking change. Choose patch otherwise.
   - Write a thoughtful changelog message so we know what was updated.
- Merge the platform PR that was auto-created for updating the Connector Builder's CDK version
   - This step is optional if the change does not affect the connector builder or declarative connectors.

</details>
2024-03-05 09:07:02 -08:00
Brian Lai
ef98194673 Emit final state message for full refresh syncs and consolidate read flows (#35622) 2024-03-05 01:05:06 -05:00
Danny Tiesling
e671aa320d 🐛 Source S3: fix exception when setting CSV stream delimiter to \t. (#35246)
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
2024-02-23 14:34:29 -03:00
Artem Inzhyyants
0954ad3d3a Airbyte CDK: add interpolation for request options (#35485)
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
2024-02-22 19:40:44 +01:00
Catherine Noll
e8910e427a File-based CDK: make incremental syncs concurrent (#34540) 2024-02-07 20:41:04 -05:00
Catherine Noll
7f97f245bc CDK: fix flaky scenario-based tests by sorting on k & v (#34912) 2024-02-06 18:55:39 -05:00
Maxime Carbonneau-Leclerc
ca8590e2b4 Have StateBuilder return our actual state object and not simply a dict (#34625) 2024-01-30 08:46:03 -05:00
Catherine Noll
eb31e4d2ba File-based CDK: make full refresh concurrent (#34411) 2024-01-29 19:33:50 -05:00
Baz
cf7f700bbb 🎉 Airbyte CDK (File-based CDK): Stop the sync if the record could not be parsed (#32589) 2024-01-11 21:26:23 +02:00
Joe Reuter
9065181e77 Unstructured parser: Support txt (#32929)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-12-15 11:31:45 +01:00
Joe Reuter
c1e428f35c File CDK: Handle 422 errors separately (#33300) 2023-12-13 11:03:36 +00:00
Maxime Carbonneau-Leclerc
0c2d43fdf9 Issue 32871/extract trace message creation (#33227) 2023-12-11 09:20:45 -05:00
Augustin
0b33caecda Revert "[skip ci] formatting: add missing license headers (#33250)" (#33289) 2023-12-11 11:38:37 +01:00
Augustin
60c1cc01ad [skip ci] formatting: add missing license headers (#33250) 2023-12-11 10:15:18 +01:00
Joe Reuter
aa220fc515 Stop sync on traced exception (#33246)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-12-08 18:07:25 +01:00
Joe Reuter
f5ac5cfd80 File CDK: Add file processing via API to document file type parser (#32781)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-12-08 15:48:37 +01:00
Joe Reuter
7fd92e2a03 File CDK: Parser defined primary key (#33009)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-12-08 15:15:33 +01:00
Joe Reuter
5b682ef74f Unstructured parser: Handle parsing errors better (#32700)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-12-08 11:47:05 +01:00
Catherine Noll
7ed47ee7d9 File-based CDK: hide the primary key field from config (#33172) 2023-12-06 11:12:50 -05:00
Maxime Carbonneau-Leclerc
ba83309bb1 [ISSUE #32870] Adding entrypoint wrapper and migrating file based and… (#33103) 2023-12-06 08:46:38 -05:00
Joe Reuter
f8b0b3e99e File CDK: Improve stream config appearance (#32420) 2023-11-14 11:49:19 +01:00
Joe Reuter
f1a11e1927 File CDK: Allow skipping unparseable file types (#32092)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-11-09 16:48:24 +01:00
Joe Reuter
e113ff66c5 CDK: Make consts required in Pydantic generated json schemas (#32251) 2023-11-09 16:12:11 +01:00
Joe Reuter
66dd29f764 File CDK unstructured parser: Improve file type detection (#31997) 2023-11-02 12:19:27 +01:00
Martin Hwasser
bc4b7198a9 Add pptx support in file based cdk (#31912)
Co-authored-by: Joe Reuter <joe@airbyte.io>
2023-10-30 14:42:39 +01:00
Joe Reuter
e3793c1491 Move over unstructured parser (#31390)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-26 17:50:57 +02:00
Anatolii Yatsuk
c719137df3 🐛 Airbyte CDK: Fix flake errors in file-based CDK (#31771) 2023-10-24 16:15:11 +03:00
Anatolii Yatsuk
ce2342dde8 🎉 Airbyte CDK: Add CustomFileBasedException for custom errors in file-based CDK (#31704) 2023-10-24 11:09:50 +00:00
Alexandre Girard
7da2822488 Concurrent CDK: catch exceptions from worker thread and add integration test scenarios (#31245)
Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-10-23 08:39:58 -07:00
Joe Reuter
d474827068 File CDK: Don't fetch full file list for availability check (#31651)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-23 16:14:41 +02:00
Joe Reuter
bb07939646 File CDK: Add analytics messages for parser usage (#31498)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-19 15:42:51 +02:00
Alexandre Girard
ef9bd72a7e Parameterize ScenarioBuilder on Source type (#31244)
Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
2023-10-16 17:12:18 -07:00
Joe Reuter
e35a1f2cd9 File CDK: Allow configuration of parsed records during check and discover from parser (#31281)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-13 09:50:22 +02:00
Roman Yermilov [GL]
e561d5d432 Airbyte CDK: fix none type binary error in parquet parser (#31073) 2023-10-05 15:56:02 +04:00
Anton Karpets
767800d2d7 🐛Airbyte CDK: fix parsing of UUID fields in avro files (#31096) 2023-10-05 10:53:18 +03:00
Marius Posta
7ae97175a6 gradle: fix repo wide behaviour (#30607) 2023-09-28 05:01:13 -07:00
Maxime Carbonneau-Leclerc
b6836ad950 [ISSUE #30353] remove file_type from stream config (#30453) 2023-09-18 08:50:00 -04:00
Maxime Carbonneau-Leclerc
48e8816b6b [oncall #2838] migrate parsing errors as config errors (#30209) 2023-09-06 13:38:48 -04:00
Maxime Carbonneau-Leclerc
5b653676aa Update spec and fix autogenerated headers with skip after (#30123) 2023-09-03 09:26:53 -04:00
Maxime Carbonneau-Leclerc
399b4d1fca File-based CDK: ensure no errors in Sentry given empty CSV (#29944) 2023-09-02 09:40:08 -04:00
Maxime Carbonneau-Leclerc
e2fb04f72d File-based CDK: allow user to provided column names (#29868) 2023-08-28 18:00:19 -04:00
Maxime Carbonneau-Leclerc
82a96e0c69 File-based CDK: allow for extension mismatch (#29835) 2023-08-25 11:44:49 -04:00
Maxime Carbonneau-Leclerc
40b76a7813 Source S3: v4 rollout/feature parity (#29753) 2023-08-23 11:30:08 -04:00