1
0
mirror of synced 2025-12-21 02:51:29 -05:00
Commit Graph

1266 Commits

Author SHA1 Message Date
Alexandre Girard
97a353d5c5 Run mypy on airbyte-cdk as part of the build pipeline and fix typing issues in the file-based module (#27790)
* Try running only on modified files

* make a change

* return something with the wrong type

* Revert "return something with the wrong type"

This reverts commit 23b828371e.

* fix typing in file-based

* format

* Mypy

* fix

* leave as Mapping

* Revert "leave as Mapping"

This reverts commit 908f063f70.

* Use Dict

* update

* move dict()

* Revert "move dict()"

This reverts commit fa347a8236.

* Revert "Revert "move dict()""

This reverts commit c9237df2e4.

* Revert "Revert "Revert "move dict()"""

This reverts commit 5ac1616414.

* use Mapping

* point to config file

* comment

* strict = False

* remove --

* Revert "comment"

This reverts commit 6000814a82.

* install types

* install types in same command as mypy runs

* non-interactive

* freeze version

* pydantic plugin

* plugins

* update

* ignore missing import

* Revert "ignore missing import"

This reverts commit 1da7930fb7.

* Install pydantic instead

* fix

* this passes locally

* strict = true

* format

* explicitly import models

* Update

* remove old mypy.ini config

* temporarily disable mypy

* format

* any

* format

* fix tests

* format

* Automated Commit - Formatting Changes

* Revert "temporarily disable mypy"

This reverts commit eb8470fa3f.

* implicit reexport

* update test

* fix mypy

* Automated Commit - Formatting Changes

* fix some errors in tests

* more type fixes

* more fixes

* more

* .

* done with tests

* fix last files

* format

* Update gradle

* change source-stripe

* only run mypy on cdk

* remove strict

* Add more rules

* update

* ignore missing imports

* cast to string

* Allow untyped decorator

* reset to master

* move to the cdk

* derp

* move explicit imports around

* Automated Commit - Formatting Changes

* Revert "move explicit imports around"

This reverts commit 56e306b72f.

* move explicit imports around

* Upgrade mypy version

* point to config file

* Update readme

* Ignore errors in the models module

* Automated Commit - Formatting Changes

* move check to gradle build

* Any

* try checking out master too

* Revert "try checking out master too"

This reverts commit 8a8f3e373c.

* fetch master

* install mypy

* try without origin

* fetch from the script

* checkout master

* ls the branches

* remotes/origin/master

* remove some cruft

* comment

* remove pydantic types

* unpin mypy

* fetch from the script

* Update connectors base too

* modify a non-cdk file to confirm it doesn't get checked by mypy

* run mypy after generateComponentManifestClassFiles

* run from the venv

* pass files as arguments

* update

* fix when running without args

* with subdir

* path

* try without /

* ./

* remove filter

* try resetting

* Revert "try resetting"

This reverts commit 3a54c424de.

* exclude autogen file

* do not use the github action

* works locally

* remove extra fetch

* run on connectors base

* try bad  typing

* Revert "try bad  typing"

This reverts commit 33b512a3e4.

* reset stripe

* Revert "reset stripe"

This reverts commit 28f23fc6dd.

* Revert "Revert "reset stripe""

This reverts commit 5bf5dee371.

* missing return type

* do not ignore the autogen file

* remove extra installs

* run from venv

* Only check files modified on current branch

* Revert "Only check files modified on current branch"

This reverts commit b4b728e654.

* use merge-base

* Revert "use merge-base"

This reverts commit 3136670cbf.

* try with updated mypy

* bump

* run other steps after mypy

* reset task ordering

* run mypy though

* looser config

* tests pass

* fix mypy issues

* type: ignore

* optional

* this is always a bool

* ignore

* fix typing issues

* remove ignore

* remove mapping

* Automated Commit - Formatting Changes

* Revert "remove ignore"

This reverts commit 9ffeeb6cb1.

* update config

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Joe Bell <joseph.bell@airbyte.io>
2023-07-13 16:55:48 -07:00
Brian Lai
8e835963c1 [file-based cdk] spec schema improvements and fixes (#28263)
* fix spec schema incompatibility with ui and improve spec documentation and titles

* fix schema to account for latest changes pulled from main

* tests

* remove duplicate test
2023-07-13 15:14:05 -04:00
Catherine Noll
48843cf807 File-based CDK: handle user-input schema (#28052) 2023-07-13 11:59:42 -04:00
Brian Lai
f0951ffbd8 [file-based cdk] file based spec boilerplate backed by pydantic models (#28139)
* file based spec operation backed by pydantic models

* pr feedback to clean up various config and the test scenarios

* fix tests after rebase
2023-07-12 19:42:50 -04:00
Alexandre Girard
40e62fbcb4 Implement parquet parser (#28064)
* Implement parquet parser

* move comment

* comments

* Automated Commit - Formatting Changes

* cleanup

* Update

* remove superfluous method

* update

* format

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-07-12 13:40:30 -07:00
maxi297
07da56914f 🤖 Bump patch version of Airbyte CDK 2023-07-11 17:43:50 +00:00
Maxime Carbonneau-Leclerc
df2a6e50bb Issue 21014/oauth requests (#27973)
* [ISSUE #27494] fix type issue caused by connector builder logging

* [ISSUE #21014] log request/response for oauth as 'global_requests'

* formatcdk

* [ISSUE #21014] support DeclarativeOauth2Authenticator as well

* [ISSUE #21014] improving message grouper tests

* formatcdk

* Test solution with logic in MessageRepository (#27990)

* Test solution with logic in MessageRepository

* Solution without creating a new ModelToComponentFactory

* [ISSUE #21014] adding tests

* [ISSUE #21014] add title and description to global requests

* Revert "Solution without creating a new ModelToComponentFactory"

This reverts commit f17799ecff.

* Automated Commit - Formatting Changes

* [ISSUE #21014] code review

* [ISSUE #21014] do not break on log appender conflict

* Automated Commit - Formatting Changes

* [ISSUE #21014] code review

* formatcdk

* [ISSUE #21014] moving is_global to is_auxiliary
2023-07-11 13:37:38 -04:00
Catherine Noll
07286f7069 File-based CDK: implement schemaless option (#28063) 2023-07-11 11:52:47 -04:00
Brian Lai
f79aa72d64 refactor config validation_policy to not store policies on the config (#28097) 2023-07-10 20:27:02 -04:00
maxi297
b058261081 🤖 Bump patch version of Airbyte CDK 2023-07-06 12:09:49 +00:00
Maxime Carbonneau-Leclerc
c609897848 Stream state is not recorded if cursor field is result of transformation (#27915)
* [ISSUE #27494] move transformation for record selection

* formatcdk

* [ISSUE #27494] fix type issue caused by connector builder logging

* formatcdk

* [ISSUE #27494] code review
2023-07-06 08:02:16 -04:00
Catherine Noll
cfec41b1e5 File-based CDK: implement schema validation policy options (#27816) 2023-07-06 03:35:48 -04:00
Brian Lai
aa57cc21ba [26763] csv config options validation and use by reader (#27850)
* csv options validation applying dialect to reader and rafeactoring parser interfaces a bit

* fix tests

* pr feedback

* add quoting behavior config format
2023-07-05 18:33:23 -04:00
Lake Mossman
b78762f641 Update the request_body_data description to remove typos and be more readable (#27783)
* fix typo in schema

* remove hyphen
2023-07-03 15:45:59 -07:00
maxi297
f472bc0667 🤖 Bump patch version of Airbyte CDK 2023-06-29 17:31:02 +00:00
Maxime Carbonneau-Leclerc
4376527266 Fixing an issue with square as CATs only compare string and not datetime (#27840)
* Fixing an issue with square as CATs only compare string and not datetime

* formatcdk
2023-06-29 13:23:06 -04:00
maxi297
95bd388f9b 🤖 Bump patch version of Airbyte CDK 2023-06-29 14:22:36 +00:00
Maxime Carbonneau-Leclerc
91a56171a0 [ISSUE #26343] update close_slice to use the greater record (#27818)
* [ISSUE #26343] update close_slice to use the greater record

* Renaming parameter for close_slice

* code review
2023-06-29 10:09:12 -04:00
Catherine Noll
6fb53c65ee File-based CDK: implement AvailabilityStrategy.check_availability (#27609) 2023-06-28 17:32:07 -04:00
maxi297
59f6acf2f4 🤖 Bump minor version of Airbyte CDK 2023-06-28 20:58:41 +00:00
Maxime Carbonneau-Leclerc
a013fad5a9 [ISSUE-26343] data feed (#27475)
* [ISSUE #26581] per partition cursor

* [ISSUE #26581] format

* [ISSUE #26581] clean up state management

* [ISSUE #26581] improving Hashabledict

* [ISSUE #26581] format cdk

* [ISSUE #26581] fix tests

* [ISSUE #26581] code review from girarda

* Retrigger pipeline

* Decouple cursor and stream slicer and pushing state management as far up cursor as possible

* Format cdk

* Small fixes/comments

* DatetimeBasedCursor should not update state based on slice (for now at least since it wasn't doing this before)

* [ISSUE #26581] code review

* Automated Commit - Formatting Changes

* [ISSUE #26581] validation overlapping keys

* [ISSUE #26581] add typing

* [ISSUE #26581] code review

* Remove SyncMode from stream_slices

* Removing SyncMode from stream_slices up until SimpleRetriever and fixing typing

* [ISSUE-26434] replacing Record primitive by class

* [ISSUE-26434] update Cursor.update_state to use new record object

* Issue 26343/data feed incremental sync solution 2 (#27481)

* TMP [ISSUE-26434] first solution to enable stop condition on pagination

* TMP [ISSUE-26434] second solution to enable stop condition on pagination

* TMP [ISSUE-26434] second solution fix

* [ISSUE #26343] fixing behavior and adding tests

* [ISSUE #26343] only updating state once a slice to allow for data feed

* [ISSUE #26343] removing freezing of cursor

* format cdk

* [ISSUE #26343] ensure data_feed doesn't have end_datetime

* [ISSUE #26343] self review

* [ISSUE #26343] code review

* [ISSUE #26343] code review clean up

* [ISSUE #26343] code review clean up

* Code review

* [ISSUE #26343] add warn log message in DatetimeBasedCursor

* format

* Format
2023-06-28 16:53:00 -04:00
girarda
8f8cbd80a7 🤖 Bump patch version of Airbyte CDK 2023-06-28 16:38:56 +00:00
Alexandre Girard
4d08781d04 Revert "Low-Code CDK: make RecordFilter.filter_records as generator (#24772)" (#27789)
This reverts commit 032f9b8045.
2023-06-28 09:23:05 -07:00
Alexandre Girard
6ebabdc2fa File-based CDK: Support for incremental syncs (#27382)
* New file-based CDK module scaffolding

* Address code review comments

* Formatting

* Automated Commit - Formatting Changes

* Apply suggestions from code review

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>

* Automated Commit - Formatting Changes

* address CR comments

* Update tests to use builder pattern

* Move max files for schema inference onto the discovery policy

* Reorganize stream & its dependencies

* File CDK: error handling for CSV parser (#27176)

* file url and updated_at timestamp is added to state's history field

* Address CR comments

* Address CR comments

* Use stream_slice to determine which files to sync

* fix

* test with no input state

* test with multiple files

* filter out older files

* group by timestamp

* Add another test

* comment

* use min time

* skip files that are already in the history

* move the code around

* include files that are not in the history

* remove start_timestamp

* cleanup

* sync misisng recent files even if history is more recent

* remove old files if history is full

* resync files if history is incomplete

* sync recent files

* comment

* configurable history size

* configurable days to sync if history is full

* move to a stateful object

* Only update state once per file

* two unit tests

* Unit tests

* missing files

* remove inner state

* fix tests

* fix interface

* fix constructor

* Update interface

* cleanup

* format

* Update

* cleanup

* Add timestamp and source file to schema

* set file uri on record

* format

* comment

* reset

* notes

* delete dead code

* format

* remove dead code

* remove dead code

* warning if history is not complete

* always set is_history_partial in the state

* rename

* Add a readme

* format

* Update

* rename

* rename

* missing files

* get instead of compute

* sort alphabetically, and sync everthing if the history is not partial

* unit tests

* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/README.md

Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>

* Update docs

* reset

* Test to verify we remove files sorted (datetime, alphabetically)

* comment

* Update scenario

* Rename method to get_state

* If the file's ts is equal to the earliest ts, only sync it if its alphabetically greater than the file

* add missing test

* rename

* rename and update comments

* Update comment for clarity

* inject the cursor

* add interface

* comment

* Handle the case where the file has been modified since it was synced

* Only inject from AbstractFileSource

* keep the remote files in the stream slices

* Use file_based typedefs

* format

* Update the comment

* simplify the logic, update comment, and add a test

* Add a comment

* slightly cleaner

* clean up

* typing

* comment

* I think this is simpler to reason about

* create the cursor in the source

* update

* Remove methods from FiledBasedStreamReader and AbstractFileBasedStream interface (#27736)

* update the interface

* Add a comment

* rename

---------

Co-authored-by: Catherine Noll <noll.catherine@gmail.com>
Co-authored-by: clnoll <clnoll@users.noreply.github.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
2023-06-27 15:58:26 -07:00
flash1293
d0d906c0a7 🤖 Bump patch version of Airbyte CDK 2023-06-27 06:33:44 +00:00
Joe Reuter
8aba48810c Low-code CDK: Serialize request body as string for connector builder module (#27657)
* serialize request body as string

* fix some bugs
2023-06-27 08:27:16 +02:00
brianjlai
4cd2cbbea5 🤖 Bump patch version of Airbyte CDK 2023-06-23 17:55:25 +00:00
midavadim
c44c3eae48 CDK: availability check - handle HttpErrors which happen during slice extraction (#26630)
* for availability check - handle  HttError happens during slice extraction (reading of parent stream),
updated reason messages,
moved check availability call under common try/except which handles errors during usual stream read,
moved log messages which indicate start of the stream sync before availability check in to make to understand which stream is the source of errors

* why do we return here and not try next stream?

* fixed bug in CheckStream, now we try to check availability for all streams
2023-06-23 13:15:25 -04:00
Joe Reuter
c53d1fa29d Datetime inferrer: Improve detected formats (#27546)
* consolidate formats

* Automated Commit - Formatting Changes

* consolidate formats

* consolidate formats

---------

Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-06-23 05:23:33 -04:00
maxi297
516a8a59c4 🤖 Bump minor version of Airbyte CDK 2023-06-22 17:01:00 +00:00
Maxime Carbonneau-Leclerc
a45a1e3341 Maxi297/refactoring declarative state management (#27445)
* [ISSUE #26581] per partition cursor

* [ISSUE #26581] format

* [ISSUE #26581] clean up state management

* [ISSUE #26581] improving Hashabledict

* [ISSUE #26581] format cdk

* [ISSUE #26581] fix tests

* [ISSUE #26581] code review from girarda

* Retrigger pipeline

* Decouple cursor and stream slicer and pushing state management as far up cursor as possible

* Format cdk

* Small fixes/comments

* DatetimeBasedCursor should not update state based on slice (for now at least since it wasn't doing this before)

* [ISSUE #26581] code review

* Automated Commit - Formatting Changes

* [ISSUE #26581] validation overlapping keys

* [ISSUE #26581] add typing

* [ISSUE #26581] code review

* Remove SyncMode from stream_slices

* Removing SyncMode from stream_slices up until SimpleRetriever and fixing typing

* format cdk
2023-06-22 12:54:36 -04:00
Alexandre Girard
d548587161 [ISSUE #27289] Document macros output in the manifest schema (#27600)
* Add example for macros

* Update changelog

* Revert "Update changelog"

This reverts commit 2993e5820e.
2023-06-22 09:14:17 -07:00
Brian Lai
02e4bd07f7 [26989] Add request filter for cloud and integration test fixtures for e2e sync testing (#27534)
* add the request filters and integration test fixtures

* pr feedback and some tweaks to the testing framework

* optimize the cache for more hits

* formatting

* remove cache
2023-06-22 12:14:07 -04:00
Catherine Noll
a8e99a46e6 File CDK: define streams via glob list (#27476) 2023-06-22 11:50:35 -04:00
maxi297
49a60ef735 🤖 Bump patch version of Airbyte CDK 2023-06-22 14:04:07 +00:00
Maxime Carbonneau-Leclerc
d9a5e2d873 🐛 Source zenloop: update to state per partition (#27556)
* [ISSUE #26581] per partition cursor

* [ISSUE #26581] format

* [ISSUE #26581] clean up state management

* [ISSUE #26581] improving Hashabledict

* [ISSUE #26581] format cdk

* [ISSUE #26581] fix tests

* [ISSUE #26581] code review from girarda

* Retrigger pipeline

* [ISSUE #26581] code review

* Automated Commit - Formatting Changes

* [ISSUE #26581] validation overlapping keys

* [ISSUE #26581] add typing

* [ISSUE #26581] code review

* [ISSUE #26607] zenloop migration (#27243)

* [ISSUE #26607] zenloop migration implementation without tests

* [ISSUE #26607] zenloop migration adding edge cases

* [ISSUE #26607] add cursor field for state

* [ISSUE #26607] update abnormal state

* [ISSUE #26607] ensure default state

* [ISSUE #26607] updating CATs state

* [ISSUE #26607] revert migrating cursor

* [ISSUE #26607] remove default cursor value

* [ISSUE #26607] improve error message

* [ISSUE #26607] changelog

---------

Co-authored-by: Augustin <augustin@airbyte.io>

* 🤖 Auto format source-zenloop code [skip ci]

* Automated Commit - Formatting Changes

* [ISSUE #26581] move partition serialization to JSON

* Revert "[ISSUE #26607] zenloop migration (#27243)"

This reverts commit 5c6f19b775.

* Revert "Revert "[ISSUE #26607] zenloop migration (#27243)""

This reverts commit e363fd6cb8.

* [ISSUE #26607] update zenloop version

* TMP specify cdk version

* [ISSUE #26607] do not lock zenloop airbyte_cdk version

* trigger pipeline

* Automated Commit - Formatting Changes

* trigger pipeline

---------

Co-authored-by: Augustin <augustin@airbyte.io>
Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
2023-06-22 08:42:20 -05:00
maxi297
59ff7b1c6c 🤖 Bump minor version of Airbyte CDK 2023-06-21 17:06:04 +00:00
Maxime Carbonneau-Leclerc
8926970c86 Issue 26581/per partition cursor (#27223)
* [ISSUE #26581] per partition cursor

* [ISSUE #26581] format

* [ISSUE #26581] clean up state management

* [ISSUE #26581] improving Hashabledict

* [ISSUE #26581] format cdk

* [ISSUE #26581] fix tests

* [ISSUE #26581] code review from girarda

* Retrigger pipeline

* [ISSUE #26581] code review

* Automated Commit - Formatting Changes

* [ISSUE #26581] validation overlapping keys

* [ISSUE #26581] add typing

* [ISSUE #26581] code review

* [ISSUE #26607] zenloop migration (#27243)

* [ISSUE #26607] zenloop migration implementation without tests

* [ISSUE #26607] zenloop migration adding edge cases

* [ISSUE #26607] add cursor field for state

* [ISSUE #26607] update abnormal state

* [ISSUE #26607] ensure default state

* [ISSUE #26607] updating CATs state

* [ISSUE #26607] revert migrating cursor

* [ISSUE #26607] remove default cursor value

* [ISSUE #26607] improve error message

* [ISSUE #26607] changelog

---------

Co-authored-by: Augustin <augustin@airbyte.io>

* 🤖 Auto format source-zenloop code [skip ci]

* Automated Commit - Formatting Changes

* [ISSUE #26581] move partition serialization to JSON

* Revert "[ISSUE #26607] zenloop migration (#27243)"

This reverts commit 5c6f19b775.

* [ISSUE #26607] revert zenloop

---------

Co-authored-by: Augustin <augustin@airbyte.io>
Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
2023-06-21 12:59:11 -04:00
girarda
3a8601eee8 🤖 Bump minor version of Airbyte CDK 2023-06-20 21:49:37 +00:00
Alexandre Girard
dee2e9a905 🐛 Use url encoding in oauth refresh request (#27523)
* Revert "🐛 CDK: replace `data` with `json` when making OAuth calls (#27350)"

This reverts commit 780f4415d9.

* Revert "Set content-type header on oauth request (#27225)"

This reverts commit 2864f72ff4.
2023-06-20 14:41:06 -07:00
Catherine Noll
f464a330f8 File-based CDK module scaffolding (#27122)
Includes CSV schema inference & record parser (#27176)

---------

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
2023-06-19 11:01:11 -04:00
Lake Mossman
8be753df02 add link to record selector docs (#27291)
* add link to record selector docs

* Automated Commit - Formatting Changes

---------

Co-authored-by: lmossman <lmossman@users.noreply.github.com>
2023-06-16 12:09:59 -07:00
davydov-d
33b627d543 🤖 Bump patch version of Airbyte CDK 2023-06-14 19:59:56 +00:00
Denys Davydov
780f4415d9 🐛 CDK: replace data with json when making OAuth calls (#27350)
* Connector health: source hubspot, gitlab, snapchat-marketing: fix builds

* Airbyte CDK: replace data with json when making OAuth calls
2023-06-14 22:51:37 +03:00
maxi297
cdbedca86f 🤖 Bump patch version of Airbyte CDK 2023-06-13 12:46:50 +00:00
Maxime Carbonneau-Leclerc
f48849fdb4 [ISSUE #26909] adding message repository (#27158)
* [ISSUE #26909] adding message repository

* Automated Commit - Formatting Changes

* [ISSUE #26909] improve entrypoint error handling

* format CDK

* [ISSUE #26909] adding an integration test
2023-06-13 08:40:55 -04:00
Joe Reuter
27635ba26a Low code CDK: Datetime format documentation (#27149)
* add format documentation

* fix

* improve
2023-06-12 17:39:12 -07:00
Alexandre Girard
2864f72ff4 Set content-type header on oauth request (#27225)
* Set content-type header on oauth authenticator

* Revert "Set content-type header on oauth authenticator"

This reverts commit 1e6815e9bb.

* Set header on oauth request

* Fix test

* Verify header is set

* Automated Commit - Formatting Changes

---------

Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-06-12 13:29:53 -04:00
flash1293
52ee42ccd5 🤖 Bump patch version of Airbyte CDK 2023-06-09 08:50:33 +00:00
Joe Reuter
d6512dea2c CDK: Datetime format inferrer (#27071)
* datetime inferrer class

* format

* pass inferred date formats along

* review comments
2023-06-09 10:33:54 +02:00