1
0
mirror of synced 2026-02-03 01:02:02 -05:00
Commit Graph

35 Commits

Author SHA1 Message Date
Alexandre Girard
df01616951 [Issue #23497] Deduplicate query parameters for declarative connectors (#28550)
* remove duplicate param

* remove duplicate params

* fix some of the typing issues

* fix typing issues

* newline

* format

* Enable by default

* Add missing file

* refactor and remove flag

* none check

* move line of code

* fix typing in rate_limiting

* comment

* use typedef

* else branch

* format

* gate the feature

* rename test

* fix the test

* only dedupe if the values are the same

* Add some tests

* convert values to strings

* Document the change

* implement in requester too
2023-07-25 14:22:25 -07:00
Alexandre Girard
02f771b422 Do not remove trailing slash from path (#24003)
* Do not remove trailing slash from path

* Add a breaking test

* Add some tests on HttpStream
2023-03-13 23:59:24 +00:00
Cole Snodgrass
2e099acc52 update headers from 2022 -> 2023 (#22594)
* It's 2023!

* 2022 -> 2023

---------

Co-authored-by: evantahler <evan@airbyte.io>
2023-02-08 13:01:16 -08:00
Ella Rohm-Ensing
19c05f7653 Restore HttpAvailabilityStrategy as default (revert https://github.com/airbytehq/airbyte/pull/21488) (#21924) 2023-02-01 17:56:13 +00:00
Ella Rohm-Ensing
8bb41282c5 CDK 0.15.0 and source-github 0.3.10 -- revert AvailabillityStrategy changes (#20523)
* Revert "source-github: move known error handling to GithubAvailabilityStrategy (#19978)"

This reverts commit f97db17ccc.

* Revert "🐛 Python CDK: fix `StopIteration` error for `check_availability` (#20429)"

This reverts commit 4e9b014277.

* Revert "CDK: `AbstractSource.read()` skips syncing stream if its unavailable (add `AvailabilityStrategy` concept) (#19977)"

This reverts commit 55a32886a3.

* Restore changelog entries

* bump CDK version

* Bump Github version

* Re-add removed dependencies

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
2022-12-15 14:24:24 -05:00
Ella Rohm-Ensing
55a32886a3 CDK: AbstractSource.read() skips syncing stream if its unavailable (add AvailabilityStrategy concept) (#19977)
* Rough first implememtation of AvailabilityStrategy s

* Basic unit tests for AvailabilityStrategy and ScopedAvailabilityStrategy

* Make availability_strategy a property, separate out tests

* Remove from DeclarativeSource, remove Source parameter from methods, make default no AvailabilityStrategy

* Add skip stream if not available to read()

* Changes to CDK to get source-github working using AvailabilityStrategy, flakecheck

* reorganize cdk class, add HTTPAvailabilityStrategy test

* cleanup, docstrings

* pull out error handling into separate method

* Pass source and logger to check_connection method

* Add documentation links, handle 403 specifically

* Fix circular import

* Add AvailabilityStrategy to Stream and HTTPStream classes

* Remove AS from abstract_source, add to Stream, HTTPStream, AvailabilityStrategy unit tests passing for per-stream strategies

* Modify MockHttpStream to set no AvailabilityStrategy since source test mocking doesn't support this

* Move AvailabilityStrategy class to sources.streams

* Move HTTPAvailabilityStrategy to http module

* Use pascal case for HttpAvailabilityStrategy

* Remove docs message method :( and default to True availability on unhandled HTTPErrors

* add check_availability method to stream class

* Add optional source parameter

* Add test for connector-specific documentation, small tests refactor

* Add test that performs the read() function for stream with default availability strategy

* Add test for read function behavior when stream is unavailable

* Add 403 info in logger message

* Don't return error for other HTTPErrors

* Split up error handling into methods 'unavailable_error_codes' and 'get_reason_for_error'

* rework overrideable list of status codes to be a dict with reasons, to enforce that users provide reasons for all listed errors

* Fix incorrect typing

* Move HttpAvailability to its own module, fix flake errors

* Fix ScopedAvailabilityStrategy, docstrings and types for streams/availability_strategy.py

* Docstrings and types for core.py and http/availability_strategy.py

* Move _get_stream_slices to a StreamHelper class

* Docstrings + types for stream_helpers.py, cleanup test_availability.py

* Clean up test_source.py

* Move logic of getting the initial record from a stream to StreamHelper class

* Add changelog and bump minor version

* change 'is True' and 'is False' behavior

* use mocker.MagicMock

* Remove ScopedAvailabilityStrategy

* Don't except non-403 errors, check_stream uses availability_strategy if possible

* CDK: pass error to reasons_for_error_codes

* make get_stream_slice public

* Add tests for raising unhandled errors and retries are handled

* Add tests for CheckStream via AvailabilityStrategy

* Add documentation for stream availability of http streams

* Move availability unit tests to correct modules, report error message if possible

* Add test for reporting specific error if available
2022-12-12 14:32:34 -05:00
Alexandre Girard
4dcc68ade9 SimpleRetriever raises exception with error message extracted from the response (#20032)
* Raise exception with error message

* read from detail field if no other value was found

* format

* dont test the error message

* format

* error code

* bump
2022-12-02 17:56:23 -08:00
Alexandre Girard
e029353b9f SimpleRetriever yield request and response as log messages (#18644)
* method yielding airbytemessage

* move to Stream

* update abstract source

* reset

* missing file

* Yield request and response as log messages

* only emit request and responses if the debug flag is on

* add test docker image

* script to run acceptance tests with local cdk

* Update conftest to use a different image

* extract to method

* dont use a different image tag

* Always install local cdk

* break the cdk

* get path from current working directory

* or

* ignore unit test

* debug log

* Revert "AMI change: ami-0f23be2f917510c26 -> ami-005924fb76f7477ce (#18689)"

This reverts commit bf06decf73.

* build from the top

* Update source-acceptance-test

* fix

* copy setup

* some work on the gradle plugin

* reset to master

* delete unused file

* delete unused file

* reset to master

* optional argument

* delete dead code

* use latest cdk with sendgrid

* fix sendgrid dockerfile

* break the cdk

* use local file

* Revert "break the cdk"

This reverts commit 600c195541.

* dont raise an exception

* reset to master

* unit tests

* missing test

* more unit tests

* remove deprecated comment

* newline

* reset to master

* remove files

* reset

* Update abstract source

* remove method from stream

* convert to airbytemessage

* unittests

* Update

* unit test

* remove debug logs

* Revert "remove debug logs"

This reverts commit a1a139ef37.

* Revert "Revert "remove debug logs""

This reverts commit b1d62cdb60.

* Revert "reset to master"

This reverts commit 3fa6a004c1.

* fix

* slightly better test

* typing

* extract method

* Revert "Revert "reset to master""

This reverts commit 5dac7c2804.

* reset to master

* reset to master

* Revert "reset to master"

This reverts commit 3fa6a004c1.

* Comment

* operate on the message

* Revert "Revert "reset to master""

This reverts commit 5833c84d0a.

* comment

* test

* Revert "test"

This reverts commit 2f91b803b0.

* test

* Revert "test"

This reverts commit 62d95ebbb5.

* test

* Revert "test"

This reverts commit 27150ba341.

* format

* format

* symlink

* Update setup

* update path

* reset to master

* update

* Add local files

* greenhouse

* format

* symlink

* try reordering

* better error message

* better log message

* reset to master

* Revert "merge for qa"

This reverts commit ad7128f2c5, reversing
changes made to 7196c22a73.

* reset to master

* reset to master

* reset to master

* format

* gradlew format

* right type hints

* reset to master

* reset to master

* gradlew format

* a bunch of small fixes

* Update output format

* fixes from feedback

* fixme comment

* streams cannot return AirbyteRecordMessage

* fix

* format

* only return logs when running on debug mode

* move branching

* update typing

* remove dead code

* fix simpleretriever name

* i think this is better

* log response.text

* debug flag

* comment

* pass config

* comments

* run SATs

* fix most of the unit tests

* fix unit test

* reset to master

* runFromPath

* Revert "runFromPath"

This reverts commit 85979a801a.

* Revert "run SATs"

This reverts commit a8a8a2da95.

* no need to convert to dict

* fix test
2022-11-09 13:26:50 -08:00
Brian Lai
5222093b54 support custom error messaging for error response + retryable errors (#18204)
* support custom error messaging for error response + retryable errors

* remove changed backoff i was using for testing

* refactor filter to construct response status internally

* pr feedback

* bump version and update changelog
2022-10-26 15:39:36 -04:00
Serhii Chvaliuk
258b23cf79 CDK: VCR -> requests_cache + SQLite (#17990)
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-10-19 19:31:28 +03:00
Serhii Chvaliuk
a82f59ec1e CDK: Evaluate response.text only in debug mode (#16809)
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
2022-09-17 12:12:03 +03:00
Alexandre Girard
09aa685aad Alex/configurable retrier (#14330)
* checkout files from test branch

* read_incremental works

* reset to master

* remove dead code

* comment

* fix

* Add test

* comments

* utc

* format

* small fix

* Add test with rfc3339

* remove unused param

* fix test

* configurable state checkpointing

* update test

* start working on retrier

* retry predicate

* return response status

* look in error message

* cleanup test

* constant backoff strategy

* chain backoff strategy

* chain retrier

* Add to class types registry

* extract backoff time from header

* wait until

* update

* split file

* parse_records

* classmethod

* delete dead code

* comment

* comment

* comments

* fix

* test for instantiating chain retrier

* fix parsing

* cleanup

* fix

* reset

* never raise on http error

* remove print

* comment

* comment

* comment

* comment

* remove prints

* add declarative stream to registry

* Delete dead code

* Add docstrings

* quick fix

* exponential backoff

* fix test

* fix

* delete unused properties

* fix

* missing unit tests

* uppercase

* docstrings

* rename to success

* compare full request instead of just url

* renmae module

* rename test file

* rename interface

* rename default retrier

* rename to compositeerrorhandler

* fix missing renames

* move action to filter

* str -> minmaxdatetime

* small fixes

* plural

* add example

* handle header variations

* also fix wait time from

* allow using a regex to extract the value

* group()

* docstring

* add docs

* update comment

* docstrings

* update comment

* Update airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/http_requester.py

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* version: Update Parquet library to latest release (#14502)

The upstream Parquet library that is currently pinned for use in the S3 destination plugin is over a year old. The current version is generating invalid schemas for date-time with time-zone fields which appears to be addressed in the `1.12.3` release of the library in commit c72862b613

* merge

* 🎉 Source Github: improve schema for stream `pull_request_commits` added "null" (#14613)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Docs: Fixed broken links (#14622)

* fixing broken links

* more broken links

* source-hubspot: change mentioning of Mailchimp into HubSpot  doc (#14620)

* Helm Chart: Add external temporal option (#14597)

* conflict env configmap and chart lock

* reverting lock

* add eof lines and documentation on values yaml

* conflict json file

* rollback json

* solve conflict

* correct minio with new version

Co-authored-by: Guy Feldman <gfeldman@86labs.com>

* 🎉 Add YAML format to source-file reader (#14588)

* Add yaml reader

* Update docs

* Bumpversion of connector

* bump docs

* Update pyarrow dependency

* Upgrade pandas dependency

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* 🎉 Source Okta: add GroupMembers stream (#14380)

* add Group_Members stream to okta source

- Group_Members return a list of users, the same schema of Users stream.
- Create a shared schema users, and both group_members and users sechema use it as a reference.
- Add Group_Members stream to source connector

* add tests and fix logs schema

- fix the test error: None is not one of enums though the enum type includes both string and null, it comes from json schema validator
ddb87afad8/jsonschema/_validators.py (L279-L285)
- change grouop_members to use id as the cursor field since `filter` is not supported in the query string
- fix the abnormal state test on logs stream, when since is abnormally large, until has to defined, an equal or a larger value
- remove logs stream from full sync test, because 2 full sync always has a gap -- at least a new log about users or groups api.

* last polish before submit the PR

- bump docker version
- update changelog
- add the right abnormal value for logs stream
- correct the sample catalog

* address comments::

- improve comments for until parameter under the logs stream
- add use_cache on groupMembers

* add use_cache to Group_Members

* change configured_catalog to test

* auto-bump connector version

Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* split test files

* renames

* missing unit test

* add missing unit tests

* rename

* assert isinstance

* start extracting to their own files

* use final instead of classmethod

* assert we retry 429 errors

* Add log

* replace asserts with valueexceptions

* delete superfluous print statement

* fix factory so we don't need to union everything with strings

* get class_name from type

* remove from class types registry

* process error handlers one at a time

* sort

* delete print statement

* comment

* comment

* format

* delete unused file

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Tobias Macey <tmacey@boundlessnotions.com>
Co-authored-by: Serhii Chvaliuk <grubberr@gmail.com>
Co-authored-by: Amruta Ranade <11484018+Amruta-Ranade@users.noreply.github.com>
Co-authored-by: Bas Beelen <bjgbeelen@gmail.com>
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: Guy Feldman <gfeldman@86labs.com>
Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: Yiyang Li <yiyangli2010@gmail.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
2022-07-14 08:24:37 -07:00
Brian Lai
7bff12aea5 [#3078] [CDK] Add support for enabling debug from command line and some basic general debug logs (#14521)
* allow for command line debug option and basic debug statements + declarative

* feedback from pr comments

* fix some tests w/ req/res mixed up and fixing logging tests

* formatting

* pr feedback: cleaning up traces in logger.py and update docs with debug configuration

* remove unneeded trace logger test

* remove extra print statement
2022-07-13 18:01:07 -04:00
George Claireaux
4d279f8238 Remove legacy sentry code from cdk (#14016)
* rip sentry out of cdk

* remove sentry dsn from gsc
2022-06-23 12:14:09 +01:00
Brian Lai
be01b476ce Add new InterpolatedRequestOptionsProvider that encapsulates all variations of request arguments (#13472)
* write out new request options provider and refactor components and parts of the YAML config

* fix formatting

* pr feedback to consolidate body_data_provider to simplify the code

* pr feedback get rid of extraneous optional
2022-06-21 16:01:05 -04:00
Alexandre Girard
3894134d11 Bump year in license short to 2022 (#13191)
* Bump to 2022

* format
2022-05-25 17:56:49 -07:00
Pedro S. Lopez
41dc82f056 CDK: provide better user-friendly error messages (#12944)
* initial implementation

* add parse_response_error_message tests

* move error handler to existing try/catch in AbstractSource

* formatting

* var rename

* use isinstance for type checking

* add docstrings

* add more abstract_source and httpstream tests

* fix wrong httperror usage

* more test cases

* bump version, update changelog
2022-05-19 17:28:24 -04:00
Augustin
2d277747be cdk: remove false error logging in _send (#11650) 2022-04-04 10:36:05 -07:00
Marcos Marx
78cafc58ab CDK improve error handling responses (#11629)
* add response text and message to backoff error handling

* add response text and message to backoff error handling

* add response text case http error

* change response text to before raise error

* apply suggestions

* bump cdk version
2022-03-31 16:41:10 -03:00
Eugene Kulak
f83eca58ea CDK: Fix typing errors (#9037)
* fix typing, drop AirbyteLogger

* format

* bump the version

* use logger instead of fixture logger

Co-authored-by: Eugene Kulak <kulak.eugene@gmail.com>
Co-authored-by: auganbay <auganenu@gmail.com>
2022-01-14 10:29:34 +06:00
Dmytro
fe954c1a16 Integrate Sentry for performance and errors tracking. (#8248)
* Integrate Sentry for performance and errors tracking.

* Add sentry sensitive data scrubbing.

* updated cdk version and changelog

* Integrate Sentry for performance and errors tracking

Add `SENTRY_DSN` environment variable

* Integrate Sentry for performance and errors tracking

Add `sentry_sdk` to install requirements

* format cdk

* enable Sentry for google-search-console

* updated connector version

* update spec and source yamls

Co-authored-by: auganbay <auganenu@gmail.com>
Co-authored-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>
2021-12-21 20:02:02 +06:00
Sergei Solonitcyn
6ee922cef2 Improve URL creation in the CDK (#8513)
Signed-off-by: Sergei Solonitcyn <sergei.solonitcyn@zazmic.com>
2021-12-05 22:30:16 +02:00
Yevhenii
5eb2af6c08 CDK: Autogenerate reference docs (#4759)
* CDK Autogenerated reference docs: base version

* Update docs config.
Add .readthedocs.yaml file.
Update build html files.

* Update .gitignore.
Remove sphinx build files.

* Add newline at the end of .gitignore

* Update setup.py requirements.
Update .readthedocs.yaml config.

* Update rst files.
Add Makefile rst config.

* Update CDK docstring format.
Change rst layout.
Update sphinx config.

* Add Sphinx docs.
Update index.rst.
Update abstract_source.py docstrings.

* Override master_doc and package templates.
Add docs schema enerator script.
Update sphinx docs.

* Add `Publishing to Read the Docs` section to sphinx-docs.md".
Replace sphinx-docs.md to `airbyte-cdk` module.

* Update sphinx-docs.md section name

* Bump airbyte-cdk version.
Update CHANGELOG.md.

Co-authored-by: ykurochkin <y.kurochkin@zazmic.com>
Co-authored-by: Vadym Hevlich <vege1wgw@gmail.com>
2021-10-22 20:47:48 +03:00
Michel Tricot
1773e41e47 Shorten our headers + adds contributors file (#6478) 2021-09-27 10:45:50 -07:00
Marcos Marx
5861b6faf7 correct CI applying gradlew format (#6390)
Co-authored-by: Marcos Marx <marcosmarx@MacBook-Pro-de-Marcos.local>
2021-09-23 01:55:06 -03:00
Arthur Galuza
9aa5a5a52d 🎉 CDK: Added support for efficient parent/child streams using cache (#6057)
* Add caching

* Upd cache file handling

* Upd slices, sync mode, docs

* Bump version

* Use SyncMode.full_refresh for parent stream_slices

* Refactor
2021-09-22 20:23:27 +03:00
Vadym
4a0d364ea1 🎉 CDK: Add requests native authenticator support (#5731)
* Add requests native auth class

* Update init file.
Update type annotations.
Bump version.

* Update TokenAuthenticator implementation.
Update Oauth2Authenticator implemetation.
Add CHANGELOG.md record.

* Update Oauth2Authenticator default value setting.
Update CHANGELOG.md

* Add requests native authenticator tests

* Add CDK requests native __call__ method tests.
Update CHANGELOG.md

* Add outdated auth deprication messages

* Update requests native auth __call__ method tests

* Bump CDK version to 0.1.20
2021-09-15 19:23:31 +03:00
Dmytro
7584440515 CDK: private configuration option _limit and _page_size (#5617)
* CDK: private configuration option _limit and _page_size
2021-08-31 12:16:48 +03:00
Dmytro
eed2e107fd Python CDK: fix retry attempts in case of user defined backoff time (#5707)
* Python CDK: fix retry attempts in case of user defined backoff time
2021-08-31 09:55:45 +03:00
Yaroslav Dudar
8ddce6f355 🎉 Python CDK: Allow to ignore http status errors and override retry parameters (#5363)
added auto_fail_on_errors, max_retries, retry_factor properties to python cdk
2021-08-25 10:31:24 +03:00
Maksym Pavlenok
106c3cd641 🎉 CDK: Allow setting request non-JSON data (#5161)
* add the function request_body_data

* gradlew format

* Update airbyte-cdk/python/airbyte_cdk/sources/streams/http/http.py

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* Update airbyte-cdk/python/airbyte_cdk/sources/streams/http/exceptions.py

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* add test for application/x-www-form-urlencoded

Co-authored-by: Maksym Pavlenok <maksym.pavlenok@globallogic.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
2021-08-06 15:32:25 +03:00
Sherif A. Nada
3aebe55cd5 🎉 Python CDK: Allow setting network adapter args on outgoing HTTP requests (#4493) 2021-07-07 15:17:58 -07:00
Charles
0df53170c9 Stop formatting python with spotless (#3388) 2021-05-13 17:46:34 -07:00
Sherif A. Nada
535d83e0af bugfix infinite pagination in CDK (#3366) 2021-05-11 14:45:27 -07:00
Sherif A. Nada
184dab77eb CDK: overhaul directory structure (#3295) 2021-05-09 15:27:38 -07:00