* [ISSUE #19410] remove request_options_provider from the … (#21403) * [ISSUE #19410] (incomplete) remove request_options_provider from the manifest * [ISSUE #19410] (incomplete) incomplete cleanup config_component_schema.json as well * [ISSUE #19410] update source-monday * [ISSUE #19410] code review * [ISSUE #19410] formatting files * [Low-Code CDK] Replace the $options keyword with $parameters (#21632) * refactor flows and tests to use parameters instead of options * update documentation to reflect the change from options to parameters * create migration script to replace options with parameters in existing manifests * update template to use parameters instead of options * fix tests after rebasing from the branch * address pr feedback and extra uses of options that I missed * additional changes needed after rebasing from master * migrate low-code connectors to use parameters instead of options * 🚨🚨 [Low Code CDK] Update `*ref` format to `#/` (#21434) * [Low-Code CDK] Remove JsonSchema type in favor of JsonSchemaFileLoader (#21832) * fully deprecate JsonSchema in favor of JsonFileSchemaLoader * remove usage in the legacy registry * Update migration scripts according to manifest file rename (#21920) * Issue 21866 remove legacy factory and validation flow (#21878) * [ISSUE #21866] clean ManifestDeclarativeSource validation * [ISSUE #21866] remove dataclasses-jsonschema * [ISSUE #21866] code review * [ISSUE-21866] flake8 * [ISSUE #21559] remove DefaultPaginator.url_base (#21823) * [ISSUE #21559] remove DefaultPaginator.url_base * [ISSUE #21559] code review * [ISSUE #21559] update migration script * [ISSUE #21559] code review * [ISSUE #21559] update documentation * [ISSUE #21559] run migration (#21824) * [ISSUE #21559] remove DefaultPaginator.url_base (#21823) * [ISSUE #21559] remove DefaultPaginator.url_base * [ISSUE #21559] code review * [ISSUE #21559] update migration script * [ISSUE #21559] code review * [ISSUE #21559] update documentation * [ISSUE #21559] run migration (#21824) * [ISSUE #21559] fix manifests * [ISSUE #21926] setup server to allow for local tests (#21974) * [Low Code CDK] remove checkpoint_interval from DeclarativeStream component (#22120) * Issue #21576 rename dpathextractor fieldpointer (#21990) * [ISSUE #21926] setup server to allow for local tests * [ISSUE #21576] Rename DpathExtractor.field_pointer to field_path * [ISSUE #21576] migration script * [ISSUE #21576] update source-monday and source-pocket as well * [ISSUE #21576] migration (#21997) * [ISSUE #21576] code review * Remove checkpoint_interval from source-prestashop manifest (#22141) * replacing options with parameters for a few connectors I missed or were newly added * [Low-Code CDK] Rremove stream_cursor_field from stream and derive it from stream_slicer (#22294) * update schema to derive cursor_field from a stream slicer if it exists * remove usage of stream_cursor_field on simple connector use cases * fixing some of the more complex usage of stream_cursor_field that rely on cartesian product stream slicers * fix documentation to replace references to stream_cursor_field * Low Code CDK: Remove `name` and `primary_key` from non-DeclarativeStream components (#21891) * fix eslint issues for webapp (#22462) * 🪟 🔧 Connector Builder frontend fixes for low_code_cdk_to_beta (#22375) * bump connector builder server to latest CDK version * fix breaking CDK changes in connector builder FE * [Low-Code CDK] Separate request path from RequestOption component (#22398) * split apart path from RequestOption and fix usages and cleanup the code * replace usage of path with RequestPath and get rid of default to RequestOption * fix bug where stream_slice_field was used in outbound request instead of request_option field_name * organize yaml schema names and update documentation for RequestOption and RequestPath * clean up tests * regenerate models * [ISSUE #19961] refactor stream slices (#22225) * [ISSUE #19961] add 'incremental' and partially remove CartesianProductStreamSlicer - Google PageSpeed Insights not working yet * [ISSUE #19961] fixing Google PageSpeed Insights * move incremental_sync field to the stream level and perform merging into one stream slicer at that level * add tests to merging incremental and iterable into cartesian * rewrite documentation to separate incremental sync and iterator concepts * update documentation to use partition router and revise the tutorial to reflect the new changes to the components * [ISSUE #19961] update code to newest CDK version and clean autogenerated files (#22670) * [ISSUE #19961] rename stream_slicer to partition_router and update ma… (#22590) * [ISSUE #19961] rename stream_slicer to partition_router and update manifests (for incremental_sync as well) * [ISSUE 19961] rename CustomStreamSlicer (#22598) * [ISSUE 19961] rename CustomStreamSlicer * [ISSUE #19961] code review CustomStreamSlicer * [ISSUE #19961] fix source_square incremental sync * [ISSUE #19961] rename SingleSlice to SinglePartitionRouter (#22591) * [ISSUE #19961] rename SingleSlice to SinglePartitionRouter * remove SinglePartitionRouter from the schema --------- Co-authored-by: brianjlai <brian.lai@airbyte.io> * [ISSUE #19961] rename SubstreamSlicer to SubstreamPartitionRouter (#22596) * [ISSUE #19961] TMP rename SubstreamSlicer to SubstreamPartitionRouter * [ISSUE #19961] revert DatetimeStreamSlicer.stream_state_field_start and DatetimeStreamSlicer.stream_state_field_end * [ISSUE #19961] rename ListStreamSlicer to ListPartitionRouter (#22593) --------- Co-authored-by: brianjlai <brian.lai@airbyte.io> * [ISSUE #19961] clean faulty merge * [ISSUE #19961] rename DatetimeStreamSlicer (#22617) * [ISSUE #19961] rename stream_slicer to partition_router and update manifests (for incremental_sync as well) * [ISSUE 19961] rename CustomStreamSlicer (#22598) * [ISSUE 19961] rename CustomStreamSlicer * [ISSUE #19961] code review CustomStreamSlicer * [ISSUE #19961] fix source_square incremental sync * [ISSUE #19961] rename SingleSlice to SinglePartitionRouter (#22591) * [ISSUE #19961] rename SingleSlice to SinglePartitionRouter * remove SinglePartitionRouter from the schema --------- Co-authored-by: brianjlai <brian.lai@airbyte.io> * [ISSUE #19961] rename DatetimeStreamSlicer * [ISSUE #19961] rename SubstreamSlicer to SubstreamPartitionRouter (#22596) * [ISSUE #19961] TMP rename SubstreamSlicer to SubstreamPartitionRouter * [ISSUE #19961] revert DatetimeStreamSlicer.stream_state_field_start and DatetimeStreamSlicer.stream_state_field_end * [ISSUE #19961] rename ListStreamSlicer to ListPartitionRouter (#22593) --------- Co-authored-by: brianjlai <brian.lai@airbyte.io> * Update docs/connector-development/config-based/understanding-the-yaml-file/partition-router.md Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com> * Update docs/connector-development/config-based/understanding-the-yaml-file/partition-router.md Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com> * Update docs/connector-development/config-based/understanding-the-yaml-file/yaml-overview.md Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com> * Update docs/connector-development/config-based/understanding-the-yaml-file/partition-router.md Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com> * Update docs/connector-development/config-based/understanding-the-yaml-file/partition-router.md Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com> * Update docs/connector-development/config-based/understanding-the-yaml-file/partition-router.md Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com> * Update docs/connector-development/config-based/understanding-the-yaml-file/incremental-syncs.md Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com> * update docs * [ISSUE #19961] clean unit tests files * [ISSUE #19961] code review --------- Co-authored-by: brianjlai <brian.lai@airbyte.io> Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> * [Low-Code CDK] Allow for children of custom components to specify parameters that are normally derived (#22379) * Fix a bug where child components of a custom component cannot receive fields from other components * add tests, documentation and commenting * fix test from merge * add better error message for nested initialization failures * 🪟 🔧 Connector Builder frontend fixes for low_code_cdk_to_beta (#22880) * restrict name to stream level * remove checkpoint interval * adjust logic for new request options * refactor slicers * wording * review comments * make oldest supported version explicit * separate the frontend and connector builder changes from the low-code to beta release * [Low-Code CDK] Add script to run low code unit tests and address issues with a few connectors (#23123) * consolidate all the changes into a new PR after I messed up the merge on the side branch * add set to allow this to be called externally if necessary later * remove last few extra fields i found and fix docs links * fix docs one more time --------- Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com> Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com> Co-authored-by: maxi297 <maxime@airbyte.io> Co-authored-by: Lake Mossman <lake@airbyte.io> Co-authored-by: Joe Reuter <joe@airbyte.io>
134 lines
6.2 KiB
Python
134 lines
6.2 KiB
Python
#
|
|
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
|
|
#
|
|
|
|
import logging
|
|
import typing
|
|
from typing import Dict, Optional, Tuple
|
|
|
|
import requests
|
|
from airbyte_cdk.sources.streams import Stream
|
|
from airbyte_cdk.sources.streams.availability_strategy import AvailabilityStrategy
|
|
from airbyte_cdk.sources.streams.utils.stream_helper import get_first_record_for_slice, get_first_stream_slice
|
|
from requests import HTTPError
|
|
|
|
if typing.TYPE_CHECKING:
|
|
from airbyte_cdk.sources import Source
|
|
|
|
|
|
class HttpAvailabilityStrategy(AvailabilityStrategy):
|
|
def check_availability(self, stream: Stream, logger: logging.Logger, source: Optional["Source"]) -> Tuple[bool, Optional[str]]:
|
|
"""
|
|
Check stream availability by attempting to read the first record of the
|
|
stream.
|
|
|
|
:param stream: stream
|
|
:param logger: source logger
|
|
:param source: (optional) source
|
|
:return: A tuple of (boolean, str). If boolean is true, then the stream
|
|
is available, and no str is required. Otherwise, the stream is unavailable
|
|
for some reason and the str should describe what went wrong and how to
|
|
resolve the unavailability, if possible.
|
|
"""
|
|
try:
|
|
# Some streams need a stream slice to read records (e.g. if they have a SubstreamPartitionRouter)
|
|
# Streams that don't need a stream slice will return `None` as their first stream slice.
|
|
stream_slice = get_first_stream_slice(stream)
|
|
except StopIteration:
|
|
# If stream_slices has no `next()` item (Note - this is different from stream_slices returning [None]!)
|
|
# This can happen when a substream's `stream_slices` method does a `for record in parent_records: yield <something>`
|
|
# without accounting for the case in which the parent stream is empty.
|
|
reason = f"Cannot attempt to connect to stream {stream.name} - no stream slices were found, likely because the parent stream is empty."
|
|
return False, reason
|
|
|
|
try:
|
|
get_first_record_for_slice(stream, stream_slice)
|
|
return True, None
|
|
except StopIteration:
|
|
logger.info(f"Successfully connected to stream {stream.name}, but got 0 records.")
|
|
return True, None
|
|
except HTTPError as error:
|
|
return self.handle_http_error(stream, logger, source, error)
|
|
|
|
def handle_http_error(
|
|
self, stream: Stream, logger: logging.Logger, source: Optional["Source"], error: HTTPError
|
|
) -> Tuple[bool, Optional[str]]:
|
|
"""
|
|
Override this method to define error handling for various `HTTPError`s
|
|
that are raised while attempting to check a stream's availability.
|
|
|
|
Checks whether an error's status_code is in a list of unavailable_error_codes,
|
|
and gets the associated reason for that error.
|
|
|
|
:param stream: stream
|
|
:param logger: source logger
|
|
:param source: optional (source)
|
|
:param error: HTTPError raised while checking stream's availability.
|
|
:return: A tuple of (boolean, str). If boolean is true, then the stream
|
|
is available, and no str is required. Otherwise, the stream is unavailable
|
|
for some reason and the str should describe what went wrong and how to
|
|
resolve the unavailability, if possible.
|
|
"""
|
|
try:
|
|
status_code = error.response.status_code
|
|
reason = self.reasons_for_unavailable_status_codes(stream, logger, source, error)[status_code]
|
|
response_error_message = stream.parse_response_error_message(error.response)
|
|
if response_error_message:
|
|
reason += response_error_message
|
|
return False, reason
|
|
except KeyError:
|
|
# If the HTTPError is not in the dictionary of errors we know how to handle, don't except it
|
|
raise error
|
|
|
|
def reasons_for_unavailable_status_codes(
|
|
self, stream: Stream, logger: logging.Logger, source: Optional["Source"], error: HTTPError
|
|
) -> Dict[int, str]:
|
|
"""
|
|
Returns a dictionary of HTTP status codes that indicate stream
|
|
unavailability and reasons explaining why a given status code may
|
|
have occurred and how the user can resolve that error, if applicable.
|
|
|
|
:param stream: stream
|
|
:param logger: source logger
|
|
:param source: optional (source)
|
|
:return: A dictionary of (status code, reason) where the 'reason' explains
|
|
why 'status code' may have occurred and how the user can resolve that
|
|
error, if applicable.
|
|
"""
|
|
forbidden_error_message = f"The endpoint to access stream '{stream.name}' returned 403: Forbidden. "
|
|
forbidden_error_message += "This is most likely due to insufficient permissions on the credentials in use. "
|
|
forbidden_error_message += self._visit_docs_message(logger, source)
|
|
|
|
reasons_for_codes: Dict[int, str] = {requests.codes.FORBIDDEN: forbidden_error_message}
|
|
return reasons_for_codes
|
|
|
|
@staticmethod
|
|
def _visit_docs_message(logger: logging.Logger, source: Optional["Source"]) -> str:
|
|
"""
|
|
Creates a message indicicating where to look in the documentation for
|
|
more information on a given source by checking the spec of that source
|
|
(if provided) for a 'documentationUrl'.
|
|
|
|
:param logger: source logger
|
|
:param source: optional (source)
|
|
:return: A message telling the user where to go to learn more about the source.
|
|
"""
|
|
if not source:
|
|
return "Please visit the connector's documentation to learn more. "
|
|
|
|
try:
|
|
connector_spec = source.spec(logger)
|
|
docs_url = connector_spec.documentationUrl
|
|
if docs_url:
|
|
return f"Please visit {docs_url} to learn more. "
|
|
else:
|
|
return "Please visit the connector's documentation to learn more. "
|
|
|
|
except FileNotFoundError: # If we are unit testing without implementing spec() method in source
|
|
if source:
|
|
docs_url = f"https://docs.airbyte.com/integrations/sources/{source.name}"
|
|
else:
|
|
docs_url = "https://docs.airbyte.com/integrations/sources/test"
|
|
|
|
return f"Please visit {docs_url} to learn more."
|