* [ISSUE #26764] support brute force multiline json objects for JSONL
* [ISSUE #26764] infer_schema to support multiline json objects as well
* [ISSUE #26764] code review
* Add encoding to open_file interface
* pass the encoding set in the config
* cleanup
* cleanup
* Automated Commit - Formatting Changes
* Add missing test
* Automated Commit - Formatting Changes
* Update infer_schema too
* Automated Commit - Formatting Changes
* Update unit test
* add a unit test
* fix
* format
* format
* remove newline
* use a mock
* fix
* format
---------
Co-authored-by: girarda <girarda@users.noreply.github.com>
* remove version, make validation_policy enum, fix input_schema for s3 and abstract file based configs
* remove multiple file format options from stream config
* pr feedback
* fix tests after rebase
* additional spec changes to work with the UI
* fix tests post-rebase
* fix tests post-rebase and cleanup
* formatting
* add start_date config to abstract spec and apply it in the cursor
* rollback start date cursor changes
* revert back to filtering in the reader and pr feedback
* fix tests post-rebase and pr feedback
* remove invalid legacy option
* remove unused option
* the tests pass but this is quite messy
* very slight clean up
* Add skip options to csv format
* fix some of the typing issues
* fixme comment
* remove extra log message
* fix typing issues
* skip before header
* skip after header
* format
* add another test
* Automated Commit - Formatting Changes
* auto generate column names
* delete dead code
* update title and description
* true and false values
* Update the tests
* Add comment
* missing test
* rename
* update expected spec
* move to method
* Update comment
* fix typo
* remove unused import
* Add a comment
* None records do not pass the WaitForDiscoverPolicy
* format
* remove second branch to ensure we always go through the same processing
* Raise an exception if the record is None
* reset
* Update tests
* handle unquoted newlines
* Automated Commit - Formatting Changes
* Update test case so the quoting is explicit
* Update comment
* Automated Commit - Formatting Changes
* Fail validation if skipping rows before header and header is autogenerated
* always fail if a record cannot be parsed
* format
* set write line_no in error message
* remove none check
* Automated Commit - Formatting Changes
* enable autogenerate test
* remove duplicate test
* missing unit tests
* Update
* remove branching
* remove unused none check
* Update tests
* remove branching
* format
* extract to function
* comment
* missing type
* type annotation
* use set
* Document that the strings are case-sensitive
* public -> private
* add unit test
* newline
---------
Co-authored-by: girarda <girarda@users.noreply.github.com>
* relax pydantic dep
* Automated Commit - Format and Process Resources Changes
* wip
* wrap up base integration
* add init file
* introduce CDK runner and improve error message
* make state param optional
* update protocol models
* review comments
* always run incremental if possible
* fix
---------
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
* relax pydantic dep
* Automated Commit - Format and Process Resources Changes
* update protocol models
* format change
---------
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
* remove duplicate param
* remove duplicate params
* fix some of the typing issues
* fix typing issues
* newline
* format
* Enable by default
* Add missing file
* refactor and remove flag
* none check
* move line of code
* fix typing in rate_limiting
* comment
* use typedef
* else branch
* format
* gate the feature
* rename test
* fix the test
* only dedupe if the values are the same
* Add some tests
* convert values to strings
* Document the change
* implement in requester too
* add avro parser for inferring schema and reading records
* fix mypy check not caught locally
* pr feedback and some additional types
* add decimal_as_float for avro
* formatting + mypy
This PR adds a new authenticator: The SessionTokenAuthenticator. The existing authenticator under the same name is renamed to LegacySessionTokenAuthenticator.
* ingore unit tests in mypy check
* Update airbyte-cdk/python/bin/run-mypy-on-modified-files.sh
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
* work through mypy errors
* fix a bunch of stuff
* fix more type hints
* fix model_to_component_factory types
* format
* ignore list instead of allow list
---------
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
* tests pass
* everything except parquet config seems to work
* the file fortmat needs a literal
* Add a comment
* Update
* comment
* Ensure only one file type is specified
* Add a test
* add test
* update
* Automated Commit - Formatting Changes
* extract formats
* Automated Commit - Formatting Changes
* fix typo
* Update tests
* Also test jsonl
* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
* Update the spec
* update to new config format
* set decimal_as_float to True on legacy configs for backward compatibility
* comments
* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/config/file_based_stream_config.py
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
* format
---------
Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
* set test_read_limit_reached to true if we hit the max records limit
* rename slice to _slice to avoid shadowing a builtin keyword
* newline
* fix some of the typing issues
* fix some more typing issues
* another fix
* fix last typing issue
* format
* Automated Commit - Formatting Changes
* reset type
* fix the type
* Update for clarity
* Update types
---------
Co-authored-by: girarda <girarda@users.noreply.github.com>