airbyte

mirror of synced 2025-12-21 19:11:14 -05:00

Author	SHA1	Message	Date
Maxime Carbonneau-Leclerc	cfbd0b8219	[ISSUE #26764 ] support brute force multiline json objects for JSONL (#29331 ) * [ISSUE #26764] support brute force multiline json objects for JSONL * [ISSUE #26764] infer_schema to support multiline json objects as well * [ISSUE #26764] code review	2023-08-10 15:54:46 -04:00
Alexandre Girard	0aa86cf156	File-based CDK + Source S3 (v4): Pass configured file encoding to stream reader (#29110 ) * Add encoding to open_file interface * pass the encoding set in the config * cleanup * cleanup * Automated Commit - Formatting Changes * Add missing test * Automated Commit - Formatting Changes * Update infer_schema too * Automated Commit - Formatting Changes * Update unit test * add a unit test * fix * format * format * remove newline * use a mock * fix * format --------- Co-authored-by: girarda <girarda@users.noreply.github.com>	2023-08-09 09:05:06 -05:00
Brian Lai	b8d5ca77db	🐛 [file based cdk] Fix S3 and abstract spec to be compatible with Airbyte UI and CAT (#29075 ) * remove version, make validation_policy enum, fix input_schema for s3 and abstract file based configs * remove multiple file format options from stream config * pr feedback * fix tests after rebase * additional spec changes to work with the UI * fix tests post-rebase * fix tests post-rebase and cleanup * formatting	2023-08-08 18:10:05 -04:00
Alexandre Girard	78b00e088b	Parquet parser return Decimal fields as strings (#29191 ) * Update the test so it fails if the type is different * Update to convert values * Add columns from file partitions * update	2023-08-08 11:38:16 -07:00
Alexandre Girard	1b6428877d	Avro parser: return Decimal fields as strings (#29182 ) * update avro parsing * rename field * output as iso strings	2023-08-08 11:34:25 -07:00
Brian Lai	01045d674d	Add start_date to all file-based configs (#28845 ) * add start_date config to abstract spec and apply it in the cursor * rollback start date cursor changes * revert back to filtering in the reader and pr feedback * fix tests post-rebase and pr feedback	2023-08-07 20:43:07 -04:00
lmossman	41b384d708	🤖 Bump patch version of Airbyte CDK	2023-08-04 21:15:14 +00:00
Lake Mossman	4bf6b8e15a	Fix title & description of datetime_format field (#29025 ) * fix description of datetime_format field * Automated Commit - Formatting Changes * improve description of cursor datetime formats field * Automated Commit - Formatting Changes --------- Co-authored-by: lmossman <lmossman@users.noreply.github.com>	2023-08-04 14:04:24 -07:00
clnoll	a65055f0a2	🤖 Bump patch version of Airbyte CDK	2023-08-04 16:20:56 +00:00
Catherine Noll	53d8450ec2	File-based CDK: allow FileBasedSource to take a cursor_cls (#29027 )	2023-08-04 09:49:03 -04:00
Catherine Noll	8ced5ff1db	airbyte-cdk: allow Entrypoint to extract config (#28980 )	2023-08-03 22:48:06 -04:00
Alexandre Girard	641a65a1e3	Add CSV options to the CSV parser (#28491 ) * remove invalid legacy option * remove unused option * the tests pass but this is quite messy * very slight clean up * Add skip options to csv format * fix some of the typing issues * fixme comment * remove extra log message * fix typing issues * skip before header * skip after header * format * add another test * Automated Commit - Formatting Changes * auto generate column names * delete dead code * update title and description * true and false values * Update the tests * Add comment * missing test * rename * update expected spec * move to method * Update comment * fix typo * remove unused import * Add a comment * None records do not pass the WaitForDiscoverPolicy * format * remove second branch to ensure we always go through the same processing * Raise an exception if the record is None * reset * Update tests * handle unquoted newlines * Automated Commit - Formatting Changes * Update test case so the quoting is explicit * Update comment * Automated Commit - Formatting Changes * Fail validation if skipping rows before header and header is autogenerated * always fail if a record cannot be parsed * format * set write line_no in error message * remove none check * Automated Commit - Formatting Changes * enable autogenerate test * remove duplicate test * missing unit tests * Update * remove branching * remove unused none check * Update tests * remove branching * format * extract to function * comment * missing type * type annotation * use set * Document that the strings are case-sensitive * public -> private * add unit test * newline --------- Co-authored-by: girarda <girarda@users.noreply.github.com>	2023-08-03 08:59:55 -07:00
flash1293	4b4de02abd	🤖 Bump minor version of Airbyte CDK	2023-08-03 10:40:10 +00:00
Joe Reuter	df3b1d9c8d	🚨🚨 Low code CDK: Decouple SimpleRetriever and HttpStream (#28657 ) * fix tests * format * review comments * Automated Commit - Formatting Changes * review comments * review comments * review comments * log all messages * log all message * review comments * review comments * Automated Commit - Formatting Changes * add comment --------- Co-authored-by: flash1293 <flash1293@users.noreply.github.com>	2023-08-03 12:30:59 +02:00
flash1293	58b4a64a37	🤖 Bump minor version of Airbyte CDK	2023-08-03 10:09:13 +00:00
Joe Reuter	1ee4c04203	CDK: Embedded reader utils (#28873 ) * relax pydantic dep * Automated Commit - Format and Process Resources Changes * wip * wrap up base integration * add init file * introduce CDK runner and improve error message * make state param optional * update protocol models * review comments * always run incremental if possible * fix --------- Co-authored-by: flash1293 <flash1293@users.noreply.github.com>	2023-08-03 12:02:31 +02:00
flash1293	c32cc25ca7	🤖 Bump minor version of Airbyte CDK	2023-08-02 11:10:57 +00:00
Joe Reuter	60e1d72b42	Python CDK: Relax pydantic version requirement (#28854 ) * relax pydantic dep * Automated Commit - Format and Process Resources Changes * update protocol models * format change --------- Co-authored-by: flash1293 <flash1293@users.noreply.github.com>	2023-08-02 13:03:03 +02:00
Catherine Noll	09ebb47b24	File cdk parser and cursor updates (#28900 ) * File-based CDK: update parquet parser to handle partitions * File-based CDK: make the record output & cursor date time format consistent	2023-08-01 21:47:58 -04:00
Maxime Carbonneau-Leclerc	48e451703f	Improve file identification for mypy (#28780 ) * Improve file identification for mypy * Catch files that are not yet commited as well * add staged files	2023-08-01 18:14:14 -04:00
maxi297	6df0709dea	🤖 Bump patch version of Airbyte CDK	2023-08-01 22:05:38 +00:00
Maxime Carbonneau-Leclerc	e158bec4b2	[ISSUE #28782 ] support multiple cursor field datetime formats (#28936 ) * [ISSUE #28782] support multiple cursor field datetime formats * Making sure we use the proper format for creating slices * Code review	2023-08-01 17:59:17 -04:00
clnoll	5cf912a27b	🤖 Bump patch version of Airbyte CDK	2023-08-01 15:27:34 +00:00
Catherine Noll	22ff7e0fae	File-based CDK: reorganize FileReadMode to fix circular import (#28885 )	2023-07-31 17:55:29 -04:00
Catherine Noll	642e7680b4	File-based CDK: add read mode to stream reader interface & parsers (#28862 )	2023-07-31 16:55:00 -04:00
Catherine Noll	73395a187a	File-based CDK: allow null values for all inferred columns (#28847 )	2023-07-31 15:10:21 -04:00
maxi297	292530c536	🤖 Bump patch version of Airbyte CDK	2023-07-27 14:13:49 +00:00
Maxime Carbonneau-Leclerc	48bf520d87	Fix stream read given stream doesn't have any slice (#28746 ) * Fix stream read given stream doesn't have any slice * Not return slices if there are none * Fix test	2023-07-27 10:05:35 -04:00
Ben Church	fb7258e2bd	Move tools/ci_* projects to airbyte-ci, update to use Poetry, bump to python 3.10 (#27957 ) * Move ci_connector_ops * Move ci_credentials * Move tools/ci_common_utils * Rename tools to airbyte-ci * Move to ci * Convert ci_credentials * Convert ci_common_utls * Convert ci_connector_ops * Get pipelines running * Move pipelines to own poetry project * Update readme * Delete * Add ci_code_validator * Use pipx to install gha deps * Fix' * Ensure every thing is running * Automated Commit - Formatting Changes * Gitignore miss * Add pipx installer * Get local pipx dependencies * Fix paths * Install pipx * ceremonial source-faker change * Add installation step for ci_code_validator * Add comment * remove ci_code_validator * Address code review comments * add pipx install to acceptance-test-docker.sh * Run formater * Revert "ceremonial source-faker change" This reverts commit `26884cd0db`. * gitignore lecacy pipeline report path * update poetry.lock * skip upload if logs do not exist --------- Co-authored-by: bnchrch <bnchrch@users.noreply.github.com> Co-authored-by: alafanechere <augustin.lafanechere@gmail.com>	2023-07-26 15:49:59 +00:00
Lake Mossman	f66079cdb1	Capitalize title of request_authentication field (#28585 ) * capitalize title of request_authentication field * Automated Commit - Formatting Changes --------- Co-authored-by: lmossman <lmossman@users.noreply.github.com>	2023-07-25 17:17:27 -06:00
girarda	6e75a7812e	🤖 Bump patch version of Airbyte CDK	2023-07-25 21:48:36 +00:00
Alexandre Girard	df01616951	[Issue #23497 ] Deduplicate query parameters for declarative connectors (#28550 ) * remove duplicate param * remove duplicate params * fix some of the typing issues * fix typing issues * newline * format * Enable by default * Add missing file * refactor and remove flag * none check * move line of code * fix typing in rate_limiting * comment * use typedef * else branch * format * gate the feature * rename test * fix the test * only dedupe if the values are the same * Add some tests * convert values to strings * Document the change * implement in requester too	2023-07-25 14:22:25 -07:00
Brian Lai	59300093b1	[file-based cdk] Add avro parser for inferring schema and reading records (#28500 ) * add avro parser for inferring schema and reading records * fix mypy check not caught locally * pr feedback and some additional types * add decimal_as_float for avro * formatting + mypy	2023-07-25 12:54:16 -04:00
maxi297	62ca5b82ea	🤖 Bump patch version of Airbyte CDK	2023-07-20 14:50:12 +00:00
Maxime Carbonneau-Leclerc	b1a5f270ae	Fix remove field transform (#28518 ) * Fix remove field transform * mypy	2023-07-20 10:15:42 -04:00
flash1293	05b00303ba	🤖 Bump minor version of Airbyte CDK	2023-07-19 15:18:35 +00:00
Joe Reuter	58cc540c6b	🚨 Low code CDK: Add session token authenticator (#28050 ) This PR adds a new authenticator: The SessionTokenAuthenticator. The existing authenticator under the same name is renamed to LegacySessionTokenAuthenticator.	2023-07-19 17:10:24 +02:00
Joe Reuter	78728410f4	Low code CDK: Fix mypy errors (#28386 ) * ingore unit tests in mypy check * Update airbyte-cdk/python/bin/run-mypy-on-modified-files.sh Co-authored-by: Alexandre Girard <alexandre@airbyte.io> * work through mypy errors * fix a bunch of stuff * fix more type hints * fix model_to_component_factory types * format * ignore list instead of allow list --------- Co-authored-by: Alexandre Girard <alexandre@airbyte.io>	2023-07-19 15:08:35 +02:00
Joe Reuter	db16853fd8	Ingore unit tests in mypy check (#28359 ) * ingore unit tests in mypy check * Update airbyte-cdk/python/bin/run-mypy-on-modified-files.sh Co-authored-by: Alexandre Girard <alexandre@airbyte.io> * ignore list instead of allow list --------- Co-authored-by: Alexandre Girard <alexandre@airbyte.io>	2023-07-19 10:41:34 +02:00
girarda	8099c254f8	🤖 Bump patch version of Airbyte CDK	2023-07-19 01:56:49 +00:00
Alexandre Girard	dcf35701f4	Fix cdk unit test (#28447 ) * Update spec * fix test_create_custom_components * ignore errors	2023-07-18 18:40:32 -07:00
Alexandre Girard	9de707fbf0	Parquet files: support decimal as floats, map, null, and fixed sized binary types (#28320 ) * tests pass * everything except parquet config seems to work * the file fortmat needs a literal * Add a comment * Update * comment * Ensure only one file type is specified * Add a test * add test * update * Automated Commit - Formatting Changes * extract formats * Automated Commit - Formatting Changes * fix typo * Update tests * Also test jsonl * Update airbyte-cdk/python/airbyte_cdk/sources/file_based/config/abstract_file_based_spec.py Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com> * Update the spec * update to new config format * set decimal_as_float to True on legacy configs for backward compatibility * comments * Update airbyte-cdk/python/airbyte_cdk/sources/file_based/config/file_based_stream_config.py Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com> * format --------- Co-authored-by: girarda <girarda@users.noreply.github.com> Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>	2023-07-18 18:40:51 -05:00
Alexandre Girard	3ae73fb0ff	connector builder: Set test_read_limit_reached to true if we hit the max records limit (#28293 ) * set test_read_limit_reached to true if we hit the max records limit * rename slice to _slice to avoid shadowing a builtin keyword * newline * fix some of the typing issues * fix some more typing issues * another fix * fix last typing issue * format * Automated Commit - Formatting Changes * reset type * fix the type * Update for clarity * Update types --------- Co-authored-by: girarda <girarda@users.noreply.github.com>	2023-07-18 15:53:53 -07:00
girarda	ef26793fac	🤖 Bump minor version of Airbyte CDK	2023-07-18 21:52:18 +00:00
Alexandre Girard	5ca1b41eb7	Move pyarrow to CDK extra (#28413 ) * move pyarrow to extra * Automated Commit - Formatting Changes * remove parquet tests * delete the import * missing space * Automated Commit - Formatting Changes * comment parquet_parser too * optimize imports * comment out temporary file source * add pyarrow to dev extra * reset files * share pyarrow dependency * use alpine for declarative_source * Automated Commit - Formatting Changes * Revert "use alpine for declarative_source" This reverts commit `a3ad47ccca`. * pin cdk version * reset the cdk version --------- Co-authored-by: girarda <girarda@users.noreply.github.com>	2023-07-18 16:20:53 -05:00
Maxime Carbonneau-Leclerc	250e508f8c	Remove pyyaml/cython tmp fix in airbyte-cdk docker (#28396 )	2023-07-18 10:38:53 -04:00
maxi297	25fee2206a	🤖 Bump minor version of Airbyte CDK	2023-07-18 14:16:25 +00:00
Maxime Carbonneau-Leclerc	21d1a3bd62	Fix pyyaml/cython issue (#28393 )	2023-07-18 10:09:58 -04:00
Catherine Noll	e2bb01838e	File-based CDK: implement JSONL parser (#28259 )	2023-07-17 22:46:58 -06:00
Joe Reuter	0d185a2b40	fix date format detection (#28268 )	2023-07-14 13:16:15 +02:00

... 13 14 15 16 17 ...

1266 Commits