1
0
mirror of synced 2025-12-26 14:02:10 -05:00
Commit Graph

370 Commits

Author SHA1 Message Date
Maxime Carbonneau-Leclerc
71d50635cc [ISSUE #32070] concurrent cdk improve futures handling (#32277) 2023-11-08 09:16:39 -05:00
Alexandre Girard
139deeb081 Implement max_time on error handler (#32272) 2023-11-08 00:46:26 +00:00
Eugene Kulak
6c7ba28d75 API Call Rate limiter (#31276)
Co-authored-by: Eugene Kulak <kulak.eugene@gmail.com>
Co-authored-by: keu <keu@users.noreply.github.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
2023-11-07 23:32:53 +02:00
Catherine Noll
4f44e33f5c Concurrent CDK: handle legacy state messages (#31964) 2023-11-02 08:21:08 -04:00
Joe Reuter
66dd29f764 File CDK unstructured parser: Improve file type detection (#31997) 2023-11-02 12:19:27 +01:00
Maxime Carbonneau-Leclerc
32fdd7fd72 [ISSUE #29573] Concurrent CDK: incremental syncs (#31466)
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
2023-11-01 12:00:25 -04:00
Martin Hwasser
bc4b7198a9 Add pptx support in file based cdk (#31912)
Co-authored-by: Joe Reuter <joe@airbyte.io>
2023-10-30 14:42:39 +01:00
Artem Inzhyyants
ecd6d89b9a Airbyte CDK: make max_time optional for backoff handler external usage (#31889) 2023-10-27 13:56:36 +02:00
Alexandre Girard
b8ad0c6a91 🐛 CDK: use in memory caching if ENV_REQUEST_CACHE_PATH is not set (#31887)
Co-authored-by: Eugene Kulak <widowmakerreborn@gmail.com>
Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-10-26 19:28:39 -07:00
Joe Reuter
e3793c1491 Move over unstructured parser (#31390)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-26 17:50:57 +02:00
Anatolii Yatsuk
c719137df3 🐛 Airbyte CDK: Fix flake errors in file-based CDK (#31771) 2023-10-24 16:15:11 +03:00
Anatolii Yatsuk
ce2342dde8 🎉 Airbyte CDK: Add CustomFileBasedException for custom errors in file-based CDK (#31704) 2023-10-24 11:09:50 +00:00
Alexandre Girard
7a764f8bbc low-code CDK: Allow connector developers to specify the type of an added field (#31638)
Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: erohmensing <erohmensing@gmail.com>
2023-10-23 14:12:59 -07:00
Alexandre Girard
7da2822488 Concurrent CDK: catch exceptions from worker thread and add integration test scenarios (#31245)
Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-10-23 08:39:58 -07:00
Joe Reuter
d474827068 File CDK: Don't fetch full file list for availability check (#31651)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-23 16:14:41 +02:00
Joe Reuter
bb07939646 File CDK: Add analytics messages for parser usage (#31498)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-19 15:42:51 +02:00
Martin Hwasser
40b0e05526 vector_based_cdk: Add option to rename field names (#31524)
Co-authored-by: Joe Reuter <joe@airbyte.io>
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-19 15:37:47 +02:00
Yevhenii
b951898c20 CDK: Support base64 encode and decode in Jinja Interpolation (#31387) 2023-10-19 13:55:45 +03:00
Alexandre Girard
ef9bd72a7e Parameterize ScenarioBuilder on Source type (#31244)
Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
2023-10-16 17:12:18 -07:00
Alexandre Girard
04c4fea5cc 🐛 Concurrent CDK bug fixes (#31402) 2023-10-16 12:06:35 -07:00
Anton Karpets
51fa2b3c31 🐛Airbyte CDK: wrap HTTP error with status code 400 in AirbyteTracedException (#31207) 2023-10-16 11:15:04 +03:00
Joe Reuter
e35a1f2cd9 File CDK: Allow configuration of parsed records during check and discover from parser (#31281)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-10-13 09:50:22 +02:00
Catherine Noll
8536725944 CDK: URL-encode query parameters and request body (#30407) 2023-10-12 09:56:55 -04:00
Joe Reuter
67324a4b5b Vector DB CDK: Batch by documents separately for each stream and namespace (#31158) 2023-10-12 13:47:27 +00:00
Alexandre Girard
25fc396cdf CDK: ThreadBasedConcurrentStream skeleton and top-level AbstractStream (#30111)
Co-authored-by: girarda <girarda@users.noreply.github.com>
Co-authored-by: Maxime Carbonneau-Leclerc <maxi297@users.noreply.github.com>
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
2023-10-11 16:46:02 -07:00
Yevhenii
17136a0c8a CDK: Fix initialize of token_expiry_is_time_of_expiration field (#31279) 2023-10-11 16:35:56 +00:00
Yevhenii
c17fae5855 CDK: create new method for parsing refresh token lifespan (#30698)
Co-authored-by: yevhenii-ldv <yevhenii-ldv@users.noreply.github.com>
2023-10-10 17:08:41 +03:00
Ben Church
4c97b2994a CDK: coerce read records to an iterator (#31122)
Co-authored-by: bnchrch <bnchrch@users.noreply.github.com>
2023-10-06 10:01:29 -07:00
Yevhenii
00452c9bd3 CDK: Enable Page Number/Offset to be set on the first request (#30978)
Co-authored-by: yevhenii-ldv <yevhenii-ldv@users.noreply.github.com>
2023-10-05 15:31:30 +03:00
Roman Yermilov [GL]
e561d5d432 Airbyte CDK: fix none type binary error in parquet parser (#31073) 2023-10-05 15:56:02 +04:00
Anton Karpets
767800d2d7 🐛Airbyte CDK: fix parsing of UUID fields in avro files (#31096) 2023-10-05 10:53:18 +03:00
Joe Reuter
5ab372170b Vector DB CDK: Add embedding option for openai-compatible embedding services (#30137) 2023-10-02 16:21:44 +00:00
Eugene Kulak
5eba3c3b57 CDK: Fix request_cache clearing and move it to tmp folder (#30719)
Co-authored-by: Eugene Kulak <kulak.eugene@gmail.com>
2023-09-28 21:27:40 +03:00
Marius Posta
7ae97175a6 gradle: fix repo wide behaviour (#30607) 2023-09-28 05:01:13 -07:00
Yevhenii
8cdafabd82 Airbyte CDK: Change Error message if stream is not found (#30723)
Co-authored-by: Yevhenii Kurochkin <ykurochkin@flyaps.com>
2023-09-25 18:13:19 +03:00
Joe Reuter
7e3437f05b Add chunking options to vector_db CDK (#30305)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-09-25 10:09:37 +00:00
Maxime Carbonneau-Leclerc
b335880fda jira invalid user-provided urls generating sentry issues (#30672) 2023-09-21 15:01:17 -04:00
Joe Reuter
a609902106 Vector DB CDK: Split openai embedding calls (#30512) 2023-09-19 14:21:13 +00:00
Maxime Carbonneau-Leclerc
b6836ad950 [ISSUE #30353] remove file_type from stream config (#30453) 2023-09-18 08:50:00 -04:00
Maxime Carbonneau-Leclerc
3e41ce7cd6 Maxi297/fix datetime format inference issue (#30442) 2023-09-15 09:40:47 -04:00
Joe Reuter
da5b432255 Vector DB CDK: AzureOpenAIEmbedder (#30136) 2023-09-14 12:41:00 +02:00
Maxime Carbonneau-Leclerc
48e8816b6b [oncall #2838] migrate parsing errors as config errors (#30209) 2023-09-06 13:38:48 -04:00
Joe Reuter
f2a8bebdc5 Vector DB CDK: Add "from field" embedding strategy (#30140) 2023-09-06 14:54:17 +02:00
Joe Reuter
56580b70c3 Vector DB CDK: Better error message for misconfigured text fields (#30129)
Co-authored-by: Pedro S. Lopez <pedroslopez@me.com>
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-09-06 10:11:56 +02:00
Maxime Carbonneau-Leclerc
5b653676aa Update spec and fix autogenerated headers with skip after (#30123) 2023-09-03 09:26:53 -04:00
Maxime Carbonneau-Leclerc
399b4d1fca File-based CDK: ensure no errors in Sentry given empty CSV (#29944) 2023-09-02 09:40:08 -04:00
Joe Reuter
7966a4e8f6 Vector DB CDK: Fix id generation, improve config spec, add base test case (#30081) 2023-09-01 15:04:10 +02:00
Alexandre Girard
7264b3e1d7 Fix mypy issues in AbstractSource + minor refactoring (#29927)
Co-authored-by: girarda <girarda@users.noreply.github.com>
2023-08-31 07:35:17 -07:00
Joe Reuter
a6547456b9 Vector based CDK (#29703)
Co-authored-by: flash1293 <flash1293@users.noreply.github.com>
2023-08-29 16:04:32 +02:00
Maxime Carbonneau-Leclerc
e2fb04f72d File-based CDK: allow user to provided column names (#29868) 2023-08-28 18:00:19 -04:00