* New file-based CDK module scaffolding
* Address code review comments
* Formatting
* Automated Commit - Formatting Changes
* Apply suggestions from code review
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
* Automated Commit - Formatting Changes
* address CR comments
* Update tests to use builder pattern
* Move max files for schema inference onto the discovery policy
* Reorganize stream & its dependencies
* File CDK: error handling for CSV parser (#27176)
* file url and updated_at timestamp is added to state's history field
* Address CR comments
* Address CR comments
* Use stream_slice to determine which files to sync
* fix
* test with no input state
* test with multiple files
* filter out older files
* group by timestamp
* Add another test
* comment
* use min time
* skip files that are already in the history
* move the code around
* include files that are not in the history
* remove start_timestamp
* cleanup
* sync misisng recent files even if history is more recent
* remove old files if history is full
* resync files if history is incomplete
* sync recent files
* comment
* configurable history size
* configurable days to sync if history is full
* move to a stateful object
* Only update state once per file
* two unit tests
* Unit tests
* missing files
* remove inner state
* fix tests
* fix interface
* fix constructor
* Update interface
* cleanup
* format
* Update
* cleanup
* Add timestamp and source file to schema
* set file uri on record
* format
* comment
* reset
* notes
* delete dead code
* format
* remove dead code
* remove dead code
* warning if history is not complete
* always set is_history_partial in the state
* rename
* Add a readme
* format
* Update
* rename
* rename
* missing files
* get instead of compute
* sort alphabetically, and sync everthing if the history is not partial
* unit tests
* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/README.md
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
* Update docs
* reset
* Test to verify we remove files sorted (datetime, alphabetically)
* comment
* Update scenario
* Rename method to get_state
* If the file's ts is equal to the earliest ts, only sync it if its alphabetically greater than the file
* add missing test
* rename
* rename and update comments
* Update comment for clarity
* inject the cursor
* add interface
* comment
* Handle the case where the file has been modified since it was synced
* Only inject from AbstractFileSource
* keep the remote files in the stream slices
* Use file_based typedefs
* format
* Update the comment
* simplify the logic, update comment, and add a test
* Add a comment
* slightly cleaner
* clean up
* typing
* comment
* I think this is simpler to reason about
* create the cursor in the source
* update
* Remove methods from FiledBasedStreamReader and AbstractFileBasedStream interface (#27736)
* update the interface
* Add a comment
* rename
---------
Co-authored-by: Catherine Noll <noll.catherine@gmail.com>
Co-authored-by: clnoll <clnoll@users.noreply.github.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
* for availability check - handle HttError happens during slice extraction (reading of parent stream),
updated reason messages,
moved check availability call under common try/except which handles errors during usual stream read,
moved log messages which indicate start of the stream sync before availability check in to make to understand which stream is the source of errors
* why do we return here and not try next stream?
* fixed bug in CheckStream, now we try to check availability for all streams
* [ISSUE #26581] per partition cursor
* [ISSUE #26581] format
* [ISSUE #26581] clean up state management
* [ISSUE #26581] improving Hashabledict
* [ISSUE #26581] format cdk
* [ISSUE #26581] fix tests
* [ISSUE #26581] code review from girarda
* Retrigger pipeline
* Decouple cursor and stream slicer and pushing state management as far up cursor as possible
* Format cdk
* Small fixes/comments
* DatetimeBasedCursor should not update state based on slice (for now at least since it wasn't doing this before)
* [ISSUE #26581] code review
* Automated Commit - Formatting Changes
* [ISSUE #26581] validation overlapping keys
* [ISSUE #26581] add typing
* [ISSUE #26581] code review
* Remove SyncMode from stream_slices
* Removing SyncMode from stream_slices up until SimpleRetriever and fixing typing
* format cdk
* add the request filters and integration test fixtures
* pr feedback and some tweaks to the testing framework
* optimize the cache for more hits
* formatting
* remove cache
* Revert "🐛 CDK: replace `data` with `json` when making OAuth calls (#27350)"
This reverts commit 780f4415d9.
* Revert "Set content-type header on oauth request (#27225)"
This reverts commit 2864f72ff4.
* Set content-type header on oauth authenticator
* Revert "Set content-type header on oauth authenticator"
This reverts commit 1e6815e9bb.
* Set header on oauth request
* Fix test
* Verify header is set
* Automated Commit - Formatting Changes
---------
Co-authored-by: girarda <girarda@users.noreply.github.com>
* secure the jinja environment
* format
* Update comment
* remove extra test
* remove lambda
* Update
* Raise an error on undefined variables
* remove unused import
* add unit tests to missing context vars and adjust error message
---------
Co-authored-by: brianjlai <brian.lai@airbyte.io>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
* add aliases
* Raise error if the alias is found in the context
* format
* Comment
* Automated Commit - Formatting Changes
* rename to stream partition in greenhouse manifest
* Revert "rename to stream partition in greenhouse manifest"
This reverts commit d513ef418f.
* Clean up test
* Other test
* last test
---------
Co-authored-by: girarda <girarda@users.noreply.github.com>
* Fix and document macros
* cleanup
* dots
* Add tests and refactor
* Update
* Add an example
* Document variables
* Mention now_local is not recommended
* Fix + unit test
* Add a test with pagination
* Add a test with partition router
* Make sure _fetch_next_page is called with the right arguments
* Automated Commit - Formatting Changes
* pagination with partitions
* refactor
* clean up
* format
---------
Co-authored-by: girarda <girarda@users.noreply.github.com>
* Move condition for yielding the slice message to an overwritable method
* Automated Commit - Formatting Changes
* yield the slice log messages
* same for incremental
* refactor
* Revert "refactor"
This reverts commit c594365bd8.
* move flag from factory to source
* set the flag
* remove debug print
* halfmock
* clean up
* Add a test for a single page
* Add another test
* Pass the flag
* rename
---------
Co-authored-by: girarda <girarda@users.noreply.github.com>
* wip
* fix unit test
* fix other unit test
* format
* reset
* format
* missing unit test
* yield a LogMessage on error
* format
* format
* fix unit tests
* yield a trace message instead of a log message
* format
* fix bad merge