* New file-based CDK module scaffolding
* Address code review comments
* Formatting
* Automated Commit - Formatting Changes
* Apply suggestions from code review
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
* Automated Commit - Formatting Changes
* address CR comments
* Update tests to use builder pattern
* Move max files for schema inference onto the discovery policy
* Reorganize stream & its dependencies
* File CDK: error handling for CSV parser (#27176)
* file url and updated_at timestamp is added to state's history field
* Address CR comments
* Address CR comments
* Use stream_slice to determine which files to sync
* fix
* test with no input state
* test with multiple files
* filter out older files
* group by timestamp
* Add another test
* comment
* use min time
* skip files that are already in the history
* move the code around
* include files that are not in the history
* remove start_timestamp
* cleanup
* sync misisng recent files even if history is more recent
* remove old files if history is full
* resync files if history is incomplete
* sync recent files
* comment
* configurable history size
* configurable days to sync if history is full
* move to a stateful object
* Only update state once per file
* two unit tests
* Unit tests
* missing files
* remove inner state
* fix tests
* fix interface
* fix constructor
* Update interface
* cleanup
* format
* Update
* cleanup
* Add timestamp and source file to schema
* set file uri on record
* format
* comment
* reset
* notes
* delete dead code
* format
* remove dead code
* remove dead code
* warning if history is not complete
* always set is_history_partial in the state
* rename
* Add a readme
* format
* Update
* rename
* rename
* missing files
* get instead of compute
* sort alphabetically, and sync everthing if the history is not partial
* unit tests
* Update airbyte-cdk/python/airbyte_cdk/sources/file_based/README.md
Co-authored-by: Catherine Noll <clnoll@users.noreply.github.com>
* Update docs
* reset
* Test to verify we remove files sorted (datetime, alphabetically)
* comment
* Update scenario
* Rename method to get_state
* If the file's ts is equal to the earliest ts, only sync it if its alphabetically greater than the file
* add missing test
* rename
* rename and update comments
* Update comment for clarity
* inject the cursor
* add interface
* comment
* Handle the case where the file has been modified since it was synced
* Only inject from AbstractFileSource
* keep the remote files in the stream slices
* Use file_based typedefs
* format
* Update the comment
* simplify the logic, update comment, and add a test
* Add a comment
* slightly cleaner
* clean up
* typing
* comment
* I think this is simpler to reason about
* create the cursor in the source
* update
* Remove methods from FiledBasedStreamReader and AbstractFileBasedStream interface (#27736)
* update the interface
* Add a comment
* rename
---------
Co-authored-by: Catherine Noll <noll.catherine@gmail.com>
Co-authored-by: clnoll <clnoll@users.noreply.github.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>