Commit Graph

133 Commits

Author SHA1 Message Date
Rhys
a5309bee25 fix: handle missing credential_id (#30051)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-24 11:21:51 +08:00
wangxiaolei
111a39b549 fix: fix firecrawl url concat (#30008) 2025-12-24 09:40:32 +08:00
wangxiaolei
32605181bd feat: first use INTERNAL_FILES_URL first, then FILES_URL (#29962) 2025-12-21 16:53:37 +08:00
zhaobingshuang
8d1e36540a fix: detect_file_encodings TypeError: tuple indices must be integers or slices, not str (#29595)
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2025-12-17 13:58:05 +08:00
Jyong
ae4a9040df Feat/update notion preview (#29345)
Co-authored-by: twwu <twwu@dify.ai>
2025-12-16 16:43:45 +08:00
wangxiaolei
8f3fd9a728 perf: commit once (#29590) 2025-12-15 11:40:26 +08:00
Jyong
db42f467c8 fix: docx extractor external image failed (#29558) 2025-12-12 13:41:51 +08:00
Nie Ronghua
12e39365fa perf(core/rag): optimize Excel extractor performance and memory usage (#29551)
Co-authored-by: 01393547 <nieronghua@sf-express.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2025-12-12 12:15:03 +08:00
wangxiaolei
45911ab0af feat: using charset_normalizer instead of chardet (#29022) 2025-12-05 11:19:19 +08:00
李龙飞
81832c14ee Fix: Correctly handle merged cells in DOCX tables to prevent content duplication and loss (#27871)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2025-11-13 15:56:24 +08:00
Ademílson Tonato
a16ef7e73c refactor: Update Firecrawl to use v2 API (#24734)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2025-10-15 10:48:54 +08:00
Guangdong Liu
d299e75e1b refactor: use dynamic max characters for chunking in extractors (#26782) 2025-10-13 10:22:59 +08:00
Asuka Minato
1bd621f819 remove .value (#26633)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-11 09:08:29 +08:00
Asuka Minato
bb6a331490 change all to httpx (#26119)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2025-10-10 23:41:16 +08:00
Asuka Minato
ab2eacb6c1 use model_validate (#26182)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2025-10-10 17:30:13 +09:00
-LAN-
85cda47c70 feat: knowledge pipeline (#25360)
Signed-off-by: -LAN- <laipz8200@outlook.com>
Co-authored-by: twwu <twwu@dify.ai>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
Co-authored-by: jyong <718720800@qq.com>
Co-authored-by: Wu Tianwei <30284043+WTW0313@users.noreply.github.com>
Co-authored-by: QuantumGhost <obelisk.reg+git@gmail.com>
Co-authored-by: lyzno1 <yuanyouhuilyz@gmail.com>
Co-authored-by: quicksand <quicksandzn@gmail.com>
Co-authored-by: Jyong <76649700+JohnJyong@users.noreply.github.com>
Co-authored-by: lyzno1 <92089059+lyzno1@users.noreply.github.com>
Co-authored-by: zxhlyh <jasonapring2015@outlook.com>
Co-authored-by: Yongtao Huang <yongtaoh2022@gmail.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Joel <iamjoel007@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: nite-knite <nkCoding@gmail.com>
Co-authored-by: Hanqing Zhao <sherry9277@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry <xh001x@hotmail.com>
2025-09-18 12:49:10 +08:00
-LAN-
bab4975809 chore: add ast-grep rule to convert Optional[T] to T | None (#25560)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2025-09-15 13:06:33 +08:00
Ritoban Dutta
67a686cf98 [Chore/Refactor] use __all__ to specify export member. (#25681) 2025-09-15 09:45:35 +08:00
Krito.
a13d7987e0 chore: adopt StrEnum and auto() for some string-typed enums (#25129)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2025-09-12 21:14:26 +08:00
-LAN-
9b8a03b53b [Chore/Refactor] Improve type annotations in models module (#25281)
Signed-off-by: -LAN- <laipz8200@outlook.com>
Co-authored-by: crazywoola <100913391+crazywoola@users.noreply.github.com>
2025-09-08 09:42:27 +08:00
Asuka Minato
a78339a040 remove bare list, dict, Sequence, None, Any (#25058)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: -LAN- <laipz8200@outlook.com>
2025-09-06 03:32:23 +08:00
Yoshio Sugiyama
4966e4e1fb fix: Remove invalid key from firecrawl request payload. (#25190)
Signed-off-by: SUGIYAMA Yoshio <nenegi.01mo@gmail.com>
2025-09-05 10:10:56 +08:00
-LAN-
9d5956cef8 [Chore/Refactor] Switch from MyPy to Basedpyright for type checking (#25047)
Signed-off-by: -LAN- <laipz8200@outlook.com>
2025-09-03 11:52:26 +08:00
湛露先生
1fff4620e6 clean console apis and rag cleans. (#25042)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2025-09-03 11:25:18 +08:00
Yongtao Huang
bc9efa7ea8 Refactor: use DatasourceType.XX.value instead of hardcoded (#25015)
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2025-09-03 08:56:48 +08:00
Yongtao Huang
be3af1e234 Migrate SQLAlchemy from 1.x to 2.0 with automated and manual adjustments (#23224)
Co-authored-by: Yongtao Huang <99629139+hyongtao-db@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2025-09-02 10:30:19 +08:00
willzhao
ffba341258 [CHORE]: remove redundant-cast (#24807) 2025-09-01 14:05:32 +08:00
Bowen Liang
39064197da chore: cleanup unnecessary mypy suppressions on imports (#24712) 2025-08-28 23:17:25 +08:00
Asuka Minato
d2f234757b example try rm ignore (#24649) 2025-08-28 09:36:16 +08:00
-LAN-
da9af7b547 [Chore/Refactor] Use centralized naive_utc_now for UTC datetime operations (#24352)
Signed-off-by: -LAN- <laipz8200@outlook.com>
2025-08-22 23:53:05 +08:00
Asuka Minato
51cc2bf429 example of next(, None) (#24345) 2025-08-22 18:32:22 +08:00
willzhao
5ab6bc283c [CHORE]: x: T = None to x: Optional[T] = None (#24217) 2025-08-21 21:58:39 +08:00
Guangdong Liu
1abf1240b2 refactor: replace try-except blocks with contextlib.suppress for cleaner exception handling (#24284) 2025-08-21 18:18:49 +08:00
湛露先生
fd536a943a word extractor cleans. (#20926)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-08-08 09:37:51 +08:00
Aurelius Huang
ffddabde43 feat(notion): Notion Database extracts Rows content in row order and appends Row Page URL (#22646)
Co-authored-by: Aurelius Huang <cm.huang@aftership.com>
2025-07-30 21:35:20 +08:00
Asuka Minato
ef51678c73 orm filter -> where (#22801)
Signed-off-by: -LAN- <laipz8200@outlook.com>
Co-authored-by: -LAN- <laipz8200@outlook.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-07-24 00:57:45 +08:00
Asuka Minato
6d3e198c3c Mapped column (#22644)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-23 00:39:59 +08:00
helojo
e7d80bf7bf Fix: the pict type picture was not processed in the docx (#19305)
Co-authored-by: zqgame <zqgame@zqgame.local>
2025-07-17 22:53:35 +08:00
yihong
d2933c2bfe fix: drop dead code phase2 unused class (#22042)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-07-17 09:33:07 +08:00
baonudesifeizhai
1c7404099d fix: prevent timeout in file encoding detection for large files (#21453)
Co-authored-by: crazywoola <427733928@qq.com>
2025-07-03 17:06:49 +08:00
Jin
3e7f8bad56 fix: markdown_extractor lost chunks if it starts without a header(#21308) (#21309) 2025-06-21 23:10:00 +08:00
Ademílson Tonato
9e73e8b9e8 feat: add search endpoint for Firecrawl Integration (#20521)
Co-authored-by: crazywoola <427733928@qq.com>
2025-06-18 14:37:03 +08:00
kazuya-awano
45c89bd6de feat: add pagenation to notion extractor (#20919) 2025-06-18 11:30:55 +08:00
-LAN-
482e50aae9 Refactor/remove db from cycle manager (#20455)
Signed-off-by: -LAN- <laipz8200@outlook.com>
2025-05-30 04:34:13 +08:00
Amir Mohsen Asaran
c9ee60e197 Feat(WaterCrawl error handling): add custom exceptions and error handling (#19948) 2025-05-20 10:25:16 +08:00
-LAN-
4977bb21ec feat(workflow): domain model for workflow node execution (#19430)
Signed-off-by: -LAN- <laipz8200@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-17 00:56:16 +08:00
非法操作
085bd1aa93 chore: model.query change to db.session.query (#19551)
Co-authored-by: QuantumGhost <obelisk.reg+git@gmail.com>
2025-05-13 09:13:12 +08:00
非法操作
14cd71ed0a chore: all model.query replace to db.session.query (#19521) 2025-05-12 15:19:41 +08:00
湛露先生
1119790b02 clean rag word_extractor. (#19397)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-05-09 16:39:16 +08:00
Wesley
b62eb61400 fix depth param issue for WaterCrawl (#18839) 2025-04-27 11:04:56 +08:00