Commit Graph

5271 Commits

Author SHA1 Message Date
L1nSn0w
2bbe74be23 fix: make e-1.12.1 enterprise migrations database-agnostic for MySQL/TiDB (#32269)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:57:38 +08:00
GareArc
76471821d7 Merge branch 'release/e-1.12.1' into deploy/enterprise 2026-02-11 21:43:42 -08:00
NFish
111c76b71f Merge remote-tracking branch 'origin/hotfix/1.12.1-fix.6' into release/e-1.12.1 2026-02-12 13:26:12 +08:00
GareArc
25c457e2ed Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-02-10 20:12:16 -08:00
GareArc
262b7d4d08 docs(enterprise): add telemetry data dictionary for OTEL signals
- Comprehensive reference for all enterprise telemetry signals
- Documents 3 span types, 10 counters, 6 histograms, 13 log events
- Includes trace correlation model with ASCII diagrams
- Configuration reference for all 8 ENTERPRISE_* variables
- Per-emission-site label tables for metrics
- Full JSON schemas for structured log events
- Content gating behavior and token double-counting warnings
2026-02-10 19:51:14 -08:00
wangxiaolei
793d22754e fix: fix get_message_event_type return wrong message type (#32019)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-02-11 11:00:40 +08:00
GareArc
efeae4c46f Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-02-10 00:31:34 -08:00
GareArc
b5dbabf5d0 feat(telemetry): add missing ID fields for name attributes
- Add dify.credential.id to node execution events
- Add dify.event.id to all telemetry events (APP_CREATED, APP_UPDATED, APP_DELETED, FEEDBACK_CREATED)

This ensures all .name fields have corresponding .id fields for reliable aggregation and deduplication.
2026-02-10 00:09:41 -08:00
GareArc
d207ca3f1e Merge branch 'deploy/enterprise' of https://github.com/langgenius/dify into deploy/enterprise 2026-02-09 01:57:13 -08:00
GareArc
aa34ec0d25 test(enterprise-telemetry): add unit tests for OTEL bearer auth and insecure flag 2026-02-09 01:44:21 -08:00
GareArc
ffa8aedc48 feat(enterprise-telemetry): wire bearer token auth and configurable insecure flag into OTEL exporter 2026-02-09 01:44:21 -08:00
GareArc
f78b0f1f36 feat(enterprise-telemetry): add ENTERPRISE_OTLP_API_KEY config field 2026-02-09 01:44:21 -08:00
GareArc
f85275e5f9 test(enterprise-telemetry): add unit tests for OTEL bearer auth and insecure flag 2026-02-09 01:35:17 -08:00
GareArc
f1b5863bb5 feat(enterprise-telemetry): wire bearer token auth and configurable insecure flag into OTEL exporter 2026-02-09 01:29:40 -08:00
GareArc
f2d07f3ec5 feat(enterprise-telemetry): add ENTERPRISE_OTLP_API_KEY config field 2026-02-09 01:26:26 -08:00
wangxiaolei
b62965034e refactor: document_indexing_sync_task split db session (#32129)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-02-09 17:16:17 +08:00
wangxiaolei
284c5f40f1 refactor: document_indexing_update_task split database session (#32105)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-02-09 15:57:42 +08:00
wangxiaolei
55de893984 refactor: partition Celery task sessions into smaller, discrete execu… (#32085)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-02-09 15:57:42 +08:00
QuantumGhost
b035b091fa perf: use batch delete method instead of single delete (#32036)
Co-authored-by: fatelei <fatelei@gmail.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: FFXN <lizy@dify.ai>
2026-02-09 15:57:42 +08:00
wangxiaolei
9898df5ed5 fix: fix tool type is miss (#32042) 2026-02-09 15:57:42 +08:00
wangxiaolei
075e90a253 fix: fix agent node tool type is not right (#32008)
Infer real tool type via querying relevant database tables.

The root cause for incorrect `type` field is still not clear.
2026-02-09 15:57:42 +08:00
QuantumGhost
9742185e6b perf(api): Optimize the response time of AppListApi endpoint (#31999) 2026-02-09 15:57:42 +08:00
wangxiaolei
51946a734a fix: fix miss use db.session (#31971) 2026-02-09 15:57:42 +08:00
NFish
08b8eff933 Merge remote-tracking branch 'origin/hotfix/1.12.1-fix.4' into release/e-1.12.1 2026-02-09 15:54:32 +08:00
wangxiaolei
125f7e3ab4 refactor: document_indexing_update_task split database session (#32105)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-02-09 10:51:45 +08:00
wangxiaolei
400ed2fd72 refactor: partition Celery task sessions into smaller, discrete execu… (#32085)
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-02-08 21:05:03 +08:00
GareArc
1b3a21e6f8 feat(telemetry): unify token metric label structure with Pydantic enforcement
- Add TokenMetricLabels BaseModel to enforce consistent label structure
- All dify.token.* metrics now use identical 6-label structure:
  * tenant_id, app_id, operation_type, model_provider, model_name, node_type
- Pydantic validation ensures runtime enforcement (extra='forbid', frozen=True)
- Enables filtering by operation_type to avoid double-counting:
  * workflow: aggregated workflow-level tokens
  * node_execution: individual node-level tokens
  * message: direct message tokens
  * rule_generate/code_generate: prompt generation tokens

Previously, inconsistent label cardinality made aggregation impossible:
- WORKFLOW: 3 labels
- NODE_EXECUTION: 6 labels
- MESSAGE: 5 labels
- PROMPT_GENERATION: 5 labels

Now all use the same 6-label structure for consistent querying.
2026-02-06 03:10:20 -08:00
GareArc
944eb28486 feat(telemetry): unify token metric label structure with Pydantic enforcement
- Add TokenMetricLabels BaseModel to enforce consistent label structure
- All dify.token.* metrics now use identical 6-label structure:
  * tenant_id, app_id, operation_type, model_provider, model_name, node_type
- Pydantic validation ensures runtime enforcement (extra='forbid', frozen=True)
- Enables filtering by operation_type to avoid double-counting:
  * workflow: aggregated workflow-level tokens
  * node_execution: individual node-level tokens
  * message: direct message tokens
  * rule_generate/code_generate: prompt generation tokens

Previously, inconsistent label cardinality made aggregation impossible:
- WORKFLOW: 3 labels
- NODE_EXECUTION: 6 labels
- MESSAGE: 5 labels
- PROMPT_GENERATION: 5 labels

Now all use the same 6-label structure for consistent querying.
2026-02-06 03:06:06 -08:00
GareArc
4e624af5e0 Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-02-06 02:41:58 -08:00
GareArc
11c74d741a feat: add dedicated app event counters and convert event names to StrEnum
- Add APP_CREATED, APP_UPDATED, APP_DELETED counters to EnterpriseTelemetryCounter
- Create EnterpriseTelemetryEvent StrEnum for type-safe event names
- Update metric_handler to use new app-specific counters with labels (tenant_id, app_id, mode)
- Convert all event_name strings to EnterpriseTelemetryEvent enum values
- Update exporter to create OTEL meters for new app counters (dify.app.created.total, etc.)
- Update tests to verify new counter behavior and enum usage
2026-02-06 02:38:19 -08:00
GareArc
ea9081f22d feat(telemetry): add operation_type labels for token metrics
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-02-06 01:06:07 -08:00
GareArc
4e3112bd7f feat(telemetry): add enterprise OTEL telemetry with gateway, traces, metrics, and logs 2026-02-06 01:02:19 -08:00
QuantumGhost
840a8f3fc2 perf: use batch delete method instead of single delete (#32036)
Co-authored-by: fatelei <fatelei@gmail.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: FFXN <lizy@dify.ai>
2026-02-06 15:13:17 +08:00
GareArc
ac8e96bd9d docs(telemetry): clarify enterprise_telemetry queue is EE-only 2026-02-05 23:10:37 -08:00
GareArc
91a6fe25d1 feat(telemetry): add enterprise OTEL telemetry with gateway, traces, metrics, and logs 2026-02-05 23:10:30 -08:00
GareArc
576eca2113 Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-02-05 23:07:48 -08:00
GareArc
8ded2d73f0 fix(telemetry): move EE guard to gateway routing level
Prevents CE users from enqueueing EE-only events (all METRIC_LOG cases)
to non-existent enterprise_telemetry Celery queue.

- Add _should_drop_ee_only_event() check in emit() before routing
- Remove redundant check from _emit_trace()
- Single guard at gateway level protects both trace and metric/log paths
2026-02-05 22:58:40 -08:00
GareArc
4a9b74f86b refactor(telemetry): simplify by eliminating TelemetryFacade
**Problem:**
The telemetry system had unnecessary abstraction layers and bad practices
from the last 3 commits introducing the gateway implementation:
- TelemetryFacade class wrapper around emit() function
- String literals instead of SignalType enum
- Dictionary mapping enum → string instead of enum → enum
- Unnecessary ENTERPRISE_TELEMETRY_GATEWAY_ENABLED feature flag
- Duplicate guard checks scattered across files
- Non-thread-safe TelemetryGateway singleton pattern
- Missing guard in ops_trace_task.py causing RuntimeError spam

**Solution:**
1. Deleted TelemetryFacade - replaced with thin emit() function in core/telemetry/__init__.py
2. Added SignalType enum ('trace' | 'metric_log') to enterprise/telemetry/contracts.py
3. Replaced CASE_TO_TRACE_TASK_NAME dict with CASE_TO_TRACE_TASK: dict[TelemetryCase, TraceTaskName]
4. Deleted is_gateway_enabled() and _emit_legacy() - using existing ENTERPRISE_ENABLED + ENTERPRISE_TELEMETRY_ENABLED instead
5. Extracted _should_drop_ee_only_event() helper to eliminate duplicate checks
6. Moved TelemetryGateway singleton to ext_enterprise_telemetry.py:
   - Init once in init_app() for thread-safety
   - Access via get_gateway() function
7. Re-added guard to ops_trace_task.py to prevent RuntimeError when EE=OFF but CE tracing enabled
8. Updated 11 caller files to import 'emit as telemetry_emit' instead of 'TelemetryFacade'

**Result:**
- 322 net lines deleted (533 removed, 211 added)
- All 91 tests pass
- Thread-safe singleton pattern
- Cleaner API surface: from TelemetryFacade.emit() to telemetry_emit()
- Proper enum usage throughout
- No RuntimeError spam in EE=OFF + CE=ON scenario
2026-02-05 22:41:09 -08:00
wangxiaolei
b4a5296fd1 fix: fix tool type is miss (#32042) 2026-02-06 14:38:54 +08:00
Xiyuan Chen
d7c3ae50dc Update api/services/tools/builtin_tools_manage_service.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-06 13:37:37 +08:00
yunlu.wen
fb38ad84e1 chore: upgrade deps, see pull #30976 2026-02-06 13:37:33 +08:00
GareArc
849b4b8c40 fix: add TYPE_CHECKING import for Account type annotation 2026-02-06 13:32:20 +08:00
GareArc
990e8feee8 security: fix IDOR and privilege escalation in set_default_provider
- Add tenant_id verification to prevent IDOR attacks
- Add admin check for enterprise tenant-wide default changes
- Preserve non-enterprise behavior (users can set own defaults)
2026-02-06 13:32:18 +08:00
GareArc
53641019b1 fix: remove user_id filter when clearing default provider (enterprise only)
When setting a new default credential in enterprise mode, the code was
only clearing is_default for credentials matching the current user_id.
This caused issues when:
1. Enterprise credential A (synced with system user_id) was default
2. User sets local credential B as default
3. A still had is_default=true (different user_id)
4. Both A and B were considered defaults

The fix removes user_id from the filter only for enterprise deployments,
since enterprise credentials may have different user_id than local ones.
Non-enterprise behavior is unchanged to avoid breaking existing setups.

Fixes EE-1511
2026-02-06 13:31:50 +08:00
GareArc
d1f10ff301 feat: add redis mq for account deletion cleanup 2026-02-06 13:31:50 +08:00
GareArc
4d47339ce6 feat: Add parent trace context propagation for workflow-as-tool hierarchy
Enables distributed tracing for nested workflows across all trace providers
(Langfuse, LangSmith, community providers). When a workflow invokes another
workflow via workflow-as-tool, the child workflow now includes parent context
attributes that allow trace systems to reconstruct the full execution tree.

Changes:
- Add parent_trace_context field to WorkflowTool
- Set parent context in tool node when invoking workflow-as-tool
- Extract and pass parent context through app generator

This is a community enhancement (ungated) that improves distributed tracing
for all users. Parent context includes: trace_id, node_execution_id,
workflow_run_id, and app_id.
2026-02-05 20:19:29 -08:00
GareArc
6e47e163b8 fix(telemetry): use atomic Redis SET NX for idempotency and register Celery queue 2026-02-05 20:15:34 -08:00
GareArc
1663a7ab4c feat(telemetry): add gateway diagnostics and verify integration
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-Claude)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-02-05 20:15:13 -08:00
GareArc
51b0c5c89c feat(telemetry): implement gateway routing and enqueue logic
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-Claude)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-02-05 20:15:13 -08:00
GareArc
752b01ae91 refactor(telemetry): migrate event handlers to gateway-only producers
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-Claude)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-02-05 20:15:12 -08:00