You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: HITL Service API polish and 1.14 release follow-ups (#760)
* feat: add env var ignore-list consumed by the verifier
* docs: add 1.14 env vars and correct drifted defaults
* fix: correct knowledge pipeline datasource-plugins response in api references
* chore: record 1.14 env var traces in deep-dive
* docs: update HITL Service API documentation
* docs: clarify /api-reference/ link convention in formatting guide
* docs: update marketplace publish flow
* fix: drop duplicate workflow_run_id from HITL event data schemas
The three HITL event schemas (StreamEventHumanInputRequired,
StreamEventHumanInputFormFilled, StreamEventHumanInputFormTimeout)
declared `workflow_run_id` inside `data.properties`, but
`HumanInputRequiredResponse` in `api/core/app/entities/task_entities.py`
defines `workflow_run_id` only as a top-level field on the response
(alongside `event` and `task_id`); its inner `Data` class doesn't carry
one. The OpenAPI spec already provides top-level `workflow_run_id` via
the `$ref: StreamEventBase` in the `allOf` composition, so the inline
duplicate was a phantom field that doesn't exist in the actual payload.
Remove the inline `workflow_run_id` from `data.properties` in all three
HITL event schemas across all six spec files. This relies on
`StreamEventBase` to provide `workflow_run_id` at the top level via
composition, matching how every other event schema in this spec handles
it (e.g., `StreamEventWorkflowStarted`).
Reported by Copilot on PR #756.
* fix: address Copilot's review comments
* feat: skip api-reference paths in internal link checker
The /api-reference/... URL convention (no language prefix, derived
from OpenAPI tag/summary) generates pages that Mintlify auto-builds
rather than from filesystem MDX files, so the existing filesystem
resolution logic flags every such link as broken.
Skip both filesystem-existence and anchor validation for any URL
containing /api-reference/, mirroring the existing
anchor_check_skipped behaviour. The convention is documented in
writing-guides/formatting-guide.md.
Caveat: typos in tag or summary slugs will now pass silently. A
follow-up could parse the OpenAPI specs to validate against the
real tag/summary kebab pairs if hand-written api-reference links
become common enough to warrant it.
* fix: correct CORS anchor in ja environments page
Inline link [CORS設定](#cors 設定) used a literal space, but the
heading "CORS 設定" slugifies to "cors-設定" (whitespace replaced
with hyphen). Update the anchor to #cors-設定.
* fix: align datasource-plugins is_published and 200 description with node framing
* fix: clarify chatflow stream workflow events task_id origin
* docs: address HITL Service API reader-test feedback
Copy file name to clipboardExpand all lines: .claude/skills/dify-docs-env-vars/SKILL.md
+10Lines changed: 10 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -137,6 +137,16 @@ The script reports:
137
137
138
138
Use `.env.example` defaults (what Docker Compose users actually get), not Pydantic code defaults.
139
139
140
+
### Intentionally ignored variables
141
+
142
+
Some variables in `.env.example` are deliberately not documented (Cloud-only, experimental, or verifier false positives). The verifier reads these from `ignored-vars.md` (same directory) and filters them out. When you:
143
+
144
+
- Remove a variable from the docs as Cloud-only → add it under **Cloud-only (SaaS)** in `ignored-vars.md`.
145
+
- Skip documenting an experimental or internal flag → add it under **Experimental / internal**.
146
+
- Document a supported variable whose `.env.example` entry is commented out → add it under **Verifier false positives**.
147
+
148
+
Every entry must include a source reference (PR, commit, or audit date).
149
+
140
150
## Translation
141
151
142
152
The automated translation pipeline does not cover `en/self-host/configuration/environments.mdx`. After editing that English file, manually update `zh/self-host/configuration/environments.mdx` and `ja/self-host/configuration/environments.mdx` to match.
Copy file name to clipboardExpand all lines: .claude/skills/dify-docs-env-vars/deep-dive.md
+202Lines changed: 202 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1277,3 +1277,205 @@ Two modes: `basic` (username/password via http_auth) and `aws_managed_iam` (SigV
1277
1277
### OCEANBASE_ENABLE_HYBRID_SEARCH
1278
1278
1279
1279
Similar to Milvus—enables fulltext index creation for BM25 queries alongside vector search. Requires OceanBase >= 4.3.5.1. Collections must be recreated after enabling.
1280
+
1281
+
---
1282
+
1283
+
## 1.14 Additions (traced 2026-04-22)
1284
+
1285
+
### REDIS_KEY_PREFIX
1286
+
1287
+
**Default:**`""` (empty)
1288
+
1289
+
**What it actually does:** Prepends a string namespace to every Redis key that Dify writes, so multiple Dify deployments can safely share one Redis server. When set to `staging:`, a `get("session_token:abc")` call becomes `GET staging:session_token:abc` on the wire.
1290
+
1291
+
The prefix is threaded through `RedisClientWrapper` in `api/extensions/ext_redis.py` via helpers in `api/extensions/redis_names.py` (`serialize_redis_name`, `serialize_redis_name_arg`, `serialize_redis_name_args`, `normalize_redis_key_prefix`). Every wrapper method — `get`, `set`, `setex`, `delete`, `incr`, `expire`, `exists`, `ttl`, `lock`, `hset`, `zadd`, and so on — prefixes its name argument before forwarding. `delete(*names)` and `exists(*names)` prefix every name.
1292
+
1293
+
Beyond direct key operations, the prefix is also applied to:
-**Celery Redis transport** — applied as Celery's `global_keyprefix` transport option in `api/extensions/ext_celery.py`, so broker queues and result-backend keys follow the same namespace
`normalize_redis_key_prefix()` strips whitespace; whitespace-only values are treated as empty (no prefixing).
1301
+
1302
+
**If left empty:** Keys are written unprefixed (backward-compatible with existing deployments). Correct choice when Dify has Redis to itself.
1303
+
1304
+
**If set:** Every key, channel, stream, and Celery artifact is namespaced. Existing data written without the prefix becomes invisible to the new client — plan a wipe or dual-run when switching.
**What they actually do:**`_get_retry_policy()` in `api/extensions/ext_redis.py` constructs a shared `redis.retry.Retry` object with `ExponentialWithJitterBackoff(base=BACKOFF_BASE, cap=BACKOFF_CAP)` and `retries=RETRIES`. The policy is attached to every standalone, Sentinel, and Cluster client (via `_get_connection_health_params()` / `_get_cluster_connection_health_params()`), and also to pub/sub clients built by `_create_pubsub_client()`.
1322
+
1323
+
When `redis-py` encounters transient failures (`ConnectionError`, `TimeoutError`, `socket.timeout`), it calls `Retry.call_with_retry()`, which sleeps `min(base * (2^attempt) + jitter, cap)` seconds between attempts, up to `retries` attempts. With the defaults, worst-case wait before surfacing the error is roughly `1s + 2s + 4s = 7s` plus jitter, capped at 10s per sleep.
1324
+
1325
+
**If left at default:** Most transient hiccups (master failover, brief DNS blip, half-open socket) are invisible to callers. Worst-case latency cost on a bad command is bounded.
1326
+
1327
+
**If `REDIS_RETRY_RETRIES=0`:** No retry; every transient error propagates immediately. Matches pre-1.14 behavior.
1328
+
1329
+
**If backoff values are raised:** Longer tails but more patience for slow failovers. Lowered: faster failure but less resilience.
**What they actually do:**`socket_timeout` bounds how long each Redis command waits on a read/write on an already-established connection; `socket_connect_timeout` bounds how long the TCP handshake phase can take. Both are part of `RedisBaseParamsDict` in `_get_base_redis_params()` and flow into every client type — `redis.ConnectionPool`, `Sentinel.master_for()`, `RedisCluster`, and pub/sub clients all receive them.
1345
+
1346
+
Before PR #34566, the main backend clients built through `ConnectionPool(**redis_params)` / `sentinel.master_for(...)` / `RedisCluster.from_url(...)` used `redis-py`'s internal default (no socket timeout on standalone), which meant commands could block indefinitely on a silently-dropped connection.
1347
+
1348
+
**If left at default:** Stuck connections surface as timeouts after 5 seconds. Appropriate for most local or same-region deployments.
1349
+
1350
+
**If increased:** Necessary for cloud or WAN deployments where p99 network latency exceeds 5s under load. The existing `REDIS_SENTINEL_SOCKET_TIMEOUT` doc already notes this pattern for Sentinel; the same reasoning applies to the main client.
- Used in: `_get_connection_health_params()`, `_get_cluster_connection_health_params()` in `api/extensions/ext_redis.py`
1355
+
1356
+
**Source:** PR #34566, merged 2026-04-09.
1357
+
1358
+
---
1359
+
1360
+
### REDIS_HEALTH_CHECK_INTERVAL
1361
+
1362
+
**Default:**`30` (seconds)
1363
+
1364
+
**What it actually does:**`redis-py`'s `Connection` class sends a PING on a connection if it has been idle longer than this many seconds before reusing it. Catches half-open sockets that the kernel hasn't noticed yet (e.g., after a NAT rebind or a silent LB timeout). Set to `0` to disable.
1365
+
1366
+
**Important asymmetry:** The parameter is passed only in `_get_connection_health_params()` (standalone + Sentinel). `_get_cluster_connection_health_params()` explicitly drops it — see the inline comment in `ext_redis.py`:
1367
+
1368
+
> "RedisCluster does not support `health_check_interval` as a constructor keyword (it is silently stripped by `cleanup_kwargs`), so it is excluded here. Only `retry`, `socket_timeout`, and `socket_connect_timeout` are passed through."
1369
+
1370
+
This is a known `redis-py` quirk. The doc row explicitly flags it so cluster users don't waste time tuning a no-op.
1371
+
1372
+
**If left at default:** Background PINGs every 30s on idle connections prevent stale-connection errors.
1373
+
1374
+
**If set to 0:** No background health checks. Saves a tiny bit of traffic; acceptable if load is high enough that every connection is used constantly.
**What they actually do:** Control when the Baidu Vector DB backend rebuilds its ANN index automatically. The Baidu SDK treats them as the "absolute row increase" and "relative row increase" thresholds; when either is exceeded, the index is rebuilt in the background.
1390
+
1391
+
Defined in `api/configs/middleware/vdb/baidu_vector_config.py` on `BaiduVectorDBConfig`; passed to the Baidu backend factory when initializing a collection. Only meaningful when `VECTOR_STORE=baidu`.
1392
+
1393
+
**If left at default:** Index rebuilds are triggered by 500 new rows OR a 5% increase, whichever happens first. Keeps search quality high for typical workloads.
1394
+
1395
+
**If raised:** Fewer rebuilds, lower CPU churn, but search quality degrades between rebuilds.
1396
+
1397
+
**If lowered:** More frequent rebuilds, higher background load, freshest index.
**Code inconsistency to flag:** The Pydantic `Field` description in `baidu_vector_config.py` reads "default is 3600 seconds" but the actual `default=300`. `docker/.env.example` also uses 300. Document 300 (what users actually get); the description string is stale and should be flagged upstream.
1410
+
1411
+
**What it actually does:** Maximum wall-clock time the client waits for a Baidu VDB index rebuild before raising a timeout. 300 seconds (5 minutes) is adequate for small-to-medium collections; large collections (millions of rows) may need more.
1412
+
1413
+
**If it times out:** The client-side call fails, but the rebuild may still complete on the server. Re-querying after the rebuild succeeds typically resolves the error.
`celery inspect ping` is a synchronous command that round-trips through the broker to ask every worker "are you alive?" and waits for replies. Under heavy load it can itself take significant time and contribute to broker contention, which is why the health check is **disabled by default**.
1436
+
1437
+
**If disabled (default):** Compose marks the worker container healthy based on process liveness only (PID 1 running). Lighter but won't detect a hung worker that's still alive at the process level.
1438
+
1439
+
**If enabled (`COMPOSE_WORKER_HEALTHCHECK_DISABLED=false`):** Compose runs `celery inspect ping` every `INTERVAL` with a `TIMEOUT` per attempt. Three consecutive failures mark the container unhealthy, which triggers Compose restart policies or orchestration reactions. Useful when operators have observed hung-worker incidents and the added broker traffic is acceptable.
- No Pydantic config; these are Compose-only, not read by Python code.
1446
+
1447
+
---
1448
+
1449
+
### ALLOW_INLINE_STYLES
1450
+
1451
+
**Default:** `false`
1452
+
1453
+
**What it actually does:** Frontend-only security toggle. `web/docker/entrypoint.sh` maps the operator-facing `ALLOW_INLINE_STYLES` (set in `docker/.env`) to `NEXT_PUBLIC_ALLOW_INLINE_STYLES` for the Next.js runtime:
The frontend's Markdown sanitizer reads `NEXT_PUBLIC_ALLOW_INLINE_STYLES` to decide whether to allow inline `style="..."` attributes and `<style>` tags in user-generated Markdown (chat responses, knowledge base content, and so on). Disabled by default because inline styles can be abused for phishing (e.g., hiding a malicious link behind a styled block that overlays trusted UI).
1460
+
1461
+
**If disabled (default):** Markdown rendering strips inline styles. User-authored content still renders, just without custom styling.
1462
+
1463
+
**If enabled:** Inline styles pass through. Enable only if your content pipeline is trusted and you need rich visual control from Markdown authors.
1464
+
1465
+
**Key code locations:**
1466
+
- Mapping: `web/docker/entrypoint.sh`
1467
+
- Default: `docker/.env.example`(root) and `web/.env.example` (source-code deployments)
1468
+
1469
+
---
1470
+
1471
+
### CELERY_WORKER_AMOUNT — default correction
1472
+
1473
+
The existing entry near line 937 describes behavior correctly, but the stated default ("1") no longer matches `docker/.env.example`, which sets `CELERY_WORKER_AMOUNT=4` (consumed by `docker-compose.yaml` via `${CELERY_WORKER_AMOUNT:-4}`). Docs updated to `4`.
1474
+
1475
+
**Why the change matters:** 4 is a better out-of-the-box baseline for a machine with a few cores; users with lighter workloads can still set it lower, and `CELERY_AUTO_SCALE=true` overrides it entirely.
1476
+
1477
+
---
1478
+
1479
+
### POSTGRES_MAX_CONNECTIONS — default correction
1480
+
1481
+
Covered in the "PostgreSQL / MySQL Performance Tuning Variables" section. `docker/.env.example` bumped the default from `100` to `200` upstream (`docker-compose.yaml` passes it as `-c max_connections=${POSTGRES_MAX_CONNECTIONS:-200}` to the Postgres container). The higher default is safer for Dify's multi-worker + Celery + async-task traffic shape; operators can still lower it on constrained hosts.
Variables listed here appear in Dify's `docker/.env.example` or `api/configs/`, but are deliberately **not** documented in `en/self-host/configuration/environments.mdx`. The verifier script reads this file and skips matching variables when comparing docs against `.env.example`.
4
+
5
+
## When to update this list
6
+
7
+
Add an entry when you:
8
+
9
+
- Remove a variable from the docs because it only applies to Dify Cloud.
10
+
- Skip documenting a new variable because it's experimental, internal, or not user-tunable.
11
+
- Identify a verifier false positive (e.g., the variable is commented-out in `.env.example` but documented because the code supports it).
12
+
13
+
Remove an entry when the reason no longer holds (e.g., an experimental flag graduates to a stable, user-facing feature).
14
+
15
+
Every entry requires: variable name, category, reason, and a source reference (commit, PR, or issue). This enforces traceability so later maintainers can audit the decision.
16
+
17
+
## Format
18
+
19
+
The verifier parses the tables below. A line is treated as an ignore entry when it matches `| \`VARIABLE_NAME\` | ...`. Additional columns are informational.
20
+
21
+
---
22
+
23
+
## Cloud-only (SaaS)
24
+
25
+
Meaningful only on the hosted Dify Cloud deployment; self-hosted users cannot use or benefit from them. Removing these from the self-host docs prevents confusion.
26
+
27
+
| Variable | Reason | Source |
28
+
|---|---|---|
29
+
|`ENABLE_WEBSITE_JINAREADER`| Cloud UI feature flag for Jina Reader crawler. | PR #721, commit 9248032 |
30
+
|`ENABLE_WEBSITE_FIRECRAWL`| Cloud UI feature flag for Firecrawl. | PR #721, commit 9248032 |
31
+
|`ENABLE_WEBSITE_WATERCRAWL`| Cloud UI feature flag for WaterCrawl. | PR #721, commit 9248032 |
Feature flags for unfinished or staff-only features. Not yet meant for self-hosted tuning.
51
+
52
+
| Variable | Reason | Source |
53
+
|---|---|---|
54
+
|`EXPERIMENTAL_ENABLE_VINEXT`| Switches the web container to an experimental Vite-based server (`web/docker/entrypoint.sh`). Not a supported user-facing knob. | 1.14 sync audit, 2026-04-22 |
55
+
56
+
## Verifier false positives
57
+
58
+
The variable is documented in `environments.mdx` and supported by the backend, but the verifier reports it as missing from `.env.example` because the example entry is commented out.
59
+
60
+
| Variable | Reason | Source |
61
+
|---|---|---|
62
+
|`ALIYUN_CLOUDBOX_ID`| Commented-out `#ALIYUN_CLOUDBOX_ID=your-cloudbox-id` in `docker/.env.example`; backend field exists in `api/configs/middleware/storage/aliyun_oss_storage_config.py`. | 1.14 sync audit, 2026-04-22 |
0 commit comments