docs(enterprise): clarify Processing Engine in a multi-node cluster (#7185)

jstirnaman · web-flow · commit af83fa10c142 · 2026-05-05T16:22:34.000-05:00
* docs(enterprise): clarify Processing Engine in a multi-node cluster Verified against InfluxDB 3 Enterprise 3.9.1 and the influxdb3-ref-network-telemetry reference architecture. - Reframe "process nodes" as "process-capable nodes" in clustering.md and document that --plugin-dir + --node-spec — not --mode — gate trigger execution. - Replace the "Dedicated process-only node" example with a --mode=process,query example so plugins can call influxdb3_local.query() locally; link the cross-node write-back pattern. - Document that every cluster node needs --plugin-dir configured because the catalog validates registered triggers cluster-wide. - Expand the --mode=process bullet in config-options.md and influxdb3-processing-engine.md to describe its actual semantics (no API surface; activates the engine; combine with another mode). - Add a "Pin a trigger to specific nodes in a cluster" section to the create trigger reference, including the verified error surfaces: invalid-node-name (HTTP 500) at create time, mode-mismatch failures at execution time, request-trigger 404 from unpinned nodes. - Rewrite the "Distributed cluster considerations" section in shared/influxdb3-plugins/_index.md with the verified WAL fan-out, schedule write-back, and request routing semantics. - Add a new admin page covering how to start a cluster and troubleshoot common Processing Engine misconfigurations. Closes influxdata/DAR#685 * fix(links): correct --plugin-dir and WAL anchor targets The --plugin-dir option is documented at config-options/#plugin-dir, not at cli/influxdb3/serve/#plugin-dir. The WAL section anchor is #write-ahead-log-wal-persistence, not #write-ahead-log-wal.
diff --git a/content/influxdb3/enterprise/admin/clustering.md b/content/influxdb3/enterprise/admin/clustering.md
@@ -28,7 +28,7 @@ cluster efficiency.
 - [Configure ingest nodes](#configure-ingest-nodes)
 - [Configure query nodes](#configure-query-nodes)
 - [Configure compactor nodes](#configure-compactor-nodes)
-- [Configure process nodes](#configure-process-nodes)
+- [Configure process-capable nodes](#configure-process-capable-nodes)
 - [Multi-mode configurations](#multi-mode-configurations)
 - [Cluster architecture examples](#cluster-architecture-examples)
 - [Scale your cluster](#scale-your-cluster)
@@ -45,8 +45,8 @@ In an {{% product-name %}} cluster, you can dedicate nodes to specific tasks:
 - **Ingest nodes**: Optimized for high-throughput data ingestion
 - **Query nodes**: Maximized for complex analytical queries
 - **Compactor nodes**: Dedicated to data compaction and optimization
-- **Process nodes**: Focused on data processing and transformations
-- **All-in-one nodes**: Balanced for mixed workloads
+- **Process-capable nodes**: Any node with `--plugin-dir` configured can execute Processing Engine plugins. Use [`--node-spec`](/influxdb3/enterprise/reference/cli/influxdb3/create/trigger/#options) when creating a trigger to pin its execution to specific nodes.
+- **All-in-one nodes**: Balanced for mixed workloads (single-node deployments only)
 
 ## Configure node modes
 
@@ -69,7 +69,7 @@ Available modes:
 - `ingest`: Data ingestion and line protocol parsing
 - `query`: Query execution and data retrieval
 - `compact`: Background compaction and optimization
-- `process`: Data processing and transformations
+- `process`: Activates the Processing Engine. `process` has no API surface of its own — it activates the Python virtual machine that runs trigger plugins. Setting [`--plugin-dir`](/influxdb3/enterprise/reference/config-options/#plugin-dir) implies `process` mode, so you rarely need to set `process` explicitly. In a multi-node cluster, combine `process` with another mode (typically `query`, so plugins can call `influxdb3_local.query()` against the local engine) — see [Configure process-capable nodes](#configure-process-capable-nodes).
 
 > [!Warning]
 > #### Don't use all mode in a multi-node cluster
@@ -87,11 +87,11 @@ Available modes:
 Every node has two thread pools that must be properly configured:
 
 1. **IO threads**: Parse line protocol, handle HTTP requests
-2. **DataFusion threads**: Execute queries, create data snapshots (convert [WAL data](/influxdb3/enterprise/reference/internals/durability/#write-ahead-log-wal) to Parquet files), perform compaction
+2. **DataFusion threads**: Execute queries, create data snapshots (convert [WAL data](/influxdb3/enterprise/reference/internals/durability/#write-ahead-log-wal-persistence) to Parquet files), perform compaction
 
 > [!Note]
 > Even specialized nodes need both thread types. Ingest nodes use DataFusion threads
-> for creating data snapshots that convert [WAL data](/influxdb3/enterprise/reference/internals/durability/#write-ahead-log-wal) to Parquet files, and query nodes use IO threads for handling requests.
+> for creating data snapshots that convert [WAL data](/influxdb3/enterprise/reference/internals/durability/#write-ahead-log-wal-persistence) to Parquet files, and query nodes use IO threads for handling requests.
 
 ## Configure ingest nodes
 
@@ -244,11 +244,20 @@ You can adjust compaction strategies to balance performance and resource usage:
 --compaction-cleanup-wait=10m
 ```
 
-## Configure process nodes
+## Configure process-capable nodes
 
-Process nodes handle data transformations and processing plugins.
-Setting `--plugin-dir` automatically adds `process` mode to any node, so you don't need to explicitly set `--mode=process`.
-If you do set `--mode=process`, you must also set `--plugin-dir`.
+Any node with [`--plugin-dir`](/influxdb3/enterprise/reference/config-options/#plugin-dir) configured can execute Processing Engine plugins.
+Setting `--plugin-dir` implicitly adds `process` mode regardless of the node's other modes; explicit `--mode=process` requires `--plugin-dir` to be set.
+
+> [!Important]
+> #### Configure `--plugin-dir` on every cluster node
+>
+> The Enterprise catalog registers triggers cluster-wide.
+> Every node validates the registered triggers at startup, even nodes that don't execute them — for example, ingest-only and compact-only nodes.
+> If a plugin file referenced by a registered trigger is missing on a node, the engine panics on startup.
+>
+> Configure `--plugin-dir` on every node and make the same plugin files available to each one (for example, by mounting a shared directory in your container or pod spec).
+> Use [`--node-spec`](/influxdb3/enterprise/reference/cli/influxdb3/create/trigger/#options) on each trigger to control which nodes actually execute it.
 
 ### Enable the Processing Engine on any node
 
@@ -263,9 +272,10 @@ influxdb3 \
   --cluster-id=prod-cluster
 ```
 
-### Dedicated process-only node (16 cores)
+### Process + query node (16 cores)
 
-To create a node that only handles processing (no ingest, query, or compaction), set `--mode=process`:
+The recommended pattern for a node that hosts schedule plugins.
+Combining `process` with `query` lets plugins call `influxdb3_local.query()` against the local engine without an extra network hop:
 
 ```bash
 influxdb3 \
@@ -274,11 +284,19 @@ influxdb3 \
   --num-cores=16 \
   --datafusion-num-threads=12 \
   --plugin-dir=/path/to/plugins \
-  --mode=process \
+  --mode=process,query \
   --node-id=processor-01 \
   --cluster-id=prod-cluster
 ```
 
+A node in `process,query` mode doesn't accept writes locally.
+Schedule plugins running on it that need to write results back to the cluster must POST line protocol to an ingest node.
+
+> [!Note]
+> #### Cross-node write-back example
+>
+> The [`influxdb3-ref-network-telemetry`](https://github.com/influxdata/influxdb3-ref-network-telemetry) reference architecture's [`plugins/_writeback.py`](https://github.com/influxdata/influxdb3-ref-network-telemetry/blob/main/plugins/_writeback.py) helper round-robins writes across configured ingest URLs with one fallback hop on connection error.
+
 ## Multi-mode configurations
 
 Some deployments benefit from nodes handling multiple responsibilities.
diff --git a/content/influxdb3/enterprise/admin/processing-engine-cluster.md b/content/influxdb3/enterprise/admin/processing-engine-cluster.md
@@ -0,0 +1,206 @@
+---
+title: Run the Processing Engine in a cluster
+description: >
+  Configure, start, and troubleshoot Processing Engine plugins in a multi-node
+  InfluxDB 3 Enterprise cluster — including how triggers fan out across nodes,
+  how to pin triggers, and how to recognize common misconfiguration errors.
+menu:
+  influxdb3_enterprise:
+    name: Run the Processing Engine in a cluster
+    parent: Administer InfluxDB
+weight: 102
+related:
+  - /influxdb3/enterprise/admin/clustering/
+  - /influxdb3/enterprise/plugins/
+  - /influxdb3/enterprise/reference/processing-engine/
+  - /influxdb3/enterprise/reference/cli/influxdb3/create/trigger/
+  - /influxdb3/enterprise/reference/cli/influxdb3/serve/
+influxdb3/enterprise/tags: [processing engine, plugins, clustering, triggers, troubleshooting]
+---
+
+This guide covers how the Processing Engine behaves in a multi-node {{% product-name %}} cluster and how to troubleshoot common misconfigurations.
+
+For single-node deployments, defaults work as documented in [Set up the Processing Engine](/influxdb3/enterprise/plugins/#set-up-the-processing-engine).
+The cluster-specific behavior described here applies when you run more than one `influxdb3 serve` process against a shared catalog and object store.
+
+- [How trigger execution works in a cluster](#how-trigger-execution-works-in-a-cluster)
+- [Start the cluster](#start-the-cluster)
+- [Worked example: 5-node reference architecture](#worked-example-5-node-reference-architecture)
+- [Troubleshoot misconfigurations](#troubleshoot-misconfigurations)
+
+## How trigger execution works in a cluster
+
+Three independent factors determine whether a Processing Engine trigger runs on a given node:
+
+1. The node has [`--plugin-dir`](/influxdb3/enterprise/reference/config-options/#plugin-dir) configured.
+2. The trigger's [`--node-spec`](/influxdb3/enterprise/reference/cli/influxdb3/create/trigger/#options) includes the node — by default, `all` (every node with `--plugin-dir`).
+3. The trigger's plugin imports modules that are available in that node's per-node Python virtual environment.
+
+`--mode` controls which APIs the node serves (writes, queries, compaction).
+**`--mode` does not gate trigger execution.**
+A trigger pinned to a `compact`-only node still fires on that node — it just fails per tick if the plugin needs APIs the node doesn't serve.
+
+| Trigger type   | Pin to                                                             | Why                                                                                              |
+|----------------|--------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
+| WAL (`table:`) | An ingest-capable node                                             | Each ingester owns its own WAL; pinning to one ingester sees only that node's writes.            |
+| Schedule (`every:` or `cron:`) | A node with `process,query` mode                   | The plugin reads via `influxdb3_local.query()` locally; results write back to an ingester via HTTP. |
+| Request (`request:`) | A node with `query` mode (the host-exposed port)             | The HTTP route exists only on pinned nodes; unpinned nodes return `404 not found`.               |
+
+## Start the cluster
+
+Each cluster node runs `influxdb3 serve` with a unique `--node-id`, the same `--cluster-id`, and a shared object store and catalog.
+Configure `--plugin-dir` on **every** node, even nodes that don't execute plugins — see [Configure `--plugin-dir` on every cluster node](/influxdb3/enterprise/admin/clustering/#configure-process-capable-nodes).
+
+```bash { placeholders="CLUSTER_ID|DATA_DIR|PLUGINS_DIR|NODE_ID" }
+# Ingest node
+influxdb3 serve \
+  --cluster-id CLUSTER_ID \
+  --node-id NODE_ID \
+  --mode ingest \
+  --object-store file \
+  --data-dir DATA_DIR \
+  --plugin-dir PLUGINS_DIR
+
+# Query node (host-exposed)
+influxdb3 serve \
+  --cluster-id CLUSTER_ID \
+  --node-id NODE_ID \
+  --mode query \
+  --object-store file \
+  --data-dir DATA_DIR \
+  --plugin-dir PLUGINS_DIR
+
+# Compact node (one per cluster)
+influxdb3 serve \
+  --cluster-id CLUSTER_ID \
+  --node-id NODE_ID \
+  --mode compact \
+  --object-store file \
+  --data-dir DATA_DIR \
+  --plugin-dir PLUGINS_DIR
+
+# Process,query node (hosts schedule plugins)
+influxdb3 serve \
+  --cluster-id CLUSTER_ID \
+  --node-id NODE_ID \
+  --mode process,query \
+  --object-store file \
+  --data-dir DATA_DIR \
+  --plugin-dir PLUGINS_DIR
+```
+
+After all nodes are up, register triggers from any node and pin them with `--node-spec`:
+
+```bash { placeholders="AUTH_TOKEN|DATABASE_NAME|NODE_ID" }
+# Schedule trigger pinned to the process,query node
+influxdb3 create trigger \
+  --database DATABASE_NAME \
+  --token AUTH_TOKEN \
+  --path schedule_rollup.py \
+  --trigger-spec "every:5s" \
+  --node-spec "nodes:NODE_ID" \
+  hourly_rollup
+```
+
+## Worked example: 5-node reference architecture
+
+The [`influxdata/influxdb3-ref-network-telemetry`](https://github.com/influxdata/influxdb3-ref-network-telemetry) repo provides a complete 5-node {{% product-name %}} cluster you can run locally with `docker compose`:
+
+- 2 ingest nodes (`--mode=ingest`)
+- 1 query node (`--mode=query`, host-exposed on port 8181)
+- 1 compact node (`--mode=compact`)
+- 1 process,query node (`--mode=process,query`, hosts schedule plugins)
+
+The repo demonstrates:
+
+- Pinning schedule triggers to the process node and request triggers to the query node with `--node-spec nodes:<id>`.
+- Cross-node write-back from schedule plugins via HTTP — see [`plugins/_writeback.py`](https://github.com/influxdata/influxdb3-ref-network-telemetry/blob/main/plugins/_writeback.py).
+- Mounting the same plugin directory on every node (including ingest and compact) for catalog validation at startup.
+
+Use this repo as a template when designing your own cluster.
+
+## Troubleshoot misconfigurations
+
+### `invalid node name (<id>)` when creating a trigger
+
+The cluster validates `--node-spec nodes:<id>` against current cluster membership at create time.
+A typo or unknown node ID returns `HTTP 500: invalid node name (<id>)`.
+
+To fix:
+
+1. List current cluster members and their node IDs:
+
+   ```bash { placeholders="AUTH_TOKEN" }
+   influxdb3 show nodes --token AUTH_TOKEN
+   ```
+
+   The `mode` column shows the node's runtime modes — `process` is included automatically on any node that has `--plugin-dir` configured.
+
+2. Reissue `influxdb3 create trigger` with the correct `--node-spec`.
+
+### `HTTP 404 {error: "not found"}` when calling a request trigger
+
+The `/api/v3/engine/<trigger_name>` route exists only on the node(s) the trigger is pinned to.
+There is no internal cross-node routing for request triggers.
+
+To fix:
+
+- Verify the node-spec on the trigger:
+
+  ```bash { placeholders="AUTH_TOKEN|DATABASE_NAME" }
+  influxdb3 query \
+    --database DATABASE_NAME \
+    --token AUTH_TOKEN \
+    "SELECT trigger_name, trigger_specification FROM system.processing_engine_triggers"
+  ```
+
+- Either pin the trigger to the node receiving the HTTP request (typically a `query`-mode node), or route the request to a node where the trigger is already pinned.
+
+### Schedule trigger logs `ModuleNotFoundError` per tick
+
+The trigger fired on its pinned node, but the plugin imports a module that's not in that node's per-node Python virtual environment.
+
+To fix:
+
+- Install the missing package on the pinned node:
+
+  ```bash { placeholders="PACKAGE_NAME" }
+  influxdb3 install package PACKAGE_NAME
+  ```
+
+- Or pin the trigger to a node that has the required module already installed.
+
+### Engine panics on startup with a missing plugin file
+
+The Enterprise catalog registers triggers cluster-wide.
+Every node validates the registered triggers at startup, including nodes that don't execute them.
+If a plugin file referenced by a registered trigger is missing on a node, the engine panics on startup.
+
+To fix:
+
+- Configure `--plugin-dir` on every cluster node and copy or mount the same plugin files to each one.
+- If a plugin was deleted but the trigger still references it, drop the orphaned trigger:
+
+  ```bash { placeholders="AUTH_TOKEN|DATABASE_NAME|TRIGGER_NAME" }
+  influxdb3 delete trigger \
+    --database DATABASE_NAME \
+    --token AUTH_TOKEN \
+    --force TRIGGER_NAME
+  ```
+
+### Plugin operations fail in administrative tools
+
+If an administrative tool reports a generic plugin error against your cluster, check whether any node satisfies the request:
+
+1. Confirm at least one node has `--plugin-dir` configured and runs the plugin's required mode (typically `process,query` for schedule plugins, `query` for request plugins).
+2. Confirm the trigger's `--node-spec` includes a running, healthy node.
+3. Inspect the `system.processing_engine_logs` table on the pinned node for execution errors:
+
+   ```bash { placeholders="AUTH_TOKEN|DATABASE_NAME" }
+   influxdb3 query \
+     --database DATABASE_NAME \
+     --token AUTH_TOKEN \
+     "SELECT event_time, trigger_name, log_level, log_text \
+      FROM system.processing_engine_logs \
+      ORDER BY event_time DESC LIMIT 20"
+   ```
diff --git a/content/shared/influxdb3-cli/config-options.md b/content/shared/influxdb3-cli/config-options.md
@@ -199,7 +199,7 @@ This option supports the following values:
 - `ingest`: Enables only data ingest capabilities
 - `query`: Enables only query capabilities
 - `compact`: Enables only compaction processes
-- `process`: Enables only data processing capabilities
+- `process`: Activates the [Processing Engine](/influxdb3/enterprise/reference/processing-engine/) so the node can execute trigger plugins. `process` has no API surface of its own — it doesn't accept writes or serve queries. Setting [`--plugin-dir`](#plugin-dir) implicitly adds `process` mode regardless of `--mode`. Conversely, `--mode=process` requires `--plugin-dir`. In a multi-node cluster, combine `process` with another mode (typically `query`) so plugins can call `influxdb3_local.query()` locally.
 
 You can specify multiple modes using a comma-delimited list (for example, `ingest,query`).
 
diff --git a/content/shared/influxdb3-cli/create/trigger.md b/content/shared/influxdb3-cli/create/trigger.md
@@ -303,3 +303,45 @@ influxdb3 create trigger \
   --error-behavior disable \
   TRIGGER_NAME
 ```
+
+{{% show-in "enterprise" %}}
+
+### Pin a trigger to specific nodes in a cluster
+
+In a multi-node {{% product-name %}} cluster, the default `--node-spec all` makes every node with [`--plugin-dir`](/influxdb3/enterprise/reference/config-options/#plugin-dir) configured try to execute the trigger.
+For schedule triggers, this causes duplicate execution on every plugin-capable node.
+For request triggers, the route exists only on the node receiving the HTTP request, and other nodes return `404 not found` — there's no internal cross-node routing.
+
+To pin a trigger to specific node(s), pass `--node-spec nodes:<node-id>[,<node-id>...]`:
+
+```bash { placeholders="AUTH_TOKEN|DATABASE_NAME|NODE_ID" }
+# Pin a schedule trigger to a process-capable node
+influxdb3 create trigger \
+  --database DATABASE_NAME \
+  --token AUTH_TOKEN \
+  --path schedule_rollup.py \
+  --trigger-spec "every:5s" \
+  --node-spec "nodes:NODE_ID" \
+  hourly_rollup
+
+# Pin a request trigger to a query-capable node (the node that serves HTTP)
+influxdb3 create trigger \
+  --database DATABASE_NAME \
+  --token AUTH_TOKEN \
+  --path request_top_n.py \
+  --trigger-spec "request:top_n" \
+  --node-spec "nodes:NODE_ID" \
+  top_n
+```
+
+The cluster validates the node IDs in `--node-spec` against current cluster membership at create time.
+A typo or unknown node ID is rejected with `HTTP 500: invalid node name (<id>)`.
+
+The cluster doesn't validate the trigger type against the pinned node's mode at create time.
+Pinning a schedule trigger to a `compact`-only node, or a request trigger to an `ingest`-only node, succeeds — but the trigger fails or returns `404` at execution time.
+Choose the pinned node by what the trigger needs at execution:
+
+- **Schedule trigger** — pin to a node with `process,query` mode if the plugin reads with `influxdb3_local.query()`; otherwise the call HTTP-hops to another query node.
+- **Request trigger** — pin to the node(s) you want to serve external HTTP traffic. The `/api/v3/engine/<trigger_name>` route only exists on pinned nodes; clients hitting any other node receive `404 not found`.
+
+{{% /show-in %}}
diff --git a/content/shared/influxdb3-plugins/_index.md b/content/shared/influxdb3-plugins/_index.md
diff --git a/content/shared/influxdb3-reference/influxdb3-processing-engine.md b/content/shared/influxdb3-reference/influxdb3-processing-engine.md