Skip to content

Commit 7e1cf84

Browse files
authored
docs(middleware): expand tool-calling security considerations (#1716)
1 parent 8cbe943 commit 7e1cf84

1 file changed

Lines changed: 8 additions & 6 deletions

File tree

docs/integration/langchain/agent-middleware.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -309,11 +309,15 @@ result2 = agent.invoke(
309309

310310
Be aware of the following constraints when using `GuardrailsMiddleware` with tool-calling agents.
311311

312-
### Output Rails and Tool-Calling Responses
312+
### Security Considerations for Tool-Calling Agents
313313

314-
LLM-based output rails (such as `self_check_output`) evaluate the `content` field of the model's response. Intermediate tool-calling responses often have **empty content** (the actual instructions are in the `tool_calls` field). Depending on the LLM used for the self-check, an empty content field may be flagged as a violation.
314+
Rails evaluate the `content` field of messages only. This has two implications for tool-calling agents:
315315

316-
To work around this, disable output rails and rely on input rails for tool-calling agents:
316+
**Tool call arguments are not inspected.** When the LLM generates a tool call, the arguments (e.g., `send_email(body="SSN: 123-45-6789")`) are in the `tool_calls` field, not `content`. Input and output rails do not see or validate these arguments.
317+
318+
**Tool results bypass input rails.** When a tool returns its result as a `ToolMessage`, that message is not subject to input rail validation. Malicious or unexpected tool outputs can influence subsequent model responses without being checked.
319+
320+
To mitigate these risks, enable output rails to validate the final LLM response before it reaches the user. This ensures that even if unsafe content enters through tool calls or tool results, the model's response is still checked. However, note that intermediate tool-calling responses often have **empty content** (the instructions are in the `tool_calls` field), and some LLM-based output rails (such as `self_check_output`) may flag empty content as a false positive. If you encounter this, you can disable output rails as a workaround — but be aware this also removes the safety net for tool result content:
317321

318322
```python
319323
guardrails = GuardrailsMiddleware(
@@ -322,9 +326,7 @@ guardrails = GuardrailsMiddleware(
322326
)
323327
```
324328

325-
### Tool Call Arguments Are Not Inspected
326-
327-
Rails evaluate the `content` field of messages, not the `tool_calls` arguments. Content-based rails do not inspect PII or harmful content passed through tool call arguments (e.g., `send_email(body="SSN: 123-45-6789")`).
329+
For more details, see [Security Considerations](https://docs.nvidia.com/nemo/guardrails/latest/integration/tools-integration.html#security-considerations) in the tools integration guide.
328330

329331
### MODIFIED Status Replaces Message Content
330332

0 commit comments

Comments
 (0)