You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integration/langchain/agent-middleware.md
+8-6Lines changed: 8 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -309,11 +309,15 @@ result2 = agent.invoke(
309
309
310
310
Be aware of the following constraints when using `GuardrailsMiddleware` with tool-calling agents.
311
311
312
-
### Output Rails and Tool-Calling Responses
312
+
### Security Considerations for Tool-Calling Agents
313
313
314
-
LLM-based output rails (such as `self_check_output`) evaluate the `content` field of the model's response. Intermediate tool-calling responses often have **empty content** (the actual instructions are in the `tool_calls` field). Depending on the LLM used for the self-check, an empty content field may be flagged as a violation.
314
+
Rails evaluate the `content` field of messages only. This has two implications for tool-calling agents:
315
315
316
-
To work around this, disable output rails and rely on input rails for tool-calling agents:
316
+
**Tool call arguments are not inspected.** When the LLM generates a tool call, the arguments (e.g., `send_email(body="SSN: 123-45-6789")`) are in the `tool_calls` field, not `content`. Input and output rails do not see or validate these arguments.
317
+
318
+
**Tool results bypass input rails.** When a tool returns its result as a `ToolMessage`, that message is not subject to input rail validation. Malicious or unexpected tool outputs can influence subsequent model responses without being checked.
319
+
320
+
To mitigate these risks, enable output rails to validate the final LLM response before it reaches the user. This ensures that even if unsafe content enters through tool calls or tool results, the model's response is still checked. However, note that intermediate tool-calling responses often have **empty content** (the instructions are in the `tool_calls` field), and some LLM-based output rails (such as `self_check_output`) may flag empty content as a false positive. If you encounter this, you can disable output rails as a workaround — but be aware this also removes the safety net for tool result content:
Rails evaluate the `content` field of messages, not the `tool_calls` arguments. Content-based rails do not inspect PII or harmful content passed through tool call arguments (e.g., `send_email(body="SSN: 123-45-6789")`).
329
+
For more details, see [Security Considerations](https://docs.nvidia.com/nemo/guardrails/latest/integration/tools-integration.html#security-considerations) in the tools integration guide.
0 commit comments