Skip to content

fix: remove system role filtering, TrustyAI now supports system messages#232

Merged
openshift-merge-bot[bot] merged 1 commit into
opendatahub-io:mainfrom
liavweiss:fix/remove-system-role-filtering
May 5, 2026
Merged

fix: remove system role filtering, TrustyAI now supports system messages#232
openshift-merge-bot[bot] merged 1 commit into
opendatahub-io:mainfrom
liavweiss:fix/remove-system-role-filtering

Conversation

@liavweiss
Copy link
Copy Markdown
Contributor

@liavweiss liavweiss commented May 4, 2026

Summary

  • Remove system message filtering from nemo-request-guard. Previously, messages with role: "system" were filtered out before sending to NeMo because the TrustyAI /v1/guardrail/checks endpoint rejected them with a hard error (Unsupported message role: 'system').
  • TrustyAI has since merged feat(api): Support input guardrail checks when role: 'system' (#47).
  • All messages (including system) are now forwarded to NeMo for full conversation context evaluation.

Closes #160. #205

Changes

  • request_guard.go: Removed the if role == "system" { continue } filter in extractOpenAIMessages. Updated comments.
  • request_guard_test.go: Updated tests to expect system messages in the forwarded payload.

E2E verification

Verified end-to-end by building the TrustyAI NeMo image locally from the develop branch (Dockerfile.server) which includes PR #47, and deploying it on a Kind cluster with detect sensitive data on input rails:

Test Messages Result
Safe system + user system: "You are a helpful assistant" + user: "Hello" success (both checked)
PII in system msg system: "card 4111-1111-1111-1111" + user: "Hello" blocked (system detected)
System-only with PII system: "Contact John Smith at john@example.com" blocked (system detected)

cc @nirrozenbaum @christinaexyou

Summary by CodeRabbit

  • Bug Fixes
    • System messages are now included when forwarding conversations to NeMo, ensuring complete message history is transmitted.
  • Tests
    • Unit tests updated to expect and validate all message roles (including system) and adjusted cases for single-system vs. system-only conversations.

@openshift-ci openshift-ci Bot requested review from noyitz and yossiovadia May 4, 2026 09:53
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b014172d-6e62-4c78-b15c-502da912af4b

📥 Commits

Reviewing files that changed from the base of the PR and between c704557 and 32c73fc.

📒 Files selected for processing (2)
  • pkg/plugins/nemo/request_guard.go
  • pkg/plugins/nemo/request_guard_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/plugins/nemo/request_guard.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/plugins/nemo/request_guard_test.go

📝 Walkthrough

Walkthrough

The change removes the special-case filtering of OpenAI system messages in extractOpenAIMessages; the function now forwards the full messages slice (all roles: system, user, assistant, tool) to NeMo and returns nil only for an empty slice. Inline documentation was updated to reflect forwarding all conversation messages. Tests were modified to expect preserved ordering and inclusion of system messages and to add a single-system-message case.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Security Assessment

  • CWE-20 (Improper Input Validation): The previous behavior allowed role == "system" messages to bypass the guard. This change correctly forwards system messages for validation; verify NeMo actually validates them and does not silently ignore or drop roles.
  • CWE-94 (Code Injection / Injection via tool responses): Forwarding tool role content to NeMo reduces the risk of unvalidated tool-response prompt injection. Confirm NeMo treats tool content as untrusted input.
  • Remaining risk — false confidence: If NeMo’s contract filters roles or only validates the last user message, forwarding here may create a false sense of protection. Action: add a runtime assertion or integration test that NeMo acknowledges/returns validation results per-role or per-message count.
  • Actionable checks:
    1. Confirm NeMo API/schema accepts and validates all roles (system, user, assistant, tool) and returns per-message validation results.
    2. Add an integration test that sends mixed-role messages and asserts NeMo processed the same number/order of messages.
    3. Ensure logging/metrics capture when any message is dropped or rejected by NeMo so failures are detectable.
    4. Limit and/or validate message sizes and attachment types before forwarding to avoid resource exhaustion or payload-based bypasses (rate/size checks).
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: removing system role filtering from NeMo request guard to enable forwarding of system messages now that TrustyAI supports them.
Linked Issues check ✅ Passed Code changes directly address issue #160 by forwarding all message roles (system, user, assistant, tool) to NeMo instead of filtering out system messages, enabling comprehensive guardrail validation across full conversation context.
Out of Scope Changes check ✅ Passed All changes are scoped to removing system message filtering and updating corresponding tests; no unrelated modifications to other functionality or components detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/plugins/nemo/request_guard.go (1)

160-167: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Block non-string content instead of silently turning it into "".

Severity: critical. Exploit: send a chat message with OpenAI array-form content parts, e.g. a system or user message whose content is [{type:"text",text:"..."}]. This code forwards {"content": ""} to NeMo, while the original request still carries the real instruction to the downstream model, creating a guardrail bypass on the exact path this PR is trying to harden. CWE-20.

Suggested fix
 func extractOpenAIMessages(raw any) ([]map[string]string, error) {
 	slice, ok := raw.([]any)
 	if !ok {
 		return nil, fmt.Errorf("messages is not an array")
 	}
-	var messages []map[string]string
+	messages := make([]map[string]string, 0, len(slice))
 
-	for _, m := range slice {
+	for i, m := range slice {
 		msg, ok := m.(map[string]any)
 		if !ok {
-			continue
+			return nil, fmt.Errorf("message at index %d is not an object", i)
 		}
 		role, _ := msg["role"].(string)
-		content, _ := msg["content"].(string)
+		if role == "" {
+			return nil, fmt.Errorf("message at index %d has invalid role", i)
+		}
+		content, err := extractOpenAIContent(msg["content"])
+		if err != nil {
+			return nil, fmt.Errorf("message at index %d: %w", i, err)
+		}
 		messages = append(messages, map[string]string{"role": role, "content": content})
 	}
 	return messages, nil
 }
+
+func extractOpenAIContent(raw any) (string, error) {
+	switch v := raw.(type) {
+	case string:
+		return v, nil
+	case []any:
+		parts := make([]string, 0, len(v))
+		for i, p := range v {
+			part, ok := p.(map[string]any)
+			if !ok {
+				return "", fmt.Errorf("content part %d is not an object", i)
+			}
+			if part["type"] != "text" {
+				continue
+			}
+			text, ok := part["text"].(string)
+			if !ok {
+				return "", fmt.Errorf("content part %d has invalid text", i)
+			}
+			parts = append(parts, text)
+		}
+		if len(parts) == 0 {
+			return "", fmt.Errorf("unsupported content format")
+		}
+		return strings.Join(parts, "\n"), nil
+	default:
+		return "", fmt.Errorf("unsupported content format")
+	}
+}

As per coding guidelines, **: REVIEW PRIORITIES: 1. Security vulnerabilities (provide severity, exploit scenario, and remediation code).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/plugins/nemo/request_guard.go` around lines 160 - 167, The loop currently
coerces non-string message content into an empty string (content, _ :=
msg["content"].(string)) which allows array-form or structured contents to be
forwarded; instead, detect when msg["content"] is not a string and reject the
request/path. Change the logic in the loop that iterates over slice (the
variables m, msg, role, content, messages) to validate content using a strict
type assertion (e.g., v, ok := msg["content"]; s, isString := v.(string)) and if
not isString return or mark the request invalid (do not append a map with ""),
logging or surfacing an error so the guard blocks any messages whose content is
not a plain string. Ensure the same strict behavior for role if applicable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@pkg/plugins/nemo/request_guard.go`:
- Around line 160-167: The loop currently coerces non-string message content
into an empty string (content, _ := msg["content"].(string)) which allows
array-form or structured contents to be forwarded; instead, detect when
msg["content"] is not a string and reject the request/path. Change the logic in
the loop that iterates over slice (the variables m, msg, role, content,
messages) to validate content using a strict type assertion (e.g., v, ok :=
msg["content"]; s, isString := v.(string)) and if not isString return or mark
the request invalid (do not append a map with ""), logging or surfacing an error
so the guard blocks any messages whose content is not a plain string. Ensure the
same strict behavior for role if applicable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: e66b650b-7947-45f5-9725-7564e910c72a

📥 Commits

Reviewing files that changed from the base of the PR and between 74db5d3 and c704557.

📒 Files selected for processing (2)
  • pkg/plugins/nemo/request_guard.go
  • pkg/plugins/nemo/request_guard_test.go

@nirrozenbaum
Copy link
Copy Markdown
Contributor

@liavweiss making sure - was this change tested with latest trusty version?

continue
}
role, _ := msg["role"].(string)
if role == "system" {
Copy link
Copy Markdown
Contributor

@nirrozenbaum nirrozenbaum May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be correct to just return “slice” without copying?
in current code we copy the whole messages array on every request.
so if we have a long multi turn chat, memory allocation for each turn is expensive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added a pre-allocation and an early return for empty slices to avoid unnecessary allocations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my question was actually about the messages themselves, not only about the case of empty slice.
my intention was - if we have something along these lines:

func extractOpenAIMessages(raw any) ([]map[string]string, error) {
	slice, ok := raw.([]any)
	if !ok {
		return nil, fmt.Errorf("messages is not an array")
	}

    if len(slice) == 0 {
		return nil, nil
	}

    return slice

would it work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should work with some extra changes, but returning the slice directly means forwarding all original message fields to NeMo (e.g., tool_calls, function_call) instead of only role and content. NeMo ignores extra fields, but it sends more data over the wire for no reason. It also requires changing the return type from []map[string]string to []any and updating all callers and tests accordingly. If we want to go in that direction, it would be better as a follow-up PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. I'm just thinking about the tradeoffs.
do we prefer allocating memory for every request and copying the whole chat? (that could be very large) or do we prefer sending the same request that we received over the wire to yet another http hop? (but at least we don't do both - allocate memory + sending http).

there are pros/cons for both sides.
my gut feeling says sending the original request as is would probably be "cheaper".
we can handle in a follow up, for sure.

Signed-off-by: Liav Weiss <lweiss@lweiss-thinkpadx1carbongen11.raanaii.csb>
@liavweiss liavweiss force-pushed the fix/remove-system-role-filtering branch from c704557 to 32c73fc Compare May 5, 2026 06:14
@liavweiss
Copy link
Copy Markdown
Contributor Author

@liavweiss making sure - was this change tested with latest trusty version?

Yes, I built the TrustyAI NeMo image locally that includes the new code #47

Copy link
Copy Markdown
Contributor

@nirrozenbaum nirrozenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci Bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liavweiss, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit 1d7b369 into opendatahub-io:main May 5, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm "Looks good to me", indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security issue in nemo guard plugin

2 participants