Skip to content

fix(nodes): add input validation/sanitization to LLM chat drivers#559

Merged
stepmikhaylov merged 8 commits into
rocketride-org:developfrom
charliegillet:fix/llm-input-validation
Apr 7, 2026
Merged

fix(nodes): add input validation/sanitization to LLM chat drivers#559
stepmikhaylov merged 8 commits into
rocketride-org:developfrom
charliegillet:fix/llm-input-validation

Conversation

@charliegillet
Copy link
Copy Markdown
Contributor

@charliegillet charliegillet commented Mar 31, 2026

Summary

  • Add shared ai.common.validation module with sanitize_prompt, validate_prompt, validate_model_name, and validate_max_tokens utilities
  • Integrate validation into ChatBase.__init__ (model name + output token validation) and ChatBase.chat_string (prompt sanitization + empty-check)
  • Add sanitize_prompt safety net in ChatBase._chat() and in the 4 LLM nodes that override chat()/_chat() and bypass chat_string(): Mistral, Perplexity, Gemini, and IBM Watson

Type

Security Fix

Why this bug happens in this codebase

All LLM nodes inherit from ChatBase (packages/ai/src/ai/common/chat.py), which acts as the single chokepoint for prompt dispatch. Before this change, none of these paths validated or sanitized user input:

  1. ChatBase.__init__ accepted any value from config.get('model') — including empty strings, strings with control characters, or whitespace — and passed it directly to provider SDKs (ChatOpenAI, ChatAnthropic, ChatVertexAI, etc.). A malformed model name would only fail deep inside the provider SDK with a cryptic error.

  2. ChatBase.chat_string (the main prompt entry point) passed prompts directly to _chat_with_retries_chatself._llm.invoke() with zero sanitization. C0/C1 control characters (NUL bytes, escape sequences, form feeds, etc.) embedded in user input flow straight to OpenAI, Anthropic, DeepSeek, etc. APIs, which can cause 400 errors or undefined behavior.

  3. ChatBase.__init__ accepted modelOutputTokens values without upper-bound validation — a misconfigured node could request billions of output tokens.

  4. Nodes that override chat() entirelyllm_mistral/mistral.py, llm_perplexity/perplexity.py — call the provider API directly via self._client.chat.complete() or self._llm.invoke(), completely bypassing chat_string() and its (previously nonexistent) validation.

  5. Nodes that override _chat()llm_gemini/gemini.py, llm_ibm_watson/ibm_watson.py — also bypass the base _chat() method, so any safety net there wouldn't apply without explicit integration.

What changed

New file: packages/ai/src/ai/common/validation.py

Shared validation utilities used by all LLM nodes:

  • sanitize_prompt(prompt) — strips C0/C1 control chars (except , , ) that cause API errors
  • validate_prompt(prompt, max_tokens, token_counter) — rejects empty prompts, sanitizes, warns on context overflow
  • validate_model_name(model) — rejects empty/malformed model names via regex (^[a-zA-Z0-9][a-zA-Z0-9._:/-]*$)
  • validate_max_tokens(output_tokens, total_tokens) — clamps output tokens to min(output_tokens, total_tokens, 1_000_000)

Modified: packages/ai/src/ai/common/chat.py (ChatBase)

  • __init__: model name now validated via validate_model_name(); output tokens validated via validate_max_tokens()
  • chat_string(): prompts validated + sanitized via validate_prompt() before token counting and API dispatch
  • _chat(): sanitize_prompt() added as defense-in-depth for any subclass path

Modified: 4 LLM nodes with custom chat paths

  • llm_mistral/mistral.py — sanitizes prompt in overridden chat()
  • llm_perplexity/perplexity.py — sanitizes prompt in overridden chat()
  • llm_gemini/gemini.py — sanitizes prompt in overridden _chat()
  • llm_ibm_watson/ibm_watson.py — sanitizes prompt in overridden _chat()

Validation

  • ruff check passes on all 6 changed files
  • ruff format --check passes on all 6 changed files
  • No changes to node interfaces or return types — fully backward compatible

How to extend

To add validation to a new LLM node:

  1. If it inherits from ChatBase and doesn't override chat() or _chat(), it gets validation automatically
  2. If it overrides chat() or _chat(), import sanitize_prompt from ai.common.validation and call it on the prompt before the API call

Closes: N/A — new contribution, no pre-existing issue

#Hack-with-bay-2

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added input validation and sanitization for prompts, including automatic removal of disallowed control characters.
    • Implemented model name validation and improved token-limit handling with clamping and warnings.
  • Bug Fixes
    • Relaxed empty-prompt rejection to allow legitimate edge cases.
    • Streamlined validation error messaging to be clearer and more concise.

charliegillet and others added 2 commits March 30, 2026 17:11
…screen

Handle TASK_STATE.STOPPING in the control button to show "Stopping..."
with a disabled state and distinct orange styling, preventing duplicate
clicks and giving immediate visual feedback during pipeline shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a shared validation module and integrate it into ChatBase and all
LLM nodes that bypass the base chat path, preventing control character
injection, empty prompts, malformed model names, and unsafe token limits
from reaching provider APIs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new LLM input validation module and integrates its checks into ChatBase; updates IBM Watson node by reformatting location constants, simplifying location validation error text, and removing an empty-prompt guard so prompts are validated via the new flow.

Changes

Cohort / File(s) Summary
IBM Watson Node
nodes/src/nodes/llm_ibm_watson/ibm_watson.py
Reformatted _VALID_LOCATIONS, simplified _validate_location() error message construction, and removed the early empty-prompt guard in _chat() so prompts proceed to downstream handling.
ChatBase integration
packages/ai/src/ai/common/chat.py
Imported and used validate_model_name, validate_max_tokens, and validate_prompt; set model via validation, replaced manual token clamp with validate_max_tokens(), and sanitize/validate prompts before token counting.
Validation utilities
packages/ai/src/ai/common/validation.py
New module: sanitizes prompts (removes disallowed control chars), validates prompts with token checks, validates model names, and clamps/validates max output token values (introduces MAX_OUTPUT_TOKENS).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant ChatBase
    participant Validation as ValidationModule
    participant LLM as IBM_Watson/LLM

    Client->>ChatBase: send prompt + config
    ChatBase->>Validation: validate_model_name(config.model)
    ChatBase->>Validation: validate_max_tokens(outputTokens,totalTokens)
    ChatBase->>Validation: validate_prompt(prompt,maxTokens,token_counter)
    Validation-->>ChatBase: sanitized/validated prompt & token caps
    ChatBase->>LLM: construct messages payload & call chat()
    LLM-->>ChatBase: response
    ChatBase-->>Client: return response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • jmaionchi
  • stepmikhaylov
  • Rod-Christensen
  • ryan-t-christensen

Poem

🐰 I hopped through code to clean the stream,
Removing ghosts and trimming seams,
Prompts now neat, tokens in line,
Watson speaks when signals shine,
A joyful hop — the tests all beam! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding input validation/sanitization to LLM chat drivers. It directly reflects the primary objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added module:nodes Python pipeline nodes module:vscode VS Code extension module:ai AI/ML modules labels Mar 31, 2026
…ation PR

The PageStatus changes belong in a separate PR (rocketride-org#549) and were
accidentally included here.
@github-actions github-actions Bot removed the module:vscode VS Code extension label Mar 31, 2026
@ryan-t-christensen
Copy link
Copy Markdown
Collaborator

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 31, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/llm_mistral/mistral.py`:
- Around line 247-251: The override in Mistral.chat currently only sanitizes
control characters via sanitize_prompt(question.getPrompt()) and builds
messages, skipping the shared preflight validation implemented in
ChatBase.chat_string(); update Mistral.chat to run the same shared prompt
validation (the ChatBase chat_string preflight checks) on question.getPrompt()
before proceeding, ensure empty/whitespace-only prompts are rejected the same
way as other backends, and then use the sanitized/validated prompt for messages
(referencing the methods chat(), ChatBase.chat_string(), sanitize_prompt(), and
question.getPrompt() to locate the changes).

In `@nodes/src/nodes/llm_perplexity/perplexity.py`:
- Around line 189-193: This override of chat() only calls sanitize_prompt and
therefore skips the PR-wide input guarantees (empty/whitespace rejection and
preflight token check) provided by ChatBase.chat_string(); change the prompt
handling to use validate_prompt(question.getPrompt()) (and then sanitize if
needed) before invoking the model so the empty/whitespace validation and token
preflight run; update the code around question.getPrompt(), sanitize_prompt,
validate_prompt, and the call to self._llm.invoke accordingly in this class's
chat() implementation.

In `@packages/ai/src/ai/common/chat.py`:
- Around line 71-75: The check that _modelOutputTokens >= 1024 runs before
clamping, so validate_max_tokens(self._modelOutputTokens,
self._modelTotalTokens) can reduce _modelOutputTokens below 1024 without
raising; move or repeat the minimum check after clamping: call
validate_max_tokens(...) first (or call it then re-check) and raise ValueError
if the resulting self._modelOutputTokens is < 1024, referencing the fields
_modelOutputTokens and _modelTotalTokens and the helper validate_max_tokens to
locate the logic.

In `@packages/ai/src/ai/common/validation.py`:
- Around line 69-74: The current validation checks prompt emptiness before
calling sanitize_prompt, allowing control-only prompts (e.g. '\x00\x01') to
bypass rejection; update the logic in the validation function so that after
calling sanitize_prompt(prompt) you re-check whether the sanitized prompt is
empty or whitespace (using the same check as before) and raise
ValueError('Prompt is empty or contains only whitespace.') if it is; ensure you
use the sanitized value for any subsequent processing.
- Around line 124-133: Validate total_tokens before using it in comparisons:
ensure total_tokens is an int and >= 1 (similar to the check for output_tokens)
inside the same validation function in packages/ai/src/ai/common/validation.py;
if invalid raise ValueError describing the bad total_tokens, then perform the
clamping logic that compares output_tokens to MAX_OUTPUT_TOKENS and to
total_tokens so the comparisons cannot raise TypeError or produce negative/zero
output limits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 027f0a71-2d3c-4bc9-b852-19f5f045971c

📥 Commits

Reviewing files that changed from the base of the PR and between b6b2dc1 and 993d35d.

📒 Files selected for processing (6)
  • nodes/src/nodes/llm_gemini/gemini.py
  • nodes/src/nodes/llm_ibm_watson/ibm_watson.py
  • nodes/src/nodes/llm_mistral/mistral.py
  • nodes/src/nodes/llm_perplexity/perplexity.py
  • packages/ai/src/ai/common/chat.py
  • packages/ai/src/ai/common/validation.py

Comment thread nodes/src/nodes/llm_mistral/mistral.py Outdated
Comment thread nodes/src/nodes/llm_perplexity/perplexity.py Outdated
Comment thread packages/ai/src/ai/common/chat.py Outdated
Comment thread packages/ai/src/ai/common/validation.py
Comment thread packages/ai/src/ai/common/validation.py Outdated
- Re-check prompt emptiness after sanitization to catch control-only prompts
- Validate total_tokens param before clamping output tokens
- Move min 1024 output tokens check after validate_max_tokens clamping
- Use validate_prompt() in Mistral/Perplexity chat() overrides to match
  shared ChatBase validation behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/llm_mistral/mistral.py`:
- Around line 247-248: The current retry path is catching all exceptions and
converting them to provider errors, which hides input validation failures from
validate_prompt called with question.getPrompt(), self.getTotalTokens(), and
self.getTokens; update the error handling in the retry/except block (the logic
around the retry loop and provider error construction) to detect and re-raise
validation errors from validate_prompt (or catch only provider/network
exceptions) so validation exceptions propagate unchanged—e.g., check the
exception type or sentinel used by validate_prompt and rethrow it immediately
instead of converting it to provider error text.

In `@nodes/src/nodes/llm_perplexity/perplexity.py`:
- Around line 189-190: The prompt validation call using
validate_prompt(question.getPrompt(), self.getTotalTokens(), self.getTokens) is
currently inside a broad retry try-block and its ValueError is being caught and
rethrown as a generic provider error; move the validate_prompt call out of that
try (perform validation before entering the provider retry/try) so validation
ValueError can propagate normally, or alternatively add an explicit except
ValueError: raise to the existing exception handling so ValueError is not
wrapped by the provider/error conversion logic in the retry block that handles
provider exceptions.

In `@packages/ai/src/ai/common/validation.py`:
- Around line 104-110: Guard the type of the model parameter before calling
.strip(): ensure model is an instance of str (e.g., using isinstance(model,
str)) and raise a ValueError with the existing message if it is not a string or
is empty/whitespace; then call model = model.strip() and validate with
_MODEL_NAME_RE.match(model) as before. This change should be applied around the
existing checks that reference model and _MODEL_NAME_RE.match to prevent
AttributeError when non-string values (like integers) are passed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 71cce18e-ee15-4b1e-a58f-a61dee86ab48

📥 Commits

Reviewing files that changed from the base of the PR and between 993d35d and 96e93f4.

📒 Files selected for processing (4)
  • nodes/src/nodes/llm_mistral/mistral.py
  • nodes/src/nodes/llm_perplexity/perplexity.py
  • packages/ai/src/ai/common/chat.py
  • packages/ai/src/ai/common/validation.py

Comment thread nodes/src/nodes/llm_mistral/mistral.py Outdated
Comment thread nodes/src/nodes/llm_perplexity/perplexity.py Outdated
Comment thread packages/ai/src/ai/common/validation.py Outdated
Copy link
Copy Markdown
Collaborator

@kwit75 kwit75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good direction — input validation for LLM drivers is needed. few things to address:

double/triple validationchat_string() calls validate_prompt(), then _chat() also calls sanitize_prompt(). every prompt gets sanitized twice minimum. mistral and perplexity add a third pass in their own chat(). pick one validation point instead of layering them

validate_model_name(None) will break existing setupsconfig.get('model') returns None when model isn't set yet. old code accepted that. new code throws ValueError. this is a breaking change for drivers that set model lazily or have it optional

model name regex too strict[a-zA-Z0-9._:/-] doesn't allow @ (some providers use org@model), and rejects model names starting with non-alphanumeric. may break edge cases

cosmetic changes mixed with security PR — IBM Watson formatting (line breaks in _VALID_LOCATIONS), whitespace changes in chat.py — makes the diff harder to review. split these out

sanitize_prompt regex is correct, token clamping logic is solid

- Remove double/triple validation: keep validate_prompt() only in
  chat_string() (the main entry point), remove redundant sanitize_prompt()
  from _chat() and validate_prompt() from Mistral/Perplexity chat() overrides
- Fix validate_model_name(None) breaking existing setups: return None
  gracefully when model is not yet configured instead of raising ValueError
- Add @ to model name regex to support org@model provider formats
- IBM Watson formatting changes kept as-is since they are ruff format
  enforced style, not hand-made cosmetic changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
packages/ai/src/ai/common/validation.py (1)

109-110: ⚠️ Potential issue | 🟠 Major

Non-string model values currently raise AttributeError.

At Line 109, truthy non-string values (e.g., 123) hit .strip() and fail with AttributeError instead of controlled validation.

🔧 Proposed fix
-    if not model.strip():
+    if not isinstance(model, str) or not model.strip():
         raise ValueError('Model name was provided but is empty.')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ai/src/ai/common/validation.py` around lines 109 - 110, The current
check uses model.strip() which raises AttributeError for non-string truthy
values; update the validation around the model variable (the code that currently
calls model.strip()) to first verify that model is an instance of str (and not
None), and if not raise a ValueError indicating an invalid model type, then
perform the empty-string check with model.strip() and raise the existing
ValueError if it's empty—this ensures non-string inputs (e.g., 123) produce
controlled validation errors rather than AttributeError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/ai/src/ai/common/validation.py`:
- Around line 133-137: The token validation incorrectly accepts booleans because
isinstance(True, int) is True; update the checks in validation.py for
output_tokens and total_tokens to explicitly reject booleans (e.g., treat bool
as invalid) before accepting ints — modify the conditions around output_tokens
and total_tokens (the two ValueError checks) to fail when the value is a bool so
only real integers >=1 pass validation and raise the same ValueError message
otherwise.
- Around line 85-87: Replace the silent bare except in the token-counting block
with a non-blocking logged failure: inside the except block for token counting,
call the already-imported debug(...) to emit a descriptive message that includes
the exception details (e.g., str(e) or formatted traceback) and context (which
tokenizer/provider and input size), then continue without re-raising so requests
are not blocked; locate the except block shown in the diff and update it to log
via debug rather than using "pass".
- Around line 69-70: Guard the prompt's type before calling .strip(): update the
existing check around the condition "if not prompt or not prompt.strip()" to
first assert isinstance(prompt, str) and raise a ValueError (e.g., "Prompt must
be a non-empty string") for non-str inputs, then perform the empty/whitespace
check on prompt.strip() so all invalid inputs consistently raise ValueError.

---

Duplicate comments:
In `@packages/ai/src/ai/common/validation.py`:
- Around line 109-110: The current check uses model.strip() which raises
AttributeError for non-string truthy values; update the validation around the
model variable (the code that currently calls model.strip()) to first verify
that model is an instance of str (and not None), and if not raise a ValueError
indicating an invalid model type, then perform the empty-string check with
model.strip() and raise the existing ValueError if it's empty—this ensures
non-string inputs (e.g., 123) produce controlled validation errors rather than
AttributeError.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fed22d32-4f99-47e6-b10f-f67c3d19c954

📥 Commits

Reviewing files that changed from the base of the PR and between 96e93f4 and 85f82cb.

📒 Files selected for processing (2)
  • packages/ai/src/ai/common/chat.py
  • packages/ai/src/ai/common/validation.py

Comment thread packages/ai/src/ai/common/validation.py
Comment thread packages/ai/src/ai/common/validation.py
Comment thread packages/ai/src/ai/common/validation.py Outdated
…on drivers

Validation is now centralized in ChatBase.chat_string() — individual
drivers should not duplicate sanitization. Removes the redundant calls
and imports per kwit75's review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/llm_ibm_watson/ibm_watson.py`:
- Line 71: Refactor the long inline ValueError by extracting the message
formatting into a small helper or custom exception: add a helper function (e.g.,
format_invalid_location_error(location)) or a custom exception class (e.g.,
InvalidIBMCloudLocation) in the same module that uses _VALID_LOCATIONS to build
the detailed string, and replace the inline raise at the current raise site with
a short raise that calls the helper (raise
ValueError(format_invalid_location_error(location))) or raises the custom
exception (raise InvalidIBMCloudLocation(location)); keep _VALID_LOCATIONS usage
inside the helper/exception so the raise site stays lean.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6a0e8ef5-b275-493a-967d-47578b1b67a1

📥 Commits

Reviewing files that changed from the base of the PR and between 85f82cb and a0fbc04.

📒 Files selected for processing (1)
  • nodes/src/nodes/llm_ibm_watson/ibm_watson.py

Comment thread nodes/src/nodes/llm_ibm_watson/ibm_watson.py
@charliegillet
Copy link
Copy Markdown
Contributor Author

Addressed all review feedback:

  • Double/triple validation: Consolidated to a single validation point — validate_prompt() is called only in chat_string() (L358 in chat.py). Neither _chat() nor any other base method calls sanitize_prompt() separately. Mistral and Perplexity chat() overrides use question.getPrompt() directly without additional sanitization passes.

  • validate_model_name(None) breaking change: Fixed — the function now returns None when model is None (L106-107 in validation.py), so lazy/optional model configuration continues to work.

  • Model name regex too strict: Fixed — the regex (L24 in validation.py) now includes @ in the allowed character set: [a-zA-Z0-9._:/@-], supporting patterns like org@model.

Thanks for the thorough review!

Address CodeRabbit suggestions:
- Guard validate_model_name against non-string inputs
- Guard validate_max_tokens against bool values (isinstance(True, int) is True in Python)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
packages/ai/src/ai/common/validation.py (2)

69-70: ⚠️ Potential issue | 🟡 Minor

Add a runtime type guard for prompt before calling .strip().

At Line 69, truthy non-string inputs can throw AttributeError instead of returning a controlled validation error.

🔧 Proposed fix
-    if not prompt or not prompt.strip():
+    if not isinstance(prompt, str) or not prompt.strip():
         raise ValueError('Prompt is empty or contains only whitespace.')
#!/bin/bash
# Verify whether all validate_prompt callers can ever pass non-str values.
# Expected: callers should pass str; if uncertain, keep the defensive guard above.

rg -n --type=py -C3 '\bvalidate_prompt\s*\('
rg -n --type=py -C3 'def\s+getPrompt\s*\('
rg -n --type=py -C3 'getPrompt\s*\('
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ai/src/ai/common/validation.py` around lines 69 - 70, The validation
currently calls prompt.strip() which will raise AttributeError for truthy
non-string inputs; update the validate_prompt function to first perform a
runtime type guard (e.g., check isinstance(prompt, str)) and raise a ValueError
with the existing message if the check fails or if prompt.strip() is empty;
ensure references to validate_prompt remain unchanged so callers still get the
same error type and message for invalid inputs.

85-87: ⚠️ Potential issue | 🟡 Minor

Do not swallow token-counter exceptions silently.

At Line 85, except Exception: pass removes diagnostics for tokenizer/provider regressions. Log and continue instead.

🔧 Proposed fix
-    except Exception:
-        # Token counting failures should not block the request
-        pass
+    except Exception as exc:
+        # Token counting failures should not block the request
+        debug(f'Warning: Prompt token counting failed: {exc}')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ai/src/ai/common/validation.py` around lines 85 - 87, Replace the
silent "except Exception: pass" in the token-counting block with a catch that
captures the exception (e.g., "except Exception as e") and logs it (using the
module/logger available in this file) including context like "Token counting
failed, continuing" and the exception/stacktrace (logger.exception or
logger.error with exc_info=True), then continue without raising; keep the
behavior of not blocking the request but ensure diagnostics are recorded for
tokenizer/provider regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/ai/src/ai/common/validation.py`:
- Around line 69-70: The validation currently calls prompt.strip() which will
raise AttributeError for truthy non-string inputs; update the validate_prompt
function to first perform a runtime type guard (e.g., check isinstance(prompt,
str)) and raise a ValueError with the existing message if the check fails or if
prompt.strip() is empty; ensure references to validate_prompt remain unchanged
so callers still get the same error type and message for invalid inputs.
- Around line 85-87: Replace the silent "except Exception: pass" in the
token-counting block with a catch that captures the exception (e.g., "except
Exception as e") and logs it (using the module/logger available in this file)
including context like "Token counting failed, continuing" and the
exception/stacktrace (logger.exception or logger.error with exc_info=True), then
continue without raising; keep the behavior of not blocking the request but
ensure diagnostics are recorded for tokenizer/provider regressions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 573a42bf-f890-4a9a-8a96-7265dd70acdd

📥 Commits

Reviewing files that changed from the base of the PR and between a0fbc04 and 8fb1b02.

📒 Files selected for processing (1)
  • packages/ai/src/ai/common/validation.py

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

No description provided.

@ryan-t-christensen
Copy link
Copy Markdown
Collaborator

@charliegillet - added mistral and perplexity but tested and the PR adds some nice fallback. LGTM!

@stepmikhaylov stepmikhaylov enabled auto-merge (squash) April 7, 2026 09:08
@stepmikhaylov stepmikhaylov merged commit 88ba108 into rocketride-org:develop Apr 7, 2026
15 checks passed
shashidharbabu pushed a commit that referenced this pull request Apr 7, 2026
* feat(vscode): improve stop button feedback in Pipeline Observability screen

Handle TASK_STATE.STOPPING in the control button to show "Stopping..."
with a disabled state and distinct orange styling, preventing duplicate
clicks and giving immediate visual feedback during pipeline shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(nodes): add input validation and sanitization to LLM chat drivers

Add a shared validation module and integrate it into ChatBase and all
LLM nodes that bypass the base chat path, preventing control character
injection, empty prompts, malformed model names, and unsafe token limits
from reaching provider APIs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove unrelated PageStatus "Stopping..." changes from LLM validation PR

The PageStatus changes belong in a separate PR (#549) and were
accidentally included here.

* fix: address CodeRabbit review findings on input validation

- Re-check prompt emptiness after sanitization to catch control-only prompts
- Validate total_tokens param before clamping output tokens
- Move min 1024 output tokens check after validate_max_tokens clamping
- Use validate_prompt() in Mistral/Perplexity chat() overrides to match
  shared ChatBase validation behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(nodes): address kwit75's review feedback on LLM input validation

- Remove double/triple validation: keep validate_prompt() only in
  chat_string() (the main entry point), remove redundant sanitize_prompt()
  from _chat() and validate_prompt() from Mistral/Perplexity chat() overrides
- Fix validate_model_name(None) breaking existing setups: return None
  gracefully when model is not yet configured instead of raising ValueError
- Add @ to model name regex to support org@model provider formats
- IBM Watson formatting changes kept as-is since they are ruff format
  enforced style, not hand-made cosmetic changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(nodes): remove redundant sanitize_prompt from gemini and ibm_watson drivers

Validation is now centralized in ChatBase.chat_string() — individual
drivers should not duplicate sanitization. Removes the redundant calls
and imports per kwit75's review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(nodes): add type guards for model name and token validation

Address CodeRabbit suggestions:
- Guard validate_model_name against non-string inputs
- Guard validate_max_tokens against bool values (isinstance(True, int) is True in Python)

* add mistral and perplexity validation

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: ryan-t-christensen <ryan.christensen@rocketride.ai>
ryan-t-christensen added a commit that referenced this pull request Apr 8, 2026
* feat(vscode): improve stop button feedback in Pipeline Observability screen

Handle TASK_STATE.STOPPING in the control button to show "Stopping..."
with a disabled state and distinct orange styling, preventing duplicate
clicks and giving immediate visual feedback during pipeline shutdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(nodes): add input validation and sanitization to LLM chat drivers

Add a shared validation module and integrate it into ChatBase and all
LLM nodes that bypass the base chat path, preventing control character
injection, empty prompts, malformed model names, and unsafe token limits
from reaching provider APIs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove unrelated PageStatus "Stopping..." changes from LLM validation PR

The PageStatus changes belong in a separate PR (#549) and were
accidentally included here.

* fix: address CodeRabbit review findings on input validation

- Re-check prompt emptiness after sanitization to catch control-only prompts
- Validate total_tokens param before clamping output tokens
- Move min 1024 output tokens check after validate_max_tokens clamping
- Use validate_prompt() in Mistral/Perplexity chat() overrides to match
  shared ChatBase validation behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(nodes): address kwit75's review feedback on LLM input validation

- Remove double/triple validation: keep validate_prompt() only in
  chat_string() (the main entry point), remove redundant sanitize_prompt()
  from _chat() and validate_prompt() from Mistral/Perplexity chat() overrides
- Fix validate_model_name(None) breaking existing setups: return None
  gracefully when model is not yet configured instead of raising ValueError
- Add @ to model name regex to support org@model provider formats
- IBM Watson formatting changes kept as-is since they are ruff format
  enforced style, not hand-made cosmetic changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(nodes): remove redundant sanitize_prompt from gemini and ibm_watson drivers

Validation is now centralized in ChatBase.chat_string() — individual
drivers should not duplicate sanitization. Removes the redundant calls
and imports per kwit75's review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(nodes): add type guards for model name and token validation

Address CodeRabbit suggestions:
- Guard validate_model_name against non-string inputs
- Guard validate_max_tokens against bool values (isinstance(True, int) is True in Python)

* add mistral and perplexity validation

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: ryan-t-christensen <ryan.christensen@rocketride.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ai AI/ML modules module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants