fix(security): validate untrusted input in sandbox pip install and langchain separator parsing by charliegillet · Pull Request #588 · rocketride-org/rocketride-server

charliegillet · 2026-04-01T00:32:43Z

Split from #581 per @asclearuc's review.

Summary

sandbox.py: Add regex validation for pip package names to reject path traversal, URLs, and shell metacharacters before passing to subprocess. Add threading.Lock around pip install to prevent concurrent conflicts.
langchain.py: Add input length validation (_MAX_SEPARATOR_INPUT_LEN = 1024) before ast.literal_eval in the langchain preprocessor to prevent resource exhaustion from oversized separator input.

Why this bug happens in this codebase

sandbox.py passes user-provided package names directly to pip install via subprocess without validation. A malicious or malformed package name containing path traversal sequences (../), URLs, or shell metacharacters could be passed through to pip, which would attempt to install from arbitrary sources. The regex allowlist ensures only valid PyPI package name characters are accepted.

langchain.py calls ast.literal_eval on a user-provided separator string. While ast.literal_eval is safe against code execution, it still needs to parse the input — an extremely long string could cause unnecessary resource consumption. A 1024-character limit is generous for separator values (which are typically 1-10 characters) while preventing abuse.

Validation

Both changes are minimal and focused — 2 files, +24/-8 lines total
ruff check and ruff format pass clean
The package name regex (^[a-zA-Z0-9]([a-zA-Z0-9._-]*[a-zA-Z0-9])?$) matches the PyPI naming convention

Type

Bug fix / Security hardening

#Hack-with-bay-2

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Enforced input size limits to reject oversized or malformed inputs before processing, preventing errors.
- Rejected invalid package names early (type, emptiness, length and format checks) to avoid failed installs.
- Serialized package installation and cache updates to improve reliability under concurrent operations.

@asclearuc

…ngchain separator parsing Split from rocketride-org#581 per @asclearuc's review. - Add regex validation for pip package names to reject path traversal, URLs, and shell metacharacters before passing to subprocess - Add threading lock around pip install to prevent concurrent conflicts - Add input length validation for ast.literal_eval in langchain preprocessor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-01T00:33:02Z

Warning

Rate limit exceeded

@charliegillet has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 16 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 4 minutes and 16 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 46645901-73ae-4152-9875-172ae71319ac

📥 Commits

Reviewing files that changed from the base of the PR and between bddcd8f and 7663810.

📒 Files selected for processing (1)

nodes/src/nodes/preprocessor_langchain/langchain.py

📝 Walkthrough

Walkthrough

Added input validation and safety controls: the langchain preprocessor enforces a 1024-character maximum for separator parsing, and the sandbox pip helper validates package names, restricts install subprocess stderr, and serializes pip installs with a module-level lock.

Changes

Cohort / File(s)	Summary
Langchain Input Validation `nodes/src/nodes/preprocessor_langchain/langchain.py`	Introduce `_MAX_SEPARATOR_INPUT_LEN = 1024`; `_parseSeparators(user_input)` now raises `ValueError` if `len(user_input)` > 1024; existing parsing and error wrapping retained for valid lengths.
Pip Installation Safety `packages/ai/src/ai/common/sandbox.py`	Add `_pip_lock` to serialize installs; `_pip_install(package)` now validates `package` (type, non-empty, ≤128 chars, regex for allowed chars/format), raises `ValueError` on invalid names; pip subprocess call discards `stderr` and `importlib.invalidate_caches()` is performed under the lock.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐇 I nibble code and tidy threads,
Count to 1024 before I tread,
I lock the pip, I check the name,
Small safeguards, a cautious game.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately describes the main security-focused changes across both modified files (input validation in sandbox pip install and langchain separator parsing).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/preprocessor_langchain/langchain.py`:
- Around line 77-79: The constant _MAX_SEPARATOR_INPUT_LEN is defined inside
_parseSeparators, causing it to be recreated on each call and inconsistent with
other class-level constants; move _MAX_SEPARATOR_INPUT_LEN = 1024 up to the
class scope alongside the other constants (use the same private naming
convention) and update _parseSeparators to reference the class-level constant
(via self._MAX_SEPARATOR_INPUT_LEN or ClassName._MAX_SEPARATOR_INPUT_LEN) so the
value is reused and kept with the class configuration.

In `@packages/ai/src/ai/common/sandbox.py`:
- Around line 316-322: The pip install call inside the _pip_lock block uses
subprocess.check_call with stderr=subprocess.PIPE which can block if not read;
change it to redirect stderr to subprocess.DEVNULL (matching stdout) or use
subprocess.run(..., capture_output=True, check=True) if you want to capture
errors. Locate the call that invokes subprocess.check_call([sys.executable,
'-m', 'pip', 'install', '--quiet', package], stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE, timeout=60) and replace stderr=subprocess.PIPE with
stderr=subprocess.DEVNULL (or refactor to subprocess.run with
capture_output=True and check=True and handle the result) to avoid pipe buffer
deadlocks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ec7be14e-d215-487d-907d-76bd927c0222

📥 Commits

Reviewing files that changed from the base of the PR and between 6f1c58c and e2b34ff.

📒 Files selected for processing (2)

nodes/src/nodes/preprocessor_langchain/langchain.py
packages/ai/src/ai/common/sandbox.py

…ketride-org#588 - Move _MAX_SEPARATOR_INPUT_LEN to class-level constant for consistency - Use subprocess.DEVNULL for stderr in pip install to prevent pipe buffer deadlocks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

charliegillet · 2026-04-03T18:51:18Z

Addressed CodeRabbit feedback:

Moved _MAX_SEPARATOR_INPUT_LEN to class-level constant for consistency
Changed stderr=subprocess.PIPE to subprocess.DEVNULL to prevent pipe buffer deadlocks

Thanks for the review!

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/preprocessor_langchain/langchain.py`:
- Around line 78-79: The length check on user_input can raise TypeError for
non-string truthy values; add an explicit type guard before the len(...) call:
validate that user_input is an instance of str (or coerce to str if intended)
and if not raise a ValueError with a clear message, so the code path using
self._MAX_SEPARATOR_INPUT_LEN always fails deterministically for invalid
separator input; update the check around user_input and
self._MAX_SEPARATOR_INPUT_LEN accordingly (e.g., verify isinstance(user_input,
str) then perform the len check and raise ValueError on failure).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6855d57b-10ab-4226-9722-6a7fedb1dbf0

📥 Commits

Reviewing files that changed from the base of the PR and between e2b34ff and bddcd8f.

📒 Files selected for processing (2)

nodes/src/nodes/preprocessor_langchain/langchain.py
packages/ai/src/ai/common/sandbox.py

…in preprocessor

asclearuc · 2026-04-07T11:48:15Z

Closing this in favour of keeping the status quo.

_pip_lock is not needed — pip has used its own per-environment filelock since v10, so concurrent pip install subprocesses are already serialised at the pip level. Adding a process-level threading lock on top serialises all installs globally, meaning a slow torch install would block an unrelated boto3 install in another thread — a net regression for concurrent sandbox use.

stderr=subprocess.DEVNULL is a step backwards from the original stderr=subprocess.PIPE. Swallowing pip's stderr means install failures (network errors, version conflicts, missing build deps) produce no actionable output, making sandbox errors much harder to diagnose.

langchain.py input validation is the one genuinely useful change here, but it's small enough to land as part of a broader input-validation pass rather than its own PR.

Thanks for the work, @charliegillet — the intent is right, the execution just introduces more problems than it solves.

charliegillet requested review from Rod-Christensen, jmaionchi and stepmikhaylov as code owners April 1, 2026 00:32

github-actions Bot added module:nodes Python pipeline nodes module:ai AI/ML modules labels Apr 1, 2026

coderabbitai Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread nodes/src/nodes/preprocessor_langchain/langchain.py Outdated

Comment thread packages/ai/src/ai/common/sandbox.py

charliegillet mentioned this pull request Apr 1, 2026

fix(cli): implement signal handlers for graceful shutdown #581

Open

4 tasks

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread nodes/src/nodes/preprocessor_langchain/langchain.py

fix(nodes): guard separator input type before length check in langcha…

7663810

…in preprocessor

asclearuc closed this Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): validate untrusted input in sandbox pip install and langchain separator parsing#588

fix(security): validate untrusted input in sandbox pip install and langchain separator parsing#588
charliegillet wants to merge 3 commits into
rocketride-org:developfrom
charliegillet:fix/security-input-validation-split

charliegillet commented Apr 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 1, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

charliegillet commented Apr 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

asclearuc commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

charliegillet commented Apr 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this bug happens in this codebase

Validation

Type

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

charliegillet commented Apr 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asclearuc commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

charliegillet commented Apr 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 1, 2026 •

edited

Loading