fix(security): validate untrusted input in sandbox pip install and langchain separator parsing#588
Conversation
…ngchain separator parsing Split from rocketride-org#581 per @asclearuc's review. - Add regex validation for pip package names to reject path traversal, URLs, and shell metacharacters before passing to subprocess - Add threading lock around pip install to prevent concurrent conflicts - Add input length validation for ast.literal_eval in langchain preprocessor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 4 minutes and 16 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdded input validation and safety controls: the langchain preprocessor enforces a 1024-character maximum for separator parsing, and the sandbox pip helper validates package names, restricts install subprocess stderr, and serializes pip installs with a module-level lock. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/preprocessor_langchain/langchain.py`:
- Around line 77-79: The constant _MAX_SEPARATOR_INPUT_LEN is defined inside
_parseSeparators, causing it to be recreated on each call and inconsistent with
other class-level constants; move _MAX_SEPARATOR_INPUT_LEN = 1024 up to the
class scope alongside the other constants (use the same private naming
convention) and update _parseSeparators to reference the class-level constant
(via self._MAX_SEPARATOR_INPUT_LEN or ClassName._MAX_SEPARATOR_INPUT_LEN) so the
value is reused and kept with the class configuration.
In `@packages/ai/src/ai/common/sandbox.py`:
- Around line 316-322: The pip install call inside the _pip_lock block uses
subprocess.check_call with stderr=subprocess.PIPE which can block if not read;
change it to redirect stderr to subprocess.DEVNULL (matching stdout) or use
subprocess.run(..., capture_output=True, check=True) if you want to capture
errors. Locate the call that invokes subprocess.check_call([sys.executable,
'-m', 'pip', 'install', '--quiet', package], stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE, timeout=60) and replace stderr=subprocess.PIPE with
stderr=subprocess.DEVNULL (or refactor to subprocess.run with
capture_output=True and check=True and handle the result) to avoid pipe buffer
deadlocks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ec7be14e-d215-487d-907d-76bd927c0222
📒 Files selected for processing (2)
nodes/src/nodes/preprocessor_langchain/langchain.pypackages/ai/src/ai/common/sandbox.py
…ketride-org#588 - Move _MAX_SEPARATOR_INPUT_LEN to class-level constant for consistency - Use subprocess.DEVNULL for stderr in pip install to prevent pipe buffer deadlocks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Addressed CodeRabbit feedback:
Thanks for the review! |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/preprocessor_langchain/langchain.py`:
- Around line 78-79: The length check on user_input can raise TypeError for
non-string truthy values; add an explicit type guard before the len(...) call:
validate that user_input is an instance of str (or coerce to str if intended)
and if not raise a ValueError with a clear message, so the code path using
self._MAX_SEPARATOR_INPUT_LEN always fails deterministically for invalid
separator input; update the check around user_input and
self._MAX_SEPARATOR_INPUT_LEN accordingly (e.g., verify isinstance(user_input,
str) then perform the len check and raise ValueError on failure).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6855d57b-10ab-4226-9722-6a7fedb1dbf0
📒 Files selected for processing (2)
nodes/src/nodes/preprocessor_langchain/langchain.pypackages/ai/src/ai/common/sandbox.py
|
Closing this in favour of keeping the status quo.
Thanks for the work, @charliegillet — the intent is right, the execution just introduces more problems than it solves. |
Split from #581 per @asclearuc's review.
Summary
threading.Lockaround pip install to prevent concurrent conflicts._MAX_SEPARATOR_INPUT_LEN = 1024) beforeast.literal_evalin the langchain preprocessor to prevent resource exhaustion from oversized separator input.Why this bug happens in this codebase
sandbox.pypasses user-provided package names directly topip installvia subprocess without validation. A malicious or malformed package name containing path traversal sequences (../), URLs, or shell metacharacters could be passed through to pip, which would attempt to install from arbitrary sources. The regex allowlist ensures only valid PyPI package name characters are accepted.langchain.pycallsast.literal_evalon a user-provided separator string. Whileast.literal_evalis safe against code execution, it still needs to parse the input — an extremely long string could cause unnecessary resource consumption. A 1024-character limit is generous for separator values (which are typically 1-10 characters) while preventing abuse.Validation
ruff checkandruff formatpass clean^[a-zA-Z0-9]([a-zA-Z0-9._-]*[a-zA-Z0-9])?$) matches the PyPI naming conventionType
#Hack-with-bay-2
🤖 Generated with Claude Code
Summary by CodeRabbit