fix: prevent path traversal in backup importer (CWE-22)#7681
fix: prevent path traversal in backup importer (CWE-22)#7681sebastiondev wants to merge 4 commits intoAstrBotDevs:masterfrom
Conversation
Validate that all file write targets resolve within their expected base directories before writing. This prevents crafted backup ZIP files from writing to arbitrary filesystem locations via malicious path values in attachment records, media file paths, or directory entries.
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
_validate_path_within, relying onstr(resolved).startswith(str(base_resolved) + os.sep)is a bit brittle (case sensitivity, path normalization) — consider usingPath.is_relative_to(Python 3.9+) oros.path.commonpath/PurePath.relative_tofor more robust containment checks. - The equality check
resolved == base_resolvedin_validate_path_withinappears redundant given thestartswithcondition and the fact thattarget_pathshould normally be a file path; you might simplify this or add a short comment explaining when equality is expected to occur.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_validate_path_within`, relying on `str(resolved).startswith(str(base_resolved) + os.sep)` is a bit brittle (case sensitivity, path normalization) — consider using `Path.is_relative_to` (Python 3.9+) or `os.path.commonpath`/`PurePath.relative_to` for more robust containment checks.
- The equality check `resolved == base_resolved` in `_validate_path_within` appears redundant given the `startswith` condition and the fact that `target_path` should normally be a file path; you might simplify this or add a short comment explaining when equality is expected to occur.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request implements path traversal protection (CWE-22) by adding a _validate_path_within helper and applying it to knowledge base, attachment, and directory imports. The review feedback recommends using pathlib.Path.relative_to for a more robust and idiomatic implementation compared to string manipulation, while also suggesting performance optimizations for directory resolution.
| def _validate_path_within(target_path: Path, base_dir: Path) -> bool: | ||
| """Validate that target_path is within base_dir after resolving symlinks. | ||
|
|
||
| Prevents path traversal attacks (CWE-22) by ensuring the resolved | ||
| target path starts with the resolved base directory. | ||
| """ | ||
| try: | ||
| resolved = target_path.resolve() | ||
| base_resolved = base_dir.resolve() | ||
| return resolved == base_resolved or str(resolved).startswith(str(base_resolved) + os.sep) | ||
| except (OSError, ValueError): | ||
| return False |
There was a problem hiding this comment.
The current implementation of _validate_path_within uses string manipulation and os.sep to verify path boundaries. While functional, using pathlib.Path.relative_to is more idiomatic and robust. It correctly handles edge cases like the root directory (where str(base_resolved) + os.sep might result in // on Unix, causing startswith to fail) and is more readable.
Additionally, note that base_dir.resolve() is called every time this function is invoked. Since the base directory is constant within each import loop (e.g., kb_dir, attachments_dir), you could further optimize performance by resolving the base directory once in the caller and passing it to this validator.
| def _validate_path_within(target_path: Path, base_dir: Path) -> bool: | |
| """Validate that target_path is within base_dir after resolving symlinks. | |
| Prevents path traversal attacks (CWE-22) by ensuring the resolved | |
| target path starts with the resolved base directory. | |
| """ | |
| try: | |
| resolved = target_path.resolve() | |
| base_resolved = base_dir.resolve() | |
| return resolved == base_resolved or str(resolved).startswith(str(base_resolved) + os.sep) | |
| except (OSError, ValueError): | |
| return False | |
| def _validate_path_within(target_path: Path, base_dir: Path) -> bool: | |
| """Validate that target_path is within base_dir after resolving symlinks. | |
| Prevents path traversal attacks (CWE-22) by ensuring the resolved | |
| target path is a subpath of the resolved base directory. | |
| """ | |
| try: | |
| resolved = target_path.resolve() | |
| base_resolved = base_dir.resolve() | |
| resolved.relative_to(base_resolved) | |
| return True | |
| except (OSError, ValueError): | |
| return False |
|
standard library pathlib Path.is_relative_to() https://pytutorial.com/python-pathlibis_relative_to-explained-relative-paths/ |
|
Thanks for the review feedback @sourcery-ai and @gemini-code-assist! Both points are valid — I've updated
The function is now simpler and more idiomatic: return resolved.is_relative_to(base_resolved) |
|
Thank you for the update, @sebastiondev. Using |
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="astrbot/core/backup/importer.py" line_range="64-73" />
<code_context>
+
+
+def _validate_path_within(target_path: Path, base_dir: Path) -> bool:
+ """Validate that target_path is within base_dir after resolving symlinks.
+
+ Prevents path traversal attacks (CWE-22) by ensuring the resolved
+ target path is relative to the resolved base directory.
+ """
+ try:
+ resolved = target_path.resolve()
+ base_resolved = base_dir.resolve()
+ return resolved.is_relative_to(base_resolved)
+ except (OSError, ValueError):
+ return False
</code_context>
<issue_to_address>
**issue (bug_risk):** Consider Python version compatibility and explicit `strict` behavior in `_validate_path_within`.
`Path.is_relative_to` was added in Python 3.9; if we still support 3.8 or earlier, this will raise at runtime. Also, `Path.resolve()` currently relies on the implicit `strict=False` default—please pass `strict=False` explicitly so the behavior is clear and stable if the default ever changes.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def _validate_path_within(target_path: Path, base_dir: Path) -> bool: | ||
| """Validate that target_path is within base_dir after resolving symlinks. | ||
|
|
||
| Prevents path traversal attacks (CWE-22) by ensuring the resolved | ||
| target path is relative to the resolved base directory. | ||
| """ | ||
| try: | ||
| resolved = target_path.resolve() | ||
| base_resolved = base_dir.resolve() | ||
| return resolved.is_relative_to(base_resolved) |
There was a problem hiding this comment.
issue (bug_risk): Consider Python version compatibility and explicit strict behavior in _validate_path_within.
Path.is_relative_to was added in Python 3.9; if we still support 3.8 or earlier, this will raise at runtime. Also, Path.resolve() currently relies on the implicit strict=False default—please pass strict=False explicitly so the behavior is clear and stable if the default ever changes.
|
Thanks for the follow-up review @sourcery-ai! Re: explicit Re: Python version compatibility — AstrBot requires Python 3.10+ (per |
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
_validate_path_withinhelper relies onPath.is_relative_to, which is only available from Python 3.9 onward; if this project still targets older Python versions you may want to reimplement this check usingPath.relative_toin a try/except instead. - For consistency of behavior and observability, consider aligning how out-of-bounds paths are reported across the three call sites (logger vs
result.add_warning) so callers can handle these cases uniformly.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_validate_path_within` helper relies on `Path.is_relative_to`, which is only available from Python 3.9 onward; if this project still targets older Python versions you may want to reimplement this check using `Path.relative_to` in a try/except instead.
- For consistency of behavior and observability, consider aligning how out-of-bounds paths are reported across the three call sites (logger vs `result.add_warning`) so callers can handle these cases uniformly.
## Individual Comments
### Comment 1
<location path="astrbot/core/backup/importer.py" line_range="68-72" />
<code_context>
+ Prevents path traversal attacks (CWE-22) by ensuring the resolved
+ target path is relative to the resolved base directory.
+ """
+ try:
+ resolved = target_path.resolve(strict=False)
+ base_resolved = base_dir.resolve(strict=False)
+ return resolved.is_relative_to(base_resolved)
+ except (OSError, ValueError):
+ return False
+
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Broaden exception handling here to cover potential symlink loop errors
`Path.resolve` can raise `RuntimeError` on symlink loops. Since this helper is meant to harden against malicious paths, consider adding `RuntimeError` to the caught exceptions so these cases return `False` instead of propagating and potentially aborting the import.
Suggested implementation:
```python
```
```python
try:
resolved = target_path.resolve(strict=False)
base_resolved = base_dir.resolve(strict=False)
return resolved.is_relative_to(base_resolved)
except (OSError, ValueError, RuntimeError):
return False
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| try: | ||
| resolved = target_path.resolve(strict=False) | ||
| base_resolved = base_dir.resolve(strict=False) | ||
| return resolved.is_relative_to(base_resolved) | ||
| except (OSError, ValueError): |
There was a problem hiding this comment.
🚨 suggestion (security): Broaden exception handling here to cover potential symlink loop errors
Path.resolve can raise RuntimeError on symlink loops. Since this helper is meant to harden against malicious paths, consider adding RuntimeError to the caught exceptions so these cases return False instead of propagating and potentially aborting the import.
Suggested implementation:
try:
resolved = target_path.resolve(strict=False)
base_resolved = base_dir.resolve(strict=False)
return resolved.is_relative_to(base_resolved)
except (OSError, ValueError, RuntimeError):
return False
Vulnerability Summary
CWE-22: Path Traversal (Zip Slip)
Severity: High
File:
astrbot/core/backup/importer.pyDescription
The backup import functionality extracts files from user-uploaded zip archives without validating that entry names resolve to paths within the intended target directories. An attacker who crafts a zip file with entries containing path traversal sequences (e.g.,
../../etc/crontab) can write arbitrary files anywhere the process has write access.Data Flow
AstrBotImporterextracts zip entries usingzf.open(name)kb_dir,attachments_dir,target_dir) without validationThree extraction points are affected (lines ~780, ~827, ~904 in the original file).
Preconditions
Fix Description
Added a
_validate_path_within()helper that:Path.resolve()os.sepFalseon any OS/value errorsThis check is applied at all three zip extraction points. Entries that fail validation are skipped with a warning log.
Why this approach
Path.resolve()canonicalizes symlinks and..sequences, preventing bypass via symlinks or encoding tricks+ os.sepsuffix prevents prefix confusion (e.g.,/app/data2matching/app/data)Test Results
The fix was verified against:
../traversal entries (correctly rejected)Disprove Analysis
Verdict: CONFIRMED_VALID (high confidence)
We attempted to disprove this finding by examining:
The only barrier is JWT authentication, which does not mitigate the vulnerability for authenticated users. The finding is valid and the fix is recommended.
Summary by Sourcery
Prevent path traversal during backup import by validating extracted file paths against their target directories.
Bug Fixes: