Skip to content

fix: prevent path traversal in backup importer (CWE-22)#7681

Open
sebastiondev wants to merge 4 commits intoAstrBotDevs:masterfrom
sebastiondev:fix/cwe22-importer-backup-1ad9
Open

fix: prevent path traversal in backup importer (CWE-22)#7681
sebastiondev wants to merge 4 commits intoAstrBotDevs:masterfrom
sebastiondev:fix/cwe22-importer-backup-1ad9

Conversation

@sebastiondev
Copy link
Copy Markdown

@sebastiondev sebastiondev commented Apr 19, 2026

Vulnerability Summary

CWE-22: Path Traversal (Zip Slip)
Severity: High
File: astrbot/core/backup/importer.py

Description

The backup import functionality extracts files from user-uploaded zip archives without validating that entry names resolve to paths within the intended target directories. An attacker who crafts a zip file with entries containing path traversal sequences (e.g., ../../etc/crontab) can write arbitrary files anywhere the process has write access.

Data Flow

  1. Authenticated user uploads a backup zip via the dashboard API (JWT required)
  2. AstrBotImporter extracts zip entries using zf.open(name)
  3. Entry names are joined to base directories (kb_dir, attachments_dir, target_dir) without validation
  4. Files are written to the resulting path — which may be outside the intended directory

Three extraction points are affected (lines ~780, ~827, ~904 in the original file).

Preconditions

  • Attacker must have valid JWT authentication to the dashboard

Fix Description

Added a _validate_path_within() helper that:

  1. Resolves the target path and base directory to their canonical forms via Path.resolve()
  2. Verifies the resolved target starts with the resolved base directory + os.sep
  3. Returns False on any OS/value errors

This check is applied at all three zip extraction points. Entries that fail validation are skipped with a warning log.

Why this approach

  • Path.resolve() canonicalizes symlinks and .. sequences, preventing bypass via symlinks or encoding tricks
  • The + os.sep suffix prevents prefix confusion (e.g., /app/data2 matching /app/data)
  • Minimal change — only adds validation, no refactoring

Test Results

The fix was verified against:

  • Normal backup import (entries within target dirs pass validation)
  • Crafted zip with ../ traversal entries (correctly rejected)
  • Edge cases: symlink targets, entries matching directory prefix without separator

Disprove Analysis

Verdict: CONFIRMED_VALID (high confidence)

We attempted to disprove this finding by examining:

  • Preconditions: JWT authentication is required, but any authenticated dashboard user can trigger the import
  • Existing mitigations: No path validation exists in the current code at any extraction point
  • Exploitability: Confirmed feasible — standard zip libraries preserve crafted entry names

The only barrier is JWT authentication, which does not mitigate the vulnerability for authenticated users. The finding is valid and the fix is recommended.

Summary by Sourcery

Prevent path traversal during backup import by validating extracted file paths against their target directories.

Bug Fixes:

  • Add centralized path validation helper to ensure extracted backup files remain within their intended base directories, mitigating CWE-22 path traversal risk.
  • Apply path validation to knowledge base media, attachments, and directory imports, skipping and logging any out-of-bounds paths.

Validate that all file write targets resolve within their expected
base directories before writing. This prevents crafted backup ZIP
files from writing to arbitrary filesystem locations via malicious
path values in attachment records, media file paths, or directory
entries.
@auto-assign auto-assign bot requested review from LIghtJUNction and anka-afk April 19, 2026 20:09
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend labels Apr 19, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In _validate_path_within, relying on str(resolved).startswith(str(base_resolved) + os.sep) is a bit brittle (case sensitivity, path normalization) — consider using Path.is_relative_to (Python 3.9+) or os.path.commonpath/PurePath.relative_to for more robust containment checks.
  • The equality check resolved == base_resolved in _validate_path_within appears redundant given the startswith condition and the fact that target_path should normally be a file path; you might simplify this or add a short comment explaining when equality is expected to occur.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_validate_path_within`, relying on `str(resolved).startswith(str(base_resolved) + os.sep)` is a bit brittle (case sensitivity, path normalization) — consider using `Path.is_relative_to` (Python 3.9+) or `os.path.commonpath`/`PurePath.relative_to` for more robust containment checks.
- The equality check `resolved == base_resolved` in `_validate_path_within` appears redundant given the `startswith` condition and the fact that `target_path` should normally be a file path; you might simplify this or add a short comment explaining when equality is expected to occur.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements path traversal protection (CWE-22) by adding a _validate_path_within helper and applying it to knowledge base, attachment, and directory imports. The review feedback recommends using pathlib.Path.relative_to for a more robust and idiomatic implementation compared to string manipulation, while also suggesting performance optimizations for directory resolution.

Comment on lines +64 to +75
def _validate_path_within(target_path: Path, base_dir: Path) -> bool:
"""Validate that target_path is within base_dir after resolving symlinks.

Prevents path traversal attacks (CWE-22) by ensuring the resolved
target path starts with the resolved base directory.
"""
try:
resolved = target_path.resolve()
base_resolved = base_dir.resolve()
return resolved == base_resolved or str(resolved).startswith(str(base_resolved) + os.sep)
except (OSError, ValueError):
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The current implementation of _validate_path_within uses string manipulation and os.sep to verify path boundaries. While functional, using pathlib.Path.relative_to is more idiomatic and robust. It correctly handles edge cases like the root directory (where str(base_resolved) + os.sep might result in // on Unix, causing startswith to fail) and is more readable.

Additionally, note that base_dir.resolve() is called every time this function is invoked. Since the base directory is constant within each import loop (e.g., kb_dir, attachments_dir), you could further optimize performance by resolving the base directory once in the caller and passing it to this validator.

Suggested change
def _validate_path_within(target_path: Path, base_dir: Path) -> bool:
"""Validate that target_path is within base_dir after resolving symlinks.
Prevents path traversal attacks (CWE-22) by ensuring the resolved
target path starts with the resolved base directory.
"""
try:
resolved = target_path.resolve()
base_resolved = base_dir.resolve()
return resolved == base_resolved or str(resolved).startswith(str(base_resolved) + os.sep)
except (OSError, ValueError):
return False
def _validate_path_within(target_path: Path, base_dir: Path) -> bool:
"""Validate that target_path is within base_dir after resolving symlinks.
Prevents path traversal attacks (CWE-22) by ensuring the resolved
target path is a subpath of the resolved base directory.
"""
try:
resolved = target_path.resolve()
base_resolved = base_dir.resolve()
resolved.relative_to(base_resolved)
return True
except (OSError, ValueError):
return False

@LIghtJUNction
Copy link
Copy Markdown
Member

standard library pathlib

Path.is_relative_to()

https://pytutorial.com/python-pathlibis_relative_to-explained-relative-paths/

@sebastiondev
Copy link
Copy Markdown
Author

Thanks for the review feedback @sourcery-ai and @gemini-code-assist!

Both points are valid — I've updated _validate_path_within in commit 0f135d6:

  1. Replaced string manipulation with Path.is_relative_to() — this is more robust against case sensitivity and normalization edge cases, and is available since Python 3.9 which the project already requires.

  2. Removed the redundant equality checkis_relative_to() already returns True when the paths are equal, so the separate resolved == base_resolved check is no longer needed.

The function is now simpler and more idiomatic:

return resolved.is_relative_to(base_resolved)

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for the update, @sebastiondev. Using Path.is_relative_to() is indeed the cleaner, more idiomatic approach for this validation, and it correctly handles the path normalization requirements. The updated implementation looks solid.

@zouyonghe
Copy link
Copy Markdown
Member

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="astrbot/core/backup/importer.py" line_range="64-73" />
<code_context>

+
+
+def _validate_path_within(target_path: Path, base_dir: Path) -> bool:
+    """Validate that target_path is within base_dir after resolving symlinks.
+
+    Prevents path traversal attacks (CWE-22) by ensuring the resolved
+    target path is relative to the resolved base directory.
+    """
+    try:
+        resolved = target_path.resolve()
+        base_resolved = base_dir.resolve()
+        return resolved.is_relative_to(base_resolved)
+    except (OSError, ValueError):
+        return False
</code_context>
<issue_to_address>
**issue (bug_risk):** Consider Python version compatibility and explicit `strict` behavior in `_validate_path_within`.

`Path.is_relative_to` was added in Python 3.9; if we still support 3.8 or earlier, this will raise at runtime. Also, `Path.resolve()` currently relies on the implicit `strict=False` default—please pass `strict=False` explicitly so the behavior is clear and stable if the default ever changes.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +64 to +73
def _validate_path_within(target_path: Path, base_dir: Path) -> bool:
"""Validate that target_path is within base_dir after resolving symlinks.

Prevents path traversal attacks (CWE-22) by ensuring the resolved
target path is relative to the resolved base directory.
"""
try:
resolved = target_path.resolve()
base_resolved = base_dir.resolve()
return resolved.is_relative_to(base_resolved)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Consider Python version compatibility and explicit strict behavior in _validate_path_within.

Path.is_relative_to was added in Python 3.9; if we still support 3.8 or earlier, this will raise at runtime. Also, Path.resolve() currently relies on the implicit strict=False default—please pass strict=False explicitly so the behavior is clear and stable if the default ever changes.

@sebastiondev
Copy link
Copy Markdown
Author

Thanks for the follow-up review @sourcery-ai!

Re: explicit strict=False in Path.resolve() — Good catch. Added strict=False explicitly in commit 59884be so the intent is clear and future-proof.

Re: Python version compatibility — AstrBot requires Python 3.10+ (per pyproject.toml), so Path.is_relative_to() (added in 3.9) is safe to use here. No compatibility concern.

@zouyonghe
Copy link
Copy Markdown
Member

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The _validate_path_within helper relies on Path.is_relative_to, which is only available from Python 3.9 onward; if this project still targets older Python versions you may want to reimplement this check using Path.relative_to in a try/except instead.
  • For consistency of behavior and observability, consider aligning how out-of-bounds paths are reported across the three call sites (logger vs result.add_warning) so callers can handle these cases uniformly.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `_validate_path_within` helper relies on `Path.is_relative_to`, which is only available from Python 3.9 onward; if this project still targets older Python versions you may want to reimplement this check using `Path.relative_to` in a try/except instead.
- For consistency of behavior and observability, consider aligning how out-of-bounds paths are reported across the three call sites (logger vs `result.add_warning`) so callers can handle these cases uniformly.

## Individual Comments

### Comment 1
<location path="astrbot/core/backup/importer.py" line_range="68-72" />
<code_context>
+    Prevents path traversal attacks (CWE-22) by ensuring the resolved
+    target path is relative to the resolved base directory.
+    """
+    try:
+        resolved = target_path.resolve(strict=False)
+        base_resolved = base_dir.resolve(strict=False)
+        return resolved.is_relative_to(base_resolved)
+    except (OSError, ValueError):
+        return False
+
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Broaden exception handling here to cover potential symlink loop errors

`Path.resolve` can raise `RuntimeError` on symlink loops. Since this helper is meant to harden against malicious paths, consider adding `RuntimeError` to the caught exceptions so these cases return `False` instead of propagating and potentially aborting the import.

Suggested implementation:

```python

```

```python
    try:
        resolved = target_path.resolve(strict=False)
        base_resolved = base_dir.resolve(strict=False)
        return resolved.is_relative_to(base_resolved)
    except (OSError, ValueError, RuntimeError):
        return False

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +68 to +72
try:
resolved = target_path.resolve(strict=False)
base_resolved = base_dir.resolve(strict=False)
return resolved.is_relative_to(base_resolved)
except (OSError, ValueError):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 suggestion (security): Broaden exception handling here to cover potential symlink loop errors

Path.resolve can raise RuntimeError on symlink loops. Since this helper is meant to harden against malicious paths, consider adding RuntimeError to the caught exceptions so these cases return False instead of propagating and potentially aborting the import.

Suggested implementation:

    try:
        resolved = target_path.resolve(strict=False)
        base_resolved = base_dir.resolve(strict=False)
        return resolved.is_relative_to(base_resolved)
    except (OSError, ValueError, RuntimeError):
        return False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants