Skip to content

fix: sanitize user_id to prevent path traversal in Flask web app (CWE-22)#64

Open
sebastiondev wants to merge 1 commit intoBAI-LAB:mainfrom
sebastiondev:fix/cwe22-app-flask-d5e2
Open

fix: sanitize user_id to prevent path traversal in Flask web app (CWE-22)#64
sebastiondev wants to merge 1 commit intoBAI-LAB:mainfrom
sebastiondev:fix/cwe22-app-flask-d5e2

Conversation

@sebastiondev
Copy link
Copy Markdown

Vulnerability Summary

CWE-22: Improper Limitation of a Pathname to a Restricted Directory (Path Traversal)
Severity: High — unauthenticated arbitrary directory deletion via shutil.rmtree

Data Flow

  1. An unauthenticated HTTP client sends a POST to /init_memory with a JSON body containing user_id.
  2. user_id is read from request.json with zero validation (only an empty-string check exists).
  3. The value flows into Memoryos(user_id=...)os.path.join(data_storage_path, "users", user_id) → stored as self.user_data_dir.
  4. On /clear_memory, the stored memory_system.user_data_dir is passed directly to shutil.rmtree().
  5. A crafted user_id such as ../../important_data causes shutil.rmtree to delete arbitrary directories on the filesystem.

Preconditions

  • Network access to port 5019 — the app binds to 0.0.0.0 (all interfaces)
  • No authentication required — no login_required, no API keys, no Bearer tokens
  • The process has filesystem write permissions to the target path
  • Memoryos.__init__ completes with fake credentials (no API calls happen during construction)

Exploit Sketch

# Step 1: Initialize with traversal payload (no auth needed)
curl -X POST http://target:5019/init_memory \
  -H "Content-Type: application/json" \
  -d '{"user_id":"../../important_data","api_key":"fake","base_url":"http://localhost","model_name":"gpt-4o-mini"}'

# Step 2: Delete arbitrary directory via shutil.rmtree
curl -X POST http://target:5019/clear_memory \
  -H "Content-Type: application/json" \
  -b "<session_cookie_from_step_1>"
# shutil.rmtree("./data/users/../../important_data") → deletes ./important_data/

Fix Description & Rationale

Changes (2 files, minimal diff)

File Change
memoryos-playground/memdemo/app.py Add _is_safe_id() validator + apply it at /init_memory input boundary
tests/test_cwe22_app_flask.py New — comprehensive test coverage for both attack and safe payloads

Defense Strategy

  1. Primary defense — strict allowlist regex on user_id at the only input point (/init_memory):

    • Must start with an alphanumeric character
    • May contain A-Za-z0-9, ., _, - only
    • Rejects /, \, .., null bytes, and empty strings
    • assistant_id is derived as f"assistant_{user_id}" so it is safe by construction after validation
  2. Defense-in-depth checks within _is_safe_id():

    • Explicitly rejects . and .. path components
    • Rejects any value containing /, \, or .. as a belt-and-suspenders guard
    • Rejects null bytes (\x00)

Why this approach

  • The fix is applied at the single input boundary where user_id enters the system
  • An allowlist approach is safer than a denylist — only known-safe characters are permitted
  • No changes to the Memoryos library itself are needed for the web-facing fix (though library-level validation would be a good follow-up for defense in depth)

Test Results

All tests pass. Run via python3 tests/test_cwe22_app_flask.py:

--- Testing path traversal payloads are rejected ---
  PASS rejected: '../etc/cron.d' -> 400
  PASS rejected: '..\\windows\\system32' -> 400
  PASS rejected: 'foo/../../etc/passwd' -> 400
  PASS rejected: 'foo/../../../tmp/evil' -> 400
  PASS rejected: '....//....//etc' -> 400
  PASS rejected: '/absolute/path' -> 400
  PASS rejected: '.' -> 400
  PASS rejected: '..' -> 400

--- Testing safe payloads are accepted ---
  PASS accepted: 'alice' -> 200
  PASS accepted: 'bob_123' -> 200
  PASS accepted: 'user-2024' -> 200
  PASS accepted: 'CamelCaseUser' -> 200

All tests passed!

Attack payloads tested: ../etc/cron.d, ..\\windows\\system32, foo/../../etc/passwd, foo/../../../tmp/evil, ....//....//etc, /absolute/path, ., ..

Safe payloads verified (no false positives): alice, bob_123, user-2024, CamelCaseUser


Disprove Analysis

We attempted to disprove this finding through multiple angles:

Check Result
Authentication None. No login_required, no API keys, no Bearer tokens. Endpoints are fully unauthenticated.
Network exposure App binds to 0.0.0.0:5019. No CORS restrictions, no reverse proxy config found.
Deployment context Dockerfile installs deps only. No docker-compose, nginx, or restrictive infrastructure evidence.
Input validation Zero validation on user_id in original code — only an empty-string check.
Caller trace request.jsondata.get("user_id")Memoryos(user_id=...)os.path.join(...)shutil.rmtree(). No sanitization anywhere in the chain.
Prior reports No prior security issues in GitHub issue tracker.
Security policy No SECURITY.md exists.
Fix bypass attempts The regex allowlist is strict; no bypass found. Defense-in-depth checks add redundancy.

Verdict: CONFIRMED_VALID — High confidence


Additional Notes

The same vulnerable os.path.join(data_storage_path, "users", user_id) pattern exists in the library modules:

  • memoryos-pypi/memoryos.py:71
  • memoryos-chromadb/storage_provider.py:35
  • memoryos-mcp/memoryos/memoryos.py:71

These are not directly web-exposed, but any application passing untrusted input as user_id to the Memoryos class would be vulnerable at the library level. Adding validation in the library constructor would be a valuable follow-up.

This PR focuses on the immediately exploitable web-facing endpoint, which is the highest priority.

Thank you for your time reviewing this. Happy to address any feedback.

…-22)

Add _is_safe_id() validation to the /init_memory endpoint to reject
user_id values containing path separators, ".." components, null bytes,
or other characters that could escape the intended data directory.

Without this check, an attacker can POST user_id="../../etc/cron.d" to
create arbitrary directories (via os.makedirs in the Memoryos constructor)
or delete arbitrary directory trees (via shutil.rmtree in /clear_memory).

The fix uses an allowlist regex (alphanumeric, hyphens, underscores, dots)
applied at the HTTP boundary before the value reaches any filesystem operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant