Skip to content

fix: cloud mode path validation and sanitization (bmc-issue-103)#332

Merged
phernandez merged 2 commits intomainfrom
bmc-issue-103
Oct 5, 2025
Merged

fix: cloud mode path validation and sanitization (bmc-issue-103)#332
phernandez merged 2 commits intomainfrom
bmc-issue-103

Conversation

@phernandez
Copy link
Copy Markdown
Member

Summary

Fixes https://github.com/basicmachines-co/basic-memory-cloud/issues/103

Implements comprehensive path validation and sanitization for cloud mode to prevent projects from being created outside the /app/data/ mount in tenant containers.

Problem

Cloud tenants could create projects with invalid paths outside /app/data/ mount, causing silent failures during sync operations. Users could inadvertently specify paths like:

  • ~/Documents/test → resolved to /app/Documents/test (not persistent)
  • /tmp/test → outside cloud storage
  • ../../../etc/passwd → path traversal attempt

Root Causes

  1. Cloud mode detection code existed but wasn't executing properly
  2. No path sanitization to prevent escape attempts
  3. Paths weren't validated against cloud storage boundaries

Changes

Enhanced Path Validation (project_service.py)

  • Use config_manager.config.cloud_mode_enabled for proper cloud mode detection
  • Sanitize input paths by stripping dangerous patterns:
    • Leading slashes (/)
    • Home directory references (~/)
    • Parent directory traversal (../)
  • Construct all paths relative to BASIC_MEMORY_HOME
  • Validate resolved paths stay within cloud storage boundary
  • Raise clear ValueError for invalid paths with helpful messages

API Layer (project_router.py)

  • Added comment documenting that service layer handles cloud mode validation

Test Coverage (test_project_service.py)

  • test_add_project_cloud_mode_sanitizes_paths: Verify sanitization works correctly
  • test_add_project_cloud_mode_rejects_escape_attempts: Validate security against path traversal
  • test_add_project_local_mode_allows_arbitrary_paths: Ensure backward compatibility

Behavior After Fix

Cloud mode paths constrained to /app/data/basic-memory/:

  • Input: "test"/app/data/basic-memory/test
  • Input: "~/Documents/test"/app/data/basic-memory/Documents/test
  • Input: "/tmp/test"/app/data/basic-memory/tmp/test
  • Input: "../../../etc/passwd"/app/data/basic-memory (sanitized)

Local mode continues to work with arbitrary paths
All 26 project service tests pass

Test Plan

  • Run project service tests: uv run pytest tests/services/test_project_service.py -v
  • Verify cloud mode path sanitization works correctly
  • Verify local mode backward compatibility
  • Code formatted and linted

Deployment Notes

This change requires:

  1. Deployment of updated basic-memory package
  2. Tenant container restart (handled by basic-memory-cloud deployment)
  3. Companion change in basic-memory-cloud to set BASIC_MEMORY_CLOUD_MODE=true environment variable

🤖 Generated with Claude Code

Fixes basicmachines-co/basic-memory-cloud#103

## Problem
Cloud tenants could create projects with invalid paths outside /app/data/
mount, causing silent failures during sync operations.

## Root Causes
1. Missing BASIC_MEMORY_CLOUD_MODE environment variable in tenant containers
2. Cloud mode path validation existed but wasn't executing
3. No path sanitization to prevent escape attempts

## Changes

### Enhanced Path Validation (project_service.py)
- Use config_manager.config.cloud_mode_enabled for proper detection
- Sanitize input paths by stripping dangerous patterns:
  - Leading slashes (/)
  - Home directory references (~/)
  - Parent directory traversal (../)
- Construct all paths relative to BASIC_MEMORY_HOME
- Validate resolved paths stay within cloud storage boundary
- Raise clear ValueError for invalid paths

### API Layer Comment (project_router.py)
- Document that service layer handles cloud mode validation

### Test Coverage (test_project_service.py)
- test_add_project_cloud_mode_sanitizes_paths: Verify sanitization works
- test_add_project_cloud_mode_rejects_escape_attempts: Security validation
- test_add_project_local_mode_allows_arbitrary_paths: Backward compatibility

## Behavior After Fix

✅ Cloud mode paths constrained to /app/data/basic-memory/:
- Input: "test" → /app/data/basic-memory/test
- Input: "~/Documents/test" → /app/data/basic-memory/Documents/test
- Input: "/tmp/test" → /app/data/basic-memory/tmp/test
- Input: "../../../etc/passwd" → /app/data/basic-memory (sanitized)

✅ All 26 project service tests pass
✅ Local mode continues to work with arbitrary paths

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Oct 5, 2025

Claude finished @phernandez's task —— View job


PR Review: Cloud Mode Path Validation

I've completed a comprehensive review of this PR against the Basic Memory team checklist. Here's my detailed analysis:


✅ Code Quality & Standards

Follows Basic Memory's coding conventions ✅

  • Python 3.12+ type annotations: All code properly typed
  • Line length: All lines under 100 characters ✅
  • Async patterns: Correctly uses async/await with SQLAlchemy 2.0 ✅
  • Import order: Correct (standard lib → third-party → local) ✅
  • No commented-out code: Clean ✅

Minor Issue: Inconsistent cloud mode property usage ⚠️

In project_service.py:103, the code uses:

if self.config_manager.config.cloud_mode_enabled:

However, in src/basic_memory/mcp/project_context.py:25,42, the codebase still uses the old pattern:

if config.cloud_mode:

Recommendation: While this PR correctly uses cloud_mode_enabled (the property that checks both env var and config), there's inconsistency in the codebase. Consider a follow-up PR to standardize all cloud mode checks to use cloud_mode_enabled for consistency.


✅ Testing & Documentation

Excellent test coverage ✅

  • test_add_project_cloud_mode_sanitizes_paths: Comprehensive test with 6 test cases covering various path patterns
  • test_add_project_cloud_mode_rejects_escape_attempts: Security-focused test for path traversal attacks
  • test_add_project_local_mode_allows_arbitrary_paths: Ensures backward compatibility
  • Edge cases covered: Home directory (~/), absolute paths (/tmp), traversal (../), nested paths

Test quality observations:

  1. ✅ Uses monkeypatch correctly to set environment variables
  2. ✅ Tests verify paths stay within cloud storage boundary
  3. ✅ Proper cleanup in finally blocks
  4. ✅ Good descriptive test names and comments

Documentation ✅

  • PR description: Excellent summary with problem, root causes, changes, and deployment notes
  • Code comments: Clear comment added in project_router.py:197 explaining service layer handles validation
  • Inline comments: Good explanations of sanitization logic in project_service.py:108-117

✅ Basic Memory Architecture

Service layer separation ✅

The fix correctly places validation logic in the service layer (ProjectService.add_project), not the API layer. This follows the repository pattern where:

  • API layer (project_router.py) delegates to service
  • Service layer (project_service.py) handles business logic
  • Repository layer handles data access

Local-first architecture preserved ✅

  • Local mode continues to work with arbitrary paths (line 128 in project_service.py)
  • Cloud mode constraints only apply when cloud_mode_enabled=True
  • No changes to file system as source of truth

Path sanitization approach ✅

The sanitization logic (lines 108-120) is well-designed:

  1. Strips leading slashes
  2. Removes home directory references (~/)
  3. Filters out parent traversal (..) and current directory (.)
  4. Constructs path relative to BASIC_MEMORY_HOME
  5. Validates resolved path stays within boundary

✅ Security & Performance

Security improvements ✅

This PR significantly improves security for cloud mode:

Before: Users could specify paths outside /app/data/, causing:

  • Silent sync failures
  • Non-persistent storage usage
  • Potential data loss

After: All paths sanitized and validated to stay within cloud storage boundary

Path traversal protection ✅

The sanitization correctly handles attack vectors:

  • ../../../etc/passwd → sanitized to /app/data/basic-memory
  • ~/Documents/test → sanitized to /app/data/basic-memory/Documents/test
  • /tmp/test → sanitized to /app/data/basic-memory/tmp/test

Performance considerations ✅

  • Path sanitization is O(n) where n = path segments (minimal overhead)
  • No database queries added to critical path
  • resolve() called once per project creation (acceptable)

Potential security enhancement 🔍

The current code uses:

assert basic_memory_home is not None

Recommendation: Consider changing to explicit error handling for production safety:

if basic_memory_home is None:
    raise ValueError("BASIC_MEMORY_HOME must be set in cloud mode")

Assertions can be disabled with python -O, which would skip this critical check.


🔍 Additional Observations

1. Import cleanup in test file ✅

The PR removes unused imports in test_disable_permalinks_integration.py:

-from pathlib import Path
-from textwrap import dedent
-from basic_memory.models import Project

Good housekeeping!

2. Config property usage ✅

The fix correctly uses self.config_manager.config.cloud_mode_enabled which properly checks both:

  1. BASIC_MEMORY_CLOUD_MODE environment variable (priority)
  2. Config file cloud_mode value (fallback)

This is important because cloud containers will set the env var, but local development won't.

3. Test setup pattern 📝

The tests use:

from basic_memory.services import project_service as ps_module
monkeypatch.setattr(ps_module, "config", config_manager.load_config())

This is a valid approach, but note that config is defined at module level (line 24 in project_service.py). The service uses self.config_manager.config which creates a fresh instance, so this monkeypatch might not be strictly necessary for these tests. However, it doesn't hurt and ensures consistency.


✅ Deployment Readiness

The PR includes clear deployment notes:

  1. Deploy updated basic-memory package ✅
  2. Restart tenant containers ✅
  3. Companion change needed in basic-memory-cloud ✅

Note: The companion change to set BASIC_MEMORY_CLOUD_MODE=true is critical. Without it, cloud mode won't activate.


Summary

This is a high-quality security fix that addresses a real issue in cloud deployments. The implementation is clean, well-tested, and follows Basic Memory's architecture patterns.

Checklist Results:

  • ✅ Code Quality & Standards: Pass (with minor consistency note)
  • ✅ Testing & Documentation: Excellent
  • ✅ Basic Memory Architecture: Pass
  • ✅ Security & Performance: Pass (with one enhancement suggestion)

Recommendations:

  1. Optional: Change assert basic_memory_home is not None to explicit error handling (line 105)
  2. Future: Consider standardizing all config.cloud_mode references to use config.cloud_mode_enabled property

Approval Status:

Recommend approval - This PR is ready to merge. The suggestions above are minor enhancements that can be addressed in follow-up work if desired.


Cloud mode only runs on POSIX systems (Linux containers in Fly.io),
so skip these tests on Windows to avoid platform-specific path issues.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Oct 5, 2025

Claude finished @phernandez's task —— View job


PR Review: Cloud Mode Path Validation ✅

I've completed a thorough review of this PR against the Basic Memory team checklist. This is a well-implemented security fix that properly addresses cloud mode path validation issues.


✅ Code Quality & Standards

Follows Basic Memory's coding conventions ✅

  • Python 3.12+ type annotations: All functions properly typed
  • Line length: All lines under 100 characters ✅
  • Async patterns: Correctly uses async/await with SQLAlchemy 2.0 ✅
  • Import order: Correct (standard lib → third-party → local) ✅
  • No commented-out code: Clean ✅

Key Implementation Details ✅

Cloud mode detection (project_service.py:103):

if self.config_manager.config.cloud_mode_enabled:

This correctly uses the cloud_mode_enabled property which checks:

  1. BASIC_MEMORY_CLOUD_MODE env var (priority)
  2. Config file cloud_mode value (fallback)

Path sanitization logic (lines 108-120):

  • Strips leading slashes: /tmp/testtmp/test
  • Removes home directory: ~/docsdocs
  • Filters parent traversal: ../../../etcetc
  • Validates final path stays within boundary

✅ Testing & Documentation

Excellent test coverage ✅

Three comprehensive test cases added:

  1. test_add_project_cloud_mode_sanitizes_paths (lines 719-785)

    • 6 test cases covering various path patterns
    • Tests simple relative, home directory, absolute, traversal, nested, and mixed paths
    • Verifies all sanitized paths stay within cloud storage boundary
  2. test_add_project_cloud_mode_rejects_escape_attempts (lines 787-836)

    • Security-focused test for path traversal attacks
    • Tests: ../../../etc/passwd, ../../.env, ../../../home/user/.ssh/id_rsa
    • Verifies sanitization prevents escaping cloud storage
  3. test_add_project_local_mode_allows_arbitrary_paths (lines 838-871)

    • Ensures backward compatibility
    • Verifies local mode still accepts arbitrary absolute paths
    • Important regression test

Test quality observations ✅

  • Uses monkeypatch correctly to set environment variables
  • Tests verify paths stay within cloud storage boundary
  • Good use of @pytest.mark.skipif for Windows (cloud mode only runs on POSIX)
  • Proper cleanup in finally blocks

Documentation ✅

  • PR description: Excellent problem/solution summary with deployment notes
  • Code comments: Clear comment added in project_router.py:197 explaining service layer handles validation
  • Inline comments: Good explanations of sanitization logic (lines 108-117)

✅ Basic Memory Architecture

Service layer separation ✅

The fix correctly places validation logic in the service layer (ProjectService.add_project), not the API layer. This follows the repository pattern where:

  • API layer (project_router.py) delegates to service
  • Service layer (project_service.py) handles business logic
  • Repository layer handles data access

Local-first architecture preserved ✅

  • Local mode continues to work with arbitrary paths (line 128)
  • Cloud mode constraints only apply when cloud_mode_enabled=True
  • No changes to file system as source of truth

✅ Security & Performance

Security improvements ✅

Before this PR: Users could specify paths outside /app/data/, causing:

  • Silent sync failures
  • Non-persistent storage usage
  • Potential data loss

After this PR: All paths sanitized and validated to stay within cloud storage boundary

Path traversal protection ✅

The sanitization correctly handles attack vectors:

  • ../../../etc/passwd → sanitized to /app/data/basic-memory
  • ~/Documents/test → sanitized to /app/data/basic-memory/Documents/test
  • /tmp/test → sanitized to /app/data/basic-memory/tmp/test

Performance considerations ✅

  • Path sanitization is O(n) where n = path segments (minimal overhead)
  • No database queries added to critical path
  • resolve() called once per project creation (acceptable)

Security note: Assert vs Exception 🔍

Line 105 uses an assertion:

assert basic_memory_home is not None

Recommendation: Consider explicit error handling for production safety:

if basic_memory_home is None:
    raise ValueError("BASIC_MEMORY_HOME must be set in cloud mode")

Assertions can be disabled with python -O, which would skip this critical check. However, given this is defensive code and cloud containers always set BASIC_MEMORY_HOME, the current approach is acceptable.


🔍 Additional Observations

1. Import cleanup ✅

The PR removes unused imports in test_disable_permalinks_integration.py:

-from pathlib import Path
-from textwrap import dedent
-from basic_memory.models import Project

Good housekeeping!

2. Windows compatibility ✅

Tests properly skip on Windows with @pytest.mark.skipif(os.name == "nt", reason="Cloud mode only runs on POSIX systems") - ensures cross-platform test reliability.

3. Clean sanitization approach ✅

The path sanitization logic is straightforward and auditable:

  1. Strip dangerous prefixes (/, ~/)
  2. Split path into parts
  3. Filter out ., .., and empty parts
  4. Join remaining parts
  5. Validate within boundary

This approach is more secure than trying to block specific patterns.


Summary

This is a high-quality security fix that addresses a real vulnerability in cloud deployments. The implementation is clean, well-tested, and follows Basic Memory's architecture patterns.

Checklist Results:

  • Code Quality & Standards: Pass
  • Testing & Documentation: Excellent
  • Basic Memory Architecture: Pass
  • Security & Performance: Pass (with minor note)

Approval Status:

✅ Recommend approval - This PR is ready to merge.

Optional Enhancement:

  • Consider changing assert basic_memory_home is not None to explicit error handling (line 105) for production safety

@phernandez phernandez merged commit 7616b2b into main Oct 5, 2025
16 checks passed
@phernandez phernandez deleted the bmc-issue-103 branch October 5, 2025 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant