jcodemunch-mcp indexes source code from local folders and GitHub repositories. This document describes the security controls that protect against common risks when handling arbitrary codebases.
All user-supplied paths are validated before any file is read or written.
validate_path(root, target)resolves both paths to absolute form and verifies the target is a descendant ofrootusingos.path.commonpath().- Applied during file discovery and again before each file read (defense in depth).
- Paths such as
../../etc/passwdor absolute paths outside the repository root are rejected.
Symlinks can be used to escape the repository root and read arbitrary files.
- Default:
follow_symlinks=False— symlinks are skipped during file discovery. - When symlinks are followed (
follow_symlinks=True), each symlink target is resolved and validated against the repository root. Escaping symlinks are skipped with a warning. is_symlink_escape(root, path)checks whether a symlink resolves outside the root.- On Windows, environments without symlink support automatically skip symlink traversal.
Files are filtered through multiple layers:
- SKIP_PATTERNS — directories and files always excluded (e.g.,
node_modules/,vendor/,.git/,build/,dist/, generated files, lock files). .gitignore— respected by default for both local folders and GitHub repositories (via thepathspeclibrary).extra_ignore_patterns— user-configurable additional gitignore-style patterns passed to indexing tools.
Files matching known secret patterns are excluded during indexing.
Excluded patterns include:
- Environment files:
.env,.env.*,*.env - Certificates / keys:
*.pem,*.key,*.p12,*.pfx,*.keystore,*.jks - SSH keys:
id_rsa*,id_ed25519*,id_dsa*,id_ecdsa* - Credentials:
credentials.json,service-account*.json,*.credentials - Auth files:
.htpasswd,.netrc,.npmrc,.pypirc - Generic secret indicators:
*secret*,*.secrets,*.token
When a secret file is detected, a warning is included in the indexing response. Secret files are never stored in the index or cached content directory.
- Default maximum: 500 KB per file (configurable via
max_file_size). - Files exceeding the limit are skipped during discovery.
- A configurable file count limit (default: 500 files) prevents runaway indexing of extremely large repositories. Can be overridden using the
JCODEMUNCH_MAX_INDEX_FILESenvironment variable.
Binary files are excluded using a two-stage check:
- Extension-based detection — common binary extensions (
.exe,.dll,.so,.png,.jpg,.zip,.wasm,.pyc,.class,.pdf,.db,.sqlite, etc.). - Content-based detection — files containing null bytes within the first 8 KB are treated as binary and skipped, even if the extension suggests source code.
- All file reads use
errors="replace"to substitute invalid UTF-8 bytes with the Unicode replacement character (U+FFFD) instead of raising decode errors. - Symbol content retrieval also uses
errors="replace"to ensure safe decoding. - Cached raw files are stored using UTF-8 encoding.
- Index storage defaults to
~/.code-index/. - The storage path can be overridden using the
CODE_INDEX_PATHenvironment variable. - Repository identifiers are derived from
{owner}-{name}, preventing path injection in storage locations. - Index files are stored as JSON and validated during load to ensure schema integrity.
The performance and ranking telemetry introduced in v1.74.0–v1.80.0 is local-only and opt-in:
~/.code-index/telemetry.db(tool_calls,ranking_events) is written only whenperf_telemetry_enabled: true(orJCODEMUNCH_PERF_TELEMETRY=1). Default is disabled — the in-memory latency ring is always tracked but no row touches disk.~/.code-index/tuning.jsonc(per-repo retrieval-weight overrides) is written only by an explicittune_weightsinvocation.~/.code-index/embed_canary.json(16-string drift canary) is written only by an explicitcheck_embedding_drift(capture=true)invocation.- No telemetry is sent over the network. The community token-savings
counter (
share_savings) is unrelated and only sends an integer delta plus an anonymous UUID — never query strings, paths, or repo names. Disable withJCODEMUNCH_SHARE_SAVINGS=0. - Stored ranking events include the literal query string (truncated result-id list, no source code). Treat the storage path with the same care as any local source you index.
| Control | Location | Default |
|---|---|---|
| Path traversal validation | security.validate_path() |
Always enabled |
| Symlink escape protection | security.is_symlink_escape() |
Symlinks skipped by default |
| Secret file exclusion | security.is_secret_file() |
Always enabled |
| Binary file detection | security.is_binary_file() |
Always enabled |
| File size limit | File discovery pipeline | 500 KB |
| File count limit | File discovery pipeline | 500 files |
.gitignore respect |
Indexing pipeline | Enabled |
| UTF-8 safe decode | All file reads | errors="replace" |
| Perf telemetry sink | perf_telemetry_enabled |
Disabled (opt-in) |
| Ranking ledger storage | perf_telemetry_enabled |
Disabled (opt-in) |
| Tuning overrides | Explicit tune_weights call |
None until invoked |
| Embedding canary | Explicit check_embedding_drift call |
None until invoked |