This report summarizes practical strategies to scale CortexGraph for heavier workloads and harden it for production use, while respecting its design constraints:
- Local-first, JSONL-only storage (no databases)
- MCP server model (short-lived tool calls, client-controlled lifecycle)
- Strict security posture (no implicit network access, least privilege on disk)
It is organized as:
- Deployment and topology options
- Storage layout and scaling patterns
- Performance tuning and concurrency
- Security hardening and operational controls
- Observability, testing, and upgrade strategy
Where relevant, file paths and configuration options refer to the current codebase and docs (e.g., docs/deployment.md, docs/configuration.md, PERFORMANCE_OPTIMIZATIONS.md).
For a single user or small team:
- Install via UV tool (preferred for isolation and updates):
uv tool install cortexgraph
- Configuration location:
- Primary:
~/.config/cortexgraph/.env - Fallback (dev only):
./.envin the project directory
- Primary:
- Integrate as an MCP server:
- Claude Desktop / VS Code:
"command": "cortexgraph"for tool installs - Development:
"command": "uv", "args": ["--directory", "/path/to/cortexgraph", "run", "cortexgraph"]
- Claude Desktop / VS Code:
This baseline already isolates CortexGraph in a dedicated environment and keeps all user data local (~/.config/cortexgraph/jsonl and your Obsidian vault).
Because CortexGraph uses local JSONL and Markdown files rather than a central database, scaling is primarily a matter of topology and filesystem layout.
Pattern A: Per-User Instances (Recommended)
- Each human user has:
- Their own MCP client (Claude Desktop, VS Code, etc.)
- Their own CortexGraph configuration and storage directory:
~/.config/cortexgraph/jsonlfor STMLTM_VAULT_PATHpointing to a personal Obsidian vault
- Benefits:
- Strong isolation between users
- No cross-user contention on JSONL files
- Simple backup and restore per user
- Use when:
- You have multiple people on the same machine or shared environment
- You do not need shared memory between users
Pattern B: Shared “Team Memory” Instance
- One CortexGraph instance with:
- Shared
CORTEXGRAPH_STORAGE_PATHon a team-accessible filesystem - Shared
LTM_VAULT_PATHpointing to a team Obsidian vault
- Shared
- Accessed by multiple MCP clients using the same command path.
- Requirements:
- Underlying filesystem supports safe concurrent access (e.g., local disk or properly configured network storage with file locking semantics).
- Cooperative usage model (avoid running multiple long-lived CortexGraph processes writing to the same JSONL directory simultaneously).
- Use when:
- You want a shared memory graph for a team or project
- Trust boundaries among users are relatively strong
Pattern C: Multi-Instance with Sharded Storage
- You run N CortexGraph instances, each with its own storage root:
- Example shard keys: user ID, project name, or tenant ID.
- The MCP client or a higher-level orchestration layer chooses which CortexGraph instance to talk to based on the shard key.
- Benefits:
- Horizontal scaling without modifying CortexGraph’s storage layer
- Clear boundaries between tenants/projects
- Use when:
- You have many independent tenants
- Storage volumes per tenant may grow large
Note: Because CortexGraph is designed as a local MCP server with JSONL storage, do not introduce SQLite or other external databases. Scale out by topology (more instances, separate storage paths), not by replacing the storage backend.
Key paths (see docs/configuration.md and docs/deployment.md):
- STM JSONL directory:
CORTEXGRAPH_STORAGE_PATH=~/.config/cortexgraph/jsonl
- LTM vault:
LTM_VAULT_PATH=~/Documents/Obsidian/Vault
- LTM index:
LTM_INDEX_PATH=~/.config/cortexgraph/ltm_index.jsonl
For scaling and safety:
- Use absolute paths in
.envfor shared or non-default locations. - Keep STM and LTM on the same physical disk for simplicity unless you have a strong reason to separate them (e.g., SSD for STM, larger HDD for LTM).
JSONL STM files (memories.jsonl, relations.jsonl):
- Growth is controlled by:
- Decay and garbage collection thresholds:
CORTEXGRAPH_FORGET_THRESHOLD- Decay model parameters (
CORTEXGRAPH_DECAY_MODEL,CORTEXGRAPH_*).
- Promotion rules:
CORTEXGRAPH_PROMOTE_THRESHOLDCORTEXGRAPH_PROMOTE_USE_COUNTCORTEXGRAPH_PROMOTE_TIME_WINDOW
- Decay and garbage collection thresholds:
- For heavy usage or long-lived systems:
- Prefer slightly higher forget thresholds (e.g., 0.06–0.08) to keep STM manageable.
- Run compaction and maintenance regularly (see
cortexgraph-maintenance).
LTM Vault:
- Markdown files accumulate as memories are promoted.
- Scaling mainly depends on:
- Git backups and pruning strategy
- Obsidian vault organization (folders, tags)
Use the maintenance CLI (see docs/deployment.md):
cortexgraph-maintenance stats- Inspect JSONL file sizes, active counts, and compaction hints.
cortexgraph-maintenance compact- Rewrite JSONL files without tombstones/duplicates.
Recommended schedule:
- Daily or Weekly:
- Run stats and compact if file sizes or tombstones grow significantly.
- Monthly:
- Review archived memories and promotion behavior.
- Adjust thresholds if STM is growing too fast or too slow.
For large stores (thousands of memories or more):
- Increase compaction frequency.
- Consider per-user or per-tenant shard directories to limit individual file sizes.
PERFORMANCE_OPTIMIZATIONS.md describes several optimizations already implemented:
- Embedding model caching
- Lazy loading of LTM index
- Buffered JSONL I/O and batch operations
- Tag-based indexing and pagination
- Async I/O support
- Background task manager
- Performance monitoring utilities
The following recommendations focus on how to use and configure these capabilities in a scalable deployment.
Key performance-related configuration options (see PERFORMANCE_OPTIMIZATIONS.md and config.py):
batch_size— batch size for bulk operations (default: 100)cache_size— in-memory cache size (default: 1000)enable_async_io— enable async I/O for better concurrencysearch_timeout— search timeout in seconds (default: 5.0)
Guidelines:
- For heavier search workloads:
- Increase
cache_sizeto keep more frequently accessed memories and tags in memory. - Ensure
enable_async_iois true to avoid blocking operations. - Tune
search_timeoutto a value that balances responsiveness with completeness.
- Increase
- For heavy write workloads (many memories per session):
- Use batch operations where possible (e.g., batch saving in tools/clients).
- Keep
batch_sizemoderate (50–200) to avoid large spikes in memory usage.
In MCP usage, CortexGraph is invoked via tools that run requests in isolation:
- Each request should be treated as short-lived; the server’s internal caching smooths repeated operations within the same process.
- To avoid JSONL write contention:
- Prefer a single long-lived CortexGraph process per storage directory, rather than many short-lived processes writing concurrently.
- If multiple MCP clients share a storage path, route them to the same
cortexgraphcommand, not multiple independent processes each writing to the same files.
If you need true parallelism with separate processes:
- Use sharding or per-user paths so each process writes to its own JSONL files.
- Avoid multiple processes writing to the same JSONL directory unless you have verified filesystem-level locking semantics and are comfortable with the risk.
Embeddings are optional and can be resource-intensive:
- Enable:
CORTEXGRAPH_ENABLE_EMBEDDINGS=trueCORTEXGRAPH_EMBED_MODEL=all-MiniLM-L6-v2(recommended lightweight model)
- Considerations:
- First run will download the model (~50MB).
- Embedding computations can be CPU-heavy; on constrained machines, limit embedding usage or batch embedding operations.
- Scaling strategies:
- Use embeddings only where they add clear value (semantic search, clustering).
- For low-end systems or large multi-tenant setups, consider disabling embeddings or limiting them to specific shards/users.
For expensive operations (index building, compaction, clustering):
- Use the background task manager (see
background.py) to avoid blocking active requests. - Schedule heavy jobs during off-peak hours:
- LTM index rebuild
- Large-scale clustering and consolidation
- Full compaction of JSONL storage
Monitor performance metrics via:
get_performance_metrics()andreset_performance_metrics()(seeperformance.pyandtools/performance.py).- Track average latencies and operation counts over time to validate tuning changes.
CortexGraph already follows a strong security posture (see docs/security.md and SECURITY.md). The recommendations below focus on operationalizing and extending that posture.
- Run CortexGraph under a dedicated OS user where feasible:
- Only this user should have read/write access to:
~/.config/cortexgraph/- The configured
LTM_VAULT_PATH(or a subdirectory).
- Avoid running under privileged accounts (e.g., root) unless absolutely necessary.
- Only this user should have read/write access to:
- For containerized deployments:
- Use non-root containers.
- Mount
~/.config/cortexgraph/and the Obsidian vault as volumes with minimal necessary permissions.
Enforce restrictive permissions consistently:
- Config:
.envfiles should be0600(owner read/write only).
- Storage directories:
- JSONL storage (
CORTEXGRAPH_STORAGE_PATH) should be0700. - LTM vault should match your Obsidian practices; at minimum, only trusted users should have access.
- JSONL storage (
On shared systems:
- Avoid placing
CORTEXGRAPH_STORAGE_PATHorLTM_VAULT_PATHin world-readable locations. - If using a shared “team memory” instance, treat the storage directory as containing sensitive personal data and restrict access accordingly.
- Never commit
.envfiles or storage data to version control. - When sharing diagnostics:
- Redact file paths, personal data, and any sensitive content.
- Prefer synthetic or anonymized examples.
- For Git-based backups of the Obsidian vault:
- Use private repositories.
- Consider client-side encryption if storing sensitive data off-device.
- CortexGraph is designed to run locally with no outbound network access by default:
- Do not add remote API calls inside core modules or tools.
- If you need network functionality (e.g., downloading models), confine it to explicit setup steps or separate scripts.
- Integration with MCP clients:
- Limit which tools are exposed to the assistant if you have stricter security requirements.
- Review tool definitions for potential data exfiltration paths (e.g., anything that can read arbitrary files).
Current measures (see docs/security.md):
- Dependabot, pip-audit, Bandit, CodeQL, SBOM generation roadmap.
Additional hardening steps:
- Pin dependencies more tightly in
pyproject.tomlonce the ecosystem stabilizes. - Enable SBOM-based scanning in CI (using
snyk_sbom_scanor similar). - Periodically review:
- GitHub security advisories
- CI logs for Bandit, CodeQL, and pip-audit
-
Capture server logs:
cortexgraph 2>&1 | tee ~/.config/cortexgraph/server.log
-
Use performance metrics APIs:
- Track operation latencies (search, save, clustering).
- Watch for regressions after configuration changes or upgrades.
For multi-instance setups:
- Standardize log locations and formats so you can aggregate or inspect logs across instances.
Before rolling out configuration or topology changes:
- Generate synthetic workloads:
- Scripts that save many memories, run diverse searches, and exercise consolidation/GC.
- Use realistic tags and content sizes.
- Measure:
- End-to-end latencies for key operations.
- Growth in JSONL file sizes.
- Memory usage of the
cortexgraphprocess.
Because storage is JSONL and Markdown:
- Upgrades are primarily schema-additive and backward compatible.
- Good practice for upgrades:
- Backup
~/.config/cortexgraph/jsonl/and relevant parts of your Obsidian vault. - Upgrade CortexGraph (via
uv tool installorpip). - Run tests if you are on a dev install (
PYTHONPATH=src pytest). - Check
cortexgraph-maintenance statsfor inconsistencies.
- Backup
For multi-instance deployments:
- Roll out updates gradually:
- Upgrade one instance.
- Verify behavior under typical workload.
- Then upgrade remaining instances.
Topology and Storage
- Choose per-user, team, or sharded instances.
- Set explicit
CORTEXGRAPH_STORAGE_PATH,LTM_VAULT_PATH,LTM_INDEX_PATH. - Tune decay and promotion thresholds for expected load.
- Schedule regular
cortexgraph-maintenance statsandcompact.
Performance
- Enable async I/O and tune
batch_size,cache_size,search_timeout. - Use a single long-lived process per storage directory.
- Decide whether embeddings are necessary and configure accordingly.
- Run heavy operations via background tasks during off-peak hours.
Security
- Run under a dedicated OS user with minimal privileges.
- Enforce
0600on.envand0700on storage directories. - Keep all data local and avoid adding implicit network calls.
- Review integration/tool exposure for potential data leaks.
Operations
- Capture logs and monitor performance metrics.
- Test configuration changes under synthetic load.
- Back up JSONL and vault content before upgrades.
- Roll out upgrades gradually, especially in multi-instance deployments.
This checklist, combined with the detailed sections above, provides a concrete path to scaling CortexGraph responsibly while maintaining its privacy-preserving, local-first design and strong security posture.