|
2 | 2 |
|
3 | 3 | Contributions are welcome. This guide covers setup, testing, and PR guidelines. |
4 | 4 |
|
| 5 | +> **Important**: This project is a **pure C binary** (rewritten from Go in v0.5.0). Please submit C code, not Go. Go PRs may be ported but cannot be merged directly. |
| 6 | +
|
5 | 7 | ## Build from Source |
6 | 8 |
|
7 | | -**Prerequisites**: Go 1.26+, a C compiler (gcc or clang — needed for tree-sitter CGO bindings), Git. |
| 9 | +**Prerequisites**: C compiler (gcc or clang), make, zlib, Git. Optional: Node.js 22+ (for graph UI). |
8 | 10 |
|
9 | 11 | ```bash |
10 | 12 | git clone https://github.com/DeusData/codebase-memory-mcp.git |
11 | 13 | cd codebase-memory-mcp |
12 | | -CGO_ENABLED=1 go build -o codebase-memory-mcp ./cmd/codebase-memory-mcp/ |
| 14 | +scripts/build.sh |
13 | 15 | ``` |
14 | 16 |
|
15 | 17 | macOS: `xcode-select --install` provides clang. |
16 | | -Linux: `sudo apt install build-essential` (Debian/Ubuntu) or `sudo dnf install gcc` (Fedora). |
| 18 | +Linux: `sudo apt install build-essential zlib1g-dev` (Debian/Ubuntu) or `sudo dnf install gcc zlib-devel` (Fedora). |
| 19 | + |
| 20 | +The binary is output to `build/c/codebase-memory-mcp`. |
17 | 21 |
|
18 | 22 | ## Run Tests |
19 | 23 |
|
20 | 24 | ```bash |
21 | | -go test ./... -count=1 |
| 25 | +scripts/test.sh |
22 | 26 | ``` |
23 | 27 |
|
24 | | -Key test files: |
25 | | -- `internal/pipeline/langparity_test.go` — 125+ language parity cases |
26 | | -- `internal/pipeline/astdump_test.go` — 90+ AST structure cases |
27 | | -- `internal/pipeline/pipeline_test.go` — integration tests |
| 28 | +This builds with ASan + UBSan and runs all tests (~2040 cases). Key test files: |
| 29 | +- `tests/test_pipeline.c` — pipeline integration tests |
| 30 | +- `tests/test_httplink.c` — HTTP route extraction and linking |
| 31 | +- `tests/test_mcp.c` — MCP protocol and tool handler tests |
| 32 | +- `tests/test_store_*.c` — SQLite graph store tests |
28 | 33 |
|
29 | 34 | ## Run Linter |
30 | 35 |
|
31 | 36 | ```bash |
32 | | -golangci-lint run ./... |
| 37 | +scripts/lint.sh |
| 38 | +``` |
| 39 | + |
| 40 | +Runs clang-tidy, cppcheck, and clang-format. All must pass before committing (also enforced by pre-commit hook). |
| 41 | + |
| 42 | +## Run Security Audit |
| 43 | + |
| 44 | +```bash |
| 45 | +make -f Makefile.cbm security |
33 | 46 | ``` |
34 | 47 |
|
| 48 | +Runs 8 security layers: static allow-list audit, binary string scan, UI audit, install audit, network egress test, MCP robustness (fuzz), vendored dependency integrity, and frontend integrity. |
| 49 | + |
35 | 50 | ## Project Structure |
36 | 51 |
|
37 | 52 | ``` |
38 | | -cmd/codebase-memory-mcp/ Entry point (MCP server + CLI + install/update) |
39 | | -internal/ |
40 | | - lang/ Language specs (63 languages, tree-sitter node types) |
41 | | - parser/ Tree-sitter grammar loading |
42 | | - pipeline/ Multi-pass indexing pipeline |
43 | | - httplink/ Cross-service HTTP route matching |
44 | | - cypher/ Cypher query engine |
45 | | - store/ SQLite graph storage |
46 | | - tools/ MCP tool handlers (12 tools) |
47 | | - watcher/ Background auto-sync |
48 | | - discover/ File discovery with .cgrignore |
49 | | - fqn/ Qualified name computation |
| 53 | +src/ |
| 54 | + foundation/ Arena allocator, hash table, string utils, platform compat |
| 55 | + store/ SQLite graph storage (WAL mode, FTS5) |
| 56 | + cypher/ Cypher query → SQL translation |
| 57 | + mcp/ MCP server (JSON-RPC 2.0 over stdio, 14 tools) |
| 58 | + pipeline/ Multi-pass indexing pipeline |
| 59 | + pass_*.c Individual pipeline passes (definitions, calls, usages, etc.) |
| 60 | + httplink.c HTTP route extraction (Go/Express/Laravel/Ktor/Python) |
| 61 | + discover/ File discovery with gitignore support |
| 62 | + watcher/ Git-based background auto-sync |
| 63 | + cli/ CLI subcommands (install, update, uninstall, config) |
| 64 | + ui/ Graph visualization HTTP server (mongoose) |
| 65 | +internal/cbm/ Tree-sitter AST extraction (64 languages, vendored C grammars) |
| 66 | +vendored/ sqlite3, yyjson, mongoose, mimalloc, xxhash, tre |
| 67 | +graph-ui/ React/Three.js frontend for graph visualization |
| 68 | +scripts/ Build, test, lint, security audit scripts |
| 69 | +tests/ All C test files |
50 | 70 | ``` |
51 | 71 |
|
52 | 72 | ## Adding or Fixing Language Support |
53 | 73 |
|
54 | | -Most language issues are in `internal/lang/<name>.go` (node type configuration) or `internal/pipeline/` (extraction logic). |
| 74 | +Language support is split between two layers: |
| 75 | + |
| 76 | +1. **Tree-sitter extraction** (`internal/cbm/`): Grammar loading, AST node type configuration in `lang_specs.c`, function/call/import extraction in `extract_*.c` |
| 77 | +2. **Pipeline passes** (`src/pipeline/`): Call resolution, usage tracking, HTTP route linking |
55 | 78 |
|
56 | 79 | **Workflow for language fixes:** |
57 | 80 |
|
58 | | -1. Find the relevant language spec in `internal/lang/` |
59 | | -2. Use AST dump tests to see actual tree-sitter node types: |
60 | | - ```bash |
61 | | - go test ./internal/pipeline/ -run TestASTDump -v |
62 | | - ``` |
63 | | -3. Compare configured node types vs actual AST output |
64 | | -4. Update the language spec and add/fix parity test cases |
65 | | -5. Verify with a real open-source repo (see `BENCHMARK_REPORT.md` for test repos per language) |
| 81 | +1. Check the language spec in `internal/cbm/lang_specs.c` |
| 82 | +2. Use regression tests to verify extraction: `tests/test_extraction.c` |
| 83 | +3. Check parity tests: `internal/cbm/regression_test.go` (legacy, being migrated) |
| 84 | +4. Add a test case in `tests/test_pipeline.c` for integration-level fixes |
| 85 | +5. Verify with a real open-source repo |
66 | 86 |
|
67 | 87 | ## Pull Request Guidelines |
68 | 88 |
|
69 | | -- One logical change per PR |
| 89 | +- **C code only** — this project was rewritten from Go to pure C in v0.5.0. Go PRs will be acknowledged and potentially ported, but cannot be merged directly. |
| 90 | +- One logical change per PR — don't bundle unrelated features |
70 | 91 | - Include tests for new functionality |
71 | | -- Run `go test ./... -count=1` and `golangci-lint run` before submitting |
| 92 | +- Run `scripts/test.sh` and `scripts/lint.sh` before submitting |
72 | 93 | - Keep PRs focused — avoid unrelated reformatting or refactoring |
73 | 94 | - Reference the issue number in your PR description |
74 | 95 |
|
| 96 | +## Security |
| 97 | + |
| 98 | +We take security seriously. All PRs go through: |
| 99 | +- Manual security review (dangerous calls, network access, file writes, prompt injection) |
| 100 | +- Automated 8-layer security audit in CI |
| 101 | +- Vendored dependency integrity checks |
| 102 | + |
| 103 | +If you add a new `system()`, `popen()`, `fork()`, or network call, it must be justified and added to `scripts/security-allowlist.txt`. |
| 104 | + |
75 | 105 | ## Good First Issues |
76 | 106 |
|
77 | 107 | Check [issues labeled `good first issue`](https://github.com/DeusData/codebase-memory-mcp/labels/good%20first%20issue) for beginner-friendly tasks with clear scope and guidance. |
|
0 commit comments