You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+33-2Lines changed: 33 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,14 +85,45 @@ Language support is split between two layers:
85
85
4. Add a test case in `tests/test_pipeline.c` for integration-level fixes
86
86
5. Verify with a real open-source repo
87
87
88
+
### Infrastructure Languages (Infra-Pass Pattern)
89
+
90
+
Languages like **Dockerfile**, **docker-compose**, **Kubernetes manifests**, and **Kustomize** do not require a new tree-sitter grammar. Instead they follow an *infra-pass* pattern, reusing the existing tree-sitter YAML grammar where applicable:
91
+
92
+
1.**Detection helpers** in `src/pipeline/pass_infrascan.c` — functions like `cbm_is_dockerfile()`, `cbm_is_k8s_manifest()`, `cbm_is_kustomize_file()` identify files by name and/or content heuristics (e.g., presence of `apiVersion:`).
93
+
2.**Custom extractors** in `internal/cbm/extract_k8s.c` — tree-sitter-based parsers that walk the YAML AST (using the tree-sitter YAML grammar) and populate `CBMFileResult` with imports and definitions.
94
+
3.**Pipeline pass** (`pass_k8s.c`, `pass_infrascan.c`) — calls the extractor and emits graph nodes/edges. K8s manifests emit `Resource` nodes; Kustomize files emit `Module` nodes with `IMPORTS` edges to referenced resource files.
95
+
96
+
**When adding a new infrastructure language:**
97
+
- Add a detection helper (`cbm_is_<lang>_file()`) in `pass_infrascan.c` or a new `pass_<lang>.c`.
98
+
- Add the `CBM_LANG_<LANG>` enum value in `internal/cbm/cbm.h` and a row in the language table in `lang_specs.c`.
99
+
- Write a custom extractor that returns `CBMFileResult*` — do not add a tree-sitter grammar.
100
+
- Register the pass in `pipeline.c`.
101
+
- Add tests in `tests/test_pipeline.c` following the `TEST(infra_is_dockerfile)` and `TEST(k8s_extract_manifest)` patterns.
102
+
103
+
## Commit Format
104
+
105
+
Use conventional commits: `type(scope): description`
106
+
107
+
| Type | When to use |
108
+
|------|-------------|
109
+
|`feat`| New feature or capability |
110
+
|`fix`| Bug fix |
111
+
|`test`| Adding or updating tests |
112
+
|`refactor`| Code change that neither fixes a bug nor adds a feature |
113
+
|`perf`| Performance improvement |
114
+
|`docs`| Documentation only |
115
+
|`chore`| Build scripts, CI, dependency updates |
116
+
117
+
Examples: `fix(store): set busy_timeout before WAL`, `feat(cli): add --progress flag`
118
+
88
119
## Pull Request Guidelines
89
120
121
+
-**One issue per PR.** Each PR must address exactly one bug, one feature, or one refactor. Do not bundle multiple fixes or feature additions into a single PR. If your change touches multiple areas, split it into separate PRs.
122
+
-**Open an issue first.** Every PR should reference a tracking issue (`Fixes #N` or `Closes #N`). This ensures the change is discussed before code is written.
90
123
-**C code only** — this project was rewritten from Go to pure C in v0.5.0. Go PRs will be acknowledged and potentially ported, but cannot be merged directly.
91
-
- One logical change per PR — don't bundle unrelated features
92
124
- Include tests for new functionality
93
125
- Run `scripts/test.sh` and `scripts/lint.sh` before submitting
94
126
- Keep PRs focused — avoid unrelated reformatting or refactoring
95
-
- Reference the issue number in your PR description
**The fastest and most efficient code intelligence engine for AI coding agents.** Full-indexes an average repository in milliseconds, the Linux kernel (28M LOC, 75K files) in 3 minutes. Answers structural queries in under 1ms. Ships as a single static binary for macOS, Linux, and Windows — download, run `install`, done.
14
14
15
-
High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across all 64 languages, enhanced with LSP-style hybrid type resolution for Go, C, and C++ (more languages coming soon) — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 14 MCP tools. Zero dependencies. Plug and play across 10 coding agents.
15
+
High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across all 66 languages, enhanced with LSP-style hybrid type resolution for Go, C, and C++ (more languages coming soon) — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 14 MCP tools. Zero dependencies. Plug and play across 10 coding agents.
16
16
17
17
<palign="center">
18
18
<imgsrc="docs/graph-ui-screenshot.png"alt="Graph visualization UI showing the codebase-memory-mcp knowledge graph"width="800">
@@ -24,10 +24,11 @@ High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-si
24
24
25
25
-**Extreme indexing speed** — Linux kernel (28M LOC, 75K files) in 3 minutes. RAM-first pipeline: LZ4 compression, in-memory SQLite, fused Aho-Corasick pattern matching. Memory released after indexing.
26
26
-**Plug and play** — single static binary for macOS (arm64/amd64), Linux (arm64/amd64), and Windows (amd64). No Docker, no runtime dependencies, no API keys. Download → `install` → restart agent → done.
27
-
-**64 languages** — vendored tree-sitter grammars compiled into the binary. Nothing to install, nothing that breaks.
27
+
-**66 languages** — vendored tree-sitter grammars compiled into the binary. Nothing to install, nothing that breaks.
28
28
-**120x fewer tokens** — 5 structural queries: ~3,400 tokens vs ~412,000 via file-by-file search. One graph query replaces dozens of grep/read cycles.
29
29
-**10 agents, one command** — `install` auto-detects Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, and OpenClaw — configures MCP entries, instruction files, and pre-tool hooks for each.
30
30
-**Built-in graph visualization** — 3D interactive UI at `localhost:9749` (optional UI binary variant).
31
+
-**Infrastructure-as-code indexing** — Dockerfiles, Kubernetes manifests, and Kustomize overlays indexed as graph nodes with cross-references. `Resource` nodes for K8s kinds, `Module` nodes for Kustomize overlays with `IMPORTS` edges to referenced resources.
31
32
-**14 MCP tools** — search, trace, architecture, impact analysis, Cypher queries, dead code detection, cross-service HTTP linking, ADR management, and more.
@@ -306,6 +307,24 @@ codebase-memory-mcp config set auto_index_limit 50000 # max files for auto-in
306
307
codebase-memory-mcp config reset auto_index # reset to default
307
308
```
308
309
310
+
## Custom File Extensions
311
+
312
+
Map additional file extensions to supported languages via JSON config files. Useful for framework-specific extensions like `.blade.php` (Laravel) or `.mjs` (ES modules).
0 commit comments