Skip to content

Commit e4763a1

Browse files
committed
Merge branch 'main' into fix/add-gnu-source-for-strcasestr
2 parents 43aca64 + de322d9 commit e4763a1

32 files changed

+2628
-117
lines changed

CONTRIBUTING.md

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,14 +85,45 @@ Language support is split between two layers:
8585
4. Add a test case in `tests/test_pipeline.c` for integration-level fixes
8686
5. Verify with a real open-source repo
8787

88+
### Infrastructure Languages (Infra-Pass Pattern)
89+
90+
Languages like **Dockerfile**, **docker-compose**, **Kubernetes manifests**, and **Kustomize** do not require a new tree-sitter grammar. Instead they follow an *infra-pass* pattern, reusing the existing tree-sitter YAML grammar where applicable:
91+
92+
1. **Detection helpers** in `src/pipeline/pass_infrascan.c` — functions like `cbm_is_dockerfile()`, `cbm_is_k8s_manifest()`, `cbm_is_kustomize_file()` identify files by name and/or content heuristics (e.g., presence of `apiVersion:`).
93+
2. **Custom extractors** in `internal/cbm/extract_k8s.c` — tree-sitter-based parsers that walk the YAML AST (using the tree-sitter YAML grammar) and populate `CBMFileResult` with imports and definitions.
94+
3. **Pipeline pass** (`pass_k8s.c`, `pass_infrascan.c`) — calls the extractor and emits graph nodes/edges. K8s manifests emit `Resource` nodes; Kustomize files emit `Module` nodes with `IMPORTS` edges to referenced resource files.
95+
96+
**When adding a new infrastructure language:**
97+
- Add a detection helper (`cbm_is_<lang>_file()`) in `pass_infrascan.c` or a new `pass_<lang>.c`.
98+
- Add the `CBM_LANG_<LANG>` enum value in `internal/cbm/cbm.h` and a row in the language table in `lang_specs.c`.
99+
- Write a custom extractor that returns `CBMFileResult*` — do not add a tree-sitter grammar.
100+
- Register the pass in `pipeline.c`.
101+
- Add tests in `tests/test_pipeline.c` following the `TEST(infra_is_dockerfile)` and `TEST(k8s_extract_manifest)` patterns.
102+
103+
## Commit Format
104+
105+
Use conventional commits: `type(scope): description`
106+
107+
| Type | When to use |
108+
|------|-------------|
109+
| `feat` | New feature or capability |
110+
| `fix` | Bug fix |
111+
| `test` | Adding or updating tests |
112+
| `refactor` | Code change that neither fixes a bug nor adds a feature |
113+
| `perf` | Performance improvement |
114+
| `docs` | Documentation only |
115+
| `chore` | Build scripts, CI, dependency updates |
116+
117+
Examples: `fix(store): set busy_timeout before WAL`, `feat(cli): add --progress flag`
118+
88119
## Pull Request Guidelines
89120

121+
- **One issue per PR.** Each PR must address exactly one bug, one feature, or one refactor. Do not bundle multiple fixes or feature additions into a single PR. If your change touches multiple areas, split it into separate PRs.
122+
- **Open an issue first.** Every PR should reference a tracking issue (`Fixes #N` or `Closes #N`). This ensures the change is discussed before code is written.
90123
- **C code only** — this project was rewritten from Go to pure C in v0.5.0. Go PRs will be acknowledged and potentially ported, but cannot be merged directly.
91-
- One logical change per PR — don't bundle unrelated features
92124
- Include tests for new functionality
93125
- Run `scripts/test.sh` and `scripts/lint.sh` before submitting
94126
- Keep PRs focused — avoid unrelated reformatting or refactoring
95-
- Reference the issue number in your PR description
96127

97128
## Security
98129

Makefile.cbm

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ EXTRACTION_SRCS = \
115115
$(CBM_DIR)/extract_type_refs.c \
116116
$(CBM_DIR)/extract_type_assigns.c \
117117
$(CBM_DIR)/extract_env_accesses.c \
118+
$(CBM_DIR)/extract_k8s.c \
118119
$(CBM_DIR)/helpers.c \
119120
$(CBM_DIR)/lang_specs.c
120121

@@ -148,6 +149,7 @@ MCP_SRCS = src/mcp/mcp.c
148149
# Discover module (new)
149150
DISCOVER_SRCS = \
150151
src/discover/language.c \
152+
src/discover/userconfig.c \
151153
src/discover/gitignore.c \
152154
src/discover/discover.c
153155

@@ -176,6 +178,7 @@ PIPELINE_SRCS = \
176178
src/pipeline/pass_envscan.c \
177179
src/pipeline/pass_compile_commands.c \
178180
src/pipeline/pass_infrascan.c \
181+
src/pipeline/pass_k8s.c \
179182
src/pipeline/httplink.c
180183

181184
# Traces module (new)
@@ -259,6 +262,7 @@ TEST_MCP_SRCS = \
259262

260263
TEST_DISCOVER_SRCS = \
261264
tests/test_language.c \
265+
tests/test_userconfig.c \
262266
tests/test_gitignore.c \
263267
tests/test_discover.c
264268

README.md

Lines changed: 25 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@
44
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
55
[![CI](https://img.shields.io/github/actions/workflow/status/DeusData/codebase-memory-mcp/dry-run.yml?label=CI)](https://github.com/DeusData/codebase-memory-mcp/actions/workflows/dry-run.yml)
66
[![Tests](https://img.shields.io/badge/tests-2042_passing-brightgreen)](https://github.com/DeusData/codebase-memory-mcp)
7-
[![Languages](https://img.shields.io/badge/languages-64-orange)](https://github.com/DeusData/codebase-memory-mcp)
7+
[![Languages](https://img.shields.io/badge/languages-66-orange)](https://github.com/DeusData/codebase-memory-mcp)
88
[![Agents](https://img.shields.io/badge/agents-10-purple)](https://github.com/DeusData/codebase-memory-mcp)
99
[![Pure C](https://img.shields.io/badge/pure_C-zero_dependencies-blue)](https://github.com/DeusData/codebase-memory-mcp)
1010
[![Platform](https://img.shields.io/badge/macOS_%7C_Linux_%7C_Windows-supported-lightgrey)](https://github.com/DeusData/codebase-memory-mcp/releases/latest)
1111
[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/DeusData/codebase-memory-mcp/badge)](https://scorecard.dev/viewer/?uri=github.com/DeusData/codebase-memory-mcp)
1212

1313
**The fastest and most efficient code intelligence engine for AI coding agents.** Full-indexes an average repository in milliseconds, the Linux kernel (28M LOC, 75K files) in 3 minutes. Answers structural queries in under 1ms. Ships as a single static binary for macOS, Linux, and Windows — download, run `install`, done.
1414

15-
High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across all 64 languages, enhanced with LSP-style hybrid type resolution for Go, C, and C++ (more languages coming soon) — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 14 MCP tools. Zero dependencies. Plug and play across 10 coding agents.
15+
High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across all 66 languages, enhanced with LSP-style hybrid type resolution for Go, C, and C++ (more languages coming soon) — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 14 MCP tools. Zero dependencies. Plug and play across 10 coding agents.
1616

1717
<p align="center">
1818
<img src="docs/graph-ui-screenshot.png" alt="Graph visualization UI showing the codebase-memory-mcp knowledge graph" width="800">
@@ -24,10 +24,11 @@ High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-si
2424

2525
- **Extreme indexing speed** — Linux kernel (28M LOC, 75K files) in 3 minutes. RAM-first pipeline: LZ4 compression, in-memory SQLite, fused Aho-Corasick pattern matching. Memory released after indexing.
2626
- **Plug and play** — single static binary for macOS (arm64/amd64), Linux (arm64/amd64), and Windows (amd64). No Docker, no runtime dependencies, no API keys. Download → `install` → restart agent → done.
27-
- **64 languages** — vendored tree-sitter grammars compiled into the binary. Nothing to install, nothing that breaks.
27+
- **66 languages** — vendored tree-sitter grammars compiled into the binary. Nothing to install, nothing that breaks.
2828
- **120x fewer tokens** — 5 structural queries: ~3,400 tokens vs ~412,000 via file-by-file search. One graph query replaces dozens of grep/read cycles.
2929
- **10 agents, one command**`install` auto-detects Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, and OpenClaw — configures MCP entries, instruction files, and pre-tool hooks for each.
3030
- **Built-in graph visualization** — 3D interactive UI at `localhost:9749` (optional UI binary variant).
31+
- **Infrastructure-as-code indexing** — Dockerfiles, Kubernetes manifests, and Kustomize overlays indexed as graph nodes with cross-references. `Resource` nodes for K8s kinds, `Module` nodes for Kustomize overlays with `IMPORTS` edges to referenced resources.
3132
- **14 MCP tools** — search, trace, architecture, impact analysis, Cypher queries, dead code detection, cross-service HTTP linking, ADR management, and more.
3233

3334
## Quick Start
@@ -279,7 +280,7 @@ codebase-memory-mcp cli --raw search_graph '{"label": "Function"}' | jq '.result
279280

280281
### Node Labels
281282

282-
`Project`, `Package`, `Folder`, `File`, `Module`, `Class`, `Function`, `Method`, `Interface`, `Enum`, `Type`, `Route`
283+
`Project`, `Package`, `Folder`, `File`, `Module`, `Class`, `Function`, `Method`, `Interface`, `Enum`, `Type`, `Route`, `Resource`
283284

284285
### Edge Types
285286

@@ -306,6 +307,24 @@ codebase-memory-mcp config set auto_index_limit 50000 # max files for auto-in
306307
codebase-memory-mcp config reset auto_index # reset to default
307308
```
308309

310+
## Custom File Extensions
311+
312+
Map additional file extensions to supported languages via JSON config files. Useful for framework-specific extensions like `.blade.php` (Laravel) or `.mjs` (ES modules).
313+
314+
**Per-project** (in your repo root):
315+
```json
316+
// .codebase-memory.json
317+
{"extra_extensions": {".blade.php": "php", ".mjs": "javascript"}}
318+
```
319+
320+
**Global** (applies to all projects):
321+
```json
322+
// ~/.config/codebase-memory-mcp/config.json (or $XDG_CONFIG_HOME/...)
323+
{"extra_extensions": {".twig": "html", ".phtml": "php"}}
324+
```
325+
326+
Project config overrides global for conflicting extensions. Unknown language values are silently skipped. Missing config files are ignored.
327+
309328
## Persistence
310329

311330
SQLite databases stored at `~/.cache/codebase-memory-mcp/`. Persists across restarts (WAL mode, ACID-safe). To reset: `rm -rf ~/.cache/codebase-memory-mcp/`.
@@ -323,7 +342,7 @@ SQLite databases stored at `~/.cache/codebase-memory-mcp/`. Persists across rest
323342

324343
## Language Support
325344

326-
64 languages. Benchmarked against 64 real open-source repositories (78 to 49K nodes):
345+
66 languages. Benchmarked against 64 real open-source repositories (78 to 49K nodes):
327346

328347
| Tier | Score | Languages |
329348
|------|-------|-----------|
@@ -348,7 +367,7 @@ src/
348367
traces/ Runtime trace ingestion
349368
ui/ Embedded HTTP server + 3D graph visualization
350369
foundation/ Platform abstractions (threads, filesystem, logging, memory)
351-
internal/cbm/ Vendored tree-sitter grammars (64 languages) + AST extraction engine
370+
internal/cbm/ Vendored tree-sitter grammars (66 languages) + AST extraction engine
352371
```
353372

354373
## License

internal/cbm/cbm.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,11 @@ CBMFileResult *cbm_extract_file(const char *source, int source_len, CBMLanguage
316316
cbm_extract_imports(&ctx);
317317
cbm_extract_unified(&ctx);
318318

319+
// K8s / Kustomize semantic pass (additional structured extraction for YAML-based infra files).
320+
if (ctx.language == CBM_LANG_KUSTOMIZE || ctx.language == CBM_LANG_K8S) {
321+
cbm_extract_k8s(&ctx);
322+
}
323+
319324
// LSP type-aware call resolution
320325
uint64_t lsp_start = now_ns();
321326
if (language == CBM_LANG_GO) {

internal/cbm/cbm.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,8 @@ typedef enum {
7575
CBM_LANG_FORM,
7676
CBM_LANG_MAGMA,
7777
CBM_LANG_WOLFRAM,
78+
CBM_LANG_KUSTOMIZE, // kustomization.yaml — Kubernetes overlay tool
79+
CBM_LANG_K8S, // Generic Kubernetes manifest (apiVersion: detected)
7880
CBM_LANG_COUNT
7981
} CBMLanguage;
8082

@@ -361,4 +363,7 @@ void cbm_extract_type_assigns(CBMExtractCtx *ctx);
361363
// Single-pass unified extraction (replaces the 7 calls above except defs+imports).
362364
void cbm_extract_unified(CBMExtractCtx *ctx);
363365

366+
// K8s / Kustomize semantic extractor (called when language is CBM_LANG_K8S or CBM_LANG_KUSTOMIZE).
367+
void cbm_extract_k8s(CBMExtractCtx *ctx);
368+
364369
#endif // CBM_H

0 commit comments

Comments
 (0)