Skip to content

Commit b7eabf7

Browse files
Shidfarclaude
andcommitted
docs: add cross-repo protocol linking section
Documents the full cross-repo linking feature: endpoint persistence, cross-project matching algorithm, _crosslinks.db schema, MCP tool usage, pipeline integration, and verified results (148 links across 2 Anyfin repos). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 28a22ea commit b7eabf7

1 file changed

Lines changed: 106 additions & 1 deletion

File tree

docs/CROSS-SERVICE-PROTOCOL-LINKING.md

Lines changed: 106 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -309,10 +309,115 @@ The rebase onto upstream required fixing several build issues (all resolved):
309309

310310
4. **Test files** — All 14 `test_servicelink_*.c` files included the deleted `httplink.h`. Fixed by removing the include (functions already available via `servicelink.h`).
311311

312+
## Cross-Repo Protocol Linking
313+
314+
**Status**: Complete. Implemented 2026-04-08.
315+
316+
Each linker only matches producers and consumers within a single repo. Cross-repo linking solves this by persisting all discovered endpoints and matching across project boundaries.
317+
318+
### How It Works
319+
320+
**Phase 1: Endpoint Persistence** — During indexing, each of the 14 linkers registers every discovered producer and consumer via `sl_register_endpoint()`. After the graph dump, `cbm_persist_endpoints()` writes these to a `protocol_endpoints` table in the project's `.db`:
321+
322+
```sql
323+
CREATE TABLE protocol_endpoints (
324+
id INTEGER PRIMARY KEY AUTOINCREMENT,
325+
project TEXT NOT NULL,
326+
protocol TEXT NOT NULL, -- "graphql", "kafka", "pubsub", etc.
327+
role TEXT NOT NULL, -- "producer" or "consumer"
328+
identifier TEXT NOT NULL, -- topic name, operation name, etc.
329+
node_qn TEXT NOT NULL, -- function qualified name
330+
file_path TEXT NOT NULL,
331+
extra TEXT DEFAULT '{}',
332+
UNIQUE(project, protocol, role, identifier, node_qn)
333+
);
334+
```
335+
336+
**Phase 2: Cross-Project Matching** — After persistence, `cbm_cross_project_link()` scans all `.db` files in the cache directory, collects endpoints, and matches producers in project A with consumers in project B:
337+
338+
- **Exact match** (confidence 0.95): identical identifier strings
339+
- **Normalized match** (confidence 0.85): identifiers match after lowercasing and stripping `-`, `_`, `.` separators (e.g., `orderCreated` matches `order_created`)
340+
341+
Results are written to `~/.cache/codebase-memory-mcp/_crosslinks.db`:
342+
343+
```sql
344+
CREATE TABLE cross_links (
345+
id INTEGER PRIMARY KEY AUTOINCREMENT,
346+
protocol TEXT NOT NULL,
347+
identifier TEXT NOT NULL,
348+
producer_project TEXT NOT NULL,
349+
producer_qn TEXT NOT NULL,
350+
producer_file TEXT NOT NULL,
351+
consumer_project TEXT NOT NULL,
352+
consumer_qn TEXT NOT NULL,
353+
consumer_file TEXT NOT NULL,
354+
confidence REAL NOT NULL,
355+
updated_at TEXT NOT NULL,
356+
UNIQUE(protocol, identifier, producer_qn, consumer_qn)
357+
);
358+
```
359+
360+
Cross-links are rebuilt automatically after every `index_repository` call.
361+
362+
### MCP Tool: cross_project_links
363+
364+
Query cross-project links with optional filters:
365+
366+
```json
367+
{
368+
"name": "cross_project_links",
369+
"inputSchema": {
370+
"properties": {
371+
"protocol": { "type": "string" },
372+
"project": { "type": "string" },
373+
"identifier": { "type": "string" }
374+
}
375+
}
376+
}
377+
```
378+
379+
Example output:
380+
```
381+
## pubsub
382+
383+
order.created (confidence: 0.95)
384+
producer: anyfin-api :: OrderService.create (src/orders/service.ts)
385+
consumer: anyfin-pubsub :: terraform.production.order-created (terraform/production/order-created.tf)
386+
```
387+
388+
### Pipeline Integration
389+
390+
```
391+
pass_servicelinks() ← each linker calls sl_register_endpoint()
392+
393+
dump_and_persist_hashes() ← writes .db
394+
395+
cbm_persist_endpoints() ← writes protocol_endpoints table
396+
397+
cbm_cross_project_link() ← reads all DBs, writes _crosslinks.db
398+
```
399+
400+
### Files
401+
402+
| File | Purpose |
403+
|------|---------|
404+
| `src/pipeline/servicelink.h` | `cbm_sl_endpoint_t`, `cbm_sl_endpoint_list_t`, `sl_register_endpoint()` |
405+
| `src/pipeline/pass_crossrepolinks.c` | `cbm_persist_endpoints()`, `cbm_cross_project_link()` |
406+
| `src/pipeline/pipeline.c` | Lifecycle: allocate → persist → crosslink → free |
407+
| `src/store/store.c` | `protocol_endpoints` table in schema |
408+
| `src/mcp/mcp.c` | `cross_project_links` tool |
409+
410+
### Verified Results (18 Anyfin Repos)
411+
412+
- **anyfin-pubsub**: 488 endpoints (186 Pub/Sub producers, 294 consumers)
413+
- **anyfin-api**: 1543 endpoints (1423 GraphQL producers, 56 Pub/Sub producers)
414+
- **148 cross-project links** created between these two repos alone
415+
- Previously: 0 cross-repo visibility
416+
312417
## Potential Improvements
313418

314419
1. **Targeted scanning via upstream's broker tags** (see "Relationship to Upstream" above)
315420
2. **Parallel linker execution** — linkers are independent and could run on separate threads
316421
3. **Incremental linking** — skip linkers for files that haven't changed
317-
4. **Cross-repo linking**match producers in repo A with consumers in repo B (requires multi-project graph)
422+
4. ~~**Cross-repo linking**~~✅ Implemented (see above)
318423
5. **Confidence calibration** — tune thresholds per-protocol based on real-world false positive rates

0 commit comments

Comments
 (0)