Skip to content

Commit d16119e

Browse files
committed
graph
1 parent efedacf commit d16119e

46 files changed

Lines changed: 2232 additions & 1991 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

AGENTS.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -148,9 +148,12 @@ If no new rule is detected -> do not update the file.
148148
- Implement code and tests together for every behavior change.
149149
- Keep the gateway reusable as a NuGet library, not as an app-specific host.
150150
- Preserve one public execution surface for local `AITool` instances and MCP tools.
151-
- Preserve one searchable catalog that supports vector ranking when embeddings are available and lexical fallback when they are not.
151+
- Preserve one searchable catalog that uses Markdown-LD graph ranking by default and supports vector ranking only when embeddings are explicitly selected.
152+
- Tool search must support sparse high-confidence selection plus an explicit related/next-step expansion path; do not make consumers pass the full tool catalog when a smaller capability set can answer the request.
152153
- For multilingual or noisy search inputs, prefer a generic English-normalization step before ranking when an AI/query-rewrite component is available, because the user wants the searchable representation to converge to English instead of relying only on language-specific token overlap.
153154
- Keep meta-tools available through `McpGatewayToolSet` and `IMcpGateway.CreateMetaTools(...)`.
155+
- When Markdown-LD graph search is selected, startup or explicit index initialization must build and validate the tool graph before search/tool discovery so LLM-facing MCP tool selection is based on the correct focused graph.
156+
- Markdown-LD graph search must support both startup-generated graphs and filesystem-provided graph files; tests for file-backed graph mode must generate the graph fixture through the package flow rather than relying on a hand-authored static artifact.
154157
- If a user adds or corrects a persistent workflow rule, update `AGENTS.md` first and only then continue with the task.
155158

156159
### Repository Layout
@@ -209,7 +212,7 @@ If no new rule is detected -> do not update the file.
209212
- local tool indexing and invocation
210213
- MCP tool indexing and invocation
211214
- vector search behavior
212-
- lexical fallback behavior
215+
- Markdown-LD graph search and vector-to-graph fallback behavior
213216
- Keep embedding-based search covered with deterministic local tests by using a fake or test-only embedding generator.
214217
- Keep request context behavior covered when search or invocation consumes contextual inputs.
215218
- Do not remove tests to get green builds.
@@ -252,7 +255,8 @@ If no new rule is detected -> do not update the file.
252255
- Prefer direct generic DI registrations such as `services.TryAddSingleton<IService, Implementation>()` over lambda alias registrations when wiring package services, because the lambda style has already been called out as unreadable and error-prone in this repository.
253256
- Keep runtime services DI-native from their public/internal constructors; types such as `McpGatewayRegistry` must be creatable through `IOptions<McpGatewayOptions>` and other DI-managed dependencies rather than ad-hoc state-only constructors, because the package design requires services to live fully inside the container.
254257
- When emitting package identity to external protocols such as MCP client info, never hardcode a fake version string; use the actual assembly/build version so runtime metadata stays aligned with the package being shipped.
255-
- For search-quality improvements, prefer mathematical or statistical ranking changes over hardcoded phrase lists or ad-hoc query text hacks, because the user explicitly wants tokenizer search to improve through general scoring behavior rather than manual exceptions.
258+
- For search-quality improvements, prefer mathematical, statistical, or graph-ranking changes over hardcoded phrase lists or ad-hoc query text hacks, because the user explicitly wants token-distance search to improve through general scoring behavior rather than manual exceptions.
259+
- Do not keep a separate local tokenizer search path when `ManagedCode.MarkdownLd.Kb` already provides token-based graph search; route tokenizer-backed retrieval through Markdown-LD so the package does not carry duplicate ranking implementations.
256260
- Prefer framework-provided in-memory caching primitives such as `IMemoryCache` over custom process-local storage implementations when they cover the lifecycle and lookup needs, because self-rolled memory stores age poorly and make scaling/concurrency behavior harder to trust.
257261
- Never keep legacy compatibility shims, obsolete paths, or lingering documentation references to removed implementations when a replacement is accepted, because this repository should converge on the current design instead of carrying dead historical baggage.
258262
- Never leave `ManagedCode`-prefixed DI/setup extension method names such as `AddManagedCodeMcpGateway(...)` in the public API once concise `McpGateway` naming is available, because these branded leftovers make the package surface inconsistent and read like stale legacy.

Directory.Build.props

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
<AnalysisLevel>latest-recommended</AnalysisLevel>
1212
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
1313
<NoWarn>$(NoWarn);CS1591;CA1707;CA1848;CA1859;CA1873</NoWarn>
14-
<Version>0.3.1</Version>
14+
<Version>0.3.2</Version>
1515
<PackageVersion>$(Version)</PackageVersion>
1616
</PropertyGroup>
1717

Directory.Packages.props

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,19 @@
44
</PropertyGroup>
55
<ItemGroup>
66
<PackageVersion Include="DotNet.ReproducibleBuilds" Version="2.0.2" />
7-
<PackageVersion Include="Microsoft.Agents.AI" Version="1.0.0-rc3" />
8-
<PackageVersion Include="Microsoft.Extensions.AI" Version="10.3.0" />
9-
<PackageVersion Include="Microsoft.Extensions.Caching.Memory" Version="10.0.3" />
10-
<PackageVersion Include="Microsoft.Extensions.DependencyInjection" Version="10.0.3" />
11-
<PackageVersion Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="10.0.3" />
12-
<PackageVersion Include="Microsoft.Extensions.Hosting.Abstractions" Version="10.0.3" />
13-
<PackageVersion Include="Microsoft.Extensions.Logging" Version="10.0.3" />
14-
<PackageVersion Include="Microsoft.Extensions.Logging.Abstractions" Version="10.0.3" />
15-
<PackageVersion Include="Microsoft.ML.Tokenizers" Version="2.0.0" />
16-
<PackageVersion Include="Microsoft.ML.Tokenizers.Data.O200kBase" Version="2.0.0" />
17-
<PackageVersion Include="Microsoft.Extensions.Options" Version="10.0.3" />
18-
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="18.3.0" />
19-
<PackageVersion Include="Microsoft.SourceLink.GitHub" Version="10.0.103" />
20-
<PackageVersion Include="ModelContextProtocol" Version="1.1.0" />
21-
<PackageVersion Include="TUnit" Version="1.19.0" />
7+
<PackageVersion Include="ManagedCode.MarkdownLd.Kb" Version="0.1.1" />
8+
<PackageVersion Include="Microsoft.Agents.AI" Version="1.1.0" />
9+
<PackageVersion Include="Microsoft.Extensions.AI" Version="10.5.0" />
10+
<PackageVersion Include="Microsoft.Extensions.Caching.Memory" Version="10.0.6" />
11+
<PackageVersion Include="Microsoft.Extensions.DependencyInjection" Version="10.0.6" />
12+
<PackageVersion Include="Microsoft.Extensions.DependencyInjection.Abstractions" Version="10.0.6" />
13+
<PackageVersion Include="Microsoft.Extensions.Hosting.Abstractions" Version="10.0.6" />
14+
<PackageVersion Include="Microsoft.Extensions.Logging" Version="10.0.6" />
15+
<PackageVersion Include="Microsoft.Extensions.Logging.Abstractions" Version="10.0.6" />
16+
<PackageVersion Include="Microsoft.Extensions.Options" Version="10.0.6" />
17+
<PackageVersion Include="Microsoft.NET.Test.Sdk" Version="18.4.0" />
18+
<PackageVersion Include="Microsoft.SourceLink.GitHub" Version="10.0.202" />
19+
<PackageVersion Include="ModelContextProtocol" Version="1.2.0" />
20+
<PackageVersion Include="TUnit" Version="1.34.0" />
2221
</ItemGroup>
2322
</Project>

README.md

Lines changed: 147 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ dotnet add package ManagedCode.MCPGateway
2222
## What It Gives You
2323

2424
- one gateway for local `AITool` instances and MCP tools
25-
- one search surface with vector ranking when embeddings are available and lexical fallback when they are not
25+
- one search surface with default Markdown-LD graph ranking and opt-in vector ranking
2626
- one invoke surface for both local tools and MCP tools
2727
- runtime registration through `IMcpGatewayRegistry`
2828
- reusable gateway meta-tools for chat clients and agents
@@ -75,8 +75,9 @@ var invoke = await gateway.InvokeAsync(new McpGatewayInvokeRequest(
7575

7676
Important defaults:
7777

78-
- search is `Auto` by default
79-
- `Auto` uses embeddings when available and lexical fallback otherwise
78+
- search is `Graph` by default
79+
- graph search uses `ManagedCode.MarkdownLd.Kb` and does not require embeddings
80+
- embeddings are opt-in through `McpGatewaySearchStrategy.Embeddings` or `McpGatewaySearchStrategy.Auto`
8081
- the default result size is `5`
8182
- the maximum result size is `15`
8283
- the index is built lazily on first list, search, or invoke
@@ -350,7 +351,7 @@ var response = await agent.RunAsync(
350351

351352
## Optional Warmup
352353

353-
The gateway works without explicit initialization, but you can warm the index eagerly when you want startup validation or a pre-built cache.
354+
The gateway works without explicit initialization, but you can warm the index eagerly when you want startup validation or a pre-built cache. When Markdown-LD graph search is selected, warmup builds the graph during startup instead of waiting for the first search.
354355

355356
Manual warmup:
356357

@@ -393,6 +394,8 @@ services.AddKeyedSingleton<IEmbeddingGenerator<string, Embedding<float>>, MyEmbe
393394

394395
services.AddMcpGateway(options =>
395396
{
397+
options.SearchStrategy = McpGatewaySearchStrategy.Embeddings;
398+
396399
options.AddTool(
397400
"local",
398401
AIFunctionFactory.Create(
@@ -405,7 +408,7 @@ services.AddMcpGateway(options =>
405408
});
406409
```
407410

408-
If no embedding generator is registered, the same gateway still works and falls back to lexical search automatically.
411+
If vector search cannot run for a request, the gateway falls back to the same Markdown-LD graph index used by the default mode and reports a diagnostic. If you register an embedding generator but leave the default `Graph` strategy in place, the generator is not used.
409412

410413
## Optional Query Normalization
411414

@@ -420,8 +423,17 @@ services.AddKeyedSingleton<IChatClient>(
420423

421424
services.AddMcpGateway(options =>
422425
{
423-
options.SearchStrategy = McpGatewaySearchStrategy.Auto;
424426
options.SearchQueryNormalization = McpGatewaySearchQueryNormalization.TranslateToEnglishWhenAvailable;
427+
428+
options.AddTool(
429+
"local",
430+
AIFunctionFactory.Create(
431+
static (string query) => $"github:{query}",
432+
new AIFunctionFactoryOptions
433+
{
434+
Name = "github_search_repositories",
435+
Description = "Search GitHub repositories by user query."
436+
}));
425437
});
426438
```
427439

@@ -435,6 +447,20 @@ For process-local caching, use the built-in `IMemoryCache`-backed store:
435447
services.AddKeyedSingleton<IEmbeddingGenerator<string, Embedding<float>>, MyEmbeddingGenerator>(
436448
McpGatewayServiceKeys.EmbeddingGenerator);
437449
services.AddMcpGatewayInMemoryToolEmbeddingStore();
450+
services.AddMcpGateway(options =>
451+
{
452+
options.SearchStrategy = McpGatewaySearchStrategy.Embeddings;
453+
454+
options.AddTool(
455+
"local",
456+
AIFunctionFactory.Create(
457+
static (string query) => $"github:{query}",
458+
new AIFunctionFactoryOptions
459+
{
460+
Name = "github_search_repositories",
461+
Description = "Search GitHub repositories by user query."
462+
}));
463+
});
438464
```
439465

440466
This built-in store reuses the application's shared `IMemoryCache` and only caches embeddings inside the current process. It is useful for local reuse, but it is not durable and does not synchronize across replicas.
@@ -447,25 +473,117 @@ For multi-instance or durable caching, register your own `IMcpGatewayToolEmbeddi
447473
services.AddKeyedSingleton<IEmbeddingGenerator<string, Embedding<float>>, MyEmbeddingGenerator>(
448474
McpGatewayServiceKeys.EmbeddingGenerator);
449475
services.AddSingleton<IMcpGatewayToolEmbeddingStore, MyToolEmbeddingStore>();
476+
services.AddMcpGateway(options =>
477+
{
478+
options.SearchStrategy = McpGatewaySearchStrategy.Embeddings;
479+
480+
options.AddTool(
481+
"local",
482+
AIFunctionFactory.Create(
483+
static (string query) => $"github:{query}",
484+
new AIFunctionFactoryOptions
485+
{
486+
Name = "github_search_repositories",
487+
Description = "Search GitHub repositories by user query."
488+
}));
489+
});
490+
```
491+
492+
## Markdown-LD Graph Sources
493+
494+
By default the gateway generates Markdown-LD tool documents from the current local `AITool` and MCP catalog during index build:
495+
496+
```csharp
497+
services.AddMcpGateway(options =>
498+
{
499+
options.SearchStrategy = McpGatewaySearchStrategy.Graph;
500+
options.UseGeneratedMarkdownLdGraph();
501+
502+
options.AddTool(
503+
"local",
504+
AIFunctionFactory.Create(
505+
static (string query) => $"github:{query}",
506+
new AIFunctionFactoryOptions
507+
{
508+
Name = "github_search_repositories",
509+
Description = "Search GitHub repositories by user query."
510+
}));
511+
});
512+
```
513+
514+
You can also build the same Markdown-LD source documents ahead of time and point the gateway at a file or directory. This is useful when the graph should be generated in a separate step and loaded by the runtime:
515+
516+
```csharp
517+
var authoringServices = new ServiceCollection();
518+
authoringServices.AddMcpGateway(options =>
519+
{
520+
options.AddTool(
521+
"local",
522+
AIFunctionFactory.Create(
523+
static (string query) => $"github:{query}",
524+
new AIFunctionFactoryOptions
525+
{
526+
Name = "github_search_repositories",
527+
Description = "Search GitHub repositories by user query."
528+
}));
529+
});
530+
531+
await using (var authoringProvider = authoringServices.BuildServiceProvider())
532+
{
533+
var authoringGateway = authoringProvider.GetRequiredService<IMcpGateway>();
534+
var descriptors = await authoringGateway.ListToolsAsync();
535+
var documents = McpGatewayMarkdownLdGraphFile.CreateDocuments(descriptors);
536+
537+
await McpGatewayMarkdownLdGraphFile.WriteAsync(
538+
"artifacts/mcp-tools.graph.json",
539+
documents);
540+
}
541+
542+
var runtimeServices = new ServiceCollection();
543+
runtimeServices.AddMcpGateway(options =>
544+
{
545+
options.SearchStrategy = McpGatewaySearchStrategy.Graph;
546+
options.UseMarkdownLdGraphFile("artifacts/mcp-tools.graph.json");
547+
548+
options.AddTool(
549+
"local",
550+
AIFunctionFactory.Create(
551+
static (string query) => $"github:{query}",
552+
new AIFunctionFactoryOptions
553+
{
554+
Name = "github_search_repositories",
555+
Description = "Search GitHub repositories by user query."
556+
}));
557+
});
450558
```
451559

560+
`UseMarkdownLdGraphFile(...)` accepts:
561+
562+
- a gateway graph bundle JSON file created by `McpGatewayMarkdownLdGraphFile.WriteAsync(...)`
563+
- a directory containing Markdown-LD source documents
564+
- a single Markdown-LD source file supported by `ManagedCode.MarkdownLd.Kb`
565+
566+
The bundle is a portable set of Markdown-LD source documents, not a serialized RDF store. The runtime still builds the in-memory `ManagedCode.MarkdownLd.Kb` graph from those documents so focused graph search, related matches, and next-step matches behave the same way as generated startup mode.
567+
452568
## Search Modes
453569

454-
`McpGatewaySearchStrategy.Auto` is the default and usually the right choice:
570+
`McpGatewaySearchStrategy.Graph` is the default and usually the right choice for zero-cost local retrieval:
455571

456-
- use vector ranking when embeddings are available
457-
- fall back to lexical ranking when they are not
572+
- build or load a Markdown-LD graph during index build
573+
- use deterministic token-distance search from `ManagedCode.MarkdownLd.Kb`
574+
- return primary matches, related matches, next-step matches, and focused graph counts
575+
- keep invocation on the same `ToolId` flow
458576

459-
You can also force a mode:
577+
You can force graph mode explicitly:
460578

461579
```csharp
462580
services.AddMcpGateway(options =>
463581
{
464-
options.SearchStrategy = McpGatewaySearchStrategy.Tokenizer;
582+
options.SearchStrategy = McpGatewaySearchStrategy.Graph;
465583
});
466584
```
467585

468-
Or:
586+
Use embedding mode when the host has an embedding generator and wants vector ranking first:
469587

470588
```csharp
471589
services.AddMcpGateway(options =>
@@ -474,13 +592,28 @@ services.AddMcpGateway(options =>
474592
});
475593
```
476594

595+
Use `Auto` only when the host wants a policy mode that can use embeddings when the graph is unavailable and otherwise prefer the graph path:
596+
597+
```csharp
598+
services.AddMcpGateway(options =>
599+
{
600+
options.SearchStrategy = McpGatewaySearchStrategy.Auto;
601+
});
602+
```
603+
604+
Graph mode uses `ManagedCode.MarkdownLd.Kb` to convert every local `AITool` and MCP tool descriptor into an in-memory Markdown-LD knowledge graph. Each tool becomes a Markdown document with structured front matter, source metadata, required arguments, input schema text, graph groups, related-tool hints, and next-step hints. Search uses the graph's deterministic Tiktoken token-distance focused search to rank tool documents and returns normal `McpGatewaySearchMatch` results, so invocation still uses the same `ToolId` flow.
605+
606+
The old separate local tokenizer strategy is intentionally not exposed. Token-based search is provided by `ManagedCode.MarkdownLd.Kb` inside the Markdown-LD graph path.
607+
477608
`McpGatewaySearchResult.RankingMode` reports:
478609

479610
- `vector`
480-
- `lexical`
611+
- `graph`
481612
- `browse`
482613
- `empty`
483614

615+
`McpGatewayIndexBuildResult` also reports graph index state through `IsGraphSearchEnabled`, `GraphNodeCount`, and `GraphEdgeCount`. These values are useful for startup validation and tests when a host requires graph-backed search to be available.
616+
484617
## Deeper Docs
485618

486619
Use these when you need design details rather than package onboarding:
@@ -489,6 +622,7 @@ Use these when you need design details rather than package onboarding:
489622
- [ADR-0001: Runtime boundaries and index lifecycle](docs/ADR/ADR-0001-runtime-boundaries-and-index-lifecycle.md)
490623
- [ADR-0002: Search ranking and query normalization](docs/ADR/ADR-0002-search-ranking-and-query-normalization.md)
491624
- [ADR-0003: Reusable chat-client and agent auto-discovery modules](docs/ADR/ADR-0003-reusable-chat-client-and-agent-tool-modules.md)
625+
- [ADR-0005: Markdown-LD graph search for tool retrieval](docs/ADR/ADR-0005-markdown-ld-graph-search-for-tool-retrieval.md)
492626
- [Feature spec: Search query normalization and ranking](docs/Features/SearchQueryNormalizationAndRanking.md)
493627

494628
## Local Development

0 commit comments

Comments
 (0)