Skip to content

Commit 71a384c

Browse files
committed
graph
1 parent 4e641de commit 71a384c

14 files changed

Lines changed: 1073 additions & 3 deletions

AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ Target capabilities:
4949

5050
- Keep the core Markdown-to-graph pipeline deterministic and testable without network access.
5151
- Keep the core runtime in-memory. Do not introduce localhost, HTTP server, background service, database server, or hosted API dependencies into the production library.
52+
- Graph construction must support caller-supplied build rules so applications can turn Markdown corpora into structured capability/workflow graphs with groups, typed relationships, related-node expansion, and focused subgraphs instead of only flat document/topic graphs.
53+
- Graph search APIs must support sparse, high-precision retrieval and explainable related/next-step candidates so callers can select the smallest useful result set and request additional graph-neighbor results later.
5254
- Treat LLM/entity extraction as an adapter behind a small interface and implement that adapter through `Microsoft.Extensions.AI.IChatClient` from the start.
5355
- Do not add an embedding dependency to the core graph pipeline. If vector/semantic indexing is added later, expose it as an optional adapter boundary through `Microsoft.Extensions.AI.IEmbeddingGenerator<,>` or a similarly small port, with the concrete provider owned by the host app.
5456
- It is allowed for the production library to reference `Microsoft.Extensions.AI.Abstractions`; concrete OpenAI/Azure/Foundry providers must remain app-level dependencies unless an ADR says otherwise.

Directory.Build.props

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@
2525
<PackageReadmeFile>README.md</PackageReadmeFile>
2626
<EnablePackageValidation>true</EnablePackageValidation>
2727
<Product>Markdown-LD Knowledge Bank</Product>
28-
<Version>0.1.0</Version>
29-
<PackageVersion>0.1.0</PackageVersion>
28+
<Version>0.1.1</Version>
29+
<PackageVersion>0.1.1</PackageVersion>
3030
</PropertyGroup>
3131

3232
<PropertyGroup Condition="'$(GITHUB_ACTIONS)' == 'true'">

README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ Tiktoken mode is deterministic and network-free. It uses lexical token-distance
5555
- `ExecuteSelectAsync(sparql)` — read-only SPARQL SELECT returning `SparqlQueryResult`
5656
- `ExecuteAskAsync(sparql)` — read-only SPARQL ASK returning `bool`
5757
- `SearchAsync(term)` — case-insensitive search across `schema:name`, `schema:description`, and `schema:keywords`, returning matching graph subjects as `SparqlQueryResult`
58+
- `SearchFocusedAsync(term)` — sparse graph search that returns primary, related, and next-step matches plus a bounded focused graph snapshot
5859

5960
All async methods accept an optional `CancellationToken`.
6061

@@ -144,6 +145,61 @@ You do not need to pass a base URI for normal use. Document identity is resolved
144145

145146
The library uses `urn:managedcode:markdown-ld-kb:/` as an internal default base URI only to create valid RDF IRIs when the source does not provide `KnowledgeDocumentConversionOptions.CanonicalUri`. Pass `new MarkdownKnowledgePipeline(new Uri("https://your-domain/"))` only when you want generated document/entity IRIs to live under your own domain.
146147

148+
## Capability Graph Rules
149+
150+
Markdown can include deterministic graph rules in front matter. These rules are useful for capability catalogs, tool catalogs, workflow graphs, and any corpus where related and next-step nodes matter more than broad top-N search.
151+
152+
```markdown
153+
---
154+
title: Story Delete Tool
155+
summary: Delete a story after the caller identifies the exact story item.
156+
graph_groups:
157+
- Story tools
158+
- Delete operation
159+
graph_related:
160+
- https://kb.example/tools/story-feed-detail/
161+
graph_next_steps:
162+
- https://kb.example/tools/story-comments/
163+
---
164+
# Story Delete Tool
165+
166+
Use this capability to remove an existing story.
167+
```
168+
169+
`graph_groups` creates `kb:memberOf` edges. `graph_related` creates `kb:relatedTo` edges. `graph_next_steps` creates `kb:nextStep` edges. For advanced graphs, use `graph_entities` and `graph_edges` to add explicit nodes and predicates. Absolute IRIs are preserved; plain labels become stable entity IRIs under the pipeline base URI.
170+
171+
```csharp
172+
using ManagedCode.MarkdownLd.Kb.Pipeline;
173+
174+
internal static class CapabilityGraphDemo
175+
{
176+
public static async Task RunAsync(IReadOnlyList<MarkdownSourceDocument> documents)
177+
{
178+
var pipeline = new MarkdownKnowledgePipeline(
179+
new Uri("https://kb.example/"),
180+
extractionMode: MarkdownKnowledgeExtractionMode.Tiktoken);
181+
182+
var result = await pipeline.BuildAsync(documents);
183+
var focused = await result.Graph.SearchFocusedAsync(
184+
"remove the selected story from the feed",
185+
new KnowledgeGraphFocusedSearchOptions
186+
{
187+
MaxPrimaryResults = 1,
188+
MaxRelatedResults = 3,
189+
MaxNextStepResults = 3,
190+
});
191+
192+
var primary = focused.PrimaryMatches[0];
193+
var mermaid = KnowledgeGraph.SerializeMermaidFlowchart(focused.FocusedGraph);
194+
195+
Console.WriteLine(primary.Label);
196+
Console.WriteLine(mermaid);
197+
}
198+
}
199+
```
200+
201+
Use `BuildAsync(documents, KnowledgeGraphBuildOptions)` when graph rules are assembled by the host application instead of authored in Markdown front matter.
202+
147203
## Optional AI Extraction
148204

149205
AI extraction builds graph facts from entities and assertions returned by an injected `Microsoft.Extensions.AI.IChatClient`. The package stays provider-neutral: it does not reference OpenAI, Azure OpenAI, Anthropic, or any other model-specific SDK. If no chat client is provided, `Auto` mode extracts no facts and reports a diagnostic; choose `Tiktoken` mode explicitly for local token-distance extraction.
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# ADR-0004: Add Deterministic Capability Graph Rules
2+
3+
Status: Accepted
4+
Date: 2026-04-15
5+
Related Features: `docs/Features/CapabilityGraphRules.md`
6+
7+
---
8+
9+
## Context
10+
11+
The library can already build document metadata, AI-extracted facts, and Tiktoken token-distance graph structure. That is useful for document knowledge graphs, but capability catalogs need more explicit topology. A tool catalog should expose domain groups, operation groups, related tools, and next-step tools without relying on broad semantic top-N retrieval.
12+
13+
Constraints:
14+
15+
- The core library must remain in-memory and network-free.
16+
- Graph construction must be deterministic and testable.
17+
- Applications must be able to provide graph rules without hard-coding their domain into the package.
18+
- Search must support sparse high-confidence results and explainable expansion.
19+
20+
## Decision
21+
22+
Add deterministic capability graph rules to the pipeline.
23+
24+
Rules can come from Markdown front matter or `KnowledgeGraphBuildOptions`. The first shipped front matter keys are:
25+
26+
- `graph_entities`
27+
- `graph_edges`
28+
- `graph_groups`
29+
- `graph_related`
30+
- `graph_next_steps`
31+
32+
The pipeline merges rule-derived facts with extraction-derived facts before graph construction. The graph API also exposes `SearchFocusedAsync`, which returns primary matches, related matches, next-step matches, and a bounded focused graph snapshot.
33+
34+
## Diagram
35+
36+
```mermaid
37+
flowchart LR
38+
Markdown["Markdown"] --> Parser["Parser"]
39+
Parser --> RuleExtractor["Capability rule extractor"]
40+
Parser --> Extractor["Existing extraction mode"]
41+
RuleExtractor --> RuleFacts["Rule facts"]
42+
Extractor --> ExtractedFacts["Extracted facts"]
43+
RuleFacts --> Merge["Fact merge"]
44+
ExtractedFacts --> Merge
45+
Merge --> Graph["RDF graph"]
46+
Graph --> FocusedSearch["Focused search"]
47+
FocusedSearch --> Primary["Primary"]
48+
FocusedSearch --> Related["Related"]
49+
FocusedSearch --> NextStep["Next step"]
50+
```
51+
52+
## Consequences
53+
54+
### Positive
55+
56+
- Applications can build capability/workflow graphs directly from Markdown.
57+
- Tool catalogs can retrieve fewer, more relevant primary tools.
58+
- Related and next-step candidates are explicit and explainable.
59+
- Focused graph snapshots make graph debugging readable.
60+
- The library remains provider-neutral and deterministic.
61+
62+
### Negative / Risks
63+
64+
- Capability graph rules add a public API surface that must stay stable.
65+
- Poor caller-authored rules can still create noisy graphs.
66+
- Focused search is not a planner; it exposes graph neighborhood candidates for the caller to decide how to use.
67+
68+
## Verification
69+
70+
Testing methodology:
71+
72+
- Build a realistic Markdown tool corpus with capability front matter.
73+
- Run the real `MarkdownKnowledgePipeline` in Tiktoken mode.
74+
- Assert primary, related, and next-step matches.
75+
- Assert focused graph export contains group and edge labels and excludes unrelated nodes.
76+
77+
Commands:
78+
79+
- `dotnet test --solution MarkdownLd.Kb.slnx --configuration Release -- --treenode-filter "/*/*/*/Capability_graph_front_matter_builds_focused_search_with_related_and_next_step_results" --no-progress`
80+
- `dotnet test --solution MarkdownLd.Kb.slnx --configuration Release`

docs/Architecture.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The upstream reference repository is kept as a read-only submodule at `external/
1010

1111
The core runtime has no localhost, HTTP server, background service, database server, or hosted API dependency. Callers pass files, directories, or in-memory document content into the library, and the library returns in-memory graph/search/query results.
1212

13-
The graph/search model does not require semantic embeddings. The AI boundary in the core pipeline is `Microsoft.Extensions.AI.IChatClient` for entity/assertion extraction. The library also exposes an explicit experimental Tiktoken mode that creates lexical sparse vectors from `Microsoft.ML.Tokenizers` token IDs and builds a local corpus graph. Its default weighting is corpus-fitted subword TF-IDF, with raw term frequency and binary presence kept as experimental baselines. Tiktoken mode also creates section/segment structure, local TF-IDF keyphrase topics, and explicit front matter entity hint nodes, but it is not a semantic embedding model. If semantic vector search is added later, it should be a separate optional adapter over `Microsoft.Extensions.AI.IEmbeddingGenerator<,>` or an equivalent small port, with the concrete provider owned by the host app.
13+
The graph/search model does not require semantic embeddings. The AI boundary in the core pipeline is `Microsoft.Extensions.AI.IChatClient` for entity/assertion extraction. The library also exposes an explicit experimental Tiktoken mode that creates lexical sparse vectors from `Microsoft.ML.Tokenizers` token IDs and builds a local corpus graph. Its default weighting is corpus-fitted subword TF-IDF, with raw term frequency and binary presence kept as experimental baselines. Tiktoken mode also creates section/segment structure, local TF-IDF keyphrase topics, and explicit front matter entity hint nodes, but it is not a semantic embedding model. Capability graph rules add deterministic caller-authored entities and edges for groups, related nodes, and next-step nodes so applications can build workflow/capability graphs without relying on a flat document-topic graph. If semantic vector search is added later, it should be a separate optional adapter over `Microsoft.Extensions.AI.IEmbeddingGenerator<,>` or an equivalent small port, with the concrete provider owned by the host app.
1414

1515
## System Boundaries
1616

@@ -20,15 +20,18 @@ flowchart LR
2020
MarkdownFiles --> Loader["In-memory document converter and loader"]
2121
Loader --> Parser["Markdown parser and chunker"]
2222
Parser --> Router["Extraction mode router"]
23+
Parser --> Rules["Capability graph rules"]
2324
Router --> ChatExtractor["IChatClient extractor"]
2425
Router --> TokenExtractor["Tiktoken token-distance extractor"]
2526
Router --> NoExtractor["No fact extractor"]
27+
Rules --> Builder
2628
ChatExtractor --> Builder["RDF graph builder"]
2729
TokenExtractor --> Builder
2830
NoExtractor --> Builder
2931
Builder --> Graph["In-memory knowledge graph"]
3032
Graph --> Sparql["In-memory SPARQL executor API"]
3133
Graph --> Search["In-memory graph search API"]
34+
Graph --> Focused["Focused graph search API"]
3235
Graph --> Serializers["Turtle and JSON-LD serializers"]
3336
Graph --> Merge["Thread-safe graph merge API"]
3437
IChatClient["Microsoft.Extensions.AI IChatClient"] --> ChatExtractor
@@ -53,6 +56,7 @@ sequenceDiagram
5356
Pipeline->>Parser: Parse Markdown and front matter
5457
Parser-->>Pipeline: Parsed document and sections
5558
Pipeline->>Router: Resolve Auto / None / ChatClient / Tiktoken
59+
Pipeline->>Graph: Add deterministic capability graph rules
5660
alt ChatClient
5761
Router->>Chat: Structured LLM extraction
5862
Chat-->>Router: Knowledge extraction result
@@ -78,6 +82,7 @@ flowchart TB
7882
Parsing["Parsing: front matter, heading sections, wikilinks"]
7983
Ai["AI: IChatClient extraction port"]
8084
Tokens["Tiktoken: subword TF-IDF vectors, keyphrase topics, explicit entity hints, and token-distance search"]
85+
Rules["Capability rules: graph_entities, graph_edges, graph_groups, graph_related, graph_next_steps"]
8186
Rdf["RDF: graph construction, namespaces, serialization"]
8287
Query["Query: SPARQL and graph search"]
8388
end
@@ -92,6 +97,7 @@ flowchart TB
9297
FlowTests --> Parsing
9398
FlowTests --> Ai
9499
FlowTests --> Tokens
100+
FlowTests --> Rules
95101
FlowTests --> Rdf
96102
FlowTests --> Query
97103
```
@@ -147,6 +153,7 @@ Required first-slice scenarios:
147153
- Markdown with front matter and headings builds a queryable document metadata graph without requiring fact extraction.
148154
- Empty Markdown input produces an empty graph without throwing.
149155
- Explicit Tiktoken mode builds section/segment/topic/entity-hint nodes plus `schema:hasPart`, `schema:about`, `schema:mentions`, and token-distance `kb:relatedTo` edges without network access.
156+
- Capability graph rules build `kb:memberOf`, `kb:relatedTo`, and `kb:nextStep` workflow edges from Markdown front matter or caller options, and focused search returns primary, related, and next-step result groups.
150157
- English, Ukrainian, French, and German queries over same-language token graphs produce a higher hit rate than cross-language translated-topic queries.
151158
- Term frequency, binary presence, and subword TF-IDF token weighting modes are covered by focused and flow tests.
152159
- SPARQL mutating queries are rejected before execution.
@@ -174,3 +181,5 @@ Coverage requirement: 95%+ line coverage for changed production code.
174181
- TextRank: `https://aclanthology.org/W04-3252/`
175182
- RDF/SPARQL dependency decision: `docs/ADR/ADR-0001-rdf-sparql-library.md`
176183
- LLM extraction dependency decision: `docs/ADR/ADR-0002-llm-extraction-ichatclient.md`
184+
- Capability graph rules decision: `docs/ADR/ADR-0004-capability-graph-rules.md`
185+
- Capability graph rules feature: `docs/Features/CapabilityGraphRules.md`
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Capability Graph Rules
2+
3+
## Purpose
4+
5+
Capability graph rules let callers build structured, sparse graphs from Markdown documents. They are intended for tool catalogs, workflow catalogs, and other corpora where a caller needs a small primary result set plus explainable related and next-step candidates.
6+
7+
## Flow
8+
9+
```mermaid
10+
flowchart LR
11+
Source["Markdown documents"] --> Parser["MarkdownDocumentParser"]
12+
Parser --> FrontMatter["graph_* front matter"]
13+
Parser --> Extraction["None / ChatClient / Tiktoken extraction"]
14+
FrontMatter --> Rules["KnowledgeGraphRuleExtractor"]
15+
Rules --> RuleFacts["Entity and edge facts"]
16+
Extraction --> Facts["Extraction facts"]
17+
RuleFacts --> Merge["KnowledgeFactMerger"]
18+
Facts --> Merge
19+
Merge --> Graph["KnowledgeGraph"]
20+
Graph --> Focused["SearchFocusedAsync"]
21+
Focused --> Primary["Primary matches"]
22+
Focused --> Related["Related matches"]
23+
Focused --> Next["Next-step matches"]
24+
Focused --> Snapshot["Focused graph snapshot"]
25+
```
26+
27+
## Front Matter
28+
29+
- `graph_entities` / `graphEntities` adds explicit graph entities.
30+
- `graph_edges` / `graphEdges` adds explicit assertions.
31+
- `graph_groups` / `graphGroups` adds group entities and `kb:memberOf` edges from the current document.
32+
- `graph_related` / `graphRelated` adds `kb:relatedTo` edges from the current document.
33+
- `graph_next_steps` / `graphNextSteps` adds `kb:nextStep` edges from the current document.
34+
35+
Rule values can be strings or maps. Strings become node labels. Maps can use `id`, `label`, `name`, `type`, `sameAs`, `subject`, `predicate`, `object`, and `target` fields. Absolute IRIs are preserved, and labels become stable entity IRIs under the pipeline base URI.
36+
37+
## Search Behavior
38+
39+
`SearchFocusedAsync` returns:
40+
41+
- primary matches from token-distance search when the graph was built in Tiktoken mode
42+
- primary matches from graph metadata search when no token index is present
43+
- related matches from direct `kb:relatedTo` edges and shared `kb:memberOf` groups
44+
- next-step matches from direct `kb:nextStep` edges
45+
- a bounded focused graph snapshot containing the selected neighborhood
46+
47+
## Test Matrix
48+
49+
| Case | Expected behavior |
50+
| --- | --- |
51+
| Capability front matter | Builds `kb:memberOf`, `kb:relatedTo`, and `kb:nextStep` edges |
52+
| Focused search | Returns a small primary set before related or next-step candidates |
53+
| Related expansion | Includes same-group and explicit related nodes |
54+
| Next-step expansion | Includes explicit `kb:nextStep` nodes |
55+
| Focused export | Mermaid/DOT export includes only selected graph neighborhood |
56+
57+
## Verification
58+
59+
- `dotnet test --solution MarkdownLd.Kb.slnx --configuration Release -- --treenode-filter "/*/*/*/Capability_graph_front_matter_builds_focused_search_with_related_and_next_step_results" --no-progress`
60+
- `dotnet test --solution MarkdownLd.Kb.slnx --configuration Release`

src/MarkdownLd.Kb/Pipeline/KnowledgeGraph.Export.cs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,18 @@ namespace ManagedCode.MarkdownLd.Kb.Pipeline;
77

88
public sealed partial class KnowledgeGraph
99
{
10+
public static string SerializeMermaidFlowchart(KnowledgeGraphSnapshot snapshot)
11+
{
12+
ArgumentNullException.ThrowIfNull(snapshot);
13+
return BuildMermaidFlowchart(snapshot);
14+
}
15+
16+
public static string SerializeDotGraph(KnowledgeGraphSnapshot snapshot)
17+
{
18+
ArgumentNullException.ThrowIfNull(snapshot);
19+
return BuildDotGraph(snapshot);
20+
}
21+
1022
private static KnowledgeGraphSnapshot CreateGraphSnapshot(IEnumerable<Triple> triples)
1123
{
1224
var nodes = new Dictionary<string, KnowledgeGraphNode>(StringComparer.Ordinal);

0 commit comments

Comments
 (0)