You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,6 +49,8 @@ Target capabilities:
49
49
50
50
- Keep the core Markdown-to-graph pipeline deterministic and testable without network access.
51
51
- Keep the core runtime in-memory. Do not introduce localhost, HTTP server, background service, database server, or hosted API dependencies into the production library.
52
+
- Graph construction must support caller-supplied build rules so applications can turn Markdown corpora into structured capability/workflow graphs with groups, typed relationships, related-node expansion, and focused subgraphs instead of only flat document/topic graphs.
53
+
- Graph search APIs must support sparse, high-precision retrieval and explainable related/next-step candidates so callers can select the smallest useful result set and request additional graph-neighbor results later.
52
54
- Treat LLM/entity extraction as an adapter behind a small interface and implement that adapter through `Microsoft.Extensions.AI.IChatClient` from the start.
53
55
- Do not add an embedding dependency to the core graph pipeline. If vector/semantic indexing is added later, expose it as an optional adapter boundary through `Microsoft.Extensions.AI.IEmbeddingGenerator<,>` or a similarly small port, with the concrete provider owned by the host app.
54
56
- It is allowed for the production library to reference `Microsoft.Extensions.AI.Abstractions`; concrete OpenAI/Azure/Foundry providers must remain app-level dependencies unless an ADR says otherwise.
-`SearchAsync(term)` — case-insensitive search across `schema:name`, `schema:description`, and `schema:keywords`, returning matching graph subjects as `SparqlQueryResult`
58
+
-`SearchFocusedAsync(term)` — sparse graph search that returns primary, related, and next-step matches plus a bounded focused graph snapshot
58
59
59
60
All async methods accept an optional `CancellationToken`.
60
61
@@ -144,6 +145,61 @@ You do not need to pass a base URI for normal use. Document identity is resolved
144
145
145
146
The library uses `urn:managedcode:markdown-ld-kb:/` as an internal default base URI only to create valid RDF IRIs when the source does not provide `KnowledgeDocumentConversionOptions.CanonicalUri`. Pass `new MarkdownKnowledgePipeline(new Uri("https://your-domain/"))` only when you want generated document/entity IRIs to live under your own domain.
146
147
148
+
## Capability Graph Rules
149
+
150
+
Markdown can include deterministic graph rules in front matter. These rules are useful for capability catalogs, tool catalogs, workflow graphs, and any corpus where related and next-step nodes matter more than broad top-N search.
151
+
152
+
```markdown
153
+
---
154
+
title: Story Delete Tool
155
+
summary: Delete a story after the caller identifies the exact story item.
156
+
graph_groups:
157
+
- Story tools
158
+
- Delete operation
159
+
graph_related:
160
+
-https://kb.example/tools/story-feed-detail/
161
+
graph_next_steps:
162
+
-https://kb.example/tools/story-comments/
163
+
---
164
+
# Story Delete Tool
165
+
166
+
Use this capability to remove an existing story.
167
+
```
168
+
169
+
`graph_groups` creates `kb:memberOf` edges. `graph_related` creates `kb:relatedTo` edges. `graph_next_steps` creates `kb:nextStep` edges. For advanced graphs, use `graph_entities` and `graph_edges` to add explicit nodes and predicates. Absolute IRIs are preserved; plain labels become stable entity IRIs under the pipeline base URI.
Use `BuildAsync(documents, KnowledgeGraphBuildOptions)` when graph rules are assembled by the host application instead of authored in Markdown front matter.
202
+
147
203
## Optional AI Extraction
148
204
149
205
AI extraction builds graph facts from entities and assertions returned by an injected `Microsoft.Extensions.AI.IChatClient`. The package stays provider-neutral: it does not reference OpenAI, Azure OpenAI, Anthropic, or any other model-specific SDK. If no chat client is provided, `Auto` mode extracts no facts and reports a diagnostic; choose `Tiktoken` mode explicitly for local token-distance extraction.
Related Features: `docs/Features/CapabilityGraphRules.md`
6
+
7
+
---
8
+
9
+
## Context
10
+
11
+
The library can already build document metadata, AI-extracted facts, and Tiktoken token-distance graph structure. That is useful for document knowledge graphs, but capability catalogs need more explicit topology. A tool catalog should expose domain groups, operation groups, related tools, and next-step tools without relying on broad semantic top-N retrieval.
12
+
13
+
Constraints:
14
+
15
+
- The core library must remain in-memory and network-free.
16
+
- Graph construction must be deterministic and testable.
17
+
- Applications must be able to provide graph rules without hard-coding their domain into the package.
18
+
- Search must support sparse high-confidence results and explainable expansion.
19
+
20
+
## Decision
21
+
22
+
Add deterministic capability graph rules to the pipeline.
23
+
24
+
Rules can come from Markdown front matter or `KnowledgeGraphBuildOptions`. The first shipped front matter keys are:
25
+
26
+
-`graph_entities`
27
+
-`graph_edges`
28
+
-`graph_groups`
29
+
-`graph_related`
30
+
-`graph_next_steps`
31
+
32
+
The pipeline merges rule-derived facts with extraction-derived facts before graph construction. The graph API also exposes `SearchFocusedAsync`, which returns primary matches, related matches, next-step matches, and a bounded focused graph snapshot.
Copy file name to clipboardExpand all lines: docs/Architecture.md
+10-1Lines changed: 10 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ The upstream reference repository is kept as a read-only submodule at `external/
10
10
11
11
The core runtime has no localhost, HTTP server, background service, database server, or hosted API dependency. Callers pass files, directories, or in-memory document content into the library, and the library returns in-memory graph/search/query results.
12
12
13
-
The graph/search model does not require semantic embeddings. The AI boundary in the core pipeline is `Microsoft.Extensions.AI.IChatClient` for entity/assertion extraction. The library also exposes an explicit experimental Tiktoken mode that creates lexical sparse vectors from `Microsoft.ML.Tokenizers` token IDs and builds a local corpus graph. Its default weighting is corpus-fitted subword TF-IDF, with raw term frequency and binary presence kept as experimental baselines. Tiktoken mode also creates section/segment structure, local TF-IDF keyphrase topics, and explicit front matter entity hint nodes, but it is not a semantic embedding model. If semantic vector search is added later, it should be a separate optional adapter over `Microsoft.Extensions.AI.IEmbeddingGenerator<,>` or an equivalent small port, with the concrete provider owned by the host app.
13
+
The graph/search model does not require semantic embeddings. The AI boundary in the core pipeline is `Microsoft.Extensions.AI.IChatClient` for entity/assertion extraction. The library also exposes an explicit experimental Tiktoken mode that creates lexical sparse vectors from `Microsoft.ML.Tokenizers` token IDs and builds a local corpus graph. Its default weighting is corpus-fitted subword TF-IDF, with raw term frequency and binary presence kept as experimental baselines. Tiktoken mode also creates section/segment structure, local TF-IDF keyphrase topics, and explicit front matter entity hint nodes, but it is not a semantic embedding model. Capability graph rules add deterministic caller-authored entities and edges for groups, related nodes, and next-step nodes so applications can build workflow/capability graphs without relying on a flat document-topic graph. If semantic vector search is added later, it should be a separate optional adapter over `Microsoft.Extensions.AI.IEmbeddingGenerator<,>` or an equivalent small port, with the concrete provider owned by the host app.
14
14
15
15
## System Boundaries
16
16
@@ -20,15 +20,18 @@ flowchart LR
20
20
MarkdownFiles --> Loader["In-memory document converter and loader"]
- Markdown with front matter and headings builds a queryable document metadata graph without requiring fact extraction.
148
154
- Empty Markdown input produces an empty graph without throwing.
149
155
- Explicit Tiktoken mode builds section/segment/topic/entity-hint nodes plus `schema:hasPart`, `schema:about`, `schema:mentions`, and token-distance `kb:relatedTo` edges without network access.
156
+
- Capability graph rules build `kb:memberOf`, `kb:relatedTo`, and `kb:nextStep` workflow edges from Markdown front matter or caller options, and focused search returns primary, related, and next-step result groups.
150
157
- English, Ukrainian, French, and German queries over same-language token graphs produce a higher hit rate than cross-language translated-topic queries.
151
158
- Term frequency, binary presence, and subword TF-IDF token weighting modes are covered by focused and flow tests.
152
159
- SPARQL mutating queries are rejected before execution.
@@ -174,3 +181,5 @@ Coverage requirement: 95%+ line coverage for changed production code.
Capability graph rules let callers build structured, sparse graphs from Markdown documents. They are intended for tool catalogs, workflow catalogs, and other corpora where a caller needs a small primary result set plus explainable related and next-step candidates.
-`graph_groups` / `graphGroups` adds group entities and `kb:memberOf` edges from the current document.
32
+
-`graph_related` / `graphRelated` adds `kb:relatedTo` edges from the current document.
33
+
-`graph_next_steps` / `graphNextSteps` adds `kb:nextStep` edges from the current document.
34
+
35
+
Rule values can be strings or maps. Strings become node labels. Maps can use `id`, `label`, `name`, `type`, `sameAs`, `subject`, `predicate`, `object`, and `target` fields. Absolute IRIs are preserved, and labels become stable entity IRIs under the pipeline base URI.
36
+
37
+
## Search Behavior
38
+
39
+
`SearchFocusedAsync` returns:
40
+
41
+
- primary matches from token-distance search when the graph was built in Tiktoken mode
42
+
- primary matches from graph metadata search when no token index is present
43
+
- related matches from direct `kb:relatedTo` edges and shared `kb:memberOf` groups
44
+
- next-step matches from direct `kb:nextStep` edges
45
+
- a bounded focused graph snapshot containing the selected neighborhood
46
+
47
+
## Test Matrix
48
+
49
+
| Case | Expected behavior |
50
+
| --- | --- |
51
+
| Capability front matter | Builds `kb:memberOf`, `kb:relatedTo`, and `kb:nextStep` edges |
52
+
| Focused search | Returns a small primary set before related or next-step candidates |
53
+
| Related expansion | Includes same-group and explicit related nodes |
54
+
| Next-step expansion | Includes explicit `kb:nextStep` nodes |
55
+
| Focused export | Mermaid/DOT export includes only selected graph neighborhood |
56
+
57
+
## Verification
58
+
59
+
-`dotnet test --solution MarkdownLd.Kb.slnx --configuration Release -- --treenode-filter "/*/*/*/Capability_graph_front_matter_builds_focused_search_with_related_and_next_step_results" --no-progress`
60
+
-`dotnet test --solution MarkdownLd.Kb.slnx --configuration Release`
0 commit comments