You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Markdown-LD Knowledge Bank is a .NET 10 library for turning Markdown knowledge-base files into an in-memory RDF graph that can be searched, queried with read-only SPARQL, exported as RDF, and rendered as a diagram.
12
+
Markdown-LD Knowledge Bank is a .NET 10 library for turning Markdown knowledge-base files into an in-memory RDF graph that can be searched, queried with read-only SPARQL, validated with SHACL, exported as RDF, and rendered as a diagram.
13
13
14
14
The package is a C# library implementation of the Markdown-LD knowledge graph workflow. The runtime is local and in-memory: no localhost server, no Azure Functions host, no database server, and no hosted graph service are required.
-`ValidateShacl()` — SHACL validation against the built-in Markdown-LD Knowledge Bank shapes
59
+
-`ValidateShacl(shapesTurtle)` — SHACL validation against caller-supplied Turtle shapes
57
60
-`SearchAsync(term)` — case-insensitive search across `schema:name`, `schema:description`, and `schema:keywords`, returning matching graph subjects as `SparqlQueryResult`
58
61
-`SearchFocusedAsync(term)` — sparse graph search that returns primary, related, and next-step matches plus a bounded focused graph snapshot
59
62
@@ -200,6 +203,8 @@ internal static class CapabilityGraphDemo
200
203
201
204
Use `BuildAsync(documents, KnowledgeGraphBuildOptions)` when graph rules are assembled by the host application instead of authored in Markdown front matter.
202
205
206
+
Entities with the same `schema:sameAs` target are merged before assertions are emitted, and assertion endpoints are rewritten to the chosen canonical entity IRI. This keeps the graph sparse and avoids duplicated workflow edges when callers provide multiple labels or IDs for the same outside resource.
207
+
203
208
## Optional AI Extraction
204
209
205
210
AI extraction builds graph facts from entities and assertions returned by an injected `Microsoft.Extensions.AI.IChatClient`. The package stays provider-neutral: it does not reference OpenAI, Azure OpenAI, Anthropic, or any other model-specific SDK. If no chat client is provided, `Auto` mode extracts no facts and reports a diagnostic; choose `Tiktoken` mode explicitly for local token-distance extraction.
@@ -328,6 +333,56 @@ LIMIT 100
328
333
329
334
SPARQL execution is intentionally read-only. `SELECT` and `ASK` are allowed; mutation forms such as `INSERT`, `DELETE`, `LOAD`, `CLEAR`, `DROP`, and `CREATE` are rejected before execution.
`ValidateShacl()` uses default Markdown-LD Knowledge Bank shapes backed by `dotNetRdf.Shacl`. The default shapes validate article names, entity names, `schema:sameAs` IRIs, provenance IRIs, and assertion confidence metadata.
362
+
363
+
Graph assertions remain direct RDF edges for existing SPARQL and search callers. Each assertion also gets RDF reification metadata as an `rdf:Statement` with `rdf:subject`, `rdf:predicate`, `rdf:object`, `kb:confidence`, and optional `prov:wasDerivedFrom`, so SHACL can validate assertion metadata without changing the query shape of the main graph.
364
+
365
+
Pass custom Turtle shapes when the host application needs stricter rules:
366
+
367
+
```csharp
368
+
conststringShapes="""
369
+
@prefix sh: <http://www.w3.org/ns/shacl#> .
370
+
@prefix schema: <https://schema.org/> .
371
+
372
+
<urn:shape:ArticleDatePublished> a sh:NodeShape ;
373
+
sh:targetClass schema:Article ;
374
+
sh:property [
375
+
sh:path schema:datePublished ;
376
+
sh:minCount 1 ;
377
+
sh:message "Every Article must have a schema:datePublished." ;
378
+
] .
379
+
""";
380
+
381
+
varreport=result.Graph.ValidateShacl(Shapes);
382
+
```
383
+
384
+
Invalid caller-authored `sameAs` or provenance values are kept as RDF literals so the SHACL report can expose the exact violation instead of silently dropping the malformed fact.
385
+
331
386
## Export The Graph
332
387
333
388
```csharp
@@ -395,8 +450,10 @@ var rows = await shared.Graph.SearchAsync("rdf");
|`MarkdownKnowledgeBuildResult`| Holds `Documents`, `Facts`, and the built `Graph`. |
398
-
|`KnowledgeGraph`| In-memory dotNetRDF graph with query, search, export, and merge. |
453
+
|`KnowledgeGraph`| In-memory dotNetRDF graph with query, search, SHACL validation, export, and merge. |
399
454
|`KnowledgeGraphSnapshot`| Immutable view with `Nodes` (`KnowledgeGraphNode`) and `Edges` (`KnowledgeGraphEdge`). |
455
+
|`KnowledgeGraphShaclValidationReport`| SHACL conformance result with flattened issues and Turtle report output. |
456
+
|`KnowledgeGraphShaclValidationIssue`| Caller-readable SHACL result fields such as focus node, path, value, severity, and message. |
400
457
|`MarkdownDocument`| Pipeline parsed document: `FrontMatter`, `Body`, and `Sections`. |
401
458
|`MarkdownFrontMatter`| Typed front matter model used by the low-level Markdown parser. |
402
459
|`KnowledgeExtractionResult`| Merged collection of `KnowledgeEntityFact` and `KnowledgeAssertionFact`. |
@@ -459,14 +516,15 @@ Markdown links, wikilinks, and arrow assertions are not implicitly converted int
459
516
-`Markdig` parses Markdown structure.
460
517
-`YamlDotNet` parses front matter.
461
518
-`dotNetRDF` builds the RDF graph, runs local SPARQL, and serializes Turtle/JSON-LD.
519
+
-`dotNetRdf.Shacl` validates built graphs with default or caller-supplied SHACL shapes.
462
520
-`Microsoft.Extensions.AI.IChatClient` is the only AI boundary in the core pipeline.
463
521
-`Microsoft.ML.Tokenizers` powers the explicit Tiktoken token-distance mode.
464
522
- Subword TF-IDF is the default local token weighting because it downweights corpus-common tokens without adding language-specific preprocessing or model runtime dependencies.
465
523
- Local topic graph construction uses Unicode word n-gram keyphrases and RDF `schema:DefinedTerm`, `schema:hasPart`, and `schema:about` edges.
466
524
- Embeddings are not required for the current graph/search flow; Tiktoken mode uses token IDs, not embedding vectors.
467
525
- Microsoft Agent Framework is treated as host-level orchestration, not a core package dependency.
468
526
469
-
See [docs/Architecture.md](docs/Architecture.md), [ADR-0001](docs/ADR/ADR-0001-rdf-sparql-library.md), [ADR-0002](docs/ADR/ADR-0002-llm-extraction-ichatclient.md), and [ADR-0003](docs/ADR/ADR-0003-tiktoken-extraction-mode.md).
527
+
See [docs/Architecture.md](docs/Architecture.md), [ADR-0001](docs/ADR/ADR-0001-rdf-sparql-library.md), [ADR-0002](docs/ADR/ADR-0002-llm-extraction-ichatclient.md), [ADR-0003](docs/ADR/ADR-0003-tiktoken-extraction-mode.md), and [Graph SHACL Validation](docs/Features/GraphShaclValidation.md).
470
528
471
529
## Inspiration And Attribution
472
530
@@ -476,7 +534,7 @@ This project is inspired by Luis Quintanilla's Markdown-LD / AI Memex work:
476
534
-[Zero-Cost Knowledge Graph from Markdown](https://lqdev.me/resources/ai-memex/blog-post-zero-cost-knowledge-graph-from-markdown/) - core idea for using Markdown, YAML front matter, LLM extraction, RDF, JSON-LD, Turtle, and SPARQL
0 commit comments