Skip to content

Commit 11ad756

Browse files
committed
readme
1 parent 8def203 commit 11ad756

1 file changed

Lines changed: 65 additions & 7 deletions

File tree

README.md

Lines changed: 65 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
[![.NET 10](https://img.shields.io/badge/.NET-10.0-512BD4?logo=dotnet)](https://dotnet.microsoft.com/)
1010
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
1111

12-
Markdown-LD Knowledge Bank is a .NET 10 library for turning Markdown knowledge-base files into an in-memory RDF graph that can be searched, queried with read-only SPARQL, exported as RDF, and rendered as a diagram.
12+
Markdown-LD Knowledge Bank is a .NET 10 library for turning Markdown knowledge-base files into an in-memory RDF graph that can be searched, queried with read-only SPARQL, validated with SHACL, exported as RDF, and rendered as a diagram.
1313

1414
The package is a C# library implementation of the Markdown-LD knowledge graph workflow. The runtime is local and in-memory: no localhost server, no Azure Functions host, no database server, and no hosted graph service are required.
1515

@@ -31,6 +31,7 @@ flowchart LR
3131
Merge --> Builder["KnowledgeGraphBuilder\n→ dotNetRDF in-memory graph"]
3232
Builder --> Search["SearchAsync"]
3333
Builder --> Sparql["ExecuteSelectAsync\nExecuteAskAsync"]
34+
Builder --> Shacl["ValidateShacl\nSHACL report"]
3435
Builder --> Snap["ToSnapshot"]
3536
Builder --> Diagram["SerializeMermaidFlowchart\nSerializeDotGraph"]
3637
Builder --> Export["SerializeTurtle\nSerializeJsonLd"]
@@ -54,6 +55,8 @@ Tiktoken mode is deterministic and network-free. It uses lexical token-distance
5455
- `SerializeJsonLd()` — JSON-LD serialization
5556
- `ExecuteSelectAsync(sparql)` — read-only SPARQL SELECT returning `SparqlQueryResult`
5657
- `ExecuteAskAsync(sparql)` — read-only SPARQL ASK returning `bool`
58+
- `ValidateShacl()` — SHACL validation against the built-in Markdown-LD Knowledge Bank shapes
59+
- `ValidateShacl(shapesTurtle)` — SHACL validation against caller-supplied Turtle shapes
5760
- `SearchAsync(term)` — case-insensitive search across `schema:name`, `schema:description`, and `schema:keywords`, returning matching graph subjects as `SparqlQueryResult`
5861
- `SearchFocusedAsync(term)` — sparse graph search that returns primary, related, and next-step matches plus a bounded focused graph snapshot
5962

@@ -200,6 +203,8 @@ internal static class CapabilityGraphDemo
200203

201204
Use `BuildAsync(documents, KnowledgeGraphBuildOptions)` when graph rules are assembled by the host application instead of authored in Markdown front matter.
202205

206+
Entities with the same `schema:sameAs` target are merged before assertions are emitted, and assertion endpoints are rewritten to the chosen canonical entity IRI. This keeps the graph sparse and avoids duplicated workflow edges when callers provide multiple labels or IDs for the same outside resource.
207+
203208
## Optional AI Extraction
204209

205210
AI extraction builds graph facts from entities and assertions returned by an injected `Microsoft.Extensions.AI.IChatClient`. The package stays provider-neutral: it does not reference OpenAI, Azure OpenAI, Anthropic, or any other model-specific SDK. If no chat client is provided, `Auto` mode extracts no facts and reports a diagnostic; choose `Tiktoken` mode explicitly for local token-distance extraction.
@@ -328,6 +333,56 @@ LIMIT 100
328333

329334
SPARQL execution is intentionally read-only. `SELECT` and `ASK` are allowed; mutation forms such as `INSERT`, `DELETE`, `LOAD`, `CLEAR`, `DROP`, and `CREATE` are rejected before execution.
330335

336+
## Validate With SHACL
337+
338+
```csharp
339+
using ManagedCode.MarkdownLd.Kb.Pipeline;
340+
341+
internal static class ShaclValidationDemo
342+
{
343+
public static void Run(MarkdownKnowledgeBuildResult result)
344+
{
345+
KnowledgeGraphShaclValidationReport report = result.ValidateShacl();
346+
347+
if (!report.Conforms)
348+
{
349+
foreach (var issue in report.Results)
350+
{
351+
Console.WriteLine(issue.FocusNode);
352+
Console.WriteLine(issue.Message);
353+
}
354+
}
355+
356+
Console.WriteLine(report.ReportTurtle);
357+
}
358+
}
359+
```
360+
361+
`ValidateShacl()` uses default Markdown-LD Knowledge Bank shapes backed by `dotNetRdf.Shacl`. The default shapes validate article names, entity names, `schema:sameAs` IRIs, provenance IRIs, and assertion confidence metadata.
362+
363+
Graph assertions remain direct RDF edges for existing SPARQL and search callers. Each assertion also gets RDF reification metadata as an `rdf:Statement` with `rdf:subject`, `rdf:predicate`, `rdf:object`, `kb:confidence`, and optional `prov:wasDerivedFrom`, so SHACL can validate assertion metadata without changing the query shape of the main graph.
364+
365+
Pass custom Turtle shapes when the host application needs stricter rules:
366+
367+
```csharp
368+
const string Shapes = """
369+
@prefix sh: <http://www.w3.org/ns/shacl#> .
370+
@prefix schema: <https://schema.org/> .
371+
372+
<urn:shape:ArticleDatePublished> a sh:NodeShape ;
373+
sh:targetClass schema:Article ;
374+
sh:property [
375+
sh:path schema:datePublished ;
376+
sh:minCount 1 ;
377+
sh:message "Every Article must have a schema:datePublished." ;
378+
] .
379+
""";
380+
381+
var report = result.Graph.ValidateShacl(Shapes);
382+
```
383+
384+
Invalid caller-authored `sameAs` or provenance values are kept as RDF literals so the SHACL report can expose the exact violation instead of silently dropping the malformed fact.
385+
331386
## Export The Graph
332387

333388
```csharp
@@ -395,8 +450,10 @@ var rows = await shared.Graph.SearchAsync("rdf");
395450
|---|---|
396451
| `MarkdownKnowledgePipeline` | Entry point. Orchestrates parsing, extraction, merge, and graph build. |
397452
| `MarkdownKnowledgeBuildResult` | Holds `Documents`, `Facts`, and the built `Graph`. |
398-
| `KnowledgeGraph` | In-memory dotNetRDF graph with query, search, export, and merge. |
453+
| `KnowledgeGraph` | In-memory dotNetRDF graph with query, search, SHACL validation, export, and merge. |
399454
| `KnowledgeGraphSnapshot` | Immutable view with `Nodes` (`KnowledgeGraphNode`) and `Edges` (`KnowledgeGraphEdge`). |
455+
| `KnowledgeGraphShaclValidationReport` | SHACL conformance result with flattened issues and Turtle report output. |
456+
| `KnowledgeGraphShaclValidationIssue` | Caller-readable SHACL result fields such as focus node, path, value, severity, and message. |
400457
| `MarkdownDocument` | Pipeline parsed document: `FrontMatter`, `Body`, and `Sections`. |
401458
| `MarkdownFrontMatter` | Typed front matter model used by the low-level Markdown parser. |
402459
| `KnowledgeExtractionResult` | Merged collection of `KnowledgeEntityFact` and `KnowledgeAssertionFact`. |
@@ -459,14 +516,15 @@ Markdown links, wikilinks, and arrow assertions are not implicitly converted int
459516
- `Markdig` parses Markdown structure.
460517
- `YamlDotNet` parses front matter.
461518
- `dotNetRDF` builds the RDF graph, runs local SPARQL, and serializes Turtle/JSON-LD.
519+
- `dotNetRdf.Shacl` validates built graphs with default or caller-supplied SHACL shapes.
462520
- `Microsoft.Extensions.AI.IChatClient` is the only AI boundary in the core pipeline.
463521
- `Microsoft.ML.Tokenizers` powers the explicit Tiktoken token-distance mode.
464522
- Subword TF-IDF is the default local token weighting because it downweights corpus-common tokens without adding language-specific preprocessing or model runtime dependencies.
465523
- Local topic graph construction uses Unicode word n-gram keyphrases and RDF `schema:DefinedTerm`, `schema:hasPart`, and `schema:about` edges.
466524
- Embeddings are not required for the current graph/search flow; Tiktoken mode uses token IDs, not embedding vectors.
467525
- Microsoft Agent Framework is treated as host-level orchestration, not a core package dependency.
468526

469-
See [docs/Architecture.md](docs/Architecture.md), [ADR-0001](docs/ADR/ADR-0001-rdf-sparql-library.md), [ADR-0002](docs/ADR/ADR-0002-llm-extraction-ichatclient.md), and [ADR-0003](docs/ADR/ADR-0003-tiktoken-extraction-mode.md).
527+
See [docs/Architecture.md](docs/Architecture.md), [ADR-0001](docs/ADR/ADR-0001-rdf-sparql-library.md), [ADR-0002](docs/ADR/ADR-0002-llm-extraction-ichatclient.md), [ADR-0003](docs/ADR/ADR-0003-tiktoken-extraction-mode.md), and [Graph SHACL Validation](docs/Features/GraphShaclValidation.md).
470528

471529
## Inspiration And Attribution
472530

@@ -476,7 +534,7 @@ This project is inspired by Luis Quintanilla's Markdown-LD / AI Memex work:
476534
- [Zero-Cost Knowledge Graph from Markdown](https://lqdev.me/resources/ai-memex/blog-post-zero-cost-knowledge-graph-from-markdown/) - core idea for using Markdown, YAML front matter, LLM extraction, RDF, JSON-LD, Turtle, and SPARQL
477535
- [Project Report: Entity Extraction & RDF Pipeline](https://lqdev.me/resources/ai-memex/project-report-entity-extraction-rdf-pipeline/) - extraction and RDF pipeline context
478536
- [W3C SPARQL Federated Query](https://github.com/w3c/sparql-federated-query) - SPARQL federation reference material
479-
- [dotNetRDF](https://github.com/dotnetrdf/dotnetrdf) - RDF/SPARQL engine used by this C# implementation
537+
- [dotNetRDF](https://github.com/dotnetrdf/dotnetrdf) - RDF/SPARQL/SHACL engine used by this C# implementation
480538

481539
The upstream reference repository is kept as a read-only submodule under `external/lqdev-markdown-ld-kb`.
482540

@@ -494,8 +552,8 @@ Coverage is collected through `Microsoft.Testing.Extensions.CodeCoverage`. Cober
494552

495553
Current verification:
496554

497-
- tests: 77 passed, 0 failed
498-
- line coverage: 96.30%
499-
- branch coverage: 85.23%
555+
- tests: 87 passed, 0 failed
556+
- line coverage: 96.76%
557+
- branch coverage: 87.12%
500558
- target framework: .NET 10
501559
- package version: 0.0.1

0 commit comments

Comments
 (0)