Skip to content

Commit b1fc4e4

Browse files
committed
fedeeration
1 parent 9c32cc1 commit b1fc4e4

55 files changed

Lines changed: 4931 additions & 93 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Directory.Build.props

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@
2525
<PackageReadmeFile>README.md</PackageReadmeFile>
2626
<EnablePackageValidation>true</EnablePackageValidation>
2727
<Product>Markdown-LD Knowledge Bank</Product>
28-
<Version>0.1.6</Version>
29-
<PackageVersion>0.1.6</PackageVersion>
28+
<Version>0.1.7</Version>
29+
<PackageVersion>$(Version)</PackageVersion>
3030
</PropertyGroup>
3131

3232
<PropertyGroup Condition="'$(GITHUB_ACTIONS)' == 'true'">

Directory.Packages.props

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,18 @@
44
</PropertyGroup>
55
<ItemGroup>
66
<PackageVersion Include="dotNetRdf" Version="3.5.1" />
7+
<PackageVersion Include="dotNetRdf.Dynamic" Version="3.5.1" />
8+
<PackageVersion Include="dotNetRdf.Inferencing" Version="3.5.1" />
9+
<PackageVersion Include="dotNetRdf.Ldf" Version="3.5.1" />
710
<PackageVersion Include="dotNetRdf.Ontology" Version="3.5.1" />
11+
<PackageVersion Include="dotNetRdf.Query.FullText" Version="3.5.1" />
812
<PackageVersion Include="dotNetRdf.Skos" Version="3.5.1" />
913
<PackageVersion Include="dotNetRdf.Shacl" Version="3.5.1" />
1014
<PackageVersion Include="DotNet.ReproducibleBuilds" Version="2.0.2" />
1115
<PackageVersion Include="Markdig" Version="1.1.3" />
16+
<PackageVersion Include="ManagedCode.Storage.Core" Version="10.0.5" />
17+
<PackageVersion Include="ManagedCode.Storage.FileSystem" Version="10.0.5" />
18+
<PackageVersion Include="ManagedCode.Storage.VirtualFileSystem" Version="10.0.5" />
1219
<PackageVersion Include="Microsoft.Extensions.AI" Version="10.5.0" />
1320
<PackageVersion Include="Microsoft.Extensions.AI.Abstractions" Version="10.5.0" />
1421
<PackageVersion Include="Microsoft.Bcl.Memory" Version="10.0.6" />

README.md

Lines changed: 163 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,12 @@ Tiktoken mode is deterministic and network-free. It uses lexical token-distance
5353
- `SerializeDotGraph()` — Graphviz DOT diagram
5454
- `SerializeTurtle()` — Turtle RDF serialization
5555
- `SerializeJsonLd()` — JSON-LD serialization
56+
- `SaveToStoreAsync(store, location, options)` — persist the graph through a graph-store abstraction
57+
- `SaveToFileAsync(path, options)` — persist the graph as RDF
58+
- `LoadFromStoreAsync(store, location, options)` — load a graph from a graph-store abstraction
59+
- `LoadFromFileAsync(path, options)` — load a graph from one RDF file
60+
- `LoadFromDirectoryAsync(path, options)` — load and merge RDF files from a directory
61+
- `LoadFromLinkedDataFragmentsAsync(endpoint, options)` — materialize a Linked Data Fragments source into a local graph
5662
- `ExecuteSelectAsync(sparql)` — read-only SPARQL SELECT returning `SparqlQueryResult`
5763
- `ExecuteAskAsync(sparql)` — read-only SPARQL ASK returning `bool`
5864
- `ExecuteFederatedSelectAsync(sparql, options)` — explicit federated read-only SPARQL SELECT with endpoint diagnostics
@@ -61,6 +67,9 @@ Tiktoken mode is deterministic and network-free. It uses lexical token-distance
6167
- `ValidateShacl(shapesTurtle)` — SHACL validation against caller-supplied Turtle shapes
6268
- `SearchAsync(term)` — case-insensitive search across `schema:name`, `schema:description`, and `schema:keywords`, returning matching graph subjects as `SparqlQueryResult`
6369
- `SearchFocusedAsync(term)` — sparse graph search that returns primary, related, and next-step matches plus a bounded focused graph snapshot
70+
- `MaterializeInferenceAsync(options)` — explicit RDFS / SKOS / N3-rule materialization
71+
- `BuildFullTextIndexAsync(options)` — optional Lucene-backed graph full-text index
72+
- `ToDynamicSnapshot()` — optional dynamic graph access over dotNetRDF dynamic types
6473

6574
All async methods accept an optional `CancellationToken`.
6675

@@ -161,7 +170,7 @@ internal static class FileGraphDemo
161170
}
162171
```
163172

164-
`KnowledgeSourceDocumentConverter` supports Markdown and other text-like knowledge inputs: `.md`, `.markdown`, `.mdx`, `.txt`, `.text`, `.log`, `.csv`, `.json`, `.jsonl`, `.yaml`, and `.yml`. Non-Markdown files are accepted as text sources and run through the same parsing, extraction, and graph build pipeline.
173+
`KnowledgeSourceDocumentConverter` supports Markdown and other text-like knowledge inputs: `.md`, `.markdown`, `.mdx`, `.txt`, `.text`, `.log`, `.csv`, `.json`, `.jsonl`, `.yaml`, and `.yml`. Files with unknown or missing extensions are still accepted when their bytes decode as text, and they are treated as `text/plain`. Truly unreadable binary files are either skipped during directory loads or fail explicitly with `InvalidDataException` when the caller disables skipping.
165174

166175
You do not need to pass a base URI for normal use. Document identity is resolved in this order:
167176

@@ -256,6 +265,99 @@ var result = await pipeline.BuildAsync(
256265
});
257266
```
258267

268+
## Graph Runtime Lifecycle
269+
270+
Once a Markdown file or directory has been built into a `KnowledgeGraph`, the same public runtime can persist it through a graph-store abstraction, reload it, materialize inference, expose a full-text index, expose a dynamic snapshot, or materialize a Linked Data Fragments source into the same local graph model.
271+
272+
The runtime now uses `dotNetRdf`, `dotNetRdf.Ontology`, `dotNetRdf.Skos`, `dotNetRdf.Inferencing`, `dotNetRdf.Dynamic`, `dotNetRdf.Query.FullText`, and `dotNetRdf.Ldf` through repository-owned adapters instead of a hand-rolled RDF stack. RDF serialization remains repository-owned; filesystem/blob access is delegated to `ManagedCode.Storage`.
273+
274+
```csharp
275+
using ManagedCode.MarkdownLd.Kb.Pipeline;
276+
277+
internal static class GraphRuntimeLifecycleDemo
278+
{
279+
private const string FilePath = "/absolute/path/to/content/query-federation-runbook.md";
280+
private const string TurtlePath = "/absolute/path/to/output/runtime-graph.ttl";
281+
private const string StorageLocation = "graphs/runtime/runtime-graph.ttl";
282+
private const string SchemaPath = "/absolute/path/to/runtime-schema.ttl";
283+
private const string RulesPath = "/absolute/path/to/runtime-rules.n3";
284+
285+
public static async Task RunAsync()
286+
{
287+
var pipeline = new MarkdownKnowledgePipeline(new Uri("https://kb.example/"));
288+
var built = await pipeline.BuildFromFileAsync(FilePath);
289+
var memoryStore = new InMemoryKnowledgeGraphStore();
290+
291+
await built.Graph.SaveToStoreAsync(memoryStore, StorageLocation);
292+
var fromMemory = await KnowledgeGraph.LoadFromStoreAsync(memoryStore, StorageLocation);
293+
await built.Graph.SaveToFileAsync(TurtlePath);
294+
var reloaded = await KnowledgeGraph.LoadFromFileAsync(TurtlePath);
295+
296+
var inference = await fromMemory.MaterializeInferenceAsync(new KnowledgeGraphInferenceOptions
297+
{
298+
AdditionalSchemaFilePaths = [SchemaPath],
299+
AdditionalN3RuleFilePaths = [RulesPath],
300+
});
301+
302+
using var fullText = await inference.Graph.BuildFullTextIndexAsync();
303+
var matches = await fullText.SearchAsync("federated wikidata workflow");
304+
305+
dynamic dynamicGraph = inference.Graph.ToDynamicSnapshot();
306+
dynamic dynamicDocument = dynamicGraph["https://kb.example/query-federation-runbook/"];
307+
308+
Console.WriteLine(inference.InferredTripleCount);
309+
Console.WriteLine(matches.Count);
310+
Console.WriteLine(dynamicDocument["https://schema.org/name"].Count);
311+
Console.WriteLine(reloaded.TripleCount);
312+
}
313+
}
314+
```
315+
316+
The built-in graph-store implementations are:
317+
318+
- `FileSystemKnowledgeGraphStore` — local file paths, internally backed by `ManagedCode.Storage.FileSystem`
319+
- `StorageKnowledgeGraphStore` — any configured `ManagedCode.Storage.Core.IStorage` backend, including blob/object providers
320+
- `InMemoryKnowledgeGraphStore` — process-local graph persistence without files
321+
322+
DI helpers are available for hosts that want one or more configured stores:
323+
324+
```csharp
325+
using ManagedCode.MarkdownLd.Kb.Pipeline;
326+
using ManagedCode.Storage.FileSystem;
327+
using ManagedCode.Storage.FileSystem.Extensions;
328+
using Microsoft.Extensions.DependencyInjection;
329+
330+
var services = new ServiceCollection();
331+
332+
services.AddFileSystemKnowledgeGraphStoreAsDefault(options =>
333+
{
334+
options.BaseFolder = "/absolute/path/to/storage-root";
335+
options.CreateContainerIfNotExists = true;
336+
});
337+
338+
services.AddFileSystemStorage("archive", options =>
339+
{
340+
options.BaseFolder = "/absolute/path/to/archive-root";
341+
options.CreateContainerIfNotExists = true;
342+
});
343+
services.AddKeyedStorageBackedKnowledgeGraphStore<IFileSystemStorage>("archive");
344+
```
345+
346+
Use `new InMemoryKnowledgeGraphStore()` for process-local persistence, or `AddVirtualFileSystemKnowledgeGraphStore()` after `AddVirtualFileSystem(...)` when the host already standardizes on a VFS overlay.
347+
348+
The same runtime can also materialize a read-only Triple Pattern Fragments source into a local graph:
349+
350+
```csharp
351+
using ManagedCode.MarkdownLd.Kb.Pipeline;
352+
353+
var ldfGraph = await KnowledgeGraph.LoadFromLinkedDataFragmentsAsync(
354+
new Uri("https://example.org/tpf"));
355+
```
356+
357+
If the host needs custom transport settings, pass a caller-owned `HttpClient` through `KnowledgeGraphLinkedDataFragmentsOptions`. Host apps may source that client from `IHttpClientFactory`; the core library intentionally accepts the configured client instance instead of depending on `IHttpClientFactory`.
358+
359+
After materialization, callers use the normal local `ExecuteSelectAsync`, `ExecuteAskAsync`, `SearchAsync`, `ValidateShacl`, persistence, and inference APIs.
360+
259361
## Optional AI Extraction
260362

261363
AI extraction builds graph facts from entities and assertions returned by an injected `Microsoft.Extensions.AI.IChatClient`. The package stays provider-neutral: it does not reference OpenAI, Azure OpenAI, Anthropic, or any other model-specific SDK. If no chat client is provided, `Auto` mode extracts no facts and reports a diagnostic; choose `Tiktoken` mode explicitly for local token-distance extraction.
@@ -386,8 +488,20 @@ LIMIT 100
386488

387489
SPARQL execution is intentionally read-only. `SELECT` and `ASK` are allowed; mutation forms such as `INSERT`, `DELETE`, `LOAD`, `CLEAR`, `DROP`, and `CREATE` are rejected before execution.
388490

491+
The supported query surface is intentionally narrow:
492+
493+
- local read-only queries: `ExecuteSelectAsync` for `SELECT` and `ExecuteAskAsync` for `ASK`
494+
- explicit federated read-only queries: `ExecuteFederatedSelectAsync` for `SELECT` and `ExecuteFederatedAskAsync` for `ASK`
495+
- unsupported query types: `CONSTRUCT`, `DESCRIBE`, and all mutation/update forms
496+
389497
The default public SPARQL contract remains local and in-memory. Local `ExecuteSelectAsync` / `ExecuteAskAsync` reject top-level `SERVICE` clauses. Federated queries are explicit through `ExecuteFederatedSelectAsync` / `ExecuteFederatedAskAsync`, require an allowlist or named profile, and currently ship caller-visible endpoint diagnostics through `FederatedSparqlSelectResult` / `FederatedSparqlAskResult`.
390498

499+
This follows the official Wikidata Query Service federation model, where cross-endpoint access is expressed with SPARQL `SERVICE` clauses and endpoint policy stays explicit at the caller boundary. The library ships ready-made profiles for the WDQS main/scholarly split introduced on 9 May 2025:
500+
501+
- `FederatedSparqlProfiles.WikidataMain` allowlists `https://query.wikidata.org/sparql`
502+
- `FederatedSparqlProfiles.WikidataScholarly` allowlists `https://query-scholarly.wikidata.org/sparql`
503+
- `FederatedSparqlProfiles.WikidataMainAndScholarly` allowlists both endpoints for multi-endpoint federated queries
504+
391505
```csharp
392506
using ManagedCode.MarkdownLd.Kb.Pipeline;
393507

@@ -404,6 +518,50 @@ var federated = await result.Graph.ExecuteFederatedSelectAsync(
404518
Console.WriteLine(federated.ServiceEndpointSpecifiers[0]);
405519
```
406520

521+
Use `ExecuteFederatedAskAsync` the same way when the caller needs a read-only federated `ASK` query instead of a result set.
522+
523+
For deterministic multi-graph federation inside the same process, bind allowlisted endpoint URIs to other in-memory `KnowledgeGraph` instances:
524+
525+
```csharp
526+
using ManagedCode.MarkdownLd.Kb.Pipeline;
527+
528+
var localOptions = new FederatedSparqlExecutionOptions
529+
{
530+
AllowedServiceEndpoints =
531+
[
532+
new Uri("https://kb.example/services/policy"),
533+
new Uri("https://kb.example/services/runbook"),
534+
],
535+
LocalServiceBindings =
536+
[
537+
new FederatedSparqlLocalServiceBinding(
538+
new Uri("https://kb.example/services/policy"),
539+
policyGraph),
540+
new FederatedSparqlLocalServiceBinding(
541+
new Uri("https://kb.example/services/runbook"),
542+
runbookGraph),
543+
],
544+
};
545+
546+
var result = await rootGraph.ExecuteFederatedSelectAsync(
547+
"""
548+
PREFIX schema: <https://schema.org/>
549+
SELECT ?policyTitle ?runbookTitle WHERE {
550+
SERVICE <https://kb.example/services/policy> {
551+
?policy schema:name ?policyTitle .
552+
}
553+
SERVICE <https://kb.example/services/runbook> {
554+
?runbook schema:name ?runbookTitle .
555+
}
556+
}
557+
""",
558+
localOptions);
559+
```
560+
561+
This path still uses SPARQL `SERVICE` and the same allowlist checks, but it stays fully in-memory and network-free for test fixtures or host-managed multi-graph workflows.
562+
563+
For the external federation model and current WDQS endpoint split, see the official [Wikidata federated queries guide](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Federated_queries), the [WDQS graph split note](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split), and the [Wikidata Query Service user manual](https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/en).
564+
407565
## Validate With SHACL
408566

409567
```csharp
@@ -626,7 +784,7 @@ Markdown links, wikilinks, and arrow assertions are not implicitly converted int
626784
- Embeddings are not required for the current graph/search flow; Tiktoken mode uses token IDs, not embedding vectors.
627785
- Microsoft Agent Framework is treated as host-level orchestration, not a core package dependency.
628786

629-
See [docs/Architecture.md](docs/Architecture.md), [ADR-0001](docs/ADR/ADR-0001-rdf-sparql-library.md), [ADR-0002](docs/ADR/ADR-0002-llm-extraction-ichatclient.md), [ADR-0003](docs/ADR/ADR-0003-tiktoken-extraction-mode.md), [ADR-0006](docs/ADR/ADR-0006-federated-sparql-adapter.md), [Graph SHACL Validation](docs/Features/GraphShaclValidation.md), and [Federated SPARQL Execution](docs/Features/FederatedSparqlExecution.md).
787+
See [docs/Architecture.md](docs/Architecture.md), [ADR-0001](docs/ADR/ADR-0001-rdf-sparql-library.md), [ADR-0002](docs/ADR/ADR-0002-llm-extraction-ichatclient.md), [ADR-0003](docs/ADR/ADR-0003-tiktoken-extraction-mode.md), [ADR-0006](docs/ADR/ADR-0006-federated-sparql-adapter.md), [Graph Runtime Lifecycle](docs/Features/GraphRuntimeLifecycle.md), [Graph SHACL Validation](docs/Features/GraphShaclValidation.md), and [Federated SPARQL Execution](docs/Features/FederatedSparqlExecution.md).
630788

631789
## Inspiration And Attribution
632790

@@ -636,6 +794,9 @@ This project is inspired by Luis Quintanilla's Markdown-LD / AI Memex work:
636794
- [Zero-Cost Knowledge Graph from Markdown](https://lqdev.me/resources/ai-memex/blog-post-zero-cost-knowledge-graph-from-markdown/) - core idea for using Markdown, YAML front matter, LLM extraction, RDF, JSON-LD, Turtle, and SPARQL
637795
- [Project Report: Entity Extraction & RDF Pipeline](https://lqdev.me/resources/ai-memex/project-report-entity-extraction-rdf-pipeline/) - extraction and RDF pipeline context
638796
- [W3C SPARQL Federated Query](https://github.com/w3c/sparql-federated-query) - SPARQL federation reference material
797+
- [Wikidata Federated Queries](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Federated_queries) - official WDQS `SERVICE` federation guide and examples
798+
- [Wikidata Query Service User Manual](https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/en) - official WDQS operational and usage guidance
799+
- [WDQS Graph Split](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split) - official main/scholarly endpoint split and migration guidance
639800
- [dotNetRDF](https://github.com/dotnetrdf/dotnetrdf) - RDF/SPARQL/SHACL engine used by this C# implementation
640801

641802
The upstream reference repository is kept as a read-only submodule under `external/lqdev-markdown-ld-kb`.

0 commit comments

Comments
 (0)