Skip to content

Commit 7812bdf

Browse files
committed
Add schema-aware graph production pipeline
Add JSON-LD round-trip helpers, graph contracts, generated SHACL, graph diffing, incremental manifests, and build profile presets. Add schema-aware local and federated SPARQL search with explainable evidence, source context, focused graph export, and production documentation. Cover the new graph production flows with integration tests.
1 parent ae6503c commit 7812bdf

38 files changed

Lines changed: 4978 additions & 34 deletions

Directory.Build.props

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
<PackageReadmeFile>README.md</PackageReadmeFile>
2626
<EnablePackageValidation>true</EnablePackageValidation>
2727
<Product>Markdown-LD Knowledge Bank</Product>
28-
<Version>0.1.7</Version>
28+
<Version>0.2.0</Version>
2929
<PackageVersion>$(Version)</PackageVersion>
3030
</PropertyGroup>
3131

README.md

Lines changed: 301 additions & 14 deletions
Large diffs are not rendered by default.

docs/Architecture.md

Lines changed: 73 additions & 3 deletions
Large diffs are not rendered by default.

docs/Features/FederatedSparqlExecution.md

Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ The canonical graph remains the local in-memory `KnowledgeGraph`. Federation is
1111
In scope:
1212

1313
- explicit read-only federated query execution through `ExecuteFederatedSelectAsync` and `ExecuteFederatedAskAsync`
14+
- schema-aware federated search through `SearchBySchemaFederatedAsync`, which compiles caller profiles into `SERVICE` queries
1415
- endpoint allowlists and endpoint profiles
1516
- deterministic local service bindings for multi-graph in-memory federation
1617
- caller-visible endpoint diagnostics
@@ -90,6 +91,235 @@ flowchart LR
9091
- Intended use: explicit caller-selected federation across the split WDQS graphs
9192
- Default behavior: require the caller to choose this profile or enumerate both endpoints explicitly
9293

94+
## Which API To Use
95+
96+
```mermaid
97+
flowchart LR
98+
Need["Need cross-graph data?"] --> Raw{"Do you already have SPARQL?"}
99+
Raw -->|"Yes"| Execute["ExecuteFederatedSelectAsync / ExecuteFederatedAskAsync"]
100+
Raw -->|"No"| Schema{"Can the query be described by predicates?"}
101+
Schema -->|"Yes"| Search["SearchBySchemaFederatedAsync"]
102+
Schema -->|"No"| Local["Use local SearchBySchemaAsync or author raw SPARQL"]
103+
Execute --> Allow["AllowedServiceEndpoints"]
104+
Search --> Allow
105+
Allow --> Bindings["Optional LocalServiceBindings"]
106+
Allow --> Remote["Optional remote endpoints"]
107+
```
108+
109+
Use raw federated SPARQL when the caller knows the exact cross-service join. Use schema-aware federated search when the caller wants the library to compile a search profile into `SERVICE` blocks. Use local schema-aware search when all required data is already in one `KnowledgeGraph`.
110+
111+
## Raw Local Multi-Graph Example
112+
113+
This example federates across two in-memory graphs without network access. The endpoint URIs are logical service names owned by the host application.
114+
115+
```csharp
116+
var policyGraph = (await pipeline.BuildAsync(
117+
[
118+
new MarkdownSourceDocument("policy/federation.md", policyMarkdown),
119+
])).Graph;
120+
121+
var runbookGraph = (await pipeline.BuildAsync(
122+
[
123+
new MarkdownSourceDocument("runbooks/federation.md", runbookMarkdown),
124+
])).Graph;
125+
126+
var rootGraph = (await pipeline.BuildAsync(
127+
[
128+
new MarkdownSourceDocument("scratch/root.md", string.Empty),
129+
])).Graph;
130+
131+
var policyEndpoint = new Uri("https://kb.example/services/policy");
132+
var runbookEndpoint = new Uri("https://kb.example/services/runbook");
133+
134+
var options = new FederatedSparqlExecutionOptions
135+
{
136+
AllowedServiceEndpoints =
137+
[
138+
policyEndpoint,
139+
runbookEndpoint,
140+
],
141+
LocalServiceBindings =
142+
[
143+
new FederatedSparqlLocalServiceBinding(policyEndpoint, policyGraph),
144+
new FederatedSparqlLocalServiceBinding(runbookEndpoint, runbookGraph),
145+
],
146+
};
147+
148+
var sparql = """
149+
PREFIX schema: <https://schema.org/>
150+
SELECT ?policyTitle ?runbookTitle WHERE {
151+
SERVICE <https://kb.example/services/policy> {
152+
?policy a schema:Article ;
153+
schema:name ?policyTitle ;
154+
schema:about ?topic .
155+
}
156+
157+
SERVICE <https://kb.example/services/runbook> {
158+
?runbook a schema:HowTo ;
159+
schema:name ?runbookTitle ;
160+
schema:about ?topic .
161+
}
162+
}
163+
""";
164+
165+
var result = await rootGraph.ExecuteFederatedSelectAsync(sparql, options);
166+
167+
Console.WriteLine(result.ServiceEndpointSpecifiers[0]);
168+
Console.WriteLine(result.Result.Rows[0].Values["policyTitle"]);
169+
```
170+
171+
The root graph does not need to contain the data being joined. It provides the execution boundary. Each `SERVICE` block is routed either to an allowlisted local binding or to a remote SPARQL endpoint.
172+
173+
## Raw Federated ASK Example
174+
175+
Use `ExecuteFederatedAskAsync` when the caller needs a boolean policy or readiness check across graph slices.
176+
177+
```csharp
178+
var ask = """
179+
PREFIX schema: <https://schema.org/>
180+
PREFIX kb: <urn:managedcode:markdown-ld-kb:vocab:>
181+
ASK WHERE {
182+
SERVICE <https://kb.example/services/policy> {
183+
?policy schema:about ?topic .
184+
}
185+
186+
SERVICE <https://kb.example/services/runbook> {
187+
?runbook schema:about ?topic ;
188+
kb:nextStep ?nextStep .
189+
}
190+
}
191+
""";
192+
193+
var decision = await rootGraph.ExecuteFederatedAskAsync(ask, options);
194+
195+
if (decision.Result)
196+
{
197+
Console.WriteLine(decision.ServiceEndpointSpecifiers.Count);
198+
}
199+
```
200+
201+
## Schema-Aware Federated Search Example
202+
203+
`SearchBySchemaFederatedAsync` compiles a `KnowledgeGraphSchemaSearchProfile` into one `SERVICE` block per configured endpoint. It is the right path when callers want SPARQL federation but do not want to hand-author the full query string.
204+
205+
```csharp
206+
var profile = new KnowledgeGraphSchemaSearchProfile
207+
{
208+
Prefixes = new Dictionary<string, string>(StringComparer.Ordinal)
209+
{
210+
["ex"] = "https://kb.example/vocab/",
211+
},
212+
FederatedServiceEndpoints =
213+
[
214+
new Uri("https://kb.example/services/policy"),
215+
new Uri("https://kb.example/services/runbook"),
216+
],
217+
TypeFilters = ["ex:Capability"],
218+
TextPredicates =
219+
[
220+
new KnowledgeGraphSchemaTextPredicate("schema:name", Weight: 1.2d),
221+
new KnowledgeGraphSchemaTextPredicate("ex:intent", Weight: 1.5d),
222+
],
223+
RelationshipPredicates =
224+
[
225+
new KnowledgeGraphSchemaRelationshipPredicate(
226+
"ex:requires",
227+
["ex:symptom", "skos:prefLabel"],
228+
Weight: 0.9d),
229+
],
230+
TermMode = KnowledgeGraphSchemaSearchTermMode.AllTerms,
231+
};
232+
233+
var search = await rootGraph.SearchBySchemaFederatedAsync(
234+
"restore cache",
235+
profile,
236+
options);
237+
238+
Console.WriteLine(search.Explain.GeneratedSparql);
239+
Console.WriteLine(search.ServiceEndpointSpecifiers[0]);
240+
Console.WriteLine(search.Matches[0].Evidence[0].ServiceEndpoint);
241+
```
242+
243+
Federated schema search returns primary matches and predicate evidence from service endpoints. It does not create a focused local graph because the related graph neighborhood may live only behind the remote service boundary.
244+
245+
## Remote Endpoint Example
246+
247+
Remote endpoints are allowed only when explicitly configured. Use a named profile when it matches the endpoint set, or construct an options object yourself.
248+
249+
```csharp
250+
var wikidataQuery = """
251+
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
252+
SELECT ?item ?itemLabel WHERE {
253+
SERVICE <https://query.wikidata.org/sparql> {
254+
?item rdfs:label ?itemLabel .
255+
FILTER(LANG(?itemLabel) = "en")
256+
}
257+
}
258+
LIMIT 10
259+
""";
260+
261+
var remote = await graph.ExecuteFederatedSelectAsync(
262+
wikidataQuery,
263+
FederatedSparqlProfiles.WikidataMain);
264+
```
265+
266+
Remote federation is query-time access only. It does not import remote triples into the local `KnowledgeGraph`; use JSON-LD/Turtle loading or a separate preprocessing step when the local graph needs to keep those facts.
267+
268+
## Allowlist Patterns
269+
270+
Recommended host policy:
271+
272+
- use stable logical service URIs for local graph slices, such as `https://kb.example/services/runbooks`
273+
- allowlist every `SERVICE` endpoint, including local bindings
274+
- bind local service endpoints with `FederatedSparqlLocalServiceBinding`
275+
- keep remote endpoint options separate from local-only test options
276+
- set `QueryExecutionTimeoutMilliseconds` for remote endpoints
277+
- inspect `ServiceEndpointSpecifiers` on success and on `FederatedSparqlQueryException`
278+
279+
Avoid:
280+
281+
- passing user-authored arbitrary endpoint URIs directly into `AllowedServiceEndpoints`
282+
- relying on variable `SERVICE ?endpoint` at the library boundary
283+
- expecting local `ExecuteSelectAsync` or `ExecuteAskAsync` to run top-level `SERVICE`
284+
- using federation as a hidden fallback when local schema search returns no matches
285+
- treating remote federation as graph ingestion
286+
287+
## Failure Example
288+
289+
Unallowlisted endpoints fail before execution:
290+
291+
```csharp
292+
var unsafeQuery = """
293+
SELECT ?s WHERE {
294+
SERVICE <https://unknown.example/sparql> {
295+
?s ?p ?o .
296+
}
297+
}
298+
""";
299+
300+
try
301+
{
302+
await graph.ExecuteFederatedSelectAsync(unsafeQuery, FederatedSparqlProfiles.WikidataMain);
303+
}
304+
catch (FederatedSparqlQueryException exception)
305+
{
306+
Console.WriteLine(exception.ServiceEndpointSpecifiers[0]);
307+
}
308+
```
309+
310+
Variable service specifiers also fail before execution:
311+
312+
```sparql
313+
SELECT ?s WHERE {
314+
VALUES ?endpoint { <https://query.wikidata.org/sparql> }
315+
SERVICE ?endpoint {
316+
?s ?p ?o .
317+
}
318+
}
319+
```
320+
321+
The library requires absolute endpoint IRIs in `SERVICE <...>` clauses so the allowlist can be evaluated before dotNetRDF executes the query.
322+
93323
## Main Flow
94324

95325
```mermaid
@@ -146,6 +376,7 @@ sequenceDiagram
146376

147377
- The local graph remains authoritative for Markdown-derived knowledge.
148378
- Federation supplements query-time access; it does not mutate the local graph automatically.
379+
- Schema-aware federated search uses the same `SERVICE` allowlist and local binding policy as raw federated SPARQL.
149380
- Local service bindings give hosts and tests a deterministic way to federate across multiple in-memory graphs without network access.
150381
- The adapter may expose endpoint profiles, but it does not own remote dataset semantics.
151382
- Wikidata federation often needs explicit graph-shape knowledge and endpoint selection because WDQS split the main and scholarly graphs in 2025.
@@ -165,6 +396,7 @@ Current verification focus:
165396
- deterministic tests for one-query multi-graph federation across five local graphs
166397
- deterministic tests for federated `ASK` across multiple local graphs
167398
- deterministic tests that local service bindings do not bypass the allowlist
399+
- deterministic tests for schema-aware federated search over local JSON-LD service bindings
168400

169401
## Definition Of Done
170402

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Graph Creation Contracts
2+
3+
## Purpose
4+
5+
Graph creation contracts connect the graph build pipeline with schema-aware SPARQL search. A build can now carry a `KnowledgeGraphBuildProfile` that bundles graph build options, the recommended `KnowledgeGraphSchemaSearchProfile`, and optional SHACL shapes. The resulting `MarkdownKnowledgeBuildResult.Contract` describes the RDF shape that was actually produced and validates the bundled search profile against that graph.
6+
7+
This is what "self-describing graph" means in this library: the graph can expose its RDF types, predicates, literal predicates, resource predicates, and profile mismatch diagnostics without requiring the caller to guess the schema from documentation.
8+
9+
## Flow
10+
11+
```mermaid
12+
flowchart LR
13+
Profile["KnowledgeGraphBuildProfile"] --> Pipeline["MarkdownKnowledgePipeline"]
14+
Profile --> SearchProfile["KnowledgeGraphSchemaSearchProfile"]
15+
Pipeline --> Graph["KnowledgeGraph"]
16+
Graph --> Schema["DescribeSchema"]
17+
SearchProfile --> Validation["ValidateSchemaSearchProfile"]
18+
Schema --> Contract["KnowledgeGraphContract"]
19+
Validation --> Contract
20+
Contract --> Search["SearchBySchemaAsync"]
21+
Search --> Focused["Focused graph snapshot"]
22+
Focused --> Export["JSON-LD / Turtle / Mermaid / DOT"]
23+
```
24+
25+
## Public API
26+
27+
- `KnowledgeGraphBuildProfile`
28+
- `MarkdownKnowledgePipelineOptions.BuildProfile`
29+
- `MarkdownKnowledgeBuildResult.Contract`
30+
- `KnowledgeGraph.DescribeSchema(...)`
31+
- `KnowledgeGraph.ValidateSchemaSearchProfile(...)`
32+
- `KnowledgeGraphContract.SerializeJson()`
33+
- `KnowledgeGraphContract.SerializeYaml()`
34+
- `KnowledgeGraphContract.LoadJson(...)`
35+
- `KnowledgeGraphContract.LoadYaml(...)`
36+
- `KnowledgeGraphContract.GenerateShacl()`
37+
- `KnowledgeGraphSnapshot.SerializeJsonLd()`
38+
- `KnowledgeGraphSnapshot.SerializeTurtle()`
39+
- `KnowledgeGraphSnapshot.SerializeMermaidFlowchart()`
40+
- `KnowledgeGraphSnapshot.SerializeDotGraph()`
41+
42+
## Build Profile Example
43+
44+
```csharp
45+
var searchProfile = new KnowledgeGraphSchemaSearchProfile
46+
{
47+
Prefixes = new Dictionary<string, string>(StringComparer.Ordinal)
48+
{
49+
["ex"] = "https://kb.example/vocab/",
50+
},
51+
TypeFilters = ["ex:Capability"],
52+
TextPredicates =
53+
[
54+
new KnowledgeGraphSchemaTextPredicate("schema:name"),
55+
new KnowledgeGraphSchemaTextPredicate("ex:intent"),
56+
],
57+
};
58+
59+
var pipeline = new MarkdownKnowledgePipeline(new MarkdownKnowledgePipelineOptions
60+
{
61+
ExtractionMode = MarkdownKnowledgeExtractionMode.None,
62+
BuildProfile = new KnowledgeGraphBuildProfile
63+
{
64+
Name = "capability-workflow",
65+
BuildOptions = new KnowledgeGraphBuildOptions(),
66+
SearchProfile = searchProfile,
67+
},
68+
});
69+
70+
var result = await pipeline.BuildFromMarkdownAsync(markdown);
71+
72+
if (result.Contract.Validation.IsValid)
73+
{
74+
var search = await result.Graph.SearchBySchemaAsync("restore cache", result.Contract.SearchProfile);
75+
}
76+
```
77+
78+
## Schema Introspection
79+
80+
`DescribeSchema` reads the actual in-memory RDF graph and returns:
81+
82+
- `RdfTypes`
83+
- `Predicates`
84+
- `LiteralPredicates`
85+
- `ResourcePredicates`
86+
87+
`ValidateSchemaSearchProfile` checks that profile terms resolve and exist in the expected graph role. It reports missing type filters, missing literal predicates, missing resource relationship predicates, missing relationship target predicates, missing expansion predicates, missing facet predicates, and unknown prefixes.
88+
89+
## Focused Graph Export
90+
91+
Schema-aware search can return a focused graph. That snapshot can now be exported directly:
92+
93+
```csharp
94+
var result = await graph.SearchBySchemaAsync("cache recovery", profile);
95+
96+
string jsonLd = result.FocusedGraph.SerializeJsonLd();
97+
string turtle = result.FocusedGraph.SerializeTurtle();
98+
string mermaid = result.FocusedGraph.SerializeMermaidFlowchart();
99+
string dot = result.FocusedGraph.SerializeDotGraph();
100+
```
101+
102+
The focused graph export is intended for result handoff to UI, agents, follow-up SPARQL, and external preprocessing steps.
103+
104+
## Production Handoff
105+
106+
Contract artifacts are the durable companion to generated JSON-LD. Store the graph JSON-LD together with `KnowledgeGraphContract.SerializeJson()` or `SerializeYaml()`. A later process can reload both, validate the graph with `GenerateShacl()`, and run `SearchBySchemaAsync` through the contract profile without repeating Markdown parsing or AI extraction.
107+
108+
For the full production flow, including generated JSON-LD, source-backed evidence, graph diffing, presets, and incremental manifests, see [Graph Production Pipeline](GraphProductionPipeline.md).
109+
110+
## Verification
111+
112+
```bash
113+
dotnet test --solution MarkdownLd.Kb.slnx --configuration Release -- --treenode-filter "/*/*/GraphContractAndAdvancedSearchFlowTests/*" --no-progress
114+
```
115+
116+
Covered scenarios:
117+
118+
- pipeline build profile returns a search-ready contract
119+
- schema introspection describes actual RDF types and predicates
120+
- profile validation reports missing terms
121+
- advanced search profiles support all-terms mode, inbound relationships, property paths, and facets
122+
- focused graph snapshots export to JSON-LD, Turtle, Mermaid, and DOT

0 commit comments

Comments
 (0)