You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ADR/ADR-0001-rdf-sparql-library.md
+9-1Lines changed: 9 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,7 @@ Related Features: `docs/Architecture.md`
11
11
-[x] Analyze upstream graph stack and .NET options.
12
12
-[x] Choose the RDF/SPARQL dependency.
13
13
-[x] Add dotNetRDF to the production project.
14
+
-[x] Add `dotNetRdf.Shacl` when graph validation became a first-class library boundary.
14
15
-[x] Add flow tests that query generated graphs through SPARQL.
15
16
-[x] Run build, test, format, and coverage commands.
16
17
-[x] Update `docs/Architecture.md` if dependency boundaries change.
@@ -41,12 +42,13 @@ Non-goals:
41
42
42
43
## Decision
43
44
44
-
Use dotNetRDF as the RDF graph, serialization, and SPARQL engine for the first .NET implementation slice.
45
+
Use dotNetRDF as the RDF graph, serialization, SPARQL, and SHACL validation engine for the .NET implementation.
45
46
46
47
Key points:
47
48
48
49
- dotNetRDF replaces Python RDFLib for the C# port.
49
50
- The selected package supports RDF/SPARQL in .NET and the user guide documents in-memory RDF data and in-memory SPARQL querying, which matches the no-server core runtime boundary.
51
+
- The `dotNetRdf.Shacl` package provides a SHACL processor over in-memory RDF graphs, which keeps validation standards-based and local.
50
52
- Markdig and YamlDotNet will handle Markdown/front matter parsing separately.
51
53
- AI extraction remains behind an extraction port that uses `Microsoft.Extensions.AI.IChatClient`; provider/orchestration packages are not part of this RDF dependency decision.
52
54
@@ -64,6 +66,7 @@ flowchart LR
64
66
None --> Builder
65
67
Builder --> DotNetRdf["dotNetRDF graph"]
66
68
DotNetRdf --> Sparql["Local SPARQL execution"]
69
+
DotNetRdf --> Shacl["Local SHACL validation"]
67
70
DotNetRdf --> Turtle["Turtle writer"]
68
71
DotNetRdf --> JsonLd["JSON-LD writer"]
69
72
```
@@ -99,13 +102,15 @@ flowchart LR
99
102
### Negative / risks
100
103
101
104
- The core library takes a dependency on dotNetRDF APIs.
105
+
- SHACL validation uses dotNetRDF report objects internally, but public results stay in repository-owned models.
102
106
- JSON-LD support may require a specific package shape or writer availability in the selected version.
103
107
- Performance characteristics are inherited from dotNetRDF and must be measured before promising large-scale query throughput.
104
108
105
109
Mitigations:
106
110
107
111
- Hide dependency details behind `KnowledgeGraph` query methods, `KnowledgeSearchService`, and serialization methods where practical.
108
112
- Add tests for serialization and SPARQL query paths.
113
+
- Add tests for SHACL conformance and violation report paths.
109
114
- Keep remote/federated SPARQL out of the first slice.
110
115
111
116
## Impact
@@ -151,6 +156,7 @@ Mitigations:
151
156
- Serialize the graph and parse/inspect the output.
152
157
- Negative flows:
153
158
- Reject mutating SPARQL operations.
159
+
- Validate malformed graph metadata through SHACL reports.
- RDF graph building and SPARQL execution depend on dotNetRDF.
148
+
- SHACL validation depends on `dotNetRdf.Shacl` and runs against the in-memory graph through `VDS.RDF.Shacl.ShapesGraph`.
141
149
- LLM extraction depends on `Microsoft.Extensions.AI.Abstractions` and accepts `IChatClient`.
142
150
- Tiktoken extraction depends on `Microsoft.ML.Tokenizers` and the O200k data package. It uses tokenizer IDs and Unicode word n-gram keyphrase candidates only, and does not add an embedding provider. The default vector weighting is subword TF-IDF fitted over the current build corpus.
143
151
- Embeddings are not required for the core graph build/query flow.
- Empty Markdown input produces an empty graph without throwing.
155
163
- Explicit Tiktoken mode builds section/segment/topic/entity-hint nodes plus `schema:hasPart`, `schema:about`, `schema:mentions`, and token-distance `kb:relatedTo` edges without network access.
156
164
- Capability graph rules build `kb:memberOf`, `kb:relatedTo`, and `kb:nextStep` workflow edges from Markdown front matter or caller options, and focused search returns primary, related, and next-step result groups.
165
+
- SHACL validation uses default Markdown-LD Knowledge Bank shapes or caller-supplied shapes, and assertion confidence/provenance metadata is represented as RDF statements so validation remains RDF-native.
157
166
- English, Ukrainian, French, and German queries over same-language token graphs produce a higher hit rate than cross-language translated-topic queries.
158
167
- Term frequency, binary presence, and subword TF-IDF token weighting modes are covered by focused and flow tests.
159
168
- SPARQL mutating queries are rejected before execution.
@@ -183,3 +192,4 @@ Coverage requirement: 95%+ line coverage for changed production code.
Markdown-LD Knowledge Bank validates built RDF graphs with SHACL so callers can detect malformed graph construction through a standards-based report instead of custom post-processing.
8
+
9
+
The feature uses `dotNetRdf.Shacl` over the in-memory `KnowledgeGraph`. It does not add a server, database, cache, provider SDK, or Python runtime.
-`schema:Article` nodes have `schema:name` and IRI provenance.
30
+
- common entity classes have `schema:name`.
31
+
-`schema:sameAs` values are IRIs.
32
+
-`prov:wasDerivedFrom` values are IRIs.
33
+
- reified `rdf:Statement` assertion metadata has one IRI subject, predicate, object, and a decimal `kb:confidence` from 0 through 1.
34
+
35
+
Callers can pass custom Turtle SHACL shapes to `KnowledgeGraph.ValidateShacl(shapesTurtle)` or `MarkdownKnowledgeBuildResult.ValidateShacl(shapesTurtle)`.
36
+
37
+
## Assertion Metadata
38
+
39
+
Graph assertions remain direct RDF edges for existing SPARQL/search callers. Each assertion also receives RDF reification metadata:
Statement -->|"prov:wasDerivedFrom"| Source["source IRI or invalid literal"]
49
+
```
50
+
51
+
Invalid caller-authored `sameAs` and provenance values are represented as literals so SHACL can report node-kind violations instead of silently dropping them.
52
+
53
+
## Testing Methodology
54
+
55
+
Flow tests cover:
56
+
57
+
- valid Markdown and configured graph rules conform to the default shapes;
58
+
- invalid `schema:sameAs`, provenance, and assertion confidence produce SHACL results;
59
+
- caller-supplied shapes validate the same built graph;
60
+
- sameAs-first entity merge rewrites assertion endpoints before validation.
61
+
62
+
Verification commands:
63
+
64
+
-`dotnet build MarkdownLd.Kb.slnx --no-restore`
65
+
-`dotnet test --solution MarkdownLd.Kb.slnx --configuration Release`
66
+
-`dotnet format MarkdownLd.Kb.slnx --verify-no-changes`
0 commit comments