33This repo now includes a standalone ingestion surface at ` reqif_ingest_cli/ ` .
44It keeps source-document intake separate from the existing ` reqif_mcp ` ReqIF parser and policy server.
55
6+ Executive view:
7+
8+ - this is the deterministic artifact-to-ReqIF derivation pipeline
9+ - its purpose is traceable extraction, not policy judgement
10+
11+ Engineer view:
12+
13+ - use this surface when the starting point is an artifact such as XLSX, PDF, DOCX, or Markdown
14+ - use ` reqif_mcp ` when the starting point is already ReqIF
15+
16+ ``` mermaid
17+ flowchart LR
18+ ART[Source artifact] --> EXT[Deterministic extraction]
19+ EXT --> DG[document_graph]
20+ DG --> CAND[requirement_candidate]
21+ CAND --> REQIF[Derived ReqIF]
22+ ```
23+
624## Scope
725
826- Deterministic first pass only.
@@ -12,6 +30,15 @@ It keeps source-document intake separate from the existing `reqif_mcp` ReqIF par
1230- ReqIF stays derived: ` artifact -> document_graph -> requirement_candidate -> reqif ` .
1331- Optional LLM quality-eval hooks use an Azure Foundry/OpenAI-compatible adapter and are disabled by default.
1432
33+ ``` mermaid
34+ flowchart TD
35+ INPUT[Artifact input] --> HASH[Register hash and metadata]
36+ HASH --> STRUCTURE[Extract structure]
37+ STRUCTURE --> DISTILL[Deterministic distillation]
38+ DISTILL --> OUTPUT[Emit derived ReqIF]
39+ OUTPUT -. optional review .-> LLM[Foundry quality hook]
40+ ```
41+
1542## Commands
1643
1744Use the local justfile:
@@ -20,18 +47,28 @@ Use the local justfile:
2047just -f reqif_ingest_cli/justfile --list
2148```
2249
50+ ``` mermaid
51+ flowchart LR
52+ TEST[test] --> LINT[lint]
53+ LINT --> TYPE[typecheck]
54+ TYPE --> ARTIFACT[artifact]
55+ ARTIFACT --> EXTRACT[extract]
56+ EXTRACT --> DISTILL[distill]
57+ DISTILL --> EMIT[emit]
58+ ```
59+
2360Common commands:
2461
2562``` bash
2663just -f reqif_ingest_cli/justfile test
2764just -f reqif_ingest_cli/justfile lint
2865just -f reqif_ingest_cli/justfile typecheck
2966
30- just -f reqif_ingest_cli/justfile artifact " The AESCSF v2 Core.xlsx"
31- just -f reqif_ingest_cli/justfile extract " The AESCSF v2 Core.xlsx"
32- just -f reqif_ingest_cli/justfile distill " The AESCSF v2 Core.xlsx"
67+ just -f reqif_ingest_cli/justfile artifact " samples/aemo/ The AESCSF v2 Core.xlsx"
68+ just -f reqif_ingest_cli/justfile extract " samples/aemo/ The AESCSF v2 Core.xlsx"
69+ just -f reqif_ingest_cli/justfile distill " samples/aemo/ The AESCSF v2 Core.xlsx"
3370just -f reqif_ingest_cli/justfile emit \
34- " The AESCSF v2 Core.xlsx" \
71+ " samples/aemo/ The AESCSF v2 Core.xlsx" \
3572 " evidence_store/toolkits/aemo/aescsf-core.reqif" \
3673 auto \
3774 " AESCSF Core Derived Baseline"
@@ -49,26 +86,66 @@ just -f reqif_ingest_cli/justfile smoke-aemo-toolkit
4986The standalone module runs directly with ` uv ` :
5087
5188``` bash
52- uv run python -m reqif_ingest_cli register-artifact " The AESCSF v2 Core.xlsx" --pretty
53- uv run python -m reqif_ingest_cli extract " The AESCSF v2 Core.xlsx" --pretty
54- uv run python -m reqif_ingest_cli distill " The AESCSF v2 Core.xlsx" --pretty
89+ uv run python -m reqif_ingest_cli register-artifact " samples/aemo/ The AESCSF v2 Core.xlsx" --pretty
90+ uv run python -m reqif_ingest_cli extract " samples/aemo/ The AESCSF v2 Core.xlsx" --pretty
91+ uv run python -m reqif_ingest_cli distill " samples/aemo/ The AESCSF v2 Core.xlsx" --pretty
5592uv run python -m reqif_ingest_cli emit-reqif \
56- " The AESCSF v2 Core.xlsx" \
93+ " samples/aemo/ The AESCSF v2 Core.xlsx" \
5794 --title " AESCSF Core Derived Baseline" \
5895 --output " evidence_store/toolkits/aemo/aescsf-core.reqif" \
5996 --pretty
6097uv run python -m reqif_ingest_cli foundry-config --pretty
6198```
6299
100+ ``` mermaid
101+ sequenceDiagram
102+ participant User
103+ participant CLI as reqif_ingest_cli
104+ participant FS as source artifact
105+ participant Out as JSON / ReqIF output
106+
107+ User->>CLI: register-artifact
108+ CLI->>FS: hash + inspect metadata
109+ User->>CLI: extract
110+ CLI->>Out: document_graph
111+ User->>CLI: distill
112+ CLI->>Out: requirement_candidate
113+ User->>CLI: emit-reqif
114+ CLI->>Out: derived ReqIF
115+ ```
116+
63117## What It Emits
64118
65119- ` artifact/1 ` : immutable source hash, media type, file format, source path, profile
66120- ` document_graph/1 ` : sections, rows, paragraphs, anchors, semantic IDs
67121- ` requirement_candidate/1 ` : deterministic candidate text, rationale, rule ID, provenance
68122- ReqIF XML: minimal derived baseline that round-trips through the current parser
69123
124+ ``` mermaid
125+ flowchart LR
126+ A[artifact/1] --> G[document_graph/1]
127+ G --> C[requirement_candidate/1]
128+ C --> R[ReqIF XML]
129+ ```
130+
131+ See also:
132+
133+ - ` samples/README.md `
134+ - ` samples/aemo/README.md `
135+ - ` samples/contracts/README.md `
136+
70137## Current Profiles
71138
139+ ``` mermaid
140+ flowchart LR
141+ XLSX[XLSX] --> CORE[aescsf_core_v2]
142+ XLSX --> TOOLKIT[aescsf_toolkit_v1_1]
143+ XLSX --> GENERIC[generic_xlsx_table]
144+ PDF[PDF] --> PDFP[pdf_docling_v1]
145+ DOCX[DOCX] --> DOCXP[docx_docling_v1]
146+ MD[Markdown] --> MDP[markdown_docling_v1]
147+ ```
148+
72149- ` aescsf_core_v2 `
73150 - Detects the flat AESCSF core workbook layout.
74151 - Preserves paragraph chunks inside ` Context and Guidance ` .
@@ -102,3 +179,26 @@ uv run python -m reqif_ingest_cli foundry-config --pretty
102179```
103180
104181The adapter is for review and remapping only. It is not part of the deterministic first pass.
182+
183+ ``` mermaid
184+ flowchart LR
185+ DISTILL[Deterministic distillation] --> REVIEW{Foundry configured?}
186+ REVIEW -- no --> DONE[Use deterministic output]
187+ REVIEW -- yes --> QA[Quality review / remap hints]
188+ QA --> DONE
189+ ```
190+
191+ ## Current Gaps
192+
193+ - no baseline diff command yet
194+ - no ingest MCP tool surface yet
195+ - AESCSF mappings are still code-first rather than externalized config
196+ - rich PDF structure extraction still depends on offline Docling model availability
197+
198+ ``` mermaid
199+ flowchart LR
200+ NOW[Current CLI] --> NEXT1[MCP tool surface]
201+ NOW --> NEXT2[Baseline diffing]
202+ NOW --> NEXT3[Externalized profile config]
203+ NOW --> NEXT4[Richer offline PDF structure]
204+ ```
0 commit comments