Commit 8945b7f
committed
feat(xsd): transcode extract_classes.py to Rust — byte-faithful, closes XSD↔TTL bijection, drops the Python oracle dep
The MARS XSD classification extractor (arago/MARS-Schema/tools/extract_classes.py,
~360 lines: ~140 extraction logic + ~150 table formatting) is now a
faithful Rust transcode at crates/ogar-from-schema/src/xsd.rs, behind
an optional `xsd` feature (pulls roxmltree, pure-Rust read-only XML
DOM; the default TTL path stays zero-parser-deps).
Not huge — the transcode is ~350 LOC Rust including tests. And it
doubles as the seed of the broader XSD → Class front-end: the same
walk that extracts classifications is the structural-arm lift for any
XSD schema.
What it lands:
* BYTE-FOR-BYTE transcode proof. xsd::to_asciidoc() reproduces the
Python `-F asciidoc` output exactly — 628 lines, including verbatim
XSD-documentation whitespace and the printAsciiDocFooter trailing
newline. Test xsd::tests::asciidoc_matches_python_oracle diffs
against the cached _oracle/classifications.adoc.
* XSD↔TTL BIJECTION CLOSED (was "queued" in MARS-TRANSCODING.md §2).
xsd::tests::xsd_classes_match_ttl_enum asserts full bidirectional
set-equality between the XSD-extracted Application value set and the
TTL validation-parameter enum — not just one-directional membership.
Two independent encodings of one taxonomy, provably equal both ways.
* PYTHON DEPENDENCY REMOVED from the calibration path. `cargo test
--features xsd` is the whole oracle now; no python3 interpreter
needed. extract_classes.py stays vendored in _oracle/ as the
provenance witness (what the transcode was proven against), not a
runtime dep.
Transcode discipline (faithful to the Python semantics):
* getAttribute("xml:lang") returns "" for absent (not None); lang
filter is "absent OR en". roxmltree resolves xml: to the xml
namespace, matched on attribute.name() == "lang".
* getXMLText concatenates DIRECT text-node children only (not
recursive); the documentation's internal whitespace is load-bearing
for the byte-match.
* :revdate: is datetime.now() in Python (non-deterministic); the Rust
to_asciidoc(c, revdate) takes it as a parameter so output is
reproducible and testable.
* The two-level extension chain (master complexType carries NodeType,
intermediate complexType carries Class, leaf element carries
SubClass) + the post-process phase that stitches base→element is
reproduced exactly, including the master_types gate.
Tests: 20/20 with --features xsd (16 default + 4 new xsd); 16/16 on
default (xsd code fully feature-gated). Clippy-clean (--no-deps),
fmt-clean.
Docs:
* docs/MARS-TRANSCODING.md §2 — bijection marked closed; Python-dep
removal noted; the Rust extractor added to the oracle-direction table.
* .claude/board/EPIPHANIES.md — FINDING with the transcode-discipline
notes for the next source→Rust port.1 parent 7d68042 commit 8945b7f
5 files changed
Lines changed: 552 additions & 11 deletions
File tree
- .claude/board
- crates/ogar-from-schema
- src
- docs
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
18 | 70 | | |
19 | 71 | | |
20 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
14 | 19 | | |
15 | 20 | | |
16 | 21 | | |
17 | 22 | | |
| 23 | + | |
18 | 24 | | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
24 | 31 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
| 64 | + | |
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
| |||
0 commit comments