Skip to content

Commit 23bdb31

Browse files
avrabeclaude
andauthored
feat(dwarf): make --dwarf remap work on real DWARF + witness oracle (v0.20.0) (#209)
* feat(dwarf): make --dwarf remap work on real DWARF + witness oracle (#130) Falsifying the v0.19.0 single-source remap against a real Rust component (hello_rust.wasm) surfaced four address classes the synthetic oracle missed, plus gimli's all-or-nothing failure mode. All fixed: - low_pc is the function BODY START (locals-decl byte), not the first instruction — the locals/prefix region now maps linearly (locals are preserved verbatim). Previously every low_pc underflowed to a miss. - Exclusive-end addresses (high_pc-as-addr, range ends, line end_sequence) map to output_body_end. - LLVM emits fixed-width zero-padded LEBs for relocatable indices, so DWARF rows can land inside operand bytes — these resolve to the CONTAINING operator (exact on boundaries) for correct line attribution. - Linear-memory/data addresses (DW_OP_addr) at/beyond the code extent pass through unchanged (correct under multi-memory fusion). - Per-address tombstoning replaces all-or-nothing strip: gimli's Dwarf::from aborts on a single unmappable address, which on real modules (dead-code gaps) stripped ALL DWARF. convert_address is now total — correct offset where mappable, identity for data/tombstones, 0xFFFF_FFFF for unmappable code. Guarantee strengthens to "every emitted address is correct or an ignored tombstone, never wrong". Witness oracle (tests/dwarf_remap_witness.rs, gimli dev-dep): fuses a real Rust component with --dwarf remap and asserts every non-tombstone subprogram low_pc equals some fused function's component-provenance code_range.start (the pulseengine/witness contract, in-tree). On the fixture all but 9 of 225 functions map exactly; the 9 are dead code. LS-D-1 amended with the v0.20.0 hardening. Multi-source still strips (gimli writer-API blocker, see #208). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(dwarf): tombstone ambiguous function-boundary addresses (Mythos #209) The Mythos auto-scan flagged that `AddressRemap::translate` could emit a plausible-but-wrong address at a function boundary. Input bodies are contiguous, so one address is both function A's exclusive end (A.input_end) and the next function B's start (B.input_start); the BTreeMap range lookup selects B and returned B.output_body_start, which differs from A.output_body_end when A and B are not adjacent in the fused output (interleaved / reordered functions). A's high_pc-as-addr, range-list end, and end_sequence then mapped to a wrong offset. `translate` now detects the divergent-boundary case (prev.output_body_end != span.output_body_start) and returns None so the caller tombstones the address rather than emit a wrong one. When the two interpretations coincide (input order preserved — the single-source case meld emits) the boundary resolves cleanly. Pinned by `translate_ambiguous_boundary_tombstones_when_reordered`; LS-D-1 amended. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(dwarf): code_extent covers dropped functions, not just survivors (Mythos #209) Second Mythos auto-scan finding: AddressRemap::code_extent used the max input_end over REGISTERED spans, so a source module's dropped trailing function (DCE'd, no span) has input addresses exceeding code_extent. convert_address then classified them as linear-memory data and passed them through as stale input-module offsets instead of tombstoning. code_extent is now the input module's full code-section extent (every input function, via module_function_layouts which parses them all), so a dropped function's address stays in the code range and tombstones. Pinned by `code_extent_covers_dropped_trailing_function`; LS-D-1 amended. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent b028e43 commit 23bdb31

7 files changed

Lines changed: 564 additions & 28 deletions

File tree

CHANGELOG.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,58 @@ All notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
## [0.20.0] - 2026-05-29
8+
9+
### Changed
10+
11+
- **`DwarfHandling::Remap` now works on real compiler-emitted DWARF**
12+
(#143 / #130, witness integration). Falsifying the v0.19.0 remap
13+
against a real Rust component (`hello_rust.wasm`) surfaced four
14+
address classes the synthetic oracle missed; all are now handled, and
15+
the all-or-nothing gimli failure mode is replaced with per-address
16+
tombstoning.
17+
- **`low_pc` is the function body start.** WebAssembly DWARF measures
18+
`DW_AT_low_pc` from the locals-declaration byte, not the first
19+
instruction. The locals/prefix region now maps linearly (locals are
20+
preserved verbatim during fusion). Previously every function's
21+
`low_pc` underflowed to a miss.
22+
- **Exclusive-end addresses** (`high_pc`-as-address, range-list ends,
23+
line-program `end_sequence`) map to the output body end.
24+
- **Padded LEBs.** LLVM emits fixed-width zero-padded LEBs for
25+
relocatable indices, so DWARF rows can land inside an operator's
26+
operand bytes; these now resolve to the **containing** operator
27+
(exact on instruction boundaries), giving correct source-line
28+
attribution.
29+
- **Data addresses** (`DW_OP_addr` linear-memory locations) at or
30+
beyond the code extent pass through unchanged (correct under
31+
multi-memory fusion).
32+
- **Per-address tombstoning replaces all-or-nothing strip.**
33+
`gimli::write::Dwarf::from` aborts the entire conversion on a single
34+
unmappable address, so on real modules (hundreds of functions,
35+
dead-code gaps) the strict gate stripped *all* DWARF. The
36+
`convert_address` closure is now total: correct fused offset where
37+
mappable, identity for data/existing tombstones, and the DWARF
38+
tombstone `0xFFFF_FFFF` for unmappable code (functions meld dropped).
39+
The LS-D-1 guarantee strengthens to *"every emitted address is
40+
correct or an ignored tombstone — never a plausible-but-wrong
41+
address."*
42+
43+
### Added
44+
45+
- **DWARF remap falsification oracle** (`tests/dwarf_remap_witness.rs`,
46+
#130). Fuses a real Rust component with `--dwarf remap` and asserts
47+
every non-tombstone `DW_TAG_subprogram` `low_pc` in the *output* DWARF
48+
equals some fused function's component-provenance `code_range.start`
49+
the `pulseengine/witness` code-offset → source-line contract, checked
50+
in-tree with `gimli`. On the fixture all but 9 of 225 functions map
51+
exactly (the 9 are dead code, tombstoned). Updates LS-D-1.
52+
53+
### Notes
54+
55+
- Multi-DWARF-source fusion still falls back to `strip` (merging
56+
independent DWARF unit sets is blocked by gimli's writer API — see
57+
issue #208 for the analysis and the relocator design).
58+
759
## [0.19.0] - 2026-05-29
860

961
### Added

Cargo.lock

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ exclude = [
1010
]
1111

1212
[workspace.package]
13-
version = "0.19.0"
13+
version = "0.20.0"
1414
authors = ["PulseEngine <https://github.com/pulseengine>"]
1515
edition = "2024"
1616
license = "Apache-2.0"

meld-core/Cargo.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,9 @@ petgraph = "0.8.3"
4343
[dev-dependencies]
4444
proptest.workspace = true
4545
wasmprinter.workspace = true
46+
# Re-declared for integration tests (tests/dwarf_remap_witness.rs) to
47+
# parse and validate the remapped DWARF in the fused output.
48+
gimli.workspace = true
4649
wat = "1.219"
4750
wasmtime = "41.0.1"
4851
wasmtime-wasi = "41.0.4"

0 commit comments

Comments
 (0)