Skip to content

Commit 58e2c2d

Browse files
avrabeclaude
andauthored
feat(dwarf): AddressRemap engine (#143 DWARF Phase 2 inc 3a) (#204)
The mathematical core of DWARF address remapping: composes the two anchors from v0.16.0 (per-function base via component-provenance v2 code_range) and v0.17.0 (intra-function InstrOffsetMap) into one input→output code-address translation. - meld-core/src/dwarf.rs: FunctionSpan + AddressRemap with translate(input_addr) -> Option<output_addr> - reconciles the three byte-offset spaces (input DWARF address, instruction-stream offset, output DWARF address), accounting for the locals-prefix length that cancels between input and output - BTreeMap range lookup picks the containing function span; misses (outside any function, off an instruction boundary, inside the locals prefix) return None so the gimli converter drops the address rather than emitting a wrong one 6 unit tests: identity offsets, instruction-offset shift (LEB growth), locals-prefix handling, multi-function selection, miss cases, and locals-prefix underflow guard. This is increment 3a (the engine). Increment 3b wires it into a gimli write::Dwarf::from(convert_address) rewrite of the .debug_* sections behind a new DwarfHandling::Remap mode. gimli was added then removed here to avoid shipping an unused dependency — it returns in 3b where it is actually used. 301 lib tests green, clippy + fmt clean. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent b57bb92 commit 58e2c2d

3 files changed

Lines changed: 289 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,28 @@ All notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
### Added
8+
9+
- **DWARF `AddressRemap` engine** (#143 DWARF Phase 2 increment 3a,
10+
`meld-core/src/dwarf.rs`). Composes the two anchors released in
11+
v0.16.0 (per-function base via component-provenance v2 `code_range`)
12+
and v0.17.0 (intra-function `InstrOffsetMap`) into a single
13+
input→output code-address translation: `AddressRemap::translate`
14+
takes an input code-section-relative DWARF address and returns the
15+
fused-output address, or `None` when the address is outside any
16+
function or off an instruction boundary (the signal for the gimli
17+
converter to drop it rather than emit a wrong address). The module
18+
carefully reconciles the three byte-offset spaces — input DWARF
19+
address, instruction-stream offset, output DWARF address —
20+
accounting for the shared locals-prefix length that cancels between
21+
input and output. 6 unit tests pin identity mapping, instruction-
22+
offset shift (LEB growth), locals-prefix handling, multi-function
23+
selection, and the miss cases (outside functions, mid-instruction,
24+
inside-locals-prefix underflow). This is the mathematical core of
25+
DWARF Phase 2; increment 3b wires it into a `gimli`-driven
26+
`write::Dwarf::from(convert_address)` rewrite of the `.debug_*`
27+
sections behind a new `DwarfHandling::Remap` mode.
28+
729
## [0.17.0] - 2026-05-28
830

931
### Added

meld-core/src/dwarf.rs

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
//! DWARF address remapping for fused output — issue #143 Phase 2.
2+
//!
3+
//! When meld fuses N components into one core module, each function
4+
//! body moves to a new offset in the merged code section AND its
5+
//! internal byte layout shifts (the rewriter changes operand index
6+
//! values whose LEB128 encodings change length — see
7+
//! [`crate::rewriter::InstrOffsetMap`]). WebAssembly DWARF encodes
8+
//! code addresses as offsets relative to the start of the code
9+
//! section's contents, so every `DW_AT_low_pc`, line-number-program
10+
//! address, and range entry in the input DWARF is wrong for the
11+
//! fused output unless remapped.
12+
//!
13+
//! This module composes the two anchors built in increments 1 and 2:
14+
//!
15+
//! - **Per-function base** (increment 1): where each function's body
16+
//! lands in the *merged* code section, from the component-provenance
17+
//! v2 `code_range` ([`crate::provenance::CodeRange`]).
18+
//! - **Intra-function instruction offsets** (increment 2): how byte
19+
//! offsets shift *within* a rewritten function body, from
20+
//! [`crate::rewriter::InstrOffsetMap`].
21+
//!
22+
//! into an [`AddressRemap`]: a function from an input code-section-
23+
//! relative address to its fused-output code-section-relative
24+
//! address. Increment 3 (the gimli section rewrite) uses this as the
25+
//! `convert_address` closure for `gimli::write::Dwarf::from`.
26+
//!
27+
//! ## Offset-convention reconciliation
28+
//!
29+
//! Three byte-offset spaces meet here, and getting their bases
30+
//! aligned is the whole game:
31+
//!
32+
//! 1. **Input DWARF address** `A`: code-section-relative offset in the
33+
//! *source* component. Points at an instruction. To locate which
34+
//! function `A` is in, we need each input function body's
35+
//! code-section-relative span (`FunctionSpan::input`).
36+
//! 2. **Instruction-stream offset**: relative to the first instruction
37+
//! of a function body (after the locals-declaration vector). The
38+
//! [`InstrOffsetMap`](crate::rewriter::InstrOffsetMap) keys on this.
39+
//! Converting `A` to this space means subtracting the input
40+
//! function body's start AND the locals-prefix length.
41+
//! 3. **Output DWARF address** `A'`: code-section-relative offset in
42+
//! the *merged* module = merged function body start
43+
//! (`FunctionSpan::output_body_start`) + output locals-prefix
44+
//! length + new instruction-stream offset.
45+
//!
46+
//! Because meld preserves a function's locals declarations verbatim
47+
//! (the rewriter only converts val-types, never adds/removes locals
48+
//! except the address-rebasing scratch locals, which are off in the
49+
//! DWARF-remap path), the locals-prefix length is identical on input
50+
//! and output. So the prefix cancels when both are equal, and the
51+
//! [`FunctionSpan`] records it once as `locals_prefix_len`.
52+
53+
use crate::rewriter::InstrOffsetMap;
54+
use std::collections::BTreeMap;
55+
56+
/// One fused function's mapping data: where it was in the input code
57+
/// section, where it landed in the output, the shared locals-prefix
58+
/// length, and the per-instruction offset shift.
59+
#[derive(Debug, Clone)]
60+
pub struct FunctionSpan {
61+
/// `[start, end)` of this function body in the **input** code
62+
/// section (code-section-relative), including the locals prefix.
63+
pub input_start: u32,
64+
pub input_end: u32,
65+
/// Start of this function body in the **output** (merged) code
66+
/// section (code-section-relative), including the locals prefix.
67+
/// This is the v2 provenance `code_range.start`.
68+
pub output_body_start: u32,
69+
/// Byte length of the locals-declaration vector at the head of the
70+
/// body — identical on input and output (locals are preserved).
71+
/// The instruction stream begins `locals_prefix_len` bytes past
72+
/// each body start.
73+
pub locals_prefix_len: u32,
74+
/// Per-instruction old→new offset map (instruction-stream-relative).
75+
pub instr_offsets: InstrOffsetMap,
76+
}
77+
78+
impl FunctionSpan {
79+
/// `true` if the input code address `addr` falls within this
80+
/// function body's input span.
81+
fn contains_input(&self, addr: u32) -> bool {
82+
addr >= self.input_start && addr < self.input_end
83+
}
84+
}
85+
86+
/// Composed input→output code-address remapper for fused DWARF.
87+
///
88+
/// Built from the per-function [`FunctionSpan`]s collected during
89+
/// fusion. Lookups are by input code-section-relative address; the
90+
/// result is the output code-section-relative address, or `None` when
91+
/// the address can't be mapped (outside any known function, or not on
92+
/// a recorded instruction boundary — DWARF code addresses always sit
93+
/// at instruction starts, so a miss is a genuine "don't emit this
94+
/// address" signal for the gimli converter).
95+
#[derive(Debug, Clone, Default)]
96+
pub struct AddressRemap {
97+
/// Indexed by input_start for an O(log n) containing-function
98+
/// lookup. Spans are non-overlapping (function bodies are laid
99+
/// out sequentially), so the greatest key ≤ addr is the candidate.
100+
by_input_start: BTreeMap<u32, FunctionSpan>,
101+
}
102+
103+
impl AddressRemap {
104+
pub fn new() -> Self {
105+
Self::default()
106+
}
107+
108+
/// Register a function's span. Panics in debug builds if two
109+
/// spans share an input_start (would indicate a merger bug —
110+
/// function bodies are distinct).
111+
pub fn insert(&mut self, span: FunctionSpan) {
112+
debug_assert!(
113+
!self.by_input_start.contains_key(&span.input_start),
114+
"duplicate input_start {} in AddressRemap",
115+
span.input_start
116+
);
117+
self.by_input_start.insert(span.input_start, span);
118+
}
119+
120+
/// Translate an input code-section-relative address to the fused
121+
/// output code-section-relative address.
122+
///
123+
/// Returns `None` if `addr` is not inside any registered function
124+
/// or does not land on a recorded instruction boundary.
125+
pub fn translate(&self, addr: u32) -> Option<u32> {
126+
// Greatest span whose input_start ≤ addr.
127+
let (_, span) = self.by_input_start.range(..=addr).next_back()?;
128+
if !span.contains_input(addr) {
129+
return None;
130+
}
131+
// Convert the input address to an instruction-stream offset:
132+
// subtract the body start and the locals prefix.
133+
let body_rel = addr - span.input_start;
134+
let instr_stream_old = body_rel.checked_sub(span.locals_prefix_len)?;
135+
// Map through the rewriter's instruction offset map.
136+
let instr_stream_new = span.instr_offsets.translate(instr_stream_old)?;
137+
// Reassemble the output address: merged body start + locals
138+
// prefix (unchanged) + new instruction-stream offset.
139+
Some(span.output_body_start + span.locals_prefix_len + instr_stream_new)
140+
}
141+
142+
/// Number of registered function spans.
143+
pub fn len(&self) -> usize {
144+
self.by_input_start.len()
145+
}
146+
147+
pub fn is_empty(&self) -> bool {
148+
self.by_input_start.is_empty()
149+
}
150+
}
151+
152+
#[cfg(test)]
153+
mod tests {
154+
use super::*;
155+
use crate::rewriter::{InstrOffset, InstrOffsetMap};
156+
157+
fn span(
158+
input_start: u32,
159+
input_end: u32,
160+
output_body_start: u32,
161+
locals_prefix_len: u32,
162+
offsets: &[(u32, u32)],
163+
) -> FunctionSpan {
164+
FunctionSpan {
165+
input_start,
166+
input_end,
167+
output_body_start,
168+
locals_prefix_len,
169+
instr_offsets: InstrOffsetMap {
170+
entries: offsets
171+
.iter()
172+
.map(|&(old, new)| InstrOffset { old, new })
173+
.collect(),
174+
},
175+
}
176+
}
177+
178+
/// Single function, no locals prefix, identity instruction map:
179+
/// an input address maps to output_body_start + same offset.
180+
#[test]
181+
fn translate_single_function_identity_offsets() {
182+
let mut remap = AddressRemap::new();
183+
// Input body [10, 20), output body starts at 100, no locals,
184+
// instructions at stream offsets 0,2,3.
185+
remap.insert(span(10, 20, 100, 0, &[(0, 0), (2, 2), (3, 3)]));
186+
187+
// Input addr 10 = body start = instr-stream 0 → output 100.
188+
assert_eq!(remap.translate(10), Some(100));
189+
// Input addr 12 = instr-stream 2 → output 100 + 2 = 102.
190+
assert_eq!(remap.translate(12), Some(102));
191+
assert_eq!(remap.translate(13), Some(103));
192+
}
193+
194+
/// LEB-growth case: the instruction map shifts offsets, and the
195+
/// output base differs. Input body [10,20), output body at 200,
196+
/// instr map shows divergence (drop moved +1 after a remapped call).
197+
#[test]
198+
fn translate_applies_instruction_offset_shift() {
199+
let mut remap = AddressRemap::new();
200+
// instr stream: call@0→0, drop@2→3 (call grew 1 byte).
201+
remap.insert(span(10, 20, 200, 0, &[(0, 0), (2, 3)]));
202+
203+
// call at input 10 → output 200.
204+
assert_eq!(remap.translate(10), Some(200));
205+
// drop at input 12 (stream offset 2) → output 200 + 3 = 203.
206+
assert_eq!(remap.translate(12), Some(203));
207+
}
208+
209+
/// Locals prefix is subtracted on the way in and re-added on the
210+
/// way out (it's preserved verbatim), so a function whose body
211+
/// starts with a 3-byte locals vector still maps instructions
212+
/// correctly.
213+
#[test]
214+
fn translate_accounts_for_locals_prefix() {
215+
let mut remap = AddressRemap::new();
216+
// Input body [10, 30), 3-byte locals prefix, output body at 50.
217+
// First instruction is at body_rel 3 = instr-stream 0.
218+
remap.insert(span(10, 30, 50, 3, &[(0, 0), (4, 5)]));
219+
220+
// Input addr 13 = body_rel 3 = instr-stream 0 → 50 + 3 + 0 = 53.
221+
assert_eq!(remap.translate(13), Some(53));
222+
// Input addr 17 = body_rel 7 = instr-stream 4 → 50 + 3 + 5 = 58.
223+
assert_eq!(remap.translate(17), Some(58));
224+
}
225+
226+
/// Multiple functions: the BTreeMap range lookup picks the right
227+
/// containing span.
228+
#[test]
229+
fn translate_selects_correct_function_among_many() {
230+
let mut remap = AddressRemap::new();
231+
remap.insert(span(0, 10, 1000, 0, &[(0, 0), (5, 5)]));
232+
remap.insert(span(10, 25, 2000, 0, &[(0, 0), (8, 9)]));
233+
remap.insert(span(25, 40, 3000, 0, &[(0, 0), (3, 3)]));
234+
235+
assert_eq!(remap.translate(5), Some(1005)); // func 0
236+
assert_eq!(remap.translate(18), Some(2009)); // func 1, shifted
237+
assert_eq!(remap.translate(28), Some(3003)); // func 2
238+
}
239+
240+
/// Addresses outside any function, or not on an instruction
241+
/// boundary, return None — gimli will then drop them rather than
242+
/// emit a wrong address.
243+
#[test]
244+
fn translate_misses_return_none() {
245+
let mut remap = AddressRemap::new();
246+
remap.insert(span(10, 20, 100, 0, &[(0, 0), (2, 2)]));
247+
248+
assert_eq!(remap.translate(5), None, "before any function");
249+
assert_eq!(remap.translate(20), None, "at end (exclusive)");
250+
assert_eq!(remap.translate(50), None, "past all functions");
251+
assert_eq!(remap.translate(1), None, "below first function");
252+
// Inside the function but not on a recorded instruction start.
253+
assert_eq!(remap.translate(11), None, "mid-instruction offset");
254+
}
255+
256+
/// An address whose body-relative offset is *less than* the locals
257+
/// prefix (i.e. pointing into the locals declarations, not the
258+
/// instruction stream) returns None rather than underflowing.
259+
#[test]
260+
fn translate_address_inside_locals_prefix_is_none() {
261+
let mut remap = AddressRemap::new();
262+
remap.insert(span(10, 30, 50, 5, &[(0, 0)]));
263+
// Input addr 12 = body_rel 2 < locals_prefix_len 5 → None.
264+
assert_eq!(remap.translate(12), None);
265+
}
266+
}

meld-core/src/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
pub mod adapter;
4545
pub mod attestation;
4646
pub mod component_wrap;
47+
pub mod dwarf;
4748
mod error;
4849
pub mod merger;
4950
pub mod p3_async;

0 commit comments

Comments
 (0)