Skip to content

Commit eed01b7

Browse files
authored
dyninst/symdb: fix cross-CU inlined functions (#49860)
### What does this PR do? Extends the `symdb` cross-compile-unit indexing scheme introduced in #48822 to cover **non-generic** abstract inline functions and every inline instance regardless of which compile unit holds it. Along the way, fixes a streaming-iteration bug where a package could be yielded more than once, with the trailing yield containing only "leftover" functions that symdb couldn't attribute to the first yield in time. The patch series is: 1. **`dyninst/symdb: drive TestSymDBSnapshot through streaming PackagesIterator`** Replace the existing `TestSymDBSnapshot` (which called `ExtractSymbols` and aggregated all yields into a single `Symbols` value before serialization) with a test that drives the streaming `PackagesIterator` API dire 2. **`dyninst/testprogs: add cross-CU inline test cases`** Add `lib.InlinedInLaterCU`, a function defined in `lib` but called only from `lib.v2.FooV2`. Because `lib.v2`'s compile unit is emitted *after* `lib`'s CU in the binary, the Go compiler places the abstract-definition DIE in `lib.v2`'s CU. Regenerate the snapshot goldens with the existing (pre-fix) `symdb` code — the new streaming goldens clearly capture the bug described below. 3. **`dyninst/symdb: rename genericFuncIndex to funcOffsetByNameIndex`** Pure rename of the sorted name-keyed index type to decouple it from its initial use for generic shape functions. No behaviour change. 4. **`dyninst/symdb: index inline instances across compile units`** The actual fix. Adds two pre-pass indexes: - `inlineDefIndex` — canonical qualified name → abstract-definition DWARF offset, for every `DW_TAG_subprogram` with `DW_AT_inline = DW_INL_inlined` (both generic and non-generic). - `inlineInstanceIndex` — abstract-origin DWARF offset → instance DWARF offset, for every inline instance (both `DW_TAG_inlined_subroutine` and out-of-line `DW_TAG_subprogram` with `DW_AT_abstract_origin`). At each package's CU emission, every instance of every abstract function owned by that package is replayed from `inlineInstanceIndex` so that `file`/`endLine`/`injectibleLines` and per-variable availability are populated regardless of which CU the abstract DIE or its instances live in. Emission is scoped to the defining CU: `b.abstractFunctions` is no longer cleared per CU, so instance data accumulates across CUs, and `addAbstractFunctions` only emits entries whose owning package matches. The `displacedFunctions` fallback is removed from the abstract-function path (and `addFunctionToPackage`) because the indexes cover that case by construction — verified with an empirical check that the fallback never fires on any of the sample programs. ### Motivation A pre-existing bug in the streaming `PackagesIterator` API: a package can be yielded **more than once**, with the second yield containing only the "leftover" functions that `symdb` couldn't attribute to the first yield in time. The trailing yields are assembled from `displacedFunctions`, a side channel that accumulates functions parsed from non-owner CUs for later merging. Two concrete cases in the sample binary: - **Non-generic, cross-CU abstract.** `lib.InlinedInLaterCU` is defined in `lib` and inlined only from `lib.v2.FooV2`. The Go compiler emits the abstract-definition DIE in `lib.v2`'s CU, which comes *after* `lib`'s CU in the binary. When `symdb` iterates CUs: - `lib`'s CU is processed first and yields a `lib` package without `InlinedInLaterCU`. - `lib.v2`'s CU is processed later; its `exploreSubprogram` parses `InlinedInLaterCU`'s abstract DIE and, because it belongs to `lib` not `lib.v2`, routes it through `displacedFunctions["lib"]`. - At end-of-iteration, `displacedFunctions["lib"]` is yielded as a new `lib` package containing just this function. `ExtractSymbols` hid this by aggregating both yields into one `Symbols` value before serialization. `PackagesIterator` (used by the production uploader, `pkg/dyninst/module/symdb.go`) does not — it passes both yields through as two separate uploads, so the server sees the same package twice with disjoint contents. Switching the test to the streaming API (patch 1) makes this visible. - **Generic shape, cross-CU abstract.** `lib.NewImmutableSet[go.shape.string]`'s abstract DIE lands in `main`'s CU (the compiler's generic-shape placement behaviour). The existing `augmentWithDisplacedGenerics` path re-parses the abstract DIE while processing `lib`'s CU, but never replays its inline instance, so it produces a broken stub entry with empty `file`/`endLine`/`injectibleLines` alongside the correctly-populated entry. Both entries end up in the output. Both problems have the same shape: `symdb`'s abstract-function handling was scoped to the CU currently being walked, with a best-effort side channel for everything that didn't fit. The fix is to precompute the two indexes (abstract definitions, inline instances) in the existing pre-pass and replay instances at each package's emission point, so every abstract function is emitted exactly once, on its owning package's yield, with complete per-instance data — regardless of the compiler's DIE placement choices. ### Describe how you validated your changes - New unit tests for the origin-keyed index (`TestFuncOffsetByOriginIndex`) covering empty / single / multi / duplicate origins / early-break iteration across both in-memory and on-disk (mmap-backed) variants. - `TestSymDBSnapshot` goldens across 4 Go toolchains and 2 architectures. The golden diff across the commit chain visibly demonstrates: - pre-fix streaming output: 5 `lib` yields for `sample`, with a trailing duplicate yield containing `InlinedInLaterCU` + a broken `NewImmutableSet` stub. - post-fix: 4 yields, with `InlinedInLaterCU` attached to `lib`'s primary yield and the broken stub removed. - `TestIRGenAndCompileSymDBProbes` passes across all toolchain/arch combinations. - Dyninst E2E tests (`Dyn/sample/1.25`) pass. ### Additional Notes https://datadoghq.atlassian.net/browse/DEBUG-5497 Co-authored-by: andrew.werner <andrew.werner@datadoghq.com>
1 parent 4b1f0fd commit eed01b7

174 files changed

Lines changed: 18289 additions & 17195 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

pkg/dyninst/gosymname/BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ go_test(
2121
srcs = [
2222
"benchmark_test.go",
2323
"comparison_test.go",
24+
"escape_test.go",
2425
"fuzz_test.go",
2526
"generic_test.go",
2627
"symbol_test.go",

pkg/dyninst/gosymname/escape.go

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,40 @@ func splitPkg(name string) (pkgpath, sym string) {
3737
return "", name
3838
}
3939

40+
// EscapePkg escapes a Go package import path the same way the Go toolchain
41+
// does when emitting linker symbol names. Specifically: control bytes,
42+
// space, '%', '"', non-7-bit-clean bytes, and '.' bytes that appear after
43+
// the last '/' are encoded as %XX (lowercase hex). Other bytes are left
44+
// untouched. This mirrors cmd/internal/objabi.PathToPrefix in the Go
45+
// source tree.
46+
func EscapePkg(s string) string {
47+
slash := strings.LastIndex(s, "/")
48+
n := 0
49+
for i := 0; i < len(s); i++ {
50+
if needsEscape(s[i], i, slash) {
51+
n++
52+
}
53+
}
54+
if n == 0 {
55+
return s
56+
}
57+
const hex = "0123456789abcdef"
58+
p := make([]byte, 0, len(s)+2*n)
59+
for i := 0; i < len(s); i++ {
60+
c := s[i]
61+
if needsEscape(c, i, slash) {
62+
p = append(p, '%', hex[c>>4], hex[c&0xF])
63+
} else {
64+
p = append(p, c)
65+
}
66+
}
67+
return string(p)
68+
}
69+
70+
func needsEscape(c byte, i, lastSlash int) bool {
71+
return c <= ' ' || (c == '.' && i > lastSlash) || c == '%' || c == '"' || c >= 0x7F
72+
}
73+
4074
// unescapePkg unescapes a package import path, replacing %XX escape sequences
4175
// with the original characters. Returns the unescaped path.
4276
func unescapePkg(s string) (string, error) {
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
// Unless explicitly stated otherwise all files in this repository are licensed
2+
// under the Apache License Version 2.0.
3+
// This product includes software developed at Datadog (https://www.datadoghq.com/).
4+
// Copyright 2025-present Datadog, Inc.
5+
6+
package gosymname
7+
8+
import (
9+
"testing"
10+
11+
"github.com/stretchr/testify/require"
12+
)
13+
14+
func TestEscapePkg(t *testing.T) {
15+
cases := []struct {
16+
name string
17+
in string
18+
out string
19+
}{
20+
{"plain", "main", "main"},
21+
{"single_dot_in_segment", "lib.v2", "lib%2ev2"},
22+
{"slash_then_dot", "gopkg.in/ini.v1", "gopkg.in/ini%2ev1"},
23+
{"multi_segment_no_dot_last", "github.com/DataDog/datadog-agent/pkg/foo", "github.com/DataDog/datadog-agent/pkg/foo"},
24+
{"multi_segment_dot_last", "github.com/DataDog/datadog-agent/pkg/foo.v2", "github.com/DataDog/datadog-agent/pkg/foo%2ev2"},
25+
{"multiple_dots_last", "a.b.c", "a%2eb%2ec"},
26+
{"percent", "weird%name", "weird%25name"},
27+
{"quote", `q"x`, `q%22x`},
28+
{"empty", "", ""},
29+
{"trailing_dot", "pkg.", "pkg%2e"},
30+
}
31+
for _, c := range cases {
32+
t.Run(c.name, func(t *testing.T) {
33+
got := EscapePkg(c.in)
34+
require.Equal(t, c.out, got)
35+
// Round-trip through unescapePkg.
36+
back, err := unescapePkg(got)
37+
require.NoError(t, err)
38+
require.Equal(t, c.in, back)
39+
})
40+
}
41+
}
42+
43+
// FuzzEscapePkg asserts that unescapePkg ∘ EscapePkg is the identity:
44+
// every input round-trips through escape and unescape unchanged. Seeds
45+
// cover the kinds of paths we observe in real Go binaries: plain import
46+
// paths, paths with dotted last segments, the corner cases EscapePkg
47+
// specifically handles (control bytes, quotes, percent, high bytes),
48+
// and a few empty/edge inputs.
49+
func FuzzEscapePkg(f *testing.F) {
50+
seeds := []string{
51+
"",
52+
"main",
53+
"lib.v2",
54+
"gopkg.in/ini.v1",
55+
"github.com/DataDog/datadog-agent/pkg/dyninst/symdb",
56+
"github.com/DataDog/datadog-agent/pkg/dyninst/testprogs/progs/sample/lib.v2",
57+
"a.b.c",
58+
"a/b.c",
59+
"a/b.c/d",
60+
"weird%name",
61+
`q"x`,
62+
"pkg.",
63+
"\x00\x01 \x7f",
64+
"\xc3\xa9", // multi-byte UTF-8 (é)
65+
"go.opencensus.io", // dotted top-level
66+
"k8s.io/api/core/v1", // dotted top, plain last
67+
"sigs.k8s.io/yaml.v1", // dotted top + dotted last
68+
}
69+
for _, s := range seeds {
70+
f.Add(s)
71+
}
72+
73+
f.Fuzz(func(t *testing.T, s string) {
74+
got := EscapePkg(s)
75+
back, err := unescapePkg(got)
76+
require.NoError(t, err, "unescapePkg failed on EscapePkg(%q)=%q", s, got)
77+
require.Equal(t, s, back, "round-trip mismatch for %q (escaped=%q)", s, got)
78+
})
79+
}

0 commit comments

Comments
 (0)