EmitClassTableJob: surface silently dropped class buckets by sj-i · Pull Request #646 · reliforp/reli-prof

sj-i · 2026-04-25T13:36:36Z

Summary

Diagnostic-only PR to chase down a reported asymmetry: two
ObjectContext rows hanging off the same parent in rmem:explore,
one shows the ⇒ class → ClassDefinitionContext pseudo-edge that
lets users jump to the class definition, the other doesn't, even
though both clearly have a class.

RmemModel::findClassDef() is fed by classDefIndex, which is
built from the children of the class_table root in the
substrate. If a particular class doesn't end up under
class_table in the dump, the index misses, the pseudo-edge
isn't added, and the user sees the asymmetry.

EmitClassTableJob wraps every bucket in a per-bucket
try { … } catch (\Throwable) {} for resilience — a single
unreadable class shouldn't abort the whole walk. The unfortunate
side-effect is that whatever class fails to emit disappears
silently. Make the failure visible without changing the
resilience:

Reserve the class_table root id up front via
assignNodeId() and defer the actual emit to the end of the
walk, so the root's attributes pick up counters populated
during bucket iteration.
DefinedClassesContext gains recordDroppedException() and
recordSkippedDedup() and exposes the totals as
#dropped_exceptions / #skipped_dedup attributes alongside
the existing #count. Both surface in the rmem:explore detail
pane — users can tell at a glance whether their snapshot is
missing classes.
The catch arm also samples the first 8 dropped buckets
(class name when readable + exception class + message) and
prints a STDERR warning at the end of analyze. Enough to
point at which class is failing and what is throwing,
without threading a logger through the collector.

How to use

After re-running inspector:memory:analyze against the same
target with this branch:

Watch stderr — if the WARNING: EmitClassTableJob silently dropped class buckets line shows up, the listed classes are the ones that won't appear under class_table in the explore TUI.
Open the resulting .rmem in rmem:explore, navigate to the class_table root, and check the detail pane: #dropped_exceptions / #skipped_dedup attribute values give a count even when stderr was missed.
Compare the dropped class names with the ObjectContext rows that are missing the ⇒ class pseudo-edge — if they line up, this PR explains the bug; if they don't, the asymmetry has a different cause and we have a narrower hypothesis to chase.

Not a fix

The catch is still silent at the resilience layer (a single
unreadable class still doesn't crash analyze) — this PR only
makes the failure observable. Once the user reports back what
the warnings say, a follow-up PR can target the actual root
cause (e.g. fixing the FFI read that's throwing, or making
findClassDef resilient to gaps).

Test plan

phpcs (CI-equivalent: --standard=./phpcs.xml -n -q ./src ./tests) — exit 0
psalm.phar --no-progress --show-info=false — no errors
phpunit tests/Rmem tests/Lib/PhpProcessReader --exclude-group target-version — 192 tests, 847 assertions, only the pre-existing 2 env-dependent failures (mod_php, FrankenPHP)
Manual: run analyze against a real target with classes likely to fail (e.g. heavy classes / closures / classes with unusual ZE state) and confirm stderr surfaces them.

https://claude.ai/code/session_01XXo4QACibyKMdC8CvbBijw

Generated by Claude Code

The class_table walk wraps every bucket in a per-bucket `try { … } catch (\Throwable) {}` so a single unreadable class doesn't kill the whole walk. That's the right design, but it also means a class that fails to emit silently disappears from the class_table tree — `RmemModel::findClassDef()` then returns null for that class, the `⇒ class` pseudo-edge that wires ObjectContext rows to their ClassDefinitionContext on the explore TUI doesn't fire, and the user sees an apparently arbitrary asymmetry: two ObjectContext children of the same parent, one resolves its class def via the pseudo-edge and the other doesn't. Make the failure visible without changing the resilience: - Reserve the class_table root id up front via `assignNodeId()`, defer the actual emit to the end of the walk so attributes pick up counters populated during the loop. - DefinedClassesContext gains `recordDroppedException()` / `recordSkippedDedup()` setters, and exposes the totals as `#dropped_exceptions` / `#skipped_dedup` attributes (alongside the existing `#count`). Both are visible from rmem:explore's detail pane on the class_table root, so users can tell at a glance whether their snapshot is missing classes. - The catch arm also samples the first 8 dropped buckets (class name when readable + exception class + message) and prints a STDERR warning at the end of analyze. That's enough to point at which class is failing and what's going wrong without threading a logger through the collector. This is a diagnostic patch, not a fix — the `try/catch` swallowing is still resilient as before. The intent is to narrow down whether the asymmetry is genuinely "bucket throws on read" versus "bucket isn't where we think it is" before proposing a real fix. https://claude.ai/code/session_01XXo4QACibyKMdC8CvbBijw

Followup to the previous diagnostic: a user reported that the class_table root in their freshly-analyzed rmem shows up as type=ClassDefinitionContext (with #is_internal=1, a class-level attribute) instead of DefinedClassesContext. Isolated repros of the assignNodeId-then-late-emit pattern produce the right type; something in the real path is over-writing node #N's type after late-emit. Add a STDERR line that logs both the id reserved up front via `assignNodeId()` and the id memoised on `$defined_classes_context` after the late `emitNode()` call. If the two match, the late-emit reached the analyzer + sink — the bug is then upstream of us (another emit on the same id). If they don't match, the late emit short-circuited via the "fully emitted, just emit a reference edge" branch and the root genuinely never landed. https://claude.ai/code/session_01XXo4QACibyKMdC8CvbBijw

… late-emit The previous diagnostic confirmed that the reserved id and the final memoised id match, but the user's rmem still has node #(reserved+1) showing as `class_table → ClassDefinitionContext` in the explore Roots view, which is impossible if late-emit actually wrote the tree edge from the sentinel to #reserved. The remaining failure modes both flow through the same branch in analyzeContext: if ($existing_node_id !== null && $existing_node_id >= 0) { $sink->emitReference(...); // reference edge, NOT tree edge return; } i.e. if `defined_classes_context`'s memo gets flipped from `-(reserved+1)` to a positive number sometime between the `assignNodeId()` reservation and the late `emitNode()` call, late-emit short-circuits to a reference edge and the tree edge from `-1 → reserved` is never written. The substrate then doesn't see #reserved as a root, which would explain why the user's first class_def (at #reserved+1) appears with the `class_table` link instead. Snapshot the memo right before the late `emitNode()` so we can tell which case we're in: - memo == `-(reserved+1)` (negative sentinel): late-emit took the normal path; the bug is elsewhere. - memo == positive: someone flipped it during the loop, late- emit took the `emitReference` branch, and we need to find who's mutating the context's memo. https://claude.ai/code/session_01XXo4QACibyKMdC8CvbBijw

coveralls · 2026-04-25T21:04:57Z

Coverage Report for CI Build 24940141295

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage decreased (-23.2%) to 59.071%

Details

Coverage decreased (-23.2%) from the base build.
Patch coverage: 40 uncovered changes across 2 files (0 of 40 lines covered, 0.0%).
9158 coverage regressions across 282 files.

Uncovered Changes

File	Changed	Covered	%
src/Lib/PhpProcessReader/PhpMemoryReader/Collector/Job/EmitClassTableJob.php	30	0	0.0%
src/Lib/PhpProcessReader/PhpMemoryReader/ReferenceContext/DefinedClassesContext.php	10	0	0.0%

Coverage Regressions

9158 previously-covered lines in 282 files lost coverage.

Top 10 Files by Coverage Loss	Lines Losing Coverage	Coverage
src/Inspector/MemoryDump/MemoryDumper.php	393	0.44%
src/Inspector/Output/MemoryOutput/Report/Substrate/FfiCsrGraphSubstrate.php	393	49.41%
src/Inspector/Watch/VariableReader.php	389	9.41%
src/Lib/PhpProcessReader/PhpMemoryReader/MetadataPeekWalker.php	322	0.0%
src/Lib/PhpProcessReader/PhpMemoryReader/Collector/Job/PdoHelper.php	245	0.0%
src/Lib/PhpInternals/Types/Zend/ZendExecuteData.php	236	0.0%
src/Lib/PhpProcessReader/PhpMemoryReader/MemoryLocationsCollector.php	221	0.77%
src/Lib/PhpProcessReader/PhpMemoryReader/Collector/Job/EmitObjectJob.php	218	0.0%
src/Lib/PhpProcessReader/PhpMemoryReader/Collector/Job/EmitClassTableJob.php	213	0.0%
src/Inspector/Output/MemoryOutput/Report/BinaryReportDataProvider.php	192	45.29%

Coverage Stats


Relevant Lines:	37787
Covered Lines:	22321
Line Coverage:	59.07%
Coverage Strength:	18.36 hits per line

💛 - Coveralls

BinaryMemoryOutput.buildCsrSections writes per-node arrays (treeParents, tree CSR rowptr/colidx/linknames, etc.) keyed by node_id directly — relying on the implicit assumption that csr_idx == node_id at load time, with the synthetic -1 sentinel at csr_idx == nodeCount. FfiCsrGraphSubstrate.loadFromBinary built indexToNodeFfi by iterating $all_node_ids in INSERTION order, which only happened to satisfy that contract when nodes were emitted strictly in id order. EmitClassTableJob's diagnostic patch (147b451) reserves a node_id for the class_table root via assignNodeId and defers the actual emit until after the bucket walk, so dropped/skipped counters land in the root's attributes. That writes rows out of order: row N+1 (first ClassDefinitionContext) lands before row N (the late-emit DefinedClassesContext). all_node_ids consequently mapped node_id N to the LAST csr slot, shifting every per-node array index by one from row N onward — manifesting as an explore TUI where the class_table root appeared as a leaf parented under some ClassEntryContext.constants while a downstream ClassDefinitionContext mis-claimed the class_table root slot. Sort all_node_ids before assigning csr indices and re-append -1 afterwards so the sentinel keeps the last slot. Also drop the late-emit STDERR trace from EmitClassTableJob now that the csr-vs-node-id mismatch is understood and fixed; keep the silently-dropped-bucket warning, which is the user-actionable part.

The previous commit (d0f1077) fixes csr_idx assignment in FfiCsrGraphSubstrate.loadFromBinary. Caches written before that fix contain srcloc_refs computed against the broken tree topology — held_by node_ids point at whatever the shifted parent walk landed on (e.g. an unrelated ClassEntryContext for the class_table root), so reli:explore keeps surfacing nonsense source locations even after the substrate is fixed because the sidecar still validates by mtime/size/fingerprint (the rmem itself is unchanged). Bumping VERSION 2 → 3 forces recomputation on first open after upgrade.

When a class bucket walk throws (typically MemoryAddressNotInDumpException on opcache-cached vendor classes), the existing diagnostic prints the class name + exception kind but not the address that failed. Without the hex address it is hard to tell whether the missing memory is in opcache SHM (0x55…/0x56… /dev/zero VMA), in a library .data segment (0x7f… file-backed mmap), or somewhere else entirely — which decides whether --include-binary, the pagemap residency filter, or a different dump strategy is the lever to pull. Capture ce->address right after deref'ing the bucket so the catch can print it as `<class> @ 0x… : <ExceptionClass>: <message>`. Raise the sample cap from 8 → 32 so a real-world target with dozens of drops gets a useful spread instead of just the first eight.

claude added 3 commits April 25, 2026 13:36

claude added 3 commits April 25, 2026 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EmitClassTableJob: surface silently dropped class buckets#646

EmitClassTableJob: surface silently dropped class buckets#646
sj-i wants to merge 6 commits into
0.12.xfrom
claude/class-table-drop-diag-pT8jW

sj-i commented Apr 25, 2026

Uh oh!

coveralls commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sj-i commented Apr 25, 2026

Summary

How to use

Not a fix

Test plan

Uh oh!

coveralls commented Apr 25, 2026

Coverage Report for CI Build 24940141295

Coverage decreased (-23.2%) to 59.071%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants