Tooling for support-email log processing and triage#1670
Merged
Conversation
…ation
Adds batch processing over a directory of forwarded support emails and
emits four CSVs for spreadsheet-driven triage:
- emails.csv one row per email, with a corpus-level incident_id
that collapses resubmissions sharing a 1-KB log prefix
- traces.csv one row per (email, stack-trace), with four cluster
keys at decreasing granularity (top-5, top-2, top-1,
innermost-VCell), plus parsed file/class/method
- file_pivot.csv primary triage view, one row per innermost-VCell
source file, with a code-presence check that confirms
whether the file and method still exist in the live
codebase
- version_pivot.csv secondary view, by signature, pivoted by version
Other notable changes:
- Brace-balanced JSON extraction now tracks string-literal state so
literal '{' / '}' inside log content don't throw off the depth counter.
- Stack-trace classification uses the innermost 'Caused by:' class as
the canonical exception, so RuntimeException wrappers don't fragment
NPE clusters.
- Frame normalization strips line numbers, lambda-suffix digits, and
synthetic accessors before signing, so trivial refactors don't
fragment a cluster.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds docs/support_emails/README.md describing the input format, output schemas, design decisions (especially: code-presence verification beats version-diff inference at this corpus size), and learnings (notably: absence of a bug in a newer version is not evidence of a fix). Untracks the existing sample emails and JSON so the repo stops carrying user-submitted log content. The new .gitignore excludes both the raw .eml/.json inputs and the derived analysis CSVs (which contain the same log content). Drop new emails into docs/support_emails/ locally and re-run the parser; nothing in that directory should land in the repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The trace clustering logic walked frames top-down across the entire trace text and picked the first VCell frame found. That frame is the catch/wrap site, not the throw site — so traces wrapped by a catch-all (e.g., the metadata-resolver wrapping all math-mapping failures as "Failed to determine metadata for data symbols") all clustered into the wrapper file rather than the actual buggy file. Add deepest_cause_frames() that returns frames from the LAST 'Caused by:' chain only, and route signature/cluster computation through it. The full_trace column is unchanged (all data still present); only cluster keys and the top_frames preview move. Effect on the current corpus: - SimulationWorkspaceModelInfo.java drops out of the top-15 (it was inc=11 at the top before; it's just a wrapper). - InternalUnitDefinition.java surfaces with inc=7 (52 round() traces that were previously hidden under the wrapper). - StochMathMapping.java's NumberFormatExceptions correctly migrate to InternalUnitDefinition; the cluster now shows only NPEs. - A new method_missing finding (UserCancelException.java) appears that the wrapper-based clustering hid. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two improvements that materially de-noise the file_pivot.csv triage view:
1. Thread.dumpStack() output is no longer counted as an exception.
Java's Thread.dumpStack constructs a synthetic Exception ('Stack
trace') purely to print the current call stack and the program
continues. VCellThreadChecker uses this to log threading-hygiene
advisories. The parser now detects the pattern (header
'java.lang.Exception: Stack trace' followed by a Thread.dumpStack
frame) and skips those traces. In the current corpus this removes
VCellThreadChecker.java from the top-15 (8 false-positive
incidents).
2. Each file_pivot row gets a recent_commit_count + recent_commits
column showing git activity on the source files since 1 year
before the cluster's earliest incident date. The 1-year lookback
is deliberate: fixes often ship before the users on older builds
start reporting, so 'since first_seen' would miss them. The
MathSymbolMapping cluster, for example, now correctly surfaces
the existing 'NullPointerException in TreeMap because of
concurrent modification' fix from March 2025, even though all 14
corpus traces of it are dated November 2025+.
ExceptionHandler.java similarly now flags 'Catch Server Rejected
Save Exception' as a candidate already-applied fix worth checking
before investigating its 6-incident cluster.
README updated for the new columns and two new learnings (L7, L8).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Python pipeline that turns a directory of forwarded VCell support emails (
.emlcontaining embedded JSON error reports) into structured CSVs for spreadsheet-driven bug triage. Found one real concurrency bug already during initial use — see #1669, a separate PR fixing the bug this tooling uncovered.What's in the box
tools/parse_support_email.py— single Python script, batch-mode entry point. Emits four CSVs:emails.csv— one row per email, with anincident_idthat collapses resubmissions (users who hit "Send" repeatedly as their log grows produce multiple emails sharing a long log prefix; we hash the first 1024 chars and group by it).traces.csv— one row per(email, stack-trace), with four cluster keys at decreasing granularity (top-5, top-2, top-1, innermost-VCell), plus parsed file/class/method, plus the full trace text for diagnosis.file_pivot.csv— primary triage view, one row per innermost-VCell source file, with a code-presence check that grep'svcell-*/src/main/javato confirm whether the file/method still exists in the live tree.version_pivot.csv— secondary, by signature.docs/support_emails/README.md— full pipeline documentation: input format, output schemas, design decisions, learnings (notably "absence of a bug in a newer version is not evidence of a fix at small N").docs/support_emails/.gitignore— excludes both raw.eml/.jsoninputs and the derived analysis CSVs (which carry the same user-submitted log content)..claude/commands/analyze-support-email.md— slash command for single-file analysis (preserved from initial WIP).Design decisions
{/}.Caused by:chains are walked to find the actual throw site; otherwise traces wrapped by a catch-all (e.g.populateDataSymbolMetadatawrapping all math-mapping failures) all collapse into the wrapper file. The clustering fix landed mid-PR after this exact bug was caught — see commitCluster traces by deepest cause, not outer wrapper..emland derived CSVs are gitignored.Validation it works
Used on a 75-email corpus (4 software versions, all build_47 dominant). After resubmission collapse: 33 incidents across 50 distinct innermost-VCell source files. Top result was a thread-safety bug in
InternalUnitDefinition.round(52 traces, 7 incidents) that's fixed in the companion PR #1669. The tool's value is mostly in producing that kind of focused, evidence-backed bug list from messy raw reports.Workflow
# Drop new emails into docs/support_emails/, then: python3 tools/parse_support_email.py --batch docs/support_emails \ --out-emails docs/support_emails/emails.csv \ --out-traces docs/support_emails/traces.csv \ --out-file-pivot docs/support_emails/file_pivot.csv \ --out-version-pivot docs/support_emails/version_pivot.csvOpen
file_pivot.csvfirst.Test plan
UserCancelException.javaasmethod_missing(a stale code path the old wrapper-clustering hid)Notes for review
WIPcommit (Jim's original parser). Squashing on merge is fine; the three follow-up commits are logically distinct.🤖 Generated with Claude Code