Skip to content

Tooling for support-email log processing and triage#1670

Merged
jcschaff merged 5 commits intomasterfrom
user-log-processing
May 4, 2026
Merged

Tooling for support-email log processing and triage#1670
jcschaff merged 5 commits intomasterfrom
user-log-processing

Conversation

@jcschaff
Copy link
Copy Markdown
Member

Summary

Adds a Python pipeline that turns a directory of forwarded VCell support emails (.eml containing embedded JSON error reports) into structured CSVs for spreadsheet-driven bug triage. Found one real concurrency bug already during initial use — see #1669, a separate PR fixing the bug this tooling uncovered.

What's in the box

  • tools/parse_support_email.py — single Python script, batch-mode entry point. Emits four CSVs:
    • emails.csv — one row per email, with an incident_id that collapses resubmissions (users who hit "Send" repeatedly as their log grows produce multiple emails sharing a long log prefix; we hash the first 1024 chars and group by it).
    • traces.csv — one row per (email, stack-trace), with four cluster keys at decreasing granularity (top-5, top-2, top-1, innermost-VCell), plus parsed file/class/method, plus the full trace text for diagnosis.
    • file_pivot.csvprimary triage view, one row per innermost-VCell source file, with a code-presence check that grep's vcell-*/src/main/java to confirm whether the file/method still exists in the live tree.
    • version_pivot.csv — secondary, by signature.
  • docs/support_emails/README.md — full pipeline documentation: input format, output schemas, design decisions, learnings (notably "absence of a bug in a newer version is not evidence of a fix at small N").
  • docs/support_emails/.gitignore — excludes both raw .eml/.json inputs and the derived analysis CSVs (which carry the same user-submitted log content).
  • .claude/commands/analyze-support-email.md — slash command for single-file analysis (preserved from initial WIP).

Design decisions

  • Brace-balanced JSON extraction with string-state tracking. A naïve depth-counter fails when log content has literal {/}.
  • Multiple cluster keys per trace. Top-5 is too strict (lambda numbering / AWT chains differ across submissions); innermost-VCell is the most useful headline. All four keys stored so re-clustering needs no re-parsing.
  • Cluster by deepest cause, not outer wrapper. Caused by: chains are walked to find the actual throw site; otherwise traces wrapped by a catch-all (e.g. populateDataSymbolMetadata wrapping all math-mapping failures) all collapse into the wrapper file. The clustering fix landed mid-PR after this exact bug was caught — see commit Cluster traces by deepest cause, not outer wrapper.
  • Code-presence verification. With small N and no systematic fixing culture, "absent in newer version" isn't evidence of a fix. Whether the file+method still exists in the current tree is a binary fact you can verify in seconds, and is a more reliable filter for "this cluster is worth investigating now."
  • No raw email content in VCS. Reports include user identifiers, file paths, and model names. Both raw .eml and derived CSVs are gitignored.

Validation it works

Used on a 75-email corpus (4 software versions, all build_47 dominant). After resubmission collapse: 33 incidents across 50 distinct innermost-VCell source files. Top result was a thread-safety bug in InternalUnitDefinition.round (52 traces, 7 incidents) that's fixed in the companion PR #1669. The tool's value is mostly in producing that kind of focused, evidence-backed bug list from messy raw reports.

Workflow

# Drop new emails into docs/support_emails/, then:
python3 tools/parse_support_email.py --batch docs/support_emails \
  --out-emails        docs/support_emails/emails.csv \
  --out-traces        docs/support_emails/traces.csv \
  --out-file-pivot    docs/support_emails/file_pivot.csv \
  --out-version-pivot docs/support_emails/version_pivot.csv

Open file_pivot.csv first.

Test plan

  • Parser handles 75-email corpus without parse failures
  • Code-presence check correctly flags UserCancelException.java as method_missing (a stale code path the old wrapper-clustering hid)
  • Sanity-check the README's documented columns match the output (skim review)
  • Try dropping a fresh email after merge and re-running

Notes for review

  • The branch starts from a WIP commit (Jim's original parser). Squashing on merge is fine; the three follow-up commits are logically distinct.
  • Pure tooling/docs change; no Java touched. CI should be a no-op.

🤖 Generated with Claude Code

jcschaff and others added 5 commits April 29, 2026 14:29
…ation

Adds batch processing over a directory of forwarded support emails and
emits four CSVs for spreadsheet-driven triage:

- emails.csv      one row per email, with a corpus-level incident_id
                  that collapses resubmissions sharing a 1-KB log prefix
- traces.csv      one row per (email, stack-trace), with four cluster
                  keys at decreasing granularity (top-5, top-2, top-1,
                  innermost-VCell), plus parsed file/class/method
- file_pivot.csv  primary triage view, one row per innermost-VCell
                  source file, with a code-presence check that confirms
                  whether the file and method still exist in the live
                  codebase
- version_pivot.csv  secondary view, by signature, pivoted by version

Other notable changes:
- Brace-balanced JSON extraction now tracks string-literal state so
  literal '{' / '}' inside log content don't throw off the depth counter.
- Stack-trace classification uses the innermost 'Caused by:' class as
  the canonical exception, so RuntimeException wrappers don't fragment
  NPE clusters.
- Frame normalization strips line numbers, lambda-suffix digits, and
  synthetic accessors before signing, so trivial refactors don't
  fragment a cluster.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds docs/support_emails/README.md describing the input format, output
schemas, design decisions (especially: code-presence verification beats
version-diff inference at this corpus size), and learnings (notably:
absence of a bug in a newer version is not evidence of a fix).

Untracks the existing sample emails and JSON so the repo stops carrying
user-submitted log content. The new .gitignore excludes both the raw
.eml/.json inputs and the derived analysis CSVs (which contain the same
log content). Drop new emails into docs/support_emails/ locally and
re-run the parser; nothing in that directory should land in the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The trace clustering logic walked frames top-down across the entire
trace text and picked the first VCell frame found. That frame is the
catch/wrap site, not the throw site — so traces wrapped by a
catch-all (e.g., the metadata-resolver wrapping all math-mapping
failures as "Failed to determine metadata for data symbols") all
clustered into the wrapper file rather than the actual buggy file.

Add deepest_cause_frames() that returns frames from the LAST
'Caused by:' chain only, and route signature/cluster computation
through it. The full_trace column is unchanged (all data still
present); only cluster keys and the top_frames preview move.

Effect on the current corpus:
- SimulationWorkspaceModelInfo.java drops out of the top-15 (it was
  inc=11 at the top before; it's just a wrapper).
- InternalUnitDefinition.java surfaces with inc=7 (52 round() traces
  that were previously hidden under the wrapper).
- StochMathMapping.java's NumberFormatExceptions correctly migrate
  to InternalUnitDefinition; the cluster now shows only NPEs.
- A new method_missing finding (UserCancelException.java) appears
  that the wrapper-based clustering hid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two improvements that materially de-noise the file_pivot.csv triage view:

1. Thread.dumpStack() output is no longer counted as an exception.
   Java's Thread.dumpStack constructs a synthetic Exception ('Stack
   trace') purely to print the current call stack and the program
   continues. VCellThreadChecker uses this to log threading-hygiene
   advisories. The parser now detects the pattern (header
   'java.lang.Exception: Stack trace' followed by a Thread.dumpStack
   frame) and skips those traces. In the current corpus this removes
   VCellThreadChecker.java from the top-15 (8 false-positive
   incidents).

2. Each file_pivot row gets a recent_commit_count + recent_commits
   column showing git activity on the source files since 1 year
   before the cluster's earliest incident date. The 1-year lookback
   is deliberate: fixes often ship before the users on older builds
   start reporting, so 'since first_seen' would miss them. The
   MathSymbolMapping cluster, for example, now correctly surfaces
   the existing 'NullPointerException in TreeMap because of
   concurrent modification' fix from March 2025, even though all 14
   corpus traces of it are dated November 2025+.

   ExceptionHandler.java similarly now flags 'Catch Server Rejected
   Save Exception' as a candidate already-applied fix worth checking
   before investigating its 6-incident cluster.

README updated for the new columns and two new learnings (L7, L8).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jcschaff jcschaff merged commit b24ed5c into master May 4, 2026
13 checks passed
@jcschaff jcschaff deleted the user-log-processing branch May 4, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant