chore(gold-traces): AppMap gold-trace baselines + behavioral review workflow#2371
Conversation
AppMap Behavioral Review — Gold-Trace Baseline (first-ever run)Revisions:
Feature ListThe skill's Feature List enumerates application-code features in the source diff.
None of the above produced runtime drift, because none of it is reachable application code. Coverage MatrixEvery entry below is newly established at head (there was no baseline trace before).
|
| Feature | Covered by | Status |
|---|---|---|
| stats — aggregates event statistics over a directory | stats.spec.ts › analyzes a directory |
✅ |
| sequence-diagram — generates valid JSON diagram | sequenceDiagram.spec.ts › is valid |
✅ |
| openapi — builds OpenAPI doc from req/resp | openapi.spec.ts › handles valid and malformed HTTP server requests |
✅ |
| prune — shrinks AppMap via base64url filter | prune.spec.ts › correctly reduces the size… |
✅ |
| index-fingerprinter — per-file on-disk index | fingerprint/fingerprinter.spec.ts › fingerprints a recording… |
✅ |
| index-querydb — import into query.db SQLite | db/import/importAppmap.spec.ts › imports an end-to-end recording… |
✅ |
| index-traversal — directory walk / queue orchestration | fingerprintDirectoryCommand.spec.ts › fingerprints AppMaps and writes an index |
✅ |
| query-find — label-filtered call search (SQL-backed) | queries/find.spec.ts › --label filters via the labels table |
✅ |
| query-tree — call tree, filter=sql | queries/tree.spec.ts › returns only sql events when filter=sql |
✅ |
| query-hotspots — hotspots by class | queries/hotspots.spec.ts › --class filters by defined class |
✅ |
| query-endpoints — list/sort HTTP endpoints | queries/endpoints.spec.ts › sorts by the requested key |
✅ |
query-mcp — get_call_tree MCP handler |
queries/mcp.spec.ts › get_call_tree resolves the canonical appmap path… |
✅ |
| trim — value truncation across every slot type | trim.spec.ts › truncates values across every captured slot… |
✅ |
@appland/sequence-diagram — 5 traces
| Feature | Covered by | Status |
|---|---|---|
| specification — actor/package selection | integration/specification.spec.ts › includes all relevant actors |
✅ |
| labels — AppMap labels carried onto diagram actions (the branch feature) | sequenceDiagram.spec.ts › are reported on a labeled function action |
✅ |
| protocols-http — HTTP server request as a diagram action | http.spec.ts › is recorded |
✅ |
| protocols-sql — SQL query as a diagram action | sql.spec.ts › is recorded |
✅ |
| diff — two-diagram diff → PlantUML | integration/sequenceDiagramDiff.spec.ts › UML matches expectation |
✅ |
Deliberately uncovered (declared in the manifests)
| Subsystem | Covered by | Status |
|---|---|---|
compare, inventory commands |
— | ❌ uncovered (intentional) — handler tests balloon to multi-MB AppMaps the diagram collapses anyway; excluded for leanness. |
navie |
— | ❌ uncovered (intentional) — out of scope for this baseline. |
| sequence-diagram loop rendering / list-vs-prefetch diff | — | ❌ uncovered (intentional) — repeated helper frames the diagram collapses; MB weight, no digest signal. |
These are documented, reasoned exclusions — not accidental gaps. None is a
security-sensitive path lacking a negative test, so none rises to a blocking ❌.
Suggested Labels
No labels are suggested from this run. Label suggestions come from functions that
changed in the compare but carry no label; with no compare and no application-code
diff, there is no changed-but-unlabeled function to flag. The one label-bearing behavior
in scope — AppMap labels carried onto sequence-diagram actions — is the branch feature
itself and is already exercised by the labels gold trace above.
Behavioral Drift
None — and none was measurable. The baseline carries no gold traces, so appmap compare had no base/ set to diff against; the pipeline stopped at establishing the
head baseline. Independently, the source diff changes no application code
(git diff --name-only 44cf1888..f311a533 is entirely gold traces, manifests,
appmap.yml, .gitignore, and CI), so there is no code path that could have moved a
trace even if a baseline existed.
Note for future runs: the compare digest excludes volatile data (elapsed time,
object ids, parameter/return values) by construction, so once a real baseline exists a
changed entry will be a genuine structural change, not timing or value jitter. The
appmap.yml exclusions added here (sanitizeURL/parseURL) further stabilize the
baseline against ambient git state, which is the right call for determinism.
Unintended Side Effects
No behavior changed outside the change's stated scope. This section is where a
behavioral diff normally earns its keep, but two independent facts make it empty here:
(1) there is no baseline trace set to diff against, and (2) the diff touches no
application code — only recording configuration and CI. There is no footprint-minus-intent
residue to grade. Nothing to confirm or flag.
Suggestions
🟢 INFO — Initial baseline is coherent and lean; bless it
File: packages/cli/gold_traces/manifest.yaml,
packages/sequence-diagram/gold_traces/manifest.yaml
Context: the 18 recorded gold traces
The baseline picks one deterministic, single-behavior trace per distinct subsystem
(commands, query engine, indexer's three responsibilities split apart, the sequence-diagram
protocols, and the label feature that motivated the branch). The exclusions are documented
and defensible. There is no runtime evidence to act on — this is the seed baseline that
future reviews will diff against.
Risk: none. Non-application, additive infrastructure.
Recommended remediation: none required. Bless these traces as the initial behavioral
baseline. Follow-up coverage suggestions are recorded in Tests to Synthesize.
🟢 LOW — appmap.yml determinism exclusions are appropriate
File: appmap.yml Context: packages/cli exclude list
Excluding sanitizeURL/parseURL (git-remote-dependent telemetry leaves) and scoping the
new packages/sequence-diagram block keeps the baseline reproducible across machines. No
functional behavior is affected; this only narrows what is recorded. Included so the
reader sees it was considered, not missed.
Risk: the only downside is reduced recording of those two leaf functions, which are
incidental telemetry, not app logic. Acceptable.
Recommended remediation: none. If future features route real logic through those
helpers, revisit the exclusion.
Tests to Synthesize
No security or correctness gap forces a new test this run — the uncovered subsystems
are intentional, documented exclusions rather than unguarded behavior. The rows below are
optional follow-ups a maintainer may choose to add later; none is blocking.
| Target | Test name | Priority |
|---|---|---|
compare command handler (currently excluded for size) |
a lean, small-fixture compare gold trace if size can be controlled |
low |
inventory command handler (currently excluded for size) |
a lean inventory gold trace if size can be controlled |
low |
| sequence-diagram loop/prefetch rendering (excluded) | a minimal loop-rendering trace, only if a small deterministic fixture exists | low |
SQL Pass
No new or changed queries — there is no application-code diff and no baseline to compare
against. For future reference, the baseline does establish SQL-relevant coverage: the
query-find trace exercises the labels table via the SQL-backed query engine, query-tree
filters to SQL events, and the protocols-sql sequence-diagram trace represents a SQL query
as a diagram action. These are the trace shapes a later change to the query engine or query.db
import path would be diffed against; nothing to review today.
HTTP Pass
No new or changed endpoints. The baseline establishes HTTP-relevant coverage for future
diffs: openapi (builds an OpenAPI document from request/response data, incl. malformed
requests), query-endpoints (lists and sorts HTTP endpoints), and the protocols-http
sequence-diagram trace (HTTP server request as a diagram action). No auth-gated mutation or
request-handling path changed here; nothing to review today.
Summary
| Severity | Findings | Action required |
|---|---|---|
| 🔴 High | 0 | — |
| 🟡 Medium | 0 | — |
| 🟢 Low / Info | 2 | Bless the initial baseline; determinism exclusions are appropriate. |
Not merge-blocking. This is a clean, first-ever gold-trace baseline: the baseline
revision had no committed traces, the diff changes no application code, and there is
consequently no behavioral drift and no unintended side effect to report. The single
most important action is to accept these 18 traces as the initial behavioral baseline
so the next review has a real base to diff against — at which point the compare-driven
drift, side-effect, and absence checks become meaningful.
…rkflow Adds the committed AppMap gold-trace baselines for @appland/appmap (cli) and @appland/sequence-diagram (manifest.yaml + baseline/), the AppMap Behavioral Review GitHub workflow, and the supporting config: root appmap.yml lists packages/sequence-diagram and excludes non-deterministic telemetry leaves; .gitignore ignores .appmap/. Branches off feat/sequence-diagram-labels. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
0f61131 to
8a91c40
Compare
AppMap Behavioral Review — establish gold-trace baselineRevisions:
Summary
Not merge-blocking. This change adds gold-trace recording infrastructure only — it touches no application code, so there is no runtime behavior to regress. The single action is to treat the 19 recorded traces as the accepted baseline; the next review will diff against them. FindingsNone. The diff between Checks performed
Review detail — features, coverage, labels, driftFeature ListNo application-code features. Every change in this diff is recording infrastructure or config, which Step 1 excludes from the feature list:
Coverage MatrixThis is the baseline being established, not a feature under review, so every entry is a newly-recorded trace rather than a gap. No behavior is left uncovered because no behavior changed.
No record commands are emitted here — the manifests above already capture how each trace is recorded, and there are no ❌ gaps to close. Suggested LabelsNone. Label suggestions apply to functions that changed in the compare but carry no label; with no compare and no application-code change, there is nothing to label. (The sequence-diagram Behavioral DriftNone. There is no base-side recording to diff against, and no application code changed between the two revisions, so no trace could move. The recorded traces stand as the accepted starting point; subsequent reviews on this baseline will report drift relative to them. |
AppMap Behavioral Review — sequence-diagram gold-trace baselineRevisions:
Summary
Nothing to block on. No application behavior changed between these revisions; the head commit only lays down the gold-trace baseline and its supporting config. Safe to merge. The single thing to be aware of is the recording-scope change in FindingsNone. No application code changed and there is no base recording to compare against, so there is no behavioral drift, no side effect, and no missing-guard finding to raise. Checks performed
Review detail — features, coverage, labels, driftFeature ListNo application-code features changed on this branch. Coverage MatrixThe head commit introduces the baseline recordings below (from
No ❌ gaps to close: there is no changed behavior in this commit that a missing negative trace would leave unguarded. The manifest deliberately omits loop-/large-fixture-heavy tests because their extra events collapse to repeated helper frames that add size without digest signal — a reasonable determinism trade-off, not a coverage hole. Suggested LabelsNone. No function changed in a compare (none ran), so there is no unlabeled changed function to annotate. Behavioral DriftNo behavioral drift is possible or observed: the baseline carries no recordings and no application code differs between the revisions, so nothing moved at runtime. The one non-code change worth confirming is in
Both are intended and consistent with the stated goal (deterministic, lean gold traces). Confirm the excluded telemetry leaves stay excluded on the next re-record so the baseline digest remains stable. |
Splits the gold-traces + review tooling out of #2369 (the sequence-diagram
labels feature) into its own branch, stacked on
feat/sequence-diagram-labels.What's here
@appland/appmap(cli, 13 traces) and@appland/sequence-diagram(5 traces): a singlemanifest.yamlper packageplus committed, auto-trimmed
baseline/appmaps/**..github/workflows/appmap-review.yml):on PRs touching either package, re-records + blesses that package's baseline
and posts an interpreted behavioral review (sticky comment + job summary) via
getappmap/review-action.appmap.ymlnow listspackages/sequence-diagramand excludes non-deterministic telemetry leaves (
sanitizeURL/parseURL);.gitignoreignores.appmap/.Notes
appmap trimCLI command, the shared truncation lib, and thegold-trace-supporting tests/labels live on the code branch (feat(sequence-diagram): carry AppMap labels on diagram actions #2369), since
they're
src//tests/changes.feat/sequence-diagram-labels; GitHub will retarget this tomainonce feat(sequence-diagram): carry AppMap labels on diagram actions #2369 merges.
🤖 Generated with Claude Code