Skip to content

chore(gold-traces): AppMap gold-trace baselines + behavioral review workflow#2371

Merged
kgilpin merged 4 commits into
mainfrom
feat/appmap-gold-traces
Jul 3, 2026
Merged

chore(gold-traces): AppMap gold-trace baselines + behavioral review workflow#2371
kgilpin merged 4 commits into
mainfrom
feat/appmap-gold-traces

Conversation

@kgilpin

@kgilpin kgilpin commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Splits the gold-traces + review tooling out of #2369 (the sequence-diagram
labels feature) into its own branch, stacked on feat/sequence-diagram-labels.

What's here

  • Gold-trace baselines for @appland/appmap (cli, 13 traces) and
    @appland/sequence-diagram (5 traces): a single manifest.yaml per package
    plus committed, auto-trimmed baseline/appmaps/**.
  • AppMap Behavioral Review workflow (.github/workflows/appmap-review.yml):
    on PRs touching either package, re-records + blesses that package's baseline
    and posts an interpreted behavioral review (sticky comment + job summary) via
    getappmap/review-action.
  • Supporting config: root appmap.yml now lists packages/sequence-diagram
    and excludes non-deterministic telemetry leaves (sanitizeURL/parseURL);
    .gitignore ignores .appmap/.

Notes

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

AppMap Behavioral Review — Gold-Trace Baseline (first-ever run)

Revisions: f311a53 vs feat/sequence-diagram-labels (44cf1888)
Date: 2026-07-02
Commits reviewed:

  • f311a533 Merge a734c9c4 into 44cf1888d8ca37090dfe26fab757b01ef56611e8
  • a734c9c4 chore(gold-traces): add gold-trace baselines and behavioral-review workflow

⚠️ First-ever baseline — no behavioral compare was run

The baseline revision feat/sequence-diagram-labels (44cf1888) has no committed
gold traces
(git ls-tree -r 44cf1888 | grep gold_traces → 0 matches). There is no
prior recording to diff against, so appmap compare was not run and there is, by
construction, no behavioral drift to report. This is the expected "So-far baseline"
case: the head revision is the first commit to carry gold traces, and this report
reviews the coverage that now exists rather than a change between two trace sets.

Separately, the diff between the two revisions touches no application code at all
(verified: git diff --name-only 44cf1888..f311a533 contains only gold traces,
manifests, appmap.yml, .gitignore, and the new CI workflow). So even a populated
baseline would surface zero application-behavior drift here — there is no source change
that could move a trace.


Feature List

The skill's Feature List enumerates application-code features in the source diff.
The 44cf1888..f311a533 diff contains none — it is pure gold-trace/CI infrastructure.
For orientation, the changes are:

  1. Gold-trace baseline recorded (infra, not app code). 18 blessed AppMaps added —
    13 under packages/cli/gold_traces/baseline/appmaps/** and 5 under
    packages/sequence-diagram/gold_traces/baseline/appmaps/** — plus a
    manifest.yaml in each package pinning each trace to an existing Jest test.
  2. Recording scope tightened for determinism (appmap.yml). Excludes
    sanitizeURL / parseURL under packages/cli (they only fire when the checkout has a
    git remote, so they appear/vanish with ambient git state), and adds a
    packages/sequence-diagram package block (exclude: node_modules, .yarn, dist, tests).
    Scoping-only; changes what is recorded, not what the app does.
  3. .gitignore now ignores the .appmap/ working dir (derived exports/archives).
  4. CI workflow added: .github/workflows/appmap-review.yml (119 lines) — drives the
    record→bless→review pipeline that produced this report.

None of the above produced runtime drift, because none of it is reachable application code.


Coverage Matrix

Every entry below is newly established at head (there was no baseline trace before).
"Covered by" is the manifest's pinned Jest test; Status ✅ means a gold trace now exists.

@appland/appmap (CLI) — 13 traces

Feature Covered by Status
stats — aggregates event statistics over a directory stats.spec.tsanalyzes a directory
sequence-diagram — generates valid JSON diagram sequenceDiagram.spec.tsis valid
openapi — builds OpenAPI doc from req/resp openapi.spec.tshandles valid and malformed HTTP server requests
prune — shrinks AppMap via base64url filter prune.spec.tscorrectly reduces the size…
index-fingerprinter — per-file on-disk index fingerprint/fingerprinter.spec.tsfingerprints a recording…
index-querydb — import into query.db SQLite db/import/importAppmap.spec.tsimports an end-to-end recording…
index-traversal — directory walk / queue orchestration fingerprintDirectoryCommand.spec.tsfingerprints AppMaps and writes an index
query-find — label-filtered call search (SQL-backed) queries/find.spec.ts--label filters via the labels table
query-tree — call tree, filter=sql queries/tree.spec.tsreturns only sql events when filter=sql
query-hotspots — hotspots by class queries/hotspots.spec.ts--class filters by defined class
query-endpoints — list/sort HTTP endpoints queries/endpoints.spec.tssorts by the requested key
query-mcp — get_call_tree MCP handler queries/mcp.spec.tsget_call_tree resolves the canonical appmap path…
trim — value truncation across every slot type trim.spec.tstruncates values across every captured slot…

@appland/sequence-diagram — 5 traces

Feature Covered by Status
specification — actor/package selection integration/specification.spec.tsincludes all relevant actors
labels — AppMap labels carried onto diagram actions (the branch feature) sequenceDiagram.spec.tsare reported on a labeled function action
protocols-http — HTTP server request as a diagram action http.spec.tsis recorded
protocols-sql — SQL query as a diagram action sql.spec.tsis recorded
diff — two-diagram diff → PlantUML integration/sequenceDiagramDiff.spec.tsUML matches expectation

Deliberately uncovered (declared in the manifests)

Subsystem Covered by Status
compare, inventory commands uncovered (intentional) — handler tests balloon to multi-MB AppMaps the diagram collapses anyway; excluded for leanness.
navie uncovered (intentional) — out of scope for this baseline.
sequence-diagram loop rendering / list-vs-prefetch diff uncovered (intentional) — repeated helper frames the diagram collapses; MB weight, no digest signal.

These are documented, reasoned exclusions — not accidental gaps. None is a
security-sensitive path lacking a negative test, so none rises to a blocking ❌.


Suggested Labels

No labels are suggested from this run. Label suggestions come from functions that
changed in the compare but carry no label
; with no compare and no application-code
diff, there is no changed-but-unlabeled function to flag. The one label-bearing behavior
in scope — AppMap labels carried onto sequence-diagram actions — is the branch feature
itself and is already exercised by the labels gold trace above.


Behavioral Drift

None — and none was measurable. The baseline carries no gold traces, so appmap compare had no base/ set to diff against; the pipeline stopped at establishing the
head baseline. Independently, the source diff changes no application code
(git diff --name-only 44cf1888..f311a533 is entirely gold traces, manifests,
appmap.yml, .gitignore, and CI), so there is no code path that could have moved a
trace even if a baseline existed.

Note for future runs: the compare digest excludes volatile data (elapsed time,
object ids, parameter/return values) by construction, so once a real baseline exists a
changed entry will be a genuine structural change, not timing or value jitter. The
appmap.yml exclusions added here (sanitizeURL/parseURL) further stabilize the
baseline against ambient git state, which is the right call for determinism.


Unintended Side Effects

No behavior changed outside the change's stated scope. This section is where a
behavioral diff normally earns its keep, but two independent facts make it empty here:
(1) there is no baseline trace set to diff against, and (2) the diff touches no
application code — only recording configuration and CI. There is no footprint-minus-intent
residue to grade. Nothing to confirm or flag.


Suggestions

🟢 INFO — Initial baseline is coherent and lean; bless it

File: packages/cli/gold_traces/manifest.yaml,
packages/sequence-diagram/gold_traces/manifest.yaml
Context: the 18 recorded gold traces

The baseline picks one deterministic, single-behavior trace per distinct subsystem
(commands, query engine, indexer's three responsibilities split apart, the sequence-diagram
protocols, and the label feature that motivated the branch). The exclusions are documented
and defensible. There is no runtime evidence to act on — this is the seed baseline that
future reviews will diff against.

Risk: none. Non-application, additive infrastructure.

Recommended remediation: none required. Bless these traces as the initial behavioral
baseline. Follow-up coverage suggestions are recorded in Tests to Synthesize.

🟢 LOW — appmap.yml determinism exclusions are appropriate

File: appmap.yml Context: packages/cli exclude list

Excluding sanitizeURL/parseURL (git-remote-dependent telemetry leaves) and scoping the
new packages/sequence-diagram block keeps the baseline reproducible across machines. No
functional behavior is affected; this only narrows what is recorded. Included so the
reader sees it was considered, not missed.

Risk: the only downside is reduced recording of those two leaf functions, which are
incidental telemetry, not app logic. Acceptable.

Recommended remediation: none. If future features route real logic through those
helpers, revisit the exclusion.


Tests to Synthesize

No security or correctness gap forces a new test this run — the uncovered subsystems
are intentional, documented exclusions rather than unguarded behavior. The rows below are
optional follow-ups a maintainer may choose to add later; none is blocking.

Target Test name Priority
compare command handler (currently excluded for size) a lean, small-fixture compare gold trace if size can be controlled low
inventory command handler (currently excluded for size) a lean inventory gold trace if size can be controlled low
sequence-diagram loop/prefetch rendering (excluded) a minimal loop-rendering trace, only if a small deterministic fixture exists low

SQL Pass

No new or changed queries — there is no application-code diff and no baseline to compare
against. For future reference, the baseline does establish SQL-relevant coverage: the
query-find trace exercises the labels table via the SQL-backed query engine, query-tree
filters to SQL events, and the protocols-sql sequence-diagram trace represents a SQL query
as a diagram action. These are the trace shapes a later change to the query engine or query.db
import path would be diffed against; nothing to review today.

HTTP Pass

No new or changed endpoints. The baseline establishes HTTP-relevant coverage for future
diffs: openapi (builds an OpenAPI document from request/response data, incl. malformed
requests), query-endpoints (lists and sorts HTTP endpoints), and the protocols-http
sequence-diagram trace (HTTP server request as a diagram action). No auth-gated mutation or
request-handling path changed here; nothing to review today.


Summary

Severity Findings Action required
🔴 High 0
🟡 Medium 0
🟢 Low / Info 2 Bless the initial baseline; determinism exclusions are appropriate.

Not merge-blocking. This is a clean, first-ever gold-trace baseline: the baseline
revision had no committed traces, the diff changes no application code, and there is
consequently no behavioral drift and no unintended side effect to report. The single
most important action is to accept these 18 traces as the initial behavioral baseline
so the next review has a real base to diff against — at which point the compare-driven
drift, side-effect, and absence checks become meaningful.

kgilpin and others added 3 commits July 3, 2026 13:43
…rkflow

Adds the committed AppMap gold-trace baselines for @appland/appmap (cli)
and @appland/sequence-diagram (manifest.yaml + baseline/), the
AppMap Behavioral Review GitHub workflow, and the supporting config:
root appmap.yml lists packages/sequence-diagram and excludes
non-deterministic telemetry leaves; .gitignore ignores .appmap/.

Branches off feat/sequence-diagram-labels.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kgilpin kgilpin force-pushed the feat/appmap-gold-traces branch from 0f61131 to 8a91c40 Compare July 3, 2026 17:43
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

AppMap Behavioral Review — establish gold-trace baseline

Revisions: 8bf8feda vs d37c53e8 (feat/sequence-diagram-labels) · Date: 2026-07-03 ·
Commits: 47092b3 add gold-trace baselines and behavioral-review workflow · 3cdf3f7 update behavioral baseline · 8a91c40 re-run behavioral review over the updated code (merged as 8bf8feda)

⚠️ First-ever run. The baseline revision d37c53e8 has no committed gold traces, so there is no prior recording to compare against — no behavioral compare could run. This review establishes the baseline and reviews the source diff on its own; it is a valid "so-far baseline" report, not a failure.

Summary

Severity Findings Action required
🔴 High 0 none
🟡 Medium 0 none
🟢 Low 0 none

Not merge-blocking. This change adds gold-trace recording infrastructure only — it touches no application code, so there is no runtime behavior to regress. The single action is to treat the 19 recorded traces as the accepted baseline; the next review will diff against them.

Findings

None. The diff between d37c53e8 and 8bf8feda contains no application source changes — only committed gold-trace AppMaps, an appmap.yml recording-scope config, a .github/workflows/appmap-review.yml CI workflow, and two .gitignore entries. There is no changed runtime behavior to interpret and no prior baseline to diff against.

Checks performed

Check Result Note
Behavioral compare — not run No gold traces exist at the baseline revision, so there is nothing to compare; this is the first baseline being laid down.
Changes outside the PR's scope (Step 5) ✅ none No application code changed, so no trace could drift; nothing to reconcile.
Missing guards (Step 6) No security-sensitive application code changed, so no guard could go missing.
Test/recording coverage (Step 2) 19 gold traces recorded across 14 CLI subsystems and 5 sequence-diagram features (detail ↓).
SQL (Step 4b) ✅ clean No SQL-bearing application code changed; no query diff to inspect.
HTTP (Step 4c) ✅ clean No request-handling application code changed; no server/client request diff to inspect.
Intended changes verified The gold-trace baselines, appmap.yml scope config, CI workflow, and .gitignore entries — all recording infrastructure, no behavior.
Review detail — features, coverage, labels, drift

Feature List

No application-code features. Every change in this diff is recording infrastructure or config, which Step 1 excludes from the feature list:

  1. Gold-trace baselines — commits 19 curated AppMaps (14 CLI, 5 sequence-diagram) plus two manifest.yaml specs as the behavioral baseline.
  2. Recording scope (appmap.yml) — excludes incidental sanitizeURL/parseURL leaves (they fire only when a git remote is present, so they vary by machine) and adds a packages/sequence-diagram recording block; a determinism config, not a behavior change.
  3. CI workflow (appmap-review.yml) — adds the behavioral-review job that records, blesses, and publishes this report.
  4. .gitignore — ignores the .appmap/ working directory used for derived sequence exports and archives.

Coverage Matrix

This is the baseline being established, not a feature under review, so every entry is a newly-recorded trace rather than a gap. No behavior is left uncovered because no behavior changed.

Subsystem / feature Covered by (gold trace) Status
stats analyzes a directory
sequence-diagram (CLI) JSON format / is valid
openapi handles valid and malformed HTTP server requests
prune reduces the size of an appmap from a base64url filter
index-fingerprinter fingerprints a recording and writes the on-disk index
index-querydb imports an end-to-end recording into all tables
index-traversal fingerprints AppMaps and writes an index
query-find --label filters via the labels table
query-tree returns only sql events when filter=sql
query-hotspots --class filters by defined class
query-endpoints sorts by the requested key
query-mcp get_call_tree resolves the canonical path and applies focus type
query-mcp-path-resolution only accepts the canonical path; name / numeric id return "not found"
trim truncates values across every captured slot while preserving structure
seq-diag specification includes all relevant actors
seq-diag labels are reported on a labeled function action
seq-diag protocols-http HTTP server request / is recorded
seq-diag protocols-sql SQL query / is recorded
seq-diag diff UML matches expectation (user found vs not found)

No record commands are emitted here — the manifests above already capture how each trace is recorded, and there are no ❌ gaps to close.

Suggested Labels

None. Label suggestions apply to functions that changed in the compare but carry no label; with no compare and no application-code change, there is nothing to label. (The sequence-diagram labels gold trace confirms label propagation is already exercised for future reviews.)

Behavioral Drift

None. There is no base-side recording to diff against, and no application code changed between the two revisions, so no trace could move. The recorded traces stand as the accepted starting point; subsequent reviews on this baseline will report drift relative to them.

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

AppMap Behavioral Review — sequence-diagram gold-trace baseline

Revisions: 8bf8feda vs d37c53e8 (feat/sequence-diagram-labels) · Date: 2026-07-03 ·
Commits: 47092b3 chore(gold-traces): add gold-trace baselines and behavioral-review workflow · 3cdf3f7 chore(gold-traces): update behavioral baseline · 8a91c40 ci: re-run AppMap behavioral review · 8bf8feda merge

⚠️ First-ever baseline. The baseline revision feat/sequence-diagram-labels (d37c53e8) has no committed gold traces, so there is no prior recording to diff against — appmap compare could not run. There is also no application-code change between the two revisions: the only files that differ are the newly-added gold traces, the CI workflow (.github/workflows/appmap-review.yml), .gitignore, and the AppMap recording config (appmap.yml). This report establishes the baseline and reviews what can be reviewed; it is a clean "so-far" report, not a failure.

Summary

Severity Findings Action required
🔴 High 0
🟡 Medium 0
🟢 Low 0

Nothing to block on. No application behavior changed between these revisions; the head commit only lays down the gold-trace baseline and its supporting config. Safe to merge. The single thing to be aware of is the recording-scope change in appmap.yml (see the drift note below) — it shapes what future reviews will capture, not what this code does.

Findings

None. No application code changed and there is no base recording to compare against, so there is no behavioral drift, no side effect, and no missing-guard finding to raise.

Checks performed

Check Result Note
Behavioral compare — not run Baseline has no committed gold traces (first-ever run) and no application code differs, so there is no base side to diff — nothing to compare.
Changes outside the PR's scope (Step 5) ✅ none No application code changed; the only edits are gold traces, CI workflow, .gitignore, and appmap.yml recording scope — no runtime behavior moved.
Missing guards (Step 6) No security-sensitive code changed, so nothing could have lost a guard.
Test/recording coverage (Step 2) This commit adds the coverage: five gold traces now exercise the sequence-diagram subsystems (detail ↓).
SQL (Step 4b) ✅ clean No query behavior changed; no source touches SQL.
HTTP (Step 4c) ✅ clean No request-handling code changed.
Intended changes verified The intended change — establish the sequence-diagram gold-trace baseline and wire the review workflow — is present and consistent with the manifest.
Review detail — features, coverage, labels, drift

Feature List

No application-code features changed on this branch. git diff --name-only d37c53e8..8bf8feda (excluding gold_traces/) touches only appmap.yml, .gitignore, and .github/workflows/appmap-review.yml — all infrastructure/config, none of it src/. The commit's purpose is to create the first gold-trace baseline for @appland/sequence-diagram and add the behavioral-review workflow that consumes it.

Coverage Matrix

The head commit introduces the baseline recordings below (from packages/sequence-diagram/gold_traces/manifest.yaml). They define what future reviews will diff against; there is no behavior to guard in this change, so every row is "newly established" rather than "verified against a prior run".

Feature Covered by Status
Code-object selection (actors/packages in the diagram) tests/integration/specification.spec.tsincludes all relevant actors ✅ new baseline
AppMap labels carried onto a diagram action tests/unit/sequenceDiagram.spec.tsare reported on a labeled function action ✅ new baseline
HTTP server request rendered as a diagram action tests/unit/http.spec.tsis recorded ✅ new baseline
SQL query rendered as a diagram action tests/unit/sql.spec.tsis recorded ✅ new baseline
Diagram diff (user found vs not found) → PlantUML tests/integration/sequenceDiagramDiff.spec.tsUML matches expectation ✅ new baseline

No ❌ gaps to close: there is no changed behavior in this commit that a missing negative trace would leave unguarded. The manifest deliberately omits loop-/large-fixture-heavy tests because their extra events collapse to repeated helper frames that add size without digest signal — a reasonable determinism trade-off, not a coverage hole.

Suggested Labels

None. No function changed in a compare (none ran), so there is no unlabeled changed function to annotate.

Behavioral Drift

No behavioral drift is possible or observed: the baseline carries no recordings and no application code differs between the revisions, so nothing moved at runtime.

The one non-code change worth confirming is in appmap.yml, which shapes future recordings rather than this code's behavior:

  • New exclusions sanitizeURL / parseURL under the CLI package. These only fire when the checkout has a git remote (the buildMetadata if (repository) branch), so they would appear or vanish with ambient git state rather than app behavior. Excluding them keeps gold traces deterministic across machines. Acceptable and self-documented in the config comment.
  • New packages/sequence-diagram recording scope (excluding node_modules, .yarn, dist, tests). This is what lets this package be recorded at all; it is the mechanism behind the new baseline above.

Both are intended and consistent with the stated goal (deterministic, lean gold traces). Confirm the excluded telemetry leaves stay excluded on the next re-record so the baseline digest remains stable.

Base automatically changed from feat/sequence-diagram-labels to main July 3, 2026 19:34
@kgilpin kgilpin merged commit 4d80a1f into main Jul 3, 2026
@kgilpin kgilpin deleted the feat/appmap-gold-traces branch July 3, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant