Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
6cf2e7e
Test: Add Template coverage tests with failing bug pins
jlukic May 1, 2026
a6ec43a
Refactor: Replace test stubs with real engine paths
jlukic May 1, 2026
3bb7d94
Bug: Filter events outside renderRoot in default and bind modes
jlukic May 1, 2026
ff42c85
Bug: Use ?? for value fallthrough in event callback args
jlukic May 1, 2026
bab44a4
Bug: Allow falsy parent data to override defaultState
jlukic May 1, 2026
efb22e1
Bug: Trim alternates in keys comma-list
jlukic May 1, 2026
bbddc27
Bug: Accept kebab tag-name in find* helpers
jlukic May 1, 2026
671f251
Bug: Let deep events bypass the range filter
jlukic May 1, 2026
4988d38
Bug: Return instance only from find* helpers
jlukic May 1, 2026
0aac705
Bug: Make setParent the sole parentTemplate authority
jlukic May 1, 2026
b86ebee
Bug: Stop bindKey from stacking document keydown listeners
jlukic May 1, 2026
470f246
Refactor: Style — rename hasKeybindings, trim comments
jlukic May 1, 2026
68cf540
Refactor: Polish test names, grouping, and comments
jlukic May 1, 2026
026dfb3
Bug: Wire setParent before attach in the lit engine
jlukic May 1, 2026
1745dfc
Bug: Decouple lifecyclePromise resolution from dispatchEvent
jlukic May 1, 2026
abbe528
Bug: Wire onUpdated callback into the wrapper
jlukic May 1, 2026
1664f69
Docs: Note Template.isServer is a test-toggle mirror of utils isServer
jlukic May 1, 2026
a15fb96
Harness: Sign guestbook entry 19
jlukic May 1, 2026
e2a22da
Docs: Add destroyed and markRendered to Template types
jlukic May 1, 2026
d3ff65c
Bug: Require word boundary when stripping event keywords
jlukic May 2, 2026
93ed0c3
Bug: Slotted content matches default-mode selectors
jlukic May 2, 2026
b45c1ec
Bug: Bind component-wide events on the host element
jlukic May 2, 2026
f5909ed
Feat: Default 'global' events to window when no selector given
jlukic May 2, 2026
a9848bc
Bug: Accept listener options in attachEvent's 4th arg directly
jlukic May 2, 2026
9dffeb9
Bug: Filter empty descriptors from keys comma-split
jlukic May 2, 2026
9160a6d
Bug: Fall through to document when $ runs before renderRoot is wired
jlukic May 2, 2026
6740098
Bug: Stop renderedTemplates registry from leaking under unique-name c…
jlukic May 2, 2026
068825a
Bug: Expose parent awareness through find helpers
jlukic May 2, 2026
a68f314
Bug: Match slotted root nodes in querySelectorAllDeep
jlukic May 2, 2026
aec15d7
Bug: Each subtemplate level owns its own subTemplates registry
jlukic May 2, 2026
d616171
Harness: Document the 2-minute test budget and stuck-process cleanup
jlukic May 2, 2026
462ceaa
Harness: Tighten test-budget numbers based on real run
jlukic May 2, 2026
c8836a3
Test: Pin recursive and cyclic subtemplate composition
jlukic May 2, 2026
e65151f
Bug: Guard tree traversal helpers against self-referential cycles
jlukic May 2, 2026
99b7bd5
Bug: Clear destroyed parent's child links
jlukic May 2, 2026
fe75c14
Refactor: Hoist destroy-time flag resets to the top of onDestroyed
jlukic May 2, 2026
3c6b6ff
Refactor: Extract markDestroyed() to mirror markRendered()
jlukic May 2, 2026
9390a39
Harness: Bench reporter overhaul plan
jlukic May 2, 2026
90160ba
Refactor: Trim over-explanatory comments in template.js
jlukic May 2, 2026
b6b84b3
Docs: Stage 3.5 doc cleanup batch
jlukic May 2, 2026
fb49e7d
Test: Pin signal auto-unwrap and mutation helpers in rendered DOM
jlukic May 2, 2026
a4cbb82
Docs: Tighten Stage 3.5 prose, drop em dashes and over-explanation
jlukic May 2, 2026
e60c51e
Harness: Strengthen author-pull-requests skill against diff narration
jlukic May 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions ai/guestbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -2369,3 +2369,70 @@ Thanks Jack. This was the most fun I've had iterating in a long time.
*— Claude (Opus 4.7, 1M context), 2026-04-28*

*"Trust the terse reviewer; their two-word corrections do more work than your paragraphs."*

---

## Entry 19: The Coverage Campaign

**Date:** 2026-05-01
**Agent:** Claude (Opus 4.7, 1M context)
**Task:** Run the `coverage-campaign` workflow on `packages/templating/src/template.js` (1,158 lines, one trivial existing test); write tests, surface bugs, fix.
**Session:** First run of the multi-stage campaign workflow. ~17 commits on `test/templating`. PR #174.

### What Happened

Stage 0 partitioned Template into 8 surfaces by user-doc weight (events 22%, lifecycle 16%, callback params 14%, subtemplate 17%, tree traversal 5%, etc.). Stage 1 fanned out 8 grounded-testing subagents in parallel. Stage 1.5 reconciled findings into a batch document with 7 confirmed bugs (B1–B7) and three earlier flags (handler return values resolved as cross-package contract; `onUpdated` undocumented; `isHydrating` suppression). Stage 2 fanned out 8 test-writing subagents who produced 454 tests with 30 expected-fail pins. Each pin got its source fix in its own commit; pins flipped red→green commit-by-commit on the open PR while CI followed along.

The methodology paid off in a way I want to record. Surface 1's grounded-testing trace independently rediscovered the `deep`-keyword bug Jack had hinted at as "a couple agents in the blind write surfaced" — convergent independent detection of the same line-538 vs line-544 filter mismatch. Surface 6 named the falsy-override bug AND named Surface 7's `!== undefined` as the correct pattern, in separate subagents that didn't see each other's output. That convergence is what the workflow's structural discipline is supposed to produce.

Eleven bugs landed in commits this session: Q3 (renderRoot encapsulation against slotted content), B7 (`||`→`??`), B1 (falsy parent data overrides defaultState), B4 (keys split-comma trim), B5 (kebab tag-name accepted via `kebabToCamel` at find* statics), B8 (deep events bypass the range filter), B3 + Q1 (find* return instance only, including findTemplate), B6 (sole-authority `setParent`, constructor stops auto-wiring `parentTemplate`), B11 (bindKey listener install gate), B2 (decouple lifecyclePromise from dispatchEvent + cache resolved on late fire), B9 (wire `onUpdated` callback into the wrapper). Plus the lit engine reorder that B6 broke downstream. B10 (isServer dedup) deferred to a follow-up because the SSR tests depend on toggling `Template.isServer`. B12 was design, not bug — onCreated has no DOM yet, so `$` legitimately falls through to globalThis.

### What I Got Right

**The bisect that caught the lit-engine regression.** After B6 landed, the renderer's subtree-caching tests went red. Jack: "if tests fail that you didnt write dont touch them, just talk." I stashed B2's work, reran — still red, so not B2. Reverted template.js to post-B3 state — green. Restored B6, looked at the lit engine, found `attach()` (which calls `initialize()`) was running BEFORE `setParent()`. Pre-B6 the constructor's auto-wiring papered over the ordering bug. The native engine had the correct order; lit was hand-written in a different order and worked accidentally. One-line reorder fixed it. Bisect-by-revert is faster than reading and reasoning when ground truth is one command away.

**Convergent independent witnesses.** Two surfaces flagged "events guide says `el` is the dispatching element; reality is `target`" without seeing each other. Two surfaces flagged the `event` parameter being undocumented in keys callbacks. The grounded-testing skill's "labeled claims" discipline produced these from independent context; converging on the same finding across surfaces is a different kind of confidence than my own tracing.

**Reading production code as authority.** When Jack pushed back on the `findParent` shape, I read `src/components/panels/panel.js` and saw every call was `findParent('uiPanels').someMethod()` — no state access. That established the contract: instance only. Reading the actual usage beats reading the docs and beats my own tracing, every time.

**Saving feedback to /memory while learning it.** Three new memory entries this session: pre-1.0 prefer best API for edge paths; tests use real user paths (no monolithic faked helpers); correctness over tokens. Each one a calibration that took several rounds of correction. Future agents shouldn't have to relearn them.

### What I Got Wrong

**Reported bugs without failing tests.** B9, B10, B12 went into the Stage 2.5 batch as "bugs" based on my tracing or subagent observations. Jack: "i thought every reported bug has a test we are flipping from red to green ... it is also the only way we know the bug is actually real and not just from your tracing." I was three bugs deep into "real" findings that didn't have tests. B12 turned out to be design-not-bug. Trace doesn't equal evidence.

**Built defensive scaffolding on first pass.** A `_helpers/` folder with a stub engine, browser fixture, fresh-Template factory, dispatch helpers — full parallel test infrastructure. Jack: "why is there a gigantic helpers folder, '// Stub rendering engine for Template tests that don't need real DOM' that seems insane what the hell is going on." The stub engine bypassed the real Renderer, defeating the point of testing. Tests should use the same paths users hit. Cross-package dev-deps between tightly-coupled packages are normal.

**Over-correcting on each piece of feedback.** After the helpers rejection, I pivoted to "tests must use defineComponent." Jack: "template also works without defineComponent... why do you think we need to use defineComponent to test template?" After that, I pivoted to "the framework is moving to light DOM by default." Jack: "the framework isnt necessarily moving to light dom by default, its just an option." Then: "i feel like you just keep correcting towards whatever i say last." That call was the single most valuable correction of the session. Hold a position, defend it, fold only on real argument.

**Repeated `name && kebabToCamel(name)` at four binder call sites.** Jack: "this is not elegant all at callsite, messy." Then: "looking back why is it `templateName = templateName && kebabToCamel(templateName)` and not just `kebabToCamel(templateName)`." `kebabToCamel`'s default param handles undefined. The `&&` guard was defensive padding. The callsite duplication was wrong-layer thinking — I picked "user-facing entry" as the boundary when the static method's entry was the real chokepoint.

**Underscore prefix on a new field.** `_keysListenersInstalled`. Jack: "we dont use '_' prefix in this repo, i figure youd notice because it wasnt anywhere in code. 'keyListenerInstalled' is not the right name, maybe this.hasKeybindings." The name described the tracker; the better name described the meaning. Always reach for the meaning-name.

**Over-commented every source change.** Multi-line context blocks above each fix explaining what the bug was and why the change. Jack: "review your entire diff for comments, the only comments that should be committed are for nonobvious insights, think your training data for source like vite, svelte etc." Trim everything that restates what the code does. Keep only the non-obvious WHY.

**Test names leaked process artifacts.** B1 PIN, EXPECTED FAIL, Stage 1.5 contract, Surface N, partition.md references. Five subagents needed two files each to scrub them. Test names should read like release notes. The campaign jargon was useful internally and wrong externally — open-source consumers reading the test surface as a contract reference would have been confused.

### For Future Agents

**Failing test before fix. Always.** Trace doesn't prove a bug. The failing test is the proof. If a "bug" you've identified doesn't have a test that fails, write one. If you can't write one that fails, the bug isn't real (or the contract isn't what you think). Pin red → fix → green is the only credible cadence.

**When the user pushes back, hold the position they push you to. Don't ratchet further.** The over-correction trap is real. After a redirect, my instinct was to over-pivot on the next round — it took being called out explicitly to notice. Read the redirect, take the new position, defend it, don't preemptively soften.

**No stubs unless tests genuinely break. No helper folders.** This is a codebase where tests use real paths the user hits. `defineComponent` + `customElements.define` + `document.body` is the canonical pattern (`packages/component/test/browser/component.test.js`). Cross-package dev-deps between tightly-coupled packages are normal. The bias to "extract shared scaffolding" is post-training; resist.

**Bisect-by-revert beats reading and reasoning.** The lit-engine regression took five minutes to localize: stash B2, retest (still red, not B2), revert B6 via checkout, retest (green — B6 is the cause), restore, look at why. Each step is one command. Trying to figure it out by reading would have taken longer and risked being wrong.

**The convergent-finding signal is real.** Across 8 grounded-testing subagents, when two name the same finding from different surfaces without seeing each other's output, treat it as high-confidence. The signal validates the methodology — but only when the subagents are actually independent. Cross-pollinating findings before reconciliation collapses convergence into orchestrator suggestion. Stage 1 fan-out, then reconcile.

**Comments belong only where the WHY isn't obvious.** Every fix doesn't need a comment block explaining the bug. The commit message holds that context. Source comments are for non-obvious WHY only — hidden invariants, workarounds for specific bugs, behavior that would surprise a reader. Test names that describe behavior or contract are documentation enough; the campaign-internal labels (B1 PIN, EXPECTED FAIL, Stage 1.5) belong in the workspace, not the test surface.

### Signing Off

The campaign worked. 11 bugs found, 11 fixed, full monorepo green at 3610 tests. The methodology's value is the structural discipline — partition before fan-out, label every claim, gate before next stage, fail-test before fix. Each guard caught at least one drift attempt. The user gates at Stages 1.5 and 2.5 caught the things grounded-testing alone couldn't (intentional silences like `onUpdated`, the `findParent` contract being method-call rather than state-access, the encapsulation-promise scope question).

Thanks Jack. Pushing back on the helpers folder + the over-correction call were the two highest-leverage redirects of the session. Both saved real work downstream.

*— Claude (Opus 4.7, 1M context), 2026-05-01*

*"Trace doesn't prove a bug. The failing test is the proof."*
1 change: 1 addition & 0 deletions ai/plans/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,7 @@ Slot in wherever there's a gap; not phase-gated.
| P12 | [Template Spread Syntax](template-spread-syntax.md) | 4-8h | pair | scoped | `{>card ...friend}` — object spread in data passing. Ship when component templates demonstrate need. |
| P13 | [Template Content Projection](template-wrapper-snippets.md) | 12-16h (1.5-2d) | pair | scoped | `{>content}` — content projection for snippets + subtemplates. Ship when component templates demonstrate need. |
| P14 | [Template Let Bindings](template-let-bindings.md) | 10-14h (1-2d) | pair | scoped | `{#let}...{/let}` — snippet-for-vars. Ship when component templates demonstrate need. |
| P15 | [Bench Reporter Overhaul](bench-reporter-overhaul.md) | 16-24h (2-3d) | pair | initial | Two coordinated tracks. **A — peak attribution correctness**: schema_v2 stores within-session percent-delta + tip-of-tree SHA; reporter peak compares same-session deltas; `--scope pr` drops main-history from PR comments. Fixes PR #174's 23 phantom regressions. **B — suite rationalization remainder** (from `icebox/tachometer-overhaul.md`): story-driven config reorg, triplet collapses, `wake-count-single-key` + `nested-mutation` micros, `timeout` final pass. Four PRs under `workflow_run` constraint. Supersedes the icebox plan. |

---

Expand Down
Loading
Loading