Conversation
Capture two failure modes hit during recent reviews so future agent runs don't repeat them: (1) refactor PRs need a side-by-side compare against main's pre-refactor code, not just trust in the commit message and a green test suite; (2) PRs that add a new constant/list/map need a whole-file read and a future-change-cost check to avoid creating yet another implicit duplicate of an existing list.
🎨 Storybook: ✅ Built — View Storybook |
📝 WalkthroughWalkthroughDocumentation update to AGENTS.md that introduces a new "Reviewing PRs" section, adding structured code review guidelines for refactor/extraction PRs, PRs introducing constants/lists/maps, and general hygiene practices for comprehensive verification. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 6 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
AGENTS.md (4)
368-368: ⚡ Quick winShorten the “future-change cost” rule for readability.
LanguageTool flagged Line [368] as potentially wordy. The rule is good, but you can compress the sentence while keeping the core: (a) read whole file, (b) grep for duplicates, (c) future-change cost > 1–2 places implies duplication, (d) use TS types to make drift a compile error.
✏️ Proposed rewording
-- Run a **future-change cost** check: "If a new $THING is added next month, how many files need to be touched?" If the answer is more than 1–2, flag the duplication and ask whether TypeScript types can constrain the other locations to derive from the new constant (e.g. `Record<Exclude<SupportedThing, 'default'>, Loader>` makes drift a compile error). +- Run a **future-change cost** check: "If a new $THING is added next month, how many files would need updates?" If it’s more than 1–2, treat it as duplication risk and ask whether TS types can force the other places to derive from the new constant (so drift becomes a compile error).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` at line 368, Shorten the “future-change cost” rule sentence by replacing the long sentence with a concise checklist: read the file, grep for duplicates, if a future change touches more than 1–2 places treat it as duplication, and suggest using TypeScript types (e.g., Record<Exclude<SupportedThing, 'default'>, Loader>) to make schema drift a compile-time error; update the text at the current rule to this compressed form preserving the TS example.
373-375: ⚡ Quick winRe-check Line [373] per LanguageTool (spelling/grammar).
Static analysis flagged Line [373] (“Ensure spelling is correct”). The current sentence is understandable, but the flag suggests a small textual issue (likely around “Copilot review” phrasing or similar). Please re-run the linter/spellcheck for this specific line and adjust wording if needed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` around lines 373 - 375, Re-run the spelling/grammar check for the sentence starting "Static lints, `tsc`, and bot reviewers (CodeRabbit, Copilot review) share the same blind spots..." in AGENTS.md (line containing "Ensure spelling is correct") and correct the phrasing to be idiomatic and unambiguous (e.g., change "Copilot review" to "Copilot Review" or "Copilot's review" and ensure punctuation around the parenthetical is consistent); update the sentence so it reads smoothly and passes LanguageTool (keep the meaning intact and run the linter again to confirm the flag is resolved).
354-356: ⚡ Quick winAvoid hardcoding
git showin the agent guideline (make it tool-agnostic).Line [354] suggests using
git show main:<old-path>. That’s precise, but it assumes a specific local git workflow and may not reflect how the agent tooling will view “main” pre-refactor code (GitHub UI, compare view, stored snapshots, etc.). Keep the intent (side-by-side main vs branch + compare key expressions) while removing the brittle command reference.✏️ Proposed text tweak
-- Pull the pre-refactor version of the equivalent code with `git show main:<old-path>` and put it side-by-side with the new code. Compare the **key expressions** (formulas, base values, clamps, branch conditions), not just the structure. +- Pull up the pre-refactor version of the equivalent code from `main` and put it side-by-side with the new code. Compare the **key expressions** (formulas, base values, clamps, branch conditions), not just the structure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` around lines 354 - 356, Replace the hardcoded command string "git show main:<old-path>" with a tool-agnostic instruction that conveys the intent: obtain the pre-refactor version of the equivalent code from the primary branch or snapshot and place it side-by-side with the new code for comparison; keep the rest of the guidance about comparing key expressions (formulas, base values, clamps, branch conditions), treating "fix:" commits as claims, and checking non-default test parameters unchanged.
358-359: ⚡ Quick winClarify/avoid potential confusion around “Per CLAUDE.md”.
Line [358] says: “Per CLAUDE.md, dev-server browser testing is required…”. If
CLAUDE.mdis truly a repo-standard doc name, it’s fine—but as written, it reads like an external/tool dependency reference. Consider phrasing that ties to “browser testing requirements in the repo” (or explicitly links to the file by path) so readers don’t wonder what/where “CLAUDE.md” is.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` around lines 358 - 359, The phrase "Per CLAUDE.md, dev-server browser testing is required…" in AGENTS.md is ambiguous; update that sentence to explicitly tie it to the repository (e.g., "Per the repository's CLAUDE.md (see CLAUDE.md), dev-server browser testing is required…" or "Per the repo's browser-testing requirements in CLAUDE.md, dev-server browser testing is required…") or add a relative link/path to the CLAUDE.md file so readers know it’s an in-repo document; edit the exact string "Per CLAUDE.md, dev-server browser testing is required for UI changes." accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@AGENTS.md`:
- Line 368: Shorten the “future-change cost” rule sentence by replacing the long
sentence with a concise checklist: read the file, grep for duplicates, if a
future change touches more than 1–2 places treat it as duplication, and suggest
using TypeScript types (e.g., Record<Exclude<SupportedThing, 'default'>,
Loader>) to make schema drift a compile-time error; update the text at the
current rule to this compressed form preserving the TS example.
- Around line 373-375: Re-run the spelling/grammar check for the sentence
starting "Static lints, `tsc`, and bot reviewers (CodeRabbit, Copilot review)
share the same blind spots..." in AGENTS.md (line containing "Ensure spelling is
correct") and correct the phrasing to be idiomatic and unambiguous (e.g., change
"Copilot review" to "Copilot Review" or "Copilot's review" and ensure
punctuation around the parenthetical is consistent); update the sentence so it
reads smoothly and passes LanguageTool (keep the meaning intact and run the
linter again to confirm the flag is resolved).
- Around line 354-356: Replace the hardcoded command string "git show
main:<old-path>" with a tool-agnostic instruction that conveys the intent:
obtain the pre-refactor version of the equivalent code from the primary branch
or snapshot and place it side-by-side with the new code for comparison; keep the
rest of the guidance about comparing key expressions (formulas, base values,
clamps, branch conditions), treating "fix:" commits as claims, and checking
non-default test parameters unchanged.
- Around line 358-359: The phrase "Per CLAUDE.md, dev-server browser testing is
required…" in AGENTS.md is ambiguous; update that sentence to explicitly tie it
to the repository (e.g., "Per the repository's CLAUDE.md (see CLAUDE.md),
dev-server browser testing is required…" or "Per the repo's browser-testing
requirements in CLAUDE.md, dev-server browser testing is required…") or add a
relative link/path to the CLAUDE.md file so readers know it’s an in-repo
document; edit the exact string "Per CLAUDE.md, dev-server browser testing is
required for UI changes." accordingly.
📦 Bundle: 5.23 MB gzip 🟢 -27 BDetailsSummary
Category Glance App Entry Points — 22.5 kB (baseline 22.5 kB) • ⚪ 0 BMain entry bundles and manifests
Status: 1 added / 1 removed Graph Workspace — 1.24 MB (baseline 1.24 MB) • ⚪ 0 BGraph editor runtime, canvas, workflow orchestration
Status: 1 added / 1 removed Views & Navigation — 77.7 kB (baseline 77.7 kB) • ⚪ 0 BTop-level views, pages, and routed surfaces
Status: 9 added / 9 removed / 2 unchanged Panels & Settings — 488 kB (baseline 488 kB) • ⚪ 0 BConfiguration panels, inspectors, and settings screens
Status: 10 added / 10 removed / 11 unchanged User & Accounts — 17.5 kB (baseline 17.5 kB) • ⚪ 0 BAuthentication, profile, and account management bundles
Status: 5 added / 5 removed / 2 unchanged Editors & Dialogs — 113 kB (baseline 113 kB) • ⚪ 0 BModals, dialogs, drawers, and in-app editors
Status: 4 added / 4 removed UI Components — 61 kB (baseline 61 kB) • ⚪ 0 BReusable component library chunks
Status: 5 added / 5 removed / 8 unchanged Data & Services — 3.04 MB (baseline 3.04 MB) • ⚪ 0 BStores, services, APIs, and repositories
Status: 13 added / 13 removed / 4 unchanged Utilities & Hooks — 364 kB (baseline 364 kB) • ⚪ 0 BHelpers, composables, and utility bundles
Status: 13 added / 13 removed / 18 unchanged Vendor & Third-Party — 9.88 MB (baseline 9.88 MB) • ⚪ 0 BExternal libraries and shared vendor chunks Status: 16 unchanged Other — 8.83 MB (baseline 8.83 MB) • ⚪ 0 BBundles that do not match a named category
Status: 57 added / 57 removed / 78 unchanged ⚡ Performance
|
There was a problem hiding this comment.
We might want to promote this to a full Skill.
DrJKL
left a comment
There was a problem hiding this comment.
Worth a shot. @christian-byrne wdyt?
This is also something fallow might help with.
There was a problem hiding this comment.
Reading The Review Record
For context: I had already started a parallel dig through the recent FE review strata, working at this very seam, when this PR opened a cut at the same site. It made me resume that work today, because we seemed to be uncovering the same creature from different angles: review misses that leave tracks in the PR record before anyone names the pattern.
What sent me digging was a layer in my own recent PR history. some of my PRs got review feedback that probably should have been incorporated before the formal review cycle. Some of it surfaced only when humans read the PR, even after CI or automated review had already passed over the ground without registering the trail.
What turned up in the cut
I widened the sample to the most recent 50 human-authored PRs touching production FE code, then read the human review comments as trace evidence. The recurring tracks were:
- async/state freshness bugs after load/drop/refresh/teardown flows
- tests tracking a nearby animal rather than the one being changed
- similar-looking ids, counts, or payload shapes carrying different meanings
- new constants/state/helpers building another nest beside an existing one
- extension-facing or public callback/API changes (tracks leaving the preserve) needing explicit compatibility review
- UI fixes needing screenshot/video/manual browser verification, not just green tests
- smaller scuff marks: shared mutable test state, helper placement, unrelated diff noise
Where I'd dig next
My read: this PR reads two of the tracks deeper than I have, but the trail keeps going. The evidence suggests these misses are not isolated accidents; the same species keep turning up across the record. I would keep AGENTS.md compact, but the broader pattern probably belongs in a fuller review Skill — closer to a bestiary than a checklist, with entries for refactors, constants/maps, async state, schema boundaries, extension surface, and UI/manual verification [open question: is this an antipattern in the coding-agent context?], so an agent or reviewer can look up the species they suspect they're seeing and read what it usually does.
This was only a small surface dig — five days of stratigraphy, with a Load3D seam running through it. The 50-PR sample is enough to show a minimum population of recurring review misses, not a complete map of the ecosystem. If this seems useful, we could excavate further along the same line and turn the larger pattern into a proper field guide.
Full field notes: recent FE PR review traces
Field Method
Excavation site:
- Repo:
Comfy-Org/ComfyUI_frontend - Specimens collected: most recent 50 human-authored PRs, by creation time, touching production FE code under
src/orpublic/ - Discarded as non-diagnostic: bot-authored PRs, backports, docs-only PRs, test-only PRs, locale files as the only production-path signal
- Stratigraphic window: PRs created from 2026-04-25 to 2026-04-29
- Trace material read: human review bodies and inline review comments
- Bot/app review comments were collected, but not used as the primary evidence
Specimen counts:
- 50 PRs sampled
- 33 had human review feedback
- 14 had at least one human
CHANGES_REQUESTEDreview - 148 human inline comments
- 51 human review bodies
Raw collection artifacts:
temp/in_progress/recent_human_fe_pr_reviews_50.jsontemp/in_progress/recent_human_fe_pr_review_digest_50.md
The Tracks
1. Async State That Went Stale After The First Pass
Reviewers repeatedly caught state that was correct on the first happy path but stale after replacement, failure, cancellation, teardown, or a second cycle.
Examples:
- #11711: Load3D capability refs were refreshed on initialize/reload, but not after drag/drop model replacement.
- #11724: a shared
AbortControlleroutlived a DOM widget; after one removal, later video previews reused an already-aborted signal. - #11660 and #11627: Load3D adapter state could be published before load success, then remain stale after rejected/null loads.
- #11626: Preview3D camera matrices were applied on the wrong async load continuation, so overlapping loads could apply old camera state to a newer model.
- #11734: capture-time renderer/camera/material state needed explicit snapshot, restore, disposal, and tests for those cleanup paths.
Field checks:
- For every async path, ask what happens on success, failure, abort, replacement, teardown, and repeat invocation.
- Bind delayed continuations to the specific request/load/generation that produced them.
- Add tests for the second cycle, stale continuation, failed load, or cleanup path, not only the first successful path.
2. Tests That Missed The Animal They Were Named After
Several PRs had tests, but review feedback showed the tests were tracking a nearby animal, not the one named on the label.
Examples:
- #11691: a new Playwright test covered one happy path while deleted unit tests left drag/drop error branches and payload validation uncovered.
- #11751: cycle tests asserted only
not.toThrow(), so a quiet behavior drift in the exact path map shape could pass. - #11734: the browser regression test for Load3D zoom resolution failed on the PR head; later cleanup fixes also needed direct tests so they would not regress.
- #11636: browser autofill depended on real rendered DOM attributes; reviewers asked for component tests asserting those attributes directly.
- #11651 and #11692: tests were called out for asserting brittle utility classes or screenshots instead of behavior.
Field checks:
- State the exact risky branch before looking at the test.
- Ask whether the test would fail if the reviewed bug reappeared.
- Avoid change-detector tests that only assert classes, screenshots, or defaults.
- For refactors, compare behavior against main with concrete inputs, not just structure and green tests.
3. Different Species Wearing The Same Shape
Reviewers caught cases where values looked structurally similar but meant different things.
Examples:
- #11691: a filename string was treated like an asset root/type, corrupting file combo values.
- #11691:
{}passed a loose asset-drop schema and short-circuited the real upload fallback. - #11699:
LinkIdandRerouteIdwere mixed inlinkExtensions.parentId, weakening the intent of the new type aliases. - #11628: survey
labelstrings and i18nlabelKeys were conflated, leaving rendered option text untranslatable. - #11737: output-file count semantics drifted between two helpers, including different thresholds around
outputCount.
Field checks:
- Name the domain meaning of each id, key, payload, and count.
- Prefer required discriminants and schemas at boundaries.
- Use branded or named types when structurally similar values represent different domains.
- Verify fallback behavior for malformed, missing, empty, and partially valid payloads.
4. Another Nest Where There Was Already One
Reviewers frequently pushed back when a fix introduced a parallel mechanism instead of consolidating existing ownership.
Examples:
- #11712: adding
SUPPORTED_LOCALESwould have created another hand-maintained locale list in a file that already had several loader maps plus settings options. - #11737: ZIP export file counting duplicated existing output-count helpers and already disagreed on threshold semantics.
- #11646: a new missing-model refresh composable reimplemented
runMissingModelPipeline, duplicating pending-warning reconciliation, cloud verification gaps, and concurrency state. - #11661: the follow-up moved refresh behavior back through the existing pipeline and into store-owned concurrency, which reviewers explicitly called out as the cleaner shape.
- #11720: a large temporary missing-model download-progress solution was flagged as duplicated work relative to planned fixes.
Field checks:
- Read the whole file, not just the diff hunk.
- Grep for one or two representative values or helper names before accepting a new list/map/helper.
- Ask: if a new locale/model/status/output shape is added next month, how many places need edits?
- Prefer delegating to the owner of the behavior over adding a caller-local parallel path.
5. Tracks Leaving The Preserve
Some risks were not internal correctness bugs, but compatibility or surface-area changes.
Examples:
- #11691: removing
claimEventfromnode.onDragDropwas flagged as a public callback signature change that custom nodes might rely on. - #11698: adding
incrementVersion()toLGraphrequired checking the ECS migration plan and ADR staging, not just whether the code was simple. - #11717: the
downloadAsset/downloadMultipleAssetstodownloadAssetsrename prompted discussion about whether to export an explicit return-type alias or provider type. - #11646: adding a
silentoption to a public refresh API was questioned because it coupled a public extension API to one caller's toast preference.
Field checks:
- Identify whether the changed function/type is extension-facing.
- Code-search call sites, including likely custom-node/extension surfaces when relevant.
- Prefer internal primitives over caller-specific flags on public APIs.
- If a breaking or staged API change is intentional, include migration rationale in the PR description.
6. UI Changes That Needed Eyes On The Ground
For user-visible behavior, reviewers often wanted screenshots, video, or manual browser validation in addition to tests.
Examples:
- #11713: Desktop modal/search clipping and drag-region behavior needed video or screenshots, and the drag-region fix had a real pointer-events bug.
- #11668: even a test-focused link-fixer PR got a request for a behavior video.
- #11734: the Load3D zoom-resolution bug depended on the actual browser canvas backing size, and the PR's own browser test caught the issue.
- #11687: responsive node-search behavior was approved partly because screenshots and E2E coverage matched the intended UX.
- #11628: survey flow/payload/required-field behavior needed component coverage because the UI is data-driven.
Field checks:
- For UI/interaction PRs, require at least one human-readable validation path: screenshot, video, Storybook, browser test, or explicit manual test notes.
- Include adversarial cases, not just a happy path: narrow viewport, repeated open/close, replacement while loading, optional fields, malformed input.
- Do not treat unit tests as proof of browser-only behavior like hit testing, canvas backing dimensions, password-manager attributes, or Desktop drag regions.
7. Trace Marks Without A Body
There were also recurring smaller issues: test helper placement, mutable describe-scoped state, unrelated import diffs, brittle helper types, and formatting/style nits. These matter because they add review surface and create future flakes, but they were less central than the behavior and architecture misses above.
Examples:
- #11751 and #11717: shared mutable test state and manual reset patterns were flagged as future flake risks.
- #11724: test helper placement and mutable fixture setup drew cleanup comments.
- #11691: unrelated import reordering was called out as diff noise.
- #11651: utility-class assertions were removed because they tested styling mechanics instead of behavior.
Field checks:
- Keep diffs scoped to the behavior under review.
- Use per-test setup factories instead of shared mutable fixtures where possible.
- Prefer named helper types over nested
ReturnType<ReturnType<...>>puzzles. - Treat style and formatting as lower priority than correctness, but do not let them obscure the review.
How This Relates To #11770
#11770 reads two of the tracks deeper than I have, but the trail keeps going. It is directionally right and procedurally sharper than my notes on those two patterns; the gap is breadth, not depth.
It already targets two important blind spots:
- Refactor reviews need explicit behavior-equivalence checks against main.
- New constants/lists/maps need whole-file and cross-file duplicate-source checks.
The wider sample suggests the same guidance could be expanded slightly, or used as the seed for a fuller review Skill/checklist. The extra sections I would consider are:
- async state freshness and cleanup paths
- schema/data boundary checks
- extension/public API compatibility
- test coverage of risky branches, not only happy paths
- UI/browser/manual verification expectations
I would keep AGENTS.md compact. The fuller version probably belongs in a review Skill — closer to a bestiary than a checklist, where an agent or reviewer can look up the species they suspect they're seeing and read what it usually does. The checks are procedural and context-dependent, which is exactly what a field guide is for.
Compact Text That Could Fit In AGENTS.md
If this is folded into #11770, I would avoid a long taxonomy and add something closer to this:
#### Common review misses to check before approval
- For async or lifecycle code, trace success, failure, abort, replacement, teardown, and second-run paths. Stale refs, reused abort signals, and delayed continuations are common sources of regressions.
- For tests, identify the risky branch first, then verify the test would fail if that branch regressed. Avoid tests that only assert defaults, classes, screenshots, or helper mocks.
- For data boundaries, check that similar-looking ids, keys, counts, and payloads are not being conflated. Prefer required schemas, discriminants, and named or branded types where drift would be costly.
- For new constants, maps, helpers, or stores, grep for existing sources of truth and run a future-change-cost check before accepting another list or parallel mechanism.
- For extension-facing callbacks/types or core graph APIs, explicitly check compatibility and staging. Public API changes need migration rationale, not just green tests.
- For UI and interaction changes, ask for browser/manual validation with adversarial cases. Unit tests alone do not prove hit testing, canvas sizing, Desktop drag regions, password-manager attributes, or visual clipping.|
Agreed, promoting this to a full Skill makes sense. One thing I want to emphasize on the principle behind it: this review checklist should be sourced from human developer experience, specifically, real cases where Claude or other bot reviewers approved a PR and a human reviewer subsequently caught a bug the AI missed. The reason: AI-generated review heuristics tend to be plausible-sounding but unmoored from how bugs actually slip through here. If we let Claude "imagine" what reviewers should check, the Skill becomes a self-referential loop. If we anchor it to documented misses, the Skill encodes our team's actual blind spots and gets sharper over time. Concretely, the two pitfalls already in this PR fit that pattern: pitfall 1 came from a real refactor PR where the AI missed a saturation regression that surfaced only on manual testing; pitfall 2 came from an i18n PR where the AI didn't flag a 5th |
Absolutely agree. If you expand the "Full field notes" you'll find my agent's attempt to extract this. I would be curious on if you think it did a good job: if it did this is evidence that this process can be automated and we should expand the dig - if not, then perhaps we need an |
Capture two failure modes hit during recent reviews so future agent runs don't repeat them:
(1) refactor PRs need a side-by-side compare against main's pre-refactor code, not just trust in the commit message and a green test suite;
(2) PRs that add a new constant/list/map need a whole-file read and a future-change-cost check to avoid creating yet another implicit duplicate of an existing list.
┆Issue is synchronized with this Notion page by Unito