You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: scripts/loop-data.mjs
+22-18Lines changed: 22 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -407,37 +407,41 @@ export const loops = [
407
407
slug: "full-product-evaluation-loop",
408
408
title: "The full product evaluation loop",
409
409
summary:
410
-
"Tests every major product capability and fixes outcomes below the quality bar.",
411
-
seoTitle: "Full Product Evaluation Loop for AI Systems | Loop Library",
410
+
"Recreates production locally, tests every product surface, and fixes all verified bugs holistically.",
411
+
seoTitle: "Production-Grade Full Product Evaluation Loop | Loop Library",
412
412
description:
413
413
"A comprehensive product-quality workflow that evaluates realistic scenarios across every major capability, fixes weak outcomes, and reruns them to the defined bar.",
414
414
categoryLabel: "AI product evaluation workflow",
415
415
author: "Matthew Berman",
416
416
published: "2026-06-16",
417
-
modified: "2026-06-17",
417
+
modified: "2026-06-21",
418
418
prompt:
419
-
"Create [N] realistic scenarios covering every major capability. Before testing, define clear success criteria and choose a consistent evaluation method, such as pass/fail checks or a scoring rubric. Run every scenario under the same conditions and record evidence for each outcome. Fix the underlying cause of anything that does not meet the criteria, rerun the affected scenarios, and then rerun the complete set. Continue until every scenario meets the original quality bar.",
420
-
verifyTitle: "Every one of the [N] scenarios meets the defined quality bar.",
419
+
"Build sanitized, production-scale local data under production-like settings. Inventory every user-facing feature, role, route, button, input, modal, state, and workflow; define documented acceptance criteria and finite risk-based edge cases for each. Test as a real user, logging every bug with reproduction evidence. Review findings for shared causes and dependencies; implement coherent fixes with regression tests, then rerun the full inventory. Stop at a clean pass or blocked handoff. Ask before production, sensitive data, or destructive actions.",
420
+
verifyTitle: "Every inventoried product surface meets its documented acceptance criteria.",
421
421
verifyDetail:
422
-
"The final evaluated run covers every major capability under the original conditions.",
422
+
"The final full regression run covers every inventoried surface and its finite risk-based edge cases in the production-like local environment, with each reproducible bug fixed and backed by evidence.",
423
423
useWhen:
424
-
"Use this for an end-to-end product evaluation when quality must be measured across the full feature set rather than a narrow regression or a few hand-picked examples.",
424
+
"Use this for an exhaustive, end-to-end application QA pass when a production-like local environment and complete interactive-surface coverage matter more than a narrow regression or sample of major features.",
425
425
steps: [
426
-
"List every major capability, define the success criteria and evaluation method, choose [N], and allocate realistic scenarios across the product surface.",
427
-
"Run the full set under consistent conditions and evaluate every outcome with evidence.",
428
-
"Document each scenario that misses the criteria, fix the underlying issue, and add focused regression coverage where appropriate.",
429
-
"Rerun affected scenarios and then the complete set until every outcome meets the original quality bar.",
426
+
"Build a sanitized or synthetic production-scale local dataset, mirror safe production settings, and record unavoidable differences.",
427
+
"Inventory every user-facing feature, role, route, control, state, and workflow; define documented acceptance criteria and a finite risk-based edge-case set for each item.",
428
+
"Exercise every inventory item as a real user under its normal and defined edge-case conditions, logging each bug immediately with reproducible evidence.",
429
+
"Review the complete bug set for shared causes, dependencies, and conflicting fixes, then implement the smallest coherent solution with regression coverage.",
430
+
"Rerun affected paths and the complete inventory; stop only at a clean full pass or an explicit blocked handoff.",
430
431
],
431
432
why:
432
-
"A fixed capability map and consistent evaluation method make product quality visible across the whole system. Requiring a final complete run catches fixes that improve one scenario while weakening another.",
433
+
"A finite surface inventory prevents major controls and states from disappearing behind a few happy-path scenarios. Reviewing all findings before fixing them exposes shared causes and interactions, while the final full run catches changes that repair one path but weaken another.",
433
434
note:
434
-
"Keep the scenario set representative and preserve failed examples. Aggregate results can hide severe misses, so require every scenario to clear the bar.",
435
+
"Do not copy secrets or sensitive production data into the local environment, touch production without approval, or count an untested or blocked surface as passing. Preserve the inventory, bug log, environment differences, and final evidence for review.",
Copy file name to clipboardExpand all lines: site/catalog.json
+21-17Lines changed: 21 additions & 17 deletions
Original file line number
Diff line number
Diff line change
@@ -478,28 +478,32 @@
478
478
},
479
479
"author": "Matthew Berman",
480
480
"published": "2026-06-16",
481
-
"modified": "2026-06-17",
481
+
"modified": "2026-06-21",
482
482
"description": "A comprehensive product-quality workflow that evaluates realistic scenarios across every major capability, fixes weak outcomes, and reruns them to the defined bar.",
483
-
"useWhen": "Use this for an end-to-end product evaluation when quality must be measured across the full feature set rather than a narrow regression or a few hand-picked examples.",
484
-
"prompt": "Create [N] realistic scenarios covering every major capability. Before testing, define clear success criteria and choose a consistent evaluation method, such as pass/fail checks or a scoring rubric. Run every scenario under the same conditions and record evidence for each outcome. Fix the underlying cause of anything that does not meet the criteria, rerun the affected scenarios, and then rerun the complete set. Continue until every scenario meets the original quality bar.",
483
+
"useWhen": "Use this for an exhaustive, end-to-end application QA pass when a production-like local environment and complete interactive-surface coverage matter more than a narrow regression or sample of major features.",
484
+
"prompt": "Build sanitized, production-scale local data under production-like settings. Inventory every user-facing feature, role, route, button, input, modal, state, and workflow; define documented acceptance criteria and finite risk-based edge cases for each. Test as a real user, logging every bug with reproduction evidence. Review findings for shared causes and dependencies; implement coherent fixes with regression tests, then rerun the full inventory. Stop at a clean pass or blocked handoff. Ask before production, sensitive data, or destructive actions.",
485
485
"verification": {
486
-
"title": "Every one of the [N] scenarios meets the defined quality bar.",
487
-
"detail": "The final evaluated run covers every major capability under the original conditions."
486
+
"title": "Every inventoried product surface meets its documented acceptance criteria.",
487
+
"detail": "The final full regression run covers every inventoried surface and its finite risk-based edge cases in the production-like local environment, with each reproducible bug fixed and backed by evidence."
488
488
},
489
489
"steps": [
490
-
"List every major capability, define the success criteria and evaluation method, choose [N], and allocate realistic scenarios across the product surface.",
491
-
"Run the full set under consistent conditions and evaluate every outcome with evidence.",
492
-
"Document each scenario that misses the criteria, fix the underlying issue, and add focused regression coverage where appropriate.",
493
-
"Rerun affected scenarios and then the complete set until every outcome meets the original quality bar."
494
-
],
495
-
"why": "A fixed capability map and consistent evaluation method make product quality visible across the whole system. Requiring a final complete run catches fixes that improve one scenario while weakening another.",
496
-
"implementationNote": "Keep the scenario set representative and preserve failed examples. Aggregate results can hide severe misses, so require every scenario to clear the bar.",
490
+
"Build a sanitized or synthetic production-scale local dataset, mirror safe production settings, and record unavoidable differences.",
491
+
"Inventory every user-facing feature, role, route, control, state, and workflow; define documented acceptance criteria and a finite risk-based edge-case set for each item.",
492
+
"Exercise every inventory item as a real user under its normal and defined edge-case conditions, logging each bug immediately with reproducible evidence.",
493
+
"Review the complete bug set for shared causes, dependencies, and conflicting fixes, then implement the smallest coherent solution with regression coverage.",
494
+
"Rerun affected paths and the complete inventory; stop only at a clean full pass or an explicit blocked handoff."
495
+
],
496
+
"why": "A finite surface inventory prevents major controls and states from disappearing behind a few happy-path scenarios. Reviewing all findings before fixing them exposes shared causes and interactions, while the final full run catches changes that repair one path but weaken another.",
497
+
"implementationNote": "Do not copy secrets or sensitive production data into the local environment, touch production without approval, or count an untested or blocked surface as passing. Preserve the inventory, bug log, environment differences, and final evidence for review.",
Copy file name to clipboardExpand all lines: site/catalog.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -94,10 +94,10 @@ URL above.
94
94
## 010 — [The full product evaluation loop](https://signals.forwardfuture.ai/loop-library/loops/full-product-evaluation-loop/)
95
95
96
96
- Category: Evaluation
97
-
- Use when: Use this for an end-to-end product evaluation when quality must be measured across the full feature set rather than a narrow regression or a few hand-picked examples.
98
-
- Prompt: Create [N] realistic scenarios covering every major capability. Before testing, define clear success criteria and choose a consistent evaluation method, such as pass/fail checks or a scoring rubric. Run every scenario under the same conditions and record evidence for each outcome. Fix the underlying cause of anything that does not meet the criteria, rerun the affected scenarios, and then rerun the complete set. Continue until every scenario meets the original quality bar.
99
-
- Verify: Every one of the [N] scenarios meets the defined quality bar. The final evaluated run covers every major capability under the original conditions.
100
-
- Keywords: AI product evaluation, full product testing, response scoring, quality benchmark, feature coverage
97
+
- Use when: Use this for an exhaustive, end-to-end application QA pass when a production-like local environment and complete interactive-surface coverage matter more than a narrow regression or sample of major features.
98
+
- Prompt: Build sanitized, production-scale local data under production-like settings. Inventory every user-facing feature, role, route, button, input, modal, state, and workflow; define documented acceptance criteria and finite risk-based edge cases for each. Test as a real user, logging every bug with reproduction evidence. Review findings for shared causes and dependencies; implement coherent fixes with regression tests, then rerun the full inventory. Stop at a clean pass or blocked handoff. Ask before production, sensitive data, or destructive actions.
99
+
- Verify: Every inventoried product surface meets its documented acceptance criteria. The final full regression run covers every inventoried surface and its finite risk-based edge cases in the production-like local environment, with each reproducible bug fixed and backed by evidence.
100
+
- Keywords: production-grade QA, production-like local testing, exhaustive product testing, real user testing, UI control coverage, edge case testing, bug documentation, full regression testing
101
101
- Related: [The quality streak loop](https://signals.forwardfuture.ai/loop-library/loops/quality-streak-loop/), [The production data cleanup loop](https://signals.forwardfuture.ai/loop-library/loops/production-data-cleanup-loop/)
Copy file name to clipboardExpand all lines: site/catalog.txt
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -94,10 +94,10 @@ URL above.
94
94
## 010 — [The full product evaluation loop](https://signals.forwardfuture.ai/loop-library/loops/full-product-evaluation-loop/)
95
95
96
96
- Category: Evaluation
97
-
- Use when: Use this for an end-to-end product evaluation when quality must be measured across the full feature set rather than a narrow regression or a few hand-picked examples.
98
-
- Prompt: Create [N] realistic scenarios covering every major capability. Before testing, define clear success criteria and choose a consistent evaluation method, such as pass/fail checks or a scoring rubric. Run every scenario under the same conditions and record evidence for each outcome. Fix the underlying cause of anything that does not meet the criteria, rerun the affected scenarios, and then rerun the complete set. Continue until every scenario meets the original quality bar.
99
-
- Verify: Every one of the [N] scenarios meets the defined quality bar. The final evaluated run covers every major capability under the original conditions.
100
-
- Keywords: AI product evaluation, full product testing, response scoring, quality benchmark, feature coverage
97
+
- Use when: Use this for an exhaustive, end-to-end application QA pass when a production-like local environment and complete interactive-surface coverage matter more than a narrow regression or sample of major features.
98
+
- Prompt: Build sanitized, production-scale local data under production-like settings. Inventory every user-facing feature, role, route, button, input, modal, state, and workflow; define documented acceptance criteria and finite risk-based edge cases for each. Test as a real user, logging every bug with reproduction evidence. Review findings for shared causes and dependencies; implement coherent fixes with regression tests, then rerun the full inventory. Stop at a clean pass or blocked handoff. Ask before production, sensitive data, or destructive actions.
99
+
- Verify: Every inventoried product surface meets its documented acceptance criteria. The final full regression run covers every inventoried surface and its finite risk-based edge cases in the production-like local environment, with each reproducible bug fixed and backed by evidence.
100
+
- Keywords: production-grade QA, production-like local testing, exhaustive product testing, real user testing, UI control coverage, edge case testing, bug documentation, full regression testing
101
101
- Related: [The quality streak loop](https://signals.forwardfuture.ai/loop-library/loops/quality-streak-loop/), [The production data cleanup loop](https://signals.forwardfuture.ai/loop-library/loops/production-data-cleanup-loop/)
Copy file name to clipboardExpand all lines: site/index.html
+12-11Lines changed: 12 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -1025,7 +1025,7 @@ <h3>
1025
1025
data-category="evaluation"
1026
1026
data-published="2026-06-16"
1027
1027
data-featured="true"
1028
-
data-search="full product evaluation realistic tests test cases scenarios major features capabilities score responses results outcomes success criteria pass fail scoring rubric evidence quality bar rerun matthew berman"
1028
+
data-search="production grade product qa full app testing production like local dataset real user every feature role route button input modal state workflow edge case bug documentation regression fix matthew berman"
1029
1029
>
1030
1030
<tdclass="cell-loop">
1031
1031
<divclass="loop-meta">
@@ -1038,17 +1038,18 @@ <h3>
1038
1038
The full product evaluation loop
1039
1039
</a>
1040
1040
</h3>
1041
-
<pclass="loop-summary">Tests every major product capability and fixes outcomes below the quality bar.</p>
1041
+
<pclass="loop-summary">Recreates production locally, tests every product surface, and fixes all verified bugs holistically.</p>
1042
1042
<pdata-prompt>
1043
-
Create [N] realistic scenarios covering every major
1044
-
capability. Before testing, define clear success criteria
1045
-
and choose a consistent evaluation method, such as pass/fail
1046
-
checks or a scoring rubric. Run every scenario under the
1047
-
same conditions and record evidence for each outcome. Fix
1048
-
the underlying cause of anything that does not meet the
1049
-
criteria, rerun the affected scenarios, and then rerun the
1050
-
complete set. Continue until every scenario meets the
1051
-
original quality bar.
1043
+
Build sanitized, production-scale local data under
1044
+
production-like settings. Inventory every user-facing
1045
+
feature, role, route, button, input, modal, state, and
1046
+
workflow; define documented acceptance criteria and finite
1047
+
risk-based edge cases for each. Test as a real user,
1048
+
logging every bug with reproduction evidence. Review
1049
+
findings for shared causes and dependencies; implement
1050
+
coherent fixes with regression tests, then rerun the full
1051
+
inventory. Stop at a clean pass or blocked handoff. Ask
1052
+
before production, sensitive data, or destructive actions.
0 commit comments