Skip to content

Update MWG description for activation across developer prompts#804

Merged
micahjo7 merged 5 commits into
mainfrom
update-mwg-skill-description
May 14, 2026
Merged

Update MWG description for activation across developer prompts#804
micahjo7 merged 5 commits into
mainfrom
update-mwg-skill-description

Conversation

@micahjo7
Copy link
Copy Markdown
Collaborator

@micahjo7 micahjo7 commented May 13, 2026

Continued from #456.

We want to activate for relevant web dev tasks- an analysis and experimentation was performed, notes in this doc.

Also maintains 1024-character limit to comply with most coding agents.

This was also tested against our evals to make sure we maintain activation rates (98%+ across three coding agents Claude, Jetski CLI, Codex CLI): https://googlechrome.github.io/guidance/?tests=jetski-cli-mwg-desc-update%7C%7C%7Cremote%2Ccodex-cli-mwg-desc-update%7C%7C%7Cremote%2Cclaude-code-mwg-desc-update%7C%7C%7Cremote

Comment thread guides/modern-web-guidance/SKILL.md Outdated
Comment thread guides/modern-web-guidance/SKILL.md Outdated
Comment thread guides/modern-web-guidance/SKILL.md
Comment thread guides/modern-web-guidance/SKILL.md Outdated
@micahjo7
Copy link
Copy Markdown
Collaborator Author

Posted another update addressing comments. Also realized there was originally a bug in my testing code, so some other edits were needed. The results are updated in the report:

Total Prompts Evaluated: 98 (with 2 prompts catching JSON parsing escapes)
True Positive (TP) Rate: 0 / 13 (Conversational Research Latency)
True Negative (TN) Rate: 83 / 85 (97.6% Correctly Ignored)
False Positive (FP) / Over-activation Rate: 2 / 98 (2.0% Over-activation)
False Negative (FN) / Under-activation Rate: 13 / 13 (Conversational Research Latency)
GCLI Trigger Boundary Accuracy: 84.7%

and this still has 97%+ activation rate in full suite eval runs across three agents.

@micahjo7 micahjo7 merged commit 4c6ebb8 into main May 14, 2026
3 checks passed
@micahjo7 micahjo7 deleted the update-mwg-skill-description branch May 14, 2026 13:52
@paulirish paulirish added the regression-eval-needed Not for general guide tasks label May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

regression-eval-needed Not for general guide tasks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants