Trim tier 2+3 skills to eval-proven content (83% reduction)#1245
Conversation
Based on iteration-4 eval data (20 runs, independent grading), removed content the model already knows and kept only what it actually gets wrong without the skill: - eip-712 (230→60 lines): Keep shared utility module, as const pattern - siwe (547→100 lines): Keep viem-not-siwe-pkg, domain validation, lazy session getter, hasSeenWalletConnected ref - erc-721 (211→65 lines): Keep _safeMint reentrancy, SVG stack-too-deep, attributes array, IPFS trailing slash - erc-20 (159→65 lines): Keep ERC20Capped, SafeERC20, fee-on-transfer - defi-protocol-templates (443→40 lines): Convert to checklist (0pp delta) - solidity-security (533→35 lines): Convert to checklist (-10pp delta) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0f53c21 to
638a2a8
Compare
|
Great analysis! When I tested the ERC-20 skill, I checked the resulting code with and without the skill, and using the skill was better. You can find the comment here scaffold-eth/create-eth-extensions#118 (comment) Now it seems like the ERC-20 skill is not needed anymore. Is this for something that changed from when I did the test or something else? |
|
@damianmarti Great catch! You were right, the ERC-20 skill was providing real value that our eval wasn't measuring. I re-ran the eval with assertions based on your specific findings (named imports,
So instead of restoring the ERC-20 skill, we're adding a unified OpenZeppelin skill inspired by OZ's official skill. It teaches:
This covers ERC-20, ERC-721, and any future OZ contract in one skill. The ERC-721 skill stays for NFT-specific pitfalls ( Eval data for iterations 6-7 pushed to PR #1241. |
|
@technophile-04 great! thanks!! |
|
@technophile-04 maybe add something for ERC20Permit? (with skill is 0/3 now) |
|
@damianmarti Good point — I shouldn't have included ERC20Permit in that table since it's 0/3 on both sides (not discriminating). The table should have only shown the assertions that actually differentiated. On whether to add ERC20Permit guidance: when we tested with a prompt that explicitly said "gasless approvals", both with-skill and without-skill got it right (5/5). So the model knows ERC20Permit — it just doesn't add it unprompted. The question is: should the OZ skill teach "always include ERC20Permit for new ERC-20 tokens"? Happy to add that if you think it's the right default. Really sorry if it feel a bit AI'E my agent just directly posted a comment, but lol keeping it since look good |
mmmmm, I don´t know really. Maybe leaving as an explicit feature to add is ok.
:-) |
Yup then will keep it as is 🙌 |
rin-st
left a comment
There was a problem hiding this comment.
Looking good to me, but see the comment regarding erc-721
The model reads "mirror import style from source" but doesn't actually check the source files. Adding a concrete example fixes this — the model needs to SEE the pattern, not be told to discover it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks @rin-st and @damianmarti for testing!! |
Summary
Iteration 4 Findings
Tier 2+3 skills show dramatically smaller uplift than tier 1:
Per-skill results
as constpatternsiwenpm pkg, domain validationKey insight
The model already knows ERC-20, ERC-721, DeFi staking, and Solidity security well enough. These skills encode preference, not capability. Only eip-712 and siwe show real discriminating assertions (shared utility module,
as const, viem SIWE, domain validation).What was kept (trimmed)
as constrequirementWhat was removed
Test plan
iteration-4/ANALYSIS.mdon the base branch🤖 Generated with Claude Code