Skip to content

Commit 0e1b9cd

Browse files
christsoclaude
andauthored
skill-quality: detect and fix SKILL.md overspecification (#1266)
* skill-quality: add overspecification anti-pattern + coverage-contract guidance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(studio): fix tests broken by default-to-projects-dashboard change Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): resolve pre-existing biome format/type errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * skill-quality: remove verbose observed-case example from checklist Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 062164f commit 0e1b9cd

8 files changed

Lines changed: 36 additions & 25 deletions

File tree

apps/cli/src/commands/results/remote.ts

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import path from 'node:path';
44
import {
55
DEFAULT_THRESHOLD,
66
type EvaluationResult,
7+
type GitListedRun,
78
type ResultsConfig,
89
type ResultsRepoStatus,
910
directPushResults,
@@ -24,11 +25,10 @@ import {
2425
listResultFilesFromRunsDir,
2526
} from '../inspect/utils.js';
2627

27-
2828
// ── In-memory TTL cache for listGitRuns ────────────────────────────
2929
// Avoids repeated expensive git ls-tree + git cat-file --batch operations
3030
// on every API request. Cache key is repoDir, TTL is 60 seconds.
31-
const gitRunsCache = new Map<string, { data: any; expiresAt: number }>();
31+
const gitRunsCache = new Map<string, { data: Promise<GitListedRun[]>; expiresAt: number }>();
3232
const GIT_RUNS_CACHE_TTL_MS = 60_000;
3333

3434
function cachedListGitRuns(repoDir: string) {
@@ -40,12 +40,14 @@ function cachedListGitRuns(repoDir: string) {
4040
const promise = listGitRuns(repoDir);
4141
gitRunsCache.set(repoDir, { data: promise, expiresAt: now + GIT_RUNS_CACHE_TTL_MS });
4242
// Evict stale entry once the promise settles so a fresh fetch replaces it
43-
promise.catch(() => {}).finally(() => {
44-
const entry = gitRunsCache.get(repoDir);
45-
if (entry && entry.expiresAt <= Date.now()) {
46-
gitRunsCache.delete(repoDir);
47-
}
48-
});
43+
promise
44+
.catch(() => {})
45+
.finally(() => {
46+
const entry = gitRunsCache.get(repoDir);
47+
if (entry && entry.expiresAt <= Date.now()) {
48+
gitRunsCache.delete(repoDir);
49+
}
50+
});
4951
return promise;
5052
}
5153

apps/cli/src/commands/results/serve.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1633,7 +1633,9 @@ export const resultsServeCommand = command({
16331633
// Clone or pull any project entries that declare a source.
16341634
// Non-blocking: fire-and-forget so startup is instant even when some
16351635
// project paths are missing or slow (e.g. /tmp paths that timeout).
1636-
syncProjects(registry.projects).catch((err) => console.error("Background project sync failed:", err));
1636+
syncProjects(registry.projects).catch((err) =>
1637+
console.error('Background project sync failed:', err),
1638+
);
16371639

16381640
try {
16391641
let results: EvaluationResult[] = [];

apps/cli/test/commands/results/serve.test.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,9 +183,9 @@ describe('loadResults', () => {
183183
// ── resolveDashboardMode ───────────────────────────────────────────────
184184

185185
describe('resolveDashboardMode', () => {
186-
it('defaults to single-project mode when no projects are registered', () => {
186+
it('defaults to project dashboard mode when no projects are registered', () => {
187187
expect(resolveDashboardMode(0, {})).toEqual({
188-
projectDashboard: false,
188+
projectDashboard: true,
189189
});
190190
});
191191

apps/cli/test/unit/studio-navigation.test.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ import {
1313
} from '../../../studio/src/lib/navigation.ts';
1414

1515
describe('studio navigation helpers', () => {
16-
it('redirects the root entrypoint to the only registered project', () => {
17-
expect(resolveIndexRoute(['demo-project'], undefined, 'analytics')).toEqual({
16+
it('redirects when the preferred project id matches a registered project', () => {
17+
expect(resolveIndexRoute(['demo-project'], undefined, 'demo-project', 'analytics')).toEqual({
1818
kind: 'redirect',
1919
redirectPath: '/projects/demo-project?tab=analytics',
2020
});

apps/studio/src/components/EvalDetail.tsx

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -287,11 +287,7 @@ function FilesTab({
287287
return (
288288
<div className="relative flex h-full min-h-[400px] gap-4">
289289
{/* FileTree panel — desktop: side-by-side, mobile: full-width slide-over */}
290-
<div
291-
className={`${
292-
mobileShowTree ? 'block' : 'hidden'
293-
} md:block w-full md:w-auto`}
294-
>
290+
<div className={`${mobileShowTree ? 'block' : 'hidden'} md:block w-full md:w-auto`}>
295291
<FileTree
296292
files={files}
297293
selectedPath={effectivePath}
@@ -304,11 +300,7 @@ function FilesTab({
304300
</div>
305301

306302
{/* MonacoViewer panel — desktop: side-by-side, mobile: full-width */}
307-
<div
308-
className={`${
309-
!mobileShowTree ? 'block' : 'hidden'
310-
} md:block flex-1 h-full`}
311-
>
303+
<div className={`${!mobileShowTree ? 'block' : 'hidden'} md:block flex-1 h-full`}>
312304
<MonacoViewer value={displayValue} language={displayLanguage} height="100%" />
313305
</div>
314306

apps/studio/src/components/StopRunButton.tsx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,10 @@ export function StopRunButton({ runId, status, isReadOnly, projectId }: StopRunB
5959
'Stopping…'
6060
) : (
6161
<>
62-
<span aria-hidden="true" className="inline-block h-2.5 w-2.5 rounded-[1px] bg-current" />
62+
<span
63+
aria-hidden="true"
64+
className="inline-block h-2.5 w-2.5 rounded-[1px] bg-current"
65+
/>
6366
Stop
6467
</>
6568
)}

apps/studio/src/routes/index.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,10 @@ import {
2828
useStudioConfig,
2929
} from '~/lib/api';
3030
import {
31+
type StudioTabId,
3132
initialProjectRedirectStorageKey,
3233
resolveIndexRoute,
3334
resolveInitialProjectRedirect,
34-
type StudioTabId,
3535
} from '~/lib/navigation';
3636
import type { RunMeta } from '~/lib/types';
3737
type TabId = StudioTabId;

plugins/agentic-engineering/skills/agent-plugin-review/references/skill-quality-checklist.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,14 @@ description: Use when tests have race conditions, timing dependencies, or pass/f
5656
- [ ] Move heavy reference (100+ lines) to separate files
5757
- [ ] Use cross-references instead of repeating content from other skills
5858
- [ ] Compress examples — one excellent example beats many mediocre ones
59+
- [ ] When SKILL.md exceeds ~500 words for a standard skill, the heaviest section is almost always inlined reference material — extract it
60+
61+
### Coverage Contracts vs. Rule Restatement
62+
63+
When a skill author wants to enforce that the agent doesn't skip rules, the temptation is to inline each rule with its full rationale. Don't.
64+
65+
- **Coverage contract pattern:** Keep one-line checklist items in SKILL.md naming each rule and citing the reference file (e.g., `"Lifecycle choice — apply large-table rule in references/schema-rules.md"`). Add one sentence: "Silence on any item is itself a review gap." Close the silent-skip loophole with: "If a reference file is unavailable, say so explicitly rather than skipping it."
66+
- **Anti-pattern:** Multi-paragraph items that restate rules and rationale already in `references/`. The fix is structural — the prose is in the wrong file, not the wrong shape. Move operational procedures (how to locate files, `find` syntax, what to record) and output-format meta (citation discipline worked examples) into `references/`. Mark that file as always-load.
5967

6068
### Structure
6169

@@ -115,6 +123,8 @@ Match specificity to the task's fragility:
115123
| Version printing instructions | Fragile, rely on git history |
116124
| Hardcoded local paths | Machine-specific, not portable |
117125
| Description summarizes workflow | the agent follows description, skips SKILL.md body |
126+
| SKILL.md inlines rule prose that also lives in `references/` | Two sources of truth — the inline copy drifts from the canonical reference; agent applies the SKILL.md version and ignores the more detailed reference |
127+
| SKILL.md embeds operational procedures or worked-example pairs | Procedures (how to locate files, `find` syntax, what to record) and output-format meta (citation discipline examples) belong in `references/` per progressive disclosure |
118128

119129
## Discipline-Enforcing Skills (Additional Checks)
120130

@@ -125,3 +135,5 @@ For skills that enforce rules (TDD, verification, coding standards):
125135
- [ ] Red flags list for self-checking
126136
- [ ] "Spirit vs letter" addressed: "Violating the letter IS violating the spirit"
127137
- [ ] Hard gates at critical decision points
138+
- [ ] Discipline patterns (output-format meta, citation examples, verification procedures) live in `references/` — SKILL.md names them in one line and cites the file
139+
- [ ] Discipline reference file is marked as always-load so the agent cannot bypass it (don't inline to guarantee coverage — mark as unmissable instead)

0 commit comments

Comments
 (0)