Skip to content

Commit e1a3a05

Browse files
committed
Add CLI support for task prompt synchronization
1 parent 83c7820 commit e1a3a05

10 files changed

Lines changed: 124 additions & 23 deletions

bin/gd.ts

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ function listGuideDirs(): string[] {
4141
const completion = omelette('gd <command> <arg1> <arg2>');
4242

4343
completion.on('command', ({ reply }) => {
44-
reply(['dev', 'dev-all', 'grade', 'test', 'gen', 'audit', 'eval', 'run', 'dashboard', 'deploy', 'upload', 'baselinestatus', 'setup-completion', 'gen-negative-suite']);
44+
reply(['dev', 'dev-all', 'grade', 'test', 'gen', 'audit', 'eval', 'run', 'dashboard', 'deploy', 'upload', 'baselinestatus', 'setup-completion', 'gen-negative-suite', 'gen-task-suite']);
4545
});
4646

4747
completion.on('arg1', ({ before, reply }) => {
@@ -81,6 +81,8 @@ const { positionals, values } = parseArgs({
8181
'gen-grader': { type: 'boolean' },
8282
'gen-negative': { type: 'boolean' },
8383
guided: { type: 'boolean' },
84+
'sync-task': { type: 'boolean' },
85+
'no-test': { type: 'boolean' },
8486
verbose: { type: 'boolean' },
8587
usecases: { type: 'boolean' },
8688
},
@@ -128,6 +130,7 @@ ${cBold('Guide Development:')}
128130
${"Piece-wise options for `dev`:"}
129131
${cDim('--grade')} Run/calibrate grader
130132
${cDim('--test-grader')} Check grader calibration (demo + negative-demo)
133+
${cDim('--sync-task')} Force update task prompt from prompts.md
131134
${cDim('--gen-grader')} Generate a new grader script
132135
${cDim('--gen-negative')} Generate negative examples
133136
${cDim('--guided')} Skip calibration, run guided agent test only
@@ -141,6 +144,7 @@ ${cBold('Evaluation:')}
141144
${cCyan('deploy')} Deploy the dashboard to GitHub Pages
142145
${cCyan('upload')} <suite> Upload generated evaluation suite to GCS
143146
${cCyan('gen-negative-suite')} Generate resources for negative suite
147+
${cCyan('gen-task-suite')} Update regular tasks with latest prompts
144148
145149
${cBold('Other:')}
146150
${cCyan('baselinestatus')} <query> Check browser support and Baseline status
@@ -188,6 +192,8 @@ ${cBold('Options:')}
188192
const success = await devGuide(dir, {
189193
guidedOnly: !!values.guided,
190194
verbose: !!values.verbose,
195+
syncTask: !!values['sync-task'],
196+
test: !values['no-test'],
191197
});
192198
process.exit(success ? 0 : 1);
193199
}
@@ -263,6 +269,12 @@ ${cBold('Options:')}
263269
break;
264270
}
265271

272+
case 'gen-task-suite': {
273+
const { generateTaskSuite } = await import('../guides/task-suite-gen.ts');
274+
await generateTaskSuite();
275+
break;
276+
}
277+
266278
default: {
267279
// Legacy fallbacks — guide namespace was flattened
268280
if (command === 'guide') {

guides/README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ gd dev <path/to/guide_dir>
108108
This runs the following pipeline after the grader calibrates successfully:
109109

110110
1. **Generate `prompts.md`** if missing — uses Gemini CLI to create a set of developer-facing prompts derived from the guide
111-
2. **Find or create a task file** in `harness/tasks/` — scans existing tasks for a matching `grader:` field, or creates `<guideName>-task.md` using the first prompt from `prompts.md` (defaults to `daily-grind` base app)
111+
2. **Find or create a task file** in `harness/tasks/` — scans existing tasks for a matching `grader:` field, or creates `<guideName>-task.md` using the first prompt from `prompts.md` (defaults to `daily-grind` base app). If the task file already exists but its prompt has drifted from `prompts.md`, `gd dev` will warn you. Run with `--sync-task` to force it to synchronize.
112112
3. **Grade the base app as-is** (pre-score) — establishes a baseline before any agent runs
113113
4. **Run the agent** in both `unguided` (no guide access) and `guided` (with MCP guide access) modes against the base app
114114
5. **Grade both outputs** and print a comparison:
@@ -125,6 +125,16 @@ The agent and base app are selected from the [harness config](../harness/config.
125125

126126
The generated task file is automatically included in future `gd eval suite` runs — the suite discovers all task files in `harness/tasks/` by default.
127127

128+
### Synchronizing All Regular Tasks
129+
130+
If you update multiple `prompts.md` files or want to ensure all regular tasks are in sync with their respective guides, you can run:
131+
132+
```bash
133+
gd gen-task-suite
134+
```
135+
136+
This script scans for "eval-ready" guides, reads their `prompts.md`, and updates the corresponding `<guideName>-task.md` in `harness/tasks/` while preserving any custom `base_app` configuration in the task file.
137+
128138
### Negative Suite
129139

130140
To verify that guides improve agent performance starting from a "bad" implementation, you can run a **Negative Suite**.

guides/dev-guide.ts

Lines changed: 61 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ export interface DevGuideOptions {
3333
maxRetries?: number; // default: 2
3434
test?: boolean; // default: true — run agent test after calibration
3535
guidedOnly?: boolean; // skip calibration and only run the guided agent test
36+
syncTask?: boolean; // update task with latest prompt from prompts.md
3637
verbose?: boolean;
3738
}
3839

@@ -298,9 +299,28 @@ export async function devGuide(targetDirRaw: string, options: DevGuideOptions =
298299
}
299300
}
300301

301-
if (!existingTask && fs.existsSync(promptsPath)) {
302-
const taskInfo = createTask(targetDir, currentInv.name);
303-
taskMap.set(currentInv.name, taskInfo);
302+
if (fs.existsSync(promptsPath)) {
303+
const latestPrompt = getLatestPrompt(targetDir, currentInv.name);
304+
305+
if (existingTask) {
306+
if (existingTask.prompt.trim() !== latestPrompt.trim()) {
307+
if (options.syncTask) {
308+
console.log(cYellow(`\nSyncing prompt for ${currentInv.name}-task.md...`));
309+
const taskInfo = createTask(targetDir, currentInv.name);
310+
taskMap.set(currentInv.name, taskInfo);
311+
} else {
312+
console.log(cYellow(`\n\u26a0\ufe0f Task prompt is outdated!`));
313+
console.log(` Current: "${cDim(existingTask.prompt.substring(0, 100))}${existingTask.prompt.length > 100 ? '...' : ''}"`);
314+
console.log(` Latest: "${cDim(latestPrompt.substring(0, 100))}${latestPrompt.length > 100 ? '...' : ''}"`);
315+
console.log(` Run with ${cBold('--sync-task')} to update.`);
316+
}
317+
} else {
318+
console.log(cGreen(`\n\u2705 Task prompt is up-to-date`));
319+
}
320+
} else {
321+
const taskInfo = createTask(targetDir, currentInv.name);
322+
taskMap.set(currentInv.name, taskInfo);
323+
}
304324
}
305325
}
306326

@@ -399,24 +419,50 @@ Only create the ${PROMPTS_FILE} file. Do not modify any other files.`;
399419
}
400420
}
401421

402-
function createTask(targetDir: string, guideName: string): TaskInfo {
403-
const promptsContent = readFileSafe(path.join(targetDir, PROMPTS_FILE));
404-
const firstLine = promptsContent.split('\n').find(l => l.trim().startsWith('- '));
405-
const prompt = firstLine ? firstLine.replace(/^-\s*/, '').trim() : `Implement the guidance from ${guideName}`;
422+
export function getLatestPrompt(targetDir: string, guideName: string): string {
423+
const promptsPath = path.join(targetDir, PROMPTS_FILE);
424+
if (fs.existsSync(promptsPath)) {
425+
const promptsContent = readFileSafe(promptsPath);
426+
const firstLine = promptsContent.split('\n').find(l => l.trim().startsWith('- '));
427+
if (firstLine) {
428+
return firstLine.replace(/^-\s*/, '').trim();
429+
}
430+
} else {
431+
console.warn(cYellow(` ⚠️ Missing ${PROMPTS_FILE} for ${guideName}, using default prompt.`));
432+
}
433+
return `Implement the guidance from ${guideName}`;
434+
}
435+
436+
export function createTask(targetDir: string, guideName: string): TaskInfo {
437+
const prompt = getLatestPrompt(targetDir, guideName);
406438

407439
const taskName = `${guideName}-task`;
440+
const taskFilePath = path.join(TASKS_DIR, `${taskName}.md`);
441+
442+
// Preserve existing base_app if task file already exists
443+
let baseApp = 'daily-grind';
444+
if (fs.existsSync(taskFilePath)) {
445+
const rawContent = readFileSafe(taskFilePath);
446+
if (rawContent) {
447+
const { data } = matter(rawContent);
448+
if (data?.base_app) {
449+
baseApp = data.base_app;
450+
}
451+
}
452+
}
453+
408454
const taskContent = `---
409-
base_app: daily-grind
455+
base_app: ${baseApp}
410456
grader: ${guideName}
411457
---
412458
${prompt}
413459
`;
414460

415461
fs.mkdirSync(TASKS_DIR, { recursive: true });
416-
fs.writeFileSync(path.join(TASKS_DIR, `${taskName}.md`), taskContent);
417-
console.log(cGreen(`✅ Created task: harness/tasks/${taskName}.md`));
462+
fs.writeFileSync(taskFilePath, taskContent);
463+
console.log(cGreen(`✅ Created/Updated task: harness/tasks/${taskName}.md (base_app: ${baseApp})`));
418464

419-
return { taskName, baseApp: 'daily-grind', prompt };
465+
return { taskName, baseApp, prompt };
420466
}
421467

422468
async function runAgentTest(targetDir: string, guideName: string, taskMap: Map<string, TaskInfo>, guidedOnly = false): Promise<void> {
@@ -828,11 +874,13 @@ if (import.meta.url.startsWith('file:') && process.argv[1] === fileURLToPath(imp
828874
const isTest = !args.includes('--no-test');
829875

830876
if (!dir) {
831-
console.error('Usage: node --experimental-strip-types guides/dev-guide.ts <path/to/guide> [--no-test]');
877+
console.error('Usage: node --experimental-strip-types guides/dev-guide.ts <path/to/guide> [--no-test] [--sync-task]');
832878
process.exit(1);
833879
}
834880

835-
devGuide(dir, { test: isTest }).then(success => {
881+
const syncTask = args.includes('--sync-task');
882+
883+
devGuide(dir, { test: isTest, syncTask }).then(success => {
836884
process.exit(success ? 0 : 1);
837885
}).catch(err => {
838886
console.error(err);

guides/task-suite-gen.ts

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import path from 'path';
2+
import { fileURLToPath } from 'url';
3+
4+
const __filename = fileURLToPath(import.meta.url);
5+
const __dirname = path.dirname(__filename);
6+
7+
// @ts-ignore - dev-guide.ts might not have types in this setup but node handles it
8+
import { scanAllGuides, classifyGuide, getTaskMap, createTask } from './dev-guide.ts';
9+
10+
11+
export async function generateTaskSuite() {
12+
console.log('Scanning guides...');
13+
const taskMap = getTaskMap();
14+
const allGuides = scanAllGuides(taskMap);
15+
16+
const evalReadyGuides = allGuides.filter(inv => classifyGuide(inv) === 'eval-ready');
17+
18+
if (evalReadyGuides.length === 0) {
19+
console.log('No eval-ready guides found.');
20+
return;
21+
}
22+
23+
console.log(`Found ${evalReadyGuides.length} eval-ready guides.`);
24+
25+
let updatedCount = 0;
26+
27+
for (const inv of evalReadyGuides) {
28+
createTask(inv.dir, inv.name);
29+
updatedCount++;
30+
}
31+
32+
console.log(`\nTask suite generation complete! Updated/Synced ${updatedCount} tasks.`);
33+
}

harness/tasks/animate-scrollbar-color-on-scroll-task.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
base_app: daily-grind
33
grader: animate-scrollbar-color-on-scroll
44
---
5-
hey can u make the scrollbar color change as i scroll down the page? like it should start as one color and shift to another as you get to the bottom of the coffee site. make it look smooth.
5+
hey can u make the scrollbar color change as i scroll down the page? like it should start as one color and shift to another as you get to the bottom. make it look smooth.

harness/tasks/deprioritize-background-fetches-task.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
base_app: empty-app
33
grader: deprioritize-background-fetches
44
---
5-
Create an extremely minimal web page with a single button that triggers two concurrent fetch requests: one request to '/api/data' for mission-critical data that must be loaded as quickly as possible, and another to '/api/analytics' that POSTs a `{click: 1}` payload. Write the page to index.html.
5+
Create an extremely minimal web page with a single button that triggers two concurrent fetch requests: one request to '/api/data' for mission-critical data that must be loaded as quickly as possible, and another to '/api/analytics' that POSTs a `{click: 1}` payload.

harness/tasks/improve-next-page-load-performance-task.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
base_app: daily-grind
33
grader: improve-next-page-load-performance
44
---
5-
Improve the speed of this website
5+
Improve the speed of my website

harness/tasks/optimize-image-priority-task.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,4 @@
22
base_app: empty-app
33
grader: optimize-image-priority
44
---
5-
Create an extremely minimal product landing page optimized for a main hero image 'hero-lcp.jpg' which is the largest contentful element. The page also contains a product image gallery where the first image is visible but the second image 'gallery-alt.jpg' is currently hidden behind a toggle. There is also a secondary 'mega-menu-promo.jpg' image that is part of a navigation menu and initially hidden. Finally, include a 'footer-logo.png' much further down the page below the fold.
6-
7-
MANDATORY: Write the page to index.html and ensure that all image sources exactly match the filenames provided. Do not bother downloading stock images, just use the filenames as the src attributes, it's ok if they don't exist.
5+
Create an extremely minimal product landing page that features a main hero image 'hero-lcp.jpg' which is the largest contentful element. The page also contains a product image gallery where the first image is visible but the second image 'gallery-alt.jpg' is currently hidden behind a toggle. There is also a secondary 'mega-menu-promo.jpg' image that is part of a navigation menu and initially hidden. Finally, include a 'footer-logo.png' much further down the page below the fold.

harness/tasks/optimize-preload-priority-task.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
base_app: empty-app
33
grader: optimize-preload-priority
44
---
5-
Create an extremely minimal video landing page that optimizes for LCP. This includes a video poster image 'poster.jpg' (the LCP element), a custom web font 'brand-font.woff2' that is critical for the header rendering, and a secondary font 'secondary-font.woff2' for less critical UI elements. Write the page to index.html.
5+
Create an extremely minimal video landing page that optimizes for LCP. This includes a video poster image 'poster.jpg' (the LCP element), a custom web font 'brand-font.woff2' that is critical for the header rendering, and a secondary font 'secondary-font.woff2' for less critical UI elements.

harness/tasks/optimize-script-priority-task.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
base_app: empty-app
33
grader: optimize-script-priority
44
---
5-
Create an extremely minimal web page for a dashboard. It requires a critical interactivity script at '/js/app.js' that should be loaded asynchronously. It also includes an older '/js/legacy-widgets.js' script that is normally parser-blocking. Finally, include an analytics script '/js/tracker.js'. Write the page to index.html.
5+
Create an extremely minimal web page for a dashboard. It requires a critical interactivity script at '/js/app.js' that should be loaded asynchronously. It also includes an older '/js/legacy-widgets.js' script that is normally parser-blocking. Finally, include an analytics script '/js/tracker.js'.

0 commit comments

Comments
 (0)