Skip to content

Commit 85a35a0

Browse files
committed
test(experiments): add skill-from-trajectory experiment runner
Three-way comparison driver (no_recall / guidelines / skill) that runs the seed → synthesize → measure flow per trial. Reuses helpers from experiments/token_savings.py. Captures token usage from --output-format json plus per-turn usage and tool-call summaries from saved transcripts. Supports --seed-utterances to test multi-utterance seeding (e.g. gps + focal_length, then measure on lens) for skill generalization. Also resolves /tmp -> /private/tmp on macOS (Docker bind-mount of /tmp subdirs doesn't follow the symlink, breaking the prior plumbing for hidden subdirs).
1 parent e7876d4 commit 85a35a0

1 file changed

Lines changed: 540 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)