Commit 85a35a0
committed
test(experiments): add skill-from-trajectory experiment runner
Three-way comparison driver (no_recall / guidelines / skill) that runs
the seed → synthesize → measure flow per trial. Reuses helpers from
experiments/token_savings.py. Captures token usage from
--output-format json plus per-turn usage and tool-call summaries from
saved transcripts. Supports --seed-utterances to test multi-utterance
seeding (e.g. gps + focal_length, then measure on lens) for skill
generalization.
Also resolves /tmp -> /private/tmp on macOS (Docker bind-mount of /tmp
subdirs doesn't follow the symlink, breaking the prior plumbing for
hidden subdirs).1 parent e7876d4 commit 85a35a0
1 file changed
Lines changed: 540 additions & 0 deletions
0 commit comments