Commit 6ff12b0
committed
fix: video_analyze benchmark test - add multi-turn context
The LLM was declining to call video_analyze because the prompt
referenced clips without prior search context. Added a realistic
multi-turn history with video_search results containing clip IDs.
Also updated test runner to support scenario history arrays.
Result: 27/28 passed, 0 failed, 1 skipped (VLM disabled)1 parent 3adf13c commit 6ff12b0
File tree
2 files changed
+36
-5
lines changed- skills/analysis/home-security-benchmark
- fixtures
- scripts
2 files changed
+36
-5
lines changedLines changed: 31 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
23 | 52 | | |
24 | 53 | | |
25 | 54 | | |
| |||
Lines changed: 5 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
503 | 503 | | |
504 | 504 | | |
505 | 505 | | |
506 | | - | |
507 | | - | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
508 | 509 | | |
509 | | - | |
| 510 | + | |
| 511 | + | |
510 | 512 | | |
511 | 513 | | |
512 | 514 | | |
| |||
0 commit comments