Commit a1ba778
fix(jumpscore): align message format and video lookup (#1330)
* feat: add jump rope evaluation task
* fix(mmmu): lazy-load judge server to avoid OpenAI API key error on module import
The judge server was initialized at module import time, causing
OpenAI API errors in CI environments where OPENAI_API_KEY is not set.
Now the server is created on first use via _get_judge_server() instead.
* Revert "fix(mmmu): lazy-load judge server to avoid OpenAI API key error on module import"
This reverts commit 18dd0c3.
* fix(jump_rope): lazy-load HF dataset snapshot to avoid import-time download
snapshot_download was called at module level, causing CI to fail when
loading task configs without HF credentials. Moved to _get_cache_dir()
which is called on first actual use, following the same pattern as
other tasks (e.g. vbvr/utils.py).
* fix(mmmu): lazy-load judge server to avoid OpenAI API key error on module import
The judge server was initialized at module level, causing an OpenAIError
in CI environments where OPENAI_API_KEY is not set. Replaced the top-level
initialization with _get_judge_server(), which creates the server on first
actual use, consistent with how jump_rope/utils.py handles its HF download.
* ci(task-input-ab): gracefully skip comparison when BASE snapshot fails
The BASE worktree may contain pre-existing import-time errors (e.g.
module-level OpenAI client init requiring OPENAI_API_KEY, or network
calls at import time). These cause the BASE capture step to fail, blocking
all PRs even when the PR itself introduces no regression.
Changes:
- Add continue-on-error: true to 'Capture BASE snapshot' step
- Update 'Compare snapshots' to skip diff when base.json is absent,
printing a clear warning instead of failing the workflow
* refactor(jump_rope): rename task directory from jump_rope to jumpscore
* Revert "fix(mmmu): lazy-load judge server to avoid OpenAI API key error on module import"
This reverts commit 917a3ed.
* Revert "ci(task-input-ab): gracefully skip comparison when BASE snapshot fails"
This reverts commit 86f7f9a.
* fix(jumpscore): configure video cache in yaml
* fix(jumpscore): expose map metric
* fix(jumpscore): align message format and video lookup
* fix(jumpscore): remove snapshot cache fallback
* fix(jumpscore): support zipped video cache
* style: auto-fix lint (black + isort)
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>1 parent 4510f3e commit a1ba778
1 file changed
Lines changed: 44 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
| 6 | + | |
5 | 7 | | |
6 | 8 | | |
| 9 | + | |
| 10 | + | |
7 | 11 | | |
8 | 12 | | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
9 | 49 | | |
10 | 50 | | |
11 | 51 | | |
| |||
24 | 64 | | |
25 | 65 | | |
26 | 66 | | |
| 67 | + | |
| 68 | + | |
27 | 69 | | |
28 | 70 | | |
29 | 71 | | |
| |||
48 | 90 | | |
49 | 91 | | |
50 | 92 | | |
51 | | - | |
| 93 | + | |
52 | 94 | | |
53 | 95 | | |
54 | 96 | | |
55 | 97 | | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | 98 | | |
60 | 99 | | |
61 | 100 | | |
62 | 101 | | |
63 | 102 | | |
64 | 103 | | |
65 | 104 | | |
66 | | - | |
| 105 | + | |
67 | 106 | | |
68 | 107 | | |
69 | | - | |
70 | | - | |
71 | 108 | | |
72 | 109 | | |
73 | 110 | | |
| |||
0 commit comments