feat: add HD-EPIC VQA benchmark (CVPR 2025)#1316
feat: add HD-EPIC VQA benchmark (CVPR 2025)#1316aliazani wants to merge 6 commits intoEvolvingLMMs-Lab:mainfrom
Conversation
- 30 question prototypes across 7 categories (Recipe, Ingredient, Nutrition, Fine-grained Actions, 3D Perception, Object Motion, Gaze) - 26,550 multiple-choice questions from 41 hours of egocentric video - Runnable per-prototype, per-category, or full benchmark - Validated: Qwen2.5-VL-7B 26% on Ingredient Weight (R3 report: ~28%)
-c copy snaps to nearest keyframe, causing clips to start early. Replace with -c:v libx264 -preset ultrafast -crf 23 for exact cuts.
kcz358
left a comment
There was a problem hiding this comment.
Hi, the tasks LGTM. It would be better if you can split the yaml into sub folders so that the categories are clearer for the user. Thanks
Split the 30 per-prototype YAMLs and the 7 category-group YAMLs into one subfolder per HD-EPIC category (recipe/, ingredient/, nutrition/, fine_grained/, 3d_perception/, object_motion/, gaze/) so the category structure is visible at a glance. _hd_epic_base.yaml and the master group YAML stay at the top level. Per-prototype YAMLs now use `include: ../_hd_epic_base.yaml`. Task names, group wiring, and --tasks invocations are unchanged. Also: update generate_task_yamls.py to emit the new layout; add a Repository layout section to the README; fix stale clip extraction note (-c copy → libx264/ultrafast/crf23).
lmms-eval resolves !function module names relative to each YAML's own directory. After moving per-prototype YAMLs into <category>/ subfolders, references like `!function utils.filter_X` could no longer find the top-level utils.py and threw ImportError on task load. Each category subfolder now contains a small utils.py shim that prepends the parent directory to sys.path and re-exports the shared helpers, so the existing !function references resolve transparently. No YAML or top-level utils.py changes required. generate_task_yamls.py also writes the shim alongside each subfolder so regenerations stay consistent.
Pure input seek (-ss before -i) snaps to the nearest keyframe, which can start a clip several seconds early. This caused the model to see a different time window than the question intended, dropping accuracy on ingredient_ingredient_weight from 26% to 22% (~1 question per 12). Pure output seek (-ss after -i) is frame-accurate but decodes from the start of the file, making extraction 10-20x slower on the long HD-EPIC recordings (36 min for 50 questions vs ~2 min with input seek). Switch to two-pass seek: fast keyframe-aligned input seek to ~2s before the target, then a short precise output seek for the remaining offset. Frame accuracy is equivalent to pure output seek; extraction time is equivalent to input seek. Validated: accuracy returns to 26% (matching R3 baseline, within SE) at 6:38 total for 50 questions. Also update the docstring for _extract_clip to document the strategy, caching behaviour, and fallback path.
@kcz358 One thing worth noting: lmms-eval resolves Also took the opportunity to add a "Repository layout" section to the README documenting the new structure, and fixed a stale note in the Notes section (clip extraction now uses two-pass ffmpeg seek rather than Task names and Original Benchmark link: https://github.com/hd-epic/hd-epic-vqa-eval/tree/main |
Summary
--tasks hd_epic_<prototype>,--tasks hd_epic_<category>, or--tasks hd_epicIn scope
hd_epic)utils.pywithdoc_to_messages,doc_to_visual,doc_to_text,process_results,doc_to_target, and<TIME>/<BBOX>tag resolutionhd_epic_to_hf.pyconverter from official annotation JSONs to lmms-eval JSONL formatgenerate_task_yamls.pyscript for regenerating all YAMLsREADME.mdwith setup instructions and validation resultsOut of scope
Validation
python -m lmms_eval --model qwen2_5_vl --model_args pretrained=Qwen/Qwen2.5-VL-7B-Instruct,fps=1,max_num_frames=32,min_pixels=50176,max_pixels=50176 --tasks hd_epic_ingredient_ingredient_weight --batch_size 1| sample size:N=50| key metrics:accuracy| result: pass (26%, R3 community report baseline: ~28%, within SE of 6pp)Risk / Compatibility
ffmpegin PATH andHD_EPIC_VIDEO_DIRenv var set at eval time; gracefully falls back to full video if clip extraction failsType of Change