fix: count_input_cfg_levels now resolves string file references#15646
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes incorrect reweight_temperature broadcasting when input_cfg is overridden to a YAML filepath by making count_input_cfg_levels() resolve and traverse referenced YAML configs (matching parse_and_combine_datasets behavior).
Changes:
- Update
count_input_cfg_levels()to treat string/Pathinput_cfgvalues as YAML file references and traverse nestedinput_cfgkeys. - Add unit tests covering
input_cfgspecified as YAML file paths, including nested file references and unresolvable strings.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
nemo/collections/common/data/lhotse/cutset.py |
Resolve input_cfg string/Path YAML references during depth counting to align with runtime dataset parsing. |
tests/collections/common/test_lhotse_temperature_reweighting.py |
Add coverage for YAML-file-based input_cfg depth counting, including nested references. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2f1618d to
231f656
Compare
|
/ok to test 231f656 |
|
[🤖]: Hi @XuesongYang 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
| if key not in _cache: | ||
| try: | ||
| _cache[key] = load_yaml(key) | ||
| except Exception: |
There was a problem hiding this comment.
Can you make this Exception tighter?
When input_cfg is overridden via CLI to a YAML file path (e.g. model.train_ds.input_cfg=train_all.yaml), the level counter only saw the top-level string and reported 1 level, causing reweight_temperature to broadcast incorrectly. Now it loads referenced YAML files to discover nested input_cfg keys, matching the runtime behavior of parse_and_combine_datasets. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
…st name - Cache resolved YAML files in _resolve_if_path to avoid redundant disk/network I/O for sibling groups referencing the same file - Add logging.debug when a path cannot be resolved, so failures are never fully silent - Rename misleading test to match its assertion (total depth is 2, not 1; the unresolvable string is treated as a leaf) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
The broad silently swallowed all errors when _resolve_if_path could not load a YAML file, masking real problems (permission errors, malformed YAML, etc.). The only legitimate failure case is FileNotFoundError: when a nested input_cfg YAML contains OmegaConf interpolations like , raw yaml.load returns them as literal strings that don't exist on disk. parse_and_combine_datasets resolves these later via OmegaConf.create(). All other errors should propagate immediately since they would also crash parse_and_combine_datasets. Also documents file-path input_cfg depth counting in the RST docs. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
231f656 to
296def8
Compare
|
/ok to test 296def8 |
|
[🤖]: Hi @XuesongYang 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
When
input_cfgis overridden via CLI to a YAML file path (e.g.model.train_ds.input_cfg=train_all.yaml), the level counter only saw the top-level string and reported 1 level, causingreweight_temperatureto broadcast incorrectly. Now it loads referenced YAML files to discover nestedinput_cfgkeys, matching the runtime behavior ofparse_and_combine_datasets.related: #15200