Commit d3748c2
Allow basename of dataset paths to match registered names (#997)
### What does this PR do? Allow local dataset paths to match registered dataset configs
Type of change: Bug fix
<!-- Details about the change. -->
### Usage
```python
# Add a code snippet demonstrating how to use this
```
### Testing
<!-- Mention how have you tested your change if applicable. -->
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, using
`torch.load(..., weights_only=True)`, avoiding `pickle`, etc.).
- Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain
why. -->
- If you copied code from any other source, did you follow IP policy in
[CONTRIBUTING.md](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md#-copying-code-from-other-sources)?:
✅ / ❌ / N/A <!--- Mandatory -->
- Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory
for new features or examples. -->
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes
or backward incompatible changes. -->
### Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a small sample dataset entry (minipile_100_samples) and support
for loading datasets from local filesystem paths with automatic
detection and config override.
* **Chores**
* Improved local-path resolution and substring-based matching against
registered dataset keys for consistent behavior.
* **Tests**
* Added a unit test to verify loading samples from a local dataset
snapshot.
* **Documentation**
* Updated docs to describe local-path support, matching behavior, and
updated function docstring.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Co-authored-by: Chenhan Yu <chenhany@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent 6d77ce7 commit d3748c2
5 files changed
Lines changed: 153 additions & 127 deletions
File tree
- examples/megatron_bridge
- modelopt/torch/utils
- plugins
- tests
- gpu_megatron/torch/utils/plugins
- unit/torch/utils
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
93 | | - | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
94 | 96 | | |
95 | 97 | | |
96 | 98 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
| 22 | + | |
21 | 23 | | |
22 | 24 | | |
23 | 25 | | |
| 26 | + | |
24 | 27 | | |
25 | 28 | | |
26 | 29 | | |
| |||
48 | 51 | | |
49 | 52 | | |
50 | 53 | | |
51 | | - | |
52 | | - | |
53 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
54 | 57 | | |
55 | 58 | | |
56 | 59 | | |
| |||
104 | 107 | | |
105 | 108 | | |
106 | 109 | | |
| 110 | + | |
107 | 111 | | |
108 | 112 | | |
| 113 | + | |
109 | 114 | | |
110 | 115 | | |
111 | 116 | | |
112 | 117 | | |
113 | 118 | | |
114 | | - | |
| 119 | + | |
115 | 120 | | |
116 | 121 | | |
117 | 122 | | |
| |||
142 | 147 | | |
143 | 148 | | |
144 | 149 | | |
145 | | - | |
| 150 | + | |
146 | 151 | | |
147 | | - | |
| 152 | + | |
148 | 153 | | |
149 | 154 | | |
150 | | - | |
| 155 | + | |
151 | 156 | | |
152 | 157 | | |
153 | 158 | | |
| |||
158 | 163 | | |
159 | 164 | | |
160 | 165 | | |
161 | | - | |
162 | | - | |
163 | | - | |
| 166 | + | |
164 | 167 | | |
165 | 168 | | |
166 | 169 | | |
| |||
223 | 226 | | |
224 | 227 | | |
225 | 228 | | |
226 | | - | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
227 | 233 | | |
228 | 234 | | |
229 | 235 | | |
| |||
240 | 246 | | |
241 | 247 | | |
242 | 248 | | |
243 | | - | |
| 249 | + | |
244 | 250 | | |
245 | 251 | | |
246 | 252 | | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
247 | 258 | | |
248 | 259 | | |
249 | 260 | | |
250 | 261 | | |
251 | 262 | | |
| 263 | + | |
| 264 | + | |
252 | 265 | | |
253 | 266 | | |
254 | 267 | | |
| |||
274 | 287 | | |
275 | 288 | | |
276 | 289 | | |
277 | | - | |
| 290 | + | |
278 | 291 | | |
279 | 292 | | |
280 | 293 | | |
281 | | - | |
| 294 | + | |
282 | 295 | | |
283 | 296 | | |
284 | 297 | | |
285 | 298 | | |
286 | 299 | | |
287 | 300 | | |
| 301 | + | |
288 | 302 | | |
289 | 303 | | |
290 | 304 | | |
| |||
649 | 663 | | |
650 | 664 | | |
651 | 665 | | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
60 | 59 | | |
61 | | - | |
62 | 60 | | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | 61 | | |
67 | 62 | | |
68 | 63 | | |
69 | 64 | | |
| 65 | + | |
70 | 66 | | |
71 | 67 | | |
72 | 68 | | |
| |||
188 | 184 | | |
189 | 185 | | |
190 | 186 | | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | | - | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | 187 | | |
268 | 188 | | |
269 | 189 | | |
| |||
309 | 229 | | |
310 | 230 | | |
311 | 231 | | |
312 | | - | |
| 232 | + | |
313 | 233 | | |
314 | | - | |
| 234 | + | |
315 | 235 | | |
316 | 236 | | |
317 | 237 | | |
318 | 238 | | |
319 | 239 | | |
| 240 | + | |
320 | 241 | | |
321 | 242 | | |
322 | 243 | | |
| |||
338 | 259 | | |
339 | 260 | | |
340 | 261 | | |
341 | | - | |
| 262 | + | |
342 | 263 | | |
343 | 264 | | |
344 | 265 | | |
| |||
0 commit comments