Commit 2a5d4b0
authored
Fix waiting for lock acquisition during online training (#566)
<!-- markdownlint-disable -->
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT
THE BOTTOM) HAVE BEEN CONSIDERED.
## Purpose
#424 accidentally added lock waiting logic but only for the first load
attempt of the hidden states file. When that first load (intended for
pre-existing data) fails, the generate hidden states is called and new
hidden states are generated. Unfortunately the new hidden states were
loaded without waiting for lock.
<!--- Why your changes are needed -->
## Description
Make `_maybe_load_hs_file` a helper function so that it can be used by
both loads.
<!--- High-level concise summary of changes -->
## Related Issue
<!--- Link related issue if applicable -->
## Tests
Tested locally that this fixes these errors (which subsequently led to
training crashing):
```
speculators/src/speculators/train/data.py:300: UserWarning: Failed to load/cache hidden states for sample 29: No such file or directory:
/tmp/pytest-of-fynnsu/pytest-55/test_online_smoke_Qwen_Qwen3_V0/hidden_states/chatcmpl-b549e54beee9b9c2-93723e8e.safetensors
```
<!--- Please describe in detail how you tested your changes. -->
I have filled in:
- [x] The purpose of the PR, such as "Fix some issue (link existing
issues this PR will resolve)".
- [x] The test plan/results, such as providing test command and pasting
the results.
- [ ] (Optional) The necessary documentation update.
- [x] I (a human) have written or reviewed the code in this pr to the
best of my ability.
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>1 parent 1ad7048 commit 2a5d4b0
1 file changed
Lines changed: 15 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
178 | 178 | | |
179 | 179 | | |
180 | 180 | | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
181 | 192 | | |
182 | 193 | | |
183 | 194 | | |
| |||
258 | 269 | | |
259 | 270 | | |
260 | 271 | | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | 272 | | |
275 | 273 | | |
276 | 274 | | |
| |||
287 | 285 | | |
288 | 286 | | |
289 | 287 | | |
290 | | - | |
| 288 | + | |
291 | 289 | | |
292 | 290 | | |
293 | 291 | | |
| |||
306 | 304 | | |
307 | 305 | | |
308 | 306 | | |
309 | | - | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
310 | 310 | | |
311 | 311 | | |
312 | 312 | | |
| |||
0 commit comments