Commit 0d162d5
authored
Optimize load() by avoiding redundant hash checks and unpacking (#1081)
### Description
Pooch by default revalidates file hashes and re-unpacks archives on
every call, which is very slow for large checkpoints. This change
introduces a `.checked` marker file that stores the resolved resource
path once verification succeeds. Subsequent calls reuse this cached path
instead of repeating the expensive validation and extraction steps.
Key changes:
- Use a `.checked` file alongside the cached resource to record the
verified path.
- Load from the `.checked` file if it exists, bypassing re-validation.
- Ensure `.checked` is written after successful retrieval/unpacking.
### Type of changes
<!-- Mark the relevant option with an [x] -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Refactor
- [ ] Documentation update
- [ ] Other (please describe):
### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:
-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing
> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.
#### Authorizing CI Runs
We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.
- If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
- If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.
### Usage
<!--- How does a user interact with the changed code -->
```python
# TODO: Add code snippet
```
### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->
- [ ] I have tested these changes locally
- [ ] I have updated the documentation accordingly
- [ ] I have added/updated tests as needed
- [ ] All existing tests pass successfully
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- New Features
- Cache-based early exit to reuse previously verified and unpacked
checkpoints, avoiding redundant downloads.
- Automatic unpacking/decompression during retrieval based on file type.
- Performance
- Faster subsequent loads by skipping repeated integrity checks and
extraction on cache hits.
- Refactor
- Unified post-retrieval path handling across flows; no public API
changes.
- Chores
- Added debug logs to indicate when cached paths are used for improved
traceability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Anton Vorontsov <avorontsov@nvidia.com>1 parent a114094 commit 0d162d5
1 file changed
Lines changed: 16 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
195 | 195 | | |
196 | 196 | | |
197 | 197 | | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
198 | 208 | | |
199 | 209 | | |
200 | | - | |
| 210 | + | |
201 | 211 | | |
202 | 212 | | |
203 | 213 | | |
| |||
207 | 217 | | |
208 | 218 | | |
209 | 219 | | |
210 | | - | |
211 | | - | |
| 220 | + | |
212 | 221 | | |
213 | | - | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
214 | 226 | | |
215 | 227 | | |
216 | 228 | | |
| |||
0 commit comments