Commit 763998f
authored
feat(prepro): speed up Nextclade metadata lookup (#6453)
## Summary
This draft PR replaces `dpath.get()` in the Nextclade preprocessing
metadata path with a simple direct lookup for the dot-separated paths
used by Loculus metadata mappings. It also removes the now-unused
`dpath` Conda dependency and adds a focused unit test for the lookup
behavior.
## Why
While investigating west-nile preprocessing, the slow phase after
`Nextclade results available` was dominated by per-entry metadata
processing rather than upload or taxonomy calls. The hot path was
`process_single -> get_output_metadata -> add_input_metadata ->
add_nextclade_metadata -> dpath.get`.
The Nextclade result objects can be large, and Loculus only needs simple
nested dictionary paths here after the existing wildcard truncation
step. Direct traversal avoids the heavy general-purpose `dpath` lookup
overhead.
Full notes:
https://gist.github.com/theosanderson-agent/657269613739be0d318f64a08d37bfa9
## Local timing
For a synthetic 100-entry west-nile batch using saved unprocessed data:
- Before: metadata/process phase around `26.348s`
- After this branch: `process_single_total=1.183s`, mean `0.012s`,
median `0.002s`, max `0.145s`
- Same run had `nextclade_enrich=1.884s`
## Validation
- `ruff format --diff .`
- `ruff check --diff .`
- `PYTHONPATH=src /usr/bin/python3.12 -m pytest
tests/test_nextclade_preprocessing.py::test_get_nested_metadata_uses_simple_dot_paths`
🚀 Preview: https://codex-fast-nextclade-meta.loculus.org1 parent 01ec896 commit 763998f
3 files changed
Lines changed: 36 additions & 8 deletions
File tree
- preprocessing/nextclade
- src/loculus_preprocessing
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | 9 | | |
11 | 10 | | |
12 | 11 | | |
| |||
Lines changed: 13 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
9 | | - | |
10 | 8 | | |
11 | 9 | | |
12 | 10 | | |
| |||
134 | 132 | | |
135 | 133 | | |
136 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
137 | 146 | | |
138 | 147 | | |
139 | 148 | | |
| |||
169 | 178 | | |
170 | 179 | | |
171 | 180 | | |
172 | | - | |
| 181 | + | |
173 | 182 | | |
174 | 183 | | |
175 | | - | |
176 | | - | |
177 | | - | |
| 184 | + | |
178 | 185 | | |
179 | 186 | | |
180 | 187 | | |
| |||
Lines changed: 23 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
62 | 84 | | |
63 | 85 | | |
64 | 86 | | |
| |||
0 commit comments