Commit d32128a
Mypy green + 5 community backfills + keyword_in_text consolidation (#81)
* Mypy green + 5 community backfills + 2-line helper consolidation
Five threads bundled because they share test surface (mypy newly green,
new community YAML edits, the metals_present additions all touch the
same metal_extraction pipeline).
1. Consolidate `keyword_in_text` (was task 5)
Renamed `_keyword_in_text` -> `keyword_in_text` in metal_extraction
and made it the single source of word-boundary keyword matching.
`scripts/clean_metals_inplace.py` imports it instead of carrying its
own copy. Test fixture updated to the new name.
2. Apply deferred metals additions (was task 4)
The fixed extractor would now produce three legitimate additions the
prior buggy code missed (CHEBI tier-1 / strong-context tier-3
matches). Each was hand-verified against the source description and
added by targeted Edit (not script run) to preserve curator
formatting:
- Methane_Oxidation_CrVI_Reduction_SynCom: + CHROMIUM (community
name + description are explicitly about Cr(VI) reduction).
- Ngawha_Geothermal_Mercury_Cycling: + IRON alongside MERCURY (the
description names "sulfur- and iron-cycling bacteria").
- Trichodesmium_Alteromonas_Marine_Consortium: + IRON (description
names "iron and phosphorus acquisition"; metabolites include
CHEBI iron(2+)).
3. Mypy green (was task 3)
`just lint` now passes mypy with 0 errors (was 257 pre-existing).
Mostly mechanical:
- Added type stubs to dev extras: types-PyYAML, types-requests,
types-tqdm, pandas-stubs (cut 23 import-untyped errors).
- Excluded the auto-generated LinkML datamodel from mypy
(`exclude = ["src/communitymech/datamodel/communitymech\\.py"]`),
same as ruff already does. That removed 147 attr-defined reports
against the generated file.
- Added module overrides for anthropic, kg_microbe_browser, umap,
communitymech.datamodel.* (silences import-not-found / -untyped
for opt-in deps and sibling-repo modules).
- Disabled warn_return_any: most reports were `requests.json()` /
`.get(...)` chains feeding back into annotated return types --
casting at every site is churn without improving safety. WHY-comment
in pyproject.
- cli.py: removed Console = None reassignment via `# type: ignore`,
made repair handlers guard `console is not None` before calling
into Console-typed helpers, fixed implicit-Optional default on
`report: Path | None = None`.
- kgx_export.py: renamed loop variable `for e in ...` to
`for edge in ...` so it doesn't conflict with the prior
`except Exception as e:` scope (Python's `except as e` deletes
the name; mypy's flow analysis was reading the for-loop as a
reference to the deleted name and reporting 12 "deleted variable"
misc errors).
- batch_reporter.py: typed the report dict as `dict[str, Any]` so
`+=` and `.append()` calls type-check.
- metal_extraction.py: fixed the return annotation of
`extract_all_metals_summary` (the function returns a nested
`dict[str, dict[str, int]]`, not `dict[str, int]`).
- literature.py / uniprot_reference_proteomes.py: typed `pmid` /
`url` as `str | None` so reassignments to fetch-returns
type-check.
- Smaller: `interaction_types: Counter[str]`,
`_requests_this_minute: list[float]`, a couple of small noqas with
rationale.
4. AMD/biomining/REE related_ingredients backfill (was task 1)
Scope-limited representative subset (3 of 19 candidates) to keep PR
reviewable. Each entry uses CHEBI terms with snippets taken
verbatim from already-cached PMID/DOI abstracts:
- Tinto_River_Iron_Cycling_Community: iron(3+), iron(2+), sulfide
(anchored to "all related to the iron cycle" and "metabolic
activity of chemolithotrophic microorganisms thriving in the
rich complex sulfides of the Iberian Pyrite Belt"). PMID:25369810
/ doi:10.1128/aem.69.8.4853.
- Oak_Ridge_FRC_Uranium_Nitrate_Groundwater_Community: uranyl ion,
nitrate, iron(3+) (anchored to PMID:22988623 reports of stimulated
U-reducers, the nitrate 44 to 23,400 mg/L gradient, and selectively
stimulated iron reducers like Stenotrophomonas).
- AMD_Acidophile_Heterotroph_Network: iron(2+), iron(3+) (anchored
to doi:10.1007/s11356-014-3789-4 qPCR data on iron-oxidizing
acidophiles and the heterotroph dominance over chemolithotrophs).
The remaining 16 AMD/biomining/REE candidates and the deferred
broader backfill can be a follow-up round.
5. Gut/rhizosphere related_ingredients backfill (was task 2)
Two representative communities to broaden ENVO coverage:
- Bifidobacterium_Ruminococcus_Infant_HMO_CrossFeeding:
2'-fucosyllactose (CHEBI:140503), lactose (CHEBI:36219).
PMID:37973815 supports 2'FL as the curated substrate and lactose
as the R. gnavus -> B. breve cross-feeding currency.
- Avena_Rhizosphere_Detritusphere_Niche_Succession: polysaccharide
(root polysaccharides), organic matter (root detritus).
PMID:31953507 supports both as substrate classes structuring the
succession guilds.
Test plan: just test (136 passed), just validate-all (all 265 clean),
just lint (ruff + black + mypy all green for the first time).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Address Copilot review: doc + noqa placement
- scripts/clean_metals_inplace.py: drop the `PYTHONPATH=src` prefix
from the usage docstring. The script self-bootstraps via
`sys.path.insert(0, .../src)`, so the env-var was misleading. Added
a one-line note explaining the bootstrap.
- src/communitymech/render_community_pages.py: moved the
`# type: ignore[import-not-found]` from the symbol line to the
`from kg_microbe_browser import (` line, where mypy actually emits
the diagnostic. The mypy override in pyproject already covers this,
so the ignore is belt-and-suspenders — but it's now on the line
mypy reports if the override is ever removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 7fed895 commit d32128a
22 files changed
Lines changed: 397 additions & 69 deletions
File tree
- kb/communities
- scripts
- src/communitymech
- embedding
- export
- llm
- network
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
761 | 761 | | |
762 | 762 | | |
763 | 763 | | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
764 | 798 | | |
765 | 799 | | |
766 | 800 | | |
| |||
Lines changed: 35 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
120 | 155 | | |
121 | 156 | | |
122 | 157 | | |
| |||
Lines changed: 34 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
140 | 140 | | |
141 | 141 | | |
142 | 142 | | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
143 | 177 | | |
144 | 178 | | |
145 | 179 | | |
| |||
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
| 160 | + | |
160 | 161 | | |
161 | 162 | | |
162 | 163 | | |
| |||
Lines changed: 49 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
215 | 215 | | |
216 | 216 | | |
217 | 217 | | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
218 | 267 | | |
219 | 268 | | |
220 | 269 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
424 | 424 | | |
425 | 425 | | |
426 | 426 | | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
427 | 476 | | |
428 | 477 | | |
429 | 478 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
| 206 | + | |
| 207 | + | |
206 | 208 | | |
207 | 209 | | |
208 | 210 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
38 | 42 | | |
39 | 43 | | |
40 | 44 | | |
| |||
101 | 105 | | |
102 | 106 | | |
103 | 107 | | |
104 | | - | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
105 | 113 | | |
106 | 114 | | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
35 | | - | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
36 | 39 | | |
37 | 40 | | |
38 | 41 | | |
39 | | - | |
| 42 | + | |
40 | 43 | | |
41 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
42 | 49 | | |
43 | 50 | | |
44 | 51 | | |
| |||
50 | 57 | | |
51 | 58 | | |
52 | 59 | | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | 60 | | |
68 | 61 | | |
69 | 62 | | |
| |||
103 | 96 | | |
104 | 97 | | |
105 | 98 | | |
106 | | - | |
| 99 | + | |
107 | 100 | | |
108 | 101 | | |
109 | 102 | | |
| |||
0 commit comments