Commit 7fed895
* Backfill #30 wetlands, fix metals extractor bug, lint cleanup, cross-repo validator
Combines four follow-ups against #30 (cross-repo environmental linking)
plus an unrelated lint cleanup, all of which build on each other and
share the same test surface.
1. Wetland backfill (#30 Phase 5)
Apply the SPRUCE related_ingredients pattern to 6 more peatland and
wetland communities (Stordalen Mire, Prairie Pothole, MUCC Freshwater
Wetland, Asgard Wetland Soil, Coastal Forested Wetland, Wetland
Oxygen-Sulfate GHG). Each entry uses CHEBI terms and evidence
anchored to already-cached PubMed abstracts; no MediaIngredientMech
IDs are minted yet.
2. Metals extractor bug fix + 65-file cleanup
metal_extraction.py used plain substring matching against 2-letter
element symbols ('ti' for TITANIUM, 'au' for GOLD), which matched
inside unrelated words ('characteristic', 'australia') and salted
metals_present with TITANIUM in 56/67 metal-annotated YAMLs and
GOLD in several more. Switched to non-alphanumeric-boundary regex
matching (case-insensitive), with tests pinning the behavior.
scripts/clean_metals_inplace.py re-runs extraction and rewrites only
the metals_present / rare_earth_elements_present / metal_relevance /
metal_notes blocks via line-based replacement, preserving comments
and unrelated formatting (unlike backfill_metals.py's yaml.dump
path). Applied once across the corpus: 65 community YAMLs corrected.
3. Lint cleanup (just lint ruff/black)
178 pre-existing ruff errors -> 0. Removed T20 (print) from the
ruff selection with rationale: src/communitymech/ ships CLI entry
points that legitimately use print. The remaining 44 non-print
errors were fixed inline (unused imports, raise-from chains,
collapsible ifs, redundant list() calls, zip strict, line splits,
import order in batch_reporter.py) or suppressed with a per-file
E501 ignore for llm/prompts.py (long prompt strings) and targeted
`# noqa` lines with comments for S301/S701/S704/S112 cases that
are intentional within their internal-only contexts. mypy still
reports 256 pre-existing errors and is out of scope here.
4. Cross-repo ID validator (#30 Phase 3, local half)
New module communitymech.validators.cross_repo_ids with a
pattern + existence checker, plus a CLI
(scripts/validate_cross_repo_ids.py) and justfile entries
(validate-cross-repo-ids, validate-cross-repo-ids-all). Sibling
repo paths are opt-in via env or flags; when omitted, the
validator emits info-level skip notices rather than silently
passing. 10 new tests cover pattern, existence, and edge cases.
Test plan: just test (136 passed, 9 skipped), just validate-all (all
265 communities clean), ruff/black green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Address Copilot review: surgical removal-only metals cleanup
Copilot flagged two serious bugs in scripts/clean_metals_inplace.py
from the prior commit:
1. _replace_scalar rewrote metal_notes by substituting only the first
physical line of the YAML key. When the existing value spanned
multiple lines (as PyYAML's folded scalars often do), the indented
continuation lines were left orphaned and silently re-folded by
the parser into the new value — producing strings like
"...(context-validated) measurements; ...(context-validated)" and,
on Ngawha, merging curator prose about mercury cycling into the
auto-generated note.
2. The script unconditionally overwrote metal_notes and metal_relevance
and removed any metals_present entries the (newly fixed) extractor
wouldn't infer. That clobbered curator-authored values (Ngawha's
MERCURY + curator note, Oak Ridge's NICKEL/COBALT/ZINC, Bayan Obo
notes, etc.) — entries the extractor cannot derive but that are
curator decisions to keep.
Reverted all 65 YAMLs the prior commit touched, then rewrote the script
to be surgical:
- Touches only metals_present. Never reads or writes metal_relevance
or metal_notes, which sidesteps the multi-line scalar bug entirely
and preserves curator metadata.
- Removes only entries whose extractor keyword list contains a known
ambiguous short symbol (`ti`/`au`/`pd`) AND whose unambiguous tokens
(full element name, charged ionic forms) do not appear anywhere
else in the file as word-bounded tokens. Anything else is kept,
including curator-added entries the extractor couldn't have inferred.
- Never adds metals. Surprising additions (e.g., Trichodesmium IRON
via newly-correct CHEBI tier-1 matching) are out of scope; running
`scripts/backfill_metals.py --dry-run` surfaces them for separate
curator review.
Result: 56 files (down from 65), each diff is a 1-2 line removal of
TITANIUM and/or GOLD. Ngawha MERCURY, Oak Ridge metals, all curator
metal_notes preserved verbatim. 136 tests pass, all 265 communities
validate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 84af7bd commit 7fed895
83 files changed
Lines changed: 1080 additions & 142 deletions
File tree
- docs
- kb/communities
- scripts
- src/communitymech
- embedding
- llm
- network
- utils
- validators
- visualization
- tests
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
299 | 299 | | |
300 | 300 | | |
301 | 301 | | |
| 302 | + | |
| 303 | + | |
302 | 304 | | |
303 | 305 | | |
304 | 306 | | |
| |||
311 | 313 | | |
312 | 314 | | |
313 | 315 | | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
314 | 341 | | |
315 | 342 | | |
316 | 343 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
38 | 48 | | |
39 | 49 | | |
40 | 50 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
765 | 765 | | |
766 | 766 | | |
767 | 767 | | |
768 | | - | |
769 | 768 | | |
770 | 769 | | |
771 | 770 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
699 | 699 | | |
700 | 700 | | |
701 | 701 | | |
702 | | - | |
703 | 702 | | |
704 | | - | |
705 | 703 | | |
706 | 704 | | |
707 | 705 | | |
Lines changed: 44 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
179 | 179 | | |
180 | 180 | | |
181 | 181 | | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
182 | 226 | | |
183 | 227 | | |
184 | 228 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
427 | 427 | | |
428 | 428 | | |
429 | 429 | | |
430 | | - | |
431 | | - | |
| 430 | + | |
432 | 431 | | |
433 | 432 | | |
434 | 433 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
966 | 966 | | |
967 | 967 | | |
968 | 968 | | |
969 | | - | |
970 | 969 | | |
971 | 970 | | |
972 | | - | |
973 | 971 | | |
974 | 972 | | |
975 | 973 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
544 | 544 | | |
545 | 545 | | |
546 | 546 | | |
547 | | - | |
548 | 547 | | |
549 | | - | |
550 | 548 | | |
551 | 549 | | |
552 | 550 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
351 | 351 | | |
352 | 352 | | |
353 | 353 | | |
354 | | - | |
355 | | - | |
| 354 | + | |
356 | 355 | | |
357 | 356 | | |
Lines changed: 0 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
327 | 327 | | |
328 | 328 | | |
329 | 329 | | |
330 | | - | |
331 | 330 | | |
332 | 331 | | |
0 commit comments