Commit 85f6b2e
OpenAlex/Semantic Scholar/Europe PMC fallbacks + snippet fixes (62 -> 43) (#58)
* Add OpenAlex / Semantic Scholar / Europe PMC fallbacks + snippet fixes
Drive `just validate-references-all` errors from 62 to 43 by extending
the literature fetcher's fallback chain past PubMed/PMC and repairing
the snippets that newly-available abstracts surfaced.
Fetcher (src/communitymech/literature.py):
- fetch_openalex_abstract(): reconstruct linear text from OpenAlex's
abstract_inverted_index (term -> [positions]). Covers older non-OA
titles (pre-1995 IJSEM, Springer, Elsevier) that Crossref/DataCite
do not have abstracts for.
- fetch_semantic_scholar_abstract(): GET /graph/v1/paper/DOI:<doi>
?fields=abstract. Sometimes carries recent Elsevier abstracts that
Crossref hides.
- fetch_europepmc_abstract(): EPMC core search by DOI. Broader life-
science coverage than US PMC; mirrors abstracts for some Springer/
Wiley records.
- fetch_paper() fallback chain extended to:
CrossRef -> PMID/PubMed -> PMCID/PMC -> OpenAlex
-> Semantic Scholar -> Europe PMC
Cache refresh (recovers 17 ERROR rows across 6 communities):
- DOI_10.1099_00207713-36-2-197 (Wichlacz 1986 Acidiphilium taxonomy,
OpenAlex)
- DOI_10.1016_j.hydromet.2020.105484 (NEMO Terrafame bioleaching,
OpenAlex; cited 10x)
- DOI_10.1186_s12302-025-01103-y (OpenAlex)
- DOI_10.1134_S0026261716060059 (OpenAlex)
- DOI_10.1007_s13213-019-01453-y (OpenAlex)
- DOI_10.1007_s11274-009-0047-x (Europe PMC)
- DOI_10.1016_j.cej.2020.125159 -> 10.1016/j.cej.2020.124801 (typo
in original DOI; the wrong DOI points to an OLED chemistry paper.
Correct DOI located via Crossref title search; cache written via
Semantic Scholar for the right paper.)
Snippet repairs on the 7 newly-surfaced "Text part not found" errors:
- AMD_Acidophile_Heterotroph_Network: three snippets paraphrased
away from the 1986 abstract. The paper describes A. rubrum/
angustum/facilis, not A. multivorum, and does not quote pH 2.5-3.5
or 30-35 deg C optima. Replaced with verbatim substrings;
downgraded to PARTIAL/WRONG_STATEMENT as appropriate.
- Chromium_Sulfur_Reduction_Enrichment: DOI typo fixed and snippet
rewritten; second snippet downgraded to PARTIAL where review
abstract supports the bioremediation method but not the specific
numeric figure.
- Industrial_Bioreactor_Consortium / Rammelsberg_Cobalt_Nickel_
Tailings: replaced paraphrases with verbatim substrings.
Remaining 43 "No content available" errors all map to papers
genuinely not abstracted anywhere we can query (Springer book
chapters, paywalled older PNAS/AEM/femsec without PubMed records,
recent 2024-2025 Elsevier titles that none of CrossRef/OpenAlex/
Semantic Scholar/Europe PMC has indexed yet).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Address Copilot review on PR #58
- Rammelsberg_Cobalt_Nickel_Tailings snippet: replace the
`..`-split snippet with the full verbatim run from the abstract:
"Cobalt dissolution kinetics were highly improved by the bacterial
activity, whatever the consortium. This is consistent with the
presence of Co in the pyrite in the secondary ore". One contiguous
substring is cleaner than the double-period splitter trick when
the full phrase is available verbatim.
- Industrial_Bioreactor_Consortium snippet: replace the pH/gene-
expression methodology line with the abstract's verbatim taxonomic
finding that "Sulfobacillus thermosulfidooxidans and
Acidithiobacillus caldus were the dominant species during the
early stage". That directly supports A. caldus' early-stage
sulfur-oxidizing role in the consortium, which is what the
evidence is being used to back.
- references_cache/DOI_10.1016_j.cej.2020.124801.md: replace the
`Anonymous` author stub with the actual Crossref author list
(Zhao, Sun, Li, Yu, Jin, Wang, Liang, Zhang).
- src/communitymech/literature.py: add on-disk caching to
fetch_openalex_abstract, fetch_semantic_scholar_abstract, and
fetch_europepmc_abstract via a shared `_abstract_cache_path`
helper. Mirrors the cache pattern already used by
fetch_pubmed_abstract and fetch_pmc_abstract, so repeated runs
(validator, refresh scripts, smoke tests) do not re-hit the
rate-limited external APIs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 99f1c97 commit 85f6b2e
15 files changed
Lines changed: 1072 additions & 35 deletions
File tree
- kb/communities
- references_cache
- src/communitymech
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | | - | |
| 62 | + | |
63 | 63 | | |
64 | | - | |
65 | | - | |
66 | | - | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
67 | 69 | | |
68 | 70 | | |
69 | 71 | | |
| |||
654 | 656 | | |
655 | 657 | | |
656 | 658 | | |
657 | | - | |
| 659 | + | |
658 | 660 | | |
659 | | - | |
660 | | - | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
661 | 664 | | |
662 | 665 | | |
663 | 666 | | |
| |||
731 | 734 | | |
732 | 735 | | |
733 | 736 | | |
734 | | - | |
| 737 | + | |
735 | 738 | | |
736 | | - | |
737 | | - | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
738 | 744 | | |
739 | 745 | | |
740 | 746 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
327 | 327 | | |
328 | 328 | | |
329 | 329 | | |
330 | | - | |
| 330 | + | |
331 | 331 | | |
332 | 332 | | |
333 | | - | |
334 | | - | |
335 | | - | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
336 | 339 | | |
337 | 340 | | |
338 | 341 | | |
| |||
485 | 488 | | |
486 | 489 | | |
487 | 490 | | |
488 | | - | |
| 491 | + | |
489 | 492 | | |
490 | | - | |
491 | | - | |
492 | | - | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
493 | 499 | | |
494 | 500 | | |
495 | 501 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
451 | 451 | | |
452 | 452 | | |
453 | 453 | | |
454 | | - | |
455 | | - | |
456 | | - | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
457 | 459 | | |
458 | 460 | | |
459 | 461 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
243 | | - | |
244 | | - | |
245 | | - | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
246 | 248 | | |
247 | 249 | | |
248 | 250 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
0 commit comments