Convert 3 more CommunityMech writers (intelligent_snippet_fixer + enhance_strain_data + add_evidence_source) by realmarcin · Pull Request #87 · CultureBotAI/CommunityMech

realmarcin · 2026-05-26T02:14:53Z

Summary

Continues the audit-machinery rollout from #84 (initial port: 5 writers) and #85 (audit-detector blind spot fix + restore-on-failure pattern). Brings CommunityMech writer coverage from 5/16 → 8/16 gated through record_curation_event + write_validated_community.

Each converted script now:

Loads the community YAML via yaml.safe_load
Mutates the in-memory dict
Calls record_curation_event(doc, curator=..., action=..., changes=..., llm_assisted=...)
Writes via write_validated_community(doc, path) — closed-schema LinkML gate refuses any drift
Catches ValidationFailedError per-record so one bad doc can't kill a batch

Conversions

Script	Action label	LLM	Notes
`scripts/intelligent_snippet_fixer.py`	`FIX_SNIPPETS_LLM`	yes	Uses `skip_if_recent=True` so a session of auto-approved snippet fixes collapses into one curation_history entry. Pre-existing `.yaml.bak_intelligent` backup (from `shutil.copy2` at session start) is unchanged — still the user-visible safety net.
`scripts/enhance_strain_data.py`	`ENHANCE_STRAIN_DATA`	no	Previously extract-only; added an `--apply` mode that writes `strain_designation` into matching `taxonomy[*]` entries via the validated writer. Default behavior preserves the historical extract-only flow (no `kb/communities` writes without `--apply`). `--overwrite` controls whether to replace existing curator-authored values.
`scripts/add_evidence_source.py`	`BACKFILL_EVIDENCE_SOURCE`	no	Backup-then-rename pattern from #85: original renamed to `.yaml.bak_source` before the validated write; on `ValidationFailedError` the backup is renamed back into place so the batch loop continues cleanly.

All existing CLI flags preserved (--auto, --interactive, --dry-run, --auto-approve, --only-invalid, --file, --verbose, --relaxed).

After-state audit

=== writers audit summary (16 writers) ===
  appends curation_history:   8 / 16   (was 5/16)
  has write safeguard:        11 / 16
  validates before write:     9 / 16   (was 6/16)
  wired into justfile:        1 / 16

The remaining un-converted writers (apply_strain_designations.py, apply_taxonomy_corrections.py, backfill_metals.py, clean_metals_inplace.py [in flight on a sibling PR], fix_reference_formats.py, plus a couple of src/ writers) follow the same pattern; left as future work to keep this PR focused.

Heads-up: pre-existing import errors (not changed by this PR)

scripts/add_evidence_source.py and scripts/intelligent_snippet_fixer.py import communitymech.literature_enhanced at the module top level — a module that does not exist in this repo and never has. Both scripts have always failed at the import step when invoked from CLI. This PR does not introduce or fix that; it's tracked separately. The conversion itself (imports + writer routing) is independently verifiable by ruff + ast-parse, which is what this PR verifies.

Baseline (unchanged)

check	result
`uv run python scripts/validate_strict.py --quiet`	0 ERROR rows / 265 files
`uv run pytest tests/ -q`	136 passed, 9 skipped, 7 deselected
`uv run ruff check scripts/{add_evidence_source,enhance_strain_data,intelligent_snippet_fixer}.py`	net -4 errors (all I001 / F401 from removing unused imports); no new findings

Test plan

uv run python -c "import ast; ast.parse(open('scripts/<name>.py').read())" — all three parse
uv run ruff check scripts/<name>.py — net error reduction (removed pre-existing unused imports); no new findings
uv run python scripts/audit_writers.py — all 3 now show appends_curation_history=yes AND validates_before_write=yes
uv run python scripts/validate_strict.py --quiet — 0 ERROR rows / 265 files
uv run pytest tests/ — 136 passed, 9 skipped
uv run python scripts/enhance_strain_data.py --help — new --apply / --overwrite / --kb-dir flags surface cleanly

🤖 Generated with Claude Code

…d helpers Brings CommunityMech writer coverage from 5/16 to 8/16. Continues the pattern established in PR #84 and refined by PR #85 (restore-on-failure backup handling): every script that mutates a community YAML loads through yaml.safe_load, mutates the dict, records a CurationEvent via record_curation_event(), and writes back via write_validated_community() which gates on closed-schema LinkML validation. Converted scripts: - scripts/intelligent_snippet_fixer.py (LLM-driven snippet repair; llm_assisted=True; action=FIX_SNIPPETS_LLM). Uses skip_if_recent=True on the curation event so a session of auto-approved fixes collapses into a single trail entry instead of one per snippet. The existing .yaml.bak_intelligent backup created at session start by shutil.copy2 remains the user-visible safety net. - scripts/enhance_strain_data.py (strain-ID enrichment; action=ENHANCE_STRAIN_DATA). Previously the script extracted strain data but only emitted a copy-paste snippets file; this PR adds an --apply mode that writes strain_designation entries back into matching taxonomy[*] entries via write_validated_community(). Default behavior preserves the historical extract-only flow (no kb/communities writes without --apply). --overwrite controls whether to replace existing curator-authored strain_designation values. - scripts/add_evidence_source.py (evidence_source enum backfill; action=BACKFILL_EVIDENCE_SOURCE). Uses the backup-then-rename pattern from PR #85 — the original is moved to .yaml.bak_source before the validated write; on ValidationFailedError the backup is renamed back in place so the batch loop can continue without leaving a half-written community on disk. Each per-record loop continues on ValidationFailedError so one bad file can't kill the batch. CLI surfaces (--auto, --interactive, --dry-run, --auto-approve, --file, --apply, etc.) preserved. After-state: scripts/audit_writers.py reports 8/16 writers gated (was 5/16). The remaining un-converted writers (apply_strain_designations, apply_taxonomy_corrections, backfill_metals, clean_metals_inplace, fix_reference_formats, plus a handful of smaller src/ writers) follow the same conversion pattern; left as future work to keep this PR focused. Note: scripts/add_evidence_source.py and scripts/intelligent_snippet_fixer.py import communitymech.literature_enhanced (a pre-existing module that does not currently exist in the repo) at module top-level. This PR does not introduce or fix that — the scripts have always failed at the import step when invoked from CLI. Out of scope for this conversion; tracked separately. Baseline (unchanged): - validate_strict: 0 ERROR rows / 265 files - pytest tests/: 136 passed, 9 skipped Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #86 (Convert clean_metals_inplace.py) just merged into main. After rebasing, scripts/clean_metals_inplace.py is now gated, so the appends_curation_history / validates_before_write columns for it flip to yes. Re-running scripts/audit_writers.py produces a 1-row delta; commit it so the report reflects the actual post-rebase state. Combined post-merge: 9/16 appends_curation_history, 10/16 validates_before_write (was 5/16 and 6/16 respectively at the start of this PR series). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Continues the audit-machinery rollout from PR #84/#85 by converting three more writers (intelligent_snippet_fixer.py, enhance_strain_data.py, add_evidence_source.py) to route through record_curation_event + write_validated_community, bringing coverage from 5/16 to 8/16.

Changes:

Convert three writers to use the curation-event + closed-schema-validated writer pattern, each with a script-specific action label
Add --apply / --overwrite / --kb-dir flags to enhance_strain_data.py (previously extract-only); the historical default still writes nothing to kb/communities/
Refresh reports/pipeline_writers_audit.tsv so the three converted scripts now show appends_curation_history=yes and validates_before_write=yes

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
scripts/intelligent_snippet_fixer.py	Wrap snippet-fix write in `record_curation_event(skip_if_recent=True, llm_assisted=True)` + `write_validated_community`; pre-existing `.yaml.bak_intelligent` backup remains the user-facing safety net
scripts/enhance_strain_data.py	Add `apply_strain_data_to_community` method and `--apply` / `--overwrite` / `--kb-dir` CLI flags; preserves curator-authored `strain_designation` values unless `--overwrite` is set
scripts/add_evidence_source.py	Summarize per-file changes in a curation event, rename original to `.yaml.bak_source`, write via validated writer, restore backup on `ValidationFailedError`
reports/pipeline_writers_audit.tsv	Update three rows to reflect new `appends_curation_history` / `validates_before_write` status

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* Fix broken literature_enhanced imports in two writer scripts scripts/add_evidence_source.py and scripts/intelligent_snippet_fixer.py both import EnhancedLiteratureFetcher from communitymech.literature_enhanced — a module that was never committed to git (only a stale .pyc was shadowing the missing source locally). Both scripts have raised ModuleNotFoundError on import for as long as anyone has tried to run them, which was surfaced as a pre-existing-state heads-up by the recent writer-conversion PR #87. Swap to LiteratureFetcher from communitymech.literature, which exposes the same fetch_pubmed_abstract + fetch_paper surface plus a richer DOI fallback chain (CrossRef → PubMed via DOI lookup → PMC full-text → OpenAlex → Semantic Scholar → Europe PMC → publisher meta-tag scrape) that subsumes what fetch_abstract_for_doi did. API differences: - fetch_paper returns (abstract, pdf_url) not a dict; tuple-unpack at call sites. - LiteratureFetcher.fetch_paper has no download_pdf kwarg (the older version's flag was a no-op in the LiteratureFetcher pipeline; the pdf URL is just returned alongside the abstract). - Title field is unavailable separately. In add_evidence_source.py's guess_evidence_source classifier the title was filter(None, …)-merged with snippet and abstract anyway; losing it degrades classification marginally (PubMed abstracts include the title in the abstract text, so PMID references are unaffected). If richer DOI classification is needed later, LiteratureFetcher.fetch_doi_metadata() returns CrossRef metadata with a title field. After-state: both scripts now import and run their initialization paths cleanly. pytest tests/ still passes (136 passed, 9 skipped). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address Copilot review: drop dead title param from guess_evidence_source Copilot flagged that title was assigned None and then passed through guess_evidence_source as a parameter that the classifier merged into its keyword-matching text via filter(None, ...). With title always None the parameter was dead code that just clutters the call sites. Remove the title parameter from guess_evidence_source and from both caller blocks. PubMed abstracts already embed the title in the abstract text (so PMID-driven classification is unchanged), and CrossRef titles for DOI references are available via LiteratureFetcher.fetch_doi_metadata() if richer classification is wanted later — that's now a clear future-work hook rather than a hard-coded-None pretense. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 26, 2026 02:14

Copilot started reviewing on behalf of realmarcin May 26, 2026 02:15 View session

realmarcin and others added 2 commits May 25, 2026 19:15

realmarcin force-pushed the convert/three-more-writers branch from 7aa14dc to a9f062a Compare May 26, 2026 02:16

Copilot AI reviewed May 26, 2026

View reviewed changes

realmarcin merged commit a49f889 into main May 26, 2026

realmarcin deleted the convert/three-more-writers branch May 26, 2026 02:17

realmarcin mentioned this pull request May 26, 2026

Fix broken literature_enhanced imports in two writer scripts #88

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert 3 more CommunityMech writers (intelligent_snippet_fixer + enhance_strain_data + add_evidence_source)#87

Convert 3 more CommunityMech writers (intelligent_snippet_fixer + enhance_strain_data + add_evidence_source)#87
realmarcin merged 2 commits into
mainfrom
convert/three-more-writers

realmarcin commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

realmarcin commented May 26, 2026

Summary

Conversions

After-state audit

Heads-up: pre-existing import errors (not changed by this PR)

Baseline (unchanged)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants