Add hra muscular ntr#3700
Conversation
Four-stage pipeline for generating UBERON new term request ROBOT templates from HRA ASCTB unmapped term tables: Stage 1 (generate_template.py): reads xlsx/csv input, classifies parent IDs (UBERON/FMA/ASCTB-TEMP), assigns UBERON:99xxxxx provisional IDs, writes initial ROBOT template TSV + error and candidate reports. Stage 2 (group_terms_by_parent.py): groups template rows by parent and writes per-group JSON files for parallel subagent processing. Stage 3 (ntr-term-researcher agent): resolves FMA/ASCTB-TEMP parents via OLS4, checks for existing UBERON matches, writes Aristotelian definitions from Wikipedia, resolves is_a vs part_of relationship types. Stage 4 (merge_definitions.py): merges subagent outputs back into the template; appends confirmed/possible OLS4 matches to candidates report. Template columns: ID, LABEL, Definition, def_xref (definition annotation), is_a, part_of, In_subset, Date, Contributor, Present_in_taxon, Wikipedia_image (foaf:depiction), xref (direct oboInOwl:hasDbXref for Wikipedia article URL + FMA ID). Supporting agents/skills: - ntr-term-researcher: Stage 3 subagent spec - ontology-term-lookup: OLS4 structured search - fetch-wiki-info: Wikidata + Wikipedia lookup - .mcp.json: ols4, artl-mcp, playwright MCP servers Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… plans Phases covered: - Phase 2: grouping vs leaf-node term distinction (linguistic rules, subagent behaviour) - Phase 3: detect UBERON label-ID mismatches in Stage 1; new WRONG_PARENT: placeholder; multi-valued parent column splitting; subagent protocol for mismatch correction (informed by ovary run where 7/13 terms had wrong-domain UBERON parent IDs silently accepted) - Phase 4: scale to full muscular-system table - Phase 5: generalise to other ASCTB anatomical systems Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> @dragon-ai-agent
merge_definitions.py: - Fallback path (parent resolved but rel type unknown) now leaves both is_a and part_of blank rather than double-setting them, and lists affected term labels in the summary output under 'Relationship unresolved' for curator attention - Remove dead 'if jf.parent.name == "input"' guard — glob never matches files in subdirectories generate_template.py: - Remove dead write_tsv call with doubled headers that was immediately overwritten by the block below it - Fix counter order: use counter for ID, then increment (was: increment then use counter-1) - Remove hardcoded CONTRIBUTOR_IRI constant; add --contributor CLI arg with ORCID format validation; prompts interactively if not supplied group_terms_by_parent.py: - Remove derive_wikipedia_urls call and wikipedia_urls field from output JSON — parent_label is always "" so the call always returned []; the subagent derives Wikipedia URLs independently during lookup ntr-term-researcher.md: - Clarify that Wikipedia article page URL (not image URL) goes in xrefs at point of successful lookup, as Wikipedia:Article_Title - Add image relevance check: verify caption/alt text confirms the image illustrates the target structure before storing it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…name flagging
Addresses issues found in the ovary branch test run where the agent:
- classified layers (corpus luteum granulosa lutein/theca) as is_a parents (should be part_of)
- accepted source-provided broad parents instead of finding more specific ones
- left ASCTB-TEMP placeholders as the only def_xref (no real PMIDs)
- did not flag pathological terms (hemorrhagic, luteinized unruptured) as out of scope
- did not normalise non-standard names ('dominance' instead of 'dominant')
ntr-term-researcher.md changes:
- Step 1 expanded: after confirming source parent, agent must search OLS4 for a more
specific parent (e.g. primary/secondary ovarian follicle vs generic ovarian follicle)
- New Step 3: scope check (pathological/dysfunctional → out_of_scope) and name check
(non-standard → name_corrections with curator-reviewable suggestion)
- New Step 5: literature search — must find at least one real PMID/DOI for def_xref;
ASCTB-TEMP placeholders explicitly disallowed as the only reference
- Step 7 (relationship resolution) rewritten with explicit structural vocabulary:
layers, zones, heads, bellies, parts, compartments, walls → ALWAYS part_of
subtypes/stages/members of grouping classes → is_a
Quick test ('is a kind of' vs 'is part of') with worked examples
- Output JSON adds: def_xrefs_to_add, out_of_scope, name_corrections keys
- Quality checks expanded with explicit rules for layers, pathology, naming
merge_definitions.py changes:
- Refactored load_subagent_outputs to return single dict (less argument tuple churn)
- New behaviour: out_of_scope terms excluded from template (not just confirmed_matches);
written to <name>-reports/out_of_scope.tsv for curator review
- New behaviour: name_corrections applied to LABEL column; original-source mapping
written to <name>-reports/name_corrections.tsv
- New behaviour: def_xrefs_to_add appended to def_xref column with deduplication
- Lookup helper accepts both source and corrected labels (agent may key by either)
- Summary output extended with new counters
CLAUDE.md changes:
- Stage 3 description updated to enumerate the new agent responsibilities
- QC checklist extended: real def_xref required, layer/part_of rule, out_of_scope
and name_corrections review steps
- Output Files Reference adds the two new report files
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Surveyed 19 existing UBERON 'muscle of X' terms. 14 (74%) use the simple 'genus + part_of some Y' pattern with UBERON:0014892 (skeletal muscle organ, vertebrate) as genus. 3 use attaches_to_part_of, 2 lack logical definition. Decision gate passed: simple part_of pattern covers majority of existing convention. Phase 2 implementation will support genus + part_of only; attaches_to_part_of, innervated_by, and multi-axiom patterns deferred to future phases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ates generate_template.py now classifies each input row as 'leaf' or 'group' using linguistic regex rules (GROUP_PATTERNS / LEAF_PART_PATTERNS in classify_term_type). - Leaf rows go to <name>.template.tsv with SC/part_of directives (existing) - Group rows go to <name>-groups.template.tsv with EC genus + EC part_of some location directives (new) — genus and location columns left blank for the agent to fill input.tsv gains a term_type column so curators can see the classification. Smoke-tested on muscular-system: 20 group / 55 leaf rows out of 75 input terms, matching ROADMAP prediction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
group_terms_by_parent.py now reads both template_initial.tsv and
template_groups_initial.tsv. Leaf rows are grouped by parent UBERON ID as
before. Grouping rows are pooled into a single 'grouping_terms' bucket since
their genus + location values are agent-determined per term, not shared by a
common parent.
Each per-term entry includes term_type ('leaf' or 'group'). Each per-group
JSON has a term_counts summary so curators can see the leaf/group split.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
merge_definitions.py now merges subagent outputs into both the leaf and groups
templates. Common fields (definitions, images, xrefs, def_xrefs) are applied
identically; logic columns differ:
- Leaf template: resolved_relationships -> is_a/part_of (existing)
- Groups template: group_template_rows[label] -> {genus, location} populates
the EC genus and EC part_of some location columns
Group rows missing the agent's genus+location output are flagged 'EC
incomplete' in the summary so curators can investigate.
New report: manual_curation.tsv lists group terms the agent punted (couldn't
fit the simple genus + part_of some Y pattern); includes proposed definition,
reason, and similar UBERON terms found via obo-grep for curator reference.
Refactored row processing into _apply_common_fields helper plus per-template
merge functions (merge_leaf_template, merge_groups_template) so the two
templates share definition/xref/image logic without duplication.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ia obo-grep
ntr-term-researcher.md updated to handle the leaf/group split introduced by
Stage 1 pre-classification:
- New top-of-file 'Term types' paragraph explaining the leaf vs group split
- Input section documents term_type field, term_counts, GROUPING_TERMS bucket
- Step 6 (Write Definitions) now branches: leaf gets Aristotelian form,
group gets collective form ('A group of muscles that...')
- Step 7 (Resolve Relationship Types) explicitly LEAF-only
- New Step 8 for GROUP terms: use awk over uberon-edit.obo to find similar
group terms; if they use 'genus + part_of some Y' pattern, populate
group_template_rows[label] with {genus, location}; otherwise punt to
manual_curation with similar UBERON stanzas as curator reference
- Output JSON gains group_template_rows and manual_curation keys
- Quality checks updated: every group term must end up in either
group_template_rows OR manual_curation
- Tools section notes obo-grep.pl may not be in PATH; awk fallback documented
CLAUDE.md updated with the dual-template flow:
- Stage 1 documents the term_type pre-classification
- Stage 3 enumerates the new agent responsibilities (steps 8 and 9)
- QC checklist split: shared / leaf-template / groups-template / reports
- Final Delivery registers both templates in uberon-odk.yaml
- Output Files Reference includes new groups template + manual_curation.tsv
- Column reference table now has separate sections for leaf and groups
ROADMAP marks Phase 2 implementation complete (pending end-to-end agent test).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 input terms processed by the new dual-template flow: - Stage 1: pre-classified into 8 leaf + 2 group rows - Stage 2: 8 group JSONs (7 leaf parent groups + 1 grouping_terms bucket) - Stage 3: 8 ntr-term-researcher agents (3 needed retry due to API stream timeouts; grouping_terms handled inline after retry stalled) - Stage 4: dual-template merge Final output: - 4 NEW leaf terms with Aristotelian definitions, real PMID/DOI refs, resolved is_a/part_of, Wikipedia + FMA xrefs - 0 NEW group terms — both 'anterior vertebral muscle' and 'circular pharyngeal muscle' resolved to existing UBERON terms via Step 2 OLS4 check, validating that confirmed_match takes precedence over Step 8 EC pattern detection - 5 confirmed_match (5 of 10 input terms already exist in UBERON): - ceratoglossus → UBERON:0008428 - chondroglossus muscle → UBERON:0011535 - costal diaphragm → UBERON:0035831 - aryepiglotticus muscle → UBERON:0011025 - constrictor muscle of pharynx → UBERON:0001569 (NEW from group flow) - 1 possible_match: anterior vertebral muscle ≈ UBERON:0008549 (prevertebral muscle) — flagged for curator decision (synonym vs new term) - 1 manual_curation: anterior vertebral muscle — UBERON's similar term uses attaches_to + innervated_by axioms, not the simple genus + part_of pattern, so out of EC template scope The test exercises: leaf+INFER parent, leaf+NEEDS_MAPPING:FMA parent, leaf+UNRESOLVABLE parent (including wrong-domain), group→confirmed_match, group→possible_match+manual_curation. All paths working as designed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @dosumis's task in 5m 44s —— View job PR Review: Add HRA Muscular NTRReview checklist:
SummaryThis PR adds ROBOT template infrastructure and four new NTR (New Term Request) leaf terms for HRA (Human Reference Atlas) muscular-system structures, along with a bulk NTR workflow toolkit. The term-level research appears careful and the definition quality is high. However, there are two issues that should be addressed before merge, and several important follow-on items. 🔴 CRITICAL1. New templates not registered in ODK config — they will never compile
The precedent is 2. The definition research output (
and sets 🟡 IMPORTANT3. Row 5 of 4. ASCTB-TEMP IRIs in the All four terms carry For example, The ASCTB-TEMP IRI is the third token in the definition xref list. 5. Missing Per 6.
🔵 SUGGESTIONS7. Binary Excel file in repo root
8. ISBN references are non-standard in OBO/UBERON; 9. FMA parent for articularis genu (UBERON:0004252) is very broad FMA:22424 "Muscle of anterior compartment of thigh" has no UBERON equivalent, so the workflow fell back to UBERON:0004252 "hindlimb stylopod muscle". The fallback is documented in the definition output and the errors report, but UBERON:0004252 is a very coarse grouping. The definition output itself suggests considering UBERON:0001377 (quadriceps femoris) and explicitly notes that "articularis genu is anatomically discrete from the quadriceps". Curator should confirm the fallback is acceptable or note it for a follow-up parent term request. 10. Using 11. The PR adds 1,700+ lines of Python scripts, JSON outputs, and documentation for the bulk NTR pipeline. This infrastructure is valuable but may belong in a separate utilities repository or be Per-term verdict
Confirmed matches correctly excluded from template: The term research quality and definition writing are solid. The main blockers are the missing ODK registration (templates won't compile without it) and the relationship-type inconsistency on UBERON:9900010. Addressing the missing |
There was a problem hiding this comment.
Two blocking issues before merge:
-
Templates not wired into the build —
hra-muscular.template.tsvandhra-muscular-groups.template.tsvare not registered inuberon-odk.yamloruberon.Makefile. Without equivalent entries to thehra-skeletonprecedent these terms will never compile into the ontology. -
Relationship-type conflict on UBERON:9900010 — the definition workflow output (
unresolvable_meningohypophyseal_artery.json) explicitly recommendspart_offordorsal part of intertransversarii laterales lumborum musclebut the template usesis_a. These need to be reconciled.
Also flagged as important (see full comment):
- UBERON:9900008 (clavicular head) has no
is_a, onlypart_of - ASCTB-TEMP IRIs in the
def_xrefcolumn should be inxref term_tracker_itemmissing from both templates- Report files under
src/templates/hra-muscular-reports/should not live inside the template directory
There was a problem hiding this comment.
Two blocking issues before merge:
-
Templates not wired into the build — hra-muscular.template.tsv and hra-muscular-groups.template.tsv are not registered in uberon-odk.yaml or uberon.Makefile. Without equivalent entries to the hra-skeleton precedent these terms will never compile into the ontology.
-
Relationship-type conflict on UBERON:9900010 — the definition workflow output (unresolvable_meningohypophyseal_artery.json) explicitly recommends part_of for dorsal part of intertransversarii laterales lumborum muscle but the template uses is_a. These need to be reconciled.
Also flagged as important (see full comment):
- UBERON:9900008 (clavicular head) has no is_a, only part_of
- ASCTB-TEMP IRIs in the def_xref column should be in xref
- term_tracker_item missing from both templates
- Report files under src/templates/hra-muscular-reports/ should not live inside the template directory
There was a problem hiding this comment.
Two blocking issues before merge. 1) Templates not wired into the build: hra-muscular.template.tsv and hra-muscular-groups.template.tsv are not registered in uberon-odk.yaml or uberon.Makefile. 2) Relationship-type conflict on UBERON:9900010: workflow output says part_of but template uses is_a. See full review comment for details.
…umn assignment
Previously the agent picked is_a OR part_of and put the resolved parent in one
column, leaving the other empty. UBERON convention for specific named structures
is to populate BOTH — e.g. vastus lateralis (UBERON:0001379) has
'is_a UBERON:0001630 (muscle organ)' AND 'relationship: part_of UBERON:0001377
(quadriceps femoris)'.
ntr-term-researcher.md changes:
- Step 7 rewritten: agent now uses awk over uberon-edit.obo to find similar
specific UBERON terms, identifies the genus class (typically muscle organ
UBERON:0001630, muscle head UBERON:0011906) AND the part_of containing
structure, then populates leaf_template_rows[label] = {is_a, part_of}
- Worked examples included (clavicular head, articularis genu, dominant
follicle subtypes)
- Explicit warning: do NOT just take the source parent and assign it to one
column; the source parent is often too broad to serve as the genus
- Output JSON gains leaf_template_rows key (analogous to group_template_rows)
- Quality checks updated: prefer leaf_template_rows; both is_a and part_of
should be populated when applicable
merge_definitions.py changes:
- load_subagent_outputs reads leaf_template_rows
- merge_leaf_template uses leaf_template_rows first; falls back to legacy
resolved_relationships + resolved_parents if absent (backward compatible)
- New counter 'leaf_template_rows used' in summary output
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lated After updating Step 7 of the agent spec to require obo-grep lookup of similar UBERON terms before assigning the parent, re-ran the 4 leaf-term agents on the muscular-system test set. All 4 now populate both is_a and part_of columns: | Term | is_a | part_of | |-----------------------------------------------------|--------------------|-------------------| | anterior cervical intertransversarii muscle | muscle organ | neck | | articularis genu muscle | muscle organ | hindlimb stylopod | | clavicular head of pectoralis major muscle | muscle head | pectoralis major | | dorsal part of intertransversarii laterales lumborum| muscle organ | lower back muscle | Notable: clavicular head correctly resolved to is_a UBERON:0011906 (muscle head) — matching the long head of biceps brachii (UBERON:0007168) precedent. articularis genu correctly distinguished UBERON:0004252 (sibling grouping class, not container) from UBERON:0000376 (the actual containing region) by following the pectineus precedent. Stage 4 reports leaf_template_rows used=4, legacy resolved_relationships=0 — confirms the new path is exercising the proper genus+location lookup rather than the legacy single-column assignment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standalone experiment in bulk_ntr_workflow/experiments/. Workflow scripts NOT modified. Tests whether an agent can extract origin/insertion/ innervation/action from Wikipedia + uberon-edit.obo with UBERON ID resolution and verbatim evidence quotes per field. Test set (well-known whole muscle → obscure sub-part): - internal abdominal oblique muscle (existing UBERON:0005454) - tensor fascia latae muscle (existing UBERON:0001376) - iliocostalis cervicalis muscle (existing UBERON:0008546) - articularis genu muscle (NEW) - clavicular head of pectoralis major muscle (NEW, muscle head) - dorsal part of intertransversarii laterales lumborum (NEW, obscure) Findings (full report in SUMMARY.md): 1. All 6 terms got 5-6 of 6 enrichment fields populated. Where UBERON IDs couldn't be resolved (named attachments, specific nerves, specific bone landmarks), the agent gracefully fell back to free-text quotes plus parent-class UBERON IDs. 2. The hypothesis that 'muscle parts are poorly axiomatised' is partly confirmed: parent muscle classes for sub-parts are missing (intertransversarii laterales lumborum), but the bigger gap is in UBERON's coverage of related anatomical entities — superior gluteal nerve, lateral pectoral nerve, iliotibial tract, suprapatellar bursa, ilioinguinal nerve, linea alba, accessory process of lumbar vertebra are all missing. A famous muscle like tensor fasciae latae has 2 such gaps; the obscure dorsal sub-part has 3 — gaps are not strongly correlated with term obscurity. 3. The verbatim-quote design works well for review. Each enrichment field carries 1-3 sentences of evidence + a source URL, making the enrichment auditable in seconds per field. 4. 3 of 6 picks turned out to be already in UBERON despite being plausible NTR candidates — Step 2 (existing-term check) continues to do real work. Existing UBERON stanzas often have surprisingly light axiomatisation (tensor fasciae latae has only 1 origin axiom), so enrichment could also improve existing terms, not just new ones. No workflow changes; results are reference material for a future enrichment phase. Roadmap candidates: (a) system-specific templates with pre-extracted fields per system, (b) standardised evidence-quote design across all fields, (c) cascade detection — flag missing UBERON entities as candidate NTRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User hypothesis: simple is_a + part_of would be sufficient for ovary terms (unlike muscles where origin/insertion/innervation are needed). Result: hypothesis NOT confirmed. 5 of 6 ovary terms tested require relations beyond is_a + part_of: - Layers (corona radiata, CL granulosa lutein, CL theca): need composed_primarily_of (CL:cell type) and/or bounding_layer_of, has_part - Compositional complex (cumulus oophorus oocyte complex): needs has_part to distinguish from cumulus oophorus alone - Follicle stages (early antral, transitional primary): need develops_from PLUS has_component with cardinality constraints PLUS has_potential_to_develop_into — UBERON's existing precedent (UBERON:0000035/36/37) uses all of these Why ovary is harder for simple is_a + part_of than expected: - Sibling layers share part_of (both lutein + theca layers part_of corpus luteum) — part_of alone doesn't differentiate - Sibling follicle stages share is_a (all primary/secondary/tertiary is_a ovarian follicle AND part_of ovary) — neither relation distinguishes - The defining property is cellular composition or developmental position, neither captured by spatial part_of Cross-experiment comparison: - Muscle group: simple genus + part_of EC sufficient (74% precedent) - Muscle individual: needs muscle origin/insertion/innervation - Muscle head/sub-part: simple is_a + part_of works (sparse precedent) - Ovary layer/complex/stage: needs composed_primarily_of, has_part, develops_from, cardinality Conclusion: per-system templates are warranted. A single one-size-fits-all leaf template either over-fits one domain or under-serves both. The evidence-quote JSON design transferred cleanly between domains — confirming it as a generalisable pattern. Output: bulk_ntr_workflow/experiments/SUMMARY_OVARY.md and 6 enriched JSONs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage 1 now partitions input rows by source `tables` value into system overlays. Each overlay produces its own clean leaf template with system-specific columns; unmapped tables go to the default template. Phase 6 — develops_from on default leaf template - New optional column with directive `SC RO:0002202 some %` - Empty cell → no axiom (standard ROBOT pattern; do NOT alter directive to work around empty cells) - Populated by agent for stage series (follicle stages, embryonic stages, etc.) Phase 7 — skeletal-muscle overlay - New muscle template variant: <name>-muscle.template.tsv - Adds has_muscle_origin (RO:0002372), has_muscle_insertion (RO:0002373), innervated_by (RO:0002005) - All optional; populated by agent only with evidence-quoted UBERON IDs - Triggered by source table value `muscular-system` (SYSTEM_OVERLAYS map) Implementation notes: - generate_template.py: SYSTEM_OVERLAYS, classify_system(), overlay_paths(); per-row routing builds leaf_rows_by_overlay dict; one template TSV written per overlay - group_terms_by_parent.py: reads ALL leaf templates (default + system overlays) via discover_leaf_templates(); each per-term JSON entry now carries `system` field - merge_definitions.py: REFACTORED to use header-name lookup (header_indices()) instead of hardcoded column indices. Each leaf template variant's columns are looked up at merge time. Optional logic columns (develops_from, has_muscle_*) populate from leaf_template_rows[label] when both column and value exist. - agent spec: documents system field, develops_from + muscle-overlay guidance; output JSON example shows the optional fields - CLAUDE.md: column reference splits leaf table into default + muscle overlay; new partitioning subsection Smoke-tested with --table muscular-system --limit 10: - Step 0 routing correctly outputs muscle=8, group=2 (no default partition since all 10 rows are muscular-system) - Output template has 16 columns with the 6 expected logic relations (is_a + part_of + develops_from + 3 muscle) - Merge with existing leaf_template_rows from previous run still works via legacy is_a + part_of fallback (Optional cols filled: 0 because agents haven't been re-run with new spec) Phase 8 (term promotion) and overlays for skeleton/vasculature/nervous documented in ROADMAP only — not implemented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-ran Stage 1 with --table muscular-system --limit 10. Output: - Step 0 routing: muscle=8, group=2 (correctly partitioned by source table) - <name>-muscle.template.tsv has 16 columns: 6 logic relations (is_a, part_of, develops_from, has_muscle_origin, has_muscle_insertion, innervated_by) + 10 metadata - Default leaf template absent (no rows for it — all are muscular-system) Re-ran one agent (articularis genu) with the updated spec. Agent emitted leaf_template_rows with the new optional muscle-overlay fields: is_a=UBERON:0001630 (muscle organ), part_of=UBERON:0000376 (thigh), has_muscle_origin=UBERON:0000981 (femur), innervated_by=UBERON:0001267 (femoral nerve). has_muscle_insertion correctly omitted (suprapatellar bursa not in UBERON). Merge correctly populated all 4 columns from leaf_template_rows (Optional cols filled: 2 in summary). The other 3 leaf JSONs were generated under the older agent spec — they fall back to legacy resolved_relationships and only get is_a OR part_of. To populate their muscle-overlay fields, those agents would need to be re-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage 1 partitioning: muscle=55, group=20 (no default partition)
Stage 2 grouping: 45 groups (44 leaf + 1 grouping_terms bucket of 20 group terms)
Stage 3: 45 agents in 6 parallel batches; all completed without API timeouts
Stage 4 merge results:
Leaf template <name>-muscle.template.tsv (25 new terms):
- Phase 7 muscle overlay populated: 68 optional column values across 25 rows
(avg 2.7 muscle-specific columns per term — origin/insertion/innervated_by)
- All 25 use leaf_template_rows (no legacy fallback path triggered)
- 0 PENDING definitions, 0 INFER relationships, 0 unresolved
- 24 of 25 have real PMID/DOI/ISBN def_xrefs added
Groups template <name>-groups.template.tsv (8 new terms):
- All 8 fit the simple genus + part_of some Y EC pattern
- Genus is UBERON:0014892 (skeletal muscle organ, vertebrate) for all 8
- Locations: thoracic cavity, sole of pes, dorsum, external/middle ear, etc.
Reports:
- candidates.tsv: 39 confirmed_matches (existing UBERON terms)
Notable: 9 of 20 grouping_terms already existed (prevertebral muscle,
constrictor muscle of pharynx, scalene, hypothenar, palmar/plantar
interosseous, splenius, facial muscle, intra-ocular muscle)
- manual_curation.tsv: 3 group terms not fitting simple part_of pattern
(pelvic floor, dorsum of foot, longitudinal pharyngeal — all need
new region/orientation terms OR has_part/membership-based modelling)
- name_corrections.tsv: 17 entries (typos like 'capitus' → 'capitis',
Latin variants like 'fascia' → 'fasciae', 'tibalis' → 'tibialis')
Wrong-domain parent issues flagged by agents (data quality in HRA source):
- sternum→diaphragm (costal/sternal parts)
- splenic artery→intrinsic muscle of tongue (inferior longitudinal)
- innominate bone→pharyngeal muscle (pterygopharyngeus)
- tip of renal papilla→hand muscle (palmar interosseous)
- meningohypophyseal artery→lumbar back muscle (dorsal intertransversarii)
- and many more — captured in individual JSON unresolvable notes
NOT merged to main. Branch retains everything for review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @dosumis's task in 5m 44s —— View job PR Review: Add HRA Muscular NTR (Re-review after sync)Review checklist:
Tasks:
SummaryThe PR has grown substantially since the first review: the original four-term ✅ Fixed since previous review
🔴 CRITICAL1. Templates still not registered in ODK config or Makefile Neither In - filename: hra_muscular.owl
templates:
- hra-muscular-muscle.template.tsv
- hra-muscular-groups.template.tsvAnd a corresponding build rule in 2. Groups template: three back-muscle terms have identical EC axiom definitions — reasoner will infer equivalence
The OWL reasoner will immediately conclude all three are equivalent classes. Grouping terms like these are usually defined via 🟡 IMPORTANT3. ASCTB-TEMP IRIs still in the All 33 terms still carry 4. Per 5. Pipeline outputs ( 6. Three leaf terms use
All other skeletal muscle leaf terms in the same template correctly use 7. UBERON:9900025 (lateral compartment of leg muscle) — The definition explicitly states the structure is "part of the lateral (fibular) fascial compartment of the leg". The 🔵 SUGGESTIONS8. ISBN citations are not standard OBO/UBERON practice. The TA2 URL ( 9. UBERON:9900047 (posterior abdominal wall muscle) — location is generic abdominal wall The EC axiom uses Per-template verdictLeaf template (25 terms) — most terms look anatomically sound with good definitions and PMIDs. Specific concerns:
Groups template (8 terms) — logical-definition structure is sound in principle, but three back-layer groupings share identical EC axioms (see critical issue 2 above). The remaining five groupings (ear, abdominal wall, sole, thoracic wall) each have unique location values and are fine. Two prior blockers are resolved. The remaining critical items (build registration and EC axiom collision) must be addressed before merge. |
This PR modifies some logical definitions. See details below:Number of classes with modified logical definitions: 2 trabecular sinus of lymph nodeNo changes in inferred subclasses. subcapsular sinus of lymph nodeNo changes in inferred subclasses. |
There was a problem hiding this comment.
Two new critical issues must be fixed before merge:
-
Templates still not registered in ODK config/Makefile — both hra-muscular-muscle.template.tsv and hra-muscular-groups.template.tsv are absent from uberon-odk.yaml and uberon.Makefile. No component build rule exists, so all 33 new terms will never compile.
-
Groups template EC axiom collision — intermediate back muscle (UBERON:9900020), segmental back muscle (UBERON:9900055), and superficial back muscle (UBERON:9900063) all produce identical EquivalentClass axioms (skeletal muscle organ AND part_of dorsum). The reasoner will infer these three terms are equivalent. These grouping terms should use SC (SubClassOf) columns, not EC (EquivalentClass).
Fixed from prior review: UBERON:9900008 now has is_a (muscle head) ✅; UBERON:9900010 relationship conflict resolved ✅.
Still outstanding from prior review (IMPORTANT): ASCTB-TEMP IRIs remain in the def_xref column; term_tracker_item columns missing from both templates; hra-muscular-reports/ still under src/templates/.
See the full review comment for details and per-term verdict.
…m run 26 unresolvable notes extracted from per-term JSON outputs in bulk_ntr_workflow/outputs/definitions/. Most are wrong-domain parent issues in HRA source data (sternum→diaphragm, splenic artery→tongue muscle, innominate bone→pharyngeal muscle, meningohypophyseal artery→ lumbar muscle, anterior cerebral artery→hand muscle, fused sacrum→ abdominal muscle, etc.). Workflow gap noted: merge_definitions.py currently writes aggregated reports for confirmed_matches, name_corrections, out_of_scope, and manual_curation, but NOT for unresolvable. Adding this aggregation to the merge script would surface these systematically without manual extraction. Tracked for follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces standalone unresolvable.tsv with a single curator-facing review file. Each row of the original input gets: | Column | Purpose | |-------------------------|-----------------------------------------------| | table, as_iri, label | Input source identifiers | | term_type | leaf | group | | source_parent_id | Parent ID from source data | | source_parent_label | Parent label from source | | status | confirmed_match | new_term_leaf | | | | new_term_group | manual_curation | | mapped_uberon_id | Either existing UBERON ID (if confirmed) | | | OR new UBERON:99xxxxx ID (if new term) | | label_correction | Suggested corrected label (typos etc.) | | label_correction_reason | Why the label was corrected | | parent_correction | Corrected parent UBERON ID (when source was | | | wrong-domain or missing) | | curator_notes | Pipe-separated unresolvable notes from agents | | | (wrong-domain parents, missing UBERON | | | entities, modelling caveats) | Aggregated stats for the muscular-system run: - 75 input rows - Status: 39 confirmed_match, 25 new_term_leaf, 8 new_term_group, 3 manual_curation - 17 with label corrections - 34 with parent corrections (parent_correction populated) - 20 with curator notes (mostly wrong-domain source parents) This is the file curators should review first — one row per input term, all findings consolidated, no need to dig into per-term JSONs. Built one-off via Python script joining input.tsv + candidates.tsv + name_corrections.tsv + manual_curation.tsv + per-term unresolvable notes from outputs/definitions/*.json. Should be moved into merge_definitions.py as an aggregated report writer (follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new columns make curator review substantially faster: - mapped_label: human-readable label of the mapped UBERON term (looked up from src/ontology/uberon-edit.obo). For new terms, shows '(new term: <label>)'. Saves the curator from having to look up each UBERON ID separately. - parent_correction_label: human-readable label for parent_correction. Same rationale. - mapping_evidence: synthesised evidence from per-term JSON outputs explaining WHY each source term maps to its existing UBERON match. Pulls from various agent field names (note, match_notes, match_evidence, web_evidence, rationale, matched_synonyms). Example: 'UBERON:0035831 has EXACT synonym "costal part of diaphragm" (FMA:58277). The source label "respiratory diaphragm muscle" resolves to UBERON:0001103 (diaphragm)...' Stats: - 71/75 rows have mapped_label populated - 29 confirmed_match rows have mapping_evidence (the other 10 had agents that didn't emit the optional evidence fields — minor gap) - 34 with parent_correction (so 34 with parent_correction_label too) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @dosumis's task in 6m 48s —— View job PR Review: Add HRA Muscular NTR (Third review)Tasks:
Review checklist:
SummaryNo issues from the previous review have been fixed. The three most recent commits ( Additionally, the new NTR terms are not present in The uberon-edit.obo changes in this PR (removing UBERON:1200004 and updating lymph node sinus axioms) appear correct; CI confirms no regression in inferred subclasses for the two affected terms. 🔴 CRITICAL (same as previous review — still unresolved)1. Templates not registered in ODK config or Makefile
In - filename: hra_muscular.owl
templates:
- hra-muscular-muscle.template.tsv
- hra-muscular-groups.template.tsvAnd a corresponding build rule in 2. EC axiom collision in groups template — three back-layer groupings will be inferred equivalent
All three produce the identical EquivalentClass axiom 🟡 IMPORTANT (same as previous review — still unresolved)3. ASCTB-TEMP IRIs still in the All 33 terms still carry 4. Neither template has a 5. The reports directory now has 6 files ( 6. Three leaf terms use
All other skeletal muscle leaf terms in the same template use 7. UBERON:9900025 (lateral compartment of leg muscle) — The definition says "A skeletal muscle organ that is part of the lateral (fibular) fascial compartment of the leg". The 🔵 SUGGESTIONS (carried from previous review)8. ISBN:9780323393225 in def_xref is non-standard — the TA2 URL already covers the Terminologia Anatomica reference; the ISBN could be dropped or moved to a comment. 9. UBERON:9900047 (posterior abdominal wall muscle) — uses What changed since the last reviewThe three newest commits add pipeline output enhancements:
These are valuable workflow additions, but do not address any of the open issues. The uberon-edit.obo changes (removal of UBERON:1200004 and related axiom updates) look correct; CI reports no inferred-subclass regressions for the two affected lymph node sinus terms. Per-template verdict (unchanged)Leaf template (25 terms): Definition quality is high with good PMID support. Specific concerns remain on UBERON:9900003, 9900010, 9900026 (is_a too broad) and UBERON:9900025 (part_of blank). Groups template (8 terms): The critical EC axiom collision on the three back-layer groupings (UBERON:9900020, 9900055, 9900063) must be fixed. The other five groupings (ear, abdominal wall, sole, thoracic wall, middle ear) each have unique locations and are logically sound. The two critical blockers (build registration; EC axiom collision) and the important issues must be addressed before merge. |
There was a problem hiding this comment.
Two critical blockers remain unresolved since the previous review:
-
Templates not registered in ODK config/Makefile — hra-muscular-muscle.template.tsv and hra-muscular-groups.template.tsv are absent from uberon-odk.yaml and uberon.Makefile. All 33 new NTR terms exist only in unregistered template files and will never compile into the ontology.
-
EC axiom collision in groups template — intermediate back muscle (UBERON:9900020), segmental back muscle (UBERON:9900055), and superficial back muscle (UBERON:9900063) produce identical EquivalentClass axioms (skeletal muscle organ AND part_of dorsum). The reasoner will infer all three are equivalent classes. Change EC to SC for grouping rows.
Also still outstanding (flagged in both prior reviews): ASCTB-TEMP IRIs in def_xref; term_tracker_item missing from both templates; hra-muscular-reports/ under src/templates/; three leaf terms (9900003, 9900010, 9900026) using too-broad is_a UBERON:0001630; UBERON:9900025 part_of column empty.
No issues from the previous review were addressed in the three most recent commits. See the full review comment for details.
No description provided.