Skip to content

Add hra muscular ntr#3700

Open
dosumis wants to merge 21 commits into
masterfrom
add-hra-muscular-ntr
Open

Add hra muscular ntr#3700
dosumis wants to merge 21 commits into
masterfrom
add-hra-muscular-ntr

Conversation

@dosumis
Copy link
Copy Markdown
Contributor

@dosumis dosumis commented Apr 28, 2026

No description provided.

dosumis and others added 10 commits April 27, 2026 15:01
Four-stage pipeline for generating UBERON new term request ROBOT
templates from HRA ASCTB unmapped term tables:

Stage 1 (generate_template.py): reads xlsx/csv input, classifies parent
IDs (UBERON/FMA/ASCTB-TEMP), assigns UBERON:99xxxxx provisional IDs,
writes initial ROBOT template TSV + error and candidate reports.

Stage 2 (group_terms_by_parent.py): groups template rows by parent and
writes per-group JSON files for parallel subagent processing.

Stage 3 (ntr-term-researcher agent): resolves FMA/ASCTB-TEMP parents via
OLS4, checks for existing UBERON matches, writes Aristotelian definitions
from Wikipedia, resolves is_a vs part_of relationship types.

Stage 4 (merge_definitions.py): merges subagent outputs back into the
template; appends confirmed/possible OLS4 matches to candidates report.

Template columns: ID, LABEL, Definition, def_xref (definition annotation),
is_a, part_of, In_subset, Date, Contributor, Present_in_taxon,
Wikipedia_image (foaf:depiction), xref (direct oboInOwl:hasDbXref for
Wikipedia article URL + FMA ID).

Supporting agents/skills:
- ntr-term-researcher: Stage 3 subagent spec
- ontology-term-lookup: OLS4 structured search
- fetch-wiki-info: Wikidata + Wikipedia lookup
- .mcp.json: ols4, artl-mcp, playwright MCP servers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… plans

Phases covered:
- Phase 2: grouping vs leaf-node term distinction (linguistic rules, subagent behaviour)
- Phase 3: detect UBERON label-ID mismatches in Stage 1; new WRONG_PARENT: placeholder;
  multi-valued parent column splitting; subagent protocol for mismatch correction
  (informed by ovary run where 7/13 terms had wrong-domain UBERON parent IDs silently accepted)
- Phase 4: scale to full muscular-system table
- Phase 5: generalise to other ASCTB anatomical systems

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dragon-ai-agent
merge_definitions.py:
- Fallback path (parent resolved but rel type unknown) now leaves both
  is_a and part_of blank rather than double-setting them, and lists
  affected term labels in the summary output under 'Relationship
  unresolved' for curator attention
- Remove dead 'if jf.parent.name == "input"' guard — glob never matches
  files in subdirectories

generate_template.py:
- Remove dead write_tsv call with doubled headers that was immediately
  overwritten by the block below it
- Fix counter order: use counter for ID, then increment (was: increment
  then use counter-1)
- Remove hardcoded CONTRIBUTOR_IRI constant; add --contributor CLI arg
  with ORCID format validation; prompts interactively if not supplied

group_terms_by_parent.py:
- Remove derive_wikipedia_urls call and wikipedia_urls field from output
  JSON — parent_label is always "" so the call always returned []; the
  subagent derives Wikipedia URLs independently during lookup

ntr-term-researcher.md:
- Clarify that Wikipedia article page URL (not image URL) goes in xrefs
  at point of successful lookup, as Wikipedia:Article_Title
- Add image relevance check: verify caption/alt text confirms the image
  illustrates the target structure before storing it

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…name flagging

Addresses issues found in the ovary branch test run where the agent:
- classified layers (corpus luteum granulosa lutein/theca) as is_a parents (should be part_of)
- accepted source-provided broad parents instead of finding more specific ones
- left ASCTB-TEMP placeholders as the only def_xref (no real PMIDs)
- did not flag pathological terms (hemorrhagic, luteinized unruptured) as out of scope
- did not normalise non-standard names ('dominance' instead of 'dominant')

ntr-term-researcher.md changes:
- Step 1 expanded: after confirming source parent, agent must search OLS4 for a more
  specific parent (e.g. primary/secondary ovarian follicle vs generic ovarian follicle)
- New Step 3: scope check (pathological/dysfunctional → out_of_scope) and name check
  (non-standard → name_corrections with curator-reviewable suggestion)
- New Step 5: literature search — must find at least one real PMID/DOI for def_xref;
  ASCTB-TEMP placeholders explicitly disallowed as the only reference
- Step 7 (relationship resolution) rewritten with explicit structural vocabulary:
  layers, zones, heads, bellies, parts, compartments, walls → ALWAYS part_of
  subtypes/stages/members of grouping classes → is_a
  Quick test ('is a kind of' vs 'is part of') with worked examples
- Output JSON adds: def_xrefs_to_add, out_of_scope, name_corrections keys
- Quality checks expanded with explicit rules for layers, pathology, naming

merge_definitions.py changes:
- Refactored load_subagent_outputs to return single dict (less argument tuple churn)
- New behaviour: out_of_scope terms excluded from template (not just confirmed_matches);
  written to <name>-reports/out_of_scope.tsv for curator review
- New behaviour: name_corrections applied to LABEL column; original-source mapping
  written to <name>-reports/name_corrections.tsv
- New behaviour: def_xrefs_to_add appended to def_xref column with deduplication
- Lookup helper accepts both source and corrected labels (agent may key by either)
- Summary output extended with new counters

CLAUDE.md changes:
- Stage 3 description updated to enumerate the new agent responsibilities
- QC checklist extended: real def_xref required, layer/part_of rule, out_of_scope
  and name_corrections review steps
- Output Files Reference adds the two new report files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Surveyed 19 existing UBERON 'muscle of X' terms. 14 (74%) use the simple
'genus + part_of some Y' pattern with UBERON:0014892 (skeletal muscle organ,
vertebrate) as genus. 3 use attaches_to_part_of, 2 lack logical definition.

Decision gate passed: simple part_of pattern covers majority of existing
convention. Phase 2 implementation will support genus + part_of only;
attaches_to_part_of, innervated_by, and multi-axiom patterns deferred to
future phases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ates

generate_template.py now classifies each input row as 'leaf' or 'group' using
linguistic regex rules (GROUP_PATTERNS / LEAF_PART_PATTERNS in classify_term_type).

- Leaf rows go to <name>.template.tsv with SC/part_of directives (existing)
- Group rows go to <name>-groups.template.tsv with EC genus + EC part_of some
  location directives (new) — genus and location columns left blank for the
  agent to fill

input.tsv gains a term_type column so curators can see the classification.

Smoke-tested on muscular-system: 20 group / 55 leaf rows out of 75 input terms,
matching ROADMAP prediction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
group_terms_by_parent.py now reads both template_initial.tsv and
template_groups_initial.tsv. Leaf rows are grouped by parent UBERON ID as
before. Grouping rows are pooled into a single 'grouping_terms' bucket since
their genus + location values are agent-determined per term, not shared by a
common parent.

Each per-term entry includes term_type ('leaf' or 'group'). Each per-group
JSON has a term_counts summary so curators can see the leaf/group split.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
merge_definitions.py now merges subagent outputs into both the leaf and groups
templates. Common fields (definitions, images, xrefs, def_xrefs) are applied
identically; logic columns differ:

- Leaf template: resolved_relationships -> is_a/part_of (existing)
- Groups template: group_template_rows[label] -> {genus, location} populates
  the EC genus and EC part_of some location columns

Group rows missing the agent's genus+location output are flagged 'EC
incomplete' in the summary so curators can investigate.

New report: manual_curation.tsv lists group terms the agent punted (couldn't
fit the simple genus + part_of some Y pattern); includes proposed definition,
reason, and similar UBERON terms found via obo-grep for curator reference.

Refactored row processing into _apply_common_fields helper plus per-template
merge functions (merge_leaf_template, merge_groups_template) so the two
templates share definition/xref/image logic without duplication.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ia obo-grep

ntr-term-researcher.md updated to handle the leaf/group split introduced by
Stage 1 pre-classification:

- New top-of-file 'Term types' paragraph explaining the leaf vs group split
- Input section documents term_type field, term_counts, GROUPING_TERMS bucket
- Step 6 (Write Definitions) now branches: leaf gets Aristotelian form,
  group gets collective form ('A group of muscles that...')
- Step 7 (Resolve Relationship Types) explicitly LEAF-only
- New Step 8 for GROUP terms: use awk over uberon-edit.obo to find similar
  group terms; if they use 'genus + part_of some Y' pattern, populate
  group_template_rows[label] with {genus, location}; otherwise punt to
  manual_curation with similar UBERON stanzas as curator reference
- Output JSON gains group_template_rows and manual_curation keys
- Quality checks updated: every group term must end up in either
  group_template_rows OR manual_curation
- Tools section notes obo-grep.pl may not be in PATH; awk fallback documented

CLAUDE.md updated with the dual-template flow:
- Stage 1 documents the term_type pre-classification
- Stage 3 enumerates the new agent responsibilities (steps 8 and 9)
- QC checklist split: shared / leaf-template / groups-template / reports
- Final Delivery registers both templates in uberon-odk.yaml
- Output Files Reference includes new groups template + manual_curation.tsv
- Column reference table now has separate sections for leaf and groups

ROADMAP marks Phase 2 implementation complete (pending end-to-end agent test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 input terms processed by the new dual-template flow:
- Stage 1: pre-classified into 8 leaf + 2 group rows
- Stage 2: 8 group JSONs (7 leaf parent groups + 1 grouping_terms bucket)
- Stage 3: 8 ntr-term-researcher agents (3 needed retry due to API stream
  timeouts; grouping_terms handled inline after retry stalled)
- Stage 4: dual-template merge

Final output:
- 4 NEW leaf terms with Aristotelian definitions, real PMID/DOI refs,
  resolved is_a/part_of, Wikipedia + FMA xrefs
- 0 NEW group terms — both 'anterior vertebral muscle' and
  'circular pharyngeal muscle' resolved to existing UBERON terms via
  Step 2 OLS4 check, validating that confirmed_match takes precedence
  over Step 8 EC pattern detection
- 5 confirmed_match (5 of 10 input terms already exist in UBERON):
  - ceratoglossus → UBERON:0008428
  - chondroglossus muscle → UBERON:0011535
  - costal diaphragm → UBERON:0035831
  - aryepiglotticus muscle → UBERON:0011025
  - constrictor muscle of pharynx → UBERON:0001569 (NEW from group flow)
- 1 possible_match: anterior vertebral muscle ≈ UBERON:0008549 (prevertebral
  muscle) — flagged for curator decision (synonym vs new term)
- 1 manual_curation: anterior vertebral muscle — UBERON's similar term uses
  attaches_to + innervated_by axioms, not the simple genus + part_of
  pattern, so out of EC template scope

The test exercises: leaf+INFER parent, leaf+NEEDS_MAPPING:FMA parent,
leaf+UNRESOLVABLE parent (including wrong-domain), group→confirmed_match,
group→possible_match+manual_curation. All paths working as designed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ai4c-agent
Copy link
Copy Markdown

ai4c-agent Bot commented Apr 28, 2026

Claude finished @dosumis's task in 5m 44s —— View job


PR Review: Add HRA Muscular NTR

Review checklist:

  • Hierarchy and parents are consistent
  • Definitions and supporting references look adequate
  • Obsolete or replacement handling is safe
  • OBO and ODK conventions appear preserved
  • No obvious CI, auth, or workflow regression

Summary

This PR adds ROBOT template infrastructure and four new NTR (New Term Request) leaf terms for HRA (Human Reference Atlas) muscular-system structures, along with a bulk NTR workflow toolkit. The term-level research appears careful and the definition quality is high. However, there are two issues that should be addressed before merge, and several important follow-on items.


🔴 CRITICAL

1. New templates not registered in ODK config — they will never compile

src/templates/hra-muscular.template.tsv and src/templates/hra-muscular-groups.template.tsv are added, but neither appears in src/ontology/uberon-odk.yaml nor has a corresponding rule in src/ontology/uberon.Makefile.

The precedent is hra-skeleton.template.tsv, which is listed under use_template: true in uberon-odk.yaml and has a custom $(COMPONENTSDIR)/hra_skeleton.owl rule in uberon.Makefile. Without equivalent registration the four NTR terms will never be built into the ontology.

2. dorsal part of intertransversarii laterales lumborum muscle — relationship type mismatch between workflow output and template

The definition research output (bulk_ntr_workflow/outputs/definitions/unresolvable_meningohypophyseal_artery.json) explicitly says:

"Note that 'dorsal part of' implies a subdivision, hence part_of relationship rather than is_a."

and sets resolved_relationships: "part_of". Yet src/templates/hra-muscular.template.tsv (row 6, UBERON:9900010) places UBERON:0008242 in the is_a column, not the part_of column. The research output and the final template are in direct conflict. Curator should decide the correct relationship and make the template consistent.


🟡 IMPORTANT

3. clavicular head of pectoralis major muscle (UBERON:9900008) — no is_a in template

Row 5 of hra-muscular.template.tsv has an empty is_a column and only part_of UBERON:0002381. In OBO format every term needs at least one explicit is_a (or a logical definition from which the reasoner can infer one). The template currently produces a term with only a BFO:0000050 some pectoralis_major SubClassOf axiom and no direct type. A parent such as "muscle head" (if such a class exists or should be created) or at minimum a generic skeletal muscle parent should be added.

4. ASCTB-TEMP IRIs in the def_xref column

All four terms carry https://purl.org/ccf/ASCTB-TEMP_<label> in the def_xref column (the column mapped to oboInOwl:hasDbXref on the definition). These are ephemeral provenance tracking IRIs from the HRA ASCT+B tables, not stable literature citations. They should be moved to the general xref column (or dropped) so they are not asserted as definition sources.

For example, anterior cervical intertransversarii muscle cites:

https://fipat.library.dal.ca/ta2/|ISBN:9780323393225|https://purl.org/ccf/ASCTB-TEMP_anterior-cervical-intertransversarii-muscle|PMID:12650404

The ASCTB-TEMP IRI is the third token in the definition xref list.

5. Missing term_tracker_item in both templates

Per CLAUDE.md and Uberon conventions, terms should link back to the originating GitHub issue with a term_tracker_item property. Neither template has a column for this. The hra-skeleton precedent and general UBERON practice require it for traceable provenance.

6. hra-muscular-reports/ placed under src/templates/

src/templates/hra-muscular-reports/ contains pipeline outputs (candidates.tsv, errors.tsv, input.tsv, manual_curation.tsv), not template files. These will confuse any tool that iterates src/templates/*.tsv or src/templates/**/*.tsv. Reports should live in bulk_ntr_workflow/outputs/ or similar, not inside the template directory.


🔵 SUGGESTIONS

7. Binary Excel file in repo root

hra_unmapped-asct-term-list-with-refs.xlsx (277 KB) is added to the repository root. Binary blobs in a git repo grow the clone size permanently. If this is source data, consider storing it under src/datasets/ with a brief README, or keeping it outside git (linked from the issue tracker).

8. ISBN:9780323393225 as a def_xref

ISBN references are non-standard in OBO/UBERON; PMID: and doi: are the expected citation forms. The TA2 URL (https://fipat.library.dal.ca/ta2/) is already included and serves as the Terminologia Anatomica reference. The ISBN could be removed from def_xref or moved to a comment.

9. FMA parent for articularis genu (UBERON:0004252) is very broad

FMA:22424 "Muscle of anterior compartment of thigh" has no UBERON equivalent, so the workflow fell back to UBERON:0004252 "hindlimb stylopod muscle". The fallback is documented in the definition output and the errors report, but UBERON:0004252 is a very coarse grouping. The definition output itself suggests considering UBERON:0001377 (quadriceps femoris) and explicitly notes that "articularis genu is anatomically discrete from the quadriceps". Curator should confirm the fallback is acceptable or note it for a follow-up parent term request.

10. @playwright/mcp@latest in .mcp.json is a floating version

Using @latest means the Playwright MCP server version is not pinned. Different contributors will pull different versions, which can affect reproducibility of the research workflow. Pinning to a specific semver is safer for shared infrastructure.

11. bulk_ntr_workflow/ adds substantial non-ontology content

The PR adds 1,700+ lines of Python scripts, JSON outputs, and documentation for the bulk NTR pipeline. This infrastructure is valuable but may belong in a separate utilities repository or be .gitignored when done. If it stays, the bulk_ntr_workflow/CLAUDE.md should be moved to somewhere that does not accidentally extend the project-level CLAUDE.md scope.


Per-term verdict

NTR ID Label Parent Status
UBERON:9900001 anterior cervical intertransversarii muscle is_a UBERON:0002324 (muscle of back) ✅ Reasonable; parent broad but defensible
UBERON:9900003 articularis genu muscle is_a UBERON:0004252 (hindlimb stylopod muscle) 🟡 Fallback parent; FMA:22437 xref looks correct
UBERON:9900008 clavicular head of pectoralis major muscle part_of UBERON:0002381 (pectoralis major) 🔴 No is_a
UBERON:9900010 dorsal part of intertransversarii laterales lumborum muscle template: is_a UBERON:0008242; workflow output: part_of UBERON:0008242 🔴 Relationship type conflict

Confirmed matches correctly excluded from template: aryepiglottic part of oblique arytenoid muscle (→ UBERON:0011025), costal part of respiratory diaphragm muscle (→ UBERON:0035831), ceratoglossus of hyoglossus muscle (→ UBERON:0008428), chondroglossus of hyoglossus muscle (→ UBERON:0011535), circular pharyngeal muscle (→ UBERON:0001569).


The term research quality and definition writing are solid. The main blockers are the missing ODK registration (templates won't compile without it) and the relationship-type inconsistency on UBERON:9900010. Addressing the missing is_a on UBERON:9900008 and the ASCTB-TEMP def_xrefs would bring the templates into line with Uberon conventions.

Copy link
Copy Markdown

@ai4c-agent ai4c-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blocking issues before merge:

  1. Templates not wired into the buildhra-muscular.template.tsv and hra-muscular-groups.template.tsv are not registered in uberon-odk.yaml or uberon.Makefile. Without equivalent entries to the hra-skeleton precedent these terms will never compile into the ontology.

  2. Relationship-type conflict on UBERON:9900010 — the definition workflow output (unresolvable_meningohypophyseal_artery.json) explicitly recommends part_of for dorsal part of intertransversarii laterales lumborum muscle but the template uses is_a. These need to be reconciled.

Also flagged as important (see full comment):

  • UBERON:9900008 (clavicular head) has no is_a, only part_of
  • ASCTB-TEMP IRIs in the def_xref column should be in xref
  • term_tracker_item missing from both templates
  • Report files under src/templates/hra-muscular-reports/ should not live inside the template directory

Copy link
Copy Markdown

@ai4c-agent ai4c-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blocking issues before merge:

  1. Templates not wired into the build — hra-muscular.template.tsv and hra-muscular-groups.template.tsv are not registered in uberon-odk.yaml or uberon.Makefile. Without equivalent entries to the hra-skeleton precedent these terms will never compile into the ontology.

  2. Relationship-type conflict on UBERON:9900010 — the definition workflow output (unresolvable_meningohypophyseal_artery.json) explicitly recommends part_of for dorsal part of intertransversarii laterales lumborum muscle but the template uses is_a. These need to be reconciled.

Also flagged as important (see full comment):

  • UBERON:9900008 (clavicular head) has no is_a, only part_of
  • ASCTB-TEMP IRIs in the def_xref column should be in xref
  • term_tracker_item missing from both templates
  • Report files under src/templates/hra-muscular-reports/ should not live inside the template directory

Copy link
Copy Markdown

@ai4c-agent ai4c-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blocking issues before merge. 1) Templates not wired into the build: hra-muscular.template.tsv and hra-muscular-groups.template.tsv are not registered in uberon-odk.yaml or uberon.Makefile. 2) Relationship-type conflict on UBERON:9900010: workflow output says part_of but template uses is_a. See full review comment for details.

dosumis and others added 8 commits April 28, 2026 11:17
…umn assignment

Previously the agent picked is_a OR part_of and put the resolved parent in one
column, leaving the other empty. UBERON convention for specific named structures
is to populate BOTH — e.g. vastus lateralis (UBERON:0001379) has
'is_a UBERON:0001630 (muscle organ)' AND 'relationship: part_of UBERON:0001377
(quadriceps femoris)'.

ntr-term-researcher.md changes:
- Step 7 rewritten: agent now uses awk over uberon-edit.obo to find similar
  specific UBERON terms, identifies the genus class (typically muscle organ
  UBERON:0001630, muscle head UBERON:0011906) AND the part_of containing
  structure, then populates leaf_template_rows[label] = {is_a, part_of}
- Worked examples included (clavicular head, articularis genu, dominant
  follicle subtypes)
- Explicit warning: do NOT just take the source parent and assign it to one
  column; the source parent is often too broad to serve as the genus
- Output JSON gains leaf_template_rows key (analogous to group_template_rows)
- Quality checks updated: prefer leaf_template_rows; both is_a and part_of
  should be populated when applicable

merge_definitions.py changes:
- load_subagent_outputs reads leaf_template_rows
- merge_leaf_template uses leaf_template_rows first; falls back to legacy
  resolved_relationships + resolved_parents if absent (backward compatible)
- New counter 'leaf_template_rows used' in summary output

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lated

After updating Step 7 of the agent spec to require obo-grep lookup of similar
UBERON terms before assigning the parent, re-ran the 4 leaf-term agents on the
muscular-system test set. All 4 now populate both is_a and part_of columns:

| Term                                                | is_a               | part_of           |
|-----------------------------------------------------|--------------------|-------------------|
| anterior cervical intertransversarii muscle         | muscle organ       | neck              |
| articularis genu muscle                             | muscle organ       | hindlimb stylopod |
| clavicular head of pectoralis major muscle          | muscle head        | pectoralis major  |
| dorsal part of intertransversarii laterales lumborum| muscle organ       | lower back muscle |

Notable: clavicular head correctly resolved to is_a UBERON:0011906 (muscle
head) — matching the long head of biceps brachii (UBERON:0007168) precedent.
articularis genu correctly distinguished UBERON:0004252 (sibling grouping
class, not container) from UBERON:0000376 (the actual containing region) by
following the pectineus precedent.

Stage 4 reports leaf_template_rows used=4, legacy resolved_relationships=0
— confirms the new path is exercising the proper genus+location lookup
rather than the legacy single-column assignment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Standalone experiment in bulk_ntr_workflow/experiments/. Workflow scripts
NOT modified. Tests whether an agent can extract origin/insertion/
innervation/action from Wikipedia + uberon-edit.obo with UBERON ID
resolution and verbatim evidence quotes per field.

Test set (well-known whole muscle → obscure sub-part):
- internal abdominal oblique muscle (existing UBERON:0005454)
- tensor fascia latae muscle (existing UBERON:0001376)
- iliocostalis cervicalis muscle (existing UBERON:0008546)
- articularis genu muscle (NEW)
- clavicular head of pectoralis major muscle (NEW, muscle head)
- dorsal part of intertransversarii laterales lumborum (NEW, obscure)

Findings (full report in SUMMARY.md):

1. All 6 terms got 5-6 of 6 enrichment fields populated. Where UBERON IDs
   couldn't be resolved (named attachments, specific nerves, specific bone
   landmarks), the agent gracefully fell back to free-text quotes plus
   parent-class UBERON IDs.

2. The hypothesis that 'muscle parts are poorly axiomatised' is partly
   confirmed: parent muscle classes for sub-parts are missing
   (intertransversarii laterales lumborum), but the bigger gap is in
   UBERON's coverage of related anatomical entities — superior gluteal
   nerve, lateral pectoral nerve, iliotibial tract, suprapatellar bursa,
   ilioinguinal nerve, linea alba, accessory process of lumbar vertebra
   are all missing. A famous muscle like tensor fasciae latae has 2 such
   gaps; the obscure dorsal sub-part has 3 — gaps are not strongly
   correlated with term obscurity.

3. The verbatim-quote design works well for review. Each enrichment field
   carries 1-3 sentences of evidence + a source URL, making the
   enrichment auditable in seconds per field.

4. 3 of 6 picks turned out to be already in UBERON despite being plausible
   NTR candidates — Step 2 (existing-term check) continues to do real work.
   Existing UBERON stanzas often have surprisingly light axiomatisation
   (tensor fasciae latae has only 1 origin axiom), so enrichment could
   also improve existing terms, not just new ones.

No workflow changes; results are reference material for a future enrichment
phase. Roadmap candidates: (a) system-specific templates with pre-extracted
fields per system, (b) standardised evidence-quote design across all fields,
(c) cascade detection — flag missing UBERON entities as candidate NTRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User hypothesis: simple is_a + part_of would be sufficient for ovary terms
(unlike muscles where origin/insertion/innervation are needed).

Result: hypothesis NOT confirmed. 5 of 6 ovary terms tested require
relations beyond is_a + part_of:

- Layers (corona radiata, CL granulosa lutein, CL theca): need
  composed_primarily_of (CL:cell type) and/or bounding_layer_of, has_part
- Compositional complex (cumulus oophorus oocyte complex): needs has_part
  to distinguish from cumulus oophorus alone
- Follicle stages (early antral, transitional primary): need develops_from
  PLUS has_component with cardinality constraints PLUS
  has_potential_to_develop_into — UBERON's existing precedent
  (UBERON:0000035/36/37) uses all of these

Why ovary is harder for simple is_a + part_of than expected:
- Sibling layers share part_of (both lutein + theca layers part_of corpus
  luteum) — part_of alone doesn't differentiate
- Sibling follicle stages share is_a (all primary/secondary/tertiary
  is_a ovarian follicle AND part_of ovary) — neither relation distinguishes
- The defining property is cellular composition or developmental position,
  neither captured by spatial part_of

Cross-experiment comparison:
- Muscle group: simple genus + part_of EC sufficient (74% precedent)
- Muscle individual: needs muscle origin/insertion/innervation
- Muscle head/sub-part: simple is_a + part_of works (sparse precedent)
- Ovary layer/complex/stage: needs composed_primarily_of, has_part,
  develops_from, cardinality

Conclusion: per-system templates are warranted. A single one-size-fits-all
leaf template either over-fits one domain or under-serves both.

The evidence-quote JSON design transferred cleanly between domains —
confirming it as a generalisable pattern.

Output: bulk_ntr_workflow/experiments/SUMMARY_OVARY.md and 6 enriched JSONs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage 1 now partitions input rows by source `tables` value into system overlays.
Each overlay produces its own clean leaf template with system-specific columns;
unmapped tables go to the default template.

Phase 6 — develops_from on default leaf template
- New optional column with directive `SC RO:0002202 some %`
- Empty cell → no axiom (standard ROBOT pattern; do NOT alter directive to
  work around empty cells)
- Populated by agent for stage series (follicle stages, embryonic stages, etc.)

Phase 7 — skeletal-muscle overlay
- New muscle template variant: <name>-muscle.template.tsv
- Adds has_muscle_origin (RO:0002372), has_muscle_insertion (RO:0002373),
  innervated_by (RO:0002005)
- All optional; populated by agent only with evidence-quoted UBERON IDs
- Triggered by source table value `muscular-system` (SYSTEM_OVERLAYS map)

Implementation notes:
- generate_template.py: SYSTEM_OVERLAYS, classify_system(), overlay_paths();
  per-row routing builds leaf_rows_by_overlay dict; one template TSV written
  per overlay
- group_terms_by_parent.py: reads ALL leaf templates (default + system
  overlays) via discover_leaf_templates(); each per-term JSON entry now
  carries `system` field
- merge_definitions.py: REFACTORED to use header-name lookup
  (header_indices()) instead of hardcoded column indices. Each leaf
  template variant's columns are looked up at merge time. Optional logic
  columns (develops_from, has_muscle_*) populate from
  leaf_template_rows[label] when both column and value exist.
- agent spec: documents system field, develops_from + muscle-overlay
  guidance; output JSON example shows the optional fields
- CLAUDE.md: column reference splits leaf table into default + muscle
  overlay; new partitioning subsection

Smoke-tested with --table muscular-system --limit 10:
- Step 0 routing correctly outputs muscle=8, group=2 (no default partition
  since all 10 rows are muscular-system)
- Output template has 16 columns with the 6 expected logic relations
  (is_a + part_of + develops_from + 3 muscle)
- Merge with existing leaf_template_rows from previous run still works
  via legacy is_a + part_of fallback (Optional cols filled: 0 because
  agents haven't been re-run with new spec)

Phase 8 (term promotion) and overlays for skeleton/vasculature/nervous
documented in ROADMAP only — not implemented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-ran Stage 1 with --table muscular-system --limit 10. Output:
- Step 0 routing: muscle=8, group=2 (correctly partitioned by source table)
- <name>-muscle.template.tsv has 16 columns: 6 logic relations
  (is_a, part_of, develops_from, has_muscle_origin, has_muscle_insertion,
  innervated_by) + 10 metadata
- Default leaf template absent (no rows for it — all are muscular-system)

Re-ran one agent (articularis genu) with the updated spec. Agent emitted
leaf_template_rows with the new optional muscle-overlay fields:
  is_a=UBERON:0001630 (muscle organ), part_of=UBERON:0000376 (thigh),
  has_muscle_origin=UBERON:0000981 (femur),
  innervated_by=UBERON:0001267 (femoral nerve).
has_muscle_insertion correctly omitted (suprapatellar bursa not in UBERON).

Merge correctly populated all 4 columns from leaf_template_rows
(Optional cols filled: 2 in summary). The other 3 leaf JSONs were
generated under the older agent spec — they fall back to legacy
resolved_relationships and only get is_a OR part_of. To populate their
muscle-overlay fields, those agents would need to be re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stage 1 partitioning: muscle=55, group=20 (no default partition)
Stage 2 grouping: 45 groups (44 leaf + 1 grouping_terms bucket of 20 group terms)
Stage 3: 45 agents in 6 parallel batches; all completed without API timeouts
Stage 4 merge results:

Leaf template <name>-muscle.template.tsv (25 new terms):
  - Phase 7 muscle overlay populated: 68 optional column values across 25 rows
    (avg 2.7 muscle-specific columns per term — origin/insertion/innervated_by)
  - All 25 use leaf_template_rows (no legacy fallback path triggered)
  - 0 PENDING definitions, 0 INFER relationships, 0 unresolved
  - 24 of 25 have real PMID/DOI/ISBN def_xrefs added

Groups template <name>-groups.template.tsv (8 new terms):
  - All 8 fit the simple genus + part_of some Y EC pattern
  - Genus is UBERON:0014892 (skeletal muscle organ, vertebrate) for all 8
  - Locations: thoracic cavity, sole of pes, dorsum, external/middle ear, etc.

Reports:
  - candidates.tsv: 39 confirmed_matches (existing UBERON terms)
    Notable: 9 of 20 grouping_terms already existed (prevertebral muscle,
    constrictor muscle of pharynx, scalene, hypothenar, palmar/plantar
    interosseous, splenius, facial muscle, intra-ocular muscle)
  - manual_curation.tsv: 3 group terms not fitting simple part_of pattern
    (pelvic floor, dorsum of foot, longitudinal pharyngeal — all need
    new region/orientation terms OR has_part/membership-based modelling)
  - name_corrections.tsv: 17 entries (typos like 'capitus' → 'capitis',
    Latin variants like 'fascia' → 'fasciae', 'tibalis' → 'tibialis')

Wrong-domain parent issues flagged by agents (data quality in HRA source):
  - sternum→diaphragm (costal/sternal parts)
  - splenic artery→intrinsic muscle of tongue (inferior longitudinal)
  - innominate bone→pharyngeal muscle (pterygopharyngeus)
  - tip of renal papilla→hand muscle (palmar interosseous)
  - meningohypophyseal artery→lumbar back muscle (dorsal intertransversarii)
  - and many more — captured in individual JSON unresolvable notes

NOT merged to main. Branch retains everything for review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ai4c-agent
Copy link
Copy Markdown

ai4c-agent Bot commented May 11, 2026

Claude finished @dosumis's task in 5m 44s —— View job


PR Review: Add HRA Muscular NTR (Re-review after sync)

Review checklist:

  • Hierarchy and parents are consistent
  • Definitions and supporting references look adequate
  • Obsolete or replacement handling is safe
  • OBO and ODK conventions appear preserved
  • No obvious CI, auth, or workflow regression

Tasks:

  • Load context and begin review
  • Inspect current template files and term content
  • Check ODK config and Makefile registration
  • Verify relationship types and is_a completeness
  • Check xrefs and ASCTB-TEMP handling
  • Review new/changed files vs previous issues
  • Post final review

Summary

The PR has grown substantially since the first review: the original four-term hra-muscular.template.tsv has been replaced by hra-muscular-muscle.template.tsv (25 leaf terms) and hra-muscular-groups.template.tsv (8 grouping terms), for 33 new NTRs total. Two prior blockers were resolved: UBERON:9900008 (clavicular head) now correctly has is_a UBERON:0011906 (muscle head), and the UBERON:9900010 relationship-type conflict was reconciled. However, the build-registration blocker remains, and a new critical logical conflict has been introduced in the groups template.


✅ Fixed since previous review

Issue Status
UBERON:9900008 (clavicular head) missing is_a ✅ Fixed — now is_a UBERON:0011906 (muscle head)
UBERON:9900010 relationship-type conflict (is_a vs part_of) ✅ Resolved — now is_a UBERON:0001630 + part_of UBERON:0001137

🔴 CRITICAL

1. Templates still not registered in ODK config or Makefile

Neither hra-muscular-muscle.template.tsv nor hra-muscular-groups.template.tsv appear in src/ontology/uberon-odk.yaml or src/ontology/uberon.Makefile. The precedent set by hra-skeleton.template.tsv requires entries in both files:

In uberon-odk.yaml:

- filename: hra_muscular.owl
  templates:
    - hra-muscular-muscle.template.tsv
    - hra-muscular-groups.template.tsv

And a corresponding build rule in uberon.Makefile (mirroring the hra_skeleton.owl rule). Without these, all 33 new terms will never compile into the ontology.

2. Groups template: three back-muscle terms have identical EC axiom definitions — reasoner will infer equivalence

intermediate back muscle (UBERON:9900020), segmental back muscle (UBERON:9900055), and superficial back muscle (UBERON:9900063) all have the same genus (UBERON:0014892) and location (UBERON:0001137), producing:

EquivalentClass(UBERON:9900020, UBERON:0014892 AND (BFO:0000050 some UBERON:0001137))
EquivalentClass(UBERON:9900055, UBERON:0014892 AND (BFO:0000050 some UBERON:0001137))
EquivalentClass(UBERON:9900063, UBERON:0014892 AND (BFO:0000050 some UBERON:0001137))

The OWL reasoner will immediately conclude all three are equivalent classes. Grouping terms like these are usually defined via SC (SubClassOf), not EC (EquivalentClass), precisely because equivalent-class axioms require fully differentiating conditions. Either change the template column from EC % / EC BFO:0000050 some % to SC % / SC BFO:0000050 some % for all grouping rows, or provide distinct location terms for each of the three back layers.


🟡 IMPORTANT

3. ASCTB-TEMP IRIs still in the def_xref column

All 33 terms still carry https://purl.org/ccf/ASCTB-TEMP_<label> in the def_xref column (mapped to oboInOwl:hasDbXref on the definition annotation). These are ephemeral workflow-tracking IRIs, not stable literature citations. They should be moved to the general xref column or dropped entirely so they are not asserted as definition sources. Flagged in the previous review; still unfixed.

4. term_tracker_item missing from both templates

Per CLAUDE.md and Uberon conventions, terms should link back to the originating GitHub issue. Neither template has a term_tracker_item column. Flagged in the previous review; still absent.

5. hra-muscular-reports/ still under src/templates/

Pipeline outputs (candidates.tsv, errors.tsv, input.tsv, manual_curation.tsv) live in src/templates/hra-muscular-reports/. This is not a template artefact and will confuse tools that iterate src/templates/**/*.tsv. Flagged in the previous review; still in place.

6. Three leaf terms use is_a UBERON:0001630 (muscle organ) instead of the more specific UBERON:0014892 (skeletal muscle organ, vertebrate)

  • UBERON:9900003 (articularis genu muscle)
  • UBERON:9900010 (dorsal part of intertransversarii laterales lumborum muscle)
  • UBERON:9900026 (lateral posterior cervical intertransversarii muscle)

All other skeletal muscle leaf terms in the same template correctly use UBERON:0014892. These three are clearly skeletal muscles and should be consistent.

7. UBERON:9900025 (lateral compartment of leg muscle) — part_of column is empty

The definition explicitly states the structure is "part of the lateral (fibular) fascial compartment of the leg". The part_of column in the template row has no value. If a lateral compartment region term exists or is being added, it should be set here. Otherwise the logical and textual definitions are misaligned.


🔵 SUGGESTIONS

8. ISBN:9780323393225 in def_xref is non-standard

ISBN citations are not standard OBO/UBERON practice. The TA2 URL (https://fipat.library.dal.ca/ta2/) already covers the Terminologia Anatomica source. The ISBN could be dropped from def_xref or moved to a comment.

9. UBERON:9900047 (posterior abdominal wall muscle) — location is generic abdominal wall

The EC axiom uses UBERON:0003697 (abdominal wall), but the term name specifies the posterior abdominal wall. If a posterior abdominal wall region term exists, it would be a more precise location.


Per-template verdict

Leaf template (25 terms) — most terms look anatomically sound with good definitions and PMIDs. Specific concerns:

NTR Issue
UBERON:9900003 (articularis genu) is_a too broad (muscle organ instead of skeletal muscle organ, vertebrate)
UBERON:9900010 (dorsal part lumborum) is_a too broad (same issue)
UBERON:9900025 (lateral compartment leg) part_of column blank despite definition mentioning compartment
UBERON:9900026 (lateral post. cervical intertransversarii) is_a too broad (same issue)

Groups template (8 terms) — logical-definition structure is sound in principle, but three back-layer groupings share identical EC axioms (see critical issue 2 above). The remaining five groupings (ear, abdominal wall, sole, thoracic wall) each have unique location values and are fine.

Two prior blockers are resolved. The remaining critical items (build registration and EC axiom collision) must be addressed before merge.

@github-actions
Copy link
Copy Markdown
Contributor

This PR modifies some logical definitions. See details below:

Number of classes with modified logical definitions: 2

trabecular sinus of lymph node

No changes in inferred subclasses.

subcapsular sinus of lymph node

No changes in inferred subclasses.

Copy link
Copy Markdown

@ai4c-agent ai4c-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two new critical issues must be fixed before merge:

  1. Templates still not registered in ODK config/Makefile — both hra-muscular-muscle.template.tsv and hra-muscular-groups.template.tsv are absent from uberon-odk.yaml and uberon.Makefile. No component build rule exists, so all 33 new terms will never compile.

  2. Groups template EC axiom collision — intermediate back muscle (UBERON:9900020), segmental back muscle (UBERON:9900055), and superficial back muscle (UBERON:9900063) all produce identical EquivalentClass axioms (skeletal muscle organ AND part_of dorsum). The reasoner will infer these three terms are equivalent. These grouping terms should use SC (SubClassOf) columns, not EC (EquivalentClass).

Fixed from prior review: UBERON:9900008 now has is_a (muscle head) ✅; UBERON:9900010 relationship conflict resolved ✅.

Still outstanding from prior review (IMPORTANT): ASCTB-TEMP IRIs remain in the def_xref column; term_tracker_item columns missing from both templates; hra-muscular-reports/ still under src/templates/.

See the full review comment for details and per-term verdict.

dosumis and others added 3 commits May 11, 2026 17:35
…m run

26 unresolvable notes extracted from per-term JSON outputs in
bulk_ntr_workflow/outputs/definitions/. Most are wrong-domain parent
issues in HRA source data (sternum→diaphragm, splenic artery→tongue
muscle, innominate bone→pharyngeal muscle, meningohypophyseal artery→
lumbar muscle, anterior cerebral artery→hand muscle, fused sacrum→
abdominal muscle, etc.).

Workflow gap noted: merge_definitions.py currently writes aggregated
reports for confirmed_matches, name_corrections, out_of_scope, and
manual_curation, but NOT for unresolvable. Adding this aggregation to
the merge script would surface these systematically without manual
extraction. Tracked for follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces standalone unresolvable.tsv with a single curator-facing review
file. Each row of the original input gets:

| Column                  | Purpose                                       |
|-------------------------|-----------------------------------------------|
| table, as_iri, label    | Input source identifiers                       |
| term_type               | leaf | group                                  |
| source_parent_id        | Parent ID from source data                     |
| source_parent_label     | Parent label from source                       |
| status                  | confirmed_match | new_term_leaf |               |
|                         | new_term_group | manual_curation                |
| mapped_uberon_id        | Either existing UBERON ID (if confirmed)      |
|                         | OR new UBERON:99xxxxx ID (if new term)        |
| label_correction        | Suggested corrected label (typos etc.)        |
| label_correction_reason | Why the label was corrected                    |
| parent_correction       | Corrected parent UBERON ID (when source was   |
|                         | wrong-domain or missing)                       |
| curator_notes           | Pipe-separated unresolvable notes from agents |
|                         | (wrong-domain parents, missing UBERON          |
|                         | entities, modelling caveats)                   |

Aggregated stats for the muscular-system run:
- 75 input rows
- Status: 39 confirmed_match, 25 new_term_leaf, 8 new_term_group,
  3 manual_curation
- 17 with label corrections
- 34 with parent corrections (parent_correction populated)
- 20 with curator notes (mostly wrong-domain source parents)

This is the file curators should review first — one row per input
term, all findings consolidated, no need to dig into per-term JSONs.

Built one-off via Python script joining input.tsv + candidates.tsv +
name_corrections.tsv + manual_curation.tsv + per-term unresolvable
notes from outputs/definitions/*.json. Should be moved into
merge_definitions.py as an aggregated report writer (follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new columns make curator review substantially faster:

- mapped_label: human-readable label of the mapped UBERON term
  (looked up from src/ontology/uberon-edit.obo). For new terms, shows
  '(new term: <label>)'. Saves the curator from having to look up each
  UBERON ID separately.

- parent_correction_label: human-readable label for parent_correction.
  Same rationale.

- mapping_evidence: synthesised evidence from per-term JSON outputs
  explaining WHY each source term maps to its existing UBERON match.
  Pulls from various agent field names (note, match_notes,
  match_evidence, web_evidence, rationale, matched_synonyms).
  Example: 'UBERON:0035831 has EXACT synonym "costal part of diaphragm"
  (FMA:58277). The source label "respiratory diaphragm muscle"
  resolves to UBERON:0001103 (diaphragm)...'

Stats:
- 71/75 rows have mapped_label populated
- 29 confirmed_match rows have mapping_evidence (the other 10 had
  agents that didn't emit the optional evidence fields — minor gap)
- 34 with parent_correction (so 34 with parent_correction_label too)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ai4c-agent
Copy link
Copy Markdown

ai4c-agent Bot commented May 11, 2026

Claude finished @dosumis's task in 6m 48s —— View job


PR Review: Add HRA Muscular NTR (Third review)

Tasks:

  • Load context and begin review
  • Inspect current template files and term content
  • Check ODK config and Makefile registration
  • Verify EC axiom issue in groups template
  • Check is_a parents for leaf terms
  • Check xrefs and ASCTB-TEMP handling
  • Check term_tracker_item and report file placement
  • Post final review

Review checklist:

  • Hierarchy and parents are consistent (with exceptions noted below)
  • Definitions and supporting references look adequate
  • Obsolete or replacement handling is safe
  • OBO and ODK conventions appear preserved — FAILS (templates not registered; EC collision; missing term_tracker_item)
  • No obvious CI, auth, or workflow regression

Summary

No issues from the previous review have been fixed. The three most recent commits (bb73ff6, 77fd128, 0fdd84d) only add columns and a report to the pipeline output files in src/templates/hra-muscular-reports/. The term templates themselves are unchanged since the second review. Both critical blockers remain open, as do all the important issues.

Additionally, the new NTR terms are not present in src/ontology/uberon-edit.obo at all — they exist only in the two unregistered template TSV files and will have no effect on the ontology until the build-registration blocker is resolved.

The uberon-edit.obo changes in this PR (removing UBERON:1200004 and updating lymph node sinus axioms) appear correct; CI confirms no regression in inferred subclasses for the two affected terms.


🔴 CRITICAL (same as previous review — still unresolved)

1. Templates not registered in ODK config or Makefile

hra-muscular-muscle.template.tsv (25 terms) and hra-muscular-groups.template.tsv (8 terms) are absent from both src/ontology/uberon-odk.yaml and src/ontology/uberon.Makefile. The hra-skeleton precedent requires:

In uberon-odk.yaml:

- filename: hra_muscular.owl
  templates:
    - hra-muscular-muscle.template.tsv
    - hra-muscular-groups.template.tsv

And a corresponding build rule in uberon.Makefile mirroring the hra_skeleton.owl rule at line 1364. Without these, all 33 new NTRs will never compile into the ontology.

2. EC axiom collision in groups template — three back-layer groupings will be inferred equivalent

intermediate back muscle (UBERON:9900020), segmental back muscle (UBERON:9900055), and superficial back muscle (UBERON:9900063) all use:

EC UBERON:0014892
EC BFO:0000050 some UBERON:0001137

All three produce the identical EquivalentClass axiom skeletal_muscle_organ AND part_of dorsum. An OWL reasoner will immediately conclude these three classes are mutually equivalent. Change the groups template row-2 column tags from EC % / EC BFO:0000050 some % to SC % / SC BFO:0000050 some % for all grouping rows, or provide distinct location terms for the three back layers.


🟡 IMPORTANT (same as previous review — still unresolved)

3. ASCTB-TEMP IRIs still in the def_xref column

All 33 terms still carry https://purl.org/ccf/ASCTB-TEMP_<label> in the def_xref column (row-2: >A oboInOwl:hasDbXref SPLIT=|). These ephemeral workflow-tracking IRIs are asserted as definition sources. Move them to the general xref column or drop them.

4. term_tracker_item missing from both templates

Neither template has a term_tracker_item column. Per CLAUDE.md and Uberon convention, terms must link back to the originating GitHub issue.

5. hra-muscular-reports/ still under src/templates/

The reports directory now has 6 files (candidates.tsv, errors.tsv, input.tsv, manual_curation.tsv, name_corrections.tsv, review.tsv). These are pipeline outputs and should not live under src/templates/, where any tool iterating src/templates/**/*.tsv will treat them as ROBOT templates.

6. Three leaf terms use is_a UBERON:0001630 (muscle organ) instead of UBERON:0014892 (skeletal muscle organ, vertebrate)

NTR ID Label
UBERON:9900003 articularis genu muscle
UBERON:9900010 dorsal part of intertransversarii laterales lumborum muscle
UBERON:9900026 lateral posterior cervical intertransversarii muscle

All other skeletal muscle leaf terms in the same template use UBERON:0014892. These three are clearly skeletal muscles and should be consistent.

7. UBERON:9900025 (lateral compartment of leg muscle) — part_of column is empty

The definition says "A skeletal muscle organ that is part of the lateral (fibular) fascial compartment of the leg". The part_of column (column 6) contains no value, leaving the textual and logical definitions misaligned.


🔵 SUGGESTIONS (carried from previous review)

8. ISBN:9780323393225 in def_xref is non-standard — the TA2 URL already covers the Terminologia Anatomica reference; the ISBN could be dropped or moved to a comment.

9. UBERON:9900047 (posterior abdominal wall muscle) — uses UBERON:0003697 (abdominal wall, generic) as location; if a posterior abdominal wall region term exists, it would be more precise.


What changed since the last review

The three newest commits add pipeline output enhancements:

  • name_corrections.tsv (new) — captures source label corrections; useful for QC
  • review.tsv — three new columns (mapped_label, parent_correction_label, mapping_evidence) for easier curator review

These are valuable workflow additions, but do not address any of the open issues.

The uberon-edit.obo changes (removal of UBERON:1200004 and related axiom updates) look correct; CI reports no inferred-subclass regressions for the two affected lymph node sinus terms.


Per-template verdict (unchanged)

Leaf template (25 terms): Definition quality is high with good PMID support. Specific concerns remain on UBERON:9900003, 9900010, 9900026 (is_a too broad) and UBERON:9900025 (part_of blank).

Groups template (8 terms): The critical EC axiom collision on the three back-layer groupings (UBERON:9900020, 9900055, 9900063) must be fixed. The other five groupings (ear, abdominal wall, sole, thoracic wall, middle ear) each have unique locations and are logically sound.

The two critical blockers (build registration; EC axiom collision) and the important issues must be addressed before merge.

Copy link
Copy Markdown

@ai4c-agent ai4c-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two critical blockers remain unresolved since the previous review:

  1. Templates not registered in ODK config/Makefile — hra-muscular-muscle.template.tsv and hra-muscular-groups.template.tsv are absent from uberon-odk.yaml and uberon.Makefile. All 33 new NTR terms exist only in unregistered template files and will never compile into the ontology.

  2. EC axiom collision in groups template — intermediate back muscle (UBERON:9900020), segmental back muscle (UBERON:9900055), and superficial back muscle (UBERON:9900063) produce identical EquivalentClass axioms (skeletal muscle organ AND part_of dorsum). The reasoner will infer all three are equivalent classes. Change EC to SC for grouping rows.

Also still outstanding (flagged in both prior reviews): ASCTB-TEMP IRIs in def_xref; term_tracker_item missing from both templates; hra-muscular-reports/ under src/templates/; three leaf terms (9900003, 9900010, 9900026) using too-broad is_a UBERON:0001630; UBERON:9900025 part_of column empty.

No issues from the previous review were addressed in the three most recent commits. See the full review comment for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant