From c67407f2c37917da9665f04151ed49a198f1062a Mon Sep 17 00:00:00 2001 From: dragon Date: Mon, 22 Dec 2025 12:00:52 -0800 Subject: [PATCH] Add tutorials: batch term addition and literature-assisted curation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses #56 - Adds two new practical tutorials: 1. Batch Term Addition with AI Assistance - Preparing term lists for AI processing - Providing context and patterns - Review checklists and iteration strategies - Handling large batches and common issues 2. Literature-Assisted Curation - Finding citations with AI assistance - Validating citations (critical for avoiding hallucinations) - Extracting supporting text from papers - Incorporating provenance into ontology terms Both tutorials follow practical, step-by-step format aligned with the site's mission of immediately actionable guides. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- docs/tutorials/batch-term-addition.md | 292 +++++++++++++++ .../tutorials/literature-assisted-curation.md | 344 ++++++++++++++++++ mkdocs.yml | 2 + 3 files changed, 638 insertions(+) create mode 100644 docs/tutorials/batch-term-addition.md create mode 100644 docs/tutorials/literature-assisted-curation.md diff --git a/docs/tutorials/batch-term-addition.md b/docs/tutorials/batch-term-addition.md new file mode 100644 index 0000000..e8f2f6f --- /dev/null +++ b/docs/tutorials/batch-term-addition.md @@ -0,0 +1,292 @@ +# Batch Term Addition with AI Assistance + +This tutorial walks through using AI to add multiple terms to an ontology efficiently. Instead of adding terms one by one, you'll learn to batch requests and let the AI handle the mechanical work while you focus on review and quality control. + +## Prerequisites + +- Familiarity with ontology structure and your target ontology +- Access to an AI tool (Claude Code, Goose, or GitHub agent) +- A list of terms to add (from literature, user requests, or gap analysis) + +## Overview + +Adding terms in batch is one of the most common curation tasks where AI excels. The workflow is: + +1. Prepare your term list with key information +2. Provide context about placement and patterns +3. Let AI generate the additions +4. Review and refine + +## Step 1: Prepare Your Term List + +Create a structured list of terms to add. The more information you provide, the better the results. + +### Minimal Format + +``` +- apoptotic cell +- necrotic cell +- senescent cell +``` + +### Better Format + +``` +Terms to add under "cell" (CL:0000000): + +1. apoptotic cell + - Definition hint: A cell undergoing programmed cell death + - Related: caspase activation, DNA fragmentation + +2. necrotic cell + - Definition hint: A cell that has died through necrosis + - Related: membrane rupture, inflammation + +3. senescent cell + - Definition hint: A cell that has ceased dividing but remains metabolically active + - Related: SASP, p16, cell cycle arrest +``` + +### Best Format (with PMIDs) + +``` +Terms to add under "cell" (CL:0000000): + +1. apoptotic cell + - Parent: cell (CL:0000000) + - Definition: A cell that is undergoing apoptosis, characterized by + caspase activation, chromatin condensation, and membrane blebbing. + - PMID: 12345678 (for definition support) + - Synonyms: dying cell (broad) + +2. necrotic cell + - Parent: cell (CL:0000000) + - Definition: A cell that has died through necrosis, typically + characterized by membrane rupture and release of cellular contents. + - PMID: 23456789 + - Note: Distinguish from necroptotic cell (programmed necrosis) +``` + +## Step 2: Provide Context + +Before asking AI to add terms, establish the context: + +### Pattern Examples + +Point the AI to existing terms as templates: + +``` +Look at how "B cell" (CL:0000236) is defined in the Cell Ontology. +Use this as a pattern for the new immune cell types. +The definition should follow genus-differentia form: +"A [parent type] that [distinguishing characteristics]." +``` + +### Relationship Guidance + +Specify the relationships you expect: + +``` +For each cell type: +- Add is_a relationship to the appropriate parent +- Add part_of relationship to the tissue where applicable +- Include has_part for key organelles if distinctive +``` + +### Ontology-Specific Conventions + +Reference your ontology's style guide: + +``` +Follow CL conventions: +- Use lowercase for term labels +- Include exactly one textual definition +- Add database_cross_reference for PMIDs +- Use 'exact', 'broad', 'narrow', or 'related' for synonym types +``` + +## Step 3: Execute the Batch Addition + +### Using Claude Code or Goose + +Navigate to your ontology repository and use a prompt like: + +``` +I need to add the following cell types to the Cell Ontology. +Use the existing term "natural killer cell" (CL:0000623) as a pattern. + +Terms to add: +1. invariant natural killer T cell + - Parent: natural killer cell + - Definition: A natural killer cell that expresses an invariant + T cell receptor and responds to lipid antigens presented by CD1d. + - PMID: 16034096 + +2. cytokine-induced killer cell + - Parent: natural killer cell + - Definition: A natural killer cell that has been activated ex vivo + with cytokines and exhibits enhanced cytotoxic activity. + - PMID: 19056535 + +Please: +1. Look at the existing pattern for natural killer cell +2. Add these terms following that pattern +3. Include proper provenance using the PMIDs +4. Show me the changes before committing +``` + +### Using GitHub Agent + +Create an issue with your request: + +```markdown +## Request: Add immune cell subtypes + +Please add the following terms to CL: + +### Terms + +1. **invariant natural killer T cell** + - Parent: natural killer cell (CL:0000623) + - PMID: 16034096 for definition + +2. **cytokine-induced killer cell** + - Parent: natural killer cell (CL:0000623) + - PMID: 19056535 for definition + +### Instructions + +- Follow existing patterns in CL for cell type definitions +- Use genus-differentia definitions +- Add database_cross_reference for PMIDs +``` + +Then invoke: +``` +@dragon-ai-agent please resolve this issue +``` + +## Step 4: Review the Changes + +AI will typically produce changes like this (OBO format example): + +``` +[Term] +id: CL:0001234 +name: invariant natural killer T cell +def: "A natural killer cell that expresses an invariant T cell receptor +alpha chain and responds to lipid antigens presented by CD1d molecules." +[PMID:16034096] +is_a: CL:0000623 ! natural killer cell +synonym: "iNKT cell" EXACT [] +synonym: "type I NKT cell" RELATED [] +``` + +### Review Checklist + +- [ ] IDs are correctly formatted (or placeholders for your ID scheme) +- [ ] Definitions follow genus-differentia pattern +- [ ] Parent relationships are correct +- [ ] PMIDs are valid (check they exist) +- [ ] Synonyms are appropriately typed +- [ ] No duplicate terms created +- [ ] Formatting matches ontology conventions + +## Step 5: Iterate and Refine + +If changes need adjustment: + +``` +The definition for "invariant natural killer T cell" should mention +that these cells bridge innate and adaptive immunity. Please update +the definition to include this characteristic. +``` + +Or for bulk refinements: + +``` +Please update all the new definitions to explicitly state the +cell's primary function or role, not just its markers. +``` + +## Tips for Large Batches + +### Chunk Your Requests + +For 50+ terms, break into logical groups: + +``` +Let's add these in phases: +1. First, add the 10 T cell subtypes +2. Then I'll review +3. Next, add the 15 B cell subtypes +4. And so on... +``` + +### Use Spreadsheets for Input + +For very large batches, prepare a TSV/CSV: + +``` +label parent_id definition pmid synonyms +invariant natural killer T cell CL:0000623 A natural killer cell that... 16034096 iNKT cell|type I NKT cell +``` + +Then: +``` +Please read the file new_terms.tsv and add each row as a new term +to the ontology, following CL conventions. +``` + +### Validate Before Committing + +Ask for validation: +``` +Before committing, please: +1. Run the reasoner to check for logical errors +2. Verify no terms with these labels already exist +3. Check that all referenced parent IDs exist +``` + +## Common Issues and Solutions + +### AI Invents Non-Existent Parents + +**Problem:** AI creates relationships to terms that don't exist. + +**Solution:** Explicitly list valid parent options: +``` +Only use these as parents: +- cell (CL:0000000) +- native cell (CL:0000003) +- lymphocyte (CL:0000542) +Do NOT create new intermediate classes. +``` + +### Inconsistent Definition Style + +**Problem:** Each definition has different structure. + +**Solution:** Provide a template: +``` +All definitions must follow this pattern: +"A [parent type] that [key characteristic], [additional features]. +[Optional: functional description]." +``` + +### Missing Provenance + +**Problem:** AI doesn't include PMIDs. + +**Solution:** Make it explicit: +``` +Every definition MUST have a database_cross_reference to a PMID. +If I didn't provide a PMID, please note that the term needs +literature support before it can be merged. +``` + +## Next Steps + +- [Literature-Assisted Curation](literature-assisted-curation.md) - Find supporting citations +- [AI-Powered Quality Control](quality-control-with-ai.md) - Validate your additions +- [Ontology Editing with AI](ontology-editing-with-ai.md) - General AI workflow tutorial diff --git a/docs/tutorials/literature-assisted-curation.md b/docs/tutorials/literature-assisted-curation.md new file mode 100644 index 0000000..9b0c6b3 --- /dev/null +++ b/docs/tutorials/literature-assisted-curation.md @@ -0,0 +1,344 @@ +# Literature-Assisted Curation with AI + +This tutorial shows how to use AI to find, validate, and incorporate literature references into ontology curation. You'll learn to leverage AI for literature review while maintaining rigorous provenance standards. + +## Prerequisites + +- Basic familiarity with ontology curation +- Access to an AI tool (Claude Code, Goose, or GitHub agent) +- Understanding of citation requirements for your ontology + +## The Challenge + +Ontology terms need literature support for: +- Definitions +- Synonym assertions +- Relationship claims +- Comments and notes + +Finding appropriate citations is time-consuming. AI can help by: +- Searching literature for relevant papers +- Extracting key information from papers +- Suggesting definitions based on literature +- Validating that citations support claims + +## Workflow Overview + +``` +1. Identify what needs citations +2. Use AI for literature search +3. Validate citations (AI can hallucinate!) +4. Extract supporting text +5. Incorporate into ontology with provenance +``` + +## Step 1: Identify Citation Needs + +Start by identifying terms that need literature support: + +``` +Please review the following terms and identify which ones lack +proper literature citations in their definitions: + +- regulatory T cell (CL:0000815) +- T follicular helper cell (CL:0002038) +- memory B cell (CL:0000787) + +For each, note what kind of citation would be helpful. +``` + +AI might respond: + +``` +Citation needs: + +1. regulatory T cell - Has PMID but from 2005, may benefit from + updated review citation + +2. T follicular helper cell - Definition lacks PMID, needs citation + for the defining characteristics (CXCR5 expression, B cell help) + +3. memory B cell - Has PMID:12345678, adequate +``` + +## Step 2: Literature Search + +Ask AI to find relevant literature: + +### Basic Search + +``` +Find recent review articles (2020-2024) that describe T follicular +helper cells and their key characteristics. I need citations that +support a definition emphasizing: +- CXCR5 expression +- Localization to B cell follicles +- Role in B cell help and antibody responses +``` + +### Targeted Search + +``` +I need a PMID that specifically supports the claim that regulatory +T cells express FOXP3 constitutively. Please find a primary research +paper or authoritative review that makes this claim explicitly. +``` + +### Deep Research (for complex topics) + +For comprehensive literature review, use deep research capabilities: + +``` +@deep-research please provide a comprehensive review of the current +understanding of T follicular helper cell biology, including: +1. Defining markers and transcription factors +2. Development pathway +3. Functional subsets +4. Role in disease + +Focus on finding authoritative reviews from the past 3 years. +``` + +## Step 3: Validate Citations + +**Critical:** AI can hallucinate citations. Always validate: + +### Manual Validation + +For important citations: +1. Check the PMID exists on PubMed +2. Verify the paper's title and authors match +3. Confirm the paper actually supports the claim + +### AI-Assisted Validation + +Ask AI to validate its own citations: + +``` +For each PMID you suggested, please: +1. Confirm the paper exists by providing its exact title +2. Quote the specific passage that supports our definition +3. Note the year and journal +``` + +### Batch Validation + +For many citations: + +``` +I have the following PMIDs that need validation. For each, confirm +it exists and briefly state what the paper is about: + +PMID:35123456 +PMID:34567890 +PMID:33456789 +``` + +## Step 4: Extract Supporting Text + +Once citations are validated, extract relevant passages: + +``` +Please read PMID:35123456 and extract: +1. Any sentences that define what a T follicular helper cell is +2. Key characteristics mentioned (markers, location, function) +3. Specific claims that could support ontology assertions +``` + +### For Definition Writing + +``` +Based on PMID:35123456, draft a genus-differentia definition for +"T follicular helper cell" that: +- States the parent type (T cell) +- Lists defining characteristics +- Is supported by claims in the paper +- Follows Cell Ontology style guidelines +``` + +### For Relationship Support + +``` +Does PMID:35123456 support the assertion that T follicular helper +cells are part_of germinal centers? Quote any relevant text. +``` + +## Step 5: Incorporate with Provenance + +Add citations to your ontology with proper attribution: + +### OBO Format Example + +``` +[Term] +id: CL:0002038 +name: T follicular helper cell +def: "A CD4-positive, alpha-beta T cell that expresses CXCR5, is +located in B cell follicles, and provides help to B cells for +antibody production and class switching." [PMID:35123456] +synonym: "Tfh cell" EXACT [] +is_a: CL:0000624 ! CD4-positive, alpha-beta T cell +relationship: part_of UBERON:0001744 ! lymphoid follicle +``` + +### Multiple Citations + +When multiple papers support a definition: + +``` +def: "A CD4-positive T cell that..." [PMID:35123456, PMID:34567890] +``` + +### Citation for Specific Claims + +``` +comment: "Recent evidence suggests Tfh cells can also arise from +regulatory T cell precursors (PMID:36789012)." +``` + +## Practical Examples + +### Example 1: Updating an Outdated Definition + +``` +The definition of "plasma cell" references a 2003 paper. Please: + +1. Find a current (2020+) review on plasma cell biology +2. Check if the existing definition is still accurate +3. Suggest any updates based on current understanding +4. Provide the new PMID for the updated definition +``` + +### Example 2: Adding a New Term with Literature Support + +``` +I need to add "tissue-resident memory T cell" to the Cell Ontology. + +Please: +1. Find 2-3 key papers that define this cell type +2. Identify the consensus defining characteristics +3. Draft a definition with proper genus-differentia structure +4. Suggest appropriate parent terms and relationships +5. List synonyms mentioned in the literature +``` + +### Example 3: Validating Existing Citations + +``` +Please audit the following term for citation accuracy: + +[Term] +id: CL:0000815 +name: regulatory T cell +def: "A T cell that regulates..." [PMID:12034567] + +Check: +1. Does this PMID exist? +2. Does it support the current definition? +3. Is there a better (more recent/authoritative) citation? +``` + +## Common Patterns + +### Pattern: Definition with Recent Review + +``` +Please find the most recent (2022-2024) comprehensive review of +[cell type] and use it as the primary definition source. +``` + +### Pattern: Original Research Citation + +``` +For the claim that [specific characteristic], find the original +research paper that first described this, not a review. +``` + +### Pattern: Multiple Supporting Citations + +``` +This definition makes 3 claims: +1. [Claim A] +2. [Claim B] +3. [Claim C] + +Find a PMID that supports each claim. They can be different papers. +``` + +## Avoiding Common Pitfalls + +### Hallucinated Citations + +**Problem:** AI invents PMIDs that don't exist. + +**Solution:** Always validate. Use this prompt: +``` +For PMID:XXXXXXXX, please confirm: +- The exact paper title +- First author's last name +- Publication year +- Journal name + +If you cannot confirm all of these, say "UNABLE TO VERIFY". +``` + +### Outdated Citations + +**Problem:** AI suggests old papers when newer evidence exists. + +**Solution:** Specify recency: +``` +Find citations from 2020-2024 only. If the field has evolved, +note any changes from earlier understanding. +``` + +### Unsupported Claims + +**Problem:** Citation doesn't actually support the claim. + +**Solution:** Request quotes: +``` +Quote the specific sentence from the paper that supports this +definition. If no such sentence exists, say so. +``` + +### Over-Reliance on Reviews + +**Problem:** Only using review articles, missing primary sources. + +**Solution:** Request mix: +``` +Provide one primary research paper and one review article to +support this term. +``` + +## Tools and Resources + +### PubMed Integration + +Some AI setups can search PubMed directly: +``` +Search PubMed for: "T follicular helper" AND "definition" AND review[pt] +Filter: 2020-2024 +``` + +### Full-Text Access + +For papers with full text available: +``` +Please read the full text of PMID:35123456 (it should be on PMC) +and summarize the methodology section. +``` + +### Reference Tracking + +``` +What papers does PMID:35123456 cite for its definition of Tfh cells? +These might be good primary sources. +``` + +## Next Steps + +- [Batch Term Addition](batch-term-addition.md) - Apply literature findings to add terms +- [Make IDs Hallucination-Resistant](../how-tos/make-ids-hallucination-resistant.md) - Validation techniques +- [Deep Research Integration](../how-tos/deep-research-integration.md) - Advanced literature review diff --git a/mkdocs.yml b/mkdocs.yml index 159af61..6905890 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -45,6 +45,8 @@ nav: - Integrate AI into your KB: how-tos/integrate-ai-into-your-kb.md - Tutorials: - Ontology editing with AI: tutorials/ontology-editing-with-ai.md + - Batch term addition: tutorials/batch-term-addition.md + - Literature-assisted curation: tutorials/literature-assisted-curation.md - Reference: - Client apps: reference/client-apps.md - Claude Skills: reference/claude-skills.md