From 0359f75a6d960e5f93206ebf4516ddf1305cc8df Mon Sep 17 00:00:00 2001 From: sixiang <1210552219@qq.com> Date: Mon, 16 Mar 2026 23:21:51 +0800 Subject: [PATCH 1/2] feat: update research skills part 9 (16 folders) --- skills/Research/tides/SKILL.md | 26 +- skills/Research/torchdrug/SKILL.md | 182 +------ .../tracking-regression-tests/SKILL.md | 9 +- .../Research/transit-least-squares/SKILL.md | 26 +- skills/Research/uniprot-database/SKILL.md | 253 ++------- skills/Research/units/SKILL.md | 26 +- skills/Research/update-flaky-tests/SKILL.md | 33 +- skills/Research/usgs-data-download/SKILL.md | 26 +- skills/Research/uspto-database/SKILL.md | 28 +- skills/Research/uv-global/SKILL.md | 26 +- skills/Research/venue-templates/SKILL.md | 29 +- .../Research/wandb-experiment-logger/SKILL.md | 15 +- skills/Research/weather-query/SKILL.md | 26 +- skills/Research/wiring/SKILL.md | 481 +----------------- .../Research/zemax-optical-designer/SKILL.md | 26 +- skills/Research/zinc-database/SKILL.md | 337 +----------- 16 files changed, 305 insertions(+), 1244 deletions(-) diff --git a/skills/Research/tides/SKILL.md b/skills/Research/tides/SKILL.md index ac03612e..d2713a3b 100644 --- a/skills/Research/tides/SKILL.md +++ b/skills/Research/tides/SKILL.md @@ -1,20 +1,32 @@ --- id: tides name: Tides -description: Step-by-step guidance for tides. +description: Step-by-step guidance for tidal data analysis, prediction workflows, and coastal research practices. category: Research +requires: [] +examples: + - When is the next high tide in San Francisco Bay? + - Provide a tide chart for the coast of Maine for the upcoming week. --- # Tides Support tides workflows with clear steps and best practices. -## When to Use +## Instruction +- Identify the target geographic location or specific tidal station ID for analysis. +- Retrieve tidal height predictions and historical observations from authoritative maritime databases. +- Analyze tidal cycles to identify the exact times and heights of high tide, low tide, and Slack water. +- Account for local datum shifts (e.g., MLLW, MSL) to ensure consistency in depth measurements. +- Correlate tidal data with lunar phases and meteorological factors (e.g., storm surges) to assess potential flooding risks. +- Provide step-by-step guidance for integrating tidal charts into coastal research or navigation planning. -- You need help with tides. -- You want a clear, actionable next step. +## When to Use +- When planning coastal field research, maritime navigation, or coastal engineering projects. +- When needing precise high/low tide schedules for specific global ports or coastal segments. +- When assessing environmental impacts and sediment transport patterns influenced by tidal cycles. ## Output - -- Summary of goals and plan -- Key tips and precautions +- A comprehensive tidal forecast report including peak times and height estimates. +- Visualized tidal curves or charts for the requested time window. +- Actionable safety precautions and optimal time windows for coastal activities. \ No newline at end of file diff --git a/skills/Research/torchdrug/SKILL.md b/skills/Research/torchdrug/SKILL.md index c843cf1b..e2599d0c 100644 --- a/skills/Research/torchdrug/SKILL.md +++ b/skills/Research/torchdrug/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: torchdrug name: Torchdrug -description: Graph-based drug discovery toolkit. Molecular property prediction (ADMET), protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis, GNNs (GIN, GAT, SchNet), 40+ datasets, for PyTorch-based ML on molecules, proteins, and biomedical graphs. +description: PyTorch-based drug discovery toolkit for molecular property prediction, protein modeling, and retrosynthesis using GNNs. +category: Research +requires: [] +examples: + - Predict the ADMET properties for this SMILES string using TorchDrug. + - Build a protein-protein interaction network from this biological dataset. --- # TorchDrug @@ -37,54 +41,10 @@ This skill should be used when working with: - Compatible with PyTorch and PyTorch Lightning - Integrates with AlphaFold and ESM for proteins -## Getting Started - -### Installation - -```bash -uv pip install torchdrug -# Or with optional dependencies -uv pip install torchdrug[full] -``` - -### Quick Example - -```python -from torchdrug import datasets, models, tasks -from torch.utils.data import DataLoader - -# Load molecular dataset -dataset = datasets.BBBP("~/molecule-datasets/") -train_set, valid_set, test_set = dataset.split() - -# Define GNN model -model = models.GIN( - input_dim=dataset.node_feature_dim, - hidden_dims=[256, 256, 256], - edge_input_dim=dataset.edge_feature_dim, - batch_norm=True, - readout="mean" -) - -# Create property prediction task -task = tasks.PropertyPrediction( - model, - task=dataset.tasks, - criterion="bce", - metric=["auroc", "auprc"] -) - -# Train with PyTorch -optimizer = torch.optim.Adam(task.parameters(), lr=1e-3) -train_loader = DataLoader(train_set, batch_size=32, shuffle=True) - -for epoch in range(100): - for batch in train_loader: - loss = task(batch) - optimizer.zero_grad() - loss.backward() - optimizer.step() -``` +## Output +- Trained GNN models and associated performance metric reports. +- Predicted molecular properties or optimized molecular structures in SMILES format. +- Step-by-step guidance for configuring complex drug discovery pipelines in PyTorch. ## Core Capabilities @@ -103,7 +63,7 @@ Predict chemical, physical, and biological properties of molecules from structur - GNN models (GIN, GAT, SchNet) - PropertyPrediction and MultipleBinaryClassification tasks -**Reference:** See `references/molecular_property_prediction.md` for: +**Reference:** - Complete dataset catalog - Model selection guide - Training workflows and best practices @@ -126,7 +86,7 @@ Work with protein sequences, structures, and properties. - Structure models (GearNet, SchNet) - Multiple task types for different prediction levels -**Reference:** See `references/protein_modeling.md` for: +**Reference:** - Protein-specific datasets - Sequence vs structure models - Pre-training strategies @@ -147,7 +107,7 @@ Predict missing links and relationships in biological knowledge graphs. - Embedding models (TransE, RotatE, ComplEx) - KnowledgeGraphCompletion task -**Reference:** See `references/knowledge_graphs.md` for: +**Reference:** - Knowledge graph datasets (including Hetionet with 45k biomedical entities) - Embedding model comparison - Evaluation metrics and protocols @@ -169,7 +129,7 @@ Generate novel molecular structures with desired properties. - GraphAutoregressiveFlow - Property optimization workflows -**Reference:** See `references/molecular_generation.md` for: +**Reference:** - Generation strategies (unconditional, conditional, scaffold-based) - Multi-objective optimization - Validation and filtering @@ -191,7 +151,7 @@ Predict synthetic routes from target molecules to starting materials. - SynthonCompletion (reactant prediction) - End-to-end Retrosynthesis pipeline -**Reference:** See `references/retrosynthesis.md` for: +**Reference:** - Task decomposition (center ID → synthon completion) - Multi-step synthesis planning - Commercial availability checking @@ -208,7 +168,7 @@ Comprehensive catalog of GNN architectures for different data types and tasks. - Knowledge graph: TransE, RotatE, ComplEx, SimplE - Generative: GraphAutoregressiveFlow -**Reference:** See `references/models_architectures.md` for: +**Reference:** - Detailed model descriptions - Model selection guide by task and dataset - Architecture comparisons @@ -224,7 +184,7 @@ Comprehensive catalog of GNN architectures for different data types and tasks. - Knowledge graphs (general and biomedical) - Retrosynthesis reactions -**Reference:** See `references/datasets.md` for: +**Reference:** - Complete dataset catalog with sizes and tasks - Dataset selection guide - Loading and preprocessing @@ -243,7 +203,6 @@ Comprehensive catalog of GNN architectures for different data types and tasks. 4. Train with scaffold split for realistic evaluation 5. Evaluate using AUROC and AUPRC -**Navigation:** `references/molecular_property_prediction.md` → Dataset selection → Model selection → Training ### Workflow 2: Protein Function Prediction @@ -256,7 +215,6 @@ Comprehensive catalog of GNN architectures for different data types and tasks. 4. Fine-tune pre-trained model or train from scratch 5. Evaluate using accuracy and per-class metrics -**Navigation:** `references/protein_modeling.md` → Model selection (sequence vs structure) → Pre-training strategies ### Workflow 3: Drug Repurposing via Knowledge Graphs @@ -270,7 +228,6 @@ Comprehensive catalog of GNN architectures for different data types and tasks. 5. Query for "Compound-treats-Disease" predictions 6. Filter by plausibility and mechanism -**Navigation:** `references/knowledge_graphs.md` → Hetionet dataset → Model selection → Biomedical applications ### Workflow 4: De Novo Molecule Generation @@ -284,7 +241,6 @@ Comprehensive catalog of GNN architectures for different data types and tasks. 5. Validate chemistry and filter by drug-likeness 6. Rank by multi-objective scoring -**Navigation:** `references/molecular_generation.md` → Conditional generation → Multi-objective optimization ### Workflow 5: Retrosynthesis Planning @@ -298,74 +254,26 @@ Comprehensive catalog of GNN architectures for different data types and tasks. 5. Apply recursively for multi-step planning 6. Check commercial availability of building blocks -**Navigation:** `references/retrosynthesis.md` → Task types → Multi-step planning ## Integration Patterns ### With RDKit -Convert between TorchDrug molecules and RDKit: -```python -from torchdrug import data -from rdkit import Chem - -# SMILES → TorchDrug molecule -smiles = "CCO" -mol = data.Molecule.from_smiles(smiles) - -# TorchDrug → RDKit -rdkit_mol = mol.to_molecule() - -# RDKit → TorchDrug -rdkit_mol = Chem.MolFromSmiles(smiles) -mol = data.Molecule.from_molecule(rdkit_mol) -``` +Convert between TorchDrug molecules and RDKit ### With AlphaFold/ESM -Use predicted structures: -```python -from torchdrug import data - -# Load AlphaFold predicted structure -protein = data.Protein.from_pdb("AF-P12345-F1-model_v4.pdb") - -# Build graph with spatial edges -graph = protein.residue_graph( - node_position="ca", - edge_types=["sequential", "radius"], - radius_cutoff=10.0 -) -``` +Use predicted structures ### With PyTorch Lightning -Wrap tasks for Lightning training: -```python -import pytorch_lightning as pl - -class LightningTask(pl.LightningModule): - def __init__(self, torchdrug_task): - super().__init__() - self.task = torchdrug_task - - def training_step(self, batch, batch_idx): - return self.task(batch) - - def validation_step(self, batch, batch_idx): - pred = self.task.predict(batch) - target = self.task.target(batch) - return {"pred": pred, "target": target} - - def configure_optimizers(self): - return torch.optim.Adam(self.parameters(), lr=1e-3) -``` +Wrap tasks for Lightning training ## Technical Details For deep dives into TorchDrug's architecture: -**Core Concepts:** See `references/core_concepts.md` for: +**Core Concepts:** - Architecture philosophy (modular, configurable) - Data structures (Graph, Molecule, Protein, PackedGraph) - Model interface and forward function signature @@ -374,73 +282,25 @@ For deep dives into TorchDrug's architecture: - Loss functions and metrics - Common pitfalls and debugging -## Quick Reference Cheat Sheet - -**Choose Dataset:** -- Molecular property → `references/datasets.md` → Molecular section -- Protein task → `references/datasets.md` → Protein section -- Knowledge graph → `references/datasets.md` → Knowledge graph section - -**Choose Model:** -- Molecules → `references/models_architectures.md` → GNN section → GIN/GAT/SchNet -- Proteins (sequence) → `references/models_architectures.md` → Protein section → ESM -- Proteins (structure) → `references/models_architectures.md` → Protein section → GearNet -- Knowledge graph → `references/models_architectures.md` → KG section → RotatE/ComplEx - -**Common Tasks:** -- Property prediction → `references/molecular_property_prediction.md` or `references/protein_modeling.md` -- Generation → `references/molecular_generation.md` -- Retrosynthesis → `references/retrosynthesis.md` -- KG reasoning → `references/knowledge_graphs.md` - -**Understand Architecture:** -- Data structures → `references/core_concepts.md` → Data Structures -- Model design → `references/core_concepts.md` → Model Interface -- Task design → `references/core_concepts.md` → Task Interface ## Troubleshooting Common Issues **Issue: Dimension mismatch errors** → Check `model.input_dim` matches `dataset.node_feature_dim` -→ See `references/core_concepts.md` → Essential Attributes **Issue: Poor performance on molecular tasks** → Use scaffold splitting, not random → Try GIN instead of GCN -→ See `references/molecular_property_prediction.md` → Best Practices **Issue: Protein model not learning** → Use pre-trained ESM for sequence tasks → Check edge construction for structure models -→ See `references/protein_modeling.md` → Training Workflows **Issue: Memory errors with large graphs** → Reduce batch size → Use gradient accumulation -→ See `references/core_concepts.md` → Memory Efficiency **Issue: Generated molecules are invalid** → Add validity constraints → Post-process with RDKit validation -→ See `references/molecular_generation.md` → Validation and Filtering - -## Resources - -**Official Documentation:** https://torchdrug.ai/docs/ -**GitHub:** https://github.com/DeepGraphLearning/torchdrug -**Paper:** TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery - -## Summary - -Navigate to the appropriate reference file based on your task: - -1. **Molecular property prediction** → `molecular_property_prediction.md` -2. **Protein modeling** → `protein_modeling.md` -3. **Knowledge graphs** → `knowledge_graphs.md` -4. **Molecular generation** → `molecular_generation.md` -5. **Retrosynthesis** → `retrosynthesis.md` -6. **Model selection** → `models_architectures.md` -7. **Dataset selection** → `datasets.md` -8. **Technical details** → `core_concepts.md` -Each reference provides comprehensive coverage of its domain with examples, best practices, and common use cases. diff --git a/skills/Research/tracking-regression-tests/SKILL.md b/skills/Research/tracking-regression-tests/SKILL.md index a247c67e..7254efdf 100644 --- a/skills/Research/tracking-regression-tests/SKILL.md +++ b/skills/Research/tracking-regression-tests/SKILL.md @@ -1,9 +1,12 @@ --- -category: Research id: tracking-regression-tests name: Tracking Regression Tests -description: This skill enables Claude to track and run regression tests, ensuring new changes don't break existing functionality. It is triggered when the user asks to "track regression", "run regression tests", or uses the shortcut "reg". The skill helps in maintaining code stability by identifying critical tests, automating their execution, and analyzing the impact of changes. It also provides insights into test history and identifies flaky tests. The skill uses the `regression-test-tracker` plugin. - This skill enables Claude to track and run regression tests, ensuring new changes don't break existing functionality. It is triggered when the user asks to "track regression", "run regression tests", or uses the shortcut "reg". The skill helps in maintaining code stability by identifying critical tests, automating their execution, and analyzing the impact of changes. It also provides insights into test history and identifies flaky tests. The skill uses the `regression-test-tracker` plugin. +description: Track and run regression tests using the regression-test-tracker to ensure code stability and identify failures. +category: Research +requires: [] +examples: + - Run the regression test suite for the current development branch. + - Mark this specific unit test as a regression test for automated tracking. --- ## Overview diff --git a/skills/Research/transit-least-squares/SKILL.md b/skills/Research/transit-least-squares/SKILL.md index 436fb525..c7cef96f 100644 --- a/skills/Research/transit-least-squares/SKILL.md +++ b/skills/Research/transit-least-squares/SKILL.md @@ -1,20 +1,32 @@ --- id: transit-least-squares name: Transit Least Squares -description: Step-by-step guidance for transit least squares. +description: Step-by-step guidance for Transit Least Squares (TLS) analysis to detect periodic signals in photometric data. category: Research +requires: [] +examples: + - Apply the Transit Least Squares algorithm to this astronomical light curve. + - Identify periodic planet signals in my photometric data using TLS. --- # Transit Least Squares Support transit least squares workflows with clear steps and best practices. -## When to Use +## Instruction +- Load time-series photometric light curves, ensuring the data is properly cleaned and detrended. +- Define the search grid parameters, including the minimum and maximum period to be searched. +- Execute the TLS algorithm to detect periodic dips in stellar brightness, accounting for the realistic transit shape. +- Evaluate the Signal Detection Efficiency (SDE) and Signal-to-Noise Ratio (SNR) to determine the statistical significance of detected peaks. +- Identify candidate planetary transits and extract their orbital period, transit duration, and depth. +- Run a secondary optimization to refine the transit parameters and check for harmonic signals or eclipsing binary patterns. -- You need help with transit least squares. -- You want a clear, actionable next step. +## When to Use +- When searching for shallow planetary transit signals in light curves from missions like TESS, Kepler, or ground-based surveys. +- When traditional Box Least Squares (BLS) is insufficient for detecting signals with realistic transit profiles. +- When needing high-precision periodogram analysis for periodic signal discovery in astronomical photometry. ## Output - -- Summary of goals and plan -- Key tips and precautions +- TLS periodograms showing power peaks and identified period candidates. +- Summaries of planetary parameters (period, depth, duration) with associated confidence scores. +- Actionable next steps for signal validation and transit-fitting refinement. \ No newline at end of file diff --git a/skills/Research/uniprot-database/SKILL.md b/skills/Research/uniprot-database/SKILL.md index c91ccacf..96c08004 100644 --- a/skills/Research/uniprot-database/SKILL.md +++ b/skills/Research/uniprot-database/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: uniprot-database name: UniProt Database -description: Step-by-step guidance for uniprot database. +description: Access UniProt via BioServices to retrieve protein sequences, functional annotations, and structural data. +category: Research +requires: [] +examples: + - Retrieve the protein sequence and functional annotations for the human P53 gene. + - Search UniProt for proteins related to insulin signaling in mammals. --- # BioServices @@ -24,92 +28,31 @@ This skill should be used when: - Mining genomic data (BioMart, ArrayExpress, ENA) - Integrating data from multiple bioinformatics resources in a single workflow -## Core Capabilities - -### 1. Protein Analysis - -Retrieve protein information, sequences, and functional annotations: - -```python -from bioservices import UniProt - -u = UniProt(verbose=False) - -# Search for protein by name -results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism") +## Instruction +- Formulate precise queries to retrieve protein sequences, functional descriptions, and structural metadata from UniProtKB. +- Access detailed functional annotations, including GO terms, enzyme classifications (EC numbers), and post-translational modifications. +- Perform identifier mapping (e.g., UniProt ID to PDB or Ensembl) to integrate protein data into multi-omics pipelines. +- Retrieve protein-protein interaction evidence and biological pathway memberships to support systems biology research. +- Coordinate bulk downloads of protein datasets for specific taxonomic lineages or functional families. +- Adhere to UniProt API rate limits and best practices for large-scale data retrieval in Python. -# Retrieve FASTA sequence -sequence = u.retrieve("P43403", "fasta") +## Output +- Formatted protein reports containing sequences, functional annotations, and cross-references. +- Mapped identifier lists and structured protein interaction tables. +- Actionable next steps for protein sequence analysis or structure visualization. -# Map identifiers between databases -kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403") -``` +## Core Capabilities -**Key methods:** -- `search()`: Query UniProt with flexible search terms -- `retrieve()`: Get protein entries in various formats (FASTA, XML, tab) -- `mapping()`: Convert identifiers between databases +### 1. Protein Analysis -Reference: `references/services_reference.md` for complete UniProt API details. +Retrieve protein information, sequences, and functional annotations ### 2. Pathway Discovery and Analysis -Access KEGG pathway information for genes and organisms: - -```python -from bioservices import KEGG - -k = KEGG() -k.organism = "hsa" # Set to human - -# Search for organisms -k.lookfor_organism("droso") # Find Drosophila species - -# Find pathways by name -k.lookfor_pathway("B cell") # Returns matching pathway IDs - -# Get pathways containing specific genes -pathways = k.get_pathway_by_gene("7535", "hsa") # ZAP70 gene - -# Retrieve and parse pathway data -data = k.get("hsa04660") -parsed = k.parse(data) - -# Extract pathway interactions -interactions = k.parse_kgml_pathway("hsa04660") -relations = interactions['relations'] # Protein-protein interactions - -# Convert to Simple Interaction Format -sif_data = k.pathway2sif("hsa04660") -``` - -**Key methods:** -- `lookfor_organism()`, `lookfor_pathway()`: Search by name -- `get_pathway_by_gene()`: Find pathways containing genes -- `parse_kgml_pathway()`: Extract structured pathway data -- `pathway2sif()`: Get protein interaction networks - -Reference: `references/workflow_patterns.md` for complete pathway analysis workflows. - +Access KEGG pathway information for genes and organisms ### 3. Compound Database Searches -Search and cross-reference compounds across multiple databases: - -```python -from bioservices import KEGG, UniChem - -k = KEGG() - -# Search compounds by name -results = k.find("compound", "Geldanamycin") # Returns cpd:C11222 - -# Get compound information with database links -compound_info = k.get("cpd:C11222") # Includes ChEBI links - -# Cross-reference KEGG → ChEMBL using UniChem -u = UniChem() -chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315 -``` +Search and cross-reference compounds across multiple databases **Common workflow:** 1. Search compound by name in KEGG @@ -117,95 +60,28 @@ chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315 3. Use UniChem for KEGG → ChEMBL mapping 4. ChEBI IDs are often provided in KEGG entries -Reference: `references/identifier_mapping.md` for complete cross-database mapping guide. ### 4. Sequence Analysis -Run BLAST searches and sequence alignments: - -```python -from bioservices import NCBIblast - -s = NCBIblast(verbose=False) - -# Run BLASTP against UniProtKB -jobid = s.run( - program="blastp", - sequence=protein_sequence, - stype="protein", - database="uniprotkb", - email="your.email@example.com" # Required by NCBI -) - -# Check job status and retrieve results -s.getStatus(jobid) -results = s.getResult(jobid, "out") -``` - -**Note:** BLAST jobs are asynchronous. Check status before retrieving results. - +Run BLAST searches and sequence alignments ### 5. Identifier Mapping -Convert identifiers between different biological databases: - -```python -from bioservices import UniProt, KEGG - -# UniProt mapping (many database pairs supported) -u = UniProt() -results = u.mapping( - fr="UniProtKB_AC-ID", # Source database - to="KEGG", # Target database - query="P43403" # Identifier(s) to convert -) - -# KEGG gene ID → UniProt -kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535") - -# For compounds, use UniChem -from bioservices import UniChem -u = UniChem() -chembl_from_kegg = u.get_compound_id_from_kegg("C11222") -``` +Convert identifiers between different biological databases **Supported mappings (UniProt):** - UniProtKB ↔ KEGG - UniProtKB ↔ Ensembl - UniProtKB ↔ PDB - UniProtKB ↔ RefSeq -- And many more (see `references/identifier_mapping.md`) - -### 6. Gene Ontology Queries - -Access GO terms and annotations: - -```python -from bioservices import QuickGO -g = QuickGO(verbose=False) -# Retrieve GO term information -term_info = g.Term("GO:0003824", frmt="obo") +### 6. Gene Ontology Queries -# Search annotations -annotations = g.Annotation(protein="P43403", format="tsv") -``` +Access GO terms and annotations ### 7. Protein-Protein Interactions -Query interaction databases via PSICQUIC: - -```python -from bioservices import PSICQUIC - -s = PSICQUIC(verbose=False) - -# Query specific database (e.g., MINT) -interactions = s.query("mint", "ZAP70 AND species:9606") - -# List available interaction databases -databases = s.activeDBs -``` +Query interaction databases via PSICQUIC **Available databases:** MINT, IntAct, BioGRID, DIP, and 30+ others. @@ -215,11 +91,7 @@ BioServices excels at combining multiple services for comprehensive analysis. Co ### Complete Protein Analysis Pipeline -Execute a full protein characterization workflow: - -```bash -python scripts/protein_analysis_workflow.py ZAP70_HUMAN your.email@example.com -``` +Execute a full protein characterization workflow This script demonstrates: 1. UniProt search for protein entry @@ -232,10 +104,6 @@ This script demonstrates: Analyze all pathways for an organism: -```bash -python scripts/pathway_analysis.py hsa output_directory/ -``` - Extracts and analyzes: - All pathway IDs for organism - Protein-protein interactions per pathway @@ -246,9 +114,6 @@ Extracts and analyzes: Map compound identifiers across databases: -```bash -python scripts/compound_cross_reference.py Geldanamycin -``` Retrieves: - KEGG compound ID @@ -260,9 +125,6 @@ Retrieves: Convert multiple identifiers at once: -```bash -python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG -``` ## Best Practices @@ -276,28 +138,11 @@ Different services return data in various formats: ### Rate Limiting and Verbosity -Control API request behavior: - -```python -from bioservices import KEGG - -k = KEGG(verbose=False) # Suppress HTTP request details -k.TIMEOUT = 30 # Adjust timeout for slow connections -``` +Control API request behavior ### Error Handling -Wrap service calls in try-except blocks: - -```python -try: - results = u.search("ambiguous_query") - if results: - # Process results - pass -except Exception as e: - print(f"Search failed: {e}") -``` +Wrap service calls in try-except blocks ### Organism Codes @@ -317,41 +162,3 @@ BioServices works well with: - **PyMOL**: 3D structure visualization (retrieve PDB IDs) - **NetworkX**: Network analysis of pathway interactions - **Galaxy**: Custom tool wrappers for workflow platforms - -## Resources - -### scripts/ - -Executable Python scripts demonstrating complete workflows: - -- `protein_analysis_workflow.py`: End-to-end protein characterization -- `pathway_analysis.py`: KEGG pathway discovery and network extraction -- `compound_cross_reference.py`: Multi-database compound searching -- `batch_id_converter.py`: Bulk identifier mapping utility - -Scripts can be executed directly or adapted for specific use cases. - -### references/ - -Detailed documentation loaded as needed: - -- `services_reference.md`: Comprehensive list of all 40+ services with methods -- `workflow_patterns.md`: Detailed multi-step analysis workflows -- `identifier_mapping.md`: Complete guide to cross-database ID conversion - -Load references when working with specific services or complex integration tasks. - -## Installation - -```bash -uv pip install bioservices -``` - -Dependencies are automatically managed. Package is tested on Python 3.9-3.12. - -## Additional Information - -For detailed API documentation and advanced features, refer to: -- Official documentation: https://bioservices.readthedocs.io/ -- Source code: https://github.com/cokelaer/bioservices -- Service-specific references in `references/services_reference.md` diff --git a/skills/Research/units/SKILL.md b/skills/Research/units/SKILL.md index 6acf7d37..4d5fc0ad 100644 --- a/skills/Research/units/SKILL.md +++ b/skills/Research/units/SKILL.md @@ -1,20 +1,32 @@ --- id: units name: Units -description: Step-by-step guidance for units. +description: Step-by-step guidance for unit conversion workflows and best practices in scientific data standardization. category: Research +requires: [] +examples: + - Convert 500 nautical miles to kilometers for this navigation report. + - What are the standard units for measuring cosmic distance in this analysis? --- # Units Support units workflows with clear steps and best practices. -## When to Use +## Instruction +- Identify the source value and its physical unit, determining the target unit for conversion. +- Apply appropriate physical constants and conversion factors for diverse measurement domains (e.g., length, mass, energy, pressure). +- Execute multi-step conversions for derived units, such as transforming velocity from nautical miles per hour to meters per second. +- Ensure dimensional consistency across complex scientific formulas to prevent calculation errors. +- Standardize data to SI (International System of Units) or specific domain standards (e.g., astronomical units for cosmic distance). +- Verify that converted values maintain appropriate significant figures and reflect the precision of the original measurement. -- You need help with units. -- You want a clear, actionable next step. +## When to Use +- When standardizing heterogeneous scientific datasets into a unified measurement system. +- When performing complex engineering or physics calculations that involve disparate unit systems (Imperial vs. Metric). +- When generating reports or documentation that require precise unit accuracy for global accessibility. ## Output - -- Summary of goals and plan -- Key tips and precautions +- Converted values with clearly labeled target units. +- A summarized calculation plan showing the conversion factors and intermediate steps used. +- Technical precautions regarding precision loss or dimensional mismatches. \ No newline at end of file diff --git a/skills/Research/update-flaky-tests/SKILL.md b/skills/Research/update-flaky-tests/SKILL.md index e9b4f091..34bd2360 100644 --- a/skills/Research/update-flaky-tests/SKILL.md +++ b/skills/Research/update-flaky-tests/SKILL.md @@ -1,18 +1,39 @@ --- -category: Research id: update-flaky-tests name: Update Flaky Tests -description: Update the flaky test tracker. -allowed-tools: Read, Write, Bash, Glob +description: Update and manage the flaky E2E test tracker to systematically log test failures unrelated to current work. +category: Research +requires: [] +examples: + - Log a new flaky E2E test failure for the user management flow. + - Show me the current status of all active flaky tests in the database. --- # Update Flaky Tests Track and manage flaky E2E test observations over time. This skill helps systematically log test failures that are unrelated to the current work, preserving error artifacts for later analysis. +## Instruction +- Read and initialize the flaky tests database using the standardized JSON schema. +- Log new test failures by capturing critical context: test file, test name, specific step, browser environment, and the error pattern. +- Attach and manage error artifacts (e.g., logs, screenshots) in the designated artifacts folder for future debugging. +- Utilize local ISO 8601 timestamps for all log entries to ensure accurate cross-referencing with CI/CD events. +- Update the status of existing entries from "active" to "archived" once a fix has been verified and applied. +- Generate status reports summarizing the most frequent flaky tests and their impact on test suite reliability. + +## When to Use +- When identifying non-deterministic E2E test failures that are unrelated to current code changes. +- When maintaining a systematic historical record of test instability for long-term suite health analysis. +- When collaborating with team members to prioritize and resolve recurring test bottlenecks. + +## Output +- Updated flaky-tests.json database with new or modified entries. +- Summarized test failure reports including error patterns and frequency metrics. +- Structured data logs ready for visualization in engineering dashboards. + ## STEP 1: Load Database -Read the flaky tests database from `.workspace/flaky-tests/flaky-tests.json`. +Read the flaky tests database If the file or folder doesn't exist: 1. Create the folder structure: `.workspace/flaky-tests/` and `.workspace/flaky-tests/artifacts/` @@ -96,10 +117,6 @@ For each test failure you observed: - `description`: Brief description of what was fixed - `appliedBy`: Your agent type -### Status Mode (standalone check) - -Read `/.claude/skills/update-flaky-tests/status-output-sample.md` first. Output status as a markdown table matching that format. Sort by Count descending. Omit Archived section if empty. End with legend line, nothing after. - ## STEP 5: Save Database 1. Update `lastUpdated` to current UTC timestamp diff --git a/skills/Research/usgs-data-download/SKILL.md b/skills/Research/usgs-data-download/SKILL.md index b1cca561..202df8ac 100644 --- a/skills/Research/usgs-data-download/SKILL.md +++ b/skills/Research/usgs-data-download/SKILL.md @@ -1,20 +1,32 @@ --- id: usgs-data-download name: USGS Data Download -description: Step-by-step guidance for usgs data download. +description: Step-by-step guidance for USGS data download workflows and EarthExplorer best practices. category: Research +requires: [] +examples: + - How do I download earthquake data from the USGS API? + - Provide a step-by-step guide for fetching satellite imagery from EarthExplorer. --- # USGS Data Download Support usgs data download workflows with clear steps and best practices. -## When to Use +## Instruction +- Define the research area using geographic coordinates (bounding box or point radius) and specify the temporal range. +- Select the appropriate USGS dataset for the task, such as Earthquake catalogs, Landsat satellite imagery, or hydrologic data. +- Utilize the USGS API or EarthExplorer machine-to-machine interface to query available records based on metadata filters. +- Coordinate bulk download requests, managing authentication and handling large data transfers via specialized download managers. +- Extract and validate file metadata to ensure data integrity and proper versioning for scientific analysis. +- Implement automated workflows for periodic data fetching to monitor real-time geological or environmental changes. -- You need help with usgs data download. -- You want a clear, actionable next step. +## When to Use +- When requiring official geological, hydrological, or remote sensing data for scientific research. +- When automating the retrieval of global earthquake alerts or historical seismic catalogs. +- When performing environmental monitoring or land-use analysis using standardized satellite imagery. ## Output - -- Summary of goals and plan -- Key tips and precautions +- A curated download manifest with direct links to the requested USGS data products. +- Structured summaries of identified records including spatial bounds and data quality indicators. +- Actionable steps for data decompression and initial processing in GIS or analysis tools. \ No newline at end of file diff --git a/skills/Research/uspto-database/SKILL.md b/skills/Research/uspto-database/SKILL.md index 16b8a1b3..227841c3 100644 --- a/skills/Research/uspto-database/SKILL.md +++ b/skills/Research/uspto-database/SKILL.md @@ -1,20 +1,32 @@ --- -category: Research id: uspto-database name: USPTO Database -description: Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches. +description: Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, and IP analysis. +category: Research +requires: [] +examples: + - Search for recent patent assignments in the USPTO database. + - Retrieve the examination history for a specific patent application ID. --- # USPTO Database Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches. -## When to Use +## Instruction +- Identify the research target: determine whether the user is searching for Patents (utility/design) or Trademarks. +- Formulate precise search queries using Boolean logic, application IDs, or international classification codes. +- Utilize the Patent Examination Data System (PEDS) API to retrieve the full lifecycle history of a patent application. +- Analyze assignment data to track changes in IP ownership, mergers, or acquisitions. +- Retrieve specific legal artifacts including Office Actions, citations, and maintenance fee statuses. +- Perform prior art searches by mapping citations and related patent families to assess patentability. -- You need help with uspto database. -- You want a clear, actionable next step. +## When to Use +- When searching for existing patents and trademarks to support IP analysis or innovation research. +- When needing to verify the current legal status or examination history of a specific patent application. +- When auditing patent assignments or performing prior art searches for patent filing. ## Output - -- Summary of goals and plan -- Key tips and precautions +- Detailed IP search reports containing application summaries, key dates, and current legal status. +- Structured examination history tables showing office actions and applicant responses. +- Actionable briefings on IP ownership trends and prior art landscapes. \ No newline at end of file diff --git a/skills/Research/uv-global/SKILL.md b/skills/Research/uv-global/SKILL.md index 33d4816c..be59eb2b 100644 --- a/skills/Research/uv-global/SKILL.md +++ b/skills/Research/uv-global/SKILL.md @@ -1,20 +1,32 @@ --- id: uv-global name: UV Global -description: Step-by-step guidance for uv global. +description: Step-by-step guidance for using uv to manage global Python tool installations and environments. category: Research +requires: [] +examples: + - How do I use uv to manage global Python tool installations? + - Explain the difference between uv tool run and uv tool install. --- # UV Global Support uv global workflows with clear steps and best practices. -## When to Use +## Instruction +- Determine if the user needs a persistent tool installation or a one-off execution of a Python package. +- Utilize `uv tool install` to set up persistent, isolated command-line tools with their own environments. +- Use `uv tool run` (equivalent to `npx` or `pipx run`) for temporary execution of tools without permanent installation. +- Manage the global tool list by showing, updating, or uninstalling existing tool environments. +- Explain the mechanism of isolated environments to ensure the user understands why global tools do not conflict with local project dependencies. +- Configure path variables or shell completions to integrate `uv`-managed tools into the user's terminal workflow. -- You need help with uv global. -- You want a clear, actionable next step. +## When to Use +- When installing global developer tools like `black`, `ruff`, or `awscli` while maintaining environment isolation. +- When needing to run a specific Python script or CLI tool once without creating a project-specific virtual environment. +- When upgrading from legacy global `pip` installations to a faster, more reliable tool management system. ## Output - -- Summary of goals and plan -- Key tips and precautions +- Step-by-step installation or execution commands using the `uv` tool syntax. +- Summaries of installed tools and their corresponding virtual environment locations. +- Technical precautions regarding path configurations and tool versioning. \ No newline at end of file diff --git a/skills/Research/venue-templates/SKILL.md b/skills/Research/venue-templates/SKILL.md index 7a9050e1..5c26e77e 100644 --- a/skills/Research/venue-templates/SKILL.md +++ b/skills/Research/venue-templates/SKILL.md @@ -1,21 +1,32 @@ --- -category: Research id: venue-templates name: Venue Templates -description: Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates. -allowed-tools: [Read, Write, Edit, Bash] +description: Access LaTeX templates and submission guidelines for major scientific journals (Nature, Science) and conferences (NeurIPS, ICML). +category: Research +requires: [] +examples: + - Find the LaTeX template for a NeurIPS conference paper. + - What are the formatting requirements for a Nature journal submission? --- # Venue Templates Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates. -## When to Use +## Instruction +- Identify the target publication venue, such as high-impact journals (Nature, Science, PLOS) or major conferences (NeurIPS, ICML, CVPR). +- Retrieve the specific official LaTeX template package and identify key class files (.cls) and style files (.sty). +- Outline the formatting constraints for the selected venue, including page limits, font sizes, and mandatory sections. +- Determine the required citation style (e.g., BibTeX, biblatex) and verify that the bibliography format matches venue guidelines. +- Guide the user in setting up specific environments for research posters or grant proposals (NSF, NIH, DARPA). +- Provide a pre-submission checklist focused on figure resolution, math formatting, and metadata accuracy. -- You need help with venue templates. -- You want a clear, actionable next step. +## When to Use +- When preparing a scientific manuscript for submission to a peer-reviewed journal or academic conference. +- When designing research posters or formatting complex grant proposals according to agency-specific rules. +- When needing to standardize LaTeX environments for professional scientific documentation. ## Output - -- Summary of goals and plan -- Key tips and precautions +- Access links or directory locations for the requested official venue templates. +- A summarized checklist of venue-specific formatting and submission requirements. +- Actionable steps for initializing and compiling the template in a local or cloud LaTeX editor. \ No newline at end of file diff --git a/skills/Research/wandb-experiment-logger/SKILL.md b/skills/Research/wandb-experiment-logger/SKILL.md index 7ba88085..ea915564 100644 --- a/skills/Research/wandb-experiment-logger/SKILL.md +++ b/skills/Research/wandb-experiment-logger/SKILL.md @@ -1,15 +1,14 @@ --- -category: Research id: wandb-experiment-logger name: WandB Experiment Logger -description: Execute wandb experiment logger operations. Auto-activating skill for ML Training. Triggers on: wandb experiment logger, wandb experiment logger Part of the ML Training skill category. - Execute wandb experiment logger operations. Auto-activating skill for ML Training. - Triggers on: wandb experiment logger, wandb experiment logger - Part of the ML Training skill category. Use when working with wandb experiment logger functionality. Trigger with phrases like "wandb experiment logger", "wandb logger", "wandb". -allowed-tools: "Read, Write, Edit, Bash(python:*), Bash(pip:*)" +description: Execute Weights & Biases operations for ML training, covering data preparation, model training, and experiment tracking. +category: Research +author: Jeremy Longshore version: 1.0.0 -license: MIT -author: "Jeremy Longshore " +requires: [] +examples: + - Initialize a Weights & Biases run for my training script. + - How do I log model artifacts and metrics to W&B? --- # Wandb Experiment Logger diff --git a/skills/Research/weather-query/SKILL.md b/skills/Research/weather-query/SKILL.md index 78c97133..bd597ccc 100644 --- a/skills/Research/weather-query/SKILL.md +++ b/skills/Research/weather-query/SKILL.md @@ -1,20 +1,32 @@ --- id: weather-query name: Weather Query -description: Step-by-step guidance for weather query. +description: Step-by-step guidance for weather query workflows and fetching real-time meteorological data. category: Research +requires: [] +examples: + - What is the weather forecast for London over the next three days? + - Show me the current temperature and humidity in Tokyo. --- # Weather Query Support weather query workflows with clear steps and best practices. -## When to Use +## Instruction +- Resolve the target geographic location by mapping city names or addresses to latitude and longitude coordinates. +- Select the specific meteorological parameters required, such as temperature, humidity, wind speed, UV index, or precipitation levels. +- Define the temporal scope of the query, distinguishing between real-time current conditions, hourly forecasts, or 7-day outlooks. +- Retrieve data from authoritative global weather APIs (e.g., OpenMeteo, NOAA) and handle unit conversions where necessary. +- Analyze weather alerts or extreme conditions that might impact specific research or logistics activities. +- Synthesize the retrieved data into a structured briefing, highlighting critical changes or trends over the requested window. -- You need help with weather query. -- You want a clear, actionable next step. +## When to Use +- When needing real-time weather status or forecasts to support operational planning and field research. +- When performing research on historical weather patterns or environmental impacts on specific locations. +- When integrating meteorological data into automated alerting systems or project dashboards. ## Output - -- Summary of goals and plan -- Key tips and precautions +- Structured weather reports containing current conditions and multi-day forecast summaries. +- Visualized weather trends (e.g., temperature curves or precipitation probability charts). +- Actionable steps for responding to adverse weather warnings or specific meteorological conditions. \ No newline at end of file diff --git a/skills/Research/wiring/SKILL.md b/skills/Research/wiring/SKILL.md index 5fb43512..2e9e514c 100644 --- a/skills/Research/wiring/SKILL.md +++ b/skills/Research/wiring/SKILL.md @@ -1,12 +1,35 @@ --- -category: Research id: wiring name: Wiring -description: +description: Developer guide for wiring headless features to presentation views using MobX and specific code conventions. +category: Research +requires: [] +examples: + - How do I wire a headless feature to a presentation view using MobX? + - Explain the code conventions for creating implementations in this admin project. --- # Admin Developer Guide +## Instruction +- Categorize the new feature as either Headless (reusable business logic) or Presentation (UI-bound logic). +- Create abstractions for Use Cases, Services, and Repositories using `createAbstraction` within the appropriate feature directory. +- Implement features using `createImplementation`, ensuring that the `dependencies` array matches the constructor's injection order. +- Set up stateful classes (Presenters, Repositories) by applying `makeAutoObservable(this)` in the constructor for MobX reactive tracking. +- Wrap all asynchronous state mutations in `runInAction` to comply with strict MobX mutation rules. +- Connect React views to Presenters using the `observer` HOC from `mobx-react-lite` and ensure views only read from the ViewModel (`presenter.vm`). +- Adhere to naming and export conventions, ensuring classes are not exported directly, only the implementation result. + +## When to Use +- When developing or refactoring features in a project utilizing the Webiny/MobX architectural pattern. +- When wiring business logic (headless features) to UI components (presentation features) to maintain separation of concerns. +- When needing to ensure consistent dependency injection and reactive state management across a complex admin application. + +## Output +- A structured implementation plan following the specific directory and naming conventions. +- Functional code snippets for abstractions, implementations, and MobX-reactive views. +- A checklist of architectural precautions regarding scoping, exports, and state mutation patterns. + ## Two Types of Features ### Headless Features @@ -57,461 +80,7 @@ UseCase → Repository → Gateway → External API - **Repository**: Owns domain data and cache. Singleton scope. - **Gateway**: Handles external I/O (GraphQL, REST). Singleton scope. -### Abstractions (`abstractions.ts`) - -```typescript -import { createAbstraction } from "@webiny/feature/admin"; -import type { Folder } from "~/domain/folder/Folder.js"; - -// Use Case. -export interface CreateFolderParams { - title: string; - slug: string; - parentId: string | null; -} - -export interface ICreateFolderUseCase { - execute: (params: CreateFolderParams) => Promise; -} - -export const CreateFolderUseCase = - createAbstraction("CreateFolderUseCase"); - -export namespace CreateFolderUseCase { - export type Interface = ICreateFolderUseCase; - export type Params = CreateFolderParams; -} - -// Repository. -export interface ICreateFolderRepository { - execute: (folder: Folder) => Promise; -} - -export const CreateFolderRepository = - createAbstraction("CreateFolderRepository"); - -export namespace CreateFolderRepository { - export type Interface = ICreateFolderRepository; -} - -// Gateway. -export interface ICreateFolderGateway { - execute: (dto: FolderGatewayDto) => Promise; -} - -export const CreateFolderGateway = - createAbstraction("CreateFolderGateway"); - -export namespace CreateFolderGateway { - export type Interface = ICreateFolderGateway; -} -``` - -### Use Case Implementation - -```typescript -import { Folder } from "~/domain/folder/Folder.js"; -import { - CreateFolderUseCase as UseCaseAbstraction, - CreateFolderRepository -} from "./abstractions.js"; - -class CreateFolderUseCaseImpl implements UseCaseAbstraction.Interface { - constructor(private repository: CreateFolderRepository.Interface) {} - - async execute(params: UseCaseAbstraction.Params) { - await this.repository.execute( - Folder.create({ - title: params.title, - slug: params.slug, - parentId: params.parentId - }) - ); - } -} - -export const CreateFolderUseCase = UseCaseAbstraction.createImplementation({ - implementation: CreateFolderUseCaseImpl, - dependencies: [CreateFolderRepository] -}); -``` - -### Service Implementation (Stateful, Observable) - -For long-lived services that hold observable state (e.g., WcpService, TelemetryService): - -```typescript -import { makeAutoObservable, runInAction } from "mobx"; -import { - WcpService as ServiceAbstraction, - WcpGateway -} from "./abstractions.js"; - -class WcpServiceImpl implements ServiceAbstraction.Interface { - private project: ILicense | null = null; - - constructor(private gateway: WcpGateway.Interface) { - makeAutoObservable(this); - } - - getProject(): ILicense { - return this.project; - } - - async loadProject(): Promise { - const data = await this.gateway.fetchProject(); - runInAction(() => { - this.project = data; - }); - } -} - -export const WcpService = ServiceAbstraction.createImplementation({ - implementation: WcpServiceImpl, - dependencies: [WcpGateway] -}); -``` - -### Feature Registration (`feature.ts`) - -```typescript -import { createFeature } from "@webiny/feature/admin"; -import { CreateFolderUseCase as UseCase } from "./abstractions.js"; -import { CreateFolderUseCase } from "./CreateFolderUseCase.js"; -import { CreateFolderRepository } from "./CreateFolderRepository.js"; -import { CreateFolderGqlGateway } from "./CreateFolderGqlGateway.js"; - -export const CreateFolderFeature = createFeature({ - name: "CreateFolder", - register(container) { - container.register(CreateFolderUseCase); - container.register(CreateFolderRepository).inSingletonScope(); - container.register(CreateFolderGqlGateway); - }, - resolve(container) { - return { - useCase: container.resolve(UseCase) - }; - } -}); -``` - -### Composite Features (Aggregating Child Features) - -When grouping related features, create a composite with no `resolve`: - -```typescript -import { createFeature } from "@webiny/feature/admin"; - -export const FoldersFeature = createFeature({ - name: "Folders", - register(container) { - CreateFolderFeature.register(container); - UpdateFolderFeature.register(container); - DeleteFolderFeature.register(container); - } -}); -``` - -### Consuming Headless Features in React - -Always go through a hook or presentation feature — never use `useFeature(HeadlessFeature)` directly in a component's render body without wrapping it: - -```typescript -// Hook wrapping a headless feature. -export const useCreateFolder = () => { - const { useCase } = useFeature(CreateFolderFeature); - - return { - createFolder: (params: CreateFolderUseCase.Params) => { - return useCase.execute(params); - } - }; -}; -``` - ---- - -## Presentation Features - -### Architecture - -``` -View (React) → Presenter → Repository → Gateway - ↑ - (or composes a headless feature) -``` - -- **Presenter**: Owns the ViewModel (`vm` getter). Orchestrates loading state. Uses MobX. Transient scope by default. -- **Repository**: Owns domain data. Singleton scope. -- **Gateway**: External I/O. Singleton scope. -- **View**: React component wrapped with `observer`. Reads only from `presenter.vm`. - -### Abstractions (`abstractions.ts`) - -```typescript -import { createAbstraction } from "@webiny/feature/admin"; - -export type NextjsConfig = string; - -// Presenter. -export interface INextjsConfigVm { - loading: boolean; - config: NextjsConfig | undefined; -} - -export interface INextjsConfigPresenter { - vm: INextjsConfigVm; - init(): void; -} - -export const NextjsConfigPresenter = - createAbstraction("NextjsConfigPresenter"); -export namespace NextjsConfigPresenter { - export type Interface = INextjsConfigPresenter; - export type ViewModel = INextjsConfigVm; -} - -// Repository. -export interface INextjsConfigRepository { - getConfig(): NextjsConfig | undefined; - loadConfig(): Promise; -} - -export const NextjsConfigRepository = - createAbstraction("NextjsConfigRepository"); - -export namespace NextjsConfigRepository { - export type Interface = INextjsConfigRepository; -} - -// Gateway. -export interface INextjsConfigGateway { - getConfig(): Promise; -} - -export const NextjsConfigGateway = - createAbstraction("NextjsConfigGateway"); - -export namespace NextjsConfigGateway { - export type Interface = INextjsConfigGateway; -} -``` - -### Presenter Implementation - -```typescript -import { makeAutoObservable, runInAction } from "mobx"; -import { - NextjsConfigPresenter as PresenterAbstraction, - NextjsConfigRepository -} from "./abstractions.js"; - -class NextjsConfigPresenterImpl implements PresenterAbstraction.Interface { - private loading = false; - - constructor(private repository: NextjsConfigRepository.Interface) { - makeAutoObservable(this); - } - - get vm(): PresenterAbstraction.ViewModel { - return { - loading: this.loading, - config: this.repository.getConfig() - }; - } - - init(): void { - this.loading = true; - this.repository.loadConfig().then(() => { - runInAction(() => { - this.loading = false; - }); - }); - } -} - -export const NextjsConfigPresenter = PresenterAbstraction.createImplementation({ - implementation: NextjsConfigPresenterImpl, - dependencies: [NextjsConfigRepository] -}); -``` - -### Repository Implementation - -```typescript -import { makeAutoObservable, runInAction } from "mobx"; -import { - NextjsConfigRepository as RepositoryAbstraction, - NextjsConfigGateway, - NextjsConfig -} from "./abstractions.js"; - -class NextjsConfigRepositoryImpl implements RepositoryAbstraction.Interface { - private config: NextjsConfig | undefined = undefined; - - constructor(private gateway: NextjsConfigGateway.Interface) { - makeAutoObservable(this); - } - - getConfig(): NextjsConfig | undefined { - return this.config; - } - - async loadConfig(): Promise { - if (this.config) { - return; - } - - const config = await this.gateway.getConfig(); - runInAction(() => { - this.config = config; - }); - } -} - -export const NextjsConfigRepository = RepositoryAbstraction.createImplementation({ - implementation: NextjsConfigRepositoryImpl, - dependencies: [NextjsConfigGateway] -}); -``` - -### Gateway Implementation (GraphQL) - -```typescript -import { NextjsConfigGateway as GatewayAbstraction } from "./abstractions.js"; -import { GraphQLClient } from "@webiny/app/features/graphqlClient"; - -const GET_NEXTJS_CONFIG = /* GraphQL */ ` - query GetNextjsConfig { - websiteBuilder { - getNextjsConfig { - data - error { - code - message - data - } - } - } - } -`; - -type GetNextjsConfigResponse = { - websiteBuilder: { - getNextjsConfig: - | { data: string; error: null } - | { data: null; error: { code: string; message: string; data: any } }; - }; -}; - -class NextjsGraphQLGateway implements GatewayAbstraction.Interface { - constructor(private client: GraphQLClient.Interface) {} - - async getConfig(): Promise { - const response = await this.client.execute({ - query: GET_NEXTJS_CONFIG - }); - - const envelope = response.websiteBuilder.getNextjsConfig; - if (envelope.error) { - throw new Error(envelope.error.message); - } - - return envelope.data; - } -} - -export const NextjsConfigGateway = GatewayAbstraction.createImplementation({ - implementation: NextjsGraphQLGateway, - dependencies: [GraphQLClient] -}); -``` - -### Feature Registration (`feature.ts`) - -```typescript -import { createFeature } from "@webiny/feature/admin"; -import { NextjsConfigPresenter as PresenterAbstraction } from "./abstractions.js"; -import { NextjsConfigPresenter } from "./NextjsConfigPresenter.js"; -import { NextjsConfigRepository } from "./NextjsConfigRepository.js"; -import { NextjsConfigGateway } from "./NextjsConfigGateway.js"; - -export const NextjsConfigFeature = createFeature({ - name: "NextjsConfig", - register(container) { - container.register(NextjsConfigPresenter); - container.register(NextjsConfigRepository).inSingletonScope(); - container.register(NextjsConfigGateway).inSingletonScope(); - }, - resolve(container) { - return { - presenter: container.resolve(PresenterAbstraction) - }; - } -}); -``` - -### React View Component - -```typescript -import React, { useEffect } from "react"; -import { observer } from "mobx-react-lite"; -import { useFeature } from "@webiny/app"; -import { NextjsConfigFeature } from "./feature.js"; - -export const NextjsConfigView = observer(() => { - const { presenter } = useFeature(NextjsConfigFeature); - - useEffect(() => { - presenter.init(); - }, []); - - const { loading, config } = presenter.vm; - - if (loading) { - return
Loading...
; - } - - return
{config}
; -}); -``` - ---- - -## Extending Features (Decorators) - -### Use Case Decorator (Cross-cutting Concerns) - -```typescript -class ListFoldersUseCaseWithLoading implements UseCaseAbstraction.Interface { - constructor( - private loadingRepository: FoldersLoadingRepository.Interface, - private decoratee: UseCaseAbstraction.Interface - ) {} - - async execute() { - await this.loadingRepository.runCallBack( - this.decoratee.execute(), - LoadingActionsEnum.list - ); - } -} -``` - -### Registering a Decorator - -```typescript -export const MyExtensionFeature = createFeature({ - name: "MyExtension", - register(container) { - container.registerDecorator(MyPresenterDecorator); - } -}); -``` - ---- ## Scoping Rules diff --git a/skills/Research/zemax-optical-designer/SKILL.md b/skills/Research/zemax-optical-designer/SKILL.md index 868d2da3..f3881462 100644 --- a/skills/Research/zemax-optical-designer/SKILL.md +++ b/skills/Research/zemax-optical-designer/SKILL.md @@ -1,20 +1,32 @@ --- id: zemax-optical-designer name: Zemax Optical Designer -description: Step-by-step guidance for zemax optical designer. +description: Step-by-step guidance for Zemax optical design workflows, including lens optimization and tolerance analysis. category: Research +requires: [] +examples: + - How do I set up a lens optimization workflow in Zemax? + - What are the best practices for tolerance analysis in optical design? --- # Zemax Optical Designer Support zemax optical designer workflows with clear steps and best practices. -## When to Use +## Instruction +- Define the optical system parameters by specifying the entrance pupil diameter, fields of view, and target wavelengths. +- Configure the Lens Data Editor by setting up surface types (e.g., Standard, Aspheric, Diffractive) and selecting appropriate materials from glass catalogs. +- Construct a robust Merit Function using operands to define optimization targets such as spot size, MTF, or wavefront error. +- Execute optimization routines (Local or Global) to find the best design configuration while respecting physical constraints like thickness and weight. +- Perform comprehensive tolerance analysis to evaluate the sensitivity of the design to manufacturing and assembly errors. +- Utilize diagnostic tools such as Ray Fan plots and Aberration coefficients to identify and correct specific image defects. -- You need help with zemax optical designer. -- You want a clear, actionable next step. +## When to Use +- When designing or optimizing precision optical systems including lenses, telescopes, and imaging sensors. +- When conducting feasibility studies or tolerance analysis for optical manufacturing projects. +- When documenting optical performance specifications for research reports or technical reviews. ## Output - -- Summary of goals and plan -- Key tips and precautions +- A summarized optical design plan including system specifications and optimization goals. +- Step-by-step guidance for configuring surfaces, merit functions, and optimization runs. +- Technical tips on tolerance management and performance verification through MTF and Ray Trace analysis. \ No newline at end of file diff --git a/skills/Research/zinc-database/SKILL.md b/skills/Research/zinc-database/SKILL.md index e109f126..e5c08dd3 100644 --- a/skills/Research/zinc-database/SKILL.md +++ b/skills/Research/zinc-database/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: zinc-database name: Zinc Database -description: Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery. +description: Access ZINC database for purchasable compounds, similarity searches, and 3D-ready structures for virtual screening. +category: Research +requires: [] +examples: + - Search the ZINC database for analogs of this SMILES string. + - Download 3D-ready structures for virtual screening from ZINC22. --- # ZINC Database @@ -25,302 +29,19 @@ This skill should be used when: - **Supplier queries**: Identifying compounds from specific chemical vendors - **Random sampling**: Obtaining random compound sets for screening -## Database Versions - -ZINC has evolved through multiple versions: - -- **ZINC22** (Current): Largest version with 230+ million purchasable compounds and multi-billion scale make-on-demand compounds -- **ZINC20**: Still maintained, focused on lead-like and drug-like compounds -- **ZINC15**: Predecessor version, legacy but still documented - -This skill primarily focuses on ZINC22, the most current and comprehensive version. - -## Access Methods - -### Web Interface - -Primary access point: https://zinc.docking.org/ -Interactive searching: https://cartblanche22.docking.org/ - -### API Access - -All ZINC22 searches can be performed programmatically via the CartBlanche22 API: - -**Base URL**: `https://cartblanche22.docking.org/` - -All API endpoints return data in text or JSON format with customizable fields. - -## Core Capabilities - -### 1. Search by ZINC ID - -Retrieve specific compounds using their ZINC identifiers. - -**Web interface**: https://cartblanche22.docking.org/search/zincid - -**API endpoint**: -```bash -curl "https://cartblanche22.docking.org/[email protected]_fields=smiles,zinc_id" -``` - -**Multiple IDs**: -```bash -curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001,ZINC000000000002&output_fields=smiles,zinc_id,tranche" -``` - -**Response fields**: `zinc_id`, `smiles`, `sub_id`, `supplier_code`, `catalogs`, `tranche` (includes H-count, LogP, MW, phase) - -### 2. Search by SMILES - -Find compounds by chemical structure using SMILES notation, with optional distance parameters for analog searching. - -**Web interface**: https://cartblanche22.docking.org/search/smiles - -**API endpoint**: -```bash -curl "https://cartblanche22.docking.org/[email protected]=4-Fadist=4" -``` - -**Parameters**: -- `smiles`: Query SMILES string (URL-encoded if necessary) -- `dist`: Tanimoto distance threshold (default: 0 for exact match) -- `adist`: Alternative distance parameter for broader searches (default: 0) -- `output_fields`: Comma-separated list of desired output fields - -**Example - Exact match**: -```bash -curl "https://cartblanche22.docking.org/smiles.txt:smiles=c1ccccc1" -``` - -**Example - Similarity search**: -```bash -curl "https://cartblanche22.docking.org/smiles.txt:smiles=c1ccccc1&dist=3&output_fields=zinc_id,smiles,tranche" -``` - -### 3. Search by Supplier Codes - -Query compounds from specific chemical suppliers or retrieve all molecules from particular catalogs. - -**Web interface**: https://cartblanche22.docking.org/search/catitems - -**API endpoint**: -```bash -curl "https://cartblanche22.docking.org/catitems.txt:catitem_id=SUPPLIER-CODE-123" -``` - -**Use cases**: -- Verify compound availability from specific vendors -- Retrieve all compounds from a catalog -- Cross-reference supplier codes with ZINC IDs - -### 4. Random Compound Sampling - -Generate random compound sets for screening or benchmarking purposes. - -**Web interface**: https://cartblanche22.docking.org/search/random - -**API endpoint**: -```bash -curl "https://cartblanche22.docking.org/substance/random.txt:count=100" -``` - -**Parameters**: -- `count`: Number of random compounds to retrieve (default: 100) -- `subset`: Filter by subset (e.g., 'lead-like', 'drug-like', 'fragment') -- `output_fields`: Customize returned data fields - -**Example - Random lead-like molecules**: -```bash -curl "https://cartblanche22.docking.org/substance/random.txt:count=1000&subset=lead-like&output_fields=zinc_id,smiles,tranche" -``` - -## Common Workflows - -### Workflow 1: Preparing a Docking Library - -1. **Define search criteria** based on target properties or desired chemical space - -2. **Query ZINC22** using appropriate search method: - ```bash - # Example: Get drug-like compounds with specific LogP and MW - curl "https://cartblanche22.docking.org/substance/random.txt:count=10000&subset=drug-like&output_fields=zinc_id,smiles,tranche" > docking_library.txt - ``` - -3. **Parse results** to extract ZINC IDs and SMILES: - ```python - import pandas as pd - - # Load results - df = pd.read_csv('docking_library.txt', sep='\t') - - # Filter by properties in tranche data - # Tranche format: H##P###M###-phase - # H = H-bond donors, P = LogP*10, M = MW - ``` - -4. **Download 3D structures** for docking using ZINC ID or download from file repositories - -### Workflow 2: Finding Analogs of a Hit Compound +## Instruction +- Identify the target molecules for search using ZINC IDs, supplier codes, or SMILES strings. +- Select the appropriate database version, prioritizing ZINC22 for multi-billion scale tangible compound discovery. +- Perform structure-based searches to find analogs or similar compounds for lead discovery and virtual screening. +- Download 3D-ready molecular structures in SDF or MOL2 formats, ensuring they are prepared for docking workflows. +- Apply filters based on purchasability, molecular weight, and logP to identify leads that meet specific medicinal chemistry criteria. +- Coordinate random sampling of compound sets to build diverse screening libraries for experimental validation. +- Verify supplier information and compound availability through the CartBlanche22 interface or API links. -1. **Obtain SMILES** of the hit compound: - ```python - hit_smiles = "CC(C)Cc1ccc(cc1)C(C)C(=O)O" # Example: Ibuprofen - ``` - -2. **Perform similarity search** with distance threshold: - ```bash - curl "https://cartblanche22.docking.org/smiles.txt:smiles=CC(C)Cc1ccc(cc1)C(C)C(=O)O&dist=5&output_fields=zinc_id,smiles,catalogs" > analogs.txt - ``` - -3. **Analyze results** to identify purchasable analogs: - ```python - import pandas as pd - - analogs = pd.read_csv('analogs.txt', sep='\t') - print(f"Found {len(analogs)} analogs") - print(analogs[['zinc_id', 'smiles', 'catalogs']].head(10)) - ``` - -4. **Retrieve 3D structures** for the most promising analogs - -### Workflow 3: Batch Compound Retrieval - -1. **Compile list of ZINC IDs** from literature, databases, or previous screens: - ```python - zinc_ids = [ - "ZINC000000000001", - "ZINC000000000002", - "ZINC000000000003" - ] - zinc_ids_str = ",".join(zinc_ids) - ``` - -2. **Query ZINC22 API**: - ```bash - curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001,ZINC000000000002&output_fields=zinc_id,smiles,supplier_code,catalogs" - ``` - -3. **Process results** for downstream analysis or purchasing - -### Workflow 4: Chemical Space Sampling - -1. **Select subset parameters** based on screening goals: - - Fragment: MW < 250, good for fragment-based drug discovery - - Lead-like: MW 250-350, LogP ≤ 3.5 - - Drug-like: MW 350-500, follows Lipinski's Rule of Five - -2. **Generate random sample**: - ```bash - curl "https://cartblanche22.docking.org/substance/random.txt:count=5000&subset=lead-like&output_fields=zinc_id,smiles,tranche" > chemical_space_sample.txt - ``` - -3. **Analyze chemical diversity** and prepare for virtual screening - -## Output Fields - -Customize API responses with the `output_fields` parameter: - -**Available fields**: -- `zinc_id`: ZINC identifier -- `smiles`: SMILES string representation -- `sub_id`: Internal substance ID -- `supplier_code`: Vendor catalog number -- `catalogs`: List of suppliers offering the compound -- `tranche`: Encoded molecular properties (H-count, LogP, MW, reactivity phase) - -**Example**: -```bash -curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001&output_fields=zinc_id,smiles,catalogs,tranche" -``` - -## Tranche System - -ZINC organizes compounds into "tranches" based on molecular properties: - -**Format**: `H##P###M###-phase` - -- **H##**: Number of hydrogen bond donors (00-99) -- **P###**: LogP × 10 (e.g., P035 = LogP 3.5) -- **M###**: Molecular weight in Daltons (e.g., M400 = 400 Da) -- **phase**: Reactivity classification - -**Example tranche**: `H05P035M400-0` -- 5 H-bond donors -- LogP = 3.5 -- MW = 400 Da -- Reactivity phase 0 - -Use tranche data to filter compounds by drug-likeness criteria. - -## Downloading 3D Structures - -For molecular docking, 3D structures are available via file repositories: - -**File repository**: https://files.docking.org/zinc22/ - -Structures are organized by tranches and available in multiple formats: -- MOL2: Multi-molecule format with 3D coordinates -- SDF: Structure-data file format -- DB2.GZ: Compressed database format for DOCK - -Refer to ZINC documentation at https://wiki.docking.org for downloading protocols and batch access methods. - -## Python Integration - -### Using curl with Python - -```python -import subprocess -import json - -def query_zinc_by_id(zinc_id, output_fields="zinc_id,smiles,catalogs"): - """Query ZINC22 by ZINC ID.""" - url = f"https://cartblanche22.docking.org/[email protected]_id={zinc_id}&output_fields={output_fields}" - result = subprocess.run(['curl', url], capture_output=True, text=True) - return result.stdout - -def search_by_smiles(smiles, dist=0, adist=0, output_fields="zinc_id,smiles"): - """Search ZINC22 by SMILES with optional distance parameters.""" - url = f"https://cartblanche22.docking.org/smiles.txt:smiles={smiles}&dist={dist}&adist={adist}&output_fields={output_fields}" - result = subprocess.run(['curl', url], capture_output=True, text=True) - return result.stdout - -def get_random_compounds(count=100, subset=None, output_fields="zinc_id,smiles,tranche"): - """Get random compounds from ZINC22.""" - url = f"https://cartblanche22.docking.org/substance/random.txt:count={count}&output_fields={output_fields}" - if subset: - url += f"&subset={subset}" - result = subprocess.run(['curl', url], capture_output=True, text=True) - return result.stdout -``` - -### Parsing Results - -```python -import pandas as pd -from io import StringIO - -# Query ZINC and parse as DataFrame -result = query_zinc_by_id("ZINC000000000001") -df = pd.read_csv(StringIO(result), sep='\t') - -# Extract tranche properties -def parse_tranche(tranche_str): - """Parse ZINC tranche code to extract properties.""" - # Format: H##P###M###-phase - import re - match = re.match(r'H(\d+)P(\d+)M(\d+)-(\d+)', tranche_str) - if match: - return { - 'h_donors': int(match.group(1)), - 'logP': int(match.group(2)) / 10.0, - 'mw': int(match.group(3)), - 'phase': int(match.group(4)) - } - return None - -df['tranche_props'] = df['tranche'].apply(parse_tranche) -``` +## Output +- Filtered lists of purchasable compounds with associated ZINC IDs and structural data. +- Download manifests for 3D-ready structures tailored to molecular docking pipelines. +- Summaries of chemical space coverage and identified supplier networks for target leads. ## Best Practices @@ -347,8 +68,6 @@ df['tranche_props'] = df['tranche'].apply(parse_tranche) ## Resources -### references/api_reference.md - Comprehensive documentation including: - Complete API endpoint reference @@ -378,23 +97,3 @@ ZINC explicitly states: **"We do not guarantee the quality of any molecule for a - Verify licensing terms for commercial use - Respect intellectual property when working with patented compounds - Follow your institution's guidelines for compound procurement - -## Additional Resources - -- **ZINC Website**: https://zinc.docking.org/ -- **CartBlanche22 Interface**: https://cartblanche22.docking.org/ -- **ZINC Wiki**: https://wiki.docking.org/ -- **File Repository**: https://files.docking.org/zinc22/ -- **GitHub**: https://github.com/docking-org/ -- **Primary Publication**: Irwin et al., J. Chem. Inf. Model 2020 (ZINC15) -- **ZINC22 Publication**: Irwin et al., J. Chem. Inf. Model 2023 - -## Citations - -When using ZINC in publications, cite the appropriate version: - -**ZINC22**: -Irwin, J. J., et al. "ZINC22—A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery." *Journal of Chemical Information and Modeling* 2023. - -**ZINC15**: -Irwin, J. J., et al. "ZINC15 – Ligand Discovery for Everyone." *Journal of Chemical Information and Modeling* 2020, 60, 6065–6073. From 64f9e7faaaeede99a3adcfd5a1f73dd8cda780ba Mon Sep 17 00:00:00 2001 From: sixiang-svg Date: Sat, 21 Mar 2026 23:11:12 +0800 Subject: [PATCH 2/2] remove skill: body dismatches frontmatter(bisoservices duplicate) --- skills/Research/uniprot-database/SKILL.md | 164 ---------------------- 1 file changed, 164 deletions(-) delete mode 100644 skills/Research/uniprot-database/SKILL.md diff --git a/skills/Research/uniprot-database/SKILL.md b/skills/Research/uniprot-database/SKILL.md deleted file mode 100644 index 96c08004..00000000 --- a/skills/Research/uniprot-database/SKILL.md +++ /dev/null @@ -1,164 +0,0 @@ ---- -id: uniprot-database -name: UniProt Database -description: Access UniProt via BioServices to retrieve protein sequences, functional annotations, and structural data. -category: Research -requires: [] -examples: - - Retrieve the protein sequence and functional annotations for the human P53 gene. - - Search UniProt for proteins related to insulin signaling in mammals. ---- - -# BioServices - -## Overview - -BioServices is a Python package providing programmatic access to approximately 40 bioinformatics web services and databases. Retrieve biological data, perform cross-database queries, map identifiers, analyze sequences, and integrate multiple biological resources in Python workflows. The package handles both REST and SOAP/WSDL protocols transparently. - -## When to Use This Skill - -This skill should be used when: -- Retrieving protein sequences, annotations, or structures from UniProt, PDB, Pfam -- Analyzing metabolic pathways and gene functions via KEGG or Reactome -- Searching compound databases (ChEBI, ChEMBL, PubChem) for chemical information -- Converting identifiers between different biological databases (KEGG↔UniProt, compound IDs) -- Running sequence similarity searches (BLAST, MUSCLE alignment) -- Querying gene ontology terms (QuickGO, GO annotations) -- Accessing protein-protein interaction data (PSICQUIC, IntactComplex) -- Mining genomic data (BioMart, ArrayExpress, ENA) -- Integrating data from multiple bioinformatics resources in a single workflow - -## Instruction -- Formulate precise queries to retrieve protein sequences, functional descriptions, and structural metadata from UniProtKB. -- Access detailed functional annotations, including GO terms, enzyme classifications (EC numbers), and post-translational modifications. -- Perform identifier mapping (e.g., UniProt ID to PDB or Ensembl) to integrate protein data into multi-omics pipelines. -- Retrieve protein-protein interaction evidence and biological pathway memberships to support systems biology research. -- Coordinate bulk downloads of protein datasets for specific taxonomic lineages or functional families. -- Adhere to UniProt API rate limits and best practices for large-scale data retrieval in Python. - -## Output -- Formatted protein reports containing sequences, functional annotations, and cross-references. -- Mapped identifier lists and structured protein interaction tables. -- Actionable next steps for protein sequence analysis or structure visualization. - -## Core Capabilities - -### 1. Protein Analysis - -Retrieve protein information, sequences, and functional annotations - -### 2. Pathway Discovery and Analysis - -Access KEGG pathway information for genes and organisms -### 3. Compound Database Searches - -Search and cross-reference compounds across multiple databases - -**Common workflow:** -1. Search compound by name in KEGG -2. Extract KEGG compound ID -3. Use UniChem for KEGG → ChEMBL mapping -4. ChEBI IDs are often provided in KEGG entries - - -### 4. Sequence Analysis - -Run BLAST searches and sequence alignments -### 5. Identifier Mapping - -Convert identifiers between different biological databases - -**Supported mappings (UniProt):** -- UniProtKB ↔ KEGG -- UniProtKB ↔ Ensembl -- UniProtKB ↔ PDB -- UniProtKB ↔ RefSeq - - -### 6. Gene Ontology Queries - -Access GO terms and annotations - -### 7. Protein-Protein Interactions - -Query interaction databases via PSICQUIC - -**Available databases:** MINT, IntAct, BioGRID, DIP, and 30+ others. - -## Multi-Service Integration Workflows - -BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns: - -### Complete Protein Analysis Pipeline - -Execute a full protein characterization workflow - -This script demonstrates: -1. UniProt search for protein entry -2. FASTA sequence retrieval -3. BLAST similarity search -4. KEGG pathway discovery -5. PSICQUIC interaction mapping - -### Pathway Network Analysis - -Analyze all pathways for an organism: - -Extracts and analyzes: -- All pathway IDs for organism -- Protein-protein interactions per pathway -- Interaction type distributions -- Exports to CSV/SIF formats - -### Cross-Database Compound Search - -Map compound identifiers across databases: - - -Retrieves: -- KEGG compound ID -- ChEBI identifier -- ChEMBL identifier -- Basic compound properties - -### Batch Identifier Conversion - -Convert multiple identifiers at once: - - -## Best Practices - -### Output Format Handling - -Different services return data in various formats: -- **XML**: Parse using BeautifulSoup (most SOAP services) -- **Tab-separated (TSV)**: Pandas DataFrames for tabular data -- **Dictionary/JSON**: Direct Python manipulation -- **FASTA**: BioPython integration for sequence analysis - -### Rate Limiting and Verbosity - -Control API request behavior - -### Error Handling - -Wrap service calls in try-except blocks - -### Organism Codes - -Use standard organism abbreviations: -- `hsa`: Homo sapiens (human) -- `mmu`: Mus musculus (mouse) -- `dme`: Drosophila melanogaster -- `sce`: Saccharomyces cerevisiae (yeast) - -List all organisms: `k.list("organism")` or `k.organismIds` - -### Integration with Other Tools - -BioServices works well with: -- **BioPython**: Sequence analysis on retrieved FASTA data -- **Pandas**: Tabular data manipulation -- **PyMOL**: 3D structure visualization (retrieve PDB IDs) -- **NetworkX**: Network analysis of pathway interactions -- **Galaxy**: Custom tool wrappers for workflow platforms