Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions skills/Research/tides/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,32 @@
---
id: tides
name: Tides
description: Step-by-step guidance for tides.
description: Step-by-step guidance for tidal data analysis, prediction workflows, and coastal research practices.
category: Research
requires: []
examples:
- When is the next high tide in San Francisco Bay?
- Provide a tide chart for the coast of Maine for the upcoming week.
---

# Tides

Support tides workflows with clear steps and best practices.

## When to Use
## Instruction
- Identify the target geographic location or specific tidal station ID for analysis.
- Retrieve tidal height predictions and historical observations from authoritative maritime databases.
- Analyze tidal cycles to identify the exact times and heights of high tide, low tide, and Slack water.
- Account for local datum shifts (e.g., MLLW, MSL) to ensure consistency in depth measurements.
- Correlate tidal data with lunar phases and meteorological factors (e.g., storm surges) to assess potential flooding risks.
- Provide step-by-step guidance for integrating tidal charts into coastal research or navigation planning.

- You need help with tides.
- You want a clear, actionable next step.
## When to Use
- When planning coastal field research, maritime navigation, or coastal engineering projects.
- When needing precise high/low tide schedules for specific global ports or coastal segments.
- When assessing environmental impacts and sediment transport patterns influenced by tidal cycles.

## Output

- Summary of goals and plan
- Key tips and precautions
- A comprehensive tidal forecast report including peak times and height estimates.
- Visualized tidal curves or charts for the requested time window.
- Actionable safety precautions and optimal time windows for coastal activities.
182 changes: 21 additions & 161 deletions skills/Research/torchdrug/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
---
category: Research
id: torchdrug
name: Torchdrug
description: Graph-based drug discovery toolkit. Molecular property prediction (ADMET), protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis, GNNs (GIN, GAT, SchNet), 40+ datasets, for PyTorch-based ML on molecules, proteins, and biomedical graphs.
description: PyTorch-based drug discovery toolkit for molecular property prediction, protein modeling, and retrosynthesis using GNNs.
category: Research
requires: []
examples:
- Predict the ADMET properties for this SMILES string using TorchDrug.
- Build a protein-protein interaction network from this biological dataset.
---

# TorchDrug
Expand Down Expand Up @@ -37,54 +41,10 @@ This skill should be used when working with:
- Compatible with PyTorch and PyTorch Lightning
- Integrates with AlphaFold and ESM for proteins

## Getting Started

### Installation

```bash
uv pip install torchdrug
# Or with optional dependencies
uv pip install torchdrug[full]
```

### Quick Example

```python
from torchdrug import datasets, models, tasks
from torch.utils.data import DataLoader

# Load molecular dataset
dataset = datasets.BBBP("~/molecule-datasets/")
train_set, valid_set, test_set = dataset.split()

# Define GNN model
model = models.GIN(
input_dim=dataset.node_feature_dim,
hidden_dims=[256, 256, 256],
edge_input_dim=dataset.edge_feature_dim,
batch_norm=True,
readout="mean"
)

# Create property prediction task
task = tasks.PropertyPrediction(
model,
task=dataset.tasks,
criterion="bce",
metric=["auroc", "auprc"]
)

# Train with PyTorch
optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)

for epoch in range(100):
for batch in train_loader:
loss = task(batch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
```
## Output
- Trained GNN models and associated performance metric reports.
- Predicted molecular properties or optimized molecular structures in SMILES format.
- Step-by-step guidance for configuring complex drug discovery pipelines in PyTorch.

## Core Capabilities

Expand All @@ -103,7 +63,7 @@ Predict chemical, physical, and biological properties of molecules from structur
- GNN models (GIN, GAT, SchNet)
- PropertyPrediction and MultipleBinaryClassification tasks

**Reference:** See `references/molecular_property_prediction.md` for:
**Reference:**
- Complete dataset catalog
- Model selection guide
- Training workflows and best practices
Expand All @@ -126,7 +86,7 @@ Work with protein sequences, structures, and properties.
- Structure models (GearNet, SchNet)
- Multiple task types for different prediction levels

**Reference:** See `references/protein_modeling.md` for:
**Reference:**
- Protein-specific datasets
- Sequence vs structure models
- Pre-training strategies
Expand All @@ -147,7 +107,7 @@ Predict missing links and relationships in biological knowledge graphs.
- Embedding models (TransE, RotatE, ComplEx)
- KnowledgeGraphCompletion task

**Reference:** See `references/knowledge_graphs.md` for:
**Reference:**
- Knowledge graph datasets (including Hetionet with 45k biomedical entities)
- Embedding model comparison
- Evaluation metrics and protocols
Expand All @@ -169,7 +129,7 @@ Generate novel molecular structures with desired properties.
- GraphAutoregressiveFlow
- Property optimization workflows

**Reference:** See `references/molecular_generation.md` for:
**Reference:**
- Generation strategies (unconditional, conditional, scaffold-based)
- Multi-objective optimization
- Validation and filtering
Expand All @@ -191,7 +151,7 @@ Predict synthetic routes from target molecules to starting materials.
- SynthonCompletion (reactant prediction)
- End-to-end Retrosynthesis pipeline

**Reference:** See `references/retrosynthesis.md` for:
**Reference:**
- Task decomposition (center ID → synthon completion)
- Multi-step synthesis planning
- Commercial availability checking
Expand All @@ -208,7 +168,7 @@ Comprehensive catalog of GNN architectures for different data types and tasks.
- Knowledge graph: TransE, RotatE, ComplEx, SimplE
- Generative: GraphAutoregressiveFlow

**Reference:** See `references/models_architectures.md` for:
**Reference:**
- Detailed model descriptions
- Model selection guide by task and dataset
- Architecture comparisons
Expand All @@ -224,7 +184,7 @@ Comprehensive catalog of GNN architectures for different data types and tasks.
- Knowledge graphs (general and biomedical)
- Retrosynthesis reactions

**Reference:** See `references/datasets.md` for:
**Reference:**
- Complete dataset catalog with sizes and tasks
- Dataset selection guide
- Loading and preprocessing
Expand All @@ -243,7 +203,6 @@ Comprehensive catalog of GNN architectures for different data types and tasks.
4. Train with scaffold split for realistic evaluation
5. Evaluate using AUROC and AUPRC

**Navigation:** `references/molecular_property_prediction.md` → Dataset selection → Model selection → Training

### Workflow 2: Protein Function Prediction

Expand All @@ -256,7 +215,6 @@ Comprehensive catalog of GNN architectures for different data types and tasks.
4. Fine-tune pre-trained model or train from scratch
5. Evaluate using accuracy and per-class metrics

**Navigation:** `references/protein_modeling.md` → Model selection (sequence vs structure) → Pre-training strategies

### Workflow 3: Drug Repurposing via Knowledge Graphs

Expand All @@ -270,7 +228,6 @@ Comprehensive catalog of GNN architectures for different data types and tasks.
5. Query for "Compound-treats-Disease" predictions
6. Filter by plausibility and mechanism

**Navigation:** `references/knowledge_graphs.md` → Hetionet dataset → Model selection → Biomedical applications

### Workflow 4: De Novo Molecule Generation

Expand All @@ -284,7 +241,6 @@ Comprehensive catalog of GNN architectures for different data types and tasks.
5. Validate chemistry and filter by drug-likeness
6. Rank by multi-objective scoring

**Navigation:** `references/molecular_generation.md` → Conditional generation → Multi-objective optimization

### Workflow 5: Retrosynthesis Planning

Expand All @@ -298,74 +254,26 @@ Comprehensive catalog of GNN architectures for different data types and tasks.
5. Apply recursively for multi-step planning
6. Check commercial availability of building blocks

**Navigation:** `references/retrosynthesis.md` → Task types → Multi-step planning

## Integration Patterns

### With RDKit

Convert between TorchDrug molecules and RDKit:
```python
from torchdrug import data
from rdkit import Chem

# SMILES → TorchDrug molecule
smiles = "CCO"
mol = data.Molecule.from_smiles(smiles)

# TorchDrug → RDKit
rdkit_mol = mol.to_molecule()

# RDKit → TorchDrug
rdkit_mol = Chem.MolFromSmiles(smiles)
mol = data.Molecule.from_molecule(rdkit_mol)
```
Convert between TorchDrug molecules and RDKit

### With AlphaFold/ESM

Use predicted structures:
```python
from torchdrug import data

# Load AlphaFold predicted structure
protein = data.Protein.from_pdb("AF-P12345-F1-model_v4.pdb")

# Build graph with spatial edges
graph = protein.residue_graph(
node_position="ca",
edge_types=["sequential", "radius"],
radius_cutoff=10.0
)
```
Use predicted structures

### With PyTorch Lightning

Wrap tasks for Lightning training:
```python
import pytorch_lightning as pl

class LightningTask(pl.LightningModule):
def __init__(self, torchdrug_task):
super().__init__()
self.task = torchdrug_task

def training_step(self, batch, batch_idx):
return self.task(batch)

def validation_step(self, batch, batch_idx):
pred = self.task.predict(batch)
target = self.task.target(batch)
return {"pred": pred, "target": target}

def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)
```
Wrap tasks for Lightning training

## Technical Details

For deep dives into TorchDrug's architecture:

**Core Concepts:** See `references/core_concepts.md` for:
**Core Concepts:**
- Architecture philosophy (modular, configurable)
- Data structures (Graph, Molecule, Protein, PackedGraph)
- Model interface and forward function signature
Expand All @@ -374,73 +282,25 @@ For deep dives into TorchDrug's architecture:
- Loss functions and metrics
- Common pitfalls and debugging

## Quick Reference Cheat Sheet

**Choose Dataset:**
- Molecular property → `references/datasets.md` → Molecular section
- Protein task → `references/datasets.md` → Protein section
- Knowledge graph → `references/datasets.md` → Knowledge graph section

**Choose Model:**
- Molecules → `references/models_architectures.md` → GNN section → GIN/GAT/SchNet
- Proteins (sequence) → `references/models_architectures.md` → Protein section → ESM
- Proteins (structure) → `references/models_architectures.md` → Protein section → GearNet
- Knowledge graph → `references/models_architectures.md` → KG section → RotatE/ComplEx

**Common Tasks:**
- Property prediction → `references/molecular_property_prediction.md` or `references/protein_modeling.md`
- Generation → `references/molecular_generation.md`
- Retrosynthesis → `references/retrosynthesis.md`
- KG reasoning → `references/knowledge_graphs.md`

**Understand Architecture:**
- Data structures → `references/core_concepts.md` → Data Structures
- Model design → `references/core_concepts.md` → Model Interface
- Task design → `references/core_concepts.md` → Task Interface

## Troubleshooting Common Issues

**Issue: Dimension mismatch errors**
→ Check `model.input_dim` matches `dataset.node_feature_dim`
→ See `references/core_concepts.md` → Essential Attributes

**Issue: Poor performance on molecular tasks**
→ Use scaffold splitting, not random
→ Try GIN instead of GCN
→ See `references/molecular_property_prediction.md` → Best Practices

**Issue: Protein model not learning**
→ Use pre-trained ESM for sequence tasks
→ Check edge construction for structure models
→ See `references/protein_modeling.md` → Training Workflows

**Issue: Memory errors with large graphs**
→ Reduce batch size
→ Use gradient accumulation
→ See `references/core_concepts.md` → Memory Efficiency

**Issue: Generated molecules are invalid**
→ Add validity constraints
→ Post-process with RDKit validation
→ See `references/molecular_generation.md` → Validation and Filtering

## Resources

**Official Documentation:** https://torchdrug.ai/docs/
**GitHub:** https://github.com/DeepGraphLearning/torchdrug
**Paper:** TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery

## Summary

Navigate to the appropriate reference file based on your task:

1. **Molecular property prediction** → `molecular_property_prediction.md`
2. **Protein modeling** → `protein_modeling.md`
3. **Knowledge graphs** → `knowledge_graphs.md`
4. **Molecular generation** → `molecular_generation.md`
5. **Retrosynthesis** → `retrosynthesis.md`
6. **Model selection** → `models_architectures.md`
7. **Dataset selection** → `datasets.md`
8. **Technical details** → `core_concepts.md`

Each reference provides comprehensive coverage of its domain with examples, best practices, and common use cases.
9 changes: 6 additions & 3 deletions skills/Research/tracking-regression-tests/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
---
category: Research
id: tracking-regression-tests
name: Tracking Regression Tests
description: This skill enables Claude to track and run regression tests, ensuring new changes don't break existing functionality. It is triggered when the user asks to "track regression", "run regression tests", or uses the shortcut "reg". The skill helps in maintaining code stability by identifying critical tests, automating their execution, and analyzing the impact of changes. It also provides insights into test history and identifies flaky tests. The skill uses the `regression-test-tracker` plugin.
This skill enables Claude to track and run regression tests, ensuring new changes don't break existing functionality. It is triggered when the user asks to "track regression", "run regression tests", or uses the shortcut "reg". The skill helps in maintaining code stability by identifying critical tests, automating their execution, and analyzing the impact of changes. It also provides insights into test history and identifies flaky tests. The skill uses the `regression-test-tracker` plugin.
description: Track and run regression tests using the regression-test-tracker to ensure code stability and identify failures.
category: Research
requires: []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Incorrect category for development/testing skill

This skill covers software regression testing workflows, which is clearly a development category skill — not research. The same misclassification applies to several other skills in this PR:

  • skills/Research/tracking-regression-tests/SKILL.md:6category: Research → should be development
  • skills/Research/update-flaky-tests/SKILL.md:6category: Research → should be development
  • skills/Research/uv-global/SKILL.md:6category: Research → should be development
  • skills/Research/wiring/SKILL.md:6category: Research → should be development

All four skills describe developer tooling, test management, package management, or frontend architecture patterns — none of which belong in the research category. Using the wrong category will cause these skills to surface incorrectly in category-filtered searches.

Suggested change
requires: []
category: development

examples:
- Run the regression test suite for the current development branch.
- Mark this specific unit test as a regression test for automated tracking.
---

## Overview
Expand Down
Loading
Loading