From 5a5f122bb254f01dda4e8b5f94a182568eec337f Mon Sep 17 00:00:00 2001 From: sixiang <1210552219@qq.com> Date: Mon, 16 Mar 2026 23:21:41 +0800 Subject: [PATCH 1/3] feat: update research skills part 7 (20 folders) --- skills/Research/pubmed-database/SKILL.md | 239 +---------- skills/Research/pycse/SKILL.md | 26 +- skills/Research/pylabrobot/SKILL.md | 83 +--- skills/Research/pymatgen/SKILL.md | 28 +- skills/Research/pyopenms/SKILL.md | 143 +------ skills/Research/pysam/SKILL.md | 84 +--- skills/Research/pytdc/SKILL.md | 238 +---------- .../Research/pythia-event-generator/SKILL.md | 27 +- skills/Research/qlm-calibration/SKILL.md | 26 +- skills/Research/qms-audit-expert/SKILL.md | 33 +- skills/Research/qutip/SKILL.md | 266 ++----------- skills/Research/radio-copilot/SKILL.md | 26 +- skills/Research/rag-implementation/SKILL.md | 29 +- skills/Research/rdkit/SKILL.md | 28 +- skills/Research/research-engineer/SKILL.md | 344 ++-------------- skills/Research/research-grants/SKILL.md | 29 +- skills/Research/research-lookup/SKILL.md | 29 +- skills/Research/safety-interlocks/SKILL.md | 115 ++---- skills/Research/safety-system-skill/SKILL.md | 32 +- skills/Research/scanpy/SKILL.md | 374 ++---------------- 20 files changed, 365 insertions(+), 1834 deletions(-) diff --git a/skills/Research/pubmed-database/SKILL.md b/skills/Research/pubmed-database/SKILL.md index e99d0650..80a1cf9a 100644 --- a/skills/Research/pubmed-database/SKILL.md +++ b/skills/Research/pubmed-database/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: pubmed-database name: PubMed Database -description: Guidance and answers for pubmed database. +description: Search and retrieve metadata, citations, and abstracts from the PubMed database for biomedical and life sciences research. +category: Research +requires: [] +examples: + - Search PubMed for recent clinical trials on immunotherapy for cancer. + - Retrieve metadata and citations for a specific PMID from PubMed. --- # Biopython: Computational Molecular Biology in Python @@ -44,19 +48,8 @@ Biopython is organized into modular sub-packages, each addressing specific bioin Install Biopython using pip (requires Python 3 and NumPy): -```python -uv pip install biopython -``` - For NCBI database access, always set your email address (required by NCBI): -```python -from Bio import Entrez -Entrez.email = "your.email@example.com" - -# Optional: API key for higher rate limits (10 req/s instead of 3 req/s) -Entrez.api_key = "your_api_key_here" -``` ## Using This Skill @@ -64,8 +57,6 @@ This skill provides comprehensive documentation organized by functionality area. ### 1. Sequence Handling (Bio.Seq & Bio.SeqIO) -**Reference:** `references/sequence_io.md` - Use for: - Creating and manipulating biological sequences - Reading and writing sequence files (FASTA, GenBank, FASTQ, etc.) @@ -74,22 +65,8 @@ Use for: - Sequence translation, transcription, and reverse complement - Working with SeqRecord objects -**Quick example:** -```python -from Bio import SeqIO - -# Read sequences from FASTA file -for record in SeqIO.parse("sequences.fasta", "fasta"): - print(f"{record.id}: {len(record.seq)} bp") - -# Convert GenBank to FASTA -SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta") -``` - ### 2. Alignment Analysis (Bio.Align & Bio.AlignIO) -**Reference:** `references/alignment.md` - Use for: - Pairwise sequence alignment (global and local) - Reading and writing multiple sequence alignments @@ -97,21 +74,8 @@ Use for: - Calculating alignment statistics - Customizing alignment parameters -**Quick example:** -```python -from Bio import Align - -# Pairwise alignment -aligner = Align.PairwiseAligner() -aligner.mode = 'global' -alignments = aligner.align("ACCGGT", "ACGGT") -print(alignments[0]) -``` - ### 3. Database Access (Bio.Entrez) -**Reference:** `references/databases.md` - Use for: - Searching NCBI databases (PubMed, GenBank, Protein, Gene, etc.) - Downloading sequences and records @@ -119,22 +83,8 @@ Use for: - Finding related records across databases - Batch downloading with proper rate limiting -**Quick example:** -```python -from Bio import Entrez -Entrez.email = "your.email@example.com" - -# Search PubMed -handle = Entrez.esearch(db="pubmed", term="biopython", retmax=10) -results = Entrez.read(handle) -handle.close() -print(f"Found {results['Count']} results") -``` - ### 4. BLAST Operations (Bio.Blast) -**Reference:** `references/blast.md` - Use for: - Running BLAST searches via NCBI web services - Running local BLAST searches @@ -142,22 +92,8 @@ Use for: - Filtering results by E-value or identity - Extracting hit sequences -**Quick example:** -```python -from Bio.Blast import NCBIWWW, NCBIXML - -# Run BLAST search -result_handle = NCBIWWW.qblast("blastn", "nt", "ATCGATCGATCG") -blast_record = NCBIXML.read(result_handle) - -# Display top hits -for alignment in blast_record.alignments[:5]: - print(f"{alignment.title}: E-value={alignment.hsps[0].expect}") -``` - ### 5. Structural Bioinformatics (Bio.PDB) -**Reference:** `references/structure.md` Use for: - Parsing PDB and mmCIF structure files @@ -167,24 +103,9 @@ Use for: - Structure superimposition and RMSD calculation - Extracting sequences from structures -**Quick example:** -```python -from Bio.PDB import PDBParser - -# Parse structure -parser = PDBParser(QUIET=True) -structure = parser.get_structure("1crn", "1crn.pdb") - -# Calculate distance between alpha carbons -chain = structure[0]["A"] -distance = chain[10]["CA"] - chain[20]["CA"] -print(f"Distance: {distance:.2f} Å") -``` ### 6. Phylogenetics (Bio.Phylo) -**Reference:** `references/phylogenetics.md` - Use for: - Reading and writing phylogenetic trees (Newick, NEXUS, phyloXML) - Building trees from distance matrices or alignments @@ -193,23 +114,9 @@ Use for: - Creating consensus trees - Visualizing trees -**Quick example:** -```python -from Bio import Phylo - -# Read and visualize tree -tree = Phylo.read("tree.nwk", "newick") -Phylo.draw_ascii(tree) - -# Calculate distance -distance = tree.distance("Species_A", "Species_B") -print(f"Distance: {distance:.3f}") -``` ### 7. Advanced Features -**Reference:** `references/advanced.md` - Use for: - **Sequence motifs** (Bio.motifs) - Finding and analyzing motif patterns - **Population genetics** (Bio.PopGen) - GenePop files, Fst calculations, Hardy-Weinberg tests @@ -218,15 +125,6 @@ Use for: - **Clustering** (Bio.Cluster) - K-means and hierarchical clustering - **Genome diagrams** (GenomeDiagram) - Visualizing genomic features -**Quick example:** -```python -from Bio.SeqUtils import gc_fraction, molecular_weight -from Bio.Seq import Seq - -seq = Seq("ATCGATCGATCG") -print(f"GC content: {gc_fraction(seq):.2%}") -print(f"Molecular weight: {molecular_weight(seq, seq_type='DNA'):.2f} g/mol") -``` ## General Workflow Guidelines @@ -239,136 +137,33 @@ When a user asks about a specific Biopython task: 3. **Extract relevant code patterns** and adapt them to the user's specific needs 4. **Combine multiple modules** when the task requires it -Example search patterns for reference files: -```bash -# Find information about specific functions -grep -n "SeqIO.parse" references/sequence_io.md - -# Find examples of specific tasks -grep -n "BLAST" references/blast.md - -# Find information about specific concepts -grep -n "alignment" references/alignment.md -``` - ### Writing Biopython Code Follow these principles when writing Biopython code: 1. **Import modules explicitly** - ```python - from Bio import SeqIO, Entrez - from Bio.Seq import Seq - ``` 2. **Set Entrez email** when using NCBI databases - ```python - Entrez.email = "your.email@example.com" - ``` 3. **Use appropriate file formats** - Check which format best suits the task - ```python - # Common formats: "fasta", "genbank", "fastq", "clustal", "phylip" - ``` 4. **Handle files properly** - Close handles after use or use context managers - ```python - with open("file.fasta") as handle: - records = SeqIO.parse(handle, "fasta") - ``` 5. **Use iterators for large files** - Avoid loading everything into memory - ```python - for record in SeqIO.parse("large_file.fasta", "fasta"): - # Process one record at a time - ``` 6. **Handle errors gracefully** - Network operations and file parsing can fail - ```python - try: - handle = Entrez.efetch(db="nucleotide", id=accession) - except HTTPError as e: - print(f"Error: {e}") - ``` ## Common Patterns ### Pattern 1: Fetch Sequence from GenBank -```python -from Bio import Entrez, SeqIO - -Entrez.email = "your.email@example.com" - -# Fetch sequence -handle = Entrez.efetch(db="nucleotide", id="EU490707", rettype="gb", retmode="text") -record = SeqIO.read(handle, "genbank") -handle.close() - -print(f"Description: {record.description}") -print(f"Sequence length: {len(record.seq)}") -``` ### Pattern 2: Sequence Analysis Pipeline -```python -from Bio import SeqIO -from Bio.SeqUtils import gc_fraction - -for record in SeqIO.parse("sequences.fasta", "fasta"): - # Calculate statistics - gc = gc_fraction(record.seq) - length = len(record.seq) - - # Find ORFs, translate, etc. - protein = record.seq.translate() - - print(f"{record.id}: {length} bp, GC={gc:.2%}") -``` - ### Pattern 3: BLAST and Fetch Top Hits -```python -from Bio.Blast import NCBIWWW, NCBIXML -from Bio import Entrez, SeqIO - -Entrez.email = "your.email@example.com" - -# Run BLAST -result_handle = NCBIWWW.qblast("blastn", "nt", sequence) -blast_record = NCBIXML.read(result_handle) - -# Get top hit accessions -accessions = [aln.accession for aln in blast_record.alignments[:5]] - -# Fetch sequences -for acc in accessions: - handle = Entrez.efetch(db="nucleotide", id=acc, rettype="fasta", retmode="text") - record = SeqIO.read(handle, "fasta") - handle.close() - print(f">{record.description}") -``` - ### Pattern 4: Build Phylogenetic Tree from Sequences -```python -from Bio import AlignIO, Phylo -from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor - -# Read alignment -alignment = AlignIO.read("alignment.fasta", "fasta") - -# Calculate distances -calculator = DistanceCalculator("identity") -dm = calculator.get_distance(alignment) - -# Build tree -constructor = DistanceTreeConstructor() -tree = constructor.nj(dm) - -# Visualize -Phylo.draw_ascii(tree) -``` ## Best Practices @@ -403,28 +198,6 @@ Phylo.draw_ascii(tree) ### Issue: PDB parser warnings **Solution:** Use `PDBParser(QUIET=True)` to suppress warnings, or investigate structure quality. -## Additional Resources - -- **Official Documentation**: https://biopython.org/docs/latest/ -- **Tutorial**: https://biopython.org/docs/latest/Tutorial/ -- **Cookbook**: https://biopython.org/docs/latest/Tutorial/ (advanced examples) -- **GitHub**: https://github.com/biopython/biopython -- **Mailing List**: biopython@biopython.org - -## Quick Reference - -To locate information in reference files, use these search patterns: - -```bash -# Search for specific functions -grep -n "function_name" references/*.md - -# Find examples of specific tasks -grep -n "example" references/sequence_io.md - -# Find all occurrences of a module -grep -n "Bio.Seq" references/*.md -``` ## Summary diff --git a/skills/Research/pycse/SKILL.md b/skills/Research/pycse/SKILL.md index 5ac14fec..f619a206 100644 --- a/skills/Research/pycse/SKILL.md +++ b/skills/Research/pycse/SKILL.md @@ -1,20 +1,32 @@ --- id: pycse name: PyCSE -description: Step-by-step guidance for pycse. +description: Python tools for chemical science and engineering, including numerical solutions for ODEs, regression, and data analysis. category: Research +requires: [] +examples: + - Help me solve an ordinary differential equation (ODE) using PyCSE. + - Apply regression analysis to my experimental data using PyCSE tools. --- # PyCSE Support pycse workflows with clear steps and best practices. -## When to Use +## Instruction +- Identify the engineering problem type, such as reactor modeling, heat transfer, or kinetic parameter estimation. +- Utilize the numerical solvers to integrate systems of ordinary differential equations (ODEs) for time-dependent chemical processes. +- Perform non-linear regression analysis to fit experimental data to theoretical models, ensuring proper error estimation. +- Implement boundary value problem (BVP) solvers for steady-state transport phenomena and diffusion-reaction systems. +- Use specialized utilities for physical property lookups and unit conversions relevant to chemical engineering. +- Generate high-quality plots to visualize concentration profiles, temperature gradients, or regression residuals. -- You need help with pycse. -- You want a clear, actionable next step. +## When to Use +- When solving complex mathematical models in chemical kinetics, thermodynamics, or transport phenomena. +- When performing statistical data fitting and regression for experimental laboratory results. +- When needing a Pythonic interface for traditional engineering numerical methods. ## Output - -- Summary of goals and plan -- Key tips and precautions +- Numerical solutions for differential equations and identified system steady-states. +- Regression reports including optimized parameters, confidence intervals, and R-squared values. +- Visualized engineering data and summary reports on model validity. \ No newline at end of file diff --git a/skills/Research/pylabrobot/SKILL.md b/skills/Research/pylabrobot/SKILL.md index b24af3c3..c79e0ad7 100644 --- a/skills/Research/pylabrobot/SKILL.md +++ b/skills/Research/pylabrobot/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: pylabrobot name: PyLabRobot -description: Step-by-step guidance for pylabrobot. +description: Hardware-agnostic Python SDK for controlling liquid handlers, plate readers, and other lab automation equipment through a unified interface. +category: Research +requires: [] +examples: + - Initialize a Hamilton STAR liquid handler and pick up tips. + - Create a plate reading protocol using the BMG CLARIOstar backend. --- # PyLabRobot @@ -84,26 +88,6 @@ Visualize and simulate laboratory protocols: To get started with PyLabRobot, install the package and initialize a liquid handler: -```python -# Install PyLabRobot -# uv pip install pylabrobot - -# Basic liquid handling setup -from pylabrobot.liquid_handling import LiquidHandler -from pylabrobot.liquid_handling.backends import STAR -from pylabrobot.resources import STARLetDeck - -# Initialize liquid handler -lh = LiquidHandler(backend=STAR(), deck=STARLetDeck()) -await lh.setup() - -# Basic operations -await lh.pick_up_tips(tip_rack["A1:H1"]) -await lh.aspirate(plate["A1"], vols=100) -await lh.dispense(plate["A2"], vols=100) -await lh.drop_tips() -``` - ## Working with References This skill organizes detailed information across multiple reference files. Load the relevant reference when: @@ -133,49 +117,18 @@ When creating laboratory automation protocols with PyLabRobot: ### Liquid Transfer Protocol -```python -# Setup -lh = LiquidHandler(backend=STAR(), deck=STARLetDeck()) -await lh.setup() - -# Define resources -tip_rack = TIP_CAR_480_A00(name="tip_rack") -source_plate = Cos_96_DW_1mL(name="source") -dest_plate = Cos_96_DW_1mL(name="dest") - -lh.deck.assign_child_resource(tip_rack, rails=1) -lh.deck.assign_child_resource(source_plate, rails=10) -lh.deck.assign_child_resource(dest_plate, rails=15) - -# Transfer protocol -await lh.pick_up_tips(tip_rack["A1:H1"]) -await lh.transfer(source_plate["A1:H12"], dest_plate["A1:H12"], vols=100) -await lh.drop_tips() -``` ### Plate Reading Workflow -```python -# Setup plate reader -from pylabrobot.plate_reading import PlateReader -from pylabrobot.plate_reading.clario_star_backend import CLARIOstarBackend - -pr = PlateReader(name="CLARIOstar", backend=CLARIOstarBackend()) -await pr.setup() - -# Set temperature and read -await pr.set_temperature(37) -await pr.open() -# (manually or robotically load plate) -await pr.close() -data = await pr.read_absorbance(wavelength=450) -``` - -## Additional Resources - -- **Official Documentation**: https://docs.pylabrobot.org -- **GitHub Repository**: https://github.com/PyLabRobot/pylabrobot -- **Community Forum**: https://discuss.pylabrobot.org -- **PyPI Package**: https://pypi.org/project/PyLabRobot/ - -For detailed usage of specific capabilities, refer to the corresponding reference file in the `references/` directory. +## Instruction +- Initialize the target liquid handling system (e.g., Hamilton, Opentrons, Tecan) using the appropriate backend configuration. +- Define the laboratory deck layout by assigning child resources such as tip racks, source plates, and destination reservoirs to specific rails or slots. +- Program complex liquid handling sequences, including tip pickup, aspiration, dispensing, and tip disposal with specific volume controls. +- Implement multi-device workflows by integrating plate readers, heater-shakers, or centrifuges into the unified Python interface. +- Execute protocol simulations to verify deck coordinates and movement paths before running on physical hardware. +- Handle asynchronous operations for high-throughput batch processing to maximize robot utilization. + +## Output +- Executable Python automation scripts and validated deck configuration files. +- Real-time status logs for robot movements and liquid transfer completion. +- Simulation reports and visual summaries of the automated experimental workflow. \ No newline at end of file diff --git a/skills/Research/pymatgen/SKILL.md b/skills/Research/pymatgen/SKILL.md index 0ca18cde..4f98e16e 100644 --- a/skills/Research/pymatgen/SKILL.md +++ b/skills/Research/pymatgen/SKILL.md @@ -1,20 +1,32 @@ --- -category: Research id: pymatgen name: Pymatgen -description: Materials science toolkit. Crystal structures (CIF, POSCAR), phase diagrams, band structure, DOS, Materials Project integration, format conversion, for computational materials science. +description: Materials science toolkit for crystal structures, phase diagrams, band structure analysis, and Materials Project integration. +category: Research +requires: [] +examples: + - Generate a phase diagram for a multi-component materials system. + - Convert a CIF file to a POSCAR format for VASP calculations. --- # Pymatgen Materials science toolkit. Crystal structures (CIF, POSCAR), phase diagrams, band structure, DOS, Materials Project integration, format conversion, for computational materials science. -## When to Use +## Instruction +- Load crystal structures from various formats (CIF, POSCAR, MPRester) and perform structural symmetry analysis. +- Access the Materials Project API to retrieve computed properties, such as band gaps, formation energies, and elastic constants. +- Construct phase diagrams for multi-component materials systems to predict thermodynamic stability and decomposition pathways. +- Analyze electronic structures by generating Band Structure and Density of States (DOS) plots from simulation outputs. +- Perform geometric transformations, including creating supercells, slab models for surfaces, and point defects. +- Coordinate the setup of high-throughput Ab Initio calculations by generating standardized input files for VASP or Gaussian. -- You need help with pymatgen. -- You want a clear, actionable next step. +## When to Use +- When conducting computational materials design and thermodynamic stability analysis. +- When retrieving high-fidelity materials property data from the Materials Project database. +- When automating the preprocessing of structures for density functional theory (DFT) simulations. ## Output - -- Summary of goals and plan -- Key tips and precautions +- Processed crystal structure files and structural analysis reports. +- Thermodynamic phase diagrams and electronic structure visualizations (DOS/Band Structure). +- Automated input sets for quantum chemistry and solid-state physics software. \ No newline at end of file diff --git a/skills/Research/pyopenms/SKILL.md b/skills/Research/pyopenms/SKILL.md index 0110ad35..745a7168 100644 --- a/skills/Research/pyopenms/SKILL.md +++ b/skills/Research/pyopenms/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: pyopenms name: pyOpenMS -description: Guidance and answers for pyopenms. +description: Python bindings for OpenMS to process mass spectrometry data, including file handling, feature detection, and peptide identification. +category: Research +requires: [] +examples: + - Read an mzML file and iterate through its mass spectra. + - Perform feature detection on mass spectrometry data using pyOpenMS. --- # PyOpenMS @@ -13,18 +17,9 @@ PyOpenMS provides Python bindings to the OpenMS library for computational mass s ## Installation -Install using uv: - -```bash -uv uv pip install pyopenms -``` - -Verify installation: +Install using uv -```python -import pyopenms -print(pyopenms.__version__) -``` +Verify installation ## Core Capabilities @@ -38,50 +33,17 @@ Handle mass spectrometry file formats and convert between representations. Basic file reading: -```python -import pyopenms as ms - -# Read mzML file -exp = ms.MSExperiment() -ms.MzMLFile().load("data.mzML", exp) - -# Access spectra -for spectrum in exp: - mz, intensity = spectrum.get_peaks() - print(f"Spectrum: {len(mz)} peaks") -``` - -**For detailed file handling**: See `references/file_io.md` - ### 2. Signal Processing Process raw spectral data with smoothing, filtering, centroiding, and normalization. Basic spectrum processing: -```python -# Smooth spectrum with Gaussian filter -gaussian = ms.GaussFilter() -params = gaussian.getParameters() -params.setValue("gaussian_width", 0.1) -gaussian.setParameters(params) -gaussian.filterExperiment(exp) -``` - -**For algorithm details**: See `references/signal_processing.md` ### 3. Feature Detection Detect and link features across spectra and samples for quantitative analysis. -```python -# Detect features -ff = ms.FeatureFinder() -ff.run("centroided", exp, features, params, ms.FeatureMap()) -``` - -**For complete workflows**: See `references/feature_detection.md` - ### 4. Peptide and Protein Identification Integrate with search engines and process identification results. @@ -90,19 +52,6 @@ Integrate with search engines and process identification results. Basic identification workflow: -```python -# Load identification data -protein_ids = [] -peptide_ids = [] -ms.IdXMLFile().load("identifications.idXML", protein_ids, peptide_ids) - -# Apply FDR filtering -fdr = ms.FalseDiscoveryRate() -fdr.apply(peptide_ids) -``` - -**For detailed workflows**: See `references/identification.md` - ### 5. Metabolomics Analysis Perform untargeted metabolomics preprocessing and analysis. @@ -114,8 +63,6 @@ Typical workflow: 4. Link features to consensus map 5. Annotate with compound databases -**For complete metabolomics workflows**: See `references/metabolomics.md` - ## Data Structures PyOpenMS uses these primary objects: @@ -128,65 +75,16 @@ PyOpenMS uses these primary objects: - **PeptideIdentification**: Search results for peptides - **ProteinIdentification**: Search results for proteins -**For detailed documentation**: See `references/data_structures.md` - ## Common Workflows -### Quick Start: Load and Explore Data - -```python -import pyopenms as ms - -# Load mzML file -exp = ms.MSExperiment() -ms.MzMLFile().load("sample.mzML", exp) - -# Get basic statistics -print(f"Number of spectra: {exp.getNrSpectra()}") -print(f"Number of chromatograms: {exp.getNrChromatograms()}") - -# Examine first spectrum -spec = exp.getSpectrum(0) -print(f"MS level: {spec.getMSLevel()}") -print(f"Retention time: {spec.getRT()}") -mz, intensity = spec.get_peaks() -print(f"Peaks: {len(mz)}") -``` ### Parameter Management -Most algorithms use a parameter system: - -```python -# Get algorithm parameters -algo = ms.GaussFilter() -params = algo.getParameters() - -# View available parameters -for param in params.keys(): - print(f"{param}: {params.getValue(param)}") - -# Modify parameters -params.setValue("gaussian_width", 0.2) -algo.setParameters(params) -``` +Most algorithms use a parameter system ### Export to Pandas -Convert data to pandas DataFrames for analysis: - -```python -import pyopenms as ms -import pandas as pd - -# Load feature map -fm = ms.FeatureMap() -ms.FeatureXMLFile().load("features.featureXML", fm) - -# Convert to DataFrame -df = fm.get_df() -print(df.head()) -``` +Convert data to pandas DataFrames for analysis ## Integration with Other Tools @@ -197,17 +95,12 @@ PyOpenMS integrates with: - **Matplotlib/Seaborn**: Visualization - **R**: Via rpy2 bridge -## Resources - -- **Official documentation**: https://pyopenms.readthedocs.io -- **OpenMS documentation**: https://www.openms.org -- **GitHub**: https://github.com/OpenMS/OpenMS - -## References +## When to Use +- When processing large-scale LC-MS/MS datasets for proteomics or metabolomics research. +- When identifying peptides and quantifying their abundance in biological samples. +- When building custom computational mass spectrometry pipelines in Python. -- `references/file_io.md` - Comprehensive file format handling -- `references/signal_processing.md` - Signal processing algorithms -- `references/feature_detection.md` - Feature detection and linking -- `references/identification.md` - Peptide and protein identification -- `references/metabolomics.md` - Metabolomics-specific workflows -- `references/data_structures.md` - Core objects and data structures +## Output +- Picked peaks and identified molecular features with associated intensities. +- Peptide and protein identification lists with scoring and validation metrics. +- Quantified protein expression tables ready for bioinformatics interpretation. \ No newline at end of file diff --git a/skills/Research/pysam/SKILL.md b/skills/Research/pysam/SKILL.md index 510e89d9..768433f3 100644 --- a/skills/Research/pysam/SKILL.md +++ b/skills/Research/pysam/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: pysam name: pysam -description: Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines. +description: Python interface for htslib to read, manipulate, and write SAM/BAM/CRAM alignments, VCF/BCF variants, and FASTA/FASTQ sequences. +category: Research +requires: [] +examples: + - Calculate read depth for a specific region in a BAM file. + - Filter variants in a VCF file based on quality scores. --- # Pysam @@ -25,41 +29,15 @@ This skill should be used when: ## Quick Start -### Installation -```bash -uv pip install pysam -``` ### Basic Examples **Read alignment file:** -```python -import pysam -# Open BAM file and fetch reads in region -samfile = pysam.AlignmentFile("example.bam", "rb") -for read in samfile.fetch("chr1", 1000, 2000): - print(f"{read.query_name}: {read.reference_start}") -samfile.close() -``` **Read variant file:** -```python -# Open VCF file and iterate variants -vcf = pysam.VariantFile("variants.vcf") -for variant in vcf: - print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alts}") -vcf.close() -``` **Query reference sequence:** -```python -# Open FASTA and extract sequence -fasta = pysam.FastaFile("reference.fasta") -sequence = fasta.fetch("chr1", 1000, 2000) -print(sequence) -fasta.close() -``` ## Core Capabilities @@ -209,53 +187,3 @@ Specify format when opening files: 6. **Stream limitations:** Only stdin/stdout are supported for streaming, not arbitrary Python file objects 7. **Thread safety:** While GIL is released during I/O, comprehensive thread-safety hasn't been fully validated -## Command-Line Tools - -Pysam provides access to samtools and bcftools commands: - -```python -# Sort BAM file -pysam.samtools.sort("-o", "sorted.bam", "input.bam") - -# Index BAM -pysam.samtools.index("sorted.bam") - -# View specific region -pysam.samtools.view("-b", "-o", "region.bam", "input.bam", "chr1:1000-2000") - -# BCF tools -pysam.bcftools.view("-O", "z", "-o", "output.vcf.gz", "input.vcf") -``` - -**Error handling:** -```python -try: - pysam.samtools.sort("-o", "output.bam", "input.bam") -except pysam.SamtoolsError as e: - print(f"Error: {e}") -``` - -## Resources - -### references/ - -Detailed documentation for each major capability: - -- **alignment_files.md** - Complete guide to SAM/BAM/CRAM operations, including AlignmentFile class, AlignedSegment attributes, fetch operations, pileup analysis, and writing alignments - -- **variant_files.md** - Complete guide to VCF/BCF operations, including VariantFile class, VariantRecord attributes, genotype handling, INFO/FORMAT fields, and multi-sample operations - -- **sequence_files.md** - Complete guide to FASTA/FASTQ operations, including FastaFile and FastxFile classes, sequence extraction, quality score handling, and tabix-indexed file access - -- **common_workflows.md** - Practical examples of integrated bioinformatics workflows combining multiple file types, including quality control, coverage analysis, variant validation, and sequence extraction - -## Getting Help - -For detailed information on specific operations, refer to the appropriate reference document: - -- Working with BAM files or calculating coverage → `alignment_files.md` -- Analyzing variants or genotypes → `variant_files.md` -- Extracting sequences or processing FASTQ → `sequence_files.md` -- Complex workflows integrating multiple file types → `common_workflows.md` - -Official documentation: https://pysam.readthedocs.io/ diff --git a/skills/Research/pytdc/SKILL.md b/skills/Research/pytdc/SKILL.md index 94666bf8..2b399a92 100644 --- a/skills/Research/pytdc/SKILL.md +++ b/skills/Research/pytdc/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: pytdc name: PyTDC -description: Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction. +description: Therapeutics Data Commons for AI-ready drug discovery datasets, benchmarks, scaffold splits, and molecular oracles. +category: Research +requires: [] +examples: + - Load an ADME dataset from TDC for property prediction. + - Evaluate model predictions using the TDC MAE evaluator. --- # PyTDC (Therapeutics Data Commons) @@ -26,46 +30,15 @@ This skill should be used when: Install PyTDC using pip: -```bash -uv pip install PyTDC -``` To upgrade to the latest version: -```bash -uv pip install PyTDC --upgrade -``` Core dependencies (automatically installed): - numpy, pandas, tqdm, seaborn, scikit_learn, fuzzywuzzy Additional packages are installed automatically as needed for specific features. -## Quick Start - -The basic pattern for accessing any TDC dataset follows this structure: - -```python -from tdc. import -data = (name='') -split = data.get_split(method='scaffold', seed=1, frac=[0.7, 0.1, 0.2]) -df = data.get_data(format='df') -``` - -Where: -- ``: One of `single_pred`, `multi_pred`, or `generation` -- ``: Specific task category (e.g., ADME, DTI, MolGen) -- ``: Dataset name within that task - -**Example - Loading ADME data:** - -```python -from tdc.single_pred import ADME -data = ADME(name='Caco2_Wang') -split = data.get_split(method='scaffold') -# Returns dict with 'train', 'valid', 'test' DataFrames -``` - ## Single-Instance Prediction Tasks Single-instance prediction involves forecasting properties of individual biomedical entities (molecules, proteins, etc.). @@ -76,11 +49,6 @@ Single-instance prediction involves forecasting properties of individual biomedi Predict pharmacokinetic properties of drug molecules. -```python -from tdc.single_pred import ADME -data = ADME(name='Caco2_Wang') # Intestinal permeability -# Other datasets: HIA_Hou, Bioavailability_Ma, Lipophilicity_AstraZeneca, etc. -``` **Common ADME datasets:** - Caco2 - Intestinal permeability @@ -95,11 +63,7 @@ data = ADME(name='Caco2_Wang') # Intestinal permeability Predict toxicity and adverse effects of compounds. -```python -from tdc.single_pred import Tox -data = Tox(name='hERG') # Cardiotoxicity -# Other datasets: AMES, DILI, Carcinogens_Lagunin, etc. -``` + **Common toxicity datasets:** - hERG - Cardiac toxicity @@ -112,19 +76,11 @@ data = Tox(name='hERG') # Cardiotoxicity Bioactivity predictions from screening data. -```python -from tdc.single_pred import HTS -data = HTS(name='SARSCoV2_Vitro_Touret') -``` #### 4. QM (Quantum Mechanics) Quantum mechanical properties of molecules. -```python -from tdc.single_pred import QM -data = QM(name='QM7') -``` #### 5. Other Single Prediction Tasks @@ -150,12 +106,6 @@ Multi-instance prediction involves forecasting properties of interactions betwee Predict binding affinity between drugs and protein targets. -```python -from tdc.multi_pred import DTI -data = DTI(name='BindingDB_Kd') -split = data.get_split() -``` - **Available datasets:** - BindingDB_Kd - Dissociation constant (52,284 pairs) - BindingDB_IC50 - Half-maximal inhibitory concentration (991,486 pairs) @@ -168,23 +118,12 @@ split = data.get_split() Predict interactions between drug pairs. -```python -from tdc.multi_pred import DDI -data = DDI(name='DrugBank') -split = data.get_split() -``` - Multi-class classification task predicting interaction types. Dataset contains 191,808 DDI pairs with 1,706 drugs. #### 3. PPI (Protein-Protein Interaction) Predict protein-protein interactions. -```python -from tdc.multi_pred import PPI -data = PPI(name='HuRI') -``` - #### 4. Other Multi-Prediction Tasks - **GDA**: Gene-disease associations @@ -204,31 +143,13 @@ Generation tasks involve creating novel biomedical entities with desired propert Generate diverse, novel molecules with desirable chemical properties. -```python -from tdc.generation import MolGen -data = MolGen(name='ChEMBL_V29') -split = data.get_split() -``` Use with oracles to optimize for specific properties: -```python -from tdc import Oracle -oracle = Oracle(name='GSK3B') -score = oracle('CC(C)Cc1ccc(cc1)C(C)C(O)=O') # Evaluate SMILES -``` - -See `references/oracles.md` for all available oracle functions. - ### 2. Retrosynthesis (RetroSyn) Predict reactants needed to synthesize a target molecule. -```python -from tdc.generation import RetroSyn -data = RetroSyn(name='USPTO') -split = data.get_split() -``` Dataset contains 1,939,253 reactions from USPTO database. @@ -236,11 +157,6 @@ Dataset contains 1,939,253 reactions from USPTO database. Generate molecule pairs (e.g., prodrug-drug pairs). -```python -from tdc.generation import PairMolGen -data = PairMolGen(name='Prodrug') -``` - For detailed oracle documentation and molecular generation workflows, refer to `references/oracles.md` and `scripts/molecular_generation.py`. ## Benchmark Groups @@ -249,23 +165,6 @@ Benchmark groups provide curated collections of related datasets for systematic ### ADMET Benchmark Group -```python -from tdc.benchmark_group import admet_group -group = admet_group(path='data/') - -# Get benchmark datasets -benchmark = group.get('Caco2_Wang') -predictions = {} - -for seed in [1, 2, 3, 4, 5]: - train, valid = benchmark['train'], benchmark['valid'] - # Train model here - predictions[seed] = model.predict(benchmark['test']) - -# Evaluate with required 5 seeds -results = group.evaluate(predictions) -``` - **ADMET Group includes 22 datasets** covering absorption, distribution, metabolism, excretion, and toxicity. ### Other Benchmark Groups @@ -276,8 +175,6 @@ Available benchmark groups include collections for: - Drug combination prediction - And more specialized therapeutic tasks -For benchmark evaluation workflows, see `scripts/benchmark_evaluation.py`. - ## Data Functions TDC provides comprehensive data processing utilities organized into four categories. @@ -286,18 +183,6 @@ TDC provides comprehensive data processing utilities organized into four categor Retrieve train/validation/test partitions with various strategies: -```python -# Scaffold split (default for most tasks) -split = data.get_split(method='scaffold', seed=1, frac=[0.7, 0.1, 0.2]) - -# Random split -split = data.get_split(method='random', seed=42, frac=[0.8, 0.1, 0.1]) - -# Cold split (for DTI/DDI tasks) -split = data.get_split(method='cold_drug', seed=1) # Unseen drugs in test -split = data.get_split(method='cold_target', seed=1) # Unseen targets in test -``` - **Available split strategies:** - `random`: Random shuffling - `scaffold`: Scaffold-based (for chemical diversity) @@ -308,32 +193,12 @@ split = data.get_split(method='cold_target', seed=1) # Unseen targets in test Use standardized metrics for evaluation: -```python -from tdc import Evaluator - -# For binary classification -evaluator = Evaluator(name='ROC-AUC') -score = evaluator(y_true, y_pred) - -# For regression -evaluator = Evaluator(name='RMSE') -score = evaluator(y_true, y_pred) -``` - **Available metrics:** ROC-AUC, PR-AUC, F1, Accuracy, RMSE, MAE, R2, Spearman, Pearson, and more. ### 3. Data Processing TDC provides 11 key processing utilities: -```python -from tdc.chem_utils import MolConvert - -# Molecule format conversion -converter = MolConvert(src='SMILES', dst='PyG') -pyg_graph = converter('CC(C)Cc1ccc(cc1)C(C)C(O)=O') -``` - **Processing utilities include:** - Molecule format conversion (SMILES, SELFIES, PyG, DGL, ECFP, etc.) - Molecule filters (PAINS, drug-likeness) @@ -343,114 +208,25 @@ pyg_graph = converter('CC(C)Cc1ccc(cc1)C(C)C(O)=O') - Graph transformation - Entity retrieval (CID to SMILES, UniProt to sequence) -For comprehensive utilities documentation, see `references/utilities.md`. ### 4. Molecule Generation Oracles TDC provides 17+ oracle functions for molecular optimization: -```python -from tdc import Oracle - -# Single oracle -oracle = Oracle(name='DRD2') -score = oracle('CC(C)Cc1ccc(cc1)C(C)C(O)=O') - -# Multiple oracles -oracle = Oracle(name='JNK3') -scores = oracle(['SMILES1', 'SMILES2', 'SMILES3']) -``` - -For complete oracle documentation, see `references/oracles.md`. ## Advanced Features ### Retrieve Available Datasets -```python -from tdc.utils import retrieve_dataset_names - -# Get all ADME datasets -adme_datasets = retrieve_dataset_names('ADME') - -# Get all DTI datasets -dti_datasets = retrieve_dataset_names('DTI') -``` ### Label Transformations -```python -# Get label mapping -label_map = data.get_label_map(name='DrugBank') - -# Convert labels -from tdc.chem_utils import label_transform -transformed = label_transform(y, from_unit='nM', to_unit='p') -``` - ### Database Queries -```python -from tdc.utils import cid2smiles, uniprot2seq - -# Convert PubChem CID to SMILES -smiles = cid2smiles(2244) - -# Convert UniProt ID to amino acid sequence -sequence = uniprot2seq('P12345') -``` - ## Common Workflows ### Workflow 1: Train a Single Prediction Model -See `scripts/load_and_split_data.py` for a complete example: - -```python -from tdc.single_pred import ADME -from tdc import Evaluator - -# Load data -data = ADME(name='Caco2_Wang') -split = data.get_split(method='scaffold', seed=42) - -train, valid, test = split['train'], split['valid'], split['test'] - -# Train model (user implements) -# model.fit(train['Drug'], train['Y']) - -# Evaluate -evaluator = Evaluator(name='MAE') -# score = evaluator(test['Y'], predictions) -``` - ### Workflow 2: Benchmark Evaluation -See `scripts/benchmark_evaluation.py` for a complete example with multiple seeds and proper evaluation protocol. - ### Workflow 3: Molecular Generation with Oracles - -See `scripts/molecular_generation.py` for an example of goal-directed generation using oracle functions. - -## Resources - -This skill includes bundled resources for common TDC workflows: - -### scripts/ - -- `load_and_split_data.py`: Template for loading and splitting TDC datasets with various strategies -- `benchmark_evaluation.py`: Template for running benchmark group evaluations with proper 5-seed protocol -- `molecular_generation.py`: Template for molecular generation using oracle functions - -### references/ - -- `datasets.md`: Comprehensive catalog of all available datasets organized by task type -- `oracles.md`: Complete documentation of all 17+ molecule generation oracles -- `utilities.md`: Detailed guide to data processing, splitting, and evaluation utilities - -## Additional Resources - -- **Official Website**: https://tdcommons.ai -- **Documentation**: https://tdc.readthedocs.io -- **GitHub**: https://github.com/mims-harvard/TDC -- **Paper**: NeurIPS 2021 - "Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development" diff --git a/skills/Research/pythia-event-generator/SKILL.md b/skills/Research/pythia-event-generator/SKILL.md index ab441f64..900645b8 100644 --- a/skills/Research/pythia-event-generator/SKILL.md +++ b/skills/Research/pythia-event-generator/SKILL.md @@ -1,20 +1,31 @@ --- id: pythia-event-generator name: Pythia Event Generator -description: Step-by-step guidance for pythia event. +description: Guidance for high-energy physics event simulation and particle collision workflows using Pythia. category: Research +requires: [] +examples: + - Set up a particle collision simulation in Pythia. + - What are the best practices for generating event catalogs? --- # Pythia Event Generator -Support pythia event workflows with clear steps and best practices. +## Instruction +- Define the collision physics environment, specifying beam particles, energy (COM energy), and target process (e.g., HardQCD, WeakBosonExchange). +- Configure specific physical parameters such as parton distribution functions (PDFs) and hadronization models. +- Execute the event generation loop to produce large-scale simulated particle collision catalogs. +- Analyze the event record to extract final-state particles, including their momentum, energy, and flavor identifiers. +- Implement jet clustering or particle decay chain analysis to study specific physics signatures. +- Monitor event generation logs for convergence issues or physical instabilities in the simulation. +- Export the generated event data in HepMC or ROOT formats for downstream detector simulation and analysis. ## When to Use - -- You need help with pythia event generator. -- You want a clear, actionable next step. +- When simulating high-energy particle physics experiments (e.g., LHC-style collisions) for theoretical validation. +- When generating background or signal samples for new physics searches in collider experiments. +- When studying hadronization and particle decay patterns in vacuum or nuclear media. ## Output - -- Summary of goals and plan -- Key tips and precautions +- High-fidelity event catalogs containing simulated particle collision data. +- Physics summary reports including cross-section estimates and multiplicity distributions. +- Visualizations of event topologies and particle decay chains. \ No newline at end of file diff --git a/skills/Research/qlm-calibration/SKILL.md b/skills/Research/qlm-calibration/SKILL.md index b15b58fc..9d9d3bf4 100644 --- a/skills/Research/qlm-calibration/SKILL.md +++ b/skills/Research/qlm-calibration/SKILL.md @@ -1,20 +1,32 @@ --- id: qlm-calibration name: QLM Calibration -description: Step-by-step guidance for QLM calibration. +description: Step-by-step guidance for quantum learning machine (QLM) calibration workflows and best practices. category: Research +requires: [] +examples: + - Guide me through the QLM calibration process. + - What are the recommended steps for qubit gate calibration? --- # QLM Calibration Support QLM calibration workflows with clear steps and best practices. -## When to Use +## Instruction +- Initialize the calibration routine by identifying the target QPU and the specific quantum gates to be tuned. +- Perform Rabi oscillation measurements to determine the precise pulse amplitude and duration for single-qubit rotations. +- Conduct Ramsey and T1/T2 measurements to characterize qubit decoherence times and dephasing rates. +- Execute Randomized Benchmarking (RB) to quantify the average fidelity of the gate set and identify error sources. +- Implement crosstalk characterization to measure and mitigate the impact of neighboring qubit operations. +- Document the optimized pulse parameters and update the quantum compiler's hardware model with the new calibration data. -- You need help with qlm calibration. -- You want a clear, actionable next step. +## When to Use +- When performing routine maintenance and tuning of superconducting or trapped-ion quantum processors. +- When needing to optimize gate fidelities for complex quantum algorithm execution. +- When benchmarking the performance and noise characteristics of quantum learning machine hardware. ## Output - -- Summary of goals and plan -- Key tips and precautions +- Updated calibration parameters for quantum gates and pulse sequences. +- Qubit characterization reports including T1, T2, and gate fidelity scores. +- Actionable advice for mitigating hardware-specific noise patterns. \ No newline at end of file diff --git a/skills/Research/qms-audit-expert/SKILL.md b/skills/Research/qms-audit-expert/SKILL.md index 7cd0c5e0..716b93e3 100644 --- a/skills/Research/qms-audit-expert/SKILL.md +++ b/skills/Research/qms-audit-expert/SKILL.md @@ -1,14 +1,23 @@ --- -category: Research id: qms-audit-expert name: QMS Audit Expert -description: Senior QMS Audit Expert for internal and external quality management system auditing. Provides ISO 13485 audit expertise, audit program management, nonconformity identification, and corrective action verification. Use for internal audit planning, external audit preparation, audit execution, and audit follow-up activities. +description: Senior QMS audit guidance for ISO 13485 compliance, internal audit planning, and nonconformity management. +category: Research +requires: [] +examples: + - Develop an internal audit plan for ISO 13485 compliance. + - Identify and document a nonconformity found during a quality audit. --- # Senior QMS Audit Expert Expert-level quality management system auditing with comprehensive knowledge of ISO 13485, audit methodologies, nonconformity management, and audit program optimization for medical device organizations. +## When to Use +- When planning or executing internal QMS audits in a medical device manufacturing environment. +- When documenting audit findings or managing nonconformity resolution processes. +- When preparing for externa + ## Core QMS Auditing Competencies ### 1. ISO 13485 Audit Program Management @@ -218,23 +227,3 @@ Maintain awareness of industry audit best practices and regulatory expectations. - **Professional Development**: Auditor certification and continuing education - **Peer Learning**: Industry audit community participation and knowledge sharing -## Resources - -### scripts/ -- `audit-schedule-optimizer.py`: Risk-based audit planning and schedule optimization -- `audit-prep-checklist.py`: Comprehensive audit preparation automation -- `nonconformity-tracker.py`: Audit finding and CAPA integration management -- `audit-performance-analyzer.py`: Audit program effectiveness monitoring - -### references/ -- `iso13485-audit-guide.md`: Complete ISO 13485 audit methodology and checklists -- `process-audit-procedures.md`: Process-based audit execution frameworks -- `regulatory-inspection-guide.md`: Regulatory audit preparation and response -- `certification-audit-guide.md`: Certification body audit coordination -- `auditor-competency-framework.md`: Auditor development and assessment criteria - -### assets/ -- `audit-templates/`: Audit plan, checklist, and report templates -- `audit-checklists/`: ISO 13485 clause-specific audit checklists -- `training-materials/`: Auditor training and competency development programs -- `nonconformity-forms/`: Standardized nonconformity documentation templates diff --git a/skills/Research/qutip/SKILL.md b/skills/Research/qutip/SKILL.md index a25a2a21..5948370a 100644 --- a/skills/Research/qutip/SKILL.md +++ b/skills/Research/qutip/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: qutip name: Qutip -description: Quantum mechanics simulations and analysis using QuTiP (Quantum Toolbox in Python). +description: Quantum mechanical simulations and analysis using the Quantum Toolbox in Python (QuTiP) for closed and open systems. +category: Research +requires: [] +examples: + - Simulate the time evolution of a two-level quantum system. + - Plot the Wigner function for a Fock state in QuTiP. --- # QuTiP: Quantum Toolbox in Python @@ -11,45 +15,24 @@ description: Quantum mechanics simulations and analysis using QuTiP (Quantum Too QuTiP provides comprehensive tools for simulating and analyzing quantum mechanical systems. It handles both closed (unitary) and open (dissipative) quantum systems with multiple solvers optimized for different scenarios. -## Installation - -```bash -uv pip install qutip -``` - -Optional packages for additional functionality: - -```bash -# Quantum information processing (circuits, gates) -uv pip install qutip-qip - -# Quantum trajectory viewer -uv pip install qutip-qtrl -``` - -## Quick Start - -```python -from qutip import * -import numpy as np -import matplotlib.pyplot as plt - -# Create quantum state -psi = basis(2, 0) # |0⟩ state - -# Create operator -H = sigmaz() # Hamiltonian - -# Time evolution -tlist = np.linspace(0, 10, 100) -result = sesolve(H, psi, tlist, e_ops=[sigmaz()]) - -# Plot results -plt.plot(tlist, result.expect[0]) -plt.xlabel('Time') -plt.ylabel('⟨σz⟩') -plt.show() -``` +## Instruction +- Define the quantum state using basis vectors or density matrices and construct the Hamiltonian operator. +- Set up the time evolution environment by choosing the appropriate solver: `sesolve` for closed systems or `mesolve` for open systems with dissipation. +- Implement collapse operators to model environmental noise and decoherence in open quantum systems. +- Execute the simulation over a defined time grid and calculate expectation values for specific observables. +- Utilize visualization tools to plot the Wigner function, Q-function, or Bloch sphere trajectories of the quantum system. +- Perform advanced analysis including steady-state calculations, entanglement measures, and fidelity assessments. +- Optimize simulation performance by using string-based time dependence or parallelized solvers for large Hilbert spaces. + +## When to Use +- When simulating the dynamics of atoms, photons, and superconducting qubits in quantum optics and information tasks. +- When analyzing the impact of noise and dissipation on quantum algorithms or experimental setups. +- When visualizing phase-space distributions or state trajectories in quantum mechanical research. + +## Output +- Time-series data of quantum state evolution and expectation values. +- Phase-space visualizations (Wigner functions) and state-fidelity reports. +- Steady-state analysis results and entanglement characterization summaries. ## Core Capabilities @@ -57,39 +40,10 @@ plt.show() Create and manipulate quantum states and operators: -```python -# States -psi = basis(N, n) # Fock state |n⟩ -psi = coherent(N, alpha) # Coherent state |α⟩ -rho = thermal_dm(N, n_avg) # Thermal density matrix - -# Operators -a = destroy(N) # Annihilation operator -H = num(N) # Number operator -sx, sy, sz = sigmax(), sigmay(), sigmaz() # Pauli matrices - -# Composite systems -psi_AB = tensor(psi_A, psi_B) # Tensor product -``` - -**See** `references/core_concepts.md` for comprehensive coverage of quantum objects, states, operators, and tensor products. - ### 2. Time Evolution and Dynamics Multiple solvers for different scenarios: -```python -# Closed systems (unitary evolution) -result = sesolve(H, psi0, tlist, e_ops=[num(N)]) - -# Open systems (dissipation) -c_ops = [np.sqrt(0.1) * destroy(N)] # Collapse operators -result = mesolve(H, psi0, tlist, c_ops, e_ops=[num(N)]) - -# Quantum trajectories (Monte Carlo) -result = mcsolve(H, psi0, tlist, c_ops, ntraj=500, e_ops=[num(N)]) -``` - **Solver selection guide:** - `sesolve`: Pure states, unitary evolution - `mesolve`: Mixed states, dissipation, general open systems @@ -97,184 +51,28 @@ result = mcsolve(H, psi0, tlist, c_ops, ntraj=500, e_ops=[num(N)]) - `brmesolve`: Weak system-bath coupling - `fmmesolve`: Time-periodic Hamiltonians (Floquet) -**See** `references/time_evolution.md` for detailed solver documentation, time-dependent Hamiltonians, and advanced options. - ### 3. Analysis and Measurement Compute physical quantities: -```python -# Expectation values -n_avg = expect(num(N), psi) - -# Entropy measures -S = entropy_vn(rho) # Von Neumann entropy -C = concurrence(rho) # Entanglement (two qubits) - -# Fidelity and distance -F = fidelity(psi1, psi2) -D = tracedist(rho1, rho2) - -# Correlation functions -corr = correlation_2op_1t(H, rho0, taulist, c_ops, A, B) -w, S = spectrum_correlation_fft(taulist, corr) - -# Steady states -rho_ss = steadystate(H, c_ops) -``` - -**See** `references/analysis.md` for entropy, fidelity, measurements, correlation functions, and steady state calculations. - ### 4. Visualization Visualize quantum states and dynamics: -```python -# Bloch sphere -b = Bloch() -b.add_states(psi) -b.show() - -# Wigner function (phase space) -xvec = np.linspace(-5, 5, 200) -W = wigner(psi, xvec, xvec) -plt.contourf(xvec, xvec, W, 100, cmap='RdBu') - -# Fock distribution -plot_fock_distribution(psi) - -# Matrix visualization -hinton(rho) # Hinton diagram -matrix_histogram(H.full()) # 3D bars -``` - -**See** `references/visualization.md` for Bloch sphere animations, Wigner functions, Q-functions, and matrix visualizations. - ### 5. Advanced Methods Specialized techniques for complex scenarios: -```python -# Floquet theory (periodic Hamiltonians) -T = 2 * np.pi / w_drive -f_modes, f_energies = floquet_modes(H, T, args) -result = fmmesolve(H, psi0, tlist, c_ops, T=T, args=args) - -# HEOM (non-Markovian, strong coupling) -from qutip.nonmarkov.heom import HEOMSolver, BosonicBath -bath = BosonicBath(Q, ck_real, vk_real) -hsolver = HEOMSolver(H_sys, [bath], max_depth=5) -result = hsolver.run(rho0, tlist) - -# Permutational invariance (identical particles) -psi = dicke(N, j, m) # Dicke states -Jz = jspin(N, 'z') # Collective operators -``` - -**See** `references/advanced.md` for Floquet theory, HEOM, permutational invariance, stochastic solvers, superoperators, and performance optimization. - ## Common Workflows ### Simulating a Damped Harmonic Oscillator -```python -# System parameters -N = 20 # Hilbert space dimension -omega = 1.0 # Oscillator frequency -kappa = 0.1 # Decay rate - -# Hamiltonian and collapse operators -H = omega * num(N) -c_ops = [np.sqrt(kappa) * destroy(N)] - -# Initial state -psi0 = coherent(N, 3.0) - -# Time evolution -tlist = np.linspace(0, 50, 200) -result = mesolve(H, psi0, tlist, c_ops, e_ops=[num(N)]) - -# Visualize -plt.plot(tlist, result.expect[0]) -plt.xlabel('Time') -plt.ylabel('⟨n⟩') -plt.title('Photon Number Decay') -plt.show() -``` ### Two-Qubit Entanglement Dynamics -```python -# Create Bell state -psi0 = bell_state('00') - -# Local dephasing on each qubit -gamma = 0.1 -c_ops = [ - np.sqrt(gamma) * tensor(sigmaz(), qeye(2)), - np.sqrt(gamma) * tensor(qeye(2), sigmaz()) -] - -# Track entanglement -def compute_concurrence(t, psi): - rho = ket2dm(psi) if psi.isket else psi - return concurrence(rho) - -tlist = np.linspace(0, 10, 100) -result = mesolve(qeye([2, 2]), psi0, tlist, c_ops) - -# Compute concurrence for each state -C_t = [concurrence(state.proj()) for state in result.states] - -plt.plot(tlist, C_t) -plt.xlabel('Time') -plt.ylabel('Concurrence') -plt.title('Entanglement Decay') -plt.show() -``` ### Jaynes-Cummings Model -```python -# System parameters -N = 10 # Cavity Fock space -wc = 1.0 # Cavity frequency -wa = 1.0 # Atom frequency -g = 0.05 # Coupling strength - -# Operators -a = tensor(destroy(N), qeye(2)) # Cavity -sm = tensor(qeye(N), sigmam()) # Atom - -# Hamiltonian (RWA) -H = wc * a.dag() * a + wa * sm.dag() * sm + g * (a.dag() * sm + a * sm.dag()) - -# Initial state: cavity in coherent state, atom in ground state -psi0 = tensor(coherent(N, 2), basis(2, 0)) - -# Dissipation -kappa = 0.1 # Cavity decay -gamma = 0.05 # Atomic decay -c_ops = [np.sqrt(kappa) * a, np.sqrt(gamma) * sm] - -# Observables -n_cav = a.dag() * a -n_atom = sm.dag() * sm - -# Evolve -tlist = np.linspace(0, 50, 200) -result = mesolve(H, psi0, tlist, c_ops, e_ops=[n_cav, n_atom]) - -# Plot -fig, axes = plt.subplots(2, 1, figsize=(8, 6), sharex=True) -axes[0].plot(tlist, result.expect[0]) -axes[0].set_ylabel('⟨n_cavity⟩') -axes[1].plot(tlist, result.expect[1]) -axes[1].set_ylabel('⟨n_atom⟩') -axes[1].set_xlabel('Time') -plt.tight_layout() -plt.show() -``` ## Tips for Efficient Simulations @@ -296,19 +94,3 @@ plt.show() **Import errors**: Ensure QuTiP is installed correctly; quantum gates require `qutip-qip` package -## References - -This skill includes detailed reference documentation: - -- **`references/core_concepts.md`**: Quantum objects, states, operators, tensor products, composite systems -- **`references/time_evolution.md`**: All solvers (sesolve, mesolve, mcsolve, brmesolve, etc.), time-dependent Hamiltonians, solver options -- **`references/visualization.md`**: Bloch sphere, Wigner functions, Q-functions, Fock distributions, matrix plots -- **`references/analysis.md`**: Expectation values, entropy, fidelity, entanglement measures, correlation functions, steady states -- **`references/advanced.md`**: Floquet theory, HEOM, permutational invariance, stochastic methods, superoperators, performance tips - -## External Resources - -- Documentation: https://qutip.readthedocs.io/ -- Tutorials: https://qutip.org/qutip-tutorials/ -- API Reference: https://qutip.readthedocs.io/en/stable/apidoc/apidoc.html -- GitHub: https://github.com/qutip/qutip diff --git a/skills/Research/radio-copilot/SKILL.md b/skills/Research/radio-copilot/SKILL.md index ac0dafae..78d775ee 100644 --- a/skills/Research/radio-copilot/SKILL.md +++ b/skills/Research/radio-copilot/SKILL.md @@ -1,20 +1,32 @@ --- id: radio-copilot name: Radio Copilot -description: Step-by-step guidance for radio copilot. +description: Step-by-step guidance for radio copilot workflows and best practices. category: Research +requires: [] +examples: + - Help me set up a Radio Copilot workflow for my project. + - What are the best practices for using Radio Copilot in research? --- # Radio Copilot Support radio copilot workflows with clear steps and best practices. -## When to Use +## Instruction +- Analyze the user's specific radio communication requirements, including target frequency bands and modulation schemes. +- Perform path loss modeling and link budget calculations to estimate signal coverage and reliability. +- Consult regulatory databases to ensure the proposed radio configurations comply with local frequency allocation standards. +- Identify potential interference sources within the operating environment and suggest mitigation strategies. +- Guide the user through the configuration of Software Defined Radio (SDR) hardware or simulation software. +- Establish a step-by-step testing protocol to validate the performance of the radio system in real-world scenarios. -- You need help with radio copilot. -- You want a clear, actionable next step. +## When to Use +- When planning or optimizing wireless communication links and network topologies. +- When performing research on RF interference, spectrum management, or regulatory compliance. +- When setting up or troubleshooting Software Defined Radio (SDR) and experimental telemetry systems. ## Output - -- Summary of goals and plan -- Key tips and precautions +- A summarized RF planning report including link budget details and coverage estimates. +- Actionable steps for hardware configuration and regulatory submission checklists. +- Technical precautions regarding signal interference and antenna placement. \ No newline at end of file diff --git a/skills/Research/rag-implementation/SKILL.md b/skills/Research/rag-implementation/SKILL.md index eb1fbbbb..c532586d 100644 --- a/skills/Research/rag-implementation/SKILL.md +++ b/skills/Research/rag-implementation/SKILL.md @@ -1,20 +1,31 @@ --- -category: Research id: rag-implementation name: RAG Implementation -description: Combine vector and keyword search for improved retrieval. +description: Combine vector and keyword search for improved information retrieval and search relevance. +category: Research +requires: [] +examples: + - Help me implement a RAG system using hybrid search. + - How can I improve retrieval accuracy in my RAG implementation? --- - # RAG Implementation Combine vector and keyword search for improved retrieval. -## When to Use +## Instruction +- Define the document processing strategy, including optimal chunk sizes and overlap parameters for the target knowledge base. +- Select appropriate embedding models and vector database architectures based on retrieval speed and semantic accuracy needs. +- Implement a hybrid search pipeline that combines dense vector retrieval with sparse keyword matching (e.g., BM25). +- Apply reranking algorithms to the initial search results to improve the relevance of the context provided to the LLM. +- Configure the prompt template to ensure the LLM grounds its answers strictly in the retrieved context with proper citations. +- Conduct retrieval evaluation using metrics like faithfulness and relevancy to iteratively refine the system performance. -- You need help with rag implementation. -- You want a clear, actionable next step. +## When to Use +- When building or optimizing AI systems that require access to private or constantly updating documentation. +- When improving the accuracy and citation reliability of search-based AI agents. +- When designing high-performance retrieval architectures for large-scale enterprise knowledge bases. ## Output - -- Summary of goals and plan -- Key tips and precautions +- A comprehensive RAG implementation roadmap covering ingestion, indexing, and retrieval. +- Step-by-step guidance for configuring hybrid search and reranking components. +- Evaluation summaries and technical tips for reducing hallucinations in grounded responses. \ No newline at end of file diff --git a/skills/Research/rdkit/SKILL.md b/skills/Research/rdkit/SKILL.md index 7c9e03b8..2e8ae290 100644 --- a/skills/Research/rdkit/SKILL.md +++ b/skills/Research/rdkit/SKILL.md @@ -1,20 +1,32 @@ --- -category: Research id: rdkit name: RDKit -description: Pythonic wrapper around RDKit with simplified interface and sensible defaults. Preferred for standard drug discovery: SMILES parsing, standardization, descriptors, fingerprints, clustering, 3D conformers, parallel processing. Returns native rdkit.Chem.Mol objects. For advanced control or custom parameters, use rdkit directly. +description: Pythonic wrapper for RDKit simplified for drug discovery: SMILES parsing, standardization, descriptors, 3D conformers, and parallel processing. +category: Research +requires: [] +examples: + - Standardize these SMILES strings using the RDKit library. + - Generate 3D conformers for this molecule using RDKit in Python. --- # RDKit Pythonic wrapper around RDKit with simplified interface and sensible defaults. Preferred for standard drug discovery: SMILES parsing, standardization, descriptors, fingerprints, clustering, 3D conformers, parallel processing. Returns native rdkit.Chem.Mol objects. For advanced control or custom parameters, use rdkit directly. -## When to Use +## Instruction +- Parse chemical identifiers (SMILES, InChI) into native RDKit molecule objects for programmatic manipulation. +- Apply molecular standardization protocols to handle salts, formal charges, and tautomers for consistent data representation. +- Utilize featurization methods to generate molecular fingerprints (e.g., Morgan, ECFP) or calculate physicochemical descriptors. +- Execute substructure searching or similarity calculations to identify specific chemical scaffolds within a library. +- Conduct 3D structure generation and energy minimization for molecules to prepare them for docking analysis. +- Implement parallel processing for large-scale chemical databases to maximize computational efficiency. -- You need help with rdkit. -- You want a clear, actionable next step. +## When to Use +- When performing molecular property prediction or standardizing chemical datasets for machine learning. +- When searching for structural analogs or calculating molecular similarity in drug discovery workflows. +- When preparing 3D molecular conformations for structure-based drug design. ## Output - -- Summary of goals and plan -- Key tips and precautions +- Cleaned and standardized chemical data formatted as SMILES or SDF files. +- Molecular property reports and feature matrices (fingerprints) for downstream analysis. +- Visualizations of chemical structures and conformer generation logs. \ No newline at end of file diff --git a/skills/Research/research-engineer/SKILL.md b/skills/Research/research-engineer/SKILL.md index d4642bad..a8d0cd67 100644 --- a/skills/Research/research-engineer/SKILL.md +++ b/skills/Research/research-engineer/SKILL.md @@ -1,13 +1,14 @@ --- -category: Research id: research-engineer name: Research Engineer -description: Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. -version: 1.0.0 +description: Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints. +category: Research author: Orchestra Research -license: MIT -tags: [Mechanistic Interpretability, TransformerLens, Activation Patching, Circuit Analysis] -dependencies: [transformer-lens>=2.0.0, torch>=2.0.0] +version: 1.0.0 +requires: [] +examples: + - Help me perform activation patching on a transformer model using TransformerLens. + - Explain how to cache intermediate activations for circuit analysis. --- # TransformerLens: Mechanistic Interpretability for Transformers @@ -32,317 +33,20 @@ TransformerLens is the de facto standard library for mechanistic interpretabilit - You need remote execution on massive models → Use **nnsight** with NDIF - You want higher-level causal intervention abstractions → Use **pyvene** -## Installation - -```bash -pip install transformer-lens -``` - -For development version: -```bash -pip install git+https://github.com/TransformerLensOrg/TransformerLens -``` - -## Core Concepts - -### HookedTransformer - -The main class that wraps transformer models with HookPoints on every activation: - -```python -from transformer_lens import HookedTransformer - -# Load a model -model = HookedTransformer.from_pretrained("gpt2-small") - -# For gated models (LLaMA, Mistral) -import os -os.environ["HF_TOKEN"] = "your_token" -model = HookedTransformer.from_pretrained("meta-llama/Llama-2-7b-hf") -``` - -### Supported Models (50+) - -| Family | Models | -|--------|--------| -| GPT-2 | gpt2, gpt2-medium, gpt2-large, gpt2-xl | -| LLaMA | llama-7b, llama-13b, llama-2-7b, llama-2-13b | -| EleutherAI | pythia-70m to pythia-12b, gpt-neo, gpt-j-6b | -| Mistral | mistral-7b, mixtral-8x7b | -| Others | phi, qwen, opt, gemma | - -### Activation Caching - -Run the model and cache all intermediate activations: - -```python -# Get all activations -tokens = model.to_tokens("The Eiffel Tower is in") -logits, cache = model.run_with_cache(tokens) - -# Access specific activations -residual = cache["resid_post", 5] # Layer 5 residual stream -attn_pattern = cache["pattern", 3] # Layer 3 attention pattern -mlp_out = cache["mlp_out", 7] # Layer 7 MLP output - -# Filter which activations to cache (saves memory) -logits, cache = model.run_with_cache( - tokens, - names_filter=lambda name: "resid_post" in name -) -``` - -### ActivationCache Keys - -| Key Pattern | Shape | Description | -|-------------|-------|-------------| -| `resid_pre, layer` | [batch, pos, d_model] | Residual before attention | -| `resid_mid, layer` | [batch, pos, d_model] | Residual after attention | -| `resid_post, layer` | [batch, pos, d_model] | Residual after MLP | -| `attn_out, layer` | [batch, pos, d_model] | Attention output | -| `mlp_out, layer` | [batch, pos, d_model] | MLP output | -| `pattern, layer` | [batch, head, q_pos, k_pos] | Attention pattern (post-softmax) | -| `q, layer` | [batch, pos, head, d_head] | Query vectors | -| `k, layer` | [batch, pos, head, d_head] | Key vectors | -| `v, layer` | [batch, pos, head, d_head] | Value vectors | - -## Workflow 1: Activation Patching (Causal Tracing) - -Identify which activations causally affect model output by patching clean activations into corrupted runs. - -### Step-by-Step - -```python -from transformer_lens import HookedTransformer, patching -import torch - -model = HookedTransformer.from_pretrained("gpt2-small") - -# 1. Define clean and corrupted prompts -clean_prompt = "The Eiffel Tower is in the city of" -corrupted_prompt = "The Colosseum is in the city of" - -clean_tokens = model.to_tokens(clean_prompt) -corrupted_tokens = model.to_tokens(corrupted_prompt) - -# 2. Get clean activations -_, clean_cache = model.run_with_cache(clean_tokens) - -# 3. Define metric (e.g., logit difference) -paris_token = model.to_single_token(" Paris") -rome_token = model.to_single_token(" Rome") - -def metric(logits): - return logits[0, -1, paris_token] - logits[0, -1, rome_token] - -# 4. Patch each position and layer -results = torch.zeros(model.cfg.n_layers, clean_tokens.shape[1]) - -for layer in range(model.cfg.n_layers): - for pos in range(clean_tokens.shape[1]): - def patch_hook(activation, hook): - activation[0, pos] = clean_cache[hook.name][0, pos] - return activation - - patched_logits = model.run_with_hooks( - corrupted_tokens, - fwd_hooks=[(f"blocks.{layer}.hook_resid_post", patch_hook)] - ) - results[layer, pos] = metric(patched_logits) - -# 5. Visualize results (layer x position heatmap) -``` - -### Checklist -- [ ] Define clean and corrupted inputs that differ minimally -- [ ] Choose metric that captures behavior difference -- [ ] Cache clean activations -- [ ] Systematically patch each (layer, position) combination -- [ ] Visualize results as heatmap -- [ ] Identify causal hotspots - -## Workflow 2: Circuit Analysis (Indirect Object Identification) - -Replicate the IOI circuit discovery from "Interpretability in the Wild". - -### Step-by-Step - -```python -from transformer_lens import HookedTransformer -import torch - -model = HookedTransformer.from_pretrained("gpt2-small") - -# IOI task: "When John and Mary went to the store, Mary gave a bottle to" -# Model should predict "John" (indirect object) - -prompt = "When John and Mary went to the store, Mary gave a bottle to" -tokens = model.to_tokens(prompt) - -# 1. Get baseline logits -logits, cache = model.run_with_cache(tokens) - -john_token = model.to_single_token(" John") -mary_token = model.to_single_token(" Mary") - -# 2. Compute logit difference (IO - S) -logit_diff = logits[0, -1, john_token] - logits[0, -1, mary_token] -print(f"Logit difference: {logit_diff.item():.3f}") - -# 3. Direct logit attribution by head -def get_head_contribution(layer, head): - # Project head output to logits - head_out = cache["z", layer][0, :, head, :] # [pos, d_head] - W_O = model.W_O[layer, head] # [d_head, d_model] - W_U = model.W_U # [d_model, vocab] - - # Head contribution to logits at final position - contribution = head_out[-1] @ W_O @ W_U - return contribution[john_token] - contribution[mary_token] - -# 4. Map all heads -head_contributions = torch.zeros(model.cfg.n_layers, model.cfg.n_heads) -for layer in range(model.cfg.n_layers): - for head in range(model.cfg.n_heads): - head_contributions[layer, head] = get_head_contribution(layer, head) - -# 5. Identify top contributing heads (name movers, backup name movers) -``` - -### Checklist -- [ ] Set up task with clear IO/S tokens -- [ ] Compute baseline logit difference -- [ ] Decompose by attention head contributions -- [ ] Identify key circuit components (name movers, S-inhibition, induction) -- [ ] Validate with ablation experiments - -## Workflow 3: Induction Head Detection - -Find induction heads that implement [A][B]...[A] → [B] pattern. - -```python -from transformer_lens import HookedTransformer -import torch - -model = HookedTransformer.from_pretrained("gpt2-small") - -# Create repeated sequence: [A][B][A] should predict [B] -repeated_tokens = torch.tensor([[1000, 2000, 1000]]) # Arbitrary tokens - -_, cache = model.run_with_cache(repeated_tokens) - -# Induction heads attend from final [A] back to first [B] -# Check attention from position 2 to position 1 -induction_scores = torch.zeros(model.cfg.n_layers, model.cfg.n_heads) - -for layer in range(model.cfg.n_layers): - pattern = cache["pattern", layer][0] # [head, q_pos, k_pos] - # Attention from pos 2 to pos 1 - induction_scores[layer] = pattern[:, 2, 1] - -# Heads with high scores are induction heads -top_heads = torch.topk(induction_scores.flatten(), k=5) -``` - -## Common Issues & Solutions - -### Issue: Hooks persist after debugging -```python -# WRONG: Old hooks remain active -model.run_with_hooks(tokens, fwd_hooks=[...]) # Debug, add new hooks -model.run_with_hooks(tokens, fwd_hooks=[...]) # Old hooks still there! - -# RIGHT: Always reset hooks -model.reset_hooks() -model.run_with_hooks(tokens, fwd_hooks=[...]) -``` - -### Issue: Tokenization gotchas -```python -# WRONG: Assuming consistent tokenization -model.to_tokens("Tim") # Single token -model.to_tokens("Neel") # Becomes "Ne" + "el" (two tokens!) - -# RIGHT: Check tokenization explicitly -tokens = model.to_tokens("Neel", prepend_bos=False) -print(model.to_str_tokens(tokens)) # ['Ne', 'el'] -``` - -### Issue: LayerNorm ignored in analysis -```python -# WRONG: Ignoring LayerNorm -pre_activation = residual @ model.W_in[layer] - -# RIGHT: Include LayerNorm -ln_scale = model.blocks[layer].ln2.w -ln_out = model.blocks[layer].ln2(residual) -pre_activation = ln_out @ model.W_in[layer] -``` - -### Issue: Memory explosion with large models -```python -# Use selective caching -logits, cache = model.run_with_cache( - tokens, - names_filter=lambda n: "resid_post" in n or "pattern" in n, - device="cpu" # Cache on CPU -) -``` - -## Key Classes Reference - -| Class | Purpose | -|-------|---------| -| `HookedTransformer` | Main model wrapper with hooks | -| `ActivationCache` | Dictionary-like cache of activations | -| `HookedTransformerConfig` | Model configuration | -| `FactoredMatrix` | Efficient factored matrix operations | - -## Integration with SAELens - -TransformerLens integrates with SAELens for Sparse Autoencoder analysis: - -```python -from transformer_lens import HookedTransformer -from sae_lens import SAE - -model = HookedTransformer.from_pretrained("gpt2-small") -sae = SAE.from_pretrained("gpt2-small-res-jb", "blocks.8.hook_resid_pre") - -# Run with SAE -tokens = model.to_tokens("Hello world") -_, cache = model.run_with_cache(tokens) -sae_acts = sae.encode(cache["resid_pre", 8]) -``` - -## Reference Documentation - -For detailed API documentation, tutorials, and advanced usage, see the `references/` folder: - -| File | Contents | -|------|----------| -| [references/README.md](references/README.md) | Overview and quick start guide | -| [references/api.md](references/api.md) | Complete API reference for HookedTransformer, ActivationCache, HookPoints | -| [references/tutorials.md](references/tutorials.md) | Step-by-step tutorials for activation patching, circuit analysis, logit lens | - -## External Resources - -### Tutorials -- [Main Demo Notebook](https://transformerlensorg.github.io/TransformerLens/generated/demos/Main_Demo.html) -- [Activation Patching Demo](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Activation_Patching_in_TL_Demo.ipynb) -- [ARENA Mech Interp Course](https://arena-foundation.github.io/ARENA/) - 200+ hours of tutorials - -### Papers -- [A Mathematical Framework for Transformer Circuits](https://transformer-circuits.pub/2021/framework/index.html) -- [In-context Learning and Induction Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html) -- [Interpretability in the Wild (IOI)](https://arxiv.org/abs/2211.00593) - -### Official Documentation -- [Official Docs](https://transformerlensorg.github.io/TransformerLens/) -- [Model Properties Table](https://transformerlensorg.github.io/TransformerLens/generated/model_properties_table.html) -- [Neel Nanda's Glossary](https://www.neelnanda.io/mechanistic-interpretability/glossary) - -## Version Notes - -- **v2.0**: Removed HookedSAE (moved to SAELens) -- **v3.0 (alpha)**: TransformerBridge for loading any nn.Module +## Instruction +- Load GPT-style transformer models using the `HookedTransformer` class to enable internal activation access. +- Utilize `HookPoints` to intercept, cache, and inspect intermediate tensor activations across all layers. +- Perform activation patching and causal tracing experiments to identify the specific components responsible for model behaviors. +- Analyze attention patterns and information flow to reverse-engineer learned algorithms like induction heads or IOI circuits. +- Apply direct logit attribution to determine which heads or layers contribute most significantly to the final token prediction. +- Guide the user through the training or analysis of Sparse Autoencoders (SAEs) for more interpretable feature discovery. + +## When to Use +- When conducting mechanistic interpretability research to understand how transformers process information internally. +- When performing causal intervention experiments to debug or control model behaviors. +- When analyzing attention circuits and information flow in large language models. + +## Output +- Structured activation caches and circuit analysis reports highlighting key model components. +- Step-by-step guidance for patching experiments and causality verification. +- Visualizations and technical insights regarding internal model representations and feature logic. \ No newline at end of file diff --git a/skills/Research/research-grants/SKILL.md b/skills/Research/research-grants/SKILL.md index 5adb91c6..3ce90b85 100644 --- a/skills/Research/research-grants/SKILL.md +++ b/skills/Research/research-grants/SKILL.md @@ -1,21 +1,32 @@ --- -category: Research id: research-grants name: Research Grants -description: Write competitive research proposals for NSF, NIH, DOE, and DARPA. Agency-specific formatting, review criteria, budget preparation, broader impacts, significance statements, innovation narratives, and compliance with submission requirements. -allowed-tools: [Read, Write, Edit, Bash] +description: Draft research proposals for agencies like NSF and NIH, including formatting, budget preparation, significance statements, and compliance. +category: Research +requires: [] +examples: + - Draft a significance statement for an NIH R01 grant proposal. + - Help me format a budget justification for an NSF research grant. --- # Research Grants Write competitive research proposals for NSF, NIH, DOE, and DARPA. Agency-specific formatting, review criteria, budget preparation, broader impacts, significance statements, innovation narratives, and compliance with submission requirements. -## When to Use +## Instruction +- Identify the target funding agency (NSF, NIH, DOE, DARPA) and review their specific solicitation requirements. +- Craft a compelling significance and innovation narrative that highlights the research's potential impact and technical novelty. +- Develop a structured budget justification, ensuring all requested funds align with project tasks and agency-specific cost principles. +- Outline the "Broader Impacts" or "Public Health Relevance" sections to meet the non-technical review criteria of the proposal. +- Implement a compliance checklist for formatting (e.g., page limits, font sizes) and required supplementary documents. +- Synthesize feedback from pre-submission reviews to refine the proposal's technical clarity and alignment with agency goals. -- You need help with research grants. -- You want a clear, actionable next step. +## When to Use +- When drafting or editing technical research proposals for federal or private funding agencies. +- When preparing budget justifications and innovation statements for grant submissions. +- When verifying that a grant proposal meets all structural and administrative compliance standards. ## Output - -- Summary of goals and plan -- Key tips and precautions +- Draft sections for grant proposals (Significance, Innovation, Budget Justification) formatted for submission. +- Comprehensive compliance checklists and timelines for the grant application process. +- Strategic recommendations for enhancing the proposal's competitiveness based on agency review criteria. \ No newline at end of file diff --git a/skills/Research/research-lookup/SKILL.md b/skills/Research/research-lookup/SKILL.md index fe6082fb..1237ab7b 100644 --- a/skills/Research/research-lookup/SKILL.md +++ b/skills/Research/research-lookup/SKILL.md @@ -1,21 +1,32 @@ --- -category: Research id: research-lookup name: Research Lookup -description: Look up current research information using Perplexity's Sonar Pro Search or Sonar Reasoning Pro models through OpenRouter. Automatically selects the best model based on query complexity. Search academic papers, recent studies, technical documentation, and general research information with citations. -allowed-tools: [Read, Write, Edit, Bash] +description: Search for current academic papers, studies, and technical documentation using Perplexity models with automated model selection. +category: Research +requires: [] +examples: + - Search for recent studies on solid-state battery technology. + - Look up technical documentation for the latest Transformer architectures. --- # Research Lookup Look up current research information using Perplexity's Sonar Pro Search or Sonar Reasoning Pro models through OpenRouter. Automatically selects the best model based on query complexity. Search academic papers, recent studies, technical documentation, and general research information with citations. -## When to Use +## Instruction +- Formulate complex research queries that leverage the full search capability of Perplexity Sonar models. +- Automatically select the most appropriate model (e.g., Sonar Pro Search) based on the depth and complexity of the user's research question. +- Retrieve the latest academic papers, preprints, and technical documentation, focusing on developments within the last 24 months. +- Synthesize search results into a cohesive briefing, ensuring every technical claim is backed by a verifiable citation. +- Verify citation accuracy by cross-referencing information across multiple sources retrieved during the search session. +- Provide direct links to primary sources to facilitate the user's own verification and deeper reading. -- You need help with research lookup. -- You want a clear, actionable next step. +## When to Use +- When searching for the absolute latest scientific papers and technical breakthroughs beyond LLM training data. +- When requiring a synthesized research briefing with explicit citations for grounded academic work. +- When performing initial literature reviews or technology landscape analysis for new research projects. ## Output - -- Summary of goals and plan -- Key tips and precautions +- A structured research report containing current findings, trends, and future directions. +- Lists of relevant primary sources and technical documentation with direct links. +- Summary of identified gaps in current research and proposed next steps for investigation. \ No newline at end of file diff --git a/skills/Research/safety-interlocks/SKILL.md b/skills/Research/safety-interlocks/SKILL.md index b33b6c37..986c221b 100644 --- a/skills/Research/safety-interlocks/SKILL.md +++ b/skills/Research/safety-interlocks/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: safety-interlocks name: Safety Interlocks description: Implement safety interlocks and protective mechanisms to prevent equipment damage and ensure safe control system operation. +category: Research +requires: [] +examples: + - How do I implement a safety cutoff for my control loop? + - Show me a pattern for applying safety limits to sensor data. --- # Safety Interlocks for Control Systems @@ -11,107 +15,39 @@ description: Implement safety interlocks and protective mechanisms to prevent eq Safety interlocks are protective mechanisms that prevent equipment damage and ensure safe operation. In control systems, the primary risks are output saturation and exceeding safe operating limits. -## Implementation Pattern - -Always check safety conditions BEFORE applying control outputs: - -```python -def apply_safety_limits(measurement, command, max_limit, min_limit, max_output, min_output): - """ - Apply safety checks and return safe command. - - Args: - measurement: Current sensor reading - command: Requested control output - max_limit: Maximum safe measurement value - min_limit: Minimum safe measurement value - max_output: Maximum output command - min_output: Minimum output command +## Instruction +- Perform a critical pre-control check to verify that sensor readings are within physical bounds and not NaN or infinite. +- Identify and define the absolute safety limits (Maximum and Minimum) for both measurements and control outputs. +- Implement a "Check BEFORE Output" logic where safety conditions are evaluated immediately before any control command is sent. +- Trigger an emergency cutoff (setting output to a safe minimum) whenever a critical threshold is breached. +- Maintain a structured safety log recording the timestamp, measurement, and the specific event that triggered an interlock. +- Utilize output clamping to ensure that manual or automated commands never exceed the hardware's safe operating range. - Returns: - tuple: (safe_command, safety_triggered) - """ - safety_triggered = False +## When to Use +- When designing or implementing control logic for laboratory hardware, heating systems, or mechanical actuators. +- When establishing protective mechanisms to prevent system saturation or catastrophic equipment damage. +- When performing risk assessments and configuring automated safety monitoring for industrial processes. - # Check for over-limit - HIGHEST PRIORITY - if measurement >= max_limit: - command = min_output # Emergency cutoff - safety_triggered = True +## Output +- Functional safety logic patterns and code snippets for interlock implementation. +- Configuration summaries for safety thresholds and emergency response behaviors. +- A checklist of precautions for pre-control sensor validation and safety event logging. - # Clamp output to valid range - command = max(min_output, min(max_output, command)) +## Implementation Pattern - return command, safety_triggered -``` +Always check safety conditions BEFORE applying control outputs: ## Integration with Control Loop -```python -class SafeController: - def __init__(self, controller, max_limit, min_output=0.0, max_output=100.0): - self.controller = controller - self.max_limit = max_limit - self.min_output = min_output - self.max_output = max_output - self.safety_events = [] - - def compute(self, measurement, dt): - """Compute safe control output.""" - # Check safety FIRST - if measurement >= self.max_limit: - self.safety_events.append({ - "measurement": measurement, - "action": "emergency_cutoff" - }) - return self.min_output - - # Normal control - output = self.controller.compute(measurement, dt) - - # Clamp to valid range - return max(self.min_output, min(self.max_output, output)) -``` ## Safety During Open-Loop Testing During calibration/excitation, safety is especially important because there's no feedback control: -```python -def run_test_with_safety(system, input_value, duration, dt, max_limit): - """Run open-loop test while monitoring safety limits.""" - data = [] - current_input = input_value - - for step in range(int(duration / dt)): - result = system.step(current_input) - data.append(result) - - # Safety check - if result["output"] >= max_limit: - current_input = 0.0 # Cut input - - return data -``` - ## Logging Safety Events Always log safety events for analysis: -```python -safety_log = { - "limit": max_limit, - "events": [] -} - -if measurement >= max_limit: - safety_log["events"].append({ - "time": current_time, - "measurement": measurement, - "command_before": command, - "command_after": 0.0, - "event_type": "limit_exceeded" - }) -``` ## Pre-Control Checklist @@ -129,13 +65,6 @@ Before starting any control operation: - Maximum limit threshold set - Output clamping enabled -```python -def pre_control_checks(measurement, config): - """Run pre-control safety verification.""" - assert not np.isnan(measurement), "Measurement is NaN" - assert config.get("max_limit") is not None, "Safety limit not configured" - return True -``` ## Best Practices diff --git a/skills/Research/safety-system-skill/SKILL.md b/skills/Research/safety-system-skill/SKILL.md index 85d166a3..d89541ce 100644 --- a/skills/Research/safety-system-skill/SKILL.md +++ b/skills/Research/safety-system-skill/SKILL.md @@ -1,14 +1,12 @@ --- -category: Research id: safety-system-skill name: Safety System Skill -description: System safety and control-plane skill that prevents agent deadlocks and freezes. Provides non-LLM control commands to inspect task state, flush message queues, cancel long-running work, and recover safely without restarting the container. - System safety and control-plane skill that prevents agent deadlocks and freezes. - Provides non-LLM control commands to inspect task state, flush message queues, - cancel long-running work, and recover safely without restarting the container. - Use when implementing or operating long-running tasks, sub-agents, benchmarks, - background monitors (e.g., Moltbook, PNR checks), or when the system becomes - unresponsive and needs immediate recovery controls. +description: System safety skill that prevents agent deadlocks via non-LLM control commands to inspect task state and flush message queues. +category: Research +requires: [] +examples: + - Help me recover the system from a deadlock using control commands. + - How do I inspect the current task registry state and system health? --- # error-guard @@ -89,3 +87,21 @@ Steps: - No LLM reasoning paths This skill is the **last line of defense**. Keep it small, fast, and reliable. + +## Instruction +- Utilize non-LLM control commands to inspect the agent's internal task registry and system health without blocking execution. +- Run `/status` periodically to identify stalled or overdue tasks and report the current heartbeat of the control plane. +- Execute the `/flush` command as an emergency stop to immediately cancel all active tasks and clear pending message queues. +- Perform the `/recover` sequence to reset the agent's state and reload necessary skills after a system freeze or deadlock. +- Strictly adhere to constant-time constraints for safety commands, ensuring they never call external APIs or LLM models. +- Monitor minimal task metadata (ID, timestamps) to maintain a fail-safe environment for long-lived or high-risk workloads. + +## When to Use +- When running high-risk, long-duration tasks that are prone to agent freezes or infinite loops. +- When requiring an emergency "kill switch" to stop active code execution or message processing. +- When performing automated system health monitoring for complex multi-agent orchestration. + +## Output +- Real-time system health reports including task IDs and start times. +- Detailed recovery logs summarizing the outcome of flush and reset operations. +- Actionable status flags for stalled processes and suggested recovery paths. \ No newline at end of file diff --git a/skills/Research/scanpy/SKILL.md b/skills/Research/scanpy/SKILL.md index cf259543..a5db2782 100644 --- a/skills/Research/scanpy/SKILL.md +++ b/skills/Research/scanpy/SKILL.md @@ -1,8 +1,12 @@ --- -category: Research id: scanpy name: Scanpy -description: Query CZ CELLxGENE Census (61M+ cells). Filter by cell type/tissue/disease, retrieve expression data, integrate with scanpy/PyTorch, for population-scale single-cell analysis. +description: Query CZ CELLxGENE Census (61M+ cells) for single-cell genomics data filtered by cell type, tissue, or disease. +category: Research +requires: [] +examples: + - Retrieve human macrophage expression data from the CELLxGENE Census. + - Filter for cells with a specific disease state in the CZ CELLxGENE database. --- # CZ CELLxGENE Census @@ -29,17 +33,25 @@ This skill should be used when: - Computing statistics across millions of cells - Accessing pre-calculated embeddings or model predictions +## Instruction +- Initialize the CELLxGENE Census connection by selecting the appropriate SOMA environment for large-scale data access. +- Formulate metadata filters using `obs_value_filter` to target specific cell types, tissues, or disease states (e.g., "is_primary_data == True"). +- Retrieve high-dimensional gene expression matrices and convert them into AnnData objects for integration with scanpy analysis. +- Implement out-of-core processing or batch loading for queries returning millions of cells to prevent memory errors. +- Conduct cross-tissue or cross-disease analysis by aggregating standardized data from multiple studies within the Census. +- Utilize pre-calculated embeddings and statistics from the Census to accelerate cell clustering and visualization tasks. + +## Output +- Filtered AnnData objects ready for downstream scanpy analysis and clustering. +- Summaries of retrieved datasets including cell counts, tissue distribution, and metadata coverage. +- Step-by-step guidance for efficient data querying and memory management during large-scale retrieval. + ## Installation and Setup Install the Census API: -```bash -uv pip install cellxgene-census -``` For machine learning workflows, install additional dependencies: -```bash -uv pip install cellxgene-census[experimental] -``` + ## Core Workflow Patterns @@ -47,317 +59,59 @@ uv pip install cellxgene-census[experimental] Always use the context manager to ensure proper resource cleanup: -```python -import cellxgene_census - -# Open latest stable version -with cellxgene_census.open_soma() as census: - # Work with census data - -# Open specific version for reproducibility -with cellxgene_census.open_soma(census_version="2023-07-25") as census: - # Work with census data -``` - -**Key points:** -- Use context manager (`with` statement) for automatic cleanup -- Specify `census_version` for reproducible analyses -- Default opens latest "stable" release - ### 2. Exploring Census Information Before querying expression data, explore available datasets and metadata. -**Access summary information:** -```python -# Get summary statistics -summary = census["census_info"]["summary"].read().concat().to_pandas() -print(f"Total cells: {summary['total_cell_count'][0]}") - -# Get all datasets -datasets = census["census_info"]["datasets"].read().concat().to_pandas() - -# Filter datasets by criteria -covid_datasets = datasets[datasets["disease"].str.contains("COVID", na=False)] -``` - -**Query cell metadata to understand available data:** -```python -# Get unique cell types in a tissue -cell_metadata = cellxgene_census.get_obs( - census, - "homo_sapiens", - value_filter="tissue_general == 'brain' and is_primary_data == True", - column_names=["cell_type"] -) -unique_cell_types = cell_metadata["cell_type"].unique() -print(f"Found {len(unique_cell_types)} cell types in brain") - -# Count cells by tissue -tissue_counts = cell_metadata.groupby("tissue_general").size() -``` - -**Important:** Always filter for `is_primary_data == True` to avoid counting duplicate cells unless specifically analyzing duplicates. - ### 3. Querying Expression Data (Small to Medium Scale) For queries returning < 100k cells that fit in memory, use `get_anndata()`: -```python -# Basic query with cell type and tissue filters -adata = cellxgene_census.get_anndata( - census=census, - organism="Homo sapiens", # or "Mus musculus" - obs_value_filter="cell_type == 'B cell' and tissue_general == 'lung' and is_primary_data == True", - obs_column_names=["assay", "disease", "sex", "donor_id"], -) - -# Query specific genes with multiple filters -adata = cellxgene_census.get_anndata( - census=census, - organism="Homo sapiens", - var_value_filter="feature_name in ['CD4', 'CD8A', 'CD19', 'FOXP3']", - obs_value_filter="cell_type == 'T cell' and disease == 'COVID-19' and is_primary_data == True", - obs_column_names=["cell_type", "tissue_general", "donor_id"], -) -``` - -**Filter syntax:** -- Use `obs_value_filter` for cell filtering -- Use `var_value_filter` for gene filtering -- Combine conditions with `and`, `or` -- Use `in` for multiple values: `tissue in ['lung', 'liver']` -- Select only needed columns with `obs_column_names` - -**Getting metadata separately:** -```python -# Query cell metadata -cell_metadata = cellxgene_census.get_obs( - census, "homo_sapiens", - value_filter="disease == 'COVID-19' and is_primary_data == True", - column_names=["cell_type", "tissue_general", "donor_id"] -) - -# Query gene metadata -gene_metadata = cellxgene_census.get_var( - census, "homo_sapiens", - value_filter="feature_name in ['CD4', 'CD8A']", - column_names=["feature_id", "feature_name", "feature_length"] -) -``` ### 4. Large-Scale Queries (Out-of-Core Processing) For queries exceeding available RAM, use `axis_query()` with iterative processing: -```python -import tiledbsoma as soma - -# Create axis query -query = census["census_data"]["homo_sapiens"].axis_query( - measurement_name="RNA", - obs_query=soma.AxisQuery( - value_filter="tissue_general == 'brain' and is_primary_data == True" - ), - var_query=soma.AxisQuery( - value_filter="feature_name in ['FOXP2', 'TBR1', 'SATB2']" - ) -) - -# Iterate through expression matrix in chunks -iterator = query.X("raw").tables() -for batch in iterator: - # batch is a pyarrow.Table with columns: - # - soma_data: expression value - # - soma_dim_0: cell (obs) coordinate - # - soma_dim_1: gene (var) coordinate - process_batch(batch) -``` - -**Computing incremental statistics:** -```python -# Example: Calculate mean expression -n_observations = 0 -sum_values = 0.0 - -iterator = query.X("raw").tables() -for batch in iterator: - values = batch["soma_data"].to_numpy() - n_observations += len(values) - sum_values += values.sum() - -mean_expression = sum_values / n_observations -``` ### 5. Machine Learning with PyTorch For training models, use the experimental PyTorch integration: -```python -from cellxgene_census.experimental.ml import experiment_dataloader - -with cellxgene_census.open_soma() as census: - # Create dataloader - dataloader = experiment_dataloader( - census["census_data"]["homo_sapiens"], - measurement_name="RNA", - X_name="raw", - obs_value_filter="tissue_general == 'liver' and is_primary_data == True", - obs_column_names=["cell_type"], - batch_size=128, - shuffle=True, - ) - - # Training loop - for epoch in range(num_epochs): - for batch in dataloader: - X = batch["X"] # Gene expression tensor - labels = batch["obs"]["cell_type"] # Cell type labels - - # Forward pass - outputs = model(X) - loss = criterion(outputs, labels) - - # Backward pass - optimizer.zero_grad() - loss.backward() - optimizer.step() -``` - -**Train/test splitting:** -```python -from cellxgene_census.experimental.ml import ExperimentDataset - -# Create dataset from experiment -dataset = ExperimentDataset( - experiment_axis_query, - layer_name="raw", - obs_column_names=["cell_type"], - batch_size=128, -) - -# Split into train and test -train_dataset, test_dataset = dataset.random_split( - split=[0.8, 0.2], - seed=42 -) -``` ### 6. Integration with Scanpy Seamlessly integrate Census data with scanpy workflows: -```python -import scanpy as sc - -# Load data from Census -adata = cellxgene_census.get_anndata( - census=census, - organism="Homo sapiens", - obs_value_filter="cell_type == 'neuron' and tissue_general == 'cortex' and is_primary_data == True", -) - -# Standard scanpy workflow -sc.pp.normalize_total(adata, target_sum=1e4) -sc.pp.log1p(adata) -sc.pp.highly_variable_genes(adata, n_top_genes=2000) - -# Dimensionality reduction -sc.pp.pca(adata, n_comps=50) -sc.pp.neighbors(adata) -sc.tl.umap(adata) - -# Visualization -sc.pl.umap(adata, color=["cell_type", "tissue", "disease"]) -``` - ### 7. Multi-Dataset Integration Query and integrate multiple datasets: -```python -# Strategy 1: Query multiple tissues separately -tissues = ["lung", "liver", "kidney"] -adatas = [] - -for tissue in tissues: - adata = cellxgene_census.get_anndata( - census=census, - organism="Homo sapiens", - obs_value_filter=f"tissue_general == '{tissue}' and is_primary_data == True", - ) - adata.obs["tissue"] = tissue - adatas.append(adata) - -# Concatenate -combined = adatas[0].concatenate(adatas[1:]) - -# Strategy 2: Query multiple datasets directly -adata = cellxgene_census.get_anndata( - census=census, - organism="Homo sapiens", - obs_value_filter="tissue_general in ['lung', 'liver', 'kidney'] and is_primary_data == True", -) -``` ## Key Concepts and Best Practices ### Always Filter for Primary Data Unless analyzing duplicates, always include `is_primary_data == True` in queries to avoid counting cells multiple times: -```python -obs_value_filter="cell_type == 'B cell' and is_primary_data == True" -``` ### Specify Census Version for Reproducibility Always specify the Census version in production analyses: -```python -census = cellxgene_census.open_soma(census_version="2023-07-25") -``` + ### Estimate Query Size Before Loading For large queries, first check the number of cells to avoid memory issues: -```python -# Get cell count -metadata = cellxgene_census.get_obs( - census, "homo_sapiens", - value_filter="tissue_general == 'brain' and is_primary_data == True", - column_names=["soma_joinid"] -) -n_cells = len(metadata) -print(f"Query will return {n_cells:,} cells") -# If too large (>100k), use out-of-core processing -``` ### Use tissue_general for Broader Groupings -The `tissue_general` field provides coarser categories than `tissue`, useful for cross-tissue analyses: -```python -# Broader grouping -obs_value_filter="tissue_general == 'immune system'" - -# Specific tissue -obs_value_filter="tissue == 'peripheral blood mononuclear cell'" -``` ### Select Only Needed Columns -Minimize data transfer by specifying only required metadata columns: -```python -obs_column_names=["cell_type", "tissue_general", "disease"] # Not all columns -``` +Minimize data transfer by specifying only required metadata columns ### Check Dataset Presence for Gene-Specific Queries -When analyzing specific genes, verify which datasets measured them: -```python -presence = cellxgene_census.get_presence_matrix( - census, - "homo_sapiens", - var_value_filter="feature_name in ['CD4', 'CD8A']" -) -``` +When analyzing specific genes, verify which datasets measured them + ### Two-Step Workflow: Explore Then Query First explore metadata to understand available data, then query expression: -```python + # Step 1: Explore what's available metadata = cellxgene_census.get_obs( census, "homo_sapiens", @@ -372,25 +126,6 @@ adata = cellxgene_census.get_anndata( organism="Homo sapiens", obs_value_filter="disease == 'COVID-19' and cell_type == 'T cell' and is_primary_data == True", ) -``` - -## Available Metadata Fields - -### Cell Metadata (obs) -Key fields for filtering: -- `cell_type`, `cell_type_ontology_term_id` -- `tissue`, `tissue_general`, `tissue_ontology_term_id` -- `disease`, `disease_ontology_term_id` -- `assay`, `assay_ontology_term_id` -- `donor_id`, `sex`, `self_reported_ethnicity` -- `development_stage`, `development_stage_ontology_term_id` -- `dataset_id` -- `is_primary_data` (Boolean: True = unique cell) - -### Gene Metadata (var) -- `feature_id` (Ensembl gene ID, e.g., "ENSG00000161798") -- `feature_name` (Gene symbol, e.g., "FOXP2") -- `feature_length` (Gene length in base pairs) ## Reference Documentation @@ -421,61 +156,10 @@ Examples and patterns for: ## Common Use Cases ### Use Case 1: Explore Cell Types in a Tissue -```python -with cellxgene_census.open_soma() as census: - cells = cellxgene_census.get_obs( - census, "homo_sapiens", - value_filter="tissue_general == 'lung' and is_primary_data == True", - column_names=["cell_type"] - ) - print(cells["cell_type"].value_counts()) -``` - -### Use Case 2: Query Marker Gene Expression -```python -with cellxgene_census.open_soma() as census: - adata = cellxgene_census.get_anndata( - census=census, - organism="Homo sapiens", - var_value_filter="feature_name in ['CD4', 'CD8A', 'CD19']", - obs_value_filter="cell_type in ['T cell', 'B cell'] and is_primary_data == True", - ) -``` - -### Use Case 3: Train Cell Type Classifier -```python -from cellxgene_census.experimental.ml import experiment_dataloader - -with cellxgene_census.open_soma() as census: - dataloader = experiment_dataloader( - census["census_data"]["homo_sapiens"], - measurement_name="RNA", - X_name="raw", - obs_value_filter="is_primary_data == True", - obs_column_names=["cell_type"], - batch_size=128, - shuffle=True, - ) - - # Train model - for epoch in range(epochs): - for batch in dataloader: - # Training logic - pass -``` - -### Use Case 4: Cross-Tissue Analysis -```python -with cellxgene_census.open_soma() as census: - adata = cellxgene_census.get_anndata( - census=census, - organism="Homo sapiens", - obs_value_filter="cell_type == 'macrophage' and tissue_general in ['lung', 'liver', 'brain'] and is_primary_data == True", - ) - - # Analyze macrophage differences across tissues - sc.tl.rank_genes_groups(adata, groupby="tissue_general") -``` + +### Use Case 2: Train Cell Type Classifier + +### Use Case 3: Cross-Tissue Analysis ## Troubleshooting From 2ccc724232219fd04e75bade1b7eb19352ed3af1 Mon Sep 17 00:00:00 2001 From: sixiang-svg Date: Sat, 21 Mar 2026 22:38:12 +0800 Subject: [PATCH 2/3] Update external audit preparation guidance in SKILL.md --- skills/Research/qms-audit-expert/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/Research/qms-audit-expert/SKILL.md b/skills/Research/qms-audit-expert/SKILL.md index 716b93e3..ac1db5fe 100644 --- a/skills/Research/qms-audit-expert/SKILL.md +++ b/skills/Research/qms-audit-expert/SKILL.md @@ -16,7 +16,7 @@ Expert-level quality management system auditing with comprehensive knowledge of ## When to Use - When planning or executing internal QMS audits in a medical device manufacturing environment. - When documenting audit findings or managing nonconformity resolution processes. -- When preparing for externa +- When preparing for external audits, regulatory inspections, or certification body assessments. ## Core QMS Auditing Competencies From 8aeafe7207a0088b49124528f5776a02e4fdb889 Mon Sep 17 00:00:00 2001 From: sixiang-svg Date: Sat, 21 Mar 2026 22:40:32 +0800 Subject: [PATCH 3/3] Remove redundant safety implementation sections Removed sections on implementation patterns, integration with control loops, safety during open-loop testing, and logging safety events from SKILL.md. --- skills/Research/safety-interlocks/SKILL.md | 16 ---------------- 1 file changed, 16 deletions(-) diff --git a/skills/Research/safety-interlocks/SKILL.md b/skills/Research/safety-interlocks/SKILL.md index 986c221b..e25562d1 100644 --- a/skills/Research/safety-interlocks/SKILL.md +++ b/skills/Research/safety-interlocks/SKILL.md @@ -33,22 +33,6 @@ Safety interlocks are protective mechanisms that prevent equipment damage and en - Configuration summaries for safety thresholds and emergency response behaviors. - A checklist of precautions for pre-control sensor validation and safety event logging. -## Implementation Pattern - -Always check safety conditions BEFORE applying control outputs: - -## Integration with Control Loop - - -## Safety During Open-Loop Testing - -During calibration/excitation, safety is especially important because there's no feedback control: - -## Logging Safety Events - -Always log safety events for analysis: - - ## Pre-Control Checklist Before starting any control operation: