Skip to content

Commit 5237e4a

Browse files
ammawlaclaude
andcommitted
Marketplace plugin with 47 skills + MCP connector, bump to v0.3.0-beta.8
- Add plugin/ directory with real files (not symlinks) for marketplace install - Add plugin/.mcp.json for MCP connector (shows under Connectors in UI) - Add plugin/.claude-plugin/plugin.json (minimal manifest) - Add plugin/CLAUDE.md and plugin/skills/ (47 skills, real copies) - Fix YAML >- block scalar descriptions in 18 skills (now display properly) - Update README Quick Start with marketplace install instructions - Fix description count: 48 → 47 (tabaseq-deconvolution is unreleased) - Bump version 0.3.0-beta.7 → 0.3.0-beta.8 across all 8 manifest files Install via: /plugin marketplace add ammawla/encode-toolkit /plugin install encode-toolkit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2971bf3 commit 5237e4a

198 files changed

Lines changed: 43143 additions & 167 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude-plugin/marketplace.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"name": "encode-toolkit",
2+
"name": "ammawla",
33
"owner": {
44
"name": "Dr. Alex M. Mawla, PhD"
55
},
@@ -10,9 +10,9 @@
1010
"plugins": [
1111
{
1212
"name": "encode-toolkit",
13-
"version": "0.3.0",
13+
"version": "0.3.0-beta.8",
1414
"source": "./plugin",
15-
"description": "20 ENCODE API tools + 48 expert skills for genomics research. Search experiments, download files with MD5 verification, run pipelines, and cross-reference 14 databases."
15+
"description": "20 ENCODE API tools + 47 expert skills for genomics research. Search experiments, download files with MD5 verification, run pipelines, and cross-reference 14 databases."
1616
}
1717
]
1818
}

.claude-plugin/plugin.json

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "encode-toolkit",
3-
"description": "20 ENCODE API tools + 48 expert skills for genomics research. Search experiments, download files with MD5 verification, run pipelines, and cross-reference 14 databases.",
4-
"version": "0.3.0-beta.7",
3+
"description": "20 ENCODE API tools + 47 expert skills for genomics research. Search experiments, download files with MD5 verification, run pipelines, and cross-reference 14 databases.",
4+
"version": "0.3.0-beta.8",
55
"author": {
66
"name": "Dr. Alex M. Mawla, PhD",
77
"email": "ammawla@ucdavis.edu"
@@ -35,7 +35,12 @@
3535
"biology",
3636
"science"
3737
],
38-
"skills": "skills/*",
38+
"mcpServers": {
39+
"encode-toolkit": {
40+
"command": "npx",
41+
"args": ["-y", "encode-toolkit@latest"]
42+
}
43+
},
3944
"tools": [
4045
{
4146
"name": "encode_search_experiments",

README.md

Lines changed: 20 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
[![License: CC BY-NC-ND 4.0](https://img.shields.io/badge/License-CC_BY--NC--ND_4.0-red.svg)](LICENSE)
66
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
7-
[![Version](https://img.shields.io/badge/version-0.3.0--beta-yellow)](CHANGELOG.md)
7+
[![Version](https://img.shields.io/badge/version-0.3.0--beta.8-yellow)](CHANGELOG.md)
88
[![Status](https://img.shields.io/badge/status-beta-yellow)]()
99
[![Skills](https://img.shields.io/badge/skills-47-orange)](docs/skill-vignettes/)
1010
[![Tools](https://img.shields.io/badge/MCP_tools-20-purple)](src/encode_connector/server/main.py)
@@ -32,16 +32,31 @@ Search ENCODE, cross-reference 14 databases, run 7 analysis pipelines, and gener
3232

3333
## Quick Start
3434

35-
### Claude Code (recommended)
35+
### Claude Code Plugin (recommended)
36+
37+
Start a new Claude Code session and enter:
38+
39+
```
40+
/plugin marketplace add ammawla/encode-toolkit
41+
42+
/plugin install encode-toolkit
43+
```
44+
45+
That's it. All 20 tools, 47 skills, and the MCP connector are now available.
46+
47+
<details>
48+
<summary><strong>MCP-only install (tools only, no skills)</strong></summary>
49+
50+
If you only need the 20 MCP tools without the 47 workflow skills:
3651

3752
```bash
3853
claude mcp add encode -- uvx encode-toolkit
3954
```
4055

41-
That's it. All 20 tools and 47 skills are now available in Claude Code.
56+
</details>
4257

4358
<details>
44-
<summary><strong>Other installation methods</strong></summary>
59+
<summary><strong>Other editors and platforms</strong></summary>
4560

4661
#### npx (Node.js)
4762

@@ -69,24 +84,10 @@ Then use `encode-toolkit` as the command in any MCP client configuration:
6984
}
7085
```
7186

72-
#### Claude Code — Plugin Install
73-
74-
For the full experience (20 tools + 47 skills), install as a Claude Code plugin:
75-
76-
```bash
77-
claude plugin add /path/to/encode-toolkit
78-
```
79-
80-
Or install from a marketplace:
81-
82-
```
83-
/plugin install encode-toolkit
84-
```
85-
8687
</details>
8788

8889
<details>
89-
<summary><strong>Claude Desktop</strong></summary>
90+
<summary><strong>Claude Desktop (MCP only)</strong></summary>
9091

9192
Add to your `claude_desktop_config.json`:
9293

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "encode-toolkit",
3-
"version": "0.3.0-beta.7",
3+
"version": "0.3.0-beta.8",
44
"mcpName": "io.github.ammawla/encode-toolkit",
55
"description": "ENCODE Toolkit — Genomics research infrastructure with 20 MCP tools, 47 skills, 14 database integrations, and 7 pipelines for Claude Code",
66
"main": "index.js",

plugin/.claude-plugin/plugin.json

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
11
{
22
"name": "encode-toolkit",
3-
"version": "0.3.0",
4-
"description": "20 ENCODE API tools + 48 expert skills for genomics research. Search experiments, download files with MD5 verification, run pipelines, and cross-reference 14 databases.",
3+
"version": "0.3.0-beta.8",
4+
"description": "20 ENCODE API tools + 47 expert skills for genomics research. Search experiments, download files with MD5 verification, run pipelines, and cross-reference 14 databases.",
55
"author": {
6-
"name": "Dr. Alex M. Mawla, PhD",
7-
"email": "ammawla@ucdavis.edu"
6+
"name": "Dr. Alex M. Mawla, PhD"
87
},
9-
"homepage": "https://github.com/ammawla/encode-toolkit",
108
"repository": "https://github.com/ammawla/encode-toolkit",
119
"license": "CC-BY-NC-ND-4.0",
1210
"keywords": [
@@ -19,11 +17,5 @@
1917
"rna-seq",
2018
"pipelines",
2119
"science"
22-
],
23-
"mcpServers": {
24-
"encode-toolkit": {
25-
"command": "npx",
26-
"args": ["-y", "encode-toolkit@latest"]
27-
}
28-
}
20+
]
2921
}

plugin/.mcp.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"mcpServers": {
3+
"encode-toolkit": {
4+
"type": "stdio",
5+
"command": "npx",
6+
"args": ["-y", "encode-toolkit@latest"]
7+
}
8+
}
9+
}

plugin/CLAUDE.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

plugin/CLAUDE.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# ENCODE Toolkit
2+
3+
MCP server for the ENCODE Project (encodeproject.org) — the largest public catalog of functional genomic elements. Version 0.3.0-beta.
4+
5+
## Quick Start
6+
7+
```bash
8+
python -m venv .venv && source .venv/bin/activate
9+
pip install -e ".[dev]"
10+
pytest # 506 tests, 98% coverage
11+
ruff check src/ # lint
12+
ruff format src/ # auto-format
13+
encode-toolkit # run MCP server
14+
```
15+
16+
## Source Architecture
17+
18+
```
19+
src/encode_connector/
20+
server/main.py # MCP server — 20 tools, ~1500 lines (entry point)
21+
client/encode_client.py # Async ENCODE API client, ~585 lines, 1-hour TTL cache
22+
client/downloader.py # File download manager, ~305 lines, MD5 verification
23+
client/auth.py # OS keyring + Fernet credential storage, ~262 lines
24+
client/models.py # Pydantic models for API responses, ~332 lines
25+
client/constants.py # API URLs, filter values, ~348 lines
26+
client/tracker.py # SQLite experiment tracker, ~1129 lines
27+
client/validation.py # Input validation, ~188 lines
28+
skills/ # 47 skills, each with SKILL.md + references/ + scripts/
29+
tests/ # 506 tests (pytest-asyncio, asyncio_mode=auto), 98% coverage
30+
```
31+
32+
## Package Identity
33+
34+
- **PyPI / console command**: `encode-toolkit`
35+
- **npm**: `encode-toolkit` (thin wrapper → uvx encode-toolkit)
36+
- **Plugin marketplace**: `encode-toolkit`
37+
- **Python module**: `encode_connector`
38+
39+
## Development Gotchas
40+
41+
- Use `.venv/bin/python` on macOS (`python` may not exist)
42+
- `asyncio_mode = "auto"` in pytest — no need for `@pytest.mark.asyncio`
43+
- `main.py` is ~53KB — don't send full contents to subagents (causes timeouts)
44+
- MCP SDK: use `instructions=` parameter, not `description=`
45+
- Integration tests hit live ENCODE API — deselect with `-m "not integration"`
46+
- `server.json` is for MCP registry; `.claude-plugin/plugin.json` is for Claude marketplace — keep both in sync
47+
- `conftest.py` sets up shared fixtures (tmp_path, mock tracker) — read before adding tests
48+
- npm `package.json` + `index.js` are thin wrappers that call `uvx encode-toolkit` — no JS logic
49+
50+
## What This Server Does
51+
52+
Provides 20 tools to search, download, and track ENCODE data:
53+
- **Search**: Find experiments by assay, organ, biosample, target, organism
54+
- **Download**: Get BED, FASTQ, BAM, bigWig files with MD5 verification
55+
- **Track**: Local experiment tracking with publications, citations, provenance
56+
- **Cross-reference**: Link to PubMed, bioRxiv, ClinicalTrials, GEO
57+
58+
## Key Concepts
59+
60+
**Assay types**: Histone ChIP-seq, TF ChIP-seq, ATAC-seq, DNase-seq, RNA-seq, WGBS, Hi-C, scRNA-seq, scATAC-seq, CRISPR screen, STARR-seq, MPRA, eCLIP, CUT&RUN, CUT&Tag
61+
62+
**Biosample hierarchy**: tissue > primary cell > cell line > in vitro differentiated > organoid
63+
64+
**Tier 1 cell lines** (most data): K562, GM12878, H1-hESC
65+
66+
**File selection priority**: preferred_default=True > IDR thresholded peaks > fold change over control
67+
68+
**Assembly**: Use GRCh38 for human, mm10 for mouse. Never mix assemblies.
69+
70+
## Tool Selection Guide
71+
72+
| User wants to... | Use tool |
73+
|---|---|
74+
| Find experiments | `encode_search_experiments` |
75+
| Explore what data exists (live counts) | `encode_get_facets` |
76+
| Get valid filter strings (static list) | `encode_get_metadata` |
77+
| Get experiment details | `encode_get_experiment` |
78+
| Find specific file types | `encode_search_files` |
79+
| List files for experiment | `encode_list_files` |
80+
| Get file details | `encode_get_file_info` |
81+
| Download specific files by accession | `encode_download_files` |
82+
| Search + download in one step | `encode_batch_download` |
83+
| Track experiments locally | `encode_track_experiment` |
84+
| Compare experiments | `encode_compare_experiments` |
85+
| Get citations | `encode_get_citations` |
86+
| Log derived files | `encode_log_derived_file` |
87+
| Link to PubMed/GEO | `encode_link_reference` |
88+
| List tracked experiments | `encode_list_tracked` |
89+
| Export tracking data | `encode_export_data` |
90+
| View file provenance | `encode_get_provenance` |
91+
| View linked references | `encode_get_references` |
92+
| Get collection summary | `encode_summarize_collection` |
93+
| Manage API credentials | `encode_manage_credentials` |
94+
95+
### Example Queries
96+
97+
**Search**: `encode_search_experiments(assay_title="Histone ChIP-seq", organ="pancreas", target="H3K27ac")` → finds all H3K27ac ChIP-seq in pancreas tissue
98+
99+
**Download**: `encode_download_files(file_accessions=["ENCFF123ABC"], download_dir="/data/encode")` → downloads with MD5 verification
100+
101+
**Track**: `encode_track_experiment(accession="ENCSR000ABC", notes="Liver H3K4me3 for enhancer analysis")` → saves to local SQLite with publications
102+
103+
**Explore**: `encode_get_facets(assay_title="Histone ChIP-seq", organ="pancreas")` → shows available targets, labs, biosample types
104+
105+
**Batch download**: `encode_batch_download(assay_title="ATAC-seq", organ="liver", file_format="bed", output_type="IDR thresholded peaks", dry_run=True)` → previews matching files before download
106+
107+
**Compare**: `encode_compare_experiments(accession1="ENCSR123ABC", accession2="ENCSR456DEF")` → checks compatibility for combined analysis
108+
109+
## 47 Skills Available
110+
111+
**Core**: setup, search-encode, download-encode, track-experiments, cross-reference
112+
113+
**Analysis**: quality-assessment, integrative-analysis, regulatory-elements, epigenome-profiling, compare-biosamples, visualization-workflow, motif-analysis, peak-annotation, batch-analysis
114+
115+
**Functional Genomics**: functional-screen-analysis
116+
117+
**Data Aggregation**: histone-aggregation, accessibility-aggregation, hic-aggregation, methylation-aggregation
118+
119+
**External Databases**: gtex-expression, clinvar-annotation, cellxgene-context, gwas-catalog, jaspar-motifs, ensembl-annotation, geo-connector, gnomad-variants, ucsc-browser
120+
121+
**Workflows**: data-provenance, cite-encode, variant-annotation, pipeline-guide, single-cell-encode, disease-research, publication-trust, bioinformatics-installer, scientific-writing, liftover-coordinates
122+
123+
**Pipeline Execution**: pipeline-chipseq, pipeline-atacseq, pipeline-rnaseq, pipeline-wgbs, pipeline-hic, pipeline-dnaseseq, pipeline-cutandrun
124+
125+
**Meta-Analysis**: scrna-meta-analysis, multi-omics-integration
126+
127+
## Reference Files
128+
129+
- `skills/histone-aggregation/references/histone-marks-reference.md` — Comprehensive chromatin biology catalog (1,442 lines, 74 references, 12 sections: histone marks, ChromHMM states, functional categories, contradictions, TF combinations, chromatin remodeling, DNA methylation interplay, nucleosome dynamics, 3D genome organization, chromatin in disease)
130+
- `skills/*/references/literature.md` — 34 literature reference documents (33 per-skill + 1 chromatin biology catalog, ~320 papers with DOI, PMID, citation counts, key findings)
131+
132+
## Quality Awareness
133+
134+
- ENCODE audits: ERROR > NOT_COMPLIANT > WARNING > INTERNAL_ACTION
135+
- ChIP-seq metrics: FRiP ≥1%, NSC >1.05, RSC >0.8, NRF ≥0.8 (Landt et al. 2012)
136+
- ATAC-seq metrics: TSS enrichment ≥6, fragment size nucleosomal ladder (Buenrostro et al. 2013)
137+
- RNA-seq: Mapping rate >80%, rRNA <10%, replicate correlation ≥0.9 (Conesa et al. 2016)
138+
- WGBS: Bisulfite conversion >99%, CpG coverage ≥10× for DMRs (Foox et al. 2021)
139+
- Hi-C: Cis/trans ratio >60%, long-range cis >40% (Yardimci et al. 2019)
140+
- CUT&RUN/CUT&Tag: Different QC profiles from ChIP-seq; use suspect list (Nordin et al. 2023)
141+
- Always use 2+ biological replicates
142+
- Always apply ENCODE Blacklist v2 (Amemiya et al. 2019)
143+
- No single metric is sufficient — interpret collectively
144+
145+
## Provenance Standard
146+
147+
Every operation should log: tool name + version, exact command, input accessions + MD5, reference files + source + MD5, output descriptions + counts, and statistics. Scripts stored with sequential numbering. Enables auto-generation of publication-ready methods sections.
148+
149+
## Cross-Database Integration
150+
151+
This plugin works with MCP servers:
152+
- **PubMed** (search_articles) — Literature search and citation
153+
- **bioRxiv** (search_preprints) — Preprint discovery
154+
- **ClinicalTrials.gov** (search_trials) — Clinical trial cross-reference
155+
- **Open Targets** (query_open_targets_graphql) — Drug target identification
156+
- **Consensus** (search) — Academic paper search across 200M+ papers
157+
158+
And via skills (REST API/CLI):
159+
- **UCSC Genome Browser** — cCRE tracks, TF binding, sequence retrieval via REST API
160+
- **NCBI GEO** — Complementary expression/epigenomic datasets via E-utilities
161+
- **gnomAD** — Population allele frequencies and gene constraint via GraphQL
162+
- **Ensembl** — VEP variant annotation, Regulatory Build, coordinate liftover via REST API
163+
- **NCBI SRA** — Raw sequencing reads linked from GEO (via E-utilities elink)
164+
- **GTEx** — Tissue-specific gene expression for ENCODE regulatory element interpretation via REST API
165+
- **ClinVar** — Clinical variant significance for ENCODE-identified regulatory variants via E-utilities
166+
- **CELLxGENE** — Single-cell expression context for ENCODE bulk data via REST API
167+
- **GWAS Catalog** — GWAS associations in ENCODE regulatory regions via REST API
168+
- **JASPAR** — Transcription factor binding motifs for ENCODE ChIP-seq peak analysis via REST API

plugin/skills

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)