Skip to content

Commit 5cfbcfd

Browse files
committed
Release v0.3.0: Bioinformatics audit, ENCODE API corrections, filter validation
- Fix 13 wrong ENCODE assay_title values, add 30+ missing (verified against live API) - Add 19 missing organ_slims including blood (6,558 experiments) - Fix biosample classifications (remove 3 phantom types, add 2 real) - Remove 5 wrong output_types, add 50+ correct ones - Extract _extract_assemblies() and _extract_audit_counts() helpers - Add assembly field to ExperimentDetail model - Extract all 4 ENCODE audit levels (ERROR, NOT_COMPLIANT, WARNING, INTERNAL_ACTION) - Add check_filter_value() with cached case-insensitive matching - Wire filter validation into all 4 search tools with warnings - Fix search_term forwarding bug in search_files - Add None guards on related_series/documents list comprehensions - Fix scATAC-seq -> snATAC-seq across all skill files (0 API results for scATAC-seq) - Fix assay_title="RNA-seq" -> "total RNA-seq" across 40+ files - Update license CC-BY-NC-4.0 -> AGPL-3.0-only - Version bump to 0.3.0 across all manifests - Add CITATION.cff - 568 tests, 98% coverage, all lint clean
1 parent d931f60 commit 5cfbcfd

91 files changed

Lines changed: 1416 additions & 561 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude-plugin/marketplace.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"plugins": [
1111
{
1212
"name": "encode-toolkit",
13-
"version": "0.3.0-beta.10",
13+
"version": "0.3.0",
1414
"source": "./plugin",
1515
"description": "20 ENCODE API tools + 47 expert skills for genomics research. Search experiments, download files with MD5 verification, run pipelines, and cross-reference 14 databases."
1616
}

.claude-plugin/plugin.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
{
22
"name": "encode-toolkit",
33
"description": "20 ENCODE API tools + 47 expert skills for genomics research. Search experiments, download files with MD5 verification, run pipelines, and cross-reference 14 databases.",
4-
"version": "0.3.0-beta.10",
4+
"version": "0.3.0",
55
"author": {
66
"name": "Dr. Alex M. Mawla, PhD",
77
"email": "ammawla@ucdavis.edu"
88
},
99
"homepage": "https://github.com/ammawla/encode-toolkit",
1010
"repository": "https://github.com/ammawla/encode-toolkit",
1111
"icon": "docs/icon.svg",
12-
"license": "CC-BY-NC-4.0",
12+
"license": "AGPL-3.0-only",
1313
"keywords": [
1414
"genomics",
1515
"encode",

.cursor-plugin/marketplace.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
},
77
"metadata": {
88
"description": "ENCODE Project genomics research infrastructure for Cursor",
9-
"version": "0.3.0-beta.10",
9+
"version": "0.3.0",
1010
"homepage": "https://github.com/ammawla/encode-toolkit"
1111
},
1212
"plugins": [

.cursor-plugin/plugin.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
{
22
"name": "encode-toolkit",
33
"description": "20 ENCODE API tools + 47 expert skills for genomics research. Search experiments, download files with MD5 verification, run pipelines, and cross-reference 14 databases.",
4-
"version": "0.3.0-beta.10",
4+
"version": "0.3.0",
55
"author": {
66
"name": "Dr. Alex M. Mawla, PhD",
77
"email": "ammawla@ucdavis.edu"
88
},
99
"homepage": "https://github.com/ammawla/encode-toolkit",
1010
"repository": "https://github.com/ammawla/encode-toolkit",
11-
"license": "CC-BY-NC-4.0",
11+
"license": "AGPL-3.0-only",
1212
"logo": "docs/icon.svg",
1313
"keywords": [
1414
"genomics",

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222

2323
<!-- How was this tested? -->
2424

25-
- [ ] All 506 existing tests pass (`pytest tests/ -v`)
25+
- [ ] All 540 existing tests pass (`pytest tests/ -v`)
2626
- [ ] New tests added for new functionality
2727
- [ ] Lint passes (`ruff check src/`)
2828
- [ ] Format passes (`ruff format --check src/`)

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Initial public beta release.
3434
- **OS keyring credential management** with Fernet-encrypted file fallback
3535
- **Thread-safe SQLite tracker** with full transaction safety
3636
- **Streaming downloads** with 64KB chunks and SSRF-safe redirect validation
37-
- **506 tests** with 98% code coverage
37+
- **568 tests** with 98% code coverage
3838
- **34 literature reference documents** (~320 papers cataloged with DOI, PMID, key findings)
3939
- **9 scientist-facing vignettes** with real ENCODE API output
4040
- **GitHub Actions CI/CD** (pytest across Python 3.10–3.13, ruff lint, plugin validation)

CITATION.cff

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
cff-version: 1.2.0
2+
message: "If you use this software, please cite it as below."
3+
type: software
4+
title: "ENCODE Toolkit"
5+
abstract: "A Model Context Protocol server for programmatic access to ENCODE functional genomics data."
6+
authors:
7+
- family-names: Mawla
8+
given-names: Alex M.
9+
orcid: "https://orcid.org/0000-0003-0907-464X"
10+
affiliation: "Independent Researcher"
11+
version: 0.3.0
12+
date-released: "2026-03-08"
13+
license: AGPL-3.0-only
14+
repository-code: "https://github.com/ammawla/encode-toolkit"
15+
url: "https://github.com/ammawla/encode-toolkit"
16+
doi: "10.5281/zenodo.18917511"
17+
keywords:
18+
- bioinformatics
19+
- genomics
20+
- ENCODE
21+
- epigenomics
22+
- MCP
23+
- functional genomics

CLAUDE.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ MCP server for the ENCODE Project (encodeproject.org) — the largest public cat
77
```bash
88
python -m venv .venv && source .venv/bin/activate
99
pip install -e ".[dev]"
10-
pytest # 506 tests, 98% coverage
10+
pytest # 568 tests, 98% coverage
1111
ruff check src/ # lint
1212
ruff format src/ # auto-format
1313
encode-toolkit # run MCP server
@@ -21,12 +21,12 @@ src/encode_connector/
2121
client/encode_client.py # Async ENCODE API client, ~585 lines, 1-hour TTL cache
2222
client/downloader.py # File download manager, ~305 lines, MD5 verification
2323
client/auth.py # OS keyring + Fernet credential storage, ~262 lines
24-
client/models.py # Pydantic models for API responses, ~332 lines
25-
client/constants.py # API URLs, filter values, ~348 lines
24+
client/models.py # Pydantic models for API responses, ~367 lines
25+
client/constants.py # API URLs, filter values, ~450 lines
2626
client/tracker.py # SQLite experiment tracker, ~1129 lines
27-
client/validation.py # Input validation, ~188 lines
27+
client/validation.py # Input validation, ~226 lines
2828
skills/ # 47 skills, each with SKILL.md + references/ + scripts/
29-
tests/ # 506 tests (pytest-asyncio, asyncio_mode=auto), 98% coverage
29+
tests/ # 568 tests (pytest-asyncio, asyncio_mode=auto), 98% coverage
3030
```
3131

3232
## Package Identity
@@ -57,7 +57,7 @@ Provides 20 tools to search, download, and track ENCODE data:
5757

5858
## Key Concepts
5959

60-
**Assay types**: Histone ChIP-seq, TF ChIP-seq, ATAC-seq, DNase-seq, RNA-seq, WGBS, Hi-C, scRNA-seq, scATAC-seq, CRISPR screen, STARR-seq, MPRA, eCLIP, CUT&RUN, CUT&Tag
60+
**Assay types**: Histone ChIP-seq, TF ChIP-seq, ATAC-seq, DNase-seq, total RNA-seq, polyA plus RNA-seq, WGBS, intact Hi-C, scRNA-seq, snATAC-seq, snRNA-seq, CRISPR screen, STARR-seq, MPRA, eCLIP, CUT&RUN, CUT&Tag
6161

6262
**Biosample hierarchy**: tissue > primary cell > cell line > in vitro differentiated > organoid
6363

@@ -133,9 +133,9 @@ Provides 20 tools to search, download, and track ENCODE data:
133133

134134
- ENCODE audits: ERROR > NOT_COMPLIANT > WARNING > INTERNAL_ACTION
135135
- ChIP-seq metrics: FRiP ≥1%, NSC >1.05, RSC >0.8, NRF ≥0.8 (Landt et al. 2012)
136-
- ATAC-seq metrics: TSS enrichment ≥6, fragment size nucleosomal ladder (Buenrostro et al. 2013)
137-
- RNA-seq: Mapping rate >80%, rRNA <10%, replicate correlation ≥0.9 (Conesa et al. 2016)
138-
- WGBS: Bisulfite conversion >99%, CpG coverage ≥10× for DMRs (Foox et al. 2021)
136+
- ATAC-seq metrics: TSS enrichment ≥5 (GRCh38), ≥6 (hg19), ≥10 (mm10), fragment size nucleosomal ladder (ENCODE data standards)
137+
- RNA-seq: Mapping rate 70-90% expected (Conesa et al. 2016), rRNA <10% (community standard), replicate correlation ≥0.9 isogenic / ≥0.8 anisogenic (ENCODE data standards)
138+
- WGBS: Bisulfite conversion ≥98%, CpG coverage ≥10× for DMRs (ENCODE data standards)
139139
- Hi-C: Cis/trans ratio >60%, long-range cis >40% (Yardimci et al. 2019)
140140
- CUT&RUN/CUT&Tag: Different QC profiles from ChIP-seq; use suspect list (Nordin et al. 2023)
141141
- Always use 2+ biological replicates

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@
44

55
[![License: AGPL-3.0](https://img.shields.io/badge/license-AGPL--3.0-green.svg)](LICENSE)
66
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
7-
[![Version](https://img.shields.io/badge/version-0.3.0--beta.10-yellow)](CHANGELOG.md)
7+
[![Version](https://img.shields.io/badge/version-0.3.0-green)](CHANGELOG.md)
88
[![Status](https://img.shields.io/badge/status-beta-yellow)]()
99
[![Skills](https://img.shields.io/badge/skills-47-orange)](docs/skill-vignettes/)
1010
[![Tools](https://img.shields.io/badge/MCP_tools-20-purple)](src/encode_connector/server/main.py)
1111
[![Pipelines](https://img.shields.io/badge/pipelines-7-green)](skills/pipeline-chipseq/)
1212
[![Databases](https://img.shields.io/badge/databases-14-teal)](docs/SHOWCASE.md)<br>
13-
[![Tests](https://img.shields.io/badge/tests-506_passing-brightgreen)](tests/)
13+
[![Tests](https://img.shields.io/badge/tests-568_passing-brightgreen)](tests/)
1414
[![Coverage](https://img.shields.io/badge/coverage-98%25-brightgreen)](tests/)
1515
[![Security](https://img.shields.io/badge/security-no_telemetry-blue)]()
1616
[![Claude Code](https://img.shields.io/badge/Claude_Code-plugin-blueviolet)](https://claude.com/claude-code)
@@ -805,6 +805,6 @@ pytest
805805

806806
## License
807807

808-
**Restrictive Non-Commercial License.** Free for personal, educational, and academic research. No derivative works without written permission. Commercial use requires a separate license. See [LICENSE](LICENSE) for full terms.
808+
AGPL-3.0. See [LICENSE](LICENSE) for full terms.
809809

810810
For commercial licensing inquiries: ammawla@ucdavis.edu

agents/rnaseq-pipeline.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ You are an ENCODE RNA-seq processing specialist. Guide users through the complet
1515
5. **QC Metrics**: RNA-SeQC for comprehensive quality assessment
1616

1717
## Quality Thresholds
18-
- Mapping rate > 80%
18+
- Mapping rate 70-90%
1919
- rRNA contamination < 10%
2020
- Replicate correlation (Spearman) >= 0.9
2121
- Strandedness verified
@@ -27,6 +27,6 @@ You are an ENCODE RNA-seq processing specialist. Guide users through the complet
2727
- Junction files (novel splice junctions)
2828

2929
## Tools
30-
Use `encode_search_experiments` with assay_title="RNA-seq" to find data.
30+
Use `encode_search_experiments` with assay_title="total RNA-seq" to find data.
3131

3232
Refer to the pipeline-rnaseq skill for full Nextflow implementation.

0 commit comments

Comments
 (0)