VarNova Benchmarks

Reproducible benchmarks comparing VarNova against ANNOVAR, VEP, and SnpEff for genomic variant annotation.

Paper: VarNova: a high-performance Rust-based genomic variant annotator (in preparation)
Download VarNova binary: Releases →

Results (48-core server, 62 GB RAM, NVMe + HDD)

Tier 1 - Gene Annotation Only

Tool	Wall Time	Speed	vs VarNova
VarNova 0.1	0.8 s	111,315 v/s	-
ANNOVAR 2020	7.3 s	12,198 v/s	9.1× slower
SnpEff 5.2a	53.6 s	1,661 v/s	67× slower
VEP 115.2	69.5 s	1,281 v/s	87× slower

Tier 2 - Full Pipeline (gene + gnomAD + ClinVar + dbSNP)

Tool	Wall Time	Speed	Output Variants	vs VarNova
VarNova 0.1	18.9 s	4,711 v/s	90,188	-
ANNOVAR 2020	263.8 s	337 v/s	90,188	14× slower
VEP 115.2	202.4 s	439 v/s	89,052	10.7× slower
SnpEff 5.2a	1,619 s	54 v/s	-	86× slower

Input: 89,052 variants (GATK HaplotypeCaller, exome, hg38)
Databases: gnomAD 4.1 exome (18 GB) + ClinVar 2024 (1 GB) + dbSNP 151 (29 GB)
VarNova used .vnidx binary databases on NVMe. ANNOVAR/VEP/SnpEff used standard formats on HDD.

Run It Yourself

Requirements

Tool	Version	Required for
VarNova	≥ 0.1.0	All benchmarks
ANNOVAR	2020+	ANNOVAR comparison
VEP	≥ 110	VEP comparison
SnpEff	≥ 5.0	SnpEff comparison
GNU time	any	Timing
Python 3	≥ 3.8	Test VCF generation

Quick start (VarNova only, 5 minutes)

# 1. Clone benchmark repo
git clone https://github.com/imrobintomar/VarNova_Benchmarks.git
cd VarNova_Benchmarks

# 2. Download VarNova binary (Linux x86_64)
wget https://github.com/imrobintomar/VarNova_Benchmarks/releases/latest/download/varnova-linux-x86_64.tar.gz
tar -xzf varnova-linux-x86_64.tar.gz
mv varnova ~/.local/bin/ && chmod +x ~/.local/bin/varnova
export PATH="$HOME/.local/bin:$PATH"
varnova --version

# 3. Generate test VCF (1000 synthetic variants - no real data needed)
python3 scripts/generate_test_vcf.py --output testdata/test.vcf --variants 1000

# 4. Edit config.sh - set DB = to your humandb/ directory
cp config.sh my_config.sh
# nano my_config.sh

# 5. Run (VarNova only if ANNOVAR/VEP/SnpEff not installed)
bash scripts/run_benchmark.sh my_config.sh

Full benchmark (all 4 tools)

# Install all tools
bash scripts/setup.sh

# Download annotation databases (~60 GB)
# Edit config.sh first, then:
perl annovar/annotate_variation.pl --downdb refGene humandb/ --buildver hg38
perl annovar/annotate_variation.pl --downdb gnomad41_exome humandb/ --buildver hg38
perl annovar/annotate_variation.pl --downdb clinvar_20240730 humandb/ --buildver hg38
perl annovar/annotate_variation.pl --downdb avsnp151 humandb/ --buildver hg38

# (Optional) Convert to VarNova binary format for maximum speed
mkdir -p ~/varnova_db
varnova convert humandb/hg38_gnomad41_exome.txt --out-dir ~/varnova_db/
varnova convert humandb/hg38_clinvar_20240730.txt --out-dir ~/varnova_db/
varnova convert humandb/hg38_avsnp151.txt --out-dir ~/varnova_db/

# Run full benchmark
bash scripts/run_benchmark.sh

Expected output

results/YYYYMMDD_HHMMSS/
├── benchmark_report.txt    ← full timing log
├── results.csv             ← machine-readable results
├── varnova_gene.tsv        ← VarNova gene annotation
├── *.annotated.tsv         ← VarNova full pipeline output
├── annovar_gene.*
├── annovar_full.*
├── vep_*.vcf
└── snpeff_*.vcf

What VarNova output includes

Every variant automatically gets standard ANNOVAR-compatible columns plus gene-disease context:

Func.refGene | Gene.refGene | ExonicFunc.refGene | AAChange.refGene
GenCC_Classification | ClinGen_Validity | OMIM_Inheritance | OMIM_Gene_MIM
HPO_Disease_Count | Disease_Names | PanelApp_Panels

Example for a BRCA1 variant:

exonic | BRCA1 | nonsynonymous SNV | BRCA1:NM_007294:exon10:c.981A>T:p.Lys327Asn
Definitive | Definitive | AD | 113705 | 84 | Breast-ovarian cancer;... | Hereditary BRCA(Green)

Reproducing the Published Results

The exact hardware and commands used:

Machine: 48-core CPU, 62 GB RAM, 7.3 TB HDD (82 MB/s) + 3.7 TB NVMe
OS: Ubuntu 22.04, kernel 6.8.0
VarNova databases: .vnidx on NVMe (~/varnova_db/)
ANNOVAR/VEP/SnpEff databases: standard format on HDD

# VarNova
varnova table \
  -i sample_89052variants_hg38.vcf \
  --gene-db humandb/hg38_refGene.txt \
  --filter-db hg38_gnomad41_exome.txt,hg38_clinvar_20240730.txt,hg38_avsnp151.txt \
  --cache-dir ~/varnova_db/ \
  --out-dir results/ \
  --threads 48 -v

Differences from Other Benchmarks

This benchmark is designed to be fair and reproducible:

Same databases - all tools use identical data (gnomAD 4.1 + ClinVar 2024 + dbSNP 151)
Same input VCF - identical variants for all tools
Same hardware - all tools run on the same machine in the same session
Cold cache - first-run timing (OS page cache flushed between runs)
Output verification - variant counts verified across tools (VarNova = ANNOVAR = 90,188)

Contributing Your Results

Run the benchmark on your hardware and share results:

Fork this repo
Run bash scripts/run_benchmark.sh
Copy results/*/results.csv → community_results/<machine_name>.csv
Add specs to community_results/README.md
Open a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VarNova Benchmarks

Results (48-core server, 62 GB RAM, NVMe + HDD)

Tier 1 - Gene Annotation Only

Tier 2 - Full Pipeline (gene + gnomAD + ClinVar + dbSNP)

Run It Yourself

Requirements

Quick start (VarNova only, 5 minutes)

Full benchmark (all 4 tools)

Expected output

What VarNova output includes

Reproducing the Published Results

Differences from Other Benchmarks

Contributing Your Results

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.sh		config.sh

Folders and files

Latest commit

History

Repository files navigation

VarNova Benchmarks

Results (48-core server, 62 GB RAM, NVMe + HDD)

Tier 1 - Gene Annotation Only

Tier 2 - Full Pipeline (gene + gnomAD + ClinVar + dbSNP)

Run It Yourself

Requirements

Quick start (VarNova only, 5 minutes)

Full benchmark (all 4 tools)

Expected output

What VarNova output includes

Reproducing the Published Results

Differences from Other Benchmarks

Contributing Your Results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages