DeepVariant Complete Genomics T7 case study

In this case study, we describe applying DeepVariant to a Complete Genomics T7 sample. Then we assess the quality of the DeepVariant variant calls with hap.py.

To make it faster to run over this case study, we run only on chromosome 20.

For how to prepare environment, the steps are the same as this doc.

Download Reference

We will be using GRCh38 for this case study.

mkdir -p reference

FTPDIR=ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids

curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz | gunzip > reference/GRCh38_no_alt_analysis_set.fasta
curl ${FTPDIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.fai > reference/GRCh38_no_alt_analysis_set.fasta.fai

Download Complete Genomics T7 HG001 chr20 BAM

mkdir -p input

HTTPDIR=https://storage.googleapis.com/deepvariant/complete-case-study-testdata

curl ${HTTPDIR}/HG001.complete_t7.E100030471QC960.grch38.chr20.bam > input/HG001.complete_t7.E100030471QC960.grch38.chr20.bam

curl ${HTTPDIR}/HG001.complete_t7.E100030471QC960.grch38.chr20.bam.bai > input/HG001.complete_t7.E100030471QC960.grch38.chr20.bam.bai

Download Genome in a Bottle Benchmarks for HG001

mkdir -p benchmark

FTPDIR=ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv4.2.1/GRCh38

curl ${FTPDIR}/HG001_GRCh38_1_22_v4.2.1_benchmark.bed > benchmark/HG001_GRCh38_1_22_v4.2.1_benchmark.bed
curl ${FTPDIR}/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz > benchmark/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
curl ${FTPDIR}/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi > benchmark/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi

Download Complete Genomics T7 model

HTTPDIR=https://storage.googleapis.com/deepvariant/complete-case-study-testdata

curl ${HTTPDIR}/complete-t7/weights-51-0.995354.ckpt.data-00000-of-00001 > input/weights-51-0.995354.ckpt.data-00000-of-00001

curl ${HTTPDIR}/complete-t7/weights-51-0.995354.ckpt.index > input/weights-51-0.995354.ckpt.index

curl ${HTTPDIR}/complete-t7/example_info.json > input/example_info.json

curl ${HTTPDIR}/complete-t7/model.example_info.json > input/model.example_info.json

Running DeepVariant with one command

On a CPU-only machine:

mkdir -p output
mkdir -p output/intermediate_results_dir

BIN_VERSION="1.10.0"

sudo docker run \
  -v "${PWD}/input":"/input" \
  -v "${PWD}/output":"/output" \
  -v "${PWD}/reference":"/reference" \
  google/deepvariant:"${BIN_VERSION}" \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type WGS \
  --ref /reference/GRCh38_no_alt_analysis_set.fasta \
  --reads /input/HG001.complete_t7.E100030471QC960.grch38.chr20.bam \
  --output_vcf /output/HG001.output.vcf.gz \
  --output_gvcf /output/HG001.output.g.vcf.gz \
  --num_shards $(nproc) \
  --regions chr20 \
  --intermediate_results_dir /output/intermediate_results_dir \
  --customized_model /input/weights-51-0.995354.ckpt

For running on GPU machines, or using Singularity instead of Docker, see Quick Start.

Benchmark on chr20

mkdir -p happy

sudo docker pull jmcdani20/hap.py:v0.3.12

sudo docker run \
  -v "${PWD}/benchmark":"/benchmark" \
  -v "${PWD}/input":"/input" \
  -v "${PWD}/output":"/output" \
  -v "${PWD}/reference":"/reference" \
  -v "${PWD}/happy:/happy" \
  jmcdani20/hap.py:v0.3.12 /opt/hap.py/bin/hap.py \
  /benchmark/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz \
  /output/HG001.output.vcf.gz \
  -f /benchmark/HG001_GRCh38_1_22_v4.2.1_benchmark.bed \
  -r /reference/GRCh38_no_alt_analysis_set.fasta \
  -o /happy/happy.output \
  --engine=vcfeval \
  --pass-only \
  -l chr20

Output:

Benchmarking Summary:
Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL         9974      9946        28        21045         8      10743      2      5       0.997193          0.999223        0.510478         0.998207                     NaN                     NaN                   1.630447                   2.158659
INDEL   PASS         9974      9946        28        21045         8      10743      2      5       0.997193          0.999223        0.510478         0.998207                     NaN                     NaN                   1.630447                   2.158659
  SNP    ALL        69175     68876       299        85012        45      16048      8      2       0.995678          0.999347        0.188773         0.997509                2.288757                2.083179                   1.730097                   1.781651
  SNP   PASS        69175     68876       299        85012        45      16048      8      2       0.995678          0.999347        0.188773         0.997509                2.288757                2.083179                   1.730097                   1.781651

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepVariant Complete Genomics T7 case study

Download Reference

Download Complete Genomics T7 HG001 chr20 BAM

Download Genome in a Bottle Benchmarks for HG001

Download Complete Genomics T7 model

Running DeepVariant with one command

Benchmark on chr20

FilesExpand file tree

deepvariant-complete-t7-case-study.md

Latest commit

History

deepvariant-complete-t7-case-study.md

File metadata and controls

DeepVariant Complete Genomics T7 case study

Download Reference

Download Complete Genomics T7 HG001 chr20 BAM

Download Genome in a Bottle Benchmarks for HG001

Download Complete Genomics T7 model

Running DeepVariant with one command

Benchmark on chr20