You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clonality is one of the most decision-relevant orthogonal signals for neoantigen prioritization. Clonal neoantigens (present in essentially all tumor cells) predict immune-checkpoint-blockade response and elicit reactive T cells; subclonal neoantigens are far weaker targets — and for vaccine design a subclonal target only covers a fraction of the tumor (McGranahan et al., Science 2016).
ScanNeo2 already reports per-variant VAF, but raw VAF is not clonality — it is confounded by tumor purity and local copy number (a variant in a copy-gained or low-purity region is mis-scored). This issue adds a proper cancer cell fraction (CCF) estimate and a clonal/subclonal classification.
Scope — light
ScanNeo2 will not call copy number or purity itself (no ASCAT/Sequenza/FACETS integration — that would be a much larger build). Instead, the user supplies per-sample tumor purity and a copy-number segments file (produced by whatever CNV caller they already use). ScanNeo2 computes CCF from VAF + purity + copy number. Computing purity/CNV in-pipeline is a possible full-scope follow-up.
CCF formula
For a variant at a locus with tumor total copy number CN_t:
CCF = VAF * ( p * CN_t + (1 - p) * CN_n ) / ( p * m )
VAF — observed variant allele frequency (already available per variant)
p — tumor purity (user-supplied, per sample)
CN_t — tumor total copy number at the locus (from the user's CNV file)
CN_n — normal copy number (2 autosomal; 1 hemizygous / sex chromosomes)
m — mutation multiplicity (copies carrying the mutation)
clonal if CCF >= threshold (default ~0.9, configurable), else subclonal.
CNV lookup helper — given (chrom, pos), return total copy number from the segments file. Accept a generic TSV (chrom, start, end, total_cn); document how to derive it from common callers (ASCAT/Sequenza/FACETS). A position with no covering segment ⇒ assume CN_n (diploid) or leave empty.
CCF computation — slot into the prioritization stage where VAF is already in hand (variants.py populates vaf; compute alongside or in compile.py/filtering.py). Apply the formula per variant.
Multiplicity (m) — the fuzzy part. v1 simplification: pick m in 1..CN_t giving CCF closest to but not exceeding 1; default m = 1 when ambiguous. Document the assumption; a probabilistic estimate is a follow-up.
Classification — add a clonality value (clonal / subclonal) from a configurable CCF threshold.
Output — two new columns, ccf and clonality, in {vartype}_{mhc_class}_neoepitopes.txt.
Ranking integration — v1: report-only columns; do not change the existing ranking_score. Whether to fold clonality into the score is a separate decision (it would shift all existing scores).
Tests — unit-test the CCF formula with synthetic (VAF, purity, CN_t, m) cases of known CCF, including hemizygous loci and missing-input handling.
Standalone feature, not part of the 2026-03-26 audit cluster. Full-scope in-pipeline purity/CNV calling (ASCAT/Sequenza-style) is explicitly out of scope here and a possible separate follow-up.
Motivation
Clonality is one of the most decision-relevant orthogonal signals for neoantigen prioritization. Clonal neoantigens (present in essentially all tumor cells) predict immune-checkpoint-blockade response and elicit reactive T cells; subclonal neoantigens are far weaker targets — and for vaccine design a subclonal target only covers a fraction of the tumor (McGranahan et al., Science 2016).
ScanNeo2 already reports per-variant VAF, but raw VAF is not clonality — it is confounded by tumor purity and local copy number (a variant in a copy-gained or low-purity region is mis-scored). This issue adds a proper cancer cell fraction (CCF) estimate and a clonal/subclonal classification.
Scope — light
ScanNeo2 will not call copy number or purity itself (no ASCAT/Sequenza/FACETS integration — that would be a much larger build). Instead, the user supplies per-sample tumor purity and a copy-number segments file (produced by whatever CNV caller they already use). ScanNeo2 computes CCF from VAF + purity + copy number. Computing purity/CNV in-pipeline is a possible full-scope follow-up.
CCF formula
For a variant at a locus with tumor total copy number
CN_t:VAF— observed variant allele frequency (already available per variant)p— tumor purity (user-supplied, per sample)CN_t— tumor total copy number at the locus (from the user's CNV file)CN_n— normal copy number (2 autosomal; 1 hemizygous / sex chromosomes)m— mutation multiplicity (copies carrying the mutation)clonalifCCF >= threshold(default ~0.9, configurable), elsesubclonal.Implementation plan
Per-sample inputs —
purity(scalar 0-1) andcnv(segments file). These belong in the Multi-sample support: process multiple samples in one run via a sample sheet #93 sample sheet (purity,cnvcolumns); until Multi-sample support: process multiple samples in one run via a sample sheet #93 lands they can be config keys. If either is absent for a sample, the CCF/clonality columns are left empty — graceful, no failure (optional feature).CNV lookup helper — given
(chrom, pos), return total copy number from the segments file. Accept a generic TSV (chrom,start,end,total_cn); document how to derive it from common callers (ASCAT/Sequenza/FACETS). A position with no covering segment ⇒ assumeCN_n(diploid) or leave empty.CCF computation — slot into the prioritization stage where VAF is already in hand (
variants.pypopulatesvaf; compute alongside or incompile.py/filtering.py). Apply the formula per variant.Multiplicity (
m) — the fuzzy part. v1 simplification: pickmin1..CN_tgiving CCF closest to but not exceeding 1; defaultm = 1when ambiguous. Document the assumption; a probabilistic estimate is a follow-up.Classification — add a
clonalityvalue (clonal/subclonal) from a configurable CCF threshold.Output — two new columns,
ccfandclonality, in{vartype}_{mhc_class}_neoepitopes.txt.Ranking integration — v1: report-only columns; do not change the existing
ranking_score. Whether to fold clonality into the score is a separate decision (it would shift all existing scores).Tests — unit-test the CCF formula with synthetic
(VAF, purity, CN_t, m)cases of known CCF, including hemizygous loci and missing-input handling.Open design decisions
ranking_score. v1 = report-only.Dependencies
purity/cnvinputs fit most naturally as Multi-sample support: process multiple samples in one run via a sample sheet #93 sample-sheet columns. Can ship with config keys first if Multi-sample support: process multiple samples in one run via a sample sheet #93 is not yet merged.Scope note
Standalone feature, not part of the 2026-03-26 audit cluster. Full-scope in-pipeline purity/CNV calling (ASCAT/Sequenza-style) is explicitly out of scope here and a possible separate follow-up.