You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When two variants land close to each other (same codon, same exon, same read span), their combined effect on the protein depends on whether they are in cis (same haplotype/allele) or trans (different haplotypes). Varcode currently predicts each variant's effect independently, against the reference — which silently assumes no interaction.
Example: two SNVs in the same codon
Reference codon: `GCA` (Ala)
Variant A: position 1, G→T
Variant B: position 3, A→T
In cis (both on same allele): `GCA` → `TCT` (Ala → Ser). One substitution effect.
In trans (each on different allele): two separate substitutions. Variant A alone: `GCA` → `TCA` (Ala → Ser). Variant B alone: `GCA` → `GCT` (Ala → Ala, silent).
Varcode today reports variant A as `A→S` and variant B as `A→A` (silent), which is what you'd get from trans. If they're actually in cis, the protein has a single `A→S` change at that codon, not two independent events.
Other scenarios
Compound heterozygotes: two damaging LoF variants in the same gene. In trans → full knockout. In cis → one functional copy remains. Clinical interpretation differs radically.
Frameshift rescue: a frameshift insertion + a nearby frameshift deletion on the same allele can restore reading frame. Independently they each shift the frame.
Phased germline + somatic: a somatic variant on the same haplotype as a germline SNP (see Germline-aware effect prediction (umbrella) #268) affects one allele; in trans, it affects the other. Peptide context differs.
Scope
Preserve phase information from VCFs: VCF `GT` fields distinguish phased (`0|1`) from unphased (`0/1`) genotypes. Currently varcode discards this in the metadata dict. First-class access would let effect prediction take advantage of it when available.
Phased effect prediction: when two or more variants overlap the same codon / same exon / a short window and are known to be in cis, predict their joint effect rather than independent effects. The result could be a `HaplotypeEffect` carrying multiple source variants.
Phase block awareness: modern variant callers and tools like WhatsHap produce phase-set blocks (`PS` tag). Variants within the same block on the same haplotype are phased relative to each other. Varcode should respect these blocks.
Unphased-with-evidence fallback: when no phasing information is available but two variants are close enough that long reads or paired-end reads could resolve phase, emit both the cis and trans predictions as candidates (ties in to the possibility-set model in Incorporate RNA-level evidence for variant effects #259).
Design questions
Data model: where does phase live? Attached to the variant (with the phase set ID), or on a new `Haplotype` object that groups variants from the same phase block?
Priority: should phased joint effects replace individual effects, or coexist with them? (I lean toward coexist — different consumers want different granularities.)
Multi-sample: phasing is per-sample. In a multi-sample VCF, the same variant may be phased differently across samples.
Background
When two variants land close to each other (same codon, same exon, same read span), their combined effect on the protein depends on whether they are in cis (same haplotype/allele) or trans (different haplotypes). Varcode currently predicts each variant's effect independently, against the reference — which silently assumes no interaction.
Example: two SNVs in the same codon
In cis (both on same allele): `GCA` → `TCT` (Ala → Ser). One substitution effect.
In trans (each on different allele): two separate substitutions. Variant A alone: `GCA` → `TCA` (Ala → Ser). Variant B alone: `GCA` → `GCT` (Ala → Ala, silent).
Varcode today reports variant A as `A→S` and variant B as `A→A` (silent), which is what you'd get from trans. If they're actually in cis, the protein has a single `A→S` change at that codon, not two independent events.
Other scenarios
Scope
Preserve phase information from VCFs: VCF `GT` fields distinguish phased (`0|1`) from unphased (`0/1`) genotypes. Currently varcode discards this in the metadata dict. First-class access would let effect prediction take advantage of it when available.
Phased effect prediction: when two or more variants overlap the same codon / same exon / a short window and are known to be in cis, predict their joint effect rather than independent effects. The result could be a `HaplotypeEffect` carrying multiple source variants.
Phase block awareness: modern variant callers and tools like WhatsHap produce phase-set blocks (`PS` tag). Variants within the same block on the same haplotype are phased relative to each other. Varcode should respect these blocks.
Unphased-with-evidence fallback: when no phasing information is available but two variants are close enough that long reads or paired-end reads could resolve phase, emit both the cis and trans predictions as candidates (ties in to the possibility-set model in Incorporate RNA-level evidence for variant effects #259).
Design questions
Dependencies
Related prior art
Part of the #270 umbrella.