Summary
vepyr adds stop_lost alongside frameshift_variant when a frameshift extends past the original stop codon position. Ensembl VEP reports only frameshift_variant in these cases — it does not add stop_lost for frameshifts that naturally displace the stop.
Scale: 26 Consequence mismatches across chr1-22 (HG002 GRCh38 benchmark).
Root cause: The suppression block in transcript_consequence.rs (lines ~1630-1635) sets classification.stop_lost = false when frameshift_variant is present, but this logic is bypassed in certain code paths — specifically when the stop_lost check runs after the suppression block, or when the variant is classified through classify_coding_change_deletion() which adds stop_lost independently. #90 sub-pattern C identified 3 cases; the full chr1-22 run reveals 26.
Test cases (10 variants from different chromosomes)
All are frameshifts (deletions or insertions). Expected: frameshift_variant only. Actual: frameshift_variant&stop_lost.
# chrom pos ref alt
chr2 20598246 CT C
chr3 37984259 CA C
chr5 1428787 AG A
chr8 22619565 GGCAGTCC G
chr9 36003431 CTT C
chr13 77017068 CT C
chr15 40354319 CT C
chr20 145669 ACC A
chr21 30541662 AG A
chr22 19029764 GC G
Detailed examples
chr2:20598246 CT>C
vepyr: frameshift_variant&stop_lost
VEP: frameshift_variant
chr8:22619565 GGCAGTCC>G (2 transcripts)
vepyr: frameshift_variant&stop_lost
VEP: frameshift_variant
chr3:7691039 AT>A (with splice_region_variant + NMD)
vepyr: frameshift_variant&stop_lost&splice_region_variant&NMD_transcript_variant
VEP: frameshift_variant&splice_region_variant&NMD_transcript_variant
Related issues
Summary
vepyr adds
stop_lostalongsideframeshift_variantwhen a frameshift extends past the original stop codon position. Ensembl VEP reports onlyframeshift_variantin these cases — it does not addstop_lostfor frameshifts that naturally displace the stop.Scale: 26 Consequence mismatches across chr1-22 (HG002 GRCh38 benchmark).
Root cause: The suppression block in transcript_consequence.rs (lines ~1630-1635) sets
classification.stop_lost = falsewhenframeshift_variantis present, but this logic is bypassed in certain code paths — specifically when thestop_lostcheck runs after the suppression block, or when the variant is classified throughclassify_coding_change_deletion()which addsstop_lostindependently. #90 sub-pattern C identified 3 cases; the full chr1-22 run reveals 26.Test cases (10 variants from different chromosomes)
All are frameshifts (deletions or insertions). Expected:
frameshift_variantonly. Actual:frameshift_variant&stop_lost.Detailed examples
chr2:20598246 CT>C
chr8:22619565 GGCAGTCC>G (2 transcripts)
chr3:7691039 AT>A (with splice_region_variant + NMD)
Related issues