Skip to content

Boundary-codon reconstruction for exon skip protein math #298

@iskandr

Description

@iskandr

Problem

_build_in_frame_exon_skip_effect in `varcode/splice_outcomes.py` (PR #292) assumes the skipped exon begins and ends on codon boundaries when computing the resulting amino-acid deletion:

```python
aa_start = (exon_start_in_tx - cds_start_offset) // 3
n_aa_removed = exon_length // 3
aa_end = aa_start + n_aa_removed
aa_ref = str(transcript.protein_sequence[aa_start:aa_end])
```

When an exon actually starts mid-codon (i.e. the previous exon contributes 1 or 2 bases to the boundary codon), the math counts codons correctly (integer division truncates), but the boundary codon itself is not reconstructed. After the skip, the joined transcript's boundary codon is assembled from the last bases of the exon before the skip and the first bases of the exon after the skip — which may translate to a different amino acid than either of the flanking reference codons.

Today the candidate reports the wrong `aa_ref` at the seam and doesn't express the boundary-codon substitution at all; the in-frame case collapses to a pure `Deletion`, when biologically it's a `Deletion` with a seam substitution.

The out-of-frame skip path (`_build_out_of_frame_exon_skip_effect`) is less affected because it retranslates the full post-skip cDNA to first stop, which naturally picks up the new boundary codon — but it uses an ad-hoc `_ExonSkipFrameshiftEffect` shim rather than varcode's standard `FrameShift` class.

Scope

  • In-frame exon skip: reconstruct the boundary codon from the cDNA of the flanking exons, emit a `ComplexSubstitution` (or equivalent) at the seam in addition to the `Deletion` of the interior AAs — or represent the whole thing as a single `ComplexSubstitution` whose `aa_alt` replaces the skipped AAs plus the reshaped boundary codon.
  • Out-of-frame exon skip: consider reusing `FrameShift` directly rather than the `_ExonSkipFrameshiftEffect` shim. The existing shim computes `mutant_protein_sequence` correctly but doesn't integrate with type-based downstream consumers.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions