Skip to content

Surface non-SNV antigens (SPLICE / FUSION / CTA-SELF / ERV) from LENS reports #263

@iskandr

Description

@iskandr

Status

Currently dropped silently. ~30% of antigens in a real LENS report (Hugo IPRES Pt02: 613/2153 rows) have NaN `variant_coords` because their source isn't a single point mutation:

  • SPLICE (~6% of Pt02): aberrant splice junctions — coords live in `splice_coords` / `splice_description`.
  • FUSION (~0.1%): gene fusions — coords in `fusion_left_breakpoint` / `fusion_right_breakpoint` plus `fusion_left_gene` / `fusion_right_gene` / `fusion_type`.
  • CTA/SELF (~13%): cancer-testis antigens — no point mutation; tumor-specificity comes from expression pattern.
  • ERV (~10%): endogenous retroviruses — `erv_orf_id` and `erv_*` columns.

Why these matter

These are real neoantigens with known therapeutic value:

  • Fusion-derived neoantigens are the canonical example for fusion-driven cancers (e.g. EWSR1-FLI1 in Ewing sarcoma).
  • CTA-driven vaccines (NY-ESO-1, MAGE-A) are the basis for several clinical trials.
  • ERV-derived antigens are an active area for melanoma and other tumors.

Vaxrank data-model gap

`MutantProteinFragment` is shaped around point mutations (single `varcode.Variant`). To represent fusion / splice / CTA / ERV, we need either:

  1. Extend the data model: add an `antigen_source` field plus per-source provenance (e.g. `fusion_breakpoints: tuple`, `splice_junction: dict`, `cta_gene_id: str`, `erv_orf_id: str`). Each source type gets its own optional dataclass attached to the fragment.

  2. Polymorphic dispatch: a single `SourceProvenance` union type with subclasses (SNVProvenance, SpliceProvenance, FusionProvenance, CTAProvenance, ERVProvenance).

  3. Free-form metadata dict: `MutantProteinFragment.source_metadata: dict` — least invasive, least typed, most flexible.

Option 1 or 2 is the right long-term move; option 3 unlocks the data faster while we figure out the schema.

Acceptance

  • LENS rows with `antigen_source` ∈ {SPLICE, FUSION, CTA/SELF, ERV} produce `MutantProteinFragment`s with their source-specific provenance preserved.
  • Reports surface the antigen source so reviewers know what kind of neoantigen they're looking at.
  • Per-source filtering: vaccine designs can opt in / out of CTA, ERV, etc. independently.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions