You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When stitching an annotated splice junction, rustar adds the motif score AND the sjdb_score bonus to the alignment score. STAR uses one OR the other (replacement, not addition) — sjdb_score for annotated junctions, motif_score for unannotated ones. Net effect: rustar over-credits annotated junctions by motif_score (e.g. +0 for GT/AG → no visible difference, -4 for GC/AG → wrong by 4, -8 for AT/AC or non-canonical → wrong by 8).
Currently invisible because of #27 (annotated lookup always returns false so the additive branch never fires). The moment #27 lands (currently in PR #45), this becomes the dominant remaining AS-divergence source on annotated splices.
Surfaced during the AS-divergence investigation on #33.
source/Transcript_alignScore.cpp (and the matching logic in Stitch.cpp) treats annotated and unannotated junctions as mutually exclusive scoring paths:
if (sjAnnot == 1) {
score += sjdbScore; // annotated: replace motif score with sjdb bonus
} else {
score += motifScore; // unannotated: motif score only
}
So an annotated GT/AG junction in STAR contributes +sjdbScore (default +2), not motif_score (+0) + sjdbScore (+2) = +2. For motifs where motif_score != 0 this matters:
Motif
motif_score (STAR default)
rustar additive
STAR replacement
Per-junction error
GT/AG (canonical)
0
+0 + +2 = +2
+2
0 (invisible)
GC/AG
-4
-4 + +2 = -2
+2
4
AT/AC
-8
-8 + +2 = -6
+2
8
Non-canonical
-8
-8 + +2 = -6
+2
8
(unannotated GT/AG)
0
+0
+0
0
Most annotated junctions in well-curated annotations are canonical GT/AG, so the visible impact is small on typical data. But for any genome with appreciable GC/AG, AT/AC, or rescue-by-novel junctions in the sjdb, rustar's AS is systematically off.
Suggested fix
Switch to replacement semantics matching STAR:
d_score += if is_annotated {
scorer.sjdb_score}else{
motif_score
};
Or equivalently:
if is_annotated {
d_score += scorer.sjdb_score;}else{
d_score += motif_score;}
4-line patch. Gated on #27 — until that lands, is_annotated is always false and the change has no observable effect (the else branch is identical to today's unconditional d_score += motif_score).
Run on a paired-end input with a known mix of annotated motif types (or use the yeast test profile — most splices are canonical GT/AG, so the per-record delta on this profile is small).
Compare AS values against STAR's on identical-CIGAR records. On annotated GT/AG records the delta should be 0; on annotated GC/AG records the delta should drop from -4 (pre-fix) to 0.
Severity
Medium-low. Score-only divergence on annotated junctions, observable only after #27 lands. Doesn't affect alignment selection (since the relative ordering of candidates is preserved — both annotated branches gain +motif_score - sjdb_score relative to STAR, so the ranking among annotated alternatives is consistent). Affects any downstream tool that consumes the absolute AS value (multi-mapper EM weights, score-threshold filters keyed on STAR's AS distribution).
Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise: author:pinin4fjords or grep for nf-core/rnaseq#1855.
Summary
When stitching an annotated splice junction, rustar adds the motif score AND the
sjdb_scorebonus to the alignment score. STAR uses one OR the other (replacement, not addition) —sjdb_scorefor annotated junctions,motif_scorefor unannotated ones. Net effect: rustar over-credits annotated junctions bymotif_score(e.g.+0for GT/AG → no visible difference,-4for GC/AG → wrong by 4,-8for AT/AC or non-canonical → wrong by 8).Currently invisible because of #27 (annotated lookup always returns
falseso the additive branch never fires). The moment #27 lands (currently in PR #45), this becomes the dominant remaining AS-divergence source on annotated splices.Surfaced during the AS-divergence investigation on #33.
Current rustar code
src/align/stitch.rs:1316-1319:STAR reference behaviour
source/Transcript_alignScore.cpp(and the matching logic inStitch.cpp) treats annotated and unannotated junctions as mutually exclusive scoring paths:So an annotated GT/AG junction in STAR contributes
+sjdbScore(default+2), notmotif_score (+0) + sjdbScore (+2) = +2. For motifs wheremotif_score != 0this matters:motif_score(STAR default)+0 + +2 = +2+2-4 + +2 = -2+2-8 + +2 = -6+2-8 + +2 = -6+2+0+0Most annotated junctions in well-curated annotations are canonical GT/AG, so the visible impact is small on typical data. But for any genome with appreciable GC/AG, AT/AC, or rescue-by-novel junctions in the sjdb, rustar's AS is systematically off.
Suggested fix
Switch to replacement semantics matching STAR:
Or equivalently:
4-line patch. Gated on #27 — until that lands,
is_annotatedis alwaysfalseand the change has no observable effect (theelsebranch is identical to today's unconditionald_score += motif_score).Test plan
Log.final.outalways reportsNumber of splices: Annotated (sjdb) = 0despite--sjdbGTFfile#27 + this PR applied.-4(pre-fix) to0.Severity
Medium-low. Score-only divergence on annotated junctions, observable only after #27 lands. Doesn't affect alignment selection (since the relative ordering of candidates is preserved — both annotated branches gain
+motif_score - sjdb_scorerelative to STAR, so the ranking among annotated alternatives is consistent). Affects any downstream tool that consumes the absoluteASvalue (multi-mapper EM weights, score-threshold filters keyed on STAR'sASdistribution).Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise:
author:pinin4fjordsor grep fornf-core/rnaseq#1855.