You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Every phase uses differential testing against STAR where applicable
131
131
- Test data tiers: synthetic micro-genome → chr22 → full human genome
132
132
133
-
**Current test status**: 278/278 tests passing (274 unit + 4 integration), 0 clippy warnings
133
+
**Current test status**: 364/364 tests passing (359 unit + 5 integration), 0 clippy warnings
134
134
135
135
## Known Issues — Disagreement Root Causes (10k SE yeast)
136
136
137
-
**127 total position disagreements — ALL verified as genuine ties** (confirmed via STAR debug tracing):
137
+
**299 total position disagreements — ALL verified as genuine ties** (SA-order ties + seeded-RNG tie-break divergence from STAR's mt19937):
138
138
139
-
Both tools find identical alignment sets for all 127 disagreements. The primary difference is tie-breaking order (SA iteration order). Neither alignment is more correct than the other.
139
+
Both tools find identical alignment sets for all 299 disagreements. Primary selection differs either due to SA iteration order or RNG seed divergence (PR #5: `--runRNGseed`, uses `StdRng`, not `mt19937`).
140
140
141
-
-**100 diff-chr ties** — same set of alignments, different repeat copy chosen as primary.
142
-
-**27 same-chr ties** — same alignment set, different primary due to tie-breaking (includes multi-intron reads where both tools find same 2 alignments but select different primaries).
141
+
-**100+ diff-chr ties** — same set of alignments, different repeat copy chosen as primary.
142
+
-**27+ same-chr ties** — same alignment set, different primary due to tie-breaking.
143
143
144
144
**1 CIGAR-only disagreement (same position, different CIGAR):**
145
145
-`ERR12389696.13573895`: both tools align to XV:218357 MAPQ=255, but ruSTAR gives `100M1I45M4S` (insertion at read pos 100) while STAR gives `108M1I37M4S` (insertion at 108). Root cause: both alignments score AS=133. The 71-base seed is found at RC pos 29 (ruSTAR) vs RC pos 37 (STAR) due to different Lmapped chain paths through a long homopolymer region. Same diagonal, different starting position → different insertion placement. Seed-level tie.
@@ -161,22 +161,19 @@ Previously listed issues now resolved:
161
161
162
162
See [ROADMAP.md](ROADMAP.md) and [docs/](docs/) for full issue tracking.
1.`try_pair_transcripts` now computes STAR-faithful combined-span penalty: `combined_wt_score = t1.score + t2.score - p1 - p2 + combined_p`. Previously double-applied per-mate span penalties → AS tag wrong for 99.6% of PE reads. Now 3.1%.
170
-
2. Decision tree reordered to (1) position dedup → (2) score-range filter → (3) TooManyLoci → (4) quality filter. Fixes 12 half-mapped pairs.
168
+
**Phase G2** (2026-04-29): `MAX_RECURSION` 10,000→100,000 + `sa_pos_to_forward` overflow fix. `ERR12389696.7118031` was the sole source of both NH diffs and MAPQ inflations (NH=3 vs STAR's NH=9, ruSTAR MAPQ=1 vs STAR MAPQ=0). Root cause: the 47-WA rDNA cluster (4 copies × multiple seeds per mate) exhausted 10k recursions before exploring the 4th within-copy pair. Fix: raise the per-cluster recursion budget from 10k to 100k. Also fixed: `sa_pos_to_forward` underflow panic for reverse-strand seeds near genome boundary (now `saturating_sub`). Also added guard in `finalize_transcript` to reject WTs where `adjusted_genome_start + ref_len > n_genome`.
171
169
172
-
**Current PE parity**: 8767 vs STAR 8390 (+377 extra, mostly rDNA N² cross-copy pairs). 236 half-mapped. 28 MAPQ inflations / 192 deflations (rDNA N² problem). `.18919121` fixed (Phase 17.B). `.6302610` still FP.
170
+
**Phase G1** (2026-04-29): `split_combined_wt` junction_idx fix. Reduced `.16980960`'s pairs from 11 to 9, matching STAR's NH=9.
173
171
174
-
## Remaining Limitations (Top 5)
172
+
**Note on faithfulness change**: Phase F1 (--runRNGseed PR) changed PE tie-breaking from SA-order to seeded StdRng, increasing tie-breaking diffs. Phase G1 improved faithfulness from 99.755% → 99.865%. Phase G2: 99.865% → 99.883%.
- No `--outStd SAM/BAM` (stdout output) — Phase 17.6
179
-
- No `--outReadsUnmapped Fastx` — Phase 17.4
180
176
- No STARsolo single-cell features — Phase 14 (deferred)
177
+
- 4 PE AS diffs (ruSTAR improvements, not bugs): `.844151` finds VIII:451791 0mm vs STAR's VII:1001391 6mm; `.4972950` finds correct spliced mate2 vs STAR's unspliced. Both cases: STAR's combined-window approach fails to stitch a PE pair at the better location.
181
178
182
179
See [docs/phase17_features.md](docs/phase17_features.md) for full feature status.
CI runs on Linux (x86_64, x86-64-v3, aarch64), macOS (aarch64), and Windows (x86_64). PRs must pass all CI checks before merging.
16
+
17
+
## Test data
18
+
19
+
Small synthetic and yeast test data lives in `test/`. Integration tests in `tests/` use the synthetic genome. Differential testing against STAR reference outputs is done via `test/compare_sam.py` and `test/compare_pe.py`.
20
+
21
+
## Project history
22
+
23
+
ruSTAR was written as a faithful port of [STAR](https://github.com/alexdobin/STAR) by Alexander Dobin. Up to the initial release, the goal was behavioral parity with STAR — matching its algorithms, thresholds, and output formats as closely as possible. Notes from that development phase are in `docs/dev/`.
24
+
25
+
Future development is not bound by that constraint. Adding STARsolo, new features, or diverging from STAR behavior is entirely welcome.
0 commit comments