Skip to content

--chimSegmentMin > 0 + --twopassMode Basic aborts the run when --outFileNamePrefix doesn't end in / #35

@pinin4fjords

Description

@pinin4fjords

Summary

rustar-aligner --chimSegmentMin > 0 combined with --twopassMode Basic (and a --outFileNamePrefix value that doesn't end in /) aborts the run before writing any output:

[INFO  rustar_aligner] Chimeric detection enabled (chimSegmentMin=12)
Error: I/O error: No such file or directory (os error 2) (SAMPLE./Chimeric.out.junction)

Caused by:
    No such file or directory (os error 2)

No outputs are produced — Aligned.out.bam, Log.final.out, SJ.out.tab, and the chim file are all missing. The run is a hard failure.

This is a separable bug from #26 (--outFileNamePrefix treated as a directory):

  • Without --twopassMode Basic: rustar treats <prefix> as a directory and writes all outputs (including Chimeric.out.junction) inside it. The prefix-as-dir behaviour is the --outFileNamePrefix ending in . is treated as a directory (STAR uses it as a literal string prefix) #26 bug, but it doesn't crash.
  • With --twopassMode Basic: rustar still tries to write to <prefix>/Chimeric.out.junction, but in two-pass mode the chim writer fires in pass 1 before the dir-creating step that normal-output writing implicitly triggers, so the parent directory doesn't exist yet and the file open fails.

Pipelines that wrap STAR commonly add --chimSegmentMin via extra_star_align_args together with the rest of the standard STAR-fusion / arriba flag set, which includes --twopassMode Basic. Any nf-core/rnaseq run that enables both will crash silently (Nextflow surfaces only the non-zero exit; the error message points at a path issue, not a chim configuration issue).

STAR reference behaviour

STAR concatenates --outFileNamePrefix as a literal string when emitting <prefix>Chimeric.out.junction — no directory split, no parent-dir requirement, works regardless of whether the prefix ends in ., /, or any other character. See source/Chimeric.cpp and the file-opening glue in source/InOutStreams.cpp. The output path is built once at startup from P.outFileNamePrefix + "Chimeric.out.junction" and lives next to the other STAR outputs.

Reproducer

#!/usr/bin/env bash
set -euo pipefail
mkdir -p /tmp/rustar-mre-chim && cd /tmp/rustar-mre-chim

BASE=https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a
curl -fsLO $BASE/reference/genome.fasta
curl -fsL  $BASE/reference/genes_with_empty_tid.gtf.gz | gunzip -c > genes.gtf
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_1.fastq.gz
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_2.fastq.gz

RUSTAR=ghcr.io/scverse/rustar-aligner:dev
STAR=community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4

mkdir -p idx-rustar idx-star
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner --runMode genomeGenerate \
    --genomeDir idx-rustar --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
    --sjdbOverhang 100 --genomeSAindexNbases 7
docker run --rm -v $PWD:/w -w /w $STAR STAR --runMode genomeGenerate \
    --genomeDir idx-star --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
    --sjdbOverhang 100 --genomeSAindexNbases 7

# Identical flag set, two aligners. Both prefixes lack a trailing slash.
COMMON=(--readFilesIn SRR6357072_1.fastq.gz SRR6357072_2.fastq.gz --readFilesCommand zcat
        --runThreadN 4 --sjdbGTFfile genes.gtf --twopassMode Basic --runRNGseed 0
        --outSAMtype BAM Unsorted --outSAMattributes NH HI AS NM MD
        --outSAMattrRGline ID:WT_REP2 SM:WT_REP2
        --quantMode TranscriptomeSAM --quantTranscriptomeSAMoutput BanSingleEnd
        --chimSegmentMin 12 --chimOutType Junctions)

echo "=== STAR (--outFileNamePrefix STAR.) ==="
docker run --rm -v $PWD:/w -w /w $STAR STAR \
    --genomeDir idx-star "${COMMON[@]}" --outFileNamePrefix STAR.
echo "--- STAR outputs ---"
ls STAR.*Chimeric.out.junction STAR.Aligned.out.bam STAR.Log.final.out 2>&1

echo
echo "=== rustar (--outFileNamePrefix RUS.) ==="
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner \
    --genomeDir idx-rustar "${COMMON[@]}" --outFileNamePrefix RUS. 2>&1 | tail -8 || true
echo "--- rustar outputs (none expected; run failed) ---"
ls RUS.* 2>&1 || true
ls RUS./ 2>&1 || true

echo
echo "=== rustar with trailing-slash prefix (workaround) ==="
mkdir -p RUSdir
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner \
    --genomeDir idx-rustar "${COMMON[@]}" --outFileNamePrefix RUSdir/ 2>&1 | tail -3
echo "--- rustar outputs ---"
ls RUSdir/

Observed (verified, fresh paired run on commit 5f8ad08)

STAR with --outFileNamePrefix STAR.:

STAR.Aligned.out.bam
STAR.Aligned.toTranscriptome.out.bam
STAR.Chimeric.out.junction              # <-- chim file at top level, prefixed
STAR.Log.final.out
STAR.Log.out
STAR.Log.progress.out
STAR.SJ.out.tab
STAR._STARgenome/
STAR._STARpass1/

rustar with --outFileNamePrefix RUS.:

[INFO  rustar_aligner] Reading paired-end from ...
[INFO  rustar_aligner] Chimeric detection enabled (chimSegmentMin=12)
Error: I/O error: No such file or directory (os error 2) (RUS./Chimeric.out.junction)

Caused by:
    No such file or directory (os error 2)

--- rustar outputs (none expected; run failed) ---
ls: cannot access 'RUS.*': No such file or directory

rustar with --outFileNamePrefix RUSdir/ (parent pre-created):

Aligned.out.bam
Chimeric.out.junction       # <-- correct 14-column format
Log.final.out
SJ.out.tab

So the data layer is fine — Chimeric.out.junction content matches STAR's format when the path is reachable. Only the path-building / parent-dir-creation step is broken, and only when chim output combines with two-pass mode.

Suggested fix

The chimeric-output writer appears to construct its path as <outFileNamePrefix>/Chimeric.out.junction, treating the prefix as a directory unconditionally — but in two-pass mode it fires before the directory has been created (in non-two-pass mode, an earlier output writer happens to create it first, masking the bug). Either:

  1. Route chimeric output through the same path-builder used for the main outputs (Aligned.out.bam, Log.final.out, ...). Those handle the bare-prefix case today, since #26 (--outFileNamePrefix treated as dir) causes them to write inside the dir but it does at least create the dir along the way.
  2. Have the chim writer call std::fs::create_dir_all(parent_of(target_path)) before opening the output file. One-line fix in the chim-emission path.

The cleanest fix is #1 plus the broader #26 fix (treat prefix as a string concatenation, matching STAR). Then --outFileNamePrefix SAMPLE. yields SAMPLE.Chimeric.out.junction next to SAMPLE.Aligned.out.bam etc., matching STAR byte-for-byte.

Test plan

# After fix, both invocations should succeed and produce a chim file:
rustar-aligner --twopassMode Basic --chimSegmentMin 12 ... --outFileNamePrefix SAMPLE.
ls SAMPLE.Chimeric.out.junction   # currently does not exist; should after fix

rustar-aligner --twopassMode Basic --chimSegmentMin 12 ... --outFileNamePrefix dir/
ls dir/Chimeric.out.junction      # already works today; should still work

Why this matters

--chimSegmentMin is the standard way users enable fusion / chimeric detection on top of STAR — every STAR-fusion, arriba, and STAR-derived chimeric protocol begins with that flag. Most STAR wrappers (nf-core/rnaseq's extra_star_align_args, the STAR-fusion Snakemake wrapper, raw fusion-detection scripts) combine it with --twopassMode Basic and a STAR-prefix convention --outFileNamePrefix SAMPLE.. Today, any such invocation crashes against rustar before any output is written, with an error that doesn't obviously point at chim configuration.

Severity

Medium. nf-core/rnaseq doesn't currently pass --chimSegmentMin in its default test profile, so this isn't blocking the PR — but every user who adds it via extra_star_align_args will hit a silent failure mode that's hard to attribute without reading rustar source.


Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise: author:pinin4fjords or grep for nf-core/rnaseq#1855.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions