Skip to content

--outFileNamePrefix ending in . is treated as a directory (STAR uses it as a literal string prefix) #26

@pinin4fjords

Description

@pinin4fjords

Summary

When --outFileNamePrefix ends with . (or any path component without a separator), rustar-aligner treats it as a directory name and writes bare-named outputs inside it. STAR treats the same value as a literal string prefix and writes prefixed files at the top level.

This breaks every wrapper that emits files via STAR's prefix convention (<sample>.Aligned.out.bam, <sample>.Log.final.out, ...). We had to add a post-processing step in the nf-core/rnaseq integration to flatten the directory back to STAR-style filenames so the downstream collect-channels would work.

STAR reference behaviour

STAR treats --outFileNamePrefix as a raw string prefix concatenated onto each output filename (no directory-component split). With --outFileNamePrefix SAMPLE. you get SAMPLE.Aligned.out.bam, SAMPLE.Log.final.out, SAMPLE._STARpass1/ etc. at the top level — see source/Parameters.cpp where outFileNamePrefix is parsed and concatenated.

Root cause in rustar

src/params.rs:346-347:

#[arg(long = "outFileNamePrefix", default_value = "./")]
pub out_file_name_prefix: PathBuf,

Typing the prefix as PathBuf produces the directory-style behaviour: when the prefix ends in . (or /) it's joined with subsequent filenames as a path component rather than concatenated as a string.

Reproducer

#!/usr/bin/env bash
set -euo pipefail
mkdir -p /tmp/rustar-mre-26 && cd /tmp/rustar-mre-26

BASE=https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a
curl -fsLO $BASE/reference/genome.fasta
curl -fsL  $BASE/reference/genes_with_empty_tid.gtf.gz | gunzip -c > genes.gtf
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_1.fastq.gz
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_2.fastq.gz

RUSTAR=ghcr.io/scverse/rustar-aligner:dev
STAR=community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4

mkdir -p idx-rustar idx-star
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner --runMode genomeGenerate \
    --genomeDir idx-rustar --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
    --sjdbOverhang 100 --genomeSAindexNbases 7
docker run --rm -v $PWD:/w -w /w $STAR STAR --runMode genomeGenerate \
    --genomeDir idx-star --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
    --sjdbOverhang 100 --genomeSAindexNbases 7

COMMON=(--readFilesIn SRR6357072_1.fastq.gz SRR6357072_2.fastq.gz --readFilesCommand zcat
        --runThreadN 4 --outSAMtype BAM Unsorted)

docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner \
    --genomeDir idx-rustar "${COMMON[@]}" --outFileNamePrefix WT.
docker run --rm -v $PWD:/w -w /w $STAR STAR \
    --genomeDir idx-star "${COMMON[@]}" --outFileNamePrefix WT.

echo "=== Top-level entries matching 'WT' ==="
ls -d WT* 2>/dev/null
echo
echo "=== If WT. is a directory (rustar), list its contents ==="
[ -d WT. ] && ls -1 WT./

Observed (verified)

STAR (concatenated literal prefix):

WT.Aligned.out.bam
WT.Log.final.out
WT.Log.out
WT.Log.progress.out
WT.SJ.out.tab
WT._STARpass1/
WT._STARgenome/

rustar (WT. is a directory, files bare-named inside):

WT.

# contents of WT.:
Aligned.out.bam
Log.final.out
SJ.out.tab
SJ.pass1.out.tab

(After the same run; both invocations use the same --outFileNamePrefix WT..)

Suggested fix

Treat out_file_name_prefix as a string concatenation, matching STAR's behaviour. Two options:

  1. Change the type to String, build each output path as format!("{prefix}Aligned.out.bam") etc. Naturally handles both styles (SAMPLE. -> SAMPLE.Aligned.out.bam; out/SAMPLE_ -> out/SAMPLE_Aligned.out.bam).
  2. Keep PathBuf but explicitly split into (parent_dir, file_prefix) based on whether the value ends with a path separator vs a string suffix.

Option 1 is closer to STAR's implementation and avoids special-casing trailing characters.

Why this matters

Every STAR-driven pipeline I'm aware of (nf-core/rnaseq, nf-core/scrnaseq, the various STAR Snakemake wrappers, raw scripts in countless papers) expects <prefix>Aligned.out.bam at a known location. Today those break silently — the output collection step finds nothing matching the glob, or matches the empty directory.


Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise: author:pinin4fjords or grep for nf-core/rnaseq#1855.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions