Skip to content

--outFileNamePrefix ending in . is treated as a directory (STAR uses it as a literal string prefix) #26

Description

@pinin4fjords

Summary

When --outFileNamePrefix ends with . (or any path component without a separator), rustar-aligner treats it as a directory name and writes bare-named outputs inside it. STAR treats the same value as a literal string prefix and writes prefixed files at the top level.

This breaks every wrapper that emits files via STAR's prefix convention (<sample>.Aligned.out.bam, <sample>.Log.final.out, ...). We had to add a post-processing step in the nf-core/rnaseq integration to flatten the directory back to STAR-style filenames so the downstream collect-channels would work.

STAR reference behaviour

STAR treats --outFileNamePrefix as a raw string prefix concatenated onto each output filename (no directory-component split). With --outFileNamePrefix SAMPLE. you get SAMPLE.Aligned.out.bam, SAMPLE.Log.final.out, SAMPLE._STARpass1/ etc. at the top level — see source/Parameters.cpp where outFileNamePrefix is parsed and concatenated.

Root cause in rustar

src/params.rs:346-347:

#[arg(long = "outFileNamePrefix", default_value = "./")]
pub out_file_name_prefix: PathBuf,

Typing the prefix as PathBuf produces the directory-style behaviour: when the prefix ends in . (or /) it's joined with subsequent filenames as a path component rather than concatenated as a string.

Reproducer

#!/usr/bin/env bash
set -euo pipefail
mkdir -p /tmp/rustar-mre-26 && cd /tmp/rustar-mre-26

BASE=https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a
curl -fsLO $BASE/reference/genome.fasta
curl -fsL  $BASE/reference/genes_with_empty_tid.gtf.gz | gunzip -c > genes.gtf
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_1.fastq.gz
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_2.fastq.gz

RUSTAR=ghcr.io/scverse/rustar-aligner:dev
STAR=community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4

mkdir -p idx-rustar idx-star
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner --runMode genomeGenerate \
    --genomeDir idx-rustar --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
    --sjdbOverhang 100 --genomeSAindexNbases 7
docker run --rm -v $PWD:/w -w /w $STAR STAR --runMode genomeGenerate \
    --genomeDir idx-star --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
    --sjdbOverhang 100 --genomeSAindexNbases 7

COMMON=(--readFilesIn SRR6357072_1.fastq.gz SRR6357072_2.fastq.gz --readFilesCommand zcat
        --runThreadN 4 --outSAMtype BAM Unsorted)

docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner \
    --genomeDir idx-rustar "${COMMON[@]}" --outFileNamePrefix WT.
docker run --rm -v $PWD:/w -w /w $STAR STAR \
    --genomeDir idx-star "${COMMON[@]}" --outFileNamePrefix WT.

echo "=== Top-level entries matching 'WT' ==="
ls -d WT* 2>/dev/null
echo
echo "=== If WT. is a directory (rustar), list its contents ==="
[ -d WT. ] && ls -1 WT./

Observed (verified)

STAR (concatenated literal prefix):

WT.Aligned.out.bam
WT.Log.final.out
WT.Log.out
WT.Log.progress.out
WT.SJ.out.tab
WT._STARpass1/
WT._STARgenome/

rustar (WT. is a directory, files bare-named inside):

WT.

# contents of WT.:
Aligned.out.bam
Log.final.out
SJ.out.tab
SJ.pass1.out.tab

(After the same run; both invocations use the same --outFileNamePrefix WT..)

Suggested fix

Treat out_file_name_prefix as a string concatenation, matching STAR's behaviour. Two options:

  1. Change the type to String, build each output path as format!("{prefix}Aligned.out.bam") etc. Naturally handles both styles (SAMPLE. -> SAMPLE.Aligned.out.bam; out/SAMPLE_ -> out/SAMPLE_Aligned.out.bam).
  2. Keep PathBuf but explicitly split into (parent_dir, file_prefix) based on whether the value ends with a path separator vs a string suffix.

Option 1 is closer to STAR's implementation and avoids special-casing trailing characters.

Why this matters

Every STAR-driven pipeline I'm aware of (nf-core/rnaseq, nf-core/scrnaseq, the various STAR Snakemake wrappers, raw scripts in countless papers) expects <prefix>Aligned.out.bam at a known location. Today those break silently — the output collection step finds nothing matching the glob, or matches the empty directory.


Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise: author:pinin4fjords or grep for nf-core/rnaseq#1855.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions