Summary
When --outFileNamePrefix ends with . (or any path component without a separator), rustar-aligner treats it as a directory name and writes bare-named outputs inside it. STAR treats the same value as a literal string prefix and writes prefixed files at the top level.
This breaks every wrapper that emits files via STAR's prefix convention (<sample>.Aligned.out.bam, <sample>.Log.final.out, ...). We had to add a post-processing step in the nf-core/rnaseq integration to flatten the directory back to STAR-style filenames so the downstream collect-channels would work.
STAR reference behaviour
STAR treats --outFileNamePrefix as a raw string prefix concatenated onto each output filename (no directory-component split). With --outFileNamePrefix SAMPLE. you get SAMPLE.Aligned.out.bam, SAMPLE.Log.final.out, SAMPLE._STARpass1/ etc. at the top level — see source/Parameters.cpp where outFileNamePrefix is parsed and concatenated.
Root cause in rustar
src/params.rs:346-347:
#[arg(long = "outFileNamePrefix", default_value = "./")]
pub out_file_name_prefix: PathBuf,
Typing the prefix as PathBuf produces the directory-style behaviour: when the prefix ends in . (or /) it's joined with subsequent filenames as a path component rather than concatenated as a string.
Reproducer
#!/usr/bin/env bash
set -euo pipefail
mkdir -p /tmp/rustar-mre-26 && cd /tmp/rustar-mre-26
BASE=https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a
curl -fsLO $BASE/reference/genome.fasta
curl -fsL $BASE/reference/genes_with_empty_tid.gtf.gz | gunzip -c > genes.gtf
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_1.fastq.gz
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_2.fastq.gz
RUSTAR=ghcr.io/scverse/rustar-aligner:dev
STAR=community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4
mkdir -p idx-rustar idx-star
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner --runMode genomeGenerate \
--genomeDir idx-rustar --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
--sjdbOverhang 100 --genomeSAindexNbases 7
docker run --rm -v $PWD:/w -w /w $STAR STAR --runMode genomeGenerate \
--genomeDir idx-star --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
--sjdbOverhang 100 --genomeSAindexNbases 7
COMMON=(--readFilesIn SRR6357072_1.fastq.gz SRR6357072_2.fastq.gz --readFilesCommand zcat
--runThreadN 4 --outSAMtype BAM Unsorted)
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner \
--genomeDir idx-rustar "${COMMON[@]}" --outFileNamePrefix WT.
docker run --rm -v $PWD:/w -w /w $STAR STAR \
--genomeDir idx-star "${COMMON[@]}" --outFileNamePrefix WT.
echo "=== Top-level entries matching 'WT' ==="
ls -d WT* 2>/dev/null
echo
echo "=== If WT. is a directory (rustar), list its contents ==="
[ -d WT. ] && ls -1 WT./
Observed (verified)
STAR (concatenated literal prefix):
WT.Aligned.out.bam
WT.Log.final.out
WT.Log.out
WT.Log.progress.out
WT.SJ.out.tab
WT._STARpass1/
WT._STARgenome/
rustar (WT. is a directory, files bare-named inside):
WT.
# contents of WT.:
Aligned.out.bam
Log.final.out
SJ.out.tab
SJ.pass1.out.tab
(After the same run; both invocations use the same --outFileNamePrefix WT..)
Suggested fix
Treat out_file_name_prefix as a string concatenation, matching STAR's behaviour. Two options:
- Change the type to
String, build each output path as format!("{prefix}Aligned.out.bam") etc. Naturally handles both styles (SAMPLE. -> SAMPLE.Aligned.out.bam; out/SAMPLE_ -> out/SAMPLE_Aligned.out.bam).
- Keep
PathBuf but explicitly split into (parent_dir, file_prefix) based on whether the value ends with a path separator vs a string suffix.
Option 1 is closer to STAR's implementation and avoids special-casing trailing characters.
Why this matters
Every STAR-driven pipeline I'm aware of (nf-core/rnaseq, nf-core/scrnaseq, the various STAR Snakemake wrappers, raw scripts in countless papers) expects <prefix>Aligned.out.bam at a known location. Today those break silently — the output collection step finds nothing matching the glob, or matches the empty directory.
Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise: author:pinin4fjords or grep for nf-core/rnaseq#1855.
Summary
When
--outFileNamePrefixends with.(or any path component without a separator),rustar-alignertreats it as a directory name and writes bare-named outputs inside it. STAR treats the same value as a literal string prefix and writes prefixed files at the top level.This breaks every wrapper that emits files via STAR's prefix convention (
<sample>.Aligned.out.bam,<sample>.Log.final.out, ...). We had to add a post-processing step in the nf-core/rnaseq integration to flatten the directory back to STAR-style filenames so the downstream collect-channels would work.STAR reference behaviour
STAR treats
--outFileNamePrefixas a raw string prefix concatenated onto each output filename (no directory-component split). With--outFileNamePrefix SAMPLE.you getSAMPLE.Aligned.out.bam,SAMPLE.Log.final.out,SAMPLE._STARpass1/etc. at the top level — seesource/Parameters.cppwhereoutFileNamePrefixis parsed and concatenated.Root cause in rustar
src/params.rs:346-347:Typing the prefix as
PathBufproduces the directory-style behaviour: when the prefix ends in.(or/) it's joined with subsequent filenames as a path component rather than concatenated as a string.Reproducer
Observed (verified)
STAR (concatenated literal prefix):
rustar (
WT.is a directory, files bare-named inside):(After the same run; both invocations use the same
--outFileNamePrefix WT..)Suggested fix
Treat
out_file_name_prefixas a string concatenation, matching STAR's behaviour. Two options:String, build each output path asformat!("{prefix}Aligned.out.bam")etc. Naturally handles both styles (SAMPLE.->SAMPLE.Aligned.out.bam;out/SAMPLE_->out/SAMPLE_Aligned.out.bam).PathBufbut explicitly split into(parent_dir, file_prefix)based on whether the value ends with a path separator vs a string suffix.Option 1 is closer to STAR's implementation and avoids special-casing trailing characters.
Why this matters
Every STAR-driven pipeline I'm aware of (nf-core/rnaseq, nf-core/scrnaseq, the various STAR Snakemake wrappers, raw scripts in countless papers) expects
<prefix>Aligned.out.bamat a known location. Today those break silently — the output collection step finds nothing matching the glob, or matches the empty directory.Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise:
author:pinin4fjordsor grep fornf-core/rnaseq#1855.