Summary
Two small STAR drop-in gaps in the output layout that aren't worth filing individually but together create friction for pipelines that wrap STAR's output convention. Both verifiable in one paired MRE.
STAR reference behaviour
STAR's header writer is at source/samHeaders.cpp; separate Log.out and Log.progress.out writers are at source/Parameters_openReadsFiles.cpp and the source/InOutStreams.cpp init. Pass-1 outputs (Log.final.out, SJ.out.tab) live inside <prefix>_STARpass1/ — the two-pass orchestration is in source/twoPass.cpp (it mkdirs the _STARpass1 directory and redirects pass-1 output into it).
Reproducer
#!/usr/bin/env bash
set -euo pipefail
mkdir -p /tmp/rustar-mre-28 && cd /tmp/rustar-mre-28
BASE=https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a
curl -fsLO $BASE/reference/genome.fasta
curl -fsL $BASE/reference/genes_with_empty_tid.gtf.gz | gunzip -c > genes.gtf
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_1.fastq.gz
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_2.fastq.gz
RUSTAR=ghcr.io/scverse/rustar-aligner:dev
STAR=community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4
mkdir -p idx-rustar idx-star
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner --runMode genomeGenerate \
--genomeDir idx-rustar --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
--sjdbOverhang 100 --genomeSAindexNbases 7
docker run --rm -v $PWD:/w -w /w $STAR STAR --runMode genomeGenerate \
--genomeDir idx-star --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
--sjdbOverhang 100 --genomeSAindexNbases 7
COMMON=(--readFilesIn SRR6357072_1.fastq.gz SRR6357072_2.fastq.gz --readFilesCommand zcat
--runThreadN 4 --sjdbGTFfile genes.gtf --twopassMode Basic --runRNGseed 0
--outSAMtype BAM Unsorted)
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner \
--genomeDir idx-rustar "${COMMON[@]}" --outFileNamePrefix RUS.
docker run --rm -v $PWD:/w -w /w $STAR STAR \
--genomeDir idx-star "${COMMON[@]}" --outFileNamePrefix STAR.
echo "=== STAR top-level + pass1 dir ==="
ls -d STAR* 2>/dev/null
echo "--- inside STAR._STARpass1: ---"
ls -1 STAR._STARpass1/ 2>/dev/null || echo "(no _STARpass1 directory)"
echo
echo "=== rustar (RUS. is a directory; see issue #26) ==="
ls -1 RUS./
echo "--- pass-1 intermediate dir present? ---"
ls -d RUS./*_STARpass1 2>/dev/null || echo "(no _STARpass1 directory inside RUS./)"
Observed (verified on the same fresh run)
STAR (top level):
STAR.Aligned.out.bam
STAR.Aligned.toTranscriptome.out.bam # with --quantMode TranscriptomeSAM
STAR.Log.final.out
STAR.Log.out
STAR.Log.progress.out
STAR.SJ.out.tab
STAR._STARgenome/
STAR._STARpass1/
STAR pass-1 contents:
rustar (inside RUS./):
Aligned.out.bam
Aligned.toTranscriptome.out.bam # with --quantMode TranscriptomeSAM
Log.final.out # <-- only this; no Log.out / Log.progress.out
SJ.out.tab
SJ.pass1.out.tab # <-- at top level, not in a _STARpass1/ dir
Suggested fix
Gap 1: Log.out / Log.progress.out
Add a Log.out writer that records parameters at start and a periodic Log.progress.out writer (per-chunk; STAR updates ~every minute during alignment). Even minimal stubs (a parameter dump for Log.out, a single "done" line for Log.progress.out) would close the file-existence gap.
Gap 2: SJ.pass1.out.tab location
Move pass-1 intermediates inside <prefix>_STARpass1/. STAR uses:
<prefix>_STARpass1/
Log.final.out
SJ.out.tab # <-- pass-1 SJ tab, named SJ.out.tab inside the dir
This also gives a natural home if rustar ever wants to expose pass-1 stats separately, the way STAR does.
Severity
Low. Today nf-core/rnaseq works around both with optional: true outputs and a permissive *.tab glob. Mostly a drop-in compatibility cleanup; if either is out of scope for v0.1.x, please say so and we'll keep the workarounds.
Related: #22 mate fields (functional, higher severity); #26 prefix-as-dir and #25 --limitGenomeGenerateRAM rejection are filed separately as they need real code changes rather than additional output writers.
Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise: author:pinin4fjords or grep for nf-core/rnaseq#1855.
Summary
Two small STAR drop-in gaps in the output layout that aren't worth filing individually but together create friction for pipelines that wrap STAR's output convention. Both verifiable in one paired MRE.
Log.outandLog.progress.outare never written (STAR writes them alongsideLog.final.out).SJ.pass1.out.tabis emitted at the top level, where STAR keeps its pass-1 intermediates inside<prefix>_STARpass1/.STAR reference behaviour
STAR's header writer is at
source/samHeaders.cpp; separateLog.outandLog.progress.outwriters are atsource/Parameters_openReadsFiles.cppand thesource/InOutStreams.cppinit. Pass-1 outputs (Log.final.out,SJ.out.tab) live inside<prefix>_STARpass1/— the two-pass orchestration is insource/twoPass.cpp(itmkdirs the_STARpass1directory and redirects pass-1 output into it).Reproducer
Observed (verified on the same fresh run)
STAR (top level):
STAR pass-1 contents:
rustar (inside
RUS./):Suggested fix
Gap 1:
Log.out/Log.progress.outAdd a
Log.outwriter that records parameters at start and a periodicLog.progress.outwriter (per-chunk; STAR updates ~every minute during alignment). Even minimal stubs (a parameter dump forLog.out, a single "done" line forLog.progress.out) would close the file-existence gap.Gap 2:
SJ.pass1.out.tablocationMove pass-1 intermediates inside
<prefix>_STARpass1/. STAR uses:This also gives a natural home if rustar ever wants to expose pass-1 stats separately, the way STAR does.
Severity
Low. Today nf-core/rnaseq works around both with
optional: trueoutputs and a permissive*.tabglob. Mostly a drop-in compatibility cleanup; if either is out of scope for v0.1.x, please say so and we'll keep the workarounds.Related: #22 mate fields (functional, higher severity); #26 prefix-as-dir and #25 --limitGenomeGenerateRAM rejection are filed separately as they need real code changes rather than additional output writers.
Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise:
author:pinin4fjordsor grep fornf-core/rnaseq#1855.