Summary
rustar's BAM @PG line is content-free — just @PG\tID:rustar-aligner, no PN, VN, or CL fields. STAR emits a fully populated PG record including the full command line. Provenance tools (MultiQC program-version table, dx-toolkit lineage tracking, internal QC dashboards) end up with a blank entry.
STAR reference behaviour
source/samHeaders.cpp:61-63:
samHeaderStream << "@PG\tID:STAR\tPN:STAR\tVN:" << STAR_VERSION
<< "\tCL:" << P.commandLineFull << "\n";
SAM spec §1.3 defines the @PG fields:
ID — required, program record identifier.
PN — program name.
VN — program version.
CL — command line.
PN, VN, CL are optional in the spec but conventionally populated by every modern aligner (bwa, bwa-mem2, STAR, HISAT2, minimap2, bowtie2).
Suggested fix
In the rustar header writer (src/io/sam.rs::SamWriter::write_header), expand the @PG line to:
@PG\tID:rustar-aligner\tPN:rustar-aligner\tVN:{version}\tCL:{full command line}
version from env!("CARGO_PKG_VERSION"). Command line from std::env::args().collect::<Vec<_>>().join(" ") captured in main() before clap parsing.
Reproducer
#!/usr/bin/env bash
set -euo pipefail
mkdir -p /tmp/rustar-mre-33 && cd /tmp/rustar-mre-33
BASE=https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a
curl -fsLO $BASE/reference/genome.fasta
curl -fsL $BASE/reference/genes_with_empty_tid.gtf.gz | gunzip -c > genes.gtf
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_1.fastq.gz
curl -fsLO $BASE/testdata/GSE110004/SRR6357072_2.fastq.gz
RUSTAR=ghcr.io/scverse/rustar-aligner:dev
STAR=community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4
mkdir -p idx-rustar idx-star
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner --runMode genomeGenerate \
--genomeDir idx-rustar --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
--sjdbOverhang 100 --genomeSAindexNbases 7
docker run --rm -v $PWD:/w -w /w $STAR STAR --runMode genomeGenerate \
--genomeDir idx-star --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
--sjdbOverhang 100 --genomeSAindexNbases 7
COMMON=(--readFilesIn SRR6357072_1.fastq.gz SRR6357072_2.fastq.gz --readFilesCommand zcat
--runThreadN 4 --sjdbGTFfile genes.gtf --twopassMode Basic --runRNGseed 0
--outSAMtype BAM Unsorted --outSAMattributes NH HI AS NM MD
--outSAMattrRGline ID:WT_REP2 SM:WT_REP2)
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner \
--genomeDir idx-rustar "${COMMON[@]}" --outFileNamePrefix RUS.
docker run --rm -v $PWD:/w -w /w $STAR STAR \
--genomeDir idx-star "${COMMON[@]}" --outFileNamePrefix STAR.
echo "=== rustar @PG ==="; docker run --rm -v $PWD:/w $STAR samtools view -H /w/RUS./Aligned.out.bam | grep '^@PG' | head -1
echo "=== STAR @PG ===" ; docker run --rm -v $PWD:/w $STAR samtools view -H /w/STAR.Aligned.out.bam | grep '^@PG' | head -1
Observed:
=== rustar @PG ===
@PG ID:rustar-aligner
=== STAR @PG ===
@PG ID:STAR PN:STAR VN:2.7.11b CL:/opt/conda/bin/STAR-avx2 --runThreadN 4 ... --twopassMode Basic
Severity
Low. Doesn't break specific downstream tools but blanks the provenance entry for any tool that reads @PG.
Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855).
Summary
rustar's BAM
@PGline is content-free — just@PG\tID:rustar-aligner, noPN,VN, orCLfields. STAR emits a fully populated PG record including the full command line. Provenance tools (MultiQC program-version table, dx-toolkit lineage tracking, internal QC dashboards) end up with a blank entry.STAR reference behaviour
source/samHeaders.cpp:61-63:SAM spec §1.3 defines the
@PGfields:ID— required, program record identifier.PN— program name.VN— program version.CL— command line.PN,VN,CLare optional in the spec but conventionally populated by every modern aligner (bwa, bwa-mem2, STAR, HISAT2, minimap2, bowtie2).Suggested fix
In the rustar header writer (
src/io/sam.rs::SamWriter::write_header), expand the@PGline to:versionfromenv!("CARGO_PKG_VERSION"). Command line fromstd::env::args().collect::<Vec<_>>().join(" ")captured inmain()before clap parsing.Reproducer
Observed:
Severity
Low. Doesn't break specific downstream tools but blanks the provenance entry for any tool that reads
@PG.Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855).