Skip to content

--limitGenomeGenerateRAM rejected at CLI parser (STAR-compat regression) #25

@pinin4fjords

Description

@pinin4fjords

Summary

rustar-aligner --runMode genomeGenerate rejects the STAR-compatible --limitGenomeGenerateRAM flag at startup. STAR exposes this flag and any pipeline that wraps STAR derives a RAM cap from job resources and passes it through (e.g. the nf-core/rnaseq STAR_GENOMEGENERATE Nextflow module passes a value computed from task.memory).

STAR reference behaviour

--limitGenomeGenerateRAM <int> is a documented STAR flag — see the STAR manual §3 — capping the resident-set used during genome generation. Default 31000000000 (~31 GB). It is independent from --limitBAMsortRAM (the BAM-sort memory cap).

Reproducer

#!/usr/bin/env bash
set -euo pipefail
mkdir -p /tmp/rustar-mre-25 && cd /tmp/rustar-mre-25

BASE=https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a
curl -fsLO $BASE/reference/genome.fasta
curl -fsL  $BASE/reference/genes_with_empty_tid.gtf.gz | gunzip -c > genes.gtf

RUSTAR=ghcr.io/scverse/rustar-aligner:dev
STAR=community.wave.seqera.io/library/htslib_samtools_star_gawk:ae438e9a604351a4

echo "=== STAR with --limitGenomeGenerateRAM 31000000000 ==="
docker run --rm -v $PWD:/w -w /w $STAR STAR --runMode genomeGenerate \
    --genomeDir /tmp/star-idx --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
    --sjdbOverhang 100 --genomeSAindexNbases 7 \
    --limitGenomeGenerateRAM 31000000000 2>&1 | tail -3

echo
echo "=== rustar with the same flag ==="
docker run --rm -v $PWD:/w -w /w $RUSTAR rustar-aligner --runMode genomeGenerate \
    --genomeDir /tmp/rustar-idx --genomeFastaFiles genome.fasta --sjdbGTFfile genes.gtf \
    --sjdbOverhang 100 --genomeSAindexNbases 7 \
    --limitGenomeGenerateRAM 31000000000 2>&1 | tail -3

Observed (verified on commit 5f8ad08 + STAR 2.7.11b)

STAR completes:

May 12 15:24:18 ... writing Suffix Array to disk ...
May 12 15:24:18 ... writing SAindex to disk
May 12 15:24:18 ..... finished successfully

rustar fails at the CLI parser, before any work:

error: unexpected argument '--limitGenomeGenerateRAM' found

  tip: a similar argument exists: '--limitBAMsortRAM'

Suggested fix

In src/params.rs (near the existing --limitBAMsortRAM field around line 363), add:

#[arg(long = "limitGenomeGenerateRAM", default_value_t = 31_000_000_000_u64)]
pub limit_genome_generate_ram: u64,

Two acceptable behaviours, in order of preference:

  1. Accept and honour — cap the resident-set during suffix-array / index construction (or at least the in-memory genome chunks).
  2. Accept and warn-ignore initially (log::warn!("--limitGenomeGenerateRAM accepted but not enforced yet; rustar uses its own memory management")). This unblocks every STAR-compatible caller without committing to the cap implementation.

Either is dramatically better than failing at the CLI parser, because the flag isn't user-facing — it's emitted by every pipeline that wraps STAR.

Why this matters

rustar-aligner positions itself as a STAR drop-in. Rejecting a flag STAR has accepted for years makes the drop-in claim conditional. nf-core/rnaseq's STAR_GENOMEGENERATE Nextflow module is one of many wrappers that will need a workaround otherwise.


Filed during nf-core/rnaseq integration testing (nf-core/rnaseq#1855). All sibling issues from this exercise: author:pinin4fjords or grep for nf-core/rnaseq#1855.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions