fix(transcriptome-bam): write per-record RG:Z: tag matching @RG header#39
Open
pinin4fjords wants to merge 2 commits into
Open
fix(transcriptome-bam): write per-record RG:Z: tag matching @RG header#39pinin4fjords wants to merge 2 commits into
RG:Z: tag matching @RG header#39pinin4fjords wants to merge 2 commits into
Conversation
When --outSAMattrRGline is supplied, rustar writes the @rg header to both the genome and transcriptome BAMs, but only stamps the per-record RG:Z: tag on the genome BAM. The transcriptome BAM was emitting 0 RG:Z: records vs STAR's 1-per-record output. Add the RG tag stamp to the transcriptome record builder (paired-end and single-end paths) so every record carries RG:Z:<id> matching the @rg header, byte-symmetric with STAR. Fixes scverse#32
Author
|
Verified end-to-end on macOS/aarch64 against the rebuilt fix branch. Same PE yeast input + Pre-fix the same invocation produced |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
--outSAMattrRGline ID:... SM:...is supplied, rustar writes the@RGheader to both the genome and transcriptome BAMs. It also writesRG:Z:<id>per record on the genome BAM - but not on the transcriptome BAM. STAR writesRG:Z:on every transcriptome record (78886/78886 in the issue's test sample), while rustar wrote 0.Any tool that splits transcriptome BAM records by read group (multi-sample bundled BAMs, custom QC keyed on RG) silently sees no RG.
Fix
Add the per-record
RG:Z:stamp to the transcriptome record builder (paired-end and single-end paths), mirroring the genome record builder. The approach matches STAR'soutSAMattrOrderQuant.push_back(ATTR_RG)atParameters_samAttributes.cpp:201-205.SamWriter::build_transcriptome_recordsalready receives&Parameters, and the existingmaybe_insert_rg_taghelper (already used by every genome-BAM record builder) is reused unchanged. Both the SE and PE call sites inlib.rsgo through this single builder, so the fix lands on both paths without touching them.Test plan
RG:Z:<id>when--outSAMattrRGline ID:foo ...is suppliedRG:Z:tag is emitted when--outSAMattrRGlineis unset (mirrors genome-BAM gating)cargo buildcargo fmt --checkcargo clippy --lib -- -D warningsclean;cargo clippy --all-targetshas 46 pre-existing errors onmainunrelated to this change (deprecatedassert_cmd::Command::cargo_bin,modulo_onein integration tests, unused import insrc/chimeric/output.rs). Picked up separately bypa/lint-all-targets.Fixes #32