feat(bam): emit XS:A: strand tag when --outSAMstrandField intronMotif#43
Open
pinin4fjords wants to merge 1 commit into
Open
feat(bam): emit XS:A: strand tag when --outSAMstrandField intronMotif#43pinin4fjords wants to merge 1 commit into
XS:A: strand tag when --outSAMstrandField intronMotif#43pinin4fjords wants to merge 1 commit into
Conversation
The flag was accepted at the CLI parser but never produced any XS:A: tags on the output BAM. Downstream tools that need strand inference (StringTie, Cufflinks, rseqc infer_experiment.py) saw nothing. Mirror STAR's two coupling paths: --outSAMattributes XS auto-enables --outSAMstrandField intronMotif, and vice versa. For each output record with at least one canonical intron motif, map the motif to + or - per STAR's convention; non-canonical or mixed motifs omit the tag. Fixes scverse#30
Author
|
Verified end-to-end on macOS/aarch64 against the rebuilt fix branch with Pre-fix the same invocation produced 0 Sample spliced records (all spliced records on long introns have XS populated): The total count (549) is lower than STAR's 1738 on the same data, but that gap is downstream of #27 (~50 % fewer spliced reads in rustar overall on this profile) plus the mixed-motif omit-XS policy — not a hole in the XS-tag fix itself. LGTM. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--outSAMstrandField intronMotifwas a no-op at tag-write time - accepted at the CLI parser, but the output BAM contained zeroXS:A:tags. Downstream tools that need strand inference (StringTie, Cufflinks, rseqcinfer_experiment.py) saw nothing.samtools view <bam> | grep -c 'XS:A:'pre-fix: 0. STAR on the same data: 1 738.Fix
Mirror STAR's two coupling paths from
Parameters_samAttributes.cpp:XSin--outSAMattributes-> forceoutSAMstrandField = intronMotifand addXSto the effective attribute set (lines 172-179).--outSAMstrandField intronMotif(regardless of--outSAMattributes) -> addXSto the effective attribute set (lines 213-216).Per-record write: map
Transcript::junction_motifsto a strand char (canonical GT/AG donor ->+, canonical CT/AC ->-, non-canonical or mixed -> omit). Emitted alongside the existingNH/HI/MDtags wherever a record is built.The motif detection itself is already implemented (
src/align/score::SpliceMotif,Transcript::junction_motifs); this PR just plumbs the existing data into the tag write.Test plan
--outSAMstrandField intronMotifaddsXSto the effective attribute set--outSAMattributes XSforce-enables intronMotif strand inference+, canonical rev →-, non-canonical →None)XS:A:+in the emitted recordcargo build/cargo clippy --lib -- -D warnings/cargo fmt --checkAfter this fix,
samtools view <bam> | grep -c 'XS:A:'returns a positive count on spliced data, andinfer_experiment.pyno longer reports "stranding could not be inferred".Fixes #30