fix(bam): emit SAM-spec NM:i: tag distinct from STAR-internal nM:i:#42
Open
pinin4fjords wants to merge 1 commit into
Open
fix(bam): emit SAM-spec NM:i: tag distinct from STAR-internal nM:i:#42pinin4fjords wants to merge 1 commit into
NM:i: tag distinct from STAR-internal nM:i:#42pinin4fjords wants to merge 1 commit into
Conversation
--outSAMattributes NM was being routed to the existing nM writer (substitutions only), so requests for the SAM-spec edit-distance tag silently produced wrong values. Downstream tools that read NM:i: (samtools stats, picard, MultiQC) saw nothing. Treat NM and nM as distinct attribute tokens. When NM is requested, compute SAM-spec edit distance per the SAM v1 spec section 1.4: substitutions + inserted bases + deleted bases (excluding intron N skips). Keep nM emission unchanged when explicitly requested. Fixes scverse#29
Author
|
Verified end-to-end on macOS/aarch64 against the rebuilt fix branch. Per-record NM values on indel-containing records match STAR's exactly:
Pre-fix the same run produced 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--outSAMattributes NMwas routed to the existingnM:i:writer (substitutions only). The SAM-specNMtag is the edit distance to the reference: substitutions + inserted bases + deleted bases. STAR and SAM-conformant tools (samtools stats, Picard, MultiQC) readNM:i:and were getting nothing back.NM:i:recordsnM:i:recordsOn records with indels, the per-record values also differed (rustar
nM:i:0where STAR'sNM:i:2reflected the deleted bases).Fix
Two changes:
NMandnMas distinct tokens - request for one no longer silently enables the other.NMis requested, compute SAM-spec edit distance:n_mismatch + sum(I-op lengths) + sum(D-op lengths)from the existingTranscript::cigarandTranscript::n_mismatch. IntronNskips are excluded per SAM v1 section 1.4.Both tags can now be emitted together when both are in
--outSAMattributes.Test plan
80M2D21Mrecord with one mismatch producesNM:i:3andnM:i:1cargo buildcargo clippy --lib -- -D warningscargo fmt --checkAfter this fix,
samtools view <bam> | grep -c 'NM:i:'matches the total record count andsamtools stats error_ratereports a non-zero edit distance.Fixes #29