Skip to content

Commit 38fea75

Browse files
committed
CramComparison: ignore mapQ on unmapped reads
SAM spec leaves mapQ undefined for unmapped reads; in practice unaligned inputs (PacBio CCS, ONT basecaller output) commonly emit mapQ=255 ("not available") while both htsjdk and samtools normalize it to 0 on CRAM decode. Without this, every unmapped-read record in a roundtrip against such input fails on mapQ even though the data is otherwise identical. Skip the mapQ check when the record is unmapped; the lenient flag comparison already catches any mismatch in mapped/unmapped status.
1 parent 8a4cb98 commit 38fea75

1 file changed

Lines changed: 6 additions & 1 deletion

File tree

src/main/java/htsjdk/samtools/cram/CramComparison.java

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,13 +197,18 @@ private static String compareLenient(final SAMRecord a, final SAMRecord b) {
197197
* Strict comparison with known CRAM tolerance:
198198
* - Auto-generated MD/NM tags stripped when one side lacks them
199199
* - Unsigned B-array type differences tolerated (CRAM stores as signed)
200+
* - mapQ is ignored for unmapped reads (SAM spec leaves it undefined, and both
201+
* htsjdk and samtools normalize it to 0 on CRAM decode regardless of source)
202+
* - CIGAR =/X operators are normalized to M (CRAM does not preserve the distinction)
200203
*/
201204
private static String compareStrict(final SAMRecord a, final SAMRecord b, final Set<String> ignoreTags) {
202205
// Core fields
203206
final String lenientDiff = compareLenient(a, b);
204207
if (lenientDiff != null) return lenientDiff;
205208

206-
if (a.getMappingQuality() != b.getMappingQuality())
209+
// Only compare mapQ for mapped reads; for unmapped reads the SAM spec leaves
210+
// mapQ undefined and CRAM normalizes it to 0 regardless of the source value.
211+
if (!a.getReadUnmappedFlag() && a.getMappingQuality() != b.getMappingQuality())
207212
return "mapQ: " + a.getMappingQuality() + " vs " + b.getMappingQuality();
208213
// CRAM stores match/mismatch information in "read features" separately from
209214
// the CIGAR operator, so =/X distinction is not preserved through a roundtrip:

0 commit comments

Comments
 (0)