Skip to content

Changed SAMRecord.toString() to emit the SAM format string with all fields#1762

Merged
tfenne merged 3 commits into
devfrom
tf_getter_samrecord_tostring
Apr 25, 2026
Merged

Changed SAMRecord.toString() to emit the SAM format string with all fields#1762
tfenne merged 3 commits into
devfrom
tf_getter_samrecord_tostring

Conversation

@tfenne
Copy link
Copy Markdown
Member

@tfenne tfenne commented Mar 30, 2026

Description

SAMRecord has historically output an abbreviated format in toString(). I think more often than not this is confusing to users, since a SAMRecord has a well defined String format. Moreover the toString() method is used extensively by TestNG to display when two SAMRecord instances are not equal - and the vast majority of the time the toString() output did not contain the differences.

Things to think about before submitting:

  • Make sure your changes compile and new tests pass locally.
  • Add new tests or update existing ones:
    • A bug fix should include a test that previously would have failed and passes now.
    • New features should come with new tests that exercise and validate the new functionality.
  • Extended the README / documentation, if necessary
  • Check your code style.
  • Write a clear commit title and message
    • The commit message should describe what changed and is targeted at htsjdk developers
    • Breaking changes should be mentioned in the commit message.

@tfenne tfenne requested a review from yfarjoun March 30, 2026 19:59
Copy link
Copy Markdown
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. can you verify there's no newline, and if there is, strip it (he says knowingly)
  2. are there tools (picard, igv, gatk, fgbio, others) that rely on the toString returned format (I know I know), but it is a public so a breaking change...
  3. Should you add at least one regression test?
  4. toString was previously lock free, but looking at getSAMString we have some synchronized methods...
  5. the previous method was lighter-weight, is this a concern?
  6. are there tests that (shudders) compared outputs using toString?
  7. is it time to remove the format deprecated method?

@tfenne tfenne changed the base branch from master to dev April 25, 2026 10:35
tfenne added 3 commits April 25, 2026 04:35
The text formatting path used by SAMRecord.toString() / getSAMString() has been
cleaned up:

- SAMTextWriter.getSAMString no longer uses a JVM-wide synchronized static cache
  of a shared StringWriter/SAMTextWriter. It now allocates a fresh StringWriter
  per call. Without this change, every toString() call would have taken a global
  monitor.
- SAMTextWriter.writeAlignment is split into a private writeAlignmentNoNewline
  plus a thin wrapper that appends '\n'. getSAMString uses the no-newline
  variant, so the result no longer needs to be trimmed.

BREAKING CHANGE: SAMRecord.getSAMString() no longer terminates the returned
String with a newline character. This brings SAMRecord into line with the other
getSAMString() implementations (SAMSequenceRecord, SAMReadGroupRecord,
SAMProgramRecord, SAMFileHeader), which already return newline-free strings.
Callers that relied on the trailing '\n' as a separator (e.g. concatenating two
records' SAM strings) must now insert their own separator. Callers that stripped
the newline with .trim() can drop the call.
SAMRecord.format() was deprecated in favor of getSAMString() because it didn't
guarantee a valid SAM text representation. Now that getSAMString() is the
canonical formatter (and is used by toString()), drop format() and the private
helpers that supported it: addField, formatTagValue, safeEquals.

Also remove the SRALazyRecord.format() override, which existed only because
format() bypassed lazy attribute loading; getSAMString() goes through
SRALazyRecord.getBinaryAttributes() which already handles lazy loading. And
delete a couple of stale commented-out debug prints that referenced format().

BREAKING CHANGE: SAMRecord.format() has been removed. Callers should use
SAMRecord.getSAMString() instead.
@tfenne tfenne force-pushed the tf_getter_samrecord_tostring branch from 8d787ad to 6ba9463 Compare April 25, 2026 11:21
@tfenne
Copy link
Copy Markdown
Member Author

tfenne commented Apr 25, 2026

Thanks for the review @nh13. I have done several things based upon it:

  • Removed the newline from the toString() / getSAMString() path and fixed up anywhere in htsjdk that was using those methods
  • Added tests for the new behavior to SAMRecord.toString()
  • Removed the deprecated format() and friends
  • Removed the locking by allocating a SAMTextWriter each time in getSAMString - it turns out to be very lightweight so it should be just fine

@tfenne tfenne merged commit 12edefb into dev Apr 25, 2026
3 checks passed
@tfenne tfenne deleted the tf_getter_samrecord_tostring branch April 25, 2026 12:02
tfenne added a commit that referenced this pull request Apr 25, 2026
…ields (#1762)

* Changed SAMRecord.toString() to emit the SAM format string with all fields.
* Drop sync + trailing newline from SAMRecord.getSAMString.
* Remove deprecated SAMRecord.format() and now-dead helpers.

The text formatting path used by SAMRecord.toString() / getSAMString() has been
cleaned up:

- SAMTextWriter.getSAMString no longer uses a JVM-wide synchronized static cache
  of a shared StringWriter/SAMTextWriter. It now allocates a fresh StringWriter
  per call. Without this change, every toString() call would have taken a global
  monitor.
- SAMTextWriter.writeAlignment is split into a private writeAlignmentNoNewline
  plus a thin wrapper that appends '\n'. getSAMString uses the no-newline
  variant, so the result no longer needs to be trimmed.

BREAKING CHANGE: SAMRecord.getSAMString() no longer terminates the returned
String with a newline character. This brings SAMRecord into line with the other
getSAMString() implementations (SAMSequenceRecord, SAMReadGroupRecord,
SAMProgramRecord, SAMFileHeader), which already return newline-free strings.
Callers that relied on the trailing '\n' as a separator (e.g. concatenating two
records' SAM strings) must now insert their own separator. Callers that stripped
the newline with .trim() can drop the call.

BREAKING CHANGE: SAMRecord.format() has been removed. Callers should use
SAMRecord.getSAMString() instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants