Skip to content

Commit eff8590

Browse files
committed
Refine CompressionHeader map serialization
Document why buffering is required (length-prefixed format), pre-size the ByteArrayOutputStream to 16 KB (fits typical tag sets without reallocation; grows via BAOS doubling for rich tag data), reuse a single buffer across both map blocks via reset(), and use writeTo() rather than toByteArray() to avoid an extra copy.
1 parent b64c0e0 commit eff8590

1 file changed

Lines changed: 6 additions & 5 deletions

File tree

src/main/java/htsjdk/samtools/cram/structure/CompressionHeader.java

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -293,11 +293,12 @@ private void internalWrite(final OutputStream outputStream) throws IOException {
293293
// Each map below is written to outputStream as a length-prefixed byte
294294
// array, so we need to know the full serialized size before writing.
295295
// Buffer to a ByteArrayOutputStream first, then emit [length][bytes].
296-
// Pre-sized to 100 KB (matches the previous fixed-buffer size, so the
297-
// common case fits without any reallocation) but allowed to grow for
298-
// rich tag sets (PacBio/Ultima flow-space, ONT mod bases) where the
299-
// TD dictionary can exceed 100 KB. Reused across both blocks via reset().
300-
final ByteArrayOutputStream mapStream = new ByteArrayOutputStream(100 * 1024);
296+
// Pre-sized to 16 KB: enough for typical tag sets without reallocation,
297+
// but small enough that we don't waste memory when most of it is unused.
298+
// Rich tag sets (PacBio/Ultima flow-space, ONT mod bases) grow via the
299+
// usual doubling -- a few reallocations are cheap relative to the final
300+
// size. Reused across both blocks via reset().
301+
final ByteArrayOutputStream mapStream = new ByteArrayOutputStream(16 * 1024);
301302

302303
{ // preservation map:
303304
ITF8.writeUnsignedITF8(5, mapStream);

0 commit comments

Comments
 (0)