Skip to content

Commit c944b6c

Browse files
committed
Update README and CHANGELOG for CRAM 3.1 write support (5.0.0)
1 parent 886f2de commit c944b6c

2 files changed

Lines changed: 43 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,47 @@ early infrastructure for a plugin-based codec framework and resource bundles.
1010

1111
---
1212

13+
## 5.0.0
14+
15+
Adds **CRAM 3.1 write support** to htsjdk. This is the culmination of the read-side codec work
16+
in 4.2.0 and the reader wiring in 4.3.0: htsjdk can now produce CRAM 3.1 files that are
17+
interoperable with samtools/htslib.
18+
19+
### CRAM 3.1 Write Support
20+
21+
- Enable CRAM 3.1 writing with all spec codecs: rANS Nx16, adaptive arithmetic Range coder, FQZComp, Name Tokenisation, and STRIPE
22+
- Add configurable compression profiles (FAST, NORMAL, BEST, ARCHIVE) with trial compression for automatic codec selection
23+
- Implement `TrialCompressor` to replace ad-hoc triple-compression for tags and align trial candidates with htslib
24+
- Add `GzipCodec` for direct Deflater/Inflater GZIP compression, wired into CRAM as a codec option
25+
- Strip NM/MD tags on CRAM encode and regenerate on decode, matching htslib behavior
26+
- Implement attached (same-slice) mate pair resolution
27+
- Align DataSeries content IDs with htslib for cross-implementation debugging
28+
- Remove unnecessary content digest tags from CRAM slice headers
29+
- Add `CramConverter` command-line tool for testing and benchmarking CRAM write profiles
30+
31+
### Codec and Compression Optimizations
32+
33+
- Refactor and optimize all rANS codecs: byte-array API, backwards-write encoding, and general simplifications
34+
- Optimize Name Tokeniser encoder: replace regex with hand-written parser; add per-type flags, STRIPE support, stream deduplication, and all-MATCH elimination
35+
- Optimize FQZComp, Range coder, and rANS encoder hot paths
36+
- Tune NORMAL profile codec assignments based on empirical compression testing
37+
38+
### Performance
39+
40+
- Replace `ByteArrayInputStream`/`ByteArrayOutputStream` with unsynchronized `CRAMByteReader`/`CRAMByteWriter` to eliminate synchronization overhead
41+
- Fuse read base restoration, CIGAR building, and NM/MD computation into a single pass during decode
42+
- Cache tag key metadata to eliminate per-record `String` allocation during CRAM decode
43+
- Pool `RANSNx16Decode` instances in the Name Tokeniser
44+
- Optimize BAM nibble-to-ASCII base decoding with a bulk lookup table
45+
46+
### Testing and Infrastructure
47+
48+
- Split CRAM 3.1 fidelity tests into per-profile classes for parallel execution
49+
- Reduce memory pressure in unit tests to eliminate OOM failures
50+
- Fix thread-safety bug in `VariantContextTestProvider` causing non-deterministic test counts
51+
52+
---
53+
1354
## 4.3.0 (2025-05-09)
1455

1556
Completes CRAM 3.1 read support by wiring the codec implementations (added in 4.2.0) into

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ manipulating HTS data.
1111

1212
> **NOTE: _HTSJDK has only partial support for the latest Variant Call Format Specification. VCFv4.3 can be read but not written, VCFv4.4 can be read in lenient mode only, and there is no support for BCFv2.2._**
1313
14+
> **NOTE: _HTSJDK now supports both reading and writing CRAM 3.1 files. CRAM 3.1 write support includes all codecs defined in the specification (rANS Nx16, adaptive arithmetic Range coder, FQZComp, Name Tokenisation, and STRIPE), configurable compression profiles (FAST, NORMAL, BEST, ARCHIVE), and trial compression for automatic codec selection. Files produced by htsjdk are interoperable with samtools/htslib._**
15+
1416
### Documentation & Getting Help
1517

1618
API documentation for all versions of HTSJDK since `1.128` are available through [javadoc.io](http://www.javadoc.io/doc/com.github.samtools/htsjdk).

0 commit comments

Comments
 (0)