You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expands the existing 5.0.0 stub (which previously covered only the CRAM
3.1 write work) into a complete entry covering everything since 4.3.0.
Adds:
- Lead headlines summarizing the major themes (CRAM 3.1 writing, slimmer
runtime deps, faster BAM [de]compression, enforced formatting, fixed
test reporting).
- A prominent ⚠️ Breaking changes section calling out SRA removal,
Nashorn now opt-in, the SAMRecord.toString() format change, the
removed CRAM slice digest tags, and the new default CRAM version 3.1.
- A new "CRAM correctness and cross-implementation fixes" section
consolidating the read- and write-path fixes that improve interop
with samtools/htslib (TLEN computation, CIGAR =/X comparison, CIGAR
reconstruction for sequence '*', container-with-no-slices crash,
archive header overflow, unmapped-read query, supplementary/secondary
read-name limitation).
- Performance entries beyond just the CRAM-internal optimizations:
jlibdeflate integration, the BAM decoding path improvements, and a
long-read-friendly bases-per-slice threshold.
- A bug-fix section covering the LTF8 9-byte write fix, the
SamLocusIterator offset bug, the SamPairUtil dovetail fix, and the
snappy native-load UnsatisfiedLinkError catch.
- A build, tooling, and dependency clean-up section: Palantir Java
Format + Spotless enforcement, Maven Central portal migration,
snapshot version naming, deprecation cleanup, the test-runner
pass/fail-reporting fix, and the dependency clean-up (commons-logging
constraint, Nashorn compileOnly, ngs-java removal).
- A compatibility line noting JDK 17 / 21 / 24 spot-checks.
- Expanded testing entries: the hts-specs CRAM 3.0/3.1 compliance
tests, FQZComp round-trip tests, CRAI correctness tests, test-suite
speedups, the CEUTrio test-data downsizing, and the JS filter test
bulk-up.
Source for the additions was the full git log since the 4.3.0 tag plus
the unsquashed backup branch tf_cram_31_backup_20260425, which retains
fine-grained commits that the merged CRAM 3.1 PR squashed away.
The CRAM write-speed gains are intentionally not headlined yet --
prior htsjdk wrote CRAM 3.0 (lower compression, fewer codec passes), so
"faster" without "and same/better compression" would be misleading.
We'll revisit the perf bullet after benchmarking against samtools.
- Align DataSeries content IDs with htslib for cross-implementation debugging
28
-
- Remove content digest tags (BD/SD/B5/S5/B1/S1) from CRAM slice headers, matching htslib/samtools behavior. These are optional per the spec and were expensive to compute. Block-level CRC32 (required by CRAM 3.0+) provides data integrity. This is technically a breaking change but has zero practical impact since no known tools consume these tags.
29
-
- Default CRAM version for writing is now 3.1 (was 3.0)
69
+
- Remove content digest tags (BD/SD/B5/S5/B1/S1) from CRAM slice headers, matching htslib/samtools behavior (see Breaking changes)
70
+
- Default CRAM version for writing is now 3.1 (was 3.0; see Breaking changes)
30
71
- Add `CramConverter` command-line tool for testing and benchmarking CRAM write profiles
72
+
- Add cross-implementation CRAM validation pipeline (`validation/`) for round-tripping against samtools/htslib
73
+
- Add bases-per-slice threshold to bound slice memory when writing long reads
74
+
- Refine `CompressionHeader` map serialization
75
+
- Resolve a pile of in-tree `TODO`s in CRAM structure classes
76
+
77
+
### CRAM correctness and cross-implementation fixes
78
+
79
+
These fixes apply to both reading and writing CRAM and substantially improve interoperability with samtools/htslib.
80
+
81
+
- Fix CRAM `TLEN` computation to match htslib (cross-tool comparisons of the same input now produce matching `TLEN` values)
82
+
- Fix `CIGAR` reconstruction when the sequence is `*` (`CF_UNKNOWN_BASES`)
83
+
- Fix `=`/`X``CIGAR` op comparison in cross-implementation tests
84
+
- Fix CRAM archive header overflow on large containers
85
+
- Fix crash when reading a CRAM container with no slices
86
+
- Fix unmapped-read query in the hts-specs compliance harness
87
+
- Document the supplementary/secondary read-name resolution limitation in the writer
31
88
32
89
### Codec and Compression Optimizations
33
90
@@ -38,17 +95,48 @@ interoperable with samtools/htslib.
38
95
39
96
### Performance
40
97
41
-
- Replace `ByteArrayInputStream`/`ByteArrayOutputStream` with unsynchronized `CRAMByteReader`/`CRAMByteWriter` to eliminate synchronization overhead
42
-
- Fuse read base restoration, CIGAR building, and NM/MD computation into a single pass during decode
98
+
- Integrate [jlibdeflate](https://github.com/fulcrumgenomics/jlibdeflate) for native libdeflate-backed DEFLATE compression and decompression. Used by default; falls back to the JDK Deflater/Inflater if the native library cannot be loaded (#1768)
99
+
- A few targeted optimizations to the BAM decoding path yielding ~6-7% improvement in BAM read performance (#1764)
100
+
- Optimize CRAM write performance: ~15% faster encoding via codec-level tuning and reduced per-record allocation
101
+
- Replace `ByteArrayInputStream`/`ByteArrayOutputStream` with unsynchronized `CRAMByteReader`/`CRAMByteWriter` to eliminate synchronization overhead in CRAM
102
+
- Fuse read base restoration, CIGAR building, and NM/MD computation into a single pass during CRAM decode
43
103
- Cache tag key metadata to eliminate per-record `String` allocation during CRAM decode
44
104
- Pool `RANSNx16Decode` instances in the Name Tokeniser
45
105
- Optimize BAM nibble-to-ASCII base decoding with a bulk lookup table
46
106
107
+
### Bug fixes
108
+
109
+
- Fix LTF8 9-byte write bug: wrong bit shift (`>> 28` instead of `>> 24`) corrupted the high byte of large CRAM offsets (#1765)
110
+
- Fix `SamLocusIterator` so that read position is not incorrectly offset (#1758)
111
+
- Fix asymmetric `SamPairUtil.getPairOrientation` on dovetail pairs (#1771)
112
+
- Catch `UnsatisfiedLinkError` when loading the snappy native library so failure to load it does not abort downstream consumers (#1753)
113
+
114
+
### Build, tooling, and dependency clean-up
115
+
116
+
-**Code formatting:** apply [Palantir Java Format](https://github.com/palantir/palantir-java-format) to the entire codebase and enforce it on every build via [Spotless](https://github.com/diffplug/spotless). `compileJava` auto-formats source in place; CI separately runs `spotlessCheck` as the enforcement boundary. See `CONTRIBUTING.md` for details, including the `.git-blame-ignore-revs` opt-in for the bulk-format commit (#1761)
117
+
-**Maven Central publishing migrated** from the legacy OSSRH endpoint to the new [Sonatype Central Portal](https://central.sonatype.com), via the [NMCP Gradle plugin](https://github.com/GradleUp/nmcp). Consumer-visible groupId/artifactId/version coordinates are unchanged (#1769)
118
+
-**Snapshot versioning** now embeds the short commit hash (e.g. `5.0.0-23c681a-SNAPSHOT`) so each snapshot is a distinct, pinnable artifact rather than a moving Maven SNAPSHOT (#1772)
119
+
-**Test runner** now correctly reports failures rather than silently skipping them when a `@DataProvider` throws (#1759)
120
+
-**Existing API deprecations** cleaned up across `htsjdk.samtools` and `htsjdk.variant` (#1767)
121
+
-**`commons-logging` direct declaration removed.** htsjdk does not use commons-logging itself; the version pin is now expressed as a Gradle dependency constraint and only kicks in transitively when JEXL pulls it
122
+
-**Nashorn moved to `compileOnly`** — see Breaking changes
123
+
-**`gov.nih.nlm.ncbi:ngs-java` removed** — see Breaking changes (SRA support)
124
+
125
+
### Compatibility
126
+
127
+
- Compiled and tested against JDK 17 (CI default), 21, and 24. CI continues to build only on 17. htsjdk's published minimum remains Java 17 (set in 4.0.0)
128
+
47
129
### Testing and Infrastructure
48
130
131
+
- Add hts-specs CRAM 3.0 / 3.1 decode-compliance tests, plus FQZComp round-trip tests using hts-specs quality data
132
+
- Add CRAI index query correctness tests and codec round-trip property tests
49
133
- Split CRAM 3.1 fidelity tests into per-profile classes for parallel execution
134
+
- Speed up BCF2 and SeekableStream integration tests; cache test data in CRAM index test classes
135
+
- Reduce `CRAMFileBAIIndexTest` from 4 to 2 slice-size variants, sampling every 200th
136
+
- Downsample the CEUTrio test CRAM from ~654K to ~150K records (47 MB → 11 MB)
50
137
- Reduce memory pressure in unit tests to eliminate OOM failures
51
138
- Fix thread-safety bug in `VariantContextTestProvider` causing non-deterministic test counts
139
+
- Bulk up the JavaScript filter test suites: replace 4 checked-in `.js` fixtures with 46 small inline-script tests covering all three constructors, return-type semantics, bindings, and error paths (#1775)
0 commit comments