Commit e2d1f25
committed
Detect libdeflate >= 1.9 and adjust CRAM RN encoding.
Previously for read names compression levels between 4 and 7 zlib
considerably beat libdeflate due to libdeflate's poor selection of
minimum match length. This was raised as issue ebiggers/libdeflate#85.
It's now been resolved in libdeflate 1.9 (along with some general
improvements elsewhere), and in all cases libdeflate is a better
choice.
Also fixed the mapping of levels 1..9 (standard zlib) to 1..12
(libdeflate). The maths in the comment was incorrect as it's an
integer calculation not a floating point one.
Figures from converting 1 million NovaSeq records from BAM to CRAM 3.0:
Libdeflate 1.9+PR-7 0m43.816s 204732408 48381374 (RN=libdeflate)
Libdeflate 1.8-7 0m45.379s 206626451 50580708 (RN=zlib)
Libdeflate 1.8-7 1m1.431s 210172035 *54126292 (RN=libdeflate, forced)
Zlib only -7 0m48.531s 207189920 50580708 (RN=zlib)
(Default level)
Libdeflate 1.9+PR-5 0m30.323s 207793059 51023626 (RN=libdeflate)
Libdeflate 1.8-5 0m33.265s 208714328 51612215 (RN=zlib, as devel)
Libdeflate 1.8-5 0m29.753s 213024792 *55922679 (RN=libdeflate, forced)
Zlib only -5 0m40.353s 208499406 51612215 (RN=zlib)
We can clearly see the problem(*) in using libdeflate for read-names in
1.8, how it's fixed in 1.9, and how it is now smaller/faster than zlib
again.
At level 9 it was using libdeflate for RN already, but we see
improvements to both RN and elsewhere which are simply down to other
changes in the library:
Time Size RN
Libdeflate 1.9+PR-9 2m21.757s 202890458 47327597 (RN=libdeflate)
Libdeflate 1.8-9 2m6.304s 204292448 48541687 (RN=libdeflate)
Zlib only -9 1m20.966s 206482425 49988310 (RN=zlib)
Finally, the impact of switching level 9 from the old mapping (11) to
new (12; "9+"), along with a more complete table for curiosities sake:
Time Size RN
Libdeflate 1.9+PR-9+ 2m54.664s 202315823 46783148
Libdeflate 1.9+PR-9 2m21.757s 202890458 47327597
Libdeflate 1.9+PR-8 1m39.040s 202934405 47247996
Libdeflate 1.9+PR-7 0m43.816s 204732408 48381374
Libdeflate 1.9+PR-6 0m31.521s 207437149 50768595
Libdeflate 1.9+PR-5 0m30.323s 207793059 51023626 (default level)
Libdeflate 1.9+PR-4 0m29.478s 210425588 52946850
Libdeflate 1.9+PR-1 0m27.460s 215975209 57142706 (no change)
(Note: "1.9" here is actually master, which is a few commits on from
the tag, but the main gist of it is the same.)1 parent b28a043 commit e2d1f25
1 file changed
Lines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1108 | 1108 | | |
1109 | 1109 | | |
1110 | 1110 | | |
1111 | | - | |
1112 | | - | |
| 1111 | + | |
| 1112 | + | |
1113 | 1113 | | |
1114 | 1114 | | |
1115 | 1115 | | |
| |||
1213 | 1213 | | |
1214 | 1214 | | |
1215 | 1215 | | |
| 1216 | + | |
1216 | 1217 | | |
1217 | 1218 | | |
1218 | 1219 | | |
| |||
1269 | 1270 | | |
1270 | 1271 | | |
1271 | 1272 | | |
| 1273 | + | |
1272 | 1274 | | |
1273 | 1275 | | |
1274 | 1276 | | |
| |||
1754 | 1756 | | |
1755 | 1757 | | |
1756 | 1758 | | |
| 1759 | + | |
1757 | 1760 | | |
1758 | 1761 | | |
1759 | 1762 | | |
| 1763 | + | |
1760 | 1764 | | |
1761 | 1765 | | |
1762 | 1766 | | |
| |||
0 commit comments