Skip to content

Commit e2d1f25

Browse files
committed
Detect libdeflate >= 1.9 and adjust CRAM RN encoding.
Previously for read names compression levels between 4 and 7 zlib considerably beat libdeflate due to libdeflate's poor selection of minimum match length. This was raised as issue ebiggers/libdeflate#85. It's now been resolved in libdeflate 1.9 (along with some general improvements elsewhere), and in all cases libdeflate is a better choice. Also fixed the mapping of levels 1..9 (standard zlib) to 1..12 (libdeflate). The maths in the comment was incorrect as it's an integer calculation not a floating point one. Figures from converting 1 million NovaSeq records from BAM to CRAM 3.0: Libdeflate 1.9+PR-7 0m43.816s 204732408 48381374 (RN=libdeflate) Libdeflate 1.8-7 0m45.379s 206626451 50580708 (RN=zlib) Libdeflate 1.8-7 1m1.431s 210172035 *54126292 (RN=libdeflate, forced) Zlib only -7 0m48.531s 207189920 50580708 (RN=zlib) (Default level) Libdeflate 1.9+PR-5 0m30.323s 207793059 51023626 (RN=libdeflate) Libdeflate 1.8-5 0m33.265s 208714328 51612215 (RN=zlib, as devel) Libdeflate 1.8-5 0m29.753s 213024792 *55922679 (RN=libdeflate, forced) Zlib only -5 0m40.353s 208499406 51612215 (RN=zlib) We can clearly see the problem(*) in using libdeflate for read-names in 1.8, how it's fixed in 1.9, and how it is now smaller/faster than zlib again. At level 9 it was using libdeflate for RN already, but we see improvements to both RN and elsewhere which are simply down to other changes in the library: Time Size RN Libdeflate 1.9+PR-9 2m21.757s 202890458 47327597 (RN=libdeflate) Libdeflate 1.8-9 2m6.304s 204292448 48541687 (RN=libdeflate) Zlib only -9 1m20.966s 206482425 49988310 (RN=zlib) Finally, the impact of switching level 9 from the old mapping (11) to new (12; "9+"), along with a more complete table for curiosities sake: Time Size RN Libdeflate 1.9+PR-9+ 2m54.664s 202315823 46783148 Libdeflate 1.9+PR-9 2m21.757s 202890458 47327597 Libdeflate 1.9+PR-8 1m39.040s 202934405 47247996 Libdeflate 1.9+PR-7 0m43.816s 204732408 48381374 Libdeflate 1.9+PR-6 0m31.521s 207437149 50768595 Libdeflate 1.9+PR-5 0m30.323s 207793059 51023626 (default level) Libdeflate 1.9+PR-4 0m29.478s 210425588 52946850 Libdeflate 1.9+PR-1 0m27.460s 215975209 57142706 (no change) (Note: "1.9" here is actually master, which is a few commits on from the tag, but the main gist of it is the same.)
1 parent b28a043 commit e2d1f25

1 file changed

Lines changed: 6 additions & 2 deletions

File tree

cram/cram_io.c

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1108,8 +1108,8 @@ char *zlib_mem_inflate(char *cdata, size_t csize, size_t *size) {
11081108
static char *libdeflate_deflate(char *data, size_t size, size_t *cdata_size,
11091109
int level, int strat) {
11101110
level = level > 0 ? level : 6; // libdeflate doesn't honour -1 as default
1111-
level *= 1.2; // NB levels go up to 12 here; 5 onwards is +1
1112-
if (level >= 8) level += level/8; // 8->10, 9->12
1111+
level *= 1.23; // NB levels go up to 12 here; 5 onwards is +1
1112+
level += level>=8; // 5,6,7->6,7,8 8->10 9->12
11131113
if (level > 12) level = 12;
11141114

11151115
if (strat == Z_RLE) // not supported by libdeflate
@@ -1213,6 +1213,7 @@ char *zlib_mem_inflate(char *cdata, size_t csize, size_t *size) {
12131213
}
12141214
#endif
12151215

1216+
#if !defined(HAVE_LIBDEFLATE) || LIBDEFLATE_VERSION_MAJOR < 1 || (LIBDEFLATE_VERSION_MAJOR == 1 && LIBDEFLATE_VERSION_MINOR <= 8)
12161217
static char *zlib_mem_deflate(char *data, size_t size, size_t *cdata_size,
12171218
int level, int strat) {
12181219
z_stream s;
@@ -1269,6 +1270,7 @@ static char *zlib_mem_deflate(char *data, size_t size, size_t *cdata_size,
12691270
}
12701271
return (char *)cdata;
12711272
}
1273+
#endif
12721274

12731275
#ifdef HAVE_LIBLZMA
12741276
/* ------------------------------------------------------------------------ */
@@ -1754,9 +1756,11 @@ static char *cram_compress_by_method(cram_slice *s, char *in, size_t in_size,
17541756
//
17551757
// Eg RN at level 5; libdeflate=55.9MB zlib=51.6MB
17561758
#ifdef HAVE_LIBDEFLATE
1759+
# if (LIBDEFLATE_VERSION_MAJOR < 1 || (LIBDEFLATE_VERSION_MAJOR == 1 && LIBDEFLATE_VERSION_MINOR <= 8))
17571760
if (content_id == DS_RN && level >= 4 && level <= 7)
17581761
return zlib_mem_deflate(in, in_size, out_size, level, strat);
17591762
else
1763+
# endif
17601764
return libdeflate_deflate(in, in_size, out_size, level, strat);
17611765
#else
17621766
return zlib_mem_deflate(in, in_size, out_size, level, strat);

0 commit comments

Comments
 (0)