You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+53-7Lines changed: 53 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,48 @@
1
-
# vpxcoding single-file-header library.
1
+
# vpxcoding single-file-header C library
2
2
3
-
Single file header form of the arithmetic coder from [libvpx](https://github.com/webmproject/libvpx) as a general purpose compression/decompression of bitstreams algorithm.
3
+
**WIP Note** - This offshoot has not been battle-hardened, and is subject to change. Also, hopefully in time there will be more complete, practical examples.
4
+
5
+
Single file header form of the range coder from [libvpx](https://github.com/webmproject/libvpx) (From the video codec VP8/VP9) as a general purpose compression/decompression of bitstreams algorithm. [Range Coding](https://en.wikipedia.org/wiki/Range_coding) is a type of [Arithmatic Coding](https://en.wikipedia.org/wiki/Arithmetic_coding), able to offer even better compression than the provably optmal [Huffman Coding](https://en.wikipedia.org/wiki/Huffman_coding) because it can represent symbols using partaial numbers of bits.
4
6
5
7
The idea of this coding is, given:
6
8
7
9
1. A bitstream
8
10
2. Knowledge about how likely the next bit is to be a 0 or 1 (written as a probability from 0..255)
9
11
10
-
You can optimally code an output bitstream, with compression better than huffman trees by using arithmetic coding. Please note this is **not** a replacement for something like lz77, zstd, zlib, etc. But **is** a replacement for huffman coding. This ONLY covers optimal symbol expression. If there is data-similarity that must be compressed by another algorithm.
12
+
You can optimally code an output bitstream, with compression better than huffman trees by using arithmetic coding. Please note this is **not** a replacement for something like lz77, zstd, zlib, etc. But **is** a replacement for huffman coding. This ONLY covers optimal symbol expression. If there is data-similarity that must be compressed by another algorithm. In general, you will want to get rid of whatever entropy you can before applying this compression technique. I.e. you can't just use this to compress text. If you are looking for something for that, you may want to consider my [heatshrink single-file-header](https://github.com/cnlohr/heatshrink-sfh).
13
+
14
+
It's also reaonsably fast. Not great, but not bad.
15
+
16
+
```
17
+
Input Len: 16777216 bytes
18
+
Output Len: 14104056 bytes
19
+
Relative Size: 84.07 %
20
+
Matching 16777216 bytes
21
+
Encode Time: 375.116ms (42.653 MBytes/s)
22
+
Decode Time: 537.677ms (29.758 MBytes/s)
23
+
```
24
+
(on a AMD Ryzen 7 5800X, GCC 11.4.0, -O2)
25
+
26
+
Also, the code is very small, about 768 bytes each for reading and writing when compiled. (below, using -Os) x64.
27
+
```
28
+
.rodata 0100 (256 bytes) vpx_norm // Table used for both encode and decode
29
+
30
+
.text 003f (63 bytes) vpx_start_encode
31
+
.text 00e4 (228 bytes) vpx_write
32
+
.text 005e (94 bytes) vpx_stop_encode
11
33
12
-
In general, you will want to get rid of whatever entropy you can before applying this compression technique.
34
+
.text 0073 (115 bytes) vpx_read
35
+
.text 00fa (250 bytes) vpx_reader_fill
36
+
.text 003f (35 bytes) vpx_reader_find_end
37
+
.text 0066 (102 bytes) vpx_reader_init
38
+
.text 0016 (22 bytes) vpx_reader_has_error
39
+
```
40
+
41
+
If you are on a platform that supports `__builtin_clz`, then you may want to define `VPXCODING_NOTABLE` as that will replace the table call with a `clz` and `andi` operation, which may be faster, and use less cache/RAM. If you are on a RAM constrained system, you may want to do this as well, but see the note in the header file about the manually unwound log2.
42
+
43
+
In my tests, depending on the application, this seems to be able to save between 1-5% over huffman trees. But, notably, there are situations where you can use this to much greater effect and simplicity than huffman trees (but not all situations).
44
+
45
+
## Example
13
46
14
47
It's very simple, if you have a bitstream you want to encode, you can write something like:
15
48
@@ -66,11 +99,13 @@ NOTE: This is found emperically. It may not be correct or as-designed.
0 commit comments