Merge branch 'master' of https://github.com/cnlohr/vpxcoding

cnlohr · cnlohr · commit 39f4e075b4da · 2025-07-08T00:10:59.000-04:00
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -0,0 +1,20 @@
+name: Build Test
+
+on:
+  push:
+  pull_request:
+
+jobs:
+  Build-for-Linux:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v4
+      with:
+        submodules: recursive
+    - name: Install more dependencies
+      run: |
+        sudo apt-get install -y \
+          make \
+          build-essential
+    - name: Build
+      run: make test
diff --git a/Makefile b/Makefile
@@ -4,7 +4,7 @@ bittester : bittester.c
 	gcc -o $@ $^ -O2 -g
 
 optimalfinder : optimalfinder.c
-	gcc -o $@ $^ -O2 -g
+	gcc -o $@ $^ -Os -g
 
 test : bittester
 	./bittester
diff --git a/README.md b/README.md
@@ -1,15 +1,48 @@
-# vpxcoding single-file-header library.
+# vpxcoding single-file-header C library
 
-Single file header form of the arithmetic coder from [libvpx](https://github.com/webmproject/libvpx) as a general purpose compression/decompression of bitstreams algorithm.
+**WIP Note** - This offshoot has not been battle-hardened, and is subject to change.  Also, hopefully in time there will be more complete, practical examples.
+
+Single file header form of the range coder from [libvpx](https://github.com/webmproject/libvpx) (From the video codec VP8/VP9) as a general purpose compression/decompression of bitstreams algorithm.  [Range Coding](https://en.wikipedia.org/wiki/Range_coding) is a type of [Arithmatic Coding](https://en.wikipedia.org/wiki/Arithmetic_coding), able to offer even better compression than the provably optmal [Huffman Coding](https://en.wikipedia.org/wiki/Huffman_coding) because it can represent symbols using partaial numbers of bits.
 
 The idea of this coding is, given:
 
 1. A bitstream
 2. Knowledge about how likely the next bit is to be a 0 or 1 (written as a probability from 0..255)
 
-You can optimally code an output bitstream, with compression better than huffman trees by using arithmetic coding.  Please note this is **not** a replacement for something like lz77, zstd, zlib, etc.  But **is** a replacement for huffman coding.  This ONLY covers optimal symbol expression.  If there is data-similarity that must be compressed by another algorithm.
+You can optimally code an output bitstream, with compression better than huffman trees by using arithmetic coding.  Please note this is **not** a replacement for something like lz77, zstd, zlib, etc.  But **is** a replacement for huffman coding.  This ONLY covers optimal symbol expression.  If there is data-similarity that must be compressed by another algorithm.  In general, you will want to get rid of whatever entropy you can before applying this compression technique. I.e. you can't just use this to compress text.  If you are looking for something for that, you may want to consider my [heatshrink single-file-header](https://github.com/cnlohr/heatshrink-sfh).
+
+It's also reaonsably fast. Not great, but not bad.
+
+```
+Input  Len: 16777216 bytes
+Output Len: 14104056 bytes
+Relative Size: 84.07 %
+Matching 16777216 bytes
+Encode Time:  375.116ms (42.653 MBytes/s)
+Decode Time: 537.677ms (29.758 MBytes/s)
+```
+(on a AMD Ryzen 7 5800X, GCC 11.4.0, -O2) 
+
+Also, the code is very small, about 768 bytes each for reading and writing when compiled. (below, using -Os) x64.
+```
+.rodata	0100 (256 bytes)  vpx_norm           // Table used for both encode and decode
+
+.text	003f (63 bytes)   vpx_start_encode
+.text	00e4 (228 bytes)  vpx_write
+.text	005e (94 bytes)   vpx_stop_encode
 
-In general, you will want to get rid of whatever entropy you can before applying this compression technique.
+.text	0073 (115 bytes)  vpx_read
+.text	00fa (250 bytes)  vpx_reader_fill
+.text	003f (35 bytes)   vpx_reader_find_end
+.text	0066 (102 bytes)  vpx_reader_init
+.text	0016 (22 bytes)   vpx_reader_has_error
+```
+
+If you are on a platform that supports `__builtin_clz`, then you may want to define `VPXCODING_NOTABLE` as that will replace the table call with a `clz` and `andi` operation, which may be faster, and use less cache/RAM.  If you are on a RAM constrained system, you may want to do this as well, but see the note in the header file about the manually unwound log2.
+
+In my tests, depending on the application, this seems to be able to save between 1-5% over huffman trees.  But, notably, there are situations where you can use this to much greater effect and simplicity than huffman trees (but not all situations).
+
+## Example
 
 It's very simple, if you have a bitstream you want to encode, you can write something like:
 
@@ -66,11 +99,13 @@ NOTE: This is found emperically.  It may not be correct or as-designed.
 
 ## Overall Properties
 
-![Overall](https://private-user-images.githubusercontent.com/2748168/398571327-1ef7e391-2c7e-4f97-8a4e-cff0c53cc818.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzUxMjkyNDIsIm5iZiI6MTczNTEyODk0MiwicGF0aCI6Ii8yNzQ4MTY4LzM5ODU3MTMyNy0xZWY3ZTM5MS0yYzdlLTRmOTctOGE0ZS1jZmYwYzUzY2M4MTgucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MTIyNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDEyMjVUMTIxNTQyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OTFkOTgyMTY5MjJhNjk0OTNjM2VhMGI2ZDMxYzVkNTIyN2QwMmY0ZWQ1NDlmMWU1MWNhNTZmNGVkNDI3YmQxNiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.IX-AwtKMzUBppi4N1R1tnjksyofYlQKl0izo2jKk1Eg)
+![Optimal Compression Ratio](https://github.com/user-attachments/assets/02b9d48f-497c-4633-87b8-42a0e345aeaa)
 
-![Edges](https://private-user-images.githubusercontent.com/2748168/398571325-ded30a7a-e449-4da4-865c-d450513f0139.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzUxMjkyNDIsIm5iZiI6MTczNTEyODk0MiwicGF0aCI6Ii8yNzQ4MTY4LzM5ODU3MTMyNS1kZWQzMGE3YS1lNDQ5LTRkYTQtODY1Yy1kNDUwNTEzZjAxMzkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MTIyNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDEyMjVUMTIxNTQyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YTJmM2E4YmE3NGQ1Y2EwZDFhYTk2N2Q1ZmIyMWQ5ODJmZTE1ZTZhZmQyNTM2ZTA1ODI4YjAxNWE5YzExMDgxYyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.MEo910d7UEcy97vSx2zMIF6cBCvuRbwpt2yofZA4ROM)
+![Overall](https://github.com/user-attachments/assets/55d98d1d-9fc9-4bb2-a436-16dd0fbc603d)
 
-![Optimal](https://private-user-images.githubusercontent.com/2748168/398571328-1c47a87f-d00f-4b7b-bf1e-1da297602786.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzUxMjkyNDIsIm5iZiI6MTczNTEyODk0MiwicGF0aCI6Ii8yNzQ4MTY4LzM5ODU3MTMyOC0xYzQ3YTg3Zi1kMDBmLTRiN2ItYmYxZS0xZGEyOTc2MDI3ODYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MTIyNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDEyMjVUMTIxNTQyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OTc3ODllOTVmZmU4MzJjZGNkNGVhOWEwMjBkNjM5MGViMGUxNjM2NDc1OTk4ZmMwYjQwNTNkZTA2OTFkNmNhMyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.P3PlCYdhjuegYD7KD6o_LoAldsuhj8dNmVlHC33C6AQ)
+![Edges](https://github.com/user-attachments/assets/c18f296a-d2af-4d7d-84a3-ef145f01a66c)
+
+![Optimal](https://github.com/user-attachments/assets/d2315457-68a6-460e-aaa2-73ba25c0b0aa)
 
 
 ## Special Thanks
@@ -82,3 +117,14 @@ Whoever the original author actually was.  At the very least thank you for makin
  * https://github.com/danielrh/arithmetic_coding_tutorial 
  * https://github.com/danielrh/losslessh264
 
+ * This video is a GREAT introduction to Huffman, Arithmetic Coding, And ANS [Better than Huffman](https://www.youtube.com/watch?v=RFWJM8JMXBs)
+
+These two youtube videos were really good, but don't explain this specific implementation.
+ * [(IC 5.1) Arithmetic coding - introduction](https://www.youtube.com/watch?v=ouYV3rBtrTI)
+ * [(IC 5.2) Arithmetic coding - Example #1](https://www.youtube.com/watch?v=7vfqhoJVwuc)
+ * [(IC 5.3) Arithmetic coding - Example #2](https://www.youtube.com/watch?v=CXCWQy9N2ag)
+ * [(IC 5.4) Why the interval needs to be completely contained](https://www.youtube.com/watch?v=jHS8-rmEo5k)
+ * [(IC 5.5) Rescaling operations for arithmetic coding](https://www.youtube.com/watch?v=t8_198HHSfI)
+ * This continues on for 13? episodes?
+
+
diff --git a/bittester.c b/bittester.c
@@ -4,6 +4,7 @@
 
 #define VPXCODING_READER
 #define VPXCODING_WRITER
+#define VPXCODING_NOTABLE
 #include "vpxcoding.h"
 
 double GetAbsoluteTime()
@@ -14,9 +15,29 @@ double GetAbsoluteTime()
 }
 
 uint8_t dummydata[1024*1024*16];
+
+//#define TEST_RANDOM_PROBS
+
+#ifndef TEST_RANDOM_PROBS
+const int arbitrary_prob = 187;
+#else
+uint8_t dummyprobs[1024*1024*16*8];
+#endif
+
 int main( int argc, char ** argv )
 {
 	int i;
+#if 0
+	for( i = 0; i < 256; i++ )
+	{
+		printf( "%2d", VPXCODING_VPXNORM( i ) );
+		if( ( i & 15 ) == 15 )
+		{
+			printf( "\n" );
+		}
+	}
+	return 0;
+#endif
 	for( i = 0; i < sizeof(dummydata); i++ )
 	{
 		//dummydata[i] = rand();
@@ -27,18 +48,22 @@ int main( int argc, char ** argv )
 	uint8_t * bufferO = malloc(sizeof(dummydata)*20);
 	vpx_start_encode( &w, bufferO, sizeof(dummydata)*20);
 
-	const int arbitrary_prob = 187;
-
 	double startEnc = GetAbsoluteTime();
+	int bitno = 0;
 	for( i = 0; i < sizeof(dummydata); i++ )
 	{
 		int data = dummydata[i];
 		int bits = 8;
 		int bit;
 		for (bit = bits - 1; bit >= 0; bit--)
 		{
+#ifdef TEST_RANDOM_PROBS
+			int prob = dummyprobs[bitno++];
+#else
+			int prob = arbitrary_prob;
+#endif
 			int outbit = (data >> bit) & 1;
-			vpx_write(&w, outbit, arbitrary_prob);
+			vpx_write(&w, outbit, prob);
 		}
 	}
 	vpx_stop_encode(&w);
@@ -56,13 +81,20 @@ int main( int argc, char ** argv )
 	vpx_reader reader;
 	int ret = vpx_reader_init(&reader, bufferO, w.pos, 0, 0 );
 	double startDec = GetAbsoluteTime();
+	bitno = 0;
 	for( i = 0; i < sizeof(dummydata); i++ )
 	{
 		int bits = 8;
 		int bit;
 		uint8_t data = 0;
+#ifdef TEST_RANDOM_PROBS
+		int prob = dummyprobs[bitno++] = rand()&0xff;
+#else
+		int prob = arbitrary_prob;
+#endif
+
 		for (bit = bits - 1; bit >= 0; bit--)// Arbitrary, for testing
-			data = (data<<1) | vpx_read(&reader, arbitrary_prob);
+			data = (data<<1) | vpx_read(&reader, prob);
 		if( data != dummydata[i] )
 		{
 			fprintf( stderr, "Disagree at %d (%08x != %08x)\n", i, data, dummydata[i] );
@@ -73,7 +105,7 @@ int main( int argc, char ** argv )
 
 	printf( "Matching %d bytes\n", i );
 	printf( "Encode Time:  %.3fms (%.3f MBytes/s)\n", (endEnc - startEnc)*1000.0, (sizeof(dummydata)/1024/1024)/(endEnc - startEnc) );
-	printf( "Deccode Time: %.3fms (%.3f MBytes/s)\n", (endDec - startDec)*1000.0, (sizeof(dummydata)/1024/1024)/(endDec - startDec) );
+	printf( "Decode Time: %.3fms (%.3f MBytes/s)\n", (endDec - startDec)*1000.0, (sizeof(dummydata)/1024/1024)/(endDec - startDec) );
 
 	return 0;
 }
diff --git a/optimalfinder.c b/optimalfinder.c
@@ -2,8 +2,10 @@
 #include <stdio.h>
 #include <sys/time.h>
 
+#define VPXCODING_DECORATOR
 #define VPXCODING_READER
 #define VPXCODING_WRITER
+#define VPXCODING_IMPLEMENTATION
 #include "vpxcoding.h"
 
 uint8_t dummydata[1024*64];
@@ -12,14 +14,18 @@ int main( int argc, char ** argv )
 {
 	srand( 1 );
 	int i;
-	printf( "Probability of 1," );
+	FILE * fOpt = fopen( "FullList.csv", "w" );
+	FILE * fOptPar = fopen( "FullListPar.csv", "w" );
+	fprintf( fOpt, "Probability of 1," );
 	for( int probability = 0; probability < 256; probability++ )
 	{
-		printf( "%d,", probability );
+		fprintf( fOpt, "%d,", probability );
 	}
-	printf( "Best\n" );	double percentones = 0.0;
+	fprintf( fOpt, "Best\n" );	double percentones = 0.0;
 	for( percentones = 0.0; percentones < 100; percentones+=0.1 )
 	{
+		printf( "%f\n", percentones );
+
 		memset( dummydata, 0, sizeof( dummydata ) );
 		for( i = 0; i < sizeof(dummydata); i++ )
 		{
@@ -33,7 +39,7 @@ int main( int argc, char ** argv )
 			}
 		}
 
-		printf( "%.1f%%,",percentones);
+		fprintf( fOpt, "%.1f%%,",percentones);
 		int bestprob;
 		double bestrate;
 		bestprob = 0;
@@ -62,7 +68,7 @@ int main( int argc, char ** argv )
 			}
 			//printf( "Relative Size: %.2f %%\n", w.pos * 100.0 / sizeof(dummydata) );
 			double rate = w.pos * 100.0 / sizeof(dummydata);
-			printf( "%.4f,", rate );
+			fprintf( fOpt, "%.4f,", rate );
 			if( bestrate > rate )
 			{
 				bestrate = rate;
@@ -85,7 +91,8 @@ int main( int argc, char ** argv )
 				}
 			}
 		}
-		printf( "%d\n",bestprob );
+		fprintf( fOpt, "%d\n",bestprob );
+		fprintf( fOptPar, "%f,%f,%d\n", percentones, bestrate, bestprob );
 	}
 
 
diff --git a/vpxcoding.h b/vpxcoding.h
@@ -1,6 +1,6 @@
 /*
  *  Copyright (c) 2010 The WebM project authors. All Rights Reserved.
- *  Amalgam is (c) 2024 cnlohr
+ *  Amalgam is (c) 2024,2025 cnlohr
  *
  *  Use of this source code is governed by a BSD-style license
  *  that can be found in the LICENSE file in the root of the libvpx
@@ -15,6 +15,8 @@
  *  This amalgam has some notable changes:
  *    1. Changed decrypt_state / decrypt_cb to ingest for reader.
  *    2. Removed all libvpx dependencies.
+ *    3. Changed vpx_norm to be configurable (For low-flash situations)
+ *    4. Removed endian-specific code, and just iterated directly.
  *
  *  To Use:
 
@@ -32,7 +34,6 @@
 #include <stdint.h>
 #include <limits.h>
 #include <string.h>
-#include <endian.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -44,7 +45,7 @@ extern "C" {
 #define VPXCODING_IMPLEMENTATION
 #endif
 
-
+#ifndef VPXCODING_CUSTOM_VPXNORM
 static const uint8_t vpx_norm[256] = {
 	0, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
 	3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
@@ -63,6 +64,7 @@ static const uint8_t vpx_norm[256] = {
 	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
 };
+#endif
 
 #ifdef VPXCODING_READER
 
@@ -175,12 +177,13 @@ VPXCODING_DECORATOR void vpx_reader_fill(vpx_reader *r)
 	if (bits_left > BD_VALUE_SIZE) {
 		const int bits = (shift & 0xfffffff8) + CHAR_BIT;
 		BD_VALUE nv;
-		BD_VALUE big_endian_values;
-		memcpy(&big_endian_values, buffer, sizeof(BD_VALUE));
+		BD_VALUE big_endian_values = 0;
+		int n;
 #ifdef VPX_64BIT
-		big_endian_values = htobe64(big_endian_values);
+		// Formulated a little unusually, but selected by looking through different godbolt outputs, comparing this and |= (buffer[n]<<(56-n*8))
+		for( n = 0; n < 8; n++ ) big_endian_values = (big_endian_values<<8) | buffer[n];
 #else
-		big_endian_values = htobe32(big_endian_values);
+		for( n = 0; n < 4; n++ ) big_endian_values = (big_endian_values<<8) | buffer[n];
 #endif
 		nv = big_endian_values >> (BD_VALUE_SIZE - bits);
 		count += bits;
@@ -459,3 +462,4 @@ VPXCODING_DECORATOR int vpx_stop_encode(vpx_writer *br) {
 
 #endif
 
+

Original file line number	Diff line number	Diff line change
`@@ -2,8 +2,10 @@`
`2`	`2`	`#include <stdio.h>`
`3`	`3`	`#include <sys/time.h>`
`4`	`4`
	`5`	`+#define VPXCODING_DECORATOR`
`5`	`6`	`#define VPXCODING_READER`
`6`	`7`	`#define VPXCODING_WRITER`
	`8`	`+#define VPXCODING_IMPLEMENTATION`
`7`	`9`	`#include "vpxcoding.h"`
`8`	`10`
`9`	`11`	`uint8_t dummydata[1024*64];`
`@@ -12,14 +14,18 @@ int main( int argc, char ** argv )`
`12`	`14`	`{`
`13`	`15`	`srand( 1 );`
`14`	`16`	`int i;`
`15`		`- printf( "Probability of 1," );`
	`17`	`+ FILE * fOpt = fopen( "FullList.csv", "w" );`
	`18`	`+ FILE * fOptPar = fopen( "FullListPar.csv", "w" );`
	`19`	`+ fprintf( fOpt, "Probability of 1," );`
`16`	`20`	`for( int probability = 0; probability < 256; probability++ )`
`17`	`21`	`{`
`18`		`- printf( "%d,", probability );`
	`22`	`+ fprintf( fOpt, "%d,", probability );`
`19`	`23`	`}`
`20`		`- printf( "Best\n" ); double percentones = 0.0;`
	`24`	`+ fprintf( fOpt, "Best\n" ); double percentones = 0.0;`
`21`	`25`	`for( percentones = 0.0; percentones < 100; percentones+=0.1 )`
`22`	`26`	`{`
	`27`	`+ printf( "%f\n", percentones );`
	`28`	`+`
`23`	`29`	`memset( dummydata, 0, sizeof( dummydata ) );`
`24`	`30`	`for( i = 0; i < sizeof(dummydata); i++ )`
`25`	`31`	`{`
`@@ -33,7 +39,7 @@ int main( int argc, char ** argv )`
`33`	`39`	`}`
`34`	`40`	`}`
`35`	`41`
`36`		`- printf( "%.1f%%,",percentones);`
	`42`	`+ fprintf( fOpt, "%.1f%%,",percentones);`
`37`	`43`	`int bestprob;`
`38`	`44`	`double bestrate;`
`39`	`45`	`bestprob = 0;`
`@@ -62,7 +68,7 @@ int main( int argc, char ** argv )`
`62`	`68`	`}`
`63`	`69`	`//printf( "Relative Size: %.2f %%\n", w.pos * 100.0 / sizeof(dummydata) );`
`64`	`70`	`double rate = w.pos * 100.0 / sizeof(dummydata);`
`65`		`- printf( "%.4f,", rate );`
	`71`	`+ fprintf( fOpt, "%.4f,", rate );`
`66`	`72`	`if( bestrate > rate )`
`67`	`73`	`{`
`68`	`74`	`bestrate = rate;`
`@@ -85,7 +91,8 @@ int main( int argc, char ** argv )`
`85`	`91`	`}`
`86`	`92`	`}`
`87`	`93`	`}`
`88`		`- printf( "%d\n",bestprob );`
	`94`	`+ fprintf( fOpt, "%d\n",bestprob );`
	`95`	`+ fprintf( fOptPar, "%f,%f,%d\n", percentones, bestrate, bestprob );`
`89`	`96`	`}`
`90`	`97`
`91`	`98`