33Comparison of ` zodb-json-codec ` (Rust + PyO3) vs CPython's ` pickle ` module
44for ZODB record encoding/decoding.
55
6- Measured on: 2026-02-24
6+ Measured on: 2026-02-25
77Python: 3.13.9, PyO3: 0.28, 5000 iterations, 100 warmup
8- Build: ` maturin develop --release ` (optimized, LTO + codegen-units=1 + PGO )
8+ Build: ` maturin develop --release ` + PGO ( LTO + codegen-units=1)
99
1010** Important:** Always benchmark with ` maturin develop --release ` . Debug builds
1111are 3-8x slower due to missing optimizations and inlining.
@@ -20,7 +20,8 @@ The codec does fundamentally more work than `pickle.loads`/`pickle.dumps`:
2020
2121The codec's value is not raw speed but ** JSONB queryability** — enabling SQL
2222queries on ZODB object attributes in PostgreSQL. Despite the extra work, the
23- release build beats CPython pickle on most operations.
23+ release build beats CPython pickle on encode and roundtrip across all
24+ categories, and on decode for all but the largest string-dominated payloads.
2425
2526---
2627
@@ -30,64 +31,66 @@ release build beats CPython pickle on most operations.
3031
3132| Category | Python | Codec | Ratio |
3233| ---| ---| ---| ---|
33- | simple_flat_dict (120 B) | 1.9 us | 1.1 us | ** 1.8x faster** |
34- | nested_dict (187 B) | 2.9 us | 1.8 us | ** 1.6x faster** |
35- | large_flat_dict (2.5 KB) | 22.8 us | 19.7 us | ** 1.2x faster** |
36- | bytes_in_state (1 KB) | 1.8 us | 1.9 us | 1.1x slower |
37- | special_types (314 B) | 6.8 us | 4.7 us | ** 1.5x faster** |
38- | btree_small (112 B) | 1.9 us | 1.8 us | 1.1x faster |
39- | btree_length (44 B) | 1.0 us | 0.5 us | ** 2.0x faster** |
40- | scalar_string (72 B) | 1.1 us | 0.5 us | ** 2.1x faster** |
41- | wide_dict (27 KB) | 264 us | 279 us | 1.1x slower |
42- | deep_nesting (379 B) | 7.2 us | 7.3 us | 1.0x |
34+ | simple_flat_dict (120 B) | 1.9 us | 1.0 us | ** 1.9x faster** |
35+ | nested_dict (187 B) | 2.7 us | 1.6 us | ** 1.3x faster** |
36+ | large_flat_dict (2.5 KB) | 22.6 us | 18.0 us | ** 1.3x faster** |
37+ | bytes_in_state (1 KB) | 1.6 us | 1.4 us | ** 1.1x faster ** |
38+ | special_types (314 B) | 6.8 us | 3.8 us | ** 1.8x faster** |
39+ | btree_small (112 B) | 1.7 us | 1.5 us | ** 1.2x faster** |
40+ | btree_length (44 B) | 1.0 us | 0.4 us | ** 2.3x faster** |
41+ | scalar_string (72 B) | 1.1 us | 0.5 us | ** 2.2x faster** |
42+ | wide_dict (27 KB) | 250 us | 244.5 us | ** 1.0x faster ** |
43+ | deep_nesting (379 B) | 6.9 us | 6.4 us | 1.0x slower |
4344
4445### Decode to JSON string (pickle bytes -> JSON, all in Rust)
4546
46- The direct path for PG storage — serializes to a JSON string entirely in Rust
47- with the GIL released. Compared against the dict path + ` json.dumps() ` .
47+ The direct path for PG storage — writes JSON tokens directly to a ` String `
48+ buffer from the PickleValue AST, entirely in Rust with the GIL released.
49+ No intermediate ` serde_json::Value ` allocations. Compared against the dict
50+ path + ` json.dumps() ` .
4851
4952| Category | Dict+dumps | JSON str | Speedup |
5053| ---| ---| ---| ---|
51- | simple_flat_dict | 2.7 us | 1.3 us | ** 2.2x faster** |
52- | nested_dict | 4.3 us | 2.5 us | ** 1.7x faster** |
53- | large_flat_dict | 35.4 us | 25.6 us | ** 1.4x faster** |
54- | bytes_in_state | 5.7 us | 2.7 us | ** 2.1x faster** |
55- | special_types | 7.1 us | 4.7 us | ** 1.5x faster** |
56- | btree_small | 3.8 us | 2.1 us | ** 1.8x faster** |
57- | btree_length | 1.5 us | 0.8 us | ** 1.9x faster** |
58- | scalar_string | 0.9 us | 0.7 us | ** 1.3x faster** |
59- | wide_dict | 273.7 us | 307 .6 us | 1.1x slower |
60- | deep_nesting | 13.3 us | 8.6 us | ** 1 .5x faster** |
54+ | simple_flat_dict | 2.7 us | 1.1 us | ** 2.5x faster** |
55+ | nested_dict | 4.3 us | 1.9 us | ** 2.3x faster** |
56+ | large_flat_dict | 33.7 us | 17.1 us | ** 2.0x faster** |
57+ | bytes_in_state | 5.2 us | 1.6 us | ** 3.3x faster** |
58+ | special_types | 7.5 us | 4.0 us | ** 1.9x faster** |
59+ | btree_small | 3.6 us | 1.6 us | ** 2.3x faster** |
60+ | btree_length | 1.4 us | 0.5 us | ** 2.8x faster** |
61+ | scalar_string | 0.8 us | 0.6 us | ** 1.3x faster** |
62+ | wide_dict | 290.5 us | 161 .6 us | ** 1.8x faster ** |
63+ | deep_nesting | 14.2 us | 5.7 us | ** 2 .5x faster** |
6164
6265### Encode (Python dict -> pickle bytes)
6366
6467| Category | Python | Codec | Ratio |
6568| ---| ---| ---| ---|
66- | simple_flat_dict | 1.3 us | 0.2 us | ** 6.5x faster** |
67- | nested_dict | 1.5 us | 0.3 us | ** 4.8x faster** |
68- | large_flat_dict | 5.3 us | 1.5 us | ** 3.5x faster** |
69- | bytes_in_state | 1.2 us | 0.7 us | ** 1.7x faster** |
70- | special_types | 4.7 us | 0.5 us | ** 9.8x faster** |
71- | btree_small | 1.3 us | 0.2 us | ** 6.0x faster** |
72- | btree_length | 1.1 us | 0.1 us | ** 8.8x faster** |
73- | scalar_string | 1.2 us | 0.1 us | ** 8.3x faster** |
74- | wide_dict | 56.4 us | 13.9 us | ** 4.0x faster** |
75- | deep_nesting | 2.8 us | 1.0 us | ** 2.8x faster** |
69+ | simple_flat_dict | 1.3 us | 0.2 us | ** 6.7x faster** |
70+ | nested_dict | 1.6 us | 0.3 us | ** 6.4x faster** |
71+ | large_flat_dict | 5.7 us | 1.6 us | ** 3.9x faster** |
72+ | bytes_in_state | 1.3 us | 0.8 us | ** 1.7x faster** |
73+ | special_types | 4.6 us | 0.5 us | ** 9.2x faster** |
74+ | btree_small | 1.3 us | 0.2 us | ** 6.6x faster** |
75+ | btree_length | 1.0 us | 0.1 us | ** 8.0x faster** |
76+ | scalar_string | 1.0 us | 0.1 us | ** 7.9x faster** |
77+ | wide_dict | 56.9 us | 13.7 us | ** 4.1x faster** |
78+ | deep_nesting | 2.6 us | 1.0 us | ** 2.6x faster** |
7679
7780### Full roundtrip (decode + encode)
7881
7982| Category | Python | Codec | Ratio |
8083| ---| ---| ---| ---|
81- | simple_flat_dict | 3.2 us | 1.4 us | ** 2.4x faster** |
82- | nested_dict | 4.5 us | 2.1 us | ** 2.2x faster** |
83- | large_flat_dict | 29 .7 us | 19.1 us | ** 1.6x faster** |
84- | bytes_in_state | 3.3 us | 2.4 us | ** 1.4x faster** |
85- | special_types | 11.7 us | 4.4 us | ** 2.7x faster** |
86- | btree_small | 5.8 us | 1.8 us | ** 3.3x faster** |
87- | btree_length | 2.1 us | 0.6 us | ** 3.6x faster** |
88- | scalar_string | 2.3 us | 0.6 us | ** 3.6x faster** |
89- | wide_dict | 316 us | 260 us | ** 1.2x faster** |
90- | deep_nesting | 10.3 us | 7.3 us | ** 1.4x faster** |
84+ | simple_flat_dict | 3.2 us | 1.3 us | ** 2.6x faster** |
85+ | nested_dict | 4.4 us | 2.1 us | ** 2.1x faster** |
86+ | large_flat_dict | 28 .7 us | 19.8 us | ** 1.5x faster** |
87+ | bytes_in_state | 3.1 us | 2.3 us | ** 1.4x faster** |
88+ | special_types | 11.5 us | 4.9 us | ** 2.4x faster** |
89+ | btree_small | 3.1 us | 1.8 us | ** 1.7x faster** |
90+ | btree_length | 2.0 us | 0.6 us | ** 3.4x faster** |
91+ | scalar_string | 2.1 us | 0.6 us | ** 3.5x faster** |
92+ | wide_dict | 318 us | 258.8 us | ** 1.3x faster** |
93+ | deep_nesting | 10.0 us | 7.8 us | ** 1.3x faster** |
9194
9295### Output size (pickle bytes vs JSON)
9396
@@ -122,18 +125,18 @@ plus OOBTree containers, group summaries, and edge-case objects.
122125
123126| Metric | Codec | Python | Speedup |
124127| ---| ---| ---| ---|
125- | Decode mean | 26.9 us | 22.2 us | 1.2x slower |
126- | Decode median | 23.2 us | 21.6 us | 1.1x slower |
127- | Decode P95 | 39.7 us | 31.7 us | 1.3x slower |
128- | Encode mean | 4.7 us | 18.0 us | ** 3.8x faster** |
129- | Encode median | 3.9 us | 19.7 us | ** 5.1x faster** |
130- | Encode P95 | 9.6 us | 29.1 us | ** 3.0x faster** |
128+ | Decode mean | 27.2 us | 22.7 us | 1.2x slower |
129+ | Decode median | 23.6 us | 22.2 us | 1.1x slower |
130+ | Decode P95 | 40.5 us | 33.1 us | 1.2x slower |
131+ | Encode mean | 4.8 us | 18.2 us | ** 3.8x faster** |
132+ | Encode median | 4.0 us | 19.9 us | ** 5.0x faster** |
133+ | Encode P95 | 9.9 us | 30.0 us | ** 3.0x faster** |
131134| Total pickle | 5.1 MB | — | — |
132135| Total JSON | 7.2 MB | — | 1.41x |
133136
134137Decode is slightly slower (1.1x median) due to the two-pass conversion plus
135138type-aware transformation. The gap narrows on metadata-heavy records.
136- Encode is consistently ** 3.0-5.1x faster** because the Rust encoder writes
139+ Encode is consistently ** 3.0-5.0x faster** because the Rust encoder writes
137140pickle opcodes directly from Python objects, bypassing intermediate allocations.
138141
139142### Record type distribution
@@ -154,26 +157,27 @@ pickle opcodes directly from Python objects, bypassing intermediate allocations.
154157The zodb-pgjsonb storage path has two decode functions. The dict path
155158(` decode_zodb_record_for_pg ` ) returns a Python dict that must then be
156159serialized via ` json.dumps() ` . The JSON string path
157- (` decode_zodb_record_for_pg_json ` ) does everything in Rust with the GIL
158- released. See the synthetic comparison above .
160+ (` decode_zodb_record_for_pg_json ` ) writes JSON tokens directly from the
161+ PickleValue AST to a ` String ` buffer, entirely in Rust with the GIL released .
159162
160163```
161164Dict path: pickle bytes → Rust AST → Python dict (GIL held) → json.dumps() → PG
162- JSON path: pickle bytes → Rust AST → serde_json → JSON string (all Rust , GIL released) → PG
165+ JSON path: pickle bytes → Rust AST → JSON string (direct write , GIL released) → PG
163166```
164167
165168### 1,692 records
166169
167170| Metric | Dict+dumps | JSON str | Speedup |
168171| ---| ---| ---| ---|
169- | Mean | 41.3 us | 31.5 us | ** 1.3x faster** |
170- | Median | 35.9 us | 26.9 us | ** 1.3x faster** |
171- | P95 | 64.2 us | 47.7 us | ** 1.3x faster** |
172+ | Mean | 40.4 us | 28.3 us | ** 1.4x faster** |
173+ | Median | 34.7 us | 24.4 us | ** 1.4x faster** |
174+ | P95 | 62.0 us | 51.9 us | ** 1.2x faster** |
172175
173- The JSON string path is ** 1.3x faster** across real-world data because
174- it eliminates the Python dict allocation + ` json.dumps() ` serialization.
175- The entire pipeline runs in Rust with the GIL released, improving
176- multi-threaded throughput in Zope/Plone deployments.
176+ The JSON string path is ** 1.4x faster** across real-world data because
177+ it eliminates both the Python dict allocation + ` json.dumps() ` serialization
178+ and all intermediate ` serde_json::Value ` heap allocations. The entire pipeline
179+ runs in Rust with the GIL released, improving multi-threaded throughput in
180+ Zope/Plone deployments.
177181
178182---
179183
@@ -182,9 +186,9 @@ multi-threaded throughput in Zope/Plone deployments.
182186The sweet spot is typical ZODB objects (5-50 keys, mixed types, datetime
183187fields, persistent refs):
184188
185- - ** Decode:** 1.5 -2.0x faster on synthetic, near parity on real-world data
186- - ** Encode:** 2-10x faster on synthetic, 3-5x faster on real-world data
187- - ** PG path:** 1.3x faster end-to-end with GIL-free throughput
189+ - ** Decode:** 1.1 -2.3x faster on synthetic, near parity on real-world data
190+ - ** Encode:** 1.7-9.2x faster on synthetic, 3-5x faster on real-world data
191+ - ** PG path:** 1.3-3. 3x faster end-to-end with GIL-free throughput
188192
189193Decode overhead comes from the two-pass conversion plus type transformation.
190194On string-dominated payloads this matters more; on metadata-rich records with
@@ -215,49 +219,33 @@ mixed types (the typical ZODB case) the codec is competitive or faster.
215219- Thread-local buffer reuse (retains capacity across encode calls)
216220- ` reserve() ` calls before multi-part writes (eliminates mid-write reallocations)
217221- Direct i64 LONG1 encoding (eliminates BigInt heap allocation)
222+ - Thread-local class pickle cache per (module, name) pair (single memcpy
223+ replaces 7 opcode writes for ~ 99.6% of records)
218224- ` #[inline] ` on ` write_u8 ` , ` write_bytes ` , ` encode_int `
219225
220226** Both paths:**
221227- Interned marker strings (` pyo3::intern! ` for ` @t ` , ` @cls ` , ` @s ` , etc.)
222228- Pre-collected PyList (` PyList::new ` vs append loop)
223229- Thin LTO + single codegen unit (free 6-9% improvement)
224230- Profile-guided optimization (PGO) with real FileStorage + synthetic data
225- - Direct pickle → JSON string path for PG storage (GIL released)
231+ - Direct PickleValue → JSON string writer (` json_writer.rs ` ) for PG storage,
232+ eliminating all ` serde_json::Value ` intermediate allocations (GIL released)
233+ - Thread-local JSON writer buffer reuse (retains capacity across decode calls)
226234
227235---
228236
229237## Running benchmarks
230238
239+ All numbers in this document are from PGO builds. Always use PGO for
240+ benchmarking — it adds 5-15% and reflects production performance.
241+
231242``` bash
232243cd sources/zodb-json-codec
233244
234- # Build release first (important!)
235- maturin develop --release
236-
237- # Synthetic micro-benchmarks
238- python benchmarks/bench.py synthetic --iterations 1000
239-
240- # Generate a reproducible benchmark FileStorage (requires ZODB + BTrees)
241- python benchmarks/bench.py generate
242-
243- # Scan the generated (or any) FileStorage
244- python benchmarks/bench.py filestorage benchmarks/bench_data/Data.fs
245-
246- # PG decode path comparison (dict vs JSON string)
247- python benchmarks/bench.py pg-compare --filestorage benchmarks/bench_data/Data.fs
248-
249- # Both synthetic + filestorage, with JSON export
250- python benchmarks/bench.py all --filestorage benchmarks/bench_data/Data.fs --output results.json
251- ```
245+ # 0. Decompress benchmark data (once — Data.fs is gitignored, only .gz is tracked)
246+ gunzip -k benchmarks/bench_data/Data.fs.gz
252247
253- ## PGO build (optional, adds 5-15%)
254-
255- Profile-guided optimization uses real workload data to optimize branch
256- prediction and code layout. The release CI builds include PGO for
257- Linux x86_64 wheels.
258-
259- ``` bash
260- # 1. Install LLVM tools
248+ # 1. Install LLVM tools (once)
261249rustup component add llvm-tools
262250
263251# 2. Instrumented build
@@ -266,11 +254,23 @@ RUSTFLAGS="-Cprofile-generate=/tmp/pgo-data" maturin develop --release
266254# 3. Generate profiles — use BOTH real data and synthetic for best coverage
267255python benchmarks/bench.py filestorage benchmarks/bench_data/Data.fs
268256python benchmarks/bench.py synthetic --iterations 2000
257+ python benchmarks/bench.py pg-compare --filestorage benchmarks/bench_data/Data.fs --iterations 500
269258
270259# 4. Merge profiles
271260LLVM_PROFDATA=$( find ~ /.rustup -name llvm-profdata | head -1)
272261$LLVM_PROFDATA merge -o /tmp/pgo-data/merged.profdata /tmp/pgo-data/* .profraw
273262
274263# 5. Optimized build
275264RUSTFLAGS=" -Cprofile-use=/tmp/pgo-data/merged.profdata" maturin develop --release
265+
266+ # 6. Run benchmarks
267+ python benchmarks/bench.py synthetic --iterations 5000
268+ python benchmarks/bench.py filestorage benchmarks/bench_data/Data.fs
269+ python benchmarks/bench.py pg-compare --filestorage benchmarks/bench_data/Data.fs
270+
271+ # Generate a reproducible benchmark FileStorage (requires ZODB + BTrees)
272+ python benchmarks/bench.py generate
273+
274+ # Both synthetic + filestorage, with JSON export
275+ python benchmarks/bench.py all --filestorage benchmarks/bench_data/Data.fs --output results.json
276276```
0 commit comments