@@ -7,7 +7,10 @@ Measured on: 2026-02-24
77Python: 3.13.9, PyO3: 0.28, 500 iterations, 100 warmup
88Build: ` maturin develop --release ` (optimized, LTO + codegen-units=1)
99
10- ## Context
10+ ** Important:** Always benchmark with ` maturin develop --release ` . Debug builds
11+ are 3-8x slower due to missing optimizations and inlining.
12+
13+ ## Why the codec exists
1114
1215The codec does fundamentally more work than ` pickle.loads ` /` pickle.dumps ` :
1316
@@ -19,10 +22,9 @@ The codec's value is not raw speed but **JSONB queryability** — enabling SQL
1922queries on ZODB object attributes in PostgreSQL. Despite the extra work, the
2023release build beats CPython pickle on most operations.
2124
22- ** Important:** Always benchmark with ` maturin develop --release ` . Debug builds
23- are 3-8x slower due to missing optimizations and inlining.
25+ ---
2426
25- ## Synthetic Benchmarks
27+ ## Synthetic micro-benchmarks
2628
2729### Decode (pickle bytes -> Python dict)
2830
@@ -39,6 +41,24 @@ are 3-8x slower due to missing optimizations and inlining.
3941| wide_dict (27 KB) | 264 us | 279 us | 1.1x slower |
4042| deep_nesting (379 B) | 7.2 us | 7.3 us | 1.0x |
4143
44+ ### Decode to JSON string (pickle bytes -> JSON, all in Rust)
45+
46+ The direct path for PG storage — serializes to a JSON string entirely in Rust
47+ with the GIL released. Compared against the dict path + ` json.dumps() ` .
48+
49+ | Category | Dict+dumps | JSON str | Speedup |
50+ | ---| ---| ---| ---|
51+ | simple_flat_dict | 2.7 us | 1.3 us | ** 2.2x faster** |
52+ | nested_dict | 4.3 us | 2.5 us | ** 1.7x faster** |
53+ | large_flat_dict | 35.4 us | 25.6 us | ** 1.4x faster** |
54+ | bytes_in_state | 5.7 us | 2.7 us | ** 2.1x faster** |
55+ | special_types | 7.1 us | 4.7 us | ** 1.5x faster** |
56+ | btree_small | 3.8 us | 2.1 us | ** 1.8x faster** |
57+ | btree_length | 1.5 us | 0.8 us | ** 1.9x faster** |
58+ | scalar_string | 0.9 us | 0.7 us | ** 1.3x faster** |
59+ | wide_dict | 273.7 us | 307.6 us | 1.1x slower |
60+ | deep_nesting | 13.3 us | 8.6 us | ** 1.5x faster** |
61+
4262### Encode (Python dict -> pickle bytes)
4363
4464| Category | Python | Codec | Ratio |
@@ -54,7 +74,7 @@ are 3-8x slower due to missing optimizations and inlining.
5474| wide_dict | 59.2 us | 15.7 us | ** 3.7x faster** |
5575| deep_nesting | 2.7 us | 1.4 us | ** 1.9x faster** |
5676
57- ### Full Roundtrip (decode + encode)
77+ ### Full roundtrip (decode + encode)
5878
5979| Category | Python | Codec | Ratio |
6080| ---| ---| ---| ---|
@@ -69,7 +89,7 @@ are 3-8x slower due to missing optimizations and inlining.
6989| wide_dict | 316 us | 232 us | ** 1.4x faster** |
7090| deep_nesting | 10.3 us | 9.2 us | 1.1x faster |
7191
72- ### Size Comparison (pickle bytes vs JSON)
92+ ### Output size (pickle bytes vs JSON)
7393
7494| Category | Pickle | JSON | Ratio |
7595| ---| ---| ---| ---|
@@ -88,15 +108,17 @@ JSON is typically smaller than pickle for string-heavy data (wide_dict: 42%
88108smaller). It is larger for binary data (base64 overhead) and deeply nested
89109structures (marker overhead).
90110
91- ## FileStorage Scan (Generated Wikipedia Database)
111+ ---
112+
113+ ## FileStorage scan (generated Wikipedia database)
92114
931151,692 records, 6 distinct types, 0 errors. Generated from 1,062 multilingual
94116Wikipedia articles (en/de/zh) with body text truncated to 500-10,000 chars
95117(exponential skew toward shorter texts), enriched type-diverse fields
96118(datetime, date, timedelta, Decimal, UUID, frozenset, set, tuple, bytes)
97119plus OOBTree containers, group summaries, and edge-case objects.
98120
99- Generate with: ` python benchmarks/bench.py generate `
121+ ### Codec vs CPython pickle
100122
101123| Metric | Codec | Python | Speedup |
102124| ---| ---| ---| ---|
@@ -109,14 +131,12 @@ Generate with: `python benchmarks/bench.py generate`
109131| Total pickle | 5.1 MB | — | — |
110132| Total JSON | 7.2 MB | — | 1.41x |
111133
112- The codec is slightly slower on decode (1.1x median) because it does
113- fundamentally more work than CPython's C-extension pickle: two conversions
114- (pickle bytes → Rust AST → Python objects) plus type-aware transformation.
115- The gap narrows on metadata-heavy records (small dicts with mixed types).
116-
134+ Decode is slightly slower (1.1x median) due to the two-pass conversion plus
135+ type-aware transformation. The gap narrows on metadata-heavy records.
117136Encode is consistently ** 2.5-3.3x faster** because the Rust encoder writes
118- pickle opcodes directly from Python objects, bypassing intermediate
119- allocations that CPython's pickle module incurs.
137+ pickle opcodes directly from Python objects, bypassing intermediate allocations.
138+
139+ ### Record type distribution
120140
121141| Record type | Count | % |
122142| ---| ---| ---|
@@ -127,130 +147,79 @@ allocations that CPython's pickle module incurs.
127147| BTrees.Length.Length | 5 | 0.3% |
128148| BTrees.OIBTree.OIBTree | 2 | 0.1% |
129149
130- ## Analysis
131-
132- The codec ** beats CPython pickle** on decode for 8 of 10 synthetic categories,
133- and on encode for ** all 10 categories** . On the generated FileStorage data,
134- decode is near parity (1.1x median) while encode is ** 2.5-3.3x faster** .
135-
136- The sweet spot is typical ZODB objects (5-50 keys, mixed types, datetime
137- fields, persistent refs) where the codec is ** 1.5-2.0x faster** decode and
138- ** 4-7x faster** encode while also producing queryable JSONB output.
139-
140- Decode overhead comes from the codec's two-pass conversion plus type
141- transformation. On string-dominated payloads this matters more; on
142- metadata-rich records with mixed types (the typical ZODB case) the codec
143- is competitive or faster.
144-
145- ## Optimizations Applied
150+ ---
146151
147- 1 . ** Direct PickleValue <-> PyObject** (` src/pyconv.rs ` ) — bypasses the
148- ` serde_json::Value ` intermediate layer, eliminating one full allocation
149- pass. Persistent ref compact/expand happens inline during the tree walk.
152+ ## PG storage path (FileStorage full pipeline)
150153
151- 2 . ** Direct PyObject -> pickle bytes encoder** — for the encode path,
152- writes pickle opcodes directly from Python objects to a ` Vec<u8> ` buffer,
153- skipping the intermediate ` PickleValue ` AST allocation for common types.
154+ The zodb-pgjsonb storage path has two decode functions. The dict path
155+ (` decode_zodb_record_for_pg ` ) returns a Python dict that must then be
156+ serialized via ` json.dumps() ` . The JSON string path
157+ (` decode_zodb_record_for_pg_json ` ) does everything in Rust with the GIL
158+ released. See the synthetic comparison above.
154159
155- 3 . ** Interned marker strings** (` pyo3::intern! ` ) — all JSON marker keys
156- (` @t ` , ` @cls ` , ` @s ` , etc.) are interned Python strings, cached across
157- calls. Eliminates temporary string allocation + hashing per marker check.
158-
159- 4 . ** Frequency-ordered type dispatch** — encode path checks ` PyString ` first
160- (most common ZODB type), then ` PyDict ` , before numeric types. Saves 3-4
161- type checks per string value.
162-
163- 5 . ** Dict-size fast path** — dicts with >4 keys skip all marker checks (no
164- JSON marker dict has >4 keys). Helps wide_dict and large_flat_dict.
165-
166- 6 . ** Pre-collected PyList** (` PyList::new ` vs empty+append loop) — builds
167- Python lists in one allocation instead of repeated appends.
168-
169- 7 . ** Simplified decoder stack** — removed ` StackItem ` enum wrapper from the
170- pickle decoder. Stack operations (` push ` /` pop ` /` peek ` ) are now direct
171- ` Vec<PickleValue> ` operations with ` #[inline] ` hints. ` pop_mark ` uses
172- ` mem::take ` (pointer swap) instead of ` drain().map().collect() ` .
173-
174- 8 . ** Pre-allocated decoder vectors** — stack, memo, and metastack start with
175- ` Vec::with_capacity ` instead of empty, reducing reallocations during parsing.
176-
177- 9 . ** Pre-scan Dict decode** — checks ` all_string_keys ` with a cheap enum
178- discriminant scan before processing values. Builds string-key PyDict if
179- all keys are strings (>99% of ZODB dicts); otherwise uses ` @d ` format.
180- Avoids quadratic re-processing when mixed-key dicts are encountered.
181-
182- 10 . ** Set/frozenset move** — REDUCE handler for ` builtins.set ` /` frozenset `
183- moves the list items by value instead of cloning the entire Vec.
184-
185- 11 . ** ` @ ` prefix encode fast path** — for small dicts (1-4 keys), scans key
186- prefixes before doing marker lookups. If no key starts with ` @ ` , skips
187- all 15 marker ` get_item ` checks. Cuts deep_nesting encode by 20%.
188-
189- 12 . ** Encoder ` #[inline] ` hints** — ` write_u8 ` , ` write_bytes ` , and
190- ` encode_int ` marked ` #[inline] ` to eliminate call overhead in the hot
191- encode loop.
192-
193- 13 . ** Shared ZODB memo** — single decoder processes both class and state
194- pickles, sharing the pickle memo between them. Avoids the overhead
195- of splitting and re-initializing for the state pickle.
160+ ```
161+ Dict path: pickle bytes → Rust AST → Python dict (GIL held) → json.dumps() → PG
162+ JSON path: pickle bytes → Rust AST → serde_json → JSON string (all Rust, GIL released) → PG
163+ ```
196164
197- 14 . ** Boxed Instance variant** — ` Instance(Box<InstanceData>) ` reduces the
198- ` PickleValue ` enum from 56 to 48 bytes, improving cache utilization
199- across the entire decode/encode pipeline (-13% weighted average).
165+ ### 1,692 records
200166
201- 15 . ** Thin LTO + single codegen unit** — ` lto = "thin" ` + ` codegen-units = 1 `
202- in the release profile enables cross-crate inlining and whole-crate
203- optimization. Free 6-9% improvement across decode and encode with no
204- code changes.
167+ | Metric | Dict+dumps | JSON str | Speedup |
168+ | ---| ---| ---| ---|
169+ | Mean | 41.3 us | 31.5 us | ** 1.3x faster** |
170+ | Median | 35.9 us | 26.9 us | ** 1.3x faster** |
171+ | P95 | 64.2 us | 47.7 us | ** 1.3x faster** |
205172
206- ## Changelog
173+ The JSON string path is ** 1.3x faster** across real-world data because
174+ it eliminates the Python dict allocation + ` json.dumps() ` serialization.
175+ The entire pipeline runs in Rust with the GIL released, improving
176+ multi-threaded throughput in Zope/Plone deployments.
207177
208- ### 1.3.1 (2026-02-24): LTO release profile optimization
178+ ---
209179
210- Enabled thin LTO (` lto = "thin" ` ) and single codegen unit (` codegen-units = 1 ` )
211- in the Cargo release profile. This allows LLVM to inline across crate boundaries
212- and optimize the entire crate as a single compilation unit.
180+ ## Summary
213181
214- Impact on FileStorage benchmark (1,692 records):
182+ The sweet spot is typical ZODB objects (5-50 keys, mixed types, datetime
183+ fields, persistent refs):
215184
216- | Metric | Before | After | Improvement |
217- | ---| ---| ---| ---|
218- | Decode median | 26.1 us | 24.7 us | ** -5.4%** |
219- | Decode mean | 30.5 us | 28.7 us | ** -5.9%** |
220- | Encode median | 6.8 us | 6.2 us | ** -8.8%** |
221- | Encode mean | 7.5 us | 7.0 us | ** -6.7%** |
185+ - ** Decode:** 1.5-2.0x faster on synthetic, near parity on real-world data
186+ - ** Encode:** 4-7x faster on synthetic, 2.5-3.3x faster on real-world data
187+ - ** PG path:** 1.3x faster end-to-end with GIL-free throughput
222188
223- Zero code changes — purely a build configuration improvement.
189+ Decode overhead comes from the two-pass conversion plus type transformation.
190+ On string-dominated payloads this matters more; on metadata-rich records with
191+ mixed types (the typical ZODB case) the codec is competitive or faster.
224192
225- ### 2026-02-23: Dict/list subclass support + PickleValue boxing optimization
193+ ---
226194
227- Added support for pickle SETITEMS/SETITEM/APPENDS/APPEND on Reduce and
228- Instance variants (fixes [ #5 ] ( https://github.com/bluedynamics/zodb-json-codec/issues/5 ) :
229- ` ValueError: SETITEMS on non-dict ` for OrderedDict, defaultdict, deque, etc.).
195+ ## Optimizations applied
230196
231- To avoid an enum size regression, the ` Instance ` variant was refactored from
232- an inline struct to ` Instance(Box<InstanceData>) ` , reducing ` PickleValue ` from
233- 56 bytes (pre-change baseline) to ** 48 bytes** — a 14% reduction.
197+ ** Decode path:**
198+ - Direct PickleValue <-> PyObject conversion (` pyconv.rs ` ), bypassing the
199+ ` serde_json::Value ` intermediate layer
200+ - Simplified decoder stack with ` #[inline] ` hints, ` mem::take ` for ` pop_mark `
201+ - Pre-allocated stack/memo/metastack vectors (` Vec::with_capacity ` )
202+ - Pre-scan dict keys for string-only fast path (>99% of ZODB dicts)
203+ - Shared ZODB memo across class + state pickles
204+ - Set/frozenset move semantics (no Vec clone)
205+ - Boxed Instance variant (PickleValue 56 → 48 bytes, -13% weighted avg)
234206
235- 5-round min-median benchmark comparison (baseline vs fix):
207+ ** Encode path:**
208+ - Direct PyObject → pickle bytes encoder (bypasses PickleValue AST)
209+ - Frequency-ordered type dispatch (PyString first)
210+ - Dict-size fast path (>4 keys skips all marker checks)
211+ - ` @ ` prefix scan before marker lookups (-20% on deep_nesting)
212+ - ` #[inline] ` on ` write_u8 ` , ` write_bytes ` , ` encode_int `
236213
237- | Payload | Op | Baseline | Fix | Delta |
238- | ---| ---| ---| ---| ---|
239- | simple_flat_dict | decode | 1.31 us | 1.21 us | ** -7.9%** |
240- | nested_dict | decode | 2.00 us | 1.95 us | -2.5% |
241- | large_flat_dict | decode | 20.19 us | 19.65 us | -2.7% |
242- | btree_length | decode | 0.63 us | 0.58 us | ** -9.0%** |
243- | wide_dict | decode | 304.69 us | 257.02 us | ** -15.6%** |
244- | special_types | encode | 1.01 us | 0.96 us | ** -5.2%** |
245- | btree_small | encode | 0.27 us | 0.24 us | ** -10.1%** |
246- | wide_dict | encode | 17.47 us | 16.24 us | ** -7.1%** |
247- | ** Weighted avg** | ** all** | | | ** -13.4%** |
214+ ** Both paths:**
215+ - Interned marker strings (` pyo3::intern! ` for ` @t ` , ` @cls ` , ` @s ` , etc.)
216+ - Pre-collected PyList (` PyList::new ` vs append loop)
217+ - Thin LTO + single codegen unit (free 6-9% improvement)
218+ - Direct pickle → JSON string path for PG storage (GIL released)
248219
249- No regressions above noise threshold. The smaller enum improves cache
250- utilization across the entire decode/encode pipeline, with the largest
251- gains on payloads that allocate many PickleValue nodes (wide_dict, large dicts).
220+ ---
252221
253- ## Running Benchmarks
222+ ## Running benchmarks
254223
255224``` bash
256225cd sources/zodb-json-codec
@@ -263,13 +232,13 @@ python benchmarks/bench.py synthetic --iterations 1000
263232
264233# Generate a reproducible benchmark FileStorage (requires ZODB + BTrees)
265234python benchmarks/bench.py generate
266- # Custom paths:
267- python benchmarks/bench.py generate --output /tmp/bench.fs \
268- --seed-data path/to/seed_data.json.gz
269235
270236# Scan the generated (or any) FileStorage
271237python benchmarks/bench.py filestorage benchmarks/bench_data/Data.fs
272238
239+ # PG decode path comparison (dict vs JSON string)
240+ python benchmarks/bench.py pg-compare --filestorage benchmarks/bench_data/Data.fs
241+
273242# Both synthetic + filestorage, with JSON export
274243python benchmarks/bench.py all --filestorage benchmarks/bench_data/Data.fs --output results.json
275244```
0 commit comments