Skip to content

Commit 31fe090

Browse files
jensensclaude
andcommitted
Add direct pickle → JSON string path for PG storage (1.4.0)
New decode_zodb_record_for_pg_json() converts ZODB pickle records to JSON strings entirely in Rust with the GIL released, eliminating the intermediate Python dict + json.dumps() step. 1.3x faster full pipeline on real-world data. Also restructures BENCHMARKS.md for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 27261bc commit 31fe090

7 files changed

Lines changed: 715 additions & 163 deletions

File tree

BENCHMARKS.md

Lines changed: 93 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@ Measured on: 2026-02-24
77
Python: 3.13.9, PyO3: 0.28, 500 iterations, 100 warmup
88
Build: `maturin develop --release` (optimized, LTO + codegen-units=1)
99

10-
## Context
10+
**Important:** Always benchmark with `maturin develop --release`. Debug builds
11+
are 3-8x slower due to missing optimizations and inlining.
12+
13+
## Why the codec exists
1114

1215
The codec does fundamentally more work than `pickle.loads`/`pickle.dumps`:
1316

@@ -19,10 +22,9 @@ The codec's value is not raw speed but **JSONB queryability** — enabling SQL
1922
queries on ZODB object attributes in PostgreSQL. Despite the extra work, the
2023
release build beats CPython pickle on most operations.
2124

22-
**Important:** Always benchmark with `maturin develop --release`. Debug builds
23-
are 3-8x slower due to missing optimizations and inlining.
25+
---
2426

25-
## Synthetic Benchmarks
27+
## Synthetic micro-benchmarks
2628

2729
### Decode (pickle bytes -> Python dict)
2830

@@ -39,6 +41,24 @@ are 3-8x slower due to missing optimizations and inlining.
3941
| wide_dict (27 KB) | 264 us | 279 us | 1.1x slower |
4042
| deep_nesting (379 B) | 7.2 us | 7.3 us | 1.0x |
4143

44+
### Decode to JSON string (pickle bytes -> JSON, all in Rust)
45+
46+
The direct path for PG storage — serializes to a JSON string entirely in Rust
47+
with the GIL released. Compared against the dict path + `json.dumps()`.
48+
49+
| Category | Dict+dumps | JSON str | Speedup |
50+
|---|---|---|---|
51+
| simple_flat_dict | 2.7 us | 1.3 us | **2.2x faster** |
52+
| nested_dict | 4.3 us | 2.5 us | **1.7x faster** |
53+
| large_flat_dict | 35.4 us | 25.6 us | **1.4x faster** |
54+
| bytes_in_state | 5.7 us | 2.7 us | **2.1x faster** |
55+
| special_types | 7.1 us | 4.7 us | **1.5x faster** |
56+
| btree_small | 3.8 us | 2.1 us | **1.8x faster** |
57+
| btree_length | 1.5 us | 0.8 us | **1.9x faster** |
58+
| scalar_string | 0.9 us | 0.7 us | **1.3x faster** |
59+
| wide_dict | 273.7 us | 307.6 us | 1.1x slower |
60+
| deep_nesting | 13.3 us | 8.6 us | **1.5x faster** |
61+
4262
### Encode (Python dict -> pickle bytes)
4363

4464
| Category | Python | Codec | Ratio |
@@ -54,7 +74,7 @@ are 3-8x slower due to missing optimizations and inlining.
5474
| wide_dict | 59.2 us | 15.7 us | **3.7x faster** |
5575
| deep_nesting | 2.7 us | 1.4 us | **1.9x faster** |
5676

57-
### Full Roundtrip (decode + encode)
77+
### Full roundtrip (decode + encode)
5878

5979
| Category | Python | Codec | Ratio |
6080
|---|---|---|---|
@@ -69,7 +89,7 @@ are 3-8x slower due to missing optimizations and inlining.
6989
| wide_dict | 316 us | 232 us | **1.4x faster** |
7090
| deep_nesting | 10.3 us | 9.2 us | 1.1x faster |
7191

72-
### Size Comparison (pickle bytes vs JSON)
92+
### Output size (pickle bytes vs JSON)
7393

7494
| Category | Pickle | JSON | Ratio |
7595
|---|---|---|---|
@@ -88,15 +108,17 @@ JSON is typically smaller than pickle for string-heavy data (wide_dict: 42%
88108
smaller). It is larger for binary data (base64 overhead) and deeply nested
89109
structures (marker overhead).
90110

91-
## FileStorage Scan (Generated Wikipedia Database)
111+
---
112+
113+
## FileStorage scan (generated Wikipedia database)
92114

93115
1,692 records, 6 distinct types, 0 errors. Generated from 1,062 multilingual
94116
Wikipedia articles (en/de/zh) with body text truncated to 500-10,000 chars
95117
(exponential skew toward shorter texts), enriched type-diverse fields
96118
(datetime, date, timedelta, Decimal, UUID, frozenset, set, tuple, bytes)
97119
plus OOBTree containers, group summaries, and edge-case objects.
98120

99-
Generate with: `python benchmarks/bench.py generate`
121+
### Codec vs CPython pickle
100122

101123
| Metric | Codec | Python | Speedup |
102124
|---|---|---|---|
@@ -109,14 +131,12 @@ Generate with: `python benchmarks/bench.py generate`
109131
| Total pickle | 5.1 MB |||
110132
| Total JSON | 7.2 MB || 1.41x |
111133

112-
The codec is slightly slower on decode (1.1x median) because it does
113-
fundamentally more work than CPython's C-extension pickle: two conversions
114-
(pickle bytes → Rust AST → Python objects) plus type-aware transformation.
115-
The gap narrows on metadata-heavy records (small dicts with mixed types).
116-
134+
Decode is slightly slower (1.1x median) due to the two-pass conversion plus
135+
type-aware transformation. The gap narrows on metadata-heavy records.
117136
Encode is consistently **2.5-3.3x faster** because the Rust encoder writes
118-
pickle opcodes directly from Python objects, bypassing intermediate
119-
allocations that CPython's pickle module incurs.
137+
pickle opcodes directly from Python objects, bypassing intermediate allocations.
138+
139+
### Record type distribution
120140

121141
| Record type | Count | % |
122142
|---|---|---|
@@ -127,130 +147,79 @@ allocations that CPython's pickle module incurs.
127147
| BTrees.Length.Length | 5 | 0.3% |
128148
| BTrees.OIBTree.OIBTree | 2 | 0.1% |
129149

130-
## Analysis
131-
132-
The codec **beats CPython pickle** on decode for 8 of 10 synthetic categories,
133-
and on encode for **all 10 categories**. On the generated FileStorage data,
134-
decode is near parity (1.1x median) while encode is **2.5-3.3x faster**.
135-
136-
The sweet spot is typical ZODB objects (5-50 keys, mixed types, datetime
137-
fields, persistent refs) where the codec is **1.5-2.0x faster** decode and
138-
**4-7x faster** encode while also producing queryable JSONB output.
139-
140-
Decode overhead comes from the codec's two-pass conversion plus type
141-
transformation. On string-dominated payloads this matters more; on
142-
metadata-rich records with mixed types (the typical ZODB case) the codec
143-
is competitive or faster.
144-
145-
## Optimizations Applied
150+
---
146151

147-
1. **Direct PickleValue <-> PyObject** (`src/pyconv.rs`) — bypasses the
148-
`serde_json::Value` intermediate layer, eliminating one full allocation
149-
pass. Persistent ref compact/expand happens inline during the tree walk.
152+
## PG storage path (FileStorage full pipeline)
150153

151-
2. **Direct PyObject -> pickle bytes encoder** — for the encode path,
152-
writes pickle opcodes directly from Python objects to a `Vec<u8>` buffer,
153-
skipping the intermediate `PickleValue` AST allocation for common types.
154+
The zodb-pgjsonb storage path has two decode functions. The dict path
155+
(`decode_zodb_record_for_pg`) returns a Python dict that must then be
156+
serialized via `json.dumps()`. The JSON string path
157+
(`decode_zodb_record_for_pg_json`) does everything in Rust with the GIL
158+
released. See the synthetic comparison above.
154159

155-
3. **Interned marker strings** (`pyo3::intern!`) — all JSON marker keys
156-
(`@t`, `@cls`, `@s`, etc.) are interned Python strings, cached across
157-
calls. Eliminates temporary string allocation + hashing per marker check.
158-
159-
4. **Frequency-ordered type dispatch** — encode path checks `PyString` first
160-
(most common ZODB type), then `PyDict`, before numeric types. Saves 3-4
161-
type checks per string value.
162-
163-
5. **Dict-size fast path** — dicts with >4 keys skip all marker checks (no
164-
JSON marker dict has >4 keys). Helps wide_dict and large_flat_dict.
165-
166-
6. **Pre-collected PyList** (`PyList::new` vs empty+append loop) — builds
167-
Python lists in one allocation instead of repeated appends.
168-
169-
7. **Simplified decoder stack** — removed `StackItem` enum wrapper from the
170-
pickle decoder. Stack operations (`push`/`pop`/`peek`) are now direct
171-
`Vec<PickleValue>` operations with `#[inline]` hints. `pop_mark` uses
172-
`mem::take` (pointer swap) instead of `drain().map().collect()`.
173-
174-
8. **Pre-allocated decoder vectors** — stack, memo, and metastack start with
175-
`Vec::with_capacity` instead of empty, reducing reallocations during parsing.
176-
177-
9. **Pre-scan Dict decode** — checks `all_string_keys` with a cheap enum
178-
discriminant scan before processing values. Builds string-key PyDict if
179-
all keys are strings (>99% of ZODB dicts); otherwise uses `@d` format.
180-
Avoids quadratic re-processing when mixed-key dicts are encountered.
181-
182-
10. **Set/frozenset move** — REDUCE handler for `builtins.set`/`frozenset`
183-
moves the list items by value instead of cloning the entire Vec.
184-
185-
11. **`@` prefix encode fast path** — for small dicts (1-4 keys), scans key
186-
prefixes before doing marker lookups. If no key starts with `@`, skips
187-
all 15 marker `get_item` checks. Cuts deep_nesting encode by 20%.
188-
189-
12. **Encoder `#[inline]` hints**`write_u8`, `write_bytes`, and
190-
`encode_int` marked `#[inline]` to eliminate call overhead in the hot
191-
encode loop.
192-
193-
13. **Shared ZODB memo** — single decoder processes both class and state
194-
pickles, sharing the pickle memo between them. Avoids the overhead
195-
of splitting and re-initializing for the state pickle.
160+
```
161+
Dict path: pickle bytes → Rust AST → Python dict (GIL held) → json.dumps() → PG
162+
JSON path: pickle bytes → Rust AST → serde_json → JSON string (all Rust, GIL released) → PG
163+
```
196164

197-
14. **Boxed Instance variant**`Instance(Box<InstanceData>)` reduces the
198-
`PickleValue` enum from 56 to 48 bytes, improving cache utilization
199-
across the entire decode/encode pipeline (-13% weighted average).
165+
### 1,692 records
200166

201-
15. **Thin LTO + single codegen unit**`lto = "thin"` + `codegen-units = 1`
202-
in the release profile enables cross-crate inlining and whole-crate
203-
optimization. Free 6-9% improvement across decode and encode with no
204-
code changes.
167+
| Metric | Dict+dumps | JSON str | Speedup |
168+
|---|---|---|---|
169+
| Mean | 41.3 us | 31.5 us | **1.3x faster** |
170+
| Median | 35.9 us | 26.9 us | **1.3x faster** |
171+
| P95 | 64.2 us | 47.7 us | **1.3x faster** |
205172

206-
## Changelog
173+
The JSON string path is **1.3x faster** across real-world data because
174+
it eliminates the Python dict allocation + `json.dumps()` serialization.
175+
The entire pipeline runs in Rust with the GIL released, improving
176+
multi-threaded throughput in Zope/Plone deployments.
207177

208-
### 1.3.1 (2026-02-24): LTO release profile optimization
178+
---
209179

210-
Enabled thin LTO (`lto = "thin"`) and single codegen unit (`codegen-units = 1`)
211-
in the Cargo release profile. This allows LLVM to inline across crate boundaries
212-
and optimize the entire crate as a single compilation unit.
180+
## Summary
213181

214-
Impact on FileStorage benchmark (1,692 records):
182+
The sweet spot is typical ZODB objects (5-50 keys, mixed types, datetime
183+
fields, persistent refs):
215184

216-
| Metric | Before | After | Improvement |
217-
|---|---|---|---|
218-
| Decode median | 26.1 us | 24.7 us | **-5.4%** |
219-
| Decode mean | 30.5 us | 28.7 us | **-5.9%** |
220-
| Encode median | 6.8 us | 6.2 us | **-8.8%** |
221-
| Encode mean | 7.5 us | 7.0 us | **-6.7%** |
185+
- **Decode:** 1.5-2.0x faster on synthetic, near parity on real-world data
186+
- **Encode:** 4-7x faster on synthetic, 2.5-3.3x faster on real-world data
187+
- **PG path:** 1.3x faster end-to-end with GIL-free throughput
222188

223-
Zero code changes — purely a build configuration improvement.
189+
Decode overhead comes from the two-pass conversion plus type transformation.
190+
On string-dominated payloads this matters more; on metadata-rich records with
191+
mixed types (the typical ZODB case) the codec is competitive or faster.
224192

225-
### 2026-02-23: Dict/list subclass support + PickleValue boxing optimization
193+
---
226194

227-
Added support for pickle SETITEMS/SETITEM/APPENDS/APPEND on Reduce and
228-
Instance variants (fixes [#5](https://github.com/bluedynamics/zodb-json-codec/issues/5):
229-
`ValueError: SETITEMS on non-dict` for OrderedDict, defaultdict, deque, etc.).
195+
## Optimizations applied
230196

231-
To avoid an enum size regression, the `Instance` variant was refactored from
232-
an inline struct to `Instance(Box<InstanceData>)`, reducing `PickleValue` from
233-
56 bytes (pre-change baseline) to **48 bytes** — a 14% reduction.
197+
**Decode path:**
198+
- Direct PickleValue <-> PyObject conversion (`pyconv.rs`), bypassing the
199+
`serde_json::Value` intermediate layer
200+
- Simplified decoder stack with `#[inline]` hints, `mem::take` for `pop_mark`
201+
- Pre-allocated stack/memo/metastack vectors (`Vec::with_capacity`)
202+
- Pre-scan dict keys for string-only fast path (>99% of ZODB dicts)
203+
- Shared ZODB memo across class + state pickles
204+
- Set/frozenset move semantics (no Vec clone)
205+
- Boxed Instance variant (PickleValue 56 → 48 bytes, -13% weighted avg)
234206

235-
5-round min-median benchmark comparison (baseline vs fix):
207+
**Encode path:**
208+
- Direct PyObject → pickle bytes encoder (bypasses PickleValue AST)
209+
- Frequency-ordered type dispatch (PyString first)
210+
- Dict-size fast path (>4 keys skips all marker checks)
211+
- `@` prefix scan before marker lookups (-20% on deep_nesting)
212+
- `#[inline]` on `write_u8`, `write_bytes`, `encode_int`
236213

237-
| Payload | Op | Baseline | Fix | Delta |
238-
|---|---|---|---|---|
239-
| simple_flat_dict | decode | 1.31 us | 1.21 us | **-7.9%** |
240-
| nested_dict | decode | 2.00 us | 1.95 us | -2.5% |
241-
| large_flat_dict | decode | 20.19 us | 19.65 us | -2.7% |
242-
| btree_length | decode | 0.63 us | 0.58 us | **-9.0%** |
243-
| wide_dict | decode | 304.69 us | 257.02 us | **-15.6%** |
244-
| special_types | encode | 1.01 us | 0.96 us | **-5.2%** |
245-
| btree_small | encode | 0.27 us | 0.24 us | **-10.1%** |
246-
| wide_dict | encode | 17.47 us | 16.24 us | **-7.1%** |
247-
| **Weighted avg** | **all** | | | **-13.4%** |
214+
**Both paths:**
215+
- Interned marker strings (`pyo3::intern!` for `@t`, `@cls`, `@s`, etc.)
216+
- Pre-collected PyList (`PyList::new` vs append loop)
217+
- Thin LTO + single codegen unit (free 6-9% improvement)
218+
- Direct pickle → JSON string path for PG storage (GIL released)
248219

249-
No regressions above noise threshold. The smaller enum improves cache
250-
utilization across the entire decode/encode pipeline, with the largest
251-
gains on payloads that allocate many PickleValue nodes (wide_dict, large dicts).
220+
---
252221

253-
## Running Benchmarks
222+
## Running benchmarks
254223

255224
```bash
256225
cd sources/zodb-json-codec
@@ -263,13 +232,13 @@ python benchmarks/bench.py synthetic --iterations 1000
263232

264233
# Generate a reproducible benchmark FileStorage (requires ZODB + BTrees)
265234
python benchmarks/bench.py generate
266-
# Custom paths:
267-
python benchmarks/bench.py generate --output /tmp/bench.fs \
268-
--seed-data path/to/seed_data.json.gz
269235

270236
# Scan the generated (or any) FileStorage
271237
python benchmarks/bench.py filestorage benchmarks/bench_data/Data.fs
272238

239+
# PG decode path comparison (dict vs JSON string)
240+
python benchmarks/bench.py pg-compare --filestorage benchmarks/bench_data/Data.fs
241+
273242
# Both synthetic + filestorage, with JSON export
274243
python benchmarks/bench.py all --filestorage benchmarks/bench_data/Data.fs --output results.json
275244
```

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "zodb-json-codec"
3-
version = "1.3.1"
3+
version = "1.4.0"
44
edition = "2021"
55
description = "Fast pickle ↔ JSON transcoder for ZODB, implemented in Rust"
66
readme = "README.md"

0 commit comments

Comments
 (0)