@@ -88,40 +88,60 @@ JSON is typically smaller than pickle for string-heavy data (wide_dict: 42%
8888smaller). It is larger for binary data (base64 overhead) and deeply nested
8989structures (marker overhead).
9090
91- ## FileStorage Scan (Real Plone 6 Database)
91+ ## FileStorage Scan (Generated Wikipedia Database)
9292
93- 8,422 records, 182 distinct types, 0 errors.
93+ 1,692 records, 6 distinct types, 0 errors. Generated from 1,062 multilingual
94+ Wikipedia articles (en/de/zh) with body text truncated to 500-10,000 chars
95+ (exponential skew toward shorter texts), enriched type-diverse fields
96+ (datetime, date, timedelta, Decimal, UUID, frozenset, set, tuple, bytes)
97+ plus OOBTree containers, group summaries, and edge-case objects.
98+
99+ Generate with: ` python benchmarks/bench.py generate `
94100
95101| Metric | Codec | Python | Speedup |
96102| ---| ---| ---| ---|
97- | Decode mean | 5.3 us | 100.1 us | ** 18.7x faster** |
98- | Decode median | 3.6 us | 4.6 us | ** 1.3x faster** |
99- | Decode P95 | 11.6 us | 10.1 us | 1.1x slower |
100- | Encode mean | 1.1 us | 3.8 us | ** 3.5x faster** |
101- | Encode median | 0.7 us | 2.9 us | ** 4.1x faster** |
102- | Encode P95 | 2.7 us | 7.0 us | ** 2.6x faster** |
103- | Total pickle | 3.1 MB | — | — |
104- | Total JSON | 4.1 MB | — | 1.30x |
105-
106- The codec's mean decode speedup (18.7x) far exceeds median (1.3x) because
107- Python pickle has extreme outliers (max 365 ms) that the Rust codec avoids
108- (max 2.4 ms). This matters for tail latency in web applications.
103+ | Decode mean | 30.5 us | 24.2 us | 1.3x slower |
104+ | Decode median | 26.1 us | 23.4 us | 1.1x slower |
105+ | Decode P95 | 43.2 us | 36.1 us | 1.2x slower |
106+ | Encode mean | 7.5 us | 19.3 us | ** 2.6x faster** |
107+ | Encode median | 6.8 us | 20.9 us | ** 3.1x faster** |
108+ | Encode P95 | 13.2 us | 31.9 us | ** 2.4x faster** |
109+ | Total pickle | 5.1 MB | — | — |
110+ | Total JSON | 7.2 MB | — | 1.41x |
111+
112+ The codec is slightly slower on decode (1.1x median) because it does
113+ fundamentally more work than CPython's C-extension pickle: two conversions
114+ (pickle bytes → Rust AST → Python objects) plus type-aware transformation.
115+ The gap narrows on metadata-heavy records (small dicts with mixed types).
116+
117+ Encode is consistently ** 2.4-3.1x faster** because the Rust encoder writes
118+ pickle opcodes directly from Python objects, bypassing intermediate
119+ allocations that CPython's pickle module incurs.
120+
121+ | Record type | Count | % |
122+ | ---| ---| ---|
123+ | persistent.mapping.PersistentMapping | 1,188 | 70.2% |
124+ | BTrees.OOBTree.OOBucket | 342 | 20.2% |
125+ | persistent.list.PersistentList | 100 | 5.9% |
126+ | BTrees.OOBTree.OOBTree | 55 | 3.3% |
127+ | BTrees.Length.Length | 5 | 0.3% |
128+ | BTrees.OIBTree.OIBTree | 2 | 0.1% |
109129
110130## Analysis
111131
112132The codec ** beats CPython pickle** on decode for 8 of 10 synthetic categories,
113- and on encode for ** all 10 categories** . On real Plone data, both decode and
114- encode are faster across all statistical measures.
115-
116- The remaining decode-parity cases:
117-
118- - ** btree_small decode** : at parity (1.0x) — small payload, minimal work
119- - ** deep_nesting decode** : recursive marker prefix scanning on nested dicts
133+ and on encode for ** all 10 categories** . On the generated FileStorage data,
134+ decode is near parity (1.1x median) while encode is ** 2.4-3.1x faster** .
120135
121136The sweet spot is typical ZODB objects (5-50 keys, mixed types, datetime
122137fields, persistent refs) where the codec is ** 1.3-1.7x faster** decode and
123138** 3-7x faster** encode while also producing queryable JSONB output.
124139
140+ Decode overhead comes from the codec's two-pass conversion plus type
141+ transformation. On string-dominated payloads this matters more; on
142+ metadata-rich records with mixed types (the typical ZODB case) the codec
143+ is competitive or faster.
144+
125145## Optimizations Applied
126146
1271471 . ** Direct PickleValue <-> PyObject** (` src/pyconv.rs ` ) — bypasses the
@@ -219,9 +239,15 @@ maturin develop --release
219239# Synthetic micro-benchmarks
220240python benchmarks/bench.py synthetic --iterations 1000
221241
222- # Scan a real FileStorage
223- python benchmarks/bench.py filestorage /path/to/Data.fs
242+ # Generate a reproducible benchmark FileStorage (requires ZODB + BTrees)
243+ python benchmarks/bench.py generate
244+ # Custom paths:
245+ python benchmarks/bench.py generate --output /tmp/bench.fs \
246+ --seed-data path/to/seed_data.json.gz
247+
248+ # Scan the generated (or any) FileStorage
249+ python benchmarks/bench.py filestorage benchmarks/bench_data/Data.fs
224250
225- # Both, with JSON export for tracking
226- python benchmarks/bench.py all --filestorage /path/to /Data.fs --output results.json
251+ # Both synthetic + filestorage , with JSON export
252+ python benchmarks/bench.py all --filestorage benchmarks/bench_data /Data.fs --output results.json
227253```
0 commit comments