Refresh benchmark snapshot for release

adz · adz · commit ac8c83e7d022 · 2026-03-16T23:04:53.000+10:30
diff --git a/AGENTS.md b/AGENTS.md
@@ -29,6 +29,7 @@ These standards represent the user's preferred style and architectural philosoph
 ## 3. Testing & Validation
 - **Unquote Assertions:** Use `Swensen.Unquote` for all assertions: `test <@ actual = expected @>`.
 - **Round-Trip Testing:** Always include tests that verify a value can be serialized and then deserialized back to its original state.
+- **Release benchmark refresh:** Before cutting a release, rerun `bash scripts/generate-benchmark-snapshot.sh --stdout-only`, update the benchmark-facing docs to match the new snapshot, and do not ship stale performance numbers.
 
 ## 4. Architectural Patterns
 - **Decoder Pattern:** `JsonSource -> struct('T * JsonSource)`
@@ -40,6 +41,7 @@ These standards represent the user's preferred style and architectural philosoph
 - **Keep `Json.compile` explicit.** Hiding compilation inside `serialize`/`deserialize` would either recompile on each call or require implicit caching, which is poor UX for a performance-oriented library.
 - **Explicit nested/custom schemas currently use `Schema.fieldWith`.** Auto-resolution exists for primitives, lists, and arrays only. Future work may rename this, but the explicit-schema distinction is currently meaningful.
 - **Benchmarks should use the same DSL as tests and docs.** Avoid introducing parallel schema-definition styles unless the repo deliberately adopts a second public API.
+- **Release prep includes benchmark docs refresh.** If the release changes public performance-relevant code or performance messaging, refresh the manual benchmark snapshot and update `README.md` and benchmark docs in the same release-prep pass.
 - **When changing parsers, expand tests before refactoring.** The JSON and XML parsers are handwritten and should be treated as deterministic state machines, not “best effort” parsers.
 - **The XML surface is intentionally a small subset.** Current support is element-only XML with exact tags, escaped text, repeated `<item>` children for collections, and ignorable inter-element whitespace. Attributes, namespaces, mixed content, comments, CDATA, self-closing tags, and processing instructions are still out of scope.
 - **Common built-in schemas are now broader, but still intentional.** Auto-resolution currently includes `int64`, `int16`, `byte`, `sbyte`, `uint32`, `uint16`, `uint64`, `float`, `decimal`, `char`, `Guid`, `DateTime`, `DateTimeOffset`, `TimeSpan`, numeric-wire enums, and array-backed `IReadOnlyList<T>` / `ICollection<T>` in addition to the original primitives, lists, arrays, options, and mapping helpers. Concrete `ResizeArray<'T>` / `List<T>` uses the explicit `Schema.resizeArray` helper, and direct dictionary support still stays out of scope until there is a cleaner JSON/XML symmetry story.
@@ -54,5 +56,5 @@ These standards represent the user's preferred style and architectural philosoph
 - **Numeric parsing should stay on the shared portable helpers.** Route JSON/XML/KeyValue/import numeric decoding through the `Core.tryParse...Invariant` and `Core.parse...Invariant` helpers instead of ad hoc `Parse(..., InvariantCulture)` calls plus exception-type checks, so Fable stays warning-free and the invalid/out-of-range behavior remains aligned across runtimes.
 - **The C# facade is intentionally narrower than the bridge.** `CSharpSchema.Record(...)` is for new setter-bound C# classes and wraps the existing schema model; constructor-bound or attribute-driven C# contracts should still prefer the bridge or future codegen instead of stretching the facade into a second schema system.
 - **Do not use `System.Enum.ToObject` or `System.Convert.ChangeType` in the core portable path.** Fable rejects both APIs. When adding enum support, keep the .NET path behind `#if !FABLE_COMPILER` and use a Fable-safe erased-number path instead.
-- **BenchmarkDotNet now runs via the in-process emit toolchain.** Keep the manual runner for quick snapshots and README numbers.
+- **BenchmarkDotNet now runs via the in-process emit toolchain.** Keep the manual runner for quick snapshots and release-facing README/docs numbers.
 - **Project layout is now split by role.** Public libraries live under `src/`, executable and xUnit tests live under `tests/`, and benchmark apps live under `benchmarks/`. Keep new projects in the root that matches their purpose so tooling and docs discovery stay predictable.
diff --git a/README.md b/README.md
@@ -107,15 +107,15 @@ The project ships both a manual scenario runner and a repeatable `perf` workflow
 - profiling guide: [docs/HOW_TO_PROFILE_BENCHMARK_HOT_PATHS.md](docs/HOW_TO_PROFILE_BENCHMARK_HOT_PATHS.md)
 - full benchmark page: [docs/BENCHMARKS.md](docs/BENCHMARKS.md)
 
-Latest local manual snapshot, measured on March 11, 2026:
+Latest local manual snapshot, measured on March 16, 2026:
 
 | Scenario | CodecMapper serialize | STJ serialize | CodecMapper deserialize | STJ deserialize | Takeaway |
 | --- | ---: | ---: | ---: | ---: | --- |
-| `small-message` | `3.0 us` | `3.6 us` | `6.9 us` | `5.2 us` | `CodecMapper` wins serialize on tiny payloads; `STJ` still leads deserialize. |
-| `person-batch-25` | `76.1 us` | `68.5 us` | `152.2 us` | `152.5 us` | Medium nested decode is effectively even; serialize remains close. |
-| `person-batch-250` | `436.0 us` | `386.9 us` | `1.303 ms` | `1.074 ms` | Larger nested batches are still competitive, but `STJ` leads on throughput. |
-| `escaped-articles-20` | `236.4 us` | `192.9 us` | `410.7 us` | `325.8 us` | String-heavy payloads are a clear weak spot today. |
-| `telemetry-500` | `1.984 ms` | `1.609 ms` | `3.981 ms` | `2.810 ms` | Numeric-heavy flat payloads still need real optimization work. |
-| `person-batch-25-unknown-fields` | `40.4 us` | `39.3 us` | `158.9 us` | `129.4 us` | Unknown-field decode improved, but `STJ` still holds a noticeable lead. |
+| `small-message` | `519.5 ns` | `676.9 ns` | `990.1 ns` | `928.4 ns` | `CodecMapper` still wins tiny-message serialize; `STJ` keeps a slight decode lead. |
+| `person-batch-25` | `8.83 us` | `8.36 us` | `26.08 us` | `20.41 us` | Medium nested serialize stays close, but decode is not yet even. |
+| `person-batch-250` | `86.93 us` | `78.18 us` | `247.16 us` | `190.27 us` | Larger nested batches remain competitive on serialize, while `STJ` leads decode throughput. |
+| `escaped-articles-20` | `46.00 us` | `33.87 us` | `80.78 us` | `63.08 us` | String-heavy payloads are still a clear weak spot. |
+| `telemetry-500` | `393.93 us` | `311.45 us` | `745.63 us` | `520.84 us` | Numeric-heavy flat payloads still need significant optimization work, especially on decode. |
+| `person-batch-25-unknown-fields` | `7.92 us` | `7.51 us` | `30.50 us` | `24.23 us` | Unknown-field decode improved, but `STJ` still holds a noticeable lead. |
 
 Those numbers are machine-specific. Compare ratios and workload shape more than the absolute values.
diff --git a/docs/BENCHMARKS.md b/docs/BENCHMARKS.md
@@ -20,7 +20,7 @@ The current scenario matrix covers:
 - `telemetry-500`
 - `person-batch-25-unknown-fields`
 
-These numbers were measured locally on March 11, 2026 with:
+These numbers were measured locally on March 16, 2026 with:
 
 ```bash
 dotnet run -c Release --project benchmarks/CodecMapper.Benchmarks.Runner/CodecMapper.Benchmarks.Runner.fsproj
@@ -30,19 +30,19 @@ dotnet run -c Release --project benchmarks/CodecMapper.Benchmarks.Runner/CodecMa
 
 | Scenario | CodecMapper serialize | STJ serialize | Newtonsoft serialize | CodecMapper deserialize | STJ deserialize | Newtonsoft deserialize | Brief explanation |
 | --- | ---: | ---: | ---: | ---: | ---: | ---: | --- |
-| `small-message` | `3.0 us` | `3.6 us` | `6.7 us` | `6.9 us` | `5.2 us` | `11.5 us` | `CodecMapper` wins tiny-message serialize, while `STJ` still leads decode. |
-| `person-batch-25` | `76.1 us` | `68.5 us` | `130.1 us` | `152.2 us` | `152.5 us` | `150.2 us` | Medium nested decode is effectively even; serialize remains close. |
-| `person-batch-250` | `436.0 us` | `386.9 us` | `670.5 us` | `1.303 ms` | `1.074 ms` | `1.627 ms` | Larger nested batches are still competitive, but `STJ` has the throughput lead. |
-| `escaped-articles-20` | `236.4 us` | `192.9 us` | `288.0 us` | `410.7 us` | `325.8 us` | `404.9 us` | String-heavy payloads are a clear weak spot today. |
-| `telemetry-500` | `1.984 ms` | `1.609 ms` | `2.814 ms` | `3.981 ms` | `2.810 ms` | `5.205 ms` | Numeric-heavy payloads still need real optimization work, especially on decode. |
-| `person-batch-25-unknown-fields` | `40.4 us` | `39.3 us` | `68.9 us` | `158.9 us` | `129.4 us` | `273.9 us` | Unknown-field decode improved, but `STJ` still has a noticeable lead. |
+| `small-message` | `519.5 ns` | `676.9 ns` | `1012.0 ns` | `990.1 ns` | `928.4 ns` | `1817.7 ns` | `CodecMapper` wins tiny-message serialize, while `STJ` still leads decode. |
+| `person-batch-25` | `8.83 us` | `8.36 us` | `14.06 us` | `26.08 us` | `20.41 us` | `28.80 us` | Medium nested serialize remains close, but `STJ` holds a clearer decode lead than before. |
+| `person-batch-250` | `86.93 us` | `78.18 us` | `125.44 us` | `247.16 us` | `190.27 us` | `277.88 us` | Larger nested batches are still competitive on serialize, but `STJ` has the throughput lead on decode. |
+| `escaped-articles-20` | `46.00 us` | `33.87 us` | `49.79 us` | `80.78 us` | `63.08 us` | `78.27 us` | String-heavy payloads remain a clear weak spot, especially against `STJ`. |
+| `telemetry-500` | `393.93 us` | `311.45 us` | `539.74 us` | `745.63 us` | `520.84 us` | `938.99 us` | Numeric-heavy payloads still need real optimization work, especially on decode. |
+| `person-batch-25-unknown-fields` | `7.92 us` | `7.51 us` | `12.25 us` | `30.50 us` | `24.23 us` | `48.85 us` | Unknown-field decode improved, but `STJ` still has a noticeable lead. |
 
 ## Current reading
 
-- `CodecMapper` is already competitive on small messages and medium nested-record contracts.
+- `CodecMapper` is already competitive on small messages and stays reasonably close on medium nested-record serialize workloads.
 - `System.Text.Json` still leads on string-heavy and numeric-heavy workloads.
 - `Newtonsoft.Json` is slower across the whole current matrix.
-- Decode on wider numeric and string-heavy payloads is still the most obvious performance gap.
+- Decode on wider nested, numeric-heavy, and string-heavy payloads is still the most obvious performance gap.
 
 ## How to use this