Skip to content

Spec.Data.Value.UnionBudget: raw sorted-merge vs builtin matrix#7803

Draft
Unisay wants to merge 1 commit into
masterfrom
yura/issue-2243-fused-unionwith-evidence
Draft

Spec.Data.Value.UnionBudget: raw sorted-merge vs builtin matrix#7803
Unisay wants to merge 1 commit into
masterfrom
yura/issue-2243-fused-unionwith-evidence

Conversation

@Unisay

@Unisay Unisay commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Not for merge. Compares the on-chain builtin union path (unsafeDataAsValue + unionValue + mkValue) against a hand-rolled sorted-merge over the raw BuiltinData representation, so that V3 guidance on union cost can rest on concrete numbers rather than estimates.

The new module Spec.Data.Value.UnionBudget adds 8 goldenBundle entries: union_S{1,3,8,100}_{builtin,raw}. Both bundle paths take two BuiltinData-encoded Values (the same value used on both sides, mirroring the conservation-of-value pattern from production validators), produce the merged BuiltinData, and report the CEK budget. The bundle no longer chains a lookupCoin / valueOf on the result so the measured cost is the union itself, not a union-then-lookup composite.

The hand-rolled unionValuePositiveRaw is a Plinth translation of Philip's pvalueUnionFast (Plutarch, slack thread 1776810760.659419). It is a sorted-merge with a three-way branch on key comparison and assumes both inputs are sorted by lexicographic key and that inner-pair values are strictly positive integers. Sorted-merge is O(L + R) per level; lookup-and-merge through AssocMap.union would be O(L × R).

unionValuePositiveRaw is module-internal and not exported from plutus-ledger-api. The positive-quantity precondition is documented at the call site, not encoded in the type system. The unionValueNonZero variant from the slack discussion (where inputs may contain zero quantities and the sum-may-cancel branch must drop entries) is left for a follow-up commit if needed; the production case Philip cares about is tx-output values, which are strictly positive.

For plutus-private#2243.

Union matrix, GHC 9.6, plutus-ledger-api 1.65.0.0

CPU and Memory in absolute units. builtin = unsafeDataAsValue + unionValue + mkValue. raw = unionValuePositiveRaw, the sorted-merge Plinth helper.

Shape legend: S1 = ada only (1 token); S3 = ada plus 2 single-token policies (3 tokens); S8 = ada plus 7 single-token policies (8 tokens); S100 = ada plus 10 policies of 10 tokens each (101 tokens).

shape builtin CPU builtin Mem raw CPU raw Mem ratio (raw / builtin) CPU ratio Mem
S1 1 628 911 2 002 8 467 985 32 754 5.20× 16.4×
S3 3 951 025 2 306 22 311 683 81 606 5.65× 35.4×
S8 9 757 640 3 066 56 920 928 203 736 5.83× 66.4×
S100 83 344 331 13 242 372 767 585 1 298 334 4.47× 98.0×

The raw column is the production-relevant "before BuiltinValue" baseline: validators today do their own sorted-merge over raw BuiltinData (see Djed stdlib for an example), and that path is what the builtin replaces. The ratio is roughly 5× on CPU and 16–98× on memory; the CPU ratio stays flat across shapes because both columns scale linearly in total token count with different constants.

Correctness

Byte-identical output between builtin and raw columns at every shape:

diff <(tail -n +5 .../union_S1_builtin.golden.eval) <(tail -n +5 .../union_S1_raw.golden.eval)
# (no output: identical)

Same input on both sides (valueS* unsafeApplyCode valueS*), so the result is 2 × valueS* at every token slot. Both columns produce the same outer-key order (preserved from input) and the same per-key sum.

V3 union guidance

Validators that accumulate Values over many inputs should prefer the unionValue builtin when available. The sorted-merge raw path closes most of the gap a typed user-space unionWith opens (see #7799 for the typed delta, sub-1% at S100 against master unionWith) but is still 5× the builtin cost. For positive-quantity inputs (tx outputs) unionValuePositiveRaw is the right "before BuiltinValue" implementation to compare against.

The 15× headline gap on the original #7738 at S1 referred to the typed unionWith (via Map.union, lookup-and-merge). With the raw sorted-merge path the ratio drops to 5.2× at S1 and stays in the 4.5–6× band. The remaining gap is the cost of running UPLC vs native code; the algorithmic improvement is fully realised.

A typed unionWith column is not included here. The typed path is on impl PR #7799, with its own old-vs-new delta measured against master. Including it again here would invite a derail into "why is the typed path even on the matrix" when Philip has already accepted sorted-merge as the canonical baseline.

Compares the on-chain builtin union path (unsafeDataAsValue + unionValue +
mkValue) against a hand-rolled sorted-merge over the raw BuiltinData
representation, so V3 guidance on union cost rests on concrete numbers
rather than estimates.

The new module Spec.Data.Value.UnionBudget adds 8 goldenBundle entries:
union_S1,3,8,100 in builtin and raw variants. Both bundle paths take two
BuiltinData-encoded Values, the same value on both sides, mirroring the
conservation-of-value pattern from production validators, produce the
merged BuiltinData, and report the CEK budget. The bundle does not chain
a lookupCoin or valueOf on the result so the measured cost is the union
itself, not a union-then-lookup composite.

The hand-rolled unionValuePositiveRaw is a Plinth translation of
Philip's pvalueUnionFast (Plutarch, slack thread 1776810760.659419).
Sorted-merge with a three-way branch on key comparison; assumes both
inputs sorted by lexicographic key and inner-pair values strictly
positive. Sorted-merge is O(L + R) per level; lookup-and-merge through
AssocMap.union would be O(L x R).

unionValuePositiveRaw is module-internal and not exported from
plutus-ledger-api. The positive-quantity precondition is documented at
the call site, not encoded in the type system. The unionValueNonZero
variant from the slack discussion is left for a follow-up commit; the
production case Philip cares about is tx outputs, which are strictly
positive.

Result matrix, raw vs builtin CPU ratio: S1 5.20x, S3 5.65x, S8 5.83x,
S100 4.47x. Byte-identical output between builtin and raw at every
shape; the raw column closes most of the gap that the typed unionWith
from plutus#7799 opens. Goldens regenerated under both GHC 9.6 and GHC
9.12 (per ghc-version-support cabal stanza). 24 goldens times 2 GHC
versions equals 48 files.

For IntersectMBO/plutus-private#2243.
@github-actions

Copy link
Copy Markdown
Contributor

Execution Budget Golden Diff

8927187 (master) vs 819cafb

output

This comment will get updated when changes are made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant