Spec.Data.Value.UnionBudget: raw sorted-merge vs builtin matrix#7803
Draft
Unisay wants to merge 1 commit into
Draft
Spec.Data.Value.UnionBudget: raw sorted-merge vs builtin matrix#7803Unisay wants to merge 1 commit into
Unisay wants to merge 1 commit into
Conversation
Compares the on-chain builtin union path (unsafeDataAsValue + unionValue + mkValue) against a hand-rolled sorted-merge over the raw BuiltinData representation, so V3 guidance on union cost rests on concrete numbers rather than estimates. The new module Spec.Data.Value.UnionBudget adds 8 goldenBundle entries: union_S1,3,8,100 in builtin and raw variants. Both bundle paths take two BuiltinData-encoded Values, the same value on both sides, mirroring the conservation-of-value pattern from production validators, produce the merged BuiltinData, and report the CEK budget. The bundle does not chain a lookupCoin or valueOf on the result so the measured cost is the union itself, not a union-then-lookup composite. The hand-rolled unionValuePositiveRaw is a Plinth translation of Philip's pvalueUnionFast (Plutarch, slack thread 1776810760.659419). Sorted-merge with a three-way branch on key comparison; assumes both inputs sorted by lexicographic key and inner-pair values strictly positive. Sorted-merge is O(L + R) per level; lookup-and-merge through AssocMap.union would be O(L x R). unionValuePositiveRaw is module-internal and not exported from plutus-ledger-api. The positive-quantity precondition is documented at the call site, not encoded in the type system. The unionValueNonZero variant from the slack discussion is left for a follow-up commit; the production case Philip cares about is tx outputs, which are strictly positive. Result matrix, raw vs builtin CPU ratio: S1 5.20x, S3 5.65x, S8 5.83x, S100 4.47x. Byte-identical output between builtin and raw at every shape; the raw column closes most of the gap that the typed unionWith from plutus#7799 opens. Goldens regenerated under both GHC 9.6 and GHC 9.12 (per ghc-version-support cabal stanza). 24 goldens times 2 GHC versions equals 48 files. For IntersectMBO/plutus-private#2243.
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Not for merge. Compares the on-chain builtin union path (
unsafeDataAsValue+unionValue+mkValue) against a hand-rolled sorted-merge over the rawBuiltinDatarepresentation, so that V3 guidance on union cost can rest on concrete numbers rather than estimates.The new module
Spec.Data.Value.UnionBudgetadds 8goldenBundleentries:union_S{1,3,8,100}_{builtin,raw}. Both bundle paths take twoBuiltinData-encodedValues (the same value used on both sides, mirroring the conservation-of-value pattern from production validators), produce the mergedBuiltinData, and report the CEK budget. The bundle no longer chains alookupCoin/valueOfon the result so the measured cost is the union itself, not a union-then-lookup composite.The hand-rolled
unionValuePositiveRawis a Plinth translation of Philip'spvalueUnionFast(Plutarch, slack thread1776810760.659419). It is a sorted-merge with a three-way branch on key comparison and assumes both inputs are sorted by lexicographic key and that inner-pair values are strictly positive integers. Sorted-merge is O(L + R) per level; lookup-and-merge throughAssocMap.unionwould be O(L × R).unionValuePositiveRawis module-internal and not exported fromplutus-ledger-api. The positive-quantity precondition is documented at the call site, not encoded in the type system. TheunionValueNonZerovariant from the slack discussion (where inputs may contain zero quantities and the sum-may-cancel branch must drop entries) is left for a follow-up commit if needed; the production case Philip cares about is tx-output values, which are strictly positive.For plutus-private#2243.
Union matrix, GHC 9.6, plutus-ledger-api 1.65.0.0
CPU and Memory in absolute units.
builtin=unsafeDataAsValue+unionValue+mkValue.raw=unionValuePositiveRaw, the sorted-merge Plinth helper.Shape legend: S1 = ada only (1 token); S3 = ada plus 2 single-token policies (3 tokens); S8 = ada plus 7 single-token policies (8 tokens); S100 = ada plus 10 policies of 10 tokens each (101 tokens).
The raw column is the production-relevant "before BuiltinValue" baseline: validators today do their own sorted-merge over raw
BuiltinData(see Djed stdlib for an example), and that path is what the builtin replaces. The ratio is roughly 5× on CPU and 16–98× on memory; the CPU ratio stays flat across shapes because both columns scale linearly in total token count with different constants.Correctness
Byte-identical output between
builtinandrawcolumns at every shape:Same input on both sides (
valueS*unsafeApplyCodevalueS*), so the result is2 × valueS*at every token slot. Both columns produce the same outer-key order (preserved from input) and the same per-key sum.V3 union guidance
Validators that accumulate
Values over many inputs should prefer theunionValuebuiltin when available. The sorted-merge raw path closes most of the gap a typed user-spaceunionWithopens (see #7799 for the typed delta, sub-1% at S100 against masterunionWith) but is still 5× the builtin cost. For positive-quantity inputs (tx outputs)unionValuePositiveRawis the right "before BuiltinValue" implementation to compare against.The 15× headline gap on the original #7738 at S1 referred to the typed
unionWith(viaMap.union, lookup-and-merge). With the raw sorted-merge path the ratio drops to 5.2× at S1 and stays in the 4.5–6× band. The remaining gap is the cost of running UPLC vs native code; the algorithmic improvement is fully realised.A typed
unionWithcolumn is not included here. The typed path is on impl PR #7799, with its own old-vs-new delta measured against master. Including it again here would invite a derail into "why is the typed path even on the matrix" when Philip has already accepted sorted-merge as the canonical baseline.