Skip to content

Commit b66856d

Browse files
authored
feat(pre-registration): expose hashJson + canonicalize for arbitrary content signing (#32)
The canonicalize+sha256 logic that signManifest is built on is general enough that consumers signing arbitrary structured content (artifact bundles, production packets, dataset versions, etc.) end up reimplementing it from scratch. Two cases I checked while making this change: - physim/apps/server/src/lib/manifest.ts re-implements canonicalize+sha256 for production-packet attestation, with an inline comment noting that agent-eval's signer is shaped for HypothesisManifest only. - phony/products/builder/api/src/eval/champion-sign.ts re-implements the same canonicalize+sha256 (sync variant) before wrapping in Ed25519. Both pieces of duplication go away if the generic primitive is exposed. This change: - Lifts the previously-private `canonicalize(v: unknown)` to an exported function. Recursive key-sort, primitives pass through, arrays preserve order. Behavior unchanged from the inlined version. - Adds `hashJson<T>(obj: T): Promise<string>` — sha256 hex (full 64 chars) over the canonicalized JSON encoding. The same primitive signManifest is built on, refactored to call through. - Naming: I went with `hashJson` rather than `hashContent` because prompt-registry already exports `hashContent(s: string)` for the truncated 12-char prompt-id helper. Different semantics (string in, short id out) — the two coexist, named for what each actually does. Backward compat: signManifest / verifyManifest output bit-for-bit identical hashes (verified by a new test that compares hashJson(base) directly against signManifest(base).contentHash). All 827 existing tests still pass. Tests added: - canonicalize sorts keys recursively + preserves array order + passes primitives - hashJson is stable across key insertion order - hashJson(base) === signManifest(base).contentHash (composition guarantee) - hashJson and prompt-registry's hashContent are independent (different return shape)
1 parent f5c1a88 commit b66856d

3 files changed

Lines changed: 92 additions & 16 deletions

File tree

src/index.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -497,7 +497,13 @@ export type {
497497
CrossTraceDiffOptions,
498498
} from './cross-trace-diff'
499499

500-
export { signManifest, verifyManifest, evaluateHypothesis } from './pre-registration'
500+
export {
501+
signManifest,
502+
verifyManifest,
503+
evaluateHypothesis,
504+
hashJson,
505+
canonicalize,
506+
} from './pre-registration'
501507
export type {
502508
HypothesisManifest,
503509
SignedManifest,

src/pre-registration.ts

Lines changed: 50 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,55 @@ export interface HypothesisResult {
7272
notes?: string
7373
}
7474

75+
/**
76+
* Deterministic JSON canonicalization — sort object keys recursively.
77+
*
78+
* Two semantically-equal objects produce byte-identical canonicalized output;
79+
* this is what makes a content-hash stable across encoders, key insertion
80+
* orders, and runtime versions. Exported for any consumer that needs the same
81+
* canonicalization guarantee outside the manifest-signing path (e.g., signing
82+
* an artifact bundle, hashing a dataset version, etc.).
83+
*/
84+
export function canonicalize(v: unknown): unknown {
85+
if (v === null || typeof v !== 'object') return v
86+
if (Array.isArray(v)) return v.map(canonicalize)
87+
const keys = Object.keys(v as Record<string, unknown>).sort()
88+
const out: Record<string, unknown> = {}
89+
for (const k of keys) out[k] = canonicalize((v as Record<string, unknown>)[k])
90+
return out
91+
}
92+
93+
/**
94+
* SHA-256 hex (full 64 chars) over the canonicalized JSON encoding of `obj`.
95+
*
96+
* The same primitive `signManifest` and `verifyManifest` are built on, exposed
97+
* directly so consumers signing arbitrary structured content (artifact bundles,
98+
* production packets, dataset manifests, etc.) don't have to re-derive
99+
* canonicalize+sha256 from scratch.
100+
*
101+
* Stable across:
102+
* - object key insertion order (canonicalization sorts keys recursively)
103+
* - encoder choice (UTF-8 via TextEncoder, fixed)
104+
* - runtime (uses the Web Crypto subtle digest, present in Node ≥18 and browsers)
105+
*
106+
* Naming note: `hashJson` rather than `hashContent` because `hashContent` is
107+
* already taken in `prompt-registry.ts` for the truncated 12-char prompt-id
108+
* helper, which has different semantics (string input, short return). Both
109+
* coexist; `hashJson` is the right name when you mean "canonicalize then hash."
110+
*
111+
* @example
112+
* const hash = await hashJson({ id: '1', kind: 'spec' })
113+
* // 'a3f1...' (64 hex chars)
114+
*/
115+
export async function hashJson<T>(obj: T): Promise<string> {
116+
const canonical = canonicalize(obj)
117+
const bytes = new TextEncoder().encode(JSON.stringify(canonical))
118+
const digest = await globalThis.crypto.subtle.digest('SHA-256', bytes)
119+
return Array.from(new Uint8Array(digest))
120+
.map((b) => b.toString(16).padStart(2, '0'))
121+
.join('')
122+
}
123+
75124
/**
76125
* Sign a manifest with a SHA-256 content hash.
77126
*
@@ -83,12 +132,7 @@ export interface HypothesisResult {
83132
* hashing on both sides.
84133
*/
85134
export async function signManifest(m: HypothesisManifest): Promise<SignedManifest> {
86-
const canonical = canonicalize(m)
87-
const bytes = new TextEncoder().encode(JSON.stringify(canonical))
88-
const digest = await globalThis.crypto.subtle.digest('SHA-256', bytes)
89-
const hash = Array.from(new Uint8Array(digest))
90-
.map((b) => b.toString(16).padStart(2, '0'))
91-
.join('')
135+
const hash = await hashJson(m)
92136
return { ...m, contentHash: hash, algo: 'sha256-content' }
93137
}
94138

@@ -133,12 +177,3 @@ export async function evaluateHypothesis(
133177
rejectionReasons: reasons,
134178
}
135179
}
136-
137-
function canonicalize(v: unknown): unknown {
138-
if (v === null || typeof v !== 'object') return v
139-
if (Array.isArray(v)) return v.map(canonicalize)
140-
const keys = Object.keys(v as Record<string, unknown>).sort()
141-
const out: Record<string, unknown> = {}
142-
for (const k of keys) out[k] = canonicalize((v as Record<string, unknown>)[k])
143-
return out
144-
}

tests/tier2.test.ts

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@ import {
1111
} from '../src/counterfactual'
1212
import { crossTraceDiff } from '../src/cross-trace-diff'
1313
import {
14+
canonicalize,
1415
evaluateHypothesis,
16+
hashJson,
1517
signManifest,
1618
verifyManifest,
1719
type HypothesisManifest,
@@ -172,4 +174,37 @@ describe('pre-registration', () => {
172174
evaluateHypothesis(tampered, { n: 30, effect: 0.003, pValue: 0.01 }),
173175
).rejects.toThrow(/tampered|hash/i)
174176
})
177+
178+
it('canonicalize sorts keys recursively and produces stable encoding', () => {
179+
const a = canonicalize({ b: 2, a: { y: [3, 2, 1], x: 'k' } })
180+
const b = canonicalize({ a: { x: 'k', y: [3, 2, 1] }, b: 2 })
181+
expect(JSON.stringify(a)).toBe(JSON.stringify(b))
182+
// Arrays preserve order (canonicalization sorts object keys, not array elements).
183+
expect(JSON.stringify(canonicalize([3, 1, 2]))).toBe('[3,1,2]')
184+
// Primitives pass through.
185+
expect(canonicalize(42)).toBe(42)
186+
expect(canonicalize('s')).toBe('s')
187+
expect(canonicalize(null)).toBe(null)
188+
})
189+
190+
it('hashJson is stable across key insertion order — the property signManifest depends on', async () => {
191+
const ordered = await hashJson({ b: 2, a: 1 })
192+
const reordered = await hashJson({ a: 1, b: 2 })
193+
expect(ordered).toBe(reordered)
194+
expect(ordered).toMatch(/^[0-9a-f]{64}$/)
195+
})
196+
197+
it('hashJson matches signManifest contentHash for the same payload — generic primitive composes with the manifest signer', async () => {
198+
const signed = await signManifest(base)
199+
const direct = await hashJson(base)
200+
expect(signed.contentHash).toBe(direct)
201+
})
202+
203+
it('hashJson and prompt-registry hashContent are independent functions — different return shape', async () => {
204+
// Regression: don't accidentally collapse the two. hashContent (prompt-registry)
205+
// returns a 12-char id over a string. hashJson (here) returns 64 hex chars over
206+
// canonicalized JSON.
207+
const long = await hashJson('x')
208+
expect(long).toMatch(/^[0-9a-f]{64}$/)
209+
})
175210
})

0 commit comments

Comments
 (0)