Skip to content

Commit 56fe09d

Browse files
shumkovclaude
andauthored
fix(hashes): make SerdeHash tolerant of ContentDeserializer's HR-quirk (#729)
* fix(hashes): make SerdeHash tolerant of ContentDeserializer's HR-quirk This is the same kind of serde-tag incompatibility fixed for `OutPoint` in #708, applied to the hash-newtype family (sha256, sha256d, hash160, hash_x11, ripemd160, sha1, sha512 — affecting Txid, BlockHash, ProTxHash, PubkeyHash, QuorumHash, and every other type generated by `hash_newtype!` or `serde_impl!`). `SerdeHash::deserialize` used two separate visitors — a string-only HR visitor (`HexVisitor`) and a bytes-only non-HR visitor (`BytesVisitor`). That works fine in isolation but breaks the moment a hash-bearing struct is wrapped by an internally-tagged enum (`#[serde(tag = "...")]`), `flatten`, or an untagged enum. Serde routes those through `ContentDeserializer`, a format-agnostic intermediate buffer that always reports `is_human_readable() == true` regardless of the upstream format. A value originally written by a non-HR encoder is therefore replayed into the HR branch as raw bytes, which the previous `HexVisitor::visit_str` saw as "32 chars" instead of "64-char hex" and rejected with `bad hex string length 32 (expected 64)`. This was hit downstream in dashpay/platform when validators / validator sets (which contain `ProTxHash`, `PubkeyHash`, `QuorumHash`) were configured for the dpp `tag = "$formatVersion"` versioning convention. ## Fix Rework `SerdeHash::deserialize` to use a single `AnyShapeVisitor` that accepts every shape a hash can arrive in: - `visit_str` / `visit_borrowed_str` — ASCII hex (canonical HR form). - `visit_bytes` / `visit_borrowed_bytes` — disambiguated by length: exactly `N` bytes → raw hash, exactly `2*N` bytes → UTF-8 hex. Any other length is rejected. - `visit_seq` — length-prefixed `u8` sequence (used by bincode and other non-self-describing formats). Use `deserialize_any` in the HR branch so the actual content shape — not the reported HR flag — drives dispatch. Keep `deserialize_bytes` in the non-HR branch since bincode is non-self-describing and does not support `deserialize_any`. ## Trade-off Raw JSON now also accepts the byte-form (`"\x11..."` UTF-8 bytes vs. `"11..."` hex string) because `deserialize_any` in serde_json's self-describing mode dispatches based on the JSON token. We disambiguate strictly by length in `visit_bytes`, so anything that's neither N bytes nor 2*N bytes still errors. This is consistent with the OutPoint fix in #708 — accept any shape, validate by length. ## Implementation note: no_std / no alloc `dashcore_hashes` does not enable `serde/alloc` (it has only `serde-std` which transitively gates that), so `Visitor::visit_byte_buf` and `visit_string` (defined behind serde's `alloc` feature) are unavailable. The `visit_seq` path uses a stack array sized to fit the largest hash (64 bytes — sha512) instead of a `Vec`, keeping the crate's no-alloc posture. ## Tests Two new regression tests in `dash/src/hash_types.rs`: - `serde_round_trip_through_internally_tagged_enum` — wraps a `Txid` in a `#[serde(tag = "type")]` enum, round-trips through `serde_json::Value` (which forces buffering through `ContentDeserializer`), and asserts the round-trip is identity. Also verifies the canonical hex-string form still deserializes and that bincode round-trip still succeeds via the byte/seq path. - `serde_round_trip_through_internally_tagged_enum_pubkey_hash` — same shape with `PubkeyHash` (20-byte hash) to exercise the smaller-length disambiguation path. `bincode` dev-dep updated to `features = ["serde"]` (same change as #708) so the bincode regression assertion compiles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): rustfmt + correct comment about N being in bytes * review: address PR feedback — real visit_seq test + debug_assert overflow Two in-scope fixes from review: 1. The Txid round-trip test had an abandoned `raw_txid_bytes` literal followed by `let _ = raw_txid_bytes; // documentation only` — leftover exploration that misled readers into thinking the bytes were used. Replace with a real assertion that constructs a `serde_json::Value::Array` of u8 numbers, wraps it in a `#[serde(tag = "type")]` enum, and round-trips through `serde_json::from_value`. This now actually exercises the new `visit_seq` path through `ContentDeserializer` — the security review noted that the prior test only hit `visit_str`, leaving `visit_bytes`/`visit_seq` regression coverage thin. 2. The `MAX_HASH_BYTES = 64` overflow check in `visit_seq` was returning a runtime error with a debug-prose string ("recompile with larger MAX") that leaked an internal type name to user error logs. Convert to `debug_assert!` — failure mode is now a test panic in debug builds (caught at CI time when adding a wider hash type), zero overhead in release. The condition is unreachable in any release build that compiled at all, since adding a wider digest would require updating `serde_impl!` invocations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 976fbe1 commit 56fe09d

3 files changed

Lines changed: 217 additions & 32 deletions

File tree

dash/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ serde_json = "1.0.140"
6767
serde_test = "1.0.177"
6868
serde_derive = "1.0.219"
6969
secp256k1 = { features = [ "recovery", "rand", "hashes" ], version="0.30.0" }
70-
bincode = { version = "2.0.1" }
70+
bincode = { version = "2.0.1", features = ["serde"] }
7171
assert_matches = "1.5.0"
7272
dashcore = { path = ".", features = ["core-block-hash-use-x11", "message_verification", "quorum_validation", "signer"] }
7373
criterion = "0.5"

dash/src/hash_types.rs

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -389,3 +389,117 @@ mod newtypes {
389389
}
390390
}
391391
}
392+
393+
#[cfg(all(test, feature = "serde"))]
394+
mod tests {
395+
use super::*;
396+
use serde_derive::{Deserialize, Serialize};
397+
398+
/// Regression test for the bug where hash newtypes' `Deserialize` errored
399+
/// with "bad hex string length 32 (expected 64)" (or similar) when an
400+
/// hash-bearing struct was wrapped by an internally-tagged enum and
401+
/// round-tripped through serde's intermediate `ContentDeserializer`.
402+
/// `ContentDeserializer` always reports `is_human_readable() == true`,
403+
/// so a value originally produced by a non-human-readable encoder ends up
404+
/// replayed into the HR branch as raw bytes — which the previous
405+
/// string-only `HexVisitor` rejected because `from_str` saw 32 UTF-8 chars
406+
/// instead of the expected 64-char hex form.
407+
#[test]
408+
fn serde_round_trip_through_internally_tagged_enum() {
409+
#[derive(Debug, PartialEq, Serialize, Deserialize)]
410+
struct WithTxid {
411+
txid: Txid,
412+
}
413+
414+
#[derive(Debug, PartialEq, Serialize, Deserialize)]
415+
#[serde(tag = "type")]
416+
enum Tagged {
417+
A(WithTxid),
418+
}
419+
420+
let original = Tagged::A(WithTxid {
421+
txid: "5df6e0e2761359d30a8275058e299fcc0381534545f55cf43e41983f5d4c9456"
422+
.parse()
423+
.unwrap(),
424+
});
425+
426+
// Round-trip through serde_json::Value forces serde to buffer the
427+
// value into a Content tree, then replay it through
428+
// ContentDeserializer when resolving the internally-tagged enum
429+
// variant. The canonical HR form serializes a hash as a hex string,
430+
// so this exercises the visit_str path through ContentDeserializer.
431+
let value = serde_json::to_value(&original).unwrap();
432+
let restored: Tagged = serde_json::from_value(value).unwrap();
433+
assert_eq!(original, restored);
434+
435+
// Hand-build the array-of-numbers form of the Txid inside the tagged
436+
// enum and deserialize from it. This routes through `ContentDeserializer`
437+
// (because of the internally-tagged enum) and exercises the
438+
// `visit_seq` path on the unified visitor — the exact shape produced
439+
// downstream when a non-human-readable encoder hands hash bytes to a
440+
// tagged-enum-bearing context. Before the fix the visitor only had
441+
// string/bytes-disjoint visitors and rejected this shape.
442+
let raw_txid_bytes: [u8; 32] = [
443+
0x56, 0x94, 0x4c, 0x5d, 0x3f, 0x98, 0x41, 0x3e, 0xf4, 0x5c, 0xf5, 0x45, 0x45, 0x53,
444+
0x81, 0x03, 0xcc, 0x9f, 0x29, 0x8e, 0x05, 0x75, 0x82, 0x0a, 0xd3, 0x59, 0x13, 0x76,
445+
0xe2, 0xe0, 0xf6, 0x5d,
446+
];
447+
let arr_value = serde_json::Value::Array(
448+
raw_txid_bytes.iter().map(|b| serde_json::Value::Number((*b).into())).collect(),
449+
);
450+
let map_form = serde_json::json!({
451+
"type": "A",
452+
"txid": arr_value,
453+
});
454+
let from_arr: Tagged = serde_json::from_value(map_form).unwrap();
455+
assert_eq!(original, from_arr);
456+
457+
// The canonical HR string form must still deserialize, so existing
458+
// JSON producers do not break.
459+
let from_string: Txid = serde_json::from_str(
460+
"\"5df6e0e2761359d30a8275058e299fcc0381534545f55cf43e41983f5d4c9456\"",
461+
)
462+
.unwrap();
463+
assert_eq!(
464+
from_string,
465+
"5df6e0e2761359d30a8275058e299fcc0381534545f55cf43e41983f5d4c9456"
466+
.parse::<Txid>()
467+
.unwrap(),
468+
);
469+
470+
// Plain bincode (non-human-readable) round-trip of a `Txid` must
471+
// still succeed via the byte-shape branch — guards against breaking
472+
// the `visit_seq` path used by length-prefixed sequence formats.
473+
let raw: Txid =
474+
"5df6e0e2761359d30a8275058e299fcc0381534545f55cf43e41983f5d4c9456".parse().unwrap();
475+
let cfg = bincode::config::standard();
476+
let bytes = bincode::serde::encode_to_vec(raw, cfg).unwrap();
477+
let (decoded, _): (Txid, _) = bincode::serde::decode_from_slice(&bytes, cfg).unwrap();
478+
assert_eq!(raw, decoded);
479+
}
480+
481+
/// 20-byte hash (PubkeyHash) goes through the same path. Smaller hash
482+
/// length exercises a different `raw_len_bytes` / `hex_len_bytes`
483+
/// disambiguation in the visitor.
484+
#[test]
485+
fn serde_round_trip_through_internally_tagged_enum_pubkey_hash() {
486+
#[derive(Debug, PartialEq, Serialize, Deserialize)]
487+
struct WithPubkeyHash {
488+
pkh: PubkeyHash,
489+
}
490+
491+
#[derive(Debug, PartialEq, Serialize, Deserialize)]
492+
#[serde(tag = "type")]
493+
enum Tagged {
494+
A(WithPubkeyHash),
495+
}
496+
497+
let original = Tagged::A(WithPubkeyHash {
498+
pkh: PubkeyHash::from_hex("e8b43025641eea4fd21190f01bd870ef90f1a8b1").unwrap(),
499+
});
500+
501+
let value = serde_json::to_value(&original).unwrap();
502+
let restored: Tagged = serde_json::from_value(value).unwrap();
503+
assert_eq!(original, restored);
504+
}
505+
}

hashes/src/serde_macros.rs

Lines changed: 102 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -23,60 +23,120 @@ pub mod serde_details {
2323
use core::{fmt, ops, str};
2424

2525
use crate::Error;
26-
struct HexVisitor<ValueT>(PhantomData<ValueT>);
2726
use serde::{de, Deserializer, Serializer};
2827

29-
impl<'de, ValueT> de::Visitor<'de> for HexVisitor<ValueT>
28+
/// Single visitor that accepts every shape a hash can arrive in: an ASCII
29+
/// hex string (`visit_str`), a UTF-8 byte slice that decodes as hex
30+
/// (`visit_bytes` of length `2*N` — note `N` is in BYTES per the macro,
31+
/// see `serde_impl!` invocation in `internal_macros.rs`), a raw byte
32+
/// slice of the hash's length-in-bytes (`visit_bytes` of length `N`),
33+
/// or a length-prefixed sequence of `u8` from non-self-describing
34+
/// formats (`visit_seq`, used by bincode).
35+
///
36+
/// Required to interoperate with serde's `ContentDeserializer`, the
37+
/// format-agnostic intermediate buffer serde uses to dispatch
38+
/// internally-tagged enums (`#[serde(tag = "...")]`), `flatten`, and
39+
/// untagged enums. `ContentDeserializer` always reports
40+
/// `is_human_readable() == true` regardless of the upstream format. This
41+
/// is intentional in serde's source — see long-standing upstream issues;
42+
/// the maintainers consider it working-as-intended and the recommended
43+
/// pattern is **"don't branch on `is_human_readable()` for shape dispatch
44+
/// — accept any shape."** A value originally written by a
45+
/// non-human-readable encoder (raw bytes) can therefore be replayed into
46+
/// the human-readable branch as bytes / a byte-buf and must be accepted
47+
/// there. See the regression tests in
48+
/// `dash/src/hash_types.rs::serde_round_trip_through_internally_tagged_enum`.
49+
struct AnyShapeVisitor<ValueT>(PhantomData<ValueT>);
50+
51+
impl<'de, ValueT> de::Visitor<'de> for AnyShapeVisitor<ValueT>
3052
where
31-
ValueT: FromStr,
53+
ValueT: SerdeHash,
3254
<ValueT as FromStr>::Err: fmt::Display,
3355
{
3456
type Value = ValueT;
3557

3658
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
37-
formatter.write_str("an ASCII hex string")
59+
formatter.write_str("an ASCII hex string or a byte string of the hash length")
3860
}
3961

40-
fn visit_bytes<E>(self, v: &[u8]) -> Result<Self::Value, E>
62+
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
4163
where
4264
E: de::Error,
4365
{
44-
if let Ok(hex) = str::from_utf8(v) {
45-
Self::Value::from_str(hex).map_err(E::custom)
46-
} else {
47-
Err(E::invalid_value(de::Unexpected::Bytes(v), &self))
48-
}
66+
Self::Value::from_str(v).map_err(E::custom)
4967
}
5068

51-
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
69+
fn visit_borrowed_str<E>(self, v: &'de str) -> Result<Self::Value, E>
5270
where
5371
E: de::Error,
5472
{
5573
Self::Value::from_str(v).map_err(E::custom)
5674
}
57-
}
58-
59-
struct BytesVisitor<ValueT>(PhantomData<ValueT>);
60-
61-
impl<'de, ValueT> de::Visitor<'de> for BytesVisitor<ValueT>
62-
where
63-
ValueT: SerdeHash,
64-
<ValueT as FromStr>::Err: fmt::Display,
65-
{
66-
type Value = ValueT;
6775

68-
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
69-
formatter.write_str("a bytestring")
76+
fn visit_bytes<E>(self, v: &[u8]) -> Result<Self::Value, E>
77+
where
78+
E: de::Error,
79+
{
80+
// Disambiguate by length. A correctly-sized raw hash byte string
81+
// is exactly `N` bytes (`N` is in bytes per the macro — see
82+
// `serde_impl!` invocation in `internal_macros.rs`); a hex-encoded
83+
// form of that hash is `2*N` ASCII bytes. Any other length is
84+
// rejected.
85+
let raw_len_bytes = ValueT::N;
86+
let hex_len_bytes = raw_len_bytes * 2;
87+
if v.len() == raw_len_bytes {
88+
SerdeHash::from_slice_delegated(v)
89+
.map_err(|_| E::invalid_length(v.len(), &stringify!(N)))
90+
} else if v.len() == hex_len_bytes {
91+
if let Ok(hex) = str::from_utf8(v) {
92+
Self::Value::from_str(hex).map_err(E::custom)
93+
} else {
94+
Err(E::invalid_value(de::Unexpected::Bytes(v), &self))
95+
}
96+
} else {
97+
Err(E::invalid_length(v.len(), &self))
98+
}
7099
}
71100

72-
fn visit_bytes<E>(self, v: &[u8]) -> Result<Self::Value, E>
101+
fn visit_borrowed_bytes<E>(self, v: &'de [u8]) -> Result<Self::Value, E>
73102
where
74103
E: de::Error,
75104
{
76-
SerdeHash::from_slice_delegated(v).map_err(|_| {
77-
// from_slice only errors on incorrect length
78-
E::invalid_length(v.len(), &stringify!(N))
79-
})
105+
self.visit_bytes(v)
106+
}
107+
108+
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
109+
where
110+
A: de::SeqAccess<'de>,
111+
{
112+
// Used by bincode and any non-self-describing format that emits a
113+
// length-prefixed sequence of u8. Use a stack buffer sized to fit
114+
// the largest hash this trait services (sha512 / Hmac<sha512> at
115+
// 64 bytes) so we keep `no_std`-compatible (no `Vec`/`alloc`).
116+
// Bumping `MAX_HASH_BYTES` is only needed if a wider digest type
117+
// is added — `debug_assert!` catches that in tests.
118+
const MAX_HASH_BYTES: usize = 64;
119+
let raw_len_bytes = ValueT::N;
120+
debug_assert!(
121+
raw_len_bytes <= MAX_HASH_BYTES,
122+
"hash byte-length {} exceeds AnyShapeVisitor stack buffer ({}); bump MAX_HASH_BYTES",
123+
raw_len_bytes,
124+
MAX_HASH_BYTES,
125+
);
126+
let mut buf = [0u8; MAX_HASH_BYTES];
127+
let mut len: usize = 0;
128+
while let Some(b) = seq.next_element::<u8>()? {
129+
if len == raw_len_bytes {
130+
return Err(de::Error::invalid_length(len + 1, &stringify!(N)));
131+
}
132+
buf[len] = b;
133+
len += 1;
134+
}
135+
if len != raw_len_bytes {
136+
return Err(de::Error::invalid_length(len, &stringify!(N)));
137+
}
138+
SerdeHash::from_slice_delegated(&buf[..len])
139+
.map_err(|_| de::Error::invalid_length(len, &stringify!(N)))
80140
}
81141
}
82142

@@ -90,7 +150,7 @@ pub mod serde_details {
90150
+ ops::Index<ops::RangeFull, Output = [u8]>,
91151
<Self as FromStr>::Err: fmt::Display,
92152
{
93-
/// Size, in bits, of the hash.
153+
/// Size of the hash, in bytes.
94154
const N: usize;
95155

96156
/// Helper function to turn a deserialized slice into the correct hash type.
@@ -106,11 +166,22 @@ pub mod serde_details {
106166
}
107167

108168
/// Do serde deserialization.
169+
///
170+
/// Uses a single visitor that accepts every shape a hash can arrive
171+
/// in (ASCII hex string, raw byte slice, length-prefixed `u8`
172+
/// sequence). The HR branch dispatches via `deserialize_any` to
173+
/// handle both true human-readable deserializers (where the visitor
174+
/// receives `visit_str`) and serde's `ContentDeserializer` (which
175+
/// reports `is_human_readable() == true` even when wrapping bytes
176+
/// from a non-HR source — internally-tagged enums, `flatten`, and
177+
/// untagged enums all route through it). The non-HR branch keeps
178+
/// `deserialize_bytes` because bincode is non-self-describing and
179+
/// does not support `deserialize_any`.
109180
fn deserialize<'de, D: Deserializer<'de>>(d: D) -> Result<Self, D::Error> {
110181
if d.is_human_readable() {
111-
d.deserialize_str(HexVisitor::<Self>(PhantomData))
182+
d.deserialize_any(AnyShapeVisitor::<Self>(PhantomData))
112183
} else {
113-
d.deserialize_bytes(BytesVisitor::<Self>(PhantomData))
184+
d.deserialize_bytes(AnyShapeVisitor::<Self>(PhantomData))
114185
}
115186
}
116187
}

0 commit comments

Comments
 (0)