Skip to content

repr: Harden proto/Row decoding against malformed input#36984

Merged
def- merged 14 commits into
MaterializeInc:mainfrom
def-:fuzz-repr
Jun 23, 2026
Merged

repr: Harden proto/Row decoding against malformed input#36984
def- merged 14 commits into
MaterializeInc:mainfrom
def-:fuzz-repr

Conversation

@def-

@def- def- commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Hardens mz-repr proto/Row decoding against malformed/untrusted bytes — these paths are reachable from persisted state and the wire, so a panic is an availability bug. Replaces panics/asserts with proper decode errors and validates ranges (Date, CheckedTimestamp, ProtoNumeric, ProtoRange, ProtoRelationDesc, ProtoRow dict ordering, uuid parsing, leap-second truncation, Avro decimal).

Found by the cargo-fuzz suite (separate infra PR). Each fix has a regression test.

🤖 Generated with Claude Code

@def- def- requested a review from antiguru June 12, 2026 10:41
@def- def- marked this pull request as ready for review June 12, 2026 10:41
@def- def- requested review from a team as code owners June 12, 2026 10:41
@def-

def- commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Ready for review

@antiguru antiguru left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

Comment thread src/repr/src/row/encode.rs Outdated
Comment thread src/repr/src/row/encode.rs Outdated
def- and others added 13 commits June 23, 2026 08:27
`Decimal::from_packed_bcd` calls the C function `decPackedToNumber`,
which segfaults on empty bcd input. Reachable from untrusted proto
bytes (anyone able to send a `ProtoRow` could crash the process via a
`ProtoNumeric { bcd: [] }` datum). Reject the empty case before
descending into the FFI.
`push_range_with` returned a `Result`, but the proto decode path
unconditionally `.expect(...)`ed it — meaning any `ProtoRange` that
`push_range_with` rejects (e.g. lower > upper from an attacker-crafted
proto) panicked the process. Propagate the error to the caller via
`?`, matching the rest of the proto decode path.
`<CheckedTimestamp<_> as RustType<ProtoNaiveDateTime>>::from_proto`
constructed the struct directly (`Self { t: ... }`), bypassing the
range validation that `from_timestamplike` enforces. Out-of-range
values pushed into a `Row` cleanly, but `read_datum` then called
`from_timestamplike(...).expect(...)` while iterating and panicked —
reachable from untrusted proto bytes.

Go through `from_timestamplike` in `from_proto` so the value is
rejected at decode time.
Same shape as the `CheckedTimestamp` fix: `Date::from_proto`
constructed `Date { days: proto.days }` directly, bypassing the range
validation that `from_pg_epoch` enforces. Out-of-range days pushed
into a `Row` cleanly, then `read_datum` panicked on
`Date::from_pg_epoch(days).expect(...)` while iterating.

Go through `from_pg_epoch` in `from_proto` so the value is rejected
at decode time.
`SqlScalarType::{List,Record,Map}` from `ProtoScalarType` called
`x.custom_id.map(|id| id.into_rust().unwrap())` — a malformed
`ProtoCatalogItemId` inside any of those three variants panicked the
process. Reachable from untrusted proto bytes (an attacker-crafted
`ProtoRow` containing a list/record/map value with a bad custom_id).
Propagate via `transpose()?` instead.
The non-migration branch zipped `proto.names` and `proto.metadata`
via `zip_eq`, which panics on length mismatch — reachable from
untrusted proto bytes. Check the lengths explicitly and return
`InvalidFieldError`.
`RowPacker::push_range_with` documents a panic if a finite range
bound expression pushes `Datum::Null`. The proto-decoding path
called it with a closure that just forwarded the proto datum,
which would happily push a `Null` from a malicious or corrupt
`ProtoRangeInner.lower/upper` and crash the process. Detect the
`Null` ProtoDatum up front and return a decode error instead.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The `uuid` crate panics while constructing a parse error for some short,
brace-wrapped inputs (e.g. `{}`, `{\0}`) — it mis-slices the input
("byte range starts at 1 but ends at 0"). `strconv::parse_uuid` is reached
from `Value::decode_text` for a client-supplied UUID bind parameter, so
this is a client-triggerable crash. Reject anything that can't be a UUID
(shorter than 32 chars, or non-ASCII) before handing it to the crate, and
route the two pgrepr UUID text-decode call sites through `parse_uuid`
(they previously called `Uuid::parse_str` directly, bypassing the guard).

Found by the value_decode_text cargo-fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`RowPacker::push_range_with` asserted (`assert!`, so a panic even in
release) that a range's bounds are non-null, of consistent datum kind, and
the expected count. These bounds are reconstructed from an untrusted
`ProtoRow`, so a crafted/corrupted range (e.g. an Array lower and a Map
upper) panicked the decoder. Return a new `InvalidRangeError::InvalidRangeData`
(with a matching proto variant) instead; valid callers never hit it.

Found by the row_proto_roundtrip cargo-fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`DatumDictIter` debug-asserts that map keys are unique and strictly ascending,
but the `ProtoRow` -> `Row` decode packed a dict's elements verbatim without
checking. A crafted proto with duplicate or out-of-order keys therefore decoded
to a malformed map that tripped the assert (a panic in debug builds) the next
time it was iterated. Validate the key order while decoding and reject a
violation as a decode error. Regression for the row_proto_roundtrip cargo-fuzz
finding.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`CheckedTimestamp::round_to_precision` — applied by the string->timestamp and
string->timestamptz casts before storage — truncates to microsecond (or, for
`date_trunc`-style precision, millisecond) resolution via
`truncate_microseconds`/`truncate_milliseconds`. Both rebuilt the time with
`NaiveTime::from_hms_{micro,milli}_opt`, whose leap-second sub-second range
(>= 1s) is only accepted when `sec == 59`.

A parsed `:60` literal is stored as chrono's leap representation, and time-zone
math can leave that 1s sub-second on a second other than `:59` (e.g.
`12:30:56` with a 1,000,000,000ns sub-second). The constructor then returned
`None` and the `.unwrap()` panicked, so e.g. `SELECT TIMESTAMPTZ '...:60...'`
aborted the process.

Truncate with `with_nanosecond` instead: it accepts the whole [0, 2s) range at
any second, so it preserves the value (leap included) and never fails here.
Found by the strconv timestamp round-trip fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
twos_complement_be_to_numeric_inner read input[0] without first checking
that the slice is non-empty, panicking with "index out of bounds: the len
is 0 but the index is 0" on an empty byte string.

That input is reachable from the wire: an Avro `decimal` logical type whose
unscaled value is encoded as zero-length `bytes` flows straight from a Kafka
message body through AvroFlatDecoder into this function, so a single crafted
(or buggy-producer) message crashes Avro source decoding — an availability
bug for ingestion. Our own encoder never emits an empty byte string (zero is
the single byte 0x00), and Java's Avro decoder likewise rejects empty input
(BigInteger throws "Zero length BigInteger"), so the right behaviour is a
clean decode error, not a silent zero. The caller already maps the Err to a
DecodeError, so the malformed row now surfaces as a decode error instead of
aborting the worker.

Found by the new avro_decode_fuzzed_schema fuzz target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Moritz Hoffmann <antiguru@gmail.com>
@def- def- enabled auto-merge (squash) June 23, 2026 08:32
@def- def- merged commit 2bd0f58 into MaterializeInc:main Jun 23, 2026
120 checks passed
@def- def- deleted the fuzz-repr branch June 23, 2026 09:36
def- added a commit to def-/materialize that referenced this pull request Jun 24, 2026
…panic

`strconv::parse_uuid` (MaterializeInc#36984) rejected inputs shorter than 32 chars or
non-ASCII before handing them to the `uuid` crate, on the premise that
`Uuid::parse_str` panics (rather than erroring) while building its parse error
for some malformed inputs by slicing on a non-char boundary. That premise is
false for the version we lock (uuid 1.19.0) and for every release back to
0.8.2: `Uuid::parse_str` returns a clean `Err` for all such inputs (`{}`,
`{\0}`, non-ASCII, etc.). `InvalidUuid::into_err` rejects non-UTF8 up front,
strips the `{}`/`urn:uuid:` wrappers on raw bytes (always standalone ASCII, so
char-safe), and its one subtraction can't underflow.

So the guard only suppressed the `uuid` crate's own (more detailed) error
message for short/non-ASCII input, and the `value_decode_text` regression test
asserted an invariant the crate already upholds. No saved cargo-fuzz artifact
reaches a uuid panic. Restore the original parser and drop the vacuous test.
Mirrors removing the equivalent persist `parse_id`/`ShardId::from_str` guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
frankmcsherry added a commit that referenced this pull request Jun 24, 2026
Rebased the int-unification spike onto current main. The base spike commit
cherry-picked cleanly; this commit re-applies the mechanical follow-on work
against the new base and migrates the references that landed in main after the
original branch point (#37155, #37177, #36776 count-aware-aggregates, #36984):

- production: signed-sum fold + SumInt16/32 narrow (relation/func.rs), the
  generate_series-unoptimized literal (sql/func.rs), and the generate_series
  collapse (projection_pushdown.rs: ReprScalarType::Int64 -> Int, and the
  literal_i64 closure's int2/4/8 arms collapse to one Datum::Int arm).
- test/bench/doctest: ~292 Datum::IntN -> Datum::Int/UInt sites, plus the
  PropDatum::Int32 narrowings and an i64-typed literal in casts.rs.

Workspace builds clean under --all-targets.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
frankmcsherry added a commit that referenced this pull request Jun 24, 2026
Rebased the int-unification spike onto current main. The base spike commit
cherry-picked cleanly; this commit re-applies the mechanical follow-on work
against the new base and migrates the references that landed in main after the
original branch point (#37155, #37177, #36776 count-aware-aggregates, #36984):

- production: signed-sum fold + SumInt16/32 narrow (relation/func.rs), the
  generate_series-unoptimized literal (sql/func.rs), and the generate_series
  collapse (projection_pushdown.rs: ReprScalarType::Int64 -> Int, and the
  literal_i64 closure's int2/4/8 arms collapse to one Datum::Int arm).
- test/bench/doctest: ~292 Datum::IntN -> Datum::Int/UInt sites, plus the
  PropDatum::Int32 narrowings and an i64-typed literal in casts.rs.

Workspace builds clean under --all-targets.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
def- added a commit to def-/materialize that referenced this pull request Jun 25, 2026
…panic

`strconv::parse_uuid` (MaterializeInc#36984) rejected inputs shorter than 32 chars or
non-ASCII before handing them to the `uuid` crate, on the premise that
`Uuid::parse_str` panics (rather than erroring) while building its parse error
for some malformed inputs by slicing on a non-char boundary. That premise is
false for the version we lock (uuid 1.19.0) and for every release back to
0.8.2: `Uuid::parse_str` returns a clean `Err` for all such inputs (`{}`,
`{\0}`, non-ASCII, etc.). `InvalidUuid::into_err` rejects non-UTF8 up front,
strips the `{}`/`urn:uuid:` wrappers on raw bytes (always standalone ASCII, so
char-safe), and its one subtraction can't underflow.

So the guard only suppressed the `uuid` crate's own (more detailed) error
message for short/non-ASCII input, and the `value_decode_text` regression test
asserted an invariant the crate already upholds. No saved cargo-fuzz artifact
reaches a uuid panic. Restore the original parser and drop the vacuous test.
Mirrors removing the equivalent persist `parse_id`/`ShardId::from_str` guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants