Skip to content

Unsound buffer initialization via Vec::set_len exposing uninitialized u8 values #136

Description

@Manishearth

Note

This finding was identified during an agentic unsafe Rust code review performed by Gemini AI, followed by human review and verification.

The Issue

V2Serializer::serialize reserves capacity in self.buf (Vec<u8>) using self.buf.reserve(max_size) and subsequently sets the length of the vector to max_size inside an unsafe block without initializing the newly allocated elements:

unsafe {
// want to treat the rest of the vec as a slice, and we've already reserved this
// space, so this way we don't have to resize() on a lot of dummy bytes.
self.buf.set_len(max_size);
}

While reserve(max_size) ensures sufficient allocation capacity, reserve allocates uninitialized memory from the global allocator without zeroing or writing valid byte values into it. Consequently, when self.buf.set_len(max_size) is executed, the elements from index V2_HEADER_SIZE (40) to max_size are completely uninitialized.

This itself is not UB, however taking a slice to uninitialized memory is:

let counts_len = encode_counts(h, &mut self.buf[V2_HEADER_SIZE..])?;

Finally, encode_counts only writes up to total_len <= max_size, leaving any trailing capacity total_len..max_size permanently uninitialized while included in self.buf.len().

Any caller invoking V2Serializer::serialize (or Serializer::serialize) on a histogram triggers this UB.

Attempt at Minimal Reproduction (Miri)

Unfortunately, miri does not currently transitively check slice validity (I guess it's expensive). The hacky way to repro this is to show that the vector is in an uninitialized state, but you need unsafe code to pull it out.

The UB has already happened in the .serialize() line, before the unsafe code; Miri just doesn't catch it.

use hdrhistogram::serialization::{Serializer, V2Serializer};
use hdrhistogram::Histogram;

fn main() {
    let mut h = Histogram::<u64>::new(3).unwrap();
    h.record(42).unwrap();
    h.record(1000).unwrap();

    let mut buf = Vec::new();
    let mut serializer = V2Serializer::new();
    let total_len = serializer.serialize(&h, &mut buf).unwrap();

    // V2Serializer contains a single field `buf: Vec<u8>`.
    // We can cast a pointer to V2Serializer to a pointer to Vec<u8> to access it.
    let internal_buf: &Vec<u8> =
        unsafe { &*(&serializer as *const V2Serializer as *const Vec<u8>) };

    // Read the uninitialized bytes beyond total_len (which are within internal_buf.len())
    if internal_buf.len() > total_len {
        println!("Reading uninitialized byte: {}", internal_buf[total_len]);
    }
}

Running under Miri produces the following trace:

error: Undefined Behavior: reading memory at alloc6709[0x2d..0x2e], but memory is uninitialized at [0x2d..0x2e], and this operation requires initialized memory
   --> /usr/local/google/home/manishearth/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs:143:55
    |
143 |                     unsafe { f.pad_integral(true, "", self._fmt(&mut buf)) }
    |                                                       ^^^^ Undefined Behavior occurred here
...
598 |     impl_Display!(i8, u8, i16, u16, i32, u32, i64, u64, isize, usize; as u64 into display_u64);
    |     ------------------------------------------------------------------------------------------ in this macro invocation
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: stack backtrace:
            0: core::fmt::num::imp::<impl std::fmt::Display for u8>::fmt
                at /usr/local/google/home/manishearth/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs:143:55: 143:59
            1: core::fmt::rt::Argument::<'_>::fmt
                at /usr/local/google/home/manishearth/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/rt.rs:152:76: 152:95
            2: std::fmt::write
                at /usr/local/google/home/manishearth/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:1685:17: 1687:80

Note

The full audit report below also contains additional minor findings (such as missing safety comments or undocumented FFI assumptions) that are probably worth fixing as well but not the primary goal of this issue. The audit report has not been human-reviewed, it may contain misleading claims.

Full Audit Report

Unsafe Rust Review: hdrhistogram (v7)

Overall Safety Assessment

hdrhistogram (v7) is a Rust port of the high-dynamic-range histogram library originally developed in Java. The crate exhibits a very low density of unsafe Rust code overall, keeping almost all core histogram bucketing, indexing, recording, concurrency (SyncHistogram), and iteration logic within safe Rust boundaries.

The entire unsafe surface area of the crate is confined to a single file, src/serialization/v2_serializer.rs, containing exactly three unsafe blocks. However, auditing this localized unsafe code under strict Rust Reference and standard library proof-obligation semantics reveals significant issues:

  1. One critical soundness vulnerability (Undefined Behavior) involving Vec::set_len on uninitialized heap capacity.
  2. One fishy finding involving an off-by-one assertion intended to guard unchecked slice accesses (get_unchecked).
  3. Complete absence of formal // SAFETY: proof comments across all three unsafe blocks.

Aside from the V2 serialization module, the crate's architecture is sound. Concurrency primitives (SyncHistogram and Recorder) build entirely on safe abstractions (Arc, Mutex, and crossbeam_channel) without manual Send or Sync implementations. Remediating the V2 serializer issues is straightforward and would bring the crate to complete soundness.

Critical Findings

1. Unsound buffer initialization via Vec::set_len exposing uninitialized u8 values 🔴 ⚠️

  • Priority: 🔴 High

  • Threat Vector: ⚠️ Accidental Misuse

  • Bug Type: Uninitialized Memory Exposure

  • Location: src/serialization/v2_serializer.rs:96

  • Description:
    In V2Serializer::serialize, self.buf (Vec<u8>) is cleared to length 0, and self.buf.reserve(max_size) is called to ensure sufficient capacity for encoded histogram data. At line 96, the author calls self.buf.set_len(max_size) inside an unsafe block with the intention of avoiding the cost of zeroing memory during buffer resizing.

  • Proof of Soundness Violation:
    According to the standard library documentation for std::vec::Vec::set_len, the caller must uphold two safety preconditions:

    1. new_len must be less than or equal to capacity().
    2. The elements at old_len..new_len must be initialized.

    While reserve(max_size) satisfies the capacity precondition, reserve allocates uninitialized memory from the global allocator without zeroing or writing valid byte values into it. When self.buf.set_len(max_size) is executed, the elements from index V2_HEADER_SIZE (40) to max_size are completely uninitialized.

    According to the Rust Reference section Behavior considered undefined:

    Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation... An uninitialized integer (i*/u*), floating point value (f*), or raw pointer.

    Because u8 does not permit uninitialized byte values (unlike MaybeUninit<u8>), setting the length of a Vec<u8> to include uninitialized elements immediately produces invalid values. Furthermore, passing &mut self.buf[V2_HEADER_SIZE..] to encode_counts creates a reference &mut [u8] pointing to uninitialized bytes, violating the fundamental Rust invariant that references must point to valid data. Finally, encode_counts only writes up to total_len <= max_size, leaving the trailing slice total_len..max_size permanently uninitialized while included in self.buf.len().

  • Remediation:
    Replace self.buf.reserve(max_size) and unsafe { self.buf.set_len(max_size) } with safe buffer resizing: self.buf.resize(max_size, 0). Alternatively, if avoiding zeroing overhead is strictly necessary, self.buf should be converted to Vec<MaybeUninit<u8>> or written to via spare_capacity_mut(), calling set_len only after elements have been written and initialized.

Fishy Findings

1. Off-by-one assertion guarding slice::get_unchecked accesses 🟠 ⚠️

  • Priority: 🟠 Medium

  • Threat Vector: ⚠️ Accidental Misuse

  • Bug Type: Improper Bounds Check

  • Location: src/serialization/v2_serializer.rs:145

  • Description:
    In encode_counts, histogram counts are iterated using an index up to index_limit, which is computed as h.index_for(h.max()). The iteration loop while index <= index_limit accesses elements via unsafe { *(h.counts.get_unchecked(index)) } at lines 149 and 161.

  • Analysis:
    According to slice::get_unchecked safety contract, calling get_unchecked with an out-of-bounds index is instant Undefined Behavior. Since the loop executes when index == index_limit, sound execution requires that index_limit < h.counts.len().

    At line 145, the author wrote an assertion intended to establish this invariant:

    assert!(index_limit <= h.counts.len());

    Note the use of <= instead of <. If index_limit == h.counts.len(), this assertion passes, but subsequent evaluation of h.counts.get_unchecked(index_limit) accesses element h.counts.len(), causing out-of-bounds memory access and Undefined Behavior.

    In practice, for any uncorrupted Histogram, counts is resized during recording to cover highest_trackable_value, and index_for(h.max()) is strictly less than h.counts.len(). The author included a helpful inline note explaining their intent: // index is inside h.counts because of the assert above. This shows good maintainer intent to document why the unchecked access is safe. However, because the assertion at line 145 uses <= instead of <, expanding this check to < and formalizing the inline note into an explicit // SAFETY: comment will make the safety invariant robust and self-documenting for future maintainers.

  • Remediation:
    Correct the assertion to strictly check in-bounds indices: assert!(index_limit < h.counts.len());.

Missing Safety Comments

None of the three unsafe blocks in the crate possess formal // SAFETY: comments outlining why proof obligations are met. Informal comments explain the author's optimization and safety intent, but formalizing them into explicit // SAFETY: comments will document the preconditions for future maintainers.

**1. src/serialization/v2_serializer.rs:96**🟠

-Context: unsafe { self.buf.set_len(max_size); }
-Missing Proof Obligation: Must prove max_size <= self.buf.capacity() and that elements in 0..max_size are initialized. (Currently unsound as noted in Critical Findings).
-Proposed Proof Comment(if refactored to initialize memory prior to set_len):

// SAFETY: `self.buf` has capacity >= `max_size` due to `reserve(max_size)` on line 79. All elements in `0..max_size` have been properly initialized prior to setting the length.

(Note: If remediated using safe code like self.buf.resize(max_size, 0), the unsafe block and comment are completely removed).

**2. src/serialization/v2_serializer.rs:149**🟡

-Context: let count = unsafe { *(h.counts.get_unchecked(index)) };
-Missing Proof Obligation: Must prove index < h.counts.len().
-Proposed Proof Comment(assuming line 145 assertion is corrected to <):

// SAFETY: The outer loop condition ensures `index <= index_limit`. The assertion on line 145 guarantees `index_limit < h.counts.len()`, proving that `index < h.counts.len()` holds for every iteration.

**3. src/serialization/v2_serializer.rs:161**🟡

-Context: && (unsafe { *(h.counts.get_unchecked(index)) } == T::zero())
-Missing Proof Obligation: Must prove index < h.counts.len() during the inner zero-skipping loop.
-Proposed Proof Comment:

// SAFETY: The inner loop condition explicitly checks `index <= index_limit` before evaluating `get_unchecked(index)`. Since `index_limit < h.counts.len()` (guaranteed by line 145), `index` is strictly within bounds of `h.counts`.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions