Skip to content

Commit fc30171

Browse files
MagicalTuxclaude
andcommitted
fix(decoder): report OutputFull when the output buffer is full
A RawDecoder that buffers a whole block internally (notably bzip2, which absorbs an entire BWT block before draining it) could make a naive decode loop fail with UnexpectedEnd. When the caller's output buffer filled mid-block, the RawDecoder->Decoder bridge derived Status purely from consumed >= input.len(); since the decoder had already swallowed all the input, it returned InputEmpty instead of OutputFull. A loop that stops on InputEmpty then called finish() on a half-drained stream and got UnexpectedEnd — even on the decoder's own encoder output. Return OutputFull whenever the output buffer is full (and non-empty), which is always the correct "drain and call again" signal; a later call with no remaining input yields InputEmpty once pending bytes are out. Genuine truncation still errors (that path returns with output not full). Adds round_trip_small_output_buffer_naive_loop, which drives the exact documented decode loop with 1/64/4096/65536-byte output buffers over 100 KB-1 MB inputs and failed with UnexpectedEnd before this change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ee147c8 commit fc30171

3 files changed

Lines changed: 72 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Fixed
11+
12+
- *(decoder bridge)* a decoder that buffers a whole block internally (notably
13+
`bzip2`) could fail a naive decode loop with `UnexpectedEnd`. When the
14+
caller's `output` buffer filled mid-block, the `RawDecoder``Decoder` bridge
15+
reported `InputEmpty` (because the decoder had already absorbed all the input)
16+
instead of `OutputFull`; a loop that stops on `InputEmpty` then called
17+
`finish` on a half-drained stream and got `UnexpectedEnd`. The bridge now
18+
returns `OutputFull` whenever the output buffer is full, which is always the
19+
correct "drain and call again" signal. Affected `bzip2` round-trips whenever a
20+
decoded block was larger than the output buffer.
21+
1022
## [0.6.6](https://github.com/KarpelesLab/compcol/compare/v0.6.5...v0.6.6) - 2026-06-27
1123

1224
### Added

src/traits.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,17 @@ impl<T: RawDecoder> Decoder for T {
364364
let p = self.raw_decode(input, output)?;
365365
let status = if p.done {
366366
Status::StreamEnd
367+
} else if p.written == output.len() && !output.is_empty() {
368+
// Output buffer is full. A decoder that buffers input internally
369+
// (e.g. bzip2 absorbs a whole block before draining it) reports
370+
// `consumed == input.len()` here even though it still has decoded
371+
// bytes pending — so keying the status off `consumed` alone would
372+
// wrongly say `InputEmpty` and a standard loop would stop early
373+
// (then `finish` sees a half-drained stream → `UnexpectedEnd`).
374+
// A filled output always means "drain and call again", which is
375+
// exactly `OutputFull`'s contract; calling again with no remaining
376+
// input simply yields `InputEmpty` once the pending bytes are out.
377+
Status::OutputFull
367378
} else if p.consumed >= input.len() {
368379
Status::InputEmpty
369380
} else {

tests/bzip2.rs

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,55 @@ fn round_trip_streaming_one_byte() {
268268
assert_eq!(decoded, input);
269269
}
270270

271+
/// Decode following the *exact* streaming loop documented on the `Decoder`
272+
/// trait — no defensive "drain after InputEmpty" workaround like
273+
/// [`decode_chunked`] has. A naive caller breaks out of the decode loop on
274+
/// `InputEmpty` and then calls `finish`. This is the pattern that regressed:
275+
/// the decoder buffers a whole block internally, so when the small `output`
276+
/// fills mid-block it used to report `InputEmpty` (instead of `OutputFull`),
277+
/// the loop stopped with the block half-drained, and `finish` then failed with
278+
/// `UnexpectedEnd`.
279+
fn decode_documented_loop(encoded: &[u8], out_chunk: usize) -> Result<Vec<u8>, Error> {
280+
let mut dec = Decoder::new();
281+
let mut buf = vec![0u8; out_chunk];
282+
let mut out = Vec::new();
283+
let mut consumed = 0;
284+
loop {
285+
let (p, status) = dec.decode(&encoded[consumed..], &mut buf)?;
286+
out.extend_from_slice(&buf[..p.written]);
287+
consumed += p.consumed;
288+
match status {
289+
Status::OutputFull => continue,
290+
Status::InputEmpty => break,
291+
Status::StreamEnd => return Ok(out),
292+
}
293+
}
294+
loop {
295+
let (p, status) = dec.finish(&mut buf)?;
296+
out.extend_from_slice(&buf[..p.written]);
297+
if matches!(status, Status::StreamEnd) {
298+
break;
299+
}
300+
}
301+
Ok(out)
302+
}
303+
304+
#[test]
305+
fn round_trip_small_output_buffer_naive_loop() {
306+
// Inputs larger than the output buffer force the decoder to drain a single
307+
// decoded block across several `decode` calls. Before the fix this failed
308+
// with `UnexpectedEnd` for any block bigger than `out_chunk`.
309+
for &n in &[100_000usize, 600_000, 1_000_000] {
310+
let input: Vec<u8> = (0..n).map(|i| (i.wrapping_mul(2654435761) >> 13) as u8).collect();
311+
let encoded = encode_all(&input);
312+
for &out_chunk in &[1usize, 64, 4096, 65536] {
313+
let decoded = decode_documented_loop(&encoded, out_chunk)
314+
.unwrap_or_else(|e| panic!("n={n} out_chunk={out_chunk}: {e:?}"));
315+
assert_eq!(decoded, input, "n={n} out_chunk={out_chunk}");
316+
}
317+
}
318+
}
319+
271320
// ─── reset / reuse ─────────────────────────────────────────────────────
272321

273322
#[test]

0 commit comments

Comments
 (0)