Skip to content

Fix DataDescriptorStream infinite loop on signature boundary match#1175

Draft
Copilot wants to merge 4 commits into
masterfrom
copilot/fix-data-descriptor-stream-bug
Draft

Fix DataDescriptorStream infinite loop on signature boundary match#1175
Copilot wants to merge 4 commits into
masterfrom
copilot/fix-data-descriptor-stream-bug

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 28, 2026

Fix DataDescriptorStream boundary bug when signature starts at buffer edge

  • Understand the bug in DataDescriptorStream.Read() method
  • Identify root cause: infinite loop when rewinding to same position
  • Implement correct fix: detect infinite loop vs legitimate partial match
  • Test the fix with existing ZIP tests (38/38 passing)
  • Format code with CSharpier
  • Add regression test for the boundary bug scenario
  • Address feedback: fix handles legitimate cross-boundary signatures
Original prompt

This section details on the original issue you should resolve

<issue_title>DataDescriptorStream boundary bug when signature starts at buffer edge</issue_title>
<issue_description>When the ZIP data is read in streaming mode and the first byte of the data descriptor signature (0x50 = P) appears as the last byte of a read buffer, the reader gets stuck and the stream positioning breaks.

This causes extraction to fail.

This seems to be caused by how DataDescriptorStream.Read() handles partial matches of PK\07\08 across buffer boundaries.

Observed Behavior

When the last byte of a buffer is 0x50 ('P'):

  1. _searchPosition becomes 1
  2. At end of buffer, this code runs:
    if (_searchPosition > 0)
    {
    read -= _searchPosition;
    _stream.Position -= _searchPosition;
    _searchPosition = 0;
    }
  3. One byte is “rewound”
  4. On next read, only 1 byte is returned
  5. That byte is again P
  6. Same logic repeats
  7. Read returns 0 → stream appears finished
  8. Extraction stops prematurely

Reproduction

This happens reliably with:
• Streaming ZIP reader
• DataDescriptorStream
• BufferSize = 64k
• Payload length > 64k bytes
• Payload filled with 0x50 ('P')

repro.zip</issue_description>

Comments on the Issue (you are @copilot in this section)

@adamhathcock Sounds similar to https://github.com//issues/1168

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

When the first byte of the data descriptor signature (0x50 = 'P') appears as
the last byte of a read buffer, the reader would rewind and re-read the same
byte repeatedly, causing an infinite loop.

The fix tracks the last rewind position and prevents rewinding to the same
position twice, breaking the infinite loop while preserving correct behavior.

Fixes boundary condition where signature bytes at buffer edge cause hang.

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix boundary bug in DataDescriptorStream Fix DataDescriptorStream infinite loop on signature boundary match Jan 28, 2026
Copilot AI requested a review from adamhathcock January 28, 2026 12:40
@adamhathcock
Copy link
Copy Markdown
Owner

@copilot make a test for the situation described

Test reproduces the infinite loop scenario where:
- ZIP file contains data filled with 0x50 ('P') bytes
- Non-seekable stream triggers DataDescriptorStream usage
- Partial signature matches at buffer boundaries caused infinite rewind loop

The test verifies that with the fix in place, extraction completes
successfully without hanging. It includes safeguards to detect if the
infinite loop condition occurs (> 1000 read iterations for 100KB file).

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jan 28, 2026

@copilot make a test for the situation described

Added regression test in commit e896179.

The test reproduces the boundary bug scenario:

  • Creates a ZIP with 100KB of 0x50 ('P') bytes (first byte of PK signature)
  • Uses non-seekable stream to trigger DataDescriptorStream
  • Uses CompressionType.None to ensure DataDescriptorStream is activated
  • Includes safeguards to detect infinite loop (fails if > 1000 read iterations)
  • Verifies correct extraction of all data

The test passes with the fix and would fail with the infinite loop before the fix.

@p247
Copy link
Copy Markdown

p247 commented Jan 28, 2026

This fixes the case where part of the data descriptor is included in the extracted data. However, if the actual data descriptor is split at the end of the buffer, it will no longer be detected.

@adamhathcock
Copy link
Copy Markdown
Owner

@copilot see this:

This fixes the case where part of the data descriptor is included in the extracted data. However, if the actual data descriptor is split at the end of the buffer, it will no longer be detected.

Previous fix prevented infinite loops but broke detection of legitimate
data descriptor signatures split across buffer boundaries.

New approach:
- Track previous rewind count in _previousSearchPosition
- Detect infinite loop when read size equals previous rewind count
  AND all bytes match the signature pattern
- In that case, treat bytes as data (not signature) to break loop
- Otherwise, allow normal rewind/match to continue for legitimate
  cross-boundary signatures

This fixes both cases:
1. Infinite loop on data containing signature-like bytes (0x50...)
2. Correct detection of actual signatures split at buffer edge

Co-authored-by: adamhathcock <527620+adamhathcock@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jan 29, 2026

@copilot see this:

This fixes the case where part of the data descriptor is included in the extracted data. However, if the actual data descriptor is split at the end of the buffer, it will no l...

Fixed in commit c994a9f.

The previous fix broke legitimate cross-boundary signature detection. The new approach:

  • Detects infinite loop by checking if we read the exact same bytes again (read size == previous rewind count)
  • If so, treats those bytes as data (not signature) to break the loop
  • Otherwise, allows normal rewind/match for legitimate partial signatures

This correctly handles both:

  1. Infinite loop on data with signature-like bytes (0x50...)
  2. Valid signatures split across buffer boundaries

Base automatically changed from release to master March 28, 2026 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants