Skip to content

perf: enhance LZMA decompression performance and improve resource management#471

Open
unxed wants to merge 2 commits into
bodgit:mainfrom
unxed:optimized_xz
Open

perf: enhance LZMA decompression performance and improve resource management#471
unxed wants to merge 2 commits into
bodgit:mainfrom
unxed:optimized_xz

Conversation

@unxed

@unxed unxed commented Jun 22, 2026

Copy link
Copy Markdown

Description:

This pull request introduces significant performance optimizations for LZMA/LZMA2 decompression and fixes a resource management issue where internal decoder buffers and goroutines were not being properly released.

Key Changes:

  1. Resource Cleanup: Updated the Close methods in the lzma and lzma2 packages to explicitly close the internal decoder. This is crucial for returning large dictionary buffers to the sync.Pool and ensuring background goroutines are terminated, preventing memory leaks and potential hangs in high-concurrency environments.
  2. Optimized Decoder: Switched to a performance-optimized LZMA fork (github.com/unxed/xz). This fork provides faster sequential decoding through register caching and manual inlining, and supports multi-threaded block decompression.
  3. Parallel Decompression: Implemented a check for io.ReaderAt and io.Seeker interfaces within the LZMA2 constructor. When a seekable stream is detected (such as a SectionReader common in 7z archives), the decoder now attempts to use the ParallelReader for concurrent block processing, significantly increasing throughput on multi-core systems.
  4. Safety & Fallback: The implementation uses io.NewSectionReader to ensure parallel reads stay within the bounds of the specific archive folder. If initialization of the parallel reader fails or the stream is not seekable, the system transparently falls back to the standard sequential mode.
  5. Unit Tests: Added a new test suite in reader_test.go to verify the interface detection logic, ensure proper cleanup during Close, and confirm that the fallback mechanism works correctly for different types of input streams.

Performance Impact:
Sequential decoding speed is improved due to internal optimizations, and parallel-capable streams (where independent XZ blocks are present) can see a substantial increase in decompression throughput, scaling with the number of available CPU cores.

Fix #470

…port and improved resource management.

This change replaces the standard LZMA library with a performance-optimized fork to enable multi-threaded block decompression when the input stream supports random access via Seek and ReadAt interfaces. The Close methods in the lzma and lzma2 packages have been updated to explicitly close the internal decoder using errors.Join, ensuring that buffers are properly returned to the pool and background goroutines are terminated to prevent memory leaks and test timeouts. Furthermore, the LZMA2 constructor now includes logic to detect seekable streams and safely wrap them in section readers for parallel processing, accompanied by new unit tests to verify the interface detection and cleanup logic.

Fix bodgit#470
@unxed unxed changed the title Enhance LZMA decompression performance and improve resource management perf: enhance LZMA decompression performance and improve resource management Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support parallel multi-core XZ decompression when input is seekable (io.ReaderAt)

1 participant