Optimize Streaming Write Performance for large Data Sets by ChronosMasterOfAllTime · Pull Request #2315 · qax-os/excelize

ChronosMasterOfAllTime · 2026-04-30T03:20:54Z

Description

Replaces #2288 in smaller chunks

Performance and memory optimization of the streaming write path (StreamWriter), the WriteTo/WriteToBuffer output pipeline, and the ZIP compression layer. Profiling against the mzimmerman/excelizetest benchmark suite identified four hot spots — ColumnNumberToName, CoordinatesToCellName, SetRow, and writeCell — which together accounted for the majority of CPU time and nearly all heap allocations per row.

Summary of improvements

Area	Key metric
SetRow hot path	68–79% faster, 94–99% fewer allocations
Full pipeline (SetRow + WriteTo)	67–72% faster, 51–87% less memory
XML-escaped strings	79% faster, 81% less memory
Peak memory (50K×100)	162 MB → 43 MB (−73%)
Allocations (50K×100)	15.1M → 153K (−99%)
ZIP compression	klauspost/compress: ~2× faster than stdlib

Changes

lib.go — precomputed column names

Added columnNames: a package-level precomputed lookup table of all 16 384 column name strings (A–XFD), initialized once at startup via an IIFE. ColumnNumberToName now returns a slice element instead of allocating a new string on every call.
Optimised CoordinatesToCellName to early-return on the common (non-absolute) path, avoiding concatenation with an empty sign variable.
Updated readXML to use the new bufferedWriter.Bytes() method instead of accessing .buf directly.
Replaced archive/zip import with github.com/klauspost/compress/zip.

stream.go — hot-loop optimizations

SetRow rewrite: precomputes rowStr once per row (previously per cell via CoordinatesToCellName), reuses a single xlsxC struct across the inner loop (fields zeroed per iteration instead of allocating a new struct), and reads column names directly from the columnNames table.
writeNumericCell: zero-allocation fast path for int, int8–int64, uint–uint64, float32, float64, and bool. Writes the complete <c> element directly to the buffer using strconv.Append* into a [24]byte scratch field, bypassing xlsxC entirely.
writeStringCell: zero-allocation fast path for string and []byte. Writes inline-string XML directly, bypassing xlsxC/xlsxSI/trimCellValue/bstrMarshal/xml.EscapeText. Handles xml:space="preserve" for leading/trailing whitespace. Falls back to slow path for _xHHHH_ escape patterns or strings exceeding TotalCellChars.
writeEscaped: custom XML escaper that scans for <>&"\r; fast-path writes directly when no special chars are found, slow-path does character-by-character replacement (still zero-alloc).
writeCellStart: helper to deduplicate the <c r="…" opening across fast and slow paths.
writeCell: now takes *xlsxC by pointer plus pre-split colName/rowStr, eliminating a ~184-byte struct copy and a string concatenation per cell. Skips xml.EscapeText for numeric/boolean <v> values (digits, ., -, +, E are always safe).
setCellValFunc: inlines all integer type cases directly, removing a redundant second type-switch dispatch through setCellIntFunc.
marshalAttrs: writes directly to *bufferedWriter (no intermediate strings.Builder). Row option validation split into validateRowOpts so XML is only written after validation passes.
parseRowOpts: returns RowOpts by value instead of *RowOpts.
streamCellStyle / colStyles []int: column style lookup is now O(1) via a cached slice built once in writeSheetData, replacing a per-cell O(N) linear scan of worksheet.Cols.

stream.go — `bufferedWriter` memory architecture

Two-phase architecture: below the threshold, all writes go to an in-memory bytes.Buffer. Once the threshold is crossed, the buffer is drained to a temp file exactly once and all subsequent writes flow through a fixed-size bufio.Writer wrapping the file. This bounds peak heap usage to approximately StreamingChunkSize + bioSize regardless of total data size, compared to the previous approach which re-grew a new bytes.Buffer to the threshold size on every flush cycle.
Sync() is now a no-op when bio != nil — bufio.Writer flushes internally when its buffer is full; forcing a flush on every SetRow call (the previous behavior) negated all batching benefit.
New methods: Bytes(), Reset(), CopyTo(w io.Writer), WriteInt(int64), WriteUint(uint64), WriteFloat(float64, ...).
CopyTo: uses a 256 KiB buffered reader to minimize Pread syscalls when copying from temp files (reduces ~3000 syscalls to ~400 for a 100 MB worksheet).
scratch [24]byte: used by WriteInt, WriteUint, WriteFloat to format numbers without heap allocation.

file.go — streaming WriteTo & compression

WriteTo rewrite: non-encrypted path now streams the ZIP directly to w via a countWriter wrapper — no intermediate bytes.Buffer. Encrypted path delegates to new writeToWithEncryption, which writes ZIP to a temp file, applies ZIP64 LFH fixup, reads back, encrypts, and writes to w.
WriteToBuffer: now calls configureZipCompression and only performs ZIP64 LFH fixup when len(f.zip64Entries) > 0.
writeToZip: replaces stream.rawData.Reader() + io.Copy with stream.rawData.CopyTo(fi) (uses the new efficient copy path).
writeZip64LFHFile: performs ZIP64 local file header fixup on a temp file (chunk-based, 1 MB reads) instead of requiring an in-memory buffer.

excelize.go — options & compression

New type Compression int with three constants: CompressionDefault, CompressionNone, CompressionBestSpeed.
New fields on Options: StreamingChunkSize int, StreamingBufSize int, Compression Compression. All are zero-by-default (zero → use package constants), so existing callers are completely unaffected. StreamingChunkSize: -1 keeps all data in memory (never spills to disk).
configureZipCompression: registers a custom flate.NewWriter compressor on *zip.Writer based on the Compression option.
Replaced archive/zip import with github.com/klauspost/compress/zip and github.com/klauspost/compress/flate.

templates.go

Added StreamingBufSizeDefault = 128 << 10 (128 KiB). Value determined empirically via BenchmarkBioSizeSweep and TestBioSizeIOProfile.

go.mod

Added github.com/klauspost/compress v1.18.5 — a high-performance, pure-Go drop-in replacement for archive/zip and compress/flate.

Related Issue

Fixes #876 — High memory when writing 1 million number of rows

Motivation and Context

User-reported profiling showed that generating large worksheets via StreamWriter was dominated by per-cell allocations in the column-name conversion functions and by unbounded bytes.Buffer growth in the write buffer. For a 100-column × 50 000-row sheet (~150 MB of XML), the previous code allocated 162 MB peak and made 15.1 million allocations. The WriteTo path then buffered the entire compressed ZIP in a bytes.Buffer before writing, adding another 50–200 MB of peak memory on top.

How Has This Been Tested

All existing tests pass (go test ./...).

New tests

Test	What it covers
`TestStreamingWriteTo`	Verifies WriteTo streams correctly without password; round-trips 100×10 sheet
`TestCompressionOption`	Generates 500×20 sheet at Default/None/BestSpeed; asserts size ordering; validates all are readable XLSX
`TestWriteToBufferCompression`	Verifies `WriteToBuffer` respects `CompressionNone`
`TestWriteToWithPassword`	Round-trips encrypted file via WriteTo with password
`TestWriteToWithPasswordAndCompression`	Combines password encryption with `CompressionBestSpeed`
`TestBioSizeIOProfile`	Instruments write-syscall counts and bytes at 10 `bufio.Writer` sizes (4 KiB – 4 MiB) to project performance on different storage tiers
`BenchmarkBioSizeSweep`	Measures ns/op and B/op across 10 `bufio.Writer` sizes for a 50K×100 sheet
`BenchmarkStringCellClean/Special`	Measures `writeEscaped` fast path vs slow path
`BenchmarkCompressionLevels`	50K×20 string sheet at Default/BestSpeed/None, each with disk-spill and in-memory variants (6 sub-benchmarks)
`BenchmarkStreamWriterLarge/Huge`	10K×50 and 50K×100 integer-cell benchmarks for regression tracking
`BenchmarkExcelize*` (9 sizes)	Full pipeline (build data + SetRow + WriteTo) adapted from mzimmerman/excelizetest

Benchmark results

Platform: Apple M1 Pro, macOS, Go 1.24, arm64
Methodology: go test -run=^$ -bench=... -benchmem -count=3, median of 3 runs

Streaming write path (SetRow + Flush + Close, no WriteTo)

Benchmark	master ns/op	PR ns/op	Δ CPU	master B/op	PR B/op	Δ Mem	master allocs	PR allocs	Δ Allocs
StreamWriter (100×10)	238.7 µs	64.2 µs	−73%	101.9 KB	86.1 KB	−16%	2.3K	147	−94%
StreamWriterLarge (50K×10)	74.9 ms	24.2 ms	−68%	55.0 MB	42.4 MB	−23%	1.65M	33.3K	−98%
StreamWriterHuge (50K×100)	851.9 ms	271.1 ms	−68%	162.2 MB	43.2 MB	−73%	15.14M	153.3K	−99%
StringCellClean (50K×10)	181.1 ms	61.2 ms	−66%	163.6 MB	42.5 MB	−74%	3.65M	33.3K	−99%
StringCellSpecial (50K×10)	337.7 ms	70.7 ms	−79%	223.8 MB	42.5 MB	−81%	5.15M	33.3K	−99%

Full pipeline (build string data + SetRow + WriteTo to buffer)

Benchmark	master ns/op	PR ns/op	Δ CPU	master B/op	PR B/op	Δ Mem	master allocs	PR allocs	Δ Allocs
Excelize 1K×10	9.1 ms	2.7 ms	−70%	4.9 MB	2.1 MB	−57%	86.0K	16.9K	−80%
Excelize 10K×10	66.1 ms	21.5 ms	−67%	47.4 MB	23.3 MB	−51%	833.0K	133.9K	−84%
Excelize 100K×100	7.17 s	2.03 s	−72%	2.54 GB	339.7 MB	−87%	80.29M	10.30M	−87%

Compression options (PR only, 50K×20 string rows, full WriteTo)

Mode	Time	Memory	Allocs
Default (temp file)	383.5 ms	68.7 MB	1.15M
BestSpeed (temp file)	285.9 ms	83.7 MB	1.15M
None (temp file)	249.7 ms	213.6 MB	1.15M
Default (in-memory)	271.3 ms	194.2 MB	1.15M
BestSpeed (in-memory)	254.1 ms	209.1 MB	1.15M
None (in-memory)	178.8 ms	339.1 MB	1.15M

Metric	master (sum)	PR (sum)	Δ
CPU time	8.69 s	2.48 s	−71%
Memory	3.20 GB	536 MB	−83%
Allocations	106.8M	10.7M	−90%

Real-world scenario (the Excelize 100K×100 full pipeline) is:

7.17 s → 2.03 s (−72% CPU)
2.54 GB → 340 MB (−87% memory)
80.3M → 10.3M allocs (−87%)

Key takeaways

StreamWriterHuge (50K×100): −68% CPU, −73% memory (162 MB → 43 MB), −99% allocs (15.1M → 153K)
Excelize 100K×100 full pipeline: 7.17 s → 2.03 s (−72%), 2.54 GB → 340 MB (−87%), 80M → 10M allocs (−87%)
XML-escaped strings (50K×10): −79% CPU, −81% memory — the writeEscaped zero-alloc path eliminates per-character allocations
Allocation reduction is the biggest win across the board: 94–99% fewer allocs in all streaming benchmarks
Memory is now bounded: peak ≈ StreamingChunkSize + bioSize regardless of total data size

Types of changes

Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

…acters

…and []byte values

AdamDrewsTR · 2026-05-01T20:06:50Z

@ChronosMasterOfAllTime Is the plan to then merge in the other features: read performance + file size optimization using shared strings, etc?

ChronosMasterOfAllTime · 2026-05-01T20:10:18Z

@ChronosMasterOfAllTime Is the plan to then merge in the other features: read performance + file size optimization using shared strings, etc?

Yes exactly; break down the features piecemeal. So it's easier to troubleshoot

AdamDrewsTR · 2026-05-12T18:54:34Z

@xuri Exactly what corruption did you see where? I am not able to replicate any issues that don't already exist in master.

AdamDrewsTR · 2026-05-12T20:10:13Z

@xuri Please close. I will break up more granular.

AdamDrewsTR added 4 commits April 20, 2026 15:21

Add streaming options and improve performance for large data writes

d58305d

Add writeEscaped function for optimized XML escaping in cell writing

1d9bbf5

Add benchmarks for writeEscaped function with normal and special char…

7c7990a

…acters

Optimize string cell writing by adding a fast path for plain strings …

0f3bcc6

…and []byte values

xuri added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 30, 2026

ChronosMasterOfAllTime closed this May 12, 2026

xuri reopened this May 21, 2026

xuri closed this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Streaming Write Performance for large Data Sets#2315

Optimize Streaming Write Performance for large Data Sets#2315
ChronosMasterOfAllTime wants to merge 4 commits into
qax-os:masterfrom
ChronosMasterOfAllTime:pr-1-streaming-write-perf

ChronosMasterOfAllTime commented Apr 30, 2026 •

edited

Loading

Uh oh!

AdamDrewsTR commented May 1, 2026

Uh oh!

ChronosMasterOfAllTime commented May 1, 2026

Uh oh!

AdamDrewsTR commented May 12, 2026

Uh oh!

AdamDrewsTR commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ChronosMasterOfAllTime commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary of improvements

Changes

lib.go — precomputed column names

stream.go — hot-loop optimizations

stream.go — bufferedWriter memory architecture

file.go — streaming WriteTo & compression

excelize.go — options & compression

templates.go

go.mod

Related Issue

Motivation and Context

How Has This Been Tested

New tests

Benchmark results

Streaming write path (SetRow + Flush + Close, no WriteTo)

Full pipeline (build string data + SetRow + WriteTo to buffer)

Compression options (PR only, 50K×20 string rows, full WriteTo)

Key takeaways

Types of changes

Checklist

Uh oh!

AdamDrewsTR commented May 1, 2026

Uh oh!

ChronosMasterOfAllTime commented May 1, 2026

Uh oh!

AdamDrewsTR commented May 12, 2026

Uh oh!

AdamDrewsTR commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChronosMasterOfAllTime commented Apr 30, 2026 •

edited

Loading

stream.go — `bufferedWriter` memory architecture