Skip to content

Defer Sorter.DocMap packing until after flush#16048

Open
Tim-Brooks wants to merge 2 commits into
apache:mainfrom
Tim-Brooks:do_not_pack_sorter
Open

Defer Sorter.DocMap packing until after flush#16048
Tim-Brooks wants to merge 2 commits into
apache:mainfrom
Tim-Brooks:do_not_pack_sorter

Conversation

@Tim-Brooks
Copy link
Copy Markdown
Contributor

IndexingChain calls sortMap.oldToNew heavily during flush — per posting
in FreqProxTermsWriter, per (field, doc) in DV writers, per vector in
KNN writers. The previous implementation built oldToNew as a
PackedLongValues, which is slow for that random-access hot path.

Build oldToNew as an int[] during flush, then pack to PackedLongValues
before storing on FlushedSegment, which is retained long-term on
ReadersAndUpdates for sorted segments with seg-private updates.

IndexingChain calls sortMap.oldToNew heavily during flush — per posting
in FreqProxTermsWriter, per (field, doc) in DV writers, per vector in
KNN writers. The previous implementation built oldToNew as a
PackedLongValues, which is slow for that random-access hot path.

Build oldToNew as an int[] during flush, then pack to PackedLongValues
before storing on FlushedSegment, which is retained long-term on
ReadersAndUpdates for sorted segments with seg-private updates.
@github-actions github-actions Bot added this to the 10.5.0 milestone May 11, 2026
@Tim-Brooks
Copy link
Copy Markdown
Contributor Author

Tim-Brooks commented May 11, 2026

During a flush reading packed values is a major performance hit. Particularly for points flushing.

image

Keeping the unpacked version for the life of the flush (still short) resolves this and Lucene already allocated the memory. The memory is still bounded by (directly) maxBufferedDocs and (indirectly) ramBufferSizeMB. I assumed it was unnecessary to add switches or configurations to tweak my proposed new behavior. But I can if that is desired.

For context a segment with 128K documents will take ~512KB for this int[]. Lucene already creates this for the initial sorter. It just releases it within the same method scope. I am proposing we keep it around for the lifetime of the flush and then release to a packed version once we have a sealed segment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant