Skip to content

Reclaim orphaned chain pages after spill #379

@tjgreen42

Description

@tjgreen42

Summary

When tp_spill_finalize publishes a freshly-built L0 segment from the
in-flight memtable chain, the old chain pages are not WAL-touched
(they are simply unlinked from the metapage's memtable_head_blkno).
On disk they remain as zero-cost dead pages until something reclaims
them. There is currently no such reclaim path.

This is a storage leak, not a correctness issue: queries don't see
the orphans (the chain head moves), and stock PG WAL replay reaches
the same end-state on standbys. But under workloads with many
spills the relation file grows monotonically.

The path to reclamation is the same as for btree's RECENTLY_DEAD
pages: stamp each unlinked chain page with a merged_at_xid horizon
on spill, then have a future amvacuumcleanup pass FSM-recycle pages
whose horizon is older than RecentGlobalXmin. The horizon needs to
be standby-safe, so the same mechanism is required by #(see standby
horizon issue) for displaced segment pages.

Acceptance criteria

  • After N successful spills with K chain pages each, on-disk relation
    size grows by O(active chain pages), not O(N×K).
  • amvacuumcleanup reclaims orphaned chain pages once their horizon
    is past every active snapshot on primary and replicas.
  • Reclaimed pages are returned to the FSM so subsequent
    ExtendBufferedRel calls reuse them instead of growing the relation.

Notes

  • Documented in tp_spill_finalize's docstring.
  • See also the matching issue for displaced segment pages during
    merge — same horizon mechanism.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions