Skip to content

gh-142889: Restructure PyDictKeysObject memory layout for simpler access#145097

Open
clintonsteiner wants to merge 12 commits intopython:mainfrom
clintonsteiner:gh-142889
Open

gh-142889: Restructure PyDictKeysObject memory layout for simpler access#145097
clintonsteiner wants to merge 12 commits intopython:mainfrom
clintonsteiner:gh-142889

Conversation

@clintonsteiner
Copy link
Copy Markdown
Contributor

@clintonsteiner clintonsteiner commented Feb 22, 2026

Restructure dict keys allocation to store dk_indices before the PyDictKeysObject header and keep dk_entries after the header.

Update dict index access and related allocation/free/clone paths, adjust gdb dict entry location logic, and add layout coverage tests.

Local dict microbenchmarks showed about a 1.4% overall improvement, with most operations around 1-2% faster.

@python-cla-bot
Copy link
Copy Markdown

python-cla-bot bot commented Feb 22, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

…er entry access

Restructure dict keys allocation to store dk_indices before the PyDictKeysObject header and keep dk_entries after the header.

Update dict index access and related allocation/free/clone paths, adjust gdb dict entry location logic, and add layout coverage tests.

Local dict microbenchmarks showed about a 1.4% overall improvement, with most operations around 1-2% faster.
Copy link
Copy Markdown
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising.
I've a few suggestions, but nothing major.

clintonsteiner and others added 8 commits March 11, 2026 14:38
…ucture

  replace char dk_indices[] with explicit union named dk_entries
  because union has non-zero size, update allocation and size reporting
  to use offsetof in place of sizeof, keeping actual memory footprint unchanged
  remove _DK_INDICES_END(), inline uses in macros
  adds PyDictKeysObject pointer note to layout diagram for dictobject.c
…ointer arithmetic

The previous macros used negative indexing from the PyDictKeysObject pointer
(e.g. ((int8_t*)keys)[-1-idx]) to access the indices stored before the struct.
This is undefined behavior: the compiler sees keys as typed PyDictKeysObject*
and negative indices are formally out-of-bounds, allowing MSVC's release optimizer
to generate incorrect hash table lookups.

Fix by using _DK_INDICES_BASE(keys) (the actual malloc base) with positive forward
indexing, which is well-defined pointer arithmetic within the allocation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants