gh-145742: Manually emit _LOAD_FAST_BORROW to reduce stencil bloat by corona10 · Pull Request #148217 · python/cpython

corona10 · 2026-04-07T13:59:09Z

Manually emit _LOAD_FAST_BORROW at JIT compile time, encoding the operand offset directly into the instruction instead of loading it from the GOT at runtime.
This shrinks the generic case (oparg ≥ 8) from 28 bytes to 8 bytes and eliminates 27 stencil functions.
I've compared machine code through godbolt:
- x86-64 / aarch64: https://godbolt.org/z/3M9zKeosj
- i686: https://godbolt.org/z/cdjdzev5Y

Issue: Some (mostly) easy ways to reduce the size of JIT generated code #145742

corona10 · 2026-04-08T16:00:38Z

For i686: https://godbolt.org/z/cdjdzev5Y

diegorusso · 2026-04-08T16:08:56Z

Some initial feedback on this:

we should not to pollute jit.c with uops implementation. They should live in separate compile units and have the same signature of the other ones (e.g.: void emit__UOP_NAME(unsigned char *code, unsigned char *data, _PyExecutorObject *executor, const _PyUOpInstruction *instruction, jit_state *state)
ifdefs can select the right architecture of the custom implementation
in bytecodes.c we should have a way to tell the JIT machinery not to generated any code for a specific uops but the uops implemetation should be accounted in the table in the jit-stencils-*.h (static const StencilGroup stencil_groups[MAX_UOP_REGS_ID + 1])
The linker later on will pick up our own version of the uops implementation.

corona10 · 2026-04-08T16:13:51Z

Thanks, @diegorusso. I’ll keep working on this based on your feedback.

markshannon · 2026-04-09T17:21:41Z

A couple of other things:

This PR asserts that the immediate value fits into the space given, but this will fail for larger opargs.
I don't if this matters, but the x86 code is inferior to that generated by the stencils for oparg 0-5. For example, for LOAD_FAST_BORROW_1_r01 in the stencil generated code uses a 1 byte offset instead of the 4 byte offset this PR generates. For oparg > 5, the code is the same.

I think you need to split _LOAD_FAST_BORROW into two variants for the JIT. One for all normal opargs, that can use manual code generation, and a generated fallback for huge opargs.

pythongh-145742: Manually emit _LOAD_FAST_BORROW to reduce stencil bloat

c6fa882

corona10 requested review from brandtbucher, diegorusso, markshannon and savannahostrowski as code owners April 7, 2026 13:59

bedevere-app bot added the awaiting core review label Apr 7, 2026

bedevere-app bot mentioned this pull request Apr 7, 2026

Some (mostly) easy ways to reduce the size of JIT generated code #145742

Open

corona10 added 2 commits April 9, 2026 00:53

Support i686

e24e176

Merge remote-tracking branch 'upstream/main' into pythongh-145742-impl

d459f5e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-145742: Manually emit _LOAD_FAST_BORROW to reduce stencil bloat#148217

gh-145742: Manually emit _LOAD_FAST_BORROW to reduce stencil bloat#148217
corona10 wants to merge 3 commits intopython:mainfrom
corona10:gh-145742-impl

corona10 commented Apr 7, 2026 •

edited

Loading

Uh oh!

corona10 commented Apr 8, 2026

Uh oh!

diegorusso commented Apr 8, 2026

Uh oh!

corona10 commented Apr 8, 2026

Uh oh!

markshannon commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

corona10 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

corona10 commented Apr 8, 2026

Uh oh!

diegorusso commented Apr 8, 2026

Uh oh!

corona10 commented Apr 8, 2026

Uh oh!

markshannon commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

corona10 commented Apr 7, 2026 •

edited

Loading