Skip to content

perf: replace dis.get_instructions with direct co_code parsing in from_code#194

Merged
MatthieuDartiailh merged 9 commits into
MatthieuDartiailh:mainfrom
P403n1x87:perf/avoid-dis-location-overhead
May 9, 2026
Merged

perf: replace dis.get_instructions with direct co_code parsing in from_code#194
MatthieuDartiailh merged 9 commits into
MatthieuDartiailh:mainfrom
P403n1x87:perf/avoid-dis-location-overhead

Conversation

@P403n1x87
Copy link
Copy Markdown
Contributor

dis.get_instructions performs two full passes over the bytecode:

  • _make_labels_map → findlabels → _unpack_opargs (to build a jump-label map)
  • _get_instructions_bytes (to iterate instructions with full metadata)

Neither pass is needed here. ConcreteBytecode.from_code only needs the opname, raw arg byte, and source positions for each instruction word — all of which are directly available from co_code and co_positions().

CACHE entries are already inline in co_code on all supported Python versions, so direct 2-byte iteration handles them naturally without the per-version cache_info loop that 3.13 previously required.

Throughput (round-trips of Bytecode.from_code().to_code() on the dis module's own code object, timed over 1 second, 3 runs each):

Before: 92–94 round-trips/s
After: 107–111 round-trips/s (~+17%)

Own CPU time figures:

Function Before After
dis._unpack_opargs 5.98% 0.0%
dis._get_instructions_bytes 3.45% 0.0%
ConcreteBytecode.from_code 3.63% 4.91%

…m_code

dis.get_instructions performs two full passes over the bytecode:
- _make_labels_map → findlabels → _unpack_opargs (to build a jump-label map)
- _get_instructions_bytes (to iterate instructions with full metadata)

Neither pass is needed here. ConcreteBytecode.from_code only needs the
opname, raw arg byte, and source positions for each instruction word —
all of which are directly available from co_code and co_positions().

CACHE entries are already inline in co_code on all supported Python
versions, so direct 2-byte iteration handles them naturally without the
per-version cache_info loop that 3.13 previously required.

Throughput (round-trips of Bytecode.from_code().to_code() on the dis
module's own code object, timed over 1 second, 3 runs each):

  Before: 92–94 round-trips/s
  After:  107–111 round-trips/s  (~+17%)

Austin CPU profile figures:

  dis._unpack_opargs:          5.98% own  →  eliminated
  dis._get_instructions_bytes: 3.45% own  →  eliminated
  ConcreteBytecode.from_code:  3.63% own  →  4.91% own
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.17%. Comparing base (9de3e78) to head (9f53642).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #194      +/-   ##
==========================================
- Coverage   95.21%   95.17%   -0.05%     
==========================================
  Files           7        7              
  Lines        2048     2051       +3     
  Branches      448      446       -2     
==========================================
+ Hits         1950     1952       +2     
- Misses         54       55       +1     
  Partials       44       44              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@P403n1x87 P403n1x87 marked this pull request as ready for review May 8, 2026 13:48
Copy link
Copy Markdown
Owner

@MatthieuDartiailh MatthieuDartiailh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice !!! Getting rid of dis without complications is something I wished I had known was possible.

A couple of comments but LGTM

Comment thread src/bytecode/concrete.py Outdated
Comment thread src/bytecode/concrete.py Outdated
P403n1x87 and others added 2 commits May 8, 2026 15:48
Co-authored-by: Matthieu Dartiailh <marul@laposte.net>
@P403n1x87
Copy link
Copy Markdown
Contributor Author

Throughput up to ~130 after updating the PR

Add two fast-path factory methods that skip validation by using
object.__new__ + direct slot assignment, for call sites where the
inputs are already known to be valid:

**InstrLocation._from_tuple** — replaces InstrLocation(...) at four
internal sites where positions come from trusted sources (existing
InstrLocation.lineno, SetLineno.lineno, first_lineno):
- ConcreteBytecode.to_bytecode (fallback lineno-only location)
- ConcreteBytecode._pack_location (propagated from existing location)
- _ConvertBytecodeToConcrete.concrete_instructions (first_lineno seed
  and SetLineno-derived locations)

**BaseInstr._from_trusted** — replaces Instr(name, arg, location=loc)
in ConcreteBytecode.to_bytecode, where name/opcode/arg/location are all
derived from already-validated ConcreteInstr objects.

CPU own-time profile data:

| Hotspot | Before | After |
|---|---|---|
| `ConcreteBytecode.to_bytecode` | 5.98% | 5.07% |
| `Instr._check_arg` | 2.87% | eliminated |
| `BaseInstr._set` (via to_bytecode) | 1.48% | eliminated |
| `BaseInstr._from_trusted` | — | <1% (not in top 20) |

Throughput (Bytecode.from_code().to_code() on dis module's code object,
1 second timed window, 5 runs):

| | r/s range |
|---|---|
| Before | 103–108 |
| After | 109–114 |
Comment thread src/bytecode/concrete.py Outdated
Tuple[Optional[int], Optional[int], Optional[int], Optional[int]]
] = iter(code.co_positions())
for offset in range(0, len(bc), 2):
arg = bc[offset + 1] if opcode_has_argument(op := bc[offset]) else UNSET
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this be problematic for Cython ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be, the issue is specific to certain ad-hoc optimisations (cf. cython/cython#7670)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in this particular occurance I do not find the walrus very readable. Could you go back to first fetching the op and then using it to get the arg ?

Comment thread src/bytecode/concrete.py Outdated
@MatthieuDartiailh MatthieuDartiailh merged commit 25bf1bc into MatthieuDartiailh:main May 9, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants