perf: replace dis.get_instructions with direct co_code parsing in from_code#194
Conversation
…m_code dis.get_instructions performs two full passes over the bytecode: - _make_labels_map → findlabels → _unpack_opargs (to build a jump-label map) - _get_instructions_bytes (to iterate instructions with full metadata) Neither pass is needed here. ConcreteBytecode.from_code only needs the opname, raw arg byte, and source positions for each instruction word — all of which are directly available from co_code and co_positions(). CACHE entries are already inline in co_code on all supported Python versions, so direct 2-byte iteration handles them naturally without the per-version cache_info loop that 3.13 previously required. Throughput (round-trips of Bytecode.from_code().to_code() on the dis module's own code object, timed over 1 second, 3 runs each): Before: 92–94 round-trips/s After: 107–111 round-trips/s (~+17%) Austin CPU profile figures: dis._unpack_opargs: 5.98% own → eliminated dis._get_instructions_bytes: 3.45% own → eliminated ConcreteBytecode.from_code: 3.63% own → 4.91% own
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #194 +/- ##
==========================================
- Coverage 95.21% 95.17% -0.05%
==========================================
Files 7 7
Lines 2048 2051 +3
Branches 448 446 -2
==========================================
+ Hits 1950 1952 +2
- Misses 54 55 +1
Partials 44 44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
MatthieuDartiailh
left a comment
There was a problem hiding this comment.
Very nice !!! Getting rid of dis without complications is something I wished I had known was possible.
A couple of comments but LGTM
Co-authored-by: Matthieu Dartiailh <marul@laposte.net>
|
Throughput up to ~130 after updating the PR |
Add two fast-path factory methods that skip validation by using object.__new__ + direct slot assignment, for call sites where the inputs are already known to be valid: **InstrLocation._from_tuple** — replaces InstrLocation(...) at four internal sites where positions come from trusted sources (existing InstrLocation.lineno, SetLineno.lineno, first_lineno): - ConcreteBytecode.to_bytecode (fallback lineno-only location) - ConcreteBytecode._pack_location (propagated from existing location) - _ConvertBytecodeToConcrete.concrete_instructions (first_lineno seed and SetLineno-derived locations) **BaseInstr._from_trusted** — replaces Instr(name, arg, location=loc) in ConcreteBytecode.to_bytecode, where name/opcode/arg/location are all derived from already-validated ConcreteInstr objects. CPU own-time profile data: | Hotspot | Before | After | |---|---|---| | `ConcreteBytecode.to_bytecode` | 5.98% | 5.07% | | `Instr._check_arg` | 2.87% | eliminated | | `BaseInstr._set` (via to_bytecode) | 1.48% | eliminated | | `BaseInstr._from_trusted` | — | <1% (not in top 20) | Throughput (Bytecode.from_code().to_code() on dis module's code object, 1 second timed window, 5 runs): | | r/s range | |---|---| | Before | 103–108 | | After | 109–114 |
| Tuple[Optional[int], Optional[int], Optional[int], Optional[int]] | ||
| ] = iter(code.co_positions()) | ||
| for offset in range(0, len(bc), 2): | ||
| arg = bc[offset + 1] if opcode_has_argument(op := bc[offset]) else UNSET |
There was a problem hiding this comment.
Won't this be problematic for Cython ?
There was a problem hiding this comment.
It shouldn't be, the issue is specific to certain ad-hoc optimisations (cf. cython/cython#7670)
There was a problem hiding this comment.
Actually in this particular occurance I do not find the walrus very readable. Could you go back to first fetching the op and then using it to get the arg ?
dis.get_instructionsperforms two full passes over the bytecode:_make_labels_map → findlabels → _unpack_opargs(to build a jump-label map)_get_instructions_bytes(to iterate instructions with full metadata)Neither pass is needed here.
ConcreteBytecode.from_codeonly needs theopname, raw arg byte, and source positions for each instruction word — all of which are directly available fromco_codeandco_positions().CACHEentries are already inline inco_codeon all supported Python versions, so direct 2-byte iteration handles them naturally without the per-versioncache_infoloop that 3.13 previously required.Throughput (round-trips of
Bytecode.from_code().to_code()on thedismodule's own code object, timed over 1 second, 3 runs each):Before: 92–94 round-trips/s
After: 107–111 round-trips/s (~+17%)
Own CPU time figures:
dis._unpack_opargsdis._get_instructions_bytesConcreteBytecode.from_code