Use 'in <literal>' instead of str.isupper/isspace/islower in gen_moves (+3.7% nps)#132
Open
simin75simin wants to merge 1 commit into
Open
Use 'in <literal>' instead of str.isupper/isspace/islower in gen_moves (+3.7% nps)#132simin75simin wants to merge 1 commit into
simin75simin wants to merge 1 commit into
Conversation
Profile shows `gen_moves` accounts for ~67% of CPython search time
(measured via `cProfile` on a 5-ply search from startpos). Inside that
function, `q.isupper()`, `q.isspace()`, and `q.islower()` are called
millions of times per search. In CPython each one is a Python-level
attribute lookup plus a C call.
Substring containment against a small literal is meaningfully faster:
timeit on 120-char board scan, 100k iters:
c.isupper() 3.33 us
c in "PNBRQK" 2.61 us (~22% faster)
c.isspace() or c.isupper() 4.97 us
c in " \nPNBRQK" 3.34 us (~33% faster)
End-to-end across a 6-position suite at depth 5 (5 runs each, same
machine):
original 16,711 - 17,240 nps (mean 17,030)
patched 17,592 - 17,685 nps (mean 17,656)
speedup +3.7% (ranges don't overlap)
The search tree is unchanged: node counts are byte-identical on every
position at every depth, perft matches across 6 positions at depth 3
(startpos=8902, kiwipete=97862, ...), and the first 8 mate-in-1 puzzles
from tools/test_files/mate1.fen produce identical `bestmove` output.
After cleanup the line count is also identical (3 functional lines
swapped 1:1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace three Python str-method calls in
Position.gen_moveswith literal-stringinchecks. The search tree is unchanged; the only thing that changes is how each char is classified inside the move-generator loop.q.isupper()q in "PNBRQK"q.isspace() or q.isupper()q in " \nPNBRQK"q.islower()q in "pnbrqk"Why
gen_movesaccounted for ~67% of CPython search time in acProfilerun of a 5-ply search from startpos. Inside it, the three str-methods above are called millions of times per search and each one is a Python-level attribute lookup plus a C call. Literalinis faster because CPython has a specialized opcode for it (CONTAINS_OP) that skips the method-lookup overhead.timeitover a 120-char board scan, 100k iters, CPython 3.13:Speedup
A 6-position suite (startpos + 5 typical openings/middlegames) at fixed depth 5, 5 runs each, CPython 3.13 on Windows:
The variance bands don't overlap across 5 runs, so this is reproducible — not noise.
Correctness
tools/test_files/mate1.fenproduce identicalbestmovebetween original and patched.Notes on minimalism
build/clean.sh sunfish.py | wc -lis unchanged: 3 lines swapped 1:1 ingen_moves. The diff also adds a 3-line# NB:comment explaining why the literal form is used (so a future reader doesn't "clean up" back to method calls); comments are stripped byclean.shso the 131-line claim is unaffected.Test plan
perft_ab.py(modified vs original) at depth 3 across 6 positions — totals and per-root-move breakdowns identical.mate1.fenpositions — identicalbestmove.bench.pyat depth 4 and depth 5, 5 runs each — speedup is reproducible.tools/quick_tests.shend-to-end.