Refactor to do more work in Python by mdboom · Pull Request #382 · faster-cpython/bench_runner

mdboom · 2025-03-27T19:04:38Z

This is the first step of #380.

mdboom · 2025-03-28T18:19:58Z

@diegorusso: This moves as much work as possible from the GHA yml files to Python, so it should be portable across automation systems. I realize it's a fairly large PR, but I would be interested in your feedback to confirm it is going to meet your needs, etc.

Copilot

Pull Request Overview

This PR is the first step in the refactor to shift more work into Python as part of #380. The changes update tests, centralize benchmark hash computation via benchmark_definitions, remove the old should_run script, and adjust workflow and template scripts to use the new commands.

Updated tests in tests/test_run_benchmarks.py to use the new workflow.should_run and hardcoded benchmark hash logic.
Removed the obsolete util.get_benchmark_hash in favor of benchmark_definitions.get_benchmark_hash and updated references across the codebase.
Revised workflow and template scripts (including workflow_bootstrap.py, _pystats.src.yml, and _benchmark.src.yml) to align with the new command structure.

Reviewed Changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/test_run_benchmarks.py	Updated tests to call hardcode_benchmark_hash and use workflow.should_run
bench_runner/util.py	Removed legacy benchmark hash function
bench_runner/templates/workflow_bootstrap.py	Added new bootstrap script for setting up the workflow
bench_runner/templates/benchmark.src.yml, _pystats.src.yml, _benchmark.src.yml	Adjusted steps to call workflow_bootstrap.py instead of the old commands
bench_runner/scripts/workflow.py	Major refactoring of workflow logic with benchmark hash updates
bench_runner/scripts/should_run.py	Removed in favor of updated workflow script
bench_runner/scripts/run_benchmarks.py	Updated to reference benchmark_definitions for hash
bench_runner/scripts/install.py, get_merge_base.py	Minor changes for YAML generation and hash retrieval
bench_runner/git.py	Updated clone function with context management changes
bench_runner/benchmark_definitions.py	New centralized benchmark hash computation
bench_runner/main.py	Updated commands reflecting the refactored workflow

Files not reviewed (1)

bench_runner/templates/env.yml: Language not supported

Comments suppressed due to low confidence (1)

bench_runner/benchmark_definitions.py:34

[nitpick] The variable name 'hash' shadows the built-in Python function hash(). Consider renaming it to something like 'sha' or 'hasher' to improve clarity.

    hash = hashlib.sha256()

GeorgeWort · 2025-04-07T17:13:11Z

+    if not sys.platform.startswith("linux"):
+        return
+
+    run_in_venv(venv, "pyperf", ["system", perf and "reset" or "tune"], sudo=True)


Does this mean that the CPU frequency scaling governors are set to "powersave" when profiling and "performance" when benchmarking?
Could this not lead to different characteristics between the profiled run and the scored run? Though I admit that the profiling will create different characteristics anyway. So I guess my question is just, what is the reason for this?

The reason for this is pretty recent. When @mpage added events to pyperf so that we can turn perf on and off tightly around benchmarking code, I discovered that the tuning made some of these events not fire all the time, probably due to a lower sampling rate. This resolved that and did not seem to have a significant effect on the overall perf results.

But you're right in general that the perf measurement makes the results slightly different anyway, which is unavoidable.

GeorgeWort · 2025-04-07T17:33:58Z

It would be nice to have a way to locally set the nickname for the host as used here.

bench_runner/bench_runner/result.py

Line 527 in eeccea6

_clean(runners.get_nickname_for_hostname(socket.gethostname())),

Otherwise generate_results trips over here.

bench_runner/bench_runner/scripts/generate_results.py

Line 129 in eeccea6

return order.index(val.split()[0]), val

GeorgeWort

Looks reasonable, in that it does everything that we want it to do thanks! Though I haven't gone through and understood everything this PR changes, just that it works for our workflow.

I'm still getting error: use of undeclared identifier 'lastopcode' with the current CPython main, so python3.11 workflow_bootstrap.py python main linux-aarch64 all clang --pystats fails.

If I fix that build failure for the tail calling interpreter and I want the bytecode profiles, I can run something like python3.11 workflow_bootstrap.py python main linux-aarch64 all clang --pystats and python3.11 workflow_bootstrap.py python main linux-aarch64 all clang --pystats, but I have to separately move the pystats results files from results to profiling for profiling_plot to pick them up, is there currently a script somewhere that does this movement that needs including in this or am I missing something?

GeorgeWort · 2025-04-07T18:38:57Z

+        if Path(".debug").exists():
+            shutil.rmtree(".debug")
+
+        pystats_dir = Path("/tmp") / "pystats"


Should this be py_stats?

Good catch. Yes.

GeorgeWort · 2025-04-07T18:49:37Z

+    # Now that we've installed the full bench_runner library,
+    # continue on in a new process...
+
+    last_arg = sys.argv.index("workflow_bootstrap.py")


This means that I have to copy workflow_bootstrap.py out of templates into the repository that I'm running bench_runner from, could this be changed to check the index of the argument that includes "workflow_bootstrap.py"?

The python -m bench_runner install script will copy the workflow_bootstrap.py script to the results repository for you (just as it already does for all of the generated GitHub workflow files).

The bootstrap script is required because we need to have a working venv to run the rest of the workflow.

Ah yes thanks, that got lost in my fiddling.

GeorgeWort · 2025-04-07T19:44:56Z

+
+    args = []
+    if pystats:
+        args.append("--with-pystats")


I think this should be --enable-pystats

Thanks, good catch.

mdboom · 2025-04-08T12:44:33Z

It would be nice to have a way to locally set the nickname for the host as used here.

bench_runner/bench_runner/result.py

Line 527 in eeccea6

_clean(runners.get_nickname_for_hostname(socket.gethostname())),

Otherwise generate_results trips over here.

bench_runner/bench_runner/scripts/generate_results.py

Line 129 in eeccea6

return order.index(val.split()[0]), val

Sorry, I'm not following. Can you be more specific? The code at the second link should just sort unknown nicknames to the beginning.

Copilot

Copilot reviewed 15 out of 16 changed files in this pull request and generated no comments.

Files not reviewed (1)

bench_runner/templates/env.yml: Language not supported

Comments suppressed due to low confidence (2)

bench_runner/git.py:157

The code uses 'contextlib.chdir', which is not a standard library function. Please verify that a valid context manager for changing the working directory is defined or imported.

with contextlib.chdir(dirname):

bench_runner/benchmark_definitions.py:33

The updated benchmark hash logic now produces a different hash (e.g., 'dcfded') compared to earlier expectations (like '215d35'). Confirm that this change is intentional and that all tests and dependent logic have been updated accordingly.

def get_benchmark_hash() -> str:

GeorgeWort · 2025-04-08T13:25:43Z

Sorry, I'm not following. Can you be more specific? The code at the second link should just sort unknown nicknames to the beginning.

I don't understand why it should do that. If the string returned by val.split()[0] is not in the order list, then order.index will throw an exception as it should.

GeorgeWort

Looks good to me, only nice to haves left are the moving of the pystat files and the "unknown" issue.

mdboom · 2025-04-08T15:35:51Z

Sorry, I'm not following. Can you be more specific? The code at the second link should just sort unknown nicknames to the beginning.

I don't understand why it should do that. If the string returned by val.split()[0] is not in the order list, then order.index will throw an exception as it should.

You're right -- I'm confusing it with find. I think the thing to do here is provide a default.

diegorusso · 2025-04-09T09:20:00Z

+BENCHMARK_REPOS = [
+    BenchmarkRepo(
+        "56d12a8fd7cc1432835965d374929bfa7f6f7a07",
+        "https://github.com/mdboom/pyperformance.git",


Why are you pointing to your own fork?

Oh, this is just an artifact of needing some modifications to pyperformance early on. There's no need for this anymore, I'll change it.

diegorusso · 2025-04-09T09:28:13Z

+# the virtual environment to run the full bench_runner.
+
+
+# NOTE: This file should import in Python 3.9 or later so it can at least print


Later we require Python 3.11 though

Good catch. I'll update this comment.

Oh, actually, this is correct. bench_runner requires 3.11 and above and is only tested on that. However, the oldest LTS Linux we use installs Python 3.9 as default. It's nice to be able to run this script and get a nice error message that a Python 3.11 needs to be installed.

Copilot

Copilot reviewed 16 out of 17 changed files in this pull request and generated no comments.

Files not reviewed (1)

bench_runner/templates/env.yml: Language not supported

Comments suppressed due to low confidence (4)

bench_runner/benchmark_definitions.py:34

[nitpick] The variable name 'hash' shadows the built-in hash() function; consider renaming it to 'hasher' or a similar alternative.

hash = hashlib.sha256()

bench_runner/scripts/generate_results.py:130

The try/except fallback for unknown runner names defaults the index to -1; please document or reconsider if this behavior is intended for proper runner ordering.

            idx = order.index(val.split()[0])

bench_runner/scripts/install.py:248

The condition now skips files with a '.py' extension; verify that legitimate template files are not unintentionally omitted from processing.

if not (ROOT_PATH / path.name).is_file() or path.suffix == ".py":

bench_runner/git.py:157

Using 'contextlib.chdir' is nonstandard since the standard library does not provide this context manager; ensure that it is defined elsewhere or replace it with a custom context manager for changing directories.

with contextlib.chdir(dirname):

mdboom force-pushed the refactor-to-python branch from 50b4624 to 75a085b Compare March 28, 2025 18:16

mdboom marked this pull request as ready for review March 28, 2025 18:16

mdboom requested a review from Copilot March 28, 2025 18:20

Copilot AI reviewed Mar 28, 2025

View reviewed changes

Comment thread bench_runner/scripts/workflow.py Outdated

GeorgeWort reviewed Apr 7, 2025

View reviewed changes

Comment thread bench_runner/scripts/workflow.py Outdated

GeorgeWort reviewed Apr 7, 2025

View reviewed changes

mdboom force-pushed the refactor-to-python branch from 02c4dc1 to 6e8270f Compare April 8, 2025 13:04

mdboom requested review from GeorgeWort and Copilot April 8, 2025 13:04

Copilot AI reviewed Apr 8, 2025

View reviewed changes

GeorgeWort approved these changes Apr 8, 2025

View reviewed changes

diegorusso reviewed Apr 9, 2025

View reviewed changes

mdboom requested review from Copilot and diegorusso April 9, 2025 12:11

Copilot AI reviewed Apr 9, 2025

View reviewed changes

mdboom commented Apr 9, 2025

View reviewed changes

Comment thread bench_runner/util.py

mdboom added 4 commits April 9, 2025 17:36

Refactor so as much work as possible is done in Python

2124b6d

Remove _should_run checks in _pystats.src.yml

aa60ce0

Update minimum Python version

30baeea

Simplify bootstrapping

7ff47c7

mdboom added 7 commits April 9, 2025 17:36

Update bench_runner/scripts/workflow.py

3dc1057

Address comments from the PR

3dbc349

Port #388 to Python

859b068

Port #389 to Python

eb43dc3

Port #390 to Python

4f84f80

Bugfix for machines that don't match up with ordering

7c9d649

Don't use my personal fork

8915035

mdboom force-pushed the refactor-to-python branch from 8f2f945 to 8915035 Compare April 9, 2025 21:36

mdboom added 2 commits April 9, 2025 17:53

Extend PATH on Darwin for clang

27dc96f

Limit cores on specific machines

a8b142f

mdboom force-pushed the refactor-to-python branch from 19dd6bd to a8b142f Compare April 10, 2025 14:50

mdboom added 5 commits April 10, 2025 11:09

Bugfix

b0ce775

Investigate Windows failure

4eca84e

Windows bugfixes

37c68b6

Fix Windows again

6600630

Minor simplification

5685297

mdboom mentioned this pull request Apr 10, 2025

Add sections to GHA logs #398

Closed

mdboom added 11 commits April 10, 2025 14:18

Add CHANGELOG.md

cead7aa

Fix Clang

68c07ab

Try to get CLANG on Windows working

2fd3804

Testing Windows syntax

5b1c259

Try to fix Windows

9e29f83

Try to fix Windows

82b1a56

Please HELP with Windows syntax

c1a0bc1

Escaping

8e0c293

Maybe I'll finally get lucky...

f9f4ba8

Try again

cc90a95

Reorder arguments

9684fa4

mdboom merged commit 77124f2 into main Apr 11, 2025
4 checks passed

mdboom deleted the refactor-to-python branch April 18, 2025 12:47

		# the virtual environment to run the full bench_runner.


		# NOTE: This file should import in Python 3.9 or later so it can at least print

Conversation

mdboom commented Mar 27, 2025

Uh oh!

mdboom commented Mar 28, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GeorgeWort commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GeorgeWort left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GeorgeWort Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdboom commented Apr 8, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

GeorgeWort commented Apr 8, 2025

Uh oh!

GeorgeWort left a comment

Choose a reason for hiding this comment

Uh oh!

mdboom commented Apr 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GeorgeWort commented Apr 7, 2025 •

edited

Loading

GeorgeWort left a comment •

edited

Loading

GeorgeWort Apr 8, 2025 •

edited

Loading