Skip to content

feat(test-benchmark): support fractional gas benchmark values#2694

Closed
shrirajpawar4 wants to merge 1 commit intoethereum:forks/amsterdamfrom
shrirajpawar4:fix/speedup-bench-gas-check
Closed

feat(test-benchmark): support fractional gas benchmark values#2694
shrirajpawar4 wants to merge 1 commit intoethereum:forks/amsterdamfrom
shrirajpawar4:fix/speedup-bench-gas-check

Conversation

@shrirajpawar4
Copy link
Copy Markdown

Summary

Fixes #2673 by allowing fractional Mgas inputs for --gas-benchmark-values, while keeping the benchmark/test pipeline integer-based after parsing.

The issue is not just “accept 0.5”. The important part is to preserve benchmark semantics downstream:

  • benchmark tests consume gas_benchmark_value as an absolute gas budget in gas units
  • many tests use integer arithmetic and exact gas validation
  • fixture IDs and output subdirectories need to represent fractional Mgas values consistently

What changed

  • Parse --gas-benchmark-values as decimal Mgas inputs and convert them immediately to integer gas-unit budgets
  • Keep gas_benchmark_value integer-based for downstream benchmark logic
  • Add readable benchmark IDs such as benchmark-gas-value_0.5M
  • Update gas benchmark subdirectory formatting to support fractional values while staying sortable
  • Change just bench-gas from --gas-benchmark-values 1 to 0.5
  • Update benchmark docs/examples to document fractional Mgas inputs

Why this approach

Using raw float values all the way through is the wrong boundary for this repo. The safe place to handle fractions is the CLI/input layer; below that, the code should continue to operate on integer
gas budgets.

This keeps the fix scoped to the benchmark tooling and avoids changing benchmark test semantics under tests/benchmark/**.

Validation

Ran targeted benchmark plugin and collection tests locally:

PYTHONPATH=src:packages/testing/src .venv/bin/python -m pytest \
  packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_benchmarking.py \
  -k 'gas_benchmark_values_valid_input or gas_benchmark_values_invalid_input or gas_benchmark_value_formatting or benchmarking_mode_configured_with_option'

PYTHONPATH=src:packages/testing/src .venv/bin/python -m pytest \
  packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_filler.py \
  -k 'benchmark_conftest_matrix and gas-benchmark-values'

PYTHONPATH=src:packages/testing/src .venv/bin/python -m pytest \
  packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_benchmarking.py \
  -k 'fractional_benchmark_gas_values_split_into_subdirs'

Results:

- 10 passed, 38 deselected
- 8 passed, 23 deselected
- 1 passed, 47 deselected

I did not run the full just bench-gas recipe locally because that needs the benchmark evm binary available in PATH / EVM_BIN.

Copy link
Copy Markdown
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shrirajpawar4 thanks a lot for giving this a go. I think this PR went down the wrong path. Please check my comment about updating the pydantic model instead - it would be easier to change the type there from int to float which would enable pydantic to parse these values correctly, preferably with a reasonable / heuristically determined minimum value that gets checked in a validation step.

As #2693 landed (just) before this PR and is heading down the wrong path, I will close this PR and focus on #2693. I'm sorry about that, bad luck with timing. If you'd like to tackle a different issue, feel free to reach out.

Comment thread docs/templates/base.md.j2
```console
{% if is_benchmark %}
fill -v {{ pytest_node_id }} --gas-benchmark-values 1
fill -v {{ pytest_node_id }} --gas-benchmark-values 0.5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment, I would avoid inflating the changeset in a PR across too many files unnecessarily. An example might be worth including, but I think this includes too many doc changes for what is otherwise, a relatively small change.

]


def _parse_gas_limit_millions(value: str) -> int:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think this approach is too verbose. I believe a better idea, would be to update the pydantic model above in L77, see here:

class GasBenchmarkValues(RootModel, BenchmarkParametrizer):
"""Gas benchmark values configuration object."""
root: List[int]
flag: ClassVar[str] = "--gas-benchmark-values"
config_field: ClassVar[str] = "_gas_benchmark_values_config"
parameter_name: ClassVar[str] = "gas_benchmark_value"
@classmethod
def from_parameter_value(
cls, _config: pytest.Config, value: str
) -> Self | None:
"""Given the parameter value and config, return the expected object."""
return cls.model_validate(value.split(","))
def get_test_parameters(self, _test_name: str) -> list[ParameterSet]:
"""Get benchmark values. All tests have the same list."""
return [
pytest.param(
gas_value * 1_000_000,
id=f"benchmark-gas-value_{gas_value}M",
marks=[
pytest.mark.fixture_subfolder(
level=1,
prefix=format_gas_limit_prefix(gas_value, self.root),
),
],
)
for gas_value in self.root
]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e, change the pydantic model to allow a List[float] instead of a List[int] in L177.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the original diff was broader. The reason I did not stop at changing GasBenchmarkValues to List[float] is that I think the parsing error is only one part of the problem

The benchmark pipeline internally works with integer gas-unit budgets, not floats. If we accept floats directly and then do int(value * 1_000_000), we silently truncate and can also end up with
inconsistent benchmark IDs.

please correct me if i am wrong this will help me understand the system better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed-up bench-gas check

2 participants