Skip to content

framework.sh cpuset overflow check extracts trailing int, not max #656

@MDA2AV

Description

@MDA2AV

The cpuset overflow check in scripts/lib/framework.sh:102 extracts the trailing integer from the cpuset string instead of the maximum:

requested_max=$(echo "$cpu_limit" | grep -oP '\d+$')

This works by accident for the current profiles (all list their ranges in ascending order, so the last integer happens to be the max), but it breaks if a profile's cpuset is reordered or if a future profile is written with the largest range first.

cpuset string actual max parser returns
0-31,64-95 95 95 ✓
0-3 3 3 ✓
64-95,0-31 95 31
96-103,0-7,32-39 103 39

When the parser returns a too-low value, requested_max > max_cpu is false, so the code takes the --cpuset-cpus=\"\$cpu_limit\" path. Docker then rejects the cpuset (some indices are out of range on the host) and the framework container fails to start with a Docker error, instead of the intended `warn` + `--cpus` fallback.

Fix

One-line change — extract every integer, sort numerically, take the last:

requested_max=\$(echo \"\$cpu_limit\" | grep -oP '\d+' | sort -n | tail -1)

This computes the actual max regardless of ordering. The rest of the overflow logic stays the same; no other call sites.

Why surface it now

The current production profiles in `scripts/lib/profiles.sh` all happen to have the largest range trailing (`0-31,64-95`, `1-31,65-95`, `0-3`, etc.), so the bug doesn't fire today. But it's a quiet trap waiting for the next profile that breaks the convention — and the failure mode (Docker error instead of soft fallback) is worse than the same code on a 16-core laptop, where it would normally warn and continue.

Metadata

Metadata

Assignees

Labels

platformEverything related to benchmark exection, including the PR workflow.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions