Skip to content

feat: add delta column and alignment to benchmark workflow#1661

Closed
knyk-dev wants to merge 1 commit into
apache:masterfrom
knyk-dev:master
Closed

feat: add delta column and alignment to benchmark workflow#1661
knyk-dev wants to merge 1 commit into
apache:masterfrom
knyk-dev:master

Conversation

@knyk-dev
Copy link
Copy Markdown
Contributor

@knyk-dev knyk-dev commented Jan 6, 2026

This update improves our performance benchmark report to make it much easier to read.

  • New Delta Column: It now calculates the exact percentage change between the base and PR versions.
  • Status Emojis: Added visual icons for quick checks: 🚀 (faster >10%), 🐌 (slower >10%), and ➡️ (no significant change).
  • Better Alignment: The script fixes table formatting and removes messy worker suffixes (like -1) from benchmark names.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jan 6, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the performance benchmark workflow by splitting it into two separate workflows and adding visual enhancements to the benchmark comparison output. The changes move from a direct comment posting approach to a workflow_run pattern that separates the benchmark execution from comment posting.

  • Adds Python post-processing script to calculate delta percentages and add status emojis (🚀 for >10% faster, 🐌 for >10% slower, ➡️ for no significant change)
  • Strips worker suffixes from benchmark names for cleaner output
  • Splits the workflow into two: one for running benchmarks and another for posting comments via workflow_run trigger

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
.github/workflows/performance-pr.yml Modified to remove inline comment posting, added Python script for post-processing benchmark results with delta calculations and emojis, and changed to upload results as artifacts
.github/workflows/comment.yml New workflow that triggers on completion of the performance benchmark workflow to download artifacts and post/update PR comments with the results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +28 to +35
var matchArtifact = artifacts.data.artifacts.filter((artifact) => {
return artifact.name == "benchmark-results"
})[0];
var download = await github.rest.actions.downloadArtifact({
owner: context.repo.owner,
repo: context.repo.repo,
artifact_id: matchArtifact.id,
archive_format: 'zip',
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The artifact download logic doesn't handle the case where no matching artifact is found. If the benchmark-results artifact doesn't exist or has a different name, matchArtifact will be undefined, and accessing matchArtifact.id on line 34 will cause a runtime error. Add a check to verify the artifact exists before attempting to download it.

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +50
var issue_number = Number(fs.readFileSync('pr_number.txt'));
var comparison = fs.readFileSync('comparison.md', 'utf8');

Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fs.readFileSync calls for both pr_number.txt and comparison.md don't handle potential file reading errors. If either file is missing or corrupted in the artifact, the workflow will fail with an unclear error. Add error handling to provide better feedback when these files cannot be read.

Suggested change
var issue_number = Number(fs.readFileSync('pr_number.txt'));
var comparison = fs.readFileSync('comparison.md', 'utf8');
let issue_number;
let comparison;
try {
if (!fs.existsSync('pr_number.txt')) {
core.setFailed("Required artifact file 'pr_number.txt' was not found in the workspace.");
return;
}
const prNumberContent = fs.readFileSync('pr_number.txt', 'utf8').trim();
issue_number = Number(prNumberContent);
if (Number.isNaN(issue_number)) {
core.setFailed("Contents of 'pr_number.txt' are not a valid pull request number: \"" + prNumberContent + "\".");
return;
}
} catch (error) {
core.setFailed("Failed to read 'pr_number.txt': " + error.message);
return;
}
try {
if (!fs.existsSync('comparison.md')) {
core.setFailed("Required artifact file 'comparison.md' was not found in the workspace.");
return;
}
comparison = fs.readFileSync('comparison.md', 'utf8');
} catch (error) {
core.setFailed("Failed to read 'comparison.md': " + error.message);
return;
}

Copilot uses AI. Check for mistakes.
in_code = False

def strip_worker_suffix(text: str) -> str:
return re.sub(r'^(\S+?)-\d+(\s|$)', r'\1\2', text, count=1)
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern for stripping worker suffixes only processes the first match (count=1), but it assumes the pattern appears at the start of the text (^). This means if there are multiple benchmark names on a line or if the pattern doesn't appear at the very beginning, it may not work as expected. Consider whether the pattern should be more flexible to handle different line formats from benchstat output.

Suggested change
return re.sub(r'^(\S+?)-\d+(\s|$)', r'\1\2', text, count=1)
return re.sub(r'(\S+?)-\d+(\s|$)', r'\1\2', text)

Copilot uses AI. Check for mistakes.
with:
script: |
var fs = require('fs');
var issue_number = Number(fs.readFileSync('pr_number.txt'));
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting the PR number using Number() doesn't validate whether the value is actually a valid number. If pr_number.txt contains non-numeric content or is empty, Number() will return NaN, which will cause the API calls to fail. Consider adding validation to ensure issue_number is a valid positive integer before making API calls.

Suggested change
var issue_number = Number(fs.readFileSync('pr_number.txt'));
var prNumberContent = fs.readFileSync('pr_number.txt', 'utf8').trim();
var issue_number = parseInt(prNumberContent, 10);
if (!Number.isFinite(issue_number) || issue_number <= 0) {
throw new Error('Invalid PR number in pr_number.txt: "' + prNumberContent + '"');
}

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +127
python3 - <<'PY'
import pathlib, re, sys

p = pathlib.Path("comparison.md")
lines = p.read_text(encoding="utf-8").splitlines()
out = []
in_code = False

def strip_worker_suffix(text: str) -> str:
return re.sub(r'^(\S+?)-\d+(\s|$)', r'\1\2', text, count=1)

def decorate(line: str) -> str:
line = strip_worker_suffix(line)

def get_icon(diff_val):
if diff_val > 10: return "🐌"
if diff_val < -10: return "🚀"
return "➡️"

# 1. Existing percentage
m = re.search(r'([+-]?\d+\.\d+)%', line)
if m:
diff = float(m.group(1))
icon = get_icon(diff)
return line[:m.end()] + " " + icon + line[m.end():]

# 2. Calculate from values
matches = re.findall(r'(\d+(?:\.\d+)?)([nµmsu])', line)
if len(matches) >= 2:
def parse_val(num_str, unit_str):
v = float(num_str)
if unit_str == 'n': return v * 1e-9
if unit_str in ['µ', 'u']: return v * 1e-6
if unit_str == 'm': return v * 1e-3
return v # s

v1 = parse_val(matches[0][0], matches[0][1])
v2 = parse_val(matches[1][0], matches[1][1])

if v1 != 0:
diff = (v2 - v1) / v1 * 100
icon = get_icon(diff)
return f"{line}\t{diff:+.2f}% {icon}"

return line

for line in lines:
if line.strip() == "```":
in_code = not in_code
out.append(line)
continue
if not in_code:
out.append(line)
continue
out.append(decorate(line))

p.write_text("\n".join(out) + "\n", encoding="utf-8")
PY
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python script doesn't handle potential errors when reading or writing the comparison.md file. If the file doesn't exist or there are encoding issues, the script will crash and fail silently due to 'continue-on-error: true'. Consider adding basic error handling to provide meaningful feedback when the post-processing fails.

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +104
if unit_str == 'n': return v * 1e-9
if unit_str in ['µ', 'u']: return v * 1e-6
if unit_str == 'm': return v * 1e-3
return v # s
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit conversion logic doesn't handle the case when unit_str is 's' (seconds). When unit_str is 's', the function returns 'v' without any conversion, but the return statement is outside the conditional checks. This means if an unexpected unit is encountered (not 'n', 'µ', 'u', 'm', or 's'), it would also return the raw value without conversion, which could lead to incorrect delta calculations. Consider explicitly handling all expected units and raising an error for unexpected ones.

Suggested change
if unit_str == 'n': return v * 1e-9
if unit_str in ['µ', 'u']: return v * 1e-6
if unit_str == 'm': return v * 1e-3
return v # s
if unit_str == 'n':
return v * 1e-9
if unit_str in ['µ', 'u']:
return v * 1e-6
if unit_str == 'm':
return v * 1e-3
if unit_str == 's':
return v
raise ValueError(f"Unexpected time unit: {unit_str}")

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +110
v1 = parse_val(matches[0][0], matches[0][1])
v2 = parse_val(matches[1][0], matches[1][1])

if v1 != 0:
diff = (v2 - v1) / v1 * 100
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delta calculation assumes v1 (base benchmark) is the first value and v2 (PR benchmark) is the second value in the regex matches. However, the order of values in benchstat output should be verified. If benchstat outputs PR first and base second, the delta percentage would have the wrong sign (showing improvements as regressions and vice versa). The formula (v2 - v1) / v1 * 100 calculates the percentage change from v1 to v2.

Copilot uses AI. Check for mistakes.
uses: actions/github-script@v7
# Save PR number
- name: Save PR number
run: echo ${{ github.event.number }} > pr_number.txt
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow relies on github.event.number to get the PR number, but this context variable may not be available in all pull_request event scenarios. If the PR number cannot be retrieved, the pr_number.txt file will contain an empty or invalid value, causing the comment workflow to fail. Consider adding validation or using github.event.pull_request.number instead.

Suggested change
run: echo ${{ github.event.number }} > pr_number.txt
run: |
PR_NUMBER="${{ github.event.pull_request.number }}"
if [ -z "$PR_NUMBER" ]; then
echo "Error: Pull request number is not available in event payload." >&2
exit 1
fi
echo "$PR_NUMBER" > pr_number.txt

Copilot uses AI. Check for mistakes.
@knyk-dev knyk-dev closed this Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants