Skip to content

Commit 0b52e48

Browse files
authored
more improvements to file sizes comments (#7319)
- Run clickbench vortex-compact benchmarks on PR - Fix up comment to be collapsible <!-- Thank you for submitting a pull request! We appreciate your time and effort. Please make sure to provide enough information so that we can review your pull request. The Summary and Testing sections below contain guidance on what to include. --> ## Summary <!-- If this PR is related to a tracked effort, please link to the relevant issue here (e.g., `Closes: #123`). Otherwise, feel free to ignore / delete this. In this section, please: 1. Explain the rationale for this change. 2. Summarize the changes included in this PR. A general rule of thumb is that larger PRs should have larger summaries. If there are a lot of changes, please help us review the code by explaining what was changed and why. If there is an issue or discussion attached, there is no need to duplicate all the details, but clarity is always preferred over brevity. --> Closes: #000 <!-- ## API Changes Uncomment this section if there are any user-facing changes. Consider whether the change affects users in one of the following ways: 1. Breaks public APIs in some way. 2. Changes the underlying behavior of one of the engine integrations. 3. Should some documentation be updated to reflect this change? If a public API is changed in a breaking manner, make sure to add the appropriate label. You can run `./scripts/public-api.sh` locally to see if there are any public API changes (and this also runs in our CI). --> ## Testing <!-- Please describe how this change was tested. Here are some common categories for testing in Vortex: 1. Verifying existing behavior is maintained. 2. Verifying new behavior and functionality works correctly. 3. Serialization compatibility (backwards and forwards) should be maintained or explicitly broken. --> --------- Signed-off-by: Andrew Duffy <andrew@a10y.dev>
1 parent 8d55015 commit 0b52e48

File tree

3 files changed

+33
-1
lines changed

3 files changed

+33
-1
lines changed

.github/workflows/bench-pr.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,8 @@ jobs:
141141
"id": "clickbench-nvme",
142142
"subcommand": "clickbench",
143143
"name": "Clickbench on NVME",
144-
"targets": "datafusion:parquet,datafusion:vortex,duckdb:parquet,duckdb:vortex,duckdb:duckdb"
144+
"targets": "datafusion:parquet,datafusion:vortex,duckdb:parquet,duckdb:vortex,duckdb:duckdb",
145+
"extra_data_formats": "vortex-compact"
145146
},
146147
{
147148
"id": "tpch-nvme",

.github/workflows/sql-benchmarks.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,11 @@ jobs:
154154
# Extract all unique formats from targets (e.g., "datafusion:parquet,duckdb:vortex" -> "parquet,vortex")
155155
all_formats=$(echo "${{ matrix.targets }}" | tr ',' '\n' | sed 's/^[^:]*://' | sort -u | tr '\n' ',' | sed 's/,$//')
156156
157+
# Append extra data formats if specified (for file size tracking without benchmarking)
158+
if [ -n "${{ matrix.extra_data_formats }}" ]; then
159+
all_formats="$all_formats,${{ matrix.extra_data_formats }}"
160+
fi
161+
157162
# Build options string if scale_factor is set
158163
opts=""
159164
if [ -n "${{ matrix.scale_factor }}" ]; then

scripts/compare-file-sizes.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,29 @@ def main():
114114
# Sort by pct_change descending (largest increases first)
115115
comparisons.sort(key=lambda x: x["pct_change"], reverse=True)
116116

117+
# Build summary line for collapsible header
118+
total_base = sum(format_totals[fmt]["base"] for fmt in format_totals)
119+
total_head = sum(format_totals[fmt]["head"] for fmt in format_totals)
120+
if total_base > 0:
121+
overall_pct = (total_head / total_base - 1) * 100
122+
overall_pct_str = format_pct_change(overall_pct)
123+
else:
124+
overall_pct_str = "new"
125+
126+
increases = sum(1 for c in comparisons if c["change"] > 0)
127+
decreases = sum(1 for c in comparisons if c["change"] < 0)
128+
129+
# Output collapsible markdown
130+
print("<details>")
131+
summary = (
132+
f"<summary>File Size Changes ({len(comparisons)} files changed, "
133+
f"{overall_pct_str} overall, {increases}{decreases}↓)</summary>"
134+
)
135+
print(summary)
136+
print("")
137+
print("<br>")
138+
print("")
139+
117140
# Output markdown table
118141
print("| File | Scale | Format | Base | HEAD | Change | % |")
119142
print("|------|-------|--------|------|------|--------|---|")
@@ -140,6 +163,9 @@ def main():
140163
pct_str = ""
141164
print(f"- {fmt}: {format_size(base_total)} \u2192 {format_size(head_total)}{pct_str}")
142165

166+
print("")
167+
print("</details>")
168+
143169

144170
if __name__ == "__main__":
145171
main()

0 commit comments

Comments
 (0)