Skip to content

cache benchmark runs#417

Open
vastzuby wants to merge 1 commit into
masterfrom
AUTO-1452-benchmarks-run-cache
Open

cache benchmark runs#417
vastzuby wants to merge 1 commit into
masterfrom
AUTO-1452-benchmarks-run-cache

Conversation

@vastzuby

@vastzuby vastzuby commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

vastai run benchmarks rents a real GPU, measures perf, and tears it down. Every run's perf is reported to the benchmarks tablex so the table is a cross-user record of measured perf per spec.

This PR uses that table as a cache. Before renting, we check for an
existing benchmark with the same GPU type, num_gpus, and template,
reported in the last 30 days.

  • Cache hit: serve the median of the reported perf values and skip the
    rental. We also show the spread (range + sample count), since perf
    varies machine-to-machine for some workloads.
  • Cache miss: fall back to the current rent-and-measure flow.

Cached rows report perf only. We deliberately do not show $/hr or perf/$
for them: the price would come from a different machine than the one
benchmarked, so a cached perf/$ wouldn't correspond to anything real.
perf/$ stays accurate only on freshly measured (--no-cache) rows.

If a cached spec has no offers available to rent right now, we flag it
("no offers available to rent right now") rather than implying it's
rentable.

@vastzuby vastzuby requested a review from LucasArmandVast June 24, 2026 01:07
@vastzuby vastzuby force-pushed the AUTO-1452-benchmarks-run-cache branch from e97b98e to 9ebf5d5 Compare June 24, 2026 01:09
@vastzuby vastzuby changed the title feat(AUTO-1452): use benchmarks table as cache for the run command cache benchmark runs Jun 30, 2026
Reuse the benchmarks table as a cross-user cache for `vastai run
benchmarks`: before renting, look for a recent (<=30d) row matching the
same template + GPU + count and serve its median measured perf instead
of renting and re-measuring.

- Perf-only cache: cached rows carry no $/hr, since a cached row's price
  would come from a different machine than the one benchmarked; $/hr and
  perf/$ render "-" for cached rows.
- Show the perf spread (median, range, n, age) so real machine-to-machine
  variance is visible rather than a single misleading number.
- On a cache hit, prompt to reuse the result or run a fresh benchmark;
  -y/--raw are non-interactive and reuse the cache. --no-cache always
  re-measures.
- Flag cache hits that have no current matching offers as not rentable.
- Fix endpoint_name: drop the parens around the uuid suffix; the backend
  rejects shell metacharacters in endpoint_name, so any real rental 400'd
  (pre-existing bug in the base command).
- Lower the default --timeout from 60m to 30m.
- search_benchmarks type hint accepts dict queries.
- Drop leading underscores from module helpers/constants to match the
  rest of the cli commands; tidy comments, docstrings, and constant
  grouping.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JKE6tJgMRhV5tcLtYBYKnQ
@vastzuby vastzuby force-pushed the AUTO-1452-benchmarks-run-cache branch from 88582c7 to abeda07 Compare June 30, 2026 19:54
Template is matched client-side: rows carry template_hash or template_id
depending on how the benchmarked workergroup was created.
"""
query = {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include a limit on this query, since it could potentially return 1000s of rows for common template/num_gpu/gpu_name combos (i.e. ComfyUI on 1x5090)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants