Skip to content

Commit 7a2a697

Browse files
authored
CI - fix LLM benchmark workflows (clockworklabs#5181)
# Description of Changes Updates `llm-benchmark-periodic.yml` and `llm-benchmark-validate-goldens.yml` to new CI infrastructure. - Switch to the current runner (`spacetimedb-new-runner-2`) - Drop the dead `localhost:5000` container + `--privileged` - Build the local SpacetimeDB server the benchmark harness needs, and use that same local CLI for publishing # API and ABI breaking changes None. CI-only. # Expected complexity level and risk 1 — workflow-only, no production code. # Testing - [ ] Run both workflows via `workflow_dispatch` and confirm they pass
1 parent 3169913 commit 7a2a697

2 files changed

Lines changed: 16 additions & 20 deletions

File tree

.github/workflows/llm-benchmark-periodic.yml

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,19 +29,10 @@ concurrency:
2929

3030
jobs:
3131
run-benchmarks:
32-
runs-on: spacetimedb-new-runner
33-
container:
34-
image: localhost:5000/spacetimedb-ci:latest
35-
options: >-
36-
--privileged
32+
runs-on: spacetimedb-new-runner-2
3733
timeout-minutes: 180
3834

3935
steps:
40-
- name: Install spacetime CLI
41-
run: |
42-
curl -sSf https://install.spacetimedb.com | sh -s -- -y
43-
echo "$HOME/.local/bin" >> $GITHUB_PATH
44-
4536
- name: Checkout master
4637
uses: actions/checkout@v4
4738
with:
@@ -75,6 +66,13 @@ jobs:
7566
- name: Build llm-benchmark tool
7667
run: cargo install --path tools/xtask-llm-benchmark --locked
7768

69+
- name: Build SpacetimeDB server for benchmark harness
70+
run: |
71+
cargo ci smoketests prepare
72+
mkdir -p "$HOME/.local/bin"
73+
ln -sf "$GITHUB_WORKSPACE/target/release/spacetimedb-cli" "$HOME/.local/bin/spacetime"
74+
echo "$HOME/.local/bin" >> "$GITHUB_PATH"
75+
7876
- name: Run benchmarks
7977
env:
8078
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}

.github/workflows/llm-benchmark-validate-goldens.yml

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,7 @@ concurrency:
1515

1616
jobs:
1717
validate-goldens:
18-
runs-on: spacetimedb-new-runner
19-
container:
20-
image: localhost:5000/spacetimedb-ci:latest
21-
options: >-
22-
--privileged
18+
runs-on: spacetimedb-new-runner-2
2319
timeout-minutes: 60
2420

2521
strategy:
@@ -28,11 +24,6 @@ jobs:
2824
lang: [rust, csharp, typescript]
2925

3026
steps:
31-
- name: Install spacetime CLI
32-
run: |
33-
curl -sSf https://install.spacetimedb.com | sh -s -- -y
34-
echo "$HOME/.local/bin" >> $GITHUB_PATH
35-
3627
- name: Checkout master
3728
uses: actions/checkout@v4
3829
with:
@@ -70,6 +61,13 @@ jobs:
7061
- name: Build llm-benchmark tool
7162
run: cargo install --path tools/xtask-llm-benchmark --locked
7263

64+
- name: Build SpacetimeDB server for benchmark harness
65+
run: |
66+
cargo ci smoketests prepare
67+
mkdir -p "$HOME/.local/bin"
68+
ln -sf "$GITHUB_WORKSPACE/target/release/spacetimedb-cli" "$HOME/.local/bin/spacetime"
69+
echo "$HOME/.local/bin" >> "$GITHUB_PATH"
70+
7371
- name: Validate golden answers (${{ matrix.lang }})
7472
env:
7573
MSBUILDDISABLENODEREUSE: "1"

0 commit comments

Comments
 (0)