Skip to content

Commit 1dbe29a

Browse files
docs: address v0.13 review notes
1 parent 75b5b8c commit 1dbe29a

3 files changed

Lines changed: 11 additions & 11 deletions

File tree

benchmarks/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ Retail list prices; some providers may offer committed-use discounts.
3636

3737
| Model | Cost | p50 | p95 | Pass | Notes |
3838
|---|---:|---:|---:|---:|---|
39-
| google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Default for this workload |
40-
| cerebras/qwen-3-32b | $0.004 | 0.3s | 0.7s | 96/100 | **Fastest**, slightly worse on sarcasm |
39+
| google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Refresh against Gemini 3.1 Flash; was default for this workload |
40+
| cerebras/qwen-3-32b | $0.004 | 0.3s | 0.7s | 96/100 | Refresh against Qwen 3 32B; was **fastest**, slightly worse on sarcasm |
4141
| anthropic/claude-haiku-4 | $0.021 | 1.1s | 2.2s | 98/100 | Overkill |
4242
| openai/gpt-5.5-mini | $0.031 | 1.4s | 2.9s | 99/100 | Good but pricier; refresh against GPT-5.5-mini |
4343

@@ -47,8 +47,8 @@ Retail list prices; some providers may offer committed-use discounts.
4747

4848
| Model | Cost | p50 | p95 | Pass | Notes |
4949
|---|---:|---:|---:|---:|---|
50-
| google/gemini-3.1-pro | $0.31 | 22s | 38s || **Best quality**, 1M context |
51-
| google/gemini-3.1-flash | $0.08 | 11s | 19s || 4x cheaper, acceptable quality |
50+
| google/gemini-3.1-pro | $0.31 | 22s | 38s || Refresh against Gemini 3.1 Pro; was best quality, 1M context |
51+
| google/gemini-3.1-flash | $0.08 | 11s | 19s || Refresh against Gemini 3.1 Flash; was 4x cheaper, acceptable quality |
5252
| anthropic/claude-sonnet-5 | $0.72 | 19s | 31s || Caps at 200K; refresh against Sonnet 5 |
5353
| openai/gpt-5.5 | $0.90 | 26s | 45s || Refresh against GPT-5.5 |
5454

@@ -73,7 +73,7 @@ Retail list prices; some providers may offer committed-use discounts.
7373
| openai/gpt-5.5 | $0.11 | 18s | 32s || Refresh against GPT-5.5 |
7474
| anthropic/claude-opus-4.7 | $0.42 | 27s | 46s || Refresh against Opus 4.7 |
7575
| zai/glm-5 | $0.03 | 9s | 18s || Refresh against GLM-5 |
76-
| google/gemini-3.1-pro | $0.08 | 14s | 25s | 4/5 | Sometimes skips steps |
76+
| google/gemini-3.1-pro | $0.08 | 14s | 25s | 4/5 | Refresh against Gemini 3.1 Pro; sometimes skipped steps |
7777

7878
**Recommendation:** GPT-5.5 when stakes are high, GLM-5 for exploration.
7979

@@ -82,8 +82,8 @@ Retail list prices; some providers may offer committed-use discounts.
8282
| Model | Cost | p50 | p95 | Pass | Notes |
8383
|---|---:|---:|---:|---:|---|
8484
| moonshot/kimi-k2.6 | $0.12 | 38s | 74s | 50/50 | Refresh against Kimi K2.6 |
85-
| google/gemini-3.1-flash | $0.29 | 46s | 82s | 50/50 | Slightly slower |
86-
| cerebras/qwen-3-32b | $0.08 | 12s | 28s | 48/50 | **Fastest**; some schema drift |
85+
| google/gemini-3.1-flash | $0.29 | 46s | 82s | 50/50 | Refresh against Gemini 3.1 Flash; was slightly slower |
86+
| cerebras/qwen-3-32b | $0.08 | 12s | 28s | 48/50 | Refresh against Qwen 3 32B; was **fastest** with some schema drift |
8787

8888
**Recommendation:** Kimi for correctness, Cerebras when latency > perfection.
8989

skills/ops/weekly-dep-audit/SKILL.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ parameters:
2121

2222
# weekly-dep-audit — Cross-Repo Dependency Audit
2323

24-
Uses Gemini 2.5 Pro's 1M context to ingest entire lockfiles + advisory databases and report actionable findings.
24+
Uses Gemini 3.1 Pro's 1M context to ingest entire lockfiles + advisory databases and report actionable findings.
2525

2626
## Procedure
2727

@@ -34,7 +34,7 @@ Uses Gemini 2.5 Pro's 1M context to ingest entire lockfiles + advisory databases
3434
- `go.sum`
3535
- `Gemfile.lock`
3636

37-
3. **Delegate to Gemini 2.5 Pro.** Build a single `delegate_task` call:
37+
3. **Delegate to Gemini 3.1 Pro.** Build a single `delegate_task` call:
3838
```yaml
3939
goal: |
4040
Audit the following lockfiles for security advisories at severity ${SEVERITY_FLOOR} or higher.
@@ -87,4 +87,4 @@ Uses Gemini 2.5 Pro's 1M context to ingest entire lockfiles + advisory databases
8787

8888
## Cost note
8989

90-
Gemini 2.5 Pro at $1.25/$10 per MTok ingesting 1M of lockfiles ≈ $1.25 per run. Cheaper than GitHub Advanced Security for small orgs, and catches non-GitHub advisories too.
90+
Gemini 3.1 Pro at $1.50/$12 per MTok ingesting 1M of lockfiles ≈ $1.50 per run. Cheaper than GitHub Advanced Security for small orgs, and catches non-GitHub advisories too.

templates/config/cost-optimized.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
# Target: <$5/mo for personal daily-driver usage.
55
# - Gemini Flash / Pro for 90% of calls
66
# - Kimi K2.6 / Moonshot for bulk / background
7-
# - Cerebras Llama 70B (free-ish tier) for classification
7+
# - Cerebras Qwen 3 32B (free-ish tier) for classification
88
# - Gemini OAuth free tier
99
# - Anthropic Sonnet only when `intent: coding` on complex files
1010
# ------------------------------------------------------------

0 commit comments

Comments
 (0)