docs: refresh Hermes guide for v0.13#11
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
| | google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Default for this workload | | ||
| | cerebras/qwen-3-32b | $0.004 | 0.3s | 0.7s | 96/100 | **Fastest**, slightly worse on sarcasm | |
There was a problem hiding this comment.
🟡 Benchmark T1/T5 model names changed without 'Refresh' notes, inconsistent with T2–T4 convention
In the benchmark results tables, T2 through T4 entries consistently mark model-name-changed rows with "Refresh against [model]" notes to signal the data was measured on the predecessor model. T1 and T5 entries for google/gemini-3.1-flash and cerebras/qwen-3-32b change model names (from gemini-2.5-flash and cerebras/llama-3.1-70b) but keep the original Notes text (e.g., "Default for this workload", "Fastest; some schema drift"). This implies the benchmark data was measured on the new models when it was actually measured on entirely different ones. The T1 recommendation line at benchmarks/README.md:44 does note model IDs changed, but the individual table rows at lines 39–40 do not, breaking the convention followed consistently elsewhere in the same file.
| | google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Default for this workload | | |
| | cerebras/qwen-3-32b | $0.004 | 0.3s | 0.7s | 96/100 | **Fastest**, slightly worse on sarcasm | | |
| | google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Refresh against Gemini 3.1 Flash; was default for this workload | | |
| | cerebras/qwen-3-32b | $0.004 | 0.3s | 0.7s | 96/100 | Refresh against Qwen 3 32B; was **Fastest**, slightly worse on sarcasm | |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed in 1dbe29a: renamed benchmark rows now explicitly say they need refresh against the new model IDs instead of implying the old measurements were run on the new models.
| long_context: google/gemini-2.5-pro | ||
| coding: moonshot/kimi # Fallback to Claude only for hard coding | ||
| default: google/gemini-3.1-flash | ||
| classification: cerebras/qwen-3-32b |
There was a problem hiding this comment.
🟡 cost-optimized.yaml header comment says 'Cerebras Llama 70B' but classification model was changed to Qwen 3 32B
The PR changed the classification model from cerebras/llama-3.1-70b to cerebras/qwen-3-32b at line 16, but the header comment at templates/config/cost-optimized.yaml:7 still reads # - Cerebras Llama 70B (free-ish tier) for classification. The CONTRIBUTING.md PR checklist requires "For templates: comment every non-obvious field; include a header explaining what the template is for" — the header now misidentifies a key model.
Prompt for agents
The header comment block at lines 1-10 of templates/config/cost-optimized.yaml states "Cerebras Llama 70B (free-ish tier) for classification" at line 7, but the actual classification model at line 16 was changed to cerebras/qwen-3-32b. Update line 7 to read something like "Cerebras Qwen 3 32B (free-ish tier) for classification" to match the actual model used.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed in 1dbe29a: the header now matches the actual cerebras/qwen-3-32b classification model.
| ... | ||
| toolsets: [web] | ||
| model: gemini-2.5-pro # 1M context | ||
| model: gemini-3.1-pro # 1M context |
There was a problem hiding this comment.
🟡 weekly-dep-audit model ref updated to gemini-3.1-pro but surrounding prose still says 'Gemini 2.5 Pro' with stale pricing
The delegate_task model at skills/ops/weekly-dep-audit/SKILL.md:54 was changed from gemini-2.5-pro to gemini-3.1-pro, but the surrounding prose was not updated: line 24 says "Uses Gemini 2.5 Pro's 1M context", line 37 says "Delegate to Gemini 2.5 Pro", and line 90 quotes pricing as "$1.25/$10 per MTok" which matches the old gemini-2.5-pro entry, not the new gemini-3.1-pro which is priced at $1.50/$12 per MTok in the updated benchmarks/matrix.yaml:10-12. Users following the cost note would underestimate per-run cost.
Prompt for agents
In skills/ops/weekly-dep-audit/SKILL.md, the model reference at line 54 was updated to gemini-3.1-pro but three other locations in the same file still reference the old model:
- Line 24: "Uses Gemini 2.5 Pro's 1M context" should become "Uses Gemini 3.1 Pro's 1M context"
- Line 37: "Delegate to Gemini 2.5 Pro" should become "Delegate to Gemini 3.1 Pro"
- Line 90: "Gemini 2.5 Pro at $1.25/$10 per MTok" should be updated to reflect Gemini 3.1 Pro pricing from benchmarks/matrix.yaml ($1.50/$12 per MTok), updating the per-run cost estimate accordingly.
This is an incomplete model-name migration within a single file.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Fixed in 1dbe29a: updated the skill prose to Gemini 3.1 Pro and adjusted the cost note to $1.50/$12 per MTok / roughly $1.50 per 1M-token run.
Summary
Refreshes the guide for Hermes v0.13.0 / v2026.5.7 and the May 2026 SOTA agent/model landscape. Adds Part 23 for the Tenacity stack: Kanban, worker lanes,
/goal, Checkpoints v2, no-agent cron, media routing, xAI Custom Voices, provider plugins, and the v0.13 upgrade checklist. Updates stale v0.12-as-current claims across README, platform docs, model guidance, coding-agent routing, security, observability, templates, benchmarks, localized READMEs, and outreach copy. Follow-up commits address Devin Review notes by marking renamed benchmark rows as refresh-needed, aligning the cost-optimized template header, and updating weekly dependency-audit Gemini 3.1 pricing/prose.Type
skills/)templates/config/)Checklist
./partN-foo.md) and resolve${VAR}placeholders onlytrust:/bypass_subagentsposture documentedScreenshots / diffs (optional)
Local validation run:
python3 .github/scripts/validate_skills.pyyamllint -c .github/yamllint.yml templates benchmarksmarkdown-link-checkacross Markdown filesgit diff --checkLink to Devin session: https://app.devin.ai/sessions/d600c4abbb09414abe9bb0a0802421d0
Requested by: @OnlyTerp