docs: refresh Hermes guide for v0.13 by OnlyTerp · Pull Request #11 · OnlyTerp/hermes-optimization-guide

OnlyTerp · 2026-05-14T23:31:56Z

Summary

Refreshes the guide for Hermes v0.13.0 / v2026.5.7 and the May 2026 SOTA agent/model landscape. Adds Part 23 for the Tenacity stack: Kanban, worker lanes, /goal, Checkpoints v2, no-agent cron, media routing, xAI Custom Voices, provider plugins, and the v0.13 upgrade checklist. Updates stale v0.12-as-current claims across README, platform docs, model guidance, coding-agent routing, security, observability, templates, benchmarks, localized READMEs, and outreach copy. Follow-up commits address Devin Review notes by marking renamed benchmark rows as refresh-needed, aligning the cost-optimized template header, and updating weekly dependency-audit Gemini 3.1 pricing/prose.

Type

Checklist

Cross-links are relative (./partN-foo.md) and resolve
No secrets in any example — ${VAR} placeholders only
Dates / prices / PR numbers are current (or marked with the date)
For skills: security notes included; trust: / bypass_subagents posture documented
For templates: every non-obvious field is commented
CHANGELOG.md updated if user-facing

Screenshots / diffs (optional)

Local validation run:

python3 .github/scripts/validate_skills.py
yamllint -c .github/yamllint.yml templates benchmarks
markdown-link-check across Markdown files
git diff --check

Link to Devin session: https://app.devin.ai/sessions/d600c4abbb09414abe9bb0a0802421d0
Requested by: @OnlyTerp

devin-ai-integration · 2026-05-14T23:31:58Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

devin-ai-integration

Devin Review found 3 potential issues.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-05-14T23:36:46Z

+| google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Default for this workload |
+| cerebras/qwen-3-32b  | $0.004 | 0.3s | 0.7s | 96/100 | **Fastest**, slightly worse on sarcasm |


🟡 Benchmark T1/T5 model names changed without 'Refresh' notes, inconsistent with T2–T4 convention

In the benchmark results tables, T2 through T4 entries consistently mark model-name-changed rows with "Refresh against [model]" notes to signal the data was measured on the predecessor model. T1 and T5 entries for google/gemini-3.1-flash and cerebras/qwen-3-32b change model names (from gemini-2.5-flash and cerebras/llama-3.1-70b) but keep the original Notes text (e.g., "Default for this workload", "Fastest; some schema drift"). This implies the benchmark data was measured on the new models when it was actually measured on entirely different ones. The T1 recommendation line at benchmarks/README.md:44 does note model IDs changed, but the individual table rows at lines 39–40 do not, breaking the convention followed consistently elsewhere in the same file.

Suggested change

| google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Default for this workload |

| cerebras/qwen-3-32b | $0.004 | 0.3s | 0.7s | 96/100 | **Fastest**, slightly worse on sarcasm |

| google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Refresh against Gemini 3.1 Flash; was default for this workload |

| cerebras/qwen-3-32b | $0.004 | 0.3s | 0.7s | 96/100 | Refresh against Qwen 3 32B; was **Fastest**, slightly worse on sarcasm |

Was this helpful? React with 👍 or 👎 to provide feedback.

Fixed in 1dbe29a: renamed benchmark rows now explicitly say they need refresh against the new model IDs instead of implying the old measurements were run on the new models.

devin-ai-integration · 2026-05-14T23:36:47Z

-  long_context: google/gemini-2.5-pro
-  coding: moonshot/kimi                 # Fallback to Claude only for hard coding
+  default: google/gemini-3.1-flash
+  classification: cerebras/qwen-3-32b


🟡 cost-optimized.yaml header comment says 'Cerebras Llama 70B' but classification model was changed to Qwen 3 32B

The PR changed the classification model from cerebras/llama-3.1-70b to cerebras/qwen-3-32b at line 16, but the header comment at templates/config/cost-optimized.yaml:7 still reads # - Cerebras Llama 70B (free-ish tier) for classification. The CONTRIBUTING.md PR checklist requires "For templates: comment every non-obvious field; include a header explaining what the template is for" — the header now misidentifies a key model.

Prompt for agents

The header comment block at lines 1-10 of templates/config/cost-optimized.yaml states "Cerebras Llama 70B (free-ish tier) for classification" at line 7, but the actual classification model at line 16 was changed to cerebras/qwen-3-32b. Update line 7 to read something like "Cerebras Qwen 3 32B (free-ish tier) for classification" to match the actual model used.

Was this helpful? React with 👍 or 👎 to provide feedback.

Fixed in 1dbe29a: the header now matches the actual cerebras/qwen-3-32b classification model.

devin-ai-integration · 2026-05-14T23:36:48Z

         ...
   toolsets: [web]
-   model: gemini-2.5-pro          # 1M context
+   model: gemini-3.1-pro          # 1M context


🟡 weekly-dep-audit model ref updated to gemini-3.1-pro but surrounding prose still says 'Gemini 2.5 Pro' with stale pricing

The delegate_task model at skills/ops/weekly-dep-audit/SKILL.md:54 was changed from gemini-2.5-pro to gemini-3.1-pro, but the surrounding prose was not updated: line 24 says "Uses Gemini 2.5 Pro's 1M context", line 37 says "Delegate to Gemini 2.5 Pro", and line 90 quotes pricing as "$1.25/$10 per MTok" which matches the old gemini-2.5-pro entry, not the new gemini-3.1-pro which is priced at $1.50/$12 per MTok in the updated benchmarks/matrix.yaml:10-12. Users following the cost note would underestimate per-run cost.

Prompt for agents

In skills/ops/weekly-dep-audit/SKILL.md, the model reference at line 54 was updated to gemini-3.1-pro but three other locations in the same file still reference the old model: - Line 24: "Uses Gemini 2.5 Pro's 1M context" should become "Uses Gemini 3.1 Pro's 1M context" - Line 37: "Delegate to Gemini 2.5 Pro" should become "Delegate to Gemini 3.1 Pro" - Line 90: "Gemini 2.5 Pro at $1.25/$10 per MTok" should be updated to reflect Gemini 3.1 Pro pricing from benchmarks/matrix.yaml ($1.50/$12 per MTok), updating the per-run cost estimate accordingly. This is an incomplete model-name migration within a single file.

Was this helpful? React with 👍 or 👎 to provide feedback.

Fixed in 1dbe29a: updated the skill prose to Gemini 3.1 Pro and adjusted the cost note to $1.50/$12 per MTok / roughly $1.50 per 1M-token run.

docs: refresh Hermes guide for v0.13

75b5b8c

devin-ai-integration Bot assigned OnlyTerp May 14, 2026

devin-ai-integration Bot reviewed May 14, 2026

View reviewed changes

docs: address v0.13 review notes

1dbe29a

OnlyTerp merged commit a82cace into main May 15, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: refresh Hermes guide for v0.13#11

docs: refresh Hermes guide for v0.13#11
OnlyTerp merged 2 commits into
mainfrom
devin/1778800581-hermes-sota-refresh

OnlyTerp commented May 14, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot commented May 14, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 14, 2026

Uh oh!

devin-ai-integration Bot May 14, 2026

Uh oh!

devin-ai-integration Bot May 14, 2026

Uh oh!

devin-ai-integration Bot May 14, 2026

Uh oh!

devin-ai-integration Bot May 14, 2026

Uh oh!

devin-ai-integration Bot May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		\| google/gemini-3.1-flash \| $0.018 \| 0.9s \| 1.6s \| 98/100 \| Default for this workload \|
		\| cerebras/qwen-3-32b \| $0.004 \| 0.3s \| 0.7s \| 96/100 \| Fastest, slightly worse on sarcasm \|

Conversation

OnlyTerp commented May 14, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type

Checklist

Screenshots / diffs (optional)

Uh oh!

devin-ai-integration Bot commented May 14, 2026

🤖 Devin AI Engineer

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

OnlyTerp commented May 14, 2026 •

edited by devin-ai-integration Bot

Loading