Skip to content

docs: refresh Hermes guide for v0.13#11

Merged
OnlyTerp merged 2 commits into
mainfrom
devin/1778800581-hermes-sota-refresh
May 15, 2026
Merged

docs: refresh Hermes guide for v0.13#11
OnlyTerp merged 2 commits into
mainfrom
devin/1778800581-hermes-sota-refresh

Conversation

@OnlyTerp
Copy link
Copy Markdown
Owner

@OnlyTerp OnlyTerp commented May 14, 2026

Summary

Refreshes the guide for Hermes v0.13.0 / v2026.5.7 and the May 2026 SOTA agent/model landscape. Adds Part 23 for the Tenacity stack: Kanban, worker lanes, /goal, Checkpoints v2, no-agent cron, media routing, xAI Custom Voices, provider plugins, and the v0.13 upgrade checklist. Updates stale v0.12-as-current claims across README, platform docs, model guidance, coding-agent routing, security, observability, templates, benchmarks, localized READMEs, and outreach copy. Follow-up commits address Devin Review notes by marking renamed benchmark rows as refresh-needed, aligning the cost-optimized template header, and updating weekly dependency-audit Gemini 3.1 pricing/prose.

Type

  • Docs / content update
  • New skill (skills/)
  • New config template (templates/config/)
  • Benchmark addition
  • Ecosystem entry
  • Infra template (compose / caddy / systemd / script)
  • Fix / typo / link

Checklist

  • Cross-links are relative (./partN-foo.md) and resolve
  • No secrets in any example — ${VAR} placeholders only
  • Dates / prices / PR numbers are current (or marked with the date)
  • For skills: security notes included; trust: / bypass_subagents posture documented
  • For templates: every non-obvious field is commented
  • CHANGELOG.md updated if user-facing

Screenshots / diffs (optional)

Local validation run:

  • python3 .github/scripts/validate_skills.py
  • yamllint -c .github/yamllint.yml templates benchmarks
  • markdown-link-check across Markdown files
  • git diff --check

Link to Devin session: https://app.devin.ai/sessions/d600c4abbb09414abe9bb0a0802421d0
Requested by: @OnlyTerp


Open in Devin Review

@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment thread benchmarks/README.md Outdated
Comment on lines +39 to +40
| google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Default for this workload |
| cerebras/qwen-3-32b | $0.004 | 0.3s | 0.7s | 96/100 | **Fastest**, slightly worse on sarcasm |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Benchmark T1/T5 model names changed without 'Refresh' notes, inconsistent with T2–T4 convention

In the benchmark results tables, T2 through T4 entries consistently mark model-name-changed rows with "Refresh against [model]" notes to signal the data was measured on the predecessor model. T1 and T5 entries for google/gemini-3.1-flash and cerebras/qwen-3-32b change model names (from gemini-2.5-flash and cerebras/llama-3.1-70b) but keep the original Notes text (e.g., "Default for this workload", "Fastest; some schema drift"). This implies the benchmark data was measured on the new models when it was actually measured on entirely different ones. The T1 recommendation line at benchmarks/README.md:44 does note model IDs changed, but the individual table rows at lines 39–40 do not, breaking the convention followed consistently elsewhere in the same file.

Suggested change
| google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Default for this workload |
| cerebras/qwen-3-32b | $0.004 | 0.3s | 0.7s | 96/100 | **Fastest**, slightly worse on sarcasm |
| google/gemini-3.1-flash | $0.018 | 0.9s | 1.6s | 98/100 | Refresh against Gemini 3.1 Flash; was default for this workload |
| cerebras/qwen-3-32b | $0.004 | 0.3s | 0.7s | 96/100 | Refresh against Qwen 3 32B; was **Fastest**, slightly worse on sarcasm |
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1dbe29a: renamed benchmark rows now explicitly say they need refresh against the new model IDs instead of implying the old measurements were run on the new models.

long_context: google/gemini-2.5-pro
coding: moonshot/kimi # Fallback to Claude only for hard coding
default: google/gemini-3.1-flash
classification: cerebras/qwen-3-32b
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 cost-optimized.yaml header comment says 'Cerebras Llama 70B' but classification model was changed to Qwen 3 32B

The PR changed the classification model from cerebras/llama-3.1-70b to cerebras/qwen-3-32b at line 16, but the header comment at templates/config/cost-optimized.yaml:7 still reads # - Cerebras Llama 70B (free-ish tier) for classification. The CONTRIBUTING.md PR checklist requires "For templates: comment every non-obvious field; include a header explaining what the template is for" — the header now misidentifies a key model.

Prompt for agents
The header comment block at lines 1-10 of templates/config/cost-optimized.yaml states "Cerebras Llama 70B (free-ish tier) for classification" at line 7, but the actual classification model at line 16 was changed to cerebras/qwen-3-32b. Update line 7 to read something like "Cerebras Qwen 3 32B (free-ish tier) for classification" to match the actual model used.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1dbe29a: the header now matches the actual cerebras/qwen-3-32b classification model.

...
toolsets: [web]
model: gemini-2.5-pro # 1M context
model: gemini-3.1-pro # 1M context
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 weekly-dep-audit model ref updated to gemini-3.1-pro but surrounding prose still says 'Gemini 2.5 Pro' with stale pricing

The delegate_task model at skills/ops/weekly-dep-audit/SKILL.md:54 was changed from gemini-2.5-pro to gemini-3.1-pro, but the surrounding prose was not updated: line 24 says "Uses Gemini 2.5 Pro's 1M context", line 37 says "Delegate to Gemini 2.5 Pro", and line 90 quotes pricing as "$1.25/$10 per MTok" which matches the old gemini-2.5-pro entry, not the new gemini-3.1-pro which is priced at $1.50/$12 per MTok in the updated benchmarks/matrix.yaml:10-12. Users following the cost note would underestimate per-run cost.

Prompt for agents
In skills/ops/weekly-dep-audit/SKILL.md, the model reference at line 54 was updated to gemini-3.1-pro but three other locations in the same file still reference the old model:

- Line 24: "Uses Gemini 2.5 Pro's 1M context" should become "Uses Gemini 3.1 Pro's 1M context"
- Line 37: "Delegate to Gemini 2.5 Pro" should become "Delegate to Gemini 3.1 Pro"
- Line 90: "Gemini 2.5 Pro at $1.25/$10 per MTok" should be updated to reflect Gemini 3.1 Pro pricing from benchmarks/matrix.yaml ($1.50/$12 per MTok), updating the per-run cost estimate accordingly.

This is an incomplete model-name migration within a single file.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1dbe29a: updated the skill prose to Gemini 3.1 Pro and adjusted the cost note to $1.50/$12 per MTok / roughly $1.50 per 1M-token run.

@OnlyTerp OnlyTerp merged commit a82cace into main May 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant