feat(sisyphus): add GLM-5.x dedicated prompt builder and speed overlay#3736
feat(sisyphus): add GLM-5.x dedicated prompt builder and speed overlay#3736islee23520 wants to merge 35 commits intocode-yeongyu:devfrom
Conversation
- Add isGlmSisyphusHarnessModel for GLM-5/5.1/5-turbo detection - Route GLM harness models to specialized prompts in Sisyphus agents - Add Small Context Working Memory with state slices for GLM context optimization - Add GLM-specific context priorities and vision constraints for Sisyphus Junior - Add comprehensive tests for GLM prompt validation and routing
- Refactor sisyphus/glm.ts: 387→69 lines via overlay pattern (string replacement)
- Add isGlmThinkingModel() for GLM-5+ text models (excludes VLM)
- Add isGlmVisionModel() for GLM VLM variants (glm-4.6v, glm-5v-turbo)
- Oracle/Metis/Momus: GLM-5+ text → thinking: { type: enabled }, Claude → budgetTokens
- Sisyphus-Junior: GLM → thinking: { type: enabled } (was bare base)
- Sisyphus: GLM overlay injection + thinking config, fact-checked comments
- Update stale test: sisyphus-junior GLM now returns thinking
- Add 100-test factory benchmark (5 agents × 7 GLM variants + cross-agent guards)
- Add runtime benchmark script (scripts/benchmark-glm-thinking.ts)
Benchmark: 100 factory tests pass, 452 agent tests pass, typecheck clean
Verified: No GLM text model receives budgetTokens across any agent
Refs: code-yeongyu#3210, code-yeongyu#3256, code-yeongyu#3568
…rlay - New src/agents/sisyphus/glm-prompt.ts: 8-block GLM-specific Sisyphus prompt (DISPATCH→DELEGATE→COLLECT→SYNTHESIZE→DONE execution loop replacing EXPLORE→PLAN→ROUTE→EXECUTE→VERIFY→RETRY→DONE) - New src/agents/glm-prompt-quality.test.ts: 32 quality benchmarks across Instruction Compliance (10), Speed (10), Accuracy (9), Cross-Agent (3) - Extended src/agents/sisyphus-junior/glm.ts: SJ speed overlay with execution-first mindset, brief thinking, re-entry rule, exploration budget (2-iteration cap), tiered verification V1/V2/V3, token economy - Modified src/agents/sisyphus.ts: GLM routing from overlay string.replace to dedicated buildGlmSisyphusPrompt() builder (matches Kimi K2.x pattern) GLM-5.x does not support budgetTokens. Excessive thinking was controlled via prompt engineering: concise thinking mandate, re-entry rule (suppress re-verbalization for resolved turns), exploration budget hard stops, and tiered verification (V1/V2/V3) to avoid over-verification on trivial changes. Hephaestus delegation strategy included: sequential edits >= 3 automatically routed to Hephaestus (deep-thinking worker) to keep Sisyphus unblocked. All 140 GLM-related tests pass. Typecheck clean. AI slop removed.
There was a problem hiding this comment.
No issues found across 18 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Auto-approved: GLM-specific changes are strictly isolated using model-specific predicates. Non-GLM logic and configurations are preserved, verified by 140 tests including 32 new quality benchmarks.
- Add buildGlmSubagentVisionBlock() for concise subagent vision warnings - Apply vision constraint to Oracle, Metis, Momus (GLM branches) - Simplify GLM SJ speed overlay prompt (remove 4 redundant lines) - Remove redundant JSDoc from metis.ts, marketing language from sisyphus.ts - Centralize Sisyphus description as SISYPHUS_DESCRIPTION constant
There was a problem hiding this comment.
No issues found across 18 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Requires human review: Large PR (1800+ lines) modifies Sisyphus metadata descriptions globally, not just for GLM models. 45 failing tests reported, and complex model-routing logic changes require manual verification.
|
Looking forward to try these changes |
…unior prompt test
The SJ GLM prompt intentionally omits the .sisyphus/state/{plan-or-session}/
path and individual slice filenames (goal.md, decisions.md, etc.) that the
main Sisyphus GLM prompt includes. The test incorrectly expected the full
ledger path; align it with the lightweight memory contract.
# Conflicts: # src/agents/momus.ts
…r GLM Upstream removed GLM-specific thinking config from Momus, causing budgetTokens: 32000 to be applied to GLM models that do not support it. Restore the isGlmThinkingModel branch matching Metis and Oracle. Also add test coverage for: - Momus GLM thinking config without budgetTokens - Sisyphus call_omo_agent permission (allow) vs Hephaestus (deny)
There was a problem hiding this comment.
1 issue found across 5 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/plugin-handlers/tool-config-handler.ts">
<violation number="1">
P1: Sisyphus’s `call_omo_agent` permission was changed from allow to deny, which can break its ability to delegate to other agents.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
…="deep") delegation
Restore architecture and routing intent comments in types.ts, oracle.ts, and sisyphus/glm.ts that document design decisions.
There was a problem hiding this comment.
0 issues found across 3 files (changes from recent commits).
Requires human review: PR adds 2544 lines across 12+ files (new agents, prompt builders, config changes). Despite clean AI review and tests, the strict '100% sure no regressions' criteria cannot be met without manual review
…J, sisyphus Restore architecture intent comments: - metis.ts: agent role/responsibilities JSDoc - sisyphus-junior/agent.ts: routing order, BLOCKED_TOOLS intent - sisyphus.ts: Gemini overlay placement rationale, GLM thinking note
There was a problem hiding this comment.
0 issues found across 3 files (changes from recent commits).
Requires human review: Cannot be 100% sure of zero regressions: modifies core orchestration agents (sisyphus, metis, oracle, momus, sisyphus-junior) with new GLM routing logic, which could introduce edge-case misrouting or
… prompt The GLM_SJ_Speed_Optimizations section duplicated content already present in the base SJ prompt and Sisyphus system prompt. Only the GLM-specific context priorities and vision constraint are kept.
delegation-scorecard and event-metric-collector are test-only utilities with zero runtime consumers. Only referenced by scripts/benchmark-* Move to scripts/ if needed later.
|
You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment |
Summary
GLM-5.x models overthink during Sisyphus orchestration. Since
budgetTokensis not supported, thinking restraint must come from prompt structure. This PR introduces a dedicated GLM prompt system that enforces a fast dispatch-first execution loop and prevents text-only models from attempting image analysis.Changes
glm-prompt.ts) with 8-block structure enforcing DISPATCH→DELEGATE→COLLECT→SYNTHESIZE→DONEVerification
bun run typecheck— cleanbun run build— cleanbun test— 6026 pass (45 pre-existing failures in tmux/background-agent, unrelated)Checklist
bun run typecheckpassesbun run buildsucceedspackage.json