Skip to content

feat(sisyphus): add GLM-5.x dedicated prompt builder and speed overlay#3736

Closed
islee23520 wants to merge 35 commits intocode-yeongyu:devfrom
islee23520:tune/glm-performance
Closed

feat(sisyphus): add GLM-5.x dedicated prompt builder and speed overlay#3736
islee23520 wants to merge 35 commits intocode-yeongyu:devfrom
islee23520:tune/glm-performance

Conversation

@islee23520
Copy link
Copy Markdown
Contributor

@islee23520 islee23520 commented Apr 30, 2026

Summary

GLM-5.x models overthink during Sisyphus orchestration. Since budgetTokens is not supported, thinking restraint must come from prompt structure. This PR introduces a dedicated GLM prompt system that enforces a fast dispatch-first execution loop and prevents text-only models from attempting image analysis.

Changes

  • New GLM-specific prompt builder (glm-prompt.ts) with 8-block structure enforcing DISPATCH→DELEGATE→COLLECT→SYNTHESIZE→DONE
  • Vision constraints applied to all GLM-using agents (Sisyphus, SJ, Oracle, Metis, Momus)
  • SJ speed overlay for concise GLM delegation
  • AI slop removal across agent prompts
  • Quality benchmarks (32 tests)

Verification

  • bun run typecheck — clean
  • bun run build — clean
  • bun test — 6026 pass (45 pre-existing failures in tmux/background-agent, unrelated)

Checklist

  • Code follows project conventions
  • bun run typecheck passes
  • bun run build succeeds
  • Tested locally with OpenCode
  • Updated documentation if needed (README, AGENTS.md)
  • No version changes in package.json

- Add isGlmSisyphusHarnessModel for GLM-5/5.1/5-turbo detection
- Route GLM harness models to specialized prompts in Sisyphus agents
- Add Small Context Working Memory with state slices for GLM context optimization
- Add GLM-specific context priorities and vision constraints for Sisyphus Junior
- Add comprehensive tests for GLM prompt validation and routing
- Refactor sisyphus/glm.ts: 387→69 lines via overlay pattern (string replacement)
- Add isGlmThinkingModel() for GLM-5+ text models (excludes VLM)
- Add isGlmVisionModel() for GLM VLM variants (glm-4.6v, glm-5v-turbo)
- Oracle/Metis/Momus: GLM-5+ text → thinking: { type: enabled }, Claude → budgetTokens
- Sisyphus-Junior: GLM → thinking: { type: enabled } (was bare base)
- Sisyphus: GLM overlay injection + thinking config, fact-checked comments
- Update stale test: sisyphus-junior GLM now returns thinking
- Add 100-test factory benchmark (5 agents × 7 GLM variants + cross-agent guards)
- Add runtime benchmark script (scripts/benchmark-glm-thinking.ts)

Benchmark: 100 factory tests pass, 452 agent tests pass, typecheck clean
Verified: No GLM text model receives budgetTokens across any agent

Refs: code-yeongyu#3210, code-yeongyu#3256, code-yeongyu#3568
@islee23520 islee23520 changed the title feat(agents): GLM-5.x thinking optimization, overlay refactor, and GPT-5.5 prompt hardening feat(agents): GLM-5.x thinking optimization and overlay refactor Apr 30, 2026
islee23520 added 2 commits May 1, 2026 09:04
…rlay

- New src/agents/sisyphus/glm-prompt.ts: 8-block GLM-specific Sisyphus prompt
  (DISPATCH→DELEGATE→COLLECT→SYNTHESIZE→DONE execution loop replacing
  EXPLORE→PLAN→ROUTE→EXECUTE→VERIFY→RETRY→DONE)
- New src/agents/glm-prompt-quality.test.ts: 32 quality benchmarks across
  Instruction Compliance (10), Speed (10), Accuracy (9), Cross-Agent (3)
- Extended src/agents/sisyphus-junior/glm.ts: SJ speed overlay with
  execution-first mindset, brief thinking, re-entry rule, exploration budget
  (2-iteration cap), tiered verification V1/V2/V3, token economy
- Modified src/agents/sisyphus.ts: GLM routing from overlay string.replace
  to dedicated buildGlmSisyphusPrompt() builder (matches Kimi K2.x pattern)

GLM-5.x does not support budgetTokens. Excessive thinking was controlled via
prompt engineering: concise thinking mandate, re-entry rule (suppress
re-verbalization for resolved turns), exploration budget hard stops, and
tiered verification (V1/V2/V3) to avoid over-verification on trivial changes.

Hephaestus delegation strategy included: sequential edits >= 3 automatically
routed to Hephaestus (deep-thinking worker) to keep Sisyphus unblocked.

All 140 GLM-related tests pass. Typecheck clean. AI slop removed.
@islee23520 islee23520 changed the title feat(agents): GLM-5.x thinking optimization and overlay refactor feat(sisyphus): add GLM-5.x dedicated prompt builder and speed overlay May 1, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 18 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Auto-approved: GLM-specific changes are strictly isolated using model-specific predicates. Non-GLM logic and configurations are preserved, verified by 140 tests including 32 new quality benchmarks.

islee23520 added 2 commits May 1, 2026 15:30
- Add buildGlmSubagentVisionBlock() for concise subagent vision warnings
- Apply vision constraint to Oracle, Metis, Momus (GLM branches)
- Simplify GLM SJ speed overlay prompt (remove 4 redundant lines)
- Remove redundant JSDoc from metis.ts, marketing language from sisyphus.ts
- Centralize Sisyphus description as SISYPHUS_DESCRIPTION constant
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 18 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Requires human review: Large PR (1800+ lines) modifies Sisyphus metadata descriptions globally, not just for GLM models. 45 failing tests reported, and complex model-routing logic changes require manual verification.

@TheAsda
Copy link
Copy Markdown

TheAsda commented May 1, 2026

Looking forward to try these changes

islee23520 added 3 commits May 3, 2026 11:42
…unior prompt test

The SJ GLM prompt intentionally omits the .sisyphus/state/{plan-or-session}/
path and individual slice filenames (goal.md, decisions.md, etc.) that the
main Sisyphus GLM prompt includes. The test incorrectly expected the full
ledger path; align it with the lightweight memory contract.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: The PR removes extensive JSDoc and descriptive comments from multiple agent files and types.ts, which is a regression in code maintainability and documentation.

islee23520 added 5 commits May 5, 2026 21:59
# Conflicts:
#	src/agents/momus.ts
…r GLM

Upstream removed GLM-specific thinking config from Momus, causing
budgetTokens: 32000 to be applied to GLM models that do not support
it. Restore the isGlmThinkingModel branch matching Metis and Oracle.

Also add test coverage for:
- Momus GLM thinking config without budgetTokens
- Sisyphus call_omo_agent permission (allow) vs Hephaestus (deny)
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 11 files (changes from recent commits).

Requires human review: Prompt modifications (e.g., shortening sisyphus role description) could cause subtle behavioral regressions; cannot guarantee 100% no regression.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/plugin-handlers/tool-config-handler.ts">

<violation number="1">
P1: Sisyphus’s `call_omo_agent` permission was changed from allow to deny, which can break its ability to delegate to other agents.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: Large PR (2561 lines) touches core agent logic with significant prompt changes. Cannot be 100% sure of no regressions despite clean tests.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 3 files (changes from recent commits).

Requires human review: Prompt identity changes (Sisyphus role/description) affect all models, not just GLM. Cannot guarantee zero regression from altering agent personality framing.

Restore architecture and routing intent comments in types.ts,
oracle.ts, and sisyphus/glm.ts that document design decisions.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 3 files (changes from recent commits).

Requires human review: PR adds 2544 lines across 12+ files (new agents, prompt builders, config changes). Despite clean AI review and tests, the strict '100% sure no regressions' criteria cannot be met without manual review

islee23520 added 3 commits May 7, 2026 09:57
…J, sisyphus

Restore architecture intent comments:
- metis.ts: agent role/responsibilities JSDoc
- sisyphus-junior/agent.ts: routing order, BLOCKED_TOOLS intent
- sisyphus.ts: Gemini overlay placement rationale, GLM thinking note
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 3 files (changes from recent commits).

Requires human review: Cannot be 100% sure of zero regressions: modifies core orchestration agents (sisyphus, metis, oracle, momus, sisyphus-junior) with new GLM routing logic, which could introduce edge-case misrouting or

islee23520 added 3 commits May 7, 2026 10:04
… prompt

The GLM_SJ_Speed_Optimizations section duplicated content already
present in the base SJ prompt and Sisyphus system prompt. Only the
GLM-specific context priorities and vision constraint are kept.
delegation-scorecard and event-metric-collector are test-only utilities
with zero runtime consumers. Only referenced by scripts/benchmark-*
Move to scripts/ if needed later.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 12 files (changes from recent commits).

Requires human review: Unrelated change in ralph-loop-event-handler.ts (runtime error retry cap) introduces behavioral regression risk. Not 100% sure of no regressions per custom criteria.

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai Bot commented May 7, 2026

You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment @cubic-dev-ai review.

@islee23520 islee23520 closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants