Skip to content

Commit 84cf154

Browse files
Kasper JungeRalphify
authored andcommitted
fix: resolve Windows .cmd/.exe extension breaking streaming mode detection
On Windows, npm installs Claude Code as `claude.cmd` (or `claude.exe`). The `_supports_stream_json` check compared `Path(cmd[0]).name` against `"claude"`, which failed for `claude.cmd` since `.name` includes the extension. Switch to `.stem` which strips the extension, so streaming mode is correctly detected on all platforms. Co-authored-by: Ralphify <noreply@ralphify.co>
1 parent ad52ebf commit 84cf154

8 files changed

Lines changed: 245 additions & 1 deletion

File tree

research/ralph-loops/REPORT.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@
5050

5151
23. **Memory engineering has moved beyond vector databases — and structured triggers beat vigilance for cross-session learning.** The leading memory frameworks (Google Always On Memory Agent, SimpleMem, Mastra Observational Memory) use SQLite or structured files with periodic LLM consolidation, not vector databases. Claude Code's two-tier memory (CLAUDE.md auto-briefing + .memory/state.json on-demand store) achieves production-grade cross-session learning at $0.05-$0.10/day with Jaccard deduplication and confidence-weighted decay. But ngrok's BMO post-mortem reveals the knowing-doing gap: agents use self-improvement tools only 2 out of 60+ sessions despite explicit instructions, and creating an OPPORTUNITIES.md file paradoxically increased procrastination. The fix: **boundary-triggered** learning (end-of-session reflections, every-N-iteration consolidation) executes reliably while vigilance-based behaviors fail. Ralph loops' command system already implements restorable compression — `{{ commands.X }}` re-derives state each iteration rather than relying on stale summaries. Claude Code Channels (March 2026) add event-driven push into running sessions, enabling reactive ralphs that respond to CI webhooks, monitoring alerts, and chat messages — shifting loops from timer-driven batch processing toward event-driven continuous operation.
5252

53+
24. **CI/CD pipelines are the natural scheduling infrastructure for agent loops — and premature orchestration complexity kills 40% of projects.** Elastic saved 20 days of engineering work by embedding Claude Code in Buildkite CI (24 PRs fixed, 22 commits). Red Hat's Cicaddy proves that CI/CD pipelines already provide everything agents need: scheduling, isolation, secrets management, and artifact storage — "no dedicated agentic platform needed." The deterministic sandwich (pre-processing → AI reasoning → post-processing) matches ralph's command→prompt→agent architecture. Meanwhile, Deloitte predicts 40% of agentic AI projects will be cancelled by 2027 due to complexity, and 86% of IT leaders cite complexity as their top concern (Salesforce). The framework landscape (LangGraph 2.2x faster than CrewAI, 8-9x token efficiency differences) is converging on the same conclusion practitioners already know: **start with single agents and strong prompts, add tools before agents, graduate to multi-agent only when facing clear limitations.** Context loss at handoff points is the #1 multi-agent failure (GitHub Blog) — "typed schemas are table stakes." Ralph's YAML frontmatter + `{{ commands.<name> }}` system already provides this structure. Fresh practitioner metrics confirm the economics: $23.14 for 47 overnight commits with 80% success rate (IntelligentTools), and hash-based line identification alone yields +5-14pp on coding benchmarks (blog.can.ac). The anti-complexity positioning is ralphify's strongest competitive moat.
54+
5355
## Chapters
5456

5557
| # | Chapter | Summary |
@@ -81,6 +83,7 @@
8183
| 25 | [Domain-Specific Loops & The Observability Gap](chapters/25-domain-specific-loops-observability.md) | Ralph loops beyond coding (security/DevOps/data/content/business), Databricks Genie Code (32→77% success), observability crisis (47.1% monitored, 88% incidents), traditional monitoring insufficient, AgenticOS concept, "any metric" positioning |
8284
| 26 | [Resilience Patterns, Model Routing & Durable Execution](chapters/26-resilience-patterns-durable-execution.md) | 4-layer fault tolerance (23%→2% unrecoverable), AIMD model failover, inner/outer loop separation, graceful degradation tiers, durable execution vs filesystem-as-checkpoint, production incident catalog (10 incidents, 0 postmortems), autoresearch at GPU scale, "harness > model" quantified |
8385
| 27 | [Practical Memory Engineering & Event-Driven Loops](chapters/27-memory-engineering-event-driven-loops.md) | BMO knowing-doing gap, Claude Code two-tier memory (CLAUDE.md + .memory/state.json), Factory.ai restorable compression, 7 memory frameworks (no vector DB trend), Claude Code Channels (event-driven push into sessions), guardrails scaling strategies, three-layer practitioner consensus |
86+
| 28 | [Workflow Composition, CI/CD Integration & Agent Fleets](chapters/28-workflow-composition-cicd-integration.md) | Google ADK 8 patterns, CI as agent scheduler (Elastic 20 days saved, Red Hat Cicaddy), typed schemas at handoffs (GitHub Blog), 40% project cancellation (Deloitte), framework landscape (LangGraph/CrewAI/AutoGen), DAG orchestration, Cursor Automations (event-triggered), fresh metrics ($23/night 80% success) |
8487

8588
## Open Questions
8689

@@ -131,3 +134,7 @@
131134
- [BMO Self-Improving Coding Agent](https://ngrok.com/blog/bmo-self-improving-coding-agent) — ngrok (knowing-doing gap, 2/60 tool usage)
132135
- [Context Compression](https://factory.ai/news/compressing-context) — Factory.ai (restorable compression, two-threshold system)
133136
- [Claude Code Channels](https://code.claude.com/docs/en/channels) — Anthropic (event-driven push into running sessions)
137+
- [CI/CD Pipelines with Claude AI](https://www.elastic.co/search-labs/blog/ci-pipelines-claude-ai-agent) — Elastic (24 PRs fixed, 20 days saved in Buildkite CI)
138+
- [Multi-Agent Workflows Often Fail](https://github.blog/ai-and-ml/generative-ai/multi-agent-workflows-often-fail-heres-how-to-engineer-ones-that-dont/) — GitHub (typed schemas, context loss as #1 failure)
139+
- [Cicaddy: Agentic Workflows in CI](https://developers.redhat.com/articles/2026/03/12/how-develop-agentic-workflows-ci-pipeline-cicaddy) — Red Hat (pipeline-native agents, MCP as tool interface)
140+
- [Ralph Loop + Claude Code: 47 Commits](https://intelligenttools.co/blog/claude-code-unsupervised-8-hours-ralph-loop) — IntelligentTools ($23/night, 80% success)

0 commit comments

Comments
 (0)