computerlovetech
diff --git a/‎research/ralph-loops/REPORT.md‎
Lines changed: 7 additions & 0 deletions b/‎research/ralph-loops/REPORT.md‎
Lines changed: 7 additions & 0 deletions
@@ -50,6 +50,8 @@
 
 23. **Memory engineering has moved beyond vector databases — and structured triggers beat vigilance for cross-session learning.** The leading memory frameworks (Google Always On Memory Agent, SimpleMem, Mastra Observational Memory) use SQLite or structured files with periodic LLM consolidation, not vector databases. Claude Code's two-tier memory (CLAUDE.md auto-briefing + .memory/state.json on-demand store) achieves production-grade cross-session learning at $0.05-$0.10/day with Jaccard deduplication and confidence-weighted decay. But ngrok's BMO post-mortem reveals the knowing-doing gap: agents use self-improvement tools only 2 out of 60+ sessions despite explicit instructions, and creating an OPPORTUNITIES.md file paradoxically increased procrastination. The fix: **boundary-triggered** learning (end-of-session reflections, every-N-iteration consolidation) executes reliably while vigilance-based behaviors fail. Ralph loops' command system already implements restorable compression — `{{ commands.X }}` re-derives state each iteration rather than relying on stale summaries. Claude Code Channels (March 2026) add event-driven push into running sessions, enabling reactive ralphs that respond to CI webhooks, monitoring alerts, and chat messages — shifting loops from timer-driven batch processing toward event-driven continuous operation.
 
+24. **CI/CD pipelines are the natural scheduling infrastructure for agent loops — and premature orchestration complexity kills 40% of projects.** Elastic saved 20 days of engineering work by embedding Claude Code in Buildkite CI (24 PRs fixed, 22 commits). Red Hat's Cicaddy proves that CI/CD pipelines already provide everything agents need: scheduling, isolation, secrets management, and artifact storage — "no dedicated agentic platform needed." The deterministic sandwich (pre-processing → AI reasoning → post-processing) matches ralph's command→prompt→agent architecture. Meanwhile, Deloitte predicts 40% of agentic AI projects will be cancelled by 2027 due to complexity, and 86% of IT leaders cite complexity as their top concern (Salesforce). The framework landscape (LangGraph 2.2x faster than CrewAI, 8-9x token efficiency differences) is converging on the same conclusion practitioners already know: **start with single agents and strong prompts, add tools before agents, graduate to multi-agent only when facing clear limitations.** Context loss at handoff points is the #1 multi-agent failure (GitHub Blog) — "typed schemas are table stakes." Ralph's YAML frontmatter + `{{ commands.<name> }}` system already provides this structure. Fresh practitioner metrics confirm the economics: $23.14 for 47 overnight commits with 80% success rate (IntelligentTools), and hash-based line identification alone yields +5-14pp on coding benchmarks (blog.can.ac). The anti-complexity positioning is ralphify's strongest competitive moat.
+
 ## Chapters
 
 | # | Chapter | Summary |
@@ -81,6 +83,7 @@
 | 25 | [Domain-Specific Loops & The Observability Gap](chapters/25-domain-specific-loops-observability.md) | Ralph loops beyond coding (security/DevOps/data/content/business), Databricks Genie Code (32→77% success), observability crisis (47.1% monitored, 88% incidents), traditional monitoring insufficient, AgenticOS concept, "any metric" positioning |
 | 26 | [Resilience Patterns, Model Routing & Durable Execution](chapters/26-resilience-patterns-durable-execution.md) | 4-layer fault tolerance (23%→2% unrecoverable), AIMD model failover, inner/outer loop separation, graceful degradation tiers, durable execution vs filesystem-as-checkpoint, production incident catalog (10 incidents, 0 postmortems), autoresearch at GPU scale, "harness > model" quantified |
 | 27 | [Practical Memory Engineering & Event-Driven Loops](chapters/27-memory-engineering-event-driven-loops.md) | BMO knowing-doing gap, Claude Code two-tier memory (CLAUDE.md + .memory/state.json), Factory.ai restorable compression, 7 memory frameworks (no vector DB trend), Claude Code Channels (event-driven push into sessions), guardrails scaling strategies, three-layer practitioner consensus |
+| 28 | [Workflow Composition, CI/CD Integration & Agent Fleets](chapters/28-workflow-composition-cicd-integration.md) | Google ADK 8 patterns, CI as agent scheduler (Elastic 20 days saved, Red Hat Cicaddy), typed schemas at handoffs (GitHub Blog), 40% project cancellation (Deloitte), framework landscape (LangGraph/CrewAI/AutoGen), DAG orchestration, Cursor Automations (event-triggered), fresh metrics ($23/night 80% success) |
 
 ## Open Questions
 
@@ -131,3 +134,7 @@
 - [BMO Self-Improving Coding Agent](https://ngrok.com/blog/bmo-self-improving-coding-agent) — ngrok (knowing-doing gap, 2/60 tool usage)
 - [Context Compression](https://factory.ai/news/compressing-context) — Factory.ai (restorable compression, two-threshold system)
 - [Claude Code Channels](https://code.claude.com/docs/en/channels) — Anthropic (event-driven push into running sessions)
+- [CI/CD Pipelines with Claude AI](https://www.elastic.co/search-labs/blog/ci-pipelines-claude-ai-agent) — Elastic (24 PRs fixed, 20 days saved in Buildkite CI)
+- [Multi-Agent Workflows Often Fail](https://github.blog/ai-and-ml/generative-ai/multi-agent-workflows-often-fail-heres-how-to-engineer-ones-that-dont/) — GitHub (typed schemas, context loss as #1 failure)
+- [Cicaddy: Agentic Workflows in CI](https://developers.redhat.com/articles/2026/03/12/how-develop-agentic-workflows-ci-pipeline-cicaddy) — Red Hat (pipeline-native agents, MCP as tool interface)
+- [Ralph Loop + Claude Code: 47 Commits](https://intelligenttools.co/blog/claude-code-unsupervised-8-hours-ralph-loop) — IntelligentTools ($23/night, 80% success)