Skip to content

fix(generation): move master workflow payload into sidecars#133

Open
miyaontherelay wants to merge 3 commits into
mainfrom
claude/master-workflow-sidecars
Open

fix(generation): move master workflow payload into sidecars#133
miyaontherelay wants to merge 3 commits into
mainfrom
claude/master-workflow-sidecars

Conversation

@miyaontherelay
Copy link
Copy Markdown
Collaborator

Summary

  • move the master workflow payload into sidecar files instead of embedding the full payload inline
  • align generation contract tests with the sidecar workflow shape
  • disable the overnight runner auto-commit/auto-push path so future runs do not ship directly to origin/main

Validation

  • npm run typecheck
  • npx vitest run src/product/generation/pipeline.test.ts test/generated-workflow-reliability-contract.test.ts test/overnight-script-contract.test.ts

Main branch cleanup

  • reverted accidental direct pushes on origin/main with c9a2c36 and 913a655

miyaontherelay and others added 3 commits May 22, 2026 14:55
The overnight runner's commit_if_clean_delta function shipped unreviewed
changes directly to origin/main after every workflow (commits authored
as "Miya"). This is the mechanism that pushed two sidecar-related
commits straight to main, bypassing review. Future progress capture
should land via a per-run branch + PR.

Replaced the function body with a logged no-op so callers don't have to
change. The active overnight session has already been signalled to stop
via its STOP_FILE; this prevents future invocations from re-enabling
the hook.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR moves large workflow specifications and child-source mappings from inline generated code to on-disk sidecar files, avoiding argv-size constraints. The overnight harness's auto-commit function is disabled to prevent automatic commits of generated workflows.

Changes

Sidecar File Infrastructure and Overnight Automation

Layer / File(s) Summary
Sidecar File Contract and Disk Writing
src/product/generation/types.ts, src/local/entrypoint.ts
RenderedArtifact adds an optional sidecarFiles property mapping repo-relative paths to content. The local entrypoint writes each sidecar file to disk alongside the main artifact, logging each write.
Master Workflow Sidecar Generation and Runtime Loading
src/product/generation/master-workflow-renderer.ts
The master workflow renderer derives spec and children sidecar paths, builds child source maps, and attaches them to artifact.sidecarFiles. At runtime, the generated master workflow reads the child source map from the sidecar path instead of embedding the full JSON. A new firstHeadingOrSummary helper extracts a short description for the workflow's .description(). The childWorkflowSource function now accepts an optional specSidecarPath and uses it in the prepare-context step.
Test Assertions for Sidecar Content
src/product/generation/pipeline.test.ts, test/generated-workflow-reliability-contract.test.ts
Tests now validate child-workflow configuration in rendered sidecarFiles instead of inline master content. The validator-claude retry agent and child identifiers are asserted within the sidecar, while master-specific retry configuration remains in the main content.
Overnight Script Auto-commit/Auto-push Disabled
scripts/run-ricky-overnight.sh, test/overnight-script-contract.test.ts
The commit_if_clean_delta function is converted to a no-op that logs auto-commit is disabled and returns success without creating commits or pushing. Tests assert that post-workflow delta capture is skipped and no git commands are executed.

Sequence Diagram

sequenceDiagram
  participant GeneratorCode as Generator Code
  participant MasterRenderer as Master Workflow Renderer
  participant RenderedArtifact as Rendered Artifact
  participant LocalEntrypoint as Local Entrypoint
  participant MasterWorkflow as Master Workflow<br/>(at runtime)
  
  GeneratorCode->>MasterRenderer: renderMasterExecutionWorkflow(spec)
  MasterRenderer->>MasterRenderer: compute specSidecarPath<br/>childrenSidecarPath
  MasterRenderer->>MasterRenderer: build childSources map
  MasterRenderer->>RenderedArtifact: attach sidecarFiles<br/>(spec + children.json)
  MasterRenderer-->>GeneratorCode: artifact with sidecarFiles
  
  GeneratorCode->>LocalEntrypoint: write artifact
  LocalEntrypoint->>LocalEntrypoint: write main content
  LocalEntrypoint->>LocalEntrypoint: write each sidecar file<br/>to disk
  
  MasterWorkflow->>MasterWorkflow: readFileSync(childrenSidecarPath)
  MasterWorkflow->>MasterWorkflow: load child source maps<br/>at runtime
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • AgentWorkforce/ricky#77: Directly related to moving child retry/repair assertions into sidecarFiles and updating pipeline.test.ts and generated-workflow-reliability-contract.test.ts accordingly.

  • AgentWorkforce/ricky#59: Both PRs implement the same "move large/child workflow context into generated sidecar files" refactor, updating master-workflow-renderer and adjusting tests to assert sidecar artifact references instead of inlined content.

  • AgentWorkforce/ricky#121: Related through shared modifications to src/product/generation/master-workflow-renderer.ts's child workflow generation, particularly the prepare-context and child step wiring in childWorkflowSource.

Poem

🐰 Sidecars on disk, no argv tight,
Spec and children dance in the night,
Runtime reads what we wrote down before,
No auto-push knocking at the door!
Workflows breathe free, lean and light. 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: moving master workflow payload into sidecars, which is the primary focus across multiple files in the changeset.
Description check ✅ Passed The description is directly related to the changeset, providing a clear summary of the three main objectives and validation steps performed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/master-workflow-sidecars

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/local/entrypoint.ts`:
- Around line 1205-1208: The loop writes artifact.sidecarFiles entries directly
via artifactWriter.writeArtifact(sidecarPath, ...) without validating paths;
resolve each sidecarPath against the invocation root (cwd) using
path.resolve(cwd, sidecarPath) and ensure the resulting absolute path is inside
the repo root (e.g., resolvedPath.startsWith(path.resolve(cwd) + path.sep)) to
prevent absolute paths or ../ traversal; if the check fails, skip or error and
log a warning instead of calling artifactWriter.writeArtifact; update the code
around artifact.sidecarFiles handling and the call site of
artifactWriter.writeArtifact to use the validated resolvedPath.

In `@src/product/generation/master-workflow-renderer.ts`:
- Around line 935-945: Replace the regex/line-splitting logic in
firstHeadingOrSummary with an mdast parse-and-walk: call fromMarkdown(specText)
to build the AST, find the first heading node with depth 1 or 2 and use its text
content (strip/trim), otherwise fall back to the first non-empty paragraph or
plain text node's value, and if none exist return the existing default label
("Ricky master workflow"); preserve the existing 200-character cap and
truncation behavior when returning the candidate.

In `@src/product/generation/pipeline.test.ts`:
- Around line 117-126: Replace the brittle string/regex assertions on
childrenSidecar with JSON parsing of childrenSidecarPath's contents, then for
each child workflow entry parse its TypeScript source using the project's AST
helper (e.g. extractStepConfigs) and assert the presence and values of onError
config (repairAgent, maxRetries, repairRetries, retryDelayMs), the expected step
ordering (review-claude before final-fix-codex and presence of
RICKY_CHILD_FRESH_EYES_LOOP_READY marker), and that validator-claude's onError
appears on the child AST (not via substring); also parse rendered.content
similarly with AST checks for the master-level .onError('retry'...) config
instead of relying on escaped substring matches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 21872129-f482-4e11-8d04-2a071218dde2

📥 Commits

Reviewing files that changed from the base of the PR and between c9a2c36 and 9b884dc.

📒 Files selected for processing (7)
  • scripts/run-ricky-overnight.sh
  • src/local/entrypoint.ts
  • src/product/generation/master-workflow-renderer.ts
  • src/product/generation/pipeline.test.ts
  • src/product/generation/types.ts
  • test/generated-workflow-reliability-contract.test.ts
  • test/overnight-script-contract.test.ts

Comment thread src/local/entrypoint.ts
Comment on lines +1205 to +1208
if (artifact.sidecarFiles) {
for (const [sidecarPath, sidecarContent] of Object.entries(artifact.sidecarFiles)) {
await artifactWriter.writeArtifact(sidecarPath, sidecarContent, cwd);
logs.push(`[local] wrote workflow sidecar: ${sidecarPath}`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce repo-root confinement before writing sidecars.

sidecarPath is written without validating it stays inside the invocation root. A malformed/absolute path can write outside the workspace.

🔒 Proposed hardening
-import { delimiter, dirname, isAbsolute, join, resolve } from 'node:path';
+import { delimiter, dirname, isAbsolute, join, relative, resolve } from 'node:path';
@@
         if (artifact.sidecarFiles) {
+          const repoRoot = resolve(cwd);
           for (const [sidecarPath, sidecarContent] of Object.entries(artifact.sidecarFiles)) {
+            const resolvedSidecarPath = resolve(cwd, sidecarPath);
+            const rel = relative(repoRoot, resolvedSidecarPath);
+            if (rel.startsWith('..') || isAbsolute(rel)) {
+              throw new Error(`Invalid sidecar path outside workspace: ${sidecarPath}`);
+            }
             await artifactWriter.writeArtifact(sidecarPath, sidecarContent, cwd);
             logs.push(`[local] wrote workflow sidecar: ${sidecarPath}`);
           }
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (artifact.sidecarFiles) {
for (const [sidecarPath, sidecarContent] of Object.entries(artifact.sidecarFiles)) {
await artifactWriter.writeArtifact(sidecarPath, sidecarContent, cwd);
logs.push(`[local] wrote workflow sidecar: ${sidecarPath}`);
if (artifact.sidecarFiles) {
const repoRoot = resolve(cwd);
for (const [sidecarPath, sidecarContent] of Object.entries(artifact.sidecarFiles)) {
const resolvedSidecarPath = resolve(cwd, sidecarPath);
const rel = relative(repoRoot, resolvedSidecarPath);
if (rel.startsWith('..') || isAbsolute(rel)) {
throw new Error(`Invalid sidecar path outside workspace: ${sidecarPath}`);
}
await artifactWriter.writeArtifact(sidecarPath, sidecarContent, cwd);
logs.push(`[local] wrote workflow sidecar: ${sidecarPath}`);
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/entrypoint.ts` around lines 1205 - 1208, The loop writes
artifact.sidecarFiles entries directly via
artifactWriter.writeArtifact(sidecarPath, ...) without validating paths; resolve
each sidecarPath against the invocation root (cwd) using path.resolve(cwd,
sidecarPath) and ensure the resulting absolute path is inside the repo root
(e.g., resolvedPath.startsWith(path.resolve(cwd) + path.sep)) to prevent
absolute paths or ../ traversal; if the check fails, skip or error and log a
warning instead of calling artifactWriter.writeArtifact; update the code around
artifact.sidecarFiles handling and the call site of artifactWriter.writeArtifact
to use the validated resolvedPath.

Comment on lines +935 to +945
function firstHeadingOrSummary(specText: string): string {
// Pull the first H1 / H2 line from the spec to use as a short description.
// Falls back to the first non-empty line, then to a generic label. Output
// is capped so the master `.description()` never approaches argv limits.
const lines = specText.split('\n');
const heading = lines.find((line) => /^#{1,2}\s+\S/.test(line));
const candidate = heading?.replace(/^#{1,2}\s+/, '').trim()
?? lines.find((line) => line.trim().length > 0)?.trim()
?? 'Ricky master workflow';
return candidate.length > 200 ? `${candidate.slice(0, 197)}...` : candidate;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use mdast traversal for heading extraction, not regex line matching.

This helper currently inspects markdown with split + regex. Switch to fromMarkdown(...) AST traversal to extract the first H1/H2 and fallback text.

As per coding guidelines, "Default to AST walk for the file's grammar (TS, mdast, JSON.parse) before using tokenizers, regex, or substring matching when analyzing source text in Ricky implementation."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/master-workflow-renderer.ts` around lines 935 - 945,
Replace the regex/line-splitting logic in firstHeadingOrSummary with an mdast
parse-and-walk: call fromMarkdown(specText) to build the AST, find the first
heading node with depth 1 or 2 and use its text content (strip/trim), otherwise
fall back to the first non-empty paragraph or plain text node's value, and if
none exist return the existing default label ("Ricky master workflow"); preserve
the existing 200-character cap and truncation behavior when returning the
candidate.

Comment on lines +117 to 126
const childrenSidecarPath = 'workflows/generated/runtime-master.children.json';
expect(rendered.sidecarFiles?.[childrenSidecarPath], 'children sidecar attached').toBeDefined();
const childrenSidecar = rendered.sidecarFiles![childrenSidecarPath];
expect(childrenSidecar).toContain('review-claude');
expect(childrenSidecar).toContain('final-fix-codex');
expect(childrenSidecar).toContain('RICKY_CHILD_FRESH_EYES_LOOP_READY');
expect(rendered.content).toContain(".onError('retry', { maxRetries: 2, retryDelayMs: 10000, repairAgent: \"master-lead\", repairRetries: 2 })");
expect(rendered.content.replace(/\\+"/g, '"')).toContain(".onError('retry', { maxRetries: 2, retryDelayMs: 10000, repairAgent: \"validator-claude\", repairRetries: 2 })");
// validator-claude is a child-side agent; assert against the sidecar.
expect(childrenSidecar.replace(/\\+"/g, '"')).toContain(".onError('retry', { maxRetries: 2, retryDelayMs: 10000, repairAgent: \"validator-claude\", repairRetries: 2 })");
expect(rendered.content.replace(/\\+"/g, '"')).toMatch(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Replace escaped-string pattern checks with JSON + AST assertions.

These checks rely on substring/regex over escaped sidecar text. Parse childrenSidecar as JSON, then inspect each child workflow source via TypeScript AST helpers (like extractStepConfigs) for onError, step ordering, and markers.

As per coding guidelines, "Do not use regex, substring matches, line-anchored patterns, or comment-stripping heuristics to detect presence or absence of code patterns (imports, declarations, etc.); use AST walking instead."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/pipeline.test.ts` around lines 117 - 126, Replace the
brittle string/regex assertions on childrenSidecar with JSON parsing of
childrenSidecarPath's contents, then for each child workflow entry parse its
TypeScript source using the project's AST helper (e.g. extractStepConfigs) and
assert the presence and values of onError config (repairAgent, maxRetries,
repairRetries, retryDelayMs), the expected step ordering (review-claude before
final-fix-codex and presence of RICKY_CHILD_FRESH_EYES_LOOP_READY marker), and
that validator-claude's onError appears on the child AST (not via substring);
also parse rendered.content similarly with AST checks for the master-level
.onError('retry'...) config instead of relying on escaped substring matches.

@github-actions
Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-22T12-57-25-416Z-HEAD-provider
Mode: provider
Git SHA: c99d129

Passed: 10 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 1 | Skipped: 0

Blocking Cases

FAIL workflow-authoring.fresh-eyes-loop-complex-multitrack (workflow-authoring/ricky-cli)

User message

Generate a serious multi-track master executor workflow for three independent product slices: runtime evidence, CLI status copy, and generation validation. Each track owns separate files and the final workflow may create a PR.

Ricky output

[
  {
    "stage": "generate",
    "status": "ok",
    "artifact": {
      "path": "workflows/generated/ricky-generate-a-serious-multi-track-workflow-for-three-independent-product-slices-as-.ts",
      "workflow_id": "ricky-generate-a-serious-multi-track-workflow-for-three-independent-product-slices-as-",
      "spec_digest": "76e620ac817c260de17c96b23c0b60a2a7bce8806a54d71d597d49b535230d8b"
    },
    "next": {
      "run_command": "ricky run workflows/generated/ricky-generate-a-serious-multi-track-workflow-for-three-independent-product-slices-as-.ts",
      "run_mode_hint": "ricky run workflows/generated/ricky-generate-a-serious-multi-track-workflow-for-three-independent-product-slices-as-.ts"
    },
    "decisions": {
      "skill_matches": [
        {
          "id": "relay-80-100-workflow",
          "name": "relay-80-100-workflow",
          "path": "/home/runner/work/ricky/ricky/.agents/skills/relay-80-100-workflow/SKILL.md",
          "confidence": 1,
          "reason": "Spec text mentions \"workflows\". Spec text mentions \"validation\". Spec text mentions \"fresh-eyes\". Spec text mentions \"review/fix\". Spec text mentions \"loops\".",
          "evidence": [
          
...[truncated]

Must

  • Use a coordination shape that fits independent tracks while preserving deterministic gates after every editing step.
  • Ensure each implementation track or child workflow has the mandatory Claude-then-Codex fresh-eyes review/fix loop before track signoff.
  • Run final deterministic acceptance only after all Codex final fixes and post-fix reviews have completed.
  • Use BLOCKED_NO_COMMIT with evidence when a finding cannot be fixed, and skip commit or PR creation in that state.
  • Use the GitHub primitive for PR creation when shipping is in scope.

Must Not

  • Put one global review at the end and call that sufficient for all tracks.
  • Serialize independent tracks without a stated dependency reason.
  • Let PR creation, commit, or handoff race ahead of unresolved review findings.
  • Present tests, typecheck, or dry-run alone as the complete proof bar.

Deterministic checks

  • PASS status: case executed
  • PASS ok: expected ok=true, got true
  • PASS contentIncludes:stage": "generate: expected content to include "stage": "generate"
  • PASS contentIncludes:status": "ok: expected content to include "status": "ok"
  • PASS contentIncludes:RICKY_MASTER_EXECUTOR_WORKFLOW: expected content to include "RICKY_MASTER_EXECUTOR_WORKFLOW"
  • PASS contentIncludes:Master plan:: expected content to include "Master plan:"
  • PASS contentIncludes:RICKY_CHILD_WORKFLOW_COMPLETE: expected content to include "RICKY_CHILD_WORKFLOW_COMPLETE"
  • FAIL contentIncludes:review-claude: expected content to include "review-claude"
  • FAIL contentIncludes:final-fix-codex: expected content to include "final-fix-codex"
  • FAIL contentIncludes:RICKY_CHILD_FRESH_EYES_LOOP_READY: expected content to include "RICKY_CHILD_FRESH_EYES_LOOP_READY"
  • FAIL contentIncludes:BLOCKED_NO_COMMIT: expected content to include "BLOCKED_NO_COMMIT"
  • FAIL contentMatches:review-claude[\s\S]*fix-loop[\s\S]*final-review-claude[\s\S]*final-fix-claude[\s\S]*review-codex[\s\S]*fix-loop-codex[\s\S]*final-review-codex[\s\S]*final-fix-codex[\s\S]*final-review-pass-gate[\s\S]*final-hard-validation: expected content to match /review-claude[\s\S]*fix-loop[\s\S]*final-review-claude[\s\S]*final-fix-claude[\s\S]*review-codex[\s\S]*fix-loop-codex[\s\S]*final-review-codex[\s\S]*final-fix-codex[\s\S]*final-review-pass-gate[\s\S]*final-hard-validation/i
  • PASS forbidPhrase:TypeError: content must not include "TypeError"
  • PASS forbidPhrase:ReferenceError: content must not include "ReferenceError"
  • PASS forbidPhrase:needs_clarification: content must not include "needs_clarification"
  • PASS maxToolCalls: expected <= 1 tool calls, got 1

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic workflow for delivering the requested change. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards**, and the **shared authoring rules**.

---

## 1️⃣ Scope & Acceptance Criteria

| Item | Description |
|------|-------------|
| **Docs** | • Update `README.md` and `docs/architecture.md` to state that Ricky now **delegates** intent parsing, context enrichment, and response synthesis to the **Agent Assistant** service.<br>• Add a new section “Deep Agent Assistant Integration” with a diagram and example call‑flow. |
| **Code** | • In `src/ricky.ts` replace the local `parseIntent()` stub with a call to `AgentAssistant.parseIntent()`.<br>• Add a wrapper `agentAssistantClient` that handles retries, timeout, and typed response validation.<br>• Update unit tests to mock the Agent Assistant client and verify the new path. |
| **Verification** | • All TypeScript compile checks pass (`npm run build`).<br>• 100 % unit‑test coverage on the new client wrapper.<br>• End‑to‑end smoke test (`npm run e2e`) confirms Ricky still produces correc
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*(as of the latest CI run on 2026‑05‑20)*  

| Metric | Current Value | Target / Trend | Status |
|--------|---------------|----------------|--------|
| **Workflow contracts defined** | 12 of 15 active agents have a `turn‑context` contract in `workflow.yaml` | 100 % | ⚠️ 3 agents missing |
| **Deterministic verification gates** | 9 agents have automated gate tests (`verify-turn-context.js`) | 100 % for agents with contracts | ⚠️ 3 agents lack gates |
| **Review artifacts** | 78 % of PRs include a `Turn‑Context Review` checklist item | ≥ 90 % | ⚠️ Below target |
| **Validation loops (80‑100 per release)** | Avg. 62 loops per release (target 80‑100) | ↑ | ⚠️ Insufficient loops |
| **Blocker reporting** | 4 blockers logged in the last sprint (all resolved) | ≤ 2 per sprint | ⚠️ Slightly high |
| **Branch/PR scoping** | 85 % of PRs respect scoped `turn‑context/*` branch naming | ≥ 95 % | ⚠️ Needs improvement |
| **Documentation coverage** | 68 % of agents have `TURN_CONTEXT.md` linked from `README` | ≥ 90 % | ⚠️ Gaps remain |

---

### Key Findings

1. **Partial Contract Coverage**  
   - Agents **`data‑ingest`**, **`nlp‑summari
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|---------------------------------------|
| **Workflow generation** | Agent Assistant already owns the *authoring contract* (input schema → workflow definition). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for workflow shape; downstream agents can rely on deterministic contracts. |
| **Workflow validation** | Validation gates (schema checks, deterministic test‑cases, 80‑100 loop verification) are part of the *verification contract* that Agent Assistant enforces before publishing. | Early failure detection, reduces downstream breakage, and provides a clear “gate‑open” signal for other agents. |
| **Debugging** | Debug utilities (step‑by‑step replay, state diff, deterministic replay logs) are tightly coupled to the generation/valid
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Rollout”**  
*Target window: 12 weeks (Q4 2024)*  

---

## 1. Slice Objective
Deliver a production‑ready, self‑service adoption package for the **Ricky** AgentWorkforce assistant that enables **10 pilot teams** to onboard, configure, and run reliable workflow authoring cycles with deterministic verification and blocker reporting.

---

## 2. Scope & Boundaries  

| **In‑Scope** | **Out‑of‑Scope** |
|--------------|------------------|
| • End‑to‑end onboarding flow (invite → repo fork → CI pipeline)  | • Full enterprise‑wide rollout (beyond pilot) |
| • Pre‑configured **AGENTS.md** template repository (Ricky‑template) | • Custom UI extensions (outside CLI/VS Code) |
| • Deterministic verification gates (lint, unit, contract, integration) | • Migration of legacy non‑Git workflows |
| • 80‑to‑100 validation loops (auto‑generated test matrix) | • Non‑Git‑based agents |
| • Review artifact generation (PR checklist, blocker report) | • External vendor integrations not in spec |
| • Scoped branch/PR policy (feature → dev → main) | • Long‑term monitoring dashboards (post‑slice) |

---

## 3. Milestones & Deliverables  

| **Week** | **Milesto
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, contract‑driven package** that lets Ricky expose its workflow‑reliability, coordination, and authoring capabilities through two front‑ends:

| Package | Entry point | Primary use‑case |
|---------|-------------|-----------------|
| **ricky‑slack‑bot** | `src/slack_bot.py` | Interactive Slack slash‑commands & message actions |
| **ricky‑web‑ui** | `src/web_ui.py` | Self‑service web portal (React + FastAPI) |

Both packages share a **core service layer** (`src/core/`) that implements the **Ricky Engine** (workflow validation, deterministic gates, review artifact generation, blocker reporting). The design follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (80‑100 validation loops, scoped PR boundaries, etc.).

---

## 1. Core Service Layer (`src/core/`)

| Module | Contract (Python‑type‑hinted) | Deterministic verification gate |
|--------|------------------------------|----------------------------------|
| `engine.py` | `def run_workflow(workflow_id: str, payload: Mapping[str, Any]) -> WorkflowResult:` | **Gate‑A** – schema validation of `payload` against the
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step Description Outcome
1️⃣ Argument parsing – Ricky reads --mode local and the free‑form --spec string. The request is accepted; Ricky knows you want a local workflow generation.
2️⃣ Specification interpretation – The text "generate a workflow for package checks" is sent to the Spec‑to‑Workflow engine. A workflow contract is produced (YAML/JSON) that follows the repository’s standard contract shape (see AGENTS.md → Workflow Contract).
3️⃣ Contract rendering – The contract is written to the default output location (./ricky_output/<timestamp>/workflow.yaml). You get a draft workflow file on disk, but no execution of any steps.
4️⃣ Verification gate (dry‑run) – Because --run is omitted, Ricky stops after the deterministic verification gate that checks:
• Syntax validity (YAML/JSON schema).
• Presence of required stages (init, check, report).
• No missing required inputs.
If the contract passes, you see a
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each step is deterministic, self‑contained, and includes verification gates that will stop the process and surface a clear blocker if anything goes wrong.

---

## 1️⃣ Prerequisites (Verification Gate)

| Check | Command | Expected Result |
|-------|---------|-----------------|
| **Node ≥18** | `node -v` | `v18.x` or higher |
| **Git ≥2.30** | `git --version` | `git version 2.30.x` or higher |
| **Docker (optional, for containerized agents)** | `docker version --format '{{.Server.Version}}'` | `>=20.10` |

If any check fails, install the missing tool before proceeding.

---

## 2️⃣ Install the Ricky CLI (Deterministic Contract)

```bash
# Install globally via npm (preferred)
npm i -g @ricky/cli

# Verify installation
ricky --version

Expected output: ricky-cli vX.Y.Z (the latest stable release).
If the version string is missing or the command exits with a non‑zero code → Blocker: “Ricky CLI not installed correctly”.


3️⃣ Initialise a New Workspace (Scoped Branch)

# Create a d
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky – Workflow Reliability & Coordination Assistant**

---

### Situation
You’ve received an empty specification or a reference to a missing spec file. Without a concrete spec, I can’t generate a deterministic workflow contract, verification gates, or review artifacts.

### Immediate Action
1. **Notify the requester** that the spec is missing or empty.  
2. **Request the required information** (full spec content, location, or a brief description of the intended workflow).  

### Suggested Response Template

> **Subject:** Missing or Empty Workflow Specification  
> 
> Hi [User Name],
> 
> I tried to load the workflow specification you referenced, but the file is either empty or not found at the provided path. To proceed with authoring a reliable workflow (including contracts, verification gates, and review artifacts), I need the complete spec or at least a concise description of:
> 
> 1. **Workflow purpose** – what problem does it solve?  
> 2. **Key stages / tasks** – major steps, inputs, and outputs.  
> 3. **Success criteria** – measurable outcomes or SLAs.  
> 4. **Dependencies** – external services, data sources, or other workflows.  
> 
> Once I have this information, I c
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

✅ Core Services
• Scheduler : RUNNING
• Queue : ACTIVE
• Metrics : COLLECTING

🔧 Configuration
• Google OAuth : CONFIGURED (credentials present)
• GitHub Token : CONFIGURED (token present)

⚠️ Alerts
• Provider health checks have not yet confirmed connectivity for Google or GitHub.
• Actions dependent on these providers may be delayed or fail until verification succeeds.

🛠️ Next Steps

  1. Run provider health checks:
    ricky check --provider google
    ricky check --provider github
  2. Review logs for any authentication errors.
  3. Re‑run ricky status after checks complete.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**  
*Purpose*: Apply a small, self‑contained change (e.g., typo fix, comment update, CI config tweak) to a repository while guaranteeing traceability, deterministic verification, and safe merge.

---

## 1. Scope & Branching Contract  

| Item | Definition |
|------|------------|
| **Change Type** | Primitive (≤ 10 lines, no API change, no new dependencies) |
| **Target Repo** | `owner/repo` (must have `main` protected) |
| **Branch Naming** | `ricky/primitive/<short‑desc>/<YYYYMMDD>` |
| **PR Title** | `fix: <short‑desc>` (follow Conventional Commits) |
| **PR Description** | • Why the change is needed  <br>• Exact file(s) and line ranges <br>• Verification steps (see §3) |
| **Merge Policy** | Require **2 approvals** + **all verification gates** passing; **no force‑push** after PR opened. |
| **Rollback** | If any gate fails post‑merge, revert via `git revert <merge‑sha>` and open a follow‑up PR titled `revert: <original‑title>`. |

---

## 2. Step‑by‑Step Execution Contract  

| Step | Owner | Action | Artifact | Success Gate |
|------|-------|--------|----------|--------------|
| **0 – Prep** | *Ricky (automation)* | Clone repo, create bran
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow Title:** Claude‑Review → Codex‑Implementation Pipeline  
**Owner:** AgentWorkforce (Ricky)  
**Scope:** Automated code change request (CR) from specification to merged PR.  

---  

## 1. Contract Overview  

| Element | Description |
|---------|-------------|
| **Input** | `spec.yaml` – structured spec containing `description`, `acceptance_criteria`, and optional `sample_inputs`. |
| **Output** | `implementation/` – generated source files, `review_report.md`, and a **ready‑to‑merge** Pull Request (PR) targeting `main`. |
| **Success Criteria** | 1. Claude’s review report contains **no “BLOCKER”** tags. 2. All automated tests pass (`npm test` / `pytest`). 3. Human reviewer signs off (optional gate). |
| **SLAs** | Total wall‑clock ≤ 30 min per CR; each deterministic gate ≤ 5 min. |
| **Failure Modes** | • Claude returns “BLOCKER”. <br>• Codex generation fails syntax check. <br>• Test suite fails. <br>All failures raise a **Workflow Blocker** ticket in the issue tracker. |

---  

## 2. Step‑by‑Step Execution Plan  

| # | Step | Agent | Deterministic Gate | Artifact |
|---|------|-------|--------------------|----------|
| 1 | **Ingest Spec** | `FileReader` | – | `spec.ya
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes a **generated workflow JSON** (or YAML) and returns a **tighter, review‑ready version**. It focuses on two things:

| Target | What `--refine` does |
|--------|----------------------|
| **Task descriptions** | • Removes vague verbs (“do”, “handle”). <br>• Adds concrete inputs, outputs, and success criteria. <br>• Enforces the *“who‑does‑what‑when‑how”* pattern. |
| **Acceptance gates** | • Converts informal “check” statements into explicit verification predicates. <br>• Inserts deterministic “gate‑type” fields (`type: unit-test`, `type: schema‑check`, `type: external‑api‑ping`). <br>• Adds a `timeout` and `retry` policy for each gate. |

---

## 1. Basic Invocation

```bash
awf generate --template my_template.yaml --output workflow.json
awf refine --input workflow.json --output workflow.refined.json

awf = AgentWorkforce CLI

  • --input must be a valid workflow file (JSON/YAML).
  • --output is the refined artifact you will submit for review.

2. What the Refiner Looks For

Section Refine Action
...[truncated]

**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Release**  
*Repository: `github.com/agentworkforce/ricky`*  
*Target Branch: `main` → `release/vX.Y.Z`*  

---  

## 1. Scope & Objective
Produce a deterministic, test‑driven workflow that defines the contract for the `ricky --version` CLI flag, validates it across all supported platforms, and packages the change for release. The workflow must be reproducible, auditable, and gate‑controlled.

---

## 2. Contract (Specification)

| Item | Description |
|------|-------------|
| **Command** | `ricky --version` |
| **Output Format** | `ricky version <semver>` (e.g., `ricky version 1.4.2`) |
| **Exit Code** | `0` on success, non‑zero on error |
| **Supported Platforms** | Linux (x86_64, arm64), macOS (x86_64, arm64), Windows (x86_64) |
| **Version Source** | `git describe --tags --always --dirty=-dev` at build time; fallback to `VERSION` file if git metadata unavailable |
| **Logging** | No extra logs; output must be a single line to `stdout` |
| **Error Cases** | If version cannot be resolved, exit 1 and print `ricky version unknown` to `stderr` |
| **Backward Compatibility** | Must not affect existing flags or sub‑commands |

---

## 3. W
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

OpenRouter returned an empty response after 3 attempts for generation-quality.pattern-selection-deliberate.
This provider response is reviewable as an infrastructure-quality signal, but it is not a Ricky product answer.

Attempts:
- attempt 1: empty content
- attempt 2: empty content
- attempt 3: empty content

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a concise “skill‑impact map” that shows the direct effect each of the core Ricky writing skills has on the structure, reliability, and reviewability of a generated workflow. The map is illustrated with a concrete example: a **“Customer‑Onboarding Email Campaign”** workflow that moves a new lead through a series of automated emails, approvals, and data‑enrichment steps.

| **Skill** | **What the Skill Enforces** | **Resulting Artifact in the Workflow** | **Why It Improves Reliability / Coordination** |
|-----------|----------------------------|----------------------------------------|-----------------------------------------------|
| **Deterministic Contract Definition** | Every step declares an explicit **input‑output contract** (type, schema, required fields). | `Step: enrich_profile` – `input: LeadRecord {id, email}` → `output: EnrichedLead {id, email, score, segment}` | Guarantees downstream steps receive exactly what they expect; type‑checking catches mismatches before runtime. |
| **Verification Gate Insertion** | After any external call (API, DB, third‑party service) a **gate** runs a valida
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a **concrete, end‑to‑end workflow** that turns a **vague product specification** into a **review‑ready implementation**.  
All stages are expressed as **deterministic contracts**, include **verification gates**, and are scoped so that each branch/PR has a single responsibility.  

---

## 1. High‑level Goal (derived from the vague spec)

> *“Create a feature that lets users upload a profile picture, stores it securely, and displays it on their dashboard.”*  

**Assumptions added for clarity**

| Area | Assumption |
|------|------------|
| Input | JPEG/PNG ≤ 5 MB, optional cropping UI |
| Storage | AWS S3 bucket with server‑side encryption |
| Auth | Existing JWT‑based auth; user ID is available as `req.user.id` |
| UI | React component `ProfilePictureUploader` |
| Backend | Node.js/Express service `profile.service.ts` |
| Tests | Jest for unit, Cypress for e2e |
| CI/CD | GitHub Actions, PR must pass all checks before merge |

If any assumption is wrong, open a **blocking issue** (see Blocker Reporting below) before proceeding.

---

## 2. Workflow Contract Overview

| Stage | Branch | PR Title | Owner | Success Criteria |
|-------|--------|----------|-------|-------------
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response to the Failure**

| Phase | Action | Rationale |
|-------|--------|-----------|
| 1️⃣ Detect & Log | • Capture the timeout event (step X) and the “alive‑but‑idle” worker (step Y). <br>• Write a structured failure record to the workflow‑run log (JSON schema: `runId`, `stepId`, `status`, `timestamp`, `errorDetail`). | Provides an immutable audit trail and makes the failure reproducible for downstream analysis. |
| 2️⃣ Blocker Classification | • Mark the run as **BLOCKED** in the orchestration DB. <br>• Tag the blocker type: `TIMEOUT` + `NO_ARTIFACT`. | Guarantees that no downstream steps are scheduled until the blocker is resolved. |
| 3️⃣ Automated Remediation Attempt | • **Retry Policy**: If the step’s retry‑count < 3, enqueue a deterministic retry with a fresh worker instance (same container image, same input hash). <br>• Reset the idle worker’s lease and force a graceful shutdown to avoid zombie resources. | Gives the system a chance to self‑heal while keeping resource usage bounded. |
| 4️⃣ Verification Gate (post‑retry) | • After the retry finishes, run the **Artifact‑Presence Gate**: <br> `if artifacts.exists(stepId) → PASS else → FAIL`. <br>• Run 
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow: Detect and Clean Stale Relay State Before a Local Run**  

| **Stage** | **Purpose** | **Inputs** | **Outputs** | **Verification Gate** |
|-----------|-------------|------------|-------------|-----------------------|
| **1️⃣ Scan** | Locate any stale persistence directories in the project root. | `repo_root/` | List of paths that exist: `.agent‑relay/`, `.relay/`, `.trajectories/` | ‑ Exists‑check: each path must be a directory. If none exist → **skip** to stage 5. |
| **2️⃣ Timestamp‑filter** | Determine if the directories are *stale* (no modification in the last `N` minutes). | Directory list, configurable `STALE_MINUTES` (default 30) | Sub‑list of *stale* directories | ‑ Age‑check: `now - mtime(dir) ≥ STALE_MINUTES`. If any directory fails → **block** and emit a **StaleStateBlocker** (see § 5). |
| **3️⃣ Prompt / Auto‑Clean** | Offer the user a deterministic choice: (a) **preserve** (skip clean), (b) **remove** (delete recursively), (c) **archive** (tar‑gz to `.agent‑relay/.archive/`). | Stale list, user config `CLEAN_MODE` (`prompt`|`auto‑remove`|`auto‑archive`) | Action taken flag (`preserved`/`removed`/`archived`) | ‑ User‑confirmation gate (if `prompt`). Must rec
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky/Relay run is already active in this workspace”**

The marker that blocks a new run is a safety file created by the **Ricky** or **Relay** engine to prevent overlapping executions. When the marker is left behind (e.g., a crash, an orphaned container, or a stale CI job) the next invocation aborts with the message you’re seeing.

Below is the deterministic, step‑by‑step contract for clearing the marker and guaranteeing that future runs start cleanly.

---

## 1️⃣ Verify the Stale Marker

| Check | Command | Expected Result |
|-------|---------|-----------------|
| Workspace root | `ls -a .ricky/` | Should contain a file named `RUNNING` (or `relay.lock`). |
| Process list | `ps aux \| grep -i ricky` | No live `ricky`/`relay` PID should appear. |
| CI pipeline | Look at the latest pipeline UI for a *running* job that never completed. | If a job is still “in‑progress”, wait for it to finish or cancel it. |

If **any** of the above shows a live process, **do not delete the marker** – the run is still active. Cancel/kill the process first (see §2).

---

## 2️⃣ Safely Terminate an Orphaned Run

1. **Kill the process (if present)**  
   ```bash
   # Replace <PID> wit
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the minimal, reproducible change you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix injects the **Ricky environment loader** (`loadRickyWorkflowEnv`) and the **optional guard** (`assertRickyWorkflowEnv`) **once**, at the top of the generated file, and then re‑exports the original step definitions unchanged.  

The patch respects the repository’s **branch/PR boundaries** (a dedicated `fix/missing‑env‑var` branch) and includes the required **verification gates** and **review artifacts**.

---

## 1. Branch & PR scaffolding

```bash
# 1️⃣ Create a short‑lived branch for the fix
git checkout -b fix/missing-env-var

# 2️⃣ Apply the patch (see §2)
# 3️⃣ Run the deterministic verification gate (see §3)
# 4️⃣ Commit & push
git add path/to/workflow.artifact.ts
git commit -m "fix: inject Ricky env loader for MISSING_ENV_VAR"
git push -u origin fix/missing-env-var

# 5️⃣ Open a PR against `main` with the title:
#    "fix: inject Ricky env loader for MISSING_ENV_VAR"
#    – Add the checklist from the workflow‑repair template (see §4)

2. Concrete code i

...[truncated]


**Must**

- Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected `loadRickyWorkflowEnv` body references `rickyWorkflowFs.*` and `rickyWorkflowPath.*`, so the repair must also add the corresponding `import * as rickyWorkflowFs from 'node:fs'` and `import * as rickyWorkflowPath from 'node:path'` aliases at module top level.
- Detect existing alias imports by matching real top-level `import * as <alias> from '<module>'` statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in `.step({ command: ... })` bodies do not count as imports).
- Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

**Must Not**

- Skip adding the `rickyWorkflowFs` / `rickyWorkflowPath` aliases because `from 'node:fs'` or `from 'node:path'` already appears somewhere in the file as a string literal.
- Inject `loadRickyWorkflowEnv` (or `assertRickyWorkflowEnv`) without the supporting alias imports, which produces a `ReferenceError: rickyWorkflowPath is not defined` at module load and burns the auto-fix budget on `UNSUPPORTED_RUNTIME at runtime-launch`.
- Rewrite or escape the embedded HEREDOC text in step commands.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.auto-fix-bounded-loop</code> (runtime-recovery/manual)</summary>

**User message**

> Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

**Ricky output**

```text
**Local Workflow Execution – Auto‑Fix Mode**  

Below is a deterministic, step‑by‑step contract you can run in your development environment. It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is stored in `./artifacts/workflow.yaml`. The previous run ID is supplied as `prevRunId=12345abcde`.

---

## 1. Preconditions (Verification Gate)

| Check | Command | Expected Result |
|-------|---------|-----------------|
| CLI available | `aw --version` | `aw version x.y.z` |
| Workflow file exists | `test -f ./artifacts/workflow.yaml && echo ok` | `ok` |
| Previous run metadata reachable | `aw run get 12345abcde --output json` | JSON with `status: "completed"` |
| Auto‑fix feature enabled in config | `awk '/autoFix/ {print $2}' ~/.aw/config.yaml` | `true` |

If any check fails, **block** the run and fix the issue before proceeding.

---

## 2. Execution Contract

```bash
# 1️⃣ Load previous run context (optional but recommended)
export AW_PREV_RUN_ID=12345abcde

# 2️⃣ Invoke the workflow with auto‑fix
aw workflow run ./artifacts/workflow.yaml \
    --auto-fix \
    --prev-run $AW_PREV_RUN_ID \
    --output json \
    > run-output.json

...[truncated]


**Must**

- Use a bounded retry budget and summarize every attempt.
- Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
- Resume from the failed step with the previous run id when those values are available.

**Must Not**

- Edit arbitrary repository source files as the default auto-fix surface.
- Keep retrying after the configured max attempts.
- Lose the single Ricky tracking run id across repair/resume attempts.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.no-auto-fix-preserves-single-attempt</code> (runtime-recovery/manual)</summary>

**User message**

> A user runs `ricky run workflows/foo.ts --no-auto-fix` and the workflow fails.

**Ricky output**

```text
**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts` | Full execution trace – timestamps, step IDs, and any **ERROR** lines. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Interactive debug mode – pauses before each gate so you can inspect the in‑memory contract. |
| 3️⃣  | Open the workflow file | Verify that the **contract block** (`export const contract = …`) matches the expected input/output schema. |
| 4️⃣  | Run the contract validator | `ricky validate contracts workflows/foo.ts` – should return **PASS**; any **FAIL** indicates a schema mismatch. |
| 5️⃣  | Check the deterministic verification gates | Look for `gate.verify(...)` calls inside the workflow. Confirm that the gate’s **expected hash** and **pre‑condition** are correct. |
| 6️⃣  | Review the latest PR / branch | `git status` → ensure you are on the branch that contains the most recent **review artifact** (e.g., `review/foo.md`). |
| 7️⃣  | Re‑run with auto‑fix disabled (already done) | If the failure is due to a **blocker** (e.g., missing required field), the log will contain `BLOCKER:`. Note the exact
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeat‑able contract that Ricky follows whenever a developer asks to run a workflow artifact that lives in the *primary local path* (by convention `./workflows`). The steps are written as a deterministic workflow that can be copied into a CI‑ready script or a developer‑run helper (`ricky run <workflow‑name>`).

---

## 1️⃣ Preconditions (Verification Gate #1)

| Check | How to Verify | Expected Result |
|------|---------------|-----------------|
| **Repo root** | `git rev-parse --show-toplevel` | Path points to the repository root |
| **Primary workflow directory** | `test -d ./workflows && echo ok` | `ok` |
| **Node ≥ 18** | `node -v` | `v18.x` or higher |
| **npm ≥ 9** | `npm -v` | `9.x` or higher |
| **ts-node installed** | `npx ts-node -v` | prints a version (no error) |
| **Workflow file exists** | `test -f ./workflows/<name>.ts` | file present |
| **Package lock up‑to‑date** | `npm ci --dry-run` returns 0 | dependencies lock is consistent |

If any gate fails, Ricky **blocks** the run and emits a concise error report (see § 7).

---

## 2️⃣ Scoped Branch / PR Boundary (Op
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Protocol for Persistent Structural Failures**

When a structural failure remains after an attempted fix, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, prevents downstream impact, and creates a clear hand‑off for human intervention. The protocol is organized into **four phases**—Detection, Attempted Repair, Verification, and Blocker Escalation—each bounded by explicit contracts and verification gates.

---

## 1. Detection Phase  
| Contract | Description |
|----------|-------------|
| **Failure‑Signal** | An event `StructuralFailureDetected` emitted by the runtime monitor, containing: <br>• `failure_id` (UUID) <br>• `component_path` (dot‑notation) <br>• `error_code` (enum) <br>• `timestamp` |
| **Context Snapshot** | Automatic capture of the affected workflow state (`state_dump.json`) and the last successful commit hash (`base_sha`). |
| **Gate** | `failure_id` must be unique in the current run; duplicate signals are dropped. |

*Action*: Ricky logs the signal, tags the current commit with `failure/<failure_id>`, and creates a **temporary branch** `fix/<failure_id>` off `base_sha`.

---

## 2. Attempted Repair Phase  
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary (Generated 2026‑05‑22)**  

| Metric | Value | Interpretation |
|--------|-------|----------------|
| **Total Runs (last 30 days)** | 1 842 | Baseline volume |
| **Successful Runs** | 1 679 (91.2 %) | Within target ≥ 90 % |
| **Failed Runs** | 163 (8.8 %) | Above the 5 % alert threshold |
| **Runs with Blockers** | 27 (1.5 %) | Immediate attention required |
| **Average Runtime** | 3 m 42 s | Within SLA (≤ 5 m) |
| **95‑th‑Percentile Runtime** | 6 m 13 s | Slightly over SLA – investigate outliers |
| **Deterministic Verification Pass Rate** | 98.3 % | Good, but 1.7 % of runs hit non‑deterministic gates |
| **Review Artifact Completion** | 94 % | 6 % of PRs missing required review artifacts |
| **Branch/PR Scope Violations** | 12 (0.7 %) | Minor, but should be closed |

---

### 1. Key Findings

| Area | Observation | Impact | Recommended Action |
|------|-------------|--------|--------------------|
| **Failure Spike (Days 12‑14)** | 42 % of all failures occurred on 2026‑04‑28 to 2026‑04‑30. | Elevated error‑rate (15 % of daily runs). | Review the change set merged on 2026‑04‑27; run a deterministic verification gate on the affected branches.
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

The repaired artifact must be a **single, self‑contained workflow file** (e.g., `open‑pr.yml`) that satisfies every static‑ and runtime‑precheck defined in the AgentWorkforce pipeline. Below is the contract broken into concrete sections that the auto‑fix persona must guarantee before the PR can be merged.

| # | Requirement | How to Verify | Success Criterion |
|---|-------------|---------------|--------------------|
| 1 | **Correct module import** | ‑ Static lint (`eslint`/`tsc`) <br>‑ Runtime pre‑check `importResolver` | The file contains `import { createGitHubStep } from "@agent‑relay/github‑primitive"` and the resolver finds the exact package version listed in `package.json`. |
| 2 | **Valid `createGitHubStep` usage** | ‑ Schema validation against `github‑primitive` step contract <br>‑ Runtime sandbox execution of the step factory | The call returns an object that i

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants