Skip to content

Latest commit

 

History

History
378 lines (273 loc) · 16.1 KB

File metadata and controls

378 lines (273 loc) · 16.1 KB

Relay — Roadmap

Product Direction

Vision: Co-pilot assistant — every fix is a proposal you review. Quality and control over speed and autonomy.

Near-term goal: Open source the project once fundamentals are solid. No rush on new integrations — get the existing pipeline trustworthy and visible first.


Phase 1 — Visibility (Current Priority)

The pipeline is a black box between "issue picked up" and "Telegram message arrives." Goal: make every step observable.

1.1 Structured Failure Reports

Right now failed is just a status with no context. Each failure mode should surface a specific diagnosis.

Failure modes to distinguish:

  • Triage timeout (exit 143) → "Triage timed out after 2m — try increasing triageTimeout"
  • Triage parse error → "Claude returned malformed JSON — see raw output with relay logs <id>"
  • Fix timeout → "Fix timed out after 5m — issue may be too complex"
  • Tests failed after fix → "Claude committed changes but bun test failed — review diff before retrying"
  • No commits made → "Claude ran but made no file changes — may be unfixable or needs more context"
  • Git/worktree error → "Worktree creation failed — check repo path and permissions"
  • PR creation failed → "Branch pushed but gh pr create failed — check gh auth status"

Changes needed:

  • Add failure_reason column to DB
  • fix.ts and triage.ts classify errors before setting status to failed
  • Telegram failure messages show the specific reason + suggested action
  • relay show <id> displays failure_reason

1.2 relay logs <id> Command

Pipe the raw Claude stdout (NDJSON stream) for a specific issue run to the terminal. Useful for debugging bad fixes or understanding what Claude did.

  • Store Claude's raw output to a temp file per issue run (e.g., /tmp/relay-<id>.ndjson)
  • relay logs <id> tails or cats that file
  • relay logs <id> --follow follows live during an active fix
  • Optional: --pretty flag to render as readable text (strip stream-json envelope)

1.3 Streaming Telegram Progress

Currently Telegram shows "🔧 Fixing..." and updates every 3s with generic text. Show what Claude is actually doing:

  • Which file it's currently editing (tool_use events already stream)
  • How many tool calls so far
  • Running cost estimate

Example message evolution:

🔧 Fixing... (0s)
🔧 Fixing... reading src/worker/fix.ts, src/db.ts (12s · 8 tool calls)
🔧 Fixing... editing src/db.ts (34s · 15 tool calls · ~$0.04)

The onEvent callback in claude.ts already fires tool_use events — wire them into the Telegram progress edit.

1.4 Local Web Dashboard

A bun-served page at localhost:7842 showing live pipeline state.

Views:

  • Queue — active, pending, recent issues with status badges and severity
  • Issue detail — triage verdict + plan, Claude output log, diff viewer, cost breakdown, failure reason
  • Timeline — chronological feed of all pipeline events

Tech:

  • Bun.serve() with WebSocket for live updates (daemon pushes events)
  • No frameworks — plain HTML + minimal JS (open source friendliness, no build step)
  • Daemon exposes a local HTTP+WS server alongside the poll loop
  • Auth: none (localhost only)

Commands:

  • relay start --ui — start daemon with dashboard enabled
  • Or always-on, configured via "ui": { "enabled": true, "port": 7842 } in config

Phase 1.5 — Web UI as Primary Interface

The web dashboard currently shows issue state but has no interactivity. Telegram is required for every workspace. Goal: make the web UI a fully self-sufficient control plane, with Telegram (and future Slack) as optional notification connectors.

Why Before Phase 2

Phase 2 features (confidence gating, test-gate, PR feedback loop) need a proper UI to display detailed data — confidence charts, test output, review comment threads. Building on the Telegram-only model would mean cramming rich data into chat messages. The web UI is the right surface for this, so it should be feature-complete first.

1.5.1 Make Telegram Optional

Right now telegram.botToken and telegram.chatId are required fields. Change this:

  • Make telegram optional in WorkspaceConfig (remove validation errors when missing)
  • Guard every ws.telegram.* call with if (ws.telegram) in daemon.ts
  • Dashboard starts unconditionally (already does)
  • Without Telegram, the web UI becomes the sole control interface

Config change:

// Before (required):
{ "key": "acme", "telegram": { "botToken": "...", "chatId": "..." }, "projects": [...] }

// After (optional):
{ "key": "acme", "projects": [...] }
// or with Telegram:
{ "key": "acme", "telegram": { "botToken": "...", "chatId": "..." }, "projects": [...] }

1.5.2 Web UI Action Buttons

Add interactive controls to the issue detail view matching every Telegram callback action:

  • Start — visible when pending_confirmation
  • Skip — visible when pending_confirmation
  • Accept — visible when pending_approval
  • Discard — visible when pending_approval
  • Retry — visible when failed

Backend:

  • POST /api/issues/:id/actions with { "action": "fix" | "accept" | "discard" | "skip" | "retry" }
  • Reuse existing handleCallback() logic — extract it into a shared handler that both Telegram and the web API call
  • Return updated issue row in response

Frontend:

  • Buttons render conditionally based on issue.status
  • Optimistic UI update on click, roll back on error
  • Buttons disable while action is in flight

1.5.3 Streaming Progress in Web UI

Push real-time triage/fix progress to connected WebSocket clients:

  • New WS event: { type: "progress", issueId, stage, tool, toolCallCount, elapsedMs, costUsd }
  • Dashboard shows a live progress bar/indicator in the issue detail view
  • Reuse the same onEvent callback pipeline — add a second consumer alongside Telegram's createStreamReporter
  • Show tool name, call count, elapsed time, running cost — same info as Telegram but richer (can show a tool call timeline)

1.5.4 Triage Result View

When an issue is in pending_confirmation, show:

  • Verdict badge (actionable/not actionable) with confidence percentage
  • Triage reasoning (expandable)
  • Implementation plan as a numbered checklist
  • Estimated file scope (files mentioned in plan)
  • Start Fix / Skip buttons below

1.5.5 Approval View

When an issue is in pending_approval, show:

  • Fix summary
  • Diff viewer with syntax highlighting (line-level add/remove coloring — already partially exists)
  • Token usage and cost breakdown (input_tokens, output_tokens, cost_usd)
  • Duration
  • Accept / Discard buttons

1.5.6 Failure View

When an issue is failed, show:

  • Failure reason code as a badge
  • Human-readable hint (reuse FAILURE_HINTS from telegram.ts — move to a shared module)
  • Link to raw logs (relay logs <id> equivalent, or inline log viewer)
  • Retry button

1.5.7 Integrations Status Page

New panel in the dashboard showing what's connected per workspace:

  • Endpoint: GET /api/config/status — returns sanitized workspace/project/source info (no secrets)
  • For each workspace: name, notification channels (Telegram connected/not, future: Slack)
  • For each project: name, repo path, base branch, active sources (Sentry, Asana) with connection status
  • Source health check: last successful poll time, error count, rate limit remaining (from adapter state)
  • Render as a settings/config panel accessible from the dashboard header

Example response:

{
  "workspaces": [
    {
      "key": "acme",
      "notifications": { "telegram": true, "slack": false },
      "projects": [
        {
          "key": "acme-web",
          "repoPath": "/Users/dev/acme-web",
          "baseBranch": "main",
          "sources": {
            "sentry": { "connected": true, "org": "acme", "project": "web-app" },
            "asana": { "connected": false }
          }
        }
      ]
    }
  ]
}

1.5.8 Browser Notifications

Opt-in desktop notifications via the Notification API:

  • Trigger on key status transitions: triage complete, fix ready for approval, fix failed
  • User enables via a toggle in the dashboard header (stored in localStorage)
  • No server-side changes needed — dashboard JS listens for WS events and shows notifications

1.5.9 Preserve CLI Debugging

All existing CLI tools remain untouched:

  • relay logs <id> with --follow, --pretty, --stage
  • relay show <id> with failure reason display
  • relay status / relay list
  • Web UI log viewer is additive — it reads the same NDJSON files, not a replacement for CLI access

Phase 2 — Fix Quality

Make the co-pilot trustworthy. Every fix should either be clearly correct or clearly explain why it isn't.

2.1 Confidence Gating

Triage already returns a confidence score (0–1). Use it:

  • confidence >= 0.8 → proceed normally to pending_confirmation
  • 0.5 <= confidence < 0.8 → send Telegram warning: "Low confidence (62%) — review triage plan carefully before starting"
  • confidence < 0.5 → don't offer Start at all. Show plan with a "Force Start" button that requires explicit override.

Configurable per project:

"confidenceThreshold": 0.6,   // below this: warn
"minConfidence": 0.4           // below this: block (Force Fix only)

2.2 Pre-Fix Plan Confirmation

Currently the Triage result card shows the plan but the "Start Fix" button is always visible.

Enhancement: make the triage card more prominent as a review checkpoint:

  • Format the plan as a numbered step list in Telegram
  • Highlight estimated scope: "Touches: src/db.ts, src/worker/fix.ts"
  • Add a "Edit plan" button that lets you append instructions before Claude starts

2.3 Test-Gate on Accept

Don't offer the Accept button if tests are failing in the worktree.

  • After fix.ts completes, run testCommand and store result in DB
  • If tests fail: send Telegram card with failure output, offer "Retry fix" button instead of Accept
  • If tests pass: offer Accept / Discard as normal, show test result summary ("✅ 90 tests passed")
  • Accept button is disabled (or absent) on test failure — never let a broken fix reach PR

2.4 PR Review Feedback Loop

When a reviewer comments on an Relay PR, feed those comments back to Claude to address. Full cycle: fix → PR → review → re-fix.

Flow:

  1. After Accept, store the PR URL in DB
  2. Poll GitHub for review comments on Relay PRs (filter by relay/fix-* branch prefix)
  3. When review comments appear: create a new pipeline item linked to the original issue
  4. Claude resumes the session with review comments as additional context
  5. New commit pushed, PR updated, Telegram notification with diff of changes

Challenges:

  • Avoid re-polling PRs that already got a re-fix (track review_addressed_at in DB)
  • Rate limit GitHub polling (separate interval from issue polling)
  • Session resume may not work if worktree was cleaned up — need to recreate it from the branch

Phase 3 — Open Source Readiness

Do this when Phase 1+2 are solid. Goal: someone can find Relay, install it, and have it running in under 10 minutes.

3.1 Distribution

Multiple install paths to minimise friction:

  • bunx relay — zero-install try without committing to a global install
  • bun build --compile — single self-contained binary, no Bun runtime required
  • brew install adriandmitroca/tap/relay — Homebrew formula for macOS
  • docker-compose.yml — containerised deployment (mounts repo dirs as volumes)
  • systemd unit file for Linux (parallel to the existing macOS LaunchAgent script)

3.2 Multi-Directory Projects

Many real projects are monorepos or have separate repos per layer (e.g., api/ in one repo, frontend/ in another). Support this in project config:

{
  "key": "acme-web",
  "dirs": [
    { "path": "/Users/dev/acme/api", "baseBranch": "main" },
    { "path": "/Users/dev/acme/frontend", "baseBranch": "main" }
  ]
}
  • Claude gets context from all configured directories
  • Triage and fix run with access to the full multi-dir scope
  • Commits and PRs go to the correct repo per changed file
  • Worktrees created per repo that has changes

3.3 Better Onboarding

  • relay init wizard must handle every edge case gracefully (missing gh auth, bad bot token, etc.)
  • Preflight checks with actionable error messages for every dependency
  • relay doctor command — validates entire setup and reports what's broken

3.4 Plugin Architecture

Allow sources and notification channels to be added without modifying core:

  • Sources: npm packages implementing SourceAdapter
  • Notification channels: npm packages implementing a NotificationChannel interface
  • Config: "plugins": ["@relay/linear", "@relay/slack"]

3.5 Slack Notification Channel

Telegram is niche — Slack is where most dev teams operate. Implement as the second notification channel (and first plugin) to validate the plugin architecture.

3.6 Releases & Updates

Release automation:

  • Conventional commits → semantic-release on every merge to main
  • Auto-bumps version in package.json, generates CHANGELOG.md, creates GitHub Release with release notes
  • CI attaches compiled binaries to the GitHub Release as assets (macOS arm64, macOS x64, Linux x64)
  • Homebrew formula auto-updated via a homebrew-tap repo triggered by the release

Update handling per distribution path:

Method How updates work
bunx relay Always pulls latest — no action needed
Binary relay update checks GitHub Releases API, downloads new binary, replaces itself
Homebrew brew upgrade relay — formula points to latest release tarball
Docker User pulls new image tag; latest tag always points to newest release
npm global npm install -g relay@latest

relay update command:

  • Checks https://api.github.com/repos/adriandmitroca/relay/releases/latest against current version
  • If newer: downloads the appropriate binary for the current platform, verifies checksum, replaces the running binary
  • If already latest: prints current version and exits
  • On start: silent background check, prints one-line notice if a new version is available ("Relay v1.2.0 is available — run relay update")

3.7 X.com Launch Announcement

Public launch post positioning Relay as an autonomous dev agent, not just a bug-fixer:

  • Lead with the demo (GIF of dashboard + Telegram callback → merged PR)
  • Core message: "your issues, automatically triaged, fixed, and PR'd — you just review"
  • Cover: sources (Sentry, Asana, Linear, Jira), full web dashboard, optional Telegram
  • Thread format: hook → demo → how it works → how to install → what's next

Open Questions

  • Dashboard port: 7842 or make it configurable from day one?
  • PR feedback loop polling: Use GitHub webhooks (needs tunnel) or poll (simpler, slower)?
  • Confidence defaults: What thresholds feel right? (0.6 / 0.4 are guesses)
  • Log retention: How long to keep raw Claude output? Delete after N days or keep forever?
  • Test-gate strictness: What if there's no testCommand configured? Skip gate or warn?
  • Plugin system timing: Build it before open source, or ship v1 with built-in sources only?
  • Web UI auth: Localhost-only is fine for now, but should we add basic auth or token for remote/Docker setups?
  • Notification channel abstraction: Extract NotificationChannel interface now (Phase 1.5) or defer to plugin architecture (Phase 3)?
  • Dashboard log viewer: Stream NDJSON over WS in real-time, or load-on-demand from temp files via REST?

What's Not on the Roadmap (Intentionally)

  • New source integrations (Linear, Jira, GitHub Issues) — good defaults and solid fundamentals first
  • Auto-accept / full autonomy — co-pilot vision means human stays in the loop
  • Hosted/SaaS version — self-hosted first, community second
  • Webhook mode — polling is fine for now; add later if latency becomes a real problem
  • Issue batching — premature optimisation until volume is actually a problem
  • Mobile-responsive dashboard — desktop-first for now, mobile can come later
  • Multi-user / RBAC — single-user tool for now; auth and roles are a Phase 3+ concern