End-to-End Execution Plan

Goal: after the harness is fully implemented, a user should be able to type a short task such as:

openjia run "Build a small portfolio website" --llm-backend deepagents

and receive a runnable implementation with traceable planning, scoped generation, command evidence, evaluation, and repair artifacts under .harness/.

This plan is not tied to any single fixture. Older milestones used Todo List as an acceptance fixture because it is small, visual, and testable; the runtime path now uses generic scaffolds and LLM-owned task implementation.

Current Reality

What works now:

.env can provide MiniMax API credentials.
Planner can call MiniMax and produce FEATURE_SPEC.json.
Planner and Generator can run through DeepAgents SDK with --llm-backend deepagents.
Feature ledger and progress artifacts are written.
PlanFeasibilityGate, ContractGate, SelfVerifyGate, EvaluationGate, and command logging exist.
Evaluator can run shell commands and write command evidence.
Empty simple web tasks can bootstrap a dependency-light generic web runtime.
Deterministic Generator can create a generic static web shell inside contract scope.
Generator runs required commands and writes real self-verification logs.
Browser E2E opens the generated app, performs smoke verification, and runs a generic CRUD interaction probe when matching controls exist.
Evaluator collects evidence such as test-results/page-smoke.html and test-results/crud-interactions.txt.

What does not yet work:

General-purpose DeepAgents generation is now available, but direct tool-use integration is still experimental.
Repair loop does not persist repeated failure fingerprints or perform root-cause escalation.
openjia run can create and verify simple static web app paths, but does not yet start a persistent dev server or print a final URL.

Target User Flow

User runs:

openjia run "Build a small portfolio website with a projects section and contact call to action" --llm-backend deepagents

Initializer creates .harness/ and detects whether the target directory is empty, Python, JS, TS, Vite, React, or another supported stack.
Planner creates FEATURE_SPEC.json, ROADMAP.md, FEATURE_LEDGER.json, and PROGRESS.md.
PlanFeasibilityGate accepts only if every feature has testable acceptance criteria.
SprintSelector selects the smallest sprint, usually S001.
Contract negotiation writes CONTRACT_PROPOSAL.yaml, CONTRACT_REVIEW.md, and CONTRACT.yaml.
ContextCurator writes CONTEXT_MANIFEST.yaml.
Generator modifies only allowed files and writes code, tests, GENERATOR_PLAN.md, CHANGESET.md, and SELF_VERIFY_REPORT.md.
SelfVerifyGate blocks if required commands were not run, logs are missing, placeholders remain, or command exits are nonzero.
Evaluator runs build/test/e2e commands, collects evidence, checks scope, and writes EVAL_REPORT.json.
If failed, EvaluationGate writes REPAIR_PACKET.md; Generator repairs; Evaluator reruns.
If passed, FeatureLedger updates acceptance criteria to pass and FinalQA writes a summary.
CLI prints final status, run command, evidence paths, and failing artifacts if blocked.

Implementation Milestones

Milestone 1: Safe Configuration and Provider Readiness

Status: mostly complete.

Acceptance:

openjia llm-smoke --llm-backend minimax --model MiniMax-M2.7 returns valid JSON.
pytest -q passes.
git check-ignore .env confirms the secret file is ignored.

Milestone 2: Stack-Aware Project Bootstrap

Status: complete for simple static web tasks.

Acceptance:

Empty temp directory plus openjia run "Build a small portfolio website" produces app files.
npm run build and browser checks are discoverable.
Existing projects are not overwritten without contract scope.

Milestone 3: LLM Generator Backend

Status: initial interface complete; general file generation is experimental.

Acceptance:

Generator receives only CONTRACT.yaml, CONTEXT_MANIFEST.yaml, allowed file contents, and repair packets.
Writes are applied through guarded filesystem checks.
Forbidden file changes are blocked.
CHANGESET.md lists actual changed files.

Milestone 4: Command-Running Self Verification

Status: complete for current flow.

Acceptance:

SelfVerifyGate fails if Generator does not run required commands.
SelfVerifyGate passes when required commands produce logs and exit 0.

Milestone 5: Browser Evaluator for Web Apps

Status: complete for generic smoke checks and common CRUD-style interactions.

Acceptance:

Evaluator catches a UI that builds but cannot satisfy required interaction evidence.
Evaluator catches missing persistence when required.
Evidence includes screenshot, command output, and log index paths.

Milestone 6: Repair Loop That Learns

Status: pending.

Acceptance:

Same failed acceptance criterion twice triggers RCA.
Max attempts mark the sprint blocked with BLOCKER_REPORT.md.

Milestone 7: Final App Run Report

Status: pending.

Acceptance:

After a passing run, user can read the final command and URL from CLI output or FINAL_REPORT.md.

First End-to-End Acceptance Fixture

Fixture command:

openjia run "Build a small portfolio website with a projects section and contact call to action" --llm-backend minimax

DeepAgents runtime fixture:

openjia run "Build a small portfolio website with a projects section and contact call to action" --llm-backend deepagents --model MiniMax-M2.7

Expected features:

Render the requested page content
Include project cards or sections
Include a contact call to action
Remain usable on desktop and mobile

Expected verification:

npm run build
unit/static test
browser smoke test
generated acceptance test when available
console log check
screenshot evidence

Success condition:

EVAL_REPORT.json.overall_status == "pass"
all feature ledger acceptance criteria are pass
final report contains app run instructions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End-to-End Execution Plan

Current Reality

Target User Flow

Implementation Milestones

Milestone 1: Safe Configuration and Provider Readiness

Milestone 2: Stack-Aware Project Bootstrap

Milestone 3: LLM Generator Backend

Milestone 4: Command-Running Self Verification

Milestone 5: Browser Evaluator for Web Apps

Milestone 6: Repair Loop That Learns

Milestone 7: Final App Run Report

First End-to-End Acceptance Fixture

FilesExpand file tree

END_TO_END_EXECUTION_PLAN.md

Latest commit

History

END_TO_END_EXECUTION_PLAN.md

File metadata and controls

End-to-End Execution Plan

Current Reality

Target User Flow

Implementation Milestones

Milestone 1: Safe Configuration and Provider Readiness

Milestone 2: Stack-Aware Project Bootstrap

Milestone 3: LLM Generator Backend

Milestone 4: Command-Running Self Verification

Milestone 5: Browser Evaluator for Web Apps

Milestone 6: Repair Loop That Learns

Milestone 7: Final App Run Report

First End-to-End Acceptance Fixture