Goal: after the harness is fully implemented, a user should be able to type a short task such as:
openjia run "Build a small portfolio website" --llm-backend deepagentsand receive a runnable implementation with traceable planning, scoped generation, command evidence, evaluation, and repair artifacts under .harness/.
This plan is not tied to any single fixture. Older milestones used Todo List as an acceptance fixture because it is small, visual, and testable; the runtime path now uses generic scaffolds and LLM-owned task implementation.
What works now:
.envcan provide MiniMax API credentials.- Planner can call MiniMax and produce
FEATURE_SPEC.json. - Planner and Generator can run through DeepAgents SDK with
--llm-backend deepagents. - Feature ledger and progress artifacts are written.
- PlanFeasibilityGate, ContractGate, SelfVerifyGate, EvaluationGate, and command logging exist.
- Evaluator can run shell commands and write command evidence.
- Empty simple web tasks can bootstrap a dependency-light generic web runtime.
- Deterministic Generator can create a generic static web shell inside contract scope.
- Generator runs required commands and writes real self-verification logs.
- Browser E2E opens the generated app, performs smoke verification, and runs a generic CRUD interaction probe when matching controls exist.
- Evaluator collects evidence such as
test-results/page-smoke.htmlandtest-results/crud-interactions.txt.
What does not yet work:
- General-purpose DeepAgents generation is now available, but direct tool-use integration is still experimental.
- Repair loop does not persist repeated failure fingerprints or perform root-cause escalation.
openjia runcan create and verify simple static web app paths, but does not yet start a persistent dev server or print a final URL.
-
User runs:
openjia run "Build a small portfolio website with a projects section and contact call to action" --llm-backend deepagents -
Initializer creates
.harness/and detects whether the target directory is empty, Python, JS, TS, Vite, React, or another supported stack. -
Planner creates
FEATURE_SPEC.json,ROADMAP.md,FEATURE_LEDGER.json, andPROGRESS.md. -
PlanFeasibilityGate accepts only if every feature has testable acceptance criteria.
-
SprintSelector selects the smallest sprint, usually
S001. -
Contract negotiation writes
CONTRACT_PROPOSAL.yaml,CONTRACT_REVIEW.md, andCONTRACT.yaml. -
ContextCurator writes
CONTEXT_MANIFEST.yaml. -
Generator modifies only allowed files and writes code, tests,
GENERATOR_PLAN.md,CHANGESET.md, andSELF_VERIFY_REPORT.md. -
SelfVerifyGate blocks if required commands were not run, logs are missing, placeholders remain, or command exits are nonzero.
-
Evaluator runs build/test/e2e commands, collects evidence, checks scope, and writes
EVAL_REPORT.json. -
If failed, EvaluationGate writes
REPAIR_PACKET.md; Generator repairs; Evaluator reruns. -
If passed, FeatureLedger updates acceptance criteria to pass and FinalQA writes a summary.
-
CLI prints final status, run command, evidence paths, and failing artifacts if blocked.
Status: mostly complete.
Acceptance:
openjia llm-smoke --llm-backend minimax --model MiniMax-M2.7returns valid JSON.pytest -qpasses.git check-ignore .envconfirms the secret file is ignored.
Status: complete for simple static web tasks.
Acceptance:
- Empty temp directory plus
openjia run "Build a small portfolio website"produces app files. npm run buildand browser checks are discoverable.- Existing projects are not overwritten without contract scope.
Status: initial interface complete; general file generation is experimental.
Acceptance:
- Generator receives only
CONTRACT.yaml,CONTEXT_MANIFEST.yaml, allowed file contents, and repair packets. - Writes are applied through guarded filesystem checks.
- Forbidden file changes are blocked.
CHANGESET.mdlists actual changed files.
Status: complete for current flow.
Acceptance:
- SelfVerifyGate fails if Generator does not run required commands.
- SelfVerifyGate passes when required commands produce logs and exit 0.
Status: complete for generic smoke checks and common CRUD-style interactions.
Acceptance:
- Evaluator catches a UI that builds but cannot satisfy required interaction evidence.
- Evaluator catches missing persistence when required.
- Evidence includes screenshot, command output, and log index paths.
Status: pending.
Acceptance:
- Same failed acceptance criterion twice triggers RCA.
- Max attempts mark the sprint blocked with
BLOCKER_REPORT.md.
Status: pending.
Acceptance:
- After a passing run, user can read the final command and URL from CLI output or
FINAL_REPORT.md.
Fixture command:
openjia run "Build a small portfolio website with a projects section and contact call to action" --llm-backend minimaxDeepAgents runtime fixture:
openjia run "Build a small portfolio website with a projects section and contact call to action" --llm-backend deepagents --model MiniMax-M2.7Expected features:
- Render the requested page content
- Include project cards or sections
- Include a contact call to action
- Remain usable on desktop and mobile
Expected verification:
npm run build- unit/static test
- browser smoke test
- generated acceptance test when available
- console log check
- screenshot evidence
Success condition:
EVAL_REPORT.json.overall_status == "pass"- all feature ledger acceptance criteria are
pass - final report contains app run instructions