sd0x-dev-flow

给 Claude Code 的 harness 层。

AI 跳不过的质量关卡。 一个面向 Claude Code 的 AI Agent Harness Engineering reference implementation — 具备 hook 强制双审查、可跨 context compaction 存活的 state-machine gates，以及在关键环节 fail-closed 的安全机制。

96 bundled · 96 public skills · 15 agents — 仅占 Claude context window 的 ~4%

这个 harness 做了什么

Harness engineering 是一门专注于工程化 LLM 周边一切的学科 — tool loops、context management、hooks、state machines、safety layers — 而不是训练模型本身。Mitchell Hashimoto 在 2026 年 2 月首次提出这个词；Anthropic engineering 与 Martin Fowler 都已发表相关文章；arXiv 2603.05344 则对其做了形式化定义。

sd0x-dev-flow 是一个 reference implementation。下表的每一行都把一个典型的 harness 子问题映射到可供研究的具体代码：

#	Harness 子问题	sd0x-dev-flow 实现	代码证据
1	Tool loop control	`/codex-review-fast` → `/precommit` 的 auto-loop，由 sentinel 驱动状态转移	`rules/auto-loop.md` + `hooks/post-tool-review-state.sh`
2	Sentinel-driven state machine	`✅ Ready` / `⛔ Blocked` / `✅ All Pass` 这些 gate 标记被解析为持久化状态	`scripts/emit-review-gate.sh` (producer) + `hooks/post-tool-review-state.sh` (parser)
3	Context recovery across compaction	SessionStart(compact) 之后通过 stdout 注入 `[AUTO_LOOP_RESUME]`	`hooks/post-compact-auto-loop.sh`
4	Lifecycle interceptors	5 类 hook 事件分派到 8 个脚本：PreToolUse / PostToolUse / Stop / SessionStart / UserPromptSubmit	`hooks/` (8 个脚本) + `.claude/settings.json`
5	Capability-based tool gating	Skill frontmatter 的 `allowed-tools` — 例如 `/ask` 不具备 Edit/Write 权限	95 个公开 skills 中有 86 个声明了 `allowed-tools`
6	Defense-in-depth safety	5 层防护：pre-edit-guard → commit-msg-guard → pre-push-gate → stop-guard → sidecar fail-closed 标记	`scripts/pre-push-gate.sh` + `scripts/commit-msg-guard.sh` + `hooks/stop-guard.sh`
7	Generator-evaluator split	双审查：Codex（主）+ Claude（次）在每一轮审查循环中并行分派	`rules/codex-invocation.md` + `rules/auto-loop.md` (Dual Review Mode)
8	Incremental progress tracking	`iteration_history.current_round` + `max_rounds` + 收敛停滞侦测	`rules/auto-loop.md` (exit conditions + strategic reset)
9	Human-in-the-loop safety gates	针对破坏性操作使用 `/dev/tty` 确认 + `AskUserQuestion`	`scripts/pre-push-gate.sh` + `skills/push-ci/SKILL.md`
10	Self-improvement loop	纠正 → 记录 lesson → 累计 3 次以上后晋升为 rule	`rules/self-improvement.md`

多数 harness 项目只覆盖其中 2 – 4 项。sd0x-dev-flow 覆盖全部 10 项 — 这让它的代码不只是工具，更是值得研读的学习素材。

为什么选择 sd0x-dev-flow？

没有防护时	有 sd0x-dev-flow
Context 过长时 AI 跳过审查	Hook 强制：stop-guard 阻止未完成的审查
单一审查者遗漏问题	双审查分派：Codex + 次要审查者并行
「已修复」却没有重新验证	Auto-loop：修复 → 重新审查 → 通过 → 继续
审查状态在 compact 后丢失	状态追踪：SessionStart hook 重新注入

快速开始

# 安装插件
/plugin marketplace add sd0xdev/sd0x-dev-flow
/plugin install sd0x-dev-flow@sd0xdev-marketplace

# 配置项目
/project-setup

一个命令自动检测框架、包管理器、数据库、入口文件和脚本命令。安装部分 rules 和 hooks；完整插件包含 14 条 rules + 9 个 hooks。

使用 --lite 仅配置 CLAUDE.md（跳过 rules/hooks）。

工作原理

flowchart LR
    P["🎯 Plan"] --> B["🔨 Build"]
    B --> G["🛡️ Gate"]
    G --> S["🚀 Ship"]

    P -.- P1["/codex-brainstorm<br/>/feasibility-study<br/>/tech-spec"]
    B -.- B1["/feature-dev<br/>/bug-fix<br/>/codex-implement"]
    G -.- G1["/codex-review-fast<br/>/precommit<br/>/codex-test-review"]
    S -.- S1["/smart-commit<br/>/push-ci<br/>/create-pr<br/>/pr-review"]

Auto-Loop 引擎自动执行质量关卡——代码编辑后，review 命令会分派双 Reviewer 并行审查（Codex MCP + 次要 reviewer 同步进行）。Findings 会去重、severity 正规化，并汇整为单一 gate。在 strict 模式下，Hooks 强制 fail-closed 语义：汇整 gate 未完成时，stop-guard 会阻止停止。详见 docs/hooks.md。

详细：双 Reviewer 时序图

sequenceDiagram
    participant D as Developer
    participant C as Claude
    participant X as Codex MCP
    participant T as Secondary Reviewer
    participant H as Hooks

    D->>C: Edit code
    H->>H: Track file change
    C->>H: emit-review-gate PENDING
    par Dual Review
        C->>X: Codex review (sandbox)
    and
        C->>T: Task(code-reviewer)
    end
    X-->>C: Findings (primary)
    T-->>C: Findings (secondary)
    C->>C: Aggregate + dedup + gate
    C->>H: emit-review-gate READY/BLOCKED

    alt Issues found
        C->>C: Fix all issues
        C->>X: --continue threadId
        X-->>C: Re-verify
    end

    C->>C: /precommit (auto)
    C-->>D: ✅ All gates passed

    Note over H: Strict mode: incomplete gate → blocked

功能亮点：双 Reviewer 架构

v2.0 并行分派两个独立 reviewer — 默认双 reviewer 并行审查，支持降级 fallback 模式：

Reviewer	角色	降级策略
Codex MCP	主要（sandbox，完整 diff）	不可用时退回单 reviewer 模式
次要（pr-review-toolkit）	置信度评分制审查	strict-reviewer → 单 reviewer 模式

Findings 会严重度正规化（P0-Nit）、去重（file + issue key，±5 行容差），并标记来源（codex | toolkit | both）。

Gate：✅ Ready 或 ⛔ Blocked — strict 模式下，未完成 gate = blocked。

如何比较

能力	sd0x-dev-flow	gstack	通用 prompts
强制审查关卡	Hook + 行为层	仅建议	无
双审查者	Codex + 次要（并行）	单一 /review	无
自动修复循环	修复 → 重新审查 → 通过	手动	无
多 Agent 研究	/deep-research（3 agents）	无	无
对抗式验证	纳什均衡辩论	无	无
自我改进	教训记录 + 规则提升	仅 /retro 统计	无
跨工具支持	Codex/Cursor/Windsurf	Claude/Codex/Gemini/Cursor	N/A

适用场景

适合	不太适合
使用 Claude Code 的个人或小团队项目	完全不使用 Claude Code 的团队
需要自动化审查关卡的项目	没有 CI 的一次性脚本
Codex CLI / Cursor / Windsurf 用户（skills 子集）	需要自定义 LLM provider 的项目
质量关卡可防止 regression 的仓库	没有测试基础设施的仓库

安装

Codex CLI / 其他 AI Agent

# 通过 Agent Skills 标准安装单个 skill
npx skills add sd0xdev/sd0x-dev-flow

# 生成 AGENTS.md + 安装 hooks（在 Claude Code 中执行）
/codex-setup init

方式	适用工具	覆盖范围
插件安装	Claude Code	完整（96 bundled skills、hooks、rules、auto-loop）
`npx skills add`	Codex CLI、Cursor、Windsurf、Aider	仅 Skills（96 public skills）
`/codex-setup init`	Codex CLI	AGENTS.md kernel + git hooks

环境要求：Claude Code 2.1+ | Codex MCP（选用 — /codex-* skills 需要；未安装时退回单 reviewer 模式）

工作流路径

工作流	命令	Gate	执行层
功能开发	`/feature-dev` → `/verify` → `/codex-review-fast` → `/precommit`	✅/⛔	Hook + 行为层
缺陷修复	`/issue-analyze` → `/bug-fix` → `/verify` → `/precommit`	✅/⛔	Hook + 行为层
Auto-Loop	代码编辑 → `/codex-review-fast` → `/precommit`	✅/⛔	Hook
文档审查	`.md` 编辑 → `/codex-review-doc`	✅/⛔	Hook
规划	`/codex-brainstorm` → `/feasibility-study` → `/tech-spec`	—	—
入门引导	`/project-setup` → `/repo-intake`	—	—

可视化：工作流程图

flowchart TD
    subgraph feat ["🔨 Feature Development"]
        F1["/feature-dev"] --> F2["Code + Tests"]
        F2 --> F3["/verify"]
        F3 --> F4["/codex-review-fast"]
        F4 --> F5["/precommit"]
        F5 --> F6["/update-docs"]
    end

    subgraph fix ["🐛 Bug Fix"]
        B1["/issue-analyze"] --> B2["/bug-fix"]
        B2 --> B3["Fix + Regression test"]
        B3 --> B4["/verify"]
        B4 --> B5["/codex-review-fast"]
        B5 --> B6["/precommit"]
    end

    subgraph docs ["📝 Docs Only"]
        D1["Edit .md"] --> D2["/codex-review-doc"]
        D2 --> D3["Done"]
    end

    subgraph plan ["🎯 Planning"]
        P1["/codex-brainstorm"] --> P2["/feasibility-study"]
        P2 --> P3["/tech-spec"]
        P3 --> P4["/codex-architect"]
        P4 --> P5["Implementation ready"]
    end

    subgraph ops ["⚙️ Operations"]
        O1["/project-setup"] --> O2["/repo-intake"]
        O2 --> O3["Develop"]
        O3 --> O4["/project-audit"]
        O3 --> O7["/best-practices"]
        O3 --> O5["/risk-assess"]
        O4 --> O6["/next-step --go"]
        O5 --> O6
        O7 --> O6
    end

实战指南（Cookbook）

展示真实场景下如何组合使用各技能及其执行顺序。

场景	流程	文档
第一天上手新仓库	`/project-setup` → `/repo-intake` → `/next-step`	→
实现新功能	`/feature-dev` → `/verify` → `/codex-test-review` → `/codex-review-fast` → `/precommit`	→
处理 PR 审查意见	`/load-pr-review` → 修复 → `/codex-review-fast` → `/push-ci`	→
合并前安全检查	`/codex-security` → `/dep-audit` → `/risk-assess` → `/pre-pr-audit`	→
精选组合：验证方向	`/deep-research` → `/best-practices` → `/feasibility-study` → `/codex-brainstorm`	→
精选组合：对抗式设计	`/codex-brainstorm`（纳什均衡式辩论）→ `/codex-architect`	→

全部 10 个场景 →

包含内容

类别	数量	示例
Skills	96 public (96 bundled)	`/project-setup`, `/codex-review-fast`, `/verify`, `/smart-commit`, `/deep-research`
代理	15	strict-reviewer, verify-app, coverage-analyst, architecture-designer
钩子	9	pre-edit-guard, auto-format, review state tracking, stop guard, namespace hint, post-compact-auto-loop, post-skill-auto-loop, user-prompt-review-guard, session-init
规则	14	auto-loop, auto-loop-project, codex-invocation, security, testing, git-workflow, self-improvement, context-management
脚本	13	precommit runner, verify runner, dep audit, namespace hint, skill runner, commit-msg guard, pre-push gate, utils (shared lib), emit-review-gate, build-codex-artifacts, resolve-feature (CLI + shell), feature-resolver, readme-catalog

极小的 Context 占用

~4% 的 Claude 200k context window——96% 留给你的代码。

组件	Tokens	占 200k 比例
Rules（常驻加载）	5.1k	2.6%
Skills（按需加载）	1.9k	1.0%
Agents	791	0.4%
合计	~8k	~4%

Skills 按需加载。闲置 Skill 不占用任何 Token。

技能参考

Skill	使用场景
`/project-setup`	首次项目配置
`/bug-fix`	修复缺陷与解决问题
`/feature-dev`	端到端实现新功能
`/smart-commit`	智能分组提交变更
`/push-ci`	推送代码并监控 CI
`/create-pr`	创建 GitHub Pull Request
`/codex-review-fast`	快速代码审查（仅 diff）
`/codex-review-doc`	审查文档变更
`/codex-security`	OWASP Top 10 安全审计
`/verify`	运行完整验证链
`/precommit`	提交前质量关卡（lint + build + test）
`/precommit-fast`	快速提交前检查（lint + test，跳过 build）
`/codex-brainstorm`	对抗式头脑风暴（纳什均衡）
`/tech-spec`	编写技术规格书
`/pr-review`	合并前 PR 自查

全部 96 个 public skills

开发 (33)

Skill	Description
`/ask`	具备上下文感知的 Q&A，自动收集上下文信息。
`/bug-fix`	Bug fix workflow.
`/bump-version`	Bump package and plugin version in sync.
`/code-explore`	Pure Claude code investigation.
`/code-investigate`	Dual-perspective code investigation.
`/codex-architect`	Codex architecture consulting.
`/codex-implement`	Implement features via Codex MCP.
`/codex-setup`	Initialize sd0x-dev-flow infrastructure for Codex CLI and other non-Claude agents.
`/create-pr`	Create or update GitHub PR with gh CLI.
`/debug`	Interactive debugging workflow with hypothesis-driven probe loop.
`/deep-explore`	Multi-wave parallel code exploration orchestrator.
`/epic-merge`	将堆叠的 PR 链顺序 squash-merge 合并到 epic 分支。
`/feature-dev`	Feature development workflow.
`/feature-verify`	Feature verification (READ-ONLY, P0-P5).
`/git-investigate`	Git history investigation.
`/git-profile`	Git identity and GPG signing profile manager.
`/install-hooks`	Install plugin hooks into project .claude/ for persistent use without plugin loaded
`/install-rules`	Install plugin rules into project .claude/rules/ for persistent use without plugin loaded
`/install-scripts`	Install plugin runner scripts into project .claude/scripts/ for persistent use without plugin loaded
`/issue-analyze`	GitHub Issue and PR review thread deep analysis with Codex blind verdict.
`/jira`	Jira integration — view issues, generate branches, create tickets, transition status.
`/load-pr-review`	Load GitHub PR review comments into AI session — analyze, triage, plan.
`/merge-prep`	Pre-merge analysis and preparation.
`/next-step`	Change-aware next step advisor.
`/post-dev-test`	Post-development test completion.
`/pr-comment`	Post friendly review comments to a GitHub PR — prepare locally, preview, then submit as atomic review.
`/project-setup`	Project configuration initialization.
`/push-ci`	Push to remote and monitor CI.
`/remind`	Lightweight model correction with context-aware rule loading.
`/repo-intake`	Project initialization inventory (one-time).
`/smart-commit`	Smart batch commit.
`/smart-rebase`	Smart partial rebase for squash-merge repositories.
`/watch-ci`	Monitor GitHub Actions CI runs until completion.

审查 (Codex MCP) (14)

Skill	Description	循环支持
`/codex-cli-review`	Code review via Codex CLI with full disk access.	-
`/codex-code-review`	Code review using Codex MCP.	-
`/codex-explain`	Explain complex code via Codex MCP.	-
`/codex-review`	Full second-opinion using Codex MCP (with lint:fix + build).	`--continue <threadId>`
`/codex-review-branch`	Fully automated review of an entire feature branch using Codex MCP	-
`/codex-review-doc`	Review documents using Codex MCP.	`--continue <threadId>`
`/codex-review-fast`	Quick second-opinion using Codex MCP (diff only, no tests).	`--continue <threadId>`
`/codex-security`	OWASP Top 10 security review using Codex MCP.	`--continue <threadId>`
`/codex-test-gen`	Generate unit tests for specified functions using Codex MCP	-
`/codex-test-review`	Review test case sufficiency using Codex MCP, suggest additional edge cases.	`--continue <threadId>`
`/doc-review`	Document review via Codex MCP.	-
`/security-review`	Security review via Codex MCP.	-
`/seek-verdict`	Independent second-opinion verification for any finding.	-
`/test-review`	Test coverage review via Codex MCP.	-

验证 (13)

Skill	Description
`/best-practices`	Industry best practices conformance audit with mandatory adversarial debate.
`/check-coverage`	Comprehensive assessment of Unit / Integration / E2E three-layer test coverage, identify gaps and provide actionable ...
`/dep-audit`	Audit dependency security risks
`/dev-security-audit`	Comprehensive developer workstation security audit — scans for exposed credentials, compromised application data, per...
`/necessity-audit`	Necessity audit for over-designed spec elements.
`/pre-pr-audit`	Pre-PR confidence audit with 5-dimension scoring.
`/precommit`	Pre-commit checks — lint:fix -> build -> test
`/precommit-fast`	Quick pre-commit checks — lint:fix -> test
`/project-audit`	Project health audit with deterministic scoring.
`/risk-assess`	Uncommitted code risk assessment with breaking change detection, blast radius analysis, and scope metrics.
`/test-deep`	Context-aware test orchestration.
`/test-health`	Holistic test coverage measurement.
`/verify`	Verification loop — lint -> typecheck -> unit -> integration -> e2e

规划 (16)

Skill	Description
`/architecture`	Architecture design and documentation.
`/codex-brainstorm`	Adversarial brainstorming via Claude+Codex debate.
`/deep-analyze`	Deep-dive analysis of an initial proposal — research code implementation, produce an actionable roadmap and alternatives
`/deep-research`	Universal multi-source research orchestration.
`/feasibility-study`	Feasibility analysis from first principles.
`/fp-brief`	First-principles briefing from technical documents.
`/post-dev-recap`	Guided post-dev recap wrapper — scope detection + doc generation + Q&A.
`/project-brief`	Convert a technical spec into a PM/CTO-readable executive summary.
`/recap-ask`	Recap-bounded Q&A follow-up over an existing briefing-recap.
`/recap-doc`	Post-development recap document generator with blind-spot detection.
`/req-analyze`	Requirements analysis — problem decomposition, stakeholder scan, requirement structuring.
`/request-tracking`	Request tracking knowledge base.
`/review-spec`	Review technical spec documents from completeness, feasibility, risk, and code consistency perspectives.
`/tech-brief`	Technical briefing for developer sharing.
`/tech-spec`	Tech spec generation and review.
`/ui-first-principles`	First-principles UI/IA reasoning: turns a `<scenario>` + API field set into JTBD analysis, principle-anchored field-p...

文档与工具 (20)

Skill	Description
`/claude-health`	Claude Code config health check + plugin sync.
`/contract-decode`	EVM contract error and calldata decoder.
`/create-request`	Create, update, or scan per-task request tickets for progress tracking.
`/de-ai-flavor`	Remove AI artifacts from documents.
`/doc-refactor`	Refactor documents — simplify without losing information, visualize flows with sequenceDiagram.
`/generate-runner`	Generate a customized precommit runner for any ecosystem.
`/obsidian-cli`	Obsidian vault integration via official CLI.
`/op-session`	Initialize 1Password CLI session for Claude Code.
`/portfolio`	Portfolio system knowledge base.
`/pr-review`	PR self-review — review changes, produce checklist, update rules
`/pr-summary`	List open PRs, filter automation PRs, group by ticket ID, format as Markdown.
`/refactor`	Multi-target refactoring orchestrator.
`/runbook`	Generate/update feature release runbook
`/safe-remove`	Safely remove plugin assets (skill/agent/rule/script/hook) with dependency detection and reference cleanup.
`/sharingan`	Replicate knowledge from any source as sd0x-dev-flow skill definition.
`/simplify`	Wrap-up refactoring — simplify code, eliminate duplication, preserve behavior
`/skill-health-check`	Validate skill quality against routing, progressive loading, and verification criteria.
`/statusline-config`	Customize Claude Code statusline.
`/update-docs`	Research current code state then update corresponding docs, ensuring docs stay in sync with code.
`/zh-tw`	Rewrite the previous reply in Traditional Chinese

规则与钩子

14 条规则（常驻加载的规范）+ 9 个钩子（自动化防护栏）。

定制化：编辑 auto-loop-project.md 可覆写项目的 auto-loop 行为。插件更新不会冲突 — 详见 Rule Override Pattern。

完整的规则、钩子与环境变量参考，请见 docs/rules.md 与 docs/hooks.md。

自定义配置

运行 /project-setup 自动检测并配置所有占位符，或手动编辑 .claude/CLAUDE.md：

占位符	说明	示例
`{PROJECT_NAME}`	项目名称	my-app
`{FRAMEWORK}`	框架	MidwayJS 3.x, NestJS, Express
`{CONFIG_FILE}`	主配置文件	src/configuration.ts
`{BOOTSTRAP_FILE}`	启动入口	bootstrap.js, main.ts
`{DATABASE}`	数据库	MongoDB, PostgreSQL
`{TEST_COMMAND}`	测试命令	yarn test:unit
`{LINT_FIX_COMMAND}`	Lint 自动修复	yarn lint:fix
`{BUILD_COMMAND}`	构建命令	yarn build
`{TYPECHECK_COMMAND}`	类型检查	yarn typecheck

展示：多 Agent 研究

执行 /deep-research 可调度 2-3 个并行研究 agent，跨越网络来源、代码库与社区知识 — 搭配 claim registry 综合与条件式对抗辩论。

特性	内容
Agents	2-3 个并行（web + code + community）
综合	Claim registry 共识检测
验证	条件式 /codex-brainstorm 辩论
评分	4 信号完整度模型

完整文档

架构

Command (entry) → Skill (capability) → Agent (environment)

Commands：用户通过 /... 触发
Skills：按需加载的知识库
Agents：拥有特定工具的隔离子代理
Hooks：自动化防护栏（格式化、审查状态、停止守卫）
Rules：始终生效的规范（自动加载）

高级架构详情（agentic control stack、控制回路理论、沙箱规则）参见 docs/architecture.md。

贡献

欢迎 PR。请：

遵循现有命名规范（kebab-case）
在技能中包含 When to Use / When NOT to Use
对危险操作添加 disable-model-invocation: true
提交前用 Claude Code 测试

许可证

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sd0x-dev-flow

这个 harness 做了什么

为什么选择 sd0x-dev-flow？

快速开始

工作原理

功能亮点：双 Reviewer 架构

如何比较

适用场景

安装

Codex CLI / 其他 AI Agent

工作流路径

实战指南（Cookbook）

包含内容

极小的 Context 占用

技能参考

开发 (33)

审查 (Codex MCP) (14)

验证 (13)

规划 (16)

文档与工具 (20)

规则与钩子

自定义配置

展示：多 Agent 研究

架构

贡献

许可证

Star History

FilesExpand file tree

README.zh-CN.md

Latest commit

History

README.zh-CN.md

File metadata and controls

sd0x-dev-flow

这个 harness 做了什么

为什么选择 sd0x-dev-flow？

快速开始

工作原理

功能亮点：双 Reviewer 架构

如何比较

适用场景

安装

Codex CLI / 其他 AI Agent

工作流路径

实战指南（Cookbook）

包含内容

极小的 Context 占用

技能参考

开发 (33)

审查 (Codex MCP) (14)

验证 (13)

规划 (16)

文档与工具 (20)

规则与钩子

自定义配置

展示：多 Agent 研究

架构

贡献

许可证

Star History