Skip to content

Latest commit

 

History

History
492 lines (393 loc) · 25 KB

File metadata and controls

492 lines (393 loc) · 25 KB

sd0x-dev-flow

sd0x-dev-flow banner

语言: English | 繁體中文 | 简体中文 | 日本語 | 한국어 | Español

给 Claude Code 的 harness 层。

AI 跳不过的质量关卡。 一个面向 Claude Code 的 AI Agent Harness Engineering reference implementation — 具备 hook 强制双审查、可跨 context compaction 存活的 state-machine gates,以及在关键环节 fail-closed 的安全机制。

96 bundled · 96 public skills · 15 agents — 仅占 Claude context window 的 ~4%

License: MIT npm

这个 harness 做了什么

Harness engineering 是一门专注于工程化 LLM 周边一切的学科 — tool loops、context management、hooks、state machines、safety layers — 而不是训练模型本身。Mitchell Hashimoto 在 2026 年 2 月首次提出这个词;Anthropic engineeringMartin Fowler 都已发表相关文章;arXiv 2603.05344 则对其做了形式化定义。

sd0x-dev-flow 是一个 reference implementation。下表的每一行都把一个典型的 harness 子问题映射到可供研究的具体代码:

# Harness 子问题 sd0x-dev-flow 实现 代码证据
1 Tool loop control /codex-review-fast/precommit 的 auto-loop,由 sentinel 驱动状态转移 rules/auto-loop.md + hooks/post-tool-review-state.sh
2 Sentinel-driven state machine ✅ Ready / ⛔ Blocked / ✅ All Pass 这些 gate 标记被解析为持久化状态 scripts/emit-review-gate.sh (producer) + hooks/post-tool-review-state.sh (parser)
3 Context recovery across compaction SessionStart(compact) 之后通过 stdout 注入 [AUTO_LOOP_RESUME] hooks/post-compact-auto-loop.sh
4 Lifecycle interceptors 5 类 hook 事件分派到 8 个脚本:PreToolUse / PostToolUse / Stop / SessionStart / UserPromptSubmit hooks/ (8 个脚本) + .claude/settings.json
5 Capability-based tool gating Skill frontmatter 的 allowed-tools — 例如 /ask 不具备 Edit/Write 权限 95 个公开 skills 中有 86 个声明了 allowed-tools
6 Defense-in-depth safety 5 层防护:pre-edit-guard → commit-msg-guard → pre-push-gate → stop-guard → sidecar fail-closed 标记 scripts/pre-push-gate.sh + scripts/commit-msg-guard.sh + hooks/stop-guard.sh
7 Generator-evaluator split 双审查:Codex(主)+ Claude(次)在每一轮审查循环中并行分派 rules/codex-invocation.md + rules/auto-loop.md (Dual Review Mode)
8 Incremental progress tracking iteration_history.current_round + max_rounds + 收敛停滞侦测 rules/auto-loop.md (exit conditions + strategic reset)
9 Human-in-the-loop safety gates 针对破坏性操作使用 /dev/tty 确认 + AskUserQuestion scripts/pre-push-gate.sh + skills/push-ci/SKILL.md
10 Self-improvement loop 纠正 → 记录 lesson → 累计 3 次以上后晋升为 rule rules/self-improvement.md

多数 harness 项目只覆盖其中 2 – 4 项。sd0x-dev-flow 覆盖全部 10 项 — 这让它的代码不只是工具,更是值得研读的学习素材。

为什么选择 sd0x-dev-flow?

没有防护时 有 sd0x-dev-flow
Context 过长时 AI 跳过审查 Hook 强制:stop-guard 阻止未完成的审查
单一审查者遗漏问题 双审查分派:Codex + 次要审查者并行
「已修复」却没有重新验证 Auto-loop:修复 → 重新审查 → 通过 → 继续
审查状态在 compact 后丢失 状态追踪:SessionStart hook 重新注入

快速开始

# 安装插件
/plugin marketplace add sd0xdev/sd0x-dev-flow
/plugin install sd0x-dev-flow@sd0xdev-marketplace

# 配置项目
/project-setup

一个命令自动检测框架、包管理器、数据库、入口文件和脚本命令。安装部分 rules 和 hooks;完整插件包含 14 条 rules + 9 个 hooks。

使用 --lite 仅配置 CLAUDE.md(跳过 rules/hooks)。

工作原理

flowchart LR
    P["🎯 Plan"] --> B["🔨 Build"]
    B --> G["🛡️ Gate"]
    G --> S["🚀 Ship"]

    P -.- P1["/codex-brainstorm<br/>/feasibility-study<br/>/tech-spec"]
    B -.- B1["/feature-dev<br/>/bug-fix<br/>/codex-implement"]
    G -.- G1["/codex-review-fast<br/>/precommit<br/>/codex-test-review"]
    S -.- S1["/smart-commit<br/>/push-ci<br/>/create-pr<br/>/pr-review"]
Loading

Auto-Loop 引擎自动执行质量关卡——代码编辑后,review 命令会分派双 Reviewer 并行审查(Codex MCP + 次要 reviewer 同步进行)。Findings 会去重、severity 正规化,并汇整为单一 gate。在 strict 模式下,Hooks 强制 fail-closed 语义:汇整 gate 未完成时,stop-guard 会阻止停止。详见 docs/hooks.md

详细:双 Reviewer 时序图
sequenceDiagram
    participant D as Developer
    participant C as Claude
    participant X as Codex MCP
    participant T as Secondary Reviewer
    participant H as Hooks

    D->>C: Edit code
    H->>H: Track file change
    C->>H: emit-review-gate PENDING
    par Dual Review
        C->>X: Codex review (sandbox)
    and
        C->>T: Task(code-reviewer)
    end
    X-->>C: Findings (primary)
    T-->>C: Findings (secondary)
    C->>C: Aggregate + dedup + gate
    C->>H: emit-review-gate READY/BLOCKED

    alt Issues found
        C->>C: Fix all issues
        C->>X: --continue threadId
        X-->>C: Re-verify
    end

    C->>C: /precommit (auto)
    C-->>D: ✅ All gates passed

    Note over H: Strict mode: incomplete gate → blocked
Loading

功能亮点:双 Reviewer 架构

v2.0 并行分派两个独立 reviewer — 默认双 reviewer 并行审查,支持降级 fallback 模式:

Reviewer 角色 降级策略
Codex MCP 主要(sandbox,完整 diff) 不可用时退回单 reviewer 模式
次要(pr-review-toolkit) 置信度评分制审查 strict-reviewer → 单 reviewer 模式

Findings 会严重度正规化(P0-Nit)、去重(file + issue key,±5 行容差),并标记来源codex | toolkit | both)。

Gate:✅ Ready⛔ Blocked — strict 模式下,未完成 gate = blocked。

如何比较

能力 sd0x-dev-flow gstack 通用 prompts
强制审查关卡 Hook + 行为层 仅建议
双审查者 Codex + 次要(并行) 单一 /review
自动修复循环 修复 → 重新审查 → 通过 手动
多 Agent 研究 /deep-research(3 agents)
对抗式验证 纳什均衡辩论
自我改进 教训记录 + 规则提升 仅 /retro 统计
跨工具支持 Codex/Cursor/Windsurf Claude/Codex/Gemini/Cursor N/A

适用场景

适合 不太适合
使用 Claude Code 的个人或小团队项目 完全不使用 Claude Code 的团队
需要自动化审查关卡的项目 没有 CI 的一次性脚本
Codex CLI / Cursor / Windsurf 用户(skills 子集) 需要自定义 LLM provider 的项目
质量关卡可防止 regression 的仓库 没有测试基础设施的仓库

安装

Codex CLI / 其他 AI Agent

# 通过 Agent Skills 标准安装单个 skill
npx skills add sd0xdev/sd0x-dev-flow

# 生成 AGENTS.md + 安装 hooks(在 Claude Code 中执行)
/codex-setup init
方式 适用工具 覆盖范围
插件安装 Claude Code 完整(96 bundled skills、hooks、rules、auto-loop)
npx skills add Codex CLI、Cursor、Windsurf、Aider 仅 Skills(96 public skills)
/codex-setup init Codex CLI AGENTS.md kernel + git hooks

环境要求:Claude Code 2.1+ | Codex MCP(选用 — /codex-* skills 需要;未安装时退回单 reviewer 模式)

工作流路径

工作流 命令 Gate 执行层
功能开发 /feature-dev/verify/codex-review-fast/precommit ✅/⛔ Hook + 行为层
缺陷修复 /issue-analyze/bug-fix/verify/precommit ✅/⛔ Hook + 行为层
Auto-Loop 代码编辑 → /codex-review-fast/precommit ✅/⛔ Hook
文档审查 .md 编辑 → /codex-review-doc ✅/⛔ Hook
规划 /codex-brainstorm/feasibility-study/tech-spec
入门引导 /project-setup/repo-intake
可视化:工作流程图
flowchart TD
    subgraph feat ["🔨 Feature Development"]
        F1["/feature-dev"] --> F2["Code + Tests"]
        F2 --> F3["/verify"]
        F3 --> F4["/codex-review-fast"]
        F4 --> F5["/precommit"]
        F5 --> F6["/update-docs"]
    end

    subgraph fix ["🐛 Bug Fix"]
        B1["/issue-analyze"] --> B2["/bug-fix"]
        B2 --> B3["Fix + Regression test"]
        B3 --> B4["/verify"]
        B4 --> B5["/codex-review-fast"]
        B5 --> B6["/precommit"]
    end

    subgraph docs ["📝 Docs Only"]
        D1["Edit .md"] --> D2["/codex-review-doc"]
        D2 --> D3["Done"]
    end

    subgraph plan ["🎯 Planning"]
        P1["/codex-brainstorm"] --> P2["/feasibility-study"]
        P2 --> P3["/tech-spec"]
        P3 --> P4["/codex-architect"]
        P4 --> P5["Implementation ready"]
    end

    subgraph ops ["⚙️ Operations"]
        O1["/project-setup"] --> O2["/repo-intake"]
        O2 --> O3["Develop"]
        O3 --> O4["/project-audit"]
        O3 --> O7["/best-practices"]
        O3 --> O5["/risk-assess"]
        O4 --> O6["/next-step --go"]
        O5 --> O6
        O7 --> O6
    end
Loading

实战指南(Cookbook)

展示真实场景下如何组合使用各技能及其执行顺序。

场景 流程 文档
第一天上手新仓库 /project-setup/repo-intake/next-step
实现新功能 /feature-dev/verify/codex-test-review/codex-review-fast/precommit
处理 PR 审查意见 /load-pr-review → 修复 → /codex-review-fast/push-ci
合并前安全检查 /codex-security/dep-audit/risk-assess/pre-pr-audit
精选组合: 验证方向 /deep-research/best-practices/feasibility-study/codex-brainstorm
精选组合: 对抗式设计 /codex-brainstorm(纳什均衡式辩论)→ /codex-architect

全部 10 个场景 →

包含内容

类别 数量 示例
Skills 96 public (96 bundled) /project-setup, /codex-review-fast, /verify, /smart-commit, /deep-research
代理 15 strict-reviewer, verify-app, coverage-analyst, architecture-designer
钩子 9 pre-edit-guard, auto-format, review state tracking, stop guard, namespace hint, post-compact-auto-loop, post-skill-auto-loop, user-prompt-review-guard, session-init
规则 14 auto-loop, auto-loop-project, codex-invocation, security, testing, git-workflow, self-improvement, context-management
脚本 13 precommit runner, verify runner, dep audit, namespace hint, skill runner, commit-msg guard, pre-push gate, utils (shared lib), emit-review-gate, build-codex-artifacts, resolve-feature (CLI + shell), feature-resolver, readme-catalog

极小的 Context 占用

~4% 的 Claude 200k context window——96% 留给你的代码。

组件 Tokens 占 200k 比例
Rules(常驻加载) 5.1k 2.6%
Skills(按需加载) 1.9k 1.0%
Agents 791 0.4%
合计 ~8k ~4%

Skills 按需加载。闲置 Skill 不占用任何 Token。

技能参考

Skill 使用场景
/project-setup 首次项目配置
/bug-fix 修复缺陷与解决问题
/feature-dev 端到端实现新功能
/smart-commit 智能分组提交变更
/push-ci 推送代码并监控 CI
/create-pr 创建 GitHub Pull Request
/codex-review-fast 快速代码审查(仅 diff)
/codex-review-doc 审查文档变更
/codex-security OWASP Top 10 安全审计
/verify 运行完整验证链
/precommit 提交前质量关卡(lint + build + test)
/precommit-fast 快速提交前检查(lint + test,跳过 build)
/codex-brainstorm 对抗式头脑风暴(纳什均衡)
/tech-spec 编写技术规格书
/pr-review 合并前 PR 自查
全部 96 个 public skills

开发 (33)

Skill Description
/ask 具备上下文感知的 Q&A,自动收集上下文信息。
/bug-fix Bug fix workflow.
/bump-version Bump package and plugin version in sync.
/code-explore Pure Claude code investigation.
/code-investigate Dual-perspective code investigation.
/codex-architect Codex architecture consulting.
/codex-implement Implement features via Codex MCP.
/codex-setup Initialize sd0x-dev-flow infrastructure for Codex CLI and other non-Claude agents.
/create-pr Create or update GitHub PR with gh CLI.
/debug Interactive debugging workflow with hypothesis-driven probe loop.
/deep-explore Multi-wave parallel code exploration orchestrator.
/epic-merge 将堆叠的 PR 链顺序 squash-merge 合并到 epic 分支。
/feature-dev Feature development workflow.
/feature-verify Feature verification (READ-ONLY, P0-P5).
/git-investigate Git history investigation.
/git-profile Git identity and GPG signing profile manager.
/install-hooks Install plugin hooks into project .claude/ for persistent use without plugin loaded
/install-rules Install plugin rules into project .claude/rules/ for persistent use without plugin loaded
/install-scripts Install plugin runner scripts into project .claude/scripts/ for persistent use without plugin loaded
/issue-analyze GitHub Issue and PR review thread deep analysis with Codex blind verdict.
/jira Jira integration — view issues, generate branches, create tickets, transition status.
/load-pr-review Load GitHub PR review comments into AI session — analyze, triage, plan.
/merge-prep Pre-merge analysis and preparation.
/next-step Change-aware next step advisor.
/post-dev-test Post-development test completion.
/pr-comment Post friendly review comments to a GitHub PR — prepare locally, preview, then submit as atomic review.
/project-setup Project configuration initialization.
/push-ci Push to remote and monitor CI.
/remind Lightweight model correction with context-aware rule loading.
/repo-intake Project initialization inventory (one-time).
/smart-commit Smart batch commit.
/smart-rebase Smart partial rebase for squash-merge repositories.
/watch-ci Monitor GitHub Actions CI runs until completion.

审查 (Codex MCP) (14)

Skill Description 循环支持
/codex-cli-review Code review via Codex CLI with full disk access. -
/codex-code-review Code review using Codex MCP. -
/codex-explain Explain complex code via Codex MCP. -
/codex-review Full second-opinion using Codex MCP (with lint:fix + build). --continue <threadId>
/codex-review-branch Fully automated review of an entire feature branch using Codex MCP -
/codex-review-doc Review documents using Codex MCP. --continue <threadId>
/codex-review-fast Quick second-opinion using Codex MCP (diff only, no tests). --continue <threadId>
/codex-security OWASP Top 10 security review using Codex MCP. --continue <threadId>
/codex-test-gen Generate unit tests for specified functions using Codex MCP -
/codex-test-review Review test case sufficiency using Codex MCP, suggest additional edge cases. --continue <threadId>
/doc-review Document review via Codex MCP. -
/security-review Security review via Codex MCP. -
/seek-verdict Independent second-opinion verification for any finding. -
/test-review Test coverage review via Codex MCP. -

验证 (13)

Skill Description
/best-practices Industry best practices conformance audit with mandatory adversarial debate.
/check-coverage Comprehensive assessment of Unit / Integration / E2E three-layer test coverage, identify gaps and provide actionable ...
/dep-audit Audit dependency security risks
/dev-security-audit Comprehensive developer workstation security audit — scans for exposed credentials, compromised application data, per...
/necessity-audit Necessity audit for over-designed spec elements.
/pre-pr-audit Pre-PR confidence audit with 5-dimension scoring.
/precommit Pre-commit checks — lint:fix -> build -> test
/precommit-fast Quick pre-commit checks — lint:fix -> test
/project-audit Project health audit with deterministic scoring.
/risk-assess Uncommitted code risk assessment with breaking change detection, blast radius analysis, and scope metrics.
/test-deep Context-aware test orchestration.
/test-health Holistic test coverage measurement.
/verify Verification loop — lint -> typecheck -> unit -> integration -> e2e

规划 (16)

Skill Description
/architecture Architecture design and documentation.
/codex-brainstorm Adversarial brainstorming via Claude+Codex debate.
/deep-analyze Deep-dive analysis of an initial proposal — research code implementation, produce an actionable roadmap and alternatives
/deep-research Universal multi-source research orchestration.
/feasibility-study Feasibility analysis from first principles.
/fp-brief First-principles briefing from technical documents.
/post-dev-recap Guided post-dev recap wrapper — scope detection + doc generation + Q&A.
/project-brief Convert a technical spec into a PM/CTO-readable executive summary.
/recap-ask Recap-bounded Q&A follow-up over an existing briefing-recap.
/recap-doc Post-development recap document generator with blind-spot detection.
/req-analyze Requirements analysis — problem decomposition, stakeholder scan, requirement structuring.
/request-tracking Request tracking knowledge base.
/review-spec Review technical spec documents from completeness, feasibility, risk, and code consistency perspectives.
/tech-brief Technical briefing for developer sharing.
/tech-spec Tech spec generation and review.
/ui-first-principles First-principles UI/IA reasoning: turns a <scenario> + API field set into JTBD analysis, principle-anchored field-p...

文档与工具 (20)

Skill Description
/claude-health Claude Code config health check + plugin sync.
/contract-decode EVM contract error and calldata decoder.
/create-request Create, update, or scan per-task request tickets for progress tracking.
/de-ai-flavor Remove AI artifacts from documents.
/doc-refactor Refactor documents — simplify without losing information, visualize flows with sequenceDiagram.
/generate-runner Generate a customized precommit runner for any ecosystem.
/obsidian-cli Obsidian vault integration via official CLI.
/op-session Initialize 1Password CLI session for Claude Code.
/portfolio Portfolio system knowledge base.
/pr-review PR self-review — review changes, produce checklist, update rules
/pr-summary List open PRs, filter automation PRs, group by ticket ID, format as Markdown.
/refactor Multi-target refactoring orchestrator.
/runbook Generate/update feature release runbook
/safe-remove Safely remove plugin assets (skill/agent/rule/script/hook) with dependency detection and reference cleanup.
/sharingan Replicate knowledge from any source as sd0x-dev-flow skill definition.
/simplify Wrap-up refactoring — simplify code, eliminate duplication, preserve behavior
/skill-health-check Validate skill quality against routing, progressive loading, and verification criteria.
/statusline-config Customize Claude Code statusline.
/update-docs Research current code state then update corresponding docs, ensuring docs stay in sync with code.
/zh-tw Rewrite the previous reply in Traditional Chinese

规则与钩子

14 条规则(常驻加载的规范)+ 9 个钩子(自动化防护栏)。

定制化:编辑 auto-loop-project.md 可覆写项目的 auto-loop 行为。插件更新不会冲突 — 详见 Rule Override Pattern

完整的规则、钩子与环境变量参考,请见 docs/rules.mddocs/hooks.md

自定义配置

运行 /project-setup 自动检测并配置所有占位符,或手动编辑 .claude/CLAUDE.md

占位符 说明 示例
{PROJECT_NAME} 项目名称 my-app
{FRAMEWORK} 框架 MidwayJS 3.x, NestJS, Express
{CONFIG_FILE} 主配置文件 src/configuration.ts
{BOOTSTRAP_FILE} 启动入口 bootstrap.js, main.ts
{DATABASE} 数据库 MongoDB, PostgreSQL
{TEST_COMMAND} 测试命令 yarn test:unit
{LINT_FIX_COMMAND} Lint 自动修复 yarn lint:fix
{BUILD_COMMAND} 构建命令 yarn build
{TYPECHECK_COMMAND} 类型检查 yarn typecheck

展示:多 Agent 研究

执行 /deep-research 可调度 2-3 个并行研究 agent,跨越网络来源、代码库与社区知识 — 搭配 claim registry 综合与条件式对抗辩论。

特性 内容
Agents 2-3 个并行(web + code + community)
综合 Claim registry 共识检测
验证 条件式 /codex-brainstorm 辩论
评分 4 信号完整度模型

完整文档

架构

Command (entry) → Skill (capability) → Agent (environment)
  • Commands:用户通过 /... 触发
  • Skills:按需加载的知识库
  • Agents:拥有特定工具的隔离子代理
  • Hooks:自动化防护栏(格式化、审查状态、停止守卫)
  • Rules:始终生效的规范(自动加载)

高级架构详情(agentic control stack、控制回路理论、沙箱规则)参见 docs/architecture.md

贡献

欢迎 PR。请:

  1. 遵循现有命名规范(kebab-case)
  2. 在技能中包含 When to Use / When NOT to Use
  3. 对危险操作添加 disable-model-invocation: true
  4. 提交前用 Claude Code 测试

许可证

MIT

Star History

Star History Chart