- Status: accepted
- Date: 2026-04-27
- Authors: Laith Al-Saadoon + Claude
- Branch:
feat/artifact-factory
ADR 0007 committed to shipping an artifact factory inside the Claude Code plugin. The orchestration pattern — how Claude drives the skill → precompute → parallel subagents → assembler flow — is the hard part. It is also solvable from prior art: a single-repo documentation skill in another project has already validated the shape. We port that shape here with three adaptations.
The pattern is:
- Phase 0 (pre-flight, inline in the skill) — read the relevant graph
artifacts plus a small number of structured queries, persist their combined
context to
<output-dir>/.context.md(200-line cap) and<output-dir>/.prefetch.md(a JSON ledger of tool calls with response digests). Subagents read these files instead of re-calling tools. - Phase AB (N subagents in parallel, single message with N Agent tool calls) — each reads the two shared-context files first, then writes its section-specific files.
- Phase CD (small number of subagents in parallel) — diagrams plus specialty sections (e.g. cross-repo in group mode).
- Phase E (inline deterministic assembler) — regex over backtick source
citations builds a co-occurrence index, appends
See also:footers, writesREADME.md+.docmeta.json.
The pattern resolves the three hardest problems in generative documentation:
prompt dedup at fan-out (filesystem, not copy-paste), determinism of
structure (Phase E is LLM-free), and refresh without full regen (mtime
comparison against declared data_sources).
Adopt the four-phase pattern for OpenCodeHub's codehub-document skill with
three deliberate adaptations.
OpenCodeHub's supply-chain tools (verdict, list_findings, license_audit,
list_dead_code, list_findings_delta) pre-digest a lot of analysis output,
so we land on six subagents instead of a larger fan-out:
| Subagent | Scope |
|---|---|
doc-architecture |
System overview, module map, data flow |
doc-reference |
Public API, CLI surface, MCP tool surface |
doc-behavior |
Processes, state machines |
doc-analysis |
Risk hotspots, ownership, dead code (driven by verdict / risk_trends / owners / list_dead_code) |
doc-diagrams |
Mermaid diagrams (component, sequence, dependency) |
doc-cross-repo |
Group mode only. Consumes group_contracts + group_query. |
Single-repo documentation is the bootstrap case; group mode is the wedge.
Our Phase 0 writer emits context either to .codehub/ (single-repo) or
.codehub/groups/<name>/ (group mode). Phase AB fans out 4 subagents per
repo in group mode — batched by role if the cardinality exceeds Claude Code's
concurrent-subagent ceiling (~10 per message, per brainstorm 004 and
spec 001 AC-5-1). Phase E builds a cross-repo link graph in addition to the
single-repo See-also footers.
The base pattern's Phase E regex matches backtick citations in path:LOC
form. We extend the grammar to also match repo:path:LOC for group mode,
and the .docmeta.json schema adds a cross_repo_refs[] array. See ADR
0009 for the full output contract.
- Phase 0 writes two files on disk; subagents read them before touching any MCP tool.
- Every subagent prompt follows an 8-section scaffold: frontmatter, output files, input specification, process, document format rules, tool usage guide, fallback paths, quality checklist.
- Phase E is deterministic Markdown assembly, no LLM call. Regex → join → footer → manifest.
.docmeta.jsonis the source of truth for--refresh. Mtime comparison againstsection.sources[]decides what to regenerate.- No YAML frontmatter on generated outputs. H1 is the identifier.
- Generated docs have backtick source citations (
`path:LOC`or`repo:path:LOC`).
- Proven pattern. The shape has run in production elsewhere; our risk is adaptation, not invention.
- Prompt dedup is a filesystem property, not a prompt-engineering
property. Per-subagent prompts stay small; the context lives in
.context.md+.prefetch.md. - Determinism where it matters. Phase E is regex + join. Same inputs, same cross-links. Prose is LLM-generated (non-deterministic), but structure and citations are deterministic.
- Group mode comes for free once the topology is named. The unique
OpenCodeHub wedge (
group_contracts+group_query) fits as a single additional subagent.
- Subagent tool sprawl. Each doc-* carries 6–10 MCP tools plus
Read/Write/Grep/Glob. Context-bloat-from-tool-metadata is the realistic
failure mode, not bad prompts. Mitigation is baked into the subagent
prompts: each opens with "do not re-call tools whose digest is in
.prefetch.md" plus a Tool Usage Guide table. - Parallel subagent ceiling. Claude Code caps concurrent Agent calls
at ~10 per message. Groups of 3+ repos require role-batched dispatch
(all
doc-architecturefirst, thendoc-behavior, etc.). Verified against the current Claude Code release whencodehub-document --groupships.
- Pattern divergences are recorded in this ADR and in ADR 0009. Internal prompt templates and references document the adaptations without naming the external prior art.
docs/adr/0007-artifact-factory.md— the parent decisiondocs/adr/0009-artifact-output-conventions.md— output contract.erpaval/brainstorms/004-opencodehub-subagent-prompts.md— per-agent 8-section scaffolds.erpaval/brainstorms/005-opencodehub-output-conventions.md— citation grammar +.docmeta.jsonschema