Suggested approaches for teams adopting agentic development patterns incrementally. Not a rigid plan — a set of paths teams have found useful.
Core principle: Each pattern level delivers value independently. L0 Foundation alone dramatically improves agent navigation. You don't need L4 to benefit from L0. Start where your gaps are, adopt what helps.
Before choosing a path, assess your current state. Run through this checklist to identify gaps.
- CLAUDE.md exists and is under 150 lines
- README.md clearly explains what the project does
- CLAUDE.md links to all relevant documentation
- File structure is grouped by domain, not layer
- Stack tests and E2E/integration tests use real components (no mocks for owned services)
- Complex external dependencies have focused integration tests against real testnet/sandbox
- Tests run in isolated environments (no shared state)
- Test failures provide clear diagnostic signals
- Tests assert on side effects, not just responses
- Every pattern is documented with examples
- Code examples in docs match current codebase
- Documentation is linked from CLAUDE.md
- No orphaned docs (all reachable from master index)
- Git worktrees are used for feature branches
- Linting/formatting is automated
- Pre-commit hooks enforce basic rules
- CI runs tests in isolated environments
The ultimate assessment: give someone (or an agent) with zero context your CLAUDE.md + README + file structure. Can they answer these questions without asking for help?
- What does this project do?
- How do I run it locally?
- Where do I add a new feature?
- What patterns should I follow?
- How do I test my changes?
- What must I never do?
Document every assumption they had to make. Each assumption is a gap.
| Level | Focus | Common Gaps | Quick Win |
|---|---|---|---|
| L0 Foundation | Structure, CLAUDE.md, doc freshness, cleanup | No CLAUDE.md, layer-based organization, stale docs | Write CLAUDE.md, restructure by domain |
| L1 Closed Loop Design | Design-led verification with stack tests and focused integration tests | No stack tests, shallow assertions, no edge-case coverage for complex external deps | Add app-startup stack test |
| L2 Guardrails | Skills, hooks, behavioral rules | No enforcement, agents make common errors | Add test-integrity skill |
| L3 Optimization | Smart routing, structured search | Raw grep/cat commands, token waste | Set up jcodemunch indexing |
| L4 Standards & Measurement | Evidence, drift detection, metrics, context eval | Claims without evidence, spec drift | Establish evidence standard |
Three paths teams have used. Choose based on your situation.
Start with L0, the highest-impact, lowest-effort starting point. Then build upward.
- L0 Foundation — Restructure by domain, write CLAUDE.md, establish doc freshness and cleanup practices. See L0-foundation.md.
- L1 Closed Loop Design — Add stack tests for your most critical user journeys. See L1-feedback-loops.md and the working examples in examples/stack-test/.
- L2 Guardrails — Add skills and hooks for your project's most common errors. See L2-behavioral-guardrails.md.
- L3 Optimization — Set up structured search and routing. See L3-optimization.md and the working example in examples/guardrails/.
- L4 Standards & Measurement — Add drift detection, metrics, and context eval when the earlier levels are stable. See L4-standards-measurement.md.
If your testing is the biggest pain point — no stack tests, shallow assertions, or missing edge-case coverage for external dependencies — start at L1.
- L1 — Ensure that there exists a stack test to cover all user journeys, avoiding duplicated testing as much as possible. For complex external dependencies (payment processors, blockchain RPCs), add focused integration tests against real testnet/sandbox. See Pattern 1.1 — Stack Tests, Pattern 1.5 — Real Dependencies, and examples/stack-test/.
- L0 — Once stack tests are working, structure the project to make them maintainable. Deep modules, CLAUDE.md, progressive disclosure.
- L2-L4 — Continue upward as in Path A.
If agents are making consistent errors (missing tests, ignoring rules, skipping verification), start at L2.
- L2 — Add skills for your most violated rules, hooks to block destructive patterns. See L2-behavioral-guardrails.md.
- L0 — Structure the project so agents have clear context to work with.
- L1, L3-L4 — Fill in remaining levels as capacity allows.
The pattern docs contain the detailed guidance. This guide links to them rather than duplicating content.
| Level | Start With | Working Example |
|---|---|---|
| L0 | L0-foundation.md — Deep modules, CLAUDE.md, progressive disclosure | examples/project-structure/ |
| L1 | L1-feedback-loops.md — Context harvesting, stack tests, full-loop assertions | examples/stack-test/typescript/ or examples/stack-test/python/ |
| L2 | L2-behavioral-guardrails.md — Skills, hooks, constitutional rules | superpowers or rig |
| L3 | L3-optimization.md — Smart routing, structured search, scout pattern | examples/guardrails/ |
| L4 | L4-standards-measurement.md — Evidence, drift detection, metrics, context eval | — |
| System | What it implements | Language |
|---|---|---|
| rig | L2 enforcement pipeline, L3 tool routing + token-optimized scout agent, L4 context eval, skill chain with phase transitions, CI guardrails | TypeScript |
| gstack | L2 skill framework with resolver pipeline, preamble system | TypeScript |
| superpowers | L2 base skills (brainstorming, TDD, verification, review), worktree management | Markdown/JS |
- Incremental adoption — Each level delivers value independently. Don't try to implement everything at once.
- Start with L0 if unsure — Structure and CLAUDE.md have the highest ROI for the least effort.
- Update docs in the same task as code — Deferred documentation is stale documentation. See Pattern 0.7 — Documentation as System Map.
- Remove dead code as you find it — Cleanup is continuous, not periodic. See Pattern 0.8 — Aggressive Cleanup.
- Use the New Starter Test regularly — It catches gaps that code review misses. See Pattern 4.3 — New Starter Standard.
- Skipping assessment — Without understanding your gaps, you may optimize the wrong things.
- Deferring documentation — "I'll update the docs later" means the docs stay stale.
- Allowing constitutional rule exceptions — Rules that bend become suggestions, and suggestions erode.
- Treating cleanup as a quarterly sprint — Dead code is noise. Remove it as you encounter it.
- L0 Foundation — Project structure patterns
- L1 Closed Loop Design — Design-led verification
- L2 Behavioral Guardrails — Skills and hooks
- L3 Optimization — Token efficiency
- L4 Standards & Measurement — Evidence, drift detection, metrics
- Anti-Patterns — Common mistakes to avoid
- Reference Implementation Case Study — Production validation