How to split features into agent-comparable tasks
Based on v2: 15 tasks, perfect parallelization, zero conflicts.
Understanding the pipeline is critical for writing effective task files:
- You write a task file (markdown) with requirements, constraints, and planning doc references
- Prompter AI (Phase 1) reads your task file + codebase and generates a detailed writer prompt
- Writer AIs (Phase 2) receive the generated prompt and implement the task — they also have full codebase access and can read planning docs directly
- Judge AIs (Phase 4+) review implementations against the original task requirements
This two-stage pipeline has key implications:
| Do This | Not This | Why |
|---|---|---|
| Reference planning docs | Duplicate planning doc content | Writers can read the originals directly |
| State requirements and constraints | Write step-by-step procedures | Writers reason better from goals than instructions |
| Show 2-5 line style patterns | Include full code implementations | Verbose examples cause copy-paste, not reasoning |
| Describe anti-patterns as constraints | Write full bad/good code blocks | Brief constraints are more generalizable |
| Be specific about WHAT | Be prescriptive about HOW | Prescriptive tasks prevent writers from finding better solutions |
Research-backed insight: Goal-oriented prompts with clear constraints consistently outperform verbose procedural instructions. AI agents exploit surface patterns in code examples rather than reasoning about requirements — which means long code examples can actually reduce solution quality.
Create tasks that are:
- ✅ Comparable - 2 agents can propose different valid approaches
- ✅ Independent - Can run in parallel without conflicts
- ✅ Complete - Shippable unit of value
- ✅ Sized right - 2-8 hours (sweet spot for quality comparison)
❌ "Add a constant"
❌ "Update import"
❌ "Fix typo"
Problem: No room for different approaches, not worth dual-writer overhead
✅ "CRUD Factory Core" (02-crud-factory-core)
✅ "Auth Middleware" (02-auth-middleware)
✅ "OpenAPI Generator" (03-openapi-generator)
Perfect: Enough complexity for approaches to differ, small enough to compare
❌ "Complete authentication system" (auth + RBAC + sessions + tests)
❌ "Full frontend" (too many decisions)
Problem: Too many sub-decisions, hard to judge holistically, long feedback cycles
Feature: Error Handling
Split:
02-error-handler.ts- Core error middleware02-error-types.ts- Error taxonomy (if complex enough)02-error-tests.ts- Test suite (if substantial)
v2 example: Single task (02-error-handler) worked well at 323 lines
Feature: Observability Stack
Split:
02-structured-logging- Logger setup02-otel-instrumentation- Tracing02-otel-collector-config- Collector setup
v2 example: Combined into one (02-observability-complete) due to tight integration
Feature: Database Layer
Split:
02-db-middleware-and-rls- Request-scoped DB, transactions02-data-driver-and-repos- Tenant-scoped wrapper02-migration-helpers- Schema factory utilities
v2 example: Worked perfectly - each independently valuable
Feature: API + SDK
Split:
03-openapi-generator- Zod → OpenAPI spec03-sdk-build- OpenAPI → TypeScript SDK
v2 example: Perfect split - clean interface between tasks
One agent owns composition:
agent-0-integrator owns:
- server.ts
- main.tsx
- route files
- config assembly
Domain agents own modules:
02-auth-middleware owns:
- lib/middleware/auth.ts
- Does NOT touch server.ts
00-integrator later:
- Imports auth middleware
- Registers in server.ts
Why: Prevents conflicts, enables parallel work
In each task file:
## Owned Paths
This task owns:
- apps/api/src/lib/auth/**
- apps/api/src/lib/auth.ts
Must NOT modify:
- apps/api/src/server.ts (owned by integrator)
- Other middleware filesPhase 02 had 9 parallel tasks:
- Each owned distinct paths
- Zero file overlap
- Integrator wired them together in Phase 04
- Result: All 9 ran simultaneously!
Every task must have (high-value sections):
- Goal - One sentence, clear deliverable
- Required Reading - Planning docs the writer MUST read (these are the golden source)
- Requirements - Specific, testable deliverables with acceptance criteria
- Constraints - Architecture rules, technical limits, KISS principles
- Anti-Patterns - Brief constraints on what NOT to do (1-2 sentences each, not code blocks)
- Owned Paths - Clear file ownership (critical for parallel safety)
- Acceptance Criteria - Testable definition of done
- Integration Points - Dependencies and what this enables
Include sparingly:
- Style Reference - Only if there's a pattern NOT already in planning docs. Keep to 2-5 lines showing the convention, not a full implementation.
- Implementation Steps - Only if ordering genuinely matters (e.g., "migrations before seed data"). Otherwise, let the writer decide the approach.
Avoid:
- Full code implementations (writers copy instead of reasoning)
- Duplicating planning doc content (writers can read the originals)
- Step-by-step procedures for obvious work (writers are capable AI models)
- Generic boilerplate sections with no task-specific content
Task: 02-crud-factory-core.md
What made it good:
- Clear scope: Factory pattern for CRUD endpoints
- Self-contained: All in
lib/crud/ - Comparable: Many ways to implement factories
- Testable: Integration tests possible
- Result: Huge architectural debate, synthesis produced 73% simpler solution!
Owned paths:
apps/api/src/lib/crud/
├── types.ts
├── register.ts
└── register.test.ts
Did NOT touch: server.ts, resource files (separate tasks)
Task: 02-auth-middleware.md
What made it good:
- Single responsibility: JWT verification
- Clear planning reference:
planning/rbac.md - Integration point: Exports middleware, integrator registers
- Testable: Can mock Auth0
- Result: Unanimous vote B (3/3), clean implementation
Owned paths:
apps/api/src/lib/middleware/auth.ts
Task: 02-migration-helpers.md (added mid-stream!)
Origin story:
- Not in original plan
- After CRUD factory: Realized boilerplate pain
- Orchestrator proposed: "Factory for tenant table migrations"
- Human approved
- Task created and executed
- Result: 80% boilerplate reduction (50→10 lines)
Lesson: Plans evolve! Don't be rigid.
❌ BAD: "Complete user management system"
(auth + CRUD + validation + tests + docs)
✅ GOOD: "User CRUD endpoints"
(specific, builds on auth from previous task)
❌ BAD: "Improve error handling"
(where? how? success criteria?)
✅ GOOD: "Error envelope middleware per api-conventions.md"
(specific doc reference, clear deliverable)
❌ BAD: Both auth and RBAC tasks modify server.ts
(conflict!)
✅ GOOD: Both export middleware, integrator wires later
(parallel-safe)
❌ BAD: Just requirements, no why
✅ GOOD: "Builds on 02-auth-middleware, needed for CRUD"
(agents understand the bigger picture)
Signs of a good task file:
- Requirements are specific enough to test, open enough to allow different approaches
- Planning doc references tell the writer WHERE to find patterns (not what the patterns are)
- Anti-patterns are brief constraints ("Don't use X because Y — use Z instead")
- Acceptance criteria are concrete and verifiable
- Total length is 100-250 lines (shorter is better if requirements are clear)
Signs of a bloated task file:
- Code examples longer than 5 lines (writer will copy, not reason)
- Sections that restate planning doc content (wasted context window)
- Step-by-step procedures for straightforward work ("install X, then configure Y")
- Generic boilerplate sections with no task-specific content
- Anti-pattern blocks with full bad/good implementations (10+ lines each)
- Total length exceeds 300 lines
The test: If you removed all code examples and the task is still clear, the examples weren't adding value. If the task is unclear without them, the requirements section needs work.
When deciding how to split, ask:
1. Can two agents propose meaningfully different approaches?
- If NO → Too small or too prescriptive
- If YES → Good task!
2. Can it run in parallel with other tasks?
- If NO → Check dependencies, might need different phase
- If YES → Good for velocity!
3. Does it have clear value when done?
- If NO → Might be too granular
- If YES → Shippable unit!
4. Can you test/verify it independently?
- If NO → Scope might be unclear
- If YES → Clear acceptance criteria!
5. Is it 2-8 hours for experienced developer?
- If NO → Resize (split or combine)
- If YES → Perfect comparison window!
Phase 02 (Core Services):
- 9 tasks defined
- 8 ran in parallel (1 was integrator)
- Average task: ~300-500 LOC
- Synthesis: 5/9 tasks (56%)
- Duration: Completed in 4 days (would be 36 days sequential!)
Optimal task characteristics:
- Backend: 200-600 lines
- Frontend: 150-400 lines
- Config/Setup: 50-200 lines
- Tests included in estimate
Step 1: Feature identified
"We need CRUD operations with tenancy"
Step 2: Check planning docs
planning/crud-factory.md exists? ✅
planning/db-conventions.md exists? ✅
Step 3: Define task scope
Task: Core factory pattern
Not: Specific resources (separate tasks)
Not: Server wiring (integrator task)
Step 4: Identify owned paths
Owns: apps/api/src/lib/crud/
Does NOT touch: server.ts, resource files
Step 5: Write task file
Use template, fill in requirements
Reference planning docs
List anti-patterns
Step 6: Run it!
cube auto implementation/phase-02/tasks/02-crud-factory-core.mdSee: templates/task-template.md
Task: [Name]
Scope: [Clear boundary]
Owned paths: [Specific files]
Planning refs: [Which docs]
Time: [2-8 hours]
Testable: [Yes/No]
Parallel-safe: [Yes/No]
- Read:
docs/PLANNING_GUIDE.md(architecture meetings → planning docs) - Read:
docs/PHASE_ORGANIZATION.md(how phases emerge) - Review: v2 tasks in
your-project/implementation/ - Practice: Split one of YOUR features into tasks
The art: Small enough to compare, large enough to matter!