"Archaeology over Creativity" — We extract the truth of what code does, grounded in static analysis, not hallucination.
When reverse-engineering specifications from existing code, the goal is documentation, not invention. We document what IS, not what SHOULD BE.
LLMs are creative by nature. Given incomplete information, they fill gaps with plausible-sounding content. This is dangerous when documenting existing systems:
- Invented features that don't exist
- Assumed behaviors that never happen
- Imagined integrations that aren't real
Every requirement and scenario in generated specs must trace back to actual code:
### Requirement: UserEmailValidation
The system SHALL validate email format before creating user accounts.
#### Scenario: InvalidEmailRejected
- **GIVEN** a registration request with email "not-an-email"
- **WHEN** the user creation endpoint is called
- **THEN** the request fails with validation error
## Technical Notes
- **Implementation**: `src/validators/email.ts:15-30`
- **Evidence**: Regex validation in validateEmail() functionThe "Evidence" field is key — it points to the exact code that supports this requirement.
✅ "The system validates email format using regex pattern X" ✅ "User passwords are hashed with bcrypt (12 rounds)" ✅ "The API returns 404 for non-existent resources"
❌ "The system should validate email format" ❌ "Passwords should be securely hashed" ❌ "The API should handle errors gracefully"
When code behavior is ambiguous, say so:
### Requirement: SessionTimeout
The system appears to implement session timeouts, though the exact duration is not clearly defined in code.
**Confidence**: Low
**Evidence**: `src/middleware/auth.ts` references `SESSION_TIMEOUT` but value not foundIf you can't verify it from code, don't include it. Missing documentation is better than wrong documentation.
- Directory structure
- File types and naming patterns
- Dependencies and frameworks
- Entry points
- Parse high-value files
- Extract entities and relationships
- Identify operations and flows
- Map dependencies
- Only document verified behavior
- Link every requirement to code
- Note confidence levels
- Flag ambiguities
- Clear code evidence
- Explicit behavior in source
- Test coverage confirms behavior
- Comments/docs align with code
- Behavior inferred from patterns
- No direct code evidence
- Missing test coverage
- Conflicting signals
- Skip it — Incomplete is better than wrong
- Flag it — Mark as "needs verification"
- Ask — Human review is valuable
Generated specs should be:
- Accurate — Every statement is verifiable
- Useful — Provides real understanding
- Maintainable — Easy to update as code changes
- Honest — Acknowledges limitations
Remember: We're archaeologists documenting ancient ruins, not architects designing new buildings. Our job is to understand and record what exists, not to imagine what could be.