| cypilot | true |
|---|---|
| type | requirement |
| name | Reverse Engineering Methodology |
| version | 1.0 |
| purpose | Technology-agnostic methodology for systematic project analysis |
- Agent Instructions
- Overview
- Layer Map
- L1: Surface Reconnaissance
- L2: Entry Point Analysis
- L3: Structural Decomposition
- L4: Data Flow Tracing
- L5: Dependency Mapping
- L6: State Management Analysis
- L7: Integration Boundary Scan
- L8: Pattern Recognition
- L9: Knowledge Synthesis
- Execution Protocol
- Error Handling
- Consolidated Validation Checklist
- References
Scope: Any software project regardless of language, framework, or architecture.
ALWAYS open and follow this file WHEN the user asks to analyze a codebase, search project code or docs, or generate artifacts/code from existing project structure.
ALWAYS open and follow {cypilot_path}/.core/requirements/execution-protocol.md for workflow context.
Prerequisites: confirm the agent has read this methodology, has repository access, will execute layers 1 -> 9 in order, and will checkpoint after each layer.
Reverse engineering builds a progressive mental model of a system. The rule is: observe patterns, not technologies. Every project reveals structure through entry points, organization, data movement, dependency direction, state transitions, and boundary behavior.
| Layer | Question |
|---|---|
| L1 | What does the repository look like before reading code? |
| L2 | Where and how does execution begin? |
| L3 | How is code organized into logical units? |
| L4 | How does data move through the system? |
| L5 | What depends on what? |
| L6 | How is state created, modified, and persisted? |
| L7 | Where does the system touch the outside world? |
| L8 | What patterns and conventions recur? |
| L9 | What knowledge should be carried forward? |
Goal: form initial impressions without deep code reading.
| Area | Required checks |
|---|---|
| Repository structure | List top-level directories; identify standard names (src, lib, pkg, app, cmd, internal, test, docs); identify non-standard/domain directories; note naming convention (kebab-case, snake_case, camelCase, PascalCase); note hidden directories (.git, .github, .vscode, .idea); note config directories (config, settings, env). |
| File inventory | List top-level files; identify config files (package.json, pyproject.toml, Cargo.toml, go.mod, pom.xml, build.gradle, *.csproj); doc files (README, CHANGELOG, CONTRIBUTING, LICENSE); CI/CD files (.github/workflows, .gitlab-ci.yml, Jenkinsfile, .circleci); container/infra files (Dockerfile, docker-compose.yml, k8s/, terraform/); editor config (.editorconfig, .prettierrc, .eslintrc, .rubocop.yml). |
| Git history | Check repository age, recent activity, most active directories, stale directories, contributor count, and commit message patterns (conventional commits, ticket references). |
| Language detection | Scan extensions (.ts, .js, .py, .rs, .go, .java, .cs, .rb, .php, .kt, .swift, .cpp, .c); count files per extension; identify primary language in source dirs; identify secondary languages in scripts, tests, or tools. |
| Multi-language patterns | Check for polyglot layout, FFI/bindings, generated code (protobuf, GraphQL codegen, ORM models), and DSLs (SQL, templates, config schemas). |
| Explicit docs | Read README.md; inspect docs/; look for architecture docs (ARCHITECTURE.md, ADR/, decisions); API docs (openapi.yml, swagger.json, Postman); inline docs (docstrings, JSDoc, rustdoc). |
| Implicit docs | Analyze test names, type definitions, error messages, and log statements. |
Goal: understand where execution starts and how bootstrap reaches business logic.
| Area | Required checks |
|---|---|
| Main entry points | Search language-specific entry patterns: Go func main() in main.go / cmd/*/main.go; Python if __name__ == "__main__" or __main__.py; Node.js main in package.json, index.js, app.js, server.js; Java public static void main or @SpringBootApplication; Rust fn main() in src/main.rs or src/bin/; C# static void Main or Program.cs; Ruby script files, config.ru, Rakefile. |
| Multiple entry points | Check for CLI subcommands or multiple binaries, workers/background jobs, scheduled tasks, event handlers/webhooks/serverless functions, and migration scripts. |
| HTTP entry points | Find route definitions; list endpoints with methods; identify middleware chains (auth, logging, rate limiting); map URL patterns to handlers. |
| Event entry points | Find queue consumers, event listeners, scheduled jobs, file watchers, and stream processors. |
| CLI entry points | Find command definitions (argparse, cobra, clap, commander), list commands/subcommands, identify hierarchy. |
| Bootstrap sequence | Trace entry point to first business logic; identify config loading, DI/service container setup, DB connection init, external client init, middleware/interceptor registration, and initialization-order dependencies. |
Goal: understand how code is grouped and what each unit owns.
| Area | Required checks |
|---|---|
| Architecture pattern | Identify the dominant pattern: Layered, Hexagonal / Ports & Adapters, Clean Architecture, Microservices, Monolith, Modular Monolith, Event-Driven, or Serverless. |
| Module boundaries | Identify top-level modules/packages; map each module responsibility in one sentence; identify module dependencies; check for circular deps; identify shared/common modules; identify vendor or third-party wrappers. |
| Grouping strategy | Determine whether code is grouped by layer, spec/feature, domain, or a hybrid. |
| File organization | Identify naming patterns; file-per-class vs file-per-module usage; index/barrel files; test file locations (adjacent, separate, nested). |
| Core components | For each module, record module name/location, primary responsibility, public interface, key dependencies, persistence involvement, and external integrations. |
| Cross-cutting components | Identify logging, error handling, configuration management, security/authentication, caching, and validation infrastructure. |
Goal: explain how requests, commands, or events transform data.
| Area | Required checks |
|---|---|
| Representative flows | Trace 3-5 operations from entry point through input validation/transformation, business logic, persistence, external calls, response construction, and error paths. |
| Data transformations | For each traced flow, record input shape, intermediate shapes, output shape, and side effects (what persists, what notifies). |
| Domain entities | Identify core entities; for each, record definition location, key attributes, relationships, invariants/validation rules, and lifecycle states if stateful. |
| DTOs | Identify request/response DTO patterns, DTO-to-entity transformations, serialization formats (JSON, protobuf, XML), and versioning patterns. |
| Storage technologies | Identify databases, file storage, caches, and search indices. |
| Data access patterns | Identify ORM/query builders, raw SQL, repository/DAO patterns, database migrations (location/tool), and seed data (location/format). |
Goal: make dependency direction and replaceability visible.
| Area | Required checks |
|---|---|
| Internal dependency graph | Build the module import graph; identify acyclic core, hubs, leaf modules, and whether dependency direction is inverted (lower layers depending on higher layers). |
| Dependency injection | Identify DI container/framework, service registration patterns, injection style (constructor/property/method), and interface-to-implementation bindings. |
| Third-party libraries | List direct dependencies from the package manager; categorize by framework, database/ORM, HTTP client, serialization, validation, testing, and utilities; identify critical dependencies, outdated/deprecated dependencies, and security vulnerabilities. |
| External services | Identify external API calls; for each service record name/purpose, client location, authentication method, error handling, and retry/resilience patterns. |
Goal: explain where state lives and how it changes.
| Area | Required checks |
|---|---|
| In-memory state | Identify singleton/global state, request-scoped state, cached state, and runtime configuration state. |
| State lifecycle | Explain how state is initialized, accessed, modified (mutation vs immutable updates), and invalidated/cleared. |
| Database state | Record schema definition location, migration history, index definitions, constraint definitions, and trigger/stored-procedure usage. |
| State machines | Identify stateful entities (status fields, enums); record valid states, allowed transitions, transition triggers, and transition side effects. |
| Session/user state | Record where session state is stored (cookie, JWT, server), what data is stored, and how expiration/cleanup works. |
| Distributed coordination | Check for distributed locks, leader election, distributed caching, and event-sourcing / CQRS patterns. |
Goal: map inbound, outbound, and infrastructure boundaries.
| Area | Required checks |
|---|---|
| Inbound boundaries | Catalog public APIs (HTTP REST, GraphQL, gRPC, WebSocket, webhook receivers), internal APIs (service-to-service, admin/management, health/metrics), and async inputs (queue consumers, event subscribers, scheduled jobs, file watchers). |
| External APIs | For each external API record service name/purpose, base URL config, auth/authz, request/response formats, timeout config, retry policy, circuit breaker presence, and fallback behavior. |
| Database connections | For each database record type/version, connection string location, pool config, read/write split, and replica usage. |
| External outputs | Identify queue publishing, email/SMS sending, file storage writes, and outbound notification webhooks. |
| Runtime boundary | Record container base image, runtime config, environment variables, and secrets management. |
| Network boundary | Record port bindings, host configuration, TLS/SSL setup, and proxy configuration. |
Goal: identify conventions and repeated implementation idioms.
| Area | Required checks |
|---|---|
| Creational patterns | Factory patterns, Builder patterns, Singleton patterns, dependency injection patterns. |
| Structural patterns | Adapter/wrapper patterns, Decorator patterns, Facade patterns, Proxy patterns. |
| Behavioral patterns | Strategy patterns, Observer patterns, Command patterns, State patterns. |
| Naming conventions | Variable naming, function naming, class/type naming, file naming, directory naming. |
| Code style | Indentation, line-length limits, import organization, comment style, documentation format. |
| Error handling conventions | Exception vs result types, error message format, error code patterns, logging on errors, propagation strategy. |
| Test organization | Test file location, naming convention, test structure (describe/it, given/when/then, arrange/act/assert), setup/teardown, fixture patterns. |
| Test types | Unit, integration, E2E, test data management, mocking/stubbing patterns. |
Goal: turn findings into reusable knowledge.
| Output | Required content |
|---|---|
| System overview paragraph | Primary purpose, key technologies, architectural style, major components, and primary data flows. |
| Component map | All major components, relationships, data-flow directions, and integration points. |
| Domain model summary | Entities with one-line descriptions, relationship summary, and key business rules/invariants. |
| Key operations | Critical business operations, operation-to-entry-point mapping, and data-flow summary for each. |
| Technical debt & risks | Circular dependencies, overly complex modules, inconsistent patterns, missing error handling, security concerns, performance concerns. |
| Knowledge gaps | Areas not fully understood, missing docs, unclear business logic, untested code paths. |
| Developer entry points | Where to start reading, key files first, critical abstractions, common modification patterns. |
| Operations entry points | Deployment process, configuration options, monitoring/alerting setup, troubleshooting guides. |
Before starting: confirm source access, search capability, read permissions, and optionally local-run access or a running instance.
Order: execute layers 1 -> 9, checkpoint after each layer, and carry findings forward.
Time box by project size:
| Project Size | L1-L2 | L3-L4 | L5-L7 | L8-L9 |
|---|---|---|---|---|
Small (< 10k LOC) |
15 min | 30 min | 30 min | 15 min |
Medium (10k-100k) |
30 min | 1 hr | 1 hr | 30 min |
Large (> 100k) |
1 hr | 2 hr | 2 hr | 1 hr |
Required output artifacts: System Overview (max 1 page: purpose, tech stack, architecture style, key components and relationships), Domain Model, Entry Points Catalog, Integration Map, Conventions Guide, and Technical Debt List.
Applicability: greenfield validation before implementation; brownfield understanding before modification; acquisitions/transfers for due diligence and onboarding; legacy modernization to find strangler boundaries; documentation generation as input to Cypilot artifacts.
Integration with Cypilot: Adapter workflow uses L1-L3 for project scan; Generate workflow uses all layers for artifact creation; Validate workflow uses L4-L7 for traceability verification.
| Condition | Response | Action |
|---|---|---|
| Repository access failed | Repository access failed: {error}; check file permissions, verify the path exists, confirm VCS access. |
STOP — source access is required. |
| Layer incomplete | Layer {N} incomplete: {reason}; record completed items, skipped items, and blocker. |
Document gaps explicitly and continue with caveat. |
| External dependencies unavailable | External dependency unavailable: {service}; note unverifiable integration patterns, auth, and data formats; mark boundary UNVERIFIED. |
WARN and continue as a knowledge gap. |
| Large codebase timeout | Time box exceeded for Layer {N}; record completion percentage and whether to resume later or proceed with partial findings. |
Save a checkpoint, note incompleteness, proceed. |
| Obfuscated/generated code | Obfuscated/generated code detected: {location}; skip generated output and analyze source templates/generators instead. |
Analyze generators/templates, not generated output. |
Use this single checklist for all reverse-engineering validation.
| # | Check | Required | How to Verify |
|---|---|---|---|
L1.1 |
Repository structure documented | YES | Directory tree captured |
L1.2 |
Primary language identified | YES | File-extension counts analyzed |
L1.3 |
Documentation inventory complete | YES | README, docs/, ADRs listed |
L2.1 |
Main entry points identified | YES | Entry files/functions listed |
L2.2 |
Bootstrap sequence traced | YES | Initialization order documented |
| # | Check | Required | How to Verify |
|---|---|---|---|
L3.1 |
Architecture pattern identified | YES | Pattern named with evidence |
L3.2 |
Module boundaries mapped | YES | Modules listed with responsibilities |
L3.3 |
Component inventory complete | YES | Core + cross-cutting components listed |
L4.1 |
Representative flows traced | YES | 3-5 flows documented entry-to-exit |
L4.2 |
Domain entities identified | YES | Entities with attributes listed |
L4.3 |
Persistence layer documented | YES | Storage technologies and patterns noted |
| # | Check | Required | How to Verify |
|---|---|---|---|
L5.1 |
Module dependency graph built | YES | Import relationships mapped |
L5.2 |
External dependencies cataloged | YES | Libraries categorized by purpose |
L5.3 |
External services documented | YES | API calls with auth/error handling noted |
L6.1 |
Application state locations identified | YES | Global, request-scoped, cached state listed |
L6.2 |
State machines documented | CONDITIONAL | If stateful entities exist, transitions mapped |
L6.3 |
Distributed state patterns noted | CONDITIONAL | If distributed, coordination mechanisms listed |
| # | Check | Required | How to Verify |
|---|---|---|---|
L7.1 |
Inbound boundaries cataloged | YES | APIs, consumers, triggers listed |
L7.2 |
Outbound boundaries cataloged | YES | External calls, databases, outputs listed |
L7.3 |
Infrastructure boundaries noted | YES | Container, network, secrets documented |
L8.1 |
Code patterns identified | YES | Creational, structural, behavioral patterns listed |
L8.2 |
Project conventions documented | YES | Naming, style, error-handling patterns noted |
L8.3 |
Testing conventions documented | YES | Test organization and patterns noted |
| # | Check | Required | How to Verify |
|---|---|---|---|
L9.1 |
System overview produced | YES | Single-paragraph description complete |
L9.2 |
Component map produced | YES | Visual or textual map created |
L9.3 |
Domain model summarized | YES | Entities and relationships listed |
L9.4 |
Technical debt identified | YES | Issues and risks documented |
L9.5 |
Knowledge gaps listed | YES | Unclear areas explicitly noted |
L9.6 |
Entry points summary for developers | YES | Where to start reading documented |
| # | Check | Required | How to Verify |
|---|---|---|---|
F.1 |
All Surface Analysis checks pass | YES | L1.1-L2.2 verified |
F.2 |
All Structural Analysis checks pass | YES | L3.1-L4.3 verified |
F.3 |
All Dependency Analysis checks pass | YES | L5.1-L6.3 verified, conditionals where applicable |
F.4 |
All Integration Analysis checks pass | YES | L7.1-L8.3 verified |
F.5 |
All Synthesis checks pass | YES | L9.1-L9.6 verified |
F.6 |
Output artifacts produced | YES | Six artifacts from Execution Protocol created |
- Generate workflow:
{cypilot_path}/.core/workflows/generate.md - Execution protocol:
{cypilot_path}/.core/requirements/execution-protocol.md