| cypilot | true |
|---|---|
| type | requirement |
| name | Bug-Finding Methodology |
| version | 1.0 |
| purpose | Compact language-agnostic methodology for high-recall bug discovery in source code |
Scope: Source-code analysis for correctness, logic, reliability, security, concurrency, performance, and integration defects across programming languages.
Non-goal: guarantee 100% bug detection. That is not achievable in the general case because programs depend on incomplete specifications, runtime environment, dynamic inputs, external systems, and nondeterministic behavior. The practical target is maximum recall with explicit uncertainty, evidence, and escalation paths.
- Combine complementary signals. Pattern rules, semantic reasoning, data-flow review, dynamic checks, and historical evidence catch different bug classes.
- Optimize for recall first, then calibrate precision with evidence. Missing a bug is usually worse than investigating one plausible candidate.
- When recall and compactness conflict, keep the smallest slice that still covers the full invariant or boundary under test. Expand immediately when a plausible bug depends on unseen callers, callees, shared state, config, or cross-language contracts.
- Work from invariants and failure modes, not from language syntax alone. This keeps the method portable across languages.
- Never claim
all bugs found. Report confidence and residual uncertainty explicitly. - Load only the code needed for the current reasoning slice. Expand context only when call graph, state flow, or boundary contracts require it.
- A single LLM pass is insufficient. CodeRabbit-like quality requires a layered review stack, not just a better prompt.
- This section is the single canonical hotspot budget and escalation rule for the methodology.
- Start each hotspot with a working set of
<= 5files and<= 800active code lines. - Summarize and drop already-processed context before expanding.
- Expand only when the current slice cannot confirm or refute a plausible bug in the active invariant, state transition, or boundary contract.
- For companion requirements or checklists, load the smallest decisive slice first: TOC, then review-mode or reporting slices, then only the specific item sections required by the chosen review mode or the active hotspot, then one contiguous line range if needed. A slice is decisive only when the inspected text is sufficient to resolve the hotspot-relevant normative effect: whether that dependency changes the active invariant, required check, status rule, boundary contract, or validation obligation for the hotspot under review. Do not whole-file load a companion dependency by default.
- Keep companion coverage bounded to
<= 2dependency files and<= 240raw dependency lines in active context for one review pass; summarize and drop raw dependency text before loading the next slice. - Treat a companion as proved non-material only when the inspected slice is sufficient to show that the remaining unseen text cannot change those hotspot-relevant obligations and any remaining mentions are purely descriptive, duplicative, or outside the reviewed hotspot. If that proof is missing, the companion remains unresolved.
- If a hotspot still needs more than
1500active code lines, more than2expansion rounds, or companion coverage beyond the dependency budget, stop broadening, emit a checkpoint with the unresolved boundary, and mark the reviewPARTIALunless a confirmed material defect already makes itFAIL. - Recommend a focused follow-up pass or dynamic validation instead of claiming broader coverage.
| Layer | Question |
|---|---|
| L1 | Where are the real risk hotspots? |
| L2 | What contracts and invariants must always hold? |
| L3 | Which paths, states, and interleavings can violate them? |
| L4 | Which universal bug classes apply here? |
| L5 | Can a concrete counterexample be constructed? |
| L6 | What dynamic check would confirm or refute the finding? |
| L7 | What is the overall review status, confidence, impact, and next action? |
Focus first on code that is most likely to contain high-impact defects.
- Start from changed code, entry points, trust boundaries, persistence boundaries, async boundaries, and externally visible behavior.
- Prioritize authentication, authorization, money movement, state transitions, retries, parsing, serialization, migrations, caching, and cleanup logic.
- Expand to callers, callees, shared utilities, and configuration only when they influence the active path.
- Use repository signals when available: churn, incident history, bug-fix patterns, flaky tests, complex functions, and modules with many dependencies.
Extract what must be true before, during, and after execution.
- Preconditions: input shape, nullability, permissions, ordering, initialization, feature flags, units, and schema assumptions.
- Postconditions: returned value, persisted state, emitted events, side effects, idempotency, and cleanup guarantees.
- Cross-step invariants: uniqueness, monotonicity, ownership, transactional boundaries, retry safety, and consistency between cache, database, and outbound messages.
- If the contract is not explicit, infer it from tests, names, types, assertions, error messages, docs, and call sites, but mark it as inferred rather than proven.
Trace how bugs emerge when the happy path breaks.
- Check the main path, unhappy path, edge values, repeated invocation, partial failure, timeout, retry, stale state, invalid config, startup, shutdown, and rollback behavior.
- For stateful logic, trace creation, mutation, persistence, invalidation, and cleanup.
- For async or concurrent logic, examine races, double delivery, out-of-order completion, missing awaits, lock ordering, cancellation, and duplicate side effects.
- For distributed flows, examine retries, deduplication, eventual consistency gaps, and split-brain assumptions between services.
Apply the same defect lenses regardless of language.
| Class | Typical failures |
|---|---|
| Correctness & logic | Wrong branch, inverted condition, off-by-one, missing case, unreachable branch, bad default |
| Input & boundary | Missing validation, parse mismatch, encoding mismatch, unit mismatch, schema drift |
| Error handling & resilience | Swallowed error, wrong fallback, retry storm, partial commit, misleading success |
| State & lifecycle | Wrong initialization order, stale cache, missing cleanup, duplicate apply, broken rollback |
| Security & trust boundary | Authz gap, injection path, traversal, unsafe deserialization, secret or PII leak |
| Concurrency & async | Race, deadlock, lost update, double execution, missing await, cancellation bug |
| Performance & resources | N+1, unbounded loop, leak, blocking hot path, missing backpressure |
| Integration & config | Version drift, env mismatch, clock/timezone bug, feature-flag inversion, protocol mismatch |
| Testing gaps | Missing regression coverage for critical or failure paths |
A suspected bug becomes stronger when you can describe exactly how it fails.
- Build the smallest trigger: input, prior state, ordering, timing, or configuration needed to break the invariant.
- Express the failure as
condition -> execution path -> bad outcome. - Search for contradictory code, assertions, tests, or guards that disprove the hypothesis.
- If no plausible trigger can be constructed, lower confidence or discard the finding.
When static reasoning is insufficient, specify the cheapest next proof.
- Use targeted unit tests for local logic and boundary conditions.
- Use integration tests for persistence, network, serialization, configuration, and cross-service behavior.
- Use property-based tests or fuzzing for parsers, protocol handlers, validators, and state machines.
- Use semantic static analysis or data-flow engines for taint, authorization, and multi-hop flow issues.
- Use runtime traces, logs, metrics, and production incidents for nondeterministic or environment-sensitive failures.
Practical layered stack:
- Hotspot triage plus invariant and failure-path review on bounded local slices
- Universal bug-class sweep plus counterexample construction on the highest-risk paths
- Cheapest confirming proof next: targeted tests, semantic/static analyzers, runtime evidence, then feedback from escaped defects or incidents
Overall review status is mandatory:
PASS: the stated scope was completed, every hotspot in scope was checked enough to resolve the active bug hypotheses, every required companion slice for the final report was covered within budget, no confirmed or high-confidence material defect remains open, and residual risk is explicitly bounded.PARTIAL: coverage is incomplete, a hotspot or companion dependency was checkpointed, a material hypothesis still needs more bounded context, or dynamic validation is still required before the review can close safely.FAIL: the review path was invalid or at least one confirmed or high-confidence material defect remains open.
Mandatory status triggers:
- Use
PARTIALwhen the canonical hotspot budget forced a stop, any required companion slice remains unresolved, or follow-up validation is still required for a material hotspot. - Use
FAILwhen methodology requirements were not followed well enough for a valid review or when an open material defect still stands at report time. - Never use
PASSwhen unresolved hotspots, unresolved companion effects, or required follow-up validation remain. - A companion counts as checked only after its inspected slice resolved the hotspot-relevant normative effect or proved the dependency non-material by the decision rule in
Context Budget & Expansion Control; otherwise it stays unresolved and forcesPARTIAL.
Report each finding with:
- Bug class
- Severity
- Confidence:
CONFIRMED,HIGH,MEDIUM, orLOW - Location
- Violated invariant or contract
- Minimal trigger or counterexample
- Impact
- Evidence
- Proposed fix
- Best validation step
Residual uncertainty is mandatory:
- List unproven high-risk areas.
- List required dynamic checks not yet run.
- State which bug classes were checked and which were only partially checked.
- State why the final status is
PARTIALorFAILwhenever either value is used. - Never collapse uncertainty into a blanket
PASS.
Final review output is mandatory. This methodology defines the authoritative bug-finding content contract, but a host workflow may still control the outer response wrapper.
For a standalone bug-finding report, use these sections in order; this order is authoritative even when companion checklists are loaded:
Review Summary: review status, target scope, hotspots reviewed, files inspected, whether the review stayed local or expanded, and which companion slices were loaded.Findings: severity-sorted findings using the required per-finding fields above. Report only problems. If none, stateNo confirmed findings.Coverage & Residual Risk: bug classes checked, bug classes only partially checked, unchecked high-risk hotspots, dynamic checks not yet run, checklist mode used, any checklist items or companion coverage still unverified, and a checklist ledger for item-level status accounting whencode-checklist.mdis in scope.Next Actions: cheapest confirming validations, required context expansions, or an explicit statement that no further action is currently justified.
When a host workflow already defines mandatory top-level headings, keep that outer structure and map this methodology into it instead of replacing the workflow headings. For /cypilot-analyze, keep the six-section Validation Report wrapper, place Review Summary, Coverage & Residual Risk, and Next Actions inside the bug-finding portion of ### 3. Semantic Review (MANDATORY), and surface actual defects from Findings in ### 6. Issues (if any). Do not introduce competing top-level headings.
Inside Coverage & Residual Risk, include a compact checklist ledger for checklist items required by the selected review mode or explicitly loaded and assessed in the current pass, using the columns ID | Status | Rationale. Allowed status values are PASS, FAIL, N/A, and NOT REVIEWED. Preserve item-level accounting for review-mode-excluded items without forcing whole-checklist expansion: record each excluded item as NOT REVIEWED with rationale excluded by review mode, but compact rows are allowed only when they still enumerate the exact checklist IDs covered by that row, using exact ID lists or contiguous ID ranges/lists only when every ID in that row shares the same status and rationale. Use NOT REVIEWED for explicitly checkpointed loaded items only after their governing slice was identified. Keep Findings problem-only; record PASS, N/A, and NOT REVIEWED entries in the ledger instead of inventing extra top-level sections, and summarize any other unloaded checklist remainder as unresolved coverage rather than implying whole-checklist review.
Companion checklist integration rules:
- When
code-checklist.mdis in scope, keep this section order instead of replacing it with a separate quick/standard/full top-level format. - Satisfy the checklist's reporting minimums inside
Findingsby makingLocation,Evidence,Impact, andProposed fixexplicit in every reported problem; do not duplicate the same content under a second heading set. - Put checklist acceptance/reporting evidence in
Coverage & Residual Risk, not in a competing report preamble. - This four-section contract supersedes only
code-checklist.md's top-level report shape when this methodology is the outer report contract. It does not waive that checklist's item-by-item applicability decisions, item-levelNOT REVIEWEDaccounting for review-mode-excluded items, status labels, rationale requirements, or review-mode obligations.
- Use this methodology when the user asks to find bugs, logic errors, edge cases, regressions, hidden failure modes, or "all problems" in code.
- Load
reverse-engineering.mdWHEN the bug review needs structure beyond the local hotspot: entry points, module boundaries, dependency direction, state lifecycle, or integration boundaries. Start with a TOC, section, or contiguous line-range read, then summarize and drop it before loading the next slice. Skip it for bounded local reviews where those are already clear. - Load
code-checklist.mdbefore final output as a mandatory acceptance and reporting checklist, but start with only the reporting, procedure, review-mode, conflict-resolution, and specific checklist-item slices required by the chosen review mode or already implicated by the hotspot. Do not load unrelated checklist sections by default. - Companion dependency loading is bounded by
Context Budget & Expansion Control. If required review-mode coverage or other required companion coverage cannot be completed safely within that budget, checkpoint the unresolved dependency, set the final review status toPARTIAL, and recommend a focused follow-up instead of forcing a broader ledger orPASS. - Use this methodology as the search procedure that drives what code paths and failure modes to inspect first.
Use this sequence for each hotspot:
- Map the boundary and impacted path.
- Extract explicit and inferred invariants.
- Walk the happy path and the most dangerous unhappy paths.
- Sweep all universal bug classes.
- Build or refute a concrete counterexample.
- Propose the cheapest confirming dynamic check.
- Report confidence and residual risk.
Efficiency rules:
- Apply
Context Budget & Expansion Controlas the single canonical hotspot budget and escalation rule for every hotspot. - Keep only the active path, invariant, and evidence in working context; summarize and drop processed code before expanding.
- Expand to adjacent files or companion requirements only when a plausible bug depends on that boundary and the current slice cannot resolve it.
- If the canonical budget forces a stop, emit a checkpoint and set status per
L7instead of broadening implicitly.
Review is complete when:
- Risk hotspots were identified and prioritized
- Explicit and inferred invariants were extracted
- Happy path and failure paths were both examined
- All universal bug classes were swept for the target scope
- Each reported issue includes a plausible trigger or counterexample
- Missing proof was converted into a concrete dynamic follow-up
- Review status, bounded companion coverage, and residual uncertainty were reported explicitly
- Any unresolved companion dependency or required follow-up validation forced
PARTIALinstead ofPASS - Final output either used the standalone four-section order or mapped it into the host workflow's mandatory wrapper without competing top-level headings
- No claim of
100%detection or blanket coverage was made