Skip to content

Latest commit

 

History

History
190 lines (142 loc) · 46.4 KB

File metadata and controls

190 lines (142 loc) · 46.4 KB

Testing Pitfalls

Test scenario checklist for reviewing coverage of any feature. Every item on this list exists because it catches bugs that have occurred in real codebases. Items marked with 🔥 Found in bug hunts were discovered in this codebase specifically. Unmarked items are equally important — they represent bugs we haven't made yet. Do not deprioritize an item because it lacks a marker.


1. Concurrency & TOCTOU

  • Multi-step flows under concurrent access: When a flow reads state then writes state (check-then-act), test two goroutines racing through the same flow simultaneously. Use a barrier pattern (ready := make(chan struct{}) + close(ready)) to ensure goroutines hit the critical section at the same time — sync.WaitGroup alone doesn't guarantee simultaneous execution.
  • Token/code consumption: Any "use once" token (reset, verification, invitation) must be tested with two concurrent consumers. Exactly one must succeed. 🔥 Found in bug hunts: password reset token consumed across separate transactions allowed concurrent resets.
  • Rate limit bypass under concurrency: Count-then-insert rate limits can be bypassed by concurrent requests that all read the same count before any insert. Test with burst requests. 🔥 Found in bug hunts: concurrent forgot-password requests exceeded per-user rate limit.
  • Bootstrap/initialization races: First-user, first-org, or any "only if none exist" flow must be tested with concurrent attempts. Exactly one must win. 🔥 Found in bug hunts: first-user bootstrap race in invite-only mode.
  • Idempotent operations under concurrency: When an operation should be idempotent (e.g., accepting an invitation twice), test concurrent execution — the second attempt must not produce a 500 from a constraint violation. 🔥 Found in bug hunts: concurrent invitation accept hit constraint violation.
  • Job deduplication under concurrent triggers: When both a scheduler tick and an API trigger can enqueue the same job type, test that concurrent triggers produce exactly one running job — not two. 🔥 Found in bug hunts: job queue TOCTOU allowed duplicate feed runs.

2. Negative Property Testing

  • Cleanup and eviction: When code accumulates state (maps, caches, queues), test that stale entries are eventually cleaned up. Don't just test "it works" — test "it doesn't leak." 🔥 Found in bug hunts: lockout map grew without bound — no eviction ever triggered.
  • Case sensitivity: When a string key is used for identity (email, username), test that case variations are treated consistently. Admin@Example.com and admin@example.com must be the same identity. 🔥 Found in bug hunts: login lockout bypassed via email case variation.
  • Bounded growth: For any in-memory data structure that grows with external input, test that it has a maximum size or eviction policy. Simulate 1000+ entries and verify memory is bounded.
  • Resource release on panic: When a resource (semaphore, lock, file handle) is acquired, test that it's released even if the protected code panics. Prefer defer over explicit release. 🔥 Found in bug hunts: argon2 semaphore not released on panic.
  • Stream error recovery: When a streaming parser (e.g., json.Decoder) hits a malformed record and continues to the next iteration, test that subsequent valid records still parse correctly. A continue after a partial read can leave the decoder in a corrupted state — the next Decode() call reads garbage. 🔥 Found in bug hunts: streaming JSON continue after error corrupted decoder state.
  • Post-filter pagination interaction: When SQL paginates (LIMIT N+1) and an application-layer filter removes rows from the result, the has-next-page decision must use the pre-filter row count — not the post-filter count. Test with a page where the filter removes enough rows that len(filtered) <= limit while len(unfiltered) > limit. The cursor position must also account for the filter: use the last visible row if any survive, or fall back to the last pre-filter row so the client can advance past an empty page. 🔥 Found in bug hunts: PostFilter regex conditions applied after SQL fetch — pagination decision used post-filter count, silently truncating results.
  • Case-sensitivity divergence between SQL and application layers: When the same logical condition is evaluated in SQL (e.g., lower(column)) and again in application code (e.g., Go regex match), test that both paths apply the same case normalization. A case-insensitive SQL query paired with a case-sensitive application filter silently drops matches. 🔥 Found in bug hunts: alert evaluator used lower(cves.description_primary) in SQL, but PostFilterField returned the raw-case description — same regex rule produced different results depending on evaluation path.

3. Error Path Differentiation

Silent error swallowing is the #1 bug category in this codebase — 26% of all bug hunt findings. Every error path must be tested explicitly.

  • Information leakage via error codes: When a handler must return the same status code regardless of whether a resource exists (anti-enumeration), test that ALL error paths — including DB errors on post-lookup queries — return the same status. A 500 on a query that only runs for existing users leaks user existence. 🔥 Found in bug hunts: GetPasswordResetTokenByHash error leaked user existence.
  • Error swallowing in Go: When code silently returns on error (if err != nil { return }), test that the error is at minimum logged. Silent return hides bugs. 🔥 Found in bug hunts: sync state persistence errors silently discarded; fetch log errors silently discarded; resendVerificationHandler claimed "sent" when SMTP failed.
  • Partial failure in multi-step flows: When a flow commits step 1 (e.g., creates a user) then fails on step 2 (e.g., creates an org), test that step 1's result is still usable. The user must not be in a broken state. 🔥 Found in bug hunts: orphaned tokens created when email delivery failed during registration.
  • Silent success on missing resources: When a DELETE or cancel operation targets a non-existent resource, test that it returns 404 — not 204. A silent 204 hides bugs in the caller. 🔥 Found in bug hunts: cancelInvitationHandler returned silent success on 404.
  • Error propagation across layers: When a store method returns an error, trace it through the handler to the HTTP response. Test that the handler doesn't discard the error and return 200. Inject DB errors via test fixtures and verify the HTTP status reflects the failure.
  • Error response format consistency across layers: When handlers use a structured error format (e.g., RFC 9457 application/problem+json), middleware must use the same format. Test that errors from middleware (auth, RBAC, CSRF, rate limiting) produce the same Content-Type and body structure as handler errors. A client parsing structured errors will break on text/plain responses from middleware. 🔥 Found in bug hunts: 16 middleware error paths used http.Error() (text/plain) while all handlers used writeProblem() (application/problem+json) — same 401 status arrived in two different formats.

4. Validation Symmetry

  • Create vs Update/PATCH: When a create handler validates a field (e.g., rejects empty names), the corresponding PATCH/update handler MUST apply the same validation. When testing PATCH, test every invalid input that create rejects — without exception. 🔥 Found in bug hunts: PATCH channel allowed empty/whitespace names; org create/update allowed whitespace-only names.
  • Whitespace-only strings: "" and " " are both "empty" from a business perspective. When a handler rejects empty strings, test that it also rejects whitespace-only strings. Use strings.TrimSpace() before checking.
  • Zero-value vs absent in PATCH: For pointer-based optional fields (*string, *bool), test that sending an explicit zero value (empty string, false) is handled correctly — not silently ignored by omitempty.
  • Falsy-value data preservation: When mapping external data to internal structs, test that legitimate zero values (CVSS score of 0.0, empty string that means "no value" vs missing field) are preserved — not dropped by truthiness checks. 🔥 Found in bug hunts: CVSS 0.0 scores dropped as falsy in GHSA, MSRC, and OSV adapters — this bug recurred three separate times.
  • Single canonical input source: When the same logical parameter can arrive from multiple sources (query param, JSON body, header), test that only one source is accepted. If both are provided with different values, the handler must either reject the request or document which source wins — never silently prefer one. 🔥 Found in bug hunts: bulk retry handler accepted limit from both query param (max 200) and JSON body (max 1000) — body silently won with no documentation.

5. Boundary & Configuration Validation

  • Dangerous config combinations: When two config values interact (e.g., CORS wildcard + credentials), test that dangerous combinations are rejected at startup — not silently accepted.
  • Missing config for optional features: When a feature depends on external config (SMTP host, API key), test what happens when the config is absent. The error message must name the missing config — not produce a cryptic connection failure.
  • Rate limit accounting: When a flow implicitly creates a counted resource (e.g., registration creates a verification token), and a rate limit counts those resources, the test must account for the implicitly-created resource. A limit of 3 with 1 implicit creation means 2 explicit attempts before the limit.
  • Schema version drift: When the application checks schema version at startup, test that the expected version constant matches the latest migration. A stale constant produces spurious warnings on every boot. 🔥 Found in bug hunts: expectedSchemaVersion was stale — startup always warned.
  • Unimplemented CLI flags must error: When a CLI flag is parsed but not yet implemented, the command must return an error — never log a warning and exit 0. A user running --dry-run expects either dry-run behavior or an explicit failure. Test every accepted flag to verify it either changes behavior or rejects the invocation. 🔥 Found in bug hunts: validate-feeds --dry-run logged "not yet implemented" then exited 0 with "all feeds valid" — misleading success.
  • Config endpoint secret redaction: When an admin or debug endpoint exposes configuration values, test that every field containing a secret (API keys, JWT secrets, database URLs, encryption keys, SMTP passwords) is redacted in the response — not just omitted. A refactoring that changes the redaction helper or adds a new secret field silently exposes credentials if there's no test asserting redaction. Test by comparing the response value against the actual secret. 🔥 Found in coverage review: adminConfigHandler calls redactSecret() on 10+ fields, but has 0% test coverage — a refactoring mistake could expose all secrets in plaintext.
  • Dependent config sub-field validation: When a config struct has a type/mode field that makes other fields required (e.g., pagination.type: cursor requires cursor_param and cursor_path), validate all dependent fields in the Validate() method — not just the type field itself. Test with a config that has a valid type but missing required sub-fields, and assert a validation error. An unchecked dependency can cause infinite loops (empty cursor param → fetch same page forever) or silent data loss. 🔥 Found in coverage review: generic feed Validate() checked pagination.type was valid but not that cursor_param/cursor_path were set — a misconfigured cursor feed could loop infinitely.

6. Data Lifecycle

  • Soft-delete filtering: Every query that lists or counts resources must be tested after a soft-delete. If deleted_at IS NULL is missing, soft-deleted records leak into results.
  • Expired record handling: Queries filtered by expires_at > now() must be tested with expired records present — verify they're excluded, and verify cleanup jobs remove them.
  • Duplicate detection: When a resource should be unique within a scope (e.g., one pending invitation per email per org), test that creating a duplicate is rejected — not silently accepted. 🔥 Found in bug hunts: createInvitationHandler allowed duplicate pending invitations.

7. Transaction & Store Conventions

  • Transaction helper compliance: Every store method must use the appropriate transaction helper (withOrgTx, withBypassTx, WorkerTx). Direct s.q or s.db access bypasses RLS setup. Verify with code inspection — grep for s.db.QueryContext and s.db.ExecContext in org-scoped methods. 🔥 Found in bug hunts: CountUsers, CreateUser, GetOrgByID, and UpdatePasswordHash all bypassed transaction helpers.
  • Multi-attribute PATCH atomicity: When a PATCH handler modifies multiple attributes (e.g., tier + suspend state), all modifications must happen in a single transaction. Test by failing the second update (e.g., inject a constraint violation) and verifying the first update did not persist. Separate transactions per attribute produce inconsistent state on partial failure. 🔥 Found in bug hunts: admin org PATCH ran tier update and suspend/unsuspend as independent transactions — a failure on suspend left the tier change committed.
  • Audit trail completeness: When similar operations (create/update/delete channel) have audit logging, verify that ALL operations in the same category (create/cancel/resend invitation) also have audit logging. Missing audit entries are hard to catch with automated tests — use code inspection.
  • Defined event/error constants must be emitted: When a security event constant or error sentinel is defined in source code, test that at least one code path actually emits it. A constant defined but never referenced is dead code that creates a false sense of coverage -- monitoring dashboards, alert rules, and audit queries referencing the event will silently match nothing. Verify with a grep that every defined constant appears in at least one emit/log/record call outside its definition. 🔥 Found in bug hunts: EventMFAChallengeExhausted was defined in secure/events.go and registered in the severity map, but never emitted -- the store method could not distinguish exhaustion from a normal failure, so the handler always emitted a generic event instead.
  • Direct tests for every public store method: When a store method is only tested indirectly (e.g., DeleteAllRecoveryCodes tested via RegenerateRecoveryCodes which calls it internally), that's not real coverage. The indirect test won't catch bugs in the standalone method's error wrapping, return value, or transaction helper usage. Every public method on *Store needs at least one test that calls it directly and asserts its return values. 🔥 Found in review: DeleteAllRecoveryCodes, DeleteAllUserChallenges, and CountMFACredentialsByUser were all tested only indirectly — none had direct tests verifying their behavior.
  • Multi-side-effect operations must assert ALL effects: When a store method or handler performs multiple side effects in one call (e.g., ResetUserMFA deletes credentials, recovery codes, challenges, and increments token_version), test that ALL side effects occurred — not just the primary one. A test that only checks hasMFA == false after a reset won't catch a bug where recovery codes survive, device tokens persist, or sessions remain valid. For each side effect: assert its post-condition directly. 🔥 Found in coverage review: TestAdminMFAReset checked 1 of 4 side effects (credential deletion only); TestAdminForcePasswordReset checked 1 of 3 (flag only — missed session invalidation and device token cleanup).
  • Test data must flow through production code paths: When an integration test needs seed data (CVEs, alert rules, notification channels), create it through the same code path production uses — not raw SQL inserts. A test that seeds CVEs via db.ExecContext("INSERT INTO cves ...") with manual material_hash values bypasses the merge pipeline's hash computation, child table population, and FTS index update. If the code under test evolves to join against child tables or rely on computed columns, the test passes with hand-crafted data while production fails. Use store methods or the merge pipeline to seed data. For CVE data specifically, use testutil.SeedCorpus(t, db) — see §9 "When to use SeedCorpus vs hand-crafted fixtures." 🔥 Found in review: alert evaluator tests seeded CVEs via raw SQL with fake material_hash strings — any query joining cve_affected_packages would silently return nothing.
  • Transaction commit persistence verification: When testing a transaction helper's commit path, verify that a write inside the transaction is readable after the function returns — not just that the function returned no error. A transaction helper that calls Rollback() instead of Commit() would pass a test that only checks err == nil. The symmetric test: rollback tests verify rows are NOT persisted; commit tests must verify rows ARE persisted. 🔥 Found in review: TestOrgTx_CommitsOnSuccess ran SELECT 1 inside the transaction and asserted no error — never wrote a row or verified persistence.

8. External Dependency Failure

  • SMTP/webhook failure reporting: When a handler sends an email or webhook, test the failure path. Does it claim success when the send failed? Does it surface the error to the user (for authenticated endpoints) or silently log it (for unauthenticated anti-enumeration endpoints)? 🔥 Found in bug hunts: resendVerificationHandler claimed "sent" when SMTP failed.
  • Sync vs async send implications: When an email is sent synchronously, the handler can report failure. When async, it can't. Test that the chosen approach matches the error-reporting contract. Don't make a send async if the API promises to report delivery status. 🔥 Found in bug hunts: verification email sent synchronously blocked the registration endpoint under SMTP latency.
  • No open DB transaction during external I/O: When code makes an outbound HTTP call (webhook, feed fetch), verify that no database transaction is held open during the call. The pattern is: claim job → commit → call external service → update status in new transaction. Holding a transaction open during a slow HTTP call exhausts the connection pool.
  • Production client configuration exercised in tests: When a security-critical HTTP client has specific configuration (SSRF protection, redirect policy, connection limits, TLS settings), at least one functional test must exercise the production client — not a substitute. When all tests use a plain http.Client because the production client blocks 127.0.0.1, the entire production client configuration is untested. Solve by verifying all configuration properties on the built client object, or by testing the production client against a non-loopback target. 🔥 Found in review: all webhook delivery tests used plainHTTPClient() or buildTestClient() bypassing safeurl — redirect policy, MaxConnsPerHost: 50, and SSRF blocking were never exercised in delivery path tests.

9. Feed Data Quality & Cursor Lifecycle

Feed adapters process untrusted external data. Every adapter test suite must cover these scenarios — upstream feeds are inconsistent, sparse, and periodically non-spec-compliant.

  • Null byte sanitization: Test that null bytes (\x00) in feed data are stripped before reaching Postgres. Postgres text columns reject null bytes — an unsanitized field causes the entire upsert to fail. Test with a fixture containing \x00 in description, required action, or any free-text field.
  • Falsy-value preservation: Test that CVSS 0.0, empty strings, and other legitimate zero values are persisted — not dropped by Go truthiness checks (if score != 0). This is the single most recurring adapter bug. 🔥 Found in bug hunts: CVSS 0.0 dropped in GHSA, MSRC, and OSV adapters — three independent occurrences.
  • Sparse field handling: A CVE without a CVSS score, without references, or without a description is normal — not an error. Test that the adapter and merge pipeline handle nil/missing fields gracefully without panicking or skipping the record.
  • Wire format assumptions: Never assume the upstream response is a top-level JSON array. Test the actual envelope structure. Use curl | jq 'keys' on the real API before writing fixtures. 🔥 Found in implementation-pitfalls: multiple adapters assumed wrong top-level structure.
  • Timestamp format tolerance: External feeds use inconsistent timestamp formats. Test with RFC3339, RFC3339Nano, date-only strings, and timezone-offset variants. A parser that hard-fails on a non-standard timestamp loses the entire CVE.
  • Cursor advancement after success: After a successful fetch-and-merge cycle, test that the cursor has advanced in the database. A feed that fetches successfully but doesn't persist its cursor will re-process the same data on every run. 🔥 Found in bug hunts: Red Hat adapter never advanced AfterDate cursor; feed sync cursor not persisted after successful fetch.
  • Cursor state on final page: Test that the cursor after processing the last page represents "caught up" — not the penultimate page's marker. 🔥 Found in bug hunts: NVD cursor regressed on final page.
  • Crash recovery — no re-processing: Simulate a crash mid-pagination (process N of M pages, then restart). Verify that already-processed pages are not re-merged. When an adapter fetches all pages internally before checkpointing, every crash re-processes everything. 🔥 Found in bug hunts: GHSA fetched all pages internally with no mid-pagination checkpoint.
  • Fetch duration tracking: Verify that fetch log entries record accurate start/end timestamps with non-zero duration. 🔥 Found in bug hunts: InsertFeedFetchLog discarded started_at and ended_at.
  • Behavioral parity between alternate code paths: When the same data can be processed by two different code paths (streaming vs buffered, fast-path vs slow-path, simple vs nested), test both paths with identical input and assert identical output. Type coercion differences between libraries are a common source of divergence — e.g., encoding/json rejects type mismatches while gjson coerces them. 🔥 Found in bug hunts: generic adapter streaming path used json.Decode(&string) for cursor values while the buffered path used gjson.String() — numeric cursors like 42 succeeded on the buffered path but failed on the streaming path.
  • End-to-end adapter→merge→store integration: For each feed adapter, test the full pipeline: adapter.Fetch()CanonicalPatchmerge.Ingest() → database read-back. Adapter unit tests verify JSON→CanonicalPatch conversion; merge integration tests construct CanonicalPatch manually. Neither catches field-mapping errors at the boundary — e.g., swapping CVSSv3 and CVSSv4 vectors, or dropping CWE IDs during struct conversion. These produce structurally valid patches with silently wrong data. At minimum, one test per adapter should round-trip through the merge pipeline and assert specific field values in the resulting cves row.
  • Golden file tests against real captured data: Every feed adapter MUST have a golden_test.go that serves captured real API responses via httptest and runs Fetch() against them. Unit tests with hand-crafted JSON fixtures cannot detect upstream schema drift — the fixture matches what the developer assumed the API returns, not what it actually returns. Golden file tests are the only reliable way to catch this class of bug. 🔥 Found in Phase 10: MSRC adapter had 8 passing unit tests but was 100% broken against the real API (endpoint never existed). GHSA adapter had 27 passing unit tests but silently dropped every advisory (references field was []string, not []object). Both bugs were invisible until golden file tests ran real captured data.

When to use SeedCorpus vs hand-crafted fixtures

testutil.SeedCorpus(t, db) seeds a test database with 65 real CVEs from 8 feed adapters via golden fixtures and the real merge pipeline. It provides a realistic, deterministic corpus for integration tests. Requires Docker (testcontainers).

Use SeedCorpus when:

  • Testing features that query across the CVE corpus (alert evaluation, search/FTS, reports, watchlists)
  • Testing merge pipeline behavior with multi-source CVEs (field precedence, vendor enrichment)
  • Testing any code that joins cves with child tables (cve_sources, cve_affected_packages, cve_search_index)
  • You need CVEs that look like production data — proper material_hash, populated child tables, FTS index entries

Use hand-crafted fixtures when:

  • Testing a specific edge case that real data may never contain (CVSS 0.0, null bytes, malformed timestamps)
  • Adapter unit tests that need precise control over individual JSON fields
  • Testing error paths (malformed responses, HTTP failures, cursor regressions)
  • Speed matters — SeedCorpus takes ~10s (Docker startup); hand-crafted fixtures are instant

Never use raw SQL inserts for CVE test data — they bypass material_hash computation, child table population, and FTS index updates. Use SeedCorpus, store methods, or the merge pipeline. (See §7 "Test data must flow through production code paths.")

10. RLS & Tenant Isolation Verification

RLS bugs are invisible in tests that use the superuser connection. Every org-scoped store method needs a test through the restricted connection.

  • Dual-connection testing: When testing any org-scoped store method, query through db.AppStore (the NOBYPASSRLS connection) — not db.Store (superuser). Tests that only use the superuser connection cannot detect RLS policy bugs. 🔥 Found in implementation-pitfalls §2.14: four broken list methods passed all tests because the superuser connection returned all rows regardless of app.org_id.
  • Cross-tenant visibility assertion: For every org-scoped list/get operation, create data in Org A and Org B via the superuser connection, then query via AppStore scoped to Org A. Org B's data must not appear. This is the minimum viable RLS test.
  • Per-user resource isolation: Not all isolation is org-scoped RLS. Per-user resources (MFA challenges, remember-device tokens, recovery codes) are filtered by user_id in global tables — no RLS involved. For every per-user lookup, test that User B cannot access User A's resource using the same token/hash. This is easy to miss because "it's just a WHERE clause" — but a missing AND user_id = $1 in a query turns a per-user resource into a global one. 🔥 Found in review: remember-device token cross-user isolation was untested — query correctness was assumed, not verified.
  • Child table isolation: RLS on a parent table does NOT protect child tables. When a child table has org_id, it must have its own RLS policy and its own cross-tenant test. When a child table lacks org_id, that is a schema bug — flag it.
  • Worker vs API transaction helpers: WorkerTx bypasses RLS (for background jobs that need cross-tenant access). withOrgTx enforces it (for API handlers). Test that API handler code paths never use WorkerTx — grep for WorkerTx in internal/api/ as a code review step. 🔥 Found in implementation-pitfalls §4.6: worker transaction helpers called from API handlers allowed cross-tenant data access.

11. Security Enforcement Testing

Building a security mechanism is not enough — test that the mechanism actually fires. Every security check needs at least one test that proves rejection works.

  • RBAC matrix coverage: For every protected endpoint, test with every role (owner, admin, member, viewer, unauthenticated). An endpoint that only tests the happy-path role doesn't prove the others are rejected. 🔥 Found in bug hunts: admin feed endpoints lacked role-based auth — any authenticated user could trigger feed syncs.
  • JWT algorithm confusion: Test that a JWT signed with alg: "none" is rejected. Test that a JWT signed with an unexpected algorithm (e.g., RS256 when the server expects HS256) is rejected. WithValidMethods must be present and tested.
  • Anti-enumeration timing: When a handler must not reveal whether a resource exists (forgot-password, login), test that response times for existing and non-existing targets are statistically indistinguishable. A constant-time comparison on the secret is necessary but not sufficient — a DB query that only runs for existing users creates a timing side-channel.
  • Authorization parity across auth flows: When a behavior exists in one auth flow (native register creates a default org), verify it exists in ALL auth flows (GitHub OAuth, Google OIDC). Maintain a checklist of "things that happen on first registration" and test each flow against it. 🔥 Found in implementation-pitfalls §8.3: OAuth and native registration produced different post-registration states.
  • Security check enforcement across similar endpoints: When adding a security check to one handler, grep for all handlers that perform the same kind of operation and verify they have the same check. Role assignment isn't just updateMemberRole — it's also invitations, OAuth account linking, and admin overrides. 🔥 Found in implementation-pitfalls §8.1.
  • Webhook SSRF protection: When testing webhook delivery, test with internal/private IP targets (127.0.0.1, 169.254.169.254, 10.0.0.0/8). The safeurl client must reject them. Test with DNS rebinding (a hostname that resolves to a private IP).
  • Constant-time secret comparison: Any comparison of secrets (API key hashes, HMAC signatures, CSRF tokens) must use subtle.ConstantTimeCompare or hmac.Equal — never == or bytes.Equal. Code inspection alone doesn't prevent regressions — complement it with a mechanical check (grep test, go/ast scan, or linter rule) that breaks when the comparison function changes.
  • Multi-layer authorization negative cases: When a function checks multiple independent authorization layers (e.g., "site config → org policy → per-member rule"), test each layer's negative case with all other layers inactive. A test that sets layer 1's config to "off" but leaves layer 3 active gives false confidence — it passes whether layer 1's check works or not, because layer 3 is the actual reason the assertion holds. For N layers, you need at least N negative tests where only one layer could match and it doesn't. 🔥 Found in review: UserMFARequired layer 1 negative test (site admin + config off) also had a per-member requirement active — would have passed even if layer 1 ignored the config flag entirely.
  • Multi-step restricted session state consistency: When an auth flow issues restricted tokens (pending JWTs) that carry state claims (e.g., token_version, remaining steps), and an intermediate step mutates that state in the DB (e.g., password change increments token_version), test that the reissued restricted token reflects the post-mutation state -- not the pre-mutation value carried forward from the original token. The test must decode the reissued token and assert the claim matches the DB. 🔥 Found in bug hunts: changePasswordHandler reissued a pending token with the stale token_version from before UpdatePasswordHash incremented it in the DB.
  • Full auth token issuance on restricted session completion: When a restricted session flow has multiple pending steps and the final step clears the last one, test that the handler issues full access/refresh tokens -- not just clears the restricted cookie. A flow that clears the cookie without issuing replacement tokens leaves the user unauthenticated despite having just proven their identity. Test the single-pending-step case specifically -- multi-step flows may mask the bug by always having remaining steps. 🔥 Found in bug hunts: clearEnrollmentPending only cleared the cookie when all pending items were done -- never issued full tokens, forcing an unnecessary re-login.
  • Token type enforcement across session types: When the auth system issues multiple token types with different privilege levels (access, refresh, pending/restricted), test that each parser rejects tokens of the wrong type. A ParseAccessToken that doesn't check the typ claim accepts a pending MFA token as a fully-authenticated session — bypassing MFA entirely. For every token type, test that presenting it to the wrong parser/endpoint returns 401. 🔥 Found in review: no test verified that a pending token was rejected when presented as an access token.
  • Cross-user token swapping in multi-step flows: When a multi-step authentication flow issues user-bound bearer tokens (enrollment cookies, pending tokens, setup nonces), test that completing another user's flow with a stolen token is rejected. Start the flow as User A, then attempt to complete it as User B using A's token. The guard (enrollClaims.UserID != authenticatedUserID) must be tested explicitly — it's easy to implement but also easy to forget when the happy path works. This is distinct from token type enforcement (§11 "Token type enforcement"): type enforcement verifies wrong parsers reject wrong types; cross-user enforcement verifies the same parser rejects wrong users. 🔥 Found in coverage review: mfaTOTPConfirmHandler checks enrollClaims.UserID != userID to prevent User A from binding their authenticator to User B's account — no test exercised this guard.
  • Fail-open regression tests: When a security mechanism intentionally fails open on infrastructure errors (e.g., lockout check returns "allowed" on DB error, with rate limiter as backup), add a regression test that: (1) documents the design decision, (2) verifies the fail-open behavior under simulated infrastructure failure, (3) prevents accidental change to fail-closed (which could lock out all users on DB outage). Comments documenting design intent get deleted; regression tests don't. 🔥 Found in coverage review: lockoutManager.Check intentionally fails open on DB errors (rate limiter as backup defense) — no regression test existed to prevent an inadvertent change to fail-closed.

12. Frontend State & Error Handling

Empty catch {} blocks and stale state on navigation are the two most common frontend bug categories. 13 frontend bugs were found in bug hunts — the majority were silent failures that rendered blank pages.

  • Error states, not blank pages: When a network request fails, the component must render an error message — never a blank page or an empty list that looks like "no results." Test every fetch/GET call with a mocked network error and assert that an error element is visible. 🔥 Found in bug hunts: WatchlistDetailView rendered blank page on network error; GroupMembersDialog showed "empty state" instead of error message; fetchItems() had no catch block.
  • Mutation error feedback: When a POST/PUT/DELETE fails, the user must see feedback. Test every mutation call with a mocked failure and assert visible error feedback. 🔥 Found in bug hunts: MembersView.cancelInvitation() silently failed; multiple dialogs swallowed mutation errors.
  • Stale data on route param change: When a view fetches data based on a route param (e.g., /cves/:id), test that navigating to a different param triggers a new fetch — not a stale render of the previous data. Use Vue's watch on the route param or onBeforeRouteUpdate. 🔥 Found in bug hunts: CveDetailView showed wrong CVE after param change.
  • Org switch cache invalidation: When the user switches organizations, every org-scoped data cache must be invalidated and refetched. Test by switching orgs and asserting that list data reflects the new org — not stale data from the previous org. 🔥 Found in bug hunts: org switch didn't refetch data — showed cross-org data.
  • Auth token refresh deduplication: When multiple concurrent requests receive 401s, exactly one refresh request must fire — not one per failed request. Test with two parallel requests that both trigger 401, assert a single refresh call. 🔥 Found in bug hunts: dual independent refreshPromise in API client and org fetch module.
  • Request body on retry: When a middleware retries a failed request (e.g., after token refresh), test that the original request body is re-sent. A fetch() body is a stream — once consumed, it's gone. The retry must use a cloned or buffered body. 🔥 Found in bug hunts: refresh middleware retry used consumed fetch() body.
  • Non-2xx success responses: When an endpoint intentionally returns non-2xx status codes with valid response bodies (e.g., 503 for health checks, 207 for multi-status), test that the frontend correctly parses the response body — not just checks resp.ok. A fetch response with ok === false still carries a valid JSON body that must be read. 🔥 Found in bug hunts: AdminSystemView doctor check used resp.ok to gate JSON parsing, silently discarding 503 responses — the exact case where the doctor results are most needed.

13. Feature Flag & Admin Column Enforcement

When a migration adds admin-managed columns (disable flags, forced-reset flags, pause states), the feature is incomplete until the enforcement code reads them. These bugs are invisible in integration tests that only exercise the admin write path.

  • Admin flag enforcement at all entry points: When a migration adds a boolean/timestamp flag (e.g., disabled_at, force_password_reset, paused_at), trace every code path that the flag should gate. Login handlers, refresh handlers, scheduler loops, and manual trigger endpoints all need to check the flag — not just the middleware that runs on authenticated API calls. 🔥 Found in bug hunts: disabled_at not checked in login handler (only in auth middleware); force_password_reset not checked anywhere; paused_at not checked by scheduler or manual trigger endpoint.
  • In-memory vs DB state consistency: When both in-memory state (rate limiters, lockout managers) and DB columns exist for the same concept, test that: (1) DB state survives restarts, (2) admin "clear" operations actually affect the enforcement mechanism, (3) multi-instance deployments share state. 🔥 Found in bug hunts: in-memory lockout manager and DB locked_at/failed_login_count columns were completely disconnected — admin "unlock" cleared DB columns the login flow never wrote.
  • Registry completeness for extensible systems: When a system supports both built-in items and user-configured items (e.g., built-in feeds + generic YAML feeds), test that management endpoints (list, trigger, pause, resume) work for BOTH categories — not just the hardcoded built-in list. 🔥 Found in bug hunts: admin feed list/trigger/pause/resume endpoints only accepted built-in feed names; generic feeds were invisible and unmanageable.

14. Background Worker & Job Lifecycle

Workers operate outside the request-response cycle with different failure modes: no caller to report errors to, no request context for scoping, and crash recovery depends on database state.

  • Claim-commit-call-update pattern: When a worker delivers to an external service (webhook, email), verify this exact sequence: (1) claim job in DB, (2) commit the claim transaction, (3) make the external HTTP call, (4) update job status in a new transaction. Any test that mocks the external call while a DB transaction is open is testing the wrong thing.
  • Stale job recovery: Simulate a worker crash by claiming a job and not completing it. Verify that the recovery sweep picks it up after the stale threshold and re-enqueues it.
  • Goroutine shutdown path: Every go func() { for { ... } }() must have a corresponding Stop() method and a done channel. In tests, every server or worker that spawns goroutines must have t.Cleanup(srv.Close). A test that passes but leaks goroutines is a test that hides bugs. 🔥 Found in implementation-pitfalls §8.5.
  • Large corpus pagination: When a worker processes the full CVE corpus (activation scans, batch evaluation, search index rebuilds), test that it paginates — not loads everything into memory. Mock a corpus of 10,000+ records and verify memory stays bounded.
  • Fan-out error independence: When delivering notifications to multiple channels, test that a failure on one channel does not cancel or skip the others. The pattern is sync.WaitGroup with independent error recording per channel — never errgroup (which cancels siblings on first error).
  • Prometheus counter race safety: When workers update Prometheus counters, test with -race enabled and multiple concurrent worker goroutines. Counter init and increment must be goroutine-safe. 🔥 Found in bug hunts: scheduler Prometheus counters had data race on init.
  • Scheduler respects pause/disable flags: When a scheduler checks whether to enqueue a job, test that it reads and respects all relevant state flags (paused, disabled, suspended) — not just timing-based checks (interval, backoff). 🔥 Found in bug hunts: feed scheduler checked backoff and interval but ignored paused_at flag entirely.

15. Config Indirection & Hot-Reload

When a config value is read from an indirection layer (atomic pointer, cache, holder), tests must verify the runtime code actually reads from that layer — not from the original source.

  • Runtime reads from the correct source: When introducing a config holder/cache that sits between startup config and runtime consumers, test that every consumer reads from the holder — not from the original config struct. A holder that is updated on reload but never read is a silent no-op. Test by changing the holder's value and asserting that subsequent operations use the new value, not the startup value. 🔥 Found in bug hunts: configHolder was updated by SIGHUP and admin reload, but all JWT signing/parsing and SSO encryption/decryption still read from the static startup srv.cfg — hot-reload had zero runtime effect.
  • Full-replacement vs merge on reload: When a config reload replaces the entire config object (not individual fields), test that a reload with a partial config file does not zero out fields that were present at startup but absent from the reload file. If the reload file only contains JWT_SECRET, fields like SMTP_HOST must retain their previous values — not silently become empty. 🔥 Found in bug hunts: LoadFromSecretsFile created a new ReloadableConfig from scratch — any field not in the file was zeroed, silently disabling SMTP, SSO encryption, and other features.
  • Feature wiring completeness: When a feature has implementation code (constructors, methods, types) and configuration fields, test that the startup path actually calls the constructors with the config values. Dead code that is implemented but never wired is invisible to unit tests. 🔥 Found in bug hunts: SyslogWriter and SetSyslog were fully implemented, SIEMSyslogAddr existed in ReloadableConfig, but no startup code ever called NewSyslogWriter or SetSyslog — SIEM forwarding was entirely dead code.

16. Test Infrastructure Hygiene

Test code has the same error-swallowing bugs as production code — but they're harder to catch because the symptom is a misleading test failure, not a user-visible bug.

  • Test setup must not discard errors: When test setup calls store methods (CreateOrg, CreateUser, CreateChannel), never discard the error with _ =. A nil return from a failed CreateOrg propagates as a nil pointer dereference deep in test logic — the stack trace points at the wrong line, and the real failure (e.g., missing migration, duplicate constraint) is invisible. Use require.NoError(t, err) for every setup call. Grep for , _ := and _, _ := patterns in _test.go files as a code review step. 🔥 Found in review: CreateOrg and CreateUser errors discarded with _ = across multiple test files in store, notify, and alert packages — infrastructure failures surfaced as nil pointer panics.
  • Tests must be executed, not just compiled: go build and go vet verify syntax; only go test verifies behavior. When infrastructure is unavailable (Docker Desktop down, test DB unreachable), treat this as a hard blocker — not something to work around. Code that compiles but was never executed is unverified code. 🔥 Found in remediation: 4,600 lines of test code across 4 execution stages were merged to dev with "compiles clean" verification only. Docker Desktop was unavailable throughout. 12 of 13 CI failures would have been caught by running the tests once.
  • Nullable columns need nullable scan destinations: When querying a table with nullable columns (common in security_events, audit_log), the scan destination must be a pointer type (*string, *uuid.UUID) or sql.NullString. Scanning a nullable column into a plain string panics at runtime: can't scan into dest: cannot scan NULL into *string. This is invisible at compile time. 🔥 Found in remediation: flushAndQueryEvents helper scanned actor_email (nullable) into string — 9 tests crashed in CI.
  • CSRF header on mutating test requests: Every POST/PUT/PATCH/DELETE request through the full HTTP stack must include req.Header.Set("X-Requested-By", "CVErt-Ops"). Cookie-authenticated requests without it get 403 from CSRF middleware. GET requests are exempt. 🔥 Found in remediation: 3 admin user tests got 403 because CSRF header was missing from POST requests — the pattern existed in every other test file but wasn't copied.
  • DB CHECK constraints must cover all handler action strings: When a migration adds a CHECK constraint on an enum-like column (e.g., audit_log.action), the allowed values must include every string used by application code. Adding a new handler action (revoke, add, remove) without updating the constraint silently fails in CI while working locally if the migration hasn't been re-run. 🔥 Found in remediation: audit_log CHECK constraint allowed create/update/delete but handlers used revoke, add, remove, bind, unbind.