Test scenario checklist for reviewing coverage of any feature. Every item on this list exists because it catches bugs that have occurred in real codebases. Items marked with 🔥 Found in bug hunts were discovered in this codebase specifically. Unmarked items are equally important — they represent bugs we haven't made yet. Do not deprioritize an item because it lacks a marker.
- Multi-step flows under concurrent access: When a flow reads state then writes state (check-then-act), test two goroutines racing through the same flow simultaneously. Use a barrier pattern (
ready := make(chan struct{})+close(ready)) to ensure goroutines hit the critical section at the same time —sync.WaitGroupalone doesn't guarantee simultaneous execution. - Token/code consumption: Any "use once" token (reset, verification, invitation) must be tested with two concurrent consumers. Exactly one must succeed. 🔥 Found in bug hunts: password reset token consumed across separate transactions allowed concurrent resets.
- Rate limit bypass under concurrency: Count-then-insert rate limits can be bypassed by concurrent requests that all read the same count before any insert. Test with burst requests. 🔥 Found in bug hunts: concurrent forgot-password requests exceeded per-user rate limit.
- Bootstrap/initialization races: First-user, first-org, or any "only if none exist" flow must be tested with concurrent attempts. Exactly one must win. 🔥 Found in bug hunts: first-user bootstrap race in invite-only mode.
- Idempotent operations under concurrency: When an operation should be idempotent (e.g., accepting an invitation twice), test concurrent execution — the second attempt must not produce a 500 from a constraint violation. 🔥 Found in bug hunts: concurrent invitation accept hit constraint violation.
- Job deduplication under concurrent triggers: When both a scheduler tick and an API trigger can enqueue the same job type, test that concurrent triggers produce exactly one running job — not two. 🔥 Found in bug hunts: job queue TOCTOU allowed duplicate feed runs.
- Cleanup and eviction: When code accumulates state (maps, caches, queues), test that stale entries are eventually cleaned up. Don't just test "it works" — test "it doesn't leak." 🔥 Found in bug hunts: lockout map grew without bound — no eviction ever triggered.
- Case sensitivity: When a string key is used for identity (email, username), test that case variations are treated consistently.
Admin@Example.comandadmin@example.commust be the same identity. 🔥 Found in bug hunts: login lockout bypassed via email case variation. - Bounded growth: For any in-memory data structure that grows with external input, test that it has a maximum size or eviction policy. Simulate 1000+ entries and verify memory is bounded.
- Resource release on panic: When a resource (semaphore, lock, file handle) is acquired, test that it's released even if the protected code panics. Prefer
deferover explicit release. 🔥 Found in bug hunts: argon2 semaphore not released on panic. - Stream error recovery: When a streaming parser (e.g.,
json.Decoder) hits a malformed record andcontinues to the next iteration, test that subsequent valid records still parse correctly. Acontinueafter a partial read can leave the decoder in a corrupted state — the nextDecode()call reads garbage. 🔥 Found in bug hunts: streaming JSONcontinueafter error corrupted decoder state. - Post-filter pagination interaction: When SQL paginates (
LIMIT N+1) and an application-layer filter removes rows from the result, the has-next-page decision must use the pre-filter row count — not the post-filter count. Test with a page where the filter removes enough rows thatlen(filtered) <= limitwhilelen(unfiltered) > limit. The cursor position must also account for the filter: use the last visible row if any survive, or fall back to the last pre-filter row so the client can advance past an empty page. 🔥 Found in bug hunts:PostFilterregex conditions applied after SQL fetch — pagination decision used post-filter count, silently truncating results. - Case-sensitivity divergence between SQL and application layers: When the same logical condition is evaluated in SQL (e.g.,
lower(column)) and again in application code (e.g., Go regex match), test that both paths apply the same case normalization. A case-insensitive SQL query paired with a case-sensitive application filter silently drops matches. 🔥 Found in bug hunts: alert evaluator usedlower(cves.description_primary)in SQL, butPostFilterFieldreturned the raw-case description — same regex rule produced different results depending on evaluation path.
Silent error swallowing is the #1 bug category in this codebase — 26% of all bug hunt findings. Every error path must be tested explicitly.
- Information leakage via error codes: When a handler must return the same status code regardless of whether a resource exists (anti-enumeration), test that ALL error paths — including DB errors on post-lookup queries — return the same status. A 500 on a query that only runs for existing users leaks user existence. 🔥 Found in bug hunts:
GetPasswordResetTokenByHasherror leaked user existence. - Error swallowing in Go: When code silently returns on error (
if err != nil { return }), test that the error is at minimum logged. Silentreturnhides bugs. 🔥 Found in bug hunts: sync state persistence errors silently discarded; fetch log errors silently discarded;resendVerificationHandlerclaimed "sent" when SMTP failed. - Partial failure in multi-step flows: When a flow commits step 1 (e.g., creates a user) then fails on step 2 (e.g., creates an org), test that step 1's result is still usable. The user must not be in a broken state. 🔥 Found in bug hunts: orphaned tokens created when email delivery failed during registration.
- Silent success on missing resources: When a DELETE or cancel operation targets a non-existent resource, test that it returns 404 — not 204. A silent 204 hides bugs in the caller. 🔥 Found in bug hunts:
cancelInvitationHandlerreturned silent success on 404. - Error propagation across layers: When a store method returns an error, trace it through the handler to the HTTP response. Test that the handler doesn't discard the error and return 200. Inject DB errors via test fixtures and verify the HTTP status reflects the failure.
- Error response format consistency across layers: When handlers use a structured error format (e.g., RFC 9457
application/problem+json), middleware must use the same format. Test that errors from middleware (auth, RBAC, CSRF, rate limiting) produce the sameContent-Typeand body structure as handler errors. A client parsing structured errors will break ontext/plainresponses from middleware. 🔥 Found in bug hunts: 16 middleware error paths usedhttp.Error()(text/plain) while all handlers usedwriteProblem()(application/problem+json) — same 401 status arrived in two different formats.
- Create vs Update/PATCH: When a create handler validates a field (e.g., rejects empty names), the corresponding PATCH/update handler MUST apply the same validation. When testing PATCH, test every invalid input that create rejects — without exception. 🔥 Found in bug hunts: PATCH channel allowed empty/whitespace names; org create/update allowed whitespace-only names.
- Whitespace-only strings:
""and" "are both "empty" from a business perspective. When a handler rejects empty strings, test that it also rejects whitespace-only strings. Usestrings.TrimSpace()before checking. - Zero-value vs absent in PATCH: For pointer-based optional fields (
*string,*bool), test that sending an explicit zero value (empty string, false) is handled correctly — not silently ignored byomitempty. - Falsy-value data preservation: When mapping external data to internal structs, test that legitimate zero values (CVSS score of 0.0, empty string that means "no value" vs missing field) are preserved — not dropped by truthiness checks. 🔥 Found in bug hunts: CVSS 0.0 scores dropped as falsy in GHSA, MSRC, and OSV adapters — this bug recurred three separate times.
- Single canonical input source: When the same logical parameter can arrive from multiple sources (query param, JSON body, header), test that only one source is accepted. If both are provided with different values, the handler must either reject the request or document which source wins — never silently prefer one. 🔥 Found in bug hunts: bulk retry handler accepted
limitfrom both query param (max 200) and JSON body (max 1000) — body silently won with no documentation.
- Dangerous config combinations: When two config values interact (e.g., CORS wildcard + credentials), test that dangerous combinations are rejected at startup — not silently accepted.
- Missing config for optional features: When a feature depends on external config (SMTP host, API key), test what happens when the config is absent. The error message must name the missing config — not produce a cryptic connection failure.
- Rate limit accounting: When a flow implicitly creates a counted resource (e.g., registration creates a verification token), and a rate limit counts those resources, the test must account for the implicitly-created resource. A limit of 3 with 1 implicit creation means 2 explicit attempts before the limit.
- Schema version drift: When the application checks schema version at startup, test that the expected version constant matches the latest migration. A stale constant produces spurious warnings on every boot. 🔥 Found in bug hunts:
expectedSchemaVersionwas stale — startup always warned. - Unimplemented CLI flags must error: When a CLI flag is parsed but not yet implemented, the command must return an error — never log a warning and exit 0. A user running
--dry-runexpects either dry-run behavior or an explicit failure. Test every accepted flag to verify it either changes behavior or rejects the invocation. 🔥 Found in bug hunts:validate-feeds --dry-runlogged "not yet implemented" then exited 0 with "all feeds valid" — misleading success. - Config endpoint secret redaction: When an admin or debug endpoint exposes configuration values, test that every field containing a secret (API keys, JWT secrets, database URLs, encryption keys, SMTP passwords) is redacted in the response — not just omitted. A refactoring that changes the redaction helper or adds a new secret field silently exposes credentials if there's no test asserting redaction. Test by comparing the response value against the actual secret. 🔥 Found in coverage review:
adminConfigHandlercallsredactSecret()on 10+ fields, but has 0% test coverage — a refactoring mistake could expose all secrets in plaintext. - Dependent config sub-field validation: When a config struct has a type/mode field that makes other fields required (e.g.,
pagination.type: cursorrequirescursor_paramandcursor_path), validate all dependent fields in theValidate()method — not just the type field itself. Test with a config that has a valid type but missing required sub-fields, and assert a validation error. An unchecked dependency can cause infinite loops (empty cursor param → fetch same page forever) or silent data loss. 🔥 Found in coverage review: generic feedValidate()checkedpagination.typewas valid but not thatcursor_param/cursor_pathwere set — a misconfigured cursor feed could loop infinitely.
- Soft-delete filtering: Every query that lists or counts resources must be tested after a soft-delete. If
deleted_at IS NULLis missing, soft-deleted records leak into results. - Expired record handling: Queries filtered by
expires_at > now()must be tested with expired records present — verify they're excluded, and verify cleanup jobs remove them. - Duplicate detection: When a resource should be unique within a scope (e.g., one pending invitation per email per org), test that creating a duplicate is rejected — not silently accepted. 🔥 Found in bug hunts:
createInvitationHandlerallowed duplicate pending invitations.
- Transaction helper compliance: Every store method must use the appropriate transaction helper (
withOrgTx,withBypassTx,WorkerTx). Directs.qors.dbaccess bypasses RLS setup. Verify with code inspection — grep fors.db.QueryContextands.db.ExecContextin org-scoped methods. 🔥 Found in bug hunts:CountUsers,CreateUser,GetOrgByID, andUpdatePasswordHashall bypassed transaction helpers. - Multi-attribute PATCH atomicity: When a PATCH handler modifies multiple attributes (e.g., tier + suspend state), all modifications must happen in a single transaction. Test by failing the second update (e.g., inject a constraint violation) and verifying the first update did not persist. Separate transactions per attribute produce inconsistent state on partial failure. 🔥 Found in bug hunts: admin org PATCH ran tier update and suspend/unsuspend as independent transactions — a failure on suspend left the tier change committed.
- Audit trail completeness: When similar operations (create/update/delete channel) have audit logging, verify that ALL operations in the same category (create/cancel/resend invitation) also have audit logging. Missing audit entries are hard to catch with automated tests — use code inspection.
- Defined event/error constants must be emitted: When a security event constant or error sentinel is defined in source code, test that at least one code path actually emits it. A constant defined but never referenced is dead code that creates a false sense of coverage -- monitoring dashboards, alert rules, and audit queries referencing the event will silently match nothing. Verify with a grep that every defined constant appears in at least one emit/log/record call outside its definition. 🔥 Found in bug hunts:
EventMFAChallengeExhaustedwas defined insecure/events.goand registered in the severity map, but never emitted -- the store method could not distinguish exhaustion from a normal failure, so the handler always emitted a generic event instead. - Direct tests for every public store method: When a store method is only tested indirectly (e.g.,
DeleteAllRecoveryCodestested viaRegenerateRecoveryCodeswhich calls it internally), that's not real coverage. The indirect test won't catch bugs in the standalone method's error wrapping, return value, or transaction helper usage. Every public method on*Storeneeds at least one test that calls it directly and asserts its return values. 🔥 Found in review:DeleteAllRecoveryCodes,DeleteAllUserChallenges, andCountMFACredentialsByUserwere all tested only indirectly — none had direct tests verifying their behavior. - Multi-side-effect operations must assert ALL effects: When a store method or handler performs multiple side effects in one call (e.g.,
ResetUserMFAdeletes credentials, recovery codes, challenges, and incrementstoken_version), test that ALL side effects occurred — not just the primary one. A test that only checkshasMFA == falseafter a reset won't catch a bug where recovery codes survive, device tokens persist, or sessions remain valid. For each side effect: assert its post-condition directly. 🔥 Found in coverage review:TestAdminMFAResetchecked 1 of 4 side effects (credential deletion only);TestAdminForcePasswordResetchecked 1 of 3 (flag only — missed session invalidation and device token cleanup). - Test data must flow through production code paths: When an integration test needs seed data (CVEs, alert rules, notification channels), create it through the same code path production uses — not raw SQL inserts. A test that seeds CVEs via
db.ExecContext("INSERT INTO cves ...")with manualmaterial_hashvalues bypasses the merge pipeline's hash computation, child table population, and FTS index update. If the code under test evolves to join against child tables or rely on computed columns, the test passes with hand-crafted data while production fails. Use store methods or the merge pipeline to seed data. For CVE data specifically, usetestutil.SeedCorpus(t, db)— see §9 "When to use SeedCorpus vs hand-crafted fixtures." 🔥 Found in review: alert evaluator tests seeded CVEs via raw SQL with fakematerial_hashstrings — any query joiningcve_affected_packageswould silently return nothing. - Transaction commit persistence verification: When testing a transaction helper's commit path, verify that a write inside the transaction is readable after the function returns — not just that the function returned no error. A transaction helper that calls
Rollback()instead ofCommit()would pass a test that only checkserr == nil. The symmetric test: rollback tests verify rows are NOT persisted; commit tests must verify rows ARE persisted. 🔥 Found in review:TestOrgTx_CommitsOnSuccessranSELECT 1inside the transaction and asserted no error — never wrote a row or verified persistence.
- SMTP/webhook failure reporting: When a handler sends an email or webhook, test the failure path. Does it claim success when the send failed? Does it surface the error to the user (for authenticated endpoints) or silently log it (for unauthenticated anti-enumeration endpoints)? 🔥 Found in bug hunts:
resendVerificationHandlerclaimed "sent" when SMTP failed. - Sync vs async send implications: When an email is sent synchronously, the handler can report failure. When async, it can't. Test that the chosen approach matches the error-reporting contract. Don't make a send async if the API promises to report delivery status. 🔥 Found in bug hunts: verification email sent synchronously blocked the registration endpoint under SMTP latency.
- No open DB transaction during external I/O: When code makes an outbound HTTP call (webhook, feed fetch), verify that no database transaction is held open during the call. The pattern is: claim job → commit → call external service → update status in new transaction. Holding a transaction open during a slow HTTP call exhausts the connection pool.
- Production client configuration exercised in tests: When a security-critical HTTP client has specific configuration (SSRF protection, redirect policy, connection limits, TLS settings), at least one functional test must exercise the production client — not a substitute. When all tests use a plain
http.Clientbecause the production client blocks127.0.0.1, the entire production client configuration is untested. Solve by verifying all configuration properties on the built client object, or by testing the production client against a non-loopback target. 🔥 Found in review: all webhook delivery tests usedplainHTTPClient()orbuildTestClient()bypassing safeurl — redirect policy,MaxConnsPerHost: 50, and SSRF blocking were never exercised in delivery path tests.
Feed adapters process untrusted external data. Every adapter test suite must cover these scenarios — upstream feeds are inconsistent, sparse, and periodically non-spec-compliant.
- Null byte sanitization: Test that null bytes (
\x00) in feed data are stripped before reaching Postgres. Postgres text columns reject null bytes — an unsanitized field causes the entire upsert to fail. Test with a fixture containing\x00in description, required action, or any free-text field. - Falsy-value preservation: Test that CVSS 0.0, empty strings, and other legitimate zero values are persisted — not dropped by Go truthiness checks (
if score != 0). This is the single most recurring adapter bug. 🔥 Found in bug hunts: CVSS 0.0 dropped in GHSA, MSRC, and OSV adapters — three independent occurrences. - Sparse field handling: A CVE without a CVSS score, without references, or without a description is normal — not an error. Test that the adapter and merge pipeline handle nil/missing fields gracefully without panicking or skipping the record.
- Wire format assumptions: Never assume the upstream response is a top-level JSON array. Test the actual envelope structure. Use
curl | jq 'keys'on the real API before writing fixtures. 🔥 Found in implementation-pitfalls: multiple adapters assumed wrong top-level structure. - Timestamp format tolerance: External feeds use inconsistent timestamp formats. Test with RFC3339, RFC3339Nano, date-only strings, and timezone-offset variants. A parser that hard-fails on a non-standard timestamp loses the entire CVE.
- Cursor advancement after success: After a successful fetch-and-merge cycle, test that the cursor has advanced in the database. A feed that fetches successfully but doesn't persist its cursor will re-process the same data on every run. 🔥 Found in bug hunts: Red Hat adapter never advanced
AfterDatecursor; feed sync cursor not persisted after successful fetch. - Cursor state on final page: Test that the cursor after processing the last page represents "caught up" — not the penultimate page's marker. 🔥 Found in bug hunts: NVD cursor regressed on final page.
- Crash recovery — no re-processing: Simulate a crash mid-pagination (process N of M pages, then restart). Verify that already-processed pages are not re-merged. When an adapter fetches all pages internally before checkpointing, every crash re-processes everything. 🔥 Found in bug hunts: GHSA fetched all pages internally with no mid-pagination checkpoint.
- Fetch duration tracking: Verify that fetch log entries record accurate start/end timestamps with non-zero duration. 🔥 Found in bug hunts:
InsertFeedFetchLogdiscardedstarted_atandended_at. - Behavioral parity between alternate code paths: When the same data can be processed by two different code paths (streaming vs buffered, fast-path vs slow-path, simple vs nested), test both paths with identical input and assert identical output. Type coercion differences between libraries are a common source of divergence — e.g.,
encoding/jsonrejects type mismatches whilegjsoncoerces them. 🔥 Found in bug hunts: generic adapter streaming path usedjson.Decode(&string)for cursor values while the buffered path usedgjson.String()— numeric cursors like42succeeded on the buffered path but failed on the streaming path. - End-to-end adapter→merge→store integration: For each feed adapter, test the full pipeline:
adapter.Fetch()→CanonicalPatch→merge.Ingest()→ database read-back. Adapter unit tests verify JSON→CanonicalPatch conversion; merge integration tests construct CanonicalPatch manually. Neither catches field-mapping errors at the boundary — e.g., swapping CVSSv3 and CVSSv4 vectors, or dropping CWE IDs during struct conversion. These produce structurally valid patches with silently wrong data. At minimum, one test per adapter should round-trip through the merge pipeline and assert specific field values in the resultingcvesrow. - Golden file tests against real captured data: Every feed adapter MUST have a
golden_test.gothat serves captured real API responses viahttptestand runsFetch()against them. Unit tests with hand-crafted JSON fixtures cannot detect upstream schema drift — the fixture matches what the developer assumed the API returns, not what it actually returns. Golden file tests are the only reliable way to catch this class of bug. 🔥 Found in Phase 10: MSRC adapter had 8 passing unit tests but was 100% broken against the real API (endpoint never existed). GHSA adapter had 27 passing unit tests but silently dropped every advisory (references field was[]string, not[]object). Both bugs were invisible until golden file tests ran real captured data.
testutil.SeedCorpus(t, db) seeds a test database with 65 real CVEs from 8 feed adapters via golden fixtures and the real merge pipeline. It provides a realistic, deterministic corpus for integration tests. Requires Docker (testcontainers).
Use SeedCorpus when:
- Testing features that query across the CVE corpus (alert evaluation, search/FTS, reports, watchlists)
- Testing merge pipeline behavior with multi-source CVEs (field precedence, vendor enrichment)
- Testing any code that joins
cveswith child tables (cve_sources,cve_affected_packages,cve_search_index) - You need CVEs that look like production data — proper
material_hash, populated child tables, FTS index entries
Use hand-crafted fixtures when:
- Testing a specific edge case that real data may never contain (CVSS 0.0, null bytes, malformed timestamps)
- Adapter unit tests that need precise control over individual JSON fields
- Testing error paths (malformed responses, HTTP failures, cursor regressions)
- Speed matters —
SeedCorpustakes ~10s (Docker startup); hand-crafted fixtures are instant
Never use raw SQL inserts for CVE test data — they bypass material_hash computation, child table population, and FTS index updates. Use SeedCorpus, store methods, or the merge pipeline. (See §7 "Test data must flow through production code paths.")
RLS bugs are invisible in tests that use the superuser connection. Every org-scoped store method needs a test through the restricted connection.
- Dual-connection testing: When testing any org-scoped store method, query through
db.AppStore(theNOBYPASSRLSconnection) — notdb.Store(superuser). Tests that only use the superuser connection cannot detect RLS policy bugs. 🔥 Found in implementation-pitfalls §2.14: four broken list methods passed all tests because the superuser connection returned all rows regardless ofapp.org_id. - Cross-tenant visibility assertion: For every org-scoped list/get operation, create data in Org A and Org B via the superuser connection, then query via
AppStorescoped to Org A. Org B's data must not appear. This is the minimum viable RLS test. - Per-user resource isolation: Not all isolation is org-scoped RLS. Per-user resources (MFA challenges, remember-device tokens, recovery codes) are filtered by
user_idin global tables — no RLS involved. For every per-user lookup, test that User B cannot access User A's resource using the same token/hash. This is easy to miss because "it's just a WHERE clause" — but a missingAND user_id = $1in a query turns a per-user resource into a global one. 🔥 Found in review: remember-device token cross-user isolation was untested — query correctness was assumed, not verified. - Child table isolation: RLS on a parent table does NOT protect child tables. When a child table has
org_id, it must have its own RLS policy and its own cross-tenant test. When a child table lacksorg_id, that is a schema bug — flag it. - Worker vs API transaction helpers:
WorkerTxbypasses RLS (for background jobs that need cross-tenant access).withOrgTxenforces it (for API handlers). Test that API handler code paths never useWorkerTx— grep forWorkerTxininternal/api/as a code review step. 🔥 Found in implementation-pitfalls §4.6: worker transaction helpers called from API handlers allowed cross-tenant data access.
Building a security mechanism is not enough — test that the mechanism actually fires. Every security check needs at least one test that proves rejection works.
- RBAC matrix coverage: For every protected endpoint, test with every role (owner, admin, member, viewer, unauthenticated). An endpoint that only tests the happy-path role doesn't prove the others are rejected. 🔥 Found in bug hunts: admin feed endpoints lacked role-based auth — any authenticated user could trigger feed syncs.
- JWT algorithm confusion: Test that a JWT signed with
alg: "none"is rejected. Test that a JWT signed with an unexpected algorithm (e.g., RS256 when the server expects HS256) is rejected.WithValidMethodsmust be present and tested. - Anti-enumeration timing: When a handler must not reveal whether a resource exists (forgot-password, login), test that response times for existing and non-existing targets are statistically indistinguishable. A constant-time comparison on the secret is necessary but not sufficient — a DB query that only runs for existing users creates a timing side-channel.
- Authorization parity across auth flows: When a behavior exists in one auth flow (native register creates a default org), verify it exists in ALL auth flows (GitHub OAuth, Google OIDC). Maintain a checklist of "things that happen on first registration" and test each flow against it. 🔥 Found in implementation-pitfalls §8.3: OAuth and native registration produced different post-registration states.
- Security check enforcement across similar endpoints: When adding a security check to one handler, grep for all handlers that perform the same kind of operation and verify they have the same check. Role assignment isn't just
updateMemberRole— it's also invitations, OAuth account linking, and admin overrides. 🔥 Found in implementation-pitfalls §8.1. - Webhook SSRF protection: When testing webhook delivery, test with internal/private IP targets (
127.0.0.1,169.254.169.254,10.0.0.0/8). Thesafeurlclient must reject them. Test with DNS rebinding (a hostname that resolves to a private IP). - Constant-time secret comparison: Any comparison of secrets (API key hashes, HMAC signatures, CSRF tokens) must use
subtle.ConstantTimeCompareorhmac.Equal— never==orbytes.Equal. Code inspection alone doesn't prevent regressions — complement it with a mechanical check (grep test,go/astscan, or linter rule) that breaks when the comparison function changes. - Multi-layer authorization negative cases: When a function checks multiple independent authorization layers (e.g., "site config → org policy → per-member rule"), test each layer's negative case with all other layers inactive. A test that sets layer 1's config to "off" but leaves layer 3 active gives false confidence — it passes whether layer 1's check works or not, because layer 3 is the actual reason the assertion holds. For N layers, you need at least N negative tests where only one layer could match and it doesn't. 🔥 Found in review:
UserMFARequiredlayer 1 negative test (site admin + config off) also had a per-member requirement active — would have passed even if layer 1 ignored the config flag entirely. - Multi-step restricted session state consistency: When an auth flow issues restricted tokens (pending JWTs) that carry state claims (e.g.,
token_version, remaining steps), and an intermediate step mutates that state in the DB (e.g., password change incrementstoken_version), test that the reissued restricted token reflects the post-mutation state -- not the pre-mutation value carried forward from the original token. The test must decode the reissued token and assert the claim matches the DB. 🔥 Found in bug hunts:changePasswordHandlerreissued a pending token with the staletoken_versionfrom beforeUpdatePasswordHashincremented it in the DB. - Full auth token issuance on restricted session completion: When a restricted session flow has multiple pending steps and the final step clears the last one, test that the handler issues full access/refresh tokens -- not just clears the restricted cookie. A flow that clears the cookie without issuing replacement tokens leaves the user unauthenticated despite having just proven their identity. Test the single-pending-step case specifically -- multi-step flows may mask the bug by always having remaining steps. 🔥 Found in bug hunts:
clearEnrollmentPendingonly cleared the cookie when all pending items were done -- never issued full tokens, forcing an unnecessary re-login. - Token type enforcement across session types: When the auth system issues multiple token types with different privilege levels (access, refresh, pending/restricted), test that each parser rejects tokens of the wrong type. A
ParseAccessTokenthat doesn't check thetypclaim accepts a pending MFA token as a fully-authenticated session — bypassing MFA entirely. For every token type, test that presenting it to the wrong parser/endpoint returns 401. 🔥 Found in review: no test verified that a pending token was rejected when presented as an access token. - Cross-user token swapping in multi-step flows: When a multi-step authentication flow issues user-bound bearer tokens (enrollment cookies, pending tokens, setup nonces), test that completing another user's flow with a stolen token is rejected. Start the flow as User A, then attempt to complete it as User B using A's token. The guard (
enrollClaims.UserID != authenticatedUserID) must be tested explicitly — it's easy to implement but also easy to forget when the happy path works. This is distinct from token type enforcement (§11 "Token type enforcement"): type enforcement verifies wrong parsers reject wrong types; cross-user enforcement verifies the same parser rejects wrong users. 🔥 Found in coverage review:mfaTOTPConfirmHandlerchecksenrollClaims.UserID != userIDto prevent User A from binding their authenticator to User B's account — no test exercised this guard. - Fail-open regression tests: When a security mechanism intentionally fails open on infrastructure errors (e.g., lockout check returns "allowed" on DB error, with rate limiter as backup), add a regression test that: (1) documents the design decision, (2) verifies the fail-open behavior under simulated infrastructure failure, (3) prevents accidental change to fail-closed (which could lock out all users on DB outage). Comments documenting design intent get deleted; regression tests don't. 🔥 Found in coverage review:
lockoutManager.Checkintentionally fails open on DB errors (rate limiter as backup defense) — no regression test existed to prevent an inadvertent change to fail-closed.
Empty catch {} blocks and stale state on navigation are the two most common frontend bug categories. 13 frontend bugs were found in bug hunts — the majority were silent failures that rendered blank pages.
- Error states, not blank pages: When a network request fails, the component must render an error message — never a blank page or an empty list that looks like "no results." Test every
fetch/GETcall with a mocked network error and assert that an error element is visible. 🔥 Found in bug hunts:WatchlistDetailViewrendered blank page on network error;GroupMembersDialogshowed "empty state" instead of error message;fetchItems()had no catch block. - Mutation error feedback: When a POST/PUT/DELETE fails, the user must see feedback. Test every mutation call with a mocked failure and assert visible error feedback. 🔥 Found in bug hunts:
MembersView.cancelInvitation()silently failed; multiple dialogs swallowed mutation errors. - Stale data on route param change: When a view fetches data based on a route param (e.g.,
/cves/:id), test that navigating to a different param triggers a new fetch — not a stale render of the previous data. Use Vue'swatchon the route param oronBeforeRouteUpdate. 🔥 Found in bug hunts:CveDetailViewshowed wrong CVE after param change. - Org switch cache invalidation: When the user switches organizations, every org-scoped data cache must be invalidated and refetched. Test by switching orgs and asserting that list data reflects the new org — not stale data from the previous org. 🔥 Found in bug hunts: org switch didn't refetch data — showed cross-org data.
- Auth token refresh deduplication: When multiple concurrent requests receive 401s, exactly one refresh request must fire — not one per failed request. Test with two parallel requests that both trigger 401, assert a single refresh call. 🔥 Found in bug hunts: dual independent
refreshPromisein API client and org fetch module. - Request body on retry: When a middleware retries a failed request (e.g., after token refresh), test that the original request body is re-sent. A
fetch()body is a stream — once consumed, it's gone. The retry must use a cloned or buffered body. 🔥 Found in bug hunts: refresh middleware retry used consumedfetch()body. - Non-2xx success responses: When an endpoint intentionally returns non-2xx status codes with valid response bodies (e.g., 503 for health checks, 207 for multi-status), test that the frontend correctly parses the response body — not just checks
resp.ok. Afetchresponse withok === falsestill carries a valid JSON body that must be read. 🔥 Found in bug hunts:AdminSystemViewdoctor check usedresp.okto gate JSON parsing, silently discarding 503 responses — the exact case where the doctor results are most needed.
When a migration adds admin-managed columns (disable flags, forced-reset flags, pause states), the feature is incomplete until the enforcement code reads them. These bugs are invisible in integration tests that only exercise the admin write path.
- Admin flag enforcement at all entry points: When a migration adds a boolean/timestamp flag (e.g.,
disabled_at,force_password_reset,paused_at), trace every code path that the flag should gate. Login handlers, refresh handlers, scheduler loops, and manual trigger endpoints all need to check the flag — not just the middleware that runs on authenticated API calls. 🔥 Found in bug hunts:disabled_atnot checked in login handler (only in auth middleware);force_password_resetnot checked anywhere;paused_atnot checked by scheduler or manual trigger endpoint. - In-memory vs DB state consistency: When both in-memory state (rate limiters, lockout managers) and DB columns exist for the same concept, test that: (1) DB state survives restarts, (2) admin "clear" operations actually affect the enforcement mechanism, (3) multi-instance deployments share state. 🔥 Found in bug hunts: in-memory lockout manager and DB
locked_at/failed_login_countcolumns were completely disconnected — admin "unlock" cleared DB columns the login flow never wrote. - Registry completeness for extensible systems: When a system supports both built-in items and user-configured items (e.g., built-in feeds + generic YAML feeds), test that management endpoints (list, trigger, pause, resume) work for BOTH categories — not just the hardcoded built-in list. 🔥 Found in bug hunts: admin feed list/trigger/pause/resume endpoints only accepted built-in feed names; generic feeds were invisible and unmanageable.
Workers operate outside the request-response cycle with different failure modes: no caller to report errors to, no request context for scoping, and crash recovery depends on database state.
- Claim-commit-call-update pattern: When a worker delivers to an external service (webhook, email), verify this exact sequence: (1) claim job in DB, (2) commit the claim transaction, (3) make the external HTTP call, (4) update job status in a new transaction. Any test that mocks the external call while a DB transaction is open is testing the wrong thing.
- Stale job recovery: Simulate a worker crash by claiming a job and not completing it. Verify that the recovery sweep picks it up after the stale threshold and re-enqueues it.
- Goroutine shutdown path: Every
go func() { for { ... } }()must have a correspondingStop()method and adonechannel. In tests, every server or worker that spawns goroutines must havet.Cleanup(srv.Close). A test that passes but leaks goroutines is a test that hides bugs. 🔥 Found in implementation-pitfalls §8.5. - Large corpus pagination: When a worker processes the full CVE corpus (activation scans, batch evaluation, search index rebuilds), test that it paginates — not loads everything into memory. Mock a corpus of 10,000+ records and verify memory stays bounded.
- Fan-out error independence: When delivering notifications to multiple channels, test that a failure on one channel does not cancel or skip the others. The pattern is
sync.WaitGroupwith independent error recording per channel — nevererrgroup(which cancels siblings on first error). - Prometheus counter race safety: When workers update Prometheus counters, test with
-raceenabled and multiple concurrent worker goroutines. Counter init and increment must be goroutine-safe. 🔥 Found in bug hunts: scheduler Prometheus counters had data race on init. - Scheduler respects pause/disable flags: When a scheduler checks whether to enqueue a job, test that it reads and respects all relevant state flags (paused, disabled, suspended) — not just timing-based checks (interval, backoff). 🔥 Found in bug hunts: feed scheduler checked backoff and interval but ignored
paused_atflag entirely.
When a config value is read from an indirection layer (atomic pointer, cache, holder), tests must verify the runtime code actually reads from that layer — not from the original source.
- Runtime reads from the correct source: When introducing a config holder/cache that sits between startup config and runtime consumers, test that every consumer reads from the holder — not from the original config struct. A holder that is updated on reload but never read is a silent no-op. Test by changing the holder's value and asserting that subsequent operations use the new value, not the startup value. 🔥 Found in bug hunts:
configHolderwas updated by SIGHUP and admin reload, but all JWT signing/parsing and SSO encryption/decryption still read from the static startupsrv.cfg— hot-reload had zero runtime effect. - Full-replacement vs merge on reload: When a config reload replaces the entire config object (not individual fields), test that a reload with a partial config file does not zero out fields that were present at startup but absent from the reload file. If the reload file only contains
JWT_SECRET, fields likeSMTP_HOSTmust retain their previous values — not silently become empty. 🔥 Found in bug hunts:LoadFromSecretsFilecreated a newReloadableConfigfrom scratch — any field not in the file was zeroed, silently disabling SMTP, SSO encryption, and other features. - Feature wiring completeness: When a feature has implementation code (constructors, methods, types) and configuration fields, test that the startup path actually calls the constructors with the config values. Dead code that is implemented but never wired is invisible to unit tests. 🔥 Found in bug hunts:
SyslogWriterandSetSyslogwere fully implemented,SIEMSyslogAddrexisted inReloadableConfig, but no startup code ever calledNewSyslogWriterorSetSyslog— SIEM forwarding was entirely dead code.
Test code has the same error-swallowing bugs as production code — but they're harder to catch because the symptom is a misleading test failure, not a user-visible bug.
- Test setup must not discard errors: When test setup calls store methods (
CreateOrg,CreateUser,CreateChannel), never discard the error with_ =. A nil return from a failedCreateOrgpropagates as a nil pointer dereference deep in test logic — the stack trace points at the wrong line, and the real failure (e.g., missing migration, duplicate constraint) is invisible. Userequire.NoError(t, err)for every setup call. Grep for, _ :=and_, _ :=patterns in_test.gofiles as a code review step. 🔥 Found in review:CreateOrgandCreateUsererrors discarded with_ =across multiple test files in store, notify, and alert packages — infrastructure failures surfaced as nil pointer panics. - Tests must be executed, not just compiled:
go buildandgo vetverify syntax; onlygo testverifies behavior. When infrastructure is unavailable (Docker Desktop down, test DB unreachable), treat this as a hard blocker — not something to work around. Code that compiles but was never executed is unverified code. 🔥 Found in remediation: 4,600 lines of test code across 4 execution stages were merged to dev with "compiles clean" verification only. Docker Desktop was unavailable throughout. 12 of 13 CI failures would have been caught by running the tests once. - Nullable columns need nullable scan destinations: When querying a table with nullable columns (common in security_events, audit_log), the scan destination must be a pointer type (
*string,*uuid.UUID) orsql.NullString. Scanning a nullable column into a plainstringpanics at runtime:can't scan into dest: cannot scan NULL into *string. This is invisible at compile time. 🔥 Found in remediation:flushAndQueryEventshelper scannedactor_email(nullable) intostring— 9 tests crashed in CI. - CSRF header on mutating test requests: Every POST/PUT/PATCH/DELETE request through the full HTTP stack must include
req.Header.Set("X-Requested-By", "CVErt-Ops"). Cookie-authenticated requests without it get 403 from CSRF middleware. GET requests are exempt. 🔥 Found in remediation: 3 admin user tests got 403 because CSRF header was missing from POST requests — the pattern existed in every other test file but wasn't copied. - DB CHECK constraints must cover all handler action strings: When a migration adds a CHECK constraint on an enum-like column (e.g.,
audit_log.action), the allowed values must include every string used by application code. Adding a new handler action (revoke,add,remove) without updating the constraint silently fails in CI while working locally if the migration hasn't been re-run. 🔥 Found in remediation: audit_log CHECK constraint allowedcreate/update/deletebut handlers usedrevoke,add,remove,bind,unbind.