|
| 1 | +# AGENTS.md — Inspect |
| 2 | + |
| 3 | +Website security auditing and crawling library for Go. Crawls sites concurrently, runs checks and declarative rules, generates findings with severity and CWE references. |
| 4 | + |
| 5 | +## Design Principles |
| 6 | + |
| 7 | +- **Library + CLI** — importable library with optional `inspect-ci` binary |
| 8 | +- **No LLM dependency** — pure static analysis on crawled pages |
| 9 | +- **Extensible** — custom checks (Go code) + declarative rules (no code required) |
| 10 | + |
| 11 | +## Build & Test |
| 12 | + |
| 13 | +```bash |
| 14 | +go test ./... # Run all tests |
| 15 | +go test -race ./... # Race detector |
| 16 | +go test -coverprofile=c.out ./... # Coverage |
| 17 | +go vet ./... # Static analysis |
| 18 | +gofumpt -w . # Format |
| 19 | +``` |
| 20 | + |
| 21 | +## Architecture |
| 22 | + |
| 23 | +- `crawler.go` — Concurrent website crawler with depth control |
| 24 | +- `check.go` — Check interface and built-in security checks |
| 25 | +- `rule.go` — Declarative rule engine (YAML-based) |
| 26 | +- `finding.go` — Findings with severity, CWE, and evidence |
| 27 | +- `report.go` — Report generation (JSON, SARIF, HTML) |
| 28 | +- `cmd/inspect-ci/` — Optional CI binary for pipeline integration |
| 29 | + |
| 30 | +## Conventions |
| 31 | + |
| 32 | +- Go 1.26+, pure Go, no CGO |
| 33 | +- Table-driven tests |
| 34 | +- Conventional Commits: `feat:`, `fix:`, `docs:`, `refactor:`, `test:` |
| 35 | +- No `Co-authored-by:` trailers (auto-stripped by githook) |
| 36 | +- `gofumpt` formatting enforced in CI |
| 37 | +- CWE references required for all security findings |
| 38 | + |
| 39 | +## Common Pitfalls |
| 40 | + |
| 41 | +- Crawler tests need HTTP test servers — use `httptest.NewServer` |
| 42 | +- Rule YAML must be validated before execution |
| 43 | +- Session cookie matching uses substring, not exact match |
| 44 | + |
| 45 | +## Naming Conventions |
| 46 | + |
| 47 | +- **Types are domain nouns**: `Finding`, `Report`, `Stats`, `Page`, `PageLink`, `Checker`, `RuleCheck` |
| 48 | +- **Option functions use `With` prefix**: `WithChecks()`, `WithDepth()`, `WithConcurrency()`, `WithAllowPrivateIPs()` |
| 49 | +- **Preset options are bare vars**: `Quick`, `Standard`, `Deep`, `SecurityOnly`, `CI` — exported `var Option` values |
| 50 | +- **Severity is a type alias**: `type Severity = types.Severity` from `hawk/shared/types` — shared across hawk-eco |
| 51 | +- **Internal adapters use `Adapter` suffix**: `ruleCheckAdapter`, `customCheckAdapter` — bridge public to internal interfaces |
| 52 | +- **Check names are lowercase strings**: `"security"`, `"links"`, `"forms"`, `"a11y"`, `"performance"` — used in `WithChecks()` |
| 53 | +- **Error handling**: `Scan()` returns `(*Report, error)` — validation errors for empty URL, nil errors for success |
| 54 | + |
| 55 | +## API Patterns |
| 56 | + |
| 57 | +- **Functional options pattern**: same as sight — `Option` interface with `optFunc` adapter, `buildConfig()` merge |
| 58 | +- **One-shot + reusable**: `Scan(ctx, target, opts...)` creates a `Scanner` internally; `NewScanner(opts...)` for reuse |
| 59 | +- **Checker interface for extensibility**: `Name() string` + `Run(ctx, pages) []Finding` — register via `RegisterCheck()` |
| 60 | +- **RuleCheck for declarative rules**: `HeaderMatch`, `HeaderMissing`, `BodyMatch`, `BodyMissing`, `URLMatch` patterns |
| 61 | +- **Global + per-scanner custom checks**: `RegisterCheck()`/`RegisterRule()` for global; pass slices to `Scanner` for scoped |
| 62 | +- **Report.Failed()**: checks if any finding meets `FailOn` severity threshold — same pattern as sight |
| 63 | +- **ReDoS protection**: all user-supplied regex patterns go through `compileWithTimeout()` and `matchWithTimeout()` with 1s/100ms limits |
| 64 | +- **Regex complexity check**: `checkRegexComplexity()` rejects nested quantifiers and deep group nesting before compilation |
| 65 | + |
| 66 | +## Testing Patterns |
| 67 | + |
| 68 | +- **External test package**: `package inspect_test` — tests import `inspect` as a consumer would |
| 69 | +- **httptest.NewServer for all tests**: each test spins up a mock HTTP server with specific HTML/headers/responses |
| 70 | +- **Test patterns by concern**: `TestScan_BasicSite` (links), `TestScan_SecurityHeaders`, `TestScan_FormCSRF`, `TestScan_Accessibility` |
| 71 | +- **Always pass `WithAllowPrivateIPs()`**: tests run against `127.0.0.1` — without this flag, localhost is blocked |
| 72 | +- **Always pass `WithDepth(1)`**: keeps tests fast by limiting crawl depth |
| 73 | +- **Finding assertions**: iterate `report.Findings` and check specific `Check`, `Severity`, `Message` fields |
| 74 | +- **Preset smoke test**: `TestScan_Presets` runs all presets against a simple server — catches config panics |
| 75 | +- **ClearCustomChecks() in tests**: call before registering test-specific checks to avoid global state leaks |
| 76 | +- **Report method tests**: `TestReport_Failed`, `TestReport_MaxSeverity` — test on struct literals, no HTTP needed |
| 77 | + |
| 78 | +## Refactoring Guidelines |
| 79 | + |
| 80 | +- **Safe to refactor**: `checkRegexComplexity()`, `compileWithTimeout()`, `matchWithTimeout()` — internal helpers |
| 81 | +- **Safe to refactor**: `truncateEvidence()`, `intIn()` — pure utility functions |
| 82 | +- **Safe to refactor**: `parseInspectTOML()`, `parseInspectKeyValue()`, `applyFileConfig()` — config parsing internals |
| 83 | +- **Do not touch**: `Checker` interface (`Name()`, `Run()`) — breaking change for all custom check implementations |
| 84 | +- **Do not touch**: `RuleCheck` struct field names — used by consumers to define declarative rules |
| 85 | +- **Do not touch**: `Finding`, `Report`, `Stats` struct field names/tags — JSON serialization contract |
| 86 | +- **Safe to extend**: add new `Option` functions, new presets, new built-in checks in `checks/` package |
| 87 | +- **When adding checks**: create a new file in `checks/`, implement `Checker` interface, register in `init()` |
| 88 | + |
| 89 | +## Key File Locations |
| 90 | + |
| 91 | +| What | Where | |
| 92 | +|---|---| |
| 93 | +| Public API entry point | `inspect.go` (types, `Scan()`, `Finding`, `Report`, `Stats`) | |
| 94 | +| Check interface & adapters | `check.go` (`Checker`, `RuleCheck`, `RegisterCheck()`, `RegisterRule()`, ReDoS protection) | |
| 95 | +| Scanner implementation | `scanner.go` (crawler orchestration, check execution) | |
| 96 | +| Configuration & presets | `options.go` (`config` struct, `With*` functions, presets) | |
| 97 | +| Config file loading | `config.go` (`.inspect.toml` parsing, `LoadConfig()`) | |
| 98 | +| Severity type alias | `severity.go` (re-exports from `hawk/shared/types`) | |
| 99 | +| SARIF output | `sarif.go` | |
| 100 | +| CI output formatting | `ci_output.go` | |
| 101 | +| Built-in checks | `checks/` directory | |
| 102 | +| Internal crawler | `internal/crawler/` | |
| 103 | +| Internal check runner | `internal/check/` | |
| 104 | +| Browser-based crawling | `browser.go`, `browser/` | |
| 105 | +| LLM scanner integration | `llm_scanner.go` | |
| 106 | +| API security checks | `api_security.go` | |
| 107 | +| Dependency checking | `dependency_check.go` | |
| 108 | +| SBOM generation | `sbom.go` | |
| 109 | +| Main test file | `inspect_test.go` (httptest servers, per-concern scenarios) | |
| 110 | +| Linter config | `.golangci.yml` (errcheck, govet, staticcheck, gocritic, bodyclose, noctx) | |
0 commit comments