|
| 1 | +# Quality Engineering Guide |
| 2 | + |
| 3 | +> Living document — last updated: 2026-03-15 |
| 4 | +> |
| 5 | +> This guide defines PostgresAI's quality standards, processes, and automated |
| 6 | +> gates. It serves as the "constitution" that both AI agents and human engineers |
| 7 | +> reference when building, reviewing, and releasing software. |
| 8 | +
|
| 9 | +## Core Philosophy: Quality as Code |
| 10 | + |
| 11 | +PostgresAI products touch production PostgreSQL instances — mistakes can mean |
| 12 | +data loss, incorrect diagnostics, or silent monitoring failures. Traditional QA |
| 13 | +departments don't fit a small, distributed team. Instead, quality is embedded |
| 14 | +into the development workflow itself: |
| 15 | + |
| 16 | +| Layer | Purpose | Catches | |
| 17 | +|-------|---------|---------| |
| 18 | +| **1. Automated Gates** | CI/CD pipelines, pre-commit hooks, schema validation | ~80% of issues before any human sees them | |
| 19 | +| **2. AI-Assisted Review** | PostgreSQL-specific PR review, test generation, spec gap analysis | Edge cases, combinatorial scenarios, domain-specific bugs | |
| 20 | +| **3. Human Judgment** | Architecture decisions, customer scenarios, risk assessment | Design flaws, UX issues, safety-critical decisions | |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## Layer 1: Automated Foundation |
| 25 | + |
| 26 | +### 1.1 Pre-Commit Hooks |
| 27 | + |
| 28 | +Every developer must have pre-commit hooks installed (`pre-commit install`). |
| 29 | +Current hooks: |
| 30 | + |
| 31 | +- **gitleaks** — Prevents secrets from being committed |
| 32 | +- **TypeScript typecheck** — Catches type errors before push |
| 33 | +- **pytest (unit)** — Runs fast unit tests on changed Python files |
| 34 | + |
| 35 | +### 1.2 CI Pipeline Quality Gates |
| 36 | + |
| 37 | +Every PR must pass these gates before merge: |
| 38 | + |
| 39 | +| Gate | Tool | Blocks Merge | |
| 40 | +|------|------|:------------:| |
| 41 | +| Python unit + integration tests | pytest + pytest-postgresql | Yes | |
| 42 | +| CLI unit tests + coverage | Bun test runner | Yes | |
| 43 | +| CLI smoke tests | Node.js + built CLI | Yes | |
| 44 | +| E2E monitoring stack tests | Docker-in-Docker | Yes | |
| 45 | +| Helm/config validation | pytest + helm template | Yes | |
| 46 | +| SAST security scanning | GitLab SAST | Yes | |
| 47 | +| Secret detection | gitleaks | Yes | |
| 48 | +| JSON schema validation | ajv / jsonschema | Yes | |
| 49 | +| Performance regression check | Benchmark comparison | Warning | |
| 50 | + |
| 51 | +### 1.3 PostgreSQL Version Matrix |
| 52 | + |
| 53 | +Products must be tested against supported PostgreSQL versions: |
| 54 | + |
| 55 | +| Version | Status | CI Coverage | |
| 56 | +|---------|--------|:-----------:| |
| 57 | +| 14 | Supported | Nightly | |
| 58 | +| 15 | Supported | Every PR | |
| 59 | +| 16 | Supported | Every PR | |
| 60 | +| 17 | Supported | Nightly | |
| 61 | +| 18 | Preview | Weekly | |
| 62 | + |
| 63 | +### 1.4 Test Categories |
| 64 | + |
| 65 | +Tests are organized by execution speed and infrastructure requirements: |
| 66 | + |
| 67 | +``` |
| 68 | +pytest markers: |
| 69 | + unit — Fast, mocked, no external services (~seconds) |
| 70 | + integration — Requires PostgreSQL (~30s) |
| 71 | + requires_postgres — Alias for integration |
| 72 | + e2e — Full monitoring stack (~minutes) |
| 73 | + enable_socket — Allow network access |
| 74 | +
|
| 75 | +Bun test tags: |
| 76 | + *.test.ts — Unit tests (default) |
| 77 | + *.integration.test.ts — Integration tests |
| 78 | +``` |
| 79 | + |
| 80 | +### 1.5 Coverage Requirements |
| 81 | + |
| 82 | +| Component | Minimum | Target | |
| 83 | +|-----------|:-------:|:------:| |
| 84 | +| Reporter (Python) | 70% | 85% | |
| 85 | +| CLI (TypeScript) | 60% | 80% | |
| 86 | +| New code (any) | 80% | 95% | |
| 87 | + |
| 88 | +Coverage is reported automatically in CI and visible in MR/PR comments. |
| 89 | + |
| 90 | +### 1.6 Schema Validation |
| 91 | + |
| 92 | +All health check outputs must conform to JSON schemas in `reporter/schemas/`. |
| 93 | +Schema compliance is enforced at two levels: |
| 94 | + |
| 95 | +1. **Build time** — `test_report_schemas.py` validates all check outputs |
| 96 | +2. **Runtime** — `checkup.ts` validates against embedded schemas before upload |
| 97 | + |
| 98 | +When adding a new check: |
| 99 | +1. Create `reporter/schemas/<CHECK_ID>.schema.json` |
| 100 | +2. Add test cases in `tests/reporter/` |
| 101 | +3. Add CLI implementation in `cli/lib/checkup.ts` |
| 102 | +4. Validate output matches schema in both Python and TypeScript paths |
| 103 | + |
| 104 | +--- |
| 105 | + |
| 106 | +## Layer 2: AI-Assisted Quality |
| 107 | + |
| 108 | +### 2.1 AI PR Review |
| 109 | + |
| 110 | +Every PR is reviewed by an AI agent with the PostgreSQL-specific system prompt |
| 111 | +defined in `quality/pr-review-prompt.md`. The review focuses on: |
| 112 | + |
| 113 | +- **SQL safety** — Injection paths, raw string concatenation, missing parameterization |
| 114 | +- **Connection handling** — Unclosed connections, missing timeouts, pool exhaustion |
| 115 | +- **Transaction safety** — Incorrect isolation assumptions, long-running transactions |
| 116 | +- **Resource leaks** — Unreleased advisory locks, unclosed cursors, temp table accumulation |
| 117 | +- **PostgreSQL version compatibility** — Features not available in all supported versions |
| 118 | +- **Error handling** — Missing error paths on database operations |
| 119 | +- **Lock awareness** — DDL that acquires AccessExclusive locks, missing `CONCURRENTLY` |
| 120 | + |
| 121 | +### 2.2 AI Test Generation |
| 122 | + |
| 123 | +When implementing a new health check or analyzer, use AI to generate test |
| 124 | +scaffolding: |
| 125 | + |
| 126 | +1. Write the spec/implementation |
| 127 | +2. Feed to AI with prompt: *"Generate test cases for this PostgreSQL analyzer. |
| 128 | + Cover: normal case, empty table, table with no indexes, partial indexes, |
| 129 | + expression indexes, concurrent DDL during analysis, permission errors, |
| 130 | + PostgreSQL version differences."* |
| 131 | +3. Developer reviews, adjusts, and commits the tests |
| 132 | + |
| 133 | +### 2.3 Spec Gap Analysis |
| 134 | + |
| 135 | +Before implementation begins, feed the spec to AI for gap analysis: |
| 136 | + |
| 137 | +- *"What failure modes aren't addressed in this spec?"* |
| 138 | +- *"What PostgreSQL version-specific behaviors could affect this?"* |
| 139 | +- *"What happens if this runs concurrently with vacuum/reindex/DDL?"* |
| 140 | + |
| 141 | +### 2.4 Automated Issue Triage |
| 142 | + |
| 143 | +When a bug report arrives: |
| 144 | +1. AI agent classifies severity (P0-P3) |
| 145 | +2. Identifies likely affected components (reporter, CLI, monitoring stack) |
| 146 | +3. Searches for related past issues |
| 147 | +4. Drafts initial investigation path |
| 148 | +5. Human picks up with context already assembled |
| 149 | + |
| 150 | +--- |
| 151 | + |
| 152 | +## Layer 3: Human Quality Decisions |
| 153 | + |
| 154 | +### 3.1 Architecture Reviews |
| 155 | + |
| 156 | +Required for: |
| 157 | +- New health checks that modify database state |
| 158 | +- Changes to the Analyst/Auditor/Actor pipeline |
| 159 | +- New autonomous actions (anything that writes to production databases) |
| 160 | +- Changes to connection pooling or authentication flows |
| 161 | +- New PostgreSQL extension dependencies |
| 162 | + |
| 163 | +### 3.2 Customer Scenario Testing |
| 164 | + |
| 165 | +Before each release, one engineer walks through key customer workflows: |
| 166 | + |
| 167 | +| Scenario | What to verify | |
| 168 | +|----------|---------------| |
| 169 | +| Express checkup on fresh PostgreSQL | All checks run, report is valid JSON, upload succeeds | |
| 170 | +| Monitoring stack install (demo mode) | `local-install --demo` completes, Grafana accessible, metrics flowing | |
| 171 | +| Add external target database | Target added, metrics collected, checkup runs against it | |
| 172 | +| Large database checkup | No timeouts, memory stays bounded, results are accurate | |
| 173 | +| Extension-heavy database | Common extensions (PostGIS, pg_partman, pg_stat_statements) don't cause failures | |
| 174 | + |
| 175 | +### 3.3 Risk Classification for Autonomous Actions |
| 176 | + |
| 177 | +Every autonomous action (current or future) must have a risk classification: |
| 178 | + |
| 179 | +| Risk Level | Description | Gate | |
| 180 | +|------------|-------------|------| |
| 181 | +| **Read-only** | Queries, EXPLAIN, pg_stat views | Automated | |
| 182 | +| **Advisory** | Recommendations shown to user | AI review + human spot-check | |
| 183 | +| **Reversible write** | CREATE INDEX CONCURRENTLY, config changes with reload | Human approval required | |
| 184 | +| **Irreversible write** | DROP, TRUNCATE, ALTER TABLE rewrite | Human approval + confirmation prompt | |
| 185 | + |
| 186 | +--- |
| 187 | + |
| 188 | +## PostgreSQL-Specific Quality Standards |
| 189 | + |
| 190 | +### SQL Query Standards |
| 191 | + |
| 192 | +- All queries generated by the product must be tested with `EXPLAIN ANALYZE` |
| 193 | +- No sequential scans on tables expected to have >10k rows |
| 194 | +- No queries that acquire `AccessExclusiveLock` without explicit documentation |
| 195 | +- All SQL uses parameterized queries (`$1`, `$2`) — never string concatenation |
| 196 | +- Queries must specify `statement_timeout` for safety |
| 197 | + |
| 198 | +### Extension Compatibility |
| 199 | + |
| 200 | +First-class CI coverage for these extensions (used by most customers): |
| 201 | + |
| 202 | +| Extension | Why | |
| 203 | +|-----------|-----| |
| 204 | +| pg_stat_statements | Core dependency for K-series checks | |
| 205 | +| pg_stat_kcache | CPU/IO metrics in D004 | |
| 206 | +| auto_explain | Query plan analysis | |
| 207 | +| pg_buffercache | Buffer analysis | |
| 208 | +| PostGIS | Common in customer deployments | |
| 209 | +| pg_partman | Partition management | |
| 210 | +| pgvector | Growing adoption | |
| 211 | + |
| 212 | +### Connection Handling Standards |
| 213 | + |
| 214 | +- All connections must have a `statement_timeout` (default: 30s for checks) |
| 215 | +- All connections must have a `connect_timeout` (default: 10s) |
| 216 | +- Connections must be returned to pool or closed in `finally` blocks |
| 217 | +- Connection errors must produce actionable error messages |
| 218 | +- Maximum connection count must be configurable and bounded |
| 219 | + |
| 220 | +### WAL and Replication Safety |
| 221 | + |
| 222 | +- Features touching WAL or replication need tests for: |
| 223 | + - Replica lag scenarios |
| 224 | + - Failover during operation |
| 225 | + - WAL segment cleanup interaction |
| 226 | +- Never hold connections across WAL switch boundaries unnecessarily |
| 227 | + |
| 228 | +--- |
| 229 | + |
| 230 | +## Process: Feature Development Workflow |
| 231 | + |
| 232 | +### For Every Feature |
| 233 | + |
| 234 | +``` |
| 235 | +1. Spec written |
| 236 | + └─→ Spec reviewed by engineer + AI gap analysis |
| 237 | +
|
| 238 | +2. Implementation + tests |
| 239 | + └─→ Developer writes code |
| 240 | + └─→ AI generates test scaffolding from spec |
| 241 | + └─→ Developer refines tests |
| 242 | +
|
| 243 | +3. PR opened |
| 244 | + └─→ CI runs fast suite (unit + lint + typecheck) |
| 245 | + └─→ AI runs PostgreSQL-specific review |
| 246 | + └─→ Human reviewer focuses on design + correctness |
| 247 | +
|
| 248 | +4. Merge to main |
| 249 | + └─→ Nightly: full PostgreSQL version matrix |
| 250 | + └─→ Nightly: performance benchmarks vs baseline |
| 251 | +
|
| 252 | +5. Release candidate |
| 253 | + └─→ AI produces release readiness report |
| 254 | + └─→ Human does scenario walkthrough |
| 255 | + └─→ Go/no-go decision |
| 256 | +``` |
| 257 | + |
| 258 | +### PR Review Checklist |
| 259 | + |
| 260 | +Before approving any PR, verify: |
| 261 | + |
| 262 | +- [ ] Tests cover the happy path AND at least 2 error paths |
| 263 | +- [ ] New SQL queries are parameterized (no string concatenation) |
| 264 | +- [ ] Database connections are properly closed/returned |
| 265 | +- [ ] New checks have corresponding JSON schema |
| 266 | +- [ ] Schema changes are backward-compatible |
| 267 | +- [ ] No new dependencies without justification |
| 268 | +- [ ] Error messages are actionable (not just "something went wrong") |
| 269 | +- [ ] PostgreSQL version-specific behavior is handled |
| 270 | +- [ ] No hardcoded credentials, tokens, or connection strings |
| 271 | + |
| 272 | +--- |
| 273 | + |
| 274 | +## Quality Metrics |
| 275 | + |
| 276 | +Track these metrics to measure quality system effectiveness: |
| 277 | + |
| 278 | +| Metric | How to Measure | Target | |
| 279 | +|--------|---------------|--------| |
| 280 | +| Test coverage (Python) | `pytest --cov` in CI | >70% overall, >80% new code | |
| 281 | +| Test coverage (CLI) | Bun coverage in CI | >60% overall, >80% new code | |
| 282 | +| CI pipeline pass rate | GitLab CI analytics | >90% on main | |
| 283 | +| Mean time bug-intro → detection | Git blame + issue timestamps | <1 sprint | |
| 284 | +| Performance benchmark trend | Nightly benchmark results | No regression >5% | |
| 285 | +| Schema validation failures | CI artifact count | 0 on main | |
| 286 | +| Security findings (SAST) | GitLab security dashboard | 0 critical/high | |
| 287 | + |
| 288 | +--- |
| 289 | + |
| 290 | +## Weekly Quality Rhythm |
| 291 | + |
| 292 | +| Day | Activity | |
| 293 | +|-----|----------| |
| 294 | +| **Monday** | Review nightly test failures, triage new issues | |
| 295 | +| **Wednesday** | Mid-week check: any flaky tests? CI pipeline health? | |
| 296 | +| **Friday** | Quality retro: what slipped through? New test needed? CI tightening? | |
| 297 | + |
| 298 | +--- |
| 299 | + |
| 300 | +## What We Don't Do |
| 301 | + |
| 302 | +- **Dedicated QA team** — Quality ownership stays with engineers, amplified by AI |
| 303 | +- **Manual test plans in spreadsheets** — Everything is code |
| 304 | +- **Separate staging that drifts** — Use monitoring stack's own Docker setup to mirror real environments |
| 305 | +- **100% coverage targets** — Diminishing returns; focus on critical paths and failure modes |
0 commit comments