docs: add multi-agent codebase review issue for full project audit

claude · claude · commit 30032673672c · 2026-03-12T05:56:01.000Z
Defines 9 focused review areas covering CLI, reporter, Flask backend, testing, Docker infra, security, Helm/Terraform, Index Pilot, and docs. https://claude.ai/code/session_01NrBrwQ7B7vAqd23rvWw98g
diff --git a/.github/ISSUES/multi-agent-codebase-review.md b/.github/ISSUES/multi-agent-codebase-review.md
@@ -0,0 +1,148 @@
+# Multi-agent codebase review: audit and improve all components
+
+## Overview
+
+Launch a comprehensive, multi-agent codebase review of the entire postgresai project. Each agent focuses on one specific area, documents findings in a structured way, and where improvements are warranted, opens PRs.
+
+This is a full audit of what we have so far — covering code quality, architecture, security, testing, documentation, DevOps, and more.
+
+---
+
+## Review Agents
+
+### 1. CLI Code Quality & Architecture (`cli/`)
+**Scope:** All TypeScript in `cli/lib/` and `cli/bin/`
+**Focus areas:**
+- Code organization and module boundaries (checkup, auth, issues, MCP server, etc.)
+- Error handling consistency (propagating vs. graceful degradation pattern documented in `checkup.ts`)
+- TypeScript strictness — any use of `any`, missing types, unsafe casts
+- Dependency review in `package.json` — unused, outdated, or duplicated deps
+- SQL injection surface in inline SQL and `cli/sql/` files
+- API client patterns (`checkup-api.ts`, `supabase.ts`) — retry logic, timeout handling, error surfaces
+
+**Deliverable:** Findings doc + PR(s) for any quick wins
+
+---
+
+### 2. Reporter Module (`reporter/`)
+**Scope:** Python reporter — `postgres_reports.py`, `report_schemas.py`, `logger.py`
+**Focus areas:**
+- PromQL query correctness and efficiency
+- Report schema compliance and schema evolution strategy
+- Memory management (gc usage, large result sets)
+- Error handling — are all Prometheus/Postgres failures handled gracefully?
+- Python code quality — type hints, docstrings, naming conventions
+- Connection management — connection leaks, pooling, timeout handling
+
+**Deliverable:** Findings doc + PR(s) for improvements
+
+---
+
+### 3. Monitoring Flask Backend (`monitoring_flask_backend/`)
+**Scope:** `app.py` and its test suite
+**Focus areas:**
+- Flask best practices — app factory pattern, blueprints, configuration
+- Security — input validation, SQL injection in query text endpoints, AWS auth handling
+- API endpoint design — consistency, error responses, status codes
+- Query truncation logic (`smart_truncate_query`) — edge cases, correctness
+- Performance — any N+1 patterns, unnecessary queries, missing caching
+
+**Deliverable:** Findings doc + PR(s) for fixes
+
+---
+
+### 4. Test Coverage & Quality
+**Scope:** `tests/`, `cli/test/`, `monitoring_flask_backend/test_app.py`
+**Focus areas:**
+- Coverage gaps — which modules/functions lack tests?
+- Test quality — are tests testing behavior or implementation details?
+- Flaky tests — any timing-dependent or order-dependent tests?
+- Test organization — naming conventions, fixture reuse, conftest patterns
+- Integration test reliability — Docker dependencies, external service mocks
+- Reporter test suite (`tests/reporter/`) — there are many test files; check for redundancy and consolidation opportunities
+
+**Deliverable:** Findings doc + PR(s) to improve test quality/coverage
+
+---
+
+### 5. Docker & Compose Infrastructure
+**Scope:** `docker-compose.yml`, `docker-compose.local.yml`, `Dockerfile`s, `config/`
+**Focus areas:**
+- Resource limits — are the documented 4 vCPU / 8 GiB assumptions still valid?
+- Image sizes — multi-stage builds, unnecessary packages
+- Health checks — are all services using proper health check configurations?
+- Startup ordering — `depends_on` conditions, race conditions
+- Volume management — data persistence, cleanup
+- Config initialization pattern (config-init + sources-generator) — robustness
+- Security — running as root vs. non-root, secret handling
+
+**Deliverable:** Findings doc + PR(s) for improvements
+
+---
+
+### 6. Security Audit
+**Scope:** Entire codebase
+**Focus areas:**
+- Recent SAST remediation (CWE-78, CWE-918, CWE-770, CWE-489, CWE-185) — verify completeness
+- Credential handling — connection strings, API keys, AWS auth patterns
+- Input validation at all boundaries (CLI args, API endpoints, SQL parameters)
+- Dependency vulnerabilities — `npm audit`, `pip audit`, known CVEs
+- Pre-commit hooks — gitleaks config effectiveness
+- SECURITY.md — is the vulnerability reporting process clear?
+- VictoriaMetrics Basic Auth — implementation correctness
+
+**Deliverable:** Findings doc + PR(s) for any issues found
+
+---
+
+### 7. Helm Chart & Terraform (`postgres_ai_helm/`, `terraform/`)
+**Scope:** Kubernetes and cloud deployment
+**Focus areas:**
+- Helm chart — values.yaml defaults, RBAC configuration, resource requests/limits
+- Secret management in Kubernetes — are secrets properly handled?
+- Terraform — state management, variable validation, security groups
+- Networking — ingress configuration, service exposure
+- Monitoring stack topology — is the architecture sound for production?
+- Documentation — are QUICKSTART.md and deployment docs accurate?
+
+**Deliverable:** Findings doc + PR(s) for improvements
+
+---
+
+### 8. Index Pilot Component (`components/index_pilot/`)
+**Scope:** SQL extension for automated index management
+**Focus areas:**
+- SQL correctness — DDL safety, transaction handling
+- Security — FDW configuration, permission model, privilege escalation risks
+- Test coverage — are the pgTAP tests (`test/`) comprehensive?
+- Install/uninstall reliability — idempotency, version upgrades
+- Documentation — is the component well-documented for users?
+
+**Deliverable:** Findings doc + PR(s) for improvements
+
+---
+
+### 9. Documentation & Developer Experience
+**Scope:** `README.md`, `CONTRIBUTING.md`, `CLAUDE.md`, inline docs
+**Focus areas:**
+- README accuracy — do the quick-start steps actually work?
+- CONTRIBUTING.md — is the local dev workflow complete and reproducible?
+- CLI help text — is `--help` output clear and complete for all commands?
+- Inline documentation — are architectural decisions documented where they matter?
+- Changelog / release notes — is there a process?
+- Discoverability — can a new contributor find what they need?
+
+**Deliverable:** Findings doc + PR(s) for improvements
+
+---
+
+## Process
+
+1. **Each agent works independently** — reviewing its area, documenting findings
+2. **Findings are structured** — severity (critical/high/medium/low), category, description, suggested fix
+3. **Quick wins become PRs** — if a fix is straightforward and low-risk, open a PR directly
+4. **Larger issues become follow-up issues** — for architectural changes or multi-file refactors
+5. **No over-engineering** — improvements should be practical, not theoretical
+
+## Labels
+`review`, `tech-debt`, `quality`