|
| 1 | +--- |
| 2 | +name: code-quality |
| 3 | +description: Audit code quality gaps not covered by ruff - complexity trends, exception hygiene, type coverage, TODO aging |
| 4 | +trigger: schedule |
| 5 | +tool: claude-code |
| 6 | +timeout_minutes: 20 |
| 7 | +max_turns: 30 |
| 8 | +permissions: |
| 9 | + contents: write |
| 10 | +--- |
| 11 | + |
| 12 | +# Code Quality Audit |
| 13 | + |
| 14 | +Catch quality drift that CI doesn't cover. Write findings to |
| 15 | +`/tmp/audit-{{suite}}.md`. |
| 16 | + |
| 17 | +**What CI already enforces** (do NOT duplicate): |
| 18 | +- Ruff rules: W, F, I, ICN, PIE, TID, UP006, UP007, UP045 |
| 19 | +- Ruff format with 120-char line length, double quotes |
| 20 | +- Test coverage >= 90% aggregate |
| 21 | + |
| 22 | +**What CI does NOT enforce** (this recipe's focus): |
| 23 | +- C901 cyclomatic complexity (not in ruff select) |
| 24 | +- ANN type annotation completeness (not in ruff select) |
| 25 | +- BLE001 bare except handling (not in ruff select) |
| 26 | +- Google-style docstring format (D* rules not enabled) |
| 27 | +- Complexity growth trends over time |
| 28 | +- TODO/FIXME aging |
| 29 | + |
| 30 | +## Runner memory |
| 31 | + |
| 32 | +Read `{{memory_path}}/runner-state.json` for baselines from previous runs |
| 33 | +(complexity scores, type coverage, TODO inventory). After the audit, update |
| 34 | +`baselines` with current values and `known_issues` with new findings. Skip |
| 35 | +re-reporting known issues. Flag metrics that are trending in the wrong |
| 36 | +direction compared to the previous baseline. |
| 37 | + |
| 38 | +## Instructions |
| 39 | + |
| 40 | +### 1. Complexity hotspots |
| 41 | + |
| 42 | +Try ruff C901 first (may not be in the config but can be invoked directly): |
| 43 | +```bash |
| 44 | +ruff check packages/*/src/ --select C901 --output-format json 2>/dev/null || true |
| 45 | +``` |
| 46 | + |
| 47 | +If ruff is not available or C901 produces no output, manually inspect the |
| 48 | +largest source files for functions with: |
| 49 | +- Deep nesting (3+ levels of if/for/try) |
| 50 | +- Many branches (>5 if/elif chains) |
| 51 | +- Long method bodies (>60 lines) |
| 52 | + |
| 53 | +**Track trends**: compare against the previous run's baseline in runner |
| 54 | +memory. A function at complexity 12 that was 8 last week is more concerning |
| 55 | +than one that has been at 15 for months. Report the delta. |
| 56 | + |
| 57 | +Focus on `packages/data-designer-engine/src/` (core execution) and |
| 58 | +`packages/data-designer/src/data_designer/interface/` (public API) where |
| 59 | +complexity tends to accumulate. |
| 60 | + |
| 61 | +### 2. Exception hygiene |
| 62 | + |
| 63 | +Check for patterns that violate the project's "errors normalize at |
| 64 | +boundaries" principle (AGENTS.md): |
| 65 | + |
| 66 | +```bash |
| 67 | +# Bare except clauses (should use specific exception types) |
| 68 | +grep -rn "except:" packages/*/src/ --include='*.py' | grep -v "# noqa" |
| 69 | + |
| 70 | +# Swallowed exceptions (except + pass/continue with no logging) |
| 71 | +grep -rn -A1 "except" packages/*/src/ --include='*.py' | grep -B1 "pass$\|continue$" |
| 72 | +``` |
| 73 | + |
| 74 | +The key principle: internal code should NOT leak raw third-party exceptions. |
| 75 | +Module boundary functions (public API, entry points) should wrap external |
| 76 | +exceptions in `data_designer` error types. Check: |
| 77 | +- Functions in `packages/data-designer/src/` that catch third-party exceptions |
| 78 | + (httpx, pydantic, etc.) - are they re-raised as `data_designer` errors? |
| 79 | +- Plugin loading code (`data_designer/plugins/`) - bare `except:` has been |
| 80 | + found here before |
| 81 | + |
| 82 | +### 3. Type annotation coverage |
| 83 | + |
| 84 | +The repo requires typed code (AGENTS.md: "all functions, methods, and class |
| 85 | +attributes require type annotations") but has no ANN ruff rules enforcing |
| 86 | +this. Check for gaps: |
| 87 | + |
| 88 | +```bash |
| 89 | +# Public functions missing return type annotations |
| 90 | +grep -rn "def " packages/*/src/ --include='*.py' \ |
| 91 | + | grep -v "-> " \ |
| 92 | + | grep -v "def _" \ |
| 93 | + | grep -v "__init__\|__repr__\|__str__\|__eq__\|__hash__" \ |
| 94 | + | grep -v "test_" |
| 95 | +``` |
| 96 | + |
| 97 | +Also check for `Any` usage that could be more specific: |
| 98 | +```bash |
| 99 | +grep -rn ": Any\| -> Any" packages/*/src/ --include='*.py' |
| 100 | +``` |
| 101 | + |
| 102 | +**Track coverage percentage**: count public functions with full annotations |
| 103 | +vs total public functions. Compare against previous baseline. |
| 104 | + |
| 105 | +Known gap: `packages/data-designer-config/src/data_designer/custom_column.py` |
| 106 | +and `packages/data-designer-config/src/data_designer/analysis/` have been |
| 107 | +flagged before. |
| 108 | + |
| 109 | +### 4. Executable quality checks |
| 110 | + |
| 111 | +Run a few checks that exercise real code paths to catch regressions that |
| 112 | +static analysis misses. The workflow puts `.venv/bin` on PATH via |
| 113 | +`make install-dev`, so `python` resolves to the project venv. |
| 114 | + |
| 115 | +#### 4a. Error type hierarchy (fixed - run as written) |
| 116 | + |
| 117 | +Verify that the project's error types are importable and properly |
| 118 | +structured. Silent breakage here means third-party exceptions leak to users: |
| 119 | + |
| 120 | +```bash |
| 121 | +python -c " |
| 122 | +from data_designer.errors import DataDesignerError |
| 123 | +assert issubclass(DataDesignerError, Exception), 'DataDesignerError must be an Exception' |
| 124 | +print('OK: error hierarchy intact') |
| 125 | +" 2>&1 || echo "WARN: error hierarchy check failed" |
| 126 | +``` |
| 127 | + |
| 128 | +#### 4b. Input validation checks (creative - vary each run) |
| 129 | + |
| 130 | +Verify the config builder rejects bad inputs rather than silently |
| 131 | +producing corrupt configs. **Design your own invalid inputs each run** |
| 132 | +to maximize coverage over time. |
| 133 | + |
| 134 | +Examples of things to test (pick 2-3 per run, and invent new ones): |
| 135 | +- Invalid `column_type` string (should raise) |
| 136 | +- `column_type='sampler'` without `sampler_type` (should raise) |
| 137 | +- Empty builder `.build()` (should handle gracefully) |
| 138 | +- Duplicate column names (should raise or deduplicate clearly) |
| 139 | +- Invalid sampler params (e.g., `gaussian` with negative `std`, `category` |
| 140 | + with empty `values` list) |
| 141 | +- Column names with special characters or very long strings |
| 142 | +- Recently changed validators (check `git log --oneline -10 -- packages/*/src/data_designer/config/`) |
| 143 | + |
| 144 | +**API reference:** |
| 145 | + |
| 146 | +```python |
| 147 | +from data_designer.config.config_builder import DataDesignerConfigBuilder |
| 148 | + |
| 149 | +# Test that invalid input is rejected (not silently accepted) |
| 150 | +try: |
| 151 | + DataDesignerConfigBuilder().add_column( |
| 152 | + name='x', column_type='nonexistent_type' |
| 153 | + ).build() |
| 154 | + print('FAIL: invalid column type was silently accepted') |
| 155 | +except Exception as e: |
| 156 | + print(f'OK: invalid column type rejected ({type(e).__name__})') |
| 157 | +``` |
| 158 | + |
| 159 | +The pattern: try something that should fail, print FAIL if it succeeds |
| 160 | +silently, print OK if it raises. A FAIL means a validation regression |
| 161 | +that could lead to silent data corruption. |
| 162 | + |
| 163 | +Report what you tested and why. Any FAIL is a critical finding. |
| 164 | + |
| 165 | +### 5. TODO/FIXME/HACK aging |
| 166 | + |
| 167 | +Inventory markers with their git blame age: |
| 168 | + |
| 169 | +```bash |
| 170 | +grep -rn "TODO\|FIXME\|HACK" packages/*/src/ --include='*.py' |
| 171 | +``` |
| 172 | + |
| 173 | +For each marker, get the commit date: |
| 174 | +```bash |
| 175 | +# Example: get blame date for a specific line |
| 176 | +git blame -L 42,42 --date=short path/to/file.py |
| 177 | +``` |
| 178 | + |
| 179 | +**Only flag items older than 30 days.** Recent TODOs are part of normal |
| 180 | +development flow. For old items, include: |
| 181 | +- File and line number |
| 182 | +- The marker text |
| 183 | +- Age in days |
| 184 | +- The commit that introduced it (short SHA) |
| 185 | + |
| 186 | +## Output format |
| 187 | + |
| 188 | +Write the report to `/tmp/audit-{{suite}}.md`: |
| 189 | + |
| 190 | +```markdown |
| 191 | +<!-- agentic-ci-daily-{{suite}} --> |
| 192 | +## Code Quality Audit - {{date}} |
| 193 | + |
| 194 | +### Complexity hotspots |
| 195 | + |
| 196 | +| File | Function | Complexity | Trend | |
| 197 | +|------|----------|-----------|-------| |
| 198 | +| ... | ... | C901: 18 | +3 since last run | |
| 199 | + |
| 200 | +### Exception hygiene |
| 201 | + |
| 202 | +| File | Line | Pattern | Recommendation | |
| 203 | +|------|------|---------|----------------| |
| 204 | +| plugins/plugin.py | 99 | bare except | Catch ImportError/ModuleNotFoundError | |
| 205 | + |
| 206 | +### Type annotation coverage |
| 207 | + |
| 208 | +| File | Function | Issue | |
| 209 | +|------|----------|-------| |
| 210 | +| custom_column.py | generate | Missing return type | |
| 211 | + |
| 212 | +**Coverage:** ~X% of public functions fully annotated (previous: Y%) |
| 213 | + |
| 214 | +### Executable quality checks |
| 215 | + |
| 216 | +| Check | Type | Status | Detail | |
| 217 | +|-------|------|--------|--------| |
| 218 | +| Error hierarchy | fixed | OK/FAIL | DataDesignerError is properly structured | |
| 219 | +| (describe input tested) | creative | OK/FAIL | (what was tested and why) | |
| 220 | +| ... | creative | ... | ... | |
| 221 | + |
| 222 | +### TODO/FIXME/HACK inventory |
| 223 | + |
| 224 | +| File | Line | Marker | Age (days) | Commit | |
| 225 | +|------|------|--------|-----------|--------| |
| 226 | +| ... | ... | TODO: fix this | 45 | abc1234 | |
| 227 | + |
| 228 | +**Aging items:** N markers older than 30 days (M new since last run) |
| 229 | + |
| 230 | +### Summary |
| 231 | + |
| 232 | +- N complexity hotspots (M trending up) |
| 233 | +- N exception hygiene issues (M new) |
| 234 | +- Type coverage: X% (delta: +/-N% from last run) |
| 235 | +- Executable checks: N/2 passed (any FAIL is critical) |
| 236 | +- N aging TODO/FIXME markers (M new) |
| 237 | +``` |
| 238 | + |
| 239 | +If no findings in any category, write `NO_FINDINGS` on the first line instead. |
| 240 | + |
| 241 | +## Constraints |
| 242 | + |
| 243 | +- Do not modify any files. This is a read-only audit. |
| 244 | +- Do not flag test files for type coverage or exception hygiene. Tests have |
| 245 | + different standards. |
| 246 | +- Do not duplicate ruff checks (W, F, I, ICN, PIE, TID, UP*). Those are |
| 247 | + already enforced in CI. |
| 248 | +- For complexity, focus on growth trends rather than absolute values. |
| 249 | +- For TODOs, only flag items older than 30 days. |
| 250 | +- For type annotations, focus on public API surface. Internal helpers with |
| 251 | + obvious types from context are lower priority. |
0 commit comments