|
248 | 248 | - **Type**: migration |
249 | 249 | - **Status**: complete |
250 | 250 | - **Chain hash**: `c33daae014d19022...` |
| 251 | + |
| 252 | +## 2026-05-19 — Multi-GPU support, governance hardening, full doc cross-check |
| 253 | + |
| 254 | +**Objective**: Add community GPU support (ROCm/XPU), harden governance to 30/30, |
| 255 | +resolve all documentation gaps, and fix stale content across the repository. |
| 256 | + |
| 257 | +**What was done**: |
| 258 | + |
| 259 | +- **Multi-backend device support** (`real_lm_experiment.py`): `--device` flag added |
| 260 | + (`cuda`, `rocm`, `xpu`, `mps`, `cpu`); auto-detection chain `cuda > rocm > xpu > mps > cpu`; |
| 261 | + ROCm detected via `torch.version.hip`; community-tested backends emit issue-link at runtime. |
| 262 | +- **Docker images**: `Dockerfile.cuda` (NVIDIA, verified), `Dockerfile.rocm` (AMD ROCm 6.x, |
| 263 | + community-tested), `Dockerfile.xpu` (Intel Arc/Xe, community-tested). MPS documented as |
| 264 | + not Docker-compatible (Apple Metal not accessible from Linux containers). |
| 265 | +- **Hardware issue template**: `.github/ISSUE_TEMPLATE/hardware_compat.md` added for |
| 266 | + community ROCm/XPU/MPS compatibility reports. |
| 267 | +- **REQ-OEA-023 + TEST-OEA-023**: hardware abstraction (P2) added to REQUIREMENTS.md and |
| 268 | + TESTS.md. All 23 accepted REQs now have test coverage. |
| 269 | +- **DEC-005**: hardware abstraction decision documented in ARCHITECTURE.md. |
| 270 | + REQ-OEA-020 and TEST-OEA-020 updated to reference `Dockerfile.cuda`. |
| 271 | +- **`scaffold.yml` type fix**: `aee-research` → `research-python` to match scanner detection. |
| 272 | + AEE epistemic governance fully preserved via `enable_epistemic: true`. |
| 273 | + specsmith audit: 30/30 checks, 0 issues (was 29/29 with 1 issue). |
| 274 | +- **AGENTS.md**: spec version updated 0.10.1 → 0.11.3.dev427; type updated aee-research → research-python. |
| 275 | +- **REPRODUCE.md**: Step 4 rewritten with direct pip install commands per backend; |
| 276 | + stale `setup.sh --cuda/--mps` references removed; stale numpy<2 note removed; |
| 277 | + Docker section fully updated with ROCm/XPU run commands. |
| 278 | +- **requirements-lock.txt**: per-backend install instructions added (ROCm 6.x, XPU, CUDA 12.4+, MPS); |
| 279 | + incorrect ABI comment from dependabot bump fixed. |
| 280 | +- **Dependabot PRs**: all 4 merged (numpy 2.4.5, matplotlib 3.10.9, scipy 1.17.1, pytest 9.0.3). |
| 281 | +- **GitHub issues**: #12 (stress-test confidence parser), #13 (type false-positive), |
| 282 | + #14 (publication workflow feature), #5 (submission prep) — all closed with comments. |
| 283 | +- **specsmith migrate**: 0.11.3 → 0.11.3.dev427 applied; ledger-chain.txt committed. |
| 284 | +- **AMLA 2026**: evaluated as predatory conference (AIRCC, no CORE ranking, 9 co-located |
| 285 | + events same day, $390-490 fee). Not recommended. Issue #5 updated accordingly. |
| 286 | + |
| 287 | +**Files changed**: `scaffold.yml`, `AGENTS.md`, `CHANGELOG.md`, `LEDGER.md`, |
| 288 | +`Dockerfile`, `Dockerfile.cuda`, `Dockerfile.rocm`, `Dockerfile.xpu`, |
| 289 | +`requirements-lock.txt`, `README.md`, `REPRODUCE.md`, `docs/ARCHITECTURE.md`, |
| 290 | +`docs/REQUIREMENTS.md`, `docs/TESTS.md`, `experiments/real_lm_experiment.py`, |
| 291 | +`.github/ISSUE_TEMPLATE/hardware_compat.md` |
| 292 | + |
| 293 | +**Checks run**: `specsmith audit` (30/30), `specsmith validate` (5/5), |
| 294 | +`specsmith status` (CI ✓, 0 Dependabot alerts, 0 open PRs), pytest (12/12), CI green. |
| 295 | + |
| 296 | +**Results**: Healthy. 30/30 audit checks. 0 open issues. 0 open PRs. CI passing. |
| 297 | + |
| 298 | +**Next step**: Merge develop → main when ready to publish hardware support. |
| 299 | + |
| 300 | +## 2026-05-19T13:38 — Multi-GPU support, governance hardening, full doc cross-check: added --device flag (cuda/rocm/xpu/mps/cpu) with ROCm/XPU auto-detection; Dockerfile.cuda (verified), Dockerfile.rocm, Dockerfile.xpu (community-tested); hardware_compat issue template; REQ/TEST-OEA-023 (hardware abstraction); DEC-005 in ARCHITECTURE; scaffold.yml type aee-research->research-python (specsmith audit 30/30 clean); AGENTS.md spec version 0.10.1->0.11.3.dev427; REPRODUCE.md stale content fixed; requirements-lock.txt per-backend install instructions; 4 dependabot PRs merged; GitHub issues #5 #12 #13 #14 closed; AMLA 2026 evaluated as predatory conference |
| 301 | +- **Author**: Tristen Pierson |
| 302 | +- **Type**: feature |
| 303 | +- **REQs affected**: REQ-OEA-020,REQ-OEA-023 |
| 304 | +- **Status**: complete |
| 305 | +- **Chain hash**: `522c1c447906f02a...` |
0 commit comments