Skip to content

Commit 9dc963f

Browse files
committed
feat: Implement VM fault injection tests, introduce my-stagehand-app scaffolding, add research documents, and enhance installer robustness with improved error handling and observability.
1 parent 17cdacf commit 9dc963f

23 files changed

Lines changed: 6814 additions & 36 deletions

ACFS Installation Log.md

Lines changed: 1412 additions & 0 deletions
Large diffs are not rendered by default.

AGENTS.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,50 @@ shellcheck install.sh scripts/lib/*.sh
115115

116116
---
117117

118+
## Defensive Engineering Standard
119+
120+
All long-running workflows (installer, upgrade, migration) MUST follow this standard:
121+
122+
### Stage Contract
123+
124+
Every phase declares preconditions and postconditions in `scripts/lib/stage_contract.sh`.
125+
- Preconditions are checked **before** execution in `_run_phase_with_report()`
126+
- Postconditions are verified **after** execution AND on resume (before skipping)
127+
- Postcondition drift triggers automatic phase re-run via `state_unmark_phase()`
128+
129+
### Observability
130+
131+
- Every install run gets an `ACFS_RUN_ID` (generated in `observability.sh`)
132+
- JSONL events are written to `~/.acfs/logs/install/<run_id>.jsonl`
133+
- Event types: `install_start`, `stage_start`, `stage_end`, `check_failed`, `cmd_failed`, `resume`
134+
- On failure, a structured summary box is printed with run ID, error class, and remediation
135+
136+
### Error Taxonomy
137+
138+
Errors are classified by `classify_error()` in `error_tracking.sh`:
139+
140+
| Class | Examples | Action |
141+
|-------|----------|--------|
142+
| `transient_network` | DNS, timeout, connection refused | Retry with backoff |
143+
| `permission` | Permission denied, EACCES | Stop, print fix command |
144+
| `dependency_conflict` | APT lock, broken packages | Stop, print dpkg fix |
145+
| `corrupt_state` | Invalid JSON, interrupted dpkg | Stop, suggest --force-reinstall |
146+
| `unsupported_env` | Wrong arch, unsupported OS | Stop, run preflight |
147+
| `unknown` | Unclassified | Stop, point to logs |
148+
149+
### Resumability
150+
151+
- `--resume` (default when state exists)
152+
- `--resume-from <stage>` — skip all phases before the target
153+
- `--stop-after <stage>` — exit cleanly after the target completes
154+
- `--force-reinstall` — start fresh
155+
156+
### Fault Injection Tests
157+
158+
Run with `./tests/vm/fault_injection.sh`. Tests cover network loss, apt lock, low disk, permission errors, interrupted runs, and postcondition drift.
159+
160+
---
161+
118162
## Landing the Plane (Session Completion)
119163

120164
**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until `git push` succeeds.

0 commit comments

Comments
 (0)