@@ -115,6 +115,50 @@ shellcheck install.sh scripts/lib/*.sh
115115
116116---
117117
118+ ## Defensive Engineering Standard
119+
120+ All long-running workflows (installer, upgrade, migration) MUST follow this standard:
121+
122+ ### Stage Contract
123+
124+ Every phase declares preconditions and postconditions in ` scripts/lib/stage_contract.sh ` .
125+ - Preconditions are checked ** before** execution in ` _run_phase_with_report() `
126+ - Postconditions are verified ** after** execution AND on resume (before skipping)
127+ - Postcondition drift triggers automatic phase re-run via ` state_unmark_phase() `
128+
129+ ### Observability
130+
131+ - Every install run gets an ` ACFS_RUN_ID ` (generated in ` observability.sh ` )
132+ - JSONL events are written to ` ~/.acfs/logs/install/<run_id>.jsonl `
133+ - Event types: ` install_start ` , ` stage_start ` , ` stage_end ` , ` check_failed ` , ` cmd_failed ` , ` resume `
134+ - On failure, a structured summary box is printed with run ID, error class, and remediation
135+
136+ ### Error Taxonomy
137+
138+ Errors are classified by ` classify_error() ` in ` error_tracking.sh ` :
139+
140+ | Class | Examples | Action |
141+ | -------| ----------| --------|
142+ | ` transient_network ` | DNS, timeout, connection refused | Retry with backoff |
143+ | ` permission ` | Permission denied, EACCES | Stop, print fix command |
144+ | ` dependency_conflict ` | APT lock, broken packages | Stop, print dpkg fix |
145+ | ` corrupt_state ` | Invalid JSON, interrupted dpkg | Stop, suggest --force-reinstall |
146+ | ` unsupported_env ` | Wrong arch, unsupported OS | Stop, run preflight |
147+ | ` unknown ` | Unclassified | Stop, point to logs |
148+
149+ ### Resumability
150+
151+ - ` --resume ` (default when state exists)
152+ - ` --resume-from <stage> ` — skip all phases before the target
153+ - ` --stop-after <stage> ` — exit cleanly after the target completes
154+ - ` --force-reinstall ` — start fresh
155+
156+ ### Fault Injection Tests
157+
158+ Run with ` ./tests/vm/fault_injection.sh ` . Tests cover network loss, apt lock, low disk, permission errors, interrupted runs, and postcondition drift.
159+
160+ ---
161+
118162## Landing the Plane (Session Completion)
119163
120164** When ending a work session** , you MUST complete ALL steps below. Work is NOT complete until ` git push ` succeeds.
0 commit comments