|
| 1 | +# Constitution-Sim |
| 2 | + |
| 3 | +**Stress-test constitutions with AI-powered politicians before trying them out on a real nation.** |
| 4 | + |
| 5 | +`constitution-sim` is a research-grade multi-agent simulator. You give it |
| 6 | +a constitution and a scenario; it spins up an LLM-powered agent for each |
| 7 | +political role (Executive, Legislature, Judiciary, Media, Bureaucracy) |
| 8 | +and lets them act under the rules you wrote, turn by turn. Every action |
| 9 | +is checked by a rules engine, every event is logged, and every run is |
| 10 | +reproducible from a seed. |
| 11 | + |
| 12 | +## Why |
| 13 | + |
| 14 | +Politicians are not utility-maximisers reading from a spec — they |
| 15 | +deliberate, bargain, posture, and reach for legitimacy. The interesting |
| 16 | +question is *how the rules of a constitution shape that behaviour*. So |
| 17 | +the agents here are LLMs (OpenAI / Anthropic) instructed with a |
| 18 | +role-specific persona, the constitution they live under, their own |
| 19 | +goals and utility weights, and a memory of their own recent decisions. |
| 20 | +They never get to mutate the world directly — every move passes the |
| 21 | +typed rules engine first. |
| 22 | + |
| 23 | +A deterministic heuristic agent is still available as a no-LLM fallback, |
| 24 | +so the project also runs offline / in CI / with zero API keys. |
| 25 | + |
| 26 | +## Features |
| 27 | + |
| 28 | +- **AI cognition is the default.** When `OPENAI_API_KEY` or |
| 29 | + `ANTHROPIC_API_KEY` is in the environment, `constitution-sim run` |
| 30 | + uses LLM-powered agents out of the box. With no key, it falls back to |
| 31 | + a deterministic heuristic — same CLI, same outputs, no setup required. |
| 32 | +- **Role-specific personas.** Each role (Executive, Legislature, |
| 33 | + Judiciary, Media, Bureaucracy) gets its own LLM system prompt. The |
| 34 | + Executive is ambitious; the Judiciary is reactive; the Media chases a |
| 35 | + narrative; the Bureaucracy implements steadily. |
| 36 | +- **Agent memory.** Each agent sees its own recent decisions (turn, |
| 37 | + action, legal or not) so it can reason about continuity. |
| 38 | +- **Schema-driven constitutions.** Strict Pydantic v2 models; YAML in, |
| 39 | + typed objects out. Errors are explicit and structured. |
| 40 | +- **Rules engine is source of truth.** Agents propose typed actions; |
| 41 | + the engine accepts or rejects with a reason. The LLM cannot mutate |
| 42 | + state directly. |
| 43 | +- **Partial observability.** Each role gets a state view filtered by |
| 44 | + its `observation_limits`. |
| 45 | +- **Institutional metrics.** Power concentration, deadlock, trust |
| 46 | + volatility, legitimacy, corruption pressure, emergency-power drift. |
| 47 | +- **Repeated-run evaluation harness.** Multi-seed runs with pandas / |
| 48 | + matplotlib output. |
| 49 | +- **Deterministic when seeded** (heuristic mode is byte-for-byte |
| 50 | + reproducible; LLM mode is reproducible up to provider variance). |
| 51 | + |
| 52 | +## Requirements |
| 53 | + |
| 54 | +- Python 3.10+ (target: 3.14) |
| 55 | +- `pydantic >= 2`, `PyYAML`, `pandas`, `matplotlib`, `seaborn` |
| 56 | +- For AI cognition: `openai` (and/or `anthropic`) |
| 57 | + |
| 58 | +## Install |
| 59 | + |
| 60 | +```bash |
| 61 | +git clone https://github.com/arianXdev/constitution-sim.git |
| 62 | +cd constitution-sim |
| 63 | +pip install -e ".[dev,llm]" # core + tests + LLM SDKs (recommended) |
| 64 | +# or, no-LLM-only install: |
| 65 | +pip install -e ".[dev]" |
| 66 | +``` |
| 67 | + |
| 68 | +This exposes a `constitution-sim` console entry point. |
| 69 | + |
| 70 | +## Quickstart (AI-powered) |
| 71 | + |
| 72 | +```bash |
| 73 | +export OPENAI_API_KEY=sk-... |
| 74 | +constitution-sim run \ |
| 75 | + --constitution examples/advanced_constitution.yaml \ |
| 76 | + --scenario examples/scenario.yaml \ |
| 77 | + --turns 20 --seed 42 \ |
| 78 | + --log /tmp/cs/events.jsonl \ |
| 79 | + --metrics-out /tmp/cs/metrics.csv |
| 80 | +``` |
| 81 | + |
| 82 | +That's it. The default `--agent-type auto` notices the key, spins up |
| 83 | +LLM-powered Executive / Legislature / Judiciary / Media / Bureaucracy |
| 84 | +agents, and runs the simulation. You'll see a one-liner telling you |
| 85 | +which provider was picked. |
| 86 | + |
| 87 | +Want to force a provider explicitly? |
| 88 | + |
| 89 | +```bash |
| 90 | +constitution-sim run --agent-type openai --model gpt-4o-mini ... |
| 91 | +constitution-sim run --agent-type anthropic --model claude-sonnet-4-5 ... |
| 92 | +``` |
| 93 | + |
| 94 | +Want deterministic, no-API runs (for tests / reproducibility)? |
| 95 | + |
| 96 | +```bash |
| 97 | +constitution-sim run --agent-type heuristic ... |
| 98 | +``` |
| 99 | + |
| 100 | +## The four CLI subcommands |
| 101 | + |
| 102 | +```bash |
| 103 | +# 1. Validate a constitution YAML against the schema. |
| 104 | +constitution-sim validate --constitution examples/advanced_constitution.yaml |
| 105 | + |
| 106 | +# 2. Run a simulation (single seed or multi-seed evaluation). |
| 107 | +constitution-sim run \ |
| 108 | + --constitution examples/advanced_constitution.yaml \ |
| 109 | + --scenario examples/scenario.yaml \ |
| 110 | + --turns 30 --runs 5 --seed 42 \ |
| 111 | + --log /tmp/cs/events.jsonl \ |
| 112 | + --metrics-out /tmp/cs/metrics.csv \ |
| 113 | + --plot-dir /tmp/cs/plots |
| 114 | + |
| 115 | +# 3. Replay a recorded event log (structured summary, not re-execution). |
| 116 | +constitution-sim replay --log /tmp/cs/eval_logs/run_0_events.jsonl --show-first 5 |
| 117 | + |
| 118 | +# 4. Compare two evaluations (e.g. two constitutions). |
| 119 | +constitution-sim compare --a /tmp/cs/metrics_A.csv --b /tmp/cs/metrics_B.csv |
| 120 | +``` |
| 121 | + |
| 122 | +## What the LLM sees |
| 123 | + |
| 124 | +For each turn, the LLM agent is prompted with: |
| 125 | + |
| 126 | +- A role-specific persona (Executive / Legislature / …). |
| 127 | +- The constitution's name, description, and the list of other roles. |
| 128 | +- Its own declared goals and utility weights (from the YAML). |
| 129 | +- A partial state view filtered by its `observation_limits`. |
| 130 | +- A short memory of its own recent decisions (and whether they were |
| 131 | + legal). |
| 132 | +- The exact set of typed actions it's allowed to return. |
| 133 | + |
| 134 | +It replies with one JSON object describing a single action. If the LLM |
| 135 | +returns malformed JSON or an action outside its permission set, the |
| 136 | +agent silently falls back to the deterministic heuristic policy — the |
| 137 | +simulator never breaks. |
| 138 | + |
| 139 | +## Project structure |
| 140 | + |
| 141 | +``` |
| 142 | +src/constitution_sim/ |
| 143 | + models/ Pydantic schemas: Constitution, Role, Rule, WorldState, actions |
| 144 | + core/ SimulationEngine, RulesEngine, Scheduler, EventLogger |
| 145 | + agents/ BaseAgent, DeterministicHeuristicAgent, LLMAgent, providers |
| 146 | + scenarios/ Shock model + ScenarioEngine |
| 147 | + analysis/ MetricsCollector, Evaluator, plot |
| 148 | + app/ CLI (validate / run / replay / compare) |
| 149 | +examples/ |
| 150 | + simple_constitution.yaml |
| 151 | + advanced_constitution.yaml |
| 152 | + strong_executive_constitution.yaml |
| 153 | + scenario.yaml |
| 154 | +docs/ |
| 155 | + architecture.md |
| 156 | + tutorial.md |
| 157 | +tests/ |
| 158 | +``` |
| 159 | + |
| 160 | +## Tests |
| 161 | + |
| 162 | +```bash |
| 163 | +pytest -q |
| 164 | +``` |
| 165 | + |
| 166 | +All tests should pass. `tests/test_determinism.py` explicitly asserts |
| 167 | +that two heuristic-mode runs with the same seed produce byte-identical |
| 168 | +event logs. `tests/test_llm_agent.py::test_live_openai_smoke` runs a |
| 169 | +real LLM round-trip when `OPENAI_API_KEY` is set, and is automatically |
| 170 | +skipped otherwise. |
| 171 | + |
| 172 | +## Headline experiment |
| 173 | + |
| 174 | +Compare a balanced constitution against a strong-executive one (3 runs × |
| 175 | +12 turns, seed 11). The strong-executive YAML pushes power_concentration |
| 176 | +from ~0.47 to ~0.92 and adds illegal-action attempts to the log: laws |
| 177 | +written by one actor, judiciary unable to push back. That's the |
| 178 | +framework working as intended — see `docs/tutorial.md` for a walkthrough. |
| 179 | + |
| 180 | +## Design highlights |
| 181 | + |
| 182 | +- `WorldState` is the single canonical truth; agents only ever see a |
| 183 | + `StateView`. |
| 184 | +- Every action attempt is recorded in the JSONL event log, including |
| 185 | + the rules-engine reason for any rejection. |
| 186 | +- `Role.observation_limits` lets the constitution define what each role |
| 187 | + can see (e.g. the Bureaucracy doesn't see pending bills in |
| 188 | + `advanced_constitution.yaml`). |
| 189 | +- `Role.utility_weights` drives heuristic voting and is surfaced to |
| 190 | + LLM agents in their prompt as part of the persona. |
| 191 | +- `RulesEngine` does both permission checks AND state-level legality |
| 192 | + checks (you can't vote on a non-existent bill, you can't declare |
| 193 | + emergency powers if the constitution doesn't allow them). |
| 194 | + |
| 195 | +See [`docs/architecture.md`](docs/architecture.md) for the full design |
| 196 | +and [`docs/tutorial.md`](docs/tutorial.md) for an end-to-end "use it |
| 197 | +like I'm 10" walkthrough. |
| 198 | + |
| 199 | +## Out of scope (intentional) |
| 200 | + |
| 201 | +This is an MVP, not a finished research instrument. The following are |
| 202 | +explicit non-goals at this stage: |
| 203 | + |
| 204 | +- Multi-actor coalition formation / strategic communication. |
| 205 | +- Persistent economic/demographic simulation (state variables are |
| 206 | + scalars, not vector economies). |
| 207 | +- Fine-tuned LLMs or RL self-play. |
0 commit comments