Skip to content

Commit 3b4ffdc

Browse files
committed
Initial commit
0 parents  commit 3b4ffdc

41 files changed

Lines changed: 3524 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
fail-fast: false
14+
matrix:
15+
python-version: ["3.10", "3.11", "3.12"]
16+
17+
steps:
18+
- uses: actions/checkout@v4
19+
20+
- name: Set up Python ${{ matrix.python-version }}
21+
uses: actions/setup-python@v5
22+
with:
23+
python-version: ${{ matrix.python-version }}
24+
25+
- name: Install
26+
run: |
27+
python -m pip install --upgrade pip
28+
pip install -e ".[dev,llm]"
29+
30+
- name: Smoke-test CLI
31+
run: |
32+
constitution-sim validate --constitution examples/simple_constitution.yaml
33+
constitution-sim validate --constitution examples/advanced_constitution.yaml
34+
constitution-sim validate --constitution examples/strong_executive_constitution.yaml
35+
36+
- name: Run pytest
37+
run: pytest -q

.gitignore

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Python
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
*.so
6+
.Python
7+
build/
8+
dist/
9+
*.egg-info/
10+
*.egg
11+
MANIFEST
12+
13+
# Virtual envs
14+
.venv/
15+
venv/
16+
env/
17+
18+
# Test / cache
19+
.pytest_cache/
20+
.mypy_cache/
21+
.ruff_cache/
22+
.coverage
23+
htmlcov/
24+
25+
# Editor / OS
26+
.idea/
27+
.vscode/
28+
*.swp
29+
.DS_Store
30+
31+
# Simulator outputs (don't commit run artefacts)
32+
events.jsonl
33+
eval_logs/
34+
plots/
35+
*.jsonl
36+
*.csv
37+
!docs/**/*.csv
38+
!examples/**/*.csv

README.md

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
# Constitution-Sim
2+
3+
**Stress-test constitutions with AI-powered politicians before trying them out on a real nation.**
4+
5+
`constitution-sim` is a research-grade multi-agent simulator. You give it
6+
a constitution and a scenario; it spins up an LLM-powered agent for each
7+
political role (Executive, Legislature, Judiciary, Media, Bureaucracy)
8+
and lets them act under the rules you wrote, turn by turn. Every action
9+
is checked by a rules engine, every event is logged, and every run is
10+
reproducible from a seed.
11+
12+
## Why
13+
14+
Politicians are not utility-maximisers reading from a spec — they
15+
deliberate, bargain, posture, and reach for legitimacy. The interesting
16+
question is *how the rules of a constitution shape that behaviour*. So
17+
the agents here are LLMs (OpenAI / Anthropic) instructed with a
18+
role-specific persona, the constitution they live under, their own
19+
goals and utility weights, and a memory of their own recent decisions.
20+
They never get to mutate the world directly — every move passes the
21+
typed rules engine first.
22+
23+
A deterministic heuristic agent is still available as a no-LLM fallback,
24+
so the project also runs offline / in CI / with zero API keys.
25+
26+
## Features
27+
28+
- **AI cognition is the default.** When `OPENAI_API_KEY` or
29+
`ANTHROPIC_API_KEY` is in the environment, `constitution-sim run`
30+
uses LLM-powered agents out of the box. With no key, it falls back to
31+
a deterministic heuristic — same CLI, same outputs, no setup required.
32+
- **Role-specific personas.** Each role (Executive, Legislature,
33+
Judiciary, Media, Bureaucracy) gets its own LLM system prompt. The
34+
Executive is ambitious; the Judiciary is reactive; the Media chases a
35+
narrative; the Bureaucracy implements steadily.
36+
- **Agent memory.** Each agent sees its own recent decisions (turn,
37+
action, legal or not) so it can reason about continuity.
38+
- **Schema-driven constitutions.** Strict Pydantic v2 models; YAML in,
39+
typed objects out. Errors are explicit and structured.
40+
- **Rules engine is source of truth.** Agents propose typed actions;
41+
the engine accepts or rejects with a reason. The LLM cannot mutate
42+
state directly.
43+
- **Partial observability.** Each role gets a state view filtered by
44+
its `observation_limits`.
45+
- **Institutional metrics.** Power concentration, deadlock, trust
46+
volatility, legitimacy, corruption pressure, emergency-power drift.
47+
- **Repeated-run evaluation harness.** Multi-seed runs with pandas /
48+
matplotlib output.
49+
- **Deterministic when seeded** (heuristic mode is byte-for-byte
50+
reproducible; LLM mode is reproducible up to provider variance).
51+
52+
## Requirements
53+
54+
- Python 3.10+ (target: 3.14)
55+
- `pydantic >= 2`, `PyYAML`, `pandas`, `matplotlib`, `seaborn`
56+
- For AI cognition: `openai` (and/or `anthropic`)
57+
58+
## Install
59+
60+
```bash
61+
git clone https://github.com/arianXdev/constitution-sim.git
62+
cd constitution-sim
63+
pip install -e ".[dev,llm]" # core + tests + LLM SDKs (recommended)
64+
# or, no-LLM-only install:
65+
pip install -e ".[dev]"
66+
```
67+
68+
This exposes a `constitution-sim` console entry point.
69+
70+
## Quickstart (AI-powered)
71+
72+
```bash
73+
export OPENAI_API_KEY=sk-...
74+
constitution-sim run \
75+
--constitution examples/advanced_constitution.yaml \
76+
--scenario examples/scenario.yaml \
77+
--turns 20 --seed 42 \
78+
--log /tmp/cs/events.jsonl \
79+
--metrics-out /tmp/cs/metrics.csv
80+
```
81+
82+
That's it. The default `--agent-type auto` notices the key, spins up
83+
LLM-powered Executive / Legislature / Judiciary / Media / Bureaucracy
84+
agents, and runs the simulation. You'll see a one-liner telling you
85+
which provider was picked.
86+
87+
Want to force a provider explicitly?
88+
89+
```bash
90+
constitution-sim run --agent-type openai --model gpt-4o-mini ...
91+
constitution-sim run --agent-type anthropic --model claude-sonnet-4-5 ...
92+
```
93+
94+
Want deterministic, no-API runs (for tests / reproducibility)?
95+
96+
```bash
97+
constitution-sim run --agent-type heuristic ...
98+
```
99+
100+
## The four CLI subcommands
101+
102+
```bash
103+
# 1. Validate a constitution YAML against the schema.
104+
constitution-sim validate --constitution examples/advanced_constitution.yaml
105+
106+
# 2. Run a simulation (single seed or multi-seed evaluation).
107+
constitution-sim run \
108+
--constitution examples/advanced_constitution.yaml \
109+
--scenario examples/scenario.yaml \
110+
--turns 30 --runs 5 --seed 42 \
111+
--log /tmp/cs/events.jsonl \
112+
--metrics-out /tmp/cs/metrics.csv \
113+
--plot-dir /tmp/cs/plots
114+
115+
# 3. Replay a recorded event log (structured summary, not re-execution).
116+
constitution-sim replay --log /tmp/cs/eval_logs/run_0_events.jsonl --show-first 5
117+
118+
# 4. Compare two evaluations (e.g. two constitutions).
119+
constitution-sim compare --a /tmp/cs/metrics_A.csv --b /tmp/cs/metrics_B.csv
120+
```
121+
122+
## What the LLM sees
123+
124+
For each turn, the LLM agent is prompted with:
125+
126+
- A role-specific persona (Executive / Legislature / …).
127+
- The constitution's name, description, and the list of other roles.
128+
- Its own declared goals and utility weights (from the YAML).
129+
- A partial state view filtered by its `observation_limits`.
130+
- A short memory of its own recent decisions (and whether they were
131+
legal).
132+
- The exact set of typed actions it's allowed to return.
133+
134+
It replies with one JSON object describing a single action. If the LLM
135+
returns malformed JSON or an action outside its permission set, the
136+
agent silently falls back to the deterministic heuristic policy — the
137+
simulator never breaks.
138+
139+
## Project structure
140+
141+
```
142+
src/constitution_sim/
143+
models/ Pydantic schemas: Constitution, Role, Rule, WorldState, actions
144+
core/ SimulationEngine, RulesEngine, Scheduler, EventLogger
145+
agents/ BaseAgent, DeterministicHeuristicAgent, LLMAgent, providers
146+
scenarios/ Shock model + ScenarioEngine
147+
analysis/ MetricsCollector, Evaluator, plot
148+
app/ CLI (validate / run / replay / compare)
149+
examples/
150+
simple_constitution.yaml
151+
advanced_constitution.yaml
152+
strong_executive_constitution.yaml
153+
scenario.yaml
154+
docs/
155+
architecture.md
156+
tutorial.md
157+
tests/
158+
```
159+
160+
## Tests
161+
162+
```bash
163+
pytest -q
164+
```
165+
166+
All tests should pass. `tests/test_determinism.py` explicitly asserts
167+
that two heuristic-mode runs with the same seed produce byte-identical
168+
event logs. `tests/test_llm_agent.py::test_live_openai_smoke` runs a
169+
real LLM round-trip when `OPENAI_API_KEY` is set, and is automatically
170+
skipped otherwise.
171+
172+
## Headline experiment
173+
174+
Compare a balanced constitution against a strong-executive one (3 runs ×
175+
12 turns, seed 11). The strong-executive YAML pushes power_concentration
176+
from ~0.47 to ~0.92 and adds illegal-action attempts to the log: laws
177+
written by one actor, judiciary unable to push back. That's the
178+
framework working as intended — see `docs/tutorial.md` for a walkthrough.
179+
180+
## Design highlights
181+
182+
- `WorldState` is the single canonical truth; agents only ever see a
183+
`StateView`.
184+
- Every action attempt is recorded in the JSONL event log, including
185+
the rules-engine reason for any rejection.
186+
- `Role.observation_limits` lets the constitution define what each role
187+
can see (e.g. the Bureaucracy doesn't see pending bills in
188+
`advanced_constitution.yaml`).
189+
- `Role.utility_weights` drives heuristic voting and is surfaced to
190+
LLM agents in their prompt as part of the persona.
191+
- `RulesEngine` does both permission checks AND state-level legality
192+
checks (you can't vote on a non-existent bill, you can't declare
193+
emergency powers if the constitution doesn't allow them).
194+
195+
See [`docs/architecture.md`](docs/architecture.md) for the full design
196+
and [`docs/tutorial.md`](docs/tutorial.md) for an end-to-end "use it
197+
like I'm 10" walkthrough.
198+
199+
## Out of scope (intentional)
200+
201+
This is an MVP, not a finished research instrument. The following are
202+
explicit non-goals at this stage:
203+
204+
- Multi-actor coalition formation / strategic communication.
205+
- Persistent economic/demographic simulation (state variables are
206+
scalars, not vector economies).
207+
- Fine-tuned LLMs or RL self-play.

0 commit comments

Comments
 (0)