You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Default:**`medium`. Best cost/quality tradeoff. Names, narrative, temporal awareness, full memory, the one conversation that matters most (partner), aggregate mood. ~$150 for 10k agents on 5-mini. ~3.5x current cost for a fundamentally better simulation.
987
+
**Default:**`medium`. Best cost/quality tradeoff. Names, narrative, temporal awareness, full memory, the one conversation that matters most (partner), aggregate mood. ~$360 for 10k agents on gpt-5-mini. ~1.8x current cost for a fundamentally better simulation.
987
988
988
989
---
989
990
@@ -1120,9 +1121,9 @@ These are the minimum changes needed to move every tenet to **Strong**. Listed i
1120
1121
1121
1122
7.**Demographic consistency:** An agent with `digital_literacy: basic` should not describe a plan involving "fine-tuning open-source models." Elaborations should reflect the agent's actual capabilities and constraints.
1122
1123
1123
-
### Clustering Validation
1124
+
### Exploratory Outcome Validation
1124
1125
1125
-
8.**Cluster coherence:** For exploratory outcomes, verify clusters are semantically meaningful. Silhouette scores on embeddings. Human review of cluster labels vs representative samples. Bad clusters = too heterogeneous or too small.
1126
+
8.**Export completeness:** For exploratory outcomes, verify all agent elaborations are exported with correct agent_id, demographics, and timestep. Downstream analysis (clustering, thematic coding) is done by the agentic harness or manual DS workflows — not validated by the engine.
0 commit comments