feat/social game support by XuhuiZhou · Pull Request #343 · sotopia-lab/sotopia

XuhuiZhou · 2025-12-25T19:27:19Z

An optimized version of #333

Closes #

📑 Description

✅ Checks

My pull request adheres to the code style of this project
My code requires changes to the documentation
I have updated the documentation as required
All the tests have passed
Branch name follows type/descript (e.g. feature/add-llm-agents)
Ready for code review

ℹ Additional Information

Note

Introduces an experimental framework for multi-agent social deduction games with structured phases, roles, and private information.

Adds SocialGame and SocialDeductionGame envs (sotopia/envs/social_game.py) with FSM state management, action masks (round-robin/simultaneous), per-state visibility (public/team/private), per-agent observations, and environment notifications
New ActionHandler hook for game-specific logic; example WerewolfActionHandler and WerewolfEnv implement voting/kill/inspect/witch logic with end conditions via SocialGameEndEvaluator
Extends evaluators with SocialGameEndEvaluator and passes env to evaluators; refactors ParallelSotopiaEnv (turn markers option, hidden backgrounds, action processing, async evaluator runner)
Enhances agents and messages: LLMAgent supports custom_template and strict constraints; Observation gains action_instruction; ScriptBackground supports hide_unknown; trims agent names in BaseAgent
Adds env profile game_metadata; minor server/test adjustments; bumps OpenAI dep range; expands .gitignore
Documentation: new Experimental "Social Game Engine" page and index entry
Examples: full 6-player Werewolves scenario (examples/experimental/werewolves/*) and configs for Spyfall/Undercover

^{Written by Cursor Bugbot for commit 4f1f9ae. This will update automatically on new commits. Configure here.}

with minor bugs, will fix in future iterations

contain minor bugs, will fix in future iterations

Fixes several bugs preventing custom models (via custom/model@url format) from working: - Fix parameter name in generate.py: api_base → base_url (line 257) - Fix hardcoded "gpt-4" evaluator models in server.py (lines 309, 401) Now uses model_dict.get("evaluator", model_dict["env"]) - Add markdown code block stripping in PydanticOutputParser Many local LLMs wrap JSON in ```json...```, parser now handles this - Fix format_bad_output to support custom models Passes base_url/api_key through error recovery path Conditionally uses response_format (custom servers may not support it)

Merge branch 'fix/custom-model-support' into feature/social-game-support

…ility issues in the game Refactor SocialDeductionGame for real-time history and cleaner prompts - ParallelSotopiaEnv: Added `include_turn_marker` flag to control environment turn messages. - SocialDeductionGame: - Disabled environment turn markers to avoid duplication. - Implemented real-time history appending via `recv_message` override and `agent_message_buffer`. - Populated `action_instruction` in `Observation` for dynamic prompt instructions. - Observation: Added `action_instruction` field. - generate.py: Added `fill_template` helper for partial string formatting. - LLMAgent: Updated `aact` to use `fill_template` to inject `action_instructions` into `custom_template`. - Werewolves: Updated config description to populate `{agent_names}` dynamically.

next step, change script_like to false, and fix the rest errors that may cause

…_Sell_custom_models.py

… server

previous commit reverted too much..

Found and fix the evaluation and generation error on the negotiation arena examples. - **Termination Fix**: Updated `ParallelSotopiaEnv` to pass the `env` instance to evaluators. Modified `RuleBasedTerminatedEvaluator` to correctly count active agents using `env.agents` instead of relying solely on message history, which caused early termination in the first turn. - **LiteLLM Support**: Updated `generate.py` to handle OpenAI schema limitations. Added `_fix_schema` to convert `prefixItems` (tuples) to `items` (arrays) and set `strict=False` to support dynamic dictionary outputs (Evaluator maps) while preventing `BadRequestError`.

cursor · 2025-12-25T19:27:23Z

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on January 6.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

codecov · 2025-12-25T19:29:15Z

Codecov Report

❌ Patch coverage is 32.93413% with 224 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.95%. Comparing base (80aeaaa) to head (4f1f9ae).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
sotopia/envs/social_game.py	20.40%	199 Missing ⚠️
sotopia/envs/evaluators.py	54.54%	10 Missing ⚠️
sotopia/generation_utils/generate.py	45.45%	6 Missing ⚠️
sotopia/envs/parallel.py	84.61%	4 Missing ⚠️
sotopia/messages/message_classes.py	63.63%	4 Missing ⚠️
sotopia/agents/llm_agent.py	80.00%	1 Missing ⚠️

@@            Coverage Diff             @@
##             main     #343      +/-   ##
==========================================
- Coverage   74.80%   71.95%   -2.85%     
==========================================
  Files          72       73       +1     
  Lines        4827     5121     +294     
==========================================
+ Hits         3611     3685      +74     
- Misses       1216     1436     +220

Files with missing lines	Coverage Δ
sotopia/agents/base_agent.py	`74.28% <100.00%> (ø)`
sotopia/database/persistent_profile.py	`65.00% <100.00%> (+0.25%)`	⬆️
sotopia/envs/__init__.py	`100.00% <100.00%> (ø)`
sotopia/samplers/uniform_sampler.py	`73.80% <100.00%> (ø)`
sotopia/server.py	`41.20% <ø> (ø)`
tests/conftest.py	`84.82% <100.00%> (-1.55%)`	⬇️
sotopia/agents/llm_agent.py	`50.00% <80.00%> (+1.54%)`	⬆️
sotopia/envs/parallel.py	`84.18% <84.61%> (+4.00%)`	⬆️
sotopia/messages/message_classes.py	`53.29% <63.63%> (-0.16%)`	⬇️
sotopia/generation_utils/generate.py	`74.21% <45.45%> (-6.33%)`	⬇️
... and 2 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Also, in the main, the default value to the include_background_observations is a bit misleading, changing to default False.

openhands-ai · 2026-01-07T04:12:15Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Mypy

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #343 at branch `pr-333`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

cursor

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2026-01-07T04:16:28Z

+        secrets = ""
+
+        # Check if agent is a werewolf
+        is_werewolf = env_profile.agent_goals[idx] == "Werewolf"


Werewolf role check compares goal to wrong string

High Severity

The is_werewolf check compares env_profile.agent_goals[idx] to the literal string "Werewolf", but agent_goals contains goal descriptions like "Deceive others, avoid detection, and eliminate villagers.", not role names. The werewolf_goal_str variable defined earlier at line 417 has the correct goal text and is used correctly at line 421, but line 432 doesn't use it. This means is_werewolf is always False, and werewolf agents never receive their partner information in the secrets variable, breaking a core game mechanic.

cursor · 2026-01-07T04:16:28Z

+                # log elimination
+                _gen_logger.info(
+                    f"{eliminated} was voted out! They were a {self.agent_to_role[eliminated]}."
+                )


Eliminated variable used outside defining conditional block

Low Severity

The eliminated variable is assigned inside the if vote_counts: block at line 293, but the logging statements at lines 302-304 that reference eliminated are outside that block (within the outer if votes: block). While current logic makes vote_counts always non-empty when votes is non-empty, this code structure could cause a NameError if the logic is modified or edge cases arise.

cursor · 2026-01-07T04:16:28Z


+        complied_actions = self._process_incoming_actions(actions)
+
+        # Sync evaluation (not refactored to helper as it's sync vs async)


Sync step doesn't pass env to evaluators

Medium Severity

The synchronous step method calls evaluators without passing env=self to kwargs, while the async _run_evaluators method correctly passes env=self. Evaluators like RuleBasedTerminatedEvaluator and SocialGameEndEvaluator check kwargs.get("env") and fall back to different behavior when it's None. This inconsistency means sync and async code paths may produce different termination decisions.

cursor · 2026-01-07T04:16:28Z

+            agents=agents,
+            omniscient=omniscient,
+            include_background_observations=include_background_observations,
+        )


lite parameter not forwarded to parent reset

Medium Severity

The SocialDeductionGame.reset() method accepts a lite parameter in its signature at line 270, but doesn't forward it to super().reset(). The parent ParallelSotopiaEnv.reset() uses lite to clear agent backgrounds in lite mode. Since the parameter is silently dropped, calling reset(lite=True) on a SocialDeductionGame instance has no effect, and lite mode won't work for social deduction games.

cursor · 2026-01-07T04:16:28Z

+                if instruction:
+                    base_obs[agent_name].action_instruction = instruction
+
+        return base_obs


Message buffer not cleared on environment reset

Medium Severity

The agent_message_buffer is initialized in __init__ but is never cleared in the reset() method. When a SocialDeductionGame instance is reused by calling reset() to start a new game, messages from the previous game remain in the buffer and will be delivered to agents in the new game, causing incorrect game state and potential information leakage between games.

cursor · 2026-01-07T04:16:28Z

+                if instruction:
+                    base_obs[agent_name].action_instruction = instruction
+
+        return base_obs


Round-robin index not reset between games

Medium Severity

The _round_robin_idx counter is lazily initialized in _update_action_mask() and reset during state transitions in _perform_transition_state(), but it's not reset in reset(). When reusing a SocialDeductionGame instance, the round-robin index from the previous game persists, causing the first turn of the new game to start at the wrong agent instead of the first eligible agent.

Additional Locations (1)

sotopia/envs/social_game.py#L407-L415

cursor · 2026-01-07T04:28:39Z

+            .replace(
+                "{goal}",
+                role_goal,  # Also replace the goal here
+            )


Unreplaced {secret} placeholder in prompt template

Medium Severity

The SOCIAL_GAME_PROMPT_TEMPLATE contains {secret} (singular) on line 27, but create_agents() adds and replaces {secrets} (plural). The template modification on line 413 adds {secrets} after {goal}, and line 444 replaces {secrets}, but the original {secret} placeholder is never replaced. This causes the literal text {secret} to appear in prompts sent to the LLM, which could confuse the model.

Additional Locations (1)

sotopia/envs/social_game.py#L26-L27

Keyu-He and others added 24 commits September 21, 2025 01:10

werewolf game in progress

d389b6e

with minor bugs, will fix in future iterations

werewolf game in progress

8b8850d

contain minor bugs, will fix in future iterations

updated prompt

2cc3990

current progress

3a9f689

fix mypy errors

df62578

To run the local models

f482b60

Merge branch 'fix/custom-model-support' into feature/social-game-support

Design Social Game class, werewolf demo working in progress

b453633

Merge branch 'main' into feature/social-game-support

7de839b

update on the SocialGame class / SocialDeductionGame class

ff49e41

fix mypy errors

71711b6

debugging on the prompts

39bb4e3

werewolf game debug

ca835c3

next step, change script_like to false, and fix the rest errors that may cause

Refactor social_game.py and update werewolves example

c0f7866

Add Social Game Engine documentation

089b30c

Delete examples/experimental/negotiation_arena/NegotiationArena_1_Buy…

4ce9d6c

…_Sell_custom_models.py

Restore sotopia/cli/install/redis-data/dump.rdb to match origin/main

39f46cd

Revert unnessarily changes in the uniform_sample and server.py

aacd07a

Minor update on werewolf prompt, Compatibility on uniform sampler and…

f676238

… server

update uniform_sampler and server.py to the correct versions

67dc7db

previous commit reverted too much..

move visibility prompt inside werewolf game's config

d48f71d

Merge branch 'main' into pr-333

dc127c8

XuhuiZhou requested a review from Keyu-He December 25, 2025 19:27

XuhuiZhou changed the title ~~social game support~~ feat/social game support Dec 26, 2025

Keyu-He and others added 2 commits January 2, 2026 13:43

Revert include_background_observations changes in parallel.py

89be742

Also, in the main, the default value to the include_background_observations is a bit misleading, changing to default False.

fix tests

d0ca4b4

cursor Bot reviewed Jan 7, 2026

View reviewed changes

mypy

4f1f9ae

cursor Bot reviewed Jan 7, 2026

View reviewed changes

XuhuiZhou merged commit a0aaafb into main Jan 7, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/social game support#343

feat/social game support#343
XuhuiZhou merged 27 commits into
mainfrom
pr-333

XuhuiZhou commented Dec 25, 2025 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot commented Dec 25, 2025

Uh oh!

codecov Bot commented Dec 25, 2025 •

edited

Loading

Uh oh!

openhands-ai Bot commented Jan 7, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jan 7, 2026

Uh oh!

cursor Bot Jan 7, 2026

Uh oh!

cursor Bot Jan 7, 2026

Uh oh!

cursor Bot Jan 7, 2026

Uh oh!

cursor Bot Jan 7, 2026

Uh oh!

cursor Bot Jan 7, 2026

Uh oh!

cursor Bot Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		complied_actions = self._process_incoming_actions(actions)

		# Sync evaluation (not refactored to helper as it's sync vs async)

Conversation

XuhuiZhou commented Dec 25, 2025 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📑 Description

✅ Checks

ℹ Additional Information

Uh oh!

cursor Bot commented Dec 25, 2025

Uh oh!

codecov Bot commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

openhands-ai Bot commented Jan 7, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor Bot Jan 7, 2026

Choose a reason for hiding this comment

Werewolf role check compares goal to wrong string

Uh oh!

cursor Bot Jan 7, 2026

Choose a reason for hiding this comment

Eliminated variable used outside defining conditional block

Uh oh!

cursor Bot Jan 7, 2026

Choose a reason for hiding this comment

Sync step doesn't pass env to evaluators

Uh oh!

cursor Bot Jan 7, 2026

Choose a reason for hiding this comment

lite parameter not forwarded to parent reset

Uh oh!

cursor Bot Jan 7, 2026

Choose a reason for hiding this comment

Message buffer not cleared on environment reset

Uh oh!

cursor Bot Jan 7, 2026

Choose a reason for hiding this comment

Round-robin index not reset between games

Uh oh!

cursor Bot Jan 7, 2026

Choose a reason for hiding this comment

Unreplaced {secret} placeholder in prompt template

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XuhuiZhou commented Dec 25, 2025 •

edited by cursor Bot

Loading

codecov Bot commented Dec 25, 2025 •

edited

Loading

Unreplaced `{secret}` placeholder in prompt template