Skip to content

feat/social game support#343

Merged
XuhuiZhou merged 27 commits into
mainfrom
pr-333
Jan 7, 2026
Merged

feat/social game support#343
XuhuiZhou merged 27 commits into
mainfrom
pr-333

Conversation

@XuhuiZhou
Copy link
Copy Markdown
Member

@XuhuiZhou XuhuiZhou commented Dec 25, 2025

An optimized version of #333

Closes #

📑 Description

✅ Checks

  • My pull request adheres to the code style of this project
  • My code requires changes to the documentation
  • I have updated the documentation as required
  • All the tests have passed
  • Branch name follows type/descript (e.g. feature/add-llm-agents)
  • Ready for code review

ℹ Additional Information


Note

Introduces an experimental framework for multi-agent social deduction games with structured phases, roles, and private information.

  • Adds SocialGame and SocialDeductionGame envs (sotopia/envs/social_game.py) with FSM state management, action masks (round-robin/simultaneous), per-state visibility (public/team/private), per-agent observations, and environment notifications
  • New ActionHandler hook for game-specific logic; example WerewolfActionHandler and WerewolfEnv implement voting/kill/inspect/witch logic with end conditions via SocialGameEndEvaluator
  • Extends evaluators with SocialGameEndEvaluator and passes env to evaluators; refactors ParallelSotopiaEnv (turn markers option, hidden backgrounds, action processing, async evaluator runner)
  • Enhances agents and messages: LLMAgent supports custom_template and strict constraints; Observation gains action_instruction; ScriptBackground supports hide_unknown; trims agent names in BaseAgent
  • Adds env profile game_metadata; minor server/test adjustments; bumps OpenAI dep range; expands .gitignore
  • Documentation: new Experimental "Social Game Engine" page and index entry
  • Examples: full 6-player Werewolves scenario (examples/experimental/werewolves/*) and configs for Spyfall/Undercover

Written by Cursor Bugbot for commit 4f1f9ae. This will update automatically on new commits. Configure here.

Keyu-He and others added 24 commits September 21, 2025 01:10
with minor bugs, will fix in future iterations
contain minor bugs, will fix in future iterations
Fixes several bugs preventing custom models (via custom/model@url format) from working:

  - Fix parameter name in generate.py: api_base → base_url (line 257)
  - Fix hardcoded "gpt-4" evaluator models in server.py (lines 309, 401)
    Now uses model_dict.get("evaluator", model_dict["env"])
  - Add markdown code block stripping in PydanticOutputParser
    Many local LLMs wrap JSON in ```json...```, parser now handles this
  - Fix format_bad_output to support custom models
    Passes base_url/api_key through error recovery path
    Conditionally uses response_format (custom servers may not support it)
Merge branch 'fix/custom-model-support' into feature/social-game-support
…ility issues in the game

Refactor SocialDeductionGame for real-time history and cleaner prompts

- ParallelSotopiaEnv: Added `include_turn_marker` flag to control environment turn messages.
- SocialDeductionGame:
    - Disabled environment turn markers to avoid duplication.
    - Implemented real-time history appending via `recv_message` override and `agent_message_buffer`.
    - Populated `action_instruction` in `Observation` for dynamic prompt instructions.
- Observation: Added `action_instruction` field.
- generate.py: Added `fill_template` helper for partial string formatting.
- LLMAgent: Updated `aact` to use `fill_template` to inject `action_instructions` into `custom_template`.
- Werewolves: Updated config description to populate `{agent_names}` dynamically.
next step, change script_like to false, and fix the rest errors that may cause
Found and fix the evaluation and generation error on the negotiation arena examples.

- **Termination Fix**: Updated `ParallelSotopiaEnv` to pass the `env` instance to evaluators. Modified `RuleBasedTerminatedEvaluator` to correctly count active agents using `env.agents` instead of relying solely on message history, which caused early termination in the first turn.
- **LiteLLM Support**: Updated `generate.py` to handle OpenAI schema limitations. Added `_fix_schema` to convert `prefixItems` (tuples) to `items` (arrays) and set `strict=False` to support dynamic dictionary outputs (Evaluator maps) while preventing `BadRequestError`.
@cursor
Copy link
Copy Markdown

cursor Bot commented Dec 25, 2025

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on January 6.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@XuhuiZhou XuhuiZhou requested a review from Keyu-He December 25, 2025 19:27
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 25, 2025

Codecov Report

❌ Patch coverage is 32.93413% with 224 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.95%. Comparing base (80aeaaa) to head (4f1f9ae).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
sotopia/envs/social_game.py 20.40% 199 Missing ⚠️
sotopia/envs/evaluators.py 54.54% 10 Missing ⚠️
sotopia/generation_utils/generate.py 45.45% 6 Missing ⚠️
sotopia/envs/parallel.py 84.61% 4 Missing ⚠️
sotopia/messages/message_classes.py 63.63% 4 Missing ⚠️
sotopia/agents/llm_agent.py 80.00% 1 Missing ⚠️
@@            Coverage Diff             @@
##             main     #343      +/-   ##
==========================================
- Coverage   74.80%   71.95%   -2.85%     
==========================================
  Files          72       73       +1     
  Lines        4827     5121     +294     
==========================================
+ Hits         3611     3685      +74     
- Misses       1216     1436     +220     
Files with missing lines Coverage Δ
sotopia/agents/base_agent.py 74.28% <100.00%> (ø)
sotopia/database/persistent_profile.py 65.00% <100.00%> (+0.25%) ⬆️
sotopia/envs/__init__.py 100.00% <100.00%> (ø)
sotopia/samplers/uniform_sampler.py 73.80% <100.00%> (ø)
sotopia/server.py 41.20% <ø> (ø)
tests/conftest.py 84.82% <100.00%> (-1.55%) ⬇️
sotopia/agents/llm_agent.py 50.00% <80.00%> (+1.54%) ⬆️
sotopia/envs/parallel.py 84.18% <84.61%> (+4.00%) ⬆️
sotopia/messages/message_classes.py 53.29% <63.63%> (-0.16%) ⬇️
sotopia/generation_utils/generate.py 74.21% <45.45%> (-6.33%) ⬇️
... and 2 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@XuhuiZhou XuhuiZhou changed the title social game support feat/social game support Dec 26, 2025
Keyu-He and others added 2 commits January 2, 2026 13:43
Also, in the main, the default value to the include_background_observations is a bit misleading, changing to default False.
@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Jan 7, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Mypy

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #343 at branch `pr-333`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

secrets = ""

# Check if agent is a werewolf
is_werewolf = env_profile.agent_goals[idx] == "Werewolf"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Werewolf role check compares goal to wrong string

High Severity

The is_werewolf check compares env_profile.agent_goals[idx] to the literal string "Werewolf", but agent_goals contains goal descriptions like "Deceive others, avoid detection, and eliminate villagers.", not role names. The werewolf_goal_str variable defined earlier at line 417 has the correct goal text and is used correctly at line 421, but line 432 doesn't use it. This means is_werewolf is always False, and werewolf agents never receive their partner information in the secrets variable, breaking a core game mechanic.

Fix in Cursor Fix in Web

# log elimination
_gen_logger.info(
f"{eliminated} was voted out! They were a {self.agent_to_role[eliminated]}."
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminated variable used outside defining conditional block

Low Severity

The eliminated variable is assigned inside the if vote_counts: block at line 293, but the logging statements at lines 302-304 that reference eliminated are outside that block (within the outer if votes: block). While current logic makes vote_counts always non-empty when votes is non-empty, this code structure could cause a NameError if the logic is modified or edge cases arise.

Fix in Cursor Fix in Web

Comment thread sotopia/envs/parallel.py

complied_actions = self._process_incoming_actions(actions)

# Sync evaluation (not refactored to helper as it's sync vs async)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync step doesn't pass env to evaluators

Medium Severity

The synchronous step method calls evaluators without passing env=self to kwargs, while the async _run_evaluators method correctly passes env=self. Evaluators like RuleBasedTerminatedEvaluator and SocialGameEndEvaluator check kwargs.get("env") and fall back to different behavior when it's None. This inconsistency means sync and async code paths may produce different termination decisions.

Fix in Cursor Fix in Web

agents=agents,
omniscient=omniscient,
include_background_observations=include_background_observations,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lite parameter not forwarded to parent reset

Medium Severity

The SocialDeductionGame.reset() method accepts a lite parameter in its signature at line 270, but doesn't forward it to super().reset(). The parent ParallelSotopiaEnv.reset() uses lite to clear agent backgrounds in lite mode. Since the parameter is silently dropped, calling reset(lite=True) on a SocialDeductionGame instance has no effect, and lite mode won't work for social deduction games.

Fix in Cursor Fix in Web

if instruction:
base_obs[agent_name].action_instruction = instruction

return base_obs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message buffer not cleared on environment reset

Medium Severity

The agent_message_buffer is initialized in __init__ but is never cleared in the reset() method. When a SocialDeductionGame instance is reused by calling reset() to start a new game, messages from the previous game remain in the buffer and will be delivered to agents in the new game, causing incorrect game state and potential information leakage between games.

Fix in Cursor Fix in Web

if instruction:
base_obs[agent_name].action_instruction = instruction

return base_obs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round-robin index not reset between games

Medium Severity

The _round_robin_idx counter is lazily initialized in _update_action_mask() and reset during state transitions in _perform_transition_state(), but it's not reset in reset(). When reusing a SocialDeductionGame instance, the round-robin index from the previous game persists, causing the first turn of the new game to start at the wrong agent instead of the first eligible agent.

Additional Locations (1)

Fix in Cursor Fix in Web

.replace(
"{goal}",
role_goal, # Also replace the goal here
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreplaced {secret} placeholder in prompt template

Medium Severity

The SOCIAL_GAME_PROMPT_TEMPLATE contains {secret} (singular) on line 27, but create_agents() adds and replaces {secrets} (plural). The template modification on line 413 adds {secrets} after {goal}, and line 444 replaces {secrets}, but the original {secret} placeholder is never replaced. This causes the literal text {secret} to appear in prompts sent to the LLM, which could confuse the model.

Additional Locations (1)

Fix in Cursor Fix in Web

@XuhuiZhou XuhuiZhou merged commit a0aaafb into main Jan 7, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants