Conversation
with minor bugs, will fix in future iterations
contain minor bugs, will fix in future iterations
Fixes several bugs preventing custom models (via custom/model@url format) from working:
- Fix parameter name in generate.py: api_base → base_url (line 257)
- Fix hardcoded "gpt-4" evaluator models in server.py (lines 309, 401)
Now uses model_dict.get("evaluator", model_dict["env"])
- Add markdown code block stripping in PydanticOutputParser
Many local LLMs wrap JSON in ```json...```, parser now handles this
- Fix format_bad_output to support custom models
Passes base_url/api_key through error recovery path
Conditionally uses response_format (custom servers may not support it)
Merge branch 'fix/custom-model-support' into feature/social-game-support
…ility issues in the game
Refactor SocialDeductionGame for real-time history and cleaner prompts
- ParallelSotopiaEnv: Added `include_turn_marker` flag to control environment turn messages.
- SocialDeductionGame:
- Disabled environment turn markers to avoid duplication.
- Implemented real-time history appending via `recv_message` override and `agent_message_buffer`.
- Populated `action_instruction` in `Observation` for dynamic prompt instructions.
- Observation: Added `action_instruction` field.
- generate.py: Added `fill_template` helper for partial string formatting.
- LLMAgent: Updated `aact` to use `fill_template` to inject `action_instructions` into `custom_template`.
- Werewolves: Updated config description to populate `{agent_names}` dynamically.
next step, change script_like to false, and fix the rest errors that may cause
…_Sell_custom_models.py
previous commit reverted too much..
Found and fix the evaluation and generation error on the negotiation arena examples. - **Termination Fix**: Updated `ParallelSotopiaEnv` to pass the `env` instance to evaluators. Modified `RuleBasedTerminatedEvaluator` to correctly count active agents using `env.agents` instead of relying solely on message history, which caused early termination in the first turn. - **LiteLLM Support**: Updated `generate.py` to handle OpenAI schema limitations. Added `_fix_schema` to convert `prefixItems` (tuples) to `items` (arrays) and set `strict=False` to support dynamic dictionary outputs (Evaluator maps) while preventing `BadRequestError`.
|
You have run out of free Bugbot PR reviews for this billing cycle. This will reset on January 6. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
Codecov Report❌ Patch coverage is @@ Coverage Diff @@
## main #343 +/- ##
==========================================
- Coverage 74.80% 71.95% -2.85%
==========================================
Files 72 73 +1
Lines 4827 5121 +294
==========================================
+ Hits 3611 3685 +74
- Misses 1216 1436 +220
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
Also, in the main, the default value to the include_background_observations is a bit misleading, changing to default False.
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| secrets = "" | ||
|
|
||
| # Check if agent is a werewolf | ||
| is_werewolf = env_profile.agent_goals[idx] == "Werewolf" |
There was a problem hiding this comment.
Werewolf role check compares goal to wrong string
High Severity
The is_werewolf check compares env_profile.agent_goals[idx] to the literal string "Werewolf", but agent_goals contains goal descriptions like "Deceive others, avoid detection, and eliminate villagers.", not role names. The werewolf_goal_str variable defined earlier at line 417 has the correct goal text and is used correctly at line 421, but line 432 doesn't use it. This means is_werewolf is always False, and werewolf agents never receive their partner information in the secrets variable, breaking a core game mechanic.
| # log elimination | ||
| _gen_logger.info( | ||
| f"{eliminated} was voted out! They were a {self.agent_to_role[eliminated]}." | ||
| ) |
There was a problem hiding this comment.
Eliminated variable used outside defining conditional block
Low Severity
The eliminated variable is assigned inside the if vote_counts: block at line 293, but the logging statements at lines 302-304 that reference eliminated are outside that block (within the outer if votes: block). While current logic makes vote_counts always non-empty when votes is non-empty, this code structure could cause a NameError if the logic is modified or edge cases arise.
|
|
||
| complied_actions = self._process_incoming_actions(actions) | ||
|
|
||
| # Sync evaluation (not refactored to helper as it's sync vs async) |
There was a problem hiding this comment.
Sync step doesn't pass env to evaluators
Medium Severity
The synchronous step method calls evaluators without passing env=self to kwargs, while the async _run_evaluators method correctly passes env=self. Evaluators like RuleBasedTerminatedEvaluator and SocialGameEndEvaluator check kwargs.get("env") and fall back to different behavior when it's None. This inconsistency means sync and async code paths may produce different termination decisions.
| agents=agents, | ||
| omniscient=omniscient, | ||
| include_background_observations=include_background_observations, | ||
| ) |
There was a problem hiding this comment.
lite parameter not forwarded to parent reset
Medium Severity
The SocialDeductionGame.reset() method accepts a lite parameter in its signature at line 270, but doesn't forward it to super().reset(). The parent ParallelSotopiaEnv.reset() uses lite to clear agent backgrounds in lite mode. Since the parameter is silently dropped, calling reset(lite=True) on a SocialDeductionGame instance has no effect, and lite mode won't work for social deduction games.
| if instruction: | ||
| base_obs[agent_name].action_instruction = instruction | ||
|
|
||
| return base_obs |
There was a problem hiding this comment.
Message buffer not cleared on environment reset
Medium Severity
The agent_message_buffer is initialized in __init__ but is never cleared in the reset() method. When a SocialDeductionGame instance is reused by calling reset() to start a new game, messages from the previous game remain in the buffer and will be delivered to agents in the new game, causing incorrect game state and potential information leakage between games.
| if instruction: | ||
| base_obs[agent_name].action_instruction = instruction | ||
|
|
||
| return base_obs |
There was a problem hiding this comment.
Round-robin index not reset between games
Medium Severity
The _round_robin_idx counter is lazily initialized in _update_action_mask() and reset during state transitions in _perform_transition_state(), but it's not reset in reset(). When reusing a SocialDeductionGame instance, the round-robin index from the previous game persists, causing the first turn of the new game to start at the wrong agent instead of the first eligible agent.
Additional Locations (1)
| .replace( | ||
| "{goal}", | ||
| role_goal, # Also replace the goal here | ||
| ) |
There was a problem hiding this comment.
Unreplaced {secret} placeholder in prompt template
Medium Severity
The SOCIAL_GAME_PROMPT_TEMPLATE contains {secret} (singular) on line 27, but create_agents() adds and replaces {secrets} (plural). The template modification on line 413 adds {secrets} after {goal}, and line 444 replaces {secrets}, but the original {secret} placeholder is never replaced. This causes the literal text {secret} to appear in prompts sent to the LLM, which could confuse the model.
An optimized version of #333
Closes #
📑 Description
✅ Checks
type/descript(e.g.feature/add-llm-agents)ℹ Additional Information
Note
Introduces an experimental framework for multi-agent social deduction games with structured phases, roles, and private information.
SocialGameandSocialDeductionGameenvs (sotopia/envs/social_game.py) with FSM state management, action masks (round-robin/simultaneous), per-state visibility (public/team/private), per-agent observations, and environment notificationsActionHandlerhook for game-specific logic; exampleWerewolfActionHandlerandWerewolfEnvimplement voting/kill/inspect/witch logic with end conditions viaSocialGameEndEvaluatorSocialGameEndEvaluatorand passesenvto evaluators; refactorsParallelSotopiaEnv(turn markers option, hidden backgrounds, action processing, async evaluator runner)LLMAgentsupportscustom_templateand strict constraints;Observationgainsaction_instruction;ScriptBackgroundsupportshide_unknown; trims agent names inBaseAgentgame_metadata; minor server/test adjustments; bumps OpenAI dep range; expands .gitignoreexamples/experimental/werewolves/*) and configs for Spyfall/UndercoverWritten by Cursor Bugbot for commit 4f1f9ae. This will update automatically on new commits. Configure here.