feat: implement chatroom for multi-llm conversation#162
Open
dolaameng wants to merge 24 commits into
Open
Conversation
…om framework - Implemented ChatRoom context manager for multi-agent perspective-aware message routing. - Added identity awareness, system prompt enrichment, and automatic roster injection. - Added support for private channels, visible_to restrictions, and interleaved histories. - Refactored game_werewolf_chatroom.py to dynamically scale up to 7 players. - Added unit tests verifying multi-directional privacy and sealed bid isolation.
…oting bugs - Fixed cache_id in runs.py and slug in serialization.py to resolve the actual model version identifier instead of participant name. - Fixed werewolf game vote extraction robustness in game_werewolf_chatroom.py to prevent false-positives on mentioned names. - Fixed test_corporate_takeover_chatroom assertion types in test_chatroom.py.
…fix streaming typeerrors - Switched game_werewolf_chatroom.py to use structured WerewolfVote outputs. - Enabled kbench.config.enable_interactive_mode() and player.stream_responses = True for live onstream rendering. - Added survival/existence validation to voting loops to prevent crashes during ties or votes on dead players. - Fixed panel.py new_chunk streaming TypeError by extracting string content from LLMResponse chunk objects. - Updated tests/test_chatroom.py werewolf mock responses to return structured JSON.
…update design doc - Remove 4-player backward compat; run_werewolf now strictly requires 7 players. - Upgrade Alice/Bob wolf prompts with double-bluff and distancing strategies. - Update test_werewolf_chatroom to simulate a full 2-round, 7-player game. - Add section 9.5 to design.md for Panel streaming bug fix. - Remove example-specific structured voting details from design doc.
…vatars - Replace fuzzy name matching with explicit eligible name lists in vote prompts. - Use neutral role-agnostic avatars to avoid spoiling werewolf identities.
bc7701a to
633ced9
Compare
ca3ee6b to
2e84931
Compare
e8e876e to
9b3ec0e
Compare
develra
approved these changes
May 29, 2026
Contributor
develra
left a comment
There was a problem hiding this comment.
Reviewed at a high level to the best of my ability in a time-boxed way (30 minutes) - mostly looking at the examples and tests. LGTM - neat feature, but def might have missed some more subtle issues.
dolaameng
commented
May 29, 2026
| # Uses role="user" because LLM APIs require user/assistant alternation. | ||
| self._narrator = actors.Actor(name=name, role="user", avatar="📢") | ||
|
|
||
| def add_participant( |
Collaborator
Author
There was a problem hiding this comment.
Do we still need to clone the llm?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
ChatRoomis a shared conversation context forkaggle-benchmarksthat letsmultiple LLMs converse with full awareness of each other's identities and roles.
Key capabilities:
assistantand peers' messages as attributedusermessages.multiple participants without identity collisions.
via
visible_to, and multi-turn private conversations are supported viaprivate_channel.participate alongside LLMs as
Actorinstances.Why
Multi-agent evaluation (debate, negotiation, social deduction, cooperative games)
is an increasingly important dimension for frontier LLM benchmarking, but the
existing
ChatAPI doesn't support it natively:LLMs are unaware of each other. Each agent has an isolated chat context.
The user manually forwards messages between them, stripping and re-injecting
roles. LLMs have no idea they are talking to another LLM.
Boilerplate is high. Existing multi-agent benchmarks in this repo
(dungeon_adventure.py,
game_tic_tac_toe.py,
pgg.py)
each re-implement ~40–160 lines of manual orchestration.
No conversation memory. Some benchmarks create a brand new
Chatevery turn, leaving LLMs with zero memory of previous turns.
How
Core Abstraction
A
ChatRoomis a shared conversation space. Users register participants viaadd_participant(), then drive the conversation inside awith room:blockusing two primitives:
room.post(msg)participant.reply()/actor.say(msg)Participant Registration (
add_participant)room.add_participant(actor, *, name=, avatar=, system_prompt=):system prompt so every agent knows who else is in the room.
automatically visible to all other participants.
LLMChatparticipants, creates an independentclone so the same LLM can be reused for multiple participants.
Perspective Projection
The core mechanism that makes multi-agent conversations work. When an LLM calls
reply(), the room:personal prompt.
viewer's own messages have role
assistantand all peer messages have roleuserwith name prefixes (e.g.,[Bob]: ...).the ground-truth log.
Original messages are never mutated — projection always creates new objects.
Private Information
visible_to— anyroom.post()can be restricted to a subset ofparticipants. Hidden messages are filtered out during perspective projection.
private_channel— creates a childChatRoomfor multi-turn privateconversations (e.g., werewolf night phase). Members retain context across
both the main room and the channel; non-members never see channel messages.
Code-Driven Participants (
Actor)Non-LLM participants (game engines, moderators) participate as
Actorinstancesusing
actor.say(msg). Other participants see attributed messages just like LLMmessages (e.g.,
[Game]: Board: X|O|_).Integration with Existing Framework
ChatRoomintegrates with the existingcontexts.enter()system, so
chats.get_current_chat()returns the active room.ChatRoomauto-registers it as a nested stepin the parent chat, so the full transcript renders in the Panel UI without any
UI modifications.
reply(schema=...)works for typed responses (e.g.,game moves, votes).
underlying model identity, not participant names.
Example: Before & After
Tic-Tac-Toe
Before — fresh chat each turn, zero memory:
After — full history, attributed turns:
Key Files
ChatRoomclass, perspective projection, visibility filteringLLMChat.reply()implementationActor.say()implementationOpen Questions
reply()visibility — shouldreply()supportvisible_tofor privateLLM-generated responses? Deferred for now; all
reply()output is public.run.json(ground-truth log vs. per-participant projections vs. both).