Skip to content

Latest commit

 

History

History
1147 lines (1080 loc) · 204 KB

File metadata and controls

1147 lines (1080 loc) · 204 KB

A Survey on Large Language Model-Based Game Agents (ACM CSUR)

PRs Welcome Visits Stars Forks

🔥 Must-read papers for LLM-based Game agents.

📘 Our survey has been accepted by ACM Computing Surveys (CSUR). We are preparing the camera ready. Feel free to reach out if you find missing reference.

💫 We continuously update the GitHub list on a weekly basis.

📝 If you discover any papers that are suitable but not yet included, please open an issue or submit a pull request.

Browse by Genre

Browse by Mechanism


By Genre

minecraft

  • [2026/05] GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents arXiv [paper] #minecraft #training #vlm
  • [2026/05] Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents arXiv [paper] #minecraft #self-improvement
  • [2026/04] Experience Transfer for Multimodal LLM Agents in Minecraft Game arXiv [paper] #minecraft
  • [2026/04] Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game arXiv [paper] #minecraft #memory #multi-agent #vlm
  • [2026/03] BLOCK: An Open-Source Bi-Stage MLLM Character-to-Skin Pipeline for Minecraft arXiv [paper] #minecraft #vlm
  • [2026/02] Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention arXiv [paper] #minecraft
  • [2026/01] MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents arXiv [paper] #minecraft
  • [2025/12] Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning arXiv [paper] #minecraft #training
  • [2025/11] Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting IEICE Transactions on Information and Systems 2025 [paper] #minecraft #memory
  • [2025/09] PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments 2025 IEEE Conference on Games (CoG) 2025 [paper] #minecraft #multi-agent #self-improvement
  • [2025/09] Experience-based Knowledge Correction for Robust Planning in Minecraft ICLR 2026 Poster [paper] #minecraft #planning
  • [2025/08] CausalMACE: Causality Empowered Multi-Agents in Minecraft Cooperative Tasks Findings of EMNLP 2025 [paper] #minecraft #planning #multi-agent
  • [2025/08] Vistawise: Building Cost-effective Agent with Cross-modal Knowledge Graph for Minecraft EMNLP 2025 [paper] #minecraft #memory #tool-use
  • [2025/07] VoyagerVision: Investigating the Role of Multi-modal Information for Open-ended Learning Systems arXiv [paper] #minecraft #vlm
  • [2025/07] Referential ambiguity and clarification requests: comparing human and LLM behaviour Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference 2025 [paper] #minecraft
  • [2025/06] Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts arXiv [paper][code] #minecraft #planning
  • [2025/06] Matrix-Game: Interactive World Foundation Model arXiv [paper][code] #minecraft
  • [2025/06] GuessBench: Sensemaking Multimodal Creativity in the Wild arXiv [paper] #minecraft #training #vlm
  • [2025/05] Don’t Just Follow MLLM Plans: Robust and Efficient Planning for Open-World Agents arXiv [paper] #minecraft #planning #vlm
  • [2025/05] Knowledge Retrieval in LLM Gaming: A Shift from Entity-Centric to Goal-Oriented Graphs Knowledge-Based Systems 2025 [paper] #minecraft #planning #memory
  • [2025/05] BeliefNest: A Joint Action Simulator for Embodied Agents with Theory of Mind arXiv [paper] #minecraft
  • [2025/05] MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Cultural Learning NeurIPS 2025 poster [paper] #minecraft #training
  • [2025/05] WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents NeurIPS 2025 poster [paper] #minecraft #planning #world-model #prompting
  • [2025/04] Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning arXiv [paper][code] #minecraft #multi-agent #training
  • [2025/04] WALL-E 2.0: World Alignment by NeuroSymbolic Learning Improves World Model-based LLM Agents arXiv [paper][code] #minecraft #planning #world-model #prompting
  • [2025/03] Uncertainty in Action: Confidence Elicitation in Embodied Agents arXiv [paper] #minecraft
  • [2025/03] Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems arXiv [paper] #minecraft #planning #multi-agent #tool-use
  • [2025/03] NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks In Open Domains ICLR 2025 Poster [paper] #minecraft
  • [2025/03] Word2Minecraft: Generating 3D Game Levels through Large Language Models arXiv [paper][code] #minecraft #generation
  • [2025/03] Plancraft: an evaluation dataset for planning with LLM agents COLM 2025 [paper] #minecraft #planning #memory #tool-use
  • [2025/02] GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks arXiv [paper][code] #minecraft
  • [2025/02] Optimus-2: Multimodal World Model for Open-World Minecraft Agents CVPR 2025 [paper][code] #minecraft #planning #world-model #vlm
  • [2025/01] LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence ICML 2025 poster [paper] #minecraft #training
  • [2024/12] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft arXiv [paper][code] #minecraft #multi-agent #training #vlm
  • [2024/11] MrSteve: Instruction-Following Agents with What-Where-When Memory ICLR 2025 [paper][code] #minecraft #memory
  • [2024/10] WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents arXiv [paper][code] #minecraft #planning #world-model
  • [2024/10] ADAM: An Embodied Causal Agent in Open-World Environments ICLR 2025 [paper][code] #minecraft #planning
  • [2024/09] MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory ICLR 2025 Poster [paper] #minecraft #memory
  • [2024/08] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-horizon Tasks NeurIPS 2024 [paper][code] #minecraft #planning #memory #prompting
  • [2024/07] Odyssey: Empowering Agents with Open-World Skills. IJCAI 2024 [paper][code] #minecraft #planning #tool-use
  • [2024/07] OmniJARVIS: Omni-Modal Open-World Agents in Minecraft NeurIPS 2024 [paper][code] #minecraft #training #vlm
  • [2024/06] VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft Findings of ACL 2024 [paper][code] #minecraft #planning #multi-agent
  • [2024/03] MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control arXiv [paper][code] #minecraft
  • [2024/03] MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs arXiv [paper][code] #minecraft #multi-agent #vlm
  • [2024/03] Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation (HAS) ICLR 2024 Workshop [paper] #minecraft #planning #multi-agent #vlm
  • [2024/02] RL-GPT: Integrating Reinforcement Learning and Code-as-policy NeurIPS 2024 oral [paper] #minecraft #planning #tool-use #training
  • [2024/01] ReGAL: Refactoring Programs to Discover Generalizable Abstractions ICML 2024 [paper][code] #minecraft
  • [2023/12] MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception CVPR 2024 [paper][code] #minecraft #planning #vlm
  • [2023/12] Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft CVPR 2023 [paper] #minecraft #training
  • [2023/12] Creative Agents: Empowering Agents with Imagination for Creative Tasks UAI 2023 [paper][code] #minecraft #planning
  • [2023/11] JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models TPAMI 2023 [paper][code] #minecraft #planning #memory
  • [2023/11] See and Think: Embodied Agent in Virtual Environment ECCV 2023 [paper][code] #minecraft #memory
  • [2023/10] LLaMA Rider: Spurring Large Language Models to Explore the Open World NAACL 2023 [paper][code] #minecraft #planning #training
  • [2023/10] MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft ICML 2023 [paper][code] #minecraft
  • [2023/10] Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds ICLR 2024 [paper] #minecraft #vlm
  • [2023/05] Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory arXiv [paper] #minecraft #training
  • [2023/05] VOYAGER: An Open-Ended Embodied Agent with Large Language Models FMDM@NeurIPS2023 [paper][code] #minecraft #tool-use #training
  • [2023/03] Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks FMDM@NeurIPS2023 [paper][code] #minecraft #planning #training
  • [2023/02] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents NeurIPS 2023 [paper][code] #minecraft #planning #prompting
  • [2022/07] Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code Wordplay@ACL 2022 [paper] #minecraft

text-adventure

  • [2026/05] T$^2$PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning arXiv [paper][code] #text-adventure #training
  • [2026/05] PRISM: Perception Reasoning Interleaved for Sequential Decision Making arXiv [paper] #text-adventure #vlm
  • [2026/05] From History to State: Constant-Context Skill Learning for LLM Agents arXiv [paper] #text-adventure #planning #training
  • [2026/05] StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction arXiv [paper] #text-adventure #training
  • [2026/05] AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning arXiv [paper] #text-adventure #training
  • [2026/05] Belief Memory: Agent Memory Under Partial Observability arXiv [paper] #text-adventure #memory
  • [2026/05] SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents arXiv [paper] #text-adventure
  • [2026/05] Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents arXiv [paper] #text-adventure #training #self-improvement
  • [2026/05] Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning arXiv [paper] #text-adventure #tool-use #training
  • [2026/05] SkillMaster: Toward Autonomous Skill Mastery in LLM Agents arXiv [paper] #text-adventure #training
  • [2026/05] PriorZero: Bridging Language Priors and World Models for Decision Making arXiv [paper][code] #text-adventure #planning #world-model #training
  • [2026/05] Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents arXiv [paper] #text-adventure #memory #tool-use
  • [2026/05] What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents arXiv [paper] #text-adventure #training
  • [2026/05] APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents arXiv [paper] #text-adventure #planning #self-improvement
  • [2026/05] SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs arXiv [paper] #text-adventure #memory #tool-use #training
  • [2026/05] Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning arXiv [paper] #text-adventure #training
  • [2026/05] Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL arXiv [paper][code] #text-adventure #training
  • [2026/05] R2V Agent: Teaching SLMs When to Ask for Help arXiv [paper] #text-adventure #training
  • [2026/05] SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems arXiv [paper] #text-adventure #planning #memory #tool-use
  • [2026/04] Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents arXiv [paper] #text-adventure #planning #memory
  • [2026/04] Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents arXiv [paper][code] #text-adventure #training
  • [2026/04] DORA Explorer: Improving the Exploration Ability of LLMs Without Training arXiv [paper] #text-adventure #planning #prompting
  • [2026/04] From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents arXiv [paper] #text-adventure #planning
  • [2026/04] Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows arXiv [paper] #text-adventure #planning #memory #multi-agent
  • [2026/04] ReDAct: Uncertainty-Aware Deferral for LLM Agents arXiv [paper] #text-adventure
  • [2026/04] GraSP: Graph-Structured Skill Compositions for LLM Agents arXiv [paper] #text-adventure #planning #memory #self-improvement
  • [2026/04] DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents arXiv [paper][code] #text-adventure #training
  • [2026/03] Hindsight Credit Assignment for Long-Horizon LLM Agents arXiv [paper] #text-adventure #training
  • [2026/03] How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment arXiv [paper] #text-adventure #multi-agent #training
  • [2026/03] RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy arXiv [paper] #text-adventure #planning #memory
  • [2026/03] RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback arXiv [paper] #text-adventure #memory #training #self-improvement
  • [2026/03] Reward Prediction with Factorized World States arXiv [paper] #text-adventure #planning #prompting
  • [2026/02] MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents arXiv [paper] #text-adventure #self-improvement
  • [2026/02] Active Epistemic Control for Query-Efficient Verified Planning arXiv [paper] #text-adventure #planning
  • [2026/02] Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents arXiv [paper] #text-adventure #planning #training
  • [2026/02] TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents arXiv [paper] #text-adventure #planning
  • [2026/02] CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines arXiv [paper] #text-adventure #planning #world-model #training
  • [2026/02] Reinforcement World Model Learning for LLM-based Agents arXiv [paper] #text-adventure #world-model
  • [2026/02] SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards arXiv [paper] #text-adventure #training
  • [2026/02] SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning arXiv [paper][code] #text-adventure #memory #tool-use #training
  • [2026/01] Learning How to Remember: A Meta-Cognitive Management Method for Structured and Transferable Agent Memory arXiv [paper] #text-adventure
  • [2026/01] Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents arXiv [paper] #text-adventure #planning #prompting
  • [2026/01] Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents arXiv [paper] #text-adventure #planning #training
  • [2026/01] Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates arXiv [paper][code] #text-adventure #training
  • [2025/12] Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2025 [paper] #text-adventure #memory
  • [2025/12] GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators arXiv [paper] #text-adventure
  • [2025/11] SkillGen: Learning Domain Skills for In-Context Sequential Decision Making arXiv [paper] #text-adventure #prompting
  • [2025/10] Fine-tuning with RAG for Improving LLM Learning of New Skills arXiv [paper] #text-adventure #planning #memory #training
  • [2025/10] GenQuest: An LLM-based Text Adventure Game for Language Learners arXiv [paper] #text-adventure #vlm #generation
  • [2025/10] Constrained Natural Language Action Planning for Resilient Embodied Systems arXiv [paper] #text-adventure #planning #prompting
  • [2025/10] The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas arXiv [paper] #text-adventure #planning
  • [2025/10] Graph-Enhanced Policy Optimization in LLM Agent Training arXiv [paper] #text-adventure #planning #training
  • [2025/10] SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph arXiv [paper] #text-adventure #training
  • [2025/09] World Model Implanting for Test-time Adaptation of Embodied Agents ICML 2025 poster [paper] #text-adventure #memory #world-model #prompting
  • [2025/09] Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent arXiv [paper] #text-adventure #planning #multi-agent #self-improvement
  • [2025/09] Code Driven Planning with Domain-Adaptive Critic arXiv [paper] #text-adventure #planning #self-improvement
  • [2025/09] Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents arXiv [paper] #text-adventure #training
  • [2025/09] Reflect before Act: Proactive Error Correction in Language Models arXiv [paper] #text-adventure
  • [2025/09] Reward Is Enough: LLMs Are In-Context Reinforcement Learners ICLR 2026 Poster [paper] #text-adventure #training #self-improvement
  • [2025/09] Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI ICLR 2026 Poster [paper] #text-adventure #planning #vlm
  • [2025/09] Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments ICLR 2026 Poster [paper] #text-adventure #prompting
  • [2025/09] Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations ICLR 2026 Poster [paper] #text-adventure #training
  • [2025/09] Dual-Scale World Memory for LLM Agents towards Hard-Exploration Problems ICLR 2026 Poster [paper] #text-adventure
  • [2025/09] Code Driven Planning with Domain-Adaptive Selector ICLR 2026 Poster [paper] #text-adventure #planning
  • [2025/09] Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization ICLR 2026 Poster [paper] #text-adventure #memory #training
  • [2025/09] DreamPhase: Offline Imagination and Uncertainty-Guided Planning for Large-Language-Model Agents ICLR 2026 Poster [paper] #text-adventure #planning #world-model
  • [2025/09] Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning ICLR 2026 Poster [paper] #text-adventure #tool-use #training
  • [2025/08] Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2025 [paper] #text-adventure #planning #training #vlm
  • [2025/07] CoEx -- Co-evolving World-model and Exploration EMNLP 2025 [paper] #text-adventure #planning #world-model
  • [2025/06] Enhancing Decision-Making of Large Language Models via Actor-Critic ICML 2025 poster [paper] #text-adventure #training
  • [2025/06] Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search arXiv [paper] #text-adventure #planning #world-model #training
  • [2025/06] StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns arXiv [paper] #text-adventure #memory
  • [2025/06] OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections arXiv [paper] #text-adventure #planning #training #self-improvement
  • [2025/06] KnowMap: Efficient Knowledge-Driven Task Adaptation for LLMs arXiv [paper] #text-adventure #training
  • [2025/05] STORY2GAME: Generating (Almost) Everything in an Interactive Fiction Game arXiv [paper] #text-adventure #generation
  • [2025/05] LLM-BABYBENCH: Understanding and Evaluating Grounded Planning and Reasoning in LLMs arXiv [paper][code] #text-adventure #planning
  • [2025/05] Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning ICML 2025 poster [paper] #text-adventure #planning #training
  • [2025/05] Retrospex: Language Agent Meets Offline Reinforcement Learning Critic EMNLP 2025 [paper] #text-adventure #training
  • [2025/05] Agent-Environment Alignment via Automated Interface Generation arXiv [paper][code] #text-adventure #tool-use
  • [2025/05] Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale arXiv [paper] #text-adventure
  • [2025/05] Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks NeurIPS 2025 poster [paper] #text-adventure
  • [2025/05] Learning to Play Like Humans: A Framework for LLM Adaptation in Interactive Fiction Games ACL 2025 [paper] #text-adventure
  • [2025/05] Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking arXiv [paper] #text-adventure #prompting
  • [2025/05] SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution arXiv [paper][code] #text-adventure #training
  • [2025/05] ActiveVOO: Value of Observation Guided Active Knowledge Acquisition for Open-World Embodied Lifted Regression Planning NeurIPS 2025 poster [paper] #text-adventure #planning #prompting #vlm
  • [2025/05] SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents NeurIPS 2025 poster [paper] #text-adventure #planning #training #self-improvement
  • [2025/04] TALES: Text Adventure Learning Environment Suite arXiv [paper] #text-adventure
  • [2025/04] Monte Carlo Planning with Large Language Model for Text-Based Game Agents ICLR 2025 Poster [paper] #text-adventure #planning #training
  • [2025/04] Group-in-Group Policy Optimization for LLM Agent Training NeurIPS 2025 poster [paper] #text-adventure #training
  • [2025/03] Haunted House: A text-based game for comparing the flexibility of mental models in humans and LLMs arXiv [paper] #text-adventure
  • [2025/03] GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks CVPR 2025 [paper] #text-adventure #planning #training #vlm
  • [2025/02] TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning. arXiv [paper] #text-adventure #self-improvement
  • [2025/02] Process Reward Models for LLM Agents: Practical Framework and Directions arXiv [paper][code] #text-adventure #training
  • [2024/12] Fine-tuning large vision-language models as decision-making agents via reinforcement learning NeurIPS 2024 [paper][code] #text-adventure #training #vlm
  • [2024/09] Discriminator-Guided Embodied Planning for LLM Agent ICLR 2025 Poster [paper] #text-adventure #planning
  • [2024/09] Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback ICLR 2025 Poster [paper] #text-adventure #planning #training #self-improvement
  • [2024/07] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents ACL 2024 [paper][code] #text-adventure #tool-use
  • [2024/07] Arigraph: Learning knowledge graph world models with episodic memory for llm agents IJCAI 2024 [paper] #text-adventure #planning #memory
  • [2024/06] Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement EMNLP 2024 [paper][code] #text-adventure #self-improvement
  • [2024/06] STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models ACL 2024 [paper][code] #text-adventure #training
  • [2024/05] Agent Planning with World Knowledge Model NeurIPS 2024 [paper][code] #text-adventure #planning #memory #world-model
  • [2024/05] THREAD: Thinking Deeper with Recursive Spawning NAACL 2024 [paper] #text-adventure #prompting
  • [2024/05] Policy Improvement using Language Feedback Models NeurIPS 2024 poster [paper] #text-adventure #training
  • [2024/05] AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning NeurIPS 2024 poster [paper][code] #text-adventure #planning
  • [2024/04] Learning From Failure: Integrating Negative Examples When Fine-tuning Large Language Models as Agent arXiv [paper][code] #text-adventure #tool-use #training
  • [2024/04] ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy arXiv [paper] #text-adventure #planning #training #self-improvement
  • [2024/03] KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents NAACL 2024 [paper][code] #text-adventure #planning
  • [2024/03] Language Guided Exploration for RL Agents in Text Environments NAACL 2024 [paper][code] #text-adventure #training
  • [2024/03] Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents ACL 2024 [paper][code] #text-adventure #training #self-improvement
  • [2024/03] O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models COLM [paper] #text-adventure #training
  • [2024/03] ReAct Meets ActRe: Autonomous Annotation of Agent Trajectories for Contrastive Self-Training COLM [paper] #text-adventure #planning #training #self-improvement
  • [2024/03] StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows COLM [paper] #text-adventure #planning #self-improvement
  • [2024/02] Soft Self-Consistency Improves Language Model Agents arXiv [paper][code] #text-adventure #self-improvement
  • [2024/02] Empowering Large Language Model Agents through Action Learning COLM [paper][code] #text-adventure #planning #self-improvement
  • [2023/11] ADaPT: As-Needed Decomposition and Planning with Language Models NAACL 2023 [paper][code] #text-adventure #planning
  • [2023/10] FireAct: Toward Language Agent Fine-tuning arXiv [paper][code] #text-adventure #training
  • [2023/10] Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models ICML 2024 [paper][code] #text-adventure #planning #training
  • [2023/05] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks NeurIPS 2023 [paper][code] #text-adventure #planning
  • [2023/04] Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions arXiv [paper] #text-adventure #world-model
  • [2023/03] Reflexion: Language Agents with Verbal Reinforcement Learning NeurIPS 2023 [paper][code] #text-adventure #training #self-improvement
  • [2022/10] ReAct: Synergizing Reasoning and Acting in Language Models ICLR 2023 [paper][code] #text-adventure #planning #training
  • [2022/03] ScienceWorld: Is your Agent Smarter than a 5th Grader? EMNLP 2022 [paper][code] #text-adventure
  • [2020/10] ALFWorld: Aligning Text and Embodied Environments for Interactive Learning ICLR 2021 [paper][code] #text-adventure #planning
  • [2019/09] Interactive Fiction Games: A Colossal Adventure AAAI 2020 [paper][code] #text-adventure

communication

  • [2026/05] Evaluating Large Language Models in a Complex Hidden Role Game arXiv [paper] #communication #planning
  • [2026/05] QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents arXiv [paper][code] #communication
  • [2026/04] Trust, Lies, and Long Memories: Emergent Social Dynamics and Reputation in Multi-Round Avalon with LLM Agents arXiv [paper] #communication
  • [2026/03] Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information arXiv [paper] #communication #role-play
  • [2026/03] Deception and Communication in Autonomous Multi-Agent Systems: An Experimental Study with Among Us Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2026 [paper] #communication #multi-agent
  • [2026/01] Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines Using Social Deduction Games International Conference on Agents 2027 [paper] #communication #multi-agent
  • [2026/01] Multicultural Spyfall: Assessing LLMs through Dynamic Multilingual Social Deduction Game arXiv [paper] #communication
  • [2025/12] WOLF: Werewolf-based Observations for LLM Deception and Falsehoods arXiv [paper] #communication #multi-agent
  • [2025/12] Measuring Fine-Grained Negotiation Tactics of Humans and LLMs in Diplomacy arXiv [paper] #communication #training
  • [2025/11] CSP4SDG: Constraint and Information-Theory Based Role Identification in Social Deduction Games with LLM-Enhanced Inference AAAI 2025 [paper] #communication
  • [2025/11] Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal Reasoning arXiv [paper][code] #communication #multi-agent
  • [2025/10] Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies arXiv [paper] #communication #multi-agent #self-improvement
  • [2025/08] Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy AAAI 2025 [paper] #communication #training
  • [2025/08] What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles AAAI 2025 [paper] #communication
  • [2025/08] Ethical Considerations of Large Language Models in Game Playing arXiv [paper] #communication
  • [2025/07] CoMet: Metaphor-Driven Covert Communication for Multi-Agent Language Games ACL 2025 [paper][code] #communication #multi-agent
  • [2025/07] Strategy Adaptation in Large Language Model Werewolf Agents arXiv [paper] #communication #prompting
  • [2025/06] WereWolf-Plus: An Update of Werewolf Game setting Based on DSGBench arXiv [paper][code] #communication #multi-agent
  • [2025/06] DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy ICML 2025 poster [paper] #communication #training
  • [2025/05] Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL NeurIPS 2025 poster [paper] #communication #planning #tool-use #training
  • [2025/01] DVM: Towards Controllable LLM Agents in Social Deduction Games IEEE International Conference on Acoustics, Speech, and Signal Processing 2025 [paper] #communication #training
  • [2024/12] Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy NeurIPS 2024 [paper][code] #communication #self-improvement
  • [2024/09] Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search ICLR 2025 Poster [paper] #communication #planning #multi-agent #training
  • [2024/06] PLAYER: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games arXiv [paper] #communication #multi-agent
  • [2024/05] Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf NeurIPS 2024 poster [paper] #communication #training
  • [2024/05] Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy NeurIPS 2024 poster [paper] #communication #planning #multi-agent #self-improvement
  • [2024/04] Self-playing Adversarial Language Game Enhances LLM Reasoning NeurIPS 2024 [paper][code] #communication #training #self-improvement
  • [2024/03] Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game COLM [paper] #communication #multi-agent
  • [2024/02] Enhance Reasoning for Large Language Models in the Game Werewolf arXiv [paper] #communication #training
  • [2024/02] What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents arXiv [paper] #communication
  • [2024/02] Can Large Language Model Agents Simulate Human Trust Behaviors? NeurIPS 2024 [paper] #communication
  • [2024/02] Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives ACL 2024 [paper] #communication
  • [2023/12] Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis AAAI 2024 [paper] #communication
  • [2023/12] Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game arXiv [paper] #communication #multi-agent
  • [2023/12] Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games ACL 2023 [paper] #communication #multi-agent #prompting
  • [2023/11] War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars arXiv [paper][code] #communication #multi-agent
  • [2023/11] clembench: Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents EMNLP 2023 [paper] #communication
  • [2023/10] Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game ICML 2023 [paper] #communication #training
  • [2023/10] Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation arXiv [paper] #communication #training
  • [2023/10] LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay EMNLP 2023 [paper] #communication #multi-agent
  • [2023/10] Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models arXiv [paper][code] #communication #multi-agent
  • [2023/10] AvalonBench: Evaluating LLMs Playing the Game of Avalon FMDM@NeurIPS2023 [paper][code] #communication
  • [2023/09] Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf arXiv [paper] #communication #memory
  • [2023/08] GameEval: Evaluating LLMs on Conversational Games arXiv [paper][code] #communication
  • [2022/12] Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning Science [paper] #communication

competition

  • [2026/05] GAMBIT: A Three-Mode Benchmark for Adversarial Robustness in Multi-Agent LLM Collectives arXiv [paper] #competition #multi-agent #prompting
  • [2026/05] Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games arXiv [paper] #competition
  • [2026/05] Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models arXiv [paper] #competition #training
  • [2026/04] Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents arXiv [paper] #competition
  • [2026/04] MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2026 [paper] #competition #multi-agent #training
  • [2026/03] Grounded Chess Reasoning in Language Models via Master Distillation arXiv [paper] #competition #planning #training
  • [2026/03] GTO Wizard Benchmark arXiv [paper] #competition #planning #multi-agent #prompting
  • [2026/03] Self-Evolving Multi-Agent Framework for Efficient Decision Making in Real-Time Strategy Scenarios arXiv [paper] #competition #planning #memory #multi-agent
  • [2026/02] World Models for Policy Refinement in StarCraft II arXiv [paper] #competition #world-model #prompting
  • [2026/02] VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study arXiv [paper] #competition #training
  • [2025/12] LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess arXiv [paper] #competition
  • [2025/12] Beyond Accuracy: A Geometric Stability Analysis of Large Language Models in Chess Evaluation arXiv [paper] #competition
  • [2025/10] Memory-Augmented State Machine Prompting: A Novel LLM Agent Framework for Real-Time Strategy Games arXiv [paper] #competition #memory
  • [2025/10] ChessQA: Evaluating Large Language Models for Chess Understanding arXiv [paper] #competition
  • [2025/10] Out-of-distribution Tests Reveal Compositionality in Chess Transformers arXiv [paper] #competition #planning
  • [2025/09] HLSMAC: A New StarCraft Multi-Agent Challenge for High-Level Strategic Decision-Making arXiv [paper] #competition #multi-agent #training
  • [2025/09] Speculative Actions: A Lossless Framework for Faster AI Agents ICLR 2026 Oral [paper] #competition #tool-use
  • [2025/09] SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning ICLR 2026 Poster [paper] #competition #planning #multi-agent #training
  • [2025/08] Tracking World States with Language Models: State-Based Evaluation Using Chess arXiv [paper] #competition
  • [2025/08] SC2Arena and StarEvolve: Benchmark and Self-Improvement Framework for LLMs in Complex Decision-Making Tasks arXiv [paper] #competition #planning #training #self-improvement
  • [2025/07] Learning to Imitate with Less: Efficient Individual Behavior Modeling in Chess arXiv [paper] #competition
  • [2025/05] Enfoque Odychess: Un método dialéctico, constructivista y adaptativo para la enseñanza del ajedrez con inteligencias artificiales generativas arXiv [paper] #competition #training
  • [2025/05] Can Large Language Models Master Complex Card Games? NeurIPS 2025 poster [paper][code] #competition #training
  • [2025/04] Explore the Reasoning Capability of LLMs in the Chess Testbed NAACL 2025 [paper] #competition
  • [2025/04] ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition arXiv [paper][code] #competition #planning
  • [2025/04] LLM-PySC2: Starcraft II learning environment for Large Language Models NeurIPS 2025 poster [paper] #competition #planning #multi-agent #vlm
  • [2025/04] The PokeAgent Challenge: Competitive and Long Context Learning at Scale NeurIPS Competition Track 2025 [paper] #competition
  • [2025/03] Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning COLM 2025 [paper] #competition #multi-agent #training
  • [2025/02] Hierarchical Expert Prompt for Large-Language-Model: An Approach Defeat Elite AI in TextStarCraft II for the First Time arXiv [paper][code] #competition
  • [2025/02] Implicit Search via Discrete Diffusion: A Study on Chess ICLR 2025 Poster [paper][code] #competition #planning
  • [2025/01] POKERBENCH: Training Large Language Models to become Professional Poker Players AAAI 2025 [paper] #competition #planning #training
  • [2025/01] Complete Chess Games Enable LLM Become A Chess Master NAACL 2025 [paper] #competition #training
  • [2025/01] Mastering Board Games by External and Internal Planning with Language Models ICML 2025 spotlightposter [paper] #competition #planning
  • [2025/01] Language Models as Implicit Tree Search ICML 2025 poster [paper] #competition #planning #training
  • [2024/10] PokéChamp: An Expert-level Minimax Language Agent ICML 2025 [paper][code] #competition #planning
  • [2024/08] Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information 2024 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) 2024 [paper] #competition #planning #multi-agent #training
  • [2024/05] Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models NeurIPS 2024 poster [paper] #competition
  • [2024/05] Reflective Multi-Agent Collaboration based on Large Language Models NeurIPS 2024 poster [paper] #competition #planning #multi-agent #training
  • [2024/03] Embodied LLM Agents Learn to Cooperate in Organized Teams IEEE Transactions on Computational Social Systems 2024 [paper] #competition #planning #multi-agent
  • [2024/02] PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models TOIT 2025 [paper][code] #competition #training
  • [2024/02] Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization ACL 2024 [paper][code] #competition #training
  • [2024/01] PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas Hold'em via Large Language Model arXiv [paper] #competition #training
  • [2024/01] SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models arXiv [paper] #competition #training
  • [2023/12] Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach NeurIPS 2024 poster [paper][code] #competition #planning
  • [2023/09] Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 COLM 2024 [paper] #competition #planning #memory #prompting
  • [2023/08] Are ChatGPT and GPT-4 Good Poker Players?--A Pre-Flop Analysis arXiv [paper] #competition
  • [2023/06] ChessGPT: Bridging Policy Learning and Language Modeling NeurIPS 2023 [paper][code] #competition
  • [2022/10] Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task ICLR 2023 [paper] #competition

cooperation

  • [2026/04] Don't Make the LLM Read the Graph: Make the Graph Think arXiv [paper] #cooperation #multi-agent
  • [2026/03] On the Strengths and Weaknesses of Data for Open-set Embodied Assistance arXiv [paper] #cooperation #training
  • [2025/12] ReCollab: Retrieval-Augmented LLMs for Cooperative Ad-hoc Teammate Modeling arXiv [paper] #cooperation #memory
  • [2025/10] LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game arXiv [paper] #cooperation #multi-agent
  • [2025/08] CausalPlan: Empowering Efficient LLM Multi-Agent Collaboration Through Causality-Driven Planning arXiv [paper] #cooperation #planning #multi-agent #training
  • [2025/06] PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models arXiv [paper] #cooperation #planning #multi-agent
  • [2024/05] Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration ACL 2024 [paper][code] #cooperation #planning #multi-agent #training
  • [2024/03] Can LLM-Augmented Autonomous Agents Cooperate?, An Evaluation of Their Cooperative Capabilities through Melting Pot IEEE Transactions on Artificial Intelligence 2024 [paper] #cooperation #multi-agent
  • [2024/03] ProAgent: Building Proactive Cooperative Agents with Large Language Models AAAI 2024 [paper] #cooperation
  • [2024/02] S-Agents: Self-organizing Agents in Open-ended Environments arXiv [paper] #cooperation
  • [2023/12] LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination AAMAS 2023 [paper] #cooperation
  • [2023/10] Evaluating Multi-agent Coordination Abilities in Large Language Models NAACL 2023 [paper] #cooperation #planning #multi-agent #prompting
  • [2023/10] Theory of Mind for Multi-Agent Collaboration via Large Language Models EMNLP 2023 [paper][code] #cooperation #planning #multi-agent #training
  • [2023/09] MindAgent: Emergent Gaming Interaction NAACL 2023 [paper] #cooperation #planning #multi-agent #prompting
  • [2023/07] Building Cooperative Embodied Agents Modularly with Large Language Models ICLR 2024 [paper][code] #cooperation #planning #multi-agent #training

sim-social

  • [2026/05] GASim: A Graph-Accelerated Hybrid Framework for Social Simulation arXiv [paper][code] #sim-social #memory #multi-agent
  • [2026/05] ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles arXiv [paper] #sim-social #memory #multi-agent
  • [2026/05] ALSO: Adversarial Online Strategy Optimization for Social Agents arXiv [paper] #sim-social #multi-agent #training
  • [2026/05] PAVE: A Cognitive Architecture for Legitimate Violation in Generative Agent Societies arXiv [paper] #sim-social #role-play
  • [2026/04] LLM-Agent-based Social Simulation for Attitude Diffusion arXiv [paper] #sim-social #memory
  • [2026/04] Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach arXiv [paper] #sim-social #role-play
  • [2026/04] SLALOM: Simulation Lifecycle Analysis via Longitudinal Observation Metrics for Social Simulation arXiv [paper] #sim-social
  • [2026/04] RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents arXiv [paper] #sim-social #planning
  • [2026/04] Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents arXiv [paper] #sim-social
  • [2026/04] Auditing Support Strategies in LLMs through Grounded Multi-Turn Social Simulation arXiv [paper] #sim-social
  • [2026/03] PolicySim: An LLM-Based Agent Social Simulation Sandbox for Proactive Policy Optimization WWW 2026 [paper] #sim-social #training
  • [2026/03] Belief-Driven Multi-Agent Collaboration via Approximate Perfect Bayesian Equilibrium for Social Simulation WWW 2026 [paper][code] #sim-social #multi-agent
  • [2026/02] AIvilization v0: Toward Large-Scale Artificial Social Simulation with a Unified Agent Architecture and Adaptive Agent Profiles arXiv [paper] #sim-social #planning
  • [2026/02] Exploring Silicon-Based Societies: An Early Study of the Moltbook Agent Community arXiv [paper] #sim-social
  • [2026/02] Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook Proceedings of the ACM Conference on AI and Agentic Systems 2026 [paper] #sim-social
  • [2026/02] The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies arXiv [paper] #sim-social #multi-agent #self-improvement
  • [2026/01] When Agents See Humans as the Outgroup: Belief-Dependent Bias in LLM-Powered Agents arXiv [paper] #sim-social #multi-agent
  • [2026/01] MARO: Learning Stronger Reasoning from Social Interaction arXiv [paper] #sim-social #multi-agent
  • [2026/01] HumanLLM: Towards Personalized Understanding and Simulation of Human Nature arXiv [paper] #sim-social #training
  • [2025/12] EZYer: A simulacrum of high school with generative agent arXiv [paper] #sim-social #memory
  • [2025/12] Agent-Kernel: A MicroKernel Multi-Agent System Framework for Adaptive Social Simulation Powered by LLMs arXiv [paper] #sim-social #multi-agent
  • [2025/10] Multimodal Safety Evaluation in Generative Agent Social Simulations arXiv [paper] #sim-social #planning #vlm
  • [2025/10] Alita-G: Self-Evolving Generative Agent for Agent Generation arXiv [paper] #sim-social #memory #self-improvement
  • [2025/10] Doing Things with Words: Rethinking Theory of Mind Simulation in Large Language Models Computational Linguistics 2025 [paper] #sim-social
  • [2025/10] Social Simulations with Large Language Model Risk Utopian Illusion arXiv [paper] #sim-social #multi-agent
  • [2025/10] Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations WWW 2025 [paper] #sim-social
  • [2025/09] Implicit Behavioral Alignment of Language Agents in High-Stakes Crowd Simulations EMNLP 2025 [paper] #sim-social #role-play
  • [2025/09] The Emergence of Altruism in Large-Language-Model Agents Society arXiv [paper] #sim-social
  • [2025/07] LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra arXiv [paper][code] #sim-social #multi-agent #training #role-play
  • [2025/07] Too Human to Model:The Uncanny Valley of LLMs in Social Simulation -- When Generative Language Agents Misalign with Modelling Principles arXiv [paper] #sim-social
  • [2025/07] Validating Generative Agent-Based Models of Social Norm Enforcement: From Replication to Novel Predictions Annual Meeting of the Cognitive Science Society 2025 [paper] #sim-social #role-play
  • [2025/06] Empowering Economic Simulation for Massively Multiplayer Online Games through Generative Agent-Based Modeling KDD 2025 [paper] #sim-social #training
  • [2025/06] IndoorWorld: Integrating Physical Task Solving and Social Simulation in A Heterogeneous Multi-Agent Environment EMNLP 2025 [paper] #sim-social #multi-agent
  • [2025/06] AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need arXiv [paper][code] #sim-social #planning #multi-agent
  • [2025/06] Infected Smallville: How Disease Threat Shapes Sociality in LLM Agents arXiv [paper] #sim-social
  • [2025/05] EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation EMNLP 2025 [paper] #sim-social #role-play
  • [2025/04] SOTOPIA-S4: a user-friendly system for flexible, customizable, and large-scale social simulation NAACL 2025 [paper] #sim-social #planning
  • [2025/04] BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation arXiv [paper] #sim-social #multi-agent #generation
  • [2025/04] MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework NeurIPS 2025 poster [paper] #sim-social #planning #training
  • [2025/03] The Impact of Big Five Personality Traits on AI Agent Decision-Making in Public Spaces: A Social Simulation Study arXiv [paper] #sim-social
  • [2025/02] Investigating and Extending Homans' Social Exchange Theory with Large Language Model based Agents ACL 2025 [paper][code] #sim-social
  • [2025/01] Simulating Human-like Daily Activities with Desire-driven Autonomy ICLR 2025 [paper][code] #sim-social
  • [2025/01] Are Human Interactions Replicable by Generative Agents? A Case Study on Pronoun Usage in Hierarchical Interactions arXiv [paper] #sim-social
  • [2024/10] Project Sid: Many-agent simulations toward AI civilization arXiv [paper] #sim-social
  • [2024/06] Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory arXiv [paper] #sim-social #multi-agent
  • [2024/05] Agent hospital: A simulacrum of hospital with evolvable medical agents arXiv [paper] #sim-social #planning
  • [2024/03] SOTOPIA-$\pi$: Interactive Learning of Socially Intelligent Language Agents ACL 2024 [paper][code] #sim-social #training
  • [2023/10] Humanoid Agents: Platform for Simulating Human-like Generative Agents EMNLP 2023 [paper] #sim-social
  • [2023/10] Lyfe Agents: Generative agents for low-cost real-time social interactions arXiv [paper] #sim-social #memory #multi-agent #self-improvement
  • [2023/10] SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents ICLR 2023 [paper][code] #sim-social #role-play
  • [2023/08] AgentSims: An Open-Source Sandbox for Large Language Model Evaluation arXiv [paper][code] #sim-social #planning #tool-use
  • [2023/07] S3: Social-network Simulation System with Large Language Model-Empowered Agents arXiv [paper] #sim-social #prompting
  • [2023/04] Generative Agents: Interactive Simulacra of Human Behavior UIST 2023 [paper][code] #sim-social

sim-embodied

  • [2026/05] Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty arXiv [paper] #sim-embodied #training
  • [2026/05] Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving arXiv [paper] #sim-embodied
  • [2026/02] To Move or Not to Move: Constraint-based Planning Enables Zero-Shot Generalization for Interactive Navigation arXiv [paper] #sim-embodied #planning #prompting #vlm
  • [2025/12] Emergence: Overcoming Privileged Information Bias in Asymmetric Embodied Agents via Active Querying arXiv [paper] #sim-embodied
  • [2025/12] ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning arXiv [paper] #sim-embodied #planning #memory #training
  • [2025/12] HELP: Hierarchical Embodied Language Planner for Household Tasks arXiv [paper] #sim-embodied #planning
  • [2025/11] DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration arXiv [paper] #sim-embodied #planning #multi-agent #world-model
  • [2025/11] MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning arXiv [paper] #sim-embodied #planning #multi-agent #prompting
  • [2025/09] ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures ICLR 2026 Poster [paper] #sim-embodied #planning
  • [2024/09] Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems ICLR 2025 Poster [paper] #sim-embodied #training
  • [2024/09] BadRobot: Jailbreaking Embodied LLM Agents in the Physical World ICLR 2025 Poster [paper] #sim-embodied #planning #vlm
  • [2024/09] GameGen-X: Interactive Open-world Game Video Generation ICLR 2025 Poster [paper][code] #sim-embodied #vlm
  • [2024/01] True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning arXiv [paper][code] #sim-embodied #training
  • [2023/10] Octopus: Embodied Vision-Language Programmer from Environmental Feedback ECCV 2023 [paper][code] #sim-embodied #planning #training #vlm
  • [2023/05] Language Models Meet World Models: Embodied Experiences Enhance Language Models NeurIPS 2023 [paper][code] #sim-embodied #planning #world-model
  • [2022/12] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models ICCV 2023 [paper] #sim-embodied #planning #prompting
  • [2022/01] Language Models as Zero-ShoSSocial-network Simulation Planners: Extracting Actionable Knowledge for Embodied Agents ICML 2022 [paper][code] #sim-embodied

sim-other

  • [2024/01] CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents ICLR 2024 [paper][code] #sim-other

crafter

  • [2025/09] Goal-Guided Efficient Exploration via Large Language Model in Reinforcement Learning arXiv [paper] #crafter #planning #training
  • [2025/08] HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents arXiv [paper] #crafter #planning #training
  • [2025/06] Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback arXiv [paper] #crafter #self-improvement
  • [2025/02] LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning arXiv [paper] #crafter #planning #memory #multi-agent
  • [2024/10] Mars: Situated Inductive Reasoning in an Open-World Environment NeurIPS 2024 [paper] #crafter
  • [2024/07] Enhancing Agent Learning through World Dynamics Modeling EMNLP 2024 [paper] #crafter
  • [2024/04] AgentKit: Flow Engineering with Graphs, not Coding arXiv [paper][code] #crafter #planning
  • [2024/04] World Models with Hints of Large Language Models for Goal Achieving NAACL 2024 [paper] #crafter #training #vlm
  • [2024/03] EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents COLM [paper] #crafter #training
  • [2024/03] AgentKit: Structured LLM Reasoning with Dynamic Graphs COLM [paper] #crafter #planning
  • [2023/09] AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback NAACL 2023 [paper] #crafter #training
  • [2023/06] OMNI: Open-endedness via Models of human Notions of Interestingness arXiv [paper][code] #crafter
  • [2023/05] SPRING: Studying Papers and Reasoning to play Games NeurIPS 2023 [paper] #crafter
  • [2023/02] Guiding Pretraining in Reinforcement Learning with Large Language Models ICML 2023 [paper] #crafter #training

action

  • [2026/05] Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning arXiv [paper] #action #training #vlm
  • [2026/05] ANO: A Principled Approach to Robust Policy Optimization arXiv [paper] #action #training
  • [2026/05] Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay arXiv [paper] #action #planning #training #vlm
  • [2026/04] Playing DOOM with 1.3M Parameters: Specialized Small Models vs Large Language Models for Real-Time Game Control arXiv [paper] #action
  • [2026/04] PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models arXiv [paper] #action #planning #vlm
  • [2026/04] GRAIL: Autonomous Concept Grounding for Neuro-Symbolic Reinforcement Learning arXiv [paper] #action #training
  • [2026/03] Understanding the Challenges in Iterative Generative Optimization with LLMs arXiv [paper] #action
  • [2026/03] See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay arXiv [paper] #action #vlm
  • [2026/02] Implicit Strategic Optimization: Rethinking Long-Horizon Decision-Making in Adversarial Poker Environments arXiv [paper] #action #training
  • [2025/12] Robust Agents in Open-Ended Worlds arXiv [paper] #action #multi-agent #training #generation
  • [2025/09] Exploration with Foundation Models: Capabilities, Limitations, and Hybrid Approaches arXiv [paper] #action #training #vlm
  • [2025/08] A Multi-Agent Pokemon Tournament for Evaluating Strategic Reasoning of Large Language Models arXiv [paper] #action #multi-agent
  • [2025/08] Learning Game-Playing Agents with Generative Code Optimization arXiv [paper] #action #training #self-improvement
  • [2025/05] Frog Soup: Zero-Shot, In-Context, and Sample-Efficient Frogger Agents arXiv [paper][code] #action #training
  • [2025/05] Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One arXiv [paper] #action #training
  • [2025/05] Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs NeurIPS 2025 poster [paper] #action #training
  • [2025/05] LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models NeurIPS 2025 spotlight [paper][code] #action #training
  • [2025/05] PoE-World: Compositional World Modeling with Products of Programmatic Experts NeurIPS 2025 spotlight [paper] #action #planning #world-model
  • [2025/04] Better Decisions through the Right Causal World Model arXiv [paper] #action #planning #world-model #training
  • [2025/01] LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations ICML 2025 poster [paper] #action #planning #training
  • [2024/10] Unbounded: A Generative Infinite Game of Character Life Simulation ICLR 2024 [paper] #action
  • [2024/09] Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case arXiv [paper][code] #action #planning #training
  • [2024/09] MaestroMotif: Skill Design from Artificial Intelligence Feedback ICLR 2025 Oral [paper] #action #training
  • [2024/09] BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games ICLR 2025 Poster [paper] #action #planning #training #vlm
  • [2024/08] Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games arXiv [paper] #action #planning #training
  • [2024/07] Baba Is AI: Break the Rules to Beat the Benchmark ICML 2024 [paper] #action #vlm
  • [2024/03] Will GPT-4 Run DOOM? IEEE Transactions on Games 2024 [paper][code] #action #planning #training
  • [2024/03] Evaluate LLMs in Real Time with Street Fighter III GitHub [paper][code] #action
  • [2023/02] Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning ICML 2023 [paper][code] #action #training

video-adventure

  • [2024/03] Cradle: Empowering Foundation Agents Towards General Computer Control ICML 2024 [paper][code] #video-adventure #planning
  • [2024/03] Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents 2024 IEEE Conference on Games (CoG) 2024 [paper][code] #video-adventure #planning #prompting
  • [2023/09] Motif: Intrinsic Motivation from Artificial Intelligence Feedback ICLR 2024 [paper][code] #video-adventure #training

benchmark

  • [2026/01] NitroGen: An Open Foundation Model for Generalist Gaming Agents arXiv [paper] #benchmark #training
  • [2025/10] Think Globally, Group Locally: Evaluating LLMs Using Multi-Lingual Word Grouping Games EMNLP 2025 [paper] #benchmark
  • [2025/06] Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games arXiv [paper][code] #benchmark #training
  • [2025/06] UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI ICCV 2025 [paper][code] #benchmark #multi-agent #training
  • [2025/05] lmgame-Bench: How Good are LLMs at Playing Games?." ICLR 2026 Poster [paper][code] #benchmark #planning #training
  • [2025/05] Is Your LLM Really Mastering the Concept? A Multi-Agent Benchmark arXiv [paper][code] #benchmark #multi-agent

other

  • [2026/04] GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents arXiv [paper] #planning #vlm
  • [2026/04] Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models arXiv [paper] #tool-use #training #self-improvement
  • [2026/04] From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation arXiv [paper] #planning #generation
  • [2026/03] Sensi: Learn One Thing at a Time -- Curriculum-Based Test-Time Learning for LLM Game Agents arXiv [paper]
  • [2026/02] VLM-Guided Experience Replay arXiv [paper] #planning #training #vlm
  • [2026/01] SNAP: A Plan-Driven Framework for Controllable Interactive Narrative Generation arXiv [paper] #planning #generation
  • [2025/10] ROBOPSY PL[AI]: Using Role-Play to Investigate how LLMs Present Collective Memory arXiv [paper] #role-play
  • [2025/08] All Stories Are One Story: Emotional Arc Guided Procedural Game Level Generation arXiv [paper] #generation
  • [2025/08] CHBench: A Cognitive Hierarchy Benchmark for Evaluating Strategic Reasoning Capability of LLMs arXiv [paper] #memory
  • [2025/06] The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind arXiv [paper] #multi-agent #training
  • [2025/04] PAYADOR: A Minimalist Approach to Grounding Language Models on Structured Data for Interactive Storytelling and Role-playing Games arXiv [paper]
  • [2025/04] Enhancing Player Enjoyment with a Two-Tier DRL and LLM-Based Agent System for Fighting Games arXiv [paper] #training
  • [2025/03] Playing games with Large language models: Randomness and strategy arXiv [paper] #multi-agent
  • [2025/03] Collaborative Storytelling and LLM: A Linguistic Analysis of Automatically-Generated Role-Playing Game Sessions arXiv [paper]
  • [2025/03] Cultivating Game Sense for Yourself: Making VLMs Gaming Experts arXiv [paper] #vlm
  • [2025/02] RPGBENCH: Evaluating Large Language Models as Role-Playing Game Engines arXiv [paper]
  • [2025/02] Hybrid Voting-Based Task Assignment in Role-Playing Games arXiv [paper] #planning
  • [2024/09] Agents' Room: Narrative Generation through Multi-step Collaboration ICLR 2025 Poster [paper] #generation
  • [2024/07] What if Red Can Talk? Dynamic Dialogue Generation Using Large Language Models. arXiv [paper] #generation
  • [2023/10] Language as reality: a co-creative storytelling game experience in 1001 nights using generative AI. AAAI 2023 [paper] #generation

By Mechanism

The same papers as above, re-grouped by agent design. A paper with multiple Mechanism tags appears in each relevant section.

planning

  • [2026/05] From History to State: Constant-Context Skill Learning for LLM Agents arXiv [paper] #text-adventure #planning #training
  • [2026/05] PriorZero: Bridging Language Priors and World Models for Decision Making arXiv [paper][code] #text-adventure #planning #world-model #training
  • [2026/05] Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay arXiv [paper] #action #planning #training #vlm
  • [2026/05] APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents arXiv [paper] #text-adventure #planning #self-improvement
  • [2026/05] Evaluating Large Language Models in a Complex Hidden Role Game arXiv [paper] #communication #planning
  • [2026/05] SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems arXiv [paper] #text-adventure #planning #memory #tool-use
  • [2026/04] Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents arXiv [paper] #text-adventure #planning #memory
  • [2026/04] GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents arXiv [paper] #planning #vlm
  • [2026/04] PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models arXiv [paper] #action #planning #vlm
  • [2026/04] RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents arXiv [paper] #sim-social #planning
  • [2026/04] DORA Explorer: Improving the Exploration Ability of LLMs Without Training arXiv [paper] #text-adventure #planning #prompting
  • [2026/04] From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents arXiv [paper] #text-adventure #planning
  • [2026/04] Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows arXiv [paper] #text-adventure #planning #memory #multi-agent
  • [2026/04] From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation arXiv [paper] #planning #generation
  • [2026/04] GraSP: Graph-Structured Skill Compositions for LLM Agents arXiv [paper] #text-adventure #planning #memory #self-improvement
  • [2026/03] RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy arXiv [paper] #text-adventure #planning #memory
  • [2026/03] Grounded Chess Reasoning in Language Models via Master Distillation arXiv [paper] #competition #planning #training
  • [2026/03] GTO Wizard Benchmark arXiv [paper] #competition #planning #multi-agent #prompting
  • [2026/03] Reward Prediction with Factorized World States arXiv [paper] #text-adventure #planning #prompting
  • [2026/03] Self-Evolving Multi-Agent Framework for Efficient Decision Making in Real-Time Strategy Scenarios arXiv [paper] #competition #planning #memory #multi-agent
  • [2026/02] VLM-Guided Experience Replay arXiv [paper] #planning #training #vlm
  • [2026/02] Active Epistemic Control for Query-Efficient Verified Planning arXiv [paper] #text-adventure #planning
  • [2026/02] AIvilization v0: Toward Large-Scale Artificial Social Simulation with a Unified Agent Architecture and Adaptive Agent Profiles arXiv [paper] #sim-social #planning
  • [2026/02] Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents arXiv [paper] #text-adventure #planning #training
  • [2026/02] TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents arXiv [paper] #text-adventure #planning
  • [2026/02] To Move or Not to Move: Constraint-based Planning Enables Zero-Shot Generalization for Interactive Navigation arXiv [paper] #sim-embodied #planning #prompting #vlm
  • [2026/02] CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines arXiv [paper] #text-adventure #planning #world-model #training
  • [2026/01] Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents arXiv [paper] #text-adventure #planning #prompting
  • [2026/01] SNAP: A Plan-Driven Framework for Controllable Interactive Narrative Generation arXiv [paper] #planning #generation
  • [2026/01] Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents arXiv [paper] #text-adventure #planning #training
  • [2025/12] ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning arXiv [paper] #sim-embodied #planning #memory #training
  • [2025/12] HELP: Hierarchical Embodied Language Planner for Household Tasks arXiv [paper] #sim-embodied #planning
  • [2025/11] DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration arXiv [paper] #sim-embodied #planning #multi-agent #world-model
  • [2025/11] MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning arXiv [paper] #sim-embodied #planning #multi-agent #prompting
  • [2025/10] Fine-tuning with RAG for Improving LLM Learning of New Skills arXiv [paper] #text-adventure #planning #memory #training
  • [2025/10] Constrained Natural Language Action Planning for Resilient Embodied Systems arXiv [paper] #text-adventure #planning #prompting
  • [2025/10] The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas arXiv [paper] #text-adventure #planning
  • [2025/10] Multimodal Safety Evaluation in Generative Agent Social Simulations arXiv [paper] #sim-social #planning #vlm
  • [2025/10] Graph-Enhanced Policy Optimization in LLM Agent Training arXiv [paper] #text-adventure #planning #training
  • [2025/10] Out-of-distribution Tests Reveal Compositionality in Chess Transformers arXiv [paper] #competition #planning
  • [2025/09] Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent arXiv [paper] #text-adventure #planning #multi-agent #self-improvement
  • [2025/09] Code Driven Planning with Domain-Adaptive Critic arXiv [paper] #text-adventure #planning #self-improvement
  • [2025/09] Goal-Guided Efficient Exploration via Large Language Model in Reinforcement Learning arXiv [paper] #crafter #planning #training
  • [2025/09] Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI ICLR 2026 Poster [paper] #text-adventure #planning #vlm
  • [2025/09] Experience-based Knowledge Correction for Robust Planning in Minecraft ICLR 2026 Poster [paper] #minecraft #planning
  • [2025/09] SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning ICLR 2026 Poster [paper] #competition #planning #multi-agent #training
  • [2025/09] Code Driven Planning with Domain-Adaptive Selector ICLR 2026 Poster [paper] #text-adventure #planning
  • [2025/09] ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures ICLR 2026 Poster [paper] #sim-embodied #planning
  • [2025/09] DreamPhase: Offline Imagination and Uncertainty-Guided Planning for Large-Language-Model Agents ICLR 2026 Poster [paper] #text-adventure #planning #world-model
  • [2025/08] CausalMACE: Causality Empowered Multi-Agents in Minecraft Cooperative Tasks Findings of EMNLP 2025 [paper] #minecraft #planning #multi-agent
  • [2025/08] Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2025 [paper] #text-adventure #planning #training #vlm
  • [2025/08] CausalPlan: Empowering Efficient LLM Multi-Agent Collaboration Through Causality-Driven Planning arXiv [paper] #cooperation #planning #multi-agent #training
  • [2025/08] SC2Arena and StarEvolve: Benchmark and Self-Improvement Framework for LLMs in Complex Decision-Making Tasks arXiv [paper] #competition #planning #training #self-improvement
  • [2025/08] HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents arXiv [paper] #crafter #planning #training
  • [2025/07] CoEx -- Co-evolving World-model and Exploration EMNLP 2025 [paper] #text-adventure #planning #world-model
  • [2025/06] Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts arXiv [paper][code] #minecraft #planning
  • [2025/06] Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search arXiv [paper] #text-adventure #planning #world-model #training
  • [2025/06] OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections arXiv [paper] #text-adventure #planning #training #self-improvement
  • [2025/06] AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need arXiv [paper][code] #sim-social #planning #multi-agent
  • [2025/06] PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models arXiv [paper] #cooperation #planning #multi-agent
  • [2025/05] Don’t Just Follow MLLM Plans: Robust and Efficient Planning for Open-World Agents arXiv [paper] #minecraft #planning #vlm
  • [2025/05] Knowledge Retrieval in LLM Gaming: A Shift from Entity-Centric to Goal-Oriented Graphs Knowledge-Based Systems 2025 [paper] #minecraft #planning #memory
  • [2025/05] lmgame-Bench: How Good are LLMs at Playing Games?." ICLR 2026 Poster [paper][code] #benchmark #planning #training
  • [2025/05] LLM-BABYBENCH: Understanding and Evaluating Grounded Planning and Reasoning in LLMs arXiv [paper][code] #text-adventure #planning
  • [2025/05] Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning ICML 2025 poster [paper] #text-adventure #planning #training
  • [2025/05] ActiveVOO: Value of Observation Guided Active Knowledge Acquisition for Open-World Embodied Lifted Regression Planning NeurIPS 2025 poster [paper] #text-adventure #planning #prompting #vlm
  • [2025/05] Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL NeurIPS 2025 poster [paper] #communication #planning #tool-use #training
  • [2025/05] SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents NeurIPS 2025 poster [paper] #text-adventure #planning #training #self-improvement
  • [2025/05] PoE-World: Compositional World Modeling with Products of Programmatic Experts NeurIPS 2025 spotlight [paper] #action #planning #world-model
  • [2025/05] WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents NeurIPS 2025 poster [paper] #minecraft #planning #world-model #prompting
  • [2025/04] WALL-E 2.0: World Alignment by NeuroSymbolic Learning Improves World Model-based LLM Agents arXiv [paper][code] #minecraft #planning #world-model #prompting
  • [2025/04] Better Decisions through the Right Causal World Model arXiv [paper] #action #planning #world-model #training
  • [2025/04] ZeroSumEval: Scaling LLM Evaluation with Inter-Model Competition arXiv [paper][code] #competition #planning
  • [2025/04] SOTOPIA-S4: a user-friendly system for flexible, customizable, and large-scale social simulation NAACL 2025 [paper] #sim-social #planning
  • [2025/04] Monte Carlo Planning with Large Language Model for Text-Based Game Agents ICLR 2025 Poster [paper] #text-adventure #planning #training
  • [2025/04] LLM-PySC2: Starcraft II learning environment for Large Language Models NeurIPS 2025 poster [paper] #competition #planning #multi-agent #vlm
  • [2025/04] MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework NeurIPS 2025 poster [paper] #sim-social #planning #training
  • [2025/03] Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems arXiv [paper] #minecraft #planning #multi-agent #tool-use
  • [2025/03] GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks CVPR 2025 [paper] #text-adventure #planning #training #vlm
  • [2025/03] Plancraft: an evaluation dataset for planning with LLM agents COLM 2025 [paper] #minecraft #planning #memory #tool-use
  • [2025/02] Optimus-2: Multimodal World Model for Open-World Minecraft Agents CVPR 2025 [paper][code] #minecraft #planning #world-model #vlm
  • [2025/02] LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning arXiv [paper] #crafter #planning #memory #multi-agent
  • [2025/02] Hybrid Voting-Based Task Assignment in Role-Playing Games arXiv [paper] #planning
  • [2025/02] Implicit Search via Discrete Diffusion: A Study on Chess ICLR 2025 Poster [paper][code] #competition #planning
  • [2025/01] POKERBENCH: Training Large Language Models to become Professional Poker Players AAAI 2025 [paper] #competition #planning #training
  • [2025/01] Mastering Board Games by External and Internal Planning with Language Models ICML 2025 spotlightposter [paper] #competition #planning
  • [2025/01] LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations ICML 2025 poster [paper] #action #planning #training
  • [2025/01] Language Models as Implicit Tree Search ICML 2025 poster [paper] #competition #planning #training
  • [2024/10] WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents arXiv [paper][code] #minecraft #planning #world-model
  • [2024/10] ADAM: An Embodied Causal Agent in Open-World Environments ICLR 2025 [paper][code] #minecraft #planning
  • [2024/10] PokéChamp: An Expert-level Minimax Language Agent ICML 2025 [paper][code] #competition #planning
  • [2024/09] Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case arXiv [paper][code] #action #planning #training
  • [2024/09] Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search ICLR 2025 Poster [paper] #communication #planning #multi-agent #training
  • [2024/09] Discriminator-Guided Embodied Planning for LLM Agent ICLR 2025 Poster [paper] #text-adventure #planning
  • [2024/09] Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback ICLR 2025 Poster [paper] #text-adventure #planning #training #self-improvement
  • [2024/09] BadRobot: Jailbreaking Embodied LLM Agents in the Physical World ICLR 2025 Poster [paper] #sim-embodied #planning #vlm
  • [2024/09] BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games ICLR 2025 Poster [paper] #action #planning #training #vlm
  • [2024/08] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-horizon Tasks NeurIPS 2024 [paper][code] #minecraft #planning #memory #prompting
  • [2024/08] Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information 2024 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) 2024 [paper] #competition #planning #multi-agent #training
  • [2024/08] Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games arXiv [paper] #action #planning #training
  • [2024/07] Arigraph: Learning knowledge graph world models with episodic memory for llm agents IJCAI 2024 [paper] #text-adventure #planning #memory
  • [2024/07] Odyssey: Empowering Agents with Open-World Skills. IJCAI 2024 [paper][code] #minecraft #planning #tool-use
  • [2024/06] VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft Findings of ACL 2024 [paper][code] #minecraft #planning #multi-agent
  • [2024/05] Agent Planning with World Knowledge Model NeurIPS 2024 [paper][code] #text-adventure #planning #memory #world-model
  • [2024/05] Agent hospital: A simulacrum of hospital with evolvable medical agents arXiv [paper] #sim-social #planning
  • [2024/05] Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration ACL 2024 [paper][code] #cooperation #planning #multi-agent #training
  • [2024/05] Reflective Multi-Agent Collaboration based on Large Language Models NeurIPS 2024 poster [paper] #competition #planning #multi-agent #training
  • [2024/05] AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning NeurIPS 2024 poster [paper][code] #text-adventure #planning
  • [2024/05] Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy NeurIPS 2024 poster [paper] #communication #planning #multi-agent #self-improvement
  • [2024/04] ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy arXiv [paper] #text-adventure #planning #training #self-improvement
  • [2024/04] AgentKit: Flow Engineering with Graphs, not Coding arXiv [paper][code] #crafter #planning
  • [2024/03] KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents NAACL 2024 [paper][code] #text-adventure #planning
  • [2024/03] Cradle: Empowering Foundation Agents Towards General Computer Control ICML 2024 [paper][code] #video-adventure #planning
  • [2024/03] Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents 2024 IEEE Conference on Games (CoG) 2024 [paper][code] #video-adventure #planning #prompting
  • [2024/03] Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation (HAS) ICLR 2024 Workshop [paper] #minecraft #planning #multi-agent #vlm
  • [2024/03] Embodied LLM Agents Learn to Cooperate in Organized Teams IEEE Transactions on Computational Social Systems 2024 [paper] #competition #planning #multi-agent
  • [2024/03] Will GPT-4 Run DOOM? IEEE Transactions on Games 2024 [paper][code] #action #planning #training
  • [2024/03] ReAct Meets ActRe: Autonomous Annotation of Agent Trajectories for Contrastive Self-Training COLM [paper] #text-adventure #planning #training #self-improvement
  • [2024/03] StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows COLM [paper] #text-adventure #planning #self-improvement
  • [2024/03] AgentKit: Structured LLM Reasoning with Dynamic Graphs COLM [paper] #crafter #planning
  • [2024/02] Empowering Large Language Model Agents through Action Learning COLM [paper][code] #text-adventure #planning #self-improvement
  • [2024/02] RL-GPT: Integrating Reinforcement Learning and Code-as-policy NeurIPS 2024 oral [paper] #minecraft #planning #tool-use #training
  • [2023/12] MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception CVPR 2024 [paper][code] #minecraft #planning #vlm
  • [2023/12] Creative Agents: Empowering Agents with Imagination for Creative Tasks UAI 2023 [paper][code] #minecraft #planning
  • [2023/12] Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach NeurIPS 2024 poster [paper][code] #competition #planning
  • [2023/11] ADaPT: As-Needed Decomposition and Planning with Language Models NAACL 2023 [paper][code] #text-adventure #planning
  • [2023/11] JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models TPAMI 2023 [paper][code] #minecraft #planning #memory
  • [2023/10] Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models ICML 2024 [paper][code] #text-adventure #planning #training
  • [2023/10] LLaMA Rider: Spurring Large Language Models to Explore the Open World NAACL 2023 [paper][code] #minecraft #planning #training
  • [2023/10] Octopus: Embodied Vision-Language Programmer from Environmental Feedback ECCV 2023 [paper][code] #sim-embodied #planning #training #vlm
  • [2023/10] Evaluating Multi-agent Coordination Abilities in Large Language Models NAACL 2023 [paper] #cooperation #planning #multi-agent #prompting
  • [2023/10] Theory of Mind for Multi-Agent Collaboration via Large Language Models EMNLP 2023 [paper][code] #cooperation #planning #multi-agent #training
  • [2023/09] Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 COLM 2024 [paper] #competition #planning #memory #prompting
  • [2023/09] MindAgent: Emergent Gaming Interaction NAACL 2023 [paper] #cooperation #planning #multi-agent #prompting
  • [2023/08] AgentSims: An Open-Source Sandbox for Large Language Model Evaluation arXiv [paper][code] #sim-social #planning #tool-use
  • [2023/07] Building Cooperative Embodied Agents Modularly with Large Language Models ICLR 2024 [paper][code] #cooperation #planning #multi-agent #training
  • [2023/05] Language Models Meet World Models: Embodied Experiences Enhance Language Models NeurIPS 2023 [paper][code] #sim-embodied #planning #world-model
  • [2023/05] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks NeurIPS 2023 [paper][code] #text-adventure #planning
  • [2023/03] Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks FMDM@NeurIPS2023 [paper][code] #minecraft #planning #training
  • [2023/02] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents NeurIPS 2023 [paper][code] #minecraft #planning #prompting
  • [2022/12] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models ICCV 2023 [paper] #sim-embodied #planning #prompting
  • [2022/10] ReAct: Synergizing Reasoning and Acting in Language Models ICLR 2023 [paper][code] #text-adventure #planning #training
  • [2020/10] ALFWorld: Aligning Text and Embodied Environments for Interactive Learning ICLR 2021 [paper][code] #text-adventure #planning

memory

  • [2026/05] Belief Memory: Agent Memory Under Partial Observability arXiv [paper] #text-adventure #memory
  • [2026/05] GASim: A Graph-Accelerated Hybrid Framework for Social Simulation arXiv [paper][code] #sim-social #memory #multi-agent
  • [2026/05] ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles arXiv [paper] #sim-social #memory #multi-agent
  • [2026/05] Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents arXiv [paper] #text-adventure #memory #tool-use
  • [2026/05] SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs arXiv [paper] #text-adventure #memory #tool-use #training
  • [2026/05] SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems arXiv [paper] #text-adventure #planning #memory #tool-use
  • [2026/04] Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents arXiv [paper] #text-adventure #planning #memory
  • [2026/04] LLM-Agent-based Social Simulation for Attitude Diffusion arXiv [paper] #sim-social #memory
  • [2026/04] Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows arXiv [paper] #text-adventure #planning #memory #multi-agent
  • [2026/04] GraSP: Graph-Structured Skill Compositions for LLM Agents arXiv [paper] #text-adventure #planning #memory #self-improvement
  • [2026/04] Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game arXiv [paper] #minecraft #memory #multi-agent #vlm
  • [2026/03] RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy arXiv [paper] #text-adventure #planning #memory
  • [2026/03] RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback arXiv [paper] #text-adventure #memory #training #self-improvement
  • [2026/03] Self-Evolving Multi-Agent Framework for Efficient Decision Making in Real-Time Strategy Scenarios arXiv [paper] #competition #planning #memory #multi-agent
  • [2026/02] SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning arXiv [paper][code] #text-adventure #memory #tool-use #training
  • [2025/12] EZYer: A simulacrum of high school with generative agent arXiv [paper] #sim-social #memory
  • [2025/12] ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning arXiv [paper] #sim-embodied #planning #memory #training
  • [2025/12] Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2025 [paper] #text-adventure #memory
  • [2025/12] ReCollab: Retrieval-Augmented LLMs for Cooperative Ad-hoc Teammate Modeling arXiv [paper] #cooperation #memory
  • [2025/11] Knowledge Graph-enhanced Large Language Model for Incremental Game PlayTesting IEICE Transactions on Information and Systems 2025 [paper] #minecraft #memory
  • [2025/10] Fine-tuning with RAG for Improving LLM Learning of New Skills arXiv [paper] #text-adventure #planning #memory #training
  • [2025/10] Memory-Augmented State Machine Prompting: A Novel LLM Agent Framework for Real-Time Strategy Games arXiv [paper] #competition #memory
  • [2025/10] Alita-G: Self-Evolving Generative Agent for Agent Generation arXiv [paper] #sim-social #memory #self-improvement
  • [2025/09] World Model Implanting for Test-time Adaptation of Embodied Agents ICML 2025 poster [paper] #text-adventure #memory #world-model #prompting
  • [2025/09] Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization ICLR 2026 Poster [paper] #text-adventure #memory #training
  • [2025/08] Vistawise: Building Cost-effective Agent with Cross-modal Knowledge Graph for Minecraft EMNLP 2025 [paper] #minecraft #memory #tool-use
  • [2025/08] CHBench: A Cognitive Hierarchy Benchmark for Evaluating Strategic Reasoning Capability of LLMs arXiv [paper] #memory
  • [2025/06] StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns arXiv [paper] #text-adventure #memory
  • [2025/05] Knowledge Retrieval in LLM Gaming: A Shift from Entity-Centric to Goal-Oriented Graphs Knowledge-Based Systems 2025 [paper] #minecraft #planning #memory
  • [2025/03] Plancraft: an evaluation dataset for planning with LLM agents COLM 2025 [paper] #minecraft #planning #memory #tool-use
  • [2025/02] LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning arXiv [paper] #crafter #planning #memory #multi-agent
  • [2024/11] MrSteve: Instruction-Following Agents with What-Where-When Memory ICLR 2025 [paper][code] #minecraft #memory
  • [2024/09] MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory ICLR 2025 Poster [paper] #minecraft #memory
  • [2024/08] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-horizon Tasks NeurIPS 2024 [paper][code] #minecraft #planning #memory #prompting
  • [2024/07] Arigraph: Learning knowledge graph world models with episodic memory for llm agents IJCAI 2024 [paper] #text-adventure #planning #memory
  • [2024/05] Agent Planning with World Knowledge Model NeurIPS 2024 [paper][code] #text-adventure #planning #memory #world-model
  • [2023/11] JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models TPAMI 2023 [paper][code] #minecraft #planning #memory
  • [2023/11] See and Think: Embodied Agent in Virtual Environment ECCV 2023 [paper][code] #minecraft #memory
  • [2023/10] Lyfe Agents: Generative agents for low-cost real-time social interactions arXiv [paper] #sim-social #memory #multi-agent #self-improvement
  • [2023/09] Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 COLM 2024 [paper] #competition #planning #memory #prompting
  • [2023/09] Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf arXiv [paper] #communication #memory

multi-agent

  • [2026/05] GASim: A Graph-Accelerated Hybrid Framework for Social Simulation arXiv [paper][code] #sim-social #memory #multi-agent
  • [2026/05] ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles arXiv [paper] #sim-social #memory #multi-agent
  • [2026/05] GAMBIT: A Three-Mode Benchmark for Adversarial Robustness in Multi-Agent LLM Collectives arXiv [paper] #competition #multi-agent #prompting
  • [2026/05] ALSO: Adversarial Online Strategy Optimization for Social Agents arXiv [paper] #sim-social #multi-agent #training
  • [2026/04] MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2026 [paper] #competition #multi-agent #training
  • [2026/04] Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows arXiv [paper] #text-adventure #planning #memory #multi-agent
  • [2026/04] Don't Make the LLM Read the Graph: Make the Graph Think arXiv [paper] #cooperation #multi-agent
  • [2026/04] Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game arXiv [paper] #minecraft #memory #multi-agent #vlm
  • [2026/03] How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment arXiv [paper] #text-adventure #multi-agent #training
  • [2026/03] GTO Wizard Benchmark arXiv [paper] #competition #planning #multi-agent #prompting
  • [2026/03] Self-Evolving Multi-Agent Framework for Efficient Decision Making in Real-Time Strategy Scenarios arXiv [paper] #competition #planning #memory #multi-agent
  • [2026/03] Belief-Driven Multi-Agent Collaboration via Approximate Perfect Bayesian Equilibrium for Social Simulation WWW 2026 [paper][code] #sim-social #multi-agent
  • [2026/03] Deception and Communication in Autonomous Multi-Agent Systems: An Experimental Study with Among Us Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2026 [paper] #communication #multi-agent
  • [2026/02] The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies arXiv [paper] #sim-social #multi-agent #self-improvement
  • [2026/01] When Agents See Humans as the Outgroup: Belief-Dependent Bias in LLM-Powered Agents arXiv [paper] #sim-social #multi-agent
  • [2026/01] Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines Using Social Deduction Games International Conference on Agents 2027 [paper] #communication #multi-agent
  • [2026/01] MARO: Learning Stronger Reasoning from Social Interaction arXiv [paper] #sim-social #multi-agent
  • [2025/12] WOLF: Werewolf-based Observations for LLM Deception and Falsehoods arXiv [paper] #communication #multi-agent
  • [2025/12] Robust Agents in Open-Ended Worlds arXiv [paper] #action #multi-agent #training #generation
  • [2025/12] Agent-Kernel: A MicroKernel Multi-Agent System Framework for Adaptive Social Simulation Powered by LLMs arXiv [paper] #sim-social #multi-agent
  • [2025/11] DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration arXiv [paper] #sim-embodied #planning #multi-agent #world-model
  • [2025/11] Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal Reasoning arXiv [paper][code] #communication #multi-agent
  • [2025/11] MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning arXiv [paper] #sim-embodied #planning #multi-agent #prompting
  • [2025/10] LLM-Hanabi: Evaluating Multi-Agent Gameplays with Theory-of-Mind and Rationale Inference in Imperfect Information Collaboration Game arXiv [paper] #cooperation #multi-agent
  • [2025/10] Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies arXiv [paper] #communication #multi-agent #self-improvement
  • [2025/10] Social Simulations with Large Language Model Risk Utopian Illusion arXiv [paper] #sim-social #multi-agent
  • [2025/09] Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent arXiv [paper] #text-adventure #planning #multi-agent #self-improvement
  • [2025/09] PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments 2025 IEEE Conference on Games (CoG) 2025 [paper] #minecraft #multi-agent #self-improvement
  • [2025/09] HLSMAC: A New StarCraft Multi-Agent Challenge for High-Level Strategic Decision-Making arXiv [paper] #competition #multi-agent #training
  • [2025/09] SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning ICLR 2026 Poster [paper] #competition #planning #multi-agent #training
  • [2025/08] CausalMACE: Causality Empowered Multi-Agents in Minecraft Cooperative Tasks Findings of EMNLP 2025 [paper] #minecraft #planning #multi-agent
  • [2025/08] A Multi-Agent Pokemon Tournament for Evaluating Strategic Reasoning of Large Language Models arXiv [paper] #action #multi-agent
  • [2025/08] CausalPlan: Empowering Efficient LLM Multi-Agent Collaboration Through Causality-Driven Planning arXiv [paper] #cooperation #planning #multi-agent #training
  • [2025/07] LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra arXiv [paper][code] #sim-social #multi-agent #training #role-play
  • [2025/07] CoMet: Metaphor-Driven Covert Communication for Multi-Agent Language Games ACL 2025 [paper][code] #communication #multi-agent
  • [2025/06] UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI ICCV 2025 [paper][code] #benchmark #multi-agent #training
  • [2025/06] IndoorWorld: Integrating Physical Task Solving and Social Simulation in A Heterogeneous Multi-Agent Environment EMNLP 2025 [paper] #sim-social #multi-agent
  • [2025/06] WereWolf-Plus: An Update of Werewolf Game setting Based on DSGBench arXiv [paper][code] #communication #multi-agent
  • [2025/06] The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind arXiv [paper] #multi-agent #training
  • [2025/06] AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need arXiv [paper][code] #sim-social #planning #multi-agent
  • [2025/06] PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models arXiv [paper] #cooperation #planning #multi-agent
  • [2025/05] Is Your LLM Really Mastering the Concept? A Multi-Agent Benchmark arXiv [paper][code] #benchmark #multi-agent
  • [2025/04] Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning arXiv [paper][code] #minecraft #multi-agent #training
  • [2025/04] BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation arXiv [paper] #sim-social #multi-agent #generation
  • [2025/04] LLM-PySC2: Starcraft II learning environment for Large Language Models NeurIPS 2025 poster [paper] #competition #planning #multi-agent #vlm
  • [2025/03] Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems arXiv [paper] #minecraft #planning #multi-agent #tool-use
  • [2025/03] Playing games with Large language models: Randomness and strategy arXiv [paper] #multi-agent
  • [2025/03] Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning COLM 2025 [paper] #competition #multi-agent #training
  • [2025/02] LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning arXiv [paper] #crafter #planning #memory #multi-agent
  • [2024/12] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft arXiv [paper][code] #minecraft #multi-agent #training #vlm
  • [2024/09] Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search ICLR 2025 Poster [paper] #communication #planning #multi-agent #training
  • [2024/08] Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information 2024 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) 2024 [paper] #competition #planning #multi-agent #training
  • [2024/06] VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft Findings of ACL 2024 [paper][code] #minecraft #planning #multi-agent
  • [2024/06] Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory arXiv [paper] #sim-social #multi-agent
  • [2024/06] PLAYER: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games arXiv [paper] #communication #multi-agent
  • [2024/05] Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration ACL 2024 [paper][code] #cooperation #planning #multi-agent #training
  • [2024/05] Reflective Multi-Agent Collaboration based on Large Language Models NeurIPS 2024 poster [paper] #competition #planning #multi-agent #training
  • [2024/05] Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy NeurIPS 2024 poster [paper] #communication #planning #multi-agent #self-improvement
  • [2024/03] MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs arXiv [paper][code] #minecraft #multi-agent #vlm
  • [2024/03] Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation (HAS) ICLR 2024 Workshop [paper] #minecraft #planning #multi-agent #vlm
  • [2024/03] Embodied LLM Agents Learn to Cooperate in Organized Teams IEEE Transactions on Computational Social Systems 2024 [paper] #competition #planning #multi-agent
  • [2024/03] Can LLM-Augmented Autonomous Agents Cooperate?, An Evaluation of Their Cooperative Capabilities through Melting Pot IEEE Transactions on Artificial Intelligence 2024 [paper] #cooperation #multi-agent
  • [2024/03] Helmsman of the Masses? Evaluate the Opinion Leadership of Large Language Models in the Werewolf Game COLM [paper] #communication #multi-agent
  • [2023/12] Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game arXiv [paper] #communication #multi-agent
  • [2023/12] Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games ACL 2023 [paper] #communication #multi-agent #prompting
  • [2023/11] War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars arXiv [paper][code] #communication #multi-agent
  • [2023/10] Lyfe Agents: Generative agents for low-cost real-time social interactions arXiv [paper] #sim-social #memory #multi-agent #self-improvement
  • [2023/10] Evaluating Multi-agent Coordination Abilities in Large Language Models NAACL 2023 [paper] #cooperation #planning #multi-agent #prompting
  • [2023/10] Theory of Mind for Multi-Agent Collaboration via Large Language Models EMNLP 2023 [paper][code] #cooperation #planning #multi-agent #training
  • [2023/10] LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay EMNLP 2023 [paper] #communication #multi-agent
  • [2023/10] Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models arXiv [paper][code] #communication #multi-agent
  • [2023/09] MindAgent: Emergent Gaming Interaction NAACL 2023 [paper] #cooperation #planning #multi-agent #prompting
  • [2023/07] Building Cooperative Embodied Agents Modularly with Large Language Models ICLR 2024 [paper][code] #cooperation #planning #multi-agent #training

world-model

  • [2026/05] PriorZero: Bridging Language Priors and World Models for Decision Making arXiv [paper][code] #text-adventure #planning #world-model #training
  • [2026/02] World Models for Policy Refinement in StarCraft II arXiv [paper] #competition #world-model #prompting
  • [2026/02] CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines arXiv [paper] #text-adventure #planning #world-model #training
  • [2026/02] Reinforcement World Model Learning for LLM-based Agents arXiv [paper] #text-adventure #world-model
  • [2025/11] DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration arXiv [paper] #sim-embodied #planning #multi-agent #world-model
  • [2025/09] World Model Implanting for Test-time Adaptation of Embodied Agents ICML 2025 poster [paper] #text-adventure #memory #world-model #prompting
  • [2025/09] DreamPhase: Offline Imagination and Uncertainty-Guided Planning for Large-Language-Model Agents ICLR 2026 Poster [paper] #text-adventure #planning #world-model
  • [2025/07] CoEx -- Co-evolving World-model and Exploration EMNLP 2025 [paper] #text-adventure #planning #world-model
  • [2025/06] Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search arXiv [paper] #text-adventure #planning #world-model #training
  • [2025/05] PoE-World: Compositional World Modeling with Products of Programmatic Experts NeurIPS 2025 spotlight [paper] #action #planning #world-model
  • [2025/05] WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents NeurIPS 2025 poster [paper] #minecraft #planning #world-model #prompting
  • [2025/04] WALL-E 2.0: World Alignment by NeuroSymbolic Learning Improves World Model-based LLM Agents arXiv [paper][code] #minecraft #planning #world-model #prompting
  • [2025/04] Better Decisions through the Right Causal World Model arXiv [paper] #action #planning #world-model #training
  • [2025/02] Optimus-2: Multimodal World Model for Open-World Minecraft Agents CVPR 2025 [paper][code] #minecraft #planning #world-model #vlm
  • [2024/10] WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents arXiv [paper][code] #minecraft #planning #world-model
  • [2024/05] Agent Planning with World Knowledge Model NeurIPS 2024 [paper][code] #text-adventure #planning #memory #world-model
  • [2023/05] Language Models Meet World Models: Embodied Experiences Enhance Language Models NeurIPS 2023 [paper][code] #sim-embodied #planning #world-model
  • [2023/04] Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions arXiv [paper] #text-adventure #world-model

tool-use

  • [2026/05] Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning arXiv [paper] #text-adventure #tool-use #training
  • [2026/05] Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents arXiv [paper] #text-adventure #memory #tool-use
  • [2026/05] SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs arXiv [paper] #text-adventure #memory #tool-use #training
  • [2026/05] SkillOps: Managing LLM Agent Skill Libraries as Self-Maintaining Software Ecosystems arXiv [paper] #text-adventure #planning #memory #tool-use
  • [2026/04] Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models arXiv [paper] #tool-use #training #self-improvement
  • [2026/02] SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning arXiv [paper][code] #text-adventure #memory #tool-use #training
  • [2025/09] Speculative Actions: A Lossless Framework for Faster AI Agents ICLR 2026 Oral [paper] #competition #tool-use
  • [2025/09] Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning ICLR 2026 Poster [paper] #text-adventure #tool-use #training
  • [2025/08] Vistawise: Building Cost-effective Agent with Cross-modal Knowledge Graph for Minecraft EMNLP 2025 [paper] #minecraft #memory #tool-use
  • [2025/05] Agent-Environment Alignment via Automated Interface Generation arXiv [paper][code] #text-adventure #tool-use
  • [2025/05] Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL NeurIPS 2025 poster [paper] #communication #planning #tool-use #training
  • [2025/03] Parallelized Planning-Acting for Efficient LLM-based Multi-Agent Systems arXiv [paper] #minecraft #planning #multi-agent #tool-use
  • [2025/03] Plancraft: an evaluation dataset for planning with LLM agents COLM 2025 [paper] #minecraft #planning #memory #tool-use
  • [2024/07] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents ACL 2024 [paper][code] #text-adventure #tool-use
  • [2024/07] Odyssey: Empowering Agents with Open-World Skills. IJCAI 2024 [paper][code] #minecraft #planning #tool-use
  • [2024/04] Learning From Failure: Integrating Negative Examples When Fine-tuning Large Language Models as Agent arXiv [paper][code] #text-adventure #tool-use #training
  • [2024/02] RL-GPT: Integrating Reinforcement Learning and Code-as-policy NeurIPS 2024 oral [paper] #minecraft #planning #tool-use #training
  • [2023/08] AgentSims: An Open-Source Sandbox for Large Language Model Evaluation arXiv [paper][code] #sim-social #planning #tool-use
  • [2023/05] VOYAGER: An Open-Ended Embodied Agent with Large Language Models FMDM@NeurIPS2023 [paper][code] #minecraft #tool-use #training

training

  • [2026/05] Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning arXiv [paper] #action #training #vlm
  • [2026/05] T$^2$PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning arXiv [paper][code] #text-adventure #training
  • [2026/05] ANO: A Principled Approach to Robust Policy Optimization arXiv [paper] #action #training
  • [2026/05] From History to State: Constant-Context Skill Learning for LLM Agents arXiv [paper] #text-adventure #planning #training
  • [2026/05] StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction arXiv [paper] #text-adventure #training
  • [2026/05] AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning arXiv [paper] #text-adventure #training
  • [2026/05] Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents arXiv [paper] #text-adventure #training #self-improvement
  • [2026/05] Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning arXiv [paper] #text-adventure #tool-use #training
  • [2026/05] SkillMaster: Toward Autonomous Skill Mastery in LLM Agents arXiv [paper] #text-adventure #training
  • [2026/05] Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty arXiv [paper] #sim-embodied #training
  • [2026/05] PriorZero: Bridging Language Priors and World Models for Decision Making arXiv [paper][code] #text-adventure #planning #world-model #training
  • [2026/05] Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models arXiv [paper] #competition #training
  • [2026/05] ALSO: Adversarial Online Strategy Optimization for Social Agents arXiv [paper] #sim-social #multi-agent #training
  • [2026/05] Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay arXiv [paper] #action #planning #training #vlm
  • [2026/05] What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents arXiv [paper] #text-adventure #training
  • [2026/05] GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents arXiv [paper] #minecraft #training #vlm
  • [2026/05] SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs arXiv [paper] #text-adventure #memory #tool-use #training
  • [2026/05] Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning arXiv [paper] #text-adventure #training
  • [2026/05] Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL arXiv [paper][code] #text-adventure #training
  • [2026/05] R2V Agent: Teaching SLMs When to Ask for Help arXiv [paper] #text-adventure #training
  • [2026/04] MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2026 [paper] #competition #multi-agent #training
  • [2026/04] Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents arXiv [paper][code] #text-adventure #training
  • [2026/04] GRAIL: Autonomous Concept Grounding for Neuro-Symbolic Reinforcement Learning arXiv [paper] #action #training
  • [2026/04] Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models arXiv [paper] #tool-use #training #self-improvement
  • [2026/04] DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents arXiv [paper][code] #text-adventure #training
  • [2026/03] On the Strengths and Weaknesses of Data for Open-set Embodied Assistance arXiv [paper] #cooperation #training
  • [2026/03] Hindsight Credit Assignment for Long-Horizon LLM Agents arXiv [paper] #text-adventure #training
  • [2026/03] How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment arXiv [paper] #text-adventure #multi-agent #training
  • [2026/03] PolicySim: An LLM-Based Agent Social Simulation Sandbox for Proactive Policy Optimization WWW 2026 [paper] #sim-social #training
  • [2026/03] Grounded Chess Reasoning in Language Models via Master Distillation arXiv [paper] #competition #planning #training
  • [2026/03] RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback arXiv [paper] #text-adventure #memory #training #self-improvement
  • [2026/02] VLM-Guided Experience Replay arXiv [paper] #planning #training #vlm
  • [2026/02] Implicit Strategic Optimization: Rethinking Long-Horizon Decision-Making in Adversarial Poker Environments arXiv [paper] #action #training
  • [2026/02] Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents arXiv [paper] #text-adventure #planning #training
  • [2026/02] VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study arXiv [paper] #competition #training
  • [2026/02] CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines arXiv [paper] #text-adventure #planning #world-model #training
  • [2026/02] SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards arXiv [paper] #text-adventure #training
  • [2026/02] SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning arXiv [paper][code] #text-adventure #memory #tool-use #training
  • [2026/01] NitroGen: An Open Foundation Model for Generalist Gaming Agents arXiv [paper] #benchmark #training
  • [2026/01] Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents arXiv [paper] #text-adventure #planning #training
  • [2026/01] Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates arXiv [paper][code] #text-adventure #training
  • [2026/01] HumanLLM: Towards Personalized Understanding and Simulation of Human Nature arXiv [paper] #sim-social #training
  • [2025/12] Synergizing Code Coverage and Gameplay Intent: Coverage-Aware Game Playtesting with LLM-Guided Reinforcement Learning arXiv [paper] #minecraft #training
  • [2025/12] ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning arXiv [paper] #sim-embodied #planning #memory #training
  • [2025/12] Measuring Fine-Grained Negotiation Tactics of Humans and LLMs in Diplomacy arXiv [paper] #communication #training
  • [2025/12] Robust Agents in Open-Ended Worlds arXiv [paper] #action #multi-agent #training #generation
  • [2025/10] Fine-tuning with RAG for Improving LLM Learning of New Skills arXiv [paper] #text-adventure #planning #memory #training
  • [2025/10] Graph-Enhanced Policy Optimization in LLM Agent Training arXiv [paper] #text-adventure #planning #training
  • [2025/10] SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph arXiv [paper] #text-adventure #training
  • [2025/09] HLSMAC: A New StarCraft Multi-Agent Challenge for High-Level Strategic Decision-Making arXiv [paper] #competition #multi-agent #training
  • [2025/09] Exploration with Foundation Models: Capabilities, Limitations, and Hybrid Approaches arXiv [paper] #action #training #vlm
  • [2025/09] Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents arXiv [paper] #text-adventure #training
  • [2025/09] Goal-Guided Efficient Exploration via Large Language Model in Reinforcement Learning arXiv [paper] #crafter #planning #training
  • [2025/09] Reward Is Enough: LLMs Are In-Context Reinforcement Learners ICLR 2026 Poster [paper] #text-adventure #training #self-improvement
  • [2025/09] Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations ICLR 2026 Poster [paper] #text-adventure #training
  • [2025/09] SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning ICLR 2026 Poster [paper] #competition #planning #multi-agent #training
  • [2025/09] Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization ICLR 2026 Poster [paper] #text-adventure #memory #training
  • [2025/09] Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning ICLR 2026 Poster [paper] #text-adventure #tool-use #training
  • [2025/08] Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2025 [paper] #text-adventure #planning #training #vlm
  • [2025/08] Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy AAAI 2025 [paper] #communication #training
  • [2025/08] CausalPlan: Empowering Efficient LLM Multi-Agent Collaboration Through Causality-Driven Planning arXiv [paper] #cooperation #planning #multi-agent #training
  • [2025/08] Learning Game-Playing Agents with Generative Code Optimization arXiv [paper] #action #training #self-improvement
  • [2025/08] SC2Arena and StarEvolve: Benchmark and Self-Improvement Framework for LLMs in Complex Decision-Making Tasks arXiv [paper] #competition #planning #training #self-improvement
  • [2025/08] HERAKLES: Hierarchical Skill Compilation for Open-ended LLM Agents arXiv [paper] #crafter #planning #training
  • [2025/07] LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra arXiv [paper][code] #sim-social #multi-agent #training #role-play
  • [2025/06] Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games arXiv [paper][code] #benchmark #training
  • [2025/06] UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI ICCV 2025 [paper][code] #benchmark #multi-agent #training
  • [2025/06] Empowering Economic Simulation for Massively Multiplayer Online Games through Generative Agent-Based Modeling KDD 2025 [paper] #sim-social #training
  • [2025/06] Enhancing Decision-Making of Large Language Models via Actor-Critic ICML 2025 poster [paper] #text-adventure #training
  • [2025/06] Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search arXiv [paper] #text-adventure #planning #world-model #training
  • [2025/06] DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy ICML 2025 poster [paper] #communication #training
  • [2025/06] OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections arXiv [paper] #text-adventure #planning #training #self-improvement
  • [2025/06] The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind arXiv [paper] #multi-agent #training
  • [2025/06] GuessBench: Sensemaking Multimodal Creativity in the Wild arXiv [paper] #minecraft #training #vlm
  • [2025/06] KnowMap: Efficient Knowledge-Driven Task Adaptation for LLMs arXiv [paper] #text-adventure #training
  • [2025/05] lmgame-Bench: How Good are LLMs at Playing Games?." ICLR 2026 Poster [paper][code] #benchmark #planning #training
  • [2025/05] Frog Soup: Zero-Shot, In-Context, and Sample-Efficient Frogger Agents arXiv [paper][code] #action #training
  • [2025/05] Enfoque Odychess: Un método dialéctico, constructivista y adaptativo para la enseñanza del ajedrez con inteligencias artificiales generativas arXiv [paper] #competition #training
  • [2025/05] Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One arXiv [paper] #action #training
  • [2025/05] Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning ICML 2025 poster [paper] #text-adventure #planning #training
  • [2025/05] Retrospex: Language Agent Meets Offline Reinforcement Learning Critic EMNLP 2025 [paper] #text-adventure #training
  • [2025/05] SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution arXiv [paper][code] #text-adventure #training
  • [2025/05] Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs NeurIPS 2025 poster [paper] #action #training
  • [2025/05] Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL NeurIPS 2025 poster [paper] #communication #planning #tool-use #training
  • [2025/05] Can Large Language Models Master Complex Card Games? NeurIPS 2025 poster [paper][code] #competition #training
  • [2025/05] MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Cultural Learning NeurIPS 2025 poster [paper] #minecraft #training
  • [2025/05] SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents NeurIPS 2025 poster [paper] #text-adventure #planning #training #self-improvement
  • [2025/05] LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models NeurIPS 2025 spotlight [paper][code] #action #training
  • [2025/04] Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning arXiv [paper][code] #minecraft #multi-agent #training
  • [2025/04] Better Decisions through the Right Causal World Model arXiv [paper] #action #planning #world-model #training
  • [2025/04] Enhancing Player Enjoyment with a Two-Tier DRL and LLM-Based Agent System for Fighting Games arXiv [paper] #training
  • [2025/04] Monte Carlo Planning with Large Language Model for Text-Based Game Agents ICLR 2025 Poster [paper] #text-adventure #planning #training
  • [2025/04] MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework NeurIPS 2025 poster [paper] #sim-social #planning #training
  • [2025/04] Group-in-Group Policy Optimization for LLM Agent Training NeurIPS 2025 poster [paper] #text-adventure #training
  • [2025/03] GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks CVPR 2025 [paper] #text-adventure #planning #training #vlm
  • [2025/03] Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning COLM 2025 [paper] #competition #multi-agent #training
  • [2025/02] Process Reward Models for LLM Agents: Practical Framework and Directions arXiv [paper][code] #text-adventure #training
  • [2025/01] POKERBENCH: Training Large Language Models to become Professional Poker Players AAAI 2025 [paper] #competition #planning #training
  • [2025/01] DVM: Towards Controllable LLM Agents in Social Deduction Games IEEE International Conference on Acoustics, Speech, and Signal Processing 2025 [paper] #communication #training
  • [2025/01] Complete Chess Games Enable LLM Become A Chess Master NAACL 2025 [paper] #competition #training
  • [2025/01] LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations ICML 2025 poster [paper] #action #planning #training
  • [2025/01] Language Models as Implicit Tree Search ICML 2025 poster [paper] #competition #planning #training
  • [2025/01] LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence ICML 2025 poster [paper] #minecraft #training
  • [2024/12] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft arXiv [paper][code] #minecraft #multi-agent #training #vlm
  • [2024/12] Fine-tuning large vision-language models as decision-making agents via reinforcement learning NeurIPS 2024 [paper][code] #text-adventure #training #vlm
  • [2024/09] Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case arXiv [paper][code] #action #planning #training
  • [2024/09] Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search ICLR 2025 Poster [paper] #communication #planning #multi-agent #training
  • [2024/09] MaestroMotif: Skill Design from Artificial Intelligence Feedback ICLR 2025 Oral [paper] #action #training
  • [2024/09] Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems ICLR 2025 Poster [paper] #sim-embodied #training
  • [2024/09] Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback ICLR 2025 Poster [paper] #text-adventure #planning #training #self-improvement
  • [2024/09] BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games ICLR 2025 Poster [paper] #action #planning #training #vlm
  • [2024/08] Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information 2024 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) 2024 [paper] #competition #planning #multi-agent #training
  • [2024/08] Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games arXiv [paper] #action #planning #training
  • [2024/07] OmniJARVIS: Omni-Modal Open-World Agents in Minecraft NeurIPS 2024 [paper][code] #minecraft #training #vlm
  • [2024/06] STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models ACL 2024 [paper][code] #text-adventure #training
  • [2024/05] Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration ACL 2024 [paper][code] #cooperation #planning #multi-agent #training
  • [2024/05] Policy Improvement using Language Feedback Models NeurIPS 2024 poster [paper] #text-adventure #training
  • [2024/05] Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf NeurIPS 2024 poster [paper] #communication #training
  • [2024/05] Reflective Multi-Agent Collaboration based on Large Language Models NeurIPS 2024 poster [paper] #competition #planning #multi-agent #training
  • [2024/04] Learning From Failure: Integrating Negative Examples When Fine-tuning Large Language Models as Agent arXiv [paper][code] #text-adventure #tool-use #training
  • [2024/04] ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy arXiv [paper] #text-adventure #planning #training #self-improvement
  • [2024/04] World Models with Hints of Large Language Models for Goal Achieving NAACL 2024 [paper] #crafter #training #vlm
  • [2024/04] Self-playing Adversarial Language Game Enhances LLM Reasoning NeurIPS 2024 [paper][code] #communication #training #self-improvement
  • [2024/03] Language Guided Exploration for RL Agents in Text Environments NAACL 2024 [paper][code] #text-adventure #training
  • [2024/03] Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents ACL 2024 [paper][code] #text-adventure #training #self-improvement
  • [2024/03] EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents COLM [paper] #crafter #training
  • [2024/03] SOTOPIA-$\pi$: Interactive Learning of Socially Intelligent Language Agents ACL 2024 [paper][code] #sim-social #training
  • [2024/03] Will GPT-4 Run DOOM? IEEE Transactions on Games 2024 [paper][code] #action #planning #training
  • [2024/03] O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models COLM [paper] #text-adventure #training
  • [2024/03] ReAct Meets ActRe: Autonomous Annotation of Agent Trajectories for Contrastive Self-Training COLM [paper] #text-adventure #planning #training #self-improvement
  • [2024/02] RL-GPT: Integrating Reinforcement Learning and Code-as-policy NeurIPS 2024 oral [paper] #minecraft #planning #tool-use #training
  • [2024/02] PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models TOIT 2025 [paper][code] #competition #training
  • [2024/02] Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization ACL 2024 [paper][code] #competition #training
  • [2024/02] Enhance Reasoning for Large Language Models in the Game Werewolf arXiv [paper] #communication #training
  • [2024/01] True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning arXiv [paper][code] #sim-embodied #training
  • [2024/01] PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas Hold'em via Large Language Model arXiv [paper] #competition #training
  • [2024/01] SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models arXiv [paper] #competition #training
  • [2023/12] Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft CVPR 2023 [paper] #minecraft #training
  • [2023/10] FireAct: Toward Language Agent Fine-tuning arXiv [paper][code] #text-adventure #training
  • [2023/10] Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models ICML 2024 [paper][code] #text-adventure #planning #training
  • [2023/10] LLaMA Rider: Spurring Large Language Models to Explore the Open World NAACL 2023 [paper][code] #minecraft #planning #training
  • [2023/10] Octopus: Embodied Vision-Language Programmer from Environmental Feedback ECCV 2023 [paper][code] #sim-embodied #planning #training #vlm
  • [2023/10] Theory of Mind for Multi-Agent Collaboration via Large Language Models EMNLP 2023 [paper][code] #cooperation #planning #multi-agent #training
  • [2023/10] Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game ICML 2023 [paper] #communication #training
  • [2023/10] Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation arXiv [paper] #communication #training
  • [2023/09] Motif: Intrinsic Motivation from Artificial Intelligence Feedback ICLR 2024 [paper][code] #video-adventure #training
  • [2023/09] AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback NAACL 2023 [paper] #crafter #training
  • [2023/07] Building Cooperative Embodied Agents Modularly with Large Language Models ICLR 2024 [paper][code] #cooperation #planning #multi-agent #training
  • [2023/05] Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory arXiv [paper] #minecraft #training
  • [2023/05] VOYAGER: An Open-Ended Embodied Agent with Large Language Models FMDM@NeurIPS2023 [paper][code] #minecraft #tool-use #training
  • [2023/03] Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks FMDM@NeurIPS2023 [paper][code] #minecraft #planning #training
  • [2023/03] Reflexion: Language Agents with Verbal Reinforcement Learning NeurIPS 2023 [paper][code] #text-adventure #training #self-improvement
  • [2023/02] Guiding Pretraining in Reinforcement Learning with Large Language Models ICML 2023 [paper] #crafter #training
  • [2023/02] Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning ICML 2023 [paper][code] #action #training
  • [2022/10] ReAct: Synergizing Reasoning and Acting in Language Models ICLR 2023 [paper][code] #text-adventure #planning #training

self-improvement

  • [2026/05] Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents arXiv [paper] #text-adventure #training #self-improvement
  • [2026/05] APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents arXiv [paper] #text-adventure #planning #self-improvement
  • [2026/05] Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents arXiv [paper] #minecraft #self-improvement
  • [2026/04] Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models arXiv [paper] #tool-use #training #self-improvement
  • [2026/04] GraSP: Graph-Structured Skill Compositions for LLM Agents arXiv [paper] #text-adventure #planning #memory #self-improvement
  • [2026/03] RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback arXiv [paper] #text-adventure #memory #training #self-improvement
  • [2026/02] MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents arXiv [paper] #text-adventure #self-improvement
  • [2026/02] The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies arXiv [paper] #sim-social #multi-agent #self-improvement
  • [2025/10] Alita-G: Self-Evolving Generative Agent for Agent Generation arXiv [paper] #sim-social #memory #self-improvement
  • [2025/10] Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies arXiv [paper] #communication #multi-agent #self-improvement
  • [2025/09] Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent arXiv [paper] #text-adventure #planning #multi-agent #self-improvement
  • [2025/09] PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments 2025 IEEE Conference on Games (CoG) 2025 [paper] #minecraft #multi-agent #self-improvement
  • [2025/09] Code Driven Planning with Domain-Adaptive Critic arXiv [paper] #text-adventure #planning #self-improvement
  • [2025/09] Reward Is Enough: LLMs Are In-Context Reinforcement Learners ICLR 2026 Poster [paper] #text-adventure #training #self-improvement
  • [2025/08] Learning Game-Playing Agents with Generative Code Optimization arXiv [paper] #action #training #self-improvement
  • [2025/08] SC2Arena and StarEvolve: Benchmark and Self-Improvement Framework for LLMs in Complex Decision-Making Tasks arXiv [paper] #competition #planning #training #self-improvement
  • [2025/06] Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback arXiv [paper] #crafter #self-improvement
  • [2025/06] OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections arXiv [paper] #text-adventure #planning #training #self-improvement
  • [2025/05] SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents NeurIPS 2025 poster [paper] #text-adventure #planning #training #self-improvement
  • [2025/02] TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning. arXiv [paper] #text-adventure #self-improvement
  • [2024/12] Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy NeurIPS 2024 [paper][code] #communication #self-improvement
  • [2024/09] Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback ICLR 2025 Poster [paper] #text-adventure #planning #training #self-improvement
  • [2024/06] Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement EMNLP 2024 [paper][code] #text-adventure #self-improvement
  • [2024/05] Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy NeurIPS 2024 poster [paper] #communication #planning #multi-agent #self-improvement
  • [2024/04] ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy arXiv [paper] #text-adventure #planning #training #self-improvement
  • [2024/04] Self-playing Adversarial Language Game Enhances LLM Reasoning NeurIPS 2024 [paper][code] #communication #training #self-improvement
  • [2024/03] Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents ACL 2024 [paper][code] #text-adventure #training #self-improvement
  • [2024/03] ReAct Meets ActRe: Autonomous Annotation of Agent Trajectories for Contrastive Self-Training COLM [paper] #text-adventure #planning #training #self-improvement
  • [2024/03] StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows COLM [paper] #text-adventure #planning #self-improvement
  • [2024/02] Soft Self-Consistency Improves Language Model Agents arXiv [paper][code] #text-adventure #self-improvement
  • [2024/02] Empowering Large Language Model Agents through Action Learning COLM [paper][code] #text-adventure #planning #self-improvement
  • [2023/10] Lyfe Agents: Generative agents for low-cost real-time social interactions arXiv [paper] #sim-social #memory #multi-agent #self-improvement
  • [2023/03] Reflexion: Language Agents with Verbal Reinforcement Learning NeurIPS 2023 [paper][code] #text-adventure #training #self-improvement

prompting

  • [2026/05] GAMBIT: A Three-Mode Benchmark for Adversarial Robustness in Multi-Agent LLM Collectives arXiv [paper] #competition #multi-agent #prompting
  • [2026/04] DORA Explorer: Improving the Exploration Ability of LLMs Without Training arXiv [paper] #text-adventure #planning #prompting
  • [2026/03] GTO Wizard Benchmark arXiv [paper] #competition #planning #multi-agent #prompting
  • [2026/03] Reward Prediction with Factorized World States arXiv [paper] #text-adventure #planning #prompting
  • [2026/02] World Models for Policy Refinement in StarCraft II arXiv [paper] #competition #world-model #prompting
  • [2026/02] To Move or Not to Move: Constraint-based Planning Enables Zero-Shot Generalization for Interactive Navigation arXiv [paper] #sim-embodied #planning #prompting #vlm
  • [2026/01] Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents arXiv [paper] #text-adventure #planning #prompting
  • [2025/11] SkillGen: Learning Domain Skills for In-Context Sequential Decision Making arXiv [paper] #text-adventure #prompting
  • [2025/11] MADRA: Multi-Agent Debate for Risk-Aware Embodied Planning arXiv [paper] #sim-embodied #planning #multi-agent #prompting
  • [2025/10] Constrained Natural Language Action Planning for Resilient Embodied Systems arXiv [paper] #text-adventure #planning #prompting
  • [2025/09] World Model Implanting for Test-time Adaptation of Embodied Agents ICML 2025 poster [paper] #text-adventure #memory #world-model #prompting
  • [2025/09] Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments ICLR 2026 Poster [paper] #text-adventure #prompting
  • [2025/07] Strategy Adaptation in Large Language Model Werewolf Agents arXiv [paper] #communication #prompting
  • [2025/05] Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking arXiv [paper] #text-adventure #prompting
  • [2025/05] ActiveVOO: Value of Observation Guided Active Knowledge Acquisition for Open-World Embodied Lifted Regression Planning NeurIPS 2025 poster [paper] #text-adventure #planning #prompting #vlm
  • [2025/05] WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents NeurIPS 2025 poster [paper] #minecraft #planning #world-model #prompting
  • [2025/04] WALL-E 2.0: World Alignment by NeuroSymbolic Learning Improves World Model-based LLM Agents arXiv [paper][code] #minecraft #planning #world-model #prompting
  • [2024/08] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-horizon Tasks NeurIPS 2024 [paper][code] #minecraft #planning #memory #prompting
  • [2024/05] THREAD: Thinking Deeper with Recursive Spawning NAACL 2024 [paper] #text-adventure #prompting
  • [2024/03] Playing NetHack with LLMs: Potential & Limitations as Zero-Shot Agents 2024 IEEE Conference on Games (CoG) 2024 [paper][code] #video-adventure #planning #prompting
  • [2023/12] Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games ACL 2023 [paper] #communication #multi-agent #prompting
  • [2023/10] Evaluating Multi-agent Coordination Abilities in Large Language Models NAACL 2023 [paper] #cooperation #planning #multi-agent #prompting
  • [2023/09] Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 COLM 2024 [paper] #competition #planning #memory #prompting
  • [2023/09] MindAgent: Emergent Gaming Interaction NAACL 2023 [paper] #cooperation #planning #multi-agent #prompting
  • [2023/07] S3: Social-network Simulation System with Large Language Model-Empowered Agents arXiv [paper] #sim-social #prompting
  • [2023/02] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents NeurIPS 2023 [paper][code] #minecraft #planning #prompting
  • [2022/12] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models ICCV 2023 [paper] #sim-embodied #planning #prompting

role-play

  • [2026/05] PAVE: A Cognitive Architecture for Legitimate Violation in Generative Agent Societies arXiv [paper] #sim-social #role-play
  • [2026/04] Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach arXiv [paper] #sim-social #role-play
  • [2026/03] Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information arXiv [paper] #communication #role-play
  • [2025/10] ROBOPSY PL[AI]: Using Role-Play to Investigate how LLMs Present Collective Memory arXiv [paper] #role-play
  • [2025/09] Implicit Behavioral Alignment of Language Agents in High-Stakes Crowd Simulations EMNLP 2025 [paper] #sim-social #role-play
  • [2025/07] LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra arXiv [paper][code] #sim-social #multi-agent #training #role-play
  • [2025/07] Validating Generative Agent-Based Models of Social Norm Enforcement: From Replication to Novel Predictions Annual Meeting of the Cognitive Science Society 2025 [paper] #sim-social #role-play
  • [2025/05] EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation EMNLP 2025 [paper] #sim-social #role-play
  • [2023/10] SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents ICLR 2023 [paper][code] #sim-social #role-play

vlm

  • [2026/05] Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning arXiv [paper] #action #training #vlm
  • [2026/05] PRISM: Perception Reasoning Interleaved for Sequential Decision Making arXiv [paper] #text-adventure #vlm
  • [2026/05] Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay arXiv [paper] #action #planning #training #vlm
  • [2026/05] GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents arXiv [paper] #minecraft #training #vlm
  • [2026/04] GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents arXiv [paper] #planning #vlm
  • [2026/04] PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models arXiv [paper] #action #planning #vlm
  • [2026/04] Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game arXiv [paper] #minecraft #memory #multi-agent #vlm
  • [2026/03] BLOCK: An Open-Source Bi-Stage MLLM Character-to-Skin Pipeline for Minecraft arXiv [paper] #minecraft #vlm
  • [2026/03] See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay arXiv [paper] #action #vlm
  • [2026/02] VLM-Guided Experience Replay arXiv [paper] #planning #training #vlm
  • [2026/02] To Move or Not to Move: Constraint-based Planning Enables Zero-Shot Generalization for Interactive Navigation arXiv [paper] #sim-embodied #planning #prompting #vlm
  • [2025/10] GenQuest: An LLM-based Text Adventure Game for Language Learners arXiv [paper] #text-adventure #vlm #generation
  • [2025/10] Multimodal Safety Evaluation in Generative Agent Social Simulations arXiv [paper] #sim-social #planning #vlm
  • [2025/09] Exploration with Foundation Models: Capabilities, Limitations, and Hybrid Approaches arXiv [paper] #action #training #vlm
  • [2025/09] Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI ICLR 2026 Poster [paper] #text-adventure #planning #vlm
  • [2025/08] Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems 2025 [paper] #text-adventure #planning #training #vlm
  • [2025/07] VoyagerVision: Investigating the Role of Multi-modal Information for Open-ended Learning Systems arXiv [paper] #minecraft #vlm
  • [2025/06] GuessBench: Sensemaking Multimodal Creativity in the Wild arXiv [paper] #minecraft #training #vlm
  • [2025/05] Don’t Just Follow MLLM Plans: Robust and Efficient Planning for Open-World Agents arXiv [paper] #minecraft #planning #vlm
  • [2025/05] ActiveVOO: Value of Observation Guided Active Knowledge Acquisition for Open-World Embodied Lifted Regression Planning NeurIPS 2025 poster [paper] #text-adventure #planning #prompting #vlm
  • [2025/04] LLM-PySC2: Starcraft II learning environment for Large Language Models NeurIPS 2025 poster [paper] #competition #planning #multi-agent #vlm
  • [2025/03] GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks CVPR 2025 [paper] #text-adventure #planning #training #vlm
  • [2025/03] Cultivating Game Sense for Yourself: Making VLMs Gaming Experts arXiv [paper] #vlm
  • [2025/02] Optimus-2: Multimodal World Model for Open-World Minecraft Agents CVPR 2025 [paper][code] #minecraft #planning #world-model #vlm
  • [2024/12] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft arXiv [paper][code] #minecraft #multi-agent #training #vlm
  • [2024/12] Fine-tuning large vision-language models as decision-making agents via reinforcement learning NeurIPS 2024 [paper][code] #text-adventure #training #vlm
  • [2024/09] BadRobot: Jailbreaking Embodied LLM Agents in the Physical World ICLR 2025 Poster [paper] #sim-embodied #planning #vlm
  • [2024/09] GameGen-X: Interactive Open-world Game Video Generation ICLR 2025 Poster [paper][code] #sim-embodied #vlm
  • [2024/09] BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games ICLR 2025 Poster [paper] #action #planning #training #vlm
  • [2024/07] OmniJARVIS: Omni-Modal Open-World Agents in Minecraft NeurIPS 2024 [paper][code] #minecraft #training #vlm
  • [2024/07] Baba Is AI: Break the Rules to Beat the Benchmark ICML 2024 [paper] #action #vlm
  • [2024/04] World Models with Hints of Large Language Models for Goal Achieving NAACL 2024 [paper] #crafter #training #vlm
  • [2024/03] MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs arXiv [paper][code] #minecraft #multi-agent #vlm
  • [2024/03] Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation (HAS) ICLR 2024 Workshop [paper] #minecraft #planning #multi-agent #vlm
  • [2023/12] MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception CVPR 2024 [paper][code] #minecraft #planning #vlm
  • [2023/10] Octopus: Embodied Vision-Language Programmer from Environmental Feedback ECCV 2023 [paper][code] #sim-embodied #planning #training #vlm
  • [2023/10] Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds ICLR 2024 [paper] #minecraft #vlm

generation

  • [2026/04] From World-Gen to Quest-Line: A Dependency-Driven Prompt Pipeline for Coherent RPG Generation arXiv [paper] #planning #generation
  • [2026/01] SNAP: A Plan-Driven Framework for Controllable Interactive Narrative Generation arXiv [paper] #planning #generation
  • [2025/12] Robust Agents in Open-Ended Worlds arXiv [paper] #action #multi-agent #training #generation
  • [2025/10] GenQuest: An LLM-based Text Adventure Game for Language Learners arXiv [paper] #text-adventure #vlm #generation
  • [2025/08] All Stories Are One Story: Emotional Arc Guided Procedural Game Level Generation arXiv [paper] #generation
  • [2025/05] STORY2GAME: Generating (Almost) Everything in an Interactive Fiction Game arXiv [paper] #text-adventure #generation
  • [2025/04] BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation arXiv [paper] #sim-social #multi-agent #generation
  • [2025/03] Word2Minecraft: Generating 3D Game Levels through Large Language Models arXiv [paper][code] #minecraft #generation
  • [2024/09] Agents' Room: Narrative Generation through Multi-step Collaboration ICLR 2025 Poster [paper] #generation
  • [2024/07] What if Red Can Talk? Dynamic Dialogue Generation Using Large Language Models. arXiv [paper] #generation
  • [2023/10] Language as reality: a co-creative storytelling game experience in 1001 nights using generative AI. AAAI 2023 [paper] #generation

Citation

If you find this repository useful, please cite our paper. We will periodically check for new papers citing the survey and update this list and the survey if relevant.

@misc{hu2024survey,
      title={A Survey on Large Language Model-Based Game Agents},
      author={Sihao Hu and Tiansheng Huang and Fatih Ilhan and Selim Tekin and Gaowen Liu and Ramana Kompella and Ling Liu},
      year={2024},
      eprint={2404.02039},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}