Skip to content

Commit 2ebb14e

Browse files
authored
feat(doc): update README.md (#204)
* Update README.md * Update README.md
1 parent d51cdf0 commit 2ebb14e

1 file changed

Lines changed: 12 additions & 2 deletions

File tree

README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
<h1 align="center"> Sotopia-RL: Reward Design for Social Intelligence</h1>
44

5-
[![Project Page](https://img.shields.io/badge/Project-Page-green.svg)](https://rl.sotopia.world/) ![Paper PDF](https://img.shields.io/badge/Paper-PDF-red.svg) [![huggingface](https://img.shields.io/badge/%F0%9F%A4%97-Model-orange)](https://huggingface.co/ulab-ai/sotopia-rl-qwen-2.5-7B-grpo) [![Python 3.10](https://img.shields.io/badge/python-%E2%89%A53.10-blue)](https://www.python.org/downloads/release/python-3109/) [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/) <a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a> ![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-blue.svg)
5+
[![Project Page](https://img.shields.io/badge/Project-Page-green.svg)](https://rl.sotopia.world/) [![Paper PDF](https://img.shields.io/badge/Paper-PDF-red.svg)](https://arxiv.org/abs/2508.03905) [![huggingface](https://img.shields.io/badge/%F0%9F%A4%97-Model-orange)](https://huggingface.co/ulab-ai/sotopia-rl-qwen-2.5-7B-grpo) [![Python 3.10](https://img.shields.io/badge/python-%E2%89%A53.10-blue)](https://www.python.org/downloads/release/python-3109/) [![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/) <a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a> ![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-blue.svg)
66

77

88
## 📚 Table of Contents
@@ -21,7 +21,17 @@
2121

2222
We first attribute episode-level rewards for multi-turn social interactions to in- dividual utterances with LLMs. Then, we construct a combined reward that includes multiple dimensions of rewards besides goal completion, allowing us to regularize the optimization pro- cess for goal completion. These rewards are used to guide the RL training of social agents.
2323

24-
24+
```
25+
@misc{yu2025sotopiarlrewarddesignsocial,
26+
title={Sotopia-RL: Reward Design for Social Intelligence},
27+
author={Haofei Yu and Zhengyang Qi and Yining Zhao and Kolby Nottingham and Keyang Xuan and Bodhisattwa Prasad Majumder and Hao Zhu and Paul Pu Liang and Jiaxuan You},
28+
year={2025},
29+
eprint={2508.03905},
30+
archivePrefix={arXiv},
31+
primaryClass={cs.CL},
32+
url={https://arxiv.org/abs/2508.03905}
33+
}
34+
```
2535

2636
![sotopia-rl](assets/sotopia_method.jpg)
2737

0 commit comments

Comments
 (0)