You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: real-multi-round-qa/README.md
+31-7Lines changed: 31 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,37 @@
1
-
# Real Multi-Round QA Benchmark
1
+
# CxS: Real Multi-Round QA Benchmark
2
2
3
3
## Overview
4
4
5
-
This benchmark is designed to identify **the maximum number of user sessions ($C\times S$) that can be kept active while maintaining a steady-state TTFT ≤ 2 s (95-th percentile)**. By sweeping the concurrency (C) and sequential (S) independently, it isolates whether compute capacity or KV-cache pressure is the first limiting factor.
5
+
This benchmark is designed to identify **the maximum harmonic mean of user sessions $(C,S)$ that can be kept active while maintaining a steady-state TTFT ≤ 2 s (95-th percentile)**. By sweeping the concurrency ($C$) and sequential ($S$) independently, it isolates whether compute capacity or KV-cache pressure is the first limiting factor.
6
+
6
7
7
8
We highly recommend monitoring vLLM/LMCache/GPU/storage metrics at the same time.
8
9
9
10
This benchmark feeds full‑length novels to your LLM server and asks many follow‑up questions, just like a book critic. It is handy for testing long‑context handling and KV‑cache tools such as LMCache.
10
11
12
+
The benchmark is called CxS (pronounced six for simplicity), referring to the product of Concurrent $\times$ Sequential users.
0 commit comments