Skip to content

Commit 6ae4a93

Browse files
authored
feat: replace controversial examples with neutral alternatives (#1844)
1 parent f522b0c commit 6ae4a93

4 files changed

Lines changed: 18 additions & 18 deletions

File tree

benchmark/RAG/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -428,11 +428,11 @@ Example (single result):
428428

429429
```json
430430
{
431-
"_global_index": 2,
432-
"question": "When did Caroline go to the LGBTQ support group?",
433-
"gold_answers": ["7 May 2023"],
431+
"_global_index": 18,
432+
"question": "When did Melanie sign up for a pottery class?",
433+
"gold_answers": ["2 July 2023"],
434434
"llm": {
435-
"final_answer": "7 May 2023 (the day before the chat at 1:56 pm on 8 May, 2023)"
435+
"final_answer": "2 July 2023 (mentioned in the conversation on 3 July 2023)"
436436
},
437437
"metrics": {
438438
"Recall": 1.0,
@@ -441,7 +441,7 @@ Example (single result):
441441
},
442442
"llm_evaluation": {
443443
"prompt_used": "Locomo_0or4",
444-
"reasoning": "The generated answer explicitly includes the exact date 7 May 2023 that matches the gold answer...",
444+
"reasoning": "The generated answer explicitly includes the exact date 2 July 2023 that matches the gold answer...",
445445
"normalized_score": 4
446446
}
447447
}

benchmark/RAG/README_zh.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -428,11 +428,11 @@ datasets/{dataset_name}/viking_store_index_dir
428428

429429
```json
430430
{
431-
"_global_index": 2,
432-
"question": "When did Caroline go to the LGBTQ support group?",
433-
"gold_answers": ["7 May 2023"],
431+
"_global_index": 18,
432+
"question": "When did Melanie sign up for a pottery class?",
433+
"gold_answers": ["2 July 2023"],
434434
"llm": {
435-
"final_answer": "7 May 2023 (the day before the chat at 1:56 pm on 8 May, 2023)"
435+
"final_answer": "2 July 2023 (mentioned in the conversation on 3 July 2023)"
436436
},
437437
"metrics": {
438438
"Recall": 1.0,
@@ -441,7 +441,7 @@ datasets/{dataset_name}/viking_store_index_dir
441441
},
442442
"llm_evaluation": {
443443
"prompt_used": "Locomo_0or4",
444-
"reasoning": "The generated answer explicitly includes the exact date 7 May 2023 that matches the gold answer...",
444+
"reasoning": "The generated answer explicitly includes the exact date 2 July 2023 that matches the gold answer...",
445445
"normalized_score": 4
446446
}
447447
}

openviking/prompts/templates/compression/ov_wm_v2_update.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ template: |
4444
The goal is a document that lets a future assistant seamlessly continue the
4545
conversation. Concrete facts (names, dates, decisions, relationships, exact
4646
values) MUST survive. But repetitive events of the same type should be
47-
consolidated into patterns — "Caroline attended 5 pride parades" is better
47+
consolidated into patterns — "Tim competed in 5 swimming contests" is better
4848
than 5 separate entries. The test: can a new assistant answer the user's
4949
likely follow-up questions from this WM alone?
5050
@@ -115,9 +115,9 @@ template: |
115115
2. Preserve: names, dates, exact numbers, decisions with rationale,
116116
relationships, preferences. These are non-negotiable anchors.
117117
3. Drop: intermediate chronological steps that don't change the
118-
current state (e.g., 5 separate "attended pride parade" entries
119-
become "Caroline regularly attends pride parades; most recently
120-
on [date], felt [emotion]").
118+
current state (e.g., 5 separate "competed in swimming contest" entries
119+
become "Tim regularly competes in swimming contests; won 2 gold
120+
medals, most recently on [date]").
121121
4. The consolidation target is a "continuation-grade" fact list:
122122
enough to answer follow-up questions without re-reading history,
123123
but not a verbatim transcript of every event.

tests/unit/session/test_working_memory_growth.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -148,8 +148,8 @@ def test_key_facts_consolidation_rejected_when_trivially_small():
148148
_RICH_KEY_FACTS = "\n".join([
149149
"- Caroline adopted a rescue dog named Biscuit on 15 March 2024.",
150150
"- Melanie's family took 3 camping trips to Yellowstone because the kids love hiking.",
151-
"- Caroline attended pride parade on 10 June 2023, felt empowered.",
152-
"- Caroline attended pride parade on 9 June 2024, brought friends.",
151+
"- Caroline competed in swimming contest on 10 June 2023, won gold medal.",
152+
"- Caroline competed in swimming contest on 9 June 2024, brought friends to cheer.",
153153
"- Sweden trip planned for August 2025 with budget of 5000 dollars.",
154154
"- Decided to use Python because the team has 4 years experience.",
155155
"- Marcus is Caroline's brother, lives in Portland.",
@@ -191,7 +191,7 @@ def test_key_facts_consolidation_rejected_when_trivially_small():
191191

192192
_GOOD_CONSOLIDATION = "\n".join([
193193
"- Caroline adopted rescue dog Biscuit (golden retriever mix, 3 years old) on 15 March 2024; vet at Pawsome Clinic every 6 months. Neighbor Jake (retired teacher, next door since 2020) offered to pet-sit during Sweden trip.",
194-
"- Caroline regularly attends pride parades; most recently on 9 June 2024, brought friends.",
194+
"- Caroline regularly competes in swimming contests; most recently on 9 June 2024, won 2 gold medals total.",
195195
"- Caroline's art: switched from acrylic to watercolor (March 2024) because of studio ventilation; exhibition on 22 November 2024 at Gallery One (12 paintings, 500 dollars space); studio in garage, converted 2023. Budget 200 dollars/month.",
196196
"- Caroline took ceramics class starting January 2025; resolved to run half-marathon Spring 2025 (target under 2 hours, runs 5 miles mornings along river trail, group meets Wednesdays 6 AM).",
197197
"- Caroline volunteers at shelter every Saturday morning.",
@@ -264,7 +264,7 @@ def test_key_facts_consolidation_accepted_at_low_volume_with_anchors():
264264

265265
consolidated = "\n".join([
266266
"- Caroline adopted Biscuit (golden retriever, 3 years old) on 15 March 2024; vet every 6 months at Pawsome Clinic.",
267-
"- Caroline attends pride parades regularly; latest 9 June 2024.",
267+
"- Caroline competes in swimming contests regularly; latest 9 June 2024, 2 gold medals total.",
268268
"- Caroline's art: watercolor (switched March 2024 because ventilation); exhibition 22 November 2024 at Gallery One, 12 paintings, 500 dollars. Studio in garage since 2023. Budget 200 dollars/month. Also took ceramics January 2025.",
269269
"- Caroline runs half-marathon Spring 2025, target under 2 hours; runs 5 miles mornings, group Wednesdays 6 AM. Volunteers shelter Saturdays.",
270270
"- Marcus: Caroline's brother, Portland. Joint savings, Sunday calls. Committed to Sweden trip help.",

0 commit comments

Comments
 (0)