Skip to content

feat: add EdTech student-tutor benchmark scenario#18

Merged
Neal006 merged 1 commit into
Neal006:mainfrom
Priyanshu-byte-coder:feat/edtech-scenario
May 24, 2026
Merged

feat: add EdTech student-tutor benchmark scenario#18
Neal006 merged 1 commit into
Neal006:mainfrom
Priyanshu-byte-coder:feat/edtech-scenario

Conversation

@Priyanshu-byte-coder
Copy link
Copy Markdown
Contributor

Summary

Implements the EdTech domain scenario requested in issues #13 and #4.

What's added

simulator/scenarios/edtech.py

  • EDTECH_FACTS — 8 facts about a student persona (name, grade, school, favourite subject, weakest subject, GPA, learning style, study hours). Four facts have mid-conversation updates simulating real learning progressions (grade advancement, subject switch, GPA improvement, learning-style evolution).
  • EDTECH_FILLER_TURNS — 20 domain-specific tutoring requests ("Can you explain the Pythagorean theorem?", "What caused the First World War?", etc.) replacing the generic tech Q&A fillers. This makes retrieval harder for keyword-matching backends.
  • EDTECH_PERSONA_POOL — 5 diverse student personas (Priya Nair, Carlos Mendez, Aisha Kamara, Haruto Tanaka, Amelia Brooks) for multi-seed benchmarking.

simulator/conversation.py

generate_conversation() now accepts an optional filler_turns parameter, making it trivial to plug in any domain's Q&A list without copying the generator.

evaluation/benchmark.py

run_benchmark() and run_benchmark_multi_seed() accept:

  • filler_turns — propagated to generate_conversation()
  • persona_pool — lets multi-seed runs use a scenario-specific persona set instead of the default PERSONA_POOL

main.py

New --scenario flag:

python main.py --scenario edtech
python main.py --scenario edtech --seeds 5
python main.py --scenario edtech --backends naive rag cascading entity

Closes #13 #4

Adds a domain-specific benchmark scenario modelling AI-tutor conversations
with hierarchical student facts (name, grade, school, favourite subject, GPA,
learning style) and four fact updates that simulate real learning progressions.

Changes:
- simulator/scenarios/edtech.py — EDTECH_FACTS, EDTECH_FILLER_TURNS, and
  EDTECH_PERSONA_POOL (5 diverse student personas for multi-seed runs)
- simulator/conversation.py — generate_conversation() now accepts an optional
  filler_turns list for domain-specific Q&A interleaving
- evaluation/benchmark.py — run_benchmark() and run_benchmark_multi_seed()
  accept filler_turns and persona_pool kwargs so any scenario can be plugged in
- main.py — --scenario flag (choices: default | edtech) resolves the right
  facts, filler, and persona pool before calling the benchmark

Usage:
  python main.py --scenario edtech
  python main.py --scenario edtech --seeds 5

Closes Neal006#13 Neal006#4
@Neal006 Neal006 merged commit 2a6fdad into Neal006:main May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM memory decay benchmark: add EdTech domain scenario (student/teacher conversation memory)

2 participants