Skip to content

Commit 34771d1

Browse files
committed
update in news
1 parent 925ca79 commit 34771d1

5 files changed

Lines changed: 21 additions & 13 deletions

File tree

_bibliography/papers.bib

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,19 @@ @article{huihan2024culture
66
author={Huihan Li* and Arnav Goel* and Keyu He and Xiang Ren},
77
year={2025},
88
journal={ICLR},
9-
abstract={This paper introduces MEMOed, a framework to analyze whether AI generations are driven by memorization or generalization, with a focus on cultural symbols.},
9+
abstract={In open-ended generative tasks like narrative writing or dialogue, large language models often exhibit cultural biases, showing limited knowledge and generating templated outputs for less prevalent cultures. Recent works show that these biases may stem from uneven cultural representation in pretraining corpora. This work investigates how pretraining leads to biased culture-conditioned generations by analyzing how models associate entities with cultures based on pretraining data patterns. We propose the MEMOed framework (MEMOrization from pretraining document) to determine whether a generation for a culture arises from memorization. Using MEMOed on culture-conditioned generations about food and clothing for 110 cultures, we find that high-frequency cultures in pretraining data yield more generations with memorized symbols, while some low-frequency cultures produce none. Additionally, the model favors generating entities with extraordinarily high frequency regardless of the conditioned culture, reflecting biases toward frequent pretraining terms irrespective of relevance. We hope that the MEMOed framework and our insights will inspire more works on attributing model performance on pretraining data.},
1010
selected={true},
1111
url={https://arxiv.org/pdf/2412.20760},
1212
doi={10.48550/arXiv.2412.20760},
13-
pdf={2412.20760v1.pdf}
13+
pdf={Attributing_Culture_Cond.pdf}
1414
}
1515

1616
@article{keyu2025explanations,
1717
title={ELI-Why: Evaluating the Pedagogical Utility of LLM Explanations},
1818
author={Brihi Joshi* and Keyu He* and Sahana Ramnath and Sadra Sabouri and Kaitlyn Zhou and Souti Chattopadhyay and Swabha Swayamdipta and Xiang Ren},
1919
year={2025},
2020
journal={Submitted to ACL, Under review},
21-
abstract={Evaluate the pedagogical utility of LLMs in tailoring explanations to users with different educational backgrounds.},
21+
abstract={Language models today are widely used in education, yet their ability to tailor responses for learners with varied informational needs and knowledge backgrounds remains under-explored. To this end, we introduce ELI-WHY, a benchmark of 13.4K "Why" questions to assess the pedagogical capabilities of LLMs. We then conduct two extensive human studies to assess the utility of LLM-generated explanatory answers (explanations) on our benchmark, tailored to three distinct educational grades: elementary, high-school, and graduate school. In our first study, human raters assume the role of an "educator" to assess model explanations' fit to different educational grades. We find that GPT-4-generated explanations match their intended educational background only 50% of the time, compared to 79% for human-curated explanations. In our second study, human raters assume the role of a learner to assess if an explanation fits their own informational needs. Results show that users deemed GPT-4-generated explanations relatively 20% less suited to their informational needs, particularly for advanced learners. Additionally, automated evaluation metrics reveal that GPT-4 explanations for different informational needs remain indistinguishable in their grade-level, limiting their pedagogical effectiveness. These findings suggest that LLMs' ability to follow inference-time instructions alone is insufficient for producing high-utility explanations tailored to users' informational needs.},
2222
selected={true},
2323
pdf={ELI_Why_Evaluating_the_Pe.pdf}
2424
}
@@ -28,7 +28,7 @@ @article{keyu2025vlm
2828
author={Keyu He and Brihi Joshi and Tejas Srinivasan and Swabha Swayamdipta},
2929
year={2025},
3030
journal={Under preparation for NeurIPS},
31-
abstract={Identify limitations of current text-only metric and explore new vision-specific qualities to improve trust in explanations by VLMs.},
31+
abstract={Visual Language Models (VLMs) are deployed in scenarios where users lack direct access to visual stimuli, such as remote sensing, robotics, and assistance for people with visual impairments. Despite their utility, these models can produce hallucinated outputs that may mislead users. In this work, we investigate the role of explanation quality in calibrating user trust and reliance on VLM outputs. We propose new qualities, Visual Fidelity and Contrastiveness, to complement traditional text-only measures. Through quantitative evaluations on A-OKVQA and VizWiz datasets and a user study, our results indicate that explanations enriched with quality signals lead to a lower unsure rate and improved prediction accuracy and utility in AI-assisted decision-making. We also highlight limitations and future directions to further enhance the interpretability and reliability of VLM-generated rationales.},
3232
selected={true}
3333
}
3434

_news/2024-11-15-vlm-research.md

Lines changed: 0 additions & 7 deletions
This file was deleted.

_news/2024-12-01-kaggle-medal.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,15 @@
11
---
22
layout: post
33
title: Silver Medal in Kaggle Competition
4-
date: 2024-12-01
4+
date: 2024-06-21
55
---
66

7-
Achieved a **top 3.4% rank globally** in the **LLM-Prompt-Recovery Challenge** for refining prompt recovery methods using advanced similarity metrics and LoRA-based fine-tuning.
7+
Thrilled to share that our team won a **silver medal** (**top 3.4% globally**) in the LLM-Prompt-Recovery Challenge on Kaggle! 🥈
8+
9+
The task involved recovering original user prompts from Gemma-generated completions. I led model finetuning focusing on a custom scoring strategy: a sharpened cosine similarity using sentence-t5-base.
10+
11+
We used LoRA for efficient finetuning, and also explored light adversarial attacks, such as appending generic prompts, to game the metric. We ultimately achieving a high similarity score of 0.657, only 0.059 away from the top team.
12+
13+
A fun and rewarding challenge blending research and engineering!
14+
15+

_news/2025-02-15-ELI-research.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
layout: post
3+
title: Research on Pedagogical Utility of LLM Explanations
4+
date: 2025-02-15
5+
---
6+
7+
I am excited to share that we just submitted our research paper titled **"ELI-Why: Evaluating the Pedagogical Utility of LLM Explanations"** to ACL Rolling Review. In this work, we introduced **ELI-Why**, a benchmark to assess the pedagogical capabilities of LLMs, and we found that inference-time instructions alone is insufficient for LLMs to produce high-utility explanations tailored to users' informational needs.

assets/img/prof_pic-800.webp

93.8 KB
Loading

0 commit comments

Comments
 (0)