Skip to content

Commit 9b18f93

Browse files
committed
update page
1 parent 1047387 commit 9b18f93

1 file changed

Lines changed: 39 additions & 20 deletions

File tree

index.html

Lines changed: 39 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,11 @@
1818
<meta name="keywords" content="large language models, LLM, psychometrics, evaluation, validation, enhancement, survey">
1919
<meta name="viewport" content="width=device-width, initial-scale=1">
2020

21-
<title>Large Language Model Psychometrics: A Comprehensive Survey of Evaluation, Validation, and Enhancement</title>
22-
<link rel="icon" type="image/x-icon" href="static/images/logo-survey.ico">
21+
<title>Large Language Model Psychometrics:
22+
A Comprehensive Survey of
23+
Evaluation, Validation, and Enhancement
24+
</title>
25+
<!-- <link rel="icon" type="image/x-icon" href="static/images/logo-survey.ico"> -->
2326
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
2427

2528
<link rel="stylesheet" href="static/css/bulma.min.css">
@@ -460,7 +463,8 @@ <h1 class="title is-1 publication-title">Large Language Model Psychometrics: A C
460463
<h2 class="title is-3 has-text-centered">Abstract</h2>
461464
<div class="abstract content">
462465
<p>
463-
The evolving capabilities of large language models (LLMs) have outpaced traditional evaluation methodologies and introduced novel evaluation challenges,
466+
The survey will be released soon. Stay tuned!
467+
<!-- The evolving capabilities of large language models (LLMs) have outpaced traditional evaluation methodologies and introduced novel evaluation challenges,
464468
such as assessing human-like psychological constructs, addressing the limitations
465469
of static and task-specific benchmarks, and meeting the requirement for human-centered
466470
evaluation. These challenges intersect with psychometrics, the science
@@ -474,7 +478,7 @@ <h2 class="title is-3 has-text-centered">Abstract</h2>
474478
framework for researchers from various backgrounds, facilitating a comprehensive
475479
understanding of this emerging field. We aim to offer insights into developing
476480
future evaluation paradigms for human-level AI and advance human-centered AI
477-
psychology for the greater common good.
481+
psychology for the greater common good. -->
478482
</p>
479483
</div>
480484
</div>
@@ -565,9 +569,9 @@ <h2 class="title is-3 has-text-centered">Measuring Psychological Constructs</h2>
565569
</div>
566570

567571
<div class="text-content-box">
568-
<h4 class="gradient-title">Psychological Domains in LLM Research</h4>
572+
<h4 class="gradient-title">Psychological Constructs in LLM Research</h4>
569573
<p>
570-
We group the traits probed in LLMs into two super‑domains. Personality Constructs capture relatively enduring dispositions and preferences. Under this heading we place (1) Personality traits measured by Big Five, HEXACO, MBTI or Dark‑Triad batteries; (2) Values inventories such as Schwartz, WVS, VSM and GLOBE; (3) Morality scales—MFT, DIT, ETHICS—gauging moral foundations; and (4) Attitudes & Opinions from political panels like ANES, ATP, GLES and PCT. By contrast, Cognitive Constructs probe fluid abilities. These include (1) Heuristics & Biases tasks such as the Cognitive Reflection Test; (2) Social interaction abilities—Theory of Mind, Emotional and Social Intelligence; (3) Psychology of Language modules covering comprehension, generation and acquisition; and (4) Learning & Cognitive Capabilities assessed through open‑ended similarity or reasoning items. Together these two domains provide a comprehensive blueprint for modelling an LLM's "personality" and "cognition" with standard psychometric tools.
574+
LLM psychometrics evaluates LLMs in their personality and cognitive constructs. Personality constructs include (1) personality traits measured based on theories such as Big Five, HEXACO, MBTI, or Dark Triad; (2) values based on theories such as Schwartz, WVS, VSM, and GLOBE; (3) morality based on MFT, DIT, and ETHICS; and (4) attitudes and opinions from political panels like ANES, ATP, GLES, and PCT. In contrast, cognitive constructs include (1) heuristics and biases measured by tasks such as the Cognitive Reflection Test; (2) social interaction abilities—Theory of Mind, Emotional and Social Intelligence; (3) psychology of language modules covering comprehension, generation, and acquisition; and (4) learning and cognitive capabilities.
571575
</p>
572576
</div>
573577
</div>
@@ -615,9 +619,9 @@ <h2 class="title is-3 has-text-centered">Psychometric Validation</h2>
615619
</div>
616620

617621
<div class="text-content-box">
618-
<h4 class="gradient-title">Validation Framework for LLM Assessments</h4>
622+
<h4 class="gradient-title">Validation Framework for LLM Psychometrics</h4>
619623
<p>
620-
Applying psychometrics to LLMs demands the same evidential standards used for human tests. Reliability is assessed via testretest stability, parallel‑form equivalence, and interrater agreement when subjective coding is involved. Validity evidence is gathered on multiple fronts: content (guarding against trainingdata contamination and item representativeness), construct (ensuring responses reflect the intended latent trait rather than response sets or social desirability bias), and criterion or ecological correspondence with external benchmarks. We also emphasize emerging standardsnondisclosure of test materials, fairness across languages and cultures, and suitability of item difficulty to model capability—to prevent inflation of scores and to promote transparent comparisons across models and human baselines.
624+
Applying psychometrics to LLMs requires validation. Reliability is assessed through test-retest reliability, parallel forms reliability, and inter-rater reliability when subjective coding is involved. Validity evidence is gathered on multiple fronts: content (guarding against training-data contamination and item representativeness), construct (ensuring responses reflect the intended latent trait rather than response sets or social desirability bias), and criterion or ecological correspondence with external benchmarks. We also gather emerging standards, such as non-disclosure of test materials, fairness across languages and cultures, and the suitability of tests for model capabilities.
621625
</p>
622626
</div>
623627
</div>
@@ -641,7 +645,7 @@ <h2 class="title is-3 has-text-centered">LLM Enhancement Techniques</h2>
641645
</p>
642646

643647
<ul>
644-
<li><span class="highlight">Trait Manipulation</span>: Techniques for controlling LLM personality expressions through prompting, inference-time interventions, and fine-tuning</li>
648+
<li><span class="highlight">Trait Manipulation</span>: Techniques for controlling LLM traits through prompting, inference-time interventions, and fine-tuning</li>
645649
<li><span class="highlight">Safety and Alignment</span>: Leveraging psychometric measurements to guide alignment with human values and improve safety</li>
646650
<li><span class="highlight">Cognitive Enhancement</span>: Methods for developing more human-like reasoning, empathy, and communication capabilities</li>
647651
</ul>
@@ -667,13 +671,13 @@ <h2 class="title is-3 has-text-centered">Future Directions</h2>
667671
</p>
668672

669673
<ul>
670-
<li><span class="highlight">Psychometric Validation</span>: Establish rigorous reliability and validity checks to confirm that measured traits truly reflect LLM capabilities.</li>
674+
<li><span class="highlight">Psychometric Validation</span>: Establish rigorous reliability and validity checks.</li>
671675
<li><span class="highlight">From Human Constructs to LLM Constructs</span>: Reframe classical psychological constructs so they align with the representational space and behavior patterns of language models.</li>
672-
<li><span class="highlight">Perceivedvs.Aligned Traits</span>: Separate traits that humans infer from outputs from those explicitly optimized during alignment.</li>
673-
<li><span class="highlight">Anthropomorphization Challenges</span>: Avoid methodological pitfalls that arise when interpreting stochastic text as evidence of stable intentions.</li>
674-
<li><span class="highlight">Expanding Dimensions in Model Deployment</span>: Extend evaluations to multilingual, multiturn, multimodal, and multiagent contexts where new validity issues emerge.</li>
675-
<li><span class="highlight">Item Response Theory</span>: Adopt multidimensional IRT and adaptive testing to boost efficiency and precision of LLM assessments.</li>
676-
<li><span class="highlight">From Evaluation to Enhancement</span>: Use psychometric insights to fine‑tune, augment data, and steer models toward predictable, aligned behavior.</li>
676+
<li><span class="highlight">Perceived vs. Aligned Traits</span>: Distinguish between traits that humans perceive from LLM outputs and those aligned with human self-views.</li>
677+
<li><span class="highlight">Anthropomorphization Challenges</span>: Properly anthropomorphizing LLMs in psychometric tests remains a controversial topic.</li>
678+
<li><span class="highlight">Expanding Dimensions in Model Deployment</span>: Extend evaluations to multilingual, multi-turn, multimodal, agent, and multi-agent contexts where new validity issues emerge.</li>
679+
<li><span class="highlight">Item Response Theory</span>: Adopt sophisticated IRT models and adaptive testing to improve LLM evaluation.</li>
680+
<li><span class="highlight">From Evaluation to Enhancement</span>: Use psychometric insights to enhance and align LLMs.</li>
677681
</ul>
678682
</div>
679683
</div>
@@ -690,13 +694,28 @@ <h2 class="title is-3 has-text-centered">Citation</h2>
690694
<div class="columns is-centered">
691695
<div class="column">
692696
<div class="citation-box">
693-
<code>@article{ye2025llmpsychometrics,
697+
<pre>
698+
@article{ye2025llmpsychometrics,
694699
title={Large Language Model Psychometrics: A Comprehensive Survey of Evaluation, Validation, and Enhancement},
695-
author={Ye, Haoran and [Other Authors]},
696-
journal={[Journal Name]},
700+
author={Ye, Haoran and Jin, Jing and Xie, Yuhang, and Zhang, Xin and Song, Guojie},
697701
year={2025},
698-
publisher={[Publisher]}
699-
}</code>
702+
}
703+
</pre>
704+
<button onclick="copyCitation()">Copy Citation</button>
705+
<script>
706+
function copyCitation() {
707+
const citationText = `@article{ye2025llmpsychometrics,
708+
title={Large Language Model Psychometrics: A Comprehensive Survey of Evaluation, Validation, and Enhancement},
709+
author={Ye, Haoran and Jin, Jing and Xie, Yuhang, and Zhang, Xin and Song, Guojie},
710+
year={2025},
711+
}`;
712+
navigator.clipboard.writeText(citationText).then(() => {
713+
alert('Citation copied to clipboard!');
714+
}, (err) => {
715+
console.error('Could not copy text: ', err);
716+
});
717+
}
718+
</script>
700719
</div>
701720
</div>
702721
</div>

0 commit comments

Comments
 (0)