update page

henry-yeh · henry-yeh · commit 9b18f939409e · 2025-05-07T15:38:10.000+08:00
diff --git a/index.html b/index.html
@@ -18,8 +18,11 @@
   <meta name="keywords" content="large language models, LLM, psychometrics, evaluation, validation, enhancement, survey">
   <meta name="viewport" content="width=device-width, initial-scale=1">
 
-  <title>Large Language Model Psychometrics: A Comprehensive Survey of Evaluation, Validation, and Enhancement</title>
-  <link rel="icon" type="image/x-icon" href="static/images/logo-survey.ico">
+  <title>Large Language Model Psychometrics:
+    A Comprehensive Survey of
+    Evaluation, Validation, and Enhancement
+  </title>
+  <!-- <link rel="icon" type="image/x-icon" href="static/images/logo-survey.ico"> -->
   <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
 
   <link rel="stylesheet" href="static/css/bulma.min.css">
@@ -460,7 +463,8 @@ <h1 class="title is-1 publication-title">Large Language Model Psychometrics: A C
           <h2 class="title is-3 has-text-centered">Abstract</h2>
           <div class="abstract content">
             <p>
-              The evolving capabilities of large language models (LLMs) have outpaced traditional evaluation methodologies and introduced novel evaluation challenges,
+              The survey will be released soon. Stay tuned!
+              <!-- The evolving capabilities of large language models (LLMs) have outpaced traditional evaluation methodologies and introduced novel evaluation challenges,
               such as assessing human-like psychological constructs, addressing the limitations
               of static and task-specific benchmarks, and meeting the requirement for human-centered
               evaluation. These challenges intersect with psychometrics, the science
@@ -474,7 +478,7 @@ <h2 class="title is-3 has-text-centered">Abstract</h2>
               framework for researchers from various backgrounds, facilitating a comprehensive
               understanding of this emerging field. We aim to offer insights into developing
               future evaluation paradigms for human-level AI and advance human-centered AI
-              psychology for the greater common good.
+              psychology for the greater common good. -->
             </p>
           </div>
         </div>
@@ -565,9 +569,9 @@ <h2 class="title is-3 has-text-centered">Measuring Psychological Constructs</h2>
           </div>
           
           <div class="text-content-box">
-            <h4 class="gradient-title">Psychological Domains in LLM Research</h4>
+            <h4 class="gradient-title">Psychological Constructs in LLM Research</h4>
             <p>
-              We group the traits probed in LLMs into two super‑domains. Personality Constructs capture relatively enduring dispositions and preferences. Under this heading we place (1) Personality traits measured by Big Five, HEXACO, MBTI or Dark‑Triad batteries; (2) Values inventories such as Schwartz, WVS, VSM and GLOBE; (3) Morality scales—MFT, DIT, ETHICS—gauging moral foundations; and (4) Attitudes & Opinions from political panels like ANES, ATP, GLES and PCT. By contrast, Cognitive Constructs probe fluid abilities. These include (1) Heuristics & Biases tasks such as the Cognitive Reflection Test; (2) Social interaction abilities—Theory of Mind, Emotional and Social Intelligence; (3) Psychology of Language modules covering comprehension, generation and acquisition; and (4) Learning & Cognitive Capabilities assessed through open‑ended similarity or reasoning items. Together these two domains provide a comprehensive blueprint for modelling an LLM's "personality" and "cognition" with standard psychometric tools.
+              LLM psychometrics evaluates LLMs in their personality and cognitive constructs. Personality constructs include (1) personality traits measured based on theories such as Big Five, HEXACO, MBTI, or Dark Triad; (2) values based on theories such as Schwartz, WVS, VSM, and GLOBE; (3) morality based on MFT, DIT, and ETHICS; and (4) attitudes and opinions from political panels like ANES, ATP, GLES, and PCT. In contrast, cognitive constructs include (1) heuristics and biases measured by tasks such as the Cognitive Reflection Test; (2) social interaction abilities—Theory of Mind, Emotional and Social Intelligence; (3) psychology of language modules covering comprehension, generation, and acquisition; and (4) learning and cognitive capabilities.
             </p>
           </div>
         </div>
@@ -615,9 +619,9 @@ <h2 class="title is-3 has-text-centered">Psychometric Validation</h2>
           </div>
           
           <div class="text-content-box">
-            <h4 class="gradient-title">Validation Framework for LLM Assessments</h4>
+            <h4 class="gradient-title">Validation Framework for LLM Psychometrics</h4>
             <p>
-              Applying psychometrics to LLMs demands the same evidential standards used for human tests. Reliability is assessed via test–retest stability, parallel‑form equivalence, and inter‑rater agreement when subjective coding is involved. Validity evidence is gathered on multiple fronts: content (guarding against training‑data contamination and item representativeness), construct (ensuring responses reflect the intended latent trait rather than response sets or social desirability bias), and criterion or ecological correspondence with external benchmarks. We also emphasize emerging standards—non‑disclosure of test materials, fairness across languages and cultures, and suitability of item difficulty to model capability—to prevent inflation of scores and to promote transparent comparisons across models and human baselines.
+              Applying psychometrics to LLMs requires validation. Reliability is assessed through test-retest reliability, parallel forms reliability, and inter-rater reliability when subjective coding is involved. Validity evidence is gathered on multiple fronts: content (guarding against training-data contamination and item representativeness), construct (ensuring responses reflect the intended latent trait rather than response sets or social desirability bias), and criterion or ecological correspondence with external benchmarks. We also gather emerging standards, such as non-disclosure of test materials, fairness across languages and cultures, and the suitability of tests for model capabilities.
             </p>
           </div>
         </div>
@@ -641,7 +645,7 @@ <h2 class="title is-3 has-text-centered">LLM Enhancement Techniques</h2>
               </p>
               
               <ul>
-                <li><span class="highlight">Trait Manipulation</span>: Techniques for controlling LLM personality expressions through prompting, inference-time interventions, and fine-tuning</li>
+                <li><span class="highlight">Trait Manipulation</span>: Techniques for controlling LLM traits through prompting, inference-time interventions, and fine-tuning</li>
                 <li><span class="highlight">Safety and Alignment</span>: Leveraging psychometric measurements to guide alignment with human values and improve safety</li>
                 <li><span class="highlight">Cognitive Enhancement</span>: Methods for developing more human-like reasoning, empathy, and communication capabilities</li>
               </ul>
@@ -667,13 +671,13 @@ <h2 class="title is-3 has-text-centered">Future Directions</h2>
               </p>
               
               <ul>
-                <li><span class="highlight">Psychometric Validation</span>: Establish rigorous reliability and validity checks to confirm that measured traits truly reflect LLM capabilities.</li>
+                <li><span class="highlight">Psychometric Validation</span>: Establish rigorous reliability and validity checks.</li>
                 <li><span class="highlight">From Human Constructs to LLM Constructs</span>: Reframe classical psychological constructs so they align with the representational space and behavior patterns of language models.</li>
-                <li><span class="highlight">Perceived vs. Aligned Traits</span>: Separate traits that humans infer from outputs from those explicitly optimized during alignment.</li>
-                <li><span class="highlight">Anthropomorphization Challenges</span>: Avoid methodological pitfalls that arise when interpreting stochastic text as evidence of stable intentions.</li>
-                <li><span class="highlight">Expanding Dimensions in Model Deployment</span>: Extend evaluations to multilingual, multi‑turn, multimodal, and multi‑agent contexts where new validity issues emerge.</li>
-                <li><span class="highlight">Item Response Theory</span>: Adopt multidimensional IRT and adaptive testing to boost efficiency and precision of LLM assessments.</li>
-                <li><span class="highlight">From Evaluation to Enhancement</span>: Use psychometric insights to fine‑tune, augment data, and steer models toward predictable, aligned behavior.</li>
+                <li><span class="highlight">Perceived vs. Aligned Traits</span>: Distinguish between traits that humans perceive from LLM outputs and those aligned with human self-views.</li>
+                <li><span class="highlight">Anthropomorphization Challenges</span>: Properly anthropomorphizing LLMs in psychometric tests remains a controversial topic.</li>
+                <li><span class="highlight">Expanding Dimensions in Model Deployment</span>: Extend evaluations to multilingual, multi-turn, multimodal, agent, and multi-agent contexts where new validity issues emerge.</li>
+                <li><span class="highlight">Item Response Theory</span>: Adopt sophisticated IRT models and adaptive testing to improve LLM evaluation.</li>
+                <li><span class="highlight">From Evaluation to Enhancement</span>: Use psychometric insights to enhance and align LLMs.</li>
               </ul>
             </div>
           </div>
@@ -690,13 +694,28 @@ <h2 class="title is-3 has-text-centered">Citation</h2>
       <div class="columns is-centered">
         <div class="column">
           <div class="citation-box">
-            <code>@article{ye2025llmpsychometrics,
+            <pre>
+@article{ye2025llmpsychometrics,
   title={Large Language Model Psychometrics: A Comprehensive Survey of Evaluation, Validation, and Enhancement},
-  author={Ye, Haoran and [Other Authors]},
-  journal={[Journal Name]},
+  author={Ye, Haoran and Jin, Jing and Xie, Yuhang, and Zhang, Xin and Song, Guojie},
   year={2025},
-  publisher={[Publisher]}
-}</code>
+}
+            </pre>
+            <button onclick="copyCitation()">Copy Citation</button>
+            <script>
+              function copyCitation() {
+                const citationText = `@article{ye2025llmpsychometrics,
+  title={Large Language Model Psychometrics: A Comprehensive Survey of Evaluation, Validation, and Enhancement},
+  author={Ye, Haoran and Jin, Jing and Xie, Yuhang, and Zhang, Xin and Song, Guojie},
+  year={2025},
+}`;
+                navigator.clipboard.writeText(citationText).then(() => {
+                  alert('Citation copied to clipboard!');
+                }, (err) => {
+                  console.error('Could not copy text: ', err);
+                });
+              }
+            </script>
           </div>
         </div>
       </div>