Update index.html

jinjing777 · web-flow · commit 7bad576da519 · 2025-05-12T08:12:04.000+08:00
diff --git a/index.html b/index.html
@@ -22,7 +22,7 @@
     A Systematic Review of
     Evaluation, Validation, and Enhancement
   </title>
-  <link rel="icon" type="image/x-icon" href="static/images/logo_llm.ico">
+  <link rel="icon" type="image/x-icon" href="static/images/llm_psychometrics.ico">
   <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
 
   <link rel="stylesheet" href="static/css/bulma.min.css">
@@ -490,7 +490,7 @@ <h2 class="title is-3 has-text-centered">Comparison: Psychometrics vs AI Benchma
           <tbody>
             <tr>
               <td>Core goal</td>
-              <td>To prove that a test measures what it is intended to measure (validity evidence) and to understand the construct being measured.</td>
+              <td>To measure psychological constructs, to prove that a test measures as intended (validity evidence), and to understand the construct being measured.</td>
               <td>To test and compare the task performance of different LLMs. Focuses on ranking models and selecting the best one suited for a specific task.</td>
             </tr>
             <tr>
@@ -520,13 +520,13 @@ <h2 class="title is-3 has-text-centered">Comparison: Psychometrics vs AI Benchma
             </tr>
             <tr>
               <td>Sample size</td>
-              <td>Typically requires a larger sample size of individuals for robust statistical modeling.</td>
+              <td>Typically requires a larger sample size of test takers for robust statistical modeling.</td>
               <td>Can be applied to evaluate the performance of a single LLM on the benchmark.</td>
             </tr>
             <tr>
               <td>Statistical modeling</td>
               <td>Employs advanced and various statistical models like Item Response Theory and Factor Analysis to analyze data, estimate latent abilities, and assess model fit.</td>
-              <td>Often relies on simple aggregation methods, such as calculating average accuracy across benchmarks.</td>
+              <td>Often relies on simple aggregation methods, such as calculating average accuracy across benchmark tasks.</td>
             </tr>
             <tr>
               <td>Result analysis</td>