Skip to content

Commit 7bad576

Browse files
authored
Update index.html
1 parent 7d5ee6b commit 7bad576

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

index.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
A Systematic Review of
2323
Evaluation, Validation, and Enhancement
2424
</title>
25-
<link rel="icon" type="image/x-icon" href="static/images/logo_llm.ico">
25+
<link rel="icon" type="image/x-icon" href="static/images/llm_psychometrics.ico">
2626
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
2727

2828
<link rel="stylesheet" href="static/css/bulma.min.css">
@@ -490,7 +490,7 @@ <h2 class="title is-3 has-text-centered">Comparison: Psychometrics vs AI Benchma
490490
<tbody>
491491
<tr>
492492
<td>Core goal</td>
493-
<td>To prove that a test measures what it is intended to measure (validity evidence) and to understand the construct being measured.</td>
493+
<td>To measure psychological constructs, to prove that a test measures as intended (validity evidence), and to understand the construct being measured.</td>
494494
<td>To test and compare the task performance of different LLMs. Focuses on ranking models and selecting the best one suited for a specific task.</td>
495495
</tr>
496496
<tr>
@@ -520,13 +520,13 @@ <h2 class="title is-3 has-text-centered">Comparison: Psychometrics vs AI Benchma
520520
</tr>
521521
<tr>
522522
<td>Sample size</td>
523-
<td>Typically requires a larger sample size of individuals for robust statistical modeling.</td>
523+
<td>Typically requires a larger sample size of test takers for robust statistical modeling.</td>
524524
<td>Can be applied to evaluate the performance of a single LLM on the benchmark.</td>
525525
</tr>
526526
<tr>
527527
<td>Statistical modeling</td>
528528
<td>Employs advanced and various statistical models like Item Response Theory and Factor Analysis to analyze data, estimate latent abilities, and assess model fit.</td>
529-
<td>Often relies on simple aggregation methods, such as calculating average accuracy across benchmarks.</td>
529+
<td>Often relies on simple aggregation methods, such as calculating average accuracy across benchmark tasks.</td>
530530
</tr>
531531
<tr>
532532
<td>Result analysis</td>

0 commit comments

Comments
 (0)