Skip to content

Commit 1047387

Browse files
authored
Update index.html
1 parent e617dc1 commit 1047387

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -432,7 +432,7 @@ <h1 class="title is-1 publication-title">Large Language Model Psychometrics: A C
432432
</div>
433433
</section>
434434

435-
<div class="has-text-centered" style="margin-top: -1.0rem; margin-bottom: 1.5rem;">
435+
<div class="has-text-centered" style="margin-top: -2.0rem; margin-bottom: 1.5rem;">
436436
<img src="static/images/logo_survey.png" alt="LLM Psychometrics Logo" style="max-width: 180px; width: 100%; height: auto;">
437437
</div>
438438

@@ -592,7 +592,7 @@ <h2 class="title is-3 has-text-centered">Evaluation Methodology</h2>
592592
<div class="text-content-box">
593593
<h4 class="gradient-title">Assessment Techniques for LLMs</h4>
594594
<p>
595-
Our methodological framework mirrors a classic assessment pipeline but is tailored to LLM tooling. Test format can range from tightly controlled structured items (single choice or Likert) to open‑ended conversations and full agentic simulations in which the model plays a role over many turns. Data sources may come from established inventories, researcher‑curated adaptations, or synthetic prompts automatically generated to extend test coverage. Prompting strategies include perturbing the original question, injecting step‑by‑step or role‑playing instructions, and enforcing chain‑of‑thought disclosure, all designed to surface latent capacities while controlling for prompt sensitivity. Finally, output & scoring modules translate the model's raw text into numerical metrics: logits for probability‑based scoring, direct mapping to scale points, or rubric‑based evaluation of free text, optionally adjudicated by additional models or humans.
595+
Our framework mirrors a classic assessment pipeline but is tailored to LLM tooling. Test format can range from tightly controlled structured items (single choice or Likert) to open‑ended conversations and full agentic simulations in which the model plays a role over many turns. Data sources may come from established inventories, researcher‑curated adaptations, or synthetic prompts automatically generated to extend test coverage. Prompting strategies include perturbing the original question, injecting step‑by‑step or role‑playing instructions, and enforcing chain‑of‑thought disclosure, all designed to surface latent capacities while controlling for prompt sensitivity. Finally, output & scoring modules translate the model's raw text into numerical metrics: logits for probability‑based scoring, direct mapping to scale points, or rubric‑based evaluation of free text, optionally adjudicated by additional models or humans.
596596
</p>
597597
</div>
598598
</div>

0 commit comments

Comments
 (0)