Skip to content

Latest commit

 

History

History
59 lines (42 loc) · 2.15 KB

File metadata and controls

59 lines (42 loc) · 2.15 KB

Can Large Language Models Judge in Basque?

This repository contains the model responses and evaluations in the paper: Judging Instruction Responses in a Low-Resource Language: A Case Study on Basque.

The inferences were generated for the just-eval-eus dataset, available at: Vicomtech/just-eval-instruct-eus

Contents

.
├── multi
│   ├── justeval_scores_multi.csv
│   ├── sampled_instructions_common.json
│   ├── sampled_instructions_specific-A.json
│   ├── sampled_instructions_specific-B.json
│   ├── sampled_instructions_specific-D.json
│   ├── sampled_instructions_specific-E.json
│   └── sampled_instructions_specific-G.json
└── safety
    ├── sampled_instructions_safety.json
    └── scores_safety.csv

Files

multi/

Contains the general-purpose instruction-response samples and their judge evaluations.

  • sampled_instructions_common.json: sampled instructions evaluated by all human annotators.
  • sampled_instructions_specific-[A/B/D/E/G].json: sampled instructions evaluated by a single human annotator.
  • justeval_scores_multi.csv: judge scores for the multi subset.

safety/

Contains the safety-oriented instruction-response samples and their judge evaluations.

  • sampled_instructions_safety.json: sampled safety instructions and generated responses.
  • scores_safety.csv: judge scores for the safety subset.

Data

The sampled_instructions JSON files include the generated responses from the inference models described in the paper.

The score CSV files include the corresponding judge responses and scores.

Citation

If you use this repository, please cite the paper:

@inproceedings{ponce2026judging,
  title = {Judging Instruction Responses in a Low-Resource Language: A Case Study on Basque},
  author = {Ponce, David and Gete, Harritxu and Etchegoyhen, Thierry and Zubiaga, Irune and Soroa, Aitor},
  booktitle = {Proceedings of the 15th edition of the Language Resources and Evaluation Conference (LREC 2026)},
  note = {to appear}
  year = {2026}
}