Skip to content

New Alternative to LLM-as-a-judge!  #24

@milangritta

Description

@milangritta

Hello Clementine and the Evaluation Community,

We would like to introduce you to our new metric, HumanRankEval, an alternative to the popular 'llm-as-a-judge'. Instead of using the LLM to judge machine-generated text, we use human-generated text to 'judge' the LLM! :) Please take a look, thank you very much! Let us know what you think :)

NAACL '24 PAPER LINK: https://aclanthology.org/2024.naacl-long.456/
CODE: https://github.com/huawei-noah/noah-research/tree/master/NLP/HumanRankEval
DATA: https://huggingface.co/datasets/huawei-noah/human_rank_eval

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions