Code: NIH DMPs LLM Evaluation Paper

Overview

This repository contains the code associated with the paper, “Evaluating the Performance of LLMs in Drafting NIH Data Management Plans.” In the paper, we evaluated the performance of Llama 3.3 and GPT 4.1 in drafting NIH-compliant Data Management Plans (DMPs) using two complementary approaches: automatic reference-based evaluation and human expert evaluation.

The repository includes the complete automated and human evaluation workflows. Please refer to the project inventory for all related resources, including the paper.

Standards followed

The overall codebase is organized in alignment with the FAIR-BioRS guidelines. All Python code follows PEP 8 conventions, including consistent formatting, inline comments, and docstrings. Project dependencies are fully captured in requirements.txt.

Getting Started

Step 1 — Clone the repository

git clone https://github.com/fairdataihub/nih-dmp-llm-evaluation-paper-code.git
cd dmpchef
code .

Step 2 — Create and activate a virtual environment

Windows (cmd):

python -m venv venv
venv\Scripts\activate.bat

macOS/Linux:

python -m venv venv
source venv/bin/activate

Step 3 — Install dependencies

pip install -r requirements.txt

Step 4- Running the Notebook in two approaches

This repository supports two complementary evaluation workflows. Use the appropriate notebook depending on the evaluation approach you want to run.

Automatic reference-based evaluation

Use: Automated-evaluation.ipynb

Human expert evaluation

Use: Human-evaluation.ipynb

Inputs and Outputs

The Jupyter notebook makes use of files in the dataset associated with the paper. You will need to download the dataset at add it in the input folder (call the dataset folder 'dataset'). Please refer to the project inventory for a link to the dataset.

All outputs from both evaluation pipelines (tables and figures) are saved under the outputs/ directory.

License

This work is licensed under the MIT License. See LICENSE for more information.

Feedback and contribution

Use GitHub Issues to submit feedback, report problems, or suggest improvements. You can also fork the repository and submit a Pull Request with your changes.

How to cite

If you use this code, please cite this repository using following the instructions in the CITATION.cff file.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
inputs		inputs
outputs		outputs
.gitignore		.gitignore
Automated-evaluation.ipynb		Automated-evaluation.ipynb
CITATION.cff		CITATION.cff
Human-evaluation-ordinal-models.ipynb		Human-evaluation-ordinal-models.ipynb
Human-evaluation.ipynb		Human-evaluation.ipynb
LICENSE.txt		LICENSE.txt
README.md		README.md
codemeta.json		codemeta.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code: NIH DMPs LLM Evaluation Paper

Overview

Standards followed

Getting Started

Step 1 — Clone the repository

Step 2 — Create and activate a virtual environment

Step 3 — Install dependencies

Step 4- Running the Notebook in two approaches

Automatic reference-based evaluation

Human expert evaluation

Inputs and Outputs

License

Feedback and contribution

How to cite

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Code: NIH DMPs LLM Evaluation Paper

Overview

Standards followed

Getting Started

Step 1 — Clone the repository

Step 2 — Create and activate a virtual environment

Step 3 — Install dependencies

Step 4- Running the Notebook in two approaches

Automatic reference-based evaluation

Human expert evaluation

Inputs and Outputs

License

Feedback and contribution

How to cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages