Skip to content

cmu-llab/PBEBench

Repository files navigation

pbe-reasoning

A benchmark to evaluate the reasoning capabilities of LLMs using linguistics inspired BFCC sequential string manipulation programs in a programming by example/sound law induction setting.

Running Instructions.

Data Generation

To automatically generate samples run:

python src/data_generation/generate.py

Data Validation

To valdiate automatically generated or human written samples run:

python src/data_generation/validate.py "/path/to/samples.json"

Program Permutation/Reordering Task

Dataset creation command:

python src/permutation_eval/dataset.py --input "data/adaptive_balanced_1008_complete_promptsfile.jsonl" --output "data/adaptive_balanced_1008_permutation_promptsfile.jsonl" --max-attempts 10000 --seed 42 --strategy "fb_swap"

About

Code for the paper: "PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors