@@ -44,7 +44,7 @@ <h1 class="paper-title">
4444 </ p >
4545
4646 < div class ="badge-row ">
47- < a href ="paper. pdf " class ="badge-btn paper " target ="_blank ">
47+ < a href ="https://arxiv.org/ pdf/2505.23126 " class ="badge-btn paper " target ="_blank ">
4848 < span class ="icon "> 📄</ span > Paper
4949 </ a >
5050 < a href ="https://huggingface.co/datasets/changelinglab/PBEBench-Lite " class ="badge-btn dataset " target ="_blank ">
@@ -324,61 +324,6 @@ <h3>PBEBench-Lite on HuggingFace 🤗</h3>
324324 </ div >
325325</ section >
326326
327- <!-- ===== EVALUATION ===== -->
328- < section id ="evaluation ">
329- < div class ="container ">
330- < h2 class ="section-title "> Evaluate Your Model</ h2 >
331- < p style ="color:#4a4a4a;margin-bottom:1.5rem; ">
332- Follow these steps to evaluate a new model on PBEBench-Lite and submit results to the leaderboard:
333- </ p >
334-
335- < div class ="eval-steps ">
336- < div class ="eval-step ">
337- < p >
338- < strong > Clone the repository.</ strong > < br >
339- < code > git clone https://github.com/ymathur/symbolic-library-agent</ code >
340- and install dependencies with < code > pip install -r requirements.txt</ code > .
341- </ p >
342- </ div >
343- < div class ="eval-step ">
344- < p >
345- < strong > Download the dataset.</ strong > < br >
346- Load PBEBench-Lite from HuggingFace:
347- < code > datasets.load_dataset("changelinglab/PBEBench-Lite")</ code > .
348- Both < code > synthesis</ code > and < code > reordering</ code > splits are available.
349- </ p >
350- </ div >
351- < div class ="eval-step ">
352- < p >
353- < strong > Configure your model.</ strong > < br >
354- Add your model's API key or local path to < code > configs/</ code > .
355- Refer to < code > docs/agents.md</ code > for the agent interface specification.
356- </ p >
357- </ div >
358- < div class ="eval-step ">
359- < p >
360- < strong > Run evaluation.</ strong > < br >
361- < code > python main.py --model your_model_name --split synthesis</ code > .
362- Results are written to < code > outputs/</ code > in the standard JSONL format.
363- </ p >
364- </ div >
365- < div class ="eval-step ">
366- < p >
367- < strong > Compute metrics and submit.</ strong > < br >
368- Use < code > scripts/compute_metrics.py</ code > to compute Pass@1, EditSim, and other metrics.
369- Open a GitHub issue with your results to be added to the leaderboard.
370- </ p >
371- </ div >
372- </ div >
373-
374- < div style ="margin-top:1.5rem ">
375- < a href ="https://github.com/ymathur/symbolic-library-agent " class ="badge-btn github " target ="_blank ">
376- < span class ="icon "> 💻</ span > GitHub Repository
377- </ a >
378- </ div >
379- </ div >
380- </ section >
381-
382327<!-- ===== CITATION ===== -->
383328< section id ="citation ">
384329 < div class ="container ">
0 commit comments