Skip to content

Commit 5ada569

Browse files
Remove Evaluate Your Model section and update paper link to arXiv
1 parent 7f76ccb commit 5ada569

1 file changed

Lines changed: 1 addition & 56 deletions

File tree

docs/index.html

Lines changed: 1 addition & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ <h1 class="paper-title">
4444
</p>
4545

4646
<div class="badge-row">
47-
<a href="paper.pdf" class="badge-btn paper" target="_blank">
47+
<a href="https://arxiv.org/pdf/2505.23126" class="badge-btn paper" target="_blank">
4848
<span class="icon">📄</span> Paper
4949
</a>
5050
<a href="https://huggingface.co/datasets/changelinglab/PBEBench-Lite" class="badge-btn dataset" target="_blank">
@@ -324,61 +324,6 @@ <h3>PBEBench-Lite on HuggingFace 🤗</h3>
324324
</div>
325325
</section>
326326

327-
<!-- ===== EVALUATION ===== -->
328-
<section id="evaluation">
329-
<div class="container">
330-
<h2 class="section-title">Evaluate Your Model</h2>
331-
<p style="color:#4a4a4a;margin-bottom:1.5rem;">
332-
Follow these steps to evaluate a new model on PBEBench-Lite and submit results to the leaderboard:
333-
</p>
334-
335-
<div class="eval-steps">
336-
<div class="eval-step">
337-
<p>
338-
<strong>Clone the repository.</strong><br>
339-
<code>git clone https://github.com/ymathur/symbolic-library-agent</code>
340-
and install dependencies with <code>pip install -r requirements.txt</code>.
341-
</p>
342-
</div>
343-
<div class="eval-step">
344-
<p>
345-
<strong>Download the dataset.</strong><br>
346-
Load PBEBench-Lite from HuggingFace:
347-
<code>datasets.load_dataset("changelinglab/PBEBench-Lite")</code>.
348-
Both <code>synthesis</code> and <code>reordering</code> splits are available.
349-
</p>
350-
</div>
351-
<div class="eval-step">
352-
<p>
353-
<strong>Configure your model.</strong><br>
354-
Add your model's API key or local path to <code>configs/</code>.
355-
Refer to <code>docs/agents.md</code> for the agent interface specification.
356-
</p>
357-
</div>
358-
<div class="eval-step">
359-
<p>
360-
<strong>Run evaluation.</strong><br>
361-
<code>python main.py --model your_model_name --split synthesis</code>.
362-
Results are written to <code>outputs/</code> in the standard JSONL format.
363-
</p>
364-
</div>
365-
<div class="eval-step">
366-
<p>
367-
<strong>Compute metrics and submit.</strong><br>
368-
Use <code>scripts/compute_metrics.py</code> to compute Pass@1, EditSim, and other metrics.
369-
Open a GitHub issue with your results to be added to the leaderboard.
370-
</p>
371-
</div>
372-
</div>
373-
374-
<div style="margin-top:1.5rem">
375-
<a href="https://github.com/ymathur/symbolic-library-agent" class="badge-btn github" target="_blank">
376-
<span class="icon">💻</span> GitHub Repository
377-
</a>
378-
</div>
379-
</div>
380-
</section>
381-
382327
<!-- ===== CITATION ===== -->
383328
<section id="citation">
384329
<div class="container">

0 commit comments

Comments
 (0)