Skip to content

Commit 45f51ad

Browse files
committed
docs (README): update features description
1 parent eddefaf commit 45f51ad

1 file changed

Lines changed: 7 additions & 6 deletions

File tree

README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,17 @@
22

33
Infrastructure for **AI model comparison and evaluation in symbolic execution workflows**.
44

5-
PySymBench is a **local web application** for evaluating ONNX models against a non-AI baseline symbolic execution strategy. Experiments run inside Docker using [PySymGym](https://github.com/PySymGym/PySymGym) tools on a fixed dataset; results are emailed back to the user and (when published) saved to a leaderboard.
5+
PySymBench is a **local web application** for running ONNX models as guiding strategies in symbolic execution. Experiments run inside Docker using [PySymGym](https://github.com/PySymGym/PySymGym) tools on a fixed test set; results are emailed back to the user and saved to a public leaderboard.
66

7-
Three target languages are supported for the dataset: **C#**, **Java**, and **C++**.
7+
The platform is designed to cover three target languages — **C#**, **Java** and **C++** — but only **C#** is available right now; Java and C++ are in development.
88

99
## Features
1010

11-
- **Run Experiment** — upload an ONNX model, choose a target language, select methods from the dataset, and compare the model against the baseline strategy. Coverage, errors and timing are emailed to you. Each running task can be cancelled via a one-click link in the confirmation email.
12-
- **Model Ranking** — a leaderboard of all completed experiments per language (with an aggregated view across languages), sorted by mean coverage. Per-experiment metrics include mean/median coverage, total tests, errors, runtime, and coverage percentage.
13-
- **Pairwise Comparison** — pick any two experiments from the ranking and produce side-by-side comparison artifacts (PDFs) downloadable individually or as a single zip.
14-
- **Model Interface docs** — page that describes the ONNX input/output specification required to plug a model into PySymGym.
11+
- **Run Experiment** — name the experiment, upload an ONNX model, pick a test set (currently C# or the "All" mode that runs the model against every available test set), and provide an email. The experiment is launched in Docker; metrics and artifacts are emailed back when it finishes. While the task is in progress it can be cancelled via a one-click link in the confirmation email.
12+
- **Model Ranking** — leaderboard of all completed experiments, split into tabs by language plus an **All Methods** tab. Rows are sorted by mean coverage (with ties broken by total tests, total errors, recency and runtime); every column is also sortable in the UI. Per-experiment metrics include mean/median coverage, total tests, errors, runtime and the share of methods that produced results.
13+
- **All Methods mode** — a dedicated experiment mode that runs the model against every language's test set. Each per-language run produces its own leaderboard entry (with that language's metrics), and an additional aggregated entry covering all languages is shown in the **All Methods** tab.
14+
- **Pairwise Comparison** — select any two experiments from a ranking tab and produce side-by-side comparison artifacts (PDFs), downloadable individually or as a single zip.
15+
- **Model Interface docs** — page describing the ONNX input/output specification a model must satisfy to be runnable by the experiment pipeline (tensor names, shapes, graph encoding).
1516

1617
### Routes
1718

0 commit comments

Comments
 (0)