Skip to content

Commit 53ad94b

Browse files
committed
Document running the entire benchmarking suite and testing if the results differ from the paper.
1 parent 8c18397 commit 53ad94b

1 file changed

Lines changed: 79 additions & 2 deletions

File tree

benchmarking/README.md

Lines changed: 79 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,93 @@ This directory contains sacred configuration files for benchmarking imitation's
44

55
Configuration files can be loaded either from the CLI or from the Python API. The examples below assume that your current working directory is the root of the `imitation` repository. This is not necessarily the case and you should adjust your paths accordingly.
66

7-
## CLI
7+
To run a single benchmark from the command line:
88

99
```bash
1010
python -m imitation.scripts.<train_script> <algo> with benchmarking/<config_name>.json
1111
```
12+
1213
`train_script` can be either 1) `train_imitation` with `algo` as `bc` or `dagger` or 2) `train_adversarial` with `algo` as `gail` or `airl`.
1314

14-
## Python
15+
To run a single benchmark from Python add the config to your experiment:
1516

1617
```python
1718
...
1819
ex.add_config('benchmarking/<config_name>.json')
1920
```
21+
22+
To generate the commands to run the entire benchmarking suite with multiple random seeds:
23+
24+
```bash
25+
python experiments/commands.py \
26+
--name=run0 \
27+
--cfg_pattern=benchmarking/example_*.json \
28+
--seeds 0,1,2 \
29+
--output_dir=output
30+
```
31+
32+
To run those commands in parallel:
33+
34+
```bash
35+
python experiments/commands.py ... | parallel -j 8
36+
```
37+
38+
To generate the commands for the Hofvarpnir cluster:
39+
40+
```bash
41+
python experiments/commands.py \
42+
--name=run0 \
43+
--cfg_pattern=benchmarking/example_*.json \
44+
--seeds 0,1,2 \
45+
--output_dir=/data/output \
46+
--remote
47+
```
48+
49+
To run those commands pipe them into bash:
50+
51+
```bash
52+
python experiments/commands.py ... | bash
53+
```
54+
55+
To produce a table with all the results:
56+
57+
```bash
58+
python -m imitation.scripts.analyze analyze_imitation with \
59+
source_dir_str="output/sacred" table_verbosity=0 \
60+
csv_output_path=results.csv \
61+
run_name="run0"
62+
```
63+
64+
To compute a p-value to test whether the differences from the paper are statistically significant:
65+
66+
```python
67+
import pandas as pd
68+
import numpy as np
69+
import scipy
70+
71+
data = pd.read_csv("results.csv")
72+
data["imit_return"] = data["imit_return_summary"].apply(lambda x: float(x.split(" ")[0]))
73+
summary = data[["algo", "env_name", "imit_return"]].groupby(["algo", "env_name"]).describe()
74+
summary.columns = summary.columns.get_level_values(1)
75+
summary = summary.reset_index()
76+
77+
# Table 2 (https://arxiv.org/pdf/2211.11972.pdf)
78+
paper = pd.DataFrame.from_records([
79+
{"algo": "BC", "env_name": "seals/Ant-v0", "mean": 1953, "margin": 123},
80+
{"algo": "BC", "env_name": "seals/HalfCheetah-v0", "mean": 3446, "margin": 130},
81+
])
82+
paper["count"] = 5
83+
paper["confidence_level"] = 0.95
84+
# Back out the standard deviation from the margin of error.
85+
paper["std"] = (paper["margin"] * paper["count"]) / scipy.stats.t.ppf(1-((1-paper["confidence_level"])/2), paper["count"] -1)
86+
87+
comparison = pd.merge(summary, paper, on=["algo", "env_name"])
88+
89+
comparison["pvalue"] = scipy.stats.ttest_ind_from_stats(
90+
comparison["mean_x"],
91+
comparison["std_x"],
92+
comparison["count_x"],
93+
comparison["mean_y"],
94+
comparison["std_y"],
95+
comparison["count_y"]).pvalue
96+
```

0 commit comments

Comments
 (0)