Skip to content

Commit f014bf9

Browse files
committed
experiments: all unfilled cells are plain '-' per convention
1 parent 7b962fd commit f014bf9

1 file changed

Lines changed: 33 additions & 28 deletions

File tree

experiments/README.md

Lines changed: 33 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -68,12 +68,12 @@ accuracy % over OEWN-resolvable instances.
6868
| `simple_lesk` | 47.70 | 55.34 | 61.90 | 58.64 | 55.19 | 39.65 | 47.15 | 47.24 |
6969
| `adapted_lesk` | 47.00 | 55.34 | 60.98 | 57.19 | 54.79 | 39.15 | 47.24 | 46.54 |
7070
| `cosine_lesk` | 32.03 | 44.72 | 48.11 | 45.67 | 41.38 | 27.59 | 26.36 | 31.80 |
71-
| `max_similarity_path` | 33.56 | | | | | | | |
72-
| `max_similarity_wup` | 30.56 | | | | | | | |
73-
| `max_similarity_lch` | 33.56 | | | | | | | |
74-
| `max_similarity_res` | 26.62 | | | | | | | |
75-
| `max_similarity_jcn` | **52.55** | | | | | | | |
76-
| `max_similarity_lin` | 30.56 | | | | | | | |
71+
| `max_similarity_path` | 33.56 | | | | | | | |
72+
| `max_similarity_wup` | 30.56 | | | | | | | |
73+
| `max_similarity_lch` | 33.56 | | | | | | | |
74+
| `max_similarity_res` | 26.62 | | | | | | | |
75+
| `max_similarity_jcn` | **52.55** | | | | | | | |
76+
| `max_similarity_lin` | 30.56 | | | | | | | |
7777

7878
Column headers: `SE07 (AW)`=SemEval-2007 fine-grained all-words
7979
(Raganato export), `SE13 (AW)`=SemEval-2013 Task 12,
@@ -91,28 +91,33 @@ scores confirm it — all methods produce near-identical numbers
9191
(±0.5 pp) across the two columns. Will rename in the next dataset
9292
release.
9393

94-
### Cells not filled
95-
96-
*****deliberately skipped.* Each `max_similarity` run is
97-
quadratic in (candidate synsets × context synsets) and takes ~10–30
98-
minutes per metric per config even on the 455-row SemEval-2007. The
99-
other test configs are 2–10× larger, so a full sweep of all six
100-
metrics would be many hours. Partial SE2007 results (the filled SE07
101-
column above) are sufficient to rank the six metrics; `jcn` is clearly
102-
best. If someone needs the complete grid, run:
103-
```
104-
python experiments/evaluate.py \\
105-
--configs <larger-config> \\
106-
--methods max_similarity_path max_similarity_wup max_similarity_lch \\
107-
max_similarity_res max_similarity_jcn max_similarity_lin \\
108-
--out experiments/results_maxsim_<config>.jsonl
109-
```
110-
111-
*****jcn row being filled now.* Because jcn is the standout
112-
IC-based metric on SE2007, we're running it across every remaining
113-
config to see whether the advantage holds. Streams into
114-
`results_maxsim_jcn.jsonl`; expect several hours of wall time (jcn
115-
was 586 s for 432 rows; largest config here is 4,239 rows).
94+
### Cells not filled (``)
95+
96+
Two reasons a cell is ``:
97+
98+
1. **The jcn row is being filled now.** Because jcn is the standout
99+
IC-based metric on SE2007, we're running it across every remaining
100+
config to see whether the advantage holds. Streams into
101+
`results_maxsim_jcn.jsonl`; expect several hours of wall time (jcn
102+
was 586 s for 432 rows; largest config here is 4,239 rows). Cells
103+
will be updated as they land.
104+
105+
2. **All other `max_similarity` cells are deliberately skipped.** Each
106+
`max_similarity` run is quadratic in (candidate synsets × context
107+
synsets) and takes ~10–30 minutes per metric per config even on
108+
the 455-row SemEval-2007. The other test configs are 2–10× larger,
109+
so a full sweep of all six metrics would be many hours. Partial
110+
SE2007 results (the filled SE07 column above) are sufficient to
111+
rank the six metrics; `jcn` is clearly best. If someone needs the
112+
complete grid, run:
113+
114+
```
115+
python experiments/evaluate.py \\
116+
--configs <larger-config> \\
117+
--methods max_similarity_path max_similarity_wup max_similarity_lch \\
118+
max_similarity_res max_similarity_jcn max_similarity_lin \\
119+
--out experiments/results_maxsim_<config>.jsonl
120+
```
116121

117122
### Instance counts (gold-resolvable / total in test split)
118123

0 commit comments

Comments
 (0)