-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathindex.html
More file actions
268 lines (237 loc) · 11.6 KB
/
index.html
File metadata and controls
268 lines (237 loc) · 11.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description"
content="FIND: A Function Description Benchmark for Evaluating Interpretability Methods">
<meta name="keywords" content="Interpretability, LLMs">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>A Function Interpretation Benchmark for Evaluating Interpretability Methods</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet"
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/favicon.svg">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">FIND: A Function Description Benchmark for Evaluating Interpretability Methods</h1>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://cogconfluence.com">Sarah Schwettmann</a><sup>1*</sup>,</span>
<span class="author-block">
<a href="https://tamarott.github.io/">Tamar Rott Shaham</a><sup>1*</sup>,</span>
<br>
<span class="author-block">
<a href="https://joaanna.github.io/">Joanna Materzynska</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://nchowdhury.com/">Neil Chowdhury</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://shuangli59.github.io/">Shuang Li</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://www.mit.edu/~jda/">Jacob Andreas</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://baulab.info/">David Bau</a><sup>2</sup>,</span>
<span class="author-block">
<a href="https://groups.csail.mit.edu/vision/torralbalab/">Antonio Torralba</a><sup>1</sup>
</span>
</div>
* indicates equal contribution.
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>MIT CSAIL</span>
<span class="author-block"><sup>2</sup>Northeastern University</span>
</div>
<div class="is-size-4 publication-authors">
<span class="author-block">NeurIPS 2023</span>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link. -->
<span class="link-block">
<a href="https://arxiv.org/pdf/2309.03886.pdf"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<span class="link-block">
<a href="https://arxiv.org/abs/2309.03886"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="ai ai-arxiv"></i>
</span>
<span>arXiv</span>
</a>
</span>
<!-- Code Link. -->
<span class="link-block">
<a href="https://github.com/multimodal-interpretability/FIND"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code</span>
</a>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="hero teaser" style="margin-top: -5px;">
<div class="container is-max-desktop">
<div class="hero-body">
<div style="display: flex; justify-content: center; width: 100%;">
<img src="./static/figures/FIND_AIA_gif.gif" style="width: 75%; height: auto;" />
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Abstract. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
Labeling neural network submodules with human-legible descriptions is useful for many downstream tasks: such descriptions can surface failures, guide interventions, and perhaps even explain important model behaviors. To date, most mechanistic descriptions of trained networks have involved small models, narrowly delimited phenomena, and large amounts of human labor. Labeling all human-interpretable sub-computations in models of increasing size and complexity will almost certainly require tools that can generate and validate descriptions automatically. Recently, techniques that use learned models in-the-loop for labeling have begun to gain traction, but methods for evaluating their efficacy are limited and ad-hoc. How should we validate and compare open-ended labeling tools? This paper introduces <b>FIND</b> (<b>F</b>unction <b>IN</b>terpretation and <b>D</b>escription), a benchmark suite for evaluating the building blocks of automated interpretability methods. FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate. The functions are procedurally constructed across textual and numeric domains, and involve a range of real-world complexities, including noise, composition, approximation, and bias. We evaluate methods that use pretrained language models (LMs) to produce code-based and natural language descriptions of function behavior. Additionally, we introduce a new interactive method in which an <b>A</b>utomated <b>I</b>nterpretability <b>A</b>gent (<b>AIA</b>) generates function descriptions. We find that an AIA, built with an off-the-shelf LM augmented with black-box access to functions, can sometimes infer function structure, acting as a scientist by forming hypotheses, proposing experiments, and updating descriptions in light of new data. However, <b>FIND</b> also reveals that LM-based descriptions capture global function behavior while missing local details. These results suggest that FIND will be useful for characterizing the performance of more sophisticated interpretability methods before they are applied to real-world models.
</p>
<hr>
</div>
</div>
</div>
<!--/ Abstract. -->
</div>
</section>
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<figure>
<img src="./static/figures/FIND_schematic.png" />
<figcaption style="font-size: 0.8em;"></figcaption>
</figure>
<div class="content has-text-justified">
<p><b>FIND dataset and the Automated Interpretability Agent. </b>FIND is constructed procedurally: atomic functions are defined across domains
including elementary numeric operations (purple), string operations (green), and synthetic neural modules that
compute semantic similarity to reference entities (yellow) and implement real-world factual associations (blue).
Complexity is introduced through composition, bias, approximation and noise. We provide an LM-based
interpretation baseline that compares text and code interpretations to ground-truth function implementations.</p>
<hr>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h3 class="title is-4">Video</h3>
<div class="content has-text-justified">
<p>
<figure>
<iframe width="620" height="360" src="https://www.youtube.com/embed/X1yJAZMLJG8?si=l6LfplnvRGRXO0Wo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</figure>
</p>
<hr>
</div>
</div>
</div>
</div>
</section>
<section class="section" style="margin-top: -5px;">
<div class="container is-max-desktop">
<div class="columns is-centered">
<h3 class="title is-4">AIA interpretations</h3>
</div>
</div>
</section>
<section class="section" style="margin-top: -5px;">
<div class="container is-max-desktop">
<h4 class="title is-4">Numeric functions</h4>
</div>
</section>
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<figure>
<img src="./static/figures/math_examples.png" />
<figcaption style="font-size: 0.8em;"></figcaption>
</figure>
</div>
</div>
</section>
<section class="section" style="margin-top: -5px;">
<div class="container is-max-desktop">
<h4 class="title is-4">String functions</h4>
</div>
</section>
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<figure>
<img src="./static/figures/strings_examples.png" />
<figcaption style="font-size: 0.8em;"></figcaption>
</figure>
</div>
</div>
</section>
<section class="section" style="margin-top: -5px;">
<div class="container is-max-desktop">
<h4 class="title is-4">Synthetic neurons</h4>
</div>
</section>
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<figure>
<img src="./static/figures/synthetic_neurons.png" />
<figcaption style="font-size: 0.8em;"></figcaption>
</figure>
</div>
</div>
</section>
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@inproceedings{schwettmann2023find,
title={FIND: A Function Description Benchmark for Evaluating Interpretability Methods},
author={Schwettmann, Sarah and Rott Shaham, Tamar and Materzynska, Joanna and Chowdhury, Neil and Li, Shuang and Andreas, Jacob and Bau, David and Torralba, Antonio},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023}
}
</code></pre>
</div>
</section>
<footer class="footer">
<div class="columns is-centered">
<div class="column is-8">
<div class="content" style="text-align: center;">
<p>
This website is adapted from the <a href="https://github.com/nerfies/nerfies.github.io">Nerfies template</a>, which you are free to borrow if you link back to it in the footer.
</p>
</div>
</div>
</div>
</div>
</footer>
</body>
</html>