Skip to content

Commit b87841a

Browse files
a.palmasa.palmas
authored andcommitted
Update README and webpage
1 parent 67f85b0 commit b87841a

2 files changed

Lines changed: 35 additions & 25 deletions

File tree

README.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
</p>
55

66
<p align="center">
7-
<a href="https://arxiv.org/abs/2604.00641"><img src="https://img.shields.io/badge/paper-arXiv:2604.00641-B31B1B?logo=arxiv" alt="Paper"/></a>
8-
<a href="https://tbd.com"><img src="https://img.shields.io/badge/blog-read%20post-blue" alt="Blog Post"/></a>
7+
<a href="https://arxiv.org/abs/2604.17465"><img src="https://img.shields.io/badge/paper-arXiv:2604.17465-B31B1B?logo=arxiv" alt="Paper"/></a>
8+
<a href="https://saifh-github.github.io/llm-dropout-noise-recognition/"><img src="https://img.shields.io/badge/blog-read%20post-blue" alt="Blog Post"/></a>
99
</p>
1010

1111
<p align="center">
@@ -511,12 +511,14 @@ spe/
511511
If you use this code in your research, please cite our paper:
512512

513513
```bibtex
514-
@article{spe2026,
515-
title = {Self-Perturbation Experiments: Can Language Models Detect Changes in Their Own Activations?},
516-
author = {TODO},
517-
year = {2026},
518-
journal = {TODO},
519-
url = {TODO}
514+
@article{fornasiere2026languagemodelsrecognizedropout,
515+
title={Language models recognize dropout and Gaussian noise applied to their activations},
516+
author={Damiano Fornasiere and Mirko Bronzi and Spencer Kitts and Alessandro Palmas and Yoshua Bengio and Oliver Richardson},
517+
year={2026},
518+
eprint={2604.17465},
519+
archivePrefix={arXiv},
520+
primaryClass={cs.AI},
521+
url={https://arxiv.org/abs/2604.17465},
520522
}
521523
```
522524

webpage/index.html

Lines changed: 25 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -57,15 +57,15 @@ <h1 class="title is-1 publication-title">Language Models Recognize Dropout and G
5757
<div class="publication-links">
5858
<!-- PDF Link. -->
5959
<span class="link-block">
60-
<a href="#" class="external-link button is-normal is-rounded is-dark">
60+
<a href="https://arxiv.org/pdf/2604.17465" class="external-link button is-normal is-rounded is-dark">
6161
<span class="icon">
6262
<i class="fas fa-file-pdf"></i>
6363
</span>
6464
<span>Paper</span>
6565
</a>
6666
</span>
6767
<span class="link-block">
68-
<a href="#" class="external-link button is-normal is-rounded is-dark">
68+
<a href="https://arxiv.org/abs/2604.17465" class="external-link button is-normal is-rounded is-dark">
6969
<span class="icon">
7070
<i class="ai ai-arxiv"></i>
7171
</span>
@@ -74,7 +74,8 @@ <h1 class="title is-1 publication-title">Language Models Recognize Dropout and G
7474
</span>
7575
<!-- Code Link. -->
7676
<span class="link-block">
77-
<a href="#" class="external-link button is-normal is-rounded is-dark">
77+
<a href="https://github.com/saifh-github/llm-dropout-noise-recognition"
78+
class="external-link button is-normal is-rounded is-dark">
7879
<span class="icon">
7980
<i class="fab fa-github"></i>
8081
</span>
@@ -323,12 +324,12 @@ <h3 class="title is-4">Are models just biased toward the perturbed sentence?
323324
</h3>
324325
<div class="content has-text-justified">
325326
<p>
326-
One hypothesis that might have accounted for the performance on this
327-
localization task is that the perturbation simply steers the model to pick
328-
whichever sentence was perturbed, regardless of the question. To rule this out,
329-
we present the model with two sentences on different topics (<em>e.g.</em>, one
330-
about animals, one about cities), perturb one, and ask a simple comprehension
331-
question (<em>e.g.</em>, <em>"Which sentence was about animals?"</em>).
327+
One hypothesis that might have accounted for the performance on this
328+
localization task is that the perturbation simply steers the model to pick
329+
whichever sentence was perturbed, regardless of the question. To rule this out,
330+
we present the model with two sentences on different topics (<em>e.g.</em>, one
331+
about animals, one about cities), perturb one, and ask a simple comprehension
332+
question (<em>e.g.</em>, <em>"Which sentence was about animals?"</em>).
332333
</p>
333334
</div>
334335

@@ -391,7 +392,8 @@ <h2 class="title is-3">Result 2: Zero-Shot Classification</h2>
391392

392393
<div class="content has-text-justified">
393394
<p>
394-
Setting itself apart from the other models, <b>Qwen3-32B</b> (left) exhibits a notable pattern: <em>accuracy increases monotonically</em> with
395+
Setting itself apart from the other models, <b>Qwen3-32B</b> (left) exhibits a notable pattern:
396+
<em>accuracy increases monotonically</em> with
395397
perturbation strength. The model has a high prior to answer "dropout" (96.2% at the lowest
396398
tested rate), and yet it climbs to 99.2% at the highest rate. For noise, accuracy
397399
rises from 4.3% to 15.5% with the correct labels, and drastically to 89.6% when
@@ -460,7 +462,8 @@ <h3 class="title is-4">Do correct labels matter?</h3>
460462
the degree to which the demonstrations conflict with other learned prior.
461463
We therefore compare the difference between in-context learning with the correct labels, and flipped ones
462464
(<em>i.e.</em>, dropout labeled as "noise", and vice versa).
463-
We also run the same test with control labels. The resulting heatmaps of accuracy as a function of both perturbation strengths are shown for Qwen3-32B below.
465+
We also run the same test with control labels. The resulting heatmaps of accuracy as a function of both
466+
perturbation strengths are shown for Qwen3-32B below.
464467
</p>
465468
</div>
466469

@@ -541,11 +544,16 @@ <h3>
541544
<section class="section" id="BibTeX">
542545
<div class="container is-max-desktop content">
543546
<h2 class="title">BibTeX</h2>
544-
<pre><code>@article{fornasiere2026dropout,
545-
author = {Fornasiere, Damiano and Bronzi, Mirko and Kitts, Spencer and Palmas, Alessandro and Bengio, Yoshua and Richardson, Oliver},
546-
title = {Language Models Recognize Dropout and Gaussian Noise Applied to Their Activations},
547-
year = {2026},
548-
}</code></pre>
547+
<pre><code>@article{fornasiere2026languagemodelsrecognizedropout,
548+
title={Language models recognize dropout and Gaussian noise applied to their activations},
549+
author={Damiano Fornasiere and Mirko Bronzi and Spencer Kitts and Alessandro Palmas and Yoshua Bengio and Oliver Richardson},
550+
year={2026},
551+
eprint={2604.17465},
552+
archivePrefix={arXiv},
553+
primaryClass={cs.AI},
554+
url={https://arxiv.org/abs/2604.17465},
555+
}
556+
</code></pre>
549557
</div>
550558
</section>
551559

@@ -567,4 +575,4 @@ <h2 class="title">BibTeX</h2>
567575

568576
</body>
569577

570-
</html>
578+
</html>

0 commit comments

Comments
 (0)