Skip to content

Commit 5b8c275

Browse files
author
Marcel
committed
Add references and citations to manuscript
1 parent 86c65ef commit 5b8c275

3 files changed

Lines changed: 46 additions & 6 deletions

File tree

paper/main.pdf

12.4 KB
Binary file not shown.

paper/main.tex

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,12 @@
3939
This paper reports a multi-stage study of hidden-state geometry for correctness prediction in language-model reasoning.
4040
The original hypothesis was ambitious: perhaps global topological summaries of hidden-state traces could provide a robust signal of correctness or ``truth.''
4141
That hypothesis did not survive stronger checks.
42-
On GSM8K-style reasoning traces, dynamic and windowed $H_0$ features were weak, often degenerate, and did not outperform simple controls.
42+
On GSM8K-style reasoning traces \cite{cobbe2021gsm8k}, dynamic and windowed $H_0$ features were weak, often degenerate, and did not outperform simple controls.
4343
A separate fixed-decoding GSM8K branch yielded a stronger operational result: non-convergence, as measured by token-cap termination, was the dominant predictor of error.
4444

4545
The main positive result emerged only after changing the task.
4646
We introduced a procedural micro-world benchmark with nonce vocabularies, held-out templates, and exact tri-label semantics over \texttt{True}/\texttt{False}/\texttt{Unknown}.
47-
In that setting, a consistent dissociation appeared across Qwen and Gemma families: decoder behavior systematically under-expressed \texttt{Unknown}, yet verdict-region hidden-state probes recovered substantial \texttt{Unknown} signal on held-out worlds.
47+
In that setting, a consistent dissociation appeared across Qwen and Gemma families \cite{qwen_model_card,gemma_model_card}: decoder behavior systematically under-expressed \texttt{Unknown}, yet verdict-region hidden-state probes recovered substantial \texttt{Unknown} signal on held-out worlds.
4848
This dissociation survived prompt-path controls, constrained label decoding, base-vs-instruct checks, verdict-step logit analysis, and layer sweeps.
4949
The paper's final claim is therefore narrower and stronger than the original repository framing: semantic non-entailment is internally represented, but often under-realized by decoder readout.
5050
\end{abstract}
@@ -67,7 +67,7 @@ \section{Scope and Final Claim}
6767
The stronger result is a readout-bottleneck interpretation supported by multiple controls.
6868

6969
\section{Why the Original Idea Failed}
70-
The original working hypothesis was that correct reasoning might have a cleaner global geometric or topological signature than incorrect reasoning.
70+
The original working hypothesis was that correct reasoning might have a cleaner global geometric or topological signature than incorrect reasoning \cite{carlsson2009,bauer2021ripser}.
7171
The basic recipe was:
7272
\begin{enumerate}
7373
\item run a small model on a reasoning task,
@@ -114,7 +114,7 @@ \subsection{Phase A: Global Topology on GSM8K (Negative)}
114114
These numbers are not compatible with a strong correctness-prediction claim.
115115

116116
\subsection{Phase B: Fixed-Decoding GSM8K (Convergence Result)}
117-
A second branch held decoding fixed for Qwen3.5-2B and shifted focus from topology to operational failure mode.
117+
A second branch held decoding fixed for Qwen3.5-2B and shifted focus from topology to operational failure mode on a GSM8K slice \cite{cobbe2021gsm8k}.
118118
The key variables were:
119119
\[
120120
C = \mathbf{1}[\text{run terminated by } \texttt{max\_new\_tokens}], \qquad
@@ -147,7 +147,7 @@ \subsection{Phase B: Fixed-Decoding GSM8K (Convergence Result)}
147147
some failures were rescued by a longer budget, but many remained wrong even after receiving more room to continue.
148148

149149
\subsection{Phase C: Procedural Micro-World Semantics (Main Positive Result)}
150-
The project became scientifically cleaner only after replacing benchmark reasoning with a procedurally generated semantic task.
150+
The project became scientifically cleaner only after replacing benchmark reasoning with a procedurally generated semantic task, in line with controlled-evaluation guidance from recent LM analysis literature \cite{liang2022holistic}.
151151
Each generated world contains:
152152
\begin{itemize}
153153
\item a latent structured state,
@@ -252,7 +252,7 @@ \subsection{Linear Probe}
252252
\log \frac{\exp((Wh_i+b)_{y_i})}{\sum_c \exp((Wh_i+b)_c)}.
253253
\]
254254

255-
This probe is intentionally weak.
255+
This probe is intentionally weak and follows the standard linear-probe setup \cite{alain2016probes}.
256256
If it succeeds, the information is already arranged in hidden space in a directly readable linear form.
257257

258258
\subsection{Within-World Geometry Gap}
@@ -446,6 +446,46 @@ \section{Conclusion}
446446
semantic non-entailment is strongly encoded in hidden states but systematically under-surfaced at output time.
447447
The correct framing is no longer a search for a single topology-of-truth scalar, but a study of how semantic uncertainty is preserved internally and lost at decoder readout.
448448

449+
\section*{References}
450+
\begin{thebibliography}{99}
451+
452+
\bibitem{alain2016probes}
453+
Guillaume Alain and Yoshua Bengio.
454+
\newblock Understanding intermediate layers using linear classifier probes.
455+
\newblock In \emph{ICLR Workshop Track}, 2016.
456+
457+
\bibitem{bauer2021ripser}
458+
Ulrich Bauer.
459+
\newblock Ripser: efficient computation of Vietoris--Rips persistence barcodes.
460+
\newblock \emph{Journal of Applied and Computational Topology}, 5:391--423, 2021.
461+
462+
\bibitem{carlsson2009}
463+
Gunnar Carlsson.
464+
\newblock Topology and data.
465+
\newblock \emph{Bulletin of the American Mathematical Society}, 46(2):255--308, 2009.
466+
467+
\bibitem{cobbe2021gsm8k}
468+
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman.
469+
\newblock Training verifiers to solve math word problems.
470+
\newblock arXiv:2110.14168, 2021.
471+
472+
\bibitem{gemma_model_card}
473+
Google.
474+
\newblock Gemma 3 4B Instruct model card.
475+
\newblock \url{https://huggingface.co/google/gemma-3-4b-it}. Accessed 2026-04-14.
476+
477+
\bibitem{liang2022holistic}
478+
Percy Liang, Rishi Bommasani, Tony Lee, et al.
479+
\newblock Holistic evaluation of language models.
480+
\newblock \emph{Transactions on Machine Learning Research}, 2023.
481+
482+
\bibitem{qwen_model_card}
483+
Qwen Team.
484+
\newblock Qwen3.5 model family card.
485+
\newblock \url{https://huggingface.co/Qwen/Qwen3.5-4B}. Accessed 2026-04-14.
486+
487+
\end{thebibliography}
488+
449489
\appendix
450490
\section{Appendix A: Full Reproduction Commands}
451491
\begin{verbatim}

paper/paper.pdf

12.4 KB
Binary file not shown.

0 commit comments

Comments
 (0)