You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/main.tex
+46-6Lines changed: 46 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -39,12 +39,12 @@
39
39
This paper reports a multi-stage study of hidden-state geometry for correctness prediction in language-model reasoning.
40
40
The original hypothesis was ambitious: perhaps global topological summaries of hidden-state traces could provide a robust signal of correctness or ``truth.''
41
41
That hypothesis did not survive stronger checks.
42
-
On GSM8K-style reasoning traces, dynamic and windowed $H_0$ features were weak, often degenerate, and did not outperform simple controls.
42
+
On GSM8K-style reasoning traces\cite{cobbe2021gsm8k}, dynamic and windowed $H_0$ features were weak, often degenerate, and did not outperform simple controls.
43
43
A separate fixed-decoding GSM8K branch yielded a stronger operational result: non-convergence, as measured by token-cap termination, was the dominant predictor of error.
44
44
45
45
The main positive result emerged only after changing the task.
46
46
We introduced a procedural micro-world benchmark with nonce vocabularies, held-out templates, and exact tri-label semantics over \texttt{True}/\texttt{False}/\texttt{Unknown}.
47
-
In that setting, a consistent dissociation appeared across Qwen and Gemma families: decoder behavior systematically under-expressed \texttt{Unknown}, yet verdict-region hidden-state probes recovered substantial \texttt{Unknown} signal on held-out worlds.
47
+
In that setting, a consistent dissociation appeared across Qwen and Gemma families\cite{qwen_model_card,gemma_model_card}: decoder behavior systematically under-expressed \texttt{Unknown}, yet verdict-region hidden-state probes recovered substantial \texttt{Unknown} signal on held-out worlds.
48
48
This dissociation survived prompt-path controls, constrained label decoding, base-vs-instruct checks, verdict-step logit analysis, and layer sweeps.
49
49
The paper's final claim is therefore narrower and stronger than the original repository framing: semantic non-entailment is internally represented, but often under-realized by decoder readout.
50
50
\end{abstract}
@@ -67,7 +67,7 @@ \section{Scope and Final Claim}
67
67
The stronger result is a readout-bottleneck interpretation supported by multiple controls.
68
68
69
69
\section{Why the Original Idea Failed}
70
-
The original working hypothesis was that correct reasoning might have a cleaner global geometric or topological signature than incorrect reasoning.
70
+
The original working hypothesis was that correct reasoning might have a cleaner global geometric or topological signature than incorrect reasoning\cite{carlsson2009,bauer2021ripser}.
71
71
The basic recipe was:
72
72
\begin{enumerate}
73
73
\item run a small model on a reasoning task,
@@ -114,7 +114,7 @@ \subsection{Phase A: Global Topology on GSM8K (Negative)}
114
114
These numbers are not compatible with a strong correctness-prediction claim.
The project became scientifically cleaner only after replacing benchmark reasoning with a procedurally generated semantic task.
150
+
The project became scientifically cleaner only after replacing benchmark reasoning with a procedurally generated semantic task, in line with controlled-evaluation guidance from recent LM analysis literature \cite{liang2022holistic}.
This probe is intentionally weak and follows the standard linear-probe setup \cite{alain2016probes}.
256
256
If it succeeds, the information is already arranged in hidden space in a directly readable linear form.
257
257
258
258
\subsection{Within-World Geometry Gap}
@@ -446,6 +446,46 @@ \section{Conclusion}
446
446
semantic non-entailment is strongly encoded in hidden states but systematically under-surfaced at output time.
447
447
The correct framing is no longer a search for a single topology-of-truth scalar, but a study of how semantic uncertainty is preserved internally and lost at decoder readout.
448
448
449
+
\section*{References}
450
+
\begin{thebibliography}{99}
451
+
452
+
\bibitem{alain2016probes}
453
+
Guillaume Alain and Yoshua Bengio.
454
+
\newblock Understanding intermediate layers using linear classifier probes.
455
+
\newblock In \emph{ICLR Workshop Track}, 2016.
456
+
457
+
\bibitem{bauer2021ripser}
458
+
Ulrich Bauer.
459
+
\newblock Ripser: efficient computation of Vietoris--Rips persistence barcodes.
460
+
\newblock\emph{Journal of Applied and Computational Topology}, 5:391--423, 2021.
461
+
462
+
\bibitem{carlsson2009}
463
+
Gunnar Carlsson.
464
+
\newblock Topology and data.
465
+
\newblock\emph{Bulletin of the American Mathematical Society}, 46(2):255--308, 2009.
466
+
467
+
\bibitem{cobbe2021gsm8k}
468
+
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman.
469
+
\newblock Training verifiers to solve math word problems.
0 commit comments