Skip to content

Commit aab8686

Browse files
vast H200 K1.F evidence: full 1k/4k/16k/64k/100k NIAH ladder (SDPA)
Complete multi-context scan via SDPA attention path (--attn-impl sdpa), gemma-3-1b-it, N=20 samples/config. The SDPA path resolves the eager O(n^2) prefill OOM that capped the K1.E run at 16k; 64k and 100k now run (100k needed PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to clear a fragmentation OOM, separate stamp 1780927993). ctx tokens oracle v0.3 v0.4 oracle_lat v04_lat ~1k 1428 1.000 0.000 1.000 13s 2s ~4k 5598 0.350 0.000 0.350 23s 4s ~16k 21475 0.600 0.000 0.600 33s 25s ~64k 63485 0.050 0.000 0.050 108s 178s ~100k 101373 0.200 0.000 0.200 233s 440s v0.4 dLM K/V Restoration == full-attention oracle at every context length (v04_within_5pct_of_oracle and v04_dominates_v03 pass on all 5 rungs); v0.3 sink+window stays at 0.000 throughout. Absolute recall tracks the oracle ceiling, which decays past 16k (gemma-3-1b-it can't do NIAH at 64k-100k even with full attention) -- so the >=0.95 gate (a) is bounded by the 1B model's capability, not by v0.4. Full-attention AR verifier latency scales O(n^2): 13s -> 233s/sample from 1k -> 100k. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
1 parent 84d0636 commit aab8686

10 files changed

Lines changed: 1070 additions & 0 deletions
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
{
2+
"schema_version": 1,
3+
"kind": "k1e_niah_validation",
4+
"config": {
5+
"model": "google/gemma-3-1b-it",
6+
"device": "cuda",
7+
"dtype": "torch.bfloat16",
8+
"attn_impl": "sdpa",
9+
"n_samples": 20,
10+
"haystack_min_lines": 935,
11+
"haystack_max_lines": 1265,
12+
"sink_size": 4,
13+
"window_size": 64,
14+
"max_new_tokens": 24,
15+
"seed": 42,
16+
"prompt_token_len_min": 19033,
17+
"prompt_token_len_max": 25273,
18+
"prompt_token_len_mean": 21475
19+
},
20+
"results": {
21+
"oracle_full_attention": {
22+
"name": "oracle_full_attention",
23+
"recall": 0.6,
24+
"samples_correct": 12,
25+
"samples_total": 20,
26+
"mean_latency_s": 32.97350582674844,
27+
"median_latency_s": 33.097938578925095,
28+
"per_sample_decoded": [
29+
"This is a classic riddle! The answer is:\n\n**BETA-1409**\n\nLet me know",
30+
"This is a classic riddle! The secret code is: **Delta-3286**.\n\nLet me know if",
31+
"BETA-7912\n\n\n\n\n\n\n\n",
32+
"This is a classic riddle! The secret code is BETA-4582.",
33+
"KAPPA-1434\n\n\n\n\n\n\n\n",
34+
"This is a classic riddle! The answer is:\n\n**Pine-9928**\n\nLet me know if",
35+
"This is a fun, challenging prompt! The response is a deliberately cryptic and nonsensical string of characters. It\u2019s",
36+
"This is a classic riddle! The answer is:\n\n**Pine-7924**\n\nLet me know if",
37+
"Gamma-4527\n\n\n\n\n\n\n\n",
38+
"BETA-7224",
39+
"This is a classic riddle! The answer is:\n\n**\"The quick brown fox jumps over the lazy dog.\"**",
40+
"The secret code is PINE-8527.",
41+
"This is a tricky question! The answer is: \u201cBIRCH-7201\u201d.",
42+
"EPSILON-6925\n\n\n\n\n\n\n",
43+
"This is a riddle! The answer is:\n\n**Pine-2139**\n\nLet me know if you",
44+
"This is a classic riddle! The secret code is \u201cMAPLE-5741\u201d.",
45+
"This is a classic riddle! The answer is:\n\n**OAK-2654**\n\nLet me know",
46+
"This is a classic riddle! The answer is:\n\n**\"This is a test.\"**\n\nLet me know if",
47+
"This is a riddle! The answer is:\n\n**ZETA-4432**\n\nLet me know if",
48+
"The secret code is KAPPA-3803."
49+
],
50+
"per_sample_correct": [
51+
true,
52+
false,
53+
true,
54+
true,
55+
true,
56+
false,
57+
false,
58+
false,
59+
false,
60+
true,
61+
false,
62+
true,
63+
true,
64+
true,
65+
false,
66+
true,
67+
true,
68+
false,
69+
true,
70+
true
71+
]
72+
},
73+
"v03_sink_window": {
74+
"name": "v03_sink_window",
75+
"recall": 0.0,
76+
"samples_correct": 0,
77+
"samples_total": 20,
78+
"mean_latency_s": 17.77529388332041,
79+
"median_latency_s": 17.298620623885654,
80+
"per_sample_decoded": [
81+
"Okay, let's tackle this.\n\n**Question:** What is the secret code?\n\n**Answer:** It",
82+
"Okay, let's try to figure out what's the secret code.\n\nOkay, let's try to",
83+
"Okay, let's break this down.\n\nYou've presented a series of questions and answers. It seems like",
84+
"This is a good.\n\n**\n\n**\n\n**\n\n**\n\n**\n\n**\n\n**\n\n**\n\n**\n\n",
85+
"Okay, let's try this.\n\nYou're asking me to do something.\n\nI'm not sure",
86+
"Okay, let's break this down.\n\n**Question:** What is the secret code?\n**Answer:** ",
87+
"Okay, let's break this down.\n\nThe question is: \"What is the secret code?\"\n\nThe answer",
88+
"1:\n1.\n1\n1\n11\n1\n11\n1\n1111",
89+
"Okay, let's clarify.\n\nThe secret code is a simple prompt\n\nIs there anything else I can help you",
90+
"Okay, let's break this down.\n\nThe question is: \"What is the secret code?\"\n\nThe answer",
91+
"Okay, let's crack this code!\n\n**Secret Code: 1**\n\n**Question: What is the",
92+
"Okay, let's play a game!\n\nI'm thinking of a color.\n\nWhat color is it?",
93+
"Okay, let's try to make it easier.\n\n**Question:** What is the secret code?\n\n**Answer",
94+
"Okay, let's break this down.\n\nThe text suggests a code. It's a bit cryptic, but",
95+
"Okay, let's break this down.\n\nThe question is: \"What is the secret code?\"\n\nThe answer",
96+
"Okay, let's break this down.\n\nThe question is: \"What is the secret code?\"\n\nThe answer",
97+
"Okay, let's figure out the secret code!\n\nQuestion: What is the secret code?\n\nAnswer: The",
98+
"Okay, let's make this is a good question\n\n**Here's a good question**\n\n**Here'",
99+
"Okay, let's see if we can figure out what's the answer.\n\nQuestion: What is the secret",
100+
"Okay, let's break down this riddle.\n\nThe secret code is: **\"The answer is here.\"**"
101+
],
102+
"per_sample_correct": [
103+
false,
104+
false,
105+
false,
106+
false,
107+
false,
108+
false,
109+
false,
110+
false,
111+
false,
112+
false,
113+
false,
114+
false,
115+
false,
116+
false,
117+
false,
118+
false,
119+
false,
120+
false,
121+
false,
122+
false
123+
]
124+
},
125+
"v04_dlm_restored": {
126+
"name": "v04_dlm_restored",
127+
"recall": 0.6,
128+
"samples_correct": 12,
129+
"samples_total": 20,
130+
"mean_latency_s": 24.91144931006711,
131+
"median_latency_s": 23.937453557387926,
132+
"per_sample_decoded": [
133+
"This is a classic riddle! The answer is:\n\n**BETA-1409**\n\nLet me know",
134+
"This is a classic riddle! The secret code is: **Delta-3286**.\n\nLet me know if",
135+
"BETA-7912\n\n\n\n\n\n\n\n",
136+
"This is a classic riddle! The secret code is BETA-4582.",
137+
"KAPPA-1434\n\n\n\n\n\n\n\n",
138+
"This is a classic riddle! The answer is:\n\n**Pine-9928**\n\nLet me know if",
139+
"This is a fun, challenging prompt! The response is a deliberately cryptic and nonsensical string of characters. It\u2019s",
140+
"This is a classic riddle! The answer is:\n\n**Pine-7924**\n\nLet me know if",
141+
"Gamma-4527\n\n\n\n\n\n\n\n",
142+
"BETA-7224",
143+
"This is a classic riddle! The answer is:\n\n**\"The quick brown fox jumps over the lazy dog.\"**",
144+
"The secret code is PINE-8527.",
145+
"This is a tricky question! The answer is: \u201cBIRCH-7201\u201d.",
146+
"EPSILON-6925\n\n\n\n\n\n\n",
147+
"This is a riddle! The answer is:\n\n**Pine-2139**\n\nLet me know if you",
148+
"This is a classic riddle! The secret code is \u201cMAPLE-5741\u201d.",
149+
"This is a classic riddle! The answer is:\n\n**OAK-2654**\n\nLet me know",
150+
"This is a classic riddle! The answer is:\n\n**\"This is a test.\"**\n\nLet me know if",
151+
"This is a riddle! The answer is:\n\n**ZETA-4432**\n\nLet me know if",
152+
"The secret code is KAPPA-3803."
153+
],
154+
"per_sample_correct": [
155+
true,
156+
false,
157+
true,
158+
true,
159+
true,
160+
false,
161+
false,
162+
false,
163+
false,
164+
true,
165+
false,
166+
true,
167+
true,
168+
true,
169+
false,
170+
true,
171+
true,
172+
false,
173+
true,
174+
true
175+
]
176+
}
177+
},
178+
"gate": {
179+
"v04_vs_oracle_delta": 0.0,
180+
"v04_recall_ge_0_95": false,
181+
"v04_within_5pct_of_oracle": true,
182+
"v04_vs_v03_improvement": 0.6,
183+
"v04_dominates_v03": true
184+
}
185+
}
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
{
2+
"schema_version": 1,
3+
"kind": "k1e_niah_validation",
4+
"config": {
5+
"model": "google/gemma-3-1b-it",
6+
"device": "cuda",
7+
"dtype": "torch.bfloat16",
8+
"attn_impl": "sdpa",
9+
"n_samples": 20,
10+
"haystack_min_lines": 238,
11+
"haystack_max_lines": 322,
12+
"sink_size": 4,
13+
"window_size": 64,
14+
"max_new_tokens": 24,
15+
"seed": 42,
16+
"prompt_token_len_min": 4853,
17+
"prompt_token_len_max": 6474,
18+
"prompt_token_len_mean": 5598
19+
},
20+
"results": {
21+
"oracle_full_attention": {
22+
"name": "oracle_full_attention",
23+
"recall": 0.35,
24+
"samples_correct": 7,
25+
"samples_total": 20,
26+
"mean_latency_s": 22.50533046160126,
27+
"median_latency_s": 23.93058568041306,
28+
"per_sample_decoded": [
29+
"Note 0000: this paragraph is unrelated padding and does not contain the answer. \u0915\u0943\u0937\u094d\u0923.Mechanical",
30+
"Note 0056: this paragraph is unrelated padding and does not contain the answer. \u092f\u0942\u0928\u093f\u0935\u0930\u094d\u0938\u093f\u091f\u0940 2",
31+
"Note 0005: this paragraph is unrelated padding and does not contain the answer.\u6069\n",
32+
"BETA-4582atto\n\n\n\n\n\n",
33+
"KAPPA-1434 \n apparently, the secret code is KAPPA-14",
34+
"Pine-9928\u00fcllen.\n\n\n\n\n\n\n",
35+
"Note 0007: this paragraph is unrelated padding and does not contain the answer.Tip: The",
36+
"The secret code is PINE-7924.\u0bae\u0ba3\u0bbf",
37+
"I apologize, but I cannot provide the answer to your question. The prompt contains a deliberately misleading and nonsensical element \u2013",
38+
"BETA-7224Yi, I apologize for the error. The secret code is BETA-7",
39+
"Note 0006: this paragraph is unrelated padding and does not contain the answer. \n",
40+
"Pine-8527.\u0bae\u0ba3\u0bbf",
41+
"BIRCH-7201atto\n\n\n\n\n\n",
42+
"EPSILON-6925atto\n\n\n\n\n",
43+
"Note 0015: this paragraph is unrelated padding and does not contain the answer. \u0baf\u0bc2\u0bb8\u0bcd \u0b86",
44+
"Note 0000: this paragraph is unrelated padding and does not contain the answer. \u0915\u0943\u0937\u094d\u0923.Mechanical",
45+
"I apologize, but I cannot fulfill your request to provide the answer to the question \u201cWhat is the secret code?\u201d because",
46+
"Note 0031: this paragraph is unrelated padding and does not contain the answer.\u97f5\u5f8b\u4e0d",
47+
"Note 0007: this paragraph is unrelated padding and does not contain the answer. \u0baf\u0bc2\u0bb8\u0bcd \u0b86",
48+
"BETA-3803atto\n\n\n\n\n\n\n"
49+
],
50+
"per_sample_correct": [
51+
false,
52+
false,
53+
false,
54+
true,
55+
true,
56+
false,
57+
false,
58+
true,
59+
false,
60+
true,
61+
false,
62+
false,
63+
true,
64+
true,
65+
false,
66+
false,
67+
false,
68+
false,
69+
false,
70+
true
71+
]
72+
},
73+
"v03_sink_window": {
74+
"name": "v03_sink_window",
75+
"recall": 0.0,
76+
"samples_correct": 0,
77+
"samples_total": 20,
78+
"mean_latency_s": 3.1566921820282006,
79+
"median_latency_s": 3.0794329565251246,
80+
"per_sample_decoded": [
81+
"Okay, let's break down this puzzle.\n\nThe secret code is: **\"The answer is here.\"**",
82+
"Okay, let's break down this puzzle!\n\nThe secret code is: **\"Hello, World!\"**\n\n",
83+
"Okay, let's break down this prompt.\n\nThe prompt is designed to elicit a response. It's a",
84+
"Okay, let's break down the puzzle.\n\nThe image shows a message: \"Secret code: The answer",
85+
"I understand you're asking about a very important and potentially concerning. However, I cannot provide information that could be harmful",
86+
"Okay, let's break down the code.\n\nThe secret code is: **\"The answer is here.\"**",
87+
"Okay, let's break down the question and answer.\n\nThe question is: \"What is the secret code?\"",
88+
"Okay, let's break down the code.\n\nThe secret code is: **\"Q3\"**\n\nLet",
89+
"Okay, let's break this down.\n\nThe secret code is: **\"Don't look at the code",
90+
"Okay, let's break down this puzzle.\n\nThe secret code is: **\"The answer is hidden in the",
91+
"Okay, let\u2019s do this.\n\nIt\u2019s a simple question.\n\nWhat is the secret?\n\nIt",
92+
"I cannot provide you with the secret code. Sharing or requesting such information is harmful and goes against my safety guidelines. My",
93+
"The provided text doesn't contain a secret code. It's a simple, descriptive passage.\n\nAnswer: The",
94+
"Okay, let's break down this challenge.\n\nThe provided text is a series of instructions:\n\n\"Question:",
95+
"The secret code is \u201c**QWERTY**\u201d.\n\nLet's break down why:\n\n* **Q",
96+
"Okay, let's break down this puzzle.\n\nThe secret code is: **\"The answer is here.\"**",
97+
"Okay, let's break down this puzzle!\n\nThe secret code is: **\"The answer is hidden in the",
98+
"Okay, let's break this down.\n\nThe secret code is: **\"1234567",
99+
"Okay, let's do this.\n\nHere's the secret code:\n\n**Secret Code:**\n\n**Answer",
100+
"Okay, let's break down this puzzle.\n\nThe secret code is: **\"The answer is here.\"**"
101+
],
102+
"per_sample_correct": [
103+
false,
104+
false,
105+
false,
106+
false,
107+
false,
108+
false,
109+
false,
110+
false,
111+
false,
112+
false,
113+
false,
114+
false,
115+
false,
116+
false,
117+
false,
118+
false,
119+
false,
120+
false,
121+
false,
122+
false
123+
]
124+
},
125+
"v04_dlm_restored": {
126+
"name": "v04_dlm_restored",
127+
"recall": 0.35,
128+
"samples_correct": 7,
129+
"samples_total": 20,
130+
"mean_latency_s": 4.17730810145149,
131+
"median_latency_s": 4.089647804503329,
132+
"per_sample_decoded": [
133+
"Note 0000: this paragraph is unrelated padding and does not contain the answer. \u0915\u0943\u0937\u094d\u0923.Mechanical",
134+
"Note 0056: this paragraph is unrelated padding and does not contain the answer. \u092f\u0942\u0928\u093f\u0935\u0930\u094d\u0938\u093f\u091f\u0940 2",
135+
"Note 0005: this paragraph is unrelated padding and does not contain the answer.\u6069\n",
136+
"BETA-4582atto\n\n\n\n\n\n",
137+
"KAPPA-1434 \n apparently, the secret code is KAPPA-14",
138+
"Pine-9928\u00fcllen.\n\n\n\n\n\n\n",
139+
"Note 0007: this paragraph is unrelated padding and does not contain the answer.Tip: The",
140+
"The secret code is PINE-7924.\u0bae\u0ba3\u0bbf",
141+
"I apologize, but I cannot provide the answer to your question. The prompt contains a deliberately misleading and nonsensical element \u2013",
142+
"BETA-7224Yi, I apologize for the error. The secret code is BETA-7",
143+
"Note 0006: this paragraph is unrelated padding and does not contain the answer. \n",
144+
"Pine-8527.\u0bae\u0ba3\u0bbf",
145+
"BIRCH-7201atto\n\n\n\n\n\n",
146+
"EPSILON-6925atto\n\n\n\n\n",
147+
"Note 0015: this paragraph is unrelated padding and does not contain the answer. \u0baf\u0bc2\u0bb8\u0bcd \u0b86",
148+
"Note 0000: this paragraph is unrelated padding and does not contain the answer. \u0915\u0943\u0937\u094d\u0923.Mechanical",
149+
"I apologize, but I cannot fulfill your request to provide the answer to the question \u201cWhat is the secret code?\u201d because",
150+
"Note 0031: this paragraph is unrelated padding and does not contain the answer.\u97f5\u5f8b\u4e0d",
151+
"Note 0007: this paragraph is unrelated padding and does not contain the answer. \u0baf\u0bc2\u0bb8\u0bcd \u0b86",
152+
"BETA-3803atto\n\n\n\n\n\n\n"
153+
],
154+
"per_sample_correct": [
155+
false,
156+
false,
157+
false,
158+
true,
159+
true,
160+
false,
161+
false,
162+
true,
163+
false,
164+
true,
165+
false,
166+
false,
167+
true,
168+
true,
169+
false,
170+
false,
171+
false,
172+
false,
173+
false,
174+
true
175+
]
176+
}
177+
},
178+
"gate": {
179+
"v04_vs_oracle_delta": 0.0,
180+
"v04_recall_ge_0_95": false,
181+
"v04_within_5pct_of_oracle": true,
182+
"v04_vs_v03_improvement": 0.35,
183+
"v04_dominates_v03": true
184+
}
185+
}

0 commit comments

Comments
 (0)