Skip to content

Commit ec27d0c

Browse files
Add cell outputs
1 parent 47a33b0 commit ec27d0c

1 file changed

Lines changed: 147 additions & 34 deletions

File tree

tutorials/49_TurboQuant_Quantization_with_HuggingFace.ipynb

Lines changed: 147 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"\n",
99
"- **Level**: Advanced\n",
1010
"- **Time to complete**: 20 min\n",
11-
"- **Nodes Used**: [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator)\n",
11+
"- **Components Used**: [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator)\n",
1212
"- **Goal**: Apply TurboQuant KV cache compression to a local LLM and measure its memory and throughput impact with Haystack."
1313
]
1414
},
@@ -41,14 +41,29 @@
4141
},
4242
{
4343
"cell_type": "code",
44-
"execution_count": null,
45-
"metadata": {},
46-
"outputs": [],
44+
"metadata": {
45+
"ExecuteTime": {
46+
"end_time": "2026-04-03T13:26:59.381856Z",
47+
"start_time": "2026-04-03T13:26:24.012627Z"
48+
}
49+
},
4750
"source": [
4851
"%%bash\n",
4952
"\n",
5053
"pip install -q haystack-ai turboquant-vllm"
51-
]
54+
],
55+
"outputs": [
56+
{
57+
"name": "stderr",
58+
"output_type": "stream",
59+
"text": [
60+
"\n",
61+
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m A new release of pip is available: \u001B[0m\u001B[31;49m25.0.1\u001B[0m\u001B[39;49m -> \u001B[0m\u001B[32;49m26.0.1\u001B[0m\n",
62+
"\u001B[1m[\u001B[0m\u001B[34;49mnotice\u001B[0m\u001B[1;39;49m]\u001B[0m\u001B[39;49m To update, run: \u001B[0m\u001B[32;49mpip install --upgrade pip\u001B[0m\n"
63+
]
64+
}
65+
],
66+
"execution_count": 1
5267
},
5368
{
5469
"cell_type": "markdown",
@@ -61,9 +76,12 @@
6176
},
6277
{
6378
"cell_type": "code",
64-
"execution_count": null,
65-
"metadata": {},
66-
"outputs": [],
79+
"metadata": {
80+
"ExecuteTime": {
81+
"end_time": "2026-04-03T13:26:59.418920Z",
82+
"start_time": "2026-04-03T13:26:59.390162Z"
83+
}
84+
},
6785
"source": [
6886
"import time\n",
6987
"\n",
@@ -76,7 +94,9 @@
7694
" if first_token_time is None:\n",
7795
" first_token_time = now\n",
7896
" last_token_time = now"
79-
]
97+
],
98+
"outputs": [],
99+
"execution_count": 2
80100
},
81101
{
82102
"cell_type": "markdown",
@@ -95,9 +115,12 @@
95115
},
96116
{
97117
"cell_type": "code",
98-
"execution_count": 10,
99-
"metadata": {},
100-
"outputs": [],
118+
"metadata": {
119+
"ExecuteTime": {
120+
"end_time": "2026-04-03T13:27:40.845609Z",
121+
"start_time": "2026-04-03T13:26:59.424193Z"
122+
}
123+
},
101124
"source": [
102125
"from transformers import DynamicCache\n",
103126
"from turboquant_vllm import CompressedDynamicCache\n",
@@ -107,7 +130,18 @@
107130
"# and not `compressed` directly.\n",
108131
"cache = DynamicCache()\n",
109132
"compressed = CompressedDynamicCache(cache, head_dim=128, bits=4)"
110-
]
133+
],
134+
"outputs": [
135+
{
136+
"name": "stderr",
137+
"output_type": "stream",
138+
"text": [
139+
"/Users/kacper.lukawski/Projects/haystack-tutorials/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
140+
" from .autonotebook import tqdm as notebook_tqdm\n"
141+
]
142+
}
143+
],
144+
"execution_count": 3
111145
},
112146
{
113147
"cell_type": "markdown",
@@ -120,12 +154,14 @@
120154
},
121155
{
122156
"cell_type": "code",
123-
"execution_count": null,
124-
"metadata": {},
125-
"outputs": [],
157+
"metadata": {
158+
"ExecuteTime": {
159+
"end_time": "2026-04-03T13:28:00.561590Z",
160+
"start_time": "2026-04-03T13:27:40.873264Z"
161+
}
162+
},
126163
"source": [
127164
"from haystack.components.generators.chat import HuggingFaceLocalChatGenerator\n",
128-
"from haystack.utils import Secret\n",
129165
"\n",
130166
"generator = HuggingFaceLocalChatGenerator(\n",
131167
" model=\"Qwen/Qwen3-4B-Thinking-2507\",\n",
@@ -136,7 +172,9 @@
136172
" },\n",
137173
" streaming_callback=timing_callback,\n",
138174
")"
139-
]
175+
],
176+
"outputs": [],
177+
"execution_count": 4
140178
},
141179
{
142180
"cell_type": "markdown",
@@ -149,9 +187,12 @@
149187
},
150188
{
151189
"cell_type": "code",
152-
"execution_count": null,
153-
"metadata": {},
154-
"outputs": [],
190+
"metadata": {
191+
"ExecuteTime": {
192+
"end_time": "2026-04-03T14:06:57.416718Z",
193+
"start_time": "2026-04-03T13:28:00.574408Z"
194+
}
195+
},
155196
"source": [
156197
"from haystack.dataclasses import ChatMessage\n",
157198
"\n",
@@ -160,17 +201,57 @@
160201
" ChatMessage.from_user(\"What is the capital of France?\"),\n",
161202
"])\n",
162203
"total_time = time.perf_counter() - start"
163-
]
204+
],
205+
"outputs": [
206+
{
207+
"name": "stderr",
208+
"output_type": "stream",
209+
"text": [
210+
"Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 55.96it/s]\n",
211+
"Device set to use mps\n"
212+
]
213+
}
214+
],
215+
"execution_count": 5
164216
},
165217
{
166218
"cell_type": "code",
167-
"execution_count": null,
168-
"metadata": {},
169-
"outputs": [],
219+
"metadata": {
220+
"ExecuteTime": {
221+
"end_time": "2026-04-03T14:06:59.733732Z",
222+
"start_time": "2026-04-03T14:06:58.328820Z"
223+
}
224+
},
170225
"source": [
171226
"reply = output[\"replies\"][0]\n",
172227
"print(reply.text)"
173-
]
228+
],
229+
"outputs": [
230+
{
231+
"name": "stdout",
232+
"output_type": "stream",
233+
"text": [
234+
"Okay, the user is asking, \"What is the capital of France?\" This seems like a straightforward geography question. Let me recall... I know that Paris is the capital of France. But wait, I should make sure I'm not mixing it up with other countries. For example, London is the capital of the UK, and Berlin is for Germany. Yeah, France's capital is definitely Paris.\n",
235+
"\n",
236+
"Hmm, why would someone ask this? Maybe they're a student studying for a test, or someone learning English as a second language. They might be confirming basic facts. Or perhaps they're testing me to see if I know the answer correctly. Either way, I should give a clear and confident answer.\n",
237+
"\n",
238+
"I remember that historically, Paris has been the capital since the Middle Ages. It's also known for the Eiffel Tower, the Louvre Museum, and the French Revolution. So, there's no doubt here. But I should double-check to avoid any mistakes. Let me think... Yes, all reliable sources say Paris is the capital. No recent changes either—France hasn't moved its capital. \n",
239+
"\n",
240+
"The user might also be confused if they heard about \"République française\" or something else, but no, the capital is still Paris. I should mention it's in the Île-de-France region to be precise, but maybe that's extra. The question is simple, so the answer should be concise.\n",
241+
"\n",
242+
"Wait, is there any trick here? Like, does France have multiple capitals? No, Paris is the only capital. Sometimes people confuse it with other cities like Lyon or Marseille, but those are major cities, not capitals. So, no confusion here.\n",
243+
"\n",
244+
"I think the best response is to state clearly that Paris is the capital of France. Maybe add a brief note about its significance to be helpful. Like, it's the political, cultural, and economic center. But the user just asked for the capital, so keep it short. Don't overcomplicate it.\n",
245+
"\n",
246+
"Also, the user might be non-native English speaker, so I should use simple language. No need for complex terms. Just \"Paris\" is enough, but adding \"the capital city of France\" makes it clear.\n",
247+
"\n",
248+
"Let me phrase it: \"The capital of France is Paris.\" Done. That's accurate and straightforward. No need for more details unless the user asks follow-ups. \n",
249+
"\n",
250+
"Wait, just to be thorough—did France ever have another capital? Like, during the French Revolution, they moved temporarily? No, Paris remained the capital. There\n"
251+
]
252+
}
253+
],
254+
"execution_count": 6
174255
},
175256
{
176257
"cell_type": "markdown",
@@ -187,9 +268,12 @@
187268
},
188269
{
189270
"cell_type": "code",
190-
"execution_count": null,
191-
"metadata": {},
192-
"outputs": [],
271+
"metadata": {
272+
"ExecuteTime": {
273+
"end_time": "2026-04-03T14:07:00.078667Z",
274+
"start_time": "2026-04-03T14:06:59.919485Z"
275+
}
276+
},
193277
"source": [
194278
"tokens = reply.meta[\"usage\"][\"completion_tokens\"]\n",
195279
"if first_token_time is not None and last_token_time is not None:\n",
@@ -198,7 +282,20 @@
198282
" print(f\"Tokens: {tokens}\")\n",
199283
" print(f\"Speed: {tokens / generation_time:.1f} tok/s\")\n",
200284
"print(f\"Total time: {total_time:.3f}s\")"
201-
]
285+
],
286+
"outputs": [
287+
{
288+
"name": "stdout",
289+
"output_type": "stream",
290+
"text": [
291+
"TTFT: 81.985s\n",
292+
"Tokens: 512\n",
293+
"Speed: 0.4 tok/s\n",
294+
"Total time: 1425.370s\n"
295+
]
296+
}
297+
],
298+
"execution_count": 7
202299
},
203300
{
204301
"cell_type": "markdown",
@@ -211,12 +308,28 @@
211308
},
212309
{
213310
"cell_type": "code",
214-
"execution_count": null,
215-
"metadata": {},
216-
"outputs": [],
311+
"metadata": {
312+
"ExecuteTime": {
313+
"end_time": "2026-04-03T14:07:00.526515Z",
314+
"start_time": "2026-04-03T14:07:00.121042Z"
315+
}
316+
},
217317
"source": [
218318
"compressed.vram_bytes()"
219-
]
319+
],
320+
"outputs": [
321+
{
322+
"data": {
323+
"text/plain": [
324+
"20680704"
325+
]
326+
},
327+
"execution_count": 8,
328+
"metadata": {},
329+
"output_type": "execute_result"
330+
}
331+
],
332+
"execution_count": 8
220333
},
221334
{
222335
"cell_type": "markdown",

0 commit comments

Comments
 (0)