Skip to content

Commit 4635846

Browse files
fix: repair notebook CI (dead model, missing API key, pyarrow type bug) (#348)
* fix: repair notebook CI by replacing dead vision model and adding missing API key - Replace `meta/llama-4-scout-17b-16e-instruct` (no longer serving on build.nvidia.com) with `nvidia/nemotron-nano-12b-v2-vl` (project default) in tutorial notebook 4 - Add `OPENROUTER_API_KEY` to the `build-notebooks` workflow so notebooks 5 and 6 (which use OpenRouter for image generation) can authenticate - Regenerate colab notebooks to reflect the model change * fix: handle pyarrow list types in notebook 6 display_image When image columns are loaded from parquet with pyarrow backend, list values are pyarrow ListScalars, not Python lists. The isinstance(x, list) check fails, causing the whole ListScalar to be treated as a single path string (producing filenames ending in `png')]`). Use isinstance(x, str) instead to correctly handle any iterable type.
1 parent 8f7a720 commit 4635846

9 files changed

Lines changed: 171 additions & 170 deletions

.github/workflows/build-notebooks.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,15 @@ on:
44
workflow_dispatch:
55
schedule:
66
- cron: "0 12 * * MON"
7-
7+
88
jobs:
99
build:
1010
runs-on: ubuntu-latest
1111
permissions:
1212
contents: write
1313
env:
1414
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
15+
OPENROUTER_API_KEY: ${{ secrets.TEST_OPENROUTER_API_KEY }}
1516
steps:
1617
- name: Checkout repository
1718
uses: actions/checkout@v2

docs/colab_notebooks/1-the-basics.ipynb

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "markdown",
5-
"id": "f80a5317",
5+
"id": "538dab5d",
66
"metadata": {},
77
"source": [
88
"# 🎨 Data Designer Tutorial: The Basics\n",
@@ -14,7 +14,7 @@
1414
},
1515
{
1616
"cell_type": "markdown",
17-
"id": "f94e1097",
17+
"id": "8f148ac1",
1818
"metadata": {},
1919
"source": [
2020
"### 📦 Import Data Designer\n",
@@ -26,7 +26,7 @@
2626
},
2727
{
2828
"cell_type": "markdown",
29-
"id": "847f7d3b",
29+
"id": "85a4f14c",
3030
"metadata": {},
3131
"source": [
3232
"### ⚡ Colab Setup\n",
@@ -37,7 +37,7 @@
3737
{
3838
"cell_type": "code",
3939
"execution_count": null,
40-
"id": "ef72fc77",
40+
"id": "8cb4fa30",
4141
"metadata": {},
4242
"outputs": [],
4343
"source": [
@@ -48,7 +48,7 @@
4848
{
4949
"cell_type": "code",
5050
"execution_count": null,
51-
"id": "cfa5d568",
51+
"id": "8c97cb88",
5252
"metadata": {},
5353
"outputs": [],
5454
"source": [
@@ -66,7 +66,7 @@
6666
{
6767
"cell_type": "code",
6868
"execution_count": null,
69-
"id": "dc6a83bf",
69+
"id": "192de987",
7070
"metadata": {},
7171
"outputs": [],
7272
"source": [
@@ -76,7 +76,7 @@
7676
},
7777
{
7878
"cell_type": "markdown",
79-
"id": "1d2f7999",
79+
"id": "09cc055e",
8080
"metadata": {},
8181
"source": [
8282
"### ⚙️ Initialize the Data Designer interface\n",
@@ -89,7 +89,7 @@
8989
{
9090
"cell_type": "code",
9191
"execution_count": null,
92-
"id": "a073af81",
92+
"id": "64cd40d2",
9393
"metadata": {},
9494
"outputs": [],
9595
"source": [
@@ -98,7 +98,7 @@
9898
},
9999
{
100100
"cell_type": "markdown",
101-
"id": "92c42c7d",
101+
"id": "25827639",
102102
"metadata": {},
103103
"source": [
104104
"### 🎛️ Define model configurations\n",
@@ -115,7 +115,7 @@
115115
{
116116
"cell_type": "code",
117117
"execution_count": null,
118-
"id": "45ae1dbf",
118+
"id": "6ccc7929",
119119
"metadata": {},
120120
"outputs": [],
121121
"source": [
@@ -145,7 +145,7 @@
145145
},
146146
{
147147
"cell_type": "markdown",
148-
"id": "35d0b3af",
148+
"id": "ffeb8a6e",
149149
"metadata": {},
150150
"source": [
151151
"### 🏗️ Initialize the Data Designer Config Builder\n",
@@ -160,7 +160,7 @@
160160
{
161161
"cell_type": "code",
162162
"execution_count": null,
163-
"id": "0c957c0b",
163+
"id": "0f13f0dd",
164164
"metadata": {},
165165
"outputs": [],
166166
"source": [
@@ -169,7 +169,7 @@
169169
},
170170
{
171171
"cell_type": "markdown",
172-
"id": "132e10d6",
172+
"id": "ba888091",
173173
"metadata": {},
174174
"source": [
175175
"## 🎲 Getting started with sampler columns\n",
@@ -186,7 +186,7 @@
186186
{
187187
"cell_type": "code",
188188
"execution_count": null,
189-
"id": "b62b30fd",
189+
"id": "9d831f16",
190190
"metadata": {},
191191
"outputs": [],
192192
"source": [
@@ -195,7 +195,7 @@
195195
},
196196
{
197197
"cell_type": "markdown",
198-
"id": "04df17aa",
198+
"id": "6bf9f07e",
199199
"metadata": {},
200200
"source": [
201201
"Let's start designing our product review dataset by adding product category and subcategory columns.\n"
@@ -204,7 +204,7 @@
204204
{
205205
"cell_type": "code",
206206
"execution_count": null,
207-
"id": "cb62478e",
207+
"id": "47236c2a",
208208
"metadata": {},
209209
"outputs": [],
210210
"source": [
@@ -285,7 +285,7 @@
285285
},
286286
{
287287
"cell_type": "markdown",
288-
"id": "889beb4b",
288+
"id": "5fef86e3",
289289
"metadata": {},
290290
"source": [
291291
"Next, let's add samplers to generate data related to the customer and their review.\n"
@@ -294,7 +294,7 @@
294294
{
295295
"cell_type": "code",
296296
"execution_count": null,
297-
"id": "e2746105",
297+
"id": "6f4282c6",
298298
"metadata": {},
299299
"outputs": [],
300300
"source": [
@@ -331,7 +331,7 @@
331331
},
332332
{
333333
"cell_type": "markdown",
334-
"id": "9fcf1a92",
334+
"id": "09c6d0bd",
335335
"metadata": {},
336336
"source": [
337337
"## 🦜 LLM-generated columns\n",
@@ -346,7 +346,7 @@
346346
{
347347
"cell_type": "code",
348348
"execution_count": null,
349-
"id": "6e6ac591",
349+
"id": "b136c9c9",
350350
"metadata": {},
351351
"outputs": [],
352352
"source": [
@@ -382,7 +382,7 @@
382382
},
383383
{
384384
"cell_type": "markdown",
385-
"id": "35332948",
385+
"id": "77b21c85",
386386
"metadata": {},
387387
"source": [
388388
"### 🔁 Iteration is key – preview the dataset!\n",
@@ -399,7 +399,7 @@
399399
{
400400
"cell_type": "code",
401401
"execution_count": null,
402-
"id": "e2830ad2",
402+
"id": "9e7d8e57",
403403
"metadata": {},
404404
"outputs": [],
405405
"source": [
@@ -409,7 +409,7 @@
409409
{
410410
"cell_type": "code",
411411
"execution_count": null,
412-
"id": "911fecd7",
412+
"id": "35a3e198",
413413
"metadata": {},
414414
"outputs": [],
415415
"source": [
@@ -420,7 +420,7 @@
420420
{
421421
"cell_type": "code",
422422
"execution_count": null,
423-
"id": "46faf8e9",
423+
"id": "ac646977",
424424
"metadata": {},
425425
"outputs": [],
426426
"source": [
@@ -430,7 +430,7 @@
430430
},
431431
{
432432
"cell_type": "markdown",
433-
"id": "3565f974",
433+
"id": "135c82ff",
434434
"metadata": {},
435435
"source": [
436436
"### 📊 Analyze the generated data\n",
@@ -443,7 +443,7 @@
443443
{
444444
"cell_type": "code",
445445
"execution_count": null,
446-
"id": "6effb2c0",
446+
"id": "8b0290e0",
447447
"metadata": {},
448448
"outputs": [],
449449
"source": [
@@ -453,7 +453,7 @@
453453
},
454454
{
455455
"cell_type": "markdown",
456-
"id": "5b63d3ec",
456+
"id": "f780d07f",
457457
"metadata": {},
458458
"source": [
459459
"### 🆙 Scale up!\n",
@@ -466,7 +466,7 @@
466466
{
467467
"cell_type": "code",
468468
"execution_count": null,
469-
"id": "0214e011",
469+
"id": "e040f619",
470470
"metadata": {},
471471
"outputs": [],
472472
"source": [
@@ -476,7 +476,7 @@
476476
{
477477
"cell_type": "code",
478478
"execution_count": null,
479-
"id": "11560d0f",
479+
"id": "2beb335c",
480480
"metadata": {},
481481
"outputs": [],
482482
"source": [
@@ -489,7 +489,7 @@
489489
{
490490
"cell_type": "code",
491491
"execution_count": null,
492-
"id": "246f210c",
492+
"id": "c72948ca",
493493
"metadata": {},
494494
"outputs": [],
495495
"source": [
@@ -501,7 +501,7 @@
501501
},
502502
{
503503
"cell_type": "markdown",
504-
"id": "f9f91c1d",
504+
"id": "59725e56",
505505
"metadata": {},
506506
"source": [
507507
"## ⏭️ Next Steps\n",

0 commit comments

Comments
 (0)