Skip to content

Commit 698a7cb

Browse files
committed
feat: add convert-to-marimo skill
Encapsulates the full Jupyter-to-marimo conversion workflow derived from the semantic-search notebook conversion, covering dependency management, SDK updates, marimo affordances, code quality, prose standards, and common pitfalls. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
1 parent 97f4811 commit 698a7cb

1 file changed

Lines changed: 340 additions & 0 deletions

File tree

  • .claude/skills/convert-to-marimo
Lines changed: 340 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,340 @@
1+
---
2+
name: convert-to-marimo
3+
description: This skill should be used when the user asks to "convert a notebook to marimo", "migrate a Jupyter notebook to marimo", "rewrite a notebook in marimo", or wants to modernize an existing .ipynb file into a high-quality marimo .py notebook.
4+
version: 0.1.0
5+
---
6+
7+
# Convert Jupyter Notebook to Marimo
8+
9+
Convert an existing `.ipynb` Jupyter notebook into a high-quality marimo `.py` notebook, updating dependencies, adopting marimo affordances, improving code quality, and revising prose to meet the Pinecone examples writing guidelines.
10+
11+
**Writing guidelines reference:** See **`.ai/writing-guidelines.md`** for voice, tone, and style.
12+
13+
## Phase 1: Initial Conversion
14+
15+
### Convert with marimo
16+
17+
```bash
18+
uv run marimo convert path/to/notebook.ipynb -o docs/notebook-name.py
19+
```
20+
21+
### Start in sandbox mode for development
22+
23+
```bash
24+
uvx marimo edit --sandbox docs/notebook-name.py --no-token
25+
```
26+
27+
Sandbox mode creates an isolated environment from the notebook's `# /// script` inline metadata — none of the project's root dependencies bleed in. Always develop in sandbox mode so the dependency list stays honest.
28+
29+
### Explore the code_mode API at the start of each session
30+
31+
```python
32+
import marimo._code_mode as cm
33+
help(cm)
34+
```
35+
36+
The API can change between marimo versions; verify it before using it.
37+
38+
---
39+
40+
## Phase 2: Dependencies
41+
42+
### Update the `# /// script` metadata block
43+
44+
The converted file will have a metadata block at the top. This is the source of truth for the notebook's dependencies when running in sandbox mode.
45+
46+
```python
47+
# /// script
48+
# requires-python = ">=3.10"
49+
# dependencies = [
50+
# "marimo>=0.23.6",
51+
# "pinecone==9.0.1",
52+
# "datasets==3.5.1",
53+
# ]
54+
# ///
55+
```
56+
57+
**Rules:**
58+
- Pin every dependency to a specific version with `==`
59+
- Include `marimo>=0.23.6` (or current version)
60+
- Pin the Pinecone SDK to `9.0.1` (or latest)
61+
- Keep this block in the notebook file — **never** add notebook deps to the root `pyproject.toml`
62+
63+
### Remove unused dependencies
64+
65+
After conversion, audit the declared deps against what's actually imported. Common removals:
66+
- `tqdm` — replaced by `mo.status.progress_bar()`
67+
- `numpy` — often imported by Jupyter cells that don't need it directly
68+
- `pinecone-notebooks` — Colab-only authentication widget, not needed in marimo
69+
70+
### Watch for library compatibility breaks
71+
72+
Check whether newer versions of dependencies break with the data sources used. A known example: `datasets>=4.0` dropped support for custom loading scripts (e.g. `Helsinki-NLP/tatoeba`). Pin to the last working version and note why in a comment.
73+
74+
---
75+
76+
## Phase 3: Remove Jupyter/Colab Artifacts
77+
78+
Delete or replace:
79+
80+
- **Colab/nbviewer badges** — strip from the header markdown cell
81+
- **`!pip install` cells** — dependencies are declared in `# /// script`, not installed at runtime
82+
- **Colab authentication cells**`pinecone_notebooks.colab.Authenticate()` and similar widgets
83+
- **"Note: pip install is formatted for Jupyter" markdown** — not relevant in marimo
84+
- **`## Installation` section headings** — no installation step needed
85+
- **`trust_remote_code=True` notes** — keep the argument but remove surrounding Jupyter-specific explanation
86+
- **References to "this notebook", "run this cell", "Jupyter"** — rewrite as plain prose
87+
88+
---
89+
90+
## Phase 4: Update the Pinecone SDK to 9.0.1
91+
92+
Replace deprecated method calls with the `pc.indexes.*` namespace:
93+
94+
| Old | New |
95+
|-----|-----|
96+
| `pc.has_index(name=x)` | `pc.indexes.exists(name=x)` |
97+
| `pc.create_index(name=x, ...)` | `pc.indexes.create(name=x, ...)` |
98+
| `pc.describe_index(name=x)` | `pc.indexes.describe(name=x)` |
99+
| `pc.delete_index(name=x)` | `pc.indexes.delete(name=x)` |
100+
| `pc.Index(host=desc.host)` | `pc.index(name=x)` |
101+
| `index.search(namespace=ns, query={"top_k": k, "inputs": {...}})` | `index.search(namespace=ns, top_k=k, inputs={...})` |
102+
| `results["result"]["hits"]` / `result["_score"]` | `results.result.hits` / `hit.score` |
103+
104+
**Always use keyword argument names** in all Pinecone API calls — positional args are harder to read and more fragile across SDK versions.
105+
106+
---
107+
108+
## Phase 5: Adopt Marimo Affordances
109+
110+
### Replace `print()` output with tables
111+
112+
```python
113+
# Before
114+
for result in results:
115+
print(f"{result['text']} (score: {result['score']})")
116+
117+
# After
118+
mo.ui.table([{"text": r["text"], "score": r["score"]} for r in results])
119+
```
120+
121+
Use `mo.vstack()` to combine a heading with a table:
122+
```python
123+
mo.vstack([
124+
mo.md(f"**Query:** {query}"),
125+
mo.ui.table(data, show_column_summaries=False),
126+
])
127+
```
128+
129+
### Replace `tqdm` with `mo.status.progress_bar()`
130+
131+
```python
132+
# Before
133+
for batch in tqdm(batches):
134+
index.upsert(batch)
135+
136+
# After
137+
for batch in mo.status.progress_bar(batches, title="Upserting", show_rate=True, show_eta=True):
138+
index.upsert(batch)
139+
```
140+
141+
When passing a `range`, omit `total` — marimo infers it from `len(range(...))`.
142+
143+
### Wrap destructive operations in `mo.ui.run_button()`
144+
145+
Marimo's reactive model means all cells run automatically. A cleanup cell that deletes an index will fire immediately — gate it with a button:
146+
147+
```python
148+
# Cell 1 — display the button
149+
delete_button = mo.ui.run_button(label="Delete index")
150+
delete_button
151+
152+
# Cell 2 — action (separate cell — can't read .value in the same cell that creates it)
153+
mo.stop(not delete_button.value)
154+
pc.indexes.delete(name=index_name)
155+
```
156+
157+
### Use `mo.callout()` for status messages
158+
159+
```python
160+
mo.callout(mo.md("API key loaded from environment."), kind="success")
161+
mo.callout(mo.md("Enter your API key to continue."), kind="info")
162+
mo.callout(mo.md("**Error:** index not found."), kind="danger")
163+
```
164+
165+
Kinds: `neutral`, `info`, `warn`, `success`, `danger`.
166+
167+
### Handle API key input
168+
169+
Users running locally can set `PINECONE_API_KEY` in their environment or a `.env` file (marimo reads `.env` on startup). Users in molab need a password input:
170+
171+
```python
172+
# Cell 1 — input (hide_code=True)
173+
env_key = os.environ.get("PINECONE_API_KEY", "")
174+
api_key_input = mo.ui.text(
175+
kind="password",
176+
placeholder="pcsk_...",
177+
label="Pinecone API Key",
178+
value=env_key,
179+
full_width=True,
180+
)
181+
(
182+
mo.callout(mo.md("API key loaded from environment."), kind="success")
183+
if env_key
184+
else mo.vstack([
185+
mo.callout(mo.md("Enter your Pinecone API key. Get a free key at [app.pinecone.io](https://app.pinecone.io)."), kind="info"),
186+
api_key_input,
187+
])
188+
)
189+
190+
# Cell 2 — validate and create client (hide_code=True for the stop check; visible for pc = Pinecone(...))
191+
api_key = api_key_input.value
192+
mo.stop(
193+
not api_key,
194+
mo.callout(mo.md("**API key required.** Enter your key above to continue."), kind="danger"),
195+
)
196+
197+
# Cell 3 — visible: instantiate the client
198+
pc = Pinecone(api_key=api_key, source_tag="pinecone_examples:...")
199+
```
200+
201+
### Display data with `mo.ui.table()`
202+
203+
HuggingFace datasets and lists of dicts both work directly:
204+
205+
```python
206+
mo.ui.table(dataset, page_size=10)
207+
mo.ui.table(records, page_size=10)
208+
```
209+
210+
### Add interactive inputs for exploration
211+
212+
At the end of the notebook, add a "Try It Yourself" section:
213+
214+
```python
215+
query_input = mo.ui.text(value="default query", full_width=True)
216+
lang_select = mo.ui.radio(
217+
options={"All": None, "English": "en", "Spanish": "es"},
218+
value="All",
219+
)
220+
mo.vstack([query_input, lang_select])
221+
```
222+
223+
Then in the next cell:
224+
```python
225+
search(query_input.value, lang=lang_select.value)
226+
```
227+
228+
Results update when the user changes either input.
229+
230+
---
231+
232+
## Phase 6: Code Quality
233+
234+
### Remove over-explaining comments
235+
236+
Only comment on the non-obvious WHY — not on what the code does. Delete comments like:
237+
- `# Initialize client`
238+
- `# convert to record format`
239+
- `# flatten and shuffle for ease of use`
240+
- `# Here, we create a record for each sentence in the dataset`
241+
242+
Keep comments that explain constraints, workarounds, or non-obvious choices.
243+
244+
### Decompose monolithic functions
245+
246+
If the converted notebook has a large function doing multiple things, split it:
247+
- Separate filtering from reshaping from formatting
248+
- Name each function after its single responsibility
249+
- Parameterize functions properly — avoid globals captured by closures
250+
251+
### Avoid multiply-defined variables across cells
252+
253+
Marimo's static analysis flags top-level variables defined in more than one cell. When two cells have the same local variable names, either:
254+
- Use different names
255+
- Inline the computation (no assignment)
256+
- Consolidate both cells into one
257+
258+
### Watch for marimo cell configuration issues
259+
260+
Cells created with `code_mode` default to `hide_code=True`. Always explicitly set `hide_code=False` for code cells that should be visible. Verify with:
261+
262+
```python
263+
for cell in ctx.cells:
264+
kind = "md " if cell.config.hide_code else "code"
265+
print(f"[{cell.id}] {kind}: {cell.code[:60]!r}")
266+
```
267+
268+
---
269+
270+
## Phase 7: Prose and Structure
271+
272+
Follow `.ai/writing-guidelines.md`. Key points for marimo conversion:
273+
274+
### Voice and tone
275+
- Use "we" throughout (collaborative tutorial voice)
276+
- Factual and collegial — no "super helpful!", "Neat!", "magic", "Congrats"
277+
- No superlatives, no marketing language
278+
- No time references ("recently added", "new feature")
279+
280+
### Structure
281+
- **Intersperse explanations between code cells** — don't dump all prose at the top
282+
- Put "why" before the code it motivates (e.g. explain why a keyword is ambiguous just before the filter that uses it)
283+
- After showing data, explain what you see before proceeding
284+
- Use `###` subheadings within sections for skimmability
285+
286+
### Merge adjacent text cells
287+
When two or more markdown cells appear next to each other with no code between them, consolidate them into one unless they serve structurally distinct purposes (e.g. a section heading followed by body text can be merged).
288+
289+
### Remove Jupyter-specific prose
290+
- "Run the cell below" → remove or rewrite
291+
- "This notebook will..." → "This example demonstrates..."
292+
- References to Colab, Google Colab, nbviewer → remove entirely
293+
- "In this notebook" → rewrite without the word "notebook"
294+
295+
### Section heading guidelines
296+
- Headings should be short noun phrases, not full sentences
297+
- "Meaning Over Keywords" not "Semantic Search considers the meaning of the query"
298+
- "How It Works" not "Wait, how is this working?"
299+
- "Cleanup" not "Demo Cleanup"
300+
301+
---
302+
303+
## Phase 8: Final Checks
304+
305+
### Run ruff
306+
```bash
307+
uv run ruff check docs/notebook-name.py
308+
uv run ruff format docs/notebook-name.py
309+
```
310+
311+
The CI pipeline runs `ruff check` and `ruff format --check` on changed `.py` files. Fix all issues before committing.
312+
313+
### Verify sandbox runs
314+
```bash
315+
uvx marimo edit --sandbox docs/notebook-name.py --no-token
316+
```
317+
318+
Run through the notebook end-to-end to confirm all cells execute correctly in the isolated environment.
319+
320+
### Verify no root pyproject.toml changes
321+
Notebook dependencies belong in the `# /// script` block only. If marimo's package manager added anything to `pyproject.toml` during development, revert those changes and restore `uv.lock` from main:
322+
```bash
323+
git checkout origin/main -- uv.lock
324+
```
325+
326+
---
327+
328+
## Common Pitfalls
329+
330+
| Problem | Fix |
331+
|---------|-----|
332+
| `mo.ui.run_button().value` read in same cell | Split button creation and value access into separate cells |
333+
| Multiply-defined variable names across cells | Inline the call or use distinct names |
334+
| Cells created with `code_mode` are hidden | Explicitly set `hide_code=False` |
335+
| marimo package manager edits `pyproject.toml` | Revert — deps belong in `# /// script` only |
336+
| `datasets>=4` breaks dataset loading scripts | Pin to last working version (e.g. `datasets==3.5.1`) |
337+
| Old SDK calls (`pc.has_index`, `pc.Index(host=...)`) | Replace with `pc.indexes.*` namespace |
338+
| `tqdm` still imported but unused | Remove it — use `mo.status.progress_bar()` |
339+
| `source_tag` in `pc = Pinecone(...)` | Keep it, but note in prose it's for internal Pinecone analytics — users should not include it in their own apps |
340+
| Index deletion cell auto-fires on notebook load | Wrap in `mo.ui.run_button()` |

0 commit comments

Comments
 (0)