Skip to content

Commit 48802e5

Browse files
jhamonclaude
andauthored
feat: improve semantic search marimo notebook (#582)
## Summary Follow-up improvements to the semantic search marimo notebook merged in #581. ## Changes **Multilingual support** - Switch embedding model to `multilingual-e5-large` for cross-lingual retrieval - Embed both English and Spanish sentences from Tatoeba using `filter_pairs` + `extract_sentences(lang)` - Add cross-lingual query examples and a language filtering section using Pinecone metadata filters **Interactivity** - Interactive query input with `mo.ui.text` and `mo.ui.radio` language selector - Interactive API key input: reads `PINECONE_API_KEY` from env/`.env` with a `mo.ui.text(kind="password")` fallback for molab users; uses `mo.callout` admonitions for each state **Display** - Search results rendered as `mo.ui.table` with a `lang` column showing which language each hit came from - Progress bar via `mo.status.progress_bar` (replacing tqdm) **Correctness / hygiene** - Pin `datasets==3.5.1` — `datasets>=4` dropped support for custom loading scripts used by `Helsinki-NLP/tatoeba` - Use keyword argument names in all Pinecone API calls - Remove unused `numpy` and `tqdm` dependencies - Remove notebook-specific deps from root `pyproject.toml` (they belong in the notebook's `# /// script` inline metadata) ## Test Plan - [ ] Notebook runs end-to-end in sandbox mode (`uvx marimo edit --sandbox`) with a valid `PINECONE_API_KEY` - [ ] Password input appears when env var is unset; success callout appears when set - [ ] Cross-lingual queries return results in both English and Spanish - [ ] Language filter correctly scopes results to `en` or `es` - [ ] Interactive query input updates results on change 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk documentation/notebook-only changes that adjust dependency pinning and interactive API key handling; no production code paths affected. > > **Overview** > Improves the `docs/semantic-search.py` semantic search marimo notebook setup experience by **pinning `datasets==3.5.1`** and replacing the env-only Pinecone key requirement with an **interactive API key input** (env/`.env` auto-detect + password field fallback with callouts). > > Adds a guard (`mo.stop`) to halt execution until a key is provided and introduces a brief section clarifying client instantiation (including the example-only `source_tag`). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 6da4180. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Code <claude@anthropic.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
1 parent c63f91e commit 48802e5

1 file changed

Lines changed: 67 additions & 7 deletions

File tree

docs/semantic-search.py

Lines changed: 67 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# /// script
22
# requires-python = ">=3.10"
33
# dependencies = [
4-
# "datasets",
4+
# "datasets==3.5.1",
55
# "marimo>=0.23.6",
66
# "pinecone==9.0.1",
77
# ]
@@ -53,17 +53,77 @@ def _(mo):
5353
5454
### Pinecone API Key
5555
56-
Set your `PINECONE_API_KEY` environment variable before running this notebook.
57-
You can get a free key at [app.pinecone.io](https://app.pinecone.io).
56+
You'll need a free Pinecone API key to run this notebook. Get one at
57+
[app.pinecone.io](https://app.pinecone.io).
58+
59+
**Running locally?** Set `PINECONE_API_KEY` in your environment or in a `.env`
60+
file — marimo reads `.env` files automatically on startup. The cell below will
61+
detect the key and confirm it's loaded.
62+
63+
**Running in molab?** Enter your key directly in the input field below.
5864
""")
5965
return
6066

6167

62-
@app.cell
63-
def _(Pinecone, os):
64-
# Initialize client
65-
api_key = os.environ.get("PINECONE_API_KEY")
68+
@app.cell(hide_code=True)
69+
def _(mo, os):
70+
env_key = os.environ.get("PINECONE_API_KEY", "")
71+
72+
api_key_input = mo.ui.text(
73+
kind="password",
74+
placeholder="pcsk_...",
75+
label="Pinecone API Key",
76+
value=env_key,
77+
full_width=True,
78+
)
79+
80+
(
81+
mo.callout(mo.md("API key loaded from environment."), kind="success")
82+
if env_key
83+
else mo.vstack(
84+
[
85+
mo.callout(
86+
mo.md(
87+
"Enter your Pinecone API key. Get a free key at [app.pinecone.io](https://app.pinecone.io)."
88+
),
89+
kind="info",
90+
),
91+
api_key_input,
92+
]
93+
)
94+
)
95+
return (api_key_input,)
96+
97+
98+
@app.cell(hide_code=True)
99+
def _(api_key_input, mo):
100+
api_key = api_key_input.value
101+
mo.stop(
102+
not api_key,
103+
mo.callout(
104+
mo.md("**API key required.** Enter your key above to continue."),
105+
kind="danger",
106+
),
107+
)
108+
return (api_key,)
109+
110+
111+
@app.cell(hide_code=True)
112+
def _(mo):
113+
mo.md(r"""
114+
### Instantiating the Client
66115
116+
With the API key in hand, we can create a `Pinecone` client. This is the entry point for all
117+
control-plane operations — creating and managing indexes, listing namespaces, and so on.
118+
119+
The `source_tag` parameter is used internally by Pinecone to attribute API usage from example
120+
notebooks. You would not include this in your own applications.
121+
""")
122+
return
123+
124+
125+
@app.cell(hide_code=True)
126+
def _(Pinecone, api_key):
67127
pc = Pinecone(
68128
api_key=api_key,
69129
source_tag="pinecone_examples:docs:semantic_search",

0 commit comments

Comments
 (0)