Commit c63f91e
feat: add semantic search marimo notebook (#581)
## Summary
Adds a new marimo notebook demonstrating semantic search with Pinecone,
converted and significantly expanded from the existing
`docs/semantic-search.ipynb`. The notebook uses Pinecone's Integrated
Inference with the `multilingual-e5-large` model to demonstrate
cross-lingual semantic search across English and Spanish sentences.
## Changes
- New notebook `docs/semantic-search.py` (marimo format) with:
- Pinecone SDK 9.0.1 API (`pc.indexes.*`, `pc.index()`, updated search
signature)
- `multilingual-e5-large` embedding model for cross-lingual retrieval
- Refactored dataset prep: `filter_pairs` + `extract_sentences(lang)` to
produce both English and Spanish records from Tatoeba
- `to_records` parameterized on column name with ID prefixes for
multi-language upsert
- `mo.ui.table` for dataset inspection, `mo.status.progress_bar`
replacing tqdm, `mo.ui.run_button` for safe index deletion
- Interactive query section with `mo.ui.text` and `mo.ui.radio` for
language filter
- Language filtering section demonstrating metadata filters scoped to
`en`/`es`
- Prose interspersed between code cells narrating the process
- "Meaning Over Keywords" and "How It Works" sections explaining model
selection and cross-lingual retrieval
- `pyproject.toml`: pins notebook dependencies (`datasets==3.5.1`,
`pinecone==9.0.1`, `numpy`, `tqdm`)
## Test Plan
- [ ] Notebook runs end-to-end with a valid `PINECONE_API_KEY`
- [ ] Index creation, upsert, and query cells execute without errors
- [ ] Cross-lingual queries return results in both languages
- [ ] Language filter correctly scopes results to `en` or `es`
- [ ] Interactive query input updates results on change
- [ ] Delete button safely removes the index
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Low Risk**
> Low risk: adds a new documentation notebook only, with no changes to
production code paths; the main impact is on users running the example
(it creates/deletes a Pinecone index).
>
> **Overview**
> Adds a new `docs/semantic-search.py` Marimo notebook that walks
through building a semantic search demo with Pinecone Integrated
Inference, including index creation for `multilingual-e5-large`, dataset
filtering/record preparation for English+Spanish, batched
`upsert_records`, and `index.search` with optional `lang` metadata
filtering.
>
> The notebook also adds interactive UI elements for querying and a
run-button gated cleanup step to delete the created index.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
a233d4b. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>1 parent 57b4a4e commit c63f91e
1 file changed
Lines changed: 491 additions & 0 deletions
0 commit comments