Skip to content

Commit c63f91e

Browse files
jhamonclaude
andauthored
feat: add semantic search marimo notebook (#581)
## Summary Adds a new marimo notebook demonstrating semantic search with Pinecone, converted and significantly expanded from the existing `docs/semantic-search.ipynb`. The notebook uses Pinecone's Integrated Inference with the `multilingual-e5-large` model to demonstrate cross-lingual semantic search across English and Spanish sentences. ## Changes - New notebook `docs/semantic-search.py` (marimo format) with: - Pinecone SDK 9.0.1 API (`pc.indexes.*`, `pc.index()`, updated search signature) - `multilingual-e5-large` embedding model for cross-lingual retrieval - Refactored dataset prep: `filter_pairs` + `extract_sentences(lang)` to produce both English and Spanish records from Tatoeba - `to_records` parameterized on column name with ID prefixes for multi-language upsert - `mo.ui.table` for dataset inspection, `mo.status.progress_bar` replacing tqdm, `mo.ui.run_button` for safe index deletion - Interactive query section with `mo.ui.text` and `mo.ui.radio` for language filter - Language filtering section demonstrating metadata filters scoped to `en`/`es` - Prose interspersed between code cells narrating the process - "Meaning Over Keywords" and "How It Works" sections explaining model selection and cross-lingual retrieval - `pyproject.toml`: pins notebook dependencies (`datasets==3.5.1`, `pinecone==9.0.1`, `numpy`, `tqdm`) ## Test Plan - [ ] Notebook runs end-to-end with a valid `PINECONE_API_KEY` - [ ] Index creation, upsert, and query cells execute without errors - [ ] Cross-lingual queries return results in both languages - [ ] Language filter correctly scopes results to `en` or `es` - [ ] Interactive query input updates results on change - [ ] Delete button safely removes the index 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk: adds a new documentation notebook only, with no changes to production code paths; the main impact is on users running the example (it creates/deletes a Pinecone index). > > **Overview** > Adds a new `docs/semantic-search.py` Marimo notebook that walks through building a semantic search demo with Pinecone Integrated Inference, including index creation for `multilingual-e5-large`, dataset filtering/record preparation for English+Spanish, batched `upsert_records`, and `index.search` with optional `lang` metadata filtering. > > The notebook also adds interactive UI elements for querying and a run-button gated cleanup step to delete the created index. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit a233d4b. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Code <claude@anthropic.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
1 parent 57b4a4e commit c63f91e

1 file changed

Lines changed: 491 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)