|
| 1 | +--- |
| 2 | +name: pinecone-rag |
| 3 | +description: > |
| 4 | + Build production RAG pipelines and persistent agent memory using Pinecone as |
| 5 | + the vector database backend. ALWAYS USE THIS SKILL when the user mentions |
| 6 | + Pinecone, wants to index documents for semantic search, build a |
| 7 | + retrieval-augmented generation system, store agent memory across sessions, |
| 8 | + implement hybrid search, or connect an LLM to a searchable knowledge base β |
| 9 | + even if they don't say "Pinecone" explicitly. Also use when the user asks |
| 10 | + about vector databases for RAG, namespace isolation for multi-tenant agents, |
| 11 | + embedding pipelines, or scaling a knowledge base beyond what local storage |
| 12 | + can handle. DO NOT use for local-only vector stores (Chroma, FAISS, pgvector) |
| 13 | + or pure keyword search with no semantic component. |
| 14 | +license: Apache-2.0 |
| 15 | +compatibility: "pinecone>=6.0.0, Python 3.10+" |
| 16 | +--- |
| 17 | + |
| 18 | +# Pinecone RAG Skill |
| 19 | + |
| 20 | +This skill guides you through building a production RAG pipeline or persistent |
| 21 | +agent memory system using Pinecone. Follow the workflow from start to finish β |
| 22 | +don't skip steps or jump to code before understanding what the user actually |
| 23 | +needs. |
| 24 | + |
| 25 | +## Before you start β ask one question |
| 26 | + |
| 27 | +Before writing any code, identify which of these two use cases applies: |
| 28 | + |
| 29 | +**A β RAG over documents**: User wants to index a corpus (PDFs, docs, code, |
| 30 | +web pages) and retrieve relevant chunks to ground LLM responses. |
| 31 | + |
| 32 | +**B β Agent memory**: User wants an agent to remember facts, decisions, or |
| 33 | +context across sessions or across multiple agents sharing a knowledge base. |
| 34 | + |
| 35 | +The setup is similar but the namespace strategy and retrieval patterns differ. |
| 36 | +If the user hasn't said, ask: *"Is this for document retrieval, agent memory, |
| 37 | +or both?"* Then follow the relevant workflow below. |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## Step 1 β Choose your index configuration |
| 42 | + |
| 43 | +Pick the index type before writing any code. Getting this wrong means |
| 44 | +re-creating the index later. |
| 45 | + |
| 46 | +**Serverless (recommended for most cases)** |
| 47 | +```python |
| 48 | +from pinecone import Pinecone, ServerlessSpec |
| 49 | + |
| 50 | +pc = Pinecone(api_key="PINECONE_API_KEY") |
| 51 | + |
| 52 | +if "my-index" not in pc.list_indexes().names(): |
| 53 | + pc.create_index( |
| 54 | + name="my-index", |
| 55 | + dimension=1536, # must match your embedding model exactly |
| 56 | + metric="cosine", |
| 57 | + spec=ServerlessSpec(cloud="aws", region="us-east-1") |
| 58 | + ) |
| 59 | +index = pc.Index("my-index") |
| 60 | +``` |
| 61 | + |
| 62 | +**Pod-based (for consistent high-throughput production)** |
| 63 | +```python |
| 64 | +from pinecone import PodSpec |
| 65 | + |
| 66 | +pc.create_index( |
| 67 | + name="my-index-prod", |
| 68 | + dimension=1536, |
| 69 | + metric="cosine", |
| 70 | + spec=PodSpec(environment="us-east1-gcp", pod_type="p1.x1") |
| 71 | +) |
| 72 | +``` |
| 73 | + |
| 74 | +**Dimension quick reference β match this exactly to your embedding model:** |
| 75 | +| Model | Dimension | |
| 76 | +|---|---| |
| 77 | +| `text-embedding-3-small` | 1536 | |
| 78 | +| `text-embedding-3-large` | 3072 | |
| 79 | +| `voyage-3` / `voyage-multimodal-3` | 1024 | |
| 80 | +| `BAAI/bge-large-en-v1.5` | 1024 | |
| 81 | +| `intfloat/multilingual-e5-large` (Arabic, Malay, Chinese) | 1024 | |
| 82 | + |
| 83 | +> **Checkpoint**: Index exists, dimension matches embedding model, `index.describe_index_stats()` returns without error. |
| 84 | +
|
| 85 | +--- |
| 86 | + |
| 87 | +## Step 2 β Embed and upsert documents |
| 88 | + |
| 89 | +Always batch upserts β never upsert one vector at a time. |
| 90 | + |
| 91 | +```python |
| 92 | +from openai import OpenAI |
| 93 | + |
| 94 | +client = OpenAI() |
| 95 | + |
| 96 | +def embed(texts: list[str]) -> list[list[float]]: |
| 97 | + res = client.embeddings.create(model="text-embedding-3-small", input=texts) |
| 98 | + return [r.embedding for r in res.data] |
| 99 | + |
| 100 | +def upsert_docs(index, docs: list[dict], namespace: str = "default"): |
| 101 | + """docs = [{"id": "...", "text": "...", "metadata": {...}}]""" |
| 102 | + BATCH = 100 |
| 103 | + for i in range(0, len(docs), BATCH): |
| 104 | + batch = docs[i:i + BATCH] |
| 105 | + vecs = [ |
| 106 | + { |
| 107 | + "id": d["id"], |
| 108 | + "values": emb, |
| 109 | + "metadata": {**d.get("metadata", {}), "text": d["text"]} |
| 110 | + } |
| 111 | + for d, emb in zip(batch, embed([d["text"] for d in batch])) |
| 112 | + ] |
| 113 | + index.upsert(vectors=vecs, namespace=namespace) |
| 114 | +``` |
| 115 | + |
| 116 | +**Always store the original text in metadata** β this avoids a second lookup |
| 117 | +at retrieval time. |
| 118 | + |
| 119 | +> **Checkpoint**: `index.describe_index_stats()` shows vector count > 0 in the |
| 120 | +> target namespace. |
| 121 | +
|
| 122 | +--- |
| 123 | + |
| 124 | +## Step 3 β Choose retrieval strategy |
| 125 | + |
| 126 | +### Dense (semantic) search β use for most cases |
| 127 | +```python |
| 128 | +def search(index, query: str, top_k: int = 5, namespace: str = "default", |
| 129 | + filter: dict = None) -> list[dict]: |
| 130 | + [q_emb] = embed([query]) |
| 131 | + results = index.query( |
| 132 | + vector=q_emb, top_k=top_k, namespace=namespace, |
| 133 | + include_metadata=True, filter=filter |
| 134 | + ) |
| 135 | + return [{"text": m.metadata["text"], "score": m.score, "id": m.id} |
| 136 | + for m in results.matches] |
| 137 | +``` |
| 138 | + |
| 139 | +### Hybrid search (semantic + BM25 keyword) β use when corpus has exact terminology |
| 140 | +Use hybrid when the domain has precise terms that semantic search misses: |
| 141 | +legal citations, medical codes, product SKUs, API method names. |
| 142 | + |
| 143 | +```python |
| 144 | +from pinecone_text.sparse import BM25Encoder |
| 145 | + |
| 146 | +bm25 = BM25Encoder().default() |
| 147 | +bm25.fit([d["text"] for d in docs]) # fit once on your corpus |
| 148 | + |
| 149 | +def hybrid_search(index, query: str, top_k: int = 5, alpha: float = 0.7): |
| 150 | + """alpha=1.0 is pure dense; alpha=0.0 is pure sparse.""" |
| 151 | + dense = [v * alpha for v in embed([query])[0]] |
| 152 | + sparse_raw = bm25.encode_queries(query) |
| 153 | + sparse = { |
| 154 | + "indices": sparse_raw["indices"], |
| 155 | + "values": [v * (1 - alpha) for v in sparse_raw["values"]] |
| 156 | + } |
| 157 | + return index.query(vector=dense, sparse_vector=sparse, |
| 158 | + top_k=top_k, include_metadata=True).matches |
| 159 | +``` |
| 160 | + |
| 161 | +### Metadata filtering β use to scope results before semantic ranking |
| 162 | +```python |
| 163 | +# Exact match |
| 164 | +results = index.query(vector=emb, filter={"source": {"$eq": "confluence"}}) |
| 165 | + |
| 166 | +# Combined filter |
| 167 | +results = index.query(vector=emb, filter={ |
| 168 | + "$and": [ |
| 169 | + {"category": {"$eq": "engineering"}}, |
| 170 | + {"language": {"$in": ["en", "ar"]}} |
| 171 | + ] |
| 172 | +}) |
| 173 | +``` |
| 174 | + |
| 175 | +> **Checkpoint**: A test query returns relevant results with scores > 0.7 for |
| 176 | +> clearly matching content. |
| 177 | +
|
| 178 | +--- |
| 179 | + |
| 180 | +## Step 4A β Full RAG pipeline (document use case) |
| 181 | + |
| 182 | +```python |
| 183 | +def rag_answer(index, question: str, namespace: str = "default", |
| 184 | + model: str = "gpt-4o-mini") -> str: |
| 185 | + hits = search(index, question, top_k=5, namespace=namespace) |
| 186 | + context = "\n\n".join(h["text"] for h in hits) |
| 187 | + |
| 188 | + return client.chat.completions.create( |
| 189 | + model=model, |
| 190 | + messages=[ |
| 191 | + { |
| 192 | + "role": "system", |
| 193 | + "content": ( |
| 194 | + "Answer using only the provided context. " |
| 195 | + "If the answer isn't in the context, say so.\n\n" |
| 196 | + f"Context:\n{context}" |
| 197 | + ) |
| 198 | + }, |
| 199 | + {"role": "user", "content": question} |
| 200 | + ] |
| 201 | + ).choices[0].message.content |
| 202 | +``` |
| 203 | + |
| 204 | +--- |
| 205 | + |
| 206 | +## Step 4B β Agent memory (memory use case) |
| 207 | + |
| 208 | +Use namespaces to isolate each agent's or user's memories completely. |
| 209 | +Namespace per agent prevents memory bleed across users or sessions. |
| 210 | + |
| 211 | +```python |
| 212 | +import time, hashlib |
| 213 | + |
| 214 | +def remember(index, agent_id: str, content: str, |
| 215 | + memory_type: str = "fact"): |
| 216 | + """Store a memory for an agent.""" |
| 217 | + mem_id = hashlib.md5( |
| 218 | + f"{agent_id}{content}{time.time()}".encode() |
| 219 | + ).hexdigest() |
| 220 | + [emb] = embed([content]) |
| 221 | + index.upsert( |
| 222 | + vectors=[{ |
| 223 | + "id": mem_id, |
| 224 | + "values": emb, |
| 225 | + "metadata": { |
| 226 | + "text": content, |
| 227 | + "type": memory_type, |
| 228 | + "timestamp": time.time(), |
| 229 | + "agent_id": agent_id |
| 230 | + } |
| 231 | + }], |
| 232 | + namespace=f"agent_{agent_id}" |
| 233 | + ) |
| 234 | + |
| 235 | +def recall(index, agent_id: str, query: str, |
| 236 | + top_k: int = 5) -> list[str]: |
| 237 | + """Recall relevant memories for an agent.""" |
| 238 | + return [h["text"] for h in |
| 239 | + search(index, query, top_k=top_k, |
| 240 | + namespace=f"agent_{agent_id}")] |
| 241 | + |
| 242 | +def forget(index, agent_id: str): |
| 243 | + """Wipe all memories for an agent (e.g., on user request).""" |
| 244 | + index.delete(delete_all=True, namespace=f"agent_{agent_id}") |
| 245 | +``` |
| 246 | + |
| 247 | +--- |
| 248 | + |
| 249 | +## Step 5 β Wire it together and test end to end |
| 250 | + |
| 251 | +Run a quick smoke test before integrating into the larger system: |
| 252 | + |
| 253 | +```python |
| 254 | +# Smoke test |
| 255 | +upsert_docs(index, [ |
| 256 | + {"id": "t1", "text": "Pinecone is a vector database for semantic search."}, |
| 257 | + {"id": "t2", "text": "RAG combines retrieval with language model generation."}, |
| 258 | +]) |
| 259 | + |
| 260 | +hits = search(index, "What is Pinecone?") |
| 261 | +assert hits[0]["score"] > 0.7, f"Expected high similarity, got {hits[0]['score']}" |
| 262 | +print("Smoke test passed:", hits[0]["text"]) |
| 263 | +``` |
| 264 | + |
| 265 | +> **Checkpoint**: Smoke test passes. End-to-end: index β upsert β query β |
| 266 | +> LLM response works without errors. |
| 267 | +
|
| 268 | +--- |
| 269 | + |
| 270 | +## Common pitfalls β fix these before they become bugs |
| 271 | + |
| 272 | +- **Dimension mismatch**: always verify `len(embed(["test"])[0])` matches |
| 273 | + the index dimension before your first upsert. |
| 274 | +- **Missing text in metadata**: if you don't store `"text"` in metadata, |
| 275 | + you'll need a second lookup to get the actual content at query time. |
| 276 | +- **Single-vector upserts in a loop**: always batch in chunks of 100. |
| 277 | +- **No namespace strategy**: decide upfront β one namespace per user/agent |
| 278 | + prevents cross-tenant data leaks that are hard to fix later. |
| 279 | +- **Fitting BM25 on a small corpus**: BM25 needs a representative corpus to |
| 280 | + build good term frequencies. Fit on at least a few hundred documents. |
| 281 | + |
| 282 | +## When NOT to use this skill |
| 283 | + |
| 284 | +Use a different approach when: |
| 285 | +- The dataset fits in memory and latency doesn't matter β use FAISS or Chroma |
| 286 | +- You're already on PostgreSQL and want to avoid a new service β use pgvector |
| 287 | +- You need sub-5ms p99 latency with no external API calls β local vector store |
| 288 | +- The user explicitly wants a different vector DB (Weaviate, Qdrant, etc.) |
0 commit comments