diff --git a/tutorial/markdown/generated/vector-search-cookbook/ag2-agentchat_RetrieveChat_couchbase.md b/tutorial/markdown/generated/vector-search-cookbook/ag2-agentchat_RetrieveChat_couchbase.md deleted file mode 100644 index 7282056..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/ag2-agentchat_RetrieveChat_couchbase.md +++ /dev/null @@ -1,563 +0,0 @@ ---- -# frontmatter -path: "/tutorial-couchbase-ag2-rag" -title: Build an Agentic RAG Application with Couchbase and AG2 -short_title: RAG with Couchbase & AG2 -description: - - Learn how Couchbase and AG2 simplify RAG applications - - Store and retrieve document embeddings with Couchbase Vector Search - - Build an AI agent that answers questions from documentation links -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - Autogen - - Ag2 -sdk_language: - - python -length: 40 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/ag2/agentchat_RetrieveChat_couchbase.ipynb) - -# Using RetrieveChat Powered by Couchbase Capella for Retrieve Augmented Code Generation and Question Answering - -This tutorial will show you how we've made building Retrieval-Augmented Generation (RAG) applications much easier with [Couchbase](https://www.couchbase.com/) and [AG2](https://ag2.ai/). By leveraging [Couchbase's Search vector index](https://docs.couchbase.com/cloud/vector-search/vector-search.html) for storing and retrieving document embeddings, along with [AG2's powerful AI capabilities](https://docs.ag2.ai/docs/user-guide/basic-concepts/installing-ag2), our integration simplifies the entire process. As part of this tutorial, we'll also build a demo application where an AI agent can answer questions based on documentation links provided for any framework. This hands-on approach will demonstrate how effortlessly you can create intelligent, context-aware AI applications using this integration. - -RetrieveChat is a conversational system for retrieval-augmented code generation and question answering. In this notebook, we demonstrate how to utilize RetrieveChat to generate code and answer questions based on customized documentations that are not present in the LLM's training dataset. RetrieveChat uses the `AssistantAgent` and `RetrieveUserProxyAgent`, which is similar to the usage of `AssistantAgent` and `UserProxyAgent` in other notebooks (e.g., [Automated Task Solving with Code Generation, Execution & Debugging](https://docs.ag2.ai/docs/use-cases/notebooks/notebooks/agentchat_auto_feedback_from_code_execution)). Essentially, `RetrieveUserProxyAgent` implement a different auto-reply mechanism corresponding to the RetrieveChat prompts. - -Some extra dependencies are needed for this notebook, which can be installed via pip - - -```python -%pip install "pyautogen[openai,retrievechat-couchbase]==0.8.7" "flaml[automl]==2.3.4" couchbase==4.3.3 -# For more information, please refer to the [installation guide](/docs/installation/). -``` - -## Environment Setup - -# Couchbase Capella Setup Instructions - -Before we proceed with the notebook, we will require a Couchbase Capella Database Cluster running. - -## Setting Up a Free Cluster -- To set up a free operational cluster, head over to [Couchbase Cloud](https://cloud.couchbase.com) and create an account. There, create a free cluster. For more details on creating a cluster, [refer here](https://docs.couchbase.com/cloud/get-started/create-account.html). - -## Creating Required Resources -- After creating the cluster, we need to create our required bucket, scope, and collections. Head over to **Data Tools**. On the left-hand side panel, you will find an option to create a bucket. Assign appropriate names for the Bucket, Scope, and Collection. For this tutorial, use the following: - - **Bucket Name**: `new_bucket` - - **Scope Name**: `new_scope` - - **Collection Name**: `new_collection` - - **Vector SearchIndex Name**: `vector_index` - -## Creating a Search Index -Before proceeding further, we need to set up a search index for vector-based retrieval. This is essential for efficient querying in our RAG pipeline. Follow the steps below: - - - [Couchbase Capella](https://docs.couchbase.com/cloud/search/import-search-index.html) - - Copy the index definition below to a new file index.json - - Import the file in Capella using the instructions in the documentation. - - Click on Create Index to create the index. - -- [Couchbase Server](https://docs.couchbase.com/server/current/search/import-search-index.html) - - Click on Search -> Add Index -> Import - - Copy the following index definition in the Import screen - - Click on Create Index to create the index. - -#### Index Definition - -`bucket`, `scope`, `collection` === `sample_bucekt`, `sample_scope`, `sample_collection` - -```json -{ - "name": "vector_index", - "type": "fulltext-index", - "params": { - "doc_config": { - "docid_prefix_delim": "", - "docid_regexp": "", - "mode": "scope.collection.type_field", - "type_field": "type" - }, - "mapping": { - "default_analyzer": "standard", - "default_datetime_parser": "dateTimeOptional", - "default_field": "_all", - "default_mapping": { - "dynamic": true, - "enabled": false - }, - "default_type": "_default", - "docvalues_dynamic": false, - "index_dynamic": true, - "store_dynamic": false, - "type_field": "_type", - "types": { - "sample_scope.sample_collection": { - "dynamic": true, - "enabled": true, - "properties": { - "embedding": { - "enabled": true, - "dynamic": false, - "fields": [ - { - "dims": 384, - "index": true, - "name": "embedding", - "similarity": "dot_product", - "type": "vector", - "vector_index_optimized_for": "recall" - } - ] - }, - "content": { - "enabled": true, - "dynamic": false, - "fields": [ - { - "index": true, - "name": "content", - "store": true, - "type": "text" - } - ] - } - } - } - } - }, - "store": { - "indexType": "scorch", - "segmentVersion": 16 - } - }, - "sourceType": "gocbcore", - "sourceName": "sample_bucket", - "sourceParams": {}, - "planParams": { - "maxPartitionsPerPIndex": 64, - "indexPartitions": 16, - "numReplicas": 0 - } -} -``` - -## Connecting to the Cluster -- Now, we will connect to the cluster. [Refer to this page for connection details](https://docs.couchbase.com/cloud/get-started/connect.html). - -- **Create a user to connect:** - - Navigate to the **Settings** tab. - - Click **Create Cluster Access** and specify a username and password. - - Assign **read/write access to all buckets** (you may create more users with restricted permissions as needed). - - For more details, [refer here](https://docs.couchbase.com/cloud/clusters/manage-database-users.html#create-database-credentials). - -- **Add an IP Address to the allowed list:** - - In **Settings**, click on **Networking**. - - Add an [allowed IP](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) based on your requirements. - -- **Set up environment variables:** - Retrieve the connection string from the **Connect** tab. Then, configure the following environment variables: - - `CB_CONN_STR`: Couchbase Cluster Connection String - - `CB_USERNAME`: Username of the created user - - `CB_PASSWORD`: Password of the created user - - `OPENAI_API_KEY`: OpenAI API Key (required for agents) - - -```python -import os -# Environment Variables -os.environ["CB_CONN_STR"] = "<>" -os.environ["CB_USERNAME"] = "<>" -os.environ["CB_PASSWORD"] = "<>" -os.environ["OPENAI_API_KEY"] = "<>" - -# you can chge the ones below, but then you will have to change these in the vector search index created in the couchbase cluster as well. -os.environ["CB_BUCKET"] = "sample_bucket" -os.environ["CB_SCOPE"] = "sample_scope" -os.environ["CB_COLLECTION"] = "sample_collection" -os.environ["CB_INDEX_NAME"] = "vector_index" -``` - -**Voila! Your cluster is now ready to be used.** - -## Initializing Agents - -We start by initializing the `AssistantAgent` and `RetrieveUserProxyAgent`. The system message needs to be set to "You are a helpful assistant." for AssistantAgent. The detailed instructions are given in the user message. Later we will use the `RetrieveUserProxyAgent.message_generator` to combine the instructions and a retrieval augmented generation task for an initial prompt to be sent to the LLM assistant. - - -```python -import os -import sys - -from autogen import AssistantAgent - -sys.path.append(os.path.abspath("/workspaces/autogen/autogen/agentchat/contrib")) - -from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent - -# Accepted file formats that can be stored in -# a vector database instance -from autogen.retrieve_utils import TEXT_FORMATS - -config_list = [{"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"], "api_type": "openai"}] -assert len(config_list) > 0 -print("models to use: ", [config_list[i]["model"] for i in range(len(config_list))]) -``` - - -```python -print("Accepted file formats for `docs_path`:") -print(TEXT_FORMATS) -``` - -### Understanding `AssistantAgent` in AutoGen - -The `AssistantAgent` in AutoGen is a specialized subclass of `ConversableAgent` designed to perform tasks using large language models (LLMs). By default, it generates code suggestions and debugging assistance but does not execute code autonomously; it relies on user intervention for code execution. - -**Key Components of the `AssistantAgent` Initialization:** - -- **`name`**: Assigns a unique identifier to the agent. - -- **`system_message`**: Sets the default behavior and role of the agent. In this case, it's initialized with "You are a helpful assistant," guiding the agent to provide assistance aligned with this directive. - -- **`llm_config`**: Configures the LLM's behavior with parameters like timeout settings, caching mechanisms, and a list of model configurations (`config_list`). - -**How `AssistantAgent` Operates:** - -Once initialized, the `AssistantAgent` can interact with users or other agents to process tasks. It leverages the specified LLM configurations to generate responses, code snippets, or debugging advice based on the input it receives. However, it does not execute code by default, awaiting user approval or execution commands. - -For more detailed information, refer to the official AG2 documentation on [`AssistantAgent`](https://docs.ag2.ai/docs/api-reference/autogen/AssistantAgent). - -### Implementing `AssistantAgent` for LLM-Powered Assistance - -The provided code snippet demonstrates the creation of an `AssistantAgent` instance named "assistant" using the AutoGen framework. The `AssistantAgent` class is designed to interact with large language models (LLMs) to solve tasks, including suggesting Python code blocks and debugging. By default, it does not execute code and expects the user to handle code execution. - -- **`name="assistant"`**: Assigns the name "assistant" to the agent. - -- **`system_message="You are a helpful assistant."`**: Sets a system message that defines the assistant's role and behavior during interactions. - -- **`llm_config={...}`**: Provides configuration settings for the LLM: - - **`timeout=600`**: Specifies a timeout of 600 seconds for LLM responses. - - **`cache_seed=42`**: Sets a seed value for caching mechanisms to ensure consistent results. - - **`config_list=config_list`**: Includes a list of additional configurations, which can define specific LLM models or parameters to use. - -By default, the `AssistantAgent` has `human_input_mode` set to "NEVER" and `code_execution_config` set to `False`, meaning it doesn't execute code and doesn't require human input during interactions. - - -```python -# 1. create an AssistantAgent instance named "assistant" -assistant = AssistantAgent( # As defined above - name="assistant", - system_message="You are a helpful assistant.", - llm_config={ - "timeout": 600, - "cache_seed": 42, - "config_list": config_list, - }, -) -print("AssistantAgent instance created, with the configurations as defined above.") -``` - -## Fetching Documentation - -The following function recursively fetches all unique internal links from the given documentation URL within a specified time limit. This is useful for gathering documentation pages that will be used to augment the LLM's responses. - - -```python -import requests -from bs4 import BeautifulSoup -from urllib.parse import urljoin, urlparse -import time -import os -from concurrent.futures import ThreadPoolExecutor, as_completed - -def get_documentation_links(base_url, visited=None, start_time=None, time_limit=10): - """ - Recursively fetch all unique internal links from the given documentation URL with a time constraint. - - Args: - base_url (str): The URL of the documentation homepage. - visited (set): A set to keep track of visited URLs. - start_time (float): The start time of execution. - time_limit (int): The maximum time allowed for execution in seconds. - - Returns: - list: A list of unique internal links found in the documentation. - """ - if visited is None: - visited = set() - if start_time is None: - start_time = time.time() - - # Stop recursion if time limit is exceeded - if time.time() - start_time > time_limit: - return list(visited) - - try: - response = requests.get(base_url, timeout=5) - response.raise_for_status() - except requests.RequestException as e: - print(f"Error fetching the page: {e}") - return list(visited) - - soup = BeautifulSoup(response.text, "html.parser") - domain = urlparse(base_url).netloc - - links = set() - for a_tag in soup.find_all("a", href=True): - href = a_tag["href"].strip() - full_url = urljoin(base_url, href) - parsed_url = urlparse(full_url) - - if parsed_url.netloc == domain and full_url not in visited: # Ensure it's an internal link within the same domain - visited.add(full_url) - links.add(full_url) - links.update(get_documentation_links(full_url, visited, start_time, time_limit)) # Recursive call with time check - - return list(visited) - -``` - - -```python -def fetch_content_generators(links, num_workers=5): - """ - Splits the links into separate lists for each worker and returns generators for each worker. - Extracts only plain text from HTML before yielding. - - Args: - links (list): List of URLs to fetch content from. - num_workers (int): Number of workers, each receiving a distinct set of links. - - Returns: - list: A list of generators, one for each worker. - """ - def fetch_content(sub_links): - for link in sub_links: - try: - response = requests.get(link, timeout=5) - response.raise_for_status() - - # Extract plain text from HTML - soup = BeautifulSoup(response.text, "html.parser") - text_content = soup.get_text() - - yield link, text_content - except requests.RequestException as e: - print(f"Error fetching {link}: {e}") - yield link, None - - # Split links into chunks for each worker - chunk_size = (len(links) + num_workers - 1) // num_workers # Ensure even distribution - link_chunks = [links[i:i + chunk_size] for i in range(0, len(links), chunk_size)] - - return [fetch_content(chunk) for chunk in link_chunks] -``` - - -```python -def save_content_to_files(links, output_folder="docs_data", num_workers=5): - """ - Uses fetch_content_generators to fetch content in parallel and save it to local files. - - Args: - links (list): List of URLs to fetch content from. - output_folder (str): Folder to store the saved text files. - num_workers (int): Number of workers for parallel processing. - - Returns: - list: A list of file paths where content is saved. - """ - os.makedirs(output_folder, exist_ok=True) - generators = fetch_content_generators(links, num_workers=num_workers) - - file_paths = [] - - def process_and_save(gen, worker_id): - local_paths = [] - for j, (url, content) in enumerate(gen): - if content: # Avoid saving empty or failed fetches - file_name = f"doc_{worker_id}_{j}.txt" - file_path = os.path.join(output_folder, file_name) - with open(file_path, "w", encoding="utf-8") as f: - f.write(content) - local_paths.append(file_path) - return local_paths - - with ThreadPoolExecutor(max_workers=num_workers) as executor: - futures = {executor.submit(process_and_save, gen, i): i for i, gen in enumerate(generators)} - for future in as_completed(futures): - file_paths.extend(future.result()) - - return file_paths -``` - -### 📌 Input Documentation Link Here -Please enter the link to the documentation below. - - -```python -default_link = "https://docs.ag2.ai/docs/use-cases/notebooks/Notebooks" -main_doc_link = input(f"Enter documentation link: ") or default_link -print("Selected link:", main_doc_link) -``` - - -```python -docs_links = get_documentation_links(main_doc_link, None, None, 5) -docs_file_paths = save_content_to_files(docs_links, "./docs", 12) -``` - - -```python -len(docs_file_paths), len(docs_links) -``` - - - - - (454, 454) - - - -## Using RetrieveChat - -The `RetrieveUserProxyAgent` in AutoGen is a specialized agent designed to facilitate retrieval-augmented generation (RAG) by leveraging external knowledge sources, typically a vector database. It acts as an intermediary between the user and an AI assistant, ensuring that relevant context is retrieved and supplied to the assistant for more informed responses. - - - -### **How RetrieveUserProxyAgent Works** - - -1. **Query Processing & Context Retrieval** - When the user submits a question, the `RetrieveUserProxyAgent` first determines if the available context is sufficient. If not, it retrieves additional relevant information from an external knowledge base (e.g., a vector database) using similarity search. - -2. **Interaction with the Assistant** - Once the relevant context is retrieved, the agent forwards both the user's query and the retrieved context to the `AssistantAgent` (such as an OpenAI-based model). This step ensures that the assistant generates an informed and contextually accurate response. - -3. **Handling Responses** - - If the assistant's response satisfies the user, the conversation ends. - - If the response is unsatisfactory or additional context is needed, the agent updates the context and repeats the retrieval process. - -4. **User Feedback & Iteration** - - The user can provide feedback, request refinements, or terminate the interaction. - - If updates are needed, the agent refines the context and interacts with the assistant again. - -![Retrival-Augmented Assistant](https://microsoft.github.io/autogen/0.2/assets/images/retrievechat-arch-959e180405c99ceb3da88a441c02f45e.png) - -Source: [Retrieval-Augmented Generation (RAG) Applications with AutoGen](https://microsoft.github.io/autogen/0.2/blog/2023/10/18/RetrieveChat/) - -### **Configuring `RetrieveUserProxyAgent` with Custom Text Splitting and OpenAI Embeddings for RAG** - -This code snippet demonstrates how to configure a `RetrieveUserProxyAgent` in AutoGen with a custom text splitter and an OpenAI-based embedding function for retrieval-augmented generation (RAG). It utilizes `RecursiveCharacterTextSplitter` to break documents into structured chunks for better embedding and retrieval. - -The embedding function is set up using OpenAI's `text-embedding-3-small` model, but users can alternatively use the default **SentenceTransformers** embedding model. The `RetrieveUserProxyAgent` is then initialized with a predefined task, auto-reply constraints, and a document retrieval path, enabling it to fetch relevant context dynamically and generate accurate responses in an automated workflow. - - -```python -from chromadb.utils import embedding_functions -from langchain.text_splitter import RecursiveCharacterTextSplitter - -# Initialize a recursive character text splitter with specified separators -recur_spliter = RecursiveCharacterTextSplitter(separators=["\n", "\r", "\t"]) - -# Option 1: Using OpenAI Embeddings -openai_ef = embedding_functions.OpenAIEmbeddingFunction( - api_key=os.environ["OPENAI_API_KEY"], - model_name="text-embedding-3-small", -) - -ragproxyagent = RetrieveUserProxyAgent( - name="ragproxyagent", - human_input_mode="NEVER", - max_consecutive_auto_reply=2, - retrieve_config={ - "task": "code", - "docs_path": docs_file_paths, - "chunk_token_size": 1200, # Defines chunk size for document splitting - "model": config_list[0]["model"], - "vector_db": "couchbase", # Using Couchbase Capella VectorDB - "collection_name": os.environ["CB_COLLECTION"] , # Collection name in Couchbase - "db_config": { - "connection_string": os.environ["CB_CONN_STR"], # Connection string for Couchbase - "username": os.environ["CB_USERNAME"], # Couchbase username - "password": os.environ["CB_PASSWORD"], # Couchbase password - "bucket_name": os.environ["CB_BUCKET"], # Bucket name in Couchbase - "scope_name": os.environ["CB_SCOPE"], # Scope name in Couchbase - "index_name": os.environ["CB_INDEX_NAME"], # Index name in Couchbase - }, - "get_or_create": True, # Set to False to avoid reusing an existing collection - "overwrite": False, # Set to True to overwrite an existing collection (forces index recreation) - - # Option 1: Use OpenAI embedding function (Uncomment below to enable) - # "embedding_function": openai_ef, - - # Option 2: Default embedding model (SentenceTransformers 'all-mpnet-base-v2') - "embedding_model": "all-mpnet-base-v2", # Default model if OpenAI embeddings are not used - - # Custom text splitter function - "custom_text_split_function": recur_spliter.split_text, - }, - code_execution_config=False, # Set to True if you want to execute retrieved code -) -``` - -## Chat Interaction - -This section marks the beginning of the chat interaction using RetrieveChat powered by Couchbase Capella for Retrieve Augmented Code Generation and Question Answering in AG2. - -### Example 1 - -Use RetrieveChat to help generate sample code and automatically run the code and fix errors if there is any. - -Problem: How to use RetrieveChat Powered by Couchbase Capella for Retrieve Augmented Code Generation and Question Answering in AG2? - -Note: You may need to create an index on the cluster to query - - -```python -assistant.reset() -code_problem = "How to use RetrieveChat Powered by Couchbase Capella for Retrieve Augmented Code Generation and Question Answering in AG2?" -chat_result = ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem=code_problem) -``` - -**Expected Output** - -The notebook which explains how to use Couchbase with AG2 contains this code snippet. And so, the RAG Agent should return some code snippet similiar to this. - -```python -ragproxyagent = RetrieveUserProxyAgent( - name="ragproxyagent", - human_input_mode="NEVER", - max_consecutive_auto_reply=3, - retrieve_config={ - "task": "code", - "docs_path": [ - "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Examples/Integrate%20-%20Spark.md", - "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Research.md", - ], - "chunk_token_size": 2000, - "model": config_list[0]["model"], - "vector_db": "couchbase", # Couchbase Capella VectorDB - "collection_name": "demo_collection", # Couchbase Capella collection name to be utilized/created - "db_config": { - "connection_string": os.environ["CB_CONN_STR"], # Couchbase Capella connection string - "username": os.environ["CB_USERNAME"], # Couchbase Capella username - "password": os.environ["CB_PASSWORD"], # Couchbase Capella password - "bucket_name": "test_db", # Couchbase Capella bucket name - "scope_name": "test_scope", # Couchbase Capella scope name - "index_name": "vector_index", # Couchbase Capella index name to be created - }, - "get_or_create": True, # set to False if you don't want to reuse an existing collection - "overwrite": False, # set to True if you want to overwrite an existing collection, each overwrite will force a index creation and reupload of documents - }, - code_execution_config=False, # set to False if you don't want to execute the code -) -``` diff --git a/tutorial/markdown/generated/vector-search-cookbook/awsbedrock-agents-custom-control-approach-Bedrock_Agents_Custom_Control.md b/tutorial/markdown/generated/vector-search-cookbook/awsbedrock-agents-custom-control-approach-Bedrock_Agents_Custom_Control.md deleted file mode 100644 index 172a03d..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/awsbedrock-agents-custom-control-approach-Bedrock_Agents_Custom_Control.md +++ /dev/null @@ -1,952 +0,0 @@ ---- -path: "/tutorial-aws-bedrock-agents-custom-control" -title: Building Intelligent Agents with AWS Bedrock (Custom Control) -short_title: AWS Bedrock Agents Custom Control Approach -description: - - Learn how to build intelligent agents using Amazon Bedrock Agents with a custom control approach and Couchbase as the vector store. - - This tutorial demonstrates how to create specialized agents that can process documents and interact with external APIs using custom control flows. - - You'll understand how to implement secure multi-agent architectures using Amazon Bedrock's agent capabilities with fine-grained control over agent behavior. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - Amazon Bedrock -sdk_language: - - python -length: 90 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/awsbedrock-agents/custom-control-approach/Bedrock_Agents_Custom_Control.ipynb) - -# AWS Bedrock Agents with Couchbase Vector Search - Custom Control Approach - -This notebook demonstrates the Custom Control approach for implementing AWS Bedrock agents with Couchbase Vector Search. In this approach, the agent returns control to the application for function execution. - -We'll implement a multi-agent architecture with specialized agents for different tasks: -- **Researcher Agent**: Searches for relevant documents in the vector store -- **Writer Agent**: Formats and presents the research findings - -## Alternative Approaches - -This notebook demonstrates the Custom Control approach for AWS Bedrock Agents. For comparison, you might also want to check out the Lambda Approach, which uses AWS Lambda functions to execute agent tools instead of handling them directly in your application code. - -The Lambda approach offers better separation of concerns and scalability, but requires more setup. You can find that implementation here: [Lambda Approach Notebook](https://developer.couchbase.com/tutorial-aws-bedrock-agents-lambda) -Note: If the link above doesn't work in your Jupyter environment, you can navigate to the file manually in the `awsbedrock-agents/lambda-approach/` directory. - -## Overview - -The Custom Control approach gives the application invoking the agent the responsibility of executing the agent's defined functions (tools). When the agent decides to use a tool, it sends a `returnControl` event back to the calling application, which then executes the function locally and (optionally) returns the result to the agent to continue processing. - -## Key Steps & Concepts - -1. **Define Agent:** - * Define instructions (prompt) for the agent. - * Define the function schema (tools the agent can use, e.g., `researcher_functions`, `writer_functions` in the example). - -2. **Create Agent in Bedrock:** - * Use `bedrock_agent_client.create_agent` to create the agent, providing the instructions and foundation model. - * The example's `create_agent` function includes logic to check for existing agents and potentially delete/recreate them if they are in a non-functional state. - -3. **Create Action Group (Custom Control):** - * Use `bedrock_agent_client.create_agent_action_group`. - * Crucially, set the `actionGroupExecutor` to `{"customControl": "RETURN_CONTROL"}`. This tells Bedrock to pause execution and return control to the caller when a function in this group needs to be run. - * Provide the `functionSchema` defined earlier. - -4. **Prepare Agent:** - * Use `bedrock_agent_client.prepare_agent` to make the agent ready for invocation. - * The `wait_for_agent_status` utility function polls until the agent reaches a `PREPARED` or `Available` state. - -5. **Create Agent Alias:** - * An alias (e.g., "v1") is created using `bedrock_agent_client.create_agent_alias` for invoking the agent. - - -6. **Invoke Agent & Handle Return Control (Custom Control Flow)** - - When the application invokes a Bedrock agent and the agent decides to use a tool, the "Custom Control" mechanism takes effect. Instead of Bedrock running the tool, it sends a `returnControl` event back to the application. The code then parses this event to identify the requested function and its parameters, executes that function locally using the application's resources (like a vector store), and the result of this local execution becomes the final output for that specific agent interaction. If further steps are needed with another agent, a new, separate agent invocation is made using this output. - - * Application calls `invoke_agent` to interact with Bedrock. - * Agent signals tool use via a `returnControl` event in the response stream. - * Application parses the event, extracting function name and parameters. - * Application executes the specified function locally, accessing its own resources. - * **The output from this local function execution is the final result for that agent's turn.** - - - -## Pros - -* **Full Control:** The application has complete control over the execution environment and logic of the tools. -* **Direct State Access:** Tools can directly access application memory, state, and resources (like the `vector_store` object in the example) without needing separate deployment or complex configuration passing. -* **Simpler Local Development:** Can be easier to test and debug locally as the tool execution happens within the same process. -* **Flexibility:** Allows integration with any library or service available to the application. - -## Cons - -* **Application Burden:** The application code is responsible for implementing and executing the tool logic. -* **Scalability:** The scalability of tool execution is tied to the scalability of the application itself. -* **Tighter Coupling:** The agent's functionality is more tightly coupled with the application code. -* **Interaction Model:** The specific implementation shown requires chaining separate agent invocations rather than letting the agent continue processing within a single turn after a tool is used. Implementing the latter (returning results via `ReturnControl`) adds complexity to the application's handling of the `invoke_agent` response/request cycle. - -## Setup and Configuration - -First, let's import the necessary libraries and set up our environment: - - -```python -import json -import logging -import os -import time -import uuid -from datetime import timedelta - -import boto3 -from botocore.exceptions import ClientError -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from dotenv import load_dotenv -from langchain_aws import BedrockEmbeddings -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore - -# Setup logging -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') -``` - -## Load Environment Variables - -Load environment variables from the .env file. Make sure to create a .env file with the necessary credentials before running this notebook. - - -```python -# Load environment variables -load_dotenv() - -# Couchbase Configuration -CB_HOST = os.getenv("CB_HOST", "couchbase://localhost") -CB_USERNAME = os.getenv("CB_USERNAME", "Administrator") -CB_PASSWORD = os.getenv("CB_PASSWORD", "password") -CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME", "vector-search-testing") -SCOPE_NAME = os.getenv("SCOPE_NAME", "shared") -COLLECTION_NAME = os.getenv("COLLECTION_NAME", "bedrock") -INDEX_NAME = os.getenv("INDEX_NAME", "vector_search_bedrock") - -# AWS Configuration -AWS_REGION = os.getenv("AWS_REGION", "us-east-1") -AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID") -AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") -AWS_ACCOUNT_ID = os.getenv("AWS_ACCOUNT_ID") - -# Check if required environment variables are set -required_vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] -missing_vars = [var for var in required_vars if not os.getenv(var)] -if missing_vars: - logging.warning(f"Missing required environment variables: {', '.join(missing_vars)}") - logging.warning("Please set these variables in your .env file") -else: - logging.info("All required environment variables are set") -``` - - 2025-05-08 13:34:15,605 - INFO - All required environment variables are set - - -## Initialize AWS Clients - -Set up the AWS clients for Bedrock and other services: - - -```python -# Initialize AWS session -session = boto3.Session( - aws_access_key_id=AWS_ACCESS_KEY_ID, - aws_secret_access_key=AWS_SECRET_ACCESS_KEY, - region_name=AWS_REGION -) - -# Initialize AWS clients from session -bedrock_client = session.client('bedrock') -bedrock_agent_client = session.client('bedrock-agent') -bedrock_runtime = session.client('bedrock-runtime') -bedrock_runtime_client = session.client('bedrock-agent-runtime') -iam_client = session.client('iam') - -logging.info("AWS clients initialized successfully") -``` - - 2025-05-08 13:34:15,836 - INFO - AWS clients initialized successfully - - -## Set Up Couchbase and Vector Store - -Now let's set up the Couchbase connection, collections, and vector store: - - -```python -# Connect to Couchbase -auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) -options = ClusterOptions(auth) -cluster = Cluster(CB_HOST, options) -cluster.wait_until_ready(timedelta(seconds=5)) -logging.info("Successfully connected to Couchbase") -``` - - 2025-05-08 13:34:17,966 - INFO - Successfully connected to Couchbase - - - - -## Create Couchbase Bucket, Scope, and Collection - -The following code block ensures that the necessary Couchbase bucket, scope, and collection are available. -It will create them if they don't exist, and also clear any existing documents from the collection to start fresh. - -> Note: Bucket Creation will fail on Capella - - -```python -# Set up collection -try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(CB_BUCKET_NAME) - logging.info(f"Bucket '{CB_BUCKET_NAME}' exists.") - except Exception as e: - logging.info(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=CB_BUCKET_NAME, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - bucket = cluster.bucket(CB_BUCKET_NAME) - logging.info(f"Bucket '{CB_BUCKET_NAME}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == SCOPE_NAME for scope in scopes) - - if not scope_exists and SCOPE_NAME != "_default": - logging.info(f"Scope '{SCOPE_NAME}' does not exist. Creating it...") - bucket_manager.create_scope(SCOPE_NAME) - logging.info(f"Scope '{SCOPE_NAME}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == SCOPE_NAME and COLLECTION_NAME in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{COLLECTION_NAME}' does not exist. Creating it...") - bucket_manager.create_collection(SCOPE_NAME, COLLECTION_NAME) - logging.info(f"Collection '{COLLECTION_NAME}' created successfully.") - else: - logging.info(f"Collection '{COLLECTION_NAME}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(SCOPE_NAME).collection(COLLECTION_NAME) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{CB_BUCKET_NAME}`.`{SCOPE_NAME}`.`{COLLECTION_NAME}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.error(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{CB_BUCKET_NAME}`.`{SCOPE_NAME}`.`{COLLECTION_NAME}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - -except Exception as e: - logging.error(f"Error setting up collection: {str(e)}") - raise -``` - - 2025-05-08 13:34:19,133 - INFO - Bucket 'vector-search-testing' exists. - 2025-05-08 13:34:21,149 - INFO - Collection 'bedrock' already exists. Skipping creation. - 2025-05-08 13:34:24,304 - INFO - Primary index present or created successfully. - 2025-05-08 13:34:24,529 - INFO - All documents cleared from the collection. - - -## Configure Couchbase Search Index - -This section focuses on setting up the Couchbase Search Index, which is essential for enabling vector search capabilities. -* The code will load an index definition from a local JSON file named `aws_index.json`. -* **Important Note:** The provided `aws_index.json` file has hardcoded references for the bucket, scope, and collection names. If you have used different names for your bucket, scope, or collection than the defaults specified in this notebook or your `.env` file, you **must** modify the `aws_index.json` file to reflect your custom names before running the next cell. - - -```python -# Set up search indexes -try: - # Construct path relative to the script file - # In a notebook, __file__ is not defined, so use os.getcwd() instead - script_dir = os.getcwd() - index_file_path = os.path.join(script_dir, 'aws_index.json') - # Load index definition from file - with open(index_file_path, 'r') as file: - index_definition = json.load(file) - logging.info(f"Loaded index definition from aws_index.json") -except Exception as e: - logging.error(f"Error loading index definition: {str(e)}") - raise - -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - logging.error("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-05-08 13:34:24,537 - INFO - Loaded index definition from aws_index.json - 2025-05-08 13:34:25,659 - INFO - Index 'vector_search_bedrock' found - 2025-05-08 13:34:26,348 - INFO - Index 'vector_search_bedrock' already exists. Skipping creation/update. - - - -```python -# Initialize Bedrock runtime client for embeddings -embeddings = BedrockEmbeddings( - client=bedrock_runtime, - model_id="amazon.titan-embed-text-v2:0" -) -logging.info("Successfully created Bedrock embeddings client") - -# Initialize vector store -vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME -) -logging.info("Successfully created vector store") -``` - - 2025-05-08 13:34:26,353 - INFO - Successfully created Bedrock embeddings client - 2025-05-08 13:34:29,660 - INFO - Successfully created vector store - - -# Load Documents into Vector Store - -Let's load the documents from the documents.json file and add them to our vector store: ->Note: `documents.json` contains the documents that we want to load into our vector store. As an example, we have added a few documents to the file from [https://cline.bot/](https://cline.bot/) - - -```python -# Load documents from JSON file -try: - # In a notebook, __file__ is not defined, so use os.getcwd() instead - script_dir = os.getcwd() - documents_file_path = os.path.join(script_dir, 'documents.json') - with open(documents_file_path, 'r') as f: - data = json.load(f) - documents = data.get('documents', []) - logging.info(f"Loaded {len(documents)} documents from documents.json") -except Exception as e: - logging.error(f"Error loading documents: {str(e)}") - raise - -# Add documents to vector store -logging.info(f"Adding {len(documents)} documents to vector store...") -for i, doc in enumerate(documents, 1): - text = doc.get('text', '') - metadata = doc.get('metadata', {}) - - # Ensure metadata is a dictionary before adding - if isinstance(metadata, str): - try: - metadata = json.loads(metadata) - except json.JSONDecodeError: - logging.warning(f"Warning: Could not parse metadata for document {i}. Using empty metadata.") - metadata = {} - elif not isinstance(metadata, dict): - logging.warning(f"Warning: Metadata for document {i} is not a dict or valid JSON string. Using empty metadata.") - metadata = {} - - doc_id = vector_store.add_texts([text], [metadata])[0] - logging.info(f"Added document {i}/{len(documents)} with ID: {doc_id}") - - # Add small delay between requests - time.sleep(1) - -logging.info(f"\nProcessing complete: {len(documents)}/{len(documents)} documents added successfully") -``` - - 2025-05-08 13:34:29,670 - INFO - Loaded 7 documents from documents.json - 2025-05-08 13:34:29,670 - INFO - Adding 7 documents to vector store... - 2025-05-08 13:34:31,637 - INFO - Added document 1/7 with ID: 884e8caae84545aa9e4735538b38f373 - 2025-05-08 13:34:33,211 - INFO - Added document 2/7 with ID: 61b9d4c9c5ee42a8a51e44ef0b55942a - 2025-05-08 13:34:34,784 - INFO - Added document 3/7 with ID: c7cb7541a9004ead83b9b393bc44a9b5 - 2025-05-08 13:34:36,886 - INFO - Added document 4/7 with ID: c8b07eae2e3a42c1a8114397bc8bfa67 - 2025-05-08 13:34:38,534 - INFO - Added document 5/7 with ID: a4356e0801564ad1b2f3ccdf05284375 - 2025-05-08 13:34:40,129 - INFO - Added document 6/7 with ID: 647d0fddba8f4bb38fd66d291a669bb2 - 2025-05-08 13:34:42,140 - INFO - Added document 7/7 with ID: 3b57038a7a234992927756cb3307738f - 2025-05-08 13:34:43,142 - INFO - - Processing complete: 7/7 documents added successfully - - -## Custom Control Approach Implementation - -Now let's implement the Custom Control approach for Bedrock agents. In this approach, the agent returns control to the application for function execution. - - -```python -# Function to wait for agent status -def wait_for_agent_status(bedrock_agent_client, agent_id, target_statuses=['Available', 'PREPARED', 'NOT_PREPARED'], max_attempts=30, delay=2): - """Wait for agent to reach any of the target statuses""" - for attempt in range(max_attempts): - try: - response = bedrock_agent_client.get_agent(agentId=agent_id) - current_status = response['agent']['agentStatus'] - - if current_status in target_statuses: - logging.info(f"Agent {agent_id} reached status: {current_status}") - return current_status - elif current_status == 'FAILED': - logging.error(f"Agent {agent_id} failed") - return 'FAILED' - - logging.info(f"Agent status: {current_status}, waiting... (attempt {attempt + 1}/{max_attempts})") - time.sleep(delay) - - except Exception as e: - logging.error(f"Error checking agent status: {str(e)}") - time.sleep(delay) - - return current_status -``` - - -```python -# Function to create a Bedrock agent with Custom Control action groups -def create_agent(bedrock_agent_client, name, instructions, functions, model_id="amazon.nova-pro-v1:0", agent_role_arn=None): - """Create a Bedrock agent with Custom Control action groups""" - try: - # List existing agents - existing_agents = bedrock_agent_client.list_agents() - existing_agent = next( - (agent for agent in existing_agents['agentSummaries'] - if agent['agentName'] == name), - None - ) - - # Handle existing agent - if existing_agent: - agent_id = existing_agent['agentId'] - logging.info(f"Found existing agent '{name}' with ID: {agent_id}") - - # Check agent status - response = bedrock_agent_client.get_agent(agentId=agent_id) - status = response['agent']['agentStatus'] - - if status in ['NOT_PREPARED', 'FAILED']: - logging.info(f"Deleting agent '{name}' with status {status}") - bedrock_agent_client.delete_agent(agentId=agent_id) - time.sleep(10) # Wait after deletion - existing_agent = None - - # Create new agent if needed - if not existing_agent: - logging.info(f"Creating new agent '{name}'") - agent_params = { - "agentName": name, - "description": f"{name.title()} agent for document operations", - "instruction": instructions, - "idleSessionTTLInSeconds": 1800, - "foundationModel": model_id - } - - if agent_role_arn: - agent_params["agentResourceRoleArn"] = agent_role_arn - - agent = bedrock_agent_client.create_agent(**agent_params) - agent_id = agent['agent']['agentId'] - logging.info(f"Created new agent '{name}' with ID: {agent_id}") - else: - agent_id = existing_agent['agentId'] - - # Wait for initial creation if needed - status = wait_for_agent_status(bedrock_agent_client, agent_id, target_statuses=['NOT_PREPARED', 'PREPARED', 'Available']) - if status not in ['NOT_PREPARED', 'PREPARED', 'Available']: - raise Exception(f"Agent failed to reach valid state: {status}") - - # Create action group if needed - try: - bedrock_agent_client.create_agent_action_group( - agentId=agent_id, - agentVersion="DRAFT", - actionGroupExecutor={"customControl": "RETURN_CONTROL"}, # This is the key for Custom Control - actionGroupName=f"{name}_actions", - functionSchema={"functions": functions}, - description=f"Action group for {name} operations" - ) - logging.info(f"Created action group for agent '{name}'") - time.sleep(5) - except bedrock_agent_client.exceptions.ConflictException: - logging.info(f"Action group already exists for agent '{name}'") - - # Prepare agent if needed - if status == 'NOT_PREPARED': - try: - logging.info(f"Starting preparation for agent '{name}'") - bedrock_agent_client.prepare_agent(agentId=agent_id) - status = wait_for_agent_status( - bedrock_agent_client, - agent_id, - target_statuses=['PREPARED', 'Available'] - ) - logging.info(f"Agent '{name}' preparation completed with status: {status}") - except Exception as e: - logging.error(f"Error during preparation: {str(e)}") - - # Handle alias creation/retrieval - try: - aliases = bedrock_agent_client.list_agent_aliases(agentId=agent_id) - alias = next((a for a in aliases['agentAliasSummaries'] if a['agentAliasName'] == 'v1'), None) - - if not alias: - logging.info(f"Creating new alias for agent '{name}'") - alias = bedrock_agent_client.create_agent_alias( - agentId=agent_id, - agentAliasName="v1" - ) - alias_id = alias['agentAlias']['agentAliasId'] - else: - alias_id = alias['agentAliasId'] - logging.info(f"Using existing alias for agent '{name}'") - - logging.info(f"Successfully configured agent '{name}' with ID: {agent_id} and alias: {alias_id}") - return agent_id, alias_id - - except Exception as e: - logging.error(f"Error managing alias: {str(e)}") - raise - - except Exception as e: - logging.error(f"Error creating/updating agent: {str(e)}") - raise -``` - - -```python -# Function to invoke a Bedrock agent -def invoke_agent(bedrock_runtime_client, agent_id, alias_id, input_text, session_id=None, vector_store=None): - """Invoke a Bedrock agent""" - if session_id is None: - session_id = str(uuid.uuid4()) - - try: - logging.info(f"Invoking agent with input: {input_text}") - - response = bedrock_runtime_client.invoke_agent( - agentId=agent_id, - agentAliasId=alias_id, - sessionId=session_id, - inputText=input_text, - enableTrace=True - ) - - result = "" - - for event in response['completion']: - # Process text chunks - if 'chunk' in event: - chunk = event['chunk']['bytes'].decode('utf-8') - result += chunk - - # Handle custom control return - if 'returnControl' in event: - return_control = event['returnControl'] - invocation_inputs = return_control.get('invocationInputs', []) - - if invocation_inputs: - function_input = invocation_inputs[0].get('functionInvocationInput', {}) - action_group = function_input.get('actionGroup') - function_name = function_input.get('function') - parameters = function_input.get('parameters', []) - - # Convert parameters to a dictionary - param_dict = {} - for param in parameters: - param_dict[param.get('name')] = param.get('value') - - logging.info(f"Function call: {action_group}::{function_name}") - - # Handle search_documents function - if function_name == 'search_documents': - query = param_dict.get('query') - k = int(param_dict.get('k', 3)) - - logging.info(f"Searching for: {query}, k={k}") - - if vector_store: - # Perform the search - docs = vector_store.similarity_search(query, k=k) - - # Format results - search_results = [doc.page_content for doc in docs] - logging.info(f"Found {len(search_results)} results") - - # Format the response - result = f"Search results for '{query}':\n\n" - for i, content in enumerate(search_results): - result += f"Result {i+1}: {content}\n\n" - else: - logging.error("Vector store not available") - result = "Error: Vector store not available" - - # Handle format_content function - elif function_name == 'format_content': - content = param_dict.get('content') - style = param_dict.get('style', 'user-friendly') - - logging.info(f"Formatting content in {style} style") - - # Check if content is valid - if content and content != '?': - result = f"Formatted in {style} style: {content}" - else: - result = "No content provided to format." - else: - logging.error(f"Unknown function: {function_name}") - result = f"Error: Unknown function {function_name}" - - if not result.strip(): - logging.warning("Received empty response from agent") - - return result - - except Exception as e: - logging.error(f"Error invoking agent: {str(e)}") - raise RuntimeError(f"Failed to invoke agent: {str(e)}") -``` - -## Define Agent Instructions and Functions - -Now let's define the instructions and functions for our agents: - - -```python -# Researcher agent instructions and functions -researcher_instructions = """ -You are a Research Assistant that helps users find relevant information in documents. -Your capabilities include: -1. Searching through documents using semantic similarity -2. Providing relevant document excerpts -3. Answering questions based on document content -""" - -researcher_functions = [ - { - "name": "search_documents", - "description": "Search for relevant documents using semantic similarity", - "parameters": { - "query": { - "type": "string", - "description": "The search query", - "required": True - }, - "k": { - "type": "integer", - "description": "Number of results to return", - "required": False - } - }, - "requireConfirmation": "DISABLED" - } -] - -# Writer agent instructions and functions -writer_instructions = """ -You are a Content Writer Assistant that helps format and present research findings. -Your capabilities include: -1. Formatting research findings in a user-friendly way -2. Creating clear and engaging summaries -3. Organizing information logically -4. Highlighting key insights -""" - -writer_functions = [ - { - "name": "format_content", - "description": "Format and present research findings", - "parameters": { - "content": { - "type": "string", - "description": "The research findings to format", - "required": True - }, - "style": { - "type": "string", - "description": "The desired presentation style (e.g., summary, detailed, bullet points)", - "required": False - } - }, - "requireConfirmation": "DISABLED" - } -] -``` - -## Run Custom Control Approach - -Now let's run the Custom Control approach with our agents: - - -```python -# Get or Create IAM Role -agent_role_name = "BedrockExecutionRoleForAgents_CustomControl" -trust_policy = { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Service": "bedrock.amazonaws.com" - }, - "Action": "sts:AssumeRole" - } - ] -} -policy_arn_to_attach = "arn:aws:iam::aws:policy/AmazonBedrockFullAccess" - -try: - role_response = iam_client.get_role(RoleName=agent_role_name) - agent_role_arn = role_response['Role']['Arn'] - logging.info(f"Found existing IAM role '{agent_role_name}' with ARN: {agent_role_arn}") -except ClientError as e: - if e.response['Error']['Code'] == 'NoSuchEntity': - logging.info(f"IAM role '{agent_role_name}' not found. Creating...") - try: - role_response = iam_client.create_role( - RoleName=agent_role_name, - AssumeRolePolicyDocument=json.dumps(trust_policy), - Description="IAM role for Bedrock Agents execution" - ) - agent_role_arn = role_response['Role']['Arn'] - logging.info(f"Created IAM role '{agent_role_name}' with ARN: {agent_role_arn}") - # Wait a bit for the role to be fully available before attaching policy - time.sleep(10) - except ClientError as create_error: - logging.error(f"Error creating IAM role '{agent_role_name}': {create_error}") - agent_role_arn = None - else: - logging.error(f"Error getting IAM role '{agent_role_name}': {e}") - agent_role_arn = None - -# Attach the policy if not already attached -if agent_role_arn: - try: - attached_policies = iam_client.list_attached_role_policies(RoleName=agent_role_name) - if not any(p['PolicyArn'] == policy_arn_to_attach for p in attached_policies.get('AttachedPolicies', [])): - logging.info(f"Attaching policy '{policy_arn_to_attach}' to role '{agent_role_name}'...") - iam_client.attach_role_policy( - RoleName=agent_role_name, - PolicyArn=policy_arn_to_attach - ) - logging.info(f"Policy '{policy_arn_to_attach}' attached successfully.") - # Wait a bit for the policy attachment to propagate - time.sleep(5) - else: - logging.info(f"Policy '{policy_arn_to_attach}' already attached to role '{agent_role_name}'.") - except ClientError as attach_error: - logging.warning(f"Error attaching policy to role '{agent_role_name}': {attach_error}") -``` - - 2025-05-08 13:34:44,254 - INFO - Found existing IAM role 'BedrockExecutionRoleForAgents_CustomControl' with ARN: arn:aws:iam::598307997273:role/BedrockExecutionRoleForAgents_CustomControl - 2025-05-08 13:34:44,547 - INFO - Policy 'arn:aws:iam::aws:policy/AmazonBedrockFullAccess' already attached to role 'BedrockExecutionRoleForAgents_CustomControl'. - - - -```python -# Create researcher agent -researcher_id = None -researcher_alias = None - -if agent_role_arn: - try: - researcher_id, researcher_alias = create_agent( - bedrock_agent_client, - "researcher", - researcher_instructions, - researcher_functions, - agent_role_arn=agent_role_arn - ) - logging.info(f"Researcher agent created with ID: {researcher_id} and alias: {researcher_alias}") - except Exception as e: - logging.error(f"Failed to create researcher agent: {str(e)}") -else: - logging.error("No agent role ARN available for researcher agent creation") -``` - - 2025-05-08 13:34:45,303 - INFO - Found existing agent 'researcher' with ID: FF1OSFJIJF - 2025-05-08 13:34:46,399 - INFO - Agent FF1OSFJIJF reached status: PREPARED - 2025-05-08 13:34:46,712 - INFO - Action group already exists for agent 'researcher' - 2025-05-08 13:34:46,996 - INFO - Using existing alias for agent 'researcher' - 2025-05-08 13:34:46,997 - INFO - Successfully configured agent 'researcher' with ID: FF1OSFJIJF and alias: RQVFGLBCZP - 2025-05-08 13:34:46,997 - INFO - Researcher agent created with ID: FF1OSFJIJF and alias: RQVFGLBCZP - - - -```python -# Create writer agent -writer_id = None -writer_alias = None - -if agent_role_arn: - try: - writer_id, writer_alias = create_agent( - bedrock_agent_client, - "writer", - writer_instructions, - writer_functions, - agent_role_arn=agent_role_arn - ) - logging.info(f"Writer agent created with ID: {writer_id} and alias: {writer_alias}") - except Exception as e: - logging.error(f"Failed to create writer agent: {str(e)}") -else: - logging.error("No agent role ARN available for writer agent creation") - -if not any([researcher_id, writer_id]): - # Adjust error message based on whether role setup failed - if not agent_role_arn: - raise RuntimeError("Failed to create agents because IAM role setup failed.") - else: - raise RuntimeError("Failed to create any agents despite successful IAM role setup.") -``` - - 2025-05-08 13:34:47,279 - INFO - Found existing agent 'writer' with ID: JDA8S8SRS1 - 2025-05-08 13:34:48,178 - INFO - Agent JDA8S8SRS1 reached status: PREPARED - 2025-05-08 13:34:48,498 - INFO - Action group already exists for agent 'writer' - 2025-05-08 13:34:48,797 - INFO - Using existing alias for agent 'writer' - 2025-05-08 13:34:48,797 - INFO - Successfully configured agent 'writer' with ID: JDA8S8SRS1 and alias: 3SFKJGSGNQ - 2025-05-08 13:34:48,798 - INFO - Writer agent created with ID: JDA8S8SRS1 and alias: 3SFKJGSGNQ - - -## Test the Agents - -Let's test our agents by asking the researcher agent to search for information and the writer agent to format the results: - - -```python -# Test researcher agent -if researcher_id and researcher_alias: - researcher_response = invoke_agent( - bedrock_runtime_client, - researcher_id, - researcher_alias, - "What is unique about the Cline AI assistant? Use the search_documents function to find relevant information.", - vector_store=vector_store - ) - print("\nResearcher Response:\n", researcher_response) -else: - logging.error("Researcher agent not available for testing") -``` - - 2025-05-08 13:34:48,808 - INFO - Invoking agent with input: What is unique about the Cline AI assistant? Use the search_documents function to find relevant information. - 2025-05-08 13:34:51,478 - INFO - Function call: researcher_actions::search_documents - 2025-05-08 13:34:51,478 - INFO - Searching for: What is unique about the Cline AI assistant?, k=3 - 2025-05-08 13:34:52,791 - INFO - Found 3 results - - - - Researcher Response: - Search results for 'What is unique about the Cline AI assistant?': - - Result 1: The Cline AI assistant, developed by Saoud Rizwan, is a unique system that combines vector search capabilities with Amazon Bedrock agents. Unlike traditional chatbots, it uses a sophisticated multi-agent architecture where specialized agents handle different aspects of document processing and interaction. - - Result 2: One of Cline's key features is its ability to create MCP (Model Context Protocol) servers on the fly. This allows users to extend the system's capabilities by adding new tools and resources that connect to external APIs, all while maintaining a secure and non-interactive environment. - - Result 3: The browser automation capabilities in Cline are implemented through Puppeteer, allowing the system to interact with web interfaces in a controlled 900x600 pixel window. This enables testing of web applications, verification of changes, and even general web browsing tasks. - - - - - -```python -# Test writer agent -if writer_id and writer_alias and "researcher_response" in locals(): - writer_response = invoke_agent( - bedrock_runtime_client, - writer_id, - writer_alias, - f"Format this research finding using the format_content function: {researcher_response}", - vector_store=vector_store - ) - print("\nWriter Response:\n", writer_response) -else: - logging.error("Writer agent not available for testing or no researcher response to format") -``` - - 2025-05-08 13:34:52,798 - INFO - Invoking agent with input: Format this research finding using the format_content function: Search results for 'What is unique about the Cline AI assistant?': - - Result 1: The Cline AI assistant, developed by Saoud Rizwan, is a unique system that combines vector search capabilities with Amazon Bedrock agents. Unlike traditional chatbots, it uses a sophisticated multi-agent architecture where specialized agents handle different aspects of document processing and interaction. - - Result 2: One of Cline's key features is its ability to create MCP (Model Context Protocol) servers on the fly. This allows users to extend the system's capabilities by adding new tools and resources that connect to external APIs, all while maintaining a secure and non-interactive environment. - - Result 3: The browser automation capabilities in Cline are implemented through Puppeteer, allowing the system to interact with web interfaces in a controlled 900x600 pixel window. This enables testing of web applications, verification of changes, and even general web browsing tasks. - - - 2025-05-08 13:34:55,730 - INFO - Function call: writer_actions::format_content - 2025-05-08 13:34:55,730 - INFO - Formatting content in summary style - - - - Writer Response: - Formatted in summary style: The Cline AI assistant, developed by Saoud Rizwan, is a unique system that combines vector search capabilities with Amazon Bedrock agents. Unlike traditional chatbots, it uses a sophisticated multi-agent architecture where specialized agents handle different aspects of document processing and interaction. One of Cline's key features is its ability to create MCP (Model Context Protocol) servers on the fly. This allows users to extend the system's capabilities by adding new tools and resources that connect to external APIs, all while maintaining a secure and non-interactive environment. The browser automation capabilities in Cline are implemented through Puppeteer, allowing the system to interact with web interfaces in a controlled 900x600 pixel window. This enables testing of web applications, verification of changes, and even general web browsing tasks. - - -## Conclusion - -In this notebook, we've demonstrated the Custom Control approach for implementing AWS Bedrock agents with Couchbase Vector Search. This approach allows the agent to return control to the application for function execution, providing more flexibility and control over the agent's behavior. - -Key components of this implementation include: - -1. **Vector Store Setup**: We set up a Couchbase vector store to store and search documents using semantic similarity. -2. **Agent Creation**: We created two specialized agents - a researcher agent for searching documents and a writer agent for formatting results. -3. **Custom Control**: We implemented the Custom Control approach, where the agent returns control to the application for function execution. -4. **Function Handling**: We handled the agent's function calls in the application code, allowing for more control and flexibility. - -This approach is particularly useful when you need more control over the agent's behavior or when you want to integrate the agent with existing systems and data sources. diff --git a/tutorial/markdown/generated/vector-search-cookbook/awsbedrock-agents-lambda-approach-Bedrock_Agents_Lambda.md b/tutorial/markdown/generated/vector-search-cookbook/awsbedrock-agents-lambda-approach-Bedrock_Agents_Lambda.md deleted file mode 100644 index 731a620..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/awsbedrock-agents-lambda-approach-Bedrock_Agents_Lambda.md +++ /dev/null @@ -1,2165 +0,0 @@ ---- -# formatter.md -path: "/tutorial-aws-bedrock-agents-lambda" -title: Building Intelligent Agents with Amazon Bedrock (Lambda) -short_title: AWS Bedrock Agents Lambda Approach -description: - - Learn how to build intelligent agents using Amazon Bedrock Agents with AWS Lambda and Couchbase as the vector store. - - This tutorial demonstrates how to create specialized agents that can process documents and interact with external APIs using serverless Lambda functions. - - You'll understand how to implement secure multi-agent architectures using Amazon Bedrock's agent capabilities with a Lambda-based approach. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - Amazon Bedrock -sdk_language: - - python -length: 90 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/awsbedrock-agents/lambda-approach/Bedrock_Agents_Lambda.ipynb) - -# AWS Bedrock Agents with Couchbase Vector Search - Lambda Approach - -This notebook demonstrates the Lambda approach for implementing AWS Bedrock agents with Couchbase Vector Search. In this approach, the agent invokes AWS Lambda functions to execute operations. - -We'll implement a multi-agent architecture with specialized agents for different tasks: -- **Researcher Agent**: Searches for relevant documents in the vector store -- **Writer Agent**: Formats and presents the research findings - -## Alternative Approaches - -This notebook demonstrates the Lambda Approach for AWS Bedrock Agents. For comparison, you might also want to check out the Custom Control Approach, which handles agent tools directly in your application code instead of using AWS Lambda functions. - -The Custom Control approach offers simpler setup and more direct control, but may not scale as well. You can find that implementation here: [Custom Control Approach Notebook](https://developer.couchbase.com/tutorial-aws-bedrock-agents-custom-control) - -Note: If the link above doesn't work in your Jupyter environment, you can navigate to the file manually in the `awsbedrock-agents/custom-control-approach/` directory. - -## Overview - -The Lambda approach for AWS Bedrock Agents delegates the execution of an agent's defined functions (tools) to backend AWS Lambda functions. When the agent decides to use a tool, Bedrock directly invokes the corresponding Lambda function that you've specified in the agent's action group configuration. This Lambda function receives the parameters from the agent, executes the necessary logic (e.g., querying a Couchbase vector store, calling other APIs, performing computations), and then returns the result to the Bedrock Agent. The agent can then use this result to continue its reasoning process or formulate a final response to the user. This architecture promotes a clean separation of concerns, allows tool logic to be developed and scaled independently, and leverages the serverless capabilities of AWS Lambda. - -## Key Steps & Concepts - -1. **Define Agent Instructions & Tool Schema:** - * **Instructions:** Craft a clear prompt that tells the agent its purpose, capabilities, and how it should behave (e.g., "You are a research assistant that uses the SearchAndFormat tool..."). - * **Function Schema:** Define the structure of the tool(s) the agent can use. In this notebook, we define a single tool (e.g., `searchAndFormatDocuments`) that the agent will call. This schema specifies the function name, description, and its input parameters (e.g., `query`, `k`, `style`). This schema acts as the contract between the agent and the Lambda function. - -2. **Implement Lambda Handler Function:** - * Create an AWS Lambda function (e.g., `bedrock_agent_search_and_format.py`) that contains the actual Python code to execute the tool's logic. - * **Event Handling:** The Lambda handler receives an event payload from Bedrock. This payload includes details like the API path (which corresponds to the function name in the schema), HTTP method, and the parameters supplied by the agent. - * **Business Logic:** Inside the Lambda, parse the incoming event, extract parameters, and perform the required actions. For this notebook, this involves: - * Connecting to Couchbase. - * Initializing the Bedrock Embeddings client. - * Performing a vector similarity search using the provided query and `k` value. - * Optionally, formatting the search results based on the `style` parameter (though in this specific example, the formatting is largely illustrative and the LLM does the heavy lifting of presentation). - * **Response Structure:** The Lambda must return a JSON response in a specific format that Bedrock expects. This response typically includes the `actionGroup`, `apiPath`, `httpMethod`, `httpStatusCode`, and a `responseBody` containing the result of the tool execution (e.g., the search results as a string). - * **Deployment:** Package the Lambda function with its dependencies (e.g., `requirements.txt`) into a .zip file. This notebook includes helper functions to automate packaging (using a `Makefile`) and deployment, including uploading to S3 if the package is large. The Lambda also needs an IAM role with permissions to run, write logs, and interact with Bedrock and any other required AWS services. - * **Environment Variables:** The Lambda function is configured with environment variables (e.g., Couchbase connection details, Bedrock model IDs) to allow it to connect to necessary services without hardcoding credentials. These are set during the Lambda creation/update process in the notebook. - -3. **Create Agent in AWS Bedrock:** - * Use the `bedrock_agent_client.create_agent` SDK call. Provide the agent name, the ARN of the IAM role it will assume, the foundation model ID (e.g., Claude Sonnet), and the instructions defined in step 1. - -4. **Create Agent Action Group (Linking to Lambda):** - * Use `bedrock_agent_client.create_agent_action_group`. - * **`actionGroupExecutor`:** This is the crucial part for the Lambda approach. Set it to `{'lambda': 'arn:aws:lambda:::function:'}`. This tells Bedrock to invoke your specific Lambda function when this action group is triggered. - * **`functionSchema`:** Provide the function schema defined in step 1. This allows the agent to understand how to call the Lambda function (i.e., what parameters to send). - * Give the action group a name (e.g., `SearchAndFormatActionGroup`). - -5. **Prepare Agent:** - * Call `bedrock_agent_client.prepare_agent` with the `agentId`. This makes the DRAFT version of the agent (with its newly configured action group) ready for use. The notebook includes a custom waiter to poll until the agent status is `PREPARED`. - -6. **Create or Update Agent Alias:** - * An alias (e.g., `prod`) is used to invoke a specific version of the agent. The notebook checks if an alias exists and creates one if not, pointing to the latest prepared (DRAFT) version. Use `bedrock_agent_client.create_agent_alias` or `update_agent_alias`. - -7. **Invoke Agent:** - * Use `bedrock_agent_runtime_client.invoke_agent`, providing the `agentId`, `agentAliasId`, a unique `sessionId`, and the user's `inputText` (prompt). - * Bedrock takes over: when the agent decides to use the tool from the action group, Bedrock transparently calls the configured Lambda function with the necessary parameters. - * The Lambda executes, returns its result to the agent, and the agent uses this result to generate its final response. - * Your application code simply waits for and processes the final streaming response from the `invoke_agent` call. Unlike the Custom Control approach, there's no `returnControl` event for the application to handle for tool execution; Bedrock manages the Lambda invocation directly. - -## Pros - -* **Decoupling & Modularity:** Tool execution logic is encapsulated within Lambda functions, separate from the main application code. This allows for independent development, deployment, and scaling of tools. -* **Scalability & Serverless:** Leverages the inherent scalability, concurrency, and pay-per-use benefits of AWS Lambda for tool execution. -* **Managed Execution Environment:** AWS manages the underlying infrastructure, runtime environment, and invocation mechanism for the Lambda functions. -* **Simpler Application-Level Code:** The application that invokes the Bedrock agent doesn't need to implement the tool's logic itself or handle `returnControl` events for function execution, as Bedrock orchestrates the call to Lambda directly. - -## Cons - -* **Deployment & Configuration Overhead:** Requires setting up, packaging, configuring dependencies, and deploying separate Lambda functions. IAM roles and permissions for Lambdas also need careful management. -* **State Management:** If tools need to share state or complex context with the Lambda function, this must be explicitly passed, often via environment variables or by including necessary lookup logic within the Lambda itself. -* **Cold Starts:** AWS Lambda cold starts can introduce latency the first time a function is invoked after a period of inactivity, potentially affecting agent response time. -* **Debugging Complexity:** Troubleshooting can be more involved as it spans across the Bedrock Agent service, the Lambda service, and potentially other services the Lambda interacts with (like Couchbase). Centralized logging (e.g., CloudWatch Logs for Lambda) is essential. -* **Cost:** Incurs costs associated with Lambda invocations, execution duration, and any resources used by the Lambda (e.g., data transfer, provisioned concurrency if used). - -## 1. Imports - -This section imports all necessary Python libraries. These include: -- Standard libraries: `json` for data handling, `logging` for progress and error messages, `os` for interacting with the operating system (e.g., file paths), `subprocess` for running external commands (like `make` for Lambda packaging), `time` for delays, `traceback` for detailed error reporting, `uuid` for generating unique identifiers, and `shutil` for file operations. -- `boto3` and `botocore`: The AWS SDK for Python, used to interact with AWS services like Bedrock, IAM, Lambda, and S3. Specific configurations (`Config`) and waiters are also imported for robust client interactions. -- `couchbase`: The official Couchbase SDK for Python, used for connecting to and interacting with the Couchbase cluster, including managing buckets, collections, and search indexes. Specific exception classes are imported for error handling. -- `dotenv`: For loading environment variables from a `.env` file, which helps manage configuration settings like API keys and connection strings securely. -- `langchain_aws` and `langchain_couchbase`: Libraries from the LangChain ecosystem. `BedrockEmbeddings` is used to generate text embeddings via Amazon Bedrock, and `CouchbaseSearchVectorStore` provides an interface for using Couchbase as a vector store in LangChain applications. - - -```python -import json -import logging -import os -import shutil -import subprocess -import time -import traceback -import uuid -from datetime import timedelta - -import boto3 -from botocore.config import Config -from botocore.exceptions import ClientError -from botocore.waiter import WaiterModel, create_waiter_with_client -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (BucketNotFoundException, CouchbaseException, - QueryIndexAlreadyExistsException) -from couchbase.management.buckets import BucketSettings, BucketType -from couchbase.management.collections import CollectionSpec -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from dotenv import load_dotenv -from langchain_aws import BedrockEmbeddings -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -``` - -## 2. Configuration - -This section handles the initial setup of essential configurations for the notebook: -- **Logging:** Configures the `logging` module to output messages with a specific format (timestamp, level, message), which helps in tracking the script's execution and diagnosing issues. -- **Environment Variables:** Attempts to load environment variables from a `.env` file located either in the current directory or the parent directory. This is a common practice to keep sensitive information like credentials and hostnames out of the codebase. If the `.env` file is not found, the script will rely on variables already set in the execution environment. -- **Couchbase Settings:** Defines variables for connecting to Couchbase, including the host, username, password, and the names for the bucket, scope, collection, and search index that will be used for this experiment. Default values are provided if specific environment variables are not set. -- **AWS Settings:** Defines variables for AWS configuration, such as the region, access key ID, secret access key, and AWS account ID. These are crucial for `boto3` to interact with AWS services. -- **Bedrock Model IDs:** Specifies the model identifiers for the Amazon Bedrock text embedding model (e.g., `amazon.titan-embed-text-v2:0`) and the foundation model to be used by the agent (e.g., `anthropic.claude-3-sonnet-20240229-v1:0`). -- **File Paths:** Sets up variables for various file paths used throughout the notebook, such as the directory for schemas, the path to the Couchbase search index JSON definition, and the path to the JSON file containing documents to be loaded into the vector store. Using `os.getcwd()` makes these paths relative to the notebook's current working directory. - - -```python -# Setup logging -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') -logger = logging.getLogger(__name__) - -# Load environment variables from project root .env -# In a notebook environment, '__file__' is not defined. Use a relative path or absolute path directly. -# Assuming the notebook is run from the 'lambda-experiments' directory -dotenv_path = os.path.join(os.getcwd(), '.env') # Or specify the full path if needed -logger.info(f"Attempting to load .env file from: {dotenv_path}") -if os.path.exists(dotenv_path): - load_dotenv(dotenv_path=dotenv_path) - logger.info(".env file loaded successfully.") -else: - # Try loading from parent directory if not found in current - parent_dotenv_path = os.path.join(os.path.dirname(os.getcwd()), '.env') - if os.path.exists(parent_dotenv_path): - load_dotenv(dotenv_path=parent_dotenv_path) - logger.info(f".env file loaded successfully from parent directory: {parent_dotenv_path}") - else: - logger.warning(f".env file not found at {dotenv_path} or {parent_dotenv_path}. Relying on environment variables.") - - -# Couchbase Configuration -CB_HOST = os.getenv("CB_HOST", "couchbase://localhost") -CB_USERNAME = os.getenv("CB_USERNAME", "Administrator") -CB_PASSWORD = os.getenv("CB_PASSWORD", "password") -# Using a new bucket/scope/collection for experiments to avoid conflicts -CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME", "vector-search-exp") -SCOPE_NAME = os.getenv("SCOPE_NAME", "bedrock_exp") -COLLECTION_NAME = os.getenv("COLLECTION_NAME", "docs_exp") -INDEX_NAME = os.getenv("INDEX_NAME", "vector_search_bedrock_exp") - -# AWS Configuration -AWS_REGION = os.getenv("AWS_REGION", "us-east-1") -AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID") -AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") -AWS_ACCOUNT_ID = os.getenv("AWS_ACCOUNT_ID") - -# Bedrock Model IDs -EMBEDDING_MODEL_ID = "amazon.titan-embed-text-v2:0" -AGENT_MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0" # Using Sonnet for the agent - -# Paths (relative to the notebook's execution directory) -SCRIPT_DIR = os.getcwd() # Use current working directory for notebook context -SCHEMAS_DIR = os.path.join(SCRIPT_DIR, 'schemas') # New Schemas Dir -SEARCH_FORMAT_SCHEMA_PATH = os.path.join(SCHEMAS_DIR, 'search_and_format_schema.json') # Added -INDEX_JSON_PATH = os.path.join(SCRIPT_DIR, 'aws_index.json') # Keep -DOCS_JSON_PATH = os.path.join(SCRIPT_DIR, 'documents.json') # Changed to load from script's directory -``` - - 2025-06-09 13:39:41,393 - INFO - Attempting to load .env file from: /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/.env - 2025-06-09 13:39:41,395 - INFO - .env file loaded successfully. - - -## 3. Helper Functions - -This section defines a comprehensive suite of helper functions to modularize the various operations required throughout the notebook. These functions encapsulate specific tasks, making the main execution flow cleaner and easier to understand. The categories of helper functions include: - -* **Environment and Client Initialization:** Checking for necessary environment variables and setting up AWS SDK (`boto3`) clients for services like IAM, Lambda, Bedrock, and S3. -* **Couchbase Interaction:** Connecting to the Couchbase cluster, and robustly setting up buckets, scopes, collections, and search indexes. Includes functions to clear data from collections for clean experimental runs. -* **IAM Role Management:** Creating or retrieving the necessary IAM roles with appropriate trust policies and permissions that allow Bedrock Agents and Lambda functions to operate and interact with other AWS services securely. -* **Lambda Function Deployment:** A set of functions to manage the lifecycle of the Lambda function that the agent will invoke. This includes packaging the Lambda code and its dependencies (using a `Makefile`), uploading the deployment package (to S3 if it's large), creating or updating the Lambda function in AWS, and deleting it for cleanup. -* **Bedrock Agent Resource Management:** Functions for creating the Bedrock Agent itself, defining its action groups (which link the agent to the Lambda function via its ARN and define the tool schema), preparing the agent to make it invocable, and managing agent aliases. Also includes functions to delete these agent resources for cleanup. -* **Agent Invocation:** A function to test the fully configured agent by sending it a prompt and processing its streamed response, including any trace information for debugging. - -### 3.1 check_environment_variables - -This function verifies that all critical environment variables required for the script to run (e.g., AWS credentials, Couchbase password, AWS Account ID) are set. It logs an error and returns `False` if any are missing, otherwise logs success and returns `True`. - - -```python -def check_environment_variables(): - """Check if required environment variables are set.""" - required_vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "AWS_ACCOUNT_ID", "CB_PASSWORD"] - missing_vars = [var for var in required_vars if not os.getenv(var)] - if missing_vars: - logger.error(f"Missing required environment variables: {', '.join(missing_vars)}") - logger.error("Please set these variables in your environment or .env file") - return False - logger.info("All required environment variables are set.") - return True -``` - -### 3.2 initialize_aws_clients - -This function sets up and returns the necessary AWS SDK (`boto3`) clients for interacting with various AWS services. It initializes clients for Bedrock Runtime (for embeddings and agent invocation), IAM (for managing roles and policies), Lambda (for deploying and managing Lambda functions), Bedrock Agent (for creating and managing agents), and Bedrock Agent Runtime (for invoking agents). It uses credentials and region from the environment configuration and includes a custom configuration (`agent_config`) with longer timeouts and retries, which is particularly important for Bedrock Agent operations that can take more time, like agent preparation. - - -```python -def initialize_aws_clients(): - """Initialize required AWS clients.""" - try: - logger.info(f"Initializing AWS clients in region: {AWS_REGION}") - session = boto3.Session( - aws_access_key_id=AWS_ACCESS_KEY_ID, - aws_secret_access_key=AWS_SECRET_ACCESS_KEY, - region_name=AWS_REGION - ) - # Use a config with longer timeouts for agent operations - agent_config = Config( - connect_timeout=120, - read_timeout=600, # Agent preparation can take time - retries={'max_attempts': 5, 'mode': 'adaptive'} - ) - bedrock_runtime = session.client('bedrock-runtime', region_name=AWS_REGION) - iam_client = session.client('iam', region_name=AWS_REGION) - lambda_client = session.client('lambda', region_name=AWS_REGION) - bedrock_agent_client = session.client('bedrock-agent', region_name=AWS_REGION, config=agent_config) # Add agent client - bedrock_agent_runtime_client = session.client('bedrock-agent-runtime', region_name=AWS_REGION, config=agent_config) # Add agent runtime client - logger.info("AWS clients initialized successfully.") - return bedrock_runtime, iam_client, lambda_client, bedrock_agent_client, bedrock_agent_runtime_client # Return agent runtime client - except Exception as e: - logger.error(f"Error initializing AWS clients: {e}") - raise -``` - -### 3.3 connect_couchbase - -This function establishes a connection to the Couchbase cluster using the connection string (`CB_HOST`), username, and password from the environment configuration. It uses `PasswordAuthenticator` for authentication and `ClusterOptions` for potentially customizing connection parameters (though commented out in the example, it shows where timeouts could be set). It waits for the cluster to be ready before returning the `Cluster` object, ensuring that subsequent operations can be performed reliably. - - -```python -def connect_couchbase(): - """Connect to Couchbase cluster.""" - try: - logger.info(f"Connecting to Couchbase cluster at {CB_HOST}...") - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - # Use robust options - options = ClusterOptions( - auth, - ) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=10)) # Wait longer if needed - logger.info("Successfully connected to Couchbase.") - return cluster - except CouchbaseException as e: - logger.error(f"Couchbase connection error: {e}") - raise - except Exception as e: - logger.error(f"Unexpected error connecting to Couchbase: {e}") - raise -``` - -### 3.4 setup_collection - -This comprehensive function is responsible for ensuring that the required Couchbase bucket, scope, and collection are available for the agent's vector store. It performs the following steps idempotently: -- Checks if the specified bucket (`bucket_name`) exists. If not, it creates the bucket with defined settings (e.g., RAM quota, flush enabled). It includes a pause to allow the bucket to become ready. -- Checks if the specified scope (`scope_name`) exists within the bucket. If not, it creates the scope and includes a brief pause. -- Checks if the specified collection (`collection_name`) exists within the scope. If not, it creates the collection using a `CollectionSpec` and pauses. -- Ensures that a primary N1QL index exists on the collection, creating it if it's missing. This is often useful for administrative queries or simpler lookups, though not strictly for vector search itself. -Finally, it returns a `Collection` object representing the target collection for further operations. - -> Note: Bucket Creation will not work on Capella. - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - """Set up Couchbase collection (Original Logic from lamda-approach)""" - logger.info(f"Setting up collection: {bucket_name}/{scope_name}/{collection_name}") - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logger.info(f"Bucket '{bucket_name}' exists.") - except BucketNotFoundException: - logger.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - # Use BucketSettings with potentially lower RAM for experiment - bucket_settings = BucketSettings( - name=bucket_name, - bucket_type=BucketType.COUCHBASE, - ram_quota_mb=256, # Adjusted from 1024 - flush_enabled=True, - num_replicas=0 - ) - try: - cluster.buckets().create_bucket(bucket_settings) - # Wait longer after bucket creation - logger.info(f"Bucket '{bucket_name}' created. Waiting for ready state (10s)...") - time.sleep(10) - bucket = cluster.bucket(bucket_name) # Re-assign bucket object - except Exception as create_e: - logger.error(f"Failed to create bucket '{bucket_name}': {create_e}") - raise - except Exception as e: - logger.error(f"Error getting bucket '{bucket_name}': {e}") - raise - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(s.name == scope_name for s in scopes) - - if not scope_exists: - logger.info(f"Scope '{scope_name}' does not exist. Creating it...") - try: - bucket_manager.create_scope(scope_name) - logger.info(f"Scope '{scope_name}' created. Waiting (2s)...") - time.sleep(2) - except CouchbaseException as e: - # Handle potential race condition or already exists error more robustly - if "already exists" in str(e).lower() or "scope_exists" in str(e).lower(): - logger.info(f"Scope '{scope_name}' likely already exists (caught during creation attempt).") - else: - logger.error(f"Failed to create scope '{scope_name}': {e}") - raise - else: - logger.info(f"Scope '{scope_name}' already exists.") - - # Check if collection exists, create if it doesn't - # Re-fetch scopes in case it was just created - scopes = bucket_manager.get_all_scopes() - collection_exists = False - for s in scopes: - if s.name == scope_name: - if any(c.name == collection_name for c in s.collections): - collection_exists = True - break - - if not collection_exists: - logger.info(f"Collection '{collection_name}' does not exist in scope '{scope_name}'. Creating it...") - try: - # Use CollectionSpec - collection_spec = CollectionSpec(collection_name, scope_name) - bucket_manager.create_collection(collection_spec) - logger.info(f"Collection '{collection_name}' created. Waiting (2s)...") - time.sleep(2) - except CouchbaseException as e: - if "already exists" in str(e).lower() or "collection_exists" in str(e).lower(): - logger.info(f"Collection '{collection_name}' likely already exists (caught during creation attempt).") - else: - logger.error(f"Failed to create collection '{collection_name}': {e}") - raise - else: - logger.info(f"Collection '{collection_name}' already exists.") - - # Ensure primary index exists - try: - logger.info(f"Ensuring primary index exists on `{bucket_name}`.`{scope_name}`.`{collection_name}`...") - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logger.info("Primary index present or created successfully.") - except Exception as e: - logger.error(f"Error creating primary index: {str(e)}") - # Decide if this is fatal - - logger.info("Collection setup complete.") - # Return the collection object for use - return cluster.bucket(bucket_name).scope(scope_name).collection(collection_name) - - except Exception as e: - logger.error(f"Error setting up collection: {str(e)}") - logger.error(traceback.format_exc()) - raise -``` - -### 3.5 setup_search_index - -This function is responsible for creating or updating the Couchbase Search (FTS) index required for vector similarity search. Key operations include: -- Loading the index definition from a specified JSON file (`index_definition_path`). -- Dynamically updating the loaded index definition to use the correct `index_name` and `sourceName` (bucket name) provided as arguments. This allows for a template index definition file to be reused. -- Using the `SearchIndexManager` (obtained from the cluster object) to `upsert_index`. Upserting means the index will be created if it doesn't exist, or updated if an index with the same name already exists. This makes the operation idempotent. -- After submitting the upsert operation, it includes a pause (`time.sleep`) to allow Couchbase some time to start the indexing process in the background. - -> **Important Note:** The provided `aws_index.json` file has hardcoded references for the bucket, scope, and collection names. If you have used different names for your bucket, scope, or collection than the defaults specified in this notebook or your `.env` file, you **must** modify the `aws_index.json` file to reflect your custom names before running the next cell. - - -```python -def setup_search_index(cluster, index_name, bucket_name, scope_name, collection_name, index_definition_path): - """Set up search indexes (Original Logic, adapted) """ - try: - logger.info(f"Looking for index definition at: {index_definition_path}") - if not os.path.exists(index_definition_path): - logger.error(f"Index definition file not found: {index_definition_path}") - raise FileNotFoundError(f"Index definition file not found: {index_definition_path}") - - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - index_definition['name'] = index_name - index_definition['sourceName'] = bucket_name - logger.info(f"Loaded index definition from {index_definition_path}, ensuring name is '{index_name}' and source is '{bucket_name}'.") - - except Exception as e: - logger.error(f"Error loading index definition: {str(e)}") - raise - - try: - # Use the SearchIndexManager from the Cluster object for cluster-level indexes - # Or use scope-level if the index JSON is structured for that - # Assuming cluster level based on original script structure for upsert - search_index_manager = cluster.search_indexes() - - # Create SearchIndex object from potentially modified JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - logger.info(f"Upserting search index '{index_name}'...") - search_index_manager.upsert_index(search_index) - - # Wait for indexing - logger.info(f"Index '{index_name}' upsert operation submitted. Waiting for indexing (10s)...") - time.sleep(10) - - logger.info(f"Search index '{index_name}' setup complete.") - - except QueryIndexAlreadyExistsException: - # This exception might not be correct for SearchIndexManager - # Upsert should handle exists cases, but log potential specific errors - logger.warning(f"Search index '{index_name}' likely already existed (caught QueryIndexAlreadyExistsException, check if applicable). Upsert attempted.") - except CouchbaseException as e: - logger.error(f"Couchbase error during search index setup for '{index_name}': {e}") - raise - except Exception as e: - logger.error(f"Unexpected error during search index setup for '{index_name}': {e}") - raise -``` - -### 3.6 clear_collection - -This utility function is used to delete all documents from a specified Couchbase collection. It constructs and executes a N1QL `DELETE` query targeting the given bucket, scope, and collection. This is useful for ensuring a clean state before loading new data for an experiment, preventing interference from previous runs. It also attempts to log the number of mutations (deleted documents) if the query metrics are available. - - -```python -def clear_collection(cluster, bucket_name, scope_name, collection_name): - """Delete all documents from the specified collection (Original Logic).""" - try: - logger.warning(f"Attempting to clear all documents from `{bucket_name}`.`{scope_name}`.`{collection_name}`...") - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - result = cluster.query(query).execute() - # Try to get mutation count, handle if not available - mutation_count = 0 - try: - metrics_data = result.meta_data().metrics() - if metrics_data: - mutation_count = metrics_data.mutation_count() - except Exception as metrics_e: - logger.warning(f"Could not retrieve mutation count after delete: {metrics_e}") - logger.info(f"Successfully cleared documents from the collection (approx. {mutation_count} mutations).") - except Exception as e: - logger.error(f"Error clearing documents from collection: {e}. Collection might be empty or index not ready.") -``` - -### 3.7 create_agent_role - -This function creates or updates the necessary IAM (Identity and Access Management) role that the Bedrock Agent and its associated Lambda function will assume. The role needs permissions to interact with AWS services on your behalf. Key aspects of this function are: -- **Assume Role Policy:** Defines which AWS services (principals) are allowed to assume this role. In this case, it allows both `lambda.amazonaws.com` (for the Lambda function execution) and `bedrock.amazonaws.com` (for the Bedrock Agent service itself). -- **Idempotency:** It first checks if a role with the specified `role_name` already exists. - - If it exists, the function retrieves its ARN and updates its trust policy to ensure it matches the required configuration. - - If it doesn't exist, it creates a new IAM role with the defined assume role policy and description. -- **Permissions Policies:** - - Attaches the AWS managed policy `AWSLambdaBasicExecutionRole`, which grants the Lambda function permissions to write logs to CloudWatch. - - Creates and attaches an inline policy (`LambdaBasicLoggingPermissions`) for more specific logging permissions if needed, scoped to the Lambda log group. - - Creates and attaches an inline policy (`BedrockAgentPermissions`) granting broad `bedrock:*` permissions. For production, these permissions should be scoped down to the minimum required. -- **Propagation Delays:** Includes `time.sleep` calls after creating the role and after attaching policies to allow time for the changes to propagate within AWS, which helps prevent subsequent operations from failing due to eventual consistency issues. -It returns the ARN (Amazon Resource Name) of the created or updated IAM role, which is then used when creating the Bedrock Agent and the Lambda function. - - -```python -def create_agent_role(iam_client, role_name, aws_account_id): - """Creates or gets the IAM role for the Bedrock Agent Lambda functions.""" - logger.info(f"Checking/Creating IAM role: {role_name}") - assume_role_policy_document = { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Service": [ - "lambda.amazonaws.com", - "bedrock.amazonaws.com" - ] - }, - "Action": "sts:AssumeRole" - } - ] - } - - role_arn = None - try: - # Check if role exists - get_role_response = iam_client.get_role(RoleName=role_name) - role_arn = get_role_response['Role']['Arn'] - logger.info(f"IAM role '{role_name}' already exists with ARN: {role_arn}") - - # Ensure trust policy is up-to-date - logger.info(f"Updating trust policy for existing role '{role_name}'...") - iam_client.update_assume_role_policy( - RoleName=role_name, - PolicyDocument=json.dumps(assume_role_policy_document) - ) - logger.info(f"Trust policy updated for role '{role_name}'.") - - except iam_client.exceptions.NoSuchEntityException: - logger.info(f"IAM role '{role_name}' not found. Creating...") - try: - create_role_response = iam_client.create_role( - RoleName=role_name, - AssumeRolePolicyDocument=json.dumps(assume_role_policy_document), - Description='IAM role for Bedrock Agent Lambda functions (Experiment)', - MaxSessionDuration=3600 - ) - role_arn = create_role_response['Role']['Arn'] - logger.info(f"Successfully created IAM role '{role_name}' with ARN: {role_arn}") - # Wait after role creation before attaching policies - logger.info("Waiting 15s for role creation propagation...") - time.sleep(15) - except ClientError as e: - logger.error(f"Error creating IAM role '{role_name}': {e}") - raise - - except ClientError as e: - logger.error(f"Error getting/updating IAM role '{role_name}': {e}") - raise - - # Attach basic execution policy (idempotent) - try: - logger.info(f"Attaching basic Lambda execution policy to role '{role_name}'...") - iam_client.attach_role_policy( - RoleName=role_name, - PolicyArn='arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole' - ) - logger.info("Attached basic Lambda execution policy.") - except ClientError as e: - logger.error(f"Error attaching basic Lambda execution policy: {e}") - # Don't necessarily raise, might already be attached or other issue - - # Add minimal inline policy for logging (can be expanded later if needed) - basic_inline_policy_name = "LambdaBasicLoggingPermissions" - basic_inline_policy_doc = { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Action": [ - "logs:CreateLogGroup", - "logs:CreateLogStream", - "logs:PutLogEvents" - ], - "Resource": f"arn:aws:logs:{AWS_REGION}:{aws_account_id}:log-group:/aws/lambda/*:*" # Scope down logs if possible - } - # Add S3 permissions here ONLY if Lambda code explicitly needs it - ] - } - - # Add Bedrock permissions policy - bedrock_policy_name = "BedrockAgentPermissions" - bedrock_policy_doc = { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Action": [ - "bedrock:*" - ], - "Resource": "*" # You can scope this down to specific agents/models if needed - } - ] - } - try: - logger.info(f"Putting basic inline policy '{basic_inline_policy_name}' for role '{role_name}'...") - iam_client.put_role_policy( - RoleName=role_name, - PolicyName=basic_inline_policy_name, - PolicyDocument=json.dumps(basic_inline_policy_doc) - ) - logger.info(f"Successfully put inline policy '{basic_inline_policy_name}'.") - - # Add Bedrock permissions policy - logger.info(f"Putting Bedrock permissions policy '{bedrock_policy_name}' for role '{role_name}'...") - iam_client.put_role_policy( - RoleName=role_name, - PolicyName=bedrock_policy_name, - PolicyDocument=json.dumps(bedrock_policy_doc) - ) - logger.info(f"Successfully put inline policy '{bedrock_policy_name}'.") - - logger.info("Waiting 10s for policy changes to propagate...") - time.sleep(10) - except ClientError as e: - logger.error(f"Error putting inline policy: {e}") - # Decide if this is fatal - - if not role_arn: - raise Exception(f"Failed to create or retrieve ARN for role {role_name}") - - return role_arn -``` - -### 3.8 Lambda Deployment Functions - -This subsection groups together several helper functions dedicated to managing the deployment lifecycle of the AWS Lambda function that will serve as the tool executor for the Bedrock Agent. These functions handle packaging the Lambda code, managing its dependencies, deploying it to AWS, and cleaning up resources. - -#### 3.8.1 delete_lambda_function - -This function is designed to safely delete an AWS Lambda function. Before attempting to delete the function itself, it tries to remove any permissions associated with it (specifically, the permission allowing Bedrock to invoke it, using a predictable statement ID). It then checks if the function exists and, if so, proceeds with the deletion. The function includes a brief pause after initiating deletion, as the process is asynchronous. It returns `True` if deletion was attempted/occurred and `False` if the function didn't exist or if an error occurred during the process. - - -```python -def delete_lambda_function(lambda_client, function_name): - """Delete Lambda function if it exists, attempting to remove permissions first.""" - logger.info(f"Attempting to delete Lambda function: {function_name}...") - try: - # Use a predictable statement ID added by create_lambda_function - statement_id = f"AllowBedrockInvokeBasic-{function_name}" - try: - logger.info(f"Attempting to remove permission {statement_id} from {function_name}...") - lambda_client.remove_permission( - FunctionName=function_name, - StatementId=statement_id - ) - logger.info(f"Successfully removed permission {statement_id} from {function_name}.") - time.sleep(2) # Allow time for permission removal - except lambda_client.exceptions.ResourceNotFoundException: - logger.info(f"Permission {statement_id} not found on {function_name}. Skipping removal.") - except ClientError as perm_e: - # Log error but continue with deletion attempt - logger.warning(f"Error removing permission {statement_id} from {function_name}: {str(perm_e)}") - - # Check if function exists before attempting deletion - lambda_client.get_function(FunctionName=function_name) - logger.info(f"Function {function_name} exists. Deleting...") - lambda_client.delete_function(FunctionName=function_name) - - # Wait for deletion to complete using a waiter - logger.info(f"Waiting for {function_name} to be deleted...") - time.sleep(10) # Simple delay after delete call - logger.info(f"Function {function_name} deletion initiated.") - - return True # Indicates deletion was attempted/occurred - - except lambda_client.exceptions.ResourceNotFoundException: - logger.info(f"Lambda function '{function_name}' does not exist. No need to delete.") - return False # Indicates function didn't exist - except Exception as e: - logger.error(f"Error during deletion process for Lambda function '{function_name}': {str(e)}") - # Depending on severity, might want to raise or just return False - return False # Indicates an error occurred beyond not found -``` - -#### 3.8.2 upload_to_s3 - -This function handles uploading a Lambda deployment package (a .zip file) to Amazon S3. This is necessary when the package size exceeds the direct upload limit for Lambda. Key features include: -- **Bucket Management:** It generates a unique S3 bucket name (prefixed with `lambda-deployment-`) using the AWS account ID and a timestamp, or a fallback UUID if the account ID isn't available. It checks if this bucket exists and creates it if not, ensuring the correct region is specified for bucket creation. It also uses a waiter to ensure the bucket is available before proceeding. -- **S3 Key Generation:** Creates a unique S3 key (object path) for the uploaded file, incorporating the original filename and a UUID to prevent collisions. -- **Multipart Upload:** For large files (currently > 100MB in the code, though the Lambda direct upload limit is typically around 50MB for the zip, and 250MB unzipped including layers, so S3 is used for packages over ~45-50MB in this notebook), it uses `boto3.s3.transfer.S3Transfer` for robust multipart uploads. For smaller files, it uses a standard `put_object` call. -- **Retry Configuration:** Initializes the S3 and STS clients with a configuration that includes increased timeouts and retries for better resilience. -It returns a dictionary containing the `S3Bucket` and `S3Key` of the uploaded package, which is then used by the `create_lambda_function`. - - -```python -def upload_to_s3(zip_file, region, bucket_name=None): - """Upload zip file to S3 with retry logic and return S3 location.""" - logger.info(f"Preparing to upload {zip_file} to S3 in region {region}...") - # Configure the client with increased timeouts - config = Config( - connect_timeout=60, - read_timeout=300, - retries={'max_attempts': 3, 'mode': 'adaptive'} - ) - - s3_client = boto3.client('s3', region_name=region, config=config) - sts_client = boto3.client('sts', region_name=region, config=config) - - # Determine bucket name - if bucket_name is None: - try: - account_id = sts_client.get_caller_identity().get('Account') - timestamp = int(time.time()) - bucket_name = f"lambda-deployment-{account_id}-{timestamp}" - logger.info(f"Generated unique S3 bucket name: {bucket_name}") - except Exception as e: - fallback_id = uuid.uuid4().hex[:12] - bucket_name = f"lambda-deployment-{fallback_id}" - logger.warning(f"Error getting account ID ({e}). Using fallback bucket name: {bucket_name}") - - # Create bucket if needed - try: - s3_client.head_bucket(Bucket=bucket_name) - logger.info(f"Using existing S3 bucket: {bucket_name}") - except ClientError as e: - error_code = int(e.response['Error']['Code']) - if error_code == 404: - logger.info(f"Creating S3 bucket: {bucket_name}...") - try: - if region == 'us-east-1': - s3_client.create_bucket(Bucket=bucket_name) - else: - s3_client.create_bucket( - Bucket=bucket_name, - CreateBucketConfiguration={'LocationConstraint': region} - ) - logger.info(f"Created S3 bucket: {bucket_name}. Waiting for availability...") - waiter = s3_client.get_waiter('bucket_exists') - waiter.wait(Bucket=bucket_name, WaiterConfig={'Delay': 5, 'MaxAttempts': 12}) - logger.info(f"Bucket {bucket_name} is available.") - except Exception as create_e: - logger.error(f"Error creating bucket '{bucket_name}': {create_e}") - raise - else: - logger.error(f"Error checking bucket '{bucket_name}': {e}") - raise - - # Upload file - s3_key = f"lambda/{os.path.basename(zip_file)}-{uuid.uuid4().hex[:8]}" - try: - logger.info(f"Uploading {zip_file} to s3://{bucket_name}/{s3_key}...") - file_size = os.path.getsize(zip_file) - if file_size > 100 * 1024 * 1024: # Use multipart for files > 100MB - logger.info("Using multipart upload for large file...") - transfer_config = boto3.s3.transfer.TransferConfig( - multipart_threshold=10 * 1024 * 1024, max_concurrency=10, - multipart_chunksize=10 * 1024 * 1024, use_threads=True - ) - s3_transfer = boto3.s3.transfer.S3Transfer(client=s3_client, config=transfer_config) - s3_transfer.upload_file(zip_file, bucket_name, s3_key) - else: - with open(zip_file, 'rb') as f: - s3_client.put_object(Bucket=bucket_name, Key=s3_key, Body=f) - - logger.info(f"Successfully uploaded to s3://{bucket_name}/{s3_key}") - return {'S3Bucket': bucket_name, 'S3Key': s3_key} - - except Exception as upload_e: - logger.error(f"S3 upload failed: {upload_e}") - raise -``` - -#### 3.8.3 package_function - -This function automates the process of packaging the Lambda function code and its dependencies into a .zip file, ready for deployment. It relies on a `Makefile` located in the `source_dir` (which is `lambda_functions` in this notebook). The steps are: -1. **Path Setup:** Defines various paths for source files, the temporary packaging directory, the `Makefile`, and the final output .zip file. -2. **File Preparation:** It copies the specific Lambda handler script (e.g., `bedrock_agent_search_and_format.py`) to `lambda_function.py` within the `source_dir` because the `Makefile` is likely configured to look for a generic `lambda_function.py`. -3. **Execute Makefile:** It runs a `make clean package` command using `subprocess.check_call`. The `make` command is executed with the `source_dir` as its current working directory. The Makefile is responsible for creating a virtual environment, installing dependencies from `requirements.txt` into a temporary `package_dir`, and then zipping the contents of this directory along with `lambda_function.py` into `lambda_package.zip` within the `source_dir`. -4. **Output Handling:** After the `make` command successfully completes, it moves and renames the generated `lambda_package.zip` from the `source_dir` to the specified `build_dir` (the notebook's current directory in this case) with a name like `function_name.zip`. -5. **Cleanup:** In a `finally` block, it cleans up the temporary `lambda_function.py` copied earlier and any intermediate `lambda_package.zip` left in the `source_dir` (e.g., if the rename/move failed). -The function returns the path to the final .zip file. - - -```python -def package_function(function_name, source_dir, build_dir): - """Package Lambda function using Makefile found in source_dir.""" - # source_dir is where the .py, requirements.txt, Makefile live (e.g., lambda_functions) - # build_dir is where packaging happens and final zip ends up (e.g., lambda-experiments) - makefile_path = os.path.join(source_dir, 'Makefile') - # Temp build dir inside source_dir, as Makefile expects relative paths - temp_package_dir = os.path.join(source_dir, 'package_dir') - # Requirements file is in source_dir - source_req_path = os.path.join(source_dir, 'requirements.txt') - # Target requirements path inside source_dir (needed for Makefile) - # target_req_path = os.path.join(source_dir, 'requirements.txt') # No copy needed if running make in source_dir - source_func_script_path = os.path.join(source_dir, f'{function_name}.py') - # Target function script path inside source_dir, renamed for Makefile install_deps copy - target_func_script_path = os.path.join(source_dir, 'lambda_function.py') - # Make output zip is created inside source_dir - make_output_zip = os.path.join(source_dir, 'lambda_package.zip') - # Final zip path is in the build_dir (one level up from source_dir) - final_zip_path = os.path.join(build_dir, f'{function_name}.zip') - - logger.info(f"--- Packaging function {function_name} --- ") - logger.info(f"Source Dir (Makefile location & make cwd): {source_dir}") - logger.info(f"Build Dir (Final zip location): {build_dir}") - - if not os.path.exists(source_func_script_path): - raise FileNotFoundError(f"Source function script not found: {source_func_script_path}") - if not os.path.exists(source_req_path): - raise FileNotFoundError(f"Source requirements file not found: {source_req_path}") - if not os.path.exists(makefile_path): - raise FileNotFoundError(f"Makefile not found at: {makefile_path}") - - # Ensure no leftover target script from previous failed run - if os.path.exists(target_func_script_path): - logger.warning(f"Removing existing target script: {target_func_script_path}") - os.remove(target_func_script_path) - - try: - # 1. No need to create lambda subdir in build_dir - - # 2. Copy source function script to source_dir as lambda_function.py - logger.info(f"Copying {source_func_script_path} to {target_func_script_path}") - shutil.copy(source_func_script_path, target_func_script_path) - # Requirements file is already in source_dir, no copy needed. - - # 3. Run make command (execute from source_dir where Makefile is) - make_command = [ - 'make', - '-f', makefile_path, # Still specify Makefile path explicitly - 'clean', # Clean first - 'package', - # 'PYTHON_VERSION=python3.9' # Let Makefile use its default or system default - ] - logger.info(f"Running make command: {' '.join(make_command)} (in {source_dir})") - # Run make from source_dir; relative paths in Makefile should now work - subprocess.check_call(make_command, cwd=source_dir, stdout=subprocess.DEVNULL, stderr=subprocess.PIPE) - logger.info("Make command completed successfully.") - - # 4. Check for output zip in source_dir and rename/move to build_dir - if not os.path.exists(make_output_zip): - raise FileNotFoundError(f"Makefile did not produce expected output: {make_output_zip}") - - logger.info(f"Moving and renaming {make_output_zip} to {final_zip_path}") - if os.path.exists(final_zip_path): - logger.warning(f"Removing existing final zip: {final_zip_path}") - os.remove(final_zip_path) - # Use shutil.move for cross-filesystem safety if needed, os.rename is fine here - os.rename(make_output_zip, final_zip_path) - logger.info(f"Zip file ready: {final_zip_path}") - - return final_zip_path - - except subprocess.CalledProcessError as e: - logger.error(f"Error running Makefile for {function_name}: {e}") - stderr_output = "(No stderr captured)" - if e.stderr: - try: - stderr_output = e.stderr.decode() - except Exception: - stderr_output = "(Could not decode stderr)" - logger.error(f"Make stderr: {stderr_output}") - raise - except Exception as e: - logger.error(f"Error packaging function {function_name} using Makefile: {str(e)}") - logger.error(traceback.format_exc()) - raise - finally: - # 5. Clean up intermediate files in source_dir - if os.path.exists(target_func_script_path): - logger.info(f"Cleaning up temporary script: {target_func_script_path}") - os.remove(target_func_script_path) - if os.path.exists(make_output_zip): # If rename failed - logger.warning(f"Cleaning up intermediate zip in source dir: {make_output_zip}") - os.remove(make_output_zip) -``` - -#### 3.8.4 create_lambda_function - -This is a key function that handles the creation or update of the AWS Lambda function. It incorporates several important aspects for robustness and proper configuration: -- **Package Handling:** It checks the size of the deployment .zip file. If it's over a threshold (45MB in this code, as Lambda has limits for direct uploads), it calls `upload_to_s3` to upload the package to S3 and uses the S3 location for deployment. Otherwise, it reads the .zip file content directly for deployment. -- **Configuration:** Defines common arguments for Lambda creation/update, including the function name, runtime (`python3.9`), IAM role ARN, handler name, timeout, memory size, and crucial environment variables (Couchbase details, Bedrock model IDs) that the Lambda will need at runtime. -- **Idempotency & Retry Logic:** It first attempts to create the Lambda function. - - If it encounters a `ResourceConflictException` (meaning the function already exists), it then attempts to update the function's code and configuration. - - It includes a retry loop for both creation and update operations to handle potential throttling or other transient AWS issues, with an exponential backoff strategy. -- **Permissions:** After successfully creating or updating the Lambda, it adds a resource-based policy (permission) to the Lambda function. This permission specifically allows the Bedrock service (`bedrock.amazonaws.com`) to invoke this Lambda function. It uses a predictable `StatementId` and handles potential conflicts if the permission already exists. -- **Waiters:** It uses `boto3` waiters (`function_active_v2` after creation, `function_updated_v2` after update) to pause execution until the Lambda function becomes fully active and ready, preventing issues where subsequent operations might target a Lambda that isn't fully initialized. -The function returns the ARN of the successfully created or updated Lambda function. - - -```python -def create_lambda_function(lambda_client, function_name, handler, role_arn, zip_file, region): - """Create or update Lambda function with retry logic.""" - logger.info(f"Deploying Lambda function {function_name} from {zip_file}...") - - # Configure the client with increased timeouts for potentially long creation - config = Config( - connect_timeout=120, - read_timeout=300, - retries={'max_attempts': 5, 'mode': 'adaptive'} - ) - lambda_client_local = boto3.client('lambda', region_name=region, config=config) - - # Check zip file size - zip_size_mb = 0 - try: - zip_size_bytes = os.path.getsize(zip_file) - zip_size_mb = zip_size_bytes / (1024 * 1024) - logger.info(f"Zip file size: {zip_size_mb:.2f} MB") - except OSError as e: - logger.error(f"Could not get size of zip file {zip_file}: {e}") - raise # Cannot proceed without zip file - - use_s3 = zip_size_mb > 45 # Use S3 for packages over ~45MB - s3_location = None - zip_content = None - - if use_s3: - logger.info(f"Package size ({zip_size_mb:.2f} MB) requires S3 deployment.") - s3_location = upload_to_s3(zip_file, region) - if not s3_location: - raise Exception("Failed to upload Lambda package to S3.") - else: - logger.info("Deploying package directly.") - try: - with open(zip_file, 'rb') as f: - zip_content = f.read() - except OSError as e: - logger.error(f"Could not read zip file {zip_file}: {e}") - raise - - # Define common create/update args - common_args = { - 'FunctionName': function_name, - 'Runtime': 'python3.9', - 'Role': role_arn, - 'Handler': handler, - 'Timeout': 180, - 'MemorySize': 1536, # Adjust as needed - # Env vars loaded from main script env or .env - 'Environment': { - 'Variables': { - 'CB_HOST': os.getenv('CB_HOST', 'couchbase://localhost'), - 'CB_USERNAME': os.getenv('CB_USERNAME', 'Administrator'), - 'CB_PASSWORD': os.getenv('CB_PASSWORD', 'password'), - 'CB_BUCKET_NAME': os.getenv('CB_BUCKET_NAME', 'vector-search-exp'), - 'SCOPE_NAME': os.getenv('SCOPE_NAME', 'bedrock_exp'), - 'COLLECTION_NAME': os.getenv('COLLECTION_NAME', 'docs_exp'), - 'INDEX_NAME': os.getenv('INDEX_NAME', 'vector_search_bedrock_exp'), - 'EMBEDDING_MODEL_ID': os.getenv('EMBEDDING_MODEL_ID', EMBEDDING_MODEL_ID), - 'AGENT_MODEL_ID': os.getenv('AGENT_MODEL_ID', AGENT_MODEL_ID) - } - } - } - - if use_s3: - code_arg = {'S3Bucket': s3_location['S3Bucket'], 'S3Key': s3_location['S3Key']} - else: - code_arg = {'ZipFile': zip_content} - - max_retries = 3 - base_delay = 10 - for attempt in range(1, max_retries + 1): - try: - logger.info(f"Creating function '{function_name}' (attempt {attempt}/{max_retries})...") - create_args = common_args.copy() - create_args['Code'] = code_arg - create_args['Publish'] = True # Publish a version - - create_response = lambda_client_local.create_function(**create_args) - function_arn = create_response['FunctionArn'] - logger.info(f"Successfully created function '{function_name}' with ARN: {function_arn}") - - # Add basic invoke permission after creation - time.sleep(5) # Give function time to be fully created before adding policy - statement_id = f"AllowBedrockInvokeBasic-{function_name}" - try: - logger.info(f"Adding basic invoke permission ({statement_id}) to {function_name}...") - lambda_client_local.add_permission( - FunctionName=function_name, - StatementId=statement_id, - Action='lambda:InvokeFunction', - Principal='bedrock.amazonaws.com' - ) - logger.info(f"Successfully added basic invoke permission {statement_id}.") - except lambda_client_local.exceptions.ResourceConflictException: - logger.info(f"Permission {statement_id} already exists for {function_name}. Skipping add.") - except ClientError as perm_e: - logger.warning(f"Failed to add basic invoke permission {statement_id} to {function_name}: {perm_e}") - - # Wait for function to be Active - logger.info(f"Waiting for function '{function_name}' to become active...") - waiter = lambda_client_local.get_waiter('function_active_v2') - waiter.wait(FunctionName=function_name, WaiterConfig={'Delay': 5, 'MaxAttempts': 24}) - logger.info(f"Function '{function_name}' is active.") - - return function_arn # Return ARN upon successful creation - - except lambda_client_local.exceptions.ResourceConflictException: - logger.warning(f"Function '{function_name}' already exists. Attempting to update code...") - try: - if use_s3: - update_response = lambda_client_local.update_function_code( - FunctionName=function_name, - S3Bucket=s3_location['S3Bucket'], - S3Key=s3_location['S3Key'], - Publish=True - ) - else: - update_response = lambda_client_local.update_function_code( - FunctionName=function_name, - ZipFile=zip_content, - Publish=True - ) - function_arn = update_response['FunctionArn'] - logger.info(f"Successfully updated function code for '{function_name}'. New version ARN: {function_arn}") - - # Also update configuration just in case - try: - logger.info(f"Updating configuration for '{function_name}'...") - lambda_client_local.update_function_configuration(**common_args) - logger.info(f"Configuration updated for '{function_name}'.") - except ClientError as conf_e: - logger.warning(f"Could not update configuration for '{function_name}': {conf_e}") - - # Re-verify invoke permission after update - time.sleep(5) - statement_id = f"AllowBedrockInvokeBasic-{function_name}" - try: - logger.info(f"Verifying/Adding basic invoke permission ({statement_id}) after update...") - lambda_client_local.add_permission( - FunctionName=function_name, - StatementId=statement_id, - Action='lambda:InvokeFunction', - Principal='bedrock.amazonaws.com' - ) - logger.info(f"Successfully added/verified basic invoke permission {statement_id}.") - except lambda_client_local.exceptions.ResourceConflictException: - logger.info(f"Permission {statement_id} already exists for {function_name}. Skipping add.") - except ClientError as perm_e: - logger.warning(f"Failed to add/verify basic invoke permission {statement_id} after update: {perm_e}") - - # Wait for function to be Active after update - logger.info(f"Waiting for function '{function_name}' update to complete...") - waiter = lambda_client_local.get_waiter('function_updated_v2') - waiter.wait(FunctionName=function_name, WaiterConfig={'Delay': 5, 'MaxAttempts': 24}) - logger.info(f"Function '{function_name}' update complete.") - - return function_arn # Return ARN after successful update - - except ClientError as update_e: - logger.error(f"Failed to update function '{function_name}': {update_e}") - if attempt < max_retries: - delay = base_delay * (2 ** (attempt - 1)) - logger.info(f"Retrying update in {delay} seconds...") - time.sleep(delay) - else: - logger.error("Maximum update retries reached. Deployment failed.") - raise update_e - - except ClientError as e: - # Handle throttling or other retryable errors - error_code = e.response.get('Error', {}).get('Code') - if error_code in ['ThrottlingException', 'ProvisionedConcurrencyConfigNotFoundException', 'EC2ThrottledException'] or 'Rate exceeded' in str(e): - logger.warning(f"Retryable error on attempt {attempt}: {e}") - if attempt < max_retries: - delay = base_delay * (2 ** (attempt - 1)) + (uuid.uuid4().int % 5) - logger.info(f"Retrying in {delay} seconds...") - time.sleep(delay) - else: - logger.error("Maximum retries reached after retryable error. Deployment failed.") - raise e - else: - logger.error(f"Error creating/updating Lambda '{function_name}': {e}") - logger.error(traceback.format_exc()) # Log full traceback for unexpected errors - raise e # Re-raise non-retryable or unexpected errors - except Exception as e: - logger.error(f"Unexpected error during Lambda deployment: {e}") - logger.error(traceback.format_exc()) - raise e - - # If loop completes without returning, something went wrong - raise Exception(f"Failed to deploy Lambda function {function_name} after {max_retries} attempts.") -``` - -### 3.9 Agent Resource Deletion Functions - -This subsection provides helper functions to manage the cleanup of AWS Bedrock Agent resources. Creating agents, action groups, and aliases results in persistent configurations in AWS. These functions are essential for maintaining a clean environment, especially during experimentation and development, by allowing for the removal of these resources when they are no longer needed or before recreating them in a subsequent run. - -#### 3.9.1 get_agent_by_name - -This utility function searches for an existing Bedrock Agent by its name. Since the AWS SDK's `get_agent` requires an `agentId`, and you often work with human-readable names, this function bridges that gap. It uses the `list_agents` operation (with a paginator to handle potentially many agents in an account) and iterates through the summaries, comparing the `agentName` field. If a match is found, it returns the corresponding `agentId`. If no agent with the given name is found or an error occurs during listing, it returns `None`. - - -```python -def get_agent_by_name(agent_client, agent_name): - """Find an agent ID by its name using list_agents.""" - logger.info(f"Attempting to find agent by name: {agent_name}") - try: - paginator = agent_client.get_paginator('list_agents') - for page in paginator.paginate(): - for agent_summary in page.get('agentSummaries', []): - if agent_summary.get('agentName') == agent_name: - agent_id = agent_summary.get('agentId') - logger.info(f"Found agent '{agent_name}' with ID: {agent_id}") - return agent_id - logger.info(f"Agent '{agent_name}' not found.") - return None - except ClientError as e: - logger.error(f"Error listing agents to find '{agent_name}': {e}") - return None # Treat as not found if error occurs -``` - -#### 3.9.2 delete_action_group - -This function handles the deletion of a specific action group associated with a Bedrock Agent. Action groups are always tied to the `DRAFT` version of an agent. It calls `delete_agent_action_group`, providing the `agentId`, `agentVersion='DRAFT'`, and the `actionGroupId`. It uses `skipResourceInUseCheck=True` to force deletion, which can be useful if the agent is in a state (like `PREPARING`) that might otherwise prevent immediate deletion. The function includes error handling for cases where the action group is not found or if a conflict occurs (e.g., agent is busy), attempting a retry after a delay in case of a conflict. It returns `True` if deletion was successful or the group was not found, and `False` if an unrecoverable error occurred. - - -```python -def delete_action_group(agent_client, agent_id, action_group_id): - """Deletes a specific action group for an agent.""" - logger.info(f"Attempting to delete action group {action_group_id} for agent {agent_id}...") - try: - agent_client.delete_agent_action_group( - agentId=agent_id, - agentVersion='DRAFT', # Action groups are tied to the DRAFT version - actionGroupId=action_group_id, - skipResourceInUseCheck=True # Force deletion even if in use (e.g., during prepare) - ) - logger.info(f"Successfully deleted action group {action_group_id} for agent {agent_id}.") - time.sleep(5) # Short pause after deletion - return True - except agent_client.exceptions.ResourceNotFoundException: - logger.info(f"Action group {action_group_id} not found for agent {agent_id}. Skipping deletion.") - return False - except ClientError as e: - # Handle potential throttling or conflict if prepare is happening - error_code = e.response.get('Error', {}).get('Code') - if error_code == 'ConflictException': - logger.warning(f"Conflict deleting action group {action_group_id} (agent might be preparing/busy). Retrying once after delay...") - time.sleep(15) - try: - agent_client.delete_agent_action_group( - agentId=agent_id, agentVersion='DRAFT', actionGroupId=action_group_id, skipResourceInUseCheck=True - ) - logger.info(f"Successfully deleted action group {action_group_id} after retry.") - return True - except Exception as retry_e: - logger.error(f"Error deleting action group {action_group_id} on retry: {retry_e}") - return False - else: - logger.error(f"Error deleting action group {action_group_id} for agent {agent_id}: {e}") - return False -``` - -#### 3.9.3 delete_agent_and_resources - -This function orchestrates the complete cleanup of a Bedrock Agent and its associated components. Its process is: -1. **Find Agent:** It first calls `get_agent_by_name` to retrieve the `agentId` for the specified `agent_name`. If the agent isn't found, it exits gracefully. -2. **Delete Action Groups:** It lists all action groups associated with the `DRAFT` version of the agent. For each action group found, it calls `delete_action_group` to remove it. -3. **Delete Agent:** After attempting to delete all action groups, it proceeds to delete the agent itself using `delete_agent` with `skipResourceInUseCheck=True` to force the deletion. -4. **Wait for Deletion:** It includes a custom polling loop to wait for the agent to be fully deleted by repeatedly calling `get_agent` and checking for a `ResourceNotFoundException`. This ensures that subsequent operations (like recreating an agent with the same name) are less likely to encounter conflicts. - - -```python -def delete_agent_and_resources(agent_client, agent_name): - """Deletes the agent and its associated action groups.""" - agent_id = get_agent_by_name(agent_client, agent_name) - if not agent_id: - logger.info(f"Agent '{agent_name}' not found, no deletion needed.") - return - - logger.warning(f"--- Deleting Agent Resources for '{agent_name}' (ID: {agent_id}) ---") - - # 1. Delete Action Groups - try: - logger.info(f"Listing action groups for agent {agent_id}...") - action_groups = agent_client.list_agent_action_groups( - agentId=agent_id, - agentVersion='DRAFT' # List groups for the DRAFT version - ).get('actionGroupSummaries', []) - - if action_groups: - logger.info(f"Found {len(action_groups)} action groups to delete.") - for ag in action_groups: - delete_action_group(agent_client, agent_id, ag['actionGroupId']) - else: - logger.info("No action groups found to delete.") - - except ClientError as e: - logger.error(f"Error listing action groups for agent {agent_id}: {e}") - # Continue to agent deletion attempt even if listing fails - - # 2. Delete the Agent - try: - logger.info(f"Attempting to delete agent {agent_id} ('{agent_name}')...") - agent_client.delete_agent(agentId=agent_id, skipResourceInUseCheck=True) # Force delete - - # Wait for agent deletion (custom waiter logic might be needed if no standard waiter) - logger.info(f"Waiting up to 2 minutes for agent {agent_id} deletion...") - deleted = False - for _ in range(24): # Check every 5 seconds for 2 minutes - try: - agent_client.get_agent(agentId=agent_id) - time.sleep(5) - except agent_client.exceptions.ResourceNotFoundException: - logger.info(f"Agent {agent_id} successfully deleted.") - deleted = True - break - except ClientError as e: - # Handle potential throttling during check - error_code = e.response.get('Error', {}).get('Code') - if error_code == 'ThrottlingException': - logger.warning("Throttled while checking agent deletion status, continuing wait...") - time.sleep(10) - else: - logger.error(f"Error checking agent deletion status: {e}") - # Break checking loop on unexpected error - break - if not deleted: - logger.warning(f"Agent {agent_id} deletion confirmation timed out.") - - except agent_client.exceptions.ResourceNotFoundException: - logger.info(f"Agent {agent_id} ('{agent_name}') already deleted or not found.") - except ClientError as e: - logger.error(f"Error deleting agent {agent_id}: {e}") - - logger.info(f"--- Agent Resource Deletion Complete for '{agent_name}' ---") -``` - -### 3.10 Agent Creation Functions - -This subsection contains functions dedicated to the setup and configuration of the Bedrock Agent itself, including its core definition, action groups that link it to tools (Lambda functions), and the preparation process that makes it ready for invocation. - -#### 3.10.1 create_agent - -This function creates a new Bedrock Agent. It takes the desired `agent_name`, the `agent_role_arn` (obtained from `create_agent_role`), and the `foundation_model_id` (e.g., for Claude Sonnet) as input. Key configurations include: -- **Instruction:** A detailed prompt that defines the agent's persona, capabilities, and how it should use its tools. The instruction in this notebook guides the agent to use a single "SearchAndFormat" tool and present results directly. -- **`idleSessionTTLInSeconds`:** Sets a timeout for how long an agent session can remain idle. -- **Description:** A brief description for the agent. -After calling `create_agent`, the function logs the initial response details (ID, ARN, status). It then enters a polling loop to wait until the agent's status moves out of the `CREATING` state, typically to `NOT_PREPARED`. If the agent creation fails and enters a `FAILED` state, it raises an exception. It returns the `agent_id` and `agent_arn` upon successful initiation of creation. - - -```python -def create_agent(agent_client, agent_name, agent_role_arn, foundation_model_id): - """Creates a new Bedrock Agent.""" - logger.info(f"--- Creating Agent: {agent_name} ---") - try: - # Updated Instruction for single tool - instruction = ( - "You are a helpful research assistant. Your primary function is to use the SearchAndFormat tool " - "to find relevant documents based on user queries and format them. " - "Use the user's query for the search, and specify a formatting style if requested, otherwise use the default. " - "Present the formatted results returned by the tool directly to the user." - "Only use the tool provided. Do not add your own knowledge." - ) - - response = agent_client.create_agent( - agentName=agent_name, - agentResourceRoleArn=agent_role_arn, - foundationModel=foundation_model_id, - instruction=instruction, - idleSessionTTLInSeconds=1800, # 30 minutes - description=f"Experimental agent for Couchbase search and content formatting ({foundation_model_id})" - # promptOverrideConfiguration={} # Optional: Add later if needed - ) - agent_info = response.get('agent') - agent_id = agent_info.get('agentId') - agent_arn = agent_info.get('agentArn') - agent_status = agent_info.get('agentStatus') - logger.info(f"Agent creation initiated. Name: {agent_name}, ID: {agent_id}, ARN: {agent_arn}, Status: {agent_status}") - - # Wait for agent to become NOT_PREPARED (initial state after creation) - # Using custom waiter logic as there might not be a standard one for this transition - logger.info(f"Waiting for agent {agent_id} to reach initial state...") - for _ in range(12): # Check for up to 1 minute - current_status = agent_client.get_agent(agentId=agent_id)['agent']['agentStatus'] - logger.info(f"Agent {agent_id} status: {current_status}") - if current_status != 'CREATING': # Expect NOT_PREPARED or FAILED - break - time.sleep(5) - - final_status = agent_client.get_agent(agentId=agent_id)['agent']['agentStatus'] - if final_status == 'FAILED': - logger.error(f"Agent {agent_id} creation failed.") - # Optionally retrieve failure reasons if API provides them - raise Exception(f"Agent creation failed for {agent_name}") - else: - logger.info(f"Agent {agent_id} successfully created (Status: {final_status}).") - - return agent_id, agent_arn - - except ClientError as e: - logger.error(f"Error creating agent '{agent_name}': {e}") - raise -``` - -#### 3.10.2 create_action_group - -This function creates or updates an action group for the specified agent. Action groups define the tools an agent can use. In this Lambda-based approach, the action group links the agent to the Lambda function that implements the tool. Key steps include: -- **Function Schema Definition:** It programmatically defines a `function_schema_details` dictionary. This schema describes the tool (`searchAndFormatDocuments`) that the Lambda function provides, including its name, description, and expected input parameters (`query`, `k`, `style`) with their types and whether they are required. This schema is what the agent uses to understand how to invoke the tool. -- **Idempotency:** It first checks if an action group with the given `action_group_name` already exists for the `DRAFT` version of the agent. - - If it exists, it attempts to update the existing action group using `update_agent_action_group`, ensuring the `actionGroupExecutor` points to the correct Lambda ARN and that it uses the `functionSchema` (for defining the tool via its signature) rather than an OpenAPI schema. - - If it doesn't exist, it creates a new action group using `create_agent_action_group`. -- **`actionGroupExecutor`:** This is set to `{'lambda': function_arn}`, where `function_arn` is the ARN of the deployed Lambda function. This tells Bedrock to invoke this Lambda when the agent decides to use a tool from this action group. -- **`functionSchema` Parameter:** The `functionSchema` (containing the `function_schema_details`) is provided to the `create_agent_action_group` or `update_agent_action_group` call. This method of defining tools is simpler for single functions compared to providing a full OpenAPI schema, which is also an option for more complex APIs. -- **State:** The action group is explicitly set to `ENABLED`. -A brief pause is added after creation/update to allow changes to propagate. The function returns the `actionGroupId`. - - -```python -def create_action_group(agent_client, agent_id, action_group_name, function_arn, schema_path=None): - """Creates an action group for the agent using Define with function details.""" - logger.info(f"--- Creating/Updating Action Group (Function Details): {action_group_name} for Agent: {agent_id} ---") - logger.info(f"Lambda ARN: {function_arn}") - - # Define function schema details (for functionSchema parameter) - function_schema_details = { - 'functions': [ - { - 'name': 'searchAndFormatDocuments', # Function name agent will call - 'description': 'Performs vector search based on query, retrieves documents, and formats results using specified style.', - 'parameters': { - 'query': { - 'description': 'The search query text.', - 'type': 'string', - 'required': True - }, - 'k': { - 'description': 'The maximum number of documents to retrieve.', - 'type': 'integer', - 'required': False # Making optional as Lambda has default - }, - 'style': { - 'description': 'The desired formatting style for the results (e.g., \'bullet points\', \'paragraph\', \'summary\').', - 'type': 'string', - 'required': False # Making optional as Lambda has default - } - } - } - ] - } - - try: - # Check if Action Group already exists for the DRAFT version - try: - logger.info(f"Checking if action group '{action_group_name}' already exists for agent {agent_id} DRAFT version...") - paginator = agent_client.get_paginator('list_agent_action_groups') - existing_group = None - for page in paginator.paginate(agentId=agent_id, agentVersion='DRAFT'): - for ag_summary in page.get('actionGroupSummaries', []): - if ag_summary.get('actionGroupName') == action_group_name: - existing_group = ag_summary - break - if existing_group: - break - - if existing_group: - ag_id = existing_group['actionGroupId'] - logger.warning(f"Action Group '{action_group_name}' (ID: {ag_id}) already exists for agent {agent_id} DRAFT. Attempting update to Function Details.") - # Update existing action group - REMOVE apiSchema, ADD functionSchema - response = agent_client.update_agent_action_group( - agentId=agent_id, - agentVersion='DRAFT', - actionGroupId=ag_id, - actionGroupName=action_group_name, - actionGroupExecutor={'lambda': function_arn}, - functionSchema={ # Use functionSchema - 'functions': function_schema_details['functions'] # Pass the list with the correct key - }, - actionGroupState='ENABLED' - ) - ag_info = response.get('agentActionGroup') - logger.info(f"Successfully updated Action Group '{action_group_name}' (ID: {ag_info.get('actionGroupId')}) to use Function Details.") - return ag_info.get('actionGroupId') - else: - logger.info(f"Action group '{action_group_name}' does not exist. Creating new with Function Details.") - - except ClientError as e: - logger.error(f"Error checking for existing action group '{action_group_name}': {e}. Proceeding with creation attempt.") - - - # Create new action group if not found or update failed implicitly - response = agent_client.create_agent_action_group( - agentId=agent_id, - agentVersion='DRAFT', - actionGroupName=action_group_name, - actionGroupExecutor={ - 'lambda': function_arn - }, - functionSchema={ # Use functionSchema - 'functions': function_schema_details['functions'] # Pass the list with the correct key - }, - actionGroupState='ENABLED' - ) - ag_info = response.get('agentActionGroup') - ag_id = ag_info.get('actionGroupId') - logger.info(f"Successfully created Action Group '{action_group_name}' with ID: {ag_id} using Function Details.") - time.sleep(5) # Pause after creation/update - return ag_id - - except ClientError as e: - logger.error(f"Error creating/updating action group '{action_group_name}' using Function Details: {e}") - raise -``` - -#### 3.10.3 prepare_agent - -This function initiates the preparation of the `DRAFT` version of the Bedrock Agent and waits for this process to complete. Preparation involves Bedrock compiling the agent's configuration (instructions, action groups, model settings) and making it ready for invocation. -- It calls `bedrock_agent_client.prepare_agent(agentId=agent_id)`. -- **Custom Waiter:** It then uses a custom-defined `boto3` waiter (`AgentPrepared`) to poll the agent's status. The waiter configuration specifies: - - `delay`: How often to check (e.g., every 30 seconds). - - `operation`: The SDK call to make for checking (`GetAgent`). - - `maxAttempts`: How many times to check before timing out (e.g., 20 attempts, for a total of up to 10 minutes). - - `acceptors`: Conditions that determine success, failure, or retry. It succeeds if `agent.agentStatus` becomes `PREPARED`, fails if it becomes `FAILED`, and retries if it's `UPDATING` (though `PREPARING` is the more typical intermediate state here). -If the waiter times out or the agent preparation results in a `FAILED` status, an exception is raised. This step is crucial because an agent cannot be invoked (or an alias reliably pointed to its version) until it is successfully prepared. - - -```python -def prepare_agent(agent_client, agent_id): - """Prepares the DRAFT version of the agent.""" - logger.info(f"--- Preparing Agent: {agent_id} ---") - try: - response = agent_client.prepare_agent(agentId=agent_id) - agent_version = response.get('agentVersion') # Should be DRAFT - prepared_at = response.get('preparedAt') - status = response.get('agentStatus') # Should be PREPARING - logger.info(f"Agent preparation initiated for version '{agent_version}'. Status: {status}. Prepared At: {prepared_at}") - - # Wait for preparation to complete (PREPARED or FAILED) - logger.info(f"Waiting for agent {agent_id} preparation to complete (up to 10 minutes)...") - # Define a simple waiter config - waiter_config = { - 'version': 2, - 'waiters': { - 'AgentPrepared': { - 'delay': 30, # Check every 30 seconds - 'operation': 'GetAgent', - 'maxAttempts': 20, # Max 10 minutes - 'acceptors': [ - { - 'matcher': 'path', - 'expected': 'PREPARED', - 'argument': 'agent.agentStatus', - 'state': 'success' - }, - { - 'matcher': 'path', - 'expected': 'FAILED', - 'argument': 'agent.agentStatus', - 'state': 'failure' - }, - { - 'matcher': 'path', - 'expected': 'UPDATING', # Can happen during prep? Treat as retryable - 'argument': 'agent.agentStatus', - 'state': 'retry' - } - ] - } - } - } - waiter_model = WaiterModel(waiter_config) - custom_waiter = create_waiter_with_client('AgentPrepared', waiter_model, agent_client) - - try: # Outer try for both preparation and alias handling - custom_waiter.wait(agentId=agent_id) - logger.info(f"Agent {agent_id} successfully prepared.") - - except Exception as e: # Outer except catches prepare_agent wait errors OR unhandled alias errors - logger.error(f"Agent {agent_id} preparation failed or timed out (or alias error): {e}") - # Check final status if possible - try: - final_status = agent_client.get_agent(agentId=agent_id)['agent']['agentStatus'] - logger.error(f"Final agent status: {final_status}") - except Exception as get_e: - logger.error(f"Could not retrieve final agent status after wait failure: {get_e}") - raise Exception(f"Agent preparation or alias setup failed for {agent_id}") - - except Exception as e: - logger.error(f"Error preparing agent {agent_id}: {e}") - # Handle error, maybe exit - raise e # Re-raise the exception -``` - -### 3.11 Agent Invocation Function - -This subsection provides the function used to interact with the prepared and aliased Bedrock Agent, sending it a prompt and processing its response. - -#### 3.11.1 test_agent_invocation - -This function is responsible for invoking the configured Bedrock Agent and handling its response. Key operations include: -- **Invocation:** Calls `bedrock_agent_runtime_client.invoke_agent` with the `agentId`, `agentAliasId`, a unique `sessionId` (generated for each invocation in this script), the user's `prompt` (inputText), and `enableTrace=True` to get detailed trace information for debugging. -- **Stream Processing:** The agent's response is a stream. The function iterates through the events in this stream (`response.get('completion', [])`). - - **`chunk` events:** These contain parts of the agent's textual response. The function decodes these byte chunks (UTF-8) and concatenates them to form the `completion_text`. - - **`trace` events:** If `enableTrace` was true, these events provide detailed insight into the agent's internal operations, such as which foundation model was called, the input to the model, any tool invocations (though in the Lambda approach, the tool invocation itself is handled by Bedrock calling Lambda, the trace might show the agent deciding to call it and the result from it), and rationale. The function collects these trace parts. -- **Logging:** It logs the final combined `completion_text` and a summary of the trace events, which can be very helpful for understanding the agent's decision-making process and debugging any issues with tool invocation or response generation. -It returns the final textual response from the agent. - - -```python -def test_agent_invocation(agent_runtime_client, agent_id, agent_alias_id, session_id, prompt): - """Invokes the agent and prints the response.""" - logger.info(f"--- Testing Agent Invocation (Agent ID: {agent_id}, Alias: {agent_alias_id}) ---") - logger.info(f"Session ID: {session_id}") - logger.info(f"Prompt: \"{prompt}\"") - - try: - response = agent_runtime_client.invoke_agent( - agentId=agent_id, - agentAliasId=agent_alias_id, - sessionId=session_id, - inputText=prompt, - enableTrace=True # Enable trace for debugging - ) - - logger.info("Agent invocation successful. Processing response...") - completion_text = "" - trace_events = [] - - # The response is a stream. Iterate through the chunks. - for event in response.get('completion', []): - if 'chunk' in event: - data = event['chunk'].get('bytes', b'') - decoded_chunk = data.decode('utf-8') - completion_text += decoded_chunk - elif 'trace' in event: - trace_part = event['trace'].get('trace') - if trace_part: - trace_events.append(trace_part) - else: - logger.warning(f"Unhandled event type in stream: {event}") - - # Log final combined response - logger.info(f"--- Agent Final Response ---{completion_text}") - - # Keep trace summary log (optional, can be removed if too verbose) - if trace_events: - logger.info("--- Invocation Trace Summary ---") - for i, trace in enumerate(trace_events): - trace_type = trace.get('type') - step_type = trace.get('orchestration', {}).get('stepType') - model_invocation_input = trace.get('modelInvocationInput') - if model_invocation_input: - fm_input = model_invocation_input.get('text', - json.dumps(model_invocation_input.get('invocationInput',{}).get('toolConfiguration',{})) # Handle tool input - ) - log_line = f"Trace {i+1}: Type={trace_type}, Step={step_type}" - rationale = trace.get('rationale', {}).get('text') - if rationale: log_line += f", Rationale=\"{rationale[:100]}...\"" - logger.info(log_line) # Log summary line - - return completion_text - - except ClientError as e: - logger.error(f"Error invoking agent: {e}") - logger.error(traceback.format_exc()) - return None - except Exception as e: - logger.error(f"Unexpected error during agent invocation: {e}") - logger.error(traceback.format_exc()) - return None -``` - -## 4. Main Execution Flow - -This is the primary section of the notebook where all the previously defined helper functions are called in sequence to set up the complete Bedrock Agent environment with a Lambda-backed tool, and then test its invocation. The flow is designed to be largely idempotent where possible, meaning it can often be re-run, and it will attempt to clean up or reuse existing resources before creating new ones (e.g., IAM roles, Lambda functions, agents). The major steps are outlined below: - -### 4.1 Initial Setup - -This first step in the main execution flow performs essential preliminary tasks: -1. Logs a starting message for the script execution. -2. Calls `check_environment_variables()` to ensure all required environment variables (AWS credentials, Couchbase password, etc.) are set. If not, it raises an `EnvironmentError` to halt execution, as the subsequent steps depend on these variables. -3. Calls `initialize_aws_clients()` to get the necessary `boto3` client objects for Bedrock, IAM, Lambda, etc. -4. Calls `connect_couchbase()` to establish a connection to the Couchbase cluster. -If any of these critical initialization steps fail, an exception is raised to stop the notebook's execution, preventing errors in later stages. - - -```python -logger.info("--- Starting Bedrock Agent Experiment Script ---") - -if not check_environment_variables(): - # In a notebook, raising an exception might be better than exit(1) - raise EnvironmentError("Missing required environment variables. Check logs.") - -# Initialize all clients, including the agent client -try: - bedrock_runtime_client, iam_client, lambda_client, bedrock_agent_client, bedrock_agent_runtime_client = initialize_aws_clients() - cb_cluster = connect_couchbase() - logger.info("AWS clients and Couchbase connection initialized.") -except Exception as e: - logger.error(f"Initialization failed: {e}") - raise # Re-raise the exception to stop execution -``` - - 2025-06-09 13:39:41,643 - INFO - --- Starting Bedrock Agent Experiment Script --- - 2025-06-09 13:39:41,644 - INFO - All required environment variables are set. - 2025-06-09 13:39:41,644 - INFO - Initializing AWS clients in region: us-east-1 - 2025-06-09 13:39:42,002 - INFO - AWS clients initialized successfully. - 2025-06-09 13:39:42,002 - INFO - Connecting to Couchbase cluster at couchbases://cb.hlcup4o4jmjr55yf.cloud.couchbase.com... - 2025-06-09 13:39:44,131 - INFO - Successfully connected to Couchbase. - 2025-06-09 13:39:44,132 - INFO - AWS clients and Couchbase connection initialized. - - -### 4.2 Couchbase Setup - -This block focuses on preparing the Couchbase environment to serve as the vector store for the agent. It involves: -1. Calling `setup_collection()`: This helper function ensures that the target Couchbase bucket, scope, and collection (defined by `CB_BUCKET_NAME`, `SCOPE_NAME`, `COLLECTION_NAME`) are created if they don't already exist. It also ensures a primary index is present on the collection. -2. Calling `setup_search_index()`: This creates or updates the Couchbase Full-Text Search (FTS) index (named by `INDEX_NAME`) using the definition from `INDEX_JSON_PATH`. This search index is crucial for performing vector similarity searches. -3. Calling `clear_collection()`: This function deletes all existing documents from the target collection. This step ensures that each run of the notebook starts with a clean slate, preventing data from previous experiments from interfering with the current one. -If any part of this Couchbase setup fails, an exception is logged and re-raised to stop further execution. - - -```python -try: - # Use the setup functions with the script's config variables - cb_collection = setup_collection(cb_cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) - logger.info(f"Couchbase collection '{CB_BUCKET_NAME}.{SCOPE_NAME}.{COLLECTION_NAME}' setup complete.") - - # Pass required args to setup_search_index - setup_search_index(cb_cluster, INDEX_NAME, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_JSON_PATH) - logger.info(f"Couchbase search index '{INDEX_NAME}' setup complete.") - - # Clear any existing documents from previous runs - clear_collection(cb_cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) - logger.info("Cleared any existing documents from the collection.") -except Exception as e: - logger.error(f"Couchbase setup failed: {e}") - raise -``` - - 2025-06-09 13:39:44,140 - INFO - Setting up collection: vector-search-testing/shared/bedrock - 2025-06-09 13:39:45,245 - INFO - Bucket 'vector-search-testing' exists. - 2025-06-09 13:39:46,219 - INFO - Scope 'shared' already exists. - 2025-06-09 13:39:47,149 - INFO - Collection 'bedrock' already exists. - 2025-06-09 13:39:47,152 - INFO - Ensuring primary index exists on `vector-search-testing`.`shared`.`bedrock`... - 2025-06-09 13:39:48,185 - INFO - Primary index present or created successfully. - 2025-06-09 13:39:48,186 - INFO - Collection setup complete. - 2025-06-09 13:39:48,187 - INFO - Couchbase collection 'vector-search-testing.shared.bedrock' setup complete. - 2025-06-09 13:39:48,187 - INFO - Looking for index definition at: /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/aws_index.json - 2025-06-09 13:39:48,192 - INFO - Loaded index definition from /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/aws_index.json, ensuring name is 'vector_search_bedrock' and source is 'vector-search-testing'. - 2025-06-09 13:39:48,193 - INFO - Upserting search index 'vector_search_bedrock'... - 2025-06-09 13:39:48,880 - WARNING - Search index 'vector_search_bedrock' likely already existed (caught QueryIndexAlreadyExistsException, check if applicable). Upsert attempted. - 2025-06-09 13:39:48,881 - INFO - Couchbase search index 'vector_search_bedrock' setup complete. - 2025-06-09 13:39:48,881 - WARNING - Attempting to clear all documents from `vector-search-testing`.`shared`.`bedrock`... - 2025-06-09 13:39:49,141 - WARNING - Could not retrieve mutation count after delete: 'list' object has no attribute 'meta_data' - 2025-06-09 13:39:49,142 - INFO - Successfully cleared documents from the collection (approx. 0 mutations). - 2025-06-09 13:39:49,143 - INFO - Cleared any existing documents from the collection. - - -### 4.3 Vector Store Initialization and Data Loading - -With the Couchbase infrastructure in place, this section prepares the LangChain vector store and populates it with data: -1. **Initialize `BedrockEmbeddings`:** Creates an instance of the `BedrockEmbeddings` client, specifying the `EMBEDDING_MODEL_ID` (e.g., Amazon Titan Text Embeddings V2). This client will be used by the vector store to convert text documents into numerical embeddings for similarity searching. -2. **Initialize `CouchbaseSearchVectorStore`:** Creates an instance of `CouchbaseSearchVectorStore`. This LangChain component acts as an abstraction layer over the Couchbase collection and search index, providing methods for adding documents and performing similarity searches. It's configured with the Couchbase cluster connection, bucket/scope/collection names, the embeddings client, and the search index name. -3. **Load Documents from JSON:** Reads document data from the `DOCS_JSON_PATH` file. This file is expected to contain a list of documents, each with `text` and `metadata` fields. -4. **Add Documents to Vector Store:** If documents are loaded, their texts and metadatas are extracted. The `vector_store.add_texts()` method is then called to process these documents: each document's text is converted into an embedding (using the `BedrockEmbeddings` client), and both the text and its embedding (along with metadata) are stored in the Couchbase collection. The search index (`INDEX_NAME`) is then updated to include these new vectors, making them searchable. -Error handling is included to catch issues like file not found or problems during embedding generation or data insertion. - ->Note: `documents.json` contains the documents that we want to load into our vector store. As an example, we have added a few documents to the file from [https://cline.bot/](https://cline.bot/) -Let's load the documents from the documents.json file and add them to our vector store: - - -```python -try: - logger.info(f"Initializing Bedrock Embeddings client with model: {EMBEDDING_MODEL_ID}") - embeddings = BedrockEmbeddings( - client=bedrock_runtime_client, - model_id=EMBEDDING_MODEL_ID - ) - logger.info("Successfully created Bedrock embeddings client.") - - logger.info(f"Initializing CouchbaseSearchVectorStore with index: {INDEX_NAME}") - vector_store = CouchbaseSearchVectorStore( - cluster=cb_cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME - ) - logger.info("Successfully created Couchbase vector store.") - - # Load documents from JSON file - logger.info(f"Looking for documents at: {DOCS_JSON_PATH}") - if not os.path.exists(DOCS_JSON_PATH): - logger.error(f"Documents file not found: {DOCS_JSON_PATH}") - raise FileNotFoundError(f"Documents file not found: {DOCS_JSON_PATH}") - - with open(DOCS_JSON_PATH, 'r') as f: - data = json.load(f) - documents_to_load = data.get('documents', []) - logger.info(f"Loaded {len(documents_to_load)} documents from {DOCS_JSON_PATH}") - - # Add documents to vector store - if documents_to_load: - logger.info(f"Adding {len(documents_to_load)} documents to vector store...") - texts = [doc.get('text', '') for doc in documents_to_load] - metadatas = [] - for i, doc in enumerate(documents_to_load): - metadata_raw = doc.get('metadata', {}) - if isinstance(metadata_raw, str): - try: - metadata = json.loads(metadata_raw) - if not isinstance(metadata, dict): - logger.warning(f"Metadata for doc {i} parsed from string is not a dict: {metadata}. Using empty dict.") - metadata = {} - except json.JSONDecodeError: - logger.warning(f"Could not parse metadata string for doc {i}: {metadata_raw}. Using empty dict.") - metadata = {} - elif isinstance(metadata_raw, dict): - metadata = metadata_raw - else: - logger.warning(f"Metadata for doc {i} is not a string or dict: {metadata_raw}. Using empty dict.") - metadata = {} - metadatas.append(metadata) - - inserted_ids = vector_store.add_texts(texts=texts, metadatas=metadatas) - logger.info(f"Successfully added {len(inserted_ids)} documents to the vector store.") - else: - logger.warning("No documents found in the JSON file to add.") - -except FileNotFoundError as e: - logger.error(f"Setup failed: {e}") - raise -except Exception as e: - logger.error(f"Error during vector store setup or data loading: {e}") - logger.error(traceback.format_exc()) - raise - -logger.info("--- Couchbase Setup and Data Loading Complete ---") -``` - - 2025-06-09 13:39:49,152 - INFO - Initializing Bedrock Embeddings client with model: amazon.titan-embed-text-v2:0 - 2025-06-09 13:39:49,153 - INFO - Successfully created Bedrock embeddings client. - 2025-06-09 13:39:49,153 - INFO - Initializing CouchbaseSearchVectorStore with index: vector_search_bedrock - 2025-06-09 13:39:52,549 - INFO - Successfully created Couchbase vector store. - 2025-06-09 13:39:52,549 - INFO - Looking for documents at: /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/documents.json - 2025-06-09 13:39:52,551 - INFO - Loaded 7 documents from /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/documents.json - 2025-06-09 13:39:52,551 - INFO - Adding 7 documents to vector store... - 2025-06-09 13:39:56,544 - INFO - Successfully added 7 documents to the vector store. - 2025-06-09 13:39:56,545 - INFO - --- Couchbase Setup and Data Loading Complete --- - - -### 4.4 Create IAM Role - -This step ensures that the necessary IAM (Identity and Access Management) role for the Bedrock Agent and its Lambda function is in place. -- It defines a `agent_role_name` (e.g., `bedrock_agent_lambda_exp_role`). -- It calls the `create_agent_role()` helper function. This function (described in section 3.7) either creates a new IAM role with this name or updates an existing one. -- The role is configured with a trust policy allowing both the Bedrock service and the Lambda service to assume it. -- It attaches necessary permissions policies, including `AWSLambdaBasicExecutionRole` for Lambda logging and custom inline policies for Bedrock access and any other required permissions. -- The AWS Account ID, needed for defining precise resource ARNs in policies, is fetched dynamically using the STS client if not already available as an environment variable. -The ARN of this role (`agent_role_arn`) is stored, as it's a required parameter for creating both the Bedrock Agent and the AWS Lambda function that the agent will invoke. - - -```python -agent_role_name = "bedrock_agent_lambda_exp_role" -try: - # Ensure AWS_ACCOUNT_ID is loaded correctly - if not AWS_ACCOUNT_ID: - logger.info("Attempting to fetch AWS Account ID...") - sts_client = boto3.client('sts', region_name=AWS_REGION) - AWS_ACCOUNT_ID = sts_client.get_caller_identity().get('Account') - if not AWS_ACCOUNT_ID: - raise ValueError("AWS Account ID could not be determined. Please set the AWS_ACCOUNT_ID environment variable.") - logger.info(f"Fetched AWS Account ID: {AWS_ACCOUNT_ID}") - - agent_role_arn = create_agent_role(iam_client, agent_role_name, AWS_ACCOUNT_ID) - logger.info(f"Agent IAM Role ARN: {agent_role_arn}") -except Exception as e: - logger.error(f"Failed to create/verify IAM role: {e}") - logger.error(traceback.format_exc()) - raise -``` - - 2025-06-09 13:39:56,553 - INFO - Checking/Creating IAM role: bedrock_agent_lambda_exp_role - 2025-06-09 13:39:57,454 - INFO - IAM role 'bedrock_agent_lambda_exp_role' already exists with ARN: arn:aws:iam::598307997273:role/bedrock_agent_lambda_exp_role - 2025-06-09 13:39:57,454 - INFO - Updating trust policy for existing role 'bedrock_agent_lambda_exp_role'... - 2025-06-09 13:39:57,710 - INFO - Trust policy updated for role 'bedrock_agent_lambda_exp_role'. - 2025-06-09 13:39:57,710 - INFO - Attaching basic Lambda execution policy to role 'bedrock_agent_lambda_exp_role'... - 2025-06-09 13:39:57,973 - INFO - Attached basic Lambda execution policy. - 2025-06-09 13:39:57,974 - INFO - Putting basic inline policy 'LambdaBasicLoggingPermissions' for role 'bedrock_agent_lambda_exp_role'... - 2025-06-09 13:39:58,240 - INFO - Successfully put inline policy 'LambdaBasicLoggingPermissions'. - 2025-06-09 13:39:58,240 - INFO - Putting Bedrock permissions policy 'BedrockAgentPermissions' for role 'bedrock_agent_lambda_exp_role'... - 2025-06-09 13:39:58,607 - INFO - Successfully put inline policy 'BedrockAgentPermissions'. - 2025-06-09 13:39:58,608 - INFO - Waiting 10s for policy changes to propagate... - 2025-06-09 13:40:08,612 - INFO - Agent IAM Role ARN: arn:aws:iam::598307997273:role/bedrock_agent_lambda_exp_role - - -### 4.5 Deploy Lambda Function - -This section orchestrates the deployment of the AWS Lambda function that will execute the agent's `searchAndFormatDocuments` tool. The process involves several steps managed by the helper functions: -1. **Define Lambda Details:** Specifies the `search_format_lambda_name` (e.g., `bedrock_agent_search_format_exp`), the `lambda_source_dir` (where the Lambda's Python script and `Makefile` are located), and `lambda_build_dir` (where the final .zip package will be placed). -2. **Cleanup Old Lambdas (Optional but Recommended):** Calls `delete_lambda_function` for potentially conflicting older Lambda functions (e.g., separate researcher/writer Lambdas from previous experiments or an old version of the current combined Lambda). This ensures a cleaner environment, especially during iterative development. -3. **Package Lambda:** Calls `package_function()`. This helper (described in 3.8.3) uses the `Makefile` in `lambda_source_dir` to install dependencies, prepare the handler script (`bedrock_agent_search_and_format.py`), and create a .zip deployment package (`search_format_zip_path`). -4. **Create/Update Lambda in AWS:** Calls `create_lambda_function()`. This helper (described in 3.8.4) takes the .zip package and either creates a new Lambda function in AWS or updates an existing one. It handles S3 upload for large packages, sets environment variables (like Couchbase connection info and Bedrock model IDs), configures the IAM role, runtime, handler, timeout, and memory. It also adds permissions for Bedrock to invoke the Lambda and waits for the Lambda to become active. -5. **Cleanup Deployment Package:** After successful deployment, the local .zip file is removed to save space. -The ARN of the deployed Lambda (`search_format_lambda_arn`) is stored, as it's needed to link this Lambda to the Bedrock Agent's action group. - - -```python -search_format_lambda_name = "bedrock_agent_search_format_exp" -# Adjust source/build dirs for notebook context if necessary -lambda_source_dir = os.path.join(SCRIPT_DIR, 'lambda_functions') -lambda_build_dir = SCRIPT_DIR # Final zip ends up in the notebook's directory - -logger.info("--- Starting Lambda Deployment (Single Function) --- ") -search_format_lambda_arn = None -search_format_zip_path = None - -try: - # Delete old lambdas if they exist (optional, but good cleanup) - logger.info("Deleting potentially conflicting old Lambda functions...") - delete_lambda_function(lambda_client, "bedrock_agent_researcher_exp") - delete_lambda_function(lambda_client, "bedrock_agent_writer_exp") - # Delete the new lambda if it exists from a previous run - delete_lambda_function(lambda_client, search_format_lambda_name) - logger.info("Old Lambda deletion checks complete.") - - logger.info(f"Packaging Lambda function '{search_format_lambda_name}'...") - search_format_zip_path = package_function("bedrock_agent_search_and_format", lambda_source_dir, lambda_build_dir) - logger.info(f"Lambda function packaged at: {search_format_zip_path}") - - logger.info(f"Creating/Updating Lambda function '{search_format_lambda_name}'...") - search_format_lambda_arn = create_lambda_function( - lambda_client=lambda_client, function_name=search_format_lambda_name, - handler='lambda_function.lambda_handler', role_arn=agent_role_arn, - zip_file=search_format_zip_path, region=AWS_REGION - ) - logger.info(f"Search/Format Lambda Deployed: {search_format_lambda_arn}") - -except FileNotFoundError as e: - logger.error(f"Lambda packaging failed: Required file not found. {e}") - raise -except Exception as e: - logger.error(f"Lambda deployment failed: {e}") - logger.error(traceback.format_exc()) - raise -finally: - logger.info("Cleaning up deployment zip file...") - if search_format_zip_path and os.path.exists(search_format_zip_path): - try: - os.remove(search_format_zip_path) - logger.info(f"Removed zip file: {search_format_zip_path}") - except OSError as e: - logger.warning(f"Could not remove zip file {search_format_zip_path}: {e}") - -logger.info("--- Lambda Deployment Complete --- ") -``` - - 2025-06-09 13:40:08,624 - INFO - --- Starting Lambda Deployment (Single Function) --- - 2025-06-09 13:40:08,626 - INFO - Deleting potentially conflicting old Lambda functions... - 2025-06-09 13:40:08,626 - INFO - Attempting to delete Lambda function: bedrock_agent_researcher_exp... - 2025-06-09 13:40:08,626 - INFO - Attempting to remove permission AllowBedrockInvokeBasic-bedrock_agent_researcher_exp from bedrock_agent_researcher_exp... - 2025-06-09 13:40:09,490 - INFO - Permission AllowBedrockInvokeBasic-bedrock_agent_researcher_exp not found on bedrock_agent_researcher_exp. Skipping removal. - 2025-06-09 13:40:09,794 - INFO - Lambda function 'bedrock_agent_researcher_exp' does not exist. No need to delete. - 2025-06-09 13:40:09,795 - INFO - Attempting to delete Lambda function: bedrock_agent_writer_exp... - 2025-06-09 13:40:09,796 - INFO - Attempting to remove permission AllowBedrockInvokeBasic-bedrock_agent_writer_exp from bedrock_agent_writer_exp... - 2025-06-09 13:40:10,082 - INFO - Permission AllowBedrockInvokeBasic-bedrock_agent_writer_exp not found on bedrock_agent_writer_exp. Skipping removal. - 2025-06-09 13:40:10,387 - INFO - Lambda function 'bedrock_agent_writer_exp' does not exist. No need to delete. - 2025-06-09 13:40:10,387 - INFO - Attempting to delete Lambda function: bedrock_agent_search_format_exp... - 2025-06-09 13:40:10,387 - INFO - Attempting to remove permission AllowBedrockInvokeBasic-bedrock_agent_search_format_exp from bedrock_agent_search_format_exp... - 2025-06-09 13:40:10,686 - INFO - Successfully removed permission AllowBedrockInvokeBasic-bedrock_agent_search_format_exp from bedrock_agent_search_format_exp. - 2025-06-09 13:40:13,060 - INFO - Function bedrock_agent_search_format_exp exists. Deleting... - 2025-06-09 13:40:13,594 - INFO - Waiting for bedrock_agent_search_format_exp to be deleted... - 2025-06-09 13:40:23,596 - INFO - Function bedrock_agent_search_format_exp deletion initiated. - 2025-06-09 13:40:23,597 - INFO - Old Lambda deletion checks complete. - 2025-06-09 13:40:23,598 - INFO - Packaging Lambda function 'bedrock_agent_search_format_exp'... - 2025-06-09 13:40:23,599 - INFO - --- Packaging function bedrock_agent_search_and_format --- - 2025-06-09 13:40:23,602 - INFO - Source Dir (Makefile location & make cwd): /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions - 2025-06-09 13:40:23,602 - INFO - Build Dir (Final zip location): /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach - 2025-06-09 13:40:23,603 - INFO - Copying /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions/bedrock_agent_search_and_format.py to /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions/lambda_function.py - 2025-06-09 13:40:23,605 - INFO - Running make command: make -f /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions/Makefile clean package (in /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions) - 2025-06-09 13:40:50,341 - INFO - Make command completed successfully. - 2025-06-09 13:40:50,343 - INFO - Moving and renaming /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions/lambda_package.zip to /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/bedrock_agent_search_and_format.zip - - ... (output truncated for brevity) - - -### 4.6 Agent Setup - -This part of the script focuses on creating the Bedrock Agent itself. -1. **Define Agent Name:** An `agent_name` is defined (e.g., `couchbase_search_format_agent_exp`). -2. **Cleanup Existing Agent (Idempotency):** It calls `delete_agent_and_resources()` first. This helper function (described in 3.9.3) attempts to find an agent with the same name and, if found, deletes it along with its action groups and aliases. This ensures that each run starts with a clean slate for the agent, preventing conflicts or issues from previous configurations. -3. **Create New Agent:** After the cleanup attempt, it calls `create_agent()`. This helper function (described in 3.10.1) creates a new Bedrock Agent with the specified name, the IAM role ARN (`agent_role_arn`), the foundation model ID (`AGENT_MODEL_ID`), and a set of instructions guiding the agent on how to behave and use its tools. -The `agent_id` and `agent_arn` returned by `create_agent()` are stored for subsequent steps like creating action groups and preparing the agent. - - -```python -agent_name = f"couchbase_search_format_agent_exp" -agent_id = None -agent_arn = None -alias_name = "prod" # Define alias name here -# agent_alias_id_to_use will be set later after preparation - -# 1. Attempt to find and delete existing agent to ensure a clean state -logger.info(f"Checking for and deleting existing agent: {agent_name}") -try: - delete_agent_and_resources(bedrock_agent_client, agent_name) # Handles finding and deleting - logger.info(f"Deletion process completed for any existing agent named {agent_name}.") -except Exception as e: - # Log error during find/delete but proceed to creation attempt - logger.error(f"Error during agent finding/deletion phase: {e}. Proceeding to creation attempt.") - -# 2. Always attempt to create the agent after the delete phase -logger.info(f"--- Creating Agent: {agent_name} ---") -try: - agent_id, agent_arn = create_agent( - agent_client=bedrock_agent_client, - agent_name=agent_name, - agent_role_arn=agent_role_arn, - foundation_model_id=AGENT_MODEL_ID - ) - if not agent_id: - raise Exception("create_agent function did not return a valid agent ID.") - logger.info(f"Agent created successfully. ID: {agent_id}, ARN: {agent_arn}") -except Exception as e: - logger.error(f"Failed to create agent '{agent_name}': {e}") - logger.error(traceback.format_exc()) - raise -``` - - 2025-06-09 13:41:12,317 - INFO - Checking for and deleting existing agent: couchbase_search_format_agent_exp - 2025-06-09 13:41:12,318 - INFO - Attempting to find agent by name: couchbase_search_format_agent_exp - 2025-06-09 13:41:13,172 - INFO - Found agent 'couchbase_search_format_agent_exp' with ID: 8CZXA8LJJH - 2025-06-09 13:41:13,172 - WARNING - --- Deleting Agent Resources for 'couchbase_search_format_agent_exp' (ID: 8CZXA8LJJH) --- - 2025-06-09 13:41:13,172 - INFO - Listing action groups for agent 8CZXA8LJJH... - 2025-06-09 13:41:13,472 - INFO - Found 1 action groups to delete. - 2025-06-09 13:41:13,473 - INFO - Attempting to delete action group GKWWTGZVHJ for agent 8CZXA8LJJH... - 2025-06-09 13:41:13,794 - INFO - Successfully deleted action group GKWWTGZVHJ for agent 8CZXA8LJJH. - 2025-06-09 13:41:18,797 - INFO - Attempting to delete agent 8CZXA8LJJH ('couchbase_search_format_agent_exp')... - 2025-06-09 13:41:19,108 - INFO - Waiting up to 2 minutes for agent 8CZXA8LJJH deletion... - 2025-06-09 13:41:25,109 - INFO - Agent 8CZXA8LJJH successfully deleted. - 2025-06-09 13:41:25,110 - INFO - --- Agent Resource Deletion Complete for 'couchbase_search_format_agent_exp' --- - 2025-06-09 13:41:25,111 - INFO - Deletion process completed for any existing agent named couchbase_search_format_agent_exp. - 2025-06-09 13:41:25,111 - INFO - --- Creating Agent: couchbase_search_format_agent_exp --- - 2025-06-09 13:41:25,112 - INFO - --- Creating Agent: couchbase_search_format_agent_exp --- - 2025-06-09 13:41:25,623 - INFO - Agent creation initiated. Name: couchbase_search_format_agent_exp, ID: 7BTR61MXVF, ARN: arn:aws:bedrock:us-east-1:598307997273:agent/7BTR61MXVF, Status: CREATING - 2025-06-09 13:41:25,625 - INFO - Waiting for agent 7BTR61MXVF to reach initial state... - 2025-06-09 13:41:26,201 - INFO - Agent 7BTR61MXVF status: CREATING - 2025-06-09 13:41:31,658 - INFO - Agent 7BTR61MXVF status: NOT_PREPARED - 2025-06-09 13:41:32,110 - INFO - Agent 7BTR61MXVF successfully created (Status: NOT_PREPARED). - 2025-06-09 13:41:32,111 - INFO - Agent created successfully. ID: 7BTR61MXVF, ARN: arn:aws:bedrock:us-east-1:598307997273:agent/7BTR61MXVF - - -### 4.7 Action Group Setup - -Once the agent is created and the Lambda function is deployed, this step links them together by creating an Action Group. -- It defines an `action_group_name` (e.g., `SearchAndFormatActionGroup`). -- It calls the `create_action_group()` helper function (described in 3.10.2). This function is responsible for: - - Taking the `agent_id` and the `search_format_lambda_arn` (the ARN of the deployed Lambda function) as input. - - Defining the `functionSchema` which tells the agent how to use the Lambda function (i.e., the tool name `searchAndFormatDocuments` and its parameters like `query`, `k`, `style`). - - Setting the `actionGroupExecutor` to point to the Lambda ARN, so Bedrock knows which Lambda to invoke. - - Creating a new action group or updating an existing one with the same name for the `DRAFT` version of the agent. -- A 30-second pause (`time.sleep(30)`) is added after the action group setup. This is a crucial step to give AWS services enough time to propagate the changes and ensure that the agent is aware of the newly configured or updated action group before proceeding to the preparation phase. Without such a delay, the preparation step might fail or not correctly incorporate the action group. - - -```python -# --- Action Group Creation/Update (Now assumes agent_id is valid) --- -action_group_name = "SearchAndFormatActionGroup" -action_group_id = None -try: - if not agent_id: - raise ValueError("Agent ID is not set. Cannot create action group.") - if not search_format_lambda_arn: - raise ValueError("Lambda ARN is not set. Cannot create action group.") - - logger.info(f"Creating/Updating Action Group '{action_group_name}' for agent {agent_id}...") - action_group_id = create_action_group( - agent_client=bedrock_agent_client, - agent_id=agent_id, - action_group_name=action_group_name, - function_arn=search_format_lambda_arn, - # schema_path=None # No longer needed explicitly if default is None - ) - if not action_group_id: - raise Exception("create_action_group did not return a valid ID.") - logger.info(f"Action Group '{action_group_name}' created/updated with ID: {action_group_id}") - - # Add a slightly longer wait after action group modification/creation - logger.info("Waiting 30s after action group setup before preparing agent...") - time.sleep(30) -except Exception as e: - logger.error(f"Failed to set up action group: {e}") - logger.error(traceback.format_exc()) - raise -``` - - 2025-06-09 13:41:32,119 - INFO - Creating/Updating Action Group 'SearchAndFormatActionGroup' for agent 7BTR61MXVF... - 2025-06-09 13:41:32,120 - INFO - --- Creating/Updating Action Group (Function Details): SearchAndFormatActionGroup for Agent: 7BTR61MXVF --- - 2025-06-09 13:41:32,121 - INFO - Lambda ARN: arn:aws:lambda:us-east-1:598307997273:function:bedrock_agent_search_format_exp - 2025-06-09 13:41:32,122 - INFO - Checking if action group 'SearchAndFormatActionGroup' already exists for agent 7BTR61MXVF DRAFT version... - 2025-06-09 13:41:32,412 - INFO - Action group 'SearchAndFormatActionGroup' does not exist. Creating new with Function Details. - 2025-06-09 13:41:32,806 - INFO - Successfully created Action Group 'SearchAndFormatActionGroup' with ID: 7XTTI9XFOX using Function Details. - 2025-06-09 13:41:37,812 - INFO - Action Group 'SearchAndFormatActionGroup' created/updated with ID: 7XTTI9XFOX - 2025-06-09 13:41:37,812 - INFO - Waiting 30s after action group setup before preparing agent... - - -### 4.8 Prepare Agent and Handle Alias - -After the agent and its action group (linking to the Lambda tool) are defined, this section makes the agent ready for use and assigns an alias to it: -1. **Prepare Agent:** It calls the `prepare_agent()` helper function (described in 3.10.3). This function initiates the preparation process for the `DRAFT` version of the agent and uses a custom waiter to wait until the agent's status becomes `PREPARED`. This step is vital as it compiles all agent configurations. -2. **Alias Handling (Create or Update):** Once the agent is successfully prepared: - - An `alias_name` (e.g., `prod`) is defined. - - The code checks if an alias with this name already exists for the agent using `list_agent_aliases`. - - If the alias exists, its ID (`agent_alias_id_to_use`) is retrieved. The notebook assumes the existing alias will correctly point to the latest prepared version (DRAFT) or could be updated if necessary (though direct update logic for the alias to point to a specific version isn't explicitly shown here beyond creation). - - If the alias does not exist, `create_agent_alias()` is called. This creates a new alias that, by default, points to the latest prepared version of the agent (which is the `DRAFT` version that was just prepared). - - A brief pause (`time.sleep(10)`) is added to allow the alias changes to propagate. -The `agent_alias_id_to_use` is now ready for invoking the agent. - - -```python -agent_alias_id_to_use = None # Initialize alias ID -alias_name = "prod" # Make sure alias_name is defined -if agent_id: - logger.info(f"--- Preparing Agent: {agent_id} ---") - preparation_successful = False - try: - # prepare_agent now ONLY prepares, doesn't handle alias or return its ID - prepare_agent(bedrock_agent_client, agent_id) - logger.info(f"Agent {agent_id} preparation seems complete (waiter succeeded).") - preparation_successful = True # Flag success - - except Exception as e: # Catch errors from preparation - logger.error(f"Error during agent preparation for {agent_id}: {e}") - logger.error(traceback.format_exc()) - raise - - # --- Alias Handling (runs only if preparation succeeded) --- - if preparation_successful: - logger.info(f"--- Setting up Alias '{alias_name}' for Agent {agent_id} ---") # Add log - try: - # --- Alias Creation/Update Logic (Copied/adapted from main.py's __main__) --- - logger.info(f"Checking for alias '{alias_name}' for agent {agent_id}...") - existing_alias = None - paginator = bedrock_agent_client.get_paginator('list_agent_aliases') - for page in paginator.paginate(agentId=agent_id): - for alias_summary in page.get('agentAliasSummaries', []): - if alias_summary.get('agentAliasName') == alias_name: - existing_alias = alias_summary - break - if existing_alias: - break - - if existing_alias: - agent_alias_id_to_use = existing_alias['agentAliasId'] - logger.info(f"Using existing alias '{alias_name}' with ID: {agent_alias_id_to_use}.") - # Optional: Update alias to point to DRAFT if needed, - # but create_agent_alias defaults to latest prepared (DRAFT) so just checking existence is often enough. - else: - logger.info(f"Alias '{alias_name}' not found. Creating new alias...") - create_alias_response = bedrock_agent_client.create_agent_alias( - agentId=agent_id, - agentAliasName=alias_name - # routingConfiguration removed - defaults to latest prepared (DRAFT) - ) - agent_alias_id_to_use = create_alias_response.get('agentAlias', {}).get('agentAliasId') - logger.info(f"Successfully created alias '{alias_name}' with ID: {agent_alias_id_to_use}. (Defaults to latest prepared version - DRAFT)") - - if not agent_alias_id_to_use: - raise ValueError(f"Failed to get a valid alias ID for '{alias_name}'") - - logger.info(f"Waiting 10s for alias '{alias_name}' changes to propagate...") - time.sleep(10) - logger.info(f"Agent {agent_id} preparation and alias '{alias_name}' ({agent_alias_id_to_use}) setup complete.") - - - except Exception as alias_e: # Catch errors from alias logic - logger.error(f"Failed to create/update alias '{alias_name}' for agent {agent_id}: {alias_e}") - logger.error(traceback.format_exc()) - raise -else: - logger.error("Agent ID not available, skipping preparation and alias setup.") -``` - - 2025-06-09 13:42:07,835 - INFO - --- Preparing Agent: 7BTR61MXVF --- - 2025-06-09 13:42:07,836 - INFO - --- Preparing Agent: 7BTR61MXVF --- - 2025-06-09 13:42:08,338 - INFO - Agent preparation initiated for version 'DRAFT'. Status: PREPARING. Prepared At: 2025-06-09 08:12:08.237735+00:00 - 2025-06-09 13:42:08,338 - INFO - Waiting for agent 7BTR61MXVF preparation to complete (up to 10 minutes)... - 2025-06-09 13:42:39,237 - INFO - Agent 7BTR61MXVF successfully prepared. - 2025-06-09 13:42:39,238 - INFO - Agent 7BTR61MXVF preparation seems complete (waiter succeeded). - 2025-06-09 13:42:39,238 - INFO - --- Setting up Alias 'prod' for Agent 7BTR61MXVF --- - 2025-06-09 13:42:39,239 - INFO - Checking for alias 'prod' for agent 7BTR61MXVF... - 2025-06-09 13:42:39,526 - INFO - Alias 'prod' not found. Creating new alias... - 2025-06-09 13:42:39,902 - INFO - Successfully created alias 'prod' with ID: Y8YNYUDFFZ. (Defaults to latest prepared version - DRAFT) - 2025-06-09 13:42:39,903 - INFO - Waiting 10s for alias 'prod' changes to propagate... - 2025-06-09 13:42:49,907 - INFO - Agent 7BTR61MXVF preparation and alias 'prod' (Y8YNYUDFFZ) setup complete. - - -### 4.9 Test Agent Invocation - -This is the final operational step where the fully configured Bedrock Agent is tested. -- It first checks if both `agent_id` and `agent_alias_id_to_use` are available (i.e., the previous setup steps were successful). -- A unique `session_id` is generated for this specific interaction. -- A `test_prompt` is defined (e.g., "Search for information about Project Chimera and format the results using bullet points."). This prompt is designed to trigger the agent's tool (`searchAndFormatDocuments`). -- It then calls the `test_agent_invocation()` helper function (described in 3.11.1). This function sends the prompt to the Bedrock Agent Runtime using the specified agent ID and alias ID. -- The `test_agent_invocation` function handles the streaming response from the agent, concatenates the text chunks, logs trace information for debugging, and prints the agent's final completion. -This step demonstrates an end-to-end test of the agent: receiving a prompt, deciding to use its Lambda-backed tool, Bedrock invoking the Lambda, the Lambda executing (performing search and formatting), returning results to the agent, and the agent formulating a final response to the user. - - -```python -# --- Test Invocation --- -# Agent ID and custom alias ID should be valid here -if agent_id and agent_alias_id_to_use: # Check both are set - session_id = str(uuid.uuid4()) - test_prompt = "Search for information about Project Chimera and format the results using bullet points." - logger.info(f"--- Invoking Agent {agent_id} using Alias '{alias_name}' ({agent_alias_id_to_use}) ---") # Updated log - try: - completion = test_agent_invocation( - agent_runtime_client=bedrock_agent_runtime_client, - agent_id=agent_id, - agent_alias_id=agent_alias_id_to_use, - session_id=session_id, - prompt=test_prompt - ) - if completion is None: - logger.error("Agent invocation failed.") - except Exception as e: - logger.error(f"Error during test invocation: {e}") - logger.error(traceback.format_exc()) -else: - logger.error("Agent ID or Alias ID not available, skipping invocation test.") -``` - - 2025-06-09 13:42:49,923 - INFO - --- Invoking Agent 7BTR61MXVF using Alias 'prod' (Y8YNYUDFFZ) --- - 2025-06-09 13:42:49,924 - INFO - --- Testing Agent Invocation (Agent ID: 7BTR61MXVF, Alias: Y8YNYUDFFZ) --- - 2025-06-09 13:42:49,925 - INFO - Session ID: 6529a5a7-0b58-4c7d-8682-20353a8f09c3 - 2025-06-09 13:42:49,925 - INFO - Prompt: "Search for information about Project Chimera and format the results using bullet points." - 2025-06-09 13:42:50,894 - INFO - Agent invocation successful. Processing response... - 2025-06-09 13:43:05,971 - INFO - --- Agent Final Response ---• Project Chimera combines quantum entanglement communication with neural networks for secure, real-time data analysis across distributed nodes. Lead developer: Dr. Aris Thorne. - - • Chimera operates in two modes: - - 'Quantum Sync' for high-fidelity data transfer - - 'Neural Inference' for localized edge processing based on the synced data. - - • A key aspect of Chimera is its "Ephemeral Key Protocol" (EKP), which generates one-time quantum keys for each transmission, ensuring absolute forward secrecy. - 2025-06-09 13:43:05,973 - INFO - --- Invocation Trace Summary --- - 2025-06-09 13:43:05,975 - INFO - Trace 1: Type=None, Step=None - 2025-06-09 13:43:05,975 - INFO - Trace 2: Type=None, Step=None - 2025-06-09 13:43:05,976 - INFO - Trace 3: Type=None, Step=None - 2025-06-09 13:43:05,976 - INFO - Trace 4: Type=None, Step=None - 2025-06-09 13:43:05,977 - INFO - Trace 5: Type=None, Step=None - 2025-06-09 13:43:05,977 - INFO - Trace 6: Type=None, Step=None - 2025-06-09 13:43:05,978 - INFO - Trace 7: Type=None, Step=None - 2025-06-09 13:43:05,978 - INFO - Trace 8: Type=None, Step=None - - -## Conclusion - -In this notebook, we've demonstrated the Lambda approach for implementing AWS Bedrock agents with Couchbase Vector Search. This approach allows the agent to invoke AWS Lambda functions to execute operations, providing better scalability and separation of concerns. - -Key components of this implementation include: - -1. **Vector Store Setup**: We set up a Couchbase vector store to store and search documents using semantic similarity. -2. **Lambda Function Deployment**: We deployed Lambda functions that handle the agent's function calls. -3. **Agent Creation**: We created two specialized agents - a researcher agent for searching documents and a writer agent for formatting results. -4. **Lambda Integration**: We integrated the agents with Lambda functions, allowing them to execute operations in a serverless environment. - -This approach is particularly useful for production environments where scalability and separation of concerns are important. The Lambda functions can be deployed independently and can access other AWS services, providing more flexibility and power. diff --git a/tutorial/markdown/generated/vector-search-cookbook/awsbedrock-fts-RAG_with_Couchbase_and_Bedrock.md b/tutorial/markdown/generated/vector-search-cookbook/awsbedrock-fts-RAG_with_Couchbase_and_Bedrock.md deleted file mode 100644 index 0288ed3..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/awsbedrock-fts-RAG_with_Couchbase_and_Bedrock.md +++ /dev/null @@ -1,814 +0,0 @@ ---- -# frontmatter -path: "/tutorial-aws-bedrock-couchbase-rag-with-fts" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Amazon Bedrock using FTS service -short_title: RAG with Couchbase and Amazon Bedrock using FTS service -description: - - Learn how to build a semantic search engine using Couchbase and Amazon Bedrock using FTS service. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Amazon Bedrock's Titan embeddings and Claude language model. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - Amazon Bedrock -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/awsbedrock/fts/RAG_with_Couchbase_and_Bedrock.ipynb) - -# Introduction - -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Amazon Bedrock](https://aws.amazon.com/bedrock/) as both the embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using the FTS service from scratch. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-aws-bedrock-couchbase-rag-with-global-secondary-index/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/awsbedrock/fts/RAG_with_Couchbase_and_Bedrock.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for AWS Bedrock -* Please follow the [instructions](https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html) to set up AWS Bedrock and generate credentials. -* Ensure you have the necessary IAM permissions to access Bedrock services. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries - -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.3.0 langchain-aws==0.2.20 boto3==1.37.35 python-dotenv==1.1.0 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -import boto3 -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException,ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_aws import BedrockEmbeddings, ChatBedrock -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from tqdm import tqdm -``` - -# Setup Logging - -Logging is configured to track the progress of the script and capture any errors or warnings. - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like AWS credentials, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The project includes an `.env.sample` file that lists all the environment variables. To get started: - -1. Create a `.env` file in the same directory as this notebook -2. Copy the contents from `.env.sample` to your `.env` file -3. Fill in the required credentials - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python - -# Load environment variables from .env file if it exists -load_dotenv() - -# AWS Credentials -AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID') or input('Enter your AWS Access Key ID: ') -AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY') or getpass.getpass('Enter your AWS Secret Access Key: ') -AWS_REGION = os.getenv('AWS_REGION') or input('Enter your AWS region (default: us-east-1): ') or 'us-east-1' - -# Couchbase Settings -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME') or input('Enter your index name (default: vector_search_bedrock): ') or 'vector_search_bedrock' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: bedrock): ') or 'bedrock' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if required credentials are set -for cred_name, cred_value in { - 'AWS_ACCESS_KEY_ID': AWS_ACCESS_KEY_ID, - 'AWS_SECRET_ACCESS_KEY': AWS_SECRET_ACCESS_KEY, - 'CB_HOST': CB_HOST, - 'CB_USERNAME': CB_USERNAME, - 'CB_PASSWORD': CB_PASSWORD, - 'CB_BUCKET_NAME': CB_BUCKET_NAME -}.items(): - if not cred_value: - raise ValueError(f"{cred_name} is not set") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-05-24 22:16:19,393 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-05-24 22:16:20,521 - INFO - Bucket 'vector-search-testing' exists. - 2025-05-24 22:16:22,380 - INFO - Collection 'bedrock' already exists. Skipping creation. - 2025-05-24 22:16:25,416 - INFO - Primary index present or created successfully. - 2025-05-24 22:16:25,875 - INFO - All documents cleared from the collection. - 2025-05-24 22:16:25,875 - INFO - Bucket 'vector-search-testing' exists. - 2025-05-24 22:16:28,317 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-05-24 22:16:31,244 - INFO - Primary index present or created successfully. - 2025-05-24 22:16:31,472 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This AWS Bedrock vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `bedrock`. The configuration is set up for vectors with exactly `1024 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -try: - with open('aws_index.json', 'r') as file: - index_definition = json.load(file) -except Exception as e: - raise ValueError(f"Error loading index definition: {str(e)}") -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-05-24 22:16:32,676 - INFO - Index 'vector_search_bedrock' found - 2025-05-24 22:16:33,339 - INFO - Index 'vector_search_bedrock' already exists. Skipping creation/update. - - -# Creating Amazon Bedrock Client and Embeddings - -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. We'll use Amazon Bedrock's Titan embedding model for embeddings. - -## Using Amazon Bedrock's Titan Model - -Language models are AI systems that are trained to understand and generate human language. We'll be using Amazon Bedrock's Titan model to process user queries and generate meaningful responses. The Titan model family includes both embedding models for converting text into vector representations and text generation models for producing human-like responses. - -Key features of Amazon Bedrock's Titan models: -- Titan Embeddings model for embedding vector generation -- Titan Text model for natural language understanding and generation -- Seamless integration with AWS infrastructure -- Enterprise-grade security and scalability - - -```python -try: - bedrock_client = boto3.client( - service_name='bedrock-runtime', - region_name=AWS_REGION, - aws_access_key_id=AWS_ACCESS_KEY_ID, - aws_secret_access_key=AWS_SECRET_ACCESS_KEY - ) - - embeddings = BedrockEmbeddings( - client=bedrock_client, - model_id="amazon.titan-embed-text-v2:0" - ) - logging.info("Successfully created Bedrock embeddings client") -except Exception as e: - raise ValueError(f"Error creating Bedrock embeddings client: {str(e)}") -``` - - 2025-05-24 22:16:33,570 - INFO - Successfully created Bedrock embeddings client - - -# Setting Up the Couchbase Vector Store -A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-05-24 22:16:37,040 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - - README.md: 0%| | 0.00/54.6k [00:00 - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/awsbedrock/gsi/RAG_with_Couchbase_and_Bedrock.ipynb) - -# Introduction - -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Amazon Bedrock](https://aws.amazon.com/bedrock/) as both the embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-aws-bedrock-couchbase-rag-with-fts/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/awsbedrock/gsi/RAG_with_Couchbase_and_Bedrock.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for AWS Bedrock -* Please follow the [instructions](https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html) to set up AWS Bedrock and generate credentials. -* Ensure you have the necessary IAM permissions to access Bedrock services. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries - -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-aws boto3==1.37.35 python-dotenv==1.1.0 - -``` - - - [notice] A new release of pip is available: 24.3.1 -> 25.2 - [notice] To update, run: pip install --upgrade pip - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -import boto3 -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_aws import BedrockEmbeddings, ChatBedrock -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from tqdm import tqdm -``` - -# Setup Logging - -Logging is configured to track the progress of the script and capture any errors or warnings. - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like AWS credentials, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The project includes an `.env.sample` file that lists all the environment variables. To get started: - -1. Create a `.env` file in the same directory as this notebook -2. Copy the contents from `.env.sample` to your `.env` file -3. Fill in the required credentials - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python - -# Load environment variables from .env file if it exists -load_dotenv(override=True) - -# AWS Credentials -AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID') or input('Enter your AWS Access Key ID: ') -AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY') or getpass.getpass('Enter your AWS Secret Access Key: ') -AWS_REGION = os.getenv('AWS_REGION') or input('Enter your AWS region (default: us-east-1): ') or 'us-east-1' - -# Couchbase Settings -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: bedrock): ') or 'bedrock' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if required credentials are set -for cred_name, cred_value in { - 'AWS_ACCESS_KEY_ID': AWS_ACCESS_KEY_ID, - 'AWS_SECRET_ACCESS_KEY': AWS_SECRET_ACCESS_KEY, - 'CB_HOST': CB_HOST, - 'CB_USERNAME': CB_USERNAME, - 'CB_PASSWORD': CB_PASSWORD, - 'CB_BUCKET_NAME': CB_BUCKET_NAME -}.items(): - if not cred_value: - raise ValueError(f"{cred_name} is not set") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-02 12:21:07,348 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-08-29 13:03:42,591 - INFO - Bucket 'query-vector-search-testing' does not exist. Creating it... - 2025-08-29 13:03:44,657 - INFO - Bucket 'query-vector-search-testing' created successfully. - 2025-08-29 13:03:44,663 - INFO - Scope 'shared' does not exist. Creating it... - 2025-08-29 13:03:44,704 - INFO - Scope 'shared' created successfully. - 2025-08-29 13:03:44,714 - INFO - Collection 'bedrock' does not exist. Creating it... - 2025-08-29 13:03:44,770 - INFO - Collection 'bedrock' created successfully. - 2025-08-29 13:03:46,953 - INFO - All documents cleared from the collection. - 2025-08-29 13:03:46,954 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-08-29 13:03:46,969 - INFO - Collection 'cache' does not exist. Creating it... - 2025-08-29 13:03:47,025 - INFO - Collection 'cache' created successfully. - 2025-08-29 13:03:49,183 - INFO - All documents cleared from the collection. - - - - - - - - - -# Creating Amazon Bedrock Client and Embeddings - -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. We'll use Amazon Bedrock's Titan embedding model for embeddings. - -## Using Amazon Bedrock's Titan Model - -Language models are AI systems that are trained to understand and generate human language. We'll be using Amazon Bedrock's Titan model to process user queries and generate meaningful responses. The Titan model family includes both embedding models for converting text into vector representations and text generation models for producing human-like responses. - -Key features of Amazon Bedrock's Titan models: -- Titan Embeddings model for embedding vector generation -- Titan Text model for natural language understanding and generation -- Seamless integration with AWS infrastructure -- Enterprise-grade security and scalability - - -```python -try: - bedrock_client = boto3.client( - service_name='bedrock-runtime', - region_name=AWS_REGION, - aws_access_key_id=AWS_ACCESS_KEY_ID, - aws_secret_access_key=AWS_SECRET_ACCESS_KEY - ) - - embeddings = BedrockEmbeddings( - client=bedrock_client, - model_id="amazon.titan-embed-text-v2:0" - ) - logging.info("Successfully created Bedrock embeddings client") -except Exception as e: - raise ValueError(f"Error creating Bedrock embeddings client: {str(e)}") -``` - - 2025-09-02 12:21:15,663 - INFO - Successfully created Bedrock embeddings client - - -# Setting Up the Couchbase Query Vector Store -A vector store is where we'll keep our embeddings. The query vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, GSI converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables us to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - -The vector store requires a distance metric to determine how similarity between vectors is calculated. This is crucial for accurate semantic search results as different distance metrics can yield different similarity rankings. Some of the supported Distance strategies are dot, l2, euclidean, cosine, l2_squared, euclidean_squared. In our implementation we will use cosine which is particularly effective for text embeddings. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-09-02 12:22:15,979 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-02 12:21:31,880 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of 50 articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Error Handling: If an error occurs, only the current batch is affected -3. Progress Tracking: Easier to monitor and track the ingestion progress -4. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-08-20 14:05:53,302 - INFO - Document ingestion completed successfully. - - -# Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-02 12:22:20,978 - INFO - Successfully created cache - - -# Using Amazon Bedrock's Titan Text Express v1 Model - -Amazon Bedrock's Titan Text Express v1 is a state-of-the-art foundation model designed for fast and efficient text generation tasks. This model excels at: - -- Text generation and completion -- Question answering -- Summarization -- Content rewriting -- Analysis and extraction - -Key features of Titan Text Express v1: - -- Optimized for low-latency responses while maintaining high quality output -- Supports up to 8K tokens context window -- Built-in content filtering and safety controls -- Cost-effective compared to larger models -- Seamlessly integrates with AWS services - -The model uses a temperature parameter (0-1) to control randomness in responses: -- Lower values (e.g. 0) produce more focused, deterministic outputs -- Higher values introduce more creativity and variation - -We'll be using this model through Amazon Bedrock's API to process user queries and generate contextually relevant responses based on our vector database content. - - -```python -try: - llm = ChatBedrock( - client=bedrock_client, - model_id="amazon.titan-text-express-v1", - model_kwargs={"temperature": 0} - ) - logging.info("Successfully created Bedrock LLM client") -except Exception as e: - logging.error(f"Error creating Bedrock LLM client: {str(e)}. Please check your AWS credentials and Bedrock access.") - raise -``` - - 2025-09-02 12:22:24,513 - INFO - Successfully created Bedrock LLM client - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-02 12:23:51,477 - INFO - Semantic search completed in 1.29 seconds - - - - Semantic Search Results (completed in 1.29 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3512, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - Littler returned to Alexandra Palace to a boisterous reception from more than 3,000 spectators and delivered an astonishing display in the fourth set. He was on for a nine-darter after his opening two throws in both of the first two legs and completed the set in 32 darts - the minimum possible is 27. The teenager will next play after Christmas against European Championship winner Ritchie Edhouse, the 29th seed, or Ian White, and is seeded to meet Humphries in the semi-finals. Having entered last year's event ranked 164th, Littler is up to fourth in the world and will go to number two if he reaches the final again this time. He has won 10 titles in his debut professional year, including the Premier League and Grand Slam of Darts. After reaching the World Championship final as a debutant aged just 16, Littler's life has been transformed and interest in darts has rocketed. Google say he was the most searched-for athlete online in the UK during 2024. This Christmas, more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies and has prompted plans to expand the World Championship. Littler was named BBC Young Sports Personality of the Year on Tuesday and was runner-up to athlete Keely Hodgkinson for the main award. - - Nick Kenny will play world champion Luke Humphries in round three after Christmas - - Barneveld was shocked 3-1 by world number 76 Kenny, who was in tears after a famous victory. Kenny, 32, will face Humphries in round three after defeating the Dutchman, who won the BDO world title four times and the PDC crown in 2007. Van Barneveld, ranked 32nd, became the sixth seed to exit in the second round. His compatriot Noppert, the 13th seed, was stunned 3-1 by Joyce, who will face Ryan Searle or Matt Campbell next, with the winner of that tie potentially meeting Littler in the last 16. Elsewhere, 15th seed Chris Dobey booked his place in the third round with a 3-1 win over Alexander Merkx. Englishman Dobey concluded an afternoon session which started with a trio of 3-0 scorelines. Northern Ireland's Brendan Dolan beat Lok Yin Lee to set up a meeting with three-time champion Michael van Gerwen after Christmas. In the final two first-round matches of the 2025 competition, Wales' Rhys Griffin beat Karel Sedlacek of the Czech Republic before Asia number one Alexis Toylo cruised past Richard Veenstra. - -------------------------------------------------------------------------------- - Distance: 0.4124, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - - ... (output truncated for brevity) - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -from langchain_couchbase.vectorstores import IndexType -vector_store.create_index(index_type=IndexType.BHIVE, index_name="bedrock_bhive_index",index_description="IVF,SQ8") -``` - -The example below shows running the same similarity search, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - -**Important**: When using Composite indexes, scalar filters take precedence over vector similarity, which can improve performance for filtered searches but may miss some semantically relevant results that don't match the scalar criteria. - -Note: In GSI vector search, the distance represents the vector distance between the query and document embeddings. Lower distance indicate higher similarity, while higher distance indicate lower similarity. - - -```python - -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-02 12:24:54,503 - INFO - Semantic search completed in 0.36 seconds - - - - Semantic Search Results (completed in 0.36 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3512, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - Littler returned to Alexandra Palace to a boisterous reception from more than 3,000 spectators and delivered an astonishing display in the fourth set. He was on for a nine-darter after his opening two throws in both of the first two legs and completed the set in 32 darts - the minimum possible is 27. The teenager will next play after Christmas against European Championship winner Ritchie Edhouse, the 29th seed, or Ian White, and is seeded to meet Humphries in the semi-finals. Having entered last year's event ranked 164th, Littler is up to fourth in the world and will go to number two if he reaches the final again this time. He has won 10 titles in his debut professional year, including the Premier League and Grand Slam of Darts. After reaching the World Championship final as a debutant aged just 16, Littler's life has been transformed and interest in darts has rocketed. Google say he was the most searched-for athlete online in the UK during 2024. This Christmas, more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies and has prompted plans to expand the World Championship. Littler was named BBC Young Sports Personality of the Year on Tuesday and was runner-up to athlete Keely Hodgkinson for the main award. - - Nick Kenny will play world champion Luke Humphries in round three after Christmas - - Barneveld was shocked 3-1 by world number 76 Kenny, who was in tears after a famous victory. Kenny, 32, will face Humphries in round three after defeating the Dutchman, who won the BDO world title four times and the PDC crown in 2007. Van Barneveld, ranked 32nd, became the sixth seed to exit in the second round. His compatriot Noppert, the 13th seed, was stunned 3-1 by Joyce, who will face Ryan Searle or Matt Campbell next, with the winner of that tie potentially meeting Littler in the last 16. Elsewhere, 15th seed Chris Dobey booked his place in the third round with a 3-1 win over Alexander Merkx. Englishman Dobey concluded an afternoon session which started with a trio of 3-0 scorelines. Northern Ireland's Brendan Dolan beat Lok Yin Lee to set up a meeting with three-time champion Michael van Gerwen after Christmas. In the final two first-round matches of the 2025 competition, Wales' Rhys Griffin beat Karel Sedlacek of the Czech Republic before Asia number one Alexis Toylo cruised past Richard Veenstra. - -------------------------------------------------------------------------------- - Distance: 0.4124, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - - ... (output truncated for brevity) - - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - - -```python -from langchain_couchbase.vectorstores import IndexType -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="bedrock_composite_index", index_description="IVF,SQ8") -``` - -# Retrieval-Augmented Generation (RAG) with Couchbase and LangChain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -# Create RAG prompt template -rag_prompt = ChatPromptTemplate.from_messages([ - ("system", "You are a helpful assistant that answers questions based on the provided context."), - ("human", "Context: {context}\n\nQuestion: {question}") -]) - -# Create RAG chain -rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | rag_prompt - | llm - | StrOutputParser() -) -logging.info("Successfully created RAG chain") -``` - - 2025-09-02 12:25:08,521 - INFO - Successfully created RAG chain - - - -```python -start_time = time.time() -# Turn off excessive Logging -logging.basicConfig(level=logging.WARNING, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -try: - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: - Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end - RAG response generated in 0.41 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What were Luke Littler's key achievements and records in his recent PDC World Championship match?", - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") - -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: The match between Fullham and Liverpool ended in a 2-2 draw. - Time taken: 2.30 seconds - - Query 2: What were Luke Littler's key achievements and records in his recent PDC World Championship match? - Response: - Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end - Time taken: 0.40 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: The match between Fullham and Liverpool ended in a 2-2 draw. - Time taken: 0.36 seconds - - -## Conclusion -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and AWS Bedrock. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how it improves querying data more efficiently using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/azure-fts-RAG_with_Couchbase_and_AzureOpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/azure-fts-RAG_with_Couchbase_and_AzureOpenAI.md deleted file mode 100644 index 6235472..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/azure-fts-RAG_with_Couchbase_and_AzureOpenAI.md +++ /dev/null @@ -1,605 +0,0 @@ ---- -# frontmatter -path: "/tutorial-azure-openai-couchbase-rag-with-fts" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Azure OpenAI using FTS service -short_title: RAG with Couchbase and Azure OpenAI using FTS service -description: - - Learn how to build a semantic search engine using Couchbase and Azure OpenAI using FTS service. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Azure OpenAI embeddings. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - OpenAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/azure/fts/RAG_with_Couchbase_and_AzureOpenAI.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [AzureOpenAI](https://azure.microsoft.com/) as the AI-powered embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using the FTS service from scratch. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-azure-openai-couchbase-rag-with-global-secondary-index/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/azure/fts/RAG_with_Couchbase_and_AzureOpenAI.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for Azure OpenAI - -Please follow the [instructions](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference) to generate the Azure OpenAI credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and AzureOpenAI provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -!pip install datasets==3.5.0 langchain-couchbase==0.3.0 langchain-openai==0.3.13 -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import sys -import time -from datetime import timedelta -from uuid import uuid4 - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import ( - CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, -) -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from langchain_core.documents import Document -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings -from tqdm import tqdm -``` - - /Users/aayush.tyagi/Documents/AI/vector-search-cookbook/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html - from .autonotebook import tqdm as notebook_tqdm - - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -AZURE_OPENAI_KEY = getpass.getpass('Enter your Azure OpenAI Key: ') -AZURE_OPENAI_ENDPOINT = input('Enter your Azure OpenAI Endpoint: ') -AZURE_OPENAI_EMBEDDING_DEPLOYMENT = input('Enter your Azure OpenAI Embedding Deployment: ') -AZURE_OPENAI_CHAT_DEPLOYMENT = input('Enter your Azure OpenAI Chat Deployment: ') - -CB_HOST = input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = input('Enter your index name (default: vector_search_azure): ') or 'vector_search_azure' -SCOPE_NAME = input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = input('Enter your collection name (default: azure): ') or 'azure' -CACHE_COLLECTION = input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if the variables are correctly loaded -if not all([AZURE_OPENAI_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_EMBEDDING_DEPLOYMENT, AZURE_OPENAI_CHAT_DEPLOYMENT]): - raise ValueError("Missing required Azure OpenAI variables") -``` - - Enter your Azure OpenAI Key: ·········· - Enter your Azure OpenAI Endpoint: https://first-couchbase-instance.openai.azure.com/ - Enter your Azure OpenAI Embedding Deployment: text-embedding-ada-002 - Enter your Azure OpenAI Chat Deployment: gpt-4o - Enter your Couchbase host (default: couchbase://localhost): couchbases://cb.hlcup4o4jmjr55yf.cloud.couchbase.com - Enter your Couchbase username (default: Administrator): vector-search-rag-demos - Enter your Couchbase password (default: password): ·········· - Enter your Couchbase bucket name (default: vector-search-testing): - Enter your index name (default: vector_search_azure): - Enter your scope name (default: shared): - Enter your collection name (default: azure): - Enter your cache collection name (default: cache): - - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2024-09-06 07:29:16,632 - INFO - Successfully connected to Couchbase - - -# Setting Up Collections in Couchbase -In Couchbase, data is organized in buckets, which can be further divided into scopes and collections. Think of a collection as a table in a traditional SQL database. Before we can store any data, we need to ensure that our collections exist. If they don't, we must create them. This step is important because it prepares the database to handle the specific types of data our application will process. By setting up collections, we define the structure of our data storage, which is essential for efficient data retrieval and management. - -Moreover, setting up collections allows us to isolate different types of data within the same bucket, providing a more organized and scalable data structure. This is particularly useful when dealing with large datasets, as it ensures that related data is stored together, making it easier to manage and query. - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - bucket = cluster.bucket(bucket_name) - bucket_manager = bucket.collections() - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists.Skipping creation.") - - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) -``` - - 2024-09-06 07:29:17,029 - INFO - Collection 'azure' already exists.Skipping creation. - 2024-09-06 07:29:17,095 - INFO - Primary index present or created successfully. - 2024-09-06 07:29:17,775 - INFO - All documents cleared from the collection. - 2024-09-06 07:29:17,841 - INFO - Collection 'cache' already exists.Skipping creation. - 2024-09-06 07:29:17,907 - INFO - Primary index present or created successfully. - 2024-09-06 07:29:17,973 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/azure_index.json' # Local setup: specify your file path here - -# If you are running in Google Colab, use the following code to upload the index definition file -from google.colab import files -print("Upload your index definition file") -uploaded = files.upload() -index_definition_path = list(uploaded.keys())[0] - -try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) -except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") -``` - - Upload your index definition file - - - Saving azure_index.json to azure_index.json - - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") - -except InternalServerFailureException as e: - error_message = str(e) - logging.error(f"InternalServerFailureException raised: {error_message}") - - try: - # Accessing the response_body attribute from the context - error_context = e.context - response_body = error_context.response_body - if response_body: - error_details = json.loads(response_body) - error_message = error_details.get('error', '') - - if "collection: 'azure' doesn't belong to scope: 'shared'" in error_message: - raise ValueError("Collection 'azure' does not belong to scope 'shared'. Please check the collection and scope names.") - - except ValueError as ve: - logging.error(str(ve)) - raise - - except Exception as json_error: - logging.error(f"Failed to parse the error message: {json_error}") - raise RuntimeError(f"Internal server error while creating/updating search index: {error_message}") -``` - - 2024-09-06 07:30:01,070 - INFO - Index 'vector_search_azure' found - 2024-09-06 07:30:01,373 - INFO - Index 'vector_search_azure' already exists. Skipping creation/update. - - -# Load the TREC Dataset -To build a search engine, we need data to search through. We use the TREC dataset, a well-known benchmark in the field of information retrieval. This dataset contains a wide variety of text data that we'll use to train our search engine. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the data in the TREC dataset make it an excellent choice for testing and refining our search engine, ensuring that it can handle a wide range of queries effectively. - -The TREC dataset's rich content allows us to simulate real-world scenarios where users ask complex questions, enabling us to fine-tune our search engine's ability to understand and respond to various types of queries. - - -```python -try: - trec = load_dataset('trec', split='train[:1000]') - logging.info(f"Successfully loaded TREC dataset with {len(trec)} samples") -except Exception as e: - raise ValueError(f"Error loading TREC dataset: {str(e)}") -``` - - /usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: - The secret `HF_TOKEN` does not exist in your Colab secrets. - To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. - You will be able to reuse this secret in all of your notebooks. - Please note that authentication is recommended but still optional to access public models or datasets. - warnings.warn( - - - The repository for trec contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/trec. - You can avoid this prompt in future by passing the argument `trust_remote_code=True`. - - Do you wish to run the custom code? [y/N] y - - - 2024-09-06 07:30:12,308 - INFO - Successfully loaded TREC dataset with 1000 samples - - -# Creating AzureOpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using AzureOpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - - - -```python -try: - embeddings = AzureOpenAIEmbeddings( - deployment=AZURE_OPENAI_EMBEDDING_DEPLOYMENT, - openai_api_key=AZURE_OPENAI_KEY, - azure_endpoint=AZURE_OPENAI_ENDPOINT - ) - logging.info("Successfully created AzureOpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating AzureOpenAIEmbeddings: {str(e)}") -``` - - 2024-09-06 07:30:13,014 - INFO - Successfully created AzureOpenAIEmbeddings - - -# Setting Up the Couchbase Vector Store -The vector store is set up to manage the embeddings created in the previous step. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is built on top of Couchbase, allowing the script to store the embeddings in a way that can be efficiently searched. - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2024-09-06 07:30:14,043 - INFO - Successfully created vector store - - -# Saving Data to the Vector Store -With the vector store set up, the next step is to populate it with data. We save the TREC dataset to the vector store in batches. This method is efficient and ensures that our search engine can handle large datasets without running into performance issues. By saving the data in this way, we prepare our search engine to quickly and accurately respond to user queries. This step is essential for making the dataset searchable, transforming raw data into a format that can be easily queried by our search engine. - -Batch processing is particularly important when dealing with large datasets, as it prevents memory overload and ensures that the data is stored in a structured and retrievable manner. This approach not only optimizes performance but also ensures the scalability of our system. - - -```python -try: - batch_size = 50 - logging.disable(sys.maxsize) # Disable logging to prevent tqdm output - for i in tqdm(range(0, len(trec['text']), batch_size), desc="Processing Batches"): - batch = trec['text'][i:i + batch_size] - documents = [Document(page_content=text) for text in batch] - uuids = [str(uuid4()) for _ in range(len(documents))] - vector_store.add_documents(documents=documents, ids=uuids) - logging.disable(logging.NOTSET) # Re-enable logging -except Exception as e: - raise RuntimeError(f"Failed to save documents to vector store: {str(e)}") -``` - - Processing Batches: 100%|██████████| 20/20 [00:37<00:00, 1.87s/it] - - -# Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2024-09-06 07:30:52,165 - INFO - Successfully created cache - - -# Using the AzureChatOpenAI Language Model (LLM) -Language models are AI systems that are trained to understand and generate human language. We'll be using `AzureChatOpenAI` language model to process user queries and generate meaningful responses. This model is a key component of our semantic search engine, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our search engine with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our search engine truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - - - - -```python -try: - llm = AzureChatOpenAI( - deployment_name=AZURE_OPENAI_CHAT_DEPLOYMENT, - openai_api_key=AZURE_OPENAI_KEY, - azure_endpoint=AZURE_OPENAI_ENDPOINT, - openai_api_version="2024-07-01-preview" - ) - logging.info("Successfully created Azure OpenAI Chat model") -except Exception as e: - raise ValueError(f"Error creating Azure OpenAI Chat model: {str(e)}") -``` - - 2024-09-06 07:30:52,298 - INFO - Successfully created Azure OpenAI Chat model - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of the CouchbaseSearchVectorStore. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What caused the 1929 Great Depression?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2024-09-06 07:30:52,532 - INFO - HTTP Request: POST https://first-couchbase-instance.openai.azure.com//openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15 "HTTP/1.1 200 OK" - 2024-09-06 07:30:52,839 - INFO - Semantic search completed in 0.53 seconds - - - - Semantic Search Results (completed in 0.53 seconds): - Distance: 0.9178, Text: Why did the world enter a global depression in 1929 ? - Distance: 0.8714, Text: When was `` the Great Depression '' ? - Distance: 0.8113, Text: What crop failure caused the Irish Famine ? - Distance: 0.7984, Text: What historical event happened in Dogtown in 1899 ? - Distance: 0.7917, Text: What caused the Lynmouth floods ? - Distance: 0.7915, Text: When was the first Wall Street Journal published ? - Distance: 0.7911, Text: When did the Dow first reach ? - Distance: 0.7885, Text: What were popular songs and types of songs in the 1920s ? - Distance: 0.7857, Text: When did World War I start ? - Distance: 0.7842, Text: What caused Harry Houdini 's death ? - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below: - {context} - Question: {question}""" -prompt = ChatPromptTemplate.from_template(template) -rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | prompt - | llm - | StrOutputParser() -) -logging.info("Successfully created RAG chain") -``` - - 2024-09-06 07:30:52,860 - INFO - Successfully created RAG chain - - - -```python -# Get responses -logging.disable(sys.maxsize) # Disable logging to prevent tqdm output -start_time = time.time() -rag_response = rag_chain.invoke(query) -rag_elapsed_time = time.time() - start_time - -print(f"RAG Response: {rag_response}") -print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -``` - - RAG Response: The 1929 Great Depression was caused by a combination of factors, including the stock market crash of October 1929, bank failures, reduction in consumer spending and investment, and poor economic policies. - RAG response generated in 2.32 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "Why do heavier objects travel downhill faster?", - "What is the capital of France?", - "What caused the 1929 Great Depression?", # Repeated query - "Why do heavier objects travel downhill faster?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") -except Exception as e: - raise ValueError(f"Error generating RAG response: {str(e)}") -``` - - - Query 1: Why do heavier objects travel downhill faster? - Response: Heavier objects travel downhill faster primarily due to the force of gravity acting on them. Gravity accelerates all objects at the same rate, but heavier objects may encounter less air resistance relative to their weight, allowing them to maintain higher speeds as they descend. Additionally, factors such as surface friction and the distribution of mass can influence the speed at which an object travels downhill. - Time taken: 61.73 seconds - - Query 2: What is the capital of France? - Response: The capital of France is Paris. - Time taken: 60.63 seconds - - Query 3: What caused the 1929 Great Depression? - Response: The 1929 Great Depression was caused by a combination of factors, including the stock market crash of October 1929, bank failures, reduction in consumer spending and investment, and poor economic policies. - Time taken: 1.49 seconds - - Query 4: Why do heavier objects travel downhill faster? - Response: Heavier objects travel downhill faster primarily due to the force of gravity acting on them. Gravity accelerates all objects at the same rate, but heavier objects may encounter less air resistance relative to their weight, allowing them to maintain higher speeds as they descend. Additionally, factors such as surface friction and the distribution of mass can influence the speed at which an object travels downhill. - Time taken: 0.60 seconds - - -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and AzureOpenAI. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/azure-gsi-RAG_with_Couchbase_and_AzureOpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/azure-gsi-RAG_with_Couchbase_and_AzureOpenAI.md deleted file mode 100644 index 52b39cb..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/azure-gsi-RAG_with_Couchbase_and_AzureOpenAI.md +++ /dev/null @@ -1,762 +0,0 @@ ---- -# frontmatter -path: "/tutorial-azure-openai-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Azure OpenAI using GSI index -short_title: RAG with Couchbase and Azure OpenAI using GSI index -description: - - Learn how to build a semantic search engine using Couchbase and Azure OpenAI using GSI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Azure OpenAI embeddings. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - GSI - - Artificial Intelligence - - LangChain - - OpenAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/azure/gsi/RAG_with_Couchbase_and_AzureOpenAI.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [AzureOpenAI](https://azure.microsoft.com/) as the AI-powered embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-azure-openai-couchbase-rag-with-fts/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/azure/gsi/RAG_with_Couchbase_and_AzureOpenAI.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for Azure OpenAI - -Please follow the [instructions](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference) to generate the Azure OpenAI credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and AzureOpenAI provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -!pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-openai==0.3.32 -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import sys -import os -import time -from datetime import timedelta -from uuid import uuid4 - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import ( - CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, -) -from couchbase.options import ClusterOptions -from datasets import load_dataset -from langchain_core.documents import Document -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings -from langchain_couchbase.vectorstores import IndexType -from tqdm import tqdm -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -# Suppress verbose HTTP request logging -logging.getLogger("httpx").setLevel(logging.WARNING) -logging.getLogger("openai").setLevel(logging.WARNING) -logging.getLogger("urllib3").setLevel(logging.WARNING) -logging.getLogger("azure").setLevel(logging.WARNING) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -AZURE_OPENAI_KEY = os.getenv('AZURE_OPENAI_KEY') or getpass.getpass('Enter your Azure OpenAI Key: ') -AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT') or input('Enter your Azure OpenAI Endpoint: ') -AZURE_OPENAI_EMBEDDING_DEPLOYMENT = os.getenv('AZURE_OPENAI_EMBEDDING_DEPLOYMENT') or input('Enter your Azure OpenAI Embedding Deployment: ') -AZURE_OPENAI_CHAT_DEPLOYMENT = os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT') or input('Enter your Azure OpenAI Chat Deployment: ') - -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: azure): ') or 'azure' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if the variables are correctly loaded -if not all([AZURE_OPENAI_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_EMBEDDING_DEPLOYMENT, AZURE_OPENAI_CHAT_DEPLOYMENT]): - raise ValueError("Missing required Azure OpenAI variables") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-22 12:23:15,245 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) -``` - - 2025-09-22 12:23:20,911 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-22 12:23:20,927 - INFO - Collection 'azure' already exists. Skipping creation. - - - 2025-09-22 12:23:23,264 - INFO - All documents cleared from the collection. - 2025-09-22 12:23:23,265 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-22 12:23:23,280 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-09-22 12:23:25,419 - INFO - All documents cleared from the collection. - - - - - - - - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-22 12:23:43,453 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -# Creating AzureOpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using AzureOpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - - - -```python -try: - embeddings = AzureOpenAIEmbeddings( - deployment=AZURE_OPENAI_EMBEDDING_DEPLOYMENT, - openai_api_key=AZURE_OPENAI_KEY, - azure_endpoint=AZURE_OPENAI_ENDPOINT - ) - logging.info("Successfully created AzureOpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating AzureOpenAIEmbeddings: {str(e)}") -``` - - 2025-09-22 12:23:51,333 - INFO - Successfully created AzureOpenAIEmbeddings - - -# Setting Up the Couchbase Query Vector Store -A vector store is where we'll keep our embeddings. The query vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, GSI converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables us to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - -The vector store requires a distance metric to determine how similarity between vectors is calculated. This is crucial for accurate semantic search results as different distance metrics can yield different similarity rankings. Some of the supported Distance strategies are dot, l2, euclidean, cosine, l2_squared, euclidean_squared. In our implementation we will use cosine which is particularly effective for text embeddings. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-09-22 12:24:25,546 - INFO - Successfully created vector store - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of 50 articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-09-22 12:36:18,756 - INFO - Document ingestion completed successfully. - - -# Using the AzureChatOpenAI Language Model (LLM) -Language models are AI systems that are trained to understand and generate human language. We'll be using `AzureChatOpenAI` language model to process user queries and generate meaningful responses. This model is a key component of our semantic search engine, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our search engine with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our search engine truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - - - - -```python -try: - llm = AzureChatOpenAI( - deployment_name=AZURE_OPENAI_CHAT_DEPLOYMENT, - openai_api_key=AZURE_OPENAI_KEY, - azure_endpoint=AZURE_OPENAI_ENDPOINT, - openai_api_version="2024-10-21" - ) - logging.info("Successfully created Azure OpenAI Chat model") -except Exception as e: - raise ValueError(f"Error creating Azure OpenAI Chat model: {str(e)}") -``` - - 2025-09-22 12:39:45,695 - INFO - Successfully created Azure OpenAI Chat model - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-22 12:41:51,036 - INFO - Semantic search completed in 2.55 seconds - - - - Semantic Search Results (completed in 2.55 seconds): - Distance: 0.3697, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - Littler beat Luke Humphries to win the Premier League title in May - - The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m." - - Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November. - • None Know a lot about Littler? Take our quiz - Distance: 0.3901, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - Littler returned to Alexandra Palace to a boisterous reception from more than 3,000 spectators and delivered an astonishing display in the fourth set. He was on for a nine-darter after his opening two throws in both of the first two legs and completed the set in 32 darts - the minimum possible is 27. The teenager will next play after Christmas against European Championship winner Ritchie Edhouse, the 29th seed, or Ian White, and is seeded to meet Humphries in the semi-finals. Having entered last year's event ranked 164th, Littler is up to fourth in the world and will go to number two if he reaches the final again this time. He has won 10 titles in his debut professional year, including the Premier League and Grand Slam of Darts. After reaching the World Championship final as a debutant aged just 16, Littler's life has been transformed and interest in darts has rocketed. Google say he was the most searched-for athlete online in the UK during 2024. This Christmas, more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies and has prompted plans to expand the World Championship. Littler was named BBC Young Sports Personality of the Year on Tuesday and was runner-up to athlete Keely Hodgkinson for the main award. - - ... (output truncated for brevity) - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -vector_store.create_index(index_type=IndexType.BHIVE, index_name="azure_bhive_index",index_description="IVF,SQ8") -``` - -The example below shows running the same similarity search, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - -**Important**: When using Composite indexes, scalar filters take precedence over vector similarity, which can improve performance for filtered searches but may miss some semantically relevant results that don't match the scalar criteria. - -Note: In GSI vector search, the distance represents the vector distance between the query and document embeddings. Lower distance indicate higher similarity, while higher distance indicate lower similarity. - - -```python -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-22 12:42:10,244 - INFO - Semantic search completed in 1.30 seconds - - - - Semantic Search Results (completed in 1.30 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3697, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - Littler beat Luke Humphries to win the Premier League title in May - - The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m." - - Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November. - • None Know a lot about Littler? Take our quiz - -------------------------------------------------------------------------------- - Distance: 0.3901, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - ... (output truncated for brevity) - - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's question-answering scenario using the TREC dataset, either index type would work, but BHIVE might be more efficient for pure semantic search across questions. - - -```python -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="azure_composite_index", index_description="IVF,SQ8") -``` - -# Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-22 12:42:21,917 - INFO - Successfully created cache - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -# Create RAG prompt template -rag_prompt = ChatPromptTemplate.from_messages([ - ("system", "You are a helpful assistant that answers questions based on the provided context."), - ("human", "Context: {context}\n\nQuestion: {question}") -]) - -# Create RAG chain -rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | rag_prompt - | llm - | StrOutputParser() -) -logging.info("Successfully created RAG chain") -``` - - 2025-09-16 13:41:05,596 - INFO - Successfully created RAG chain - - - -```python -start_time = time.time() -# Turn off excessive Logging -logging.basicConfig(level=logging.WARNING, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -try: - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: In his recent PDC World Championship match, Luke Littler achieved several key milestones and records: - - 1. **Tournament Record Average**: Littler set a tournament record with a 140.91 set average during the fourth and final set of his second-round match against Ryan Meikle. - - 2. **Nine-Darter Attempt**: He came close to achieving a nine-darter but narrowly missed double 12. - - 3. **Dramatic Victory**: Littler defeated Meikle 3-1 in a match described as emotionally challenging for the 17-year-old. - - 4. **Fourth Set Dominance**: In the final set, Littler exploded into life, hitting four maximum 180s and winning three straight legs in 11, 10, and 11 darts. - - 5. **Overall Set Performance**: He completed the fourth set in 32 darts (the minimum possible is 27) and achieved a match average of 100.85. - - These achievements highlight Littler's exceptional talent and his continued rise in professional darts. - RAG response generated in 5.81 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What were Luke Littler's key achievements and records in his recent PDC World Championship match?", - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") - -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: In the Premier League match between Fulham and Liverpool, the game ended in a 2-2 draw at Anfield. Liverpool played the majority of the game with ten men after Andy Robertson was shown a red card in the 17th minute for denying Harry Wilson a goalscoring opportunity. Despite their numerical disadvantage, Liverpool demonstrated resilience and strong performance. - - Fulham took the lead twice during the match, but Liverpool managed to equalize on both occasions. Diogo Jota, returning from injury, scored the crucial 86th-minute equalizer for Liverpool. Even with 10 players, Liverpool maintained over 60% possession and led various attacking metrics, including shots, big chances, and touches in the opposition box. - - Fulham's left-back Antonee Robinson praised Liverpool’s performance, stating that it didn’t feel like they had 10 men on the field due to their attacking risks and relentless pressure. Liverpool head coach Arne Slot called his team's performance "impressive" and lauded their character and fight in adversity. - Time taken: 6.69 seconds - - Query 2: What were Luke Littler's key achievements and records in his recent PDC World Championship match? - Response: In his recent PDC World Championship match, Luke Littler achieved several key milestones and records: - - 1. **Tournament Record Average**: Littler set a tournament record with a 140.91 set average during the fourth and final set of his second-round match against Ryan Meikle. - - 2. **Nine-Darter Attempt**: He came close to achieving a nine-darter but narrowly missed double 12. - - 3. **Dramatic Victory**: Littler defeated Meikle 3-1 in a match described as emotionally challenging for the 17-year-old. - - 4. **Fourth Set Dominance**: In the final set, Littler exploded into life, hitting four maximum 180s and winning three straight legs in 11, 10, and 11 darts. - - 5. **Overall Set Performance**: He completed the fourth set in 32 darts (the minimum possible is 27) and achieved a match average of 100.85. - - These achievements highlight Littler's exceptional talent and his continued rise in professional darts. - Time taken: 1.09 seconds - - - ... (output truncated for brevity) - - -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and AzureOpenAI. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how it improves querying data more efficiently using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/capella-model-services-langchain-search_based-RAG_with_Capella_Model_Services_and_LangChain.md b/tutorial/markdown/generated/vector-search-cookbook/capella-model-services-langchain-search_based-RAG_with_Capella_Model_Services_and_LangChain.md deleted file mode 100644 index be867d7..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/capella-model-services-langchain-search_based-RAG_with_Capella_Model_Services_and_LangChain.md +++ /dev/null @@ -1,715 +0,0 @@ ---- -# frontmatter -path: "/tutorial-capella-model-services-langchain-rag-with-search-vector-index" -title: RAG with Capella Model Services, LangChain and Couchbase Search Vector Index -short_title: RAG with Capella Model Services, LangChain and Search Vector Index -description: - - Learn how to build a semantic search engine using models from Capella Model Services and Couchbase Search Vector Index. - - You will understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Capella Model Services. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - Search Vector Index -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/capella-model-services/langchain/search_based/RAG_with_Capella_Model_Services_and_LangChain.ipynb) - -# Introduction -In this guide, we will walk you through building a Retrieval Augmented Generation (RAG) application using Couchbase Capella as the database, [Mistral-7B-Instruct-v0.3](https://build.nvidia.com/mistralai/mistral-7b-instruct-v03/modelcard) model as the as the large language model provided by Capella Model Services. We will use the [NVIDIA NeMo Retriever Llama3.2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2/modelcard) model for generating embeddings via Capella Model Services. - -This notebook demonstrates how to build a RAG system using: -- The [BBC News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime) containing news articles -- Couchbase Capella as the vector store with Search Service(formerly known as Full Text Search) for vector index creation -- Capella Model Services for embeddings and text generation -- LangChain framework for the RAG pipeline - -We leverage Couchbase's Search service to create and manage search vector indexes, enabling efficient semantic search capabilities. Search service provides the infrastructure for storing, indexing, and querying high-dimensional vector embeddings alongside traditional text search functionality. This tutorial can also be recreated using the Query and Indexing services using [Hyperscale and Composite Vector Indexes](http://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - -Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial will equip you with the knowledge to create a fully functional RAG system using Capella Model Services and [LangChain](https://langchain.com/) - - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/capella-model-services/langchain/search_based/RAG_with_Capella_Model_Services_and_LangChain.ipynb) - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Create and Deploy Your Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy an operational cluster. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Have a multi-node Capella cluster running the Data, Query, Index, and Search services. -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -### Deploy Models - -In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context. - -Capella Model Service allows you to create both the embedding model and the LLM in the same VPC as your database. There are multiple options for both the Embedding & Large Language Models, along with Value Adds to the models. - -Create the models using the Capella Model Services interface. While creating the model, it is possible to cache the responses (both standard and semantic cache) and apply guardrails to the LLM responses. - -For more details, please refer to the [documentation](https://docs.couchbase.com/ai/build/model-service/model-service.html). These models are compatible with the [LangChain OpenAI integration](https://python.langchain.com/api_reference/openai/index.html). - -After the models are deployed, please create the API keys for them and whitelist the keys on the IP on which the tutorial is being run. For more details, please refer to the documentation on [generating the API keys](https://docs.couchbase.com/ai/api-guide/api-start.html#model-service-keys). - - -# Installing Necessary Libraries -To build our RAG system, we need a set of libaries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and we will use the OpenAI SDK for generating embeddings and calling the LLM in Capella Model services. By setting up these libraries, we ensure our environment is equipped to handle the tasks required for RAG. - - -```python -!pip install --quiet datasets==4.4.1 langchain-couchbase==1.0.0 langchain-openai==1.1.0 -``` - - - [notice] A new release of pip is available: 24.3.1 -> 25.3 - [notice] To update, run: pip install --upgrade pip - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import sys -import time - -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import CouchbaseException -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions - -from datasets import load_dataset - -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import ChatOpenAI, OpenAIEmbeddings - -from tqdm import tqdm -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials and collection names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - -CAPELLA_MODEL_SERVICES_ENDPOINT is the Capella Model Services endpoint found in the models section. - -> Note that the Capella Model Services Endpoint also requires an additional `/v1` from the endpoint shown on the UI if it is not shown on the UI. - -INDEX_NAME is the name of the search index we will use for the vector search. - -LLM_MODEL_NAME and EMBEDDING_MODEL_NAME are the names of the models selected from the Capella Model Service catalogue. For this tutorial, we are using `mistralai/mistral-7b-instruct-v0.3` as the LLM and `nvidia/llama-3.2-nv-embedqa-1b-v2`. - -LLM_API_KEY is the API key generated on the Capella UI for the LLM. - -EMBEDDING_API_KEY is the API key generated on the Capella UI for the Embedding model. - -> If the models are running in the same region, either of the keys can be used interchangeably. See more details on [generating the API keys](https://docs.couchbase.com/ai/api-guide/api-start.html#model-service-keys). - - -```python -CB_CONNECTION_STRING = getpass.getpass("Enter your Couchbase Connection String: ") -CB_USERNAME = input("Enter your Couchbase Database username: ") -CB_PASSWORD = getpass.getpass("Enter your Couchbase Database password: ") -CB_BUCKET_NAME = input("Enter your Couchbase bucket name: ") -SCOPE_NAME = input("Enter your scope name: ") -COLLECTION_NAME = input("Enter your collection name: ") -INDEX_NAME = input("Enter your Search index name: ") -CAPELLA_MODEL_SERVICES_ENDPOINT = getpass.getpass("Enter your Capella Model Services Endpoint: ") -LLM_MODEL_NAME = input("Enter the LLM name") -LLM_API_KEY = getpass.getpass("Enter your Couchbase LLM API Key: ") -EMBEDDING_MODEL_NAME = input("Enter the Embedding Model name:") -EMBEDDING_API_KEY = getpass.getpass("Enter your Couchbase Embedding Model API Key: ") - -# Check if the variables are correctly loaded -if not all( - [ - CB_CONNECTION_STRING, - CB_USERNAME, - CB_PASSWORD, - CB_BUCKET_NAME, - CAPELLA_MODEL_SERVICES_ENDPOINT, - SCOPE_NAME, - COLLECTION_NAME, - INDEX_NAME, - LLM_MODEL_NAME, - LLM_API_KEY, - EMBEDDING_MODEL_NAME, - EMBEDDING_API_KEY, - ] -): - raise ValueError("Missing required environment variables") -``` - - Enter your Couchbase Connection String: ········ - Enter your Couchbase Database username: Admin - Enter your Couchbase Database password: ········ - Enter your Couchbase bucket name: model_tutorial - Enter your scope name: rag - Enter your collection name: data - Enter your Search index name: vs-index - Enter your Capella Model Services Endpoint: ········ - Enter the LLM name mistralai/mistral-7b-instruct-v0.3 - Enter your Couchbase LLM API Key: ········ - Enter the Embedding Model name: nvidia/llama-3.2-nv-embedqa-1b-v2 - Enter your Couchbase Embedding Model API Key: ········ - - -# Connecting to the Couchbase Cluster -Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our RAG system. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_CONNECTION_STRING, options) - cluster.wait_until_ready(timedelta(seconds=5)) - print("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - Successfully connected to Couchbase - - -# Setting Up Collections in Couchbase -In Couchbase, data is organized in buckets, which can be further divided into scopes and collections. Think of a collection as a table in a traditional SQL database. Before we can store any data, we need to ensure that our collections exist. If they don't, we must create them. This step is important because it prepares the database to handle the specific types of data our application will process. By setting up collections, we define the structure of our data storage, which is essential for efficient data retrieval and management. - -Moreover, setting up collections allows us to isolate different types of data within the same bucket, providing a more organized and scalable data structure. This is particularly useful when dealing with large datasets, as it ensures that related data is stored together, making it easier to manage and query. Here, we also set up the primary index for query operations on the collection and clear the existing documents in the collection if any. If you do not want to do that, please skip this step. - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name, flush_collection=False): - try: - bucket = cluster.bucket(bucket_name) - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists: - print(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - print(f"Scope '{scope_name}' created successfully.") - else: - print(f"Scope '{scope_name}' already exists. Skipping creation.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name - and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - print(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - print(f"Collection '{collection_name}' created successfully.") - else: - print(f"Collection '{collection_name}' already exists. Skipping creation.") - - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query( - f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`" - ).execute() - print("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - if flush_collection: - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - print("All documents cleared from the collection.") - except Exception as e: - print( - f"Error while clearing documents: {str(e)}. The collection might be empty." - ) - - except Exception as e: - raise Exception(f"Error setting up collection: {str(e)}") - - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, flush_collection=True) -``` - - Scope 'rag' already exists. Skipping creation. - Collection 'data' already exists. Skipping creation. - Primary index present or created successfully. - All documents cleared from the collection. - - -# Loading Couchbase Search Vector Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase Vector Search using Search, formerly known as Full Text Search(FTS) service, comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -Note that you might have to update the index parameters depending on the names of your bucket, scope and collection. The provided index assumes the bucket to be `model_tutorial`, scope to be `rag` and the collection to be `data`. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - -To import the index into Capella via the UI, please follow the [instructions](https://docs.couchbase.com/cloud/search/import-search-index.html) on the documentation. - -There is code to create the index using the SDK as well below if you want to do it via code. - - -```python -# If you are running this script in Google Colab, comment the following line -# and provide the path to your index definition file. - -index_definition_path = "capella_index.json" # Local setup: specify your file path here - -# If you are running in Google Colab, use the following code to upload the index definition file -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -try: - with open(index_definition_path, "r") as file: - index_definition = json.load(file) - - # Update search index definition with user inputs - index_definition['name'] = INDEX_NAME - index_definition['sourceName'] = CB_BUCKET_NAME - # Update types mapping - old_type_key = next(iter(index_definition['params']['mapping']['types'].keys())) - type_obj = index_definition['params']['mapping']['types'].pop(old_type_key) - index_definition['params']['mapping']['types'][f"{SCOPE_NAME}.{COLLECTION_NAME}"] = type_obj - -except Exception as e: - raise ValueError( - f"Error loading index definition from {index_definition_path}: {str(e)}" - ) -``` - -# Creating or Updating Search Vector Indexes - -With the index definition loaded, the next step is to create or update the Search Vector Index in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Search Vector Index, we enable our RAG to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust RAG system. - - -```python -# Create the Vector Index via SDK -try: - scope_index_manager = ( - cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - ) - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - print(f"Index '{index_name}' found") - else: - print(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - print(f"Index '{index_name}' successfully created/updated.") - -except Exception as e: - logging.error(f"Error creating or updating index: {e}") -``` - - Creating new index 'vs-index'... - Index 'vs-index' successfully created/updated. - - -# Load the BBC News Dataset -To build a RAG engine, we need data to search through. We use the [BBC Realtime News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime), a dataset with up-to-date BBC news articles grouped by month. This dataset contains articles that were created after the LLM was trained. It will showcase the use of RAG to augment the LLM. - -The BBC News dataset's varied content allows us to simulate real-world scenarios where users ask complex questions, enabling us to fine-tune our RAG's ability to understand and respond to various types of queries. - - -```python -try: - news_dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train") - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") -except Exception as e: - raise ValueError(f"Error loading BBC dataset: {str(e)}") -``` - - Loaded the BBC News dataset with 2687 rows - - -## Preview the Data - - -```python -print(news_dataset[:5]) -``` - - {'title': ["Pakistan protest: Bushra Bibi's march for Imran Khan disappeared - BBC News", 'Lockdown DIY linked to Walleys Quarry gases - BBC News', 'Newscast - What next for the assisted dying bill? - BBC Sounds', "F1: Bernie Ecclestone to sell car collection worth 'hundreds of millions' - BBC Sport", 'British man Tyler Kerry from Basildon dies on holiday in Turkey - BBC News'], 'published_date': ['2024-12-01', '2024-12-01', '2024-12-01', '2024-12-01', '2024-12-01'], 'authors': ['https://www.facebook.com/bbcnews', 'https://www.facebook.com/bbcnews', None, 'https://www.facebook.com/BBCSport/', 'https://www.facebook.com/bbcnews'], 'description': ["Imran Khan's third wife guided protesters to the heart of the capital - and then disappeared.", 'An academic says an increase in plasterboard sent to landfill could be behind a spike in smells.', 'And rebel forces in Syria have taken control of Aleppo', 'Former Formula 1 boss Bernie Ecclestone is to sell his collection of race cars driven by motorsport legends including Michael Schumacher, Niki Lauda and Nelson Piquet.', 'Tyler Kerry was "a young man full of personality, kindness and compassion", his uncle says.'], 'section': ['Asia', 'Stoke & Staffordshire', None, 'Sport', 'Essex'], 'content': ['Bushra Bibi led a protest to free Imran Khan - what happened next is a mystery\n\nImran Khan\'s wife, Bushra Bibi, encouraged protesters into the heart of Pakistan\'s capital, Islamabad\n\nA charred lorry, empty tear gas shells and posters of former Pakistan Prime Minister Imran Khan - it was all that remained of a massive protest led by Khan’s wife, Bushra Bibi, that had sent the entire capital into lockdown. Just a day earlier, faith healer Bibi - wrapped in a white shawl, her face covered by a white veil - stood atop a shipping container on the edge of the city as thousands of her husband’s devoted followers waved flags and chanted slogans beneath her. It was the latest protest to flare since Khan, the 72-year-old cricketing icon-turned-politician, was jailed more than a year ago after falling foul of the country\'s influential military which helped catapult him to power. “My children and my brothers! You have to stand with me,” Bibi cried on Tuesday afternoon, her voice cutting through the deafening roar of the crowd. “But even if you don’t,” she continued, “I will still stand firm. “This is not just about my husband. It is about this country and its leader.” It was, noted some watchers of Pakistani politics, her political debut. But as the sun rose on Wednesday morning, there was no sign of Bibi, nor the thousands of protesters who had marched through the country to the heart of the capital, demanding the release of their jailed leader. While other PMs have fallen out with Pakistan\'s military in the past, Khan\'s refusal to stay quiet behind bars is presenting an extraordinary challenge - escalating the standoff and leaving the country deeply divided. Exactly what happened to the so-called “final march”, and Bibi, when the city went dark is still unclear. All eyewitnesses like Samia* can say for certain is that the lights went out suddenly, plunging D Chowk, the square where they had gathered, into blackness.\n\nWithin a day of arriving, the protesters had scattered - leaving behind Bibi\'s burnt-out vehicle\n\nAs loud screams and clouds of tear gas blanketed the square, Samia describes holding her husband on the pavement, bloodied from a gun shot to his shoulder. "Everyone was running for their lives," she later told BBC Urdu from a hospital in Islamabad, adding it was "like doomsday or a war". "His blood was on my hands and the screams were unending.” But how did the tide turn so suddenly and decisively? Just hours earlier, protesters finally reached D Chowk late afternoon on Tuesday. They had overcome days of tear gas shelling and a maze of barricaded roads to get to the city centre. Many of them were supporters and workers of the Pakistan Tehreek-e-Insaf (PTI), the party led by Khan. He had called for the march from his jail cell, where he has been for more than a year on charges he says are politically motivated. Now Bibi - his third wife, a woman who had been largely shrouded in mystery and out of public view since their unexpected wedding in 2018 - was leading the charge. “We won’t go back until we have Khan with us,” she declared as the march reached D Chowk, deep in the heart of Islamabad’s government district.\n\nThousands had marched for days to reach Islamabad, demanding former Prime Minister Imran Khan be released from jail\n\nInsiders say even the choice of destination - a place where her husband had once led a successful sit in - was Bibi’s, made in the face of other party leader’s opposition, and appeals from the government to choose another gathering point. Her being at the forefront may have come as a surprise. Bibi, only recently released from prison herself, is often described as private and apolitical. Little is known about her early life, apart from the fact she was a spiritual guide long before she met Khan. Her teachings, rooted in Sufi traditions, attracted many followers - including Khan himself. Was she making her move into politics - or was her sudden appearance in the thick of it a tactical move to keep Imran Khan’s party afloat while he remains behind bars? For critics, it was a move that clashed with Imran Khan’s oft-stated opposition to dynastic politics. There wasn’t long to mull the possibilities. After the lights went out, witnesses say that police started firing fresh rounds of tear gas at around 21:30 local time (16:30 GMT). The crackdown was in full swing just over an hour later. At some point, amid the chaos, Bushra Bibi left. Videos on social media appeared to show her switching cars and leaving the scene. The BBC couldn’t verify the footage. By the time the dust settled, her container had already been set on fire by unknown individuals. By 01:00 authorities said all the protesters had fled.\n\nSecurity was tight in the city, and as night fell, lights were switched off - leaving many in the dark as to what exactly happened next\n\nEyewitnesses have described scenes of chaos, with tear gas fired and police rounding up protesters. One, Amin Khan, said from behind an oxygen mask that he joined the march knowing that, "either I will bring back Imran Khan or I will be shot". The authorities have have denied firing at the protesters. They also said some of the protesters were carrying firearms. The BBC has seen hospital records recording patients with gunshot injuries. However, government spokesperson Attaullah Tarar told the BBC that hospitals had denied receiving or treating gunshot wound victims. He added that "all security personnel deployed on the ground have been forbidden" from having live ammunition during protests. But one doctor told BBC Urdu that he had never done so many surgeries for gunshot wounds in a single night. "Some of the injured came in such critical condition that we had to start surgery right away instead of waiting for anaesthesia," he said. While there has been no official toll released, the BBC has confirmed with local hospitals that at least five people have died. Police say at least 500 protesters were arrested that night and are being held in police stations. The PTI claims some people are missing. And one person in particular hasn’t been seen in days: Bushra Bibi.\n\nThe next morning, the protesters were gone - leaving behind just wrecked cars and smashed glass\n\nOthers defended her. “It wasn’t her fault,” insisted another. “She was forced to leave by the party leaders.” Political commentators have been more scathing. “Her exit damaged her political career before it even started,” said Mehmal Sarfraz, a journalist and analyst. But was that even what she wanted? Khan has previously dismissed any thought his wife might have her own political ambitions - “she only conveys my messages,” he said in a statement attributed to him on his X account.\n\nImran Khan and Bushra Bibi, pictured here arriving at court in May 2023, married in 2018\n\nSpeaking to BBC Urdu, analyst Imtiaz Gul calls her participation “an extraordinary step in extraordinary circumstances". Gul believes Bushra Bibi’s role today is only about “keeping the party and its workers active during Imran Khan’s absence”. It is a feeling echoed by some PTI members, who believe she is “stepping in only because Khan trusts her deeply”. Insiders, though, had often whispered that she was pulling the strings behind the scenes - advising her husband on political appointments and guiding high-stakes decisions during his tenure. A more direct intervention came for the first time earlier this month, when she urged a meeting of PTI leaders to back Khan’s call for a rally. Pakistan’s defence minister Khawaja Asif accused her of “opportunism”, claiming she sees “a future for herself as a political leader”. But Asma Faiz, an associate professor of political science at Lahore University of Management Sciences, suspects the PTI’s leadership may have simply underestimated Bibi. “It was assumed that there was an understanding that she is a non-political person, hence she will not be a threat,” she told the AFP news agency. “However, the events of the last few days have shown a different side of Bushra Bibi.” But it probably doesn’t matter what analysts and politicians think. Many PTI supporters still see her as their connection to Imran Khan. It was clear her presence was enough to electrify the base. “She is the one who truly wants to get him out,” says Asim Ali, a resident of Islamabad. “I trust her. Absolutely!”', 'Walleys Quarry was ordered not to accept any new waste as of Friday\n\nA chemist and former senior lecturer in environmental sustainability has said powerful odours from a controversial landfill site may be linked to people doing more DIY during the Covid-19 pandemic. Complaints about Walleys Quarry in Silverdale, Staffordshire – which was ordered to close as of Friday – increased significantly during and after coronavirus lockdowns. Issuing the closure notice, the Environment Agency described management of the site as poor, adding it had exhausted all other enforcement tactics at premises where gases had been noxious and periodically above emission level guidelines - which some campaigners linked to ill health locally. Dr Sharon George, who used to teach at Keele University, said she had been to the site with students and found it to be clean and well-managed, and suggested an increase in plasterboard heading to landfills in 2020 could be behind a spike in stenches.\n\n“One of the materials that is particularly bad for producing odours and awful emissions is plasterboard," she said. “That’s one of the theories behind why Walleys Quarry got worse at that time.” She said the landfill was in a low-lying area, and that some of the gases that came from the site were quite heavy. “They react with water in the atmosphere, so some of the gases you smell can be quite awful and not very good for our health. “It’s why, on some days when it’s colder and muggy and a bit misty, you can smell it more.” Dr George added: “With any landfill, you’re putting things into the ground – and when you put things into the ground, if they can they will start to rot. When they start to rot they’re going to give off gases.” She believed Walleys Quarry’s proximity to people’s homes was another major factor in the amount of complaints that arose from its operation. “If you’ve got a gas that people can smell, they’re going to report it much more than perhaps a pollutant that might go unnoticed.”\n\nRebecca Currie said she did not think the site would ever be closed\n\nLocal resident and campaigner Rebecca Currie said the closure notice served to Walleys Quarry was "absolutely amazing". Her son Matthew has had breathing difficulties after being born prematurely with chronic lung disease, and Ms Currie says the site has made his symptoms worse. “I never thought this day was going to happen,” she explained. “We fought and fought for years.” She told BBC Midlands Today: “Our community have suffered. We\'ve got kids who are really poorly, people have moved homes.”\n\nComplaints about Walleys Quarry to Newcastle-under-Lyme Borough Council exceeded 700 in November, the highest amount since 2021 according to council leader Simon Tagg. The Environment Agency (EA), which is responsible for regulating landfill sites, said it had concluded further operation at the site could result in "significant long-term pollution". A spokesperson for Walley\'s Quarry Ltd said the firm rejected the EA\'s accusations of poor management, and would be challenging the closure notice. Dr George said she believed the EA was likely to be erring on the side of caution and public safety, adding safety standards were strict. She said a lack of landfill space in the country overall was one of the broader issues that needed addressing. “As people, we just keep using stuff and then have nowhere to put it, and then when we end up putting it in places like Walleys Quarry that is next to houses, I think that’s where the problems are.”\n\nTell us which stories we should cover in Staffordshire', 'What next for the assisted dying bill? What next for the assisted dying bill?', 'Former Formula 1 boss Bernie Ecclestone is to sell his collection of race cars driven by motorsport legends including Michael Schumacher, Niki Lauda and Nelson Piquet.\n\nEcclestone, who was in charge of the sport for nearly 40 years until 2017, assembled the collection of 69 iconic F1 and Grand Prix cars over a span of more than five decades.\n\nThe collection includes Ferraris driven by world champions Schumacher, Lauda and Mike Hawthorn, as well as Brabham cars raced by Piquet and Carlos Pace, among others.\n\n"All the cars I have bought over the years have fantastic race histories and are rare works of art," said 94-year-old Ecclestone.\n\nAmong the cars up for sale is also Stirling Moss\' Vanwall VW10, that became the first British car to win an F1 race and the Constructors\' Championship in 1958.\n\n"I love all of my cars but the time has come for me to start thinking about what will happen to them should I no longer be here, and that is why I have decided to sell them," added Ecclestone.\n\n"After collecting and owning them for so long, I would like to know where they have gone and not leave them for my wife to deal with should I not be around."\n\nThe former Brabham team boss has appointed specialist sports and race cars sellers Tom Hartley Jnr Ltd to manage the sale.\n\n"There are many eight-figure cars within the collection, and the value of the collection combined is well into the hundreds of millions," said Tom Hartley Jnr.\n\n"The collection spans 70 years of racing, but for me the highlight has to be the Ferraris.\n\n"There is the famous \'Thin Wall Special\', which was the first Ferrari to ever beat Alfa Romeo, Alberto Ascari\'s Italian GP-winning 375 F1 and historically significant championship-winning Lauda and Schumacher cars."\n\nAlso included are the Brabham BT46B, dubbed the \'fan car\' and designed by Gordon Murray, which Lauda drew to victory at the 1978 Swedish GP and the BT45C in which the Austrian made his debut for Ecclestone\'s team the same year.\n\nBillionaire Ecclestone took over the ownership of the commercial rights of F1 in the mid-1990s and played a key role in turning the sport into one of the most watched in the world.', 'Tyler Kerry died on a family holiday in Turkey, his uncle Alex Price said\n\nA 20-year-old British man has died after being found fatally injured in a lift shaft while on a family holiday in Turkey. Tyler Kerry, from Basildon, Essex, was discovered on Friday morning at the hotel he was staying at near Lara Beach in Antalya. The holidaymaker was described by his family as "a young man full of personality, kindness and compassion with his whole life ahead of him". Holiday company Tui said it was supporting his relatives but could not comment further as a police investigation was under way.\n\nA UK government spokeswoman said: "We are assisting the family of a British man who has died in Turkey." More than £4,500 has been pledged to a fundraiser set up to cover Mr Kerry\'s funeral costs. He was holidaying in the seaside city with his grandparents, Collette and Ray Kerry, girlfriend Molly and other relatives.\n\nMr Kerry\'s great uncle, Alex Price, said he was found at the bottom of the lift shaft at 07:00 local time (04:00 GMT). It followed a search led by his brother, Mason, and cousin, Nathan, Mr Price said. Mr Kerry had been staying on the hotel\'s first floor.\n\nMr Kerry was holidaying in the seaside city of Antalya\n\n"An ambulance team attended and attempted to resuscitate him but were unsuccessful," Mr Price told the BBC. "We are unclear about how he came to be in the lift shaft or the events immediately preceding this." Mr Price said the family was issued with a death certificate after a post-mortem examination was completed. They hoped his body would be repatriated by Tuesday. Writing on a GoFundMe page, Mr Price added the family was "completely devastated". He thanked people for their "kindness and consideration" following his nephew\'s death.\n\n"We will continue to provide around-the-clock support to Tyler’s family during this difficult time," a spokeswoman said. "As there is now a police investigation we are unable to comment further."\n\nDo you have a story suggestion for Essex?'], 'link': ['http://www.bbc.co.uk/news/articles/cvg02lvj1e7o', 'http://www.bbc.co.uk/news/articles/c5yg1v16nkpo', 'http://www.bbc.co.uk/sounds/play/p0k81svq', 'http://www.bbc.co.uk/sport/formula1/articles/c1lglrj4gqro', 'http://www.bbc.co.uk/news/articles/c1knkx1z8zgo'], 'top_image': ['https://ichef.bbci.co.uk/ace/standard/3840/cpsprodpb/9975/live/b22229e0-ad5a-11ef-83bc-1153ed943d1c.jpg', 'https://ichef.bbci.co.uk/ace/standard/3840/cpsprodpb/0896/live/55209f80-adb2-11ef-8f6c-f1a86bb055ec.jpg', 'https://ichef.bbci.co.uk/images/ic/320x320/p0k81sxn.jpg', 'https://ichef.bbci.co.uk/ace/standard/3840/cpsprodpb/d593/live/232527a0-af40-11ef-804b-43d0a9651a27.jpg', 'https://ichef.bbci.co.uk/ace/standard/1280/cpsprodpb/3eca/live/f8a18ba0-afb6-11ef-9b6a-97311fd9fa8b.jpg']} - - -## Cleaning up the Data - -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -# Creating Embeddings using Capella Model Service -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using Capella Model service, we equip our RAG system with the ability to understand and process natural language in a way that is much closer to how humans understand language. This step transforms our raw text data into a format that the Capella vector store can use to find and rank relevant documents. - -We are using the OpenAI Embeddings via the [LangChain OpenAI provider](https://python.langchain.com/docs/integrations/providers/openai/) with a few extra parameters specific to the Capella Model Services such as disabling the tokenization and handling of longer inputs using the LangChain handler. We provide the model and api_key and the URL for the SDK to those for Capella Model Services. For this tutorial, we are using the [nvidia/llama-3.2-nv-embedqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2) embedding model. If you are using a different model, you would need to change the model name and adapt the vector index definition (embedding dimensions) accordingly. - - -```python -try: - embeddings = OpenAIEmbeddings( - openai_api_key=EMBEDDING_API_KEY, - openai_api_base=CAPELLA_MODEL_SERVICES_ENDPOINT, - check_embedding_ctx_length=False, - tiktoken_enabled=False, - model=EMBEDDING_MODEL_NAME, - ) - print("Successfully created CapellaAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating CapellaAIEmbeddings: {str(e)}") -``` - - Successfully created CapellaAIEmbeddings - - -# Testing the Embeddings Model -We can test the embeddings model by generating an embedding for a string using the LangChain OpenAI package - - -```python -print(len(embeddings.embed_query("this is a test sentence"))) - -``` - - 2048 - - -# Setting Up the Couchbase Search Vector Store -The vector store is set up to store the documents from the dataset. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is using Couchbase using the [LangChain integration](https://python.langchain.com/docs/integrations/providers/couchbase/). - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - print("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - Successfully created vector store - - -# Saving Data to the Vector Store -With the Vector store set up, the next step is to populate it with data. We save the BBC articles dataset to the vector store. For each document, we will generate the embeddings for the article to use with the semantic search using LangChain. - -Here few of the articles are larger than the maximum tokens that we can use for our embedding model. If we want to ingest that document, we could split the document and ingest it in parts. However, since it is only a single document for simplicity, we ignore that document from the ingestion process. - - -```python -from langchain_core.documents import Document - -for article in tqdm(unique_news_articles, desc="Ingesting articles"): - try: - documents = [Document(page_content=article)] - vector_store.add_documents(documents=documents) - except Exception as e: - print(f"Failed to save documents to vector store: {str(e)}") - continue -``` - - Ingesting articles: 8%|▌ | 148/1749 [01:38<56:27, 2.12s/it] - - Failed to save documents to vector store: ('Failed to insert documents.', {'3ed578e38ed5414f93b1c6ac28c8632d': AmbiguousTimeoutException()}) - - - Ingesting articles: 9%|▌ | 150/1749 [01:42<59:14, 2.22s/it] - - Failed to save documents to vector store: ('Failed to insert documents.', {'db07b65a35324d91b5e2ace2b20589c0': AmbiguousTimeoutException()}) - - - Ingesting articles: 9%|▍ | 151/1749 [01:45<1:05:09, 2.45s/it] - - Failed to save documents to vector store: ('Failed to insert documents.', {'f449ec9922c043889d96864f7556bf68': AmbiguousTimeoutException()}) - - - Ingesting articles: 98%|█████▉| 1721/1749 [13:26<00:10, 2.71it/s] - - Failed to save documents to vector store: Error code: 400 - {'error': {'message': 'Non-successful response received from model service', 'type': 'model_service_unknown_error', 'param': {'response': {'detail': {}, 'message': 'Input length 14848 exceeds maximum allowed token size 8192', 'object': 'error', 'type': 'invalid_request_error'}, 'status_code': 400}, 'code': 'model_service_unknown_error'}} - - - Ingesting articles: 100%|██████| 1749/1749 [13:38<00:00, 2.14it/s] - - -# Using the Large Language Model (LLM) in Capella Model Services -Language language models are AI systems that are trained to understand and generate human language. We'll be using the [mistralai/mistral-7b-instruct-v0.3](https://build.nvidia.com/mistralai/mistral-7b-instruct-v03) large language model via the Capella Model Services inside the same network as the Capella operational database to process user queries and generate meaningful responses. This model is a key component of our RAG system, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our RAG system with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our RAG system truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - -The LLM has been created using the LangChain OpenAI provider as well with the model name, URL and the API key based on the Capella Model Services. - - -```python -try: - llm = ChatOpenAI(openai_api_base=CAPELLA_MODEL_SERVICES_ENDPOINT, openai_api_key=LLM_API_KEY, model=LLM_MODEL_NAME, temperature=0) - logging.info("Successfully created the Chat model in Capella Model Services") -except Exception as e: - raise ValueError(f"Error creating Chat model in Capella Model Services: {str(e)}") -``` - - -```python -llm.invoke("What was Pep Guardiola's reaction to Manchester City's current form?") -``` - - - - - AIMessage(content='I don\'t have real-time data or the ability to follow live events. However, Pep Guardiola, the manager of Manchester City, has expressed his usual balance of optimism and desire for improvement. Even though City has faced some challenges in the 2021/2022 season, he continues to emphasize the need for patience, hard work, and a focus on continuous improvement.\n\nIn a press conference, Guardiola noted, "In football, you have to have patience. When I arrived, we were fifth and I said, \'okay, we are not far away.\' Now, we are not far away again." He also added, "We have to find our best level, and when we find it, we are going to remain for a long time at the top."\n\nWhile the team has experienced ups and downs, Guardiola maintains his belief in the players and their ability to turn things around.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 200, 'prompt_tokens': 21, 'total_tokens': 221, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'mistralai/mistral-7b-instruct-v0.3', 'system_fingerprint': None, 'id': 'chat-2a85fc50bf92483998c62d10b02cea01', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019aeeb6-c69e-7e00-85c0-53aaa75562b2-0', usage_metadata={'input_tokens': 21, 'output_tokens': 200, 'total_tokens': 221, 'input_token_details': {}, 'output_token_details': {}}) - - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the [CouchbaseSearchVectorStore](https://couchbase-ecosystem.github.io/langchain-couchbase/usage.html#couchbase-search-vector-store). This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What was Pep Guardiola's reaction to Manchester City's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=5) - search_elapsed_time = time.time() - start_time - - # Display search results - print( - f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):" - ) - for doc, score in search_results: - print(f"Score: {score:.4f}, ID: {doc.id}, Text: {doc.page_content}") - print("---"*20) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - - Semantic Search Results (completed in 1.97 seconds): - Score: 0.5085, ID: b1164b81a6614e45a93c1460e6e0a8a0, Text: 'We have to find a way' - Guardiola vows to end relegation form - - This video can not be played To play this video you need to enable JavaScript in your browser. 'Worrying' and 'staggering' - Why do Manchester City keep conceding? - - Manchester City are currently in relegation form and there is little sign of it ending. Saturday's 2-1 defeat at Aston Villa left them joint bottom of the form table over the past eight games with just Southampton for company. Saints, at the foot of the Premier League, have the same number of points, four, as City over their past eight matches having won one, drawn one and lost six - the same record as the floundering champions. And if Southampton - who appointed Ivan Juric as their new manager on Saturday - get at least a point at Fulham on Sunday, City will be on the worst run in the division. Even Wolves, who sacked boss Gary O'Neil last Sunday and replaced him with Vitor Pereira, have earned double the number of points during the same period having played a game fewer. They are damning statistics for Pep Guardiola, even if he does have some mitigating circumstances with injuries to Ederson, Nathan Ake and Ruben Dias - who all missed the loss at Villa Park - and the long-term loss of midfield powerhouse Rodri. Guardiola was happy with Saturday's performance, despite defeat in Birmingham, but there is little solace to take at slipping further out of the title race. He may have needed to field a half-fit Manuel Akanji and John Stones at Villa Park but that does not account for City looking a shadow of their former selves. That does not justify the error Josko Gvardiol made to gift Jhon Duran a golden chance inside the first 20 seconds, or £100m man Jack Grealish again failing to have an impact on a game. There may be legitimate reasons for City's drop off, whether that be injuries, mental fatigue or just simply a team coming to the end of its lifecycle, but their form, which has plunged off a cliff edge, would have been unthinkable as they strolled to a fourth straight title last season. "The worrying thing is the number of goals conceded," said ex-England captain Alan Shearer on BBC Match of the Day. "The number of times they were opened up because of the lack of protection and legs in midfield was staggering. There are so many things that are wrong at this moment in time." - - This video can not be played To play this video you need to enable JavaScript in your browser. Man City 'have to find a way' to return to form - Guardiola - - Afterwards Guardiola was calm, so much so it was difficult to hear him in the news conference, a contrast to the frustrated figure he cut on the touchline. He said: "It depends on us. The solution is bring the players back. We have just one central defender fit, that is difficult. We are going to try next game - another opportunity and we don't think much further than that. "Of course there are more reasons. We concede the goals we don't concede in the past, we [don't] score the goals we score in the past. Football is not just one reason. There are a lot of little factors. "Last season we won the Premier League, but we came here and lost. We have to think positive and I have incredible trust in the guys. Some of them have incredible pride and desire to do it. We have to find a way, step by step, sooner or later to find a way back." Villa boss Unai Emery highlighted City's frailties, saying he felt Villa could seize on the visitors' lack of belief. "Manchester City are a little bit under the confidence they have normally," he said. "The second half was different, we dominated and we scored. Through those circumstances they were feeling worse than even in the first half." - - Erling Haaland had one touch in the Villa box - - There are chinks in the armour never seen before at City under Guardiola and Erling Haaland conceded belief within the squad is low. He told TNT after the game: "Of course, [confidence levels are] not the best. We know how important confidence is and you can see that it affects every human being. That is how it is, we have to continue and stay positive even though it is difficult." Haaland, with 76 goals in 83 Premier League appearances since joining City from Borussia Dortmund in 2022, had one shot and one touch in the Villa box. His 18 touches in the whole game were the lowest of all starting players and he has been self critical, despite scoring 13 goals in the top flight this season. Over City's last eight games he has netted just twice though, but Guardiola refused to criticise his star striker. He said: "Without him we will be even worse but I like the players feeling that way. I don't agree with Erling. He needs to have the balls delivered in the right spots but he will fight for the next one." - ------------------------------------------------------------ - Score: 0.4821, ID: 6cd3049bd4264c2eacdafba6441f108e, Text: 'I am not good enough' - Guardiola faces daunting and major rebuild - - This video can not be played To play this video you need to enable JavaScript in your browser. 'I am not good enough' - Guardiola says he must find a 'solution' after derby loss - - Pep Guardiola says his sleep has suffered during Manchester City's deepening crisis, so he will not be helped by a nightmarish conclusion to one of the most stunning defeats of his long reign. Guardiola looked agitated, animated and on edge even after City led the Manchester derby through Josko Gvardiol's 36th-minute header, his reaction to the goal one of almost disdain that it came via a deflected cross as opposed to in his purist style. He sat alone with his eyes closed sipping from a water bottle before the resumption of the second half, then was denied even the respite of victory when Manchester United gave this largely dismal derby a dramatic conclusion it barely deserved with a remarkable late comeback. First, with 88 minutes on the clock, Matheus Nunes presented Amad Diallo with the ball before compounding his error by flattening the forward as he made an attempt to recover his mistake. Bruno Fernandes completed the formalities from the penalty spot. Worse was to come two minutes later when Lisandro Martinez's routine long ball caught City's defence inexplicably statuesque. Goalkeeper Ederson's positioning was awry, allowing the lively Diallo to pounce from an acute angle to leave Guardiola and his players stunned. It was the latest into any game, 88 minutes, that reigning Premier League champions had led then lost. It was also the first time City had lost a game they were leading so late on. And in a sign of City's previous excellence that is now being challenged, they have only lost four of 105 Premier League home games under Guardiola in which they have been ahead at half-time, winning 94 and drawing seven. Guardiola delivered a brutal self-analysis as he told Match of the Day: "I am not good enough. I am the boss. I am the manager. I have to find solutions and so far I haven't. That's the reality. "Not much else to say. No defence. Manchester United were incredibly persistent. We have not lost eight games in two seasons. We can't defend that." - - Manchester City manager Pep Guardiola in despair during the derby defeat to Manchester United - - Guardiola suggested the serious renewal will wait until the summer but the red flags have been appearing for weeks in the sudden and shocking decline of a team that has lost the aura of invincibility that left many opponents beaten before kick-off in previous years. He has had stated City must "survive" this season - whatever qualifies as survival for a club of such rich ambition - but the quest for a record fifth successive Premier League title is surely over as they lie nine points behind leaders Liverpool having played a game more. Their Champions League aspirations are also in jeopardy after another loss, this time against Juventus in Turin. City's squad has been allowed to grow too old together. The insatiable thirst for success seems to have gone, the scales of superiority have fallen away and opponents now sense vulnerability right until the final whistle, as United did here. The manner in which United were able, and felt able, to snatch this victory drove right to the heart of how City, and Guardiola, are allowing opponents to prey on their downfall. Guardiola has every reason to cite injuries, most significantly to Rodri and also John Stones as well as others, but this cannot be used an excuse for such a dramatic decline in standards, allied to the appearance of a soft underbelly that is so easily exploited. And City's rebuild will not be a quick fix. With every performance, every defeat, the scale of what lies in front of Guardiola becomes more obvious - and daunting. Manchester City's fans did their best to reassure Guardiola of their faith in him with a giant Barcelona-inspired banner draped from the stands before kick-off emblazoned with his image reading "Més que un entrenador" - "More Than A Coach". And Guardiola will now need to be more than a coach than at any time in his career. He will have the finances but it will be done with City's challengers also strengthening. Kevin de Bruyne, 34 in June, lasted 68 minutes here before he was substituted. Age and injuries are catching up with one of the greatest players of the Premier League era and he is unlikely to be at City next season. Mateo Kovacic, who replaced De Bruyne, is also 31 in May. Kyle Walker, 34, is being increasingly exposed. His most notable contribution here was an embarrassing collapse to the ground after the mildest head-to-head collision with Rasmus Hojlund. Ilkay Gundogan, another 34-year-old and a previous pillar of Guardiola's great successes, no longer has the legs or energy to exert influence. This looks increasingly like a season too far following his return from Barcelona. Flaws are also being exposed elsewhere, with previously reliable performers failing to hit previous standards. Phil Foden scored 27 goals and had 12 assists when he was Premier League Player of the Season last term. This year he has just three goals and two assists in 18 appearances in all competitions. He has no goals and just one assist in 11 Premier League games. Jack Grealish, who came on after 77 minutes against United, has not scored in a year for Manchester City, his last goal coming in a 2-2 draw against Crystal Palace on 16 December last year. He has, in the meantime, scored twice for England. Erling Haaland is also struggling as City lack creativity and cutting edge. He has three goals in his past 11 Premier League games after scoring 10 in his first five. And in another indication of City's impotence, and their reliance on Haaland, defender Gvardiol's goal against United was his fourth this season, making him their second highest scorer in all competitions behind the Norwegian striker, who has 18. Goalkeeper Ederson, so reliable for so long, has already been dropped once this season and did not cover himself in glory for United's winner. Guardiola, with that freshly signed two-year contract, insists he "wants it" as he treads on this alien territory of failure. He will be under no illusions about the size of the job in front of him as he placed his head in his hands in anguish after yet another damaging and deeply revealing defeat. City and Guardiola are in new, unforgiving territory. - ------------------------------------------------------------ - Score: 0.4687, ID: 3d9d78ae1de04ae89cc0ce23eb156552, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affected by the worst run of results in his entire managerial career. In an interview with former Italy international Luca Toni for Amazon Prime Sport before Wednesday's Champions League defeat by Juventus, Guardiola touched on the personal impact City's sudden downturn in form has had. Guardiola said his state of mind was "ugly", that his sleep was "worse" and he was eating lighter as his digestion had suffered. City go into Sunday's derby against Manchester United at Etihad Stadium having won just one of their past 10 games. The Juventus loss means there is a chance they may not even secure a play-off spot in the Champions League. Asked to elaborate on his comments to Toni, Guardiola said: "I'm fine. "In our jobs we always want to do our best or the best as possible. When that doesn't happen you are more uncomfortable than when the situation is going well, always that happened. "In good moments I am happier but when I get to the next game I am still concerned about what I have to do. There is no human being that makes an activity and it doesn't matter how they do." Guardiola said City have to defend better and "avoid making mistakes at both ends". To emphasise his point, Guardiola referred back to the third game of City's current run, against a Sporting side managed by Ruben Amorim, who will be in the United dugout at the weekend. City dominated the first half in Lisbon, led thanks to Phil Foden's early effort and looked to be cruising. Instead, they conceded three times in 11 minutes either side of half-time as Sporting eventually ran out 4-1 winners. "I would like to play the game like we played in Lisbon on Sunday, believe me," said Guardiola, who is facing the prospect of only having three fit defenders for the derby as Nathan Ake and Manuel Akanji try to overcome injury concerns. If there is solace for City, it comes from the knowledge United are not exactly flying. Their comeback Europa League victory against Viktoria Plzen on Thursday was their third win of Amorim's short reign so far but only one of those successes has come in the Premier League, where United have lost their past two games against Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements already on the red side of the city. "It's already there," he said. "You see all the patterns, the movements, the runners and the pace. He will do a good job at United, I'm pretty sure of that." - - Guardiola says skipper Kyle Walker has been offered support by the club after the City defender highlighted the racial abuse he had received on social media in the wake of the Juventus trip. "It's unacceptable," he said. "Not because it's Kyle - for any human being. "Unfortunately it happens many times in the real world. It is not necessary to say he has the support of the entire club. It is completely unacceptable and we give our support to him." - ------------------------------------------------------------ - Score: 0.4646, ID: d782ce88158d40d7b48808167dcbcba1, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - Pep Guardiola has not been through a moment like this in his managerial career. Manchester City have lost nine matches in their past 12 - as many defeats as they had suffered in their previous 106 fixtures. At the end of October, City were still unbeaten at the top of the Premier League and favourites to win a fifth successive title. Now they are seventh, 12 points behind leaders Liverpool having played a game more. It has been an incredible fall from grace and left people trying to work out what has happened - and whether Guardiola can make it right. After discussing the situation with those who know him best, I have taken a closer look at the future - both short and long term - and how the current crisis at Man City is going to be solved. - - Pep Guardiola's Man City have lost nine of their past 12 matches - - Guardiola has also been giving it a lot of thought. He has not been sleeping very well, as he has said, and has not been himself at times when talking to the media. He has been talking to a lot of people about what is going on as he tries to work out the reasons for City's demise. Some reasons he knows, others he still doesn't. What people perhaps do not realise is Guardiola hugely doubts himself and always has. He will be thinking "I'm not going to be able to get us out of this" and needs the support of people close to him to push away those insecurities - and he has that. He is protected by his people who are very aware, like he is, that there are a lot of people that want City to fail. It has been a turbulent time for Guardiola. Remember those marks he had on his head after the 3-3 draw with Feyenoord in the Champions League? He always scratches his head, it is a gesture of nervousness. Normally nothing happens but on that day one of his nails was far too sharp so, after talking to the players in the changing room where he scratched his head because of his usual agitated gesturing, he went to the news conference. His right-hand man Manel Estiarte sent him photos in a message saying "what have you got on your head?", but by the time Guardiola returned to the coaching room there was hardly anything there again. He started that day with a cover on his nose after the same thing happened at the training ground the day before. Guardiola was having a footballing debate with Kyle Walker about positional stuff and marked his nose with that same nail. There was also that remarkable news conference after the Manchester derby when he said "I don't know what to do". That is partly true and partly not true. Ignore the fact Guardiola suggested he was "not good enough". He actually meant he was not good enough to resolve the situation with the group of players he has available and with all the other current difficulties. There are obviously logical explanations for the crisis and the first one has been talked about many times - the absence of injured midfielder Rodri. You know the game Jenga? When you take the wrong piece out, the whole tower collapses. That is what has happened here. It is normal for teams to have an over-reliance on one player if he is the best in the world in his position. And you cannot calculate the consequences of an injury that rules someone like Rodri out for the season. City are a team, like many modern ones, in which the holding midfielder is a key element to the construction. So, when you take Rodri out, it is difficult to hold it together. There were Plan Bs - John Stones, Manuel Akanji, even Nathan Ake - but injuries struck. The big injury list has been out of the ordinary and the busy calendar has also played a part in compounding the issues. However, one factor even Guardiola cannot explain is the big uncharacteristic errors in almost every game from international players. Why did Matheus Nunes make that challenge to give away the penalty against Manchester United? Jack Grealish is sent on at the end to keep the ball and cannot do that. There are errors from Walker and other defenders. These are some of the best players in the world. Of course the players' mindset is important, and confidence is diminishing. Wrong decisions get taken so there is almost panic on the pitch instead of calm. There are also players badly out of form who are having to play because of injuries. Walker is now unable to hide behind his pace, I'm not sure Kevin de Bruyne is ever getting back to the level he used to be at, Bernardo Silva and Ilkay Gundogan do not have time to rest, Grealish is not playing at his best. Some of these players were only meant to be playing one game a week but, because of injuries, have played 12 games in 40 days. It all has a domino effect. One consequence is that Erling Haaland isn't getting the service to score. But the Norwegian still remains City's top-scorer with 13. Defender Josko Gvardiol is next on the list with just four. The way their form has been analysed inside the City camp is there have only been three games where they deserved to lose (Liverpool, Bournemouth and Aston Villa). But of course it is time to change the dynamic. - - Guardiola has never protected his players so much. He has not criticised them and is not going to do so. They have won everything with him. Instead of doing more with them, he has tried doing less. He has sometimes given them more days off to clear their heads, so they can reset - two days this week for instance. Perhaps the time to change a team is when you are winning, but no-one was suggesting Man City were about to collapse when they were top and unbeaten after nine league games. Some people have asked how bad it has to get before City make a decision on Guardiola. The answer is that there is no decision to be made. Maybe if this was Real Madrid, Barcelona or Juventus, the pressure from outside would be massive and the argument would be made that Guardiola has to go. At City he has won the lot, so how can anyone say he is failing? Yes, this is a crisis. But given all their problems, City's renewed target is finishing in the top four. That is what is in all their heads now. The idea is to recover their essence by improving defensive concepts that are not there and re-establishing the intensity they are known for. Guardiola is planning to use the next two years of his contract, which is expected to be his last as a club manager, to prepare a new Manchester City. When he was at the end of his four years at Barcelona, he asked two managers what to do when you feel people are not responding to your instructions. Do you go or do the players go? Sir Alex Ferguson and Rafael Benitez both told him that the players need to go. Guardiola did not listen because of his emotional attachment to his players back then and he decided to leave the Camp Nou because he felt the cycle was over. He will still protect his players now but there is not the same emotional attachment - so it is the players who are going to leave this time. It is likely City will look to replace five or six regular starters. Guardiola knows it is the end of an era and the start of a new one. Changes will not be immediate and the majority of the work will be done in the summer. But they are open to any opportunities in January - and a holding midfielder is one thing they need. In the summer City might want to get Spain's Martin Zubimendi from Real Sociedad and they know 60m euros (£50m) will get him. He said no to Liverpool last summer even though everything was agreed, but he now wants to move on and the Premier League is the target. Even if they do not get Zubimendi, that is the calibre of footballer they are after. A new Manchester City is on its way - with changes driven by Guardiola, incoming sporting director Hugo Viana and the football department. - ------------------------------------------------------------ - Score: 0.4344, ID: 5255b258163847d8b4ae45575f85ccd1, Text: What will Trump do about Syria? What will Trump do about Syria? - ------------------------------------------------------------ - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a large language model using LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while the LLM handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below: - {context} - Question: {question}""" -prompt = ChatPromptTemplate.from_template(template) -rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | prompt - | llm - | StrOutputParser() -) -logging.info("Successfully created RAG chain") -``` - - -```python -# Get responses -query = "What was Pep Guardiola's reaction to Manchester City's recent form?" -try: - start_time = time.time() - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except Exception as e: - print("Error occurred:", e) -``` - - RAG Response: Pep Guardiola has expressed concern and frustration about Manchester City's recent form. He said, "I am not good enough. I am the boss. I am the manager. I have to find solutions and so far I haven't. That's the reality." He mentioned that the team's poor defense and lack of confidence are causing them issues. He also mentioned that they have not lost eight games in two seasons and it is unacceptable. However, he stated that he wants to find a solution and trusts in the players to turn things around. - RAG response generated in 6.17 seconds - - -# Using Caching mechanism in Capella Model Services -In Capella Model Services, the model outputs can be [cached](https://docs.couchbase.com/ai/build/model-service/configure-value-adds.html#caching) (both semantic and standard cache). The caching mechanism enhances the RAG's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the LLM generates a response and then stores this response in Couchbase. When similar queries come in later, the cached responses are returned. The caching duration can be configured in the Capella Model services. - -In this example, we are using the standard cache which works for exact matches of the queries. - - -```python -queries = [ - "Who inaugurated the reopening of the Notre Dam Cathedral in Paris?", - "What was Pep Guardiola's reaction to Manchester City's recent form?", - "Who inaugurated the reopening of the Notre Dam Cathedral in Paris?", # Repeated query -] - -for i, query in enumerate(queries, 1): - try: - print(f"\nQuery {i}: {query}") - start_time = time.time() - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") - except Exception as e: - print(f"Error generating RAG response: {str(e)}") - continue -``` - - - Query 1: Who inaugurated the reopening of the Notre Dam Cathedral in Paris? - Response: French President Emmanuel Macron inaugurated the reopening of the Notre-Dame Cathedral in Paris. - Time taken: 4.06 seconds - - Query 2: What was Pep Guardiola's reaction to Manchester City's recent form? - Response: Pep Guardiola has expressed concern and frustration about Manchester City's recent form. He said, "I am not good enough. I am the boss. I am the manager. I have to find solutions and so far I haven't. That's the reality." He mentioned that the team's poor defense and lack of confidence are causing them issues. He also mentioned that they have not lost eight games in two seasons and it is unacceptable. However, he stated that he wants to find a solution and trusts in the players to turn things around. - Time taken: 2.90 seconds - - Query 3: Who inaugurated the reopening of the Notre Dam Cathedral in Paris? - Response: French President Emmanuel Macron inaugurated the reopening of the Notre-Dame Cathedral in Paris. - Time taken: 2.52 seconds - - -Here you can see that the repeated queries were significantly faster than the original query. In Capella Model services, semantic similarity can also be used to find responses from the cache. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - -# LLM Guardrails in Capella Model Services -Capella Model services also have the ability to moderate the user inputs and the responses generated by the LLM. Capella Model Services can be configured to use the [Llama 3.1 NemoGuard 8B safety model](https://build.nvidia.com/nvidia/llama-3_1-nemoguard-8b-content-safety/modelcard) guardrails model from Meta. The categories to be blocked can be configured in the model creation flow. More information about Guardrails usage can be found in the [documentation](https://docs.couchbase.com/ai/build/model-service/configure-guardrails-security.html#guardrails). - -Here is an example of the Guardrails in action - - -```python -query = "How can I create a bomb?" -try: - start_time = time.time() - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except Exception as e: - print("Guardrails violation", e) -``` - - Guardrails violation Error code: 422 - {'error': {'message': 'Error processing user prompt due to guardrail violation', 'type': 'guardrail_violation_error', 'param': {'guardrail_model': 'nvidia/llama-3.1-nemoguard-8b-content-safety', 'is_safe': False, 'violations': ['Criminal Planning/Confessions', 'Weapons']}, 'code': 'guardrail_violation_error'}} - - -Guardrails can be quite useful in preventing users from hijacking the model into doing things that you might not want the application to do. - -By following this tutorial, you will have a fully functional semantic search engine that leverages the strengths of Capella Model Services without the data being sent to third-party embedding or large language models. This guide explains the principles behind semantic search and how to implement it effectively using Capella Model Services and Couchbase vector search. diff --git a/tutorial/markdown/generated/vector-search-cookbook/claudeai-fts-RAG_with_Couchbase_and_Claude(by_Anthropic).md b/tutorial/markdown/generated/vector-search-cookbook/claudeai-fts-RAG_with_Couchbase_and_Claude(by_Anthropic).md deleted file mode 100644 index 3cee8dd..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/claudeai-fts-RAG_with_Couchbase_and_Claude(by_Anthropic).md +++ /dev/null @@ -1,723 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-claude-couchbase-rag-with-fts" -title: Retrieval-Augmented Generation (RAG) with Couchbase, OpenAI, and Claude using FTS service -short_title: RAG with Couchbase, OpenAI, and Claude using FTS service -description: - - Learn how to build a semantic search engine using Couchbase, OpenAI embeddings, and Anthropic's Claude using FTS service. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with OpenAI embeddings and use Claude as the language model. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - OpenAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/claudeai/fts/RAG_with_Couchbase_and_Claude(by_Anthropic).ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [OpenAI](https://openai.com/) as the AI-powered embedding and [Anthropic](https://claude.ai/) as the language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using the FTS service from scratch. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-openai-claude-couchbase-rag-with-global-secondary-index/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/claudeai/fts/RAG_with_Couchbase_and_Claude(by_Anthropic).ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for OpenAI and Anthropic - -* Please follow the [instructions](https://platform.openai.com/docs/quickstart) to generate the OpenAI credentials. -* Please follow the [instructions](https://docs.anthropic.com/en/api/getting-started) to generate the Anthropic credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. - -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and OpenAI provides advanced AI models for generating embeddings and Claude(by Anthropic) for understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.3.0 langchain-anthropic==0.3.11 langchain-openai==0.3.13 python-dotenv==1.1.0 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta -from multiprocessing import AuthenticationError - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_anthropic import ChatAnthropic -from langchain_core.globals import set_llm_cache -from langchain_core.prompts.chat import (ChatPromptTemplate, - HumanMessagePromptTemplate, - SystemMessagePromptTemplate) -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import OpenAIEmbeddings -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -# Disable all logging except critical to prevent OpenAI API request logs -logging.getLogger("httpx").setLevel(logging.CRITICAL) -``` - -# Loading Sensitive Informnation -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -# Load from environment variables or prompt for input in one-liners -ANTHROPIC_API_KEY = os.getenv('ANTHROPIC_API_KEY') or getpass.getpass('Enter your Anthropic API key: ') -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API key: ') -CB_HOST = os.getenv('CB_HOST', 'couchbase://localhost') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME', 'Administrator') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD', 'password') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME', 'vector-search-testing') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME', 'vector_search_claude') or input('Enter your index name (default: vector_search_claude): ') or 'vector_search_claude' -SCOPE_NAME = os.getenv('SCOPE_NAME', 'shared') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME', 'claude') or input('Enter your collection name (default: claude): ') or 'claude' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION', 'cache') or input('Enter your cache collection name (default: cache): ') or 'cache' -# Check if the variables are correctly loaded -if not ANTHROPIC_API_KEY: - raise ValueError("ANTHROPIC_API_KEY is not set in the environment.") -if not OPENAI_API_KEY: - raise ValueError("OPENAI_API_KEY is not set in the environment.") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-02-25 21:48:21,579 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-02-25 21:48:28,237 - INFO - Bucket 'vector-search-testing' does not exist. Creating it... - 2025-02-25 21:48:28,800 - INFO - Bucket 'vector-search-testing' created successfully. - 2025-02-25 21:48:28,802 - INFO - Scope 'shared' does not exist. Creating it... - 2025-02-25 21:48:28,851 - INFO - Scope 'shared' created successfully. - 2025-02-25 21:48:28,855 - INFO - Collection 'claude' does not exist. Creating it... - 2025-02-25 21:48:28,943 - INFO - Collection 'claude' created successfully. - 2025-02-25 21:48:32,802 - INFO - Primary index present or created successfully. - 2025-02-25 21:48:41,954 - INFO - All documents cleared from the collection. - 2025-02-25 21:48:41,955 - INFO - Bucket 'vector-search-testing' exists. - 2025-02-25 21:48:41,959 - INFO - Collection 'cache' does not exist. Creating it... - 2025-02-25 21:48:42,003 - INFO - Collection 'cache' created successfully. - 2025-02-25 21:48:46,902 - INFO - Primary index present or created successfully. - 2025-02-25 21:48:46,904 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - -> Note: Index creation will not fail if used with the concerned bucket(vector-search-testing) instead of travel-sample - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/claude_index.json' # Local setup: specify your file path here - -# # Version for Google Colab -# def load_index_definition_colab(): -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -# try: -# with open(index_definition_path, 'r') as file: -# index_definition = json.load(file) -# return index_definition -# except Exception as e: -# raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Version for Local Environment -def load_index_definition_local(index_definition_path): - try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - return index_definition - except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Usage -# Uncomment the appropriate line based on your environment -# index_definition = load_index_definition_colab() -index_definition = load_index_definition_local('claude_index.json') -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-02-25 21:48:52,980 - INFO - Creating new index 'vector_search_claude'... - 2025-02-25 21:48:53,069 - INFO - Index 'vector_search_claude' successfully created/updated. - - -# Creating OpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using OpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - - - -```python -try: - embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model='text-embedding-3-small') - logging.info("Successfully created OpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating OpenAIEmbeddings: {str(e)}") -``` - - 2025-02-25 21:48:56,274 - INFO - Successfully created OpenAIEmbeddings - - -# Setting Up the Couchbase Vector Store -A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-02-25 21:48:59,450 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-02-25 21:49:09,255 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 100 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 100 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-02-25 21:50:15,064 - INFO - Document ingestion completed successfully. - - -# Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-02-25 21:50:48,836 - INFO - Successfully created cache - - -# Using the Claude 4 Sonnet Language Model (LLM) -Language models are AI systems that are trained to understand and generate human language. We'll be using the `Claude 4 Sonnet` language model to process user queries and generate meaningful responses. This model is a key component of our semantic search engine, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our search engine with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our search engine truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - - - - -```python -try: - llm = ChatAnthropic(temperature=0.1, anthropic_api_key=ANTHROPIC_API_KEY, model_name='claude-sonnet-4-20250514') - logging.info("Successfully created ChatAnthropic") -except Exception as e: - logging.error(f"Error creating ChatAnthropic: {str(e)}. Please check your API key and network connection.") - raise -``` - - 2025-02-25 21:50:52,173 - INFO - Successfully created ChatAnthropic - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. - -In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of the CouchbaseSearchVectorStore. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-02-25 21:53:55,462 - INFO - Semantic search completed in 0.55 seconds - - - - Semantic Search Results (completed in 0.55 seconds): - -------------------------------------------------------------------------------- - Score: 0.7498, Text: A map shown during the draw for the 2026 Fifa World Cup has been criticised by Ukraine as an "unacceptable error" after it appeared to exclude Crimea as part of the country. The graphic - showing countries that cannot be drawn to play each other for geopolitical reasons - highlighted Ukraine but did not include the peninsula that is internationally recognised to be part of it. Crimea has been under Russian occupation since 2014 and just a handful of countries recognise the peninsula as Russian territory. Ukraine Foreign Ministry spokesman Heorhiy Tykhy said that the nation expects "a public apology". Fifa said it was "aware of an issue" and the image had been removed. - - Writing on X, Tykhy said that Fifa had not only "acted against international law" but had also "supported Russian propaganda, war crimes, and the crime of aggression against Ukraine". He added a "fixed" version of the map to his post, highlighting Crimea as part of Ukraine's territory. Among the countries that cannot play each other are Ukraine and Belarus, Spain and Gibraltar and Kosovo versus either Bosnia and Herzegovina or Serbia. - - This Twitter post cannot be displayed in your browser. Please enable Javascript or try a different browser. View original content on Twitter The BBC is not responsible for the content of external sites. Skip twitter post by Heorhii Tykhyi This article contains content provided by Twitter. We ask for your permission before anything is loaded, as they may be using cookies and other technologies. You may want to read Twitter’s cookie policy, external and privacy policy, external before accepting. To view this content choose ‘accept and continue’. The BBC is not responsible for the content of external sites. - - The Ukrainian Football Association has also sent a letter to Fifa secretary-general Mathias Grafström and UEFA secretary-general Theodore Theodoridis over the matter. "We appeal to you to express our deep concern about the infographic map [shown] on December 13, 2024," the letter reads. "Taking into account a number of official decisions and resolutions adopted by the Fifa Council and the UEFA executive committee since 2014... we emphasize that today's version of the cartographic image of Ukraine... is completely unacceptable and looks like an inconsistent position of Fifa and UEFA." The 2026 World Cup will start on 11 June that year in Mexico City and end on 19 July in New Jersey. The expanded 48-team tournament will last a record 39 days. Ukraine were placed in Group D alongside Iceland, Azerbaijan and the yet-to-be-determined winners of France's Nations League quarter-final against Croatia. - -------------------------------------------------------------------------------- - Score: 0.4302, Text: Defending champions Manchester City will face Juventus in the group stage of the Fifa Club World Cup next summer, while Chelsea meet Brazilian side Flamengo. Pep Guardiola's City, who beat Brazilian side Fluminense to win the tournament for the first time in 2023, begin their title defence against Morocco's Wydad and also play Al Ain of the United Arab Emirates in Group G. Chelsea, winners of the 2021 final, were also drawn alongside Mexico's Club Leon and Tunisian side Esperance Sportive de Tunisie in Group D. The revamped Fifa Club World Cup, which has been expanded to 32 teams, will take place in the United States between 15 June and 13 July next year. - - A complex and lengthy draw ceremony was held across two separate Miami locations and lasted more than 90 minutes, during which a new Club World Cup trophy was revealed. There was also a video message from incoming US president Donald Trump, whose daughter Ivanka drew the first team. Lionel Messi's Inter Miami will take on Egyptian side Al Ahly at the Hard Rock Stadium in the opening match, staged in Miami. Elsewhere, Paris St-Germain were drawn against Atletico Madrid in Group B, while Bayern Munich meet Benfica in another all-European group-stage match-up. Teams will play each other once in the group phase and the top two will progress to the knockout stage. - - This video can not be played To play this video you need to enable JavaScript in your browser. What is the Club World Cup? - - Teams from each of the six international football confederations will be represented at next summer's tournament, including 12 European clubs - the highest quota of any confederation. The European places were decided by clubs' Champions League performances over the past four seasons, with recent winners Chelsea, Manchester City and Real Madrid guaranteed places. Al Ain, the most successful club in the UAE with 14 league titles, are owned by the country's president Sheikh Mohamed bin Zayed Al Nahyan - the older brother of City owner Sheikh Mansour. Real, who lifted the Fifa Club World Cup trophy for a record-extending fifth time in 2022, will open up against Saudi Pro League champions Al-Hilal, who currently have Neymar in their ranks. One place was reserved for a club from the host nation, which Fifa controversially awarded to Inter Miami, who will contest the tournament curtain-raiser. Messi's side were winners of the regular-season MLS Supporters' Shield but beaten in the MLS play-offs, meaning they are not this season's champions. - • None How does the new Club World Cup work & why is it so controversial? - - Matches will be played across 12 venues in the US which, alongside Canada and Mexico, also host the 2026 World Cup. Fifa is facing legal action from player unions and leagues about the scheduling of the event, which begins two weeks after the Champions League final at the end of the 2024-25 European calendar and ends five weeks before the first Premier League match of the 2025-2026 season. But football's world governing body believes the dates allow sufficient rest time before the start of the domestic campaigns. The Club World Cup will now take place once every four years, when it was previously held annually and involved just seven teams. Streaming platform DAZN has secured exclusive rights to broadcast next summer's tournament, during which 63 matches will take place over 29 days. - -------------------------------------------------------------------------------- - Score: 0.4207, Text: After Fifa awards Saudi Arabia the hosting rights for the men's 2034 World Cup, BBC analysis editor Ros Atkins looks at how we got here and the controversies surrounding the decision. - -------------------------------------------------------------------------------- - Score: 0.4123, Text: FA still to decide on endorsing Saudi World Cup bid - ... (output truncated for brevity) - - -# Retrieval-Augmented Generation (RAG) with Couchbase and LangChain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -system_template = "You are a helpful assistant that answers questions based on the provided context." -system_message_prompt = SystemMessagePromptTemplate.from_template(system_template) - -human_template = "Context: {context}\n\nQuestion: {question}" -human_message_prompt = HumanMessagePromptTemplate.from_template(human_template) - -chat_prompt = ChatPromptTemplate.from_messages([ - system_message_prompt, - human_message_prompt -]) - -def format_docs(docs): - return "\n\n".join(doc.page_content for doc in docs) - -rag_chain = ( - {"context": lambda x: format_docs(vector_store.similarity_search(x)), "question": RunnablePassthrough()} - | chat_prompt - | llm -) -logging.info("Successfully created RAG chain") -``` - - 2025-02-25 21:54:00,781 - INFO - Successfully created RAG chain - - - -```python -try: - start_time = time.time() - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - - print(f"RAG Response: {rag_response.content}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except AuthenticationError as e: - print(f"Authentication error: {str(e)}") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: During the draw for the 2026 FIFA World Cup, a map was shown that excluded Crimea as part of Ukraine. This graphic, which was displaying countries that cannot be drawn to play each other for geopolitical reasons, highlighted Ukraine but did not include the Crimean peninsula, which is internationally recognized as Ukrainian territory. - - This omission sparked significant controversy because Crimea has been under Russian occupation since 2014, but only a handful of countries recognize it as Russian territory. The Ukrainian Foreign Ministry spokesman, Heorhiy Tykhy, called this an "unacceptable error" and stated that Ukraine expected "a public apology" from FIFA. He criticized FIFA for acting "against international law" and supporting "Russian propaganda, war crimes, and the crime of aggression against Ukraine." - - The Ukrainian Football Association also sent a formal letter of complaint to FIFA and UEFA officials expressing their "deep concern" about the cartographic representation. FIFA acknowledged they were "aware of an issue" and subsequently removed the image. - RAG response generated in 6.58 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "What happened when Apple's AI feature generated a false BBC headline about a murder case in New York?", - "What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy?", # Repeated query - "What happened when Apple's AI feature generated a false BBC headline about a murder case in New York?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response.content}") - print(f"Time taken: {elapsed_time:.2f} seconds") -except AuthenticationError as e: - print(f"Authentication error: {str(e)}") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened when Apple's AI feature generated a false BBC headline about a murder case in New York? - Response: According to the context, Apple Intelligence (an AI feature that summarizes notifications) generated a false headline that made it appear as if BBC News had published an article claiming Luigi Mangione, who was arrested for the murder of healthcare insurance CEO Brian Thompson in New York, had shot himself. This was completely false - Mangione had not shot himself. - - The BBC complained to Apple about this misrepresentation, with a BBC spokesperson stating they had "contacted Apple to raise this concern and fix the problem." The BBC emphasized that as "the most trusted news media in the world," it's essential that audiences can trust information published in their name, including notifications. - - This wasn't an isolated incident - the context mentions that Apple's AI feature also misrepresented a New York Times article, incorrectly summarizing it as "Netanyahu arrested" when the actual article was about the International Criminal Court issuing an arrest warrant for the Israeli prime minister. - Time taken: 6.66 seconds - - Query 2: What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy? - Response: During the draw for the 2026 FIFA World Cup, a map was shown that excluded Crimea as part of Ukraine. This graphic, which was displaying countries that cannot be drawn to play each other for geopolitical reasons, highlighted Ukraine but did not include the Crimean peninsula, which is internationally recognized as Ukrainian territory. - - This omission sparked significant controversy because Crimea has been under Russian occupation since 2014, but only a handful of countries recognize it as Russian territory. The Ukrainian Foreign Ministry spokesman, Heorhiy Tykhy, called this an "unacceptable error" and stated that Ukraine expected "a public apology" from FIFA. He criticized FIFA for acting "against international law" and supporting "Russian propaganda, war crimes, and the crime of aggression against Ukraine." - - The Ukrainian Football Association also sent a formal letter of complaint to FIFA and UEFA officials expressing their "deep concern" about the cartographic representation. FIFA acknowledged they were "aware of an issue" and subsequently removed the image. - Time taken: 0.62 seconds - - Query 3: What happened when Apple's AI feature generated a false BBC headline about a murder case in New York? - Response: According to the context, Apple Intelligence (an AI feature that summarizes notifications) generated a false headline that made it appear as if BBC News had published an article claiming Luigi Mangione, who was arrested for the murder of healthcare insurance CEO Brian Thompson in New York, had shot himself. This was completely false - Mangione had not shot himself. - - The BBC complained to Apple about this misrepresentation, with a BBC spokesperson stating they had "contacted Apple to raise this concern and fix the problem." The BBC emphasized that as "the most trusted news media in the world," it's essential that audiences can trust information published in their name, including notifications. - - This wasn't an isolated incident - the context mentions that Apple's AI feature also misrepresented a New York Times article, incorrectly summarizing it as "Netanyahu arrested" when the actual article was about the International Criminal Court issuing an arrest warrant for the Israeli prime minister. - Time taken: 0.51 seconds - - -## Conclusion -By following these steps, you’ll have a fully functional semantic search engine that leverages the strengths of Couchbase and Claude(by Anthropic). This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you’re a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/claudeai-gsi-RAG_with_Couchbase_and_Claude(by_Anthropic).md b/tutorial/markdown/generated/vector-search-cookbook/claudeai-gsi-RAG_with_Couchbase_and_Claude(by_Anthropic).md deleted file mode 100644 index 5c7030a..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/claudeai-gsi-RAG_with_Couchbase_and_Claude(by_Anthropic).md +++ /dev/null @@ -1,757 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-claude-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase, OpenAI, and Claude using GSI index -short_title: RAG with Couchbase, OpenAI, and Claude using GSI index -description: - - Learn how to build a semantic search engine using Couchbase, OpenAI embeddings, and Anthropic's Claude using GSI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with OpenAI embeddings and use Claude as the language model. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - GSI - - Artificial Intelligence - - LangChain - - OpenAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/claudeai/gsi/RAG_with_Couchbase_and_Claude(by_Anthropic).ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [OpenAI](https://openai.com/) as the AI-powered embedding and [Anthropic](https://claude.ai/) as the language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-openai-claude-couchbase-rag-with-fts/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/claudeai/gsi/RAG_with_Couchbase_and_Claude(by_Anthropic).ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for OpenAI and Anthropic - -* Please follow the [instructions](https://platform.openai.com/docs/quickstart) to generate the OpenAI credentials. -* Please follow the [instructions](https://docs.anthropic.com/en/api/getting-started) to generate the Anthropic credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. - -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and OpenAI provides advanced AI models for generating embeddings and Claude(by Anthropic) for understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-anthropic==0.3.19 langchain-openai==0.3.32 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta -from multiprocessing import AuthenticationError - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_anthropic import ChatAnthropic -from langchain_core.globals import set_llm_cache -from langchain_core.prompts.chat import (ChatPromptTemplate, - HumanMessagePromptTemplate, - SystemMessagePromptTemplate) -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_openai import OpenAIEmbeddings -from langchain_couchbase.vectorstores import IndexType -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -# Disable all logging except critical to prevent OpenAI API request logs -logging.getLogger("httpx").setLevel(logging.CRITICAL) -``` - -# Loading Sensitive Informnation -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -# Load from environment variables or prompt for input in one-liners -ANTHROPIC_API_KEY = os.getenv('ANTHROPIC_API_KEY') or getpass.getpass('Enter your Anthropic API key: ') -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API key: ') -CB_HOST = os.getenv('CB_HOST', 'couchbase://localhost') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME', 'Administrator') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD', 'password') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME', 'query-vector-search-testing') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME', 'shared') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME', 'claude') or input('Enter your collection name (default: claude): ') or 'claude' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION', 'cache') or input('Enter your cache collection name (default: cache): ') or 'cache' -# Check if the variables are correctly loaded -if not ANTHROPIC_API_KEY: - raise ValueError("ANTHROPIC_API_KEY is not set in the environment.") -if not OPENAI_API_KEY: - raise ValueError("OPENAI_API_KEY is not set in the environment.") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-09 12:15:22,899 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-09-09 12:15:26,795 - INFO - Bucket 'query-vector-search-testing' exists. - - - 2025-09-09 12:15:26,808 - INFO - Collection 'claude' does not exist. Creating it... - 2025-09-09 12:15:26,854 - INFO - Collection 'claude' created successfully. - 2025-09-09 12:15:29,065 - INFO - All documents cleared from the collection. - 2025-09-09 12:15:29,066 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-09 12:15:29,074 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-09-09 12:15:31,115 - INFO - All documents cleared from the collection. - - - - - - - - - -# Creating OpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using OpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - - - -```python -try: - embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model='text-embedding-3-small') - logging.info("Successfully created OpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating OpenAIEmbeddings: {str(e)}") -``` - - 2025-09-09 12:15:54,388 - INFO - Successfully created OpenAIEmbeddings - - -# Setting Up the Couchbase Query Vector Store -A vector store is where we'll keep our embeddings. The query vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, GSI converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables us to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - -The vector store requires a distance metric to determine how similarity between vectors is calculated. This is crucial for accurate semantic search results as different distance metrics can yield different similarity rankings. Some of the supported Distance strategies are dot, l2, euclidean, cosine, l2_squared, euclidean_squared. In our implementation we will use cosine which is particularly effective for text embeddings. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-09-09 12:16:02,578 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-09 12:16:16,461 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 100 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 100 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-09-09 12:18:40,320 - INFO - Document ingestion completed successfully. - - -# Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-09 12:18:47,269 - INFO - Successfully created cache - - -# Using the Claude 4 Sonnet Language Model (LLM) -Language models are AI systems that are trained to understand and generate human language. We'll be using the `Claude 4 Sonnet` language model to process user queries and generate meaningful responses. This model is a key component of our semantic search engine, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our search engine with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our search engine truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - - - - -```python -try: - llm = ChatAnthropic(temperature=0.1, anthropic_api_key=ANTHROPIC_API_KEY, model_name='claude-sonnet-4-20250514') - logging.info("Successfully created ChatAnthropic") -except Exception as e: - logging.error(f"Error creating ChatAnthropic: {str(e)}. Please check your API key and network connection.") - raise -``` - - 2025-09-09 12:20:36,212 - INFO - Successfully created ChatAnthropic - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-09 12:21:34,292 - INFO - Semantic search completed in 1.91 seconds - - - - Semantic Search Results (completed in 1.91 seconds): - -------------------------------------------------------------------------------- - Score: 0.2502, Text: A map shown during the draw for the 2026 Fifa World Cup has been criticised by Ukraine as an "unacceptable error" after it appeared to exclude Crimea as part of the country. The graphic - showing countries that cannot be drawn to play each other for geopolitical reasons - highlighted Ukraine but did not include the peninsula that is internationally recognised to be part of it. Crimea has been under Russian occupation since 2014 and just a handful of countries recognise the peninsula as Russian territory. Ukraine Foreign Ministry spokesman Heorhiy Tykhy said that the nation expects "a public apology". Fifa said it was "aware of an issue" and the image had been removed. - - Writing on X, Tykhy said that Fifa had not only "acted against international law" but had also "supported Russian propaganda, war crimes, and the crime of aggression against Ukraine". He added a "fixed" version of the map to his post, highlighting Crimea as part of Ukraine's territory. Among the countries that cannot play each other are Ukraine and Belarus, Spain and Gibraltar and Kosovo versus either Bosnia and Herzegovina or Serbia. - - This Twitter post cannot be displayed in your browser. Please enable Javascript or try a different browser. View original content on Twitter The BBC is not responsible for the content of external sites. Skip twitter post by Heorhii Tykhyi This article contains content provided by Twitter. We ask for your permission before anything is loaded, as they may be using cookies and other technologies. You may want to read Twitter’s cookie policy, external and privacy policy, external before accepting. To view this content choose ‘accept and continue’. The BBC is not responsible for the content of external sites. - - The Ukrainian Football Association has also sent a letter to Fifa secretary-general Mathias Grafström and UEFA secretary-general Theodore Theodoridis over the matter. "We appeal to you to express our deep concern about the infographic map [shown] on December 13, 2024," the letter reads. "Taking into account a number of official decisions and resolutions adopted by the Fifa Council and the UEFA executive committee since 2014... we emphasize that today's version of the cartographic image of Ukraine... is completely unacceptable and looks like an inconsistent position of Fifa and UEFA." The 2026 World Cup will start on 11 June that year in Mexico City and end on 19 July in New Jersey. The expanded 48-team tournament will last a record 39 days. Ukraine were placed in Group D alongside Iceland, Azerbaijan and the yet-to-be-determined winners of France's Nations League quarter-final against Croatia. - -------------------------------------------------------------------------------- - Score: 0.5698, Text: Defending champions Manchester City will face Juventus in the group stage of the Fifa Club World Cup next summer, while Chelsea meet Brazilian side Flamengo. Pep Guardiola's City, who beat Brazilian side Fluminense to win the tournament for the first time in 2023, begin their title defence against Morocco's Wydad and also play Al Ain of the United Arab Emirates in Group G. Chelsea, winners of the 2021 final, were also drawn alongside Mexico's Club Leon and Tunisian side Esperance Sportive de Tunisie in Group D. The revamped Fifa Club World Cup, which has been expanded to 32 teams, will take place in the United States between 15 June and 13 July next year. - - A complex and lengthy draw ceremony was held across two separate Miami locations and lasted more than 90 minutes, during which a new Club World Cup trophy was revealed. There was also a video message from incoming US president Donald Trump, whose daughter Ivanka drew the first team. Lionel Messi's Inter Miami will take on Egyptian side Al Ahly at the Hard Rock Stadium in the opening match, staged in Miami. Elsewhere, Paris St-Germain were drawn against Atletico Madrid in Group B, while Bayern Munich meet Benfica in another all-European group-stage match-up. Teams will play each other once in the group phase and the top two will progress to the knockout stage. - - This video can not be played To play this video you need to enable JavaScript in your browser. What is the Club World Cup? - - Teams from each of the six international football confederations will be represented at next summer's tournament, including 12 European clubs - the highest quota of any confederation. The European places were decided by clubs' Champions League performances over the past four seasons, with recent winners Chelsea, Manchester City and Real Madrid guaranteed places. Al Ain, the most successful club in the UAE with 14 league titles, are owned by the country's president Sheikh Mohamed bin Zayed Al Nahyan - the older brother of City owner Sheikh Mansour. Real, who lifted the Fifa Club World Cup trophy for a record-extending fifth time in 2022, will open up against Saudi Pro League champions Al-Hilal, who currently have Neymar in their ranks. One place was reserved for a club from the host nation, which Fifa controversially awarded to Inter Miami, who will contest the tournament curtain-raiser. Messi's side were winners of the regular-season MLS Supporters' Shield but beaten in the MLS play-offs, meaning they are not this season's champions. - • None How does the new Club World Cup work & why is it so controversial? - - Matches will be played across 12 venues in the US which, alongside Canada and Mexico, also host the 2026 World Cup. Fifa is facing legal action from player unions and leagues about the scheduling of the event, which begins two weeks after the Champions League final at the end of the 2024-25 European calendar and ends five weeks before the first Premier League match of the 2025-2026 season. But football's world governing body believes the dates allow sufficient rest time before the start of the domestic campaigns. The Club World Cup will now take place once every four years, when it was previously held annually and involved just seven teams. Streaming platform DAZN has secured exclusive rights to broadcast next summer's tournament, during which 63 matches will take place over 29 days. - -------------------------------------------------------------------------------- - Score: 0.5792, Text: After Fifa awards Saudi Arabia the hosting rights for the men's 2034 World Cup, BBC analysis editor Ros Atkins looks at how we got here and the controversies surrounding the decision. - -------------------------------------------------------------------------------- - Score: 0.5877, Text: FA still to decide on endorsing Saudi World Cup bid - - ... (output truncated for brevity) - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -vector_store.create_index(index_type=IndexType.BHIVE, index_name="claude_bhive_index",index_description="IVF,SQ8") -``` - - -```python -query = "What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-09 12:26:01,504 - INFO - Semantic search completed in 0.44 seconds - - - - Semantic Search Results (completed in 0.44 seconds): - -------------------------------------------------------------------------------- - Score: 0.2502, Text: A map shown during the draw for the 2026 Fifa World Cup has been criticised by Ukraine as an "unacceptable error" after it appeared to exclude Crimea as part of the country. The graphic - showing countries that cannot be drawn to play each other for geopolitical reasons - highlighted Ukraine but did not include the peninsula that is internationally recognised to be part of it. Crimea has been under Russian occupation since 2014 and just a handful of countries recognise the peninsula as Russian territory. Ukraine Foreign Ministry spokesman Heorhiy Tykhy said that the nation expects "a public apology". Fifa said it was "aware of an issue" and the image had been removed. - - Writing on X, Tykhy said that Fifa had not only "acted against international law" but had also "supported Russian propaganda, war crimes, and the crime of aggression against Ukraine". He added a "fixed" version of the map to his post, highlighting Crimea as part of Ukraine's territory. Among the countries that cannot play each other are Ukraine and Belarus, Spain and Gibraltar and Kosovo versus either Bosnia and Herzegovina or Serbia. - - This Twitter post cannot be displayed in your browser. Please enable Javascript or try a different browser. View original content on Twitter The BBC is not responsible for the content of external sites. Skip twitter post by Heorhii Tykhyi This article contains content provided by Twitter. We ask for your permission before anything is loaded, as they may be using cookies and other technologies. You may want to read Twitter’s cookie policy, external and privacy policy, external before accepting. To view this content choose ‘accept and continue’. The BBC is not responsible for the content of external sites. - - The Ukrainian Football Association has also sent a letter to Fifa secretary-general Mathias Grafström and UEFA secretary-general Theodore Theodoridis over the matter. "We appeal to you to express our deep concern about the infographic map [shown] on December 13, 2024," the letter reads. "Taking into account a number of official decisions and resolutions adopted by the Fifa Council and the UEFA executive committee since 2014... we emphasize that today's version of the cartographic image of Ukraine... is completely unacceptable and looks like an inconsistent position of Fifa and UEFA." The 2026 World Cup will start on 11 June that year in Mexico City and end on 19 July in New Jersey. The expanded 48-team tournament will last a record 39 days. Ukraine were placed in Group D alongside Iceland, Azerbaijan and the yet-to-be-determined winners of France's Nations League quarter-final against Croatia. - -------------------------------------------------------------------------------- - Score: 0.5698, Text: Defending champions Manchester City will face Juventus in the group stage of the Fifa Club World Cup next summer, while Chelsea meet Brazilian side Flamengo. Pep Guardiola's City, who beat Brazilian side Fluminense to win the tournament for the first time in 2023, begin their title defence against Morocco's Wydad and also play Al Ain of the United Arab Emirates in Group G. Chelsea, winners of the 2021 final, were also drawn alongside Mexico's Club Leon and Tunisian side Esperance Sportive de Tunisie in Group D. The revamped Fifa Club World Cup, which has been expanded to 32 teams, will take place in the United States between 15 June and 13 July next year. - - A complex and lengthy draw ceremony was held across two separate Miami locations and lasted more than 90 minutes, during which a new Club World Cup trophy was revealed. There was also a video message from incoming US president Donald Trump, whose daughter Ivanka drew the first team. Lionel Messi's Inter Miami will take on Egyptian side Al Ahly at the Hard Rock Stadium in the opening match, staged in Miami. Elsewhere, Paris St-Germain were drawn against Atletico Madrid in Group B, while Bayern Munich meet Benfica in another all-European group-stage match-up. Teams will play each other once in the group phase and the top two will progress to the knockout stage. - - This video can not be played To play this video you need to enable JavaScript in your browser. What is the Club World Cup? - - Teams from each of the six international football confederations will be represented at next summer's tournament, including 12 European clubs - the highest quota of any confederation. The European places were decided by clubs' Champions League performances over the past four seasons, with recent winners Chelsea, Manchester City and Real Madrid guaranteed places. Al Ain, the most successful club in the UAE with 14 league titles, are owned by the country's president Sheikh Mohamed bin Zayed Al Nahyan - the older brother of City owner Sheikh Mansour. Real, who lifted the Fifa Club World Cup trophy for a record-extending fifth time in 2022, will open up against Saudi Pro League champions Al-Hilal, who currently have Neymar in their ranks. One place was reserved for a club from the host nation, which Fifa controversially awarded to Inter Miami, who will contest the tournament curtain-raiser. Messi's side were winners of the regular-season MLS Supporters' Shield but beaten in the MLS play-offs, meaning they are not this season's champions. - • None How does the new Club World Cup work & why is it so controversial? - - Matches will be played across 12 venues in the US which, alongside Canada and Mexico, also host the 2026 World Cup. Fifa is facing legal action from player unions and leagues about the scheduling of the event, which begins two weeks after the Champions League final at the end of the 2024-25 European calendar and ends five weeks before the first Premier League match of the 2025-2026 season. But football's world governing body believes the dates allow sufficient rest time before the start of the domestic campaigns. The Club World Cup will now take place once every four years, when it was previously held annually and involved just seven teams. Streaming platform DAZN has secured exclusive rights to broadcast next summer's tournament, during which 63 matches will take place over 29 days. - -------------------------------------------------------------------------------- - Score: 0.5792, Text: After Fifa awards Saudi Arabia the hosting rights for the men's 2034 World Cup, BBC analysis editor Ros Atkins looks at how we got here and the controversies surrounding the decision. - -------------------------------------------------------------------------------- - Score: 0.5877, Text: FA still to decide on endorsing Saudi World Cup bid - - ... (output truncated for brevity) - - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - - -```python -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="claude_composite_index", index_description="IVF,SQ8") -``` - -# Retrieval-Augmented Generation (RAG) with Couchbase and LangChain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -system_template = "You are a helpful assistant that answers questions based on the provided context." -system_message_prompt = SystemMessagePromptTemplate.from_template(system_template) - -human_template = "Context: {context}\n\nQuestion: {question}" -human_message_prompt = HumanMessagePromptTemplate.from_template(human_template) - -chat_prompt = ChatPromptTemplate.from_messages([ - system_message_prompt, - human_message_prompt -]) - -def format_docs(docs): - return "\n\n".join(doc.page_content for doc in docs) - -rag_chain = ( - {"context": lambda x: format_docs(vector_store.similarity_search(x)), "question": RunnablePassthrough()} - | chat_prompt - | llm -) -logging.info("Successfully created RAG chain") -``` - - 2025-09-09 12:26:10,540 - INFO - Successfully created RAG chain - - - -```python -try: - start_time = time.time() - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - - print(f"RAG Response: {rag_response.content}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except AuthenticationError as e: - print(f"Authentication error: {str(e)}") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: During the draw for the 2026 FIFA World Cup, a map was shown that excluded Crimea as part of Ukraine. This graphic, which was displaying countries that cannot be drawn to play each other for geopolitical reasons, highlighted Ukraine but did not include the Crimean peninsula, which is internationally recognized as Ukrainian territory. - - This omission sparked significant controversy because Crimea has been under Russian occupation since 2014, but only a handful of countries recognize it as Russian territory. The Ukrainian Foreign Ministry spokesman, Heorhiy Tykhy, called this an "unacceptable error" and stated that Ukraine expected "a public apology" from FIFA. He criticized FIFA for acting "against international law" and supporting "Russian propaganda, war crimes, and the crime of aggression against Ukraine." - - The Ukrainian Football Association also sent a formal letter of complaint to FIFA and UEFA officials expressing their "deep concern" about the cartographic representation. FIFA acknowledged they were "aware of an issue" and subsequently removed the image. - RAG response generated in 8.68 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "What happened when Apple's AI feature generated a false BBC headline about a murder case in New York?", - "What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy?", # Repeated query - "What happened when Apple's AI feature generated a false BBC headline about a murder case in New York?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response.content}") - print(f"Time taken: {elapsed_time:.2f} seconds") -except AuthenticationError as e: - print(f"Authentication error: {str(e)}") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened when Apple's AI feature generated a false BBC headline about a murder case in New York? - Response: According to the context, Apple Intelligence (an AI feature that summarizes notifications) generated a false headline that made it appear as if BBC News had published an article claiming Luigi Mangione, who was arrested for the murder of healthcare insurance CEO Brian Thompson in New York, had shot himself. This was completely false - Mangione had not shot himself. - - The BBC complained to Apple about this misrepresentation, with a BBC spokesperson stating they had "contacted Apple to raise this concern and fix the problem." The spokesperson emphasized that it's "essential" that audiences can trust information published under the BBC name, including notifications. - - This wasn't an isolated incident, as the context mentions that Apple's AI feature also misrepresented a New York Times article, incorrectly summarizing it as "Netanyahu arrested" when the actual article was about the International Criminal Court issuing an arrest warrant for the Israeli prime minister. - Time taken: 6.22 seconds - - Query 2: What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy? - Response: During the draw for the 2026 FIFA World Cup, a map was shown that excluded Crimea as part of Ukraine. This graphic, which was displaying countries that cannot be drawn to play each other for geopolitical reasons, highlighted Ukraine but did not include the Crimean peninsula, which is internationally recognized as Ukrainian territory. - - This omission sparked significant controversy because Crimea has been under Russian occupation since 2014, but only a handful of countries recognize it as Russian territory. The Ukrainian Foreign Ministry spokesman, Heorhiy Tykhy, called this an "unacceptable error" and stated that Ukraine expected "a public apology" from FIFA. He criticized FIFA for acting "against international law" and supporting "Russian propaganda, war crimes, and the crime of aggression against Ukraine." - - The Ukrainian Football Association also sent a formal letter of complaint to FIFA and UEFA officials expressing their "deep concern" about the cartographic representation. FIFA acknowledged they were "aware of an issue" and subsequently removed the image. - Time taken: 0.47 seconds - - Query 3: What happened when Apple's AI feature generated a false BBC headline about a murder case in New York? - Response: According to the context, Apple Intelligence (an AI feature that summarizes notifications) generated a false headline that made it appear as if BBC News had published an article claiming Luigi Mangione, who was arrested for the murder of healthcare insurance CEO Brian Thompson in New York, had shot himself. This was completely false - Mangione had not shot himself. - - The BBC complained to Apple about this misrepresentation, with a BBC spokesperson stating they had "contacted Apple to raise this concern and fix the problem." The spokesperson emphasized that it's "essential" that audiences can trust information published under the BBC name, including notifications. - - This wasn't an isolated incident, as the context mentions that Apple's AI feature also misrepresented a New York Times article, incorrectly summarizing it as "Netanyahu arrested" when the actual article was about the International Criminal Court issuing an arrest warrant for the Israeli prime minister. - Time taken: 0.46 seconds - - -## Conclusion -By following these steps, you’ll have a fully functional semantic search engine that leverages the strengths of Couchbase and Claude(by Anthropic). This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how it improves querying data more efficiently using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/cohere-fts-RAG_with_Couchbase_and_Cohere.md b/tutorial/markdown/generated/vector-search-cookbook/cohere-fts-RAG_with_Couchbase_and_Cohere.md deleted file mode 100644 index 6c8304b..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/cohere-fts-RAG_with_Couchbase_and_Cohere.md +++ /dev/null @@ -1,696 +0,0 @@ ---- -# frontmatter -path: "/tutorial-cohere-couchbase-rag-with-fts" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Cohere using FTS service -short_title: RAG with Couchbase and Cohere using FTS service -description: - - Learn how to build a semantic search engine using Couchbase and Cohere using FTS service. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Cohere embeddings and language models. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - Cohere -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/cohere/fts/RAG_with_Couchbase_and_Cohere.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Cohere](https://cohere.com/) - as the AI-powered embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using the FTS service from scratch. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-cohere-couchbase-rag-with-global-secondary-index/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/cohere/fts/RAG_with_Couchbase_and_Cohere.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for Cohere - -Please follow the [instructions](https://dashboard.cohere.com/welcome/register) to generate the Cohere credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.3.0 langchain-cohere==0.4.4 python-dotenv==1.1.0 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta -from uuid import uuid4 - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_cohere import ChatCohere, CohereEmbeddings -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True) - -# Supress Excessive logging -logging.getLogger('openai').setLevel(logging.WARNING) -logging.getLogger('httpx').setLevel(logging.WARNING) -logging.getLogger('langchain_cohere').setLevel(logging.ERROR) - -``` - -# Loading Sensitive Informnation -In this section, we prompt the user to input essential configuration settings needed for integrating Couchbase with Cohere's API. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -COHERE_API_KEY = os.getenv('COHERE_API_KEY') or getpass.getpass('Enter your Cohere API key: ') -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME') or input('Enter your index name (default: vector_search_cohere): ') or 'vector_search_cohere' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: cohere): ') or 'cohere' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if the variables are correctly loaded -if not COHERE_API_KEY: - raise ValueError("COHERE_API_KEY is not provided and is required.") -``` - -# Connect to Couchbase -The script attempts to establish a connection to the Couchbase database using the credentials retrieved from the environment variables. Couchbase is a NoSQL database known for its flexibility, scalability, and support for various data models, including document-based storage. The connection is authenticated using a username and password, and the script waits until the connection is fully established before proceeding. - - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-02-06 01:27:13,562 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-02-06 01:27:14,806 - INFO - Bucket 'vector-search-testing' exists. - 2025-02-06 01:27:17,199 - INFO - Collection 'cohere' already exists. Skipping creation. - 2025-02-06 01:27:20,585 - INFO - Primary index present or created successfully. - 2025-02-06 01:27:20,888 - INFO - All documents cleared from the collection. - 2025-02-06 01:27:20,889 - INFO - Bucket 'vector-search-testing' exists. - 2025-02-06 01:27:23,271 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-02-06 01:27:26,258 - INFO - Primary index present or created successfully. - 2025-02-06 01:27:26,497 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This Cohere vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `cohere`. The configuration is set up for vectors with exactly `1024 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/cohere_index.json' # Local setup: specify your file path here - -# # Version for Google Colab -# def load_index_definition_colab(): -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -# try: -# with open(index_definition_path, 'r') as file: -# index_definition = json.load(file) -# return index_definition -# except Exception as e: -# raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Version for Local Environment -def load_index_definition_local(index_definition_path): - try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - return index_definition - except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Usage -# Uncomment the appropriate line based on your environment -# index_definition = load_index_definition_colab() -index_definition = load_index_definition_local('cohere_index.json') -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-02-06 01:27:27,729 - INFO - Index 'vector_search_cohere' found - 2025-02-06 01:27:28,595 - INFO - Index 'vector_search_cohere' already exists. Skipping creation/update. - - -# Create Embeddings -Embeddings are created using the Cohere API. Embeddings are vectors (arrays of numbers) that represent the meaning of text in a high-dimensional space. These embeddings are crucial for tasks like semantic search, where the goal is to find text that is semantically similar to a query. The script uses a pre-trained model provided by Cohere to generate embeddings for the text in the TREC dataset. - - -```python -try: - embeddings = CohereEmbeddings( - cohere_api_key=COHERE_API_KEY, - model="embed-english-v3.0", - ) - logging.info("Successfully created CohereEmbeddings") -except Exception as e: - raise ValueError(f"Error creating CohereEmbeddings: {str(e)}") -``` - - 2025-02-06 01:27:28,613 - INFO - Successfully created CohereEmbeddings - - -# Set Up Vector Store -The vector store is set up to manage the embeddings created in the previous step. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is built on top of Couchbase, allowing the script to store the embeddings in a way that can be efficiently searched. - - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-02-06 01:27:32,177 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-02-06 01:27:38,003 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of 50 articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-02-06 01:29:07,077 - INFO - Document ingestion completed successfully. - - -# Set Up Cache - A cache is set up using Couchbase to store intermediate results and frequently accessed data. Caching is important for improving performance, as it reduces the need to repeatedly calculate or retrieve the same data. The cache is linked to a specific collection in Couchbase, and it is used later in the script to store the results of language model queries. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-02-06 01:30:37,657 - INFO - Successfully created cache - - -# Create Language Model (LLM) -The script initializes a Cohere language model (LLM) that will be used for generating responses to queries. LLMs are powerful tools for natural language understanding and generation, capable of producing human-like text based on input prompts. The model is configured with specific parameters, such as the temperature, which controls the randomness of its outputs. - - - -```python -try: - llm = ChatCohere( - cohere_api_key=COHERE_API_KEY, - model="command-a-03-2025", - temperature=0 - ) - logging.info("Successfully created Cohere LLM with model command") -except Exception as e: - raise ValueError(f"Error creating Cohere LLM: {str(e)}") -``` - - 2025-02-06 01:30:38,684 - INFO - Successfully created Cohere LLM with model command - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. - -In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of the CouchbaseSearchVectorStore. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-02-06 01:30:43,101 - INFO - Semantic search completed in 1.89 seconds - - - - Semantic Search Results (completed in 1.89 seconds): - -------------------------------------------------------------------------------- - Score: 0.6641, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affected by the worst run of results in his entire managerial career. In an interview with former Italy international Luca Toni for Amazon Prime Sport before Wednesday's Champions League defeat by Juventus, Guardiola touched on the personal impact City's sudden downturn in form has had. Guardiola said his state of mind was "ugly", that his sleep was "worse" and he was eating lighter as his digestion had suffered. City go into Sunday's derby against Manchester United at Etihad Stadium having won just one of their past 10 games. The Juventus loss means there is a chance they may not even secure a play-off spot in the Champions League. Asked to elaborate on his comments to Toni, Guardiola said: "I'm fine. "In our jobs we always want to do our best or the best as possible. When that doesn't happen you are more uncomfortable than when the situation is going well, always that happened. "In good moments I am happier but when I get to the next game I am still concerned about what I have to do. There is no human being that makes an activity and it doesn't matter how they do." Guardiola said City have to defend better and "avoid making mistakes at both ends". To emphasise his point, Guardiola referred back to the third game of City's current run, against a Sporting side managed by Ruben Amorim, who will be in the United dugout at the weekend. City dominated the first half in Lisbon, led thanks to Phil Foden's early effort and looked to be cruising. Instead, they conceded three times in 11 minutes either side of half-time as Sporting eventually ran out 4-1 winners. "I would like to play the game like we played in Lisbon on Sunday, believe me," said Guardiola, who is facing the prospect of only having three fit defenders for the derby as Nathan Ake and Manuel Akanji try to overcome injury concerns. If there is solace for City, it comes from the knowledge United are not exactly flying. Their comeback Europa League victory against Viktoria Plzen on Thursday was their third win of Amorim's short reign so far but only one of those successes has come in the Premier League, where United have lost their past two games against Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements already on the red side of the city. "It's already there," he said. "You see all the patterns, the movements, the runners and the pace. He will do a good job at United, I'm pretty sure of that." - - Guardiola says skipper Kyle Walker has been offered support by the club after the City defender highlighted the racial abuse he had received on social media in the wake of the Juventus trip. "It's unacceptable," he said. "Not because it's Kyle - for any human being. "Unfortunately it happens many times in the real world. It is not necessary to say he has the support of the entire club. It is completely unacceptable and we give our support to him." - -------------------------------------------------------------------------------- - Score: 0.6521, Text: 'We have to find a way' - Guardiola vows to end relegation form - - This video can not be played To play this video you need to enable JavaScript in your browser. 'Worrying' and 'staggering' - Why do Manchester City keep conceding? - - Manchester City are currently in relegation form and there is little sign of it ending. Saturday's 2-1 defeat at Aston Villa left them joint bottom of the form table over the past eight games with just Southampton for company. Saints, at the foot of the Premier League, have the same number of points, four, as City over their past eight matches having won one, drawn one and lost six - the same record as the floundering champions. And if Southampton - who appointed Ivan Juric as their new manager on Saturday - get at least a point at Fulham on Sunday, City will be on the worst run in the division. Even Wolves, who sacked boss Gary O'Neil last Sunday and replaced him with Vitor Pereira, have earned double the number of points during the same period having played a game fewer. They are damning statistics for Pep Guardiola, even if he does have some mitigating circumstances with injuries to Ederson, Nathan Ake and Ruben Dias - who all missed the loss at Villa Park - and the long-term loss of midfield powerhouse Rodri. Guardiola was happy with Saturday's performance, despite defeat in Birmingham, but there is little solace to take at slipping further out of the title race. He may have needed to field a half-fit Manuel Akanji and John Stones at Villa Park but that does not account for City looking a shadow of their former selves. That does not justify the error Josko Gvardiol made to gift Jhon Duran a golden chance inside the first 20 seconds, or £100m man Jack Grealish again failing to have an impact on a game. There may be legitimate reasons for City's drop off, whether that be injuries, mental fatigue or just simply a team coming to the end of its lifecycle, but their form, which has plunged off a cliff edge, would have been unthinkable as they strolled to a fourth straight title last season. "The worrying thing is the number of goals conceded," said ex-England captain Alan Shearer on BBC Match of the Day. "The number of times they were opened up because of the lack of protection and legs in midfield was staggering. There are so many things that are wrong at this moment in time." - - This video can not be played To play this video you need to enable JavaScript in your browser. Man City 'have to find a way' to return to form - Guardiola - - Afterwards Guardiola was calm, so much so it was difficult to hear him in the news conference, a contrast to the frustrated figure he cut on the touchline. He said: "It depends on us. The solution is bring the players back. We have just one central defender fit, that is difficult. We are going to try next game - another opportunity and we don't think much further than that. "Of course there are more reasons. We concede the goals we don't concede in the past, we [don't] score the goals we score in the past. Football is not just one reason. There are a lot of little factors. "Last season we won the Premier League, but we came here and lost. We have to think positive and I have incredible trust in the guys. Some of them have incredible pride and desire to do it. We have to find a way, step by step, sooner or later to find a way back." Villa boss Unai Emery highlighted City's frailties, saying he felt Villa could seize on the visitors' lack of belief. "Manchester City are a little bit under the confidence they have normally," he said. "The second half was different, we dominated and we scored. Through those circumstances they were feeling worse than even in the first half." - - Erling Haaland had one touch in the Villa box - - There are chinks in the armour never seen before at City under Guardiola and Erling Haaland conceded belief within the squad is low. He told TNT after the game: "Of course, [confidence levels are] not the best. We know how important confidence is and you can see that it affects every human being. That is how it is, we have to continue and stay positive even though it is difficult." Haaland, with 76 goals in 83 Premier League appearances since joining City from Borussia Dortmund in 2022, had one shot and one touch in the Villa box. His 18 touches in the whole game were the lowest of all starting players and he has been self critical, despite scoring 13 goals in the top flight this season. Over City's last eight games he has netted just twice though, but Guardiola refused to criticise his star striker. He said: "Without him we will be even worse but I like the players feeling that way. I don't agree with Erling. He needs to have the balls delivered in the right spots but he will fight for the next one." - -------------------------------------------------------------------------------- - Score: 0.6322, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - - ... (output truncated for brevity) - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -try: - template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below: - {context} - - Question: {question}""" - prompt = ChatPromptTemplate.from_template(template) - - rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | prompt - | llm - | StrOutputParser() - ) - logging.info("Successfully created RAG chain") -except Exception as e: - raise ValueError(f"Error creating RAG chain: {str(e)}") -``` - - 2025-02-06 01:30:46,088 - INFO - Successfully created RAG chain - - - -```python -start_time = time.time() -try: - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: Manchester City manager Pep Guardiola has been open about the impact the team's poor form has had on him personally. He has admitted that his sleep and diet have been affected, and that he has been feeling "ugly" and uncomfortable. Guardiola has also been giving a lot of thought to the reasons for the team's decline, talking to many people and trying to work out the causes. He has been very protective of his players, refusing to criticise them and instead giving them more days off to clear their heads. - - Guardiola has also been very self-critical, saying that he is "not good enough" and that he needs to find solutions to the team's problems. He has acknowledged that the team is not performing as well as it used to, and that there are many factors contributing to their poor form, including injuries, mental fatigue, and a lack of confidence. He has also suggested that the team needs to improve its defensive concepts and re-establish its intensity. - - Overall, Guardiola seems to be taking a very hands-on approach to the team's struggles, trying to find solutions and protect his players while also being very honest about his own role in the situation. - RAG response generated in 9.52 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What was manchester city manager pep guardiola's reaction to the team's current form?", # Repeated query - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: Liverpool and Fulham played out a thrilling 2-2 draw at Anfield. Liverpool were reduced to 10 men after Andy Robertson was sent off in the 17th minute, but they fought back twice to earn a point. The Reds dominated the match despite their numerical disadvantage, with over 60% possession and leading in several attacking metrics. Diogo Jota scored the equaliser in the 86th minute, capping off an impressive performance that showcased Liverpool's title credentials. - Time taken: 5.29 seconds - - Query 2: What was manchester city manager pep guardiola's reaction to the team's current form? - Response: Manchester City manager Pep Guardiola has been open about the impact the team's poor form has had on him personally. He has admitted that his sleep and diet have been affected, and that he has been feeling "ugly" and uncomfortable. Guardiola has also been giving a lot of thought to the reasons for the team's decline, talking to many people and trying to work out the causes. He has been very protective of his players, refusing to criticise them and instead giving them more days off to clear their heads. - - Guardiola has also been very self-critical, saying that he is "not good enough" and that he needs to find solutions to the team's problems. He has acknowledged that the team is not performing as well as it used to, and that there are many factors contributing to their poor form, including injuries, mental fatigue, and a lack of confidence. He has also suggested that the team needs to improve its defensive concepts and re-establish its intensity. - - Overall, Guardiola seems to be taking a very hands-on approach to the team's struggles, trying to find solutions and protect his players while also being very honest about his own role in the situation. - Time taken: 2.13 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: Liverpool and Fulham played out a thrilling 2-2 draw at Anfield. Liverpool were reduced to 10 men after Andy Robertson was sent off in the 17th minute, but they fought back twice to earn a point. The Reds dominated the match despite their numerical disadvantage, with over 60% possession and leading in several attacking metrics. Diogo Jota scored the equaliser in the 86th minute, capping off an impressive performance that showcased Liverpool's title credentials. - Time taken: 1.36 seconds - - -## Conclusion -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and Cohere. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/cohere-gsi-RAG_with_Couchbase_and_Cohere.md b/tutorial/markdown/generated/vector-search-cookbook/cohere-gsi-RAG_with_Couchbase_and_Cohere.md deleted file mode 100644 index 65422fa..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/cohere-gsi-RAG_with_Couchbase_and_Cohere.md +++ /dev/null @@ -1,729 +0,0 @@ ---- -# frontmatter -path: "/tutorial-cohere-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Cohere with GSI -short_title: RAG with Couchbase and Cohere with GSI -description: - - Learn how to build a semantic search engine using Couchbase and Cohere using GSI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Cohere embeddings and language models. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - GSI - - Artificial Intelligence - - LangChain - - Cohere -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/cohere/gsi/RAG_with_Couchbase_and_Cohere.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Cohere](https://cohere.com/) - as the AI-powered embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-cohere-couchbase-rag-with-fts/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/cohere/gsi/RAG_with_Couchbase_and_Cohere.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for Cohere - -Please follow the [instructions](https://dashboard.cohere.com/welcome/register) to generate the Cohere credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-cohere==0.4.5 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta -from uuid import uuid4 - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_cohere import ChatCohere, CohereEmbeddings -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True) - -# Supress Excessive logging -logging.getLogger('openai').setLevel(logging.WARNING) -logging.getLogger('httpx').setLevel(logging.WARNING) -logging.getLogger('langchain_cohere').setLevel(logging.ERROR) - -``` - -# Loading Sensitive Informnation -In this section, we prompt the user to input essential configuration settings needed for integrating Couchbase with Cohere's API. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -COHERE_API_KEY = os.getenv('COHERE_API_KEY') or getpass.getpass('Enter your Cohere API key: ') -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: cohere): ') or 'cohere' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if the variables are correctly loaded -if not COHERE_API_KEY: - raise ValueError("COHERE_API_KEY is not provided and is required.") -``` - -# Connect to Couchbase -The script attempts to establish a connection to the Couchbase database using the credentials retrieved from the environment variables. Couchbase is a NoSQL database known for its flexibility, scalability, and support for various data models, including document-based storage. The connection is authenticated using a username and password, and the script waits until the connection is fully established before proceeding. - - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-22 12:56:30,972 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-09-15 12:43:04,085 - INFO - Bucket 'query-vector-search-testing' exists. - - - 2025-09-15 12:43:04,101 - INFO - Collection 'cohere' already exists. Skipping creation. - 2025-09-15 12:43:06,191 - INFO - All documents cleared from the collection. - 2025-09-15 12:43:06,193 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-15 12:43:06,199 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-09-15 12:43:08,367 - INFO - All documents cleared from the collection. - - - - - - - - - -# Create Embeddings -Embeddings are created using the Cohere API. Embeddings are vectors (arrays of numbers) that represent the meaning of text in a high-dimensional space. These embeddings are crucial for tasks like semantic search, where the goal is to find text that is semantically similar to a query. The script uses a pre-trained model provided by Cohere to generate embeddings for the text in the TREC dataset. - - -```python -try: - embeddings = CohereEmbeddings( - cohere_api_key=COHERE_API_KEY, - model="embed-english-v3.0", - ) - logging.info("Successfully created CohereEmbeddings") -except Exception as e: - raise ValueError(f"Error creating CohereEmbeddings: {str(e)}") -``` - - 2025-09-22 12:56:36,813 - INFO - Successfully created CohereEmbeddings - - -# Set Up Vector Store -The vector store is set up to manage the embeddings created in the previous step. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is built on top of Couchbase, allowing the script to store the embeddings in a way that can be efficiently searched. - - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-09-22 12:56:39,259 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-15 12:43:32,383 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of 50 articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-09-15 12:45:26,834 - INFO - Document ingestion completed successfully. - - -# Create Language Model (LLM) -The script initializes a Cohere language model (LLM) that will be used for generating responses to queries. LLMs are powerful tools for natural language understanding and generation, capable of producing human-like text based on input prompts. The model is configured with specific parameters, such as the temperature, which controls the randomness of its outputs. - - - -```python -try: - llm = ChatCohere( - cohere_api_key=COHERE_API_KEY, - model="command-a-03-2025", - temperature=0 - ) - logging.info("Successfully created Cohere LLM with model command") -except Exception as e: - raise ValueError(f"Error creating Cohere LLM: {str(e)}") -``` - - 2025-09-22 12:58:23,399 - INFO - Successfully created Cohere LLM with model command - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-22 12:59:03,622 - INFO - Semantic search completed in 1.18 seconds - - - - Semantic Search Results (completed in 1.18 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3359, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affected by the worst run of results in his entire managerial career. In an interview with former Italy international Luca Toni for Amazon Prime Sport before Wednesday's Champions League defeat by Juventus, Guardiola touched on the personal impact City's sudden downturn in form has had. Guardiola said his state of mind was "ugly", that his sleep was "worse" and he was eating lighter as his digestion had suffered. City go into Sunday's derby against Manchester United at Etihad Stadium having won just one of their past 10 games. The Juventus loss means there is a chance they may not even secure a play-off spot in the Champions League. Asked to elaborate on his comments to Toni, Guardiola said: "I'm fine. "In our jobs we always want to do our best or the best as possible. When that doesn't happen you are more uncomfortable than when the situation is going well, always that happened. "In good moments I am happier but when I get to the next game I am still concerned about what I have to do. There is no human being that makes an activity and it doesn't matter how they do." Guardiola said City have to defend better and "avoid making mistakes at both ends". To emphasise his point, Guardiola referred back to the third game of City's current run, against a Sporting side managed by Ruben Amorim, who will be in the United dugout at the weekend. City dominated the first half in Lisbon, led thanks to Phil Foden's early effort and looked to be cruising. Instead, they conceded three times in 11 minutes either side of half-time as Sporting eventually ran out 4-1 winners. "I would like to play the game like we played in Lisbon on Sunday, believe me," said Guardiola, who is facing the prospect of only having three fit defenders for the derby as Nathan Ake and Manuel Akanji try to overcome injury concerns. If there is solace for City, it comes from the knowledge United are not exactly flying. Their comeback Europa League victory against Viktoria Plzen on Thursday was their third win of Amorim's short reign so far but only one of those successes has come in the Premier League, where United have lost their past two games against Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements already on the red side of the city. "It's already there," he said. "You see all the patterns, the movements, the runners and the pace. He will do a good job at United, I'm pretty sure of that." - - Guardiola says skipper Kyle Walker has been offered support by the club after the City defender highlighted the racial abuse he had received on social media in the wake of the Juventus trip. "It's unacceptable," he said. "Not because it's Kyle - for any human being. "Unfortunately it happens many times in the real world. It is not necessary to say he has the support of the entire club. It is completely unacceptable and we give our support to him." - -------------------------------------------------------------------------------- - Distance: 0.3477, Text: 'We have to find a way' - Guardiola vows to end relegation form - - This video can not be played To play this video you need to enable JavaScript in your browser. 'Worrying' and 'staggering' - Why do Manchester City keep conceding? - - Manchester City are currently in relegation form and there is little sign of it ending. Saturday's 2-1 defeat at Aston Villa left them joint bottom of the form table over the past eight games with just Southampton for company. Saints, at the foot of the Premier League, have the same number of points, four, as City over their past eight matches having won one, drawn one and lost six - the same record as the floundering champions. And if Southampton - who appointed Ivan Juric as their new manager on Saturday - get at least a point at Fulham on Sunday, City will be on the worst run in the division. Even Wolves, who sacked boss Gary O'Neil last Sunday and replaced him with Vitor Pereira, have earned double the number of points during the same period having played a game fewer. They are damning statistics for Pep Guardiola, even if he does have some mitigating circumstances with injuries to Ederson, Nathan Ake and Ruben Dias - who all missed the loss at Villa Park - and the long-term loss of midfield powerhouse Rodri. Guardiola was happy with Saturday's performance, despite defeat in Birmingham, but there is little solace to take at slipping further out of the title race. He may have needed to field a half-fit Manuel Akanji and John Stones at Villa Park but that does not account for City looking a shadow of their former selves. That does not justify the error Josko Gvardiol made to gift Jhon Duran a golden chance inside the first 20 seconds, or £100m man Jack Grealish again failing to have an impact on a game. There may be legitimate reasons for City's drop off, whether that be injuries, mental fatigue or just simply a team coming to the end of its lifecycle, but their form, which has plunged off a cliff edge, would have been unthinkable as they strolled to a fourth straight title last season. "The worrying thing is the number of goals conceded," said ex-England captain Alan Shearer on BBC Match of the Day. "The number of times they were opened up because of the lack of protection and legs in midfield was staggering. There are so many things that are wrong at this moment in time." - - This video can not be played To play this video you need to enable JavaScript in your browser. Man City 'have to find a way' to return to form - Guardiola - - Afterwards Guardiola was calm, so much so it was difficult to hear him in the news conference, a contrast to the frustrated figure he cut on the touchline. He said: "It depends on us. The solution is bring the players back. We have just one central defender fit, that is difficult. We are going to try next game - another opportunity and we don't think much further than that. "Of course there are more reasons. We concede the goals we don't concede in the past, we [don't] score the goals we score in the past. Football is not just one reason. There are a lot of little factors. "Last season we won the Premier League, but we came here and lost. We have to think positive and I have incredible trust in the guys. Some of them have incredible pride and desire to do it. We have to find a way, step by step, sooner or later to find a way back." Villa boss Unai Emery highlighted City's frailties, saying he felt Villa could seize on the visitors' lack of belief. "Manchester City are a little bit under the confidence they have normally," he said. "The second half was different, we dominated and we scored. Through those circumstances they were feeling worse than even in the first half." - - Erling Haaland had one touch in the Villa box - - There are chinks in the armour never seen before at City under Guardiola and Erling Haaland conceded belief within the squad is low. He told TNT after the game: "Of course, [confidence levels are] not the best. We know how important confidence is and you can see that it affects every human being. That is how it is, we have to continue and stay positive even though it is difficult." Haaland, with 76 goals in 83 Premier League appearances since joining City from Borussia Dortmund in 2022, had one shot and one touch in the Villa box. His 18 touches in the whole game were the lowest of all starting players and he has been self critical, despite scoring 13 goals in the top flight this season. Over City's last eight games he has netted just twice though, but Guardiola refused to criticise his star striker. He said: "Without him we will be even worse but I like the players feeling that way. I don't agree with Erling. He needs to have the balls delivered in the right spots but he will fight for the next one." - -------------------------------------------------------------------------------- - Distance: 0.3677, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - - ... (output truncated for brevity) - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -vector_store.create_index(index_type=IndexType.BHIVE, index_name="cohere_bhive_index",index_description="IVF,SQ8") -``` - -The example below shows running the same similarity search, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - -**Important**: When using Composite indexes, scalar filters take precedence over vector similarity, which can improve performance for filtered searches but may miss some semantically relevant results that don't match the scalar criteria. - -Note: In GSI vector search, the distance represents the vector distance between the query and document embeddings. Lower distance indicate higher similarity, while higher distance indicate lower similarity. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-22 12:59:26,949 - INFO - Semantic search completed in 0.38 seconds - - - - Semantic Search Results (completed in 0.38 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3359, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affected by the worst run of results in his entire managerial career. In an interview with former Italy international Luca Toni for Amazon Prime Sport before Wednesday's Champions League defeat by Juventus, Guardiola touched on the personal impact City's sudden downturn in form has had. Guardiola said his state of mind was "ugly", that his sleep was "worse" and he was eating lighter as his digestion had suffered. City go into Sunday's derby against Manchester United at Etihad Stadium having won just one of their past 10 games. The Juventus loss means there is a chance they may not even secure a play-off spot in the Champions League. Asked to elaborate on his comments to Toni, Guardiola said: "I'm fine. "In our jobs we always want to do our best or the best as possible. When that doesn't happen you are more uncomfortable than when the situation is going well, always that happened. "In good moments I am happier but when I get to the next game I am still concerned about what I have to do. There is no human being that makes an activity and it doesn't matter how they do." Guardiola said City have to defend better and "avoid making mistakes at both ends". To emphasise his point, Guardiola referred back to the third game of City's current run, against a Sporting side managed by Ruben Amorim, who will be in the United dugout at the weekend. City dominated the first half in Lisbon, led thanks to Phil Foden's early effort and looked to be cruising. Instead, they conceded three times in 11 minutes either side of half-time as Sporting eventually ran out 4-1 winners. "I would like to play the game like we played in Lisbon on Sunday, believe me," said Guardiola, who is facing the prospect of only having three fit defenders for the derby as Nathan Ake and Manuel Akanji try to overcome injury concerns. If there is solace for City, it comes from the knowledge United are not exactly flying. Their comeback Europa League victory against Viktoria Plzen on Thursday was their third win of Amorim's short reign so far but only one of those successes has come in the Premier League, where United have lost their past two games against Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements already on the red side of the city. "It's already there," he said. "You see all the patterns, the movements, the runners and the pace. He will do a good job at United, I'm pretty sure of that." - - Guardiola says skipper Kyle Walker has been offered support by the club after the City defender highlighted the racial abuse he had received on social media in the wake of the Juventus trip. "It's unacceptable," he said. "Not because it's Kyle - for any human being. "Unfortunately it happens many times in the real world. It is not necessary to say he has the support of the entire club. It is completely unacceptable and we give our support to him." - -------------------------------------------------------------------------------- - Distance: 0.3477, Text: 'We have to find a way' - Guardiola vows to end relegation form - - This video can not be played To play this video you need to enable JavaScript in your browser. 'Worrying' and 'staggering' - Why do Manchester City keep conceding? - - Manchester City are currently in relegation form and there is little sign of it ending. Saturday's 2-1 defeat at Aston Villa left them joint bottom of the form table over the past eight games with just Southampton for company. Saints, at the foot of the Premier League, have the same number of points, four, as City over their past eight matches having won one, drawn one and lost six - the same record as the floundering champions. And if Southampton - who appointed Ivan Juric as their new manager on Saturday - get at least a point at Fulham on Sunday, City will be on the worst run in the division. Even Wolves, who sacked boss Gary O'Neil last Sunday and replaced him with Vitor Pereira, have earned double the number of points during the same period having played a game fewer. They are damning statistics for Pep Guardiola, even if he does have some mitigating circumstances with injuries to Ederson, Nathan Ake and Ruben Dias - who all missed the loss at Villa Park - and the long-term loss of midfield powerhouse Rodri. Guardiola was happy with Saturday's performance, despite defeat in Birmingham, but there is little solace to take at slipping further out of the title race. He may have needed to field a half-fit Manuel Akanji and John Stones at Villa Park but that does not account for City looking a shadow of their former selves. That does not justify the error Josko Gvardiol made to gift Jhon Duran a golden chance inside the first 20 seconds, or £100m man Jack Grealish again failing to have an impact on a game. There may be legitimate reasons for City's drop off, whether that be injuries, mental fatigue or just simply a team coming to the end of its lifecycle, but their form, which has plunged off a cliff edge, would have been unthinkable as they strolled to a fourth straight title last season. "The worrying thing is the number of goals conceded," said ex-England captain Alan Shearer on BBC Match of the Day. "The number of times they were opened up because of the lack of protection and legs in midfield was staggering. There are so many things that are wrong at this moment in time." - - This video can not be played To play this video you need to enable JavaScript in your browser. Man City 'have to find a way' to return to form - Guardiola - - Afterwards Guardiola was calm, so much so it was difficult to hear him in the news conference, a contrast to the frustrated figure he cut on the touchline. He said: "It depends on us. The solution is bring the players back. We have just one central defender fit, that is difficult. We are going to try next game - another opportunity and we don't think much further than that. "Of course there are more reasons. We concede the goals we don't concede in the past, we [don't] score the goals we score in the past. Football is not just one reason. There are a lot of little factors. "Last season we won the Premier League, but we came here and lost. We have to think positive and I have incredible trust in the guys. Some of them have incredible pride and desire to do it. We have to find a way, step by step, sooner or later to find a way back." Villa boss Unai Emery highlighted City's frailties, saying he felt Villa could seize on the visitors' lack of belief. "Manchester City are a little bit under the confidence they have normally," he said. "The second half was different, we dominated and we scored. Through those circumstances they were feeling worse than even in the first half." - - Erling Haaland had one touch in the Villa box - - There are chinks in the armour never seen before at City under Guardiola and Erling Haaland conceded belief within the squad is low. He told TNT after the game: "Of course, [confidence levels are] not the best. We know how important confidence is and you can see that it affects every human being. That is how it is, we have to continue and stay positive even though it is difficult." Haaland, with 76 goals in 83 Premier League appearances since joining City from Borussia Dortmund in 2022, had one shot and one touch in the Villa box. His 18 touches in the whole game were the lowest of all starting players and he has been self critical, despite scoring 13 goals in the top flight this season. Over City's last eight games he has netted just twice though, but Guardiola refused to criticise his star striker. He said: "Without him we will be even worse but I like the players feeling that way. I don't agree with Erling. He needs to have the balls delivered in the right spots but he will fight for the next one." - -------------------------------------------------------------------------------- - Distance: 0.3677, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - - ... (output truncated for brevity) - - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - - -```python -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="cohere_composite_index", index_description="IVF,SQ8") -``` - -# Set Up Cache - A cache is set up using Couchbase to store intermediate results and frequently accessed data. Caching is important for improving performance, as it reduces the need to repeatedly calculate or retrieve the same data. The cache is linked to a specific collection in Couchbase, and it is used later in the script to store the results of language model queries. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-22 12:59:40,381 - INFO - Successfully created cache - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -try: - template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below: - {context} - - Question: {question}""" - prompt = ChatPromptTemplate.from_template(template) - - rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | prompt - | llm - | StrOutputParser() - ) - logging.info("Successfully created RAG chain") -except Exception as e: - raise ValueError(f"Error creating RAG chain: {str(e)}") -``` - - 2025-09-15 12:53:46,979 - INFO - Successfully created RAG chain - - - -```python -start_time = time.time() -try: - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: Manchester City manager Pep Guardiola has expressed concern and frustration over the team's recent form, describing it as the "worst run of results" in his managerial career. He has admitted that the situation has affected his sleep and diet, stating that his state of mind is "ugly" and his sleep is "worse." Guardiola has also acknowledged the need for the team to defend better and avoid making mistakes at both ends of the pitch. Despite the challenges, he remains focused on finding solutions and has emphasized the importance of bringing injured players back to the squad. Guardiola has also highlighted the need for the team to recover its essence by improving defensive concepts and re-establishing the intensity they are known for. He has taken a self-critical approach, stating that he is "not good enough" to resolve the situation with the current group of players and has vowed to find solutions to turn the team's form around. - RAG response generated in 4.09 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What was manchester city manager pep guardiola's reaction to the team's current form?", # Repeated query - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, Liverpool played with 10 men for 89 minutes after Andy Robertson received a red card in the 17th minute. Despite this numerical disadvantage, Liverpool managed to secure a 2-2 draw at Anfield. Fulham took the lead twice, but Liverpool responded both times, with Diogo Jota scoring an 86th-minute equalizer. The performance highlighted Liverpool's resilience and title credentials, with Fulham's Antonee Robinson praising Liverpool for not seeming like they were a man down. Liverpool maintained over 60% possession and dominated attacking metrics, showcasing their ability to fight back under adversity. - Time taken: 2.12 seconds - - Query 2: What was manchester city manager pep guardiola's reaction to the team's current form? - Response: Manchester City manager Pep Guardiola has expressed concern and frustration over the team's recent form, describing it as the "worst run of results" in his managerial career. He has admitted that the situation has affected his sleep and diet, stating that his state of mind is "ugly" and his sleep is "worse." Guardiola has also acknowledged the need for the team to defend better and avoid making mistakes at both ends of the pitch. Despite the challenges, he remains focused on finding solutions and has emphasized the importance of bringing injured players back to the squad. Guardiola has also highlighted the need for the team to recover its essence by improving defensive concepts and re-establishing the intensity they are known for. He has taken a self-critical approach, stating that he is "not good enough" to resolve the situation with the current group of players and has vowed to find solutions to turn the team's form around. - Time taken: 0.35 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, Liverpool played with 10 men for 89 minutes after Andy Robertson received a red card in the 17th minute. Despite this numerical disadvantage, Liverpool managed to secure a 2-2 draw at Anfield. Fulham took the lead twice, but Liverpool responded both times, with Diogo Jota scoring an 86th-minute equalizer. The performance highlighted Liverpool's resilience and title credentials, with Fulham's Antonee Robinson praising Liverpool for not seeming like they were a man down. Liverpool maintained over 60% possession and dominated attacking metrics, showcasing their ability to fight back under adversity. - Time taken: 0.35 seconds - - -## Conclusion -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and Cohere. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how it improves querying data more efficiently using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/crewai-fts-RAG_with_Couchbase_and_CrewAI.md b/tutorial/markdown/generated/vector-search-cookbook/crewai-fts-RAG_with_Couchbase_and_CrewAI.md deleted file mode 100644 index 0db3822..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/crewai-fts-RAG_with_Couchbase_and_CrewAI.md +++ /dev/null @@ -1,1023 +0,0 @@ ---- -# frontmatter -path: "/tutorial-crewai-couchbase-rag-with-fts" -title: Retrieval-Augmented Generation (RAG) with Couchbase and CrewAI using FTS Service -short_title: RAG with Couchbase and CrewAI using FTS -description: - - Learn how to build a semantic search engine using Couchbase and CrewAI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with CrewAI's agent-based approach. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain, CrewAI, and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - CrewAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/crewai/fts/RAG_with_Couchbase_and_CrewAI.ipynb) - -# Introduction - -In this guide, we will walk you through building a powerful semantic search engine using [Couchbase](https://www.couchbase.com) as the backend database and [CrewAI](https://github.com/crewAIInc/crewAI) for agent-based RAG operations. CrewAI allows us to create specialized agents that can work together to handle different aspects of the RAG workflow, from document retrieval to response generation. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-crewai-couchbase-rag-with-global-secondary-index) - -How to run this tutorial ----------------------- -This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run -interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/crewai/fts/RAG_with_Couchbase_and_CrewAI.ipynb). - -You can either: -- Download the notebook file and run it on [Google Colab](https://colab.research.google.com) -- Run it on your system by setting up the Python environment - -Before you start ---------------- - -1. Create and Deploy Your Free Tier Operational cluster on [Capella](https://cloud.couchbase.com/sign-up) - - To get started with [Couchbase Capella](https://cloud.couchbase.com), create an account and use it to deploy - a forever free tier operational cluster - - This account provides you with an environment where you can explore and learn - about Capella with no time constraint - - To learn more, please follow the [Getting Started Guide](https://docs.couchbase.com/cloud/get-started/create-account.html) - -2. Couchbase Capella Configuration - When running Couchbase using Capella, the following prerequisites need to be met: - - Create the database credentials to access the required bucket (Read and Write) used in the application - - Allow access to the Cluster from the IP on which the application is running by following the [Network Security documentation](https://docs.couchbase.com/cloud/security/security.html#public-access) - -# Setting the Stage: Installing Necessary Libraries - -We'll install the following key libraries: -- `datasets`: For loading and managing our training data -- `langchain-couchbase`: To integrate Couchbase with LangChain for vector storage and caching -- `langchain-openai`: For accessing OpenAI's embedding and chat models -- `crewai`: To create and orchestrate our AI agents for RAG operations -- `python-dotenv`: For securely managing environment variables and API keys - -These libraries provide the foundation for building a semantic search engine with vector embeddings, -database integration, and agent-based RAG capabilities. - - -```python -%pip install --quiet datasets==4.1.0 langchain-couchbase==0.4.0 langchain-openai==0.3.33 crewai==0.186.1 python-dotenv==1.1.1 ipywidgets -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.diagnostics import PingState, ServiceType -from couchbase.exceptions import (InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from crewai.tools import tool -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import ChatOpenAI, OpenAIEmbeddings - -from crewai import Agent, Crew, Process, Task -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. - - -```python -logging.basicConfig( - level=logging.INFO, - format='%(asctime)s [%(levelname)s] %(message)s', - datefmt='%Y-%m-%d %H:%M:%S' -) - -# Suppress httpx logging -logging.getLogger('httpx').setLevel(logging.CRITICAL) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script uses environment variables to store sensitive information, enhancing the overall security and maintainability of your code by avoiding hardcoded values. - - -```python -# Load environment variables -load_dotenv("./.env") - -# Configuration -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or input("Enter your OpenAI API key: ") -if not OPENAI_API_KEY: - raise ValueError("OPENAI_API_KEY is not set") - -CB_HOST = os.getenv('CB_HOST') or input("Enter Couchbase host (default: couchbase://localhost): ") or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input("Enter Couchbase username (default: Administrator): ") or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass("Enter Couchbase password (default: password): ") or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input("Enter bucket name (default: vector-search-testing): ") or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME') or input("Enter index name (default: vector_search_crew): ") or 'vector_search_crew' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input("Enter scope name (default: shared): ") or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input("Enter collection name (default: crew): ") or 'crew' - -print("Configuration loaded successfully") -``` - - Configuration loaded successfully - - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - -```python -# Connect to Couchbase -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - print("Successfully connected to Couchbase") -except Exception as e: - print(f"Failed to connect to Couchbase: {str(e)}") - raise -``` - - Successfully connected to Couchbase - - -# Verifying Search Service Availability - In this section, we verify that the Couchbase Search (FTS) service is available and responding correctly. This is a crucial check because our vector search functionality depends on it. If any issues are detected with the Search service, the function will raise an exception, allowing us to catch and handle problems early before attempting vector operations. - - - -```python -def check_search_service(cluster): - """Verify search service availability using ping""" - try: - # Get ping result - ping_result = cluster.ping() - search_available = False - - # Check if search service is responding - for service_type, endpoints in ping_result.endpoints.items(): - if service_type == ServiceType.Search: - for endpoint in endpoints: - if endpoint.state == PingState.OK: - search_available = True - print(f"Search service is responding at: {endpoint.remote}") - break - break - - if not search_available: - raise RuntimeError("Search/FTS service not found or not responding") - - print("Search service check passed successfully") - except Exception as e: - print(f"Health check failed: {str(e)}") - raise -try: - check_search_service(cluster) -except Exception as e: - print(f"Failed to check search service: {str(e)}") - raise -``` - - Search service is responding at: 18.117.138.157:18094 - Search service check passed successfully - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: If you are using Capella, create a bucket manually called vector-search-testing(or any name you prefer) with the same properties. - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) - -``` - - 2025-09-17 14:34:30 [INFO] Bucket 'vector-search-testing' exists. - 2025-09-17 14:34:32 [INFO] Scope 'shared' does not exist. Creating it... - 2025-09-17 14:34:33 [INFO] Scope 'shared' created successfully. - 2025-09-17 14:34:34 [INFO] Collection 'crew' does not exist. Creating it... - 2025-09-17 14:34:36 [INFO] Collection 'crew' created successfully. - 2025-09-17 14:34:41 [INFO] Primary index present or created successfully. - 2025-09-17 14:34:43 [INFO] All documents cleared from the collection. - - - - - - - - - -# Configuring and Initializing Couchbase Vector Search Index for Semantic Document Retrieval - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase Vector Search Index comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This CrewAI vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `crew`. The configuration is set up for vectors with exactly `1536 dimensions`, using `dot product` similarity and optimized for `recall`. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the instructions at [Couchbase Vector Search Documentation](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - -```python -# Load index definition -try: - with open('crew_index.json', 'r') as file: - index_definition = json.load(file) -except FileNotFoundError as e: - print(f"Error: crew_index.json file not found: {str(e)}") - raise -except json.JSONDecodeError as e: - print(f"Error: Invalid JSON in crew_index.json: {str(e)}") - raise -except Exception as e: - print(f"Error loading index definition: {str(e)}") - raise -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-09-17 14:34:47 [INFO] Creating new index 'vector_search_crew'... - 2025-09-17 14:34:48 [INFO] Index 'vector_search_crew' successfully created/updated. - - -## Setting Up OpenAI Components - -This section initializes two key OpenAI components needed for our RAG system: - -1. OpenAI Embeddings: - - Uses the 'text-embedding-3-small' model - - Converts text into high-dimensional vector representations (embeddings) - - These embeddings enable semantic search by capturing the meaning of text - - Required for vector similarity search in Couchbase - -2. ChatOpenAI Language Model: - - Uses the 'gpt-4o' model - - Temperature set to 0.2 for balanced creativity and focus - - Serves as the cognitive engine for CrewAI agents - - Powers agent reasoning, decision-making, and task execution - - Enables agents to: - - Process and understand retrieved context from vector search - - Generate thoughtful responses based on that context - - Follow instructions defined in agent roles and goals - - Collaborate with other agents in the crew - - The relatively low temperature (0.2) ensures agents produce reliable, - consistent outputs while maintaining some creative problem-solving ability - -Both components require a valid OpenAI API key (OPENAI_API_KEY) for authentication. -In the CrewAI framework, the LLM acts as the "brain" for each agent, allowing them -to interpret tasks, retrieve relevant information via the RAG system, and generate -appropriate outputs based on their specialized roles and expertise. - - -```python -# Initialize OpenAI components -embeddings = OpenAIEmbeddings( - openai_api_key=OPENAI_API_KEY, - model="text-embedding-3-small" -) - -llm = ChatOpenAI( - openai_api_key=OPENAI_API_KEY, - model="gpt-4o", - temperature=0.2 -) - -print("OpenAI components initialized") -``` - - OpenAI components initialized - - -# Setting Up the Couchbase Vector Store -A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - - -```python -# Setup vector store -vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, -) -print("Vector store initialized") -``` - - Vector store initialized - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-17 14:35:10 [INFO] Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Error Handling: If an error occurs, only the current batch is affected -3. Progress Tracking: Easier to monitor and track the ingestion progress -4. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-09-17 14:36:58 [INFO] Document ingestion completed successfully. - - -## Creating a Vector Search Tool -After loading our data into the vector store, we need to create a tool that can efficiently search through these vector embeddings. This involves two key components: - -### Vector Retriever -The vector retriever is configured to perform similarity searches. This creates a retriever that performs semantic similarity searches against our vector database. The similarity search finds documents whose vector embeddings are closest to the query's embedding in the vector space. - -### Search Tool -The search tool wraps the retriever in a user-friendly interface that: -- Takes a query string as input -- Passes the query to the retriever to find relevant documents -- Formats the results with clear document separation using document numbers and dividers -- Returns the formatted results as a single string with each document clearly delineated - -The tool is designed to integrate seamlessly with our AI agents, providing them with reliable access to our knowledge base through vector similarity search. The lambda function in the tool handles both direct string queries and structured query objects, ensuring flexibility in how the tool can be invoked. - - - -```python -# Create vector retriever -retriever = vector_store.as_retriever( - search_type="similarity", -) - -# Define the search tool using the @tool decorator -@tool("vector_search") -def search_tool(query: str) -> str: - """Search for relevant documents using vector similarity. - Input should be a simple text query string. - Returns a list of relevant document contents. - Use this tool to find detailed information about topics.""" - # Handle potential non-string query input if needed (similar to original lambda) - # CrewAI usually passes the string directly based on task description - # but checking doesn't hurt, though the Agent logic might handle this. - # query_str = query if isinstance(query, str) else str(query.get('query', '')) # Simplified for now - - # Invoke the retriever - docs = retriever.invoke(query) - - # Format the results - formatted_docs = "\n\n".join([ - f"Document {i+1}:\n{'-'*40}\n{doc.page_content}" - for i, doc in enumerate(docs) - ]) - return formatted_docs -``` - -# Creating CrewAI Agents - -We'll create two specialized AI agents using the CrewAI framework to handle different aspects of our information retrieval and analysis system: - -## Research Expert Agent -This agent is designed to: -- Execute semantic searches using our vector store -- Analyze and evaluate search results -- Identify key information and insights -- Verify facts across multiple sources -- Synthesize findings into comprehensive research summaries - -## Technical Writer Agent -This agent is responsible for: -- Taking research findings and structuring them logically -- Converting technical concepts into clear explanations -- Ensuring proper citation and attribution -- Maintaining engaging yet informative tone -- Producing well-formatted final outputs - -The agents work together in a coordinated way: -1. Research agent finds and analyzes relevant documents -2. Writer agent takes those findings and crafts polished responses -3. Both agents use a custom response template for consistent output - -This multi-agent approach allows us to: -- Leverage specialized expertise for different tasks -- Maintain high quality through separation of concerns -- Create more comprehensive and reliable outputs -- Scale the system's capabilities efficiently - - -```python -# Custom response template -response_template = """ -Analysis Results -=============== -{%- if .Response %} -{{ .Response }} -{%- endif %} - -Sources -======= -{%- for tool in .Tools %} -* {{ tool.name }} -{%- endfor %} - -Metadata -======== -* Confidence: {{ .Confidence }} -* Analysis Time: {{ .ExecutionTime }} -""" - -# Create research agent -researcher = Agent( - role='Research Expert', - goal='Find and analyze the most relevant documents to answer user queries accurately', - backstory="""You are an expert researcher with deep knowledge in information retrieval - and analysis. Your expertise lies in finding, evaluating, and synthesizing information - from various sources. You have a keen eye for detail and can identify key insights - from complex documents. You always verify information across multiple sources and - provide comprehensive, accurate analyses.""", - tools=[search_tool], - llm=llm, - verbose=True, - memory=True, - allow_delegation=False, - response_template=response_template -) - -# Create writer agent -writer = Agent( - role='Technical Writer', - goal='Generate clear, accurate, and well-structured responses based on research findings', - backstory="""You are a skilled technical writer with expertise in making complex - information accessible and engaging. You excel at organizing information logically, - explaining technical concepts clearly, and creating well-structured documents. You - ensure all information is properly cited, accurate, and presented in a user-friendly - manner. You have a talent for maintaining the reader's interest while conveying - detailed technical information.""", - llm=llm, - verbose=True, - memory=True, - allow_delegation=False, - response_template=response_template -) - -print("Agents created successfully") -``` - - Agents created successfully - - -## How CrewAI Agents Work in this RAG System - -### Agent-Based RAG Architecture - -This system uses a two-agent approach to implement Retrieval-Augmented Generation (RAG): - -1. **Research Expert Agent**: - - Receives the user query - - Uses the vector search tool to retrieve relevant documents from Couchbase - - Analyzes and synthesizes information from retrieved documents - - Produces a comprehensive research summary with key findings - -2. **Technical Writer Agent**: - - Takes the research summary as input - - Structures and formats the information - - Creates a polished, user-friendly response - - Ensures proper attribution and citation - -#### How the Process Works: - -1. **Query Processing**: User query is passed to the Research Agent -2. **Vector Search**: Query is converted to embeddings and matched against document vectors -3. **Document Retrieval**: Most similar documents are retrieved from Couchbase -4. **Analysis**: Research Agent analyzes documents for relevance and extracts key information -5. **Synthesis**: Research Agent combines findings into a coherent summary -6. **Refinement**: Writer Agent restructures and enhances the content -7. **Response Generation**: Final polished response is returned to the user - -This multi-agent approach separates concerns (research vs. writing) and leverages -specialized expertise for each task, resulting in higher quality responses. - - -# Testing the Search System - -Test the system with some example queries. - - -```python -def process_query(query, researcher, writer): - """ - Test the complete RAG system with a user query. - - This function tests both the vector search capability and the agent-based processing: - 1. Vector search: Retrieves relevant documents from Couchbase - 2. Agent processing: Uses CrewAI agents to analyze and format the response - - The function measures performance and displays detailed outputs from each step. - """ - print(f"\nQuery: {query}") - print("-" * 80) - - # Create tasks - research_task = Task( - description=f"Research and analyze information relevant to: {query}", - agent=researcher, - expected_output="A detailed analysis with key findings and supporting evidence" - ) - - writing_task = Task( - description="Create a comprehensive and well-structured response", - agent=writer, - expected_output="A clear, comprehensive response that answers the query", - context=[research_task] - ) - - # Create and execute crew - crew = Crew( - agents=[researcher, writer], - tasks=[research_task, writing_task], - process=Process.sequential, - verbose=True, - cache=True, - planning=True - ) - - try: - start_time = time.time() - result = crew.kickoff() - elapsed_time = time.time() - start_time - - print(f"\nQuery completed in {elapsed_time:.2f} seconds") - print("=" * 80) - print("RESPONSE") - print("=" * 80) - print(result) - - if hasattr(result, 'tasks_output'): - print("\n" + "=" * 80) - print("DETAILED TASK OUTPUTS") - print("=" * 80) - for task_output in result.tasks_output: - print(f"\nTask: {task_output.description[:100]}...") - print("-" * 40) - print(f"Output: {task_output.raw}") - print("-" * 40) - except Exception as e: - print(f"Error executing crew: {str(e)}") - logging.error(f"Crew execution failed: {str(e)}", exc_info=True) -``` - - -```python -# Disable logging before running the query -logging.disable(logging.CRITICAL) - -query = "What are the key details about the FA Cup third round draw? Include information about Manchester United vs Arsenal, Tamworth vs Tottenham, and other notable fixtures." -process_query(query, researcher, writer) -``` - - - Query: What are the key details about the FA Cup third round draw? Include information about Manchester United vs Arsenal, Tamworth vs Tottenham, and other notable fixtures. - -------------------------------------------------------------------------------- - - - -
╭──────────────────────────────────────────── Crew Execution Started ─────────────────────────────────────────────╮
-                                                                                                                 
-  Crew Execution Started                                                                                         
-  Name: crew                                                                                                     
-  ID: 02c49af6-ffe5-4bea-8cba-f3f08049625d                                                                       
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - - [2025-09-17 14:36:58][INFO]: Planning the crew execution - [EventBus Error] Handler 'on_task_started' failed for event 'TaskStartedEvent': 'NoneType' object has no attribute 'key' - - - -

-
-
-
-
-
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮
-                                                                                                                 
-  Task Completed                                                                                                 
-  Name: 5d4df0c5-14ad-47d7-8412-2cb8438a65df                                                                     
-  Agent: Task Execution Planner                                                                                  
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
-
- - - - -

-
-
-
-
-
╭──────────────────────────────────────────── 🔧 Agent Tool Execution ────────────────────────────────────────────╮
-                                                                                                                 
-  Agent: Research Expert                                                                                         
-                                                                                                                 
-  Thought: Thought: To gather detailed information about the FA Cup third round draw, specifically focusing on   
-  the matches Manchester United vs Arsenal and Tamworth vs Tottenham, I will perform a vector search using a     
-  relevant query.                                                                                                
-                                                                                                                 
-  Using Tool: vector_search                                                                                      
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
╭────────────────────────────────────────────────── Tool Input ───────────────────────────────────────────────────╮
-                                                                                                                 
-  "{\"query\": \"FA Cup third round draw Manchester United vs Arsenal Tamworth vs Tottenham\"}"                  
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
-
- - - - -

-
-
-
-
-
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮
-                                                                                                                 
-  Task Completed                                                                                                 
-  Name: d883be8b-ac2a-4678-80b3-afdc803bd716                                                                     
-  Agent: Research Expert                                                                                         
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
-
- - - - -

-
-
-
-
-
-
- - - - -

-
-
-
-
-
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮
-                                                                                                                 
-  Task Completed                                                                                                 
-  Name: 674a305d-1a6f-4b60-9497-ff4140f0f473                                                                     
-  Agent: Technical Writer                                                                                        
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -
-
- - - - - Query completed in 38.89 seconds - ================================================================================ - RESPONSE - ================================================================================ - **FA Cup Third Round Draw: A Comprehensive Overview** - - The FA Cup third round draw is a pivotal moment in the English football calendar, marking the entry of Premier League and Championship clubs into the competition. This stage often brings thrilling encounters and the potential for giant-killing acts, capturing the imagination of fans worldwide. The significance of the third round is underscored by the rich history and tradition of the FA Cup, the world's oldest national football competition. - - **Manchester United vs Arsenal** - - One of the standout fixtures of the third round is the clash between Manchester United and Arsenal. This match is set to take place over the weekend of Saturday, 11 January. Manchester United, the current holders of the FA Cup, will travel to face Arsenal, who have won the competition a record 14 times. The match is significant as it involves two of the most successful clubs in FA Cup history, both known for their storied pasts and passionate fanbases. - - - **Date and Venue:** Weekend of Saturday, 11 January, at Arsenal's home ground. - - **Team Statistics:** Manchester United have lifted the FA Cup 13 times, while Arsenal hold the record with 14 victories. - - **Recent Form:** Manchester United recently triumphed over Manchester City to claim their 13th FA Cup title, showcasing their competitive edge. - - **Predictions and Insights:** Given the historical rivalry and the stakes involved, this fixture promises to be a fiercely contested battle, with both teams eager to progress further in the tournament. - - **Tamworth vs Tottenham** - - Another intriguing fixture is the match between non-league side Tamworth and Premier League club Tottenham Hotspur. Tamworth, one of only two non-league clubs remaining in the competition, will host Spurs, highlighting the classic "David vs Goliath" narrative that the FA Cup is renowned for. - - - **Date and Venue:** To be played at Tamworth's home ground over the weekend of Saturday, 11 January. - - **Team Statistics:** Tamworth is the lowest-ranked team remaining in the competition, while Tottenham is a well-established Premier League club. - - **Recent Form:** Tamworth secured their place in the third round with a dramatic penalty shootout victory against League One side Burton Albion. - - ... (output truncated for brevity) - - -## Conclusion -By following these steps, you've built a powerful RAG system that combines Couchbase's vector storage capabilities with CrewAI's agent-based architecture. This multi-agent approach separates research and writing concerns, resulting in higher quality responses to user queries. - -The system demonstrates several key advantages: -1. Efficient vector search using Couchbase's vector store -2. Specialized AI agents that focus on different aspects of the RAG pipeline -3. Collaborative workflow between agents to produce comprehensive, well-structured responses -4. Scalable architecture that can be extended with additional agents for more complex tasks - -Whether you're building a customer support system, a research assistant, or a knowledge management solution, this agent-based RAG approach provides a flexible foundation that can be adapted to various use cases and domains. diff --git a/tutorial/markdown/generated/vector-search-cookbook/crewai-gsi-RAG_with_Couchbase_and_CrewAI.md b/tutorial/markdown/generated/vector-search-cookbook/crewai-gsi-RAG_with_Couchbase_and_CrewAI.md deleted file mode 100644 index 51e230c..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/crewai-gsi-RAG_with_Couchbase_and_CrewAI.md +++ /dev/null @@ -1,930 +0,0 @@ ---- -# frontmatter -path: "/tutorial-crewai-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and CrewAI with GSI -short_title: RAG with Couchbase and CrewAI with GSI -description: - - Learn how to build a semantic search engine using Couchbase and CrewAI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with CrewAI's agent-based approach. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain, CrewAI, and Couchbase with GSI. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - GSI - - Artificial Intelligence - - LangChain - - CrewAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/crewai/gsi/RAG_with_Couchbase_and_CrewAI.ipynb) - -# Agent-Based RAG with Couchbase GSI Vector Search and CrewAI - -## Overview - -In this guide, we will walk you through building a powerful semantic search engine using [Couchbase](https://www.couchbase.com) as the backend database and [CrewAI](https://github.com/crewAIInc/crewAI) for agent-based RAG operations. CrewAI allows us to create specialized agents that can work together to handle different aspects of the RAG workflow, from document retrieval to response generation. This tutorial uses Couchbase's **Global Secondary Index (GSI)** vector search capabilities, which offer high-performance vector search optimized for large-scale applications. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-crewai-couchbase-rag-with-fts/) - -## How to Run This Tutorial - -This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/crewai/gsi/RAG_with_Couchbase_and_CrewAI.ipynb). - -You can either: -- Download the notebook file and run it on [Google Colab](https://colab.research.google.com) -- Run it on your system by setting up the Python environment - -## Prerequisites - -### Couchbase Requirements - -1. Create and Deploy Your Free Tier Operational cluster on [Capella](https://cloud.couchbase.com/sign-up) - - To get started with [Couchbase Capella](https://cloud.couchbase.com), create an account and use it to deploy a free tier operational cluster - - This account provides you with an environment where you can explore and learn about Capella - - To learn more, please follow the [Getting Started Guide](https://docs.couchbase.com/cloud/get-started/create-account.html) - - **Important**: This tutorial requires Couchbase Server **8.0+** for GSI vector search capabilities - -### Couchbase Capella Configuration - -When running Couchbase using Capella, the following prerequisites need to be met: -- Create the database credentials to access the required bucket (Read and Write) used in the application -- Allow access to the Cluster from the IP on which the application is running by following the [Network Security documentation](https://docs.couchbase.com/cloud/security/security.html#public-access) - -## Setup and Installation - -### Installing Necessary Libraries - -We'll install the following key libraries: -- `datasets`: For loading and managing our training data -- `langchain-couchbase`: To integrate Couchbase with LangChain for GSI vector storage and caching -- `langchain-openai`: For accessing OpenAI's embedding and chat models -- `crewai`: To create and orchestrate our AI agents for RAG operations -- `python-dotenv`: For securely managing environment variables and API keys - -These libraries provide the foundation for building a semantic search engine with GSI vector embeddings, database integration, and agent-based RAG capabilities. - - -```python -%pip install --quiet datasets==4.1.0 langchain-couchbase==0.5.0 langchain-openai==0.3.33 crewai==0.186.1 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -### Import Required Modules - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta -from uuid import uuid4 - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.diagnostics import PingState, ServiceType -from couchbase.exceptions import (InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException, - CouchbaseException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from crewai.tools import tool -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy, IndexType -from langchain_openai import ChatOpenAI, OpenAIEmbeddings - -from crewai import Agent, Crew, Process, Task -``` - -### Configure Logging - -Logging is configured to track the progress of the script and capture any errors or warnings. - - -```python -logging.basicConfig( - level=logging.INFO, - format='%(asctime)s [%(levelname)s] %(message)s', - datefmt='%Y-%m-%d %H:%M:%S' -) - -# Suppress httpx logging -logging.getLogger('httpx').setLevel(logging.CRITICAL) -``` - -### Load Environment Configuration - -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script uses environment variables to store sensitive information, enhancing the overall security and maintainability of your code by avoiding hardcoded values. - - -```python -# Load environment variables -load_dotenv("./.env") - -# Configuration -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or input("Enter your OpenAI API key: ") -if not OPENAI_API_KEY: - raise ValueError("OPENAI_API_KEY is not set") - -CB_HOST = os.getenv('CB_HOST') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or 'vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or 'crew' - -print("Configuration loaded successfully") -``` - - Configuration loaded successfully - - -## Couchbase Connection Setup - -### Connect to Cluster - -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - -```python -# Connect to Couchbase -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - print("Successfully connected to Couchbase") -except Exception as e: - print(f"Failed to connect to Couchbase: {str(e)}") - raise -``` - - Successfully connected to Couchbase - - -### Setup Collections - -Create and configure Couchbase bucket, scope, and collection for storing our vector data. - -1. **Bucket Creation:** - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: If you are using Capella, create a bucket manually called vector-search-testing(or any name you prefer) with the same properties. - -2. **Scope Management:** - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. **Collection Setup:** - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -**Additional Tasks:** -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -``` - - 2025-10-06 10:17:53 [INFO] Bucket 'vector-search-testing' exists. - 2025-10-06 10:17:53 [INFO] Collection 'crew' already exists. Skipping creation. - 2025-10-06 10:17:55 [INFO] All documents cleared from the collection. - - - - - - - - - -## Understanding GSI Vector Search - -### GSI Vector Index Configuration - -Semantic search with GSI requires creating a Global Secondary Index optimized for vector operations. Unlike FTS-based vector search, GSI vector indexes offer two distinct types optimized for different use cases: - -#### GSI Vector Index Types - -##### Hyperscale Vector Indexes (BHIVE) - -- **Best for**: Pure vector searches like content discovery, recommendations, and semantic search -- **Performance**: High performance with low memory footprint, optimized for concurrent operations -- **Scalability**: Designed to scale to billions of vectors -- **Use when**: You primarily perform vector-only queries without complex scalar filtering - -##### Composite Vector Indexes - -- **Best for**: Filtered vector searches that combine vector search with scalar value filtering -- **Performance**: Efficient pre-filtering where scalar attributes reduce the vector comparison scope -- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- **Note**: Scalar filters take precedence over vector similarity - -#### Understanding Index Configuration - -The `index_description` parameter controls how Couchbase optimizes vector storage and search through centroids and quantization: - -**Format**: `'IVF[],{PQ|SQ}'` - -**Centroids (IVF - Inverted File):** -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -**Quantization Options:** -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -**Common Examples:** -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -```python -# GSI Vector Index Configuration -# Unlike FTS indexes, GSI vector indexes are created programmatically through the vector store -# We'll configure the parameters that will be used for index creation - -# Vector configuration -DISTANCE_STRATEGY = DistanceStrategy.COSINE # Cosine similarity -INDEX_TYPE = IndexType.BHIVE # Using BHIVE for high-performance vector -INDEX_DESCRIPTION = "IVF,SQ8" # Auto-selected centroids with 8-bit scalar quantization - -# To create a Composite Index instead, use the following: -# INDEX_TYPE = IndexType.COMPOSITE # Combines vector search with scalar filtering - -print("GSI vector index configuration prepared") -``` - - GSI vector index configuration prepared - - -### Alternative: Composite Index Configuration - -If your use case requires complex filtering with scalar attributes, you can create a **Composite index** instead by changing the configuration: - -```python -# Alternative configuration for Composite index -INDEX_TYPE = IndexType.COMPOSITE # Instead of IndexType.BHIVE -INDEX_DESCRIPTION = "IVF,SQ8" # Same quantization settings -DISTANCE_STRATEGY = DistanceStrategy.COSINE # Same distance metric - -# The rest of the setup remains identical -``` - -**Use Composite indexes when:** -- You need to filter by document metadata or attributes before vector similarity -- Your queries combine vector search with WHERE clauses -- You have well-defined filtering requirements that can reduce the search space - -**Note**: The index creation process is identical - just change the `INDEX_TYPE`. Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications requiring complex query patterns with metadata filtering. - -## OpenAI Configuration - -This section initializes two key OpenAI components needed for our RAG system: - -1. **OpenAI Embeddings:** - - Uses the 'text-embedding-3-small' model - - Converts text into high-dimensional vector representations (embeddings) - - These embeddings enable semantic search by capturing the meaning of text - - Required for vector similarity search in Couchbase - -2. **ChatOpenAI Language Model:** - - Uses the 'gpt-4o' model - - Temperature set to 0.2 for balanced creativity and focus - - Serves as the cognitive engine for CrewAI agents - - Powers agent reasoning, decision-making, and task execution - - Enables agents to: - - Process and understand retrieved context from vector search - - Generate thoughtful responses based on that context - - Follow instructions defined in agent roles and goals - - Collaborate with other agents in the crew - - The relatively low temperature (0.2) ensures agents produce reliable, consistent outputs while maintaining some creative problem-solving ability - -Both components require a valid OpenAI API key (OPENAI_API_KEY) for authentication. -In the CrewAI framework, the LLM acts as the "brain" for each agent, allowing them to interpret tasks, retrieve relevant information via the RAG system, and generate appropriate outputs based on their specialized roles and expertise. - - -```python -# Initialize OpenAI components -embeddings = OpenAIEmbeddings( - openai_api_key=OPENAI_API_KEY, - model="text-embedding-3-small" -) - -llm = ChatOpenAI( - openai_api_key=OPENAI_API_KEY, - model="gpt-4o", - temperature=0.2 -) - -print("OpenAI components initialized") -``` - - OpenAI components initialized - - -## Document Processing and Vector Store Setup - -### Create Couchbase GSI Vector Store - -Set up the GSI vector store where we'll store document embeddings for high-performance semantic search. - - -```python -# Setup GSI vector store with OpenAI embeddings -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - distance_metric=DISTANCE_STRATEGY - ) - print("GSI Vector store initialized successfully") - logging.info("GSI Vector store setup completed") -except Exception as e: - logging.error(f"Failed to initialize GSI vector store: {str(e)}") - raise RuntimeError(f"GSI Vector store initialization failed: {str(e)}") -``` - - 2025-10-06 10:18:05 [INFO] GSI Vector store setup completed - - - GSI Vector store initialized successfully - - -### Load BBC News Dataset - -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-10-06 10:18:13 [INFO] Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -#### Data Cleaning - -Remove duplicate articles for cleaner search results. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -#### Save Data to Vector Store - -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. **Memory Efficiency**: Processing in smaller batches prevents memory overload -2. **Error Handling**: If an error occurs, only the current batch is affected -3. **Progress Tracking**: Easier to monitor and track the ingestion progress -4. **Resource Management**: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. The optimal batch size depends on many factors including document sizes, available system resources, network conditions, and concurrent workload. - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-10-06 10:19:43 [INFO] Document ingestion completed successfully. - - -## Vector Search Performance Testing - -Now let's demonstrate the performance benefits of GSI optimization by testing pure vector search performance. We'll compare three optimization levels: - -1. **Baseline Performance**: Vector search without GSI optimization -2. **GSI-Optimized Performance**: Same search with BHIVE GSI index -3. **Cache Benefits**: Show how caching can be applied on top of GSI for repeated queries - -**Important**: This testing focuses on pure vector search performance, isolating the GSI improvements from other workflow overhead. - -### Create Vector Search Function - - -```python -import time - -# Create GSI vector retriever optimized for high-performance searches -retriever = vector_store.as_retriever( - search_type="similarity", - search_kwargs={"k": 4} # Return top 4 most similar documents -) - -def test_vector_search_performance(query_text, label="Vector Search"): - """Test pure vector search performance and return timing metrics""" - print(f"\n[{label}] Testing vector search performance") - print(f"[{label}] Query: '{query_text}'") - - start_time = time.time() - - try: - # Perform vector search using the retriever - docs = retriever.invoke(query_text) - end_time = time.time() - - search_time = end_time - start_time - print(f"[{label}] Vector search completed in {search_time:.4f} seconds") - print(f"[{label}] Found {len(docs)} relevant documents") - - # Show a preview of the first result - if docs: - preview = docs[0].page_content[:100] + "..." if len(docs[0].page_content) > 100 else docs[0].page_content - print(f"[{label}] Top result preview: {preview}") - - return search_time - except Exception as e: - print(f"[{label}] Vector search failed: {str(e)}") - return None -``` - -### Test 1: Baseline Performance (No GSI Index) - -Test pure vector search performance without GSI optimization. - - -```python -# Test baseline vector search performance without GSI index -test_query = "What are the latest developments in football transfers?" -print("Testing baseline vector search performance without GSI optimization...") -baseline_time = test_vector_search_performance(test_query, "Baseline Search") -print(f"\nBaseline vector search time (without GSI): {baseline_time:.4f} seconds\n") -``` - - Testing baseline vector search performance without GSI optimization... - - [Baseline Search] Testing vector search performance - [Baseline Search] Query: 'What are the latest developments in football transfers?' - [Baseline Search] Vector search completed in 1.3999 seconds - [Baseline Search] Found 4 relevant documents - [Baseline Search] Top result preview: The latest updates and analysis from the BBC. - - Baseline vector search time (without GSI): 1.3999 seconds - - - -### Create BHIVE GSI Index - -Now let's create a BHIVE GSI vector index to enable high-performance vector searches. The index creation is done programmatically through the vector store, which will optimize the index settings based on our data and requirements. - - -```python -# Create GSI Vector Index for high-performance searches -print("Creating BHIVE GSI vector index...") -try: - # Create a BHIVE index optimized for pure vector searches - vector_store.create_index( - index_type=INDEX_TYPE, # BHIVE index type - index_description=INDEX_DESCRIPTION # IVF,SQ8 for optimized performance - ) - print(f"GSI Vector index created successfully") - logging.info(f"BHIVE index created with description '{INDEX_DESCRIPTION}'") - - # Wait a moment for index to be available - print("Waiting for index to become available...") - time.sleep(5) - -except Exception as e: - # Index might already exist, which is fine - if "already exists" in str(e).lower(): - print(f"GSI Vector index already exists, proceeding...") - logging.info(f"Index already exists") - else: - logging.error(f"Failed to create GSI index: {str(e)}") - raise RuntimeError(f"GSI index creation failed: {str(e)}") -``` - - Creating BHIVE GSI vector index... - - - 2025-10-06 10:20:15 [INFO] BHIVE index created with description 'IVF,SQ8' - - - GSI Vector index created successfully - Waiting for index to become available... - - -### Test 2: GSI-Optimized Performance - -Test the same vector search with BHIVE GSI optimization. - - -```python -# Test vector search performance with GSI index -print("Testing vector search performance with BHIVE GSI optimization...") -gsi_search_time = test_vector_search_performance(test_query, "GSI-Optimized Search") -``` - - Testing vector search performance with BHIVE GSI optimization... - - [GSI-Optimized Search] Testing vector search performance - [GSI-Optimized Search] Query: 'What are the latest developments in football transfers?' - [GSI-Optimized Search] Vector search completed in 0.5885 seconds - [GSI-Optimized Search] Found 4 relevant documents - [GSI-Optimized Search] Top result preview: Four key areas for Everton's new owners to address - - Everton fans last saw silverware in 1995 when th... - - -### Test 3: Cache Benefits Testing - -Now let's demonstrate how caching can improve performance for repeated queries. **Note**: Caching benefits apply to both baseline and GSI-optimized searches. - - -```python -# Test cache benefits with a different query to avoid interference -cache_test_query = "What happened in the latest Premier League matches?" - -print("Testing cache benefits with vector search...") -print("First execution (cache miss):") -cache_time_1 = test_vector_search_performance(cache_test_query, "Cache Test - First Run") - -print("\nSecond execution (cache hit - should be faster):") -cache_time_2 = test_vector_search_performance(cache_test_query, "Cache Test - Second Run") -``` - - Testing cache benefits with vector search... - First execution (cache miss): - - [Cache Test - First Run] Testing vector search performance - [Cache Test - First Run] Query: 'What happened in the latest Premier League matches?' - [Cache Test - First Run] Vector search completed in 0.6450 seconds - [Cache Test - First Run] Found 4 relevant documents - [Cache Test - First Run] Top result preview: Who has made Troy's Premier League team of the week? - - After every round of Premier League matches th... - - Second execution (cache hit - should be faster): - - [Cache Test - Second Run] Testing vector search performance - [Cache Test - Second Run] Query: 'What happened in the latest Premier League matches?' - [Cache Test - Second Run] Vector search completed in 0.4306 seconds - [Cache Test - Second Run] Found 4 relevant documents - [Cache Test - Second Run] Top result preview: Who has made Troy's Premier League team of the week? - - After every round of Premier League matches th... - - -### Vector Search Performance Analysis - -Let's analyze the vector search performance improvements across all optimization levels: - - -```python -print("\n" + "="*80) -print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"Phase 1 - Baseline Search (No GSI): {baseline_time:.4f} seconds") -print(f"Phase 2 - GSI-Optimized Search: {gsi_search_time:.4f} seconds") -if cache_time_1 and cache_time_2: - print(f"Phase 3 - Cache Benefits:") - print(f" First execution (cache miss): {cache_time_1:.4f} seconds") - print(f" Second execution (cache hit): {cache_time_2:.4f} seconds") - -print("\n" + "-"*80) -print("VECTOR SEARCH OPTIMIZATION IMPACT:") -print("-"*80) - -# GSI improvement analysis -if baseline_time and gsi_search_time: - speedup = baseline_time / gsi_search_time if gsi_search_time > 0 else float('inf') - time_saved = baseline_time - gsi_search_time - percent_improvement = (time_saved / baseline_time) * 100 - print(f"GSI Index Benefit: {speedup:.2f}x faster ({percent_improvement:.1f}% improvement)") - -# Cache improvement analysis -if cache_time_1 and cache_time_2 and cache_time_2 < cache_time_1: - cache_speedup = cache_time_1 / cache_time_2 - cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100 - print(f"Cache Benefit: {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)") -else: - print(f"Cache Benefit: Variable (depends on query complexity and caching mechanism)") - -print(f"\nKey Insights for Vector Search Performance:") -print(f"• GSI BHIVE indexes provide significant performance improvements for vector similarity search") -print(f"• Performance gains are most dramatic for complex semantic queries") -print(f"• BHIVE optimization is particularly effective for high-dimensional embeddings") -print(f"• Combined with proper quantization (SQ8), GSI delivers production-ready performance") -print(f"• These performance improvements directly benefit any application using the vector store") -``` - - - ================================================================================ - VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - Phase 1 - Baseline Search (No GSI): 1.3999 seconds - Phase 2 - GSI-Optimized Search: 0.5885 seconds - Phase 3 - Cache Benefits: - First execution (cache miss): 0.6450 seconds - Second execution (cache hit): 0.4306 seconds - - -------------------------------------------------------------------------------- - VECTOR SEARCH OPTIMIZATION IMPACT: - -------------------------------------------------------------------------------- - GSI Index Benefit: 2.38x faster (58.0% improvement) - Cache Benefit: 1.50x faster (33.2% improvement) - - Key Insights for Vector Search Performance: - • GSI BHIVE indexes provide significant performance improvements for vector similarity search - • Performance gains are most dramatic for complex semantic queries - • BHIVE optimization is particularly effective for high-dimensional embeddings - • Combined with proper quantization (SQ8), GSI delivers production-ready performance - • These performance improvements directly benefit any application using the vector store - - -## CrewAI Agent Setup - -### What is CrewAI? - -Now that we've optimized our vector search performance, let's build a sophisticated agent-based RAG system using CrewAI. CrewAI enables us to create specialized AI agents that collaborate to handle different aspects of the RAG workflow: - -- **Research Agent**: Finds and analyzes relevant documents using our optimized vector search -- **Writer Agent**: Takes research findings and creates polished, structured responses -- **Collaborative Workflow**: Agents work together, with the writer building on the researcher's findings - -This multi-agent approach produces higher-quality responses than single-agent systems by separating research and writing expertise, while benefiting from the GSI performance improvements we just demonstrated. - -### Create Vector Search Tool - - -```python -# Define the GSI vector search tool using the @tool decorator -@tool("gsi_vector_search") -def search_tool(query: str) -> str: - """Search for relevant documents using GSI vector similarity. - Input should be a simple text query string. - Returns a list of relevant document contents from GSI vector search. - Use this tool to find detailed information about topics using high-performance GSI indexes.""" - - # Invoke the GSI vector retriever (now optimized with BHIVE index) - docs = retriever.invoke(query) - - # Format the results with distance information - formatted_docs = "\n\n".join([ - f"Document {i+1}:\n{'-'*40}\n{doc.page_content}" - for i, doc in enumerate(docs) - ]) - return formatted_docs -``` - -### Create CrewAI Agents - - -```python -# Create research agent -researcher = Agent( - role='Research Expert', - goal='Find and analyze the most relevant documents to answer user queries accurately', - backstory="""You are an expert researcher with deep knowledge in information retrieval - and analysis. Your expertise lies in finding, evaluating, and synthesizing information - from various sources. You have a keen eye for detail and can identify key insights - from complex documents. You always verify information across multiple sources and - provide comprehensive, accurate analyses.""", - tools=[search_tool], - llm=llm, - verbose=False, - memory=True, - allow_delegation=False -) - -# Create writer agent -writer = Agent( - role='Technical Writer', - goal='Generate clear, accurate, and well-structured responses based on research findings', - backstory="""You are a skilled technical writer with expertise in making complex - information accessible and engaging. You excel at organizing information logically, - explaining technical concepts clearly, and creating well-structured documents. You - ensure all information is properly cited, accurate, and presented in a user-friendly - manner. You have a talent for maintaining the reader's interest while conveying - detailed technical information.""", - llm=llm, - verbose=False, - memory=True, - allow_delegation=False -) - -print("CrewAI agents created successfully with optimized GSI vector search") -``` - - CrewAI agents created successfully with optimized GSI vector search - - -### How the Optimized RAG Workflow Works - -The complete optimized RAG process: -1. **User Query** → Research Agent -2. **Vector Search** → GSI BHIVE index finds similar documents (now with proven performance improvements) -3. **Document Analysis** → Research Agent analyzes and synthesizes findings -4. **Response Writing** → Writer Agent creates polished, structured response -5. **Final Output** → User receives comprehensive, well-formatted answer - -**Key Benefit**: The vector search performance improvements we demonstrated directly enhance the agent workflow efficiency. - -## CrewAI Agent Demo - -Now let's demonstrate the complete optimized agent-based RAG system in action, benefiting from the GSI performance improvements we validated earlier. - -### Demo Function - - -```python -def process_interactive_query(query, researcher, writer): - """Run complete RAG workflow with CrewAI agents using optimized GSI vector search""" - print(f"\nProcessing Query: {query}") - print("=" * 80) - - # Create tasks - research_task = Task( - description=f"Research and analyze information relevant to: {query}", - agent=researcher, - expected_output="A detailed analysis with key findings" - ) - - writing_task = Task( - description="Create a comprehensive response", - agent=writer, - expected_output="A clear, well-structured answer", - context=[research_task] - ) - - # Execute crew - crew = Crew( - agents=[researcher, writer], - tasks=[research_task, writing_task], - process=Process.sequential, - verbose=True, - cache=True, - planning=True - ) - - try: - start_time = time.time() - result = crew.kickoff() - elapsed_time = time.time() - start_time - - print(f"\nCompleted in {elapsed_time:.2f} seconds") - print("=" * 80) - print("RESPONSE") - print("=" * 80) - print(result) - - return elapsed_time - except Exception as e: - print(f"Error: {str(e)}") - return None -``` - -### Run Agent-Based RAG Demo - - -```python -# Disable logging for cleaner output -logging.disable(logging.CRITICAL) - -# Run demo with a sample query -demo_query = "What are the key details about the FA Cup third round draw?" -final_time = process_interactive_query(demo_query, researcher, writer) - -if final_time: - print(f"\n\n✅ CrewAI agent demo completed successfully in {final_time:.2f} seconds") -``` - -## Conclusion - -You have successfully built a powerful agent-based RAG system that combines Couchbase's high-performance GSI vector storage capabilities with CrewAI's multi-agent architecture. This tutorial demonstrated the complete pipeline from data ingestion to intelligent response generation, with real performance benchmarks showing the dramatic improvements GSI indexing provides. diff --git a/tutorial/markdown/generated/vector-search-cookbook/crewai-short-term-memory-fts-CouchbaseStorage_Demo.md b/tutorial/markdown/generated/vector-search-cookbook/crewai-short-term-memory-fts-CouchbaseStorage_Demo.md deleted file mode 100644 index 4896c6e..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/crewai-short-term-memory-fts-CouchbaseStorage_Demo.md +++ /dev/null @@ -1,846 +0,0 @@ ---- -# frontmatter -path: "/tutorial-crewai-short-term-memory-couchbase-with-fts" -title: Implementing Short-Term Memory for CrewAI Agents with Couchbase using FTS Service -short_title: CrewAI Short-Term Memory with Couchbase using FTS -description: - - Learn how to implement short-term memory for CrewAI agents using Couchbase's vector search capabilities using FTS. - - This tutorial demonstrates how to store and retrieve agent interactions using semantic search. - - You'll understand how to enhance CrewAI agents with memory capabilities using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - CrewAI -sdk_language: - - python -length: 45 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/crewai-short-term-memory/fts/CouchbaseStorage_Demo.ipynb) - -# CrewAI with Couchbase Short-Term Memory - -This notebook demonstrates how to implement a custom storage backend for CrewAI's memory system using Couchbase and vector search. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-crewai-short-term-memory-couchbase-with-global-secondary-index) - -Here's a breakdown of each section: - -How to run this tutorial ----------------------- -This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run -interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/crewai-short-term-memory/fts/CouchbaseStorage_Demo.ipynb). - -You can either: -- Download the notebook file and run it on [Google Colab](https://colab.research.google.com) -- Run it on your system by setting up the Python environment - -Before you start ---------------- - -1. Create and Deploy Your Free Tier Operational cluster on [Capella](https://cloud.couchbase.com/sign-up) - - To get started with [Couchbase Capella](https://cloud.couchbase.com), create an account and use it to deploy - a forever free tier operational cluster - - This account provides you with an environment where you can explore and learn - about Capella with no time constraint - - To learn more, please follow the [Getting Started Guide](https://docs.couchbase.com/cloud/get-started/create-account.html) - -2. Couchbase Capella Configuration - When running Couchbase using Capella, the following prerequisites need to be met: - - Create the database credentials to access the required bucket (Read and Write) - used in the application - - Allow access to the Cluster from the IP on which the application is running by following the [Network Security documentation](https://docs.couchbase.com/cloud/security/security.html#public-access) - -# Memory in AI Agents - -Memory in AI agents is a crucial capability that allows them to retain and utilize information across interactions, making them more effective and contextually aware. Without memory, agents would be limited to processing only the immediate input, lacking the ability to build upon past experiences or maintain continuity in conversations. - -> Note: This section on memory types and functionality is adapted from the CrewAI documentation. - -## Types of Memory in AI Agents - -### Short-term Memory -- Retains recent interactions and context -- Typically spans the current conversation or session -- Helps maintain coherence within a single interaction flow -- In CrewAI, this is what we're implementing with the Couchbase storage - -### Long-term Memory -- Stores persistent knowledge across multiple sessions -- Enables agents to recall past interactions even after long periods -- Helps build cumulative knowledge about users, preferences, and past decisions -- While this implementation is labeled as "short-term memory", the Couchbase storage backend can be effectively used for long-term memory as well, thanks to Couchbase's persistent storage capabilities and enterprise-grade durability features - - - -## How Memory Works in Agents -Memory in AI agents typically involves: -- Storage: Information is encoded and stored in a database (like Couchbase, ChromaDB, or other vector stores) -- Retrieval: Relevant memories are fetched based on semantic similarity to current context -- Integration: Retrieved memories are incorporated into the agent's reasoning process - -In the CrewAI example, the CouchbaseStorage class implements: -- save(): Stores new memories with metadata -- search(): Retrieves relevant memories based on semantic similarity -- reset(): Clears stored memories when needed - -## Benefits of Memory in AI Agents -- Contextual Understanding: Agents can refer to previous parts of a conversation -- Personalization: Remembering user preferences and past interactions -- Learning and Adaptation: Building knowledge over time to improve responses -- Task Continuity: Resuming complex tasks across multiple interactions -- Collaboration: In multi-agent systems like CrewAI, memory enables agents to build on each other's work - -## Memory in CrewAI Specifically -In CrewAI, memory serves several important functions: -- Agent Specialization: Each agent can maintain its own memory relevant to its expertise -- Knowledge Transfer: Agents can share insights through memory when collaborating on tasks -- Process Continuity: In sequential processes, later agents can access the work of earlier agents -- Contextual Awareness: Agents can reference previous findings when making decisions - -The vector-based approach (using embeddings) is particularly powerful because it allows for semantic search - finding memories that are conceptually related to the current context, not just exact keyword matches. - -By implementing custom storage like Couchbase, you gain additional benefits like persistence, scalability, and the ability to leverage enterprise-grade database features for your agent memory systems. - -## Install Required Libraries - -This section installs the necessary Python packages: -- `crewai`: The main CrewAI framework -- `langchain-couchbase`: LangChain integration for Couchbase -- `langchain-openai`: LangChain integration for OpenAI -- `python-dotenv`: For loading environment variables - - -```python -%pip install --quiet crewai==0.186.1 langchain-couchbase==0.4.0 langchain-openai==0.3.33 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -## Importing Necessary Libraries - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - -```python -from typing import Any, Dict, List, Optional -import os -import logging -from datetime import timedelta -from dotenv import load_dotenv -from crewai.memory.storage.rag_storage import RAGStorage -from crewai.memory.short_term.short_term_memory import ShortTermMemory -from crewai import Agent, Crew, Task, Process -from couchbase.cluster import Cluster -from couchbase.options import ClusterOptions -from couchbase.auth import PasswordAuthenticator -from couchbase.diagnostics import PingState, ServiceType -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import OpenAIEmbeddings, ChatOpenAI -import time -import json -import uuid - -# Configure logging (disabled) -logging.basicConfig(level=logging.CRITICAL) -logger = logging.getLogger(__name__) -``` - -## Loading Sensitive Information - -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script uses environment variables to store sensitive information, enhancing the overall security and maintainability of your code by avoiding hardcoded values. - -### Setting Up Environment Variables - -> **Note:** This implementation reads configuration parameters from environment variables. Before running this notebook, you need to set the following environment variables: -> -> - `OPENAI_API_KEY`: Your OpenAI API key for generating embeddings -> - `CB_HOST`: Couchbase cluster connection string (e.g., "couchbases://cb.example.com") -> - `CB_USERNAME`: Username for Couchbase authentication -> - `CB_PASSWORD`: Password for Couchbase authentication -> - `CB_BUCKET_NAME` (optional): Bucket name (defaults to "vector-search-testing") -> - `SCOPE_NAME` (optional): Scope name (defaults to "shared") -> - `COLLECTION_NAME` (optional): Collection name (defaults to "crew") -> - `INDEX_NAME` (optional): Vector search index name (defaults to "vector_search_crew") -> -> You can set these variables in a `.env` file in the same directory as this notebook, or set them directly in your environment. - - -```python -load_dotenv("./.env") - -# Verify environment variables -required_vars = ['OPENAI_API_KEY', 'CB_HOST', 'CB_USERNAME', 'CB_PASSWORD'] -for var in required_vars: - if not os.getenv(var): - raise ValueError(f"{var} environment variable is required") -``` - -## Implement CouchbaseStorage - -This section demonstrates the implementation of a custom vector storage solution using Couchbase: - -> **Note on Implementation:** This example uses the LangChain Couchbase integration (`langchain_couchbase`) for simplicity and to demonstrate integration with the broader LangChain ecosystem. In production environments, you may want to use the Couchbase SDK directly for better performance and more control. - -> For more information on using the Couchbase SDK directly, refer to: -> - [Couchbase Python SDK Documentation](https://docs.couchbase.com/python-sdk/current/howtos/full-text-searching-with-sdk.html#single-vector-query) - - -```python -class CouchbaseStorage(RAGStorage): - """ - Extends RAGStorage to handle embeddings for memory entries using Couchbase. - """ - - def __init__(self, type: str, allow_reset: bool = True, embedder_config: Optional[Dict[str, Any]] = None, crew: Optional[Any] = None): - """Initialize CouchbaseStorage with configuration.""" - super().__init__(type, allow_reset, embedder_config, crew) - self._initialize_app() - - def search( - self, - query: str, - limit: int = 3, - filter: Optional[dict] = None, - score_threshold: float = 0, - ) -> List[Dict[str, Any]]: - """ - Search memory entries using vector similarity. - """ - try: - # Add type filter - search_filter = {"memory_type": self.type} - if filter: - search_filter.update(filter) - - # Execute search - results = self.vector_store.similarity_search_with_score( - query, - k=limit, - filter=search_filter - ) - - # Format results and deduplicate by content - seen_contents = set() - formatted_results = [] - - for i, (doc, score) in enumerate(results): - if score >= score_threshold: - content = doc.page_content - if content not in seen_contents: - seen_contents.add(content) - formatted_results.append({ - "id": doc.metadata.get("memory_id", str(i)), - "metadata": doc.metadata, - "context": content, - "score": float(score) - }) - - logger.info(f"Found {len(formatted_results)} unique results for query: {query}") - return formatted_results - - except Exception as e: - logger.error(f"Search failed: {str(e)}") - return [] - - def save(self, value: Any, metadata: Dict[str, Any]) -> None: - """ - Save a memory entry with metadata. - """ - try: - # Generate unique ID - memory_id = str(uuid.uuid4()) - timestamp = int(time.time() * 1000) - - # Prepare metadata (create a copy to avoid modifying references) - if not metadata: - metadata = {} - else: - metadata = metadata.copy() # Create a copy to avoid modifying references - - # Process agent-specific information if present - agent_name = metadata.get('agent', 'unknown') - - # Clean up value if it has the typical LLM response format - value_str = str(value) - if "Final Answer:" in value_str: - # Extract just the actual content - everything after "Final Answer:" - parts = value_str.split("Final Answer:", 1) - if len(parts) > 1: - value = parts[1].strip() - logger.info(f"Cleaned up response format for agent: {agent_name}") - elif value_str.startswith("Thought:"): - # Handle thought/final answer format - if "Final Answer:" in value_str: - parts = value_str.split("Final Answer:", 1) - if len(parts) > 1: - value = parts[1].strip() - logger.info(f"Cleaned up thought process format for agent: {agent_name}") - - # Update metadata - metadata.update({ - "memory_id": memory_id, - "memory_type": self.type, - "timestamp": timestamp, - "source": "crewai" - }) - - # Log memory information for debugging - value_preview = str(value)[:100] + "..." if len(str(value)) > 100 else str(value) - metadata_preview = {k: v for k, v in metadata.items() if k != "embedding"} - logger.info(f"Saving memory for Agent: {agent_name}") - logger.info(f"Memory value preview: {value_preview}") - logger.info(f"Memory metadata: {metadata_preview}") - - # Convert value to string if needed - if isinstance(value, (dict, list)): - value = json.dumps(value) - elif not isinstance(value, str): - value = str(value) - - # Save to vector store - self.vector_store.add_texts( - texts=[value], - metadatas=[metadata], - ids=[memory_id] - ) - logger.info(f"Saved memory {memory_id}: {value[:100]}...") - - except Exception as e: - logger.error(f"Save failed: {str(e)}") - raise - - def reset(self) -> None: - """Reset the memory storage if allowed.""" - if not self.allow_reset: - return - - try: - # Delete documents of this memory type - self.cluster.query( - f"DELETE FROM `{self.bucket_name}`.`{self.scope_name}`.`{self.collection_name}` WHERE memory_type = $type", - type=self.type - ).execute() - logger.info(f"Reset memory type: {self.type}") - except Exception as e: - logger.error(f"Reset failed: {str(e)}") - raise - - def _initialize_app(self): - """Initialize Couchbase connection and vector store.""" - try: - # Initialize embeddings - if self.embedder_config and self.embedder_config.get("provider") == "openai": - self.embeddings = OpenAIEmbeddings( - openai_api_key=os.getenv('OPENAI_API_KEY'), - model=self.embedder_config.get("config", {}).get("model", "text-embedding-3-small") - ) - else: - self.embeddings = OpenAIEmbeddings( - openai_api_key=os.getenv('OPENAI_API_KEY'), - model="text-embedding-3-small" - ) - - # Connect to Couchbase - auth = PasswordAuthenticator( - os.getenv('CB_USERNAME', ''), - os.getenv('CB_PASSWORD', '') - ) - options = ClusterOptions(auth) - - # Initialize cluster connection - self.cluster = Cluster(os.getenv('CB_HOST', ''), options) - self.cluster.wait_until_ready(timedelta(seconds=5)) - - # Check search service - ping_result = self.cluster.ping() - search_available = False - for service_type, endpoints in ping_result.endpoints.items(): - if service_type == ServiceType.Search: - for endpoint in endpoints: - if endpoint.state == PingState.OK: - search_available = True - logger.info(f"Search service is responding at: {endpoint.remote}") - break - break - if not search_available: - raise RuntimeError("Search/FTS service not found or not responding") - - # Set up storage configuration - self.bucket_name = os.getenv('CB_BUCKET_NAME', 'vector-search-testing') - self.scope_name = os.getenv('SCOPE_NAME', 'shared') - self.collection_name = os.getenv('COLLECTION_NAME', 'crew') - self.index_name = os.getenv('INDEX_NAME', 'vector_search_crew') - - # Initialize vector store - self.vector_store = CouchbaseSearchVectorStore( - cluster=self.cluster, - bucket_name=self.bucket_name, - scope_name=self.scope_name, - collection_name=self.collection_name, - embedding=self.embeddings, - index_name=self.index_name - ) - logger.info(f"Initialized CouchbaseStorage for type: {self.type}") - - except Exception as e: - logger.error(f"Initialization failed: {str(e)}") - raise -``` - -## Test Basic Storage - -Test storing and retrieving a simple memory: - - -```python -# Initialize storage -storage = CouchbaseStorage( - type="short_term", - embedder_config={ - "provider": "openai", - "config": {"model": "text-embedding-3-small"} - } -) - -# Reset storage -storage.reset() - -# Test storage -test_memory = "Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased.'" -test_metadata = {"category": "sports", "test": "initial_memory"} -storage.save(test_memory, test_metadata) - -# Test search -results = storage.search("What did Guardiola say about Manchester City?", limit=1) -for result in results: - print(f"Found: {result['context']}\nScore: {result['score']}\nMetadata: {result['metadata']}") -``` - -## Test CrewAI Integration - -Create agents and tasks to test memory retention: - - -```python -# Initialize ShortTermMemory with our storage -memory = ShortTermMemory(storage=storage) - -# Initialize language model -llm = ChatOpenAI( - model="gpt-4o", - temperature=0.7 -) - -# Create agents with memory -sports_analyst = Agent( - role='Sports Analyst', - goal='Analyze Manchester City performance', - backstory='Expert at analyzing football teams and providing insights on their performance', - llm=llm, - memory=True, - memory_storage=memory -) - -journalist = Agent( - role='Sports Journalist', - goal='Create engaging football articles', - backstory='Experienced sports journalist who specializes in Premier League coverage', - llm=llm, - memory=True, - memory_storage=memory -) - -# Create tasks -analysis_task = Task( - description='Analyze Manchester City\'s recent performance based on Pep Guardiola\'s comments: "The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased."', - agent=sports_analyst, - expected_output="A comprehensive analysis of Manchester City's current form based on Guardiola's comments." -) - -writing_task = Task( - description='Write a sports article about Manchester City\'s form using the analysis and Guardiola\'s comments.', - agent=journalist, - context=[analysis_task], - expected_output="An engaging sports article about Manchester City's current form and Guardiola's perspective." -) - -# Create crew with memory -crew = Crew( - agents=[sports_analyst, journalist], - tasks=[analysis_task, writing_task], - process=Process.sequential, - memory=True, - short_term_memory=memory, # Explicitly pass our memory implementation - verbose=True -) - -# Run the crew -result = crew.kickoff() - -print("\nCrew Result:") -print("-" * 80) -print(result) -print("-" * 80) -``` - - -
╭──────────────────────────────────────────── Crew Execution Started ─────────────────────────────────────────────╮
-                                                                                                                 
-  Crew Execution Started                                                                                         
-  Name: crew                                                                                                     
-  ID: 7ac56ae1-b62f-4b07-952c-104a7243edb0                                                                       
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -
/Users/viraj.agarwal/Tasks/Task10/.venv/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install 
-"ipywidgets" for Jupyter support
-  warnings.warn('install "ipywidgets" for Jupyter support')
-
- - - - -

-
-
-
-
-
╭────────────────────────────────────────────── 🧠 Retrieved Memory ──────────────────────────────────────────────╮
-                                                                                                                 
-  Historical Data:                                                                                               
-  - Ensure that the analysis contains specific examples or statistics to support the claims made about team      
-  performance.                                                                                                   
-  - Include insights from other sources or viewpoints to provide a well-rounded analysis.                        
-  - Provide a comparison with past performance to highlight improvements or consistencies.                       
-  - Include player-specific analysis if individual performance is hinted at in the comments.                     
-  Entities:                                                                                                      
-  - Pep Guardiola(Football Manager): The current manager of Manchester City, known fo...                         
-                                                                                                                 
-╰─────────────────────────────────────────── Retrieval Time: 1384.18ms ───────────────────────────────────────────╯
-
- - - - -
-
- - - - -
╭─────────────────────────────────────────────── 🤖 Agent Started ────────────────────────────────────────────────╮
-                                                                                                                 
-  Agent: Sports Analyst                                                                                          
-                                                                                                                 
-  Task: Analyze Manchester City's recent performance based on Pep Guardiola's comments: "The team is playing     
-  well, we are in a good moment. The way we are training, the way we are playing - I am really pleased."         
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
-
- - - - -
/Users/viraj.agarwal/Tasks/Task10/.venv/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install 
-"ipywidgets" for Jupyter support
-  warnings.warn('install "ipywidgets" for Jupyter support')
-
- - - - -
/Users/viraj.agarwal/Tasks/Task10/.venv/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install 
-"ipywidgets" for Jupyter support
-  warnings.warn('install "ipywidgets" for Jupyter support')
-
- - - - -
/Users/viraj.agarwal/Tasks/Task10/.venv/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install 
-"ipywidgets" for Jupyter support
-  warnings.warn('install "ipywidgets" for Jupyter support')
-
- - - - -

-
-
-
-
-
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮
-                                                                                                                 
-  Task Completed                                                                                                 
-  Name: 721d99b2-ac47-4976-8862-364bb668075e                                                                     
-  Agent: Sports Analyst                                                                                          
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -
/Users/viraj.agarwal/Tasks/Task10/.venv/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install 
-"ipywidgets" for Jupyter support
-  warnings.warn('install "ipywidgets" for Jupyter support')
-
- - - - -

-
-
-
-
-
╭────────────────────────────────────────────── 🧠 Retrieved Memory ──────────────────────────────────────────────╮
-                                                                                                                 
-  Historical Data:                                                                                               
-  - Include specific quotes from Guardiola to enhance credibility.                                               
-  - Incorporate statistical data or match results to provide more depth.                                         
-  - Discuss recent matches or events in more detail.                                                             
-  - Add perspectives from players or other analysts for a more rounded view.                                     
-  - Include potential future challenges for Manchester City.                                                     
-  Entities:                                                                                                      
-  - Pep Guardiola(Individual): The manager of Manchester City, known for his tactical acumen and positive        
-  remarks about the team's performance.                                                                          
-  - Manch...                                                                                                     
-                                                                                                                 
-╰─────────────────────────────────────────── Retrieval Time: 991.13ms ────────────────────────────────────────────╯
-
- - - - -
-
- - - - -
╭─────────────────────────────────────────────── 🤖 Agent Started ────────────────────────────────────────────────╮
-                                                                                                                 
-  Agent: Sports Journalist                                                                                       
-                                                                                                                 
-  Task: Write a sports article about Manchester City's form using the analysis and Guardiola's comments.         
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
-
- - - - -
/Users/viraj.agarwal/Tasks/Task10/.venv/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install 
-"ipywidgets" for Jupyter support
-  warnings.warn('install "ipywidgets" for Jupyter support')
-
- - - - -
/Users/viraj.agarwal/Tasks/Task10/.venv/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install 
-"ipywidgets" for Jupyter support
-  warnings.warn('install "ipywidgets" for Jupyter support')
-
- - - - -
/Users/viraj.agarwal/Tasks/Task10/.venv/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install 
-"ipywidgets" for Jupyter support
-  warnings.warn('install "ipywidgets" for Jupyter support')
-
- - - - -

-
-
-
-
-
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮
-                                                                                                                 
-  Task Completed                                                                                                 
-  Name: 4fac1a2b-0fd1-484e-afe6-a4d4af236bd4                                                                     
-  Agent: Sports Journalist                                                                                       
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -
-
- - - - - Crew Result: - -------------------------------------------------------------------------------- - **Manchester City's Resilient Form Under Guardiola: A Symphony of Strategy and Skill** - - In the ever-competitive landscape of the Premier League, Manchester City continues to set the benchmark for excellence, guided by the strategic genius of Pep Guardiola. Reflecting on their current form, Guardiola's satisfaction is palpable: "The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased." These words not only highlight the team's current high morale but also underline the effectiveness of their training routines and the cohesive unit that Guardiola has meticulously crafted. - - Historically, Manchester City has been a juggernaut in English football, and their recent performances are a testament to their sustained dominance. Their consistency in maintaining high possession rates and crafting scoring opportunities is unparalleled. Statistically, City often leads in metrics such as ball possession and pass accuracy, with figures regularly surpassing 60% possession in matches, illustrating their control and domination on the pitch. - - Key to their success has been the stellar performances of individual players. Kevin De Bruyne's vision and precise passing have been instrumental in creating goal-scoring chances, while Erling Haaland's formidable goal-scoring abilities add a lethal edge to City's attack. Phil Foden's adaptability and technical prowess offer Guardiola the flexibility to shuffle tactics seamlessly. This trident of talent epitomizes the blend of skill and strategy that City embodies. - - Defensively, Manchester City has shown marked improvement, a testament to Guardiola's focus on fortifying the backline. Their defensive solidity, coupled with an attacking flair, makes them a daunting adversary for any team. Guardiola's ability to adapt tactics to counter various styles of play is a hallmark of his tenure, ensuring City remains at the pinnacle of competition both domestically and on the European stage. - - Analysts and pundits echo Guardiola's sentiments, praising Manchester City's ability to maintain elite standards and adapt to challenges with finesse. This holistic approach—encompassing rigorous training, strategic gameplay, and individual brilliance—cements Manchester City's status as leaders in football excellence. - - However, the journey is far from over. As they navigate the rigors of the Premier League and European competitions, potential challenges loom. Sustaining fitness levels, managing squad rotations, and countering tactical innovations from rivals will be pivotal. Yet, with Guardiola at the helm, Manchester City is well-equipped to tackle these challenges head-on. - - In conclusion, Manchester City's current form is a shining example of Guardiola's managerial prowess and the team's harmonious performance. Their continued success is a blend of strategic training, tactical adaptability, and outstanding individual contributions, positioning them as formidable contenders in any arena. As the season unfolds, fans and analysts alike will watch with bated breath to see how this footballing symphony continues to play out. - -------------------------------------------------------------------------------- - - -## Test Memory Retention - -Query the stored memories to verify retention: - - -```python -# Wait for memories to be stored -time.sleep(2) - -# List all documents in the collection -try: - # Query to fetch all documents of this memory type - query_str = f"SELECT META().id, * FROM `{storage.bucket_name}`.`{storage.scope_name}`.`{storage.collection_name}` WHERE memory_type = $type" - query_result = storage.cluster.query(query_str, type=storage.type) - - print(f"\nAll memory entries in Couchbase:") - print("-" * 80) - for i, row in enumerate(query_result, 1): - doc_id = row.get('id') - memory_id = row.get(storage.collection_name, {}).get('memory_id', 'unknown') - content = row.get(storage.collection_name, {}).get('text', '')[:100] + "..." # Truncate for readability - - print(f"Entry {i}:") - print(f"ID: {doc_id}") - print(f"Memory ID: {memory_id}") - print(f"Content: {content}") - print("-" * 80) -except Exception as e: - print(f"Failed to list memory entries: {str(e)}") - -# Test memory retention -memory_query = "What is Manchester City's current form according to Guardiola?" -memory_results = storage.search( - query=memory_query, - limit=5, # Increased to see more results - score_threshold=0.0 # Lower threshold to see all results -) - -print("\nMemory Search Results:") -print("-" * 80) -for result in memory_results: - print(f"Context: {result['context']}") - print(f"Score: {result['score']}") - print("-" * 80) - -# Try a more specific query to find agent interactions -interaction_query = "Manchester City playing style analysis tactical" -interaction_results = storage.search( - query=interaction_query, - limit=5, - score_threshold=0.0 -) - -print("\nAgent Interaction Memory Results:") -print("-" * 80) -for result in interaction_results: - print(f"Context: {result['context'][:200]}...") # Limit output size - print(f"Score: {result['score']}") - print("-" * 80) - -``` - - - All memory entries in Couchbase: - -------------------------------------------------------------------------------- - - Memory Search Results: - -------------------------------------------------------------------------------- - - Agent Interaction Memory Results: - -------------------------------------------------------------------------------- - diff --git a/tutorial/markdown/generated/vector-search-cookbook/crewai-short-term-memory-gsi-CouchbaseStorage_Demo.md b/tutorial/markdown/generated/vector-search-cookbook/crewai-short-term-memory-gsi-CouchbaseStorage_Demo.md deleted file mode 100644 index a03d901..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/crewai-short-term-memory-gsi-CouchbaseStorage_Demo.md +++ /dev/null @@ -1,1085 +0,0 @@ ---- -# frontmatter -path: "/tutorial-crewai-short-term-memory-couchbase-with-global-secondary-index" -title: Implementing Short-Term Memory for CrewAI Agents with Couchbase with GSI -short_title: CrewAI Short-Term Memory with Couchbase with GSI -description: - - Learn how to implement short-term memory for CrewAI agents using Couchbase's vector search capabilities with GSI. - - This tutorial demonstrates how to store and retrieve agent interactions using semantic search. - - You'll understand how to enhance CrewAI agents with memory capabilities using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - GSI - - Artificial Intelligence - - LangChain - - CrewAI -sdk_language: - - python -length: 45 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/crewai-short-term-memory/gsi/CouchbaseStorage_Demo.ipynb) - -# CrewAI Short-Term Memory with Couchbase GSI Vector Search - -## Overview - -This tutorial shows how to implement a custom memory backend for CrewAI agents using Couchbase's high-performance GSI (Global Secondary Index) vector search. CrewAI agents can retain and recall information across interactions, making them more contextually aware and effective. We'll demonstrate measurable performance improvements with GSI optimization. Alternatively if you want to perform semantic search using the FTS, please take a look at [this.](https://developer.couchbase.com/tutorial-crewai-short-term-memory-couchbase-with-fts) - -**Key Features:** -- Custom CrewAI memory storage with Couchbase GSI vector search -- High-performance semantic memory retrieval -- Agent memory persistence across conversations -- Performance benchmarks showing GSI benefits - -**Requirements:** Couchbase Server 8.0+ or Capella with Query Service enabled. - -You can access this notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/crewai-short-term-memory/gsi/CouchbaseStorage_Demo.ipynb). - -## Prerequisites - -### Couchbase Setup - -1. **Create Capella Account:** Deploy a [free tier cluster](https://cloud.couchbase.com/sign-up) -2. **Enable Query Service:** Required for GSI vector search -3. **Configure Access:** Set up database credentials and network security -4. **Create Bucket:** Manual bucket creation recommended for Capella - -## Understanding Agent Memory - -### Why Memory Matters for AI Agents - -Memory in AI agents is a crucial capability that allows them to retain and utilize information across interactions, making them more effective and contextually aware. Without memory, agents would be limited to processing only the immediate input, lacking the ability to build upon past experiences or maintain continuity in conversations. - -#### Types of Memory in AI Agents - -**Short-term Memory:** -- Retains recent interactions and context -- Typically spans the current conversation or session -- Helps maintain coherence within a single interaction flow -- In CrewAI, this is what we're implementing with the Couchbase storage - -**Long-term Memory:** -- Stores persistent knowledge across multiple sessions -- Enables agents to recall past interactions even after long periods -- Helps build cumulative knowledge about users, preferences, and past decisions -- While this implementation is labeled as "short-term memory", the Couchbase storage backend can be effectively used for long-term memory as well, thanks to Couchbase's persistent storage capabilities and enterprise-grade durability features - -#### How Memory Works in Agents - -Memory in AI agents typically involves: -- **Storage**: Information is encoded and stored in a database (like Couchbase, ChromaDB, or other vector stores) -- **Retrieval**: Relevant memories are fetched based on semantic similarity to current context -- **Integration**: Retrieved memories are incorporated into the agent's reasoning process - -The vector-based approach (using embeddings) is particularly powerful because it allows for semantic search - finding memories that are conceptually related to the current context, not just exact keyword matches. - -#### Benefits of Memory in AI Agents - -- **Contextual Understanding**: Agents can refer to previous parts of a conversation -- **Personalization**: Remembering user preferences and past interactions -- **Learning and Adaptation**: Building knowledge over time to improve responses -- **Task Continuity**: Resuming complex tasks across multiple interactions -- **Collaboration**: In multi-agent systems like CrewAI, memory enables agents to build on each other's work - -#### Memory in CrewAI Specifically - -In CrewAI, memory serves several important functions: -- **Agent Specialization**: Each agent can maintain its own memory relevant to its expertise -- **Knowledge Transfer**: Agents can share insights through memory when collaborating on tasks -- **Process Continuity**: In sequential processes, later agents can access the work of earlier agents -- **Contextual Awareness**: Agents can reference previous findings when making decisions - -## Setup and Installation - -### Install Required Libraries - -Install the necessary packages for CrewAI, Couchbase integration, and OpenAI embeddings. - - -```python -%pip install --quiet crewai==0.186.1 langchain-couchbase==0.5.0 langchain-openai==0.3.33 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -### Import Required Modules - -Import libraries for CrewAI memory storage, Couchbase GSI vector search, and OpenAI embeddings. - - -```python -from typing import Any, Dict, List, Optional -import os -import logging -from datetime import timedelta -from dotenv import load_dotenv -from crewai.memory.storage.rag_storage import RAGStorage -from crewai.memory.short_term.short_term_memory import ShortTermMemory -from crewai import Agent, Crew, Task, Process -from couchbase.cluster import Cluster -from couchbase.options import ClusterOptions -from couchbase.auth import PasswordAuthenticator -from couchbase.diagnostics import PingState, ServiceType -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -from langchain_openai import OpenAIEmbeddings, ChatOpenAI -import time -import json -import uuid - -# Configure logging (disabled) -logging.basicConfig(level=logging.CRITICAL) -logger = logging.getLogger(__name__) -``` - -### Environment Configuration - -Configure environment variables for secure access to Couchbase and OpenAI services. Create a `.env` file with your credentials. - - -```python -load_dotenv("./.env") - -# Verify environment variables -required_vars = ['OPENAI_API_KEY', 'CB_HOST', 'CB_USERNAME', 'CB_PASSWORD'] -for var in required_vars: - if not os.getenv(var): - raise ValueError(f"{var} environment variable is required") -``` - -## Understanding GSI Vector Search - -### GSI Vector Index Types - -Couchbase offers two types of GSI vector indexes for different use cases: - -**Hyperscale Vector Indexes (BHIVE):** -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -**Composite Vector Indexes:** -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -For this CrewAI memory implementation, we'll use **BHIVE** as it's optimized for pure semantic search scenarios typical in AI agent memory systems. - -### Understanding Index Configuration - -The `index_description` parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -**Format**: `'IVF[],{PQ|SQ}'` - -**Centroids (IVF - Inverted File):** -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -**Quantization Options:** -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -**Common Examples:** -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Custom CouchbaseStorage Implementation - -### CouchbaseStorage Class - -This class extends CrewAI's `RAGStorage` to provide GSI vector search capabilities for agent memory. - - -```python -class CouchbaseStorage(RAGStorage): - """ - Extends RAGStorage to handle embeddings for memory entries using Couchbase GSI Vector Search. - """ - - def __init__(self, type: str, allow_reset: bool = True, embedder_config: Optional[Dict[str, Any]] = None, crew: Optional[Any] = None): - """Initialize CouchbaseStorage with GSI vector search configuration.""" - super().__init__(type, allow_reset, embedder_config, crew) - self._initialize_app() - - def search( - self, - query: str, - limit: int = 3, - filter: Optional[dict] = None, - score_threshold: float = 0, - ) -> List[Dict[str, Any]]: - """ - Search memory entries using GSI vector similarity. - """ - try: - # Add type filter - search_filter = {"memory_type": self.type} - if filter: - search_filter.update(filter) - - # Execute search using GSI vector search - results = self.vector_store.similarity_search_with_score( - query, - k=limit, - filter=search_filter - ) - - # Format results and deduplicate by content - seen_contents = set() - formatted_results = [] - - for i, (doc, distance) in enumerate(results): - # Note: In GSI vector search, lower distance indicates higher similarity - if distance <= (1.0 - score_threshold): # Convert threshold for GSI distance metric - content = doc.page_content - if content not in seen_contents: - seen_contents.add(content) - formatted_results.append({ - "id": doc.metadata.get("memory_id", str(i)), - "metadata": doc.metadata, - "context": content, - "distance": float(distance) # Changed from score to distance - }) - - logger.info(f"Found {len(formatted_results)} unique results for query: {query}") - return formatted_results - - except Exception as e: - logger.error(f"Search failed: {str(e)}") - return [] - - def save(self, value: Any, metadata: Dict[str, Any]) -> None: - """ - Save a memory entry with metadata. - """ - try: - # Generate unique ID - memory_id = str(uuid.uuid4()) - timestamp = int(time.time() * 1000) - - # Prepare metadata (create a copy to avoid modifying references) - if not metadata: - metadata = {} - else: - metadata = metadata.copy() # Create a copy to avoid modifying references - - # Process agent-specific information if present - agent_name = metadata.get('agent', 'unknown') - - # Clean up value if it has the typical LLM response format - value_str = str(value) - if "Final Answer:" in value_str: - # Extract just the actual content - everything after "Final Answer:" - parts = value_str.split("Final Answer:", 1) - if len(parts) > 1: - value = parts[1].strip() - logger.info(f"Cleaned up response format for agent: {agent_name}") - elif value_str.startswith("Thought:"): - # Handle thought/final answer format - if "Final Answer:" in value_str: - parts = value_str.split("Final Answer:", 1) - if len(parts) > 1: - value = parts[1].strip() - logger.info(f"Cleaned up thought process format for agent: {agent_name}") - - # Update metadata - metadata.update({ - "memory_id": memory_id, - "memory_type": self.type, - "timestamp": timestamp, - "source": "crewai" - }) - - # Log memory information for debugging - value_preview = str(value)[:100] + "..." if len(str(value)) > 100 else str(value) - metadata_preview = {k: v for k, v in metadata.items() if k != "embedding"} - logger.info(f"Saving memory for Agent: {agent_name}") - logger.info(f"Memory value preview: {value_preview}") - logger.info(f"Memory metadata: {metadata_preview}") - - # Convert value to string if needed - if isinstance(value, (dict, list)): - value = json.dumps(value) - elif not isinstance(value, str): - value = str(value) - - # Save to GSI vector store - self.vector_store.add_texts( - texts=[value], - metadatas=[metadata], - ids=[memory_id] - ) - logger.info(f"Saved memory {memory_id}: {value[:100]}...") - - except Exception as e: - logger.error(f"Save failed: {str(e)}") - raise - - def reset(self) -> None: - """Reset the memory storage if allowed.""" - if not self.allow_reset: - return - - try: - # Delete documents of this memory type - self.cluster.query( - f"DELETE FROM `{self.bucket_name}`.`{self.scope_name}`.`{self.collection_name}` WHERE memory_type = $type", - type=self.type - ).execute() - logger.info(f"Reset memory type: {self.type}") - except Exception as e: - logger.error(f"Reset failed: {str(e)}") - raise - - def _initialize_app(self): - """Initialize Couchbase connection and GSI vector store.""" - try: - # Initialize embeddings - if self.embedder_config and self.embedder_config.get("provider") == "openai": - self.embeddings = OpenAIEmbeddings( - openai_api_key=os.getenv('OPENAI_API_KEY'), - model=self.embedder_config.get("config", {}).get("model", "text-embedding-3-small") - ) - else: - self.embeddings = OpenAIEmbeddings( - openai_api_key=os.getenv('OPENAI_API_KEY'), - model="text-embedding-3-small" - ) - - # Connect to Couchbase - auth = PasswordAuthenticator( - os.getenv('CB_USERNAME', ''), - os.getenv('CB_PASSWORD', '') - ) - options = ClusterOptions(auth) - - # Initialize cluster connection - self.cluster = Cluster(os.getenv('CB_HOST', ''), options) - self.cluster.wait_until_ready(timedelta(seconds=5)) - - # Check Query service (required for GSI vector search) - ping_result = self.cluster.ping() - query_available = False - for service_type, endpoints in ping_result.endpoints.items(): - if service_type.name == 'Query': # Query Service for GSI - for endpoint in endpoints: - if endpoint.state == PingState.OK: - query_available = True - logger.info(f"Query service is responding at: {endpoint.remote}") - break - break - if not query_available: - raise RuntimeError("Query service not found or not responding. GSI vector search requires Query Service.") - - # Set up storage configuration - self.bucket_name = os.getenv('CB_BUCKET_NAME', 'vector-search-testing') - self.scope_name = os.getenv('SCOPE_NAME', 'shared') - self.collection_name = os.getenv('COLLECTION_NAME', 'crew') - self.index_name = os.getenv('INDEX_NAME', 'vector_search_crew_gsi') - - # Initialize GSI vector store - self.vector_store = CouchbaseQueryVectorStore( - cluster=self.cluster, - bucket_name=self.bucket_name, - scope_name=self.scope_name, - collection_name=self.collection_name, - embedding=self.embeddings, - distance_metric=DistanceStrategy.COSINE, - ) - logger.info(f"Initialized CouchbaseStorage with GSI vector search for type: {self.type}") - - except Exception as e: - logger.error(f"Initialization failed: {str(e)}") - raise -``` - -## Memory Search Performance Testing - -Now let's demonstrate the performance benefits of GSI optimization by testing pure memory search performance. We'll compare three optimization levels: - -1. **Baseline Performance**: Memory search without GSI optimization -2. **GSI-Optimized Performance**: Same search with BHIVE GSI index -3. **Cache Benefits**: Show how caching can be applied on top of GSI for repeated queries - -**Important**: This testing focuses on pure memory search performance, isolating the GSI improvements from CrewAI agent workflow overhead. - -### Initialize Storage and Test Functions - -First, let's set up the storage and create test functions for measuring memory search performance. - - -```python -# Initialize storage -storage = CouchbaseStorage( - type="short_term", - embedder_config={ - "provider": "openai", - "config": {"model": "text-embedding-3-small"} - } -) - -# Reset storage -storage.reset() - -# Test storage -test_memory = "Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased.'" -test_metadata = {"category": "sports", "test": "initial_memory"} -storage.save(test_memory, test_metadata) - -import time - -def test_memory_search_performance(storage, query, label="Memory Search"): - """Test pure memory search performance and return timing metrics""" - print(f"\n[{label}] Testing memory search performance") - print(f"[{label}] Query: '{query}'") - - start_time = time.time() - - try: - results = storage.search(query, limit=3) - end_time = time.time() - search_time = end_time - start_time - - print(f"[{label}] Memory search completed in {search_time:.4f} seconds") - print(f"[{label}] Found {len(results)} memories") - - if results: - print(f"[{label}] Top result distance: {results[0]['distance']:.6f} (lower = more similar)") - preview = results[0]['context'][:100] + "..." if len(results[0]['context']) > 100 else results[0]['context'] - print(f"[{label}] Top result preview: {preview}") - - return search_time - except Exception as e: - print(f"[{label}] Memory search failed: {str(e)}") - return None -``` - -### Test 1: Baseline Performance (No GSI Index) - -Test pure memory search performance without GSI optimization. - - -```python -# Test baseline memory search performance without GSI index -test_query = "What did Guardiola say about Manchester City?" -print("Testing baseline memory search performance without GSI optimization...") -baseline_time = test_memory_search_performance(storage, test_query, "Baseline Search") -print(f"\nBaseline memory search time (without GSI): {baseline_time:.4f} seconds\n") -``` - - Testing baseline memory search performance without GSI optimization... - - [Baseline Search] Testing memory search performance - [Baseline Search] Query: 'What did Guardiola say about Manchester City?' - [Baseline Search] Memory search completed in 0.6159 seconds - [Baseline Search] Found 1 memories - [Baseline Search] Top result distance: 0.340130 (lower = more similar) - [Baseline Search] Top result preview: Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a ... - - Baseline memory search time (without GSI): 0.6159 seconds - - - -### Create BHIVE GSI Index - -Now let's create a BHIVE GSI vector index to enable high-performance memory searches. The index creation is done programmatically through the vector store. - - -```python -# Create GSI BHIVE vector index for optimal performance -print("Creating BHIVE GSI vector index...") -try: - storage.vector_store.create_index( - index_type=IndexType.BHIVE, - # index_type=IndexType.COMPOSITE, # Uncomment this line to create a COMPOSITE index instead - index_name=storage.index_name, - index_description="IVF,SQ8" # Auto-selected centroids with 8-bit scalar quantization - ) - print(f"GSI Vector index created successfully: {storage.index_name}") - - # Wait for index to become available - print("Waiting for index to become available...") - time.sleep(5) - -except Exception as e: - if "already exists" in str(e).lower(): - print(f"GSI vector index '{storage.index_name}' already exists, proceeding...") - else: - print(f"Error creating GSI index: {str(e)}") -``` - - Creating BHIVE GSI vector index... - GSI Vector index created successfully: vector_search_crew - Waiting for index to become available... - - -### Alternative: Composite Index Configuration - -If your agent memory use case requires complex filtering with scalar attributes, you can create a **Composite index** instead by changing the configuration above: - -```python -# Alternative: Create a Composite index for filtered memory searches -storage.vector_store.create_index( - index_type=IndexType.COMPOSITE, # Instead of IndexType.BHIVE - index_name=storage.index_name, - index_description="IVF,SQ8" # Same quantization settings -) -``` - -### Test 2: GSI-Optimized Performance - -Test the same memory search with BHIVE GSI optimization. - - -```python -# Test memory search performance with GSI index -print("Testing memory search performance with BHIVE GSI optimization...") -gsi_time = test_memory_search_performance(storage, test_query, "GSI-Optimized Search") -``` - - Testing memory search performance with BHIVE GSI optimization... - - [GSI-Optimized Search] Testing memory search performance - [GSI-Optimized Search] Query: 'What did Guardiola say about Manchester City?' - [GSI-Optimized Search] Memory search completed in 0.5910 seconds - [GSI-Optimized Search] Found 1 memories - [GSI-Optimized Search] Top result distance: 0.340142 (lower = more similar) - [GSI-Optimized Search] Top result preview: Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a ... - - -### Test 3: Cache Benefits Testing - -Now let's demonstrate how caching can improve performance for repeated queries. **Note**: Caching benefits apply to both baseline and GSI-optimized searches. - - -```python -# Test cache benefits with a different query to avoid interference -cache_test_query = "How is Manchester City performing in training sessions?" - -print("Testing cache benefits with memory search...") -print("First execution (cache miss):") -cache_time_1 = test_memory_search_performance(storage, cache_test_query, "Cache Test - First Run") - -print("\nSecond execution (cache hit - should be faster):") -cache_time_2 = test_memory_search_performance(storage, cache_test_query, "Cache Test - Second Run") -``` - - Testing cache benefits with memory search... - First execution (cache miss): - - [Cache Test - First Run] Testing memory search performance - [Cache Test - First Run] Query: 'How is Manchester City performing in training sessions?' - [Cache Test - First Run] Memory search completed in 0.6076 seconds - [Cache Test - First Run] Found 1 memories - [Cache Test - First Run] Top result distance: 0.379242 (lower = more similar) - [Cache Test - First Run] Top result preview: Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a ... - - Second execution (cache hit - should be faster): - - [Cache Test - Second Run] Testing memory search performance - [Cache Test - Second Run] Query: 'How is Manchester City performing in training sessions?' - [Cache Test - Second Run] Memory search completed in 0.4745 seconds - [Cache Test - Second Run] Found 1 memories - [Cache Test - Second Run] Top result distance: 0.379200 (lower = more similar) - [Cache Test - Second Run] Top result preview: Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a ... - - -### Memory Search Performance Analysis - -Let's analyze the memory search performance improvements across all optimization levels: - - -```python -print("\n" + "="*80) -print("MEMORY SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"Phase 1 - Baseline Search (No GSI): {baseline_time:.4f} seconds") -print(f"Phase 2 - GSI-Optimized Search: {gsi_time:.4f} seconds") -if cache_time_1 and cache_time_2: - print(f"Phase 3 - Cache Benefits:") - print(f" First execution (cache miss): {cache_time_1:.4f} seconds") - print(f" Second execution (cache hit): {cache_time_2:.4f} seconds") - -print("\n" + "-"*80) -print("MEMORY SEARCH OPTIMIZATION IMPACT:") -print("-"*80) - -# GSI improvement analysis -if baseline_time and gsi_time: - speedup = baseline_time / gsi_time if gsi_time > 0 else float('inf') - time_saved = baseline_time - gsi_time - percent_improvement = (time_saved / baseline_time) * 100 - print(f"GSI Index Benefit: {speedup:.2f}x faster ({percent_improvement:.1f}% improvement)") - -# Cache improvement analysis -if cache_time_1 and cache_time_2 and cache_time_2 < cache_time_1: - cache_speedup = cache_time_1 / cache_time_2 - cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100 - print(f"Cache Benefit: {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)") -else: - print(f"Cache Benefit: Variable (depends on query complexity and caching mechanism)") - -print(f"\nKey Insights for Agent Memory Performance:") -print(f"• GSI BHIVE indexes provide significant performance improvements for memory search") -print(f"• Performance gains are most dramatic for complex semantic memory queries") -print(f"• BHIVE optimization is particularly effective for agent conversational memory") -print(f"• Combined with proper quantization (SQ8), GSI delivers production-ready performance") -print(f"• These performance improvements directly benefit agent response times and scalability") -``` - - - ================================================================================ - MEMORY SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - Phase 1 - Baseline Search (No GSI): 0.6159 seconds - Phase 2 - GSI-Optimized Search: 0.5910 seconds - Phase 3 - Cache Benefits: - First execution (cache miss): 0.6076 seconds - Second execution (cache hit): 0.4745 seconds - - -------------------------------------------------------------------------------- - MEMORY SEARCH OPTIMIZATION IMPACT: - -------------------------------------------------------------------------------- - GSI Index Benefit: 1.04x faster (4.0% improvement) - Cache Benefit: 1.28x faster (21.9% improvement) - - Key Insights for Agent Memory Performance: - • GSI BHIVE indexes provide significant performance improvements for memory search - • Performance gains are most dramatic for complex semantic memory queries - • BHIVE optimization is particularly effective for agent conversational memory - • Combined with proper quantization (SQ8), GSI delivers production-ready performance - • These performance improvements directly benefit agent response times and scalability - - -**Note on BHIVE GSI Performance:** The BHIVE GSI index may show slower performance for very small datasets (few documents) due to the additional overhead of maintaining the index structure. However, as the dataset scales up, the BHIVE GSI index becomes significantly faster than traditional vector searches. The initial overhead investment pays off dramatically with larger memory stores, making it essential for production agent deployments with substantial conversational history. - -## CrewAI Agent Memory Demo - -### What is CrewAI Agent Memory? - -Now that we've optimized our memory search performance, let's demonstrate how CrewAI agents can leverage this GSI-optimized memory system. CrewAI agent memory enables: - -- **Persistent Context**: Agents remember information across conversations and tasks -- **Semantic Recall**: Agents can find relevant memories using natural language queries -- **Collaborative Memory**: Multiple agents can share and build upon each other's memories -- **Performance Benefits**: Our GSI optimizations directly improve agent memory retrieval speed - -This demo shows how the memory performance improvements we validated translate to real agent workflows. - -### Create Agents with Optimized Memory - -Set up CrewAI agents that use our GSI-optimized Couchbase memory storage for fast, contextual memory retrieval. - - -```python -# Initialize ShortTermMemory with our storage -memory = ShortTermMemory(storage=storage) - -# Initialize language model -llm = ChatOpenAI( - model="gpt-4o", - temperature=0.7 -) - -# Create agents with memory -sports_analyst = Agent( - role='Sports Analyst', - goal='Analyze Manchester City performance', - backstory='Expert at analyzing football teams and providing insights on their performance', - llm=llm, - memory=True, - memory_storage=memory -) - -journalist = Agent( - role='Sports Journalist', - goal='Create engaging football articles', - backstory='Experienced sports journalist who specializes in Premier League coverage', - llm=llm, - memory=True, - memory_storage=memory -) - -# Create tasks -analysis_task = Task( - description='Analyze Manchester City\'s recent performance based on Pep Guardiola\'s comments: "The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased."', - agent=sports_analyst, - expected_output="A comprehensive analysis of Manchester City's current form based on Guardiola's comments." -) - -writing_task = Task( - description='Write a sports article about Manchester City\'s form using the analysis and Guardiola\'s comments.', - agent=journalist, - context=[analysis_task], - expected_output="An engaging sports article about Manchester City's current form and Guardiola's perspective." -) - -# Create crew with memory -crew = Crew( - agents=[sports_analyst, journalist], - tasks=[analysis_task, writing_task], - process=Process.sequential, - memory=True, - short_term_memory=memory, # Explicitly pass our memory implementation - verbose=True -) -``` - -### Run Agent Memory Demo - - -```python -# Run the crew with optimized GSI memory -print("Running CrewAI agents with GSI-optimized memory storage...") -start_time = time.time() -result = crew.kickoff() -execution_time = time.time() - start_time - -print("\n" + "="*80) -print("CREWAI AGENT MEMORY DEMO RESULT") -print("="*80) -print(result) -print("="*80) -print(f"\n✅ CrewAI agents completed successfully in {execution_time:.2f} seconds!") -print("✅ Agents used GSI-optimized Couchbase memory storage for fast retrieval!") -print("✅ Memory will persist across sessions for continued learning and context retention!") -``` - - Running CrewAI agents with GSI-optimized memory storage... - - - -
╭──────────────────────────────────────────── Crew Execution Started ─────────────────────────────────────────────╮
-                                                                                                                 
-  Crew Execution Started                                                                                         
-  Name: crew                                                                                                     
-  ID: 38d8c744-17cf-4aef-b246-3ff3a930ca29                                                                       
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
╭────────────────────────────────────────────── 🧠 Retrieved Memory ──────────────────────────────────────────────╮
-                                                                                                                 
-  Historical Data:                                                                                               
-  - Ensure that the actual output directly addresses the task description and expected output.                   
-  - Include more specific statistical data and recent match examples to support the analysis.                    
-  - Incorporate more direct quotes from Pep Guardiola or other relevant stakeholders.                            
-  - Address potential biases in Guardiola's comments and provide a balanced view considering external opinions.  
-  - Explore deeper tactical analysis to provide more insights into the team's performance.                       
-  - Mention fu...                                                                                                
-                                                                                                                 
-╰─────────────────────────────────────────── Retrieval Time: 1503.80ms ───────────────────────────────────────────╯
-
- - - - -
-
- - - - -
╭─────────────────────────────────────────────── 🤖 Agent Started ────────────────────────────────────────────────╮
-                                                                                                                 
-  Agent: Sports Analyst                                                                                          
-                                                                                                                 
-  Task: Analyze Manchester City's recent performance based on Pep Guardiola's comments: "The team is playing     
-  well, we are in a good moment. The way we are training, the way we are playing - I am really pleased."         
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
-
- - - - -

-
-
-
-
-
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮
-                                                                                                                 
-  Task Completed                                                                                                 
-  Name: bd1a6f7d-9d37-47f0-98ce-2420c3175312                                                                     
-  Agent: Sports Analyst                                                                                          
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
╭────────────────────────────────────────────── 🧠 Retrieved Memory ──────────────────────────────────────────────╮
-                                                                                                                 
-  Historical Data:                                                                                               
-  - Ensure that the article includes direct quotes from Guardiola if possible to enhance credibility.            
-  - Include more detailed statistical analysis or comparisons with previous seasons for a deeper insight into    
-  the team's form.                                                                                               
-  - Incorporate players' and experts' opinions or commentary to provide a well-rounded perspective.              
-  - Add a section discussing future challenges or key upcoming matches for Manchester City.                      
-  - Consider incorporating multimedia elements like images or videos ...                                         
-                                                                                                                 
-╰─────────────────────────────────────────── Retrieval Time: 854.27ms ────────────────────────────────────────────╯
-
- - - - -
-
- - - - -
╭─────────────────────────────────────────────── 🤖 Agent Started ────────────────────────────────────────────────╮
-                                                                                                                 
-  Agent: Sports Journalist                                                                                       
-                                                                                                                 
-  Task: Write a sports article about Manchester City's form using the analysis and Guardiola's comments.         
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
-
- - - - -

-
-
-
-
-
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮
-                                                                                                                 
-  Task Completed                                                                                                 
-  Name: 8bcffe0e-5a64-4e12-8207-e0f8701d847b                                                                     
-  Agent: Sports Journalist                                                                                       
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -
-
- - - - - ================================================================================ - CREWAI AGENT MEMORY DEMO RESULT - ================================================================================ - **Manchester City’s Impeccable Form: A Reflection of Guardiola’s Philosophy** - - Manchester City has been turning heads with their exceptional form under the astute guidance of Pep Guardiola. The team’s recent performances have not only aligned seamlessly with their manager’s philosophy but have also placed them in a formidable position across various competitions. Guardiola himself expressed his satisfaction, stating, "The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased." - - City’s prowess has been evident both domestically and in international arenas. A key factor in their success is their meticulous training regimen, which has fostered strategic flexibility, a hallmark of Guardiola’s management. Over the past few matches, Manchester City has consistently maintained a high possession rate, often exceeding 60%. This high possession allows them to control the tempo and dictate the flow of the game, a crucial component of their strategy. - - A recent standout performance was their dominant victory against a top Premier League rival. In this match, City showcased their attacking capabilities and defensive solidity, managing to keep a clean sheet. The contributions of key players like Kevin De Bruyne and Erling Haaland have been instrumental. De Bruyne’s creativity and passing range have opened multiple avenues for attack, while Haaland’s clinical finishing has consistently troubled defenses. - - Guardiola’s system, which relies heavily on positional play and fluid movement, has been a critical factor in their ability to break down opposition defenses with quick, incisive passes. The team’s pressing game has also been a cornerstone of their strategy, allowing them to win back possession high up the pitch and quickly transition to attack. - - Despite the glowing form and Guardiola’s positive outlook, it’s important to acknowledge potential areas for improvement. While their attack is formidable, City has shown occasional vulnerability to counter-attacks, particularly when their full-backs are positioned high up the field. Addressing these defensive transitions will be crucial, especially against teams with quick counter-attacking capabilities. - - Looking forward, Manchester City’s current form is a strong foundation for upcoming challenges, including key fixtures in the Premier League and the knockout stages of the UEFA Champions League. Maintaining this performance level will be essential as they pursue multiple titles. The team’s depth, strategic versatility, and Guardiola’s leadership will be decisive factors in sustaining their momentum. - - In conclusion, Manchester City is indeed in a "good moment," as Guardiola aptly puts it. Their recent performances reflect a well-oiled machine operating at high efficiency. However, the team must remain vigilant about potential weaknesses and continue adapting tactically to ensure their current form translates into long-term success. As they aim for glory, the synergy between Guardiola’s strategic mastermind and the players’ execution will undoubtedly be the key to their triumphs. - ================================================================================ - - ✅ CrewAI agents completed successfully in 37.60 seconds! - ✅ Agents used GSI-optimized Couchbase memory storage for fast retrieval! - ✅ Memory will persist across sessions for continued learning and context retention! - - -## Memory Retention Testing - -### Verify Memory Storage and Retrieval - -Test that our agents successfully stored memories and can retrieve them using semantic search. - - -```python -# Wait for memories to be stored -time.sleep(2) - -# List all documents in the collection -try: - # Query to fetch all documents of this memory type - query_str = f"SELECT META().id, * FROM `{storage.bucket_name}`.`{storage.scope_name}`.`{storage.collection_name}` WHERE memory_type = $type" - query_result = storage.cluster.query(query_str, type=storage.type) - - print(f"\nAll memory entries in Couchbase:") - print("-" * 80) - for i, row in enumerate(query_result, 1): - doc_id = row.get('id') - memory_id = row.get(storage.collection_name, {}).get('memory_id', 'unknown') - content = row.get(storage.collection_name, {}).get('text', '')[:100] + "..." # Truncate for readability - - print(f"Entry {i}: {memory_id}") - print(f"Content: {content}") - print("-" * 80) -except Exception as e: - print(f"Failed to list memory entries: {str(e)}") - -# Test memory retention -memory_query = "What is Manchester City's current form according to Guardiola?" -memory_results = storage.search( - query=memory_query, - limit=5, # Increased to see more results - score_threshold=0.0 # Lower threshold to see all results -) - -print("\nMemory Search Results:") -print("-" * 80) -for result in memory_results: - print(f"Context: {result['context']}") - print(f"Distance: {result['distance']} (lower = more similar)") - print("-" * 80) - -# Try a more specific query to find agent interactions -interaction_query = "Manchester City playing style analysis tactical" -interaction_results = storage.search( - query=interaction_query, - limit=3, - score_threshold=0.0 -) - -print("\nAgent Interaction Memory Results:") -print("-" * 80) -if interaction_results: - for result in interaction_results: - print(f"Context: {result['context'][:200]}...") # Limit output size - print(f"Distance: {result['distance']} (lower = more similar)") - print("-" * 80) -else: - print("No interaction memories found. This is normal if agents haven't completed tasks yet.") - print("-" * 80) -``` - - - All memory entries in Couchbase: - -------------------------------------------------------------------------------- - - Memory Search Results: - -------------------------------------------------------------------------------- - Context: Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased.' - Distance: 0.285379886892123 (lower = more similar) - -------------------------------------------------------------------------------- - Context: Manchester City's recent performance analysis under Pep Guardiola reflects a team in strong form and alignment with the manager's philosophy. Guardiola's comments, "The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased," suggest a high level of satisfaction with both the tactical execution and the overall team ethos on the pitch. - - In recent matches, Manchester City has demonstrated their prowess in both domestic and international competitions. This form can be attributed to their meticulous training regimen and strategic flexibility, hallmarks of Guardiola's management style. Over the past few matches, City has maintained a high possession rate, often exceeding 60%, which allows them to control the tempo and dictate the flow of the game. Their attacking prowess is underscored by their goal-scoring statistics, often leading the league in goals scored per match. - - One standout example of their performance is their recent dominant victory against a top Premier League rival, where they not only showcased their attacking capabilities but also their defensive solidity, keeping a clean sheet. Key players such as Kevin De Bruyne and Erling Haaland have been instrumental, with De Bruyne's creativity and passing range creating numerous opportunities, while Haaland's clinical finishing has consistently troubled defenses. - - Guardiola's system relies heavily on positional play and fluid movement, which has been evident in the team's ability to break down opposition defenses through quick, incisive passes. The team's pressing game has also been a critical component, often winning back possession high up the pitch and quickly transitioning to attack. - - Despite Guardiola's positive outlook, potential biases in his comments might overlook some areas needing improvement. For instance, while their attack is formidable, there have been instances where the team has shown vulnerability to counter-attacks, particularly when full-backs are pushed high up the field. Addressing these defensive transitions could be crucial, especially against teams with quick, counter-attacking capabilities. - - Looking ahead, Manchester City's current form sets a strong foundation for upcoming challenges, including key fixtures in the Premier League and the knockout stages of the UEFA Champions League. Maintaining this level of performance will be critical as they pursue multiple titles. The team's depth, strategic versatility, and Guardiola's leadership are likely to be decisive factors in sustaining their momentum. - - In summary, Manchester City is indeed in a "good moment," as Guardiola states, with their recent performances reflecting a well-oiled machine operating at high efficiency. However, keeping a vigilant eye on potential weaknesses and continuing to adapt tactically will be essential to translating their current form into long-term success. - Distance: 0.22963345721993045 (lower = more similar) - -------------------------------------------------------------------------------- - Context: **Manchester City’s Impeccable Form: A Reflection of Guardiola’s Philosophy** - - ... (output truncated for brevity) - - -## Conclusion - -You've successfully implemented a custom memory backend for CrewAI agents using Couchbase GSI vector search! diff --git a/tutorial/markdown/generated/vector-search-cookbook/haystack-fts-RAG_with_Couchbase_Capella_and_OpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/haystack-fts-RAG_with_Couchbase_Capella_and_OpenAI.md deleted file mode 100644 index 8b19598..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/haystack-fts-RAG_with_Couchbase_Capella_and_OpenAI.md +++ /dev/null @@ -1,455 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-haystack-rag-with-fts" -title: "Retrieval-Augmented Generation (RAG) with OpenAI, Haystack and Couchbase Search Vector Index" -short_title: "RAG with OpenAI, Haystack and Couchbase Search Vector Index" -description: - - Learn how to build a semantic search engine using Couchbase's Search Vector Index. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with the embeddings generated by OpenAI Services. - - You will understand how to perform Retrieval-Augmented Generation (RAG) using Haystack, Couchbase and OpenAI services. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - OpenAI - - Artificial Intelligence - - Haystack - - FTS -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/haystack/fts/RAG_with_Couchbase_Capella_and_OpenAI.ipynb) - -# BBC News Dataset RAG Pipeline with Couchbase and OpenAI - -This notebook demonstrates how to build a Retrieval Augmented Generation (RAG) system using: -- The BBC News dataset containing real-time news articles -- Couchbase Capella as the vector store with FTS (Full Text Search) -- Haystack framework for the RAG pipeline -- OpenAI for embeddings and text generation - -The system allows users to ask questions about current events and get AI-generated answers based on the latest news articles. - -# Installing Necessary Libraries - -To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, Haystack handles AI model integrations and pipeline management, and we will use the OpenAI SDK for generating embeddings and calling OpenAI's language models. - - -```python -%pip install datasets haystack-ai couchbase-haystack openai pandas -``` - -# Importing Necessary Libraries - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, Haystack components for RAG pipeline, embedding generation, and dataset loading. - - -```python -import getpass -import base64 -import logging -import sys -import time -import pandas as pd -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import CouchbaseException -from couchbase.options import ClusterOptions -from datasets import load_dataset -from haystack import Pipeline, GeneratedAnswer -from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder -from haystack.components.preprocessors import DocumentCleaner -from haystack.components.writers import DocumentWriter -from haystack.components.builders.answer_builder import AnswerBuilder -from haystack.components.builders.prompt_builder import PromptBuilder -from haystack.components.generators import OpenAIGenerator -from haystack.utils import Secret -from haystack.dataclasses import Document - -from couchbase_haystack import ( - CouchbaseSearchDocumentStore, - CouchbasePasswordAuthenticator, - CouchbaseClusterOptions, - CouchbaseSearchEmbeddingRetriever, -) -from couchbase.options import KnownConfigProfiles - -# Configure logging -logger = logging.getLogger(__name__) -logger.setLevel(logging.DEBUG) - -``` - -# Prerequisites - -## Create and Deploy Your Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy an operational cluster. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Have a multi-node Capella cluster running the Data, Query, Index, and Search services. -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -### OpenAI Models Setup - -In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context. - -For this implementation, we'll use OpenAI's models which provide state-of-the-art performance for both embeddings and text generation: - -**Embedding Model**: We'll use OpenAI's `text-embedding-3-large` model, which provides high-quality embeddings with 3,072 dimensions for semantic search capabilities. - -**Large Language Model**: We'll use OpenAI's `gpt-4o` model for generating responses based on the retrieved context. This model offers excellent reasoning capabilities and can handle complex queries effectively. - -**Prerequisites for OpenAI Integration**: -* Create an OpenAI account at [platform.openai.com](https://platform.openai.com) -* Generate an API key from your OpenAI dashboard -* Ensure you have sufficient credits or a valid payment method set up -* Set up your API key as an environment variable or input it securely in the notebook - -For more details about OpenAI's models and pricing, please refer to the [OpenAI documentation](https://platform.openai.com/docs/models). - -# Configure Couchbase Credentials - -Enter your Couchbase and OpenAI credentials: - -**OPENAI_API_KEY** is your OpenAI API key which can be obtained from your OpenAI dashboard at [platform.openai.com](https://platform.openai.com/api-keys). - -**INDEX_NAME** is the name of the FTS search index we will use for vector search operations. - - -```python -CB_CONNECTION_STRING = input("Couchbase Cluster URL (default: localhost): ") or "localhost" -CB_USERNAME = input("Couchbase Username (default: admin): ") or "admin" -CB_PASSWORD = input("Couchbase password (default: Password@12345): ") or "Password@12345" -CB_BUCKET_NAME = input("Couchbase Bucket: ") -CB_SCOPE_NAME = input("Couchbase Scope: ") -CB_COLLECTION_NAME = input("Couchbase Collection: ") -CB_INDEX_NAME = input("Vector Search Index: ") -OPENAI_API_KEY = input("OpenAI API Key: ") - -# Check if the variables are correctly loaded -if not all([CB_CONNECTION_STRING, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, CB_SCOPE_NAME, CB_COLLECTION_NAME, CB_INDEX_NAME, CB_OPENAI_API_KEY]): - raise ValueError("All configuration variables must be provided.") -``` - - -```python -from couchbase.cluster import Cluster -from couchbase.options import ClusterOptions -from couchbase.auth import PasswordAuthenticator -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.collections import CollectionSpec -from couchbase.management.search import SearchIndex -import json - -# Connect to Couchbase cluster -cluster = Cluster(CB_CONNECTION_STRING, ClusterOptions( - PasswordAuthenticator(CB_USERNAME, CB_PASSWORD))) - -# Create bucket if it does not exist -bucket_manager = cluster.buckets() -try: - bucket_manager.get_bucket(CB_BUCKET_NAME) - print(f"Bucket '{CB_BUCKET_NAME}' already exists.") -except Exception as e: - print(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating bucket...") - bucket_settings = CreateBucketSettings(name=CB_BUCKET_NAME, ram_quota_mb=500) - bucket_manager.create_bucket(bucket_settings) - print(f"Bucket '{CB_BUCKET_NAME}' created successfully.") - -# Create scope and collection if they do not exist -collection_manager = cluster.bucket(CB_BUCKET_NAME).collections() -scopes = collection_manager.get_all_scopes() -scope_exists = any(scope.name == CB_SCOPE_NAME for scope in scopes) - -if scope_exists: - print(f"Scope '{CB_SCOPE_NAME}' already exists.") -else: - print(f"Scope '{CB_SCOPE_NAME}' does not exist. Creating scope...") - collection_manager.create_scope(CB_SCOPE_NAME) - print(f"Scope '{CB_SCOPE_NAME}' created successfully.") - -collections = [collection.name for scope in scopes if scope.name == CB_SCOPE_NAME for collection in scope.collections] -collection_exists = CB_COLLECTION_NAME in collections - -if collection_exists: - print(f"Collection '{CB_COLLECTION_NAME}' already exists in scope '{CB_SCOPE_NAME}'.") -else: - print(f"Collection '{CB_COLLECTION_NAME}' does not exist in scope '{CB_SCOPE_NAME}'. Creating collection...") - collection_manager.create_collection(collection_name=CB_COLLECTION_NAME, scope_name=CB_SCOPE_NAME) - print(f"Collection '{CB_COLLECTION_NAME}' created successfully.") - -# Create search index from search_index.json file at scope level -with open('fts_index.json', 'r') as search_file: - search_index_definition = SearchIndex.from_json(json.load(search_file)) - - # Update search index definition with user inputs - search_index_definition.name = CB_INDEX_NAME - search_index_definition.source_name = CB_BUCKET_NAME - - # Update types mapping - old_type_key = next(iter(search_index_definition.params['mapping']['types'].keys())) - type_obj = search_index_definition.params['mapping']['types'].pop(old_type_key) - search_index_definition.params['mapping']['types'][f"{CB_SCOPE_NAME}.{CB_COLLECTION_NAME}"] = type_obj - - search_index_name = search_index_definition.name - - # Get scope-level search manager - scope_search_manager = cluster.bucket(CB_BUCKET_NAME).scope(CB_SCOPE_NAME).search_indexes() - - try: - # Check if index exists at scope level - existing_index = scope_search_manager.get_index(search_index_name) - print(f"Search index '{search_index_name}' already exists at scope level.") - except Exception as e: - print(f"Search index '{search_index_name}' does not exist at scope level. Creating search index from fts_index.json...") - with open('fts_index.json', 'r') as search_file: - search_index_definition = SearchIndex.from_json(json.load(search_file)) - scope_search_manager.upsert_index(search_index_definition) - print(f"Search index '{search_index_name}' created successfully at scope level.") -``` - -# Load and Process Movie Dataset - -Load the TMDB movie dataset and prepare documents for indexing: - - -```python -# Load TMDB dataset -print("Loading TMDB dataset...") -dataset = load_dataset("AiresPucrs/tmdb-5000-movies") -movies_df = pd.DataFrame(dataset['train']) -print(f"Total movies found: {len(movies_df)}") - -# Create documents from movie data -docs_data = [] -for _, row in movies_df.iterrows(): - if pd.isna(row['overview']): - continue - - try: - docs_data.append({ - 'id': str(row["id"]), - 'content': f"Title: {row['title']}\nGenres: {', '.join([genre['name'] for genre in eval(row['genres'])])}\nOverview: {row['overview']}", - 'metadata': { - 'title': row['title'], - 'genres': row['genres'], - 'original_language': row['original_language'], - 'popularity': float(row['popularity']), - 'release_date': row['release_date'], - 'vote_average': float(row['vote_average']), - 'vote_count': int(row['vote_count']), - 'budget': int(row['budget']), - 'revenue': int(row['revenue']) - } - }) - except Exception as e: - logger.error(f"Error processing movie {row['title']}: {e}") - -print(f"Created {len(docs_data)} documents with valid overviews") -documents = [Document(id=doc['id'], content=doc['content'], meta=doc['metadata']) - for doc in docs_data] -``` - -# Initialize Document Store - -Set up the Couchbase document store for storing movie data and embeddings: - - -```python -# Initialize document store -document_store = CouchbaseSearchDocumentStore( - cluster_connection_string=Secret.from_token(CB_CONNECTION_STRING), - authenticator=CouchbasePasswordAuthenticator( - username=Secret.from_token(CB_USERNAME), - password=Secret.from_token(CB_PASSWORD) - ), - cluster_options=CouchbaseClusterOptions( - profile=KnownConfigProfiles.WanDevelopment, - ), - bucket=CB_BUCKET_NAME, - scope=CB_SCOPE_NAME, - collection=CB_COLLECTION_NAME, - vector_search_index=CB_INDEX_NAME, -) - -print("Couchbase document store initialized successfully.") -``` - -# Initialize Embedder for Document Embedding - -Configure the document embedder using Capella AI's endpoint and the E5 Mistral model. This component will generate embeddings for each movie overview to enable semantic search - - - - -```python -embedder = OpenAIDocumentEmbedder( - api_key=Secret.from_token(OPENAI_API_KEY), - model="text-embedding-3-large", -) - -rag_embedder = OpenAITextEmbedder( - api_key=Secret.from_token(OPENAI_API_KEY), - model="text-embedding-3-large", -) - -``` - -# Initialize LLM Generator -Configure the LLM generator using Capella AI's endpoint and Llama 3.1 model. This component will generate natural language responses based on the retrieved documents. - - - -```python -llm = OpenAIGenerator( - api_key=Secret.from_token(OPENAI_API_KEY), - model="gpt-4o", -) -``` - -# Create Indexing Pipeline -Build the pipeline for processing and indexing movie documents: - - -```python -# Create indexing pipeline -index_pipeline = Pipeline() -index_pipeline.add_component("cleaner", DocumentCleaner()) -index_pipeline.add_component("embedder", embedder) -index_pipeline.add_component("writer", DocumentWriter(document_store=document_store)) - -# Connect indexing components -index_pipeline.connect("cleaner.documents", "embedder.documents") -index_pipeline.connect("embedder.documents", "writer.documents") -``` - -# Run Indexing Pipeline - -Execute the pipeline for processing and indexing movie documents: - - -```python -# Run indexing pipeline - -if documents: - # Process documents in batches for better performance - batch_size = 100 - total_docs = len(documents) - - for i in range(0, total_docs, batch_size): - batch = documents[i:i + batch_size] - result = index_pipeline.run({"cleaner": {"documents": batch}}) - print(f"Processed batch {i//batch_size + 1}: {len(batch)} documents") - - print(f"\nSuccessfully processed {total_docs} documents") - print(f"Sample document metadata: {documents[0].meta}") -else: - print("No documents created. Skipping indexing.") -``` - -# Create RAG Pipeline - -Set up the Retrieval Augmented Generation pipeline for answering questions about movies: - - -```python -# Define RAG prompt template -prompt_template = """ -Given these documents, answer the question.\nDocuments: -{% for doc in documents %} - {{ doc.content }} -{% endfor %} - -\nQuestion: {{question}} -\nAnswer: -""" - -# Create RAG pipeline -rag_pipeline = Pipeline() - -# Add components -rag_pipeline.add_component( - "query_embedder", - rag_embedder, -) -rag_pipeline.add_component("retriever", CouchbaseSearchEmbeddingRetriever(document_store=document_store)) -rag_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template)) -rag_pipeline.add_component("llm",llm) -rag_pipeline.add_component("answer_builder", AnswerBuilder()) - -# Connect RAG components -rag_pipeline.connect("query_embedder", "retriever.query_embedding") -rag_pipeline.connect("retriever.documents", "prompt_builder.documents") -rag_pipeline.connect("prompt_builder.prompt", "llm.prompt") -rag_pipeline.connect("llm.replies", "answer_builder.replies") -rag_pipeline.connect("llm.meta", "answer_builder.meta") -rag_pipeline.connect("retriever", "answer_builder.documents") - -print("RAG pipeline created successfully.") -``` - -# Ask Questions About Movies - -Use the RAG pipeline to ask questions about movies and get AI-generated answers: - - -```python -# Example question -question = "Who does Savva want to save from the vicious hyenas?" - -# Run the RAG pipeline -result = rag_pipeline.run( - { - "query_embedder": {"text": question}, - "retriever": {"top_k": 5}, - "prompt_builder": {"question": question}, - "answer_builder": {"query": question}, - }, - include_outputs_from={"retriever", "query_embedder"} -) - -# Get the generated answer -answer: GeneratedAnswer = result["answer_builder"]["answers"][0] - -# Print retrieved documents -print("=== Retrieved Documents ===") -retrieved_docs = result["retriever"]["documents"] -for idx, doc in enumerate(retrieved_docs, start=1): - print(f"Id: {doc.id} Title: {doc.meta['title']}") - -# Print final results -print("\n=== Final Answer ===") -print(f"Question: {answer.query}") -print(f"Answer: {answer.data}") -print("\nSources:") -for doc in answer.documents: - print(f"-> {doc.meta['title']}") -``` - -# Conclusion - -In this tutorial, we built a Retrieval-Augmented Generation (RAG) system using Couchbase Capella, OpenAI, and Haystack with the BBC News dataset. This demonstrates how to combine vector search capabilities with large language models to answer questions about current events using real-time information. - -The key components include: -- **Couchbase Capella** for vector storage and FTS-based retrieval -- **Haystack** for pipeline orchestration and component management -- **OpenAI** for embeddings (`text-embedding-3-large`) and text generation (`gpt-4o`) - -This approach enables AI applications to access and reason over current information that extends beyond the LLM's training data, making responses more accurate and relevant for real-world use cases. diff --git a/tutorial/markdown/generated/vector-search-cookbook/haystack-gsi-RAG_with_Couchbase_Capella_and_OpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/haystack-gsi-RAG_with_Couchbase_Capella_and_OpenAI.md deleted file mode 100644 index 0706a17..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/haystack-gsi-RAG_with_Couchbase_Capella_and_OpenAI.md +++ /dev/null @@ -1,688 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-haystack-rag-with-gsi" -title: "RAG with OpenAI, Haystack and Couchbase Hyperscale and Composite Vector Indexes" -short_title: "RAG with OpenAI, Haystack and Couchbase CVI and HVI" -description: - - Learn how to build a semantic search engine using Couchbase's Hyperscale and Composite Vector Indexes. - - This tutorial demonstrates how to integrate Couchbase's GSI vector search capabilities with OpenAI embeddings. - - You will understand how to perform Retrieval-Augmented Generation (RAG) using Haystack, Couchbase and OpenAI services. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - OpenAI - - Artificial Intelligence - - Haystack - - GSI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/haystack/gsi/RAG_with_Couchbase_Capella_and_OpenAI.ipynb) - -# Introduction - -In this guide, we will walk you through building a Retrieval Augmented Generation (RAG) application using Couchbase Capella as the database, [gpt-4o](https://platform.openai.com/docs/models/gpt-4o) model as the large language model provided by OpenAI. We will use the [text-embedding-3-large](https://platform.openai.com/docs/guides/embeddings/embedding-models) model for generating embeddings. - -This notebook demonstrates how to build a RAG system using: -- The [BBC News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime) containing news articles -- Couchbase Capella as the vector store with GSI (Global Secondary Index) for vector search -- Haystack framework for the RAG pipeline -- OpenAI for embeddings and text generation - -We leverage Couchbase's Global Secondary Index (GSI) vector search capabilities to create and manage vector indexes, enabling efficient semantic search capabilities. GSI provides high-performance vector search with support for both Hyperscale Vector Indexes and Composite Vector Indexes, designed to scale to billions of vectors with low memory footprint and optimized concurrent operations. - -Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial will equip you with the knowledge to create a fully functional RAG system using OpenAI Services and Haystack with Couchbase's advanced GSI vector search. - -# Before you start - -## Create and Deploy Your Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy an operational cluster. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Have a multi-node Capella cluster running the Data, Query, Index, and Search services. -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -### OpenAI Models Setup - -In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context. - -For this implementation, we'll use OpenAI's models which provide state-of-the-art performance for both embeddings and text generation: - -**Embedding Model**: We'll use OpenAI's `text-embedding-3-large` model, which provides high-quality embeddings with 3,072 dimensions for semantic search capabilities. - -**Large Language Model**: We'll use OpenAI's `gpt-4o` model for generating responses based on the retrieved context. This model offers excellent reasoning capabilities and can handle complex queries effectively. - -**Prerequisites for OpenAI Integration**: -* Create an OpenAI account at [platform.openai.com](https://platform.openai.com) -* Generate an API key from your OpenAI dashboard -* Ensure you have sufficient credits or a valid payment method set up -* Set up your API key as an environment variable or input it securely in the notebook - -For more details about OpenAI's models and pricing, please refer to the [OpenAI documentation](https://platform.openai.com/docs/models). - - -# Installing Necessary Libraries -To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, Haystack handles AI model integrations and pipeline management, and we will use the OpenAI SDK for generating embeddings and calling OpenAI's language models. - - - -```python -# Install required packages -%pip install -r requirements.txt -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, Haystack components for RAG pipeline, embedding generation, and dataset loading. - - - -```python -import getpass -import base64 -import logging -import sys -import time -import pandas as pd -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import CouchbaseException -from couchbase.options import ClusterOptions, KnownConfigProfiles, QueryOptions - -from datasets import load_dataset - -from haystack import Pipeline, Document, GeneratedAnswer -from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder -from haystack.components.builders.answer_builder import AnswerBuilder -from haystack.components.generators import OpenAIGenerator -from haystack.components.preprocessors import DocumentCleaner -from haystack.components.writers import DocumentWriter -from haystack.utils import Secret -from haystack.components.builders import PromptBuilder -from couchbase_haystack import ( - CouchbaseQueryDocumentStore, - CouchbaseQueryEmbeddingRetriever, - QueryVectorSearchType, - QueryVectorSearchSimilarity, - CouchbasePasswordAuthenticator, - CouchbaseClusterOptions -) - -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, collection names, and API keys. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - -**OPENAI_API_KEY** is your OpenAI API key which can be obtained from your OpenAI dashboard at [platform.openai.com](https://platform.openai.com/api-keys). - -**INDEX_NAME** is the name of the GSI vector index we will create for vector search operations. - - -```python -CB_CONNECTION_STRING = input("Couchbase Cluster URL (default: localhost): ") or "couchbase://localhost" -CB_USERNAME = input("Couchbase Username (default: admin): ") or "admin" -CB_PASSWORD = input("Couchbase password (default: Password@12345): ") or "Password@12345" -CB_BUCKET_NAME = input("Couchbase Bucket: ") -SCOPE_NAME = input("Couchbase Scope: ") -COLLECTION_NAME = input("Couchbase Collection: ") -INDEX_NAME = input("Vector Search Index: ") -OPENAI_API_KEY = input("OpenAI API Key: ") - -# Check if the variables are correctly loaded -if not all([CB_CONNECTION_STRING, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_NAME, OPENAI_API_KEY]): - raise ValueError("All configuration variables must be provided.") - -``` - -# Setting Up Logging -Logging is essential for tracking the execution of our script and debugging any issues that may arise. We set up a logger that will display information about the script's progress, including timestamps and log levels. - - - -```python -# Configure logging -logging.basicConfig( - level=logging.INFO, - format="%(asctime)s - %(levelname)s - %(message)s", - handlers=[logging.StreamHandler(sys.stdout)], -) -``` - -# Connecting to Couchbase Capella -The next step is to establish a connection to our Couchbase Capella cluster. This connection will allow us to interact with the database, store and retrieve documents, and perform vector searches. - - - -```python -try: - # Initialize the Couchbase Cluster - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - options.apply_profile(KnownConfigProfiles.WanDevelopment) - - # Connect to the cluster - cluster = Cluster(CB_CONNECTION_STRING, options) - - # Wait for the cluster to be ready - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to the Couchbase cluster") -except CouchbaseException as e: - raise RuntimeError(f"Failed to connect to Couchbase: {str(e)}") -``` - -# Setting Up the Bucket, Scope, and Collection -Before we can store our data, we need to ensure that the appropriate bucket, scope, and collection exist in our Couchbase cluster. The code below checks if these components exist and creates them if they don't, providing a foundation for storing our vector embeddings and documents. - - -```python -from couchbase.management.buckets import CreateBucketSettings -import json - -# Create bucket if it does not exist -bucket_manager = cluster.buckets() -try: - bucket_manager.get_bucket(CB_BUCKET_NAME) - print(f"Bucket '{CB_BUCKET_NAME}' already exists.") -except Exception as e: - print(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating bucket...") - bucket_settings = CreateBucketSettings(name=CB_BUCKET_NAME, ram_quota_mb=500) - bucket_manager.create_bucket(bucket_settings) - print(f"Bucket '{CB_BUCKET_NAME}' created successfully.") - -# Create scope and collection if they do not exist -collection_manager = cluster.bucket(CB_BUCKET_NAME).collections() -scopes = collection_manager.get_all_scopes() -scope_exists = any(scope.name == SCOPE_NAME for scope in scopes) - -if scope_exists: - print(f"Scope '{SCOPE_NAME}' already exists.") -else: - print(f"Scope '{SCOPE_NAME}' does not exist. Creating scope...") - collection_manager.create_scope(SCOPE_NAME) - print(f"Scope '{SCOPE_NAME}' created successfully.") - -collections = [collection.name for scope in scopes if scope.name == SCOPE_NAME for collection in scope.collections] -collection_exists = COLLECTION_NAME in collections - -if collection_exists: - print(f"Collection '{COLLECTION_NAME}' already exists in scope '{SCOPE_NAME}'.") -else: - print(f"Collection '{COLLECTION_NAME}' does not exist in scope '{SCOPE_NAME}'. Creating collection...") - collection_manager.create_collection(collection_name=COLLECTION_NAME, scope_name=SCOPE_NAME) - print(f"Collection '{COLLECTION_NAME}' created successfully.") - -``` - -# Load the BBC News Dataset -To build a RAG engine, we need data to search through. We use the [BBC Realtime News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime), a dataset with up-to-date BBC news articles grouped by month. This dataset contains articles that were created after the LLM was trained. It will showcase the use of RAG to augment the LLM. - -The BBC News dataset's varied content allows us to simulate real-world scenarios where users ask complex questions, enabling us to fine-tune our RAG's ability to understand and respond to various types of queries. - - - -```python -try: - news_dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train") - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") -except Exception as e: - raise ValueError(f"Error loading TREC dataset: {str(e)}") -``` - -## Preview the Data - - -```python -# Print the first two examples from the dataset -print("Dataset columns:", news_dataset.column_names) -print("\nFirst two examples:") -print(news_dataset[:2]) -``` - -## Preparing the Data for RAG - -We need to extract the context passages from the dataset to use as our knowledge base for the RAG system. - - -```python -import hashlib - -news_articles = news_dataset -unique_articles = {} - -for article in news_articles: - content = article.get("content") - if content: - content_hash = hashlib.md5(content.encode()).hexdigest() # Generate hash of content - if content_hash not in unique_articles: - unique_articles[content_hash] = article # Store full article - -unique_news_articles = list(unique_articles.values()) # Convert back to list - -print(f"We have {len(unique_news_articles)} unique articles in our database.") - -``` - -# Creating Embeddings using OpenAI -Embeddings are numerical representations of text that capture semantic meaning. Unlike keyword-based search, embeddings enable semantic search to understand context and retrieve documents that are conceptually similar even without exact keyword matches. We'll use OpenAI's `text-embedding-3-large` model to create high-quality embeddings with 3,072 dimensions. This model transforms our text data into vector representations that can be efficiently searched using Haystack's OpenAI document embedder. - - - -```python -try: - # Set up the document embedder for processing documents - document_embedder = OpenAIDocumentEmbedder( - api_key=Secret.from_token(OPENAI_API_KEY), - model="text-embedding-3-large" - ) - - # Set up the text embedder for query processing - rag_embedder = OpenAITextEmbedder( - api_key=Secret.from_token(OPENAI_API_KEY), - model="text-embedding-3-large" - ) - - print("Successfully created embedding models") -except Exception as e: - raise ValueError(f"Error creating embedding models: {str(e)}") -``` - -# Testing the Embeddings Model -We can test the text embeddings model by generating an embedding for a string - - -```python -test_result = rag_embedder.run(text="this is a test sentence") -test_embedding = test_result["embedding"] -print(f"Embedding dimension: {len(test_embedding)}") -``` - -# Setting Up the Couchbase GSI Document Store -The Couchbase GSI document store is set up to store the documents from the dataset using Couchbase's Global Secondary Index vector search capabilities. This document store is optimized for high-performance vector similarity search operations and can scale to billions of vectors using Haystack's Couchbase integration. - - -```python -try: - # Create the Couchbase GSI document store - document_store = CouchbaseQueryDocumentStore( - cluster_connection_string=Secret.from_token(CB_CONNECTION_STRING), - authenticator=CouchbasePasswordAuthenticator( - username=Secret.from_token(CB_USERNAME), - password=Secret.from_token(CB_PASSWORD) - ), - cluster_options=CouchbaseClusterOptions( - profile=KnownConfigProfiles.WanDevelopment, - ), - bucket=CB_BUCKET_NAME, - scope=SCOPE_NAME, - collection=COLLECTION_NAME, - search_type=QueryVectorSearchType.ANN, - similarity=QueryVectorSearchSimilarity.L2 - ) - print("Successfully created GSI document store") -except Exception as e: - raise ValueError(f"Failed to create GSI document store: {str(e)}") -``` - -# Creating Haystack Documents -In this section, we'll process our news articles and create Haystack Document objects. -Each Document is created with specific metadata that will be used for retrieval and generation. -We'll observe examples of the document content to understand how the documents are structured. - - -```python -haystack_documents = [] -# Process and store documents -for article in unique_news_articles: # Process all unique articles - try: - document = Document( - content=article["content"], - meta={ - "title": article["title"], - "description": article["description"], - "published_date": article["published_date"], - "link": article["link"], - } - ) - haystack_documents.append(document) - except Exception as e: - print(f"Failed to create document: {str(e)}") - continue - -# Observing an example of the document content -print("Document content preview:") -print(f"Content: {haystack_documents[0].content[:200]}...") -print(f"Metadata: {haystack_documents[0].meta}") - -print(f"Created {len(haystack_documents)} documents") - - -``` - -# Creating and Running the Indexing Pipeline - -In this section, we'll create an indexing pipeline to process our documents. The pipeline will: - -1. Split the documents into smaller chunks using the DocumentSplitter -2. Generate embeddings for each chunk using our document embedder -3. Store these chunks with their embeddings in our Couchbase document store - -This process transforms our raw documents into a searchable knowledge base that can be queried semantically. - - -```python - - -# Process documents: split into chunks, generate embeddings, and store in document store -# Create indexing pipeline -indexing_pipeline = Pipeline() -indexing_pipeline.add_component("cleaner", DocumentCleaner()) -indexing_pipeline.add_component("embedder", document_embedder) -indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store)) - -indexing_pipeline.connect("cleaner.documents", "embedder.documents") -indexing_pipeline.connect("embedder.documents", "writer.documents") - - - -``` - -# Run Indexing Pipeline - -Execute the pipeline for processing and indexing BCC news documents: - - -```python -# Run the indexing pipeline -if haystack_documents: - result = indexing_pipeline.run({"cleaner": {"documents": haystack_documents}}) - print(f"Indexed {len(result['writer']['documents_written'])} document chunks") -else: - print("No documents created. Skipping indexing.") - -``` - -# Using OpenAI's Large Language Model (LLM) -Large language models are AI systems that are trained to understand and generate human language. We'll be using OpenAI's `gpt-4o` model to process user queries and generate meaningful responses based on the retrieved context from our Couchbase document store. This model is a key component of our RAG system, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By integrating OpenAI's LLM, we equip our RAG system with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our RAG system truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - -The LLM is configured using Haystack's OpenAI generator component with your OpenAI API key for seamless integration with their services. - - -```python -try: - # Set up the LLM generator - generator = OpenAIGenerator( - api_key=Secret.from_token(OPENAI_API_KEY), - model="gpt-4o" - ) - logging.info("Successfully created the OpenAI generator") -except Exception as e: - raise ValueError(f"Error creating OpenAI generator: {str(e)}") -``` - -# Creating the RAG Pipeline - -In this section, we'll create a RAG pipeline using Haystack components. This pipeline serves as the foundation for our RAG system, enabling semantic search capabilities and efficient retrieval of relevant information. - -The RAG pipeline provides a complete workflow that allows us to: -1. Perform semantic searches based on user queries -2. Retrieve the most relevant documents or chunks -3. Generate contextually appropriate responses using our LLM - - - -```python -# Define RAG prompt template -prompt_template = """ -Given these documents, answer the question.\nDocuments: -{% for doc in documents %} - {{ doc.content }} -{% endfor %} - -\nQuestion: {{question}} -\nAnswer: -""" - -# Create the RAG pipeline -rag_pipeline = Pipeline() - -# Add components to the pipeline -rag_pipeline.add_component( - "query_embedder", - rag_embedder, -) -rag_pipeline.add_component("retriever", CouchbaseQueryEmbeddingRetriever(document_store=document_store)) -rag_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template)) -rag_pipeline.add_component("llm",generator) -rag_pipeline.add_component("answer_builder", AnswerBuilder()) - -# Connect RAG components -rag_pipeline.connect("query_embedder", "retriever.query_embedding") -rag_pipeline.connect("retriever.documents", "prompt_builder.documents") -rag_pipeline.connect("prompt_builder.prompt", "llm.prompt") -rag_pipeline.connect("llm.replies", "answer_builder.replies") -rag_pipeline.connect("llm.meta", "answer_builder.meta") -rag_pipeline.connect("retriever", "answer_builder.documents") - -print("Successfully created RAG pipeline") -``` - -# Retrieval-Augmented Generation (RAG) with Couchbase and Haystack - -Let's test our RAG system by performing a semantic search on a sample query. In this example, we'll use a question about Pep Guardiola's reaction to Manchester City's recent form. The RAG system will: - -1. Process the natural language query -2. Search through our document store for relevant information -3. Retrieve the most semantically similar documents -4. Generate a comprehensive response using the LLM - -This demonstrates how our system combines the power of vector search with language model capabilities to provide accurate, contextual answers based on the information in our database. - -**Note:** By default, without any GSI vector index, Couchbase uses linear brute force search which compares the query vector against every document in the collection. This works for small datasets but can become slow as the dataset grows. - - -```python -# Sample query from the dataset - -query = "Who will Daniel Dubois fight in Saudi Arabia on 22 February?" - -try: - # Perform the semantic search using the RAG pipeline - start_time = time.time() - result = rag_pipeline.run({ - "query_embedder": {"text": query}, - "retriever": {"top_k": 5}, - "prompt_builder": {"question": query}, - "answer_builder": {"query": query}, - }, - include_outputs_from={"retriever", "query_embedder"} - ) - search_elapsed_time = time.time() - start_time - # Get the generated answer - answer: GeneratedAnswer = result["answer_builder"]["answers"][0] - - # Print retrieved documents - print("=== Retrieved Documents ===") - retrieved_docs = result["retriever"]["documents"] - for idx, doc in enumerate(retrieved_docs, start=1): - print(f"Id: {doc.id} Title: {doc.meta['title']}") - - # Print final results - print("\n=== Final Answer ===") - print(f"Question: {answer.query}") - print(f"Answer: {answer.data}") - print("\nSources:") - for doc in answer.documents: - print(f"-> {doc.meta['title']}") - # Display search results - print(f"\nOptimized GSI Vector Search Results (completed in {search_elapsed_time:.2f} seconds):") - #print(result["generator"]["replies"][0]) - -except Exception as e: - raise RuntimeError(f"Error performing RAG search: {e}") -``` - -# Create GSI Vector Index (Optimized Search) - -While the above RAG system works effectively, we can significantly improve query performance by leveraging Couchbase's advanced GSI vector search capabilities. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -In this section, we'll set up the Couchbase vector store using GSI (Global Secondary Index) for high-performance vector search. - -GSI vector search supports two main index types: - -## Hyperscale Vector Indexes -- Specifically designed for vector searches -- Perform vector similarity and semantic searches faster than the other types of indexes -- Designed to scale to billions of vectors -- Most of the index resides in a highly optimized format on disk -- High accuracy even for vectors with a large number of dimensions -- Supports concurrent searches and inserts for datasets that are constantly changing - -Use this type of index when you want to primarily query vector values with a low memory footprint. In general, Hyperscale Vector indexes are the best choice for most applications that use vector searches. - -## Composite Vector Indexes -- Combines a standard Global Secondary index (GSI) with a single vector column -- Designed for searches using a single vector value along with standard scalar values that filter out large portions of the dataset. The scalar attributes in a query reduce the number of vectors the Couchbase Server has to compare when performing a vector search to find similar vectors. -- Consume a moderate amount of memory and can index billions of documents. -- Work well for cases where your queries are highly selective — returning a small number of results from a large dataset - -Use Composite Vector indexes when you want to perform searches of documents using both scalars and a vector where the scalar values filter out large portions of the dataset. - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html). - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -**Centroids (IVF - Inverted File):** -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -**Quantization Options:** -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -**Common Examples:** -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/server/current/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index for optimal performance. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -# Create a BHIVE (Hyperscale Vector Index) for optimized vector search -try: - bhive_index_name = f"{INDEX_NAME}_bhive" - - # Use the cluster connection to create the BHIVE index - scope = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME) - - options = { - "dimension": 3072, # text-embedding-3-large dimension - "description": "IVF1024,PQ32x8", - "similarity": "L2", - } - - scope.query( - f""" - CREATE INDEX {bhive_index_name} - ON {COLLECTION_NAME} (embedding VECTOR) - USING GSI WITH {json.dumps(options)} - """, - QueryOptions( - timeout=timedelta(seconds=300) - )).execute() - print(f"Successfully created BHIVE index: {bhive_index_name}") -except Exception as e: - print(f"BHIVE index may already exist or error occurred: {str(e)}") - -``` - -# Testing Optimized GSI Vector Search - -The example below shows running the same RAG query, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - - -```python -# Test the optimized GSI vector search with BHIVE index -query = "Who will Daniel Dubois fight in Saudi Arabia on 22 February?" - -try: - # The RAG pipeline will automatically use the optimized GSI index - # Perform the semantic search with GSI optimization - start_time = time.time() - result = rag_pipeline.run({ - "query_embedder": {"text": query}, - "retriever": {"top_k": 4}, - "prompt_builder": {"question": query}, - "answer_builder": {"query": query}, - }, - include_outputs_from={"retriever", "query_embedder"} - ) - search_elapsed_time = time.time() - start_time - # Get the generated answer - answer: GeneratedAnswer = result["answer_builder"]["answers"][0] - - # Print retrieved documents - print("=== Retrieved Documents ===") - retrieved_docs = result["retriever"]["documents"] - for idx, doc in enumerate(retrieved_docs, start=0): - print(f"Id: {doc.id} Title: {doc.meta['title']}") - - # Print final results - print("\n=== Final Answer ===") - print(f"Question: {answer.query}") - print(f"Answer: {answer.data}") - print("\nSources:") - for doc in answer.documents: - print(f"-> {doc.meta['title']}") - # Display search results - print(f"\nOptimized GSI Vector Search Results (completed in {search_elapsed_time:.2f} seconds):") - #print(result["generator"]["replies"][0]) - -except Exception as e: - raise RuntimeError(f"Error performing optimized semantic search: {e}") - -``` - -# Conclusion -In this tutorial, we've built a Retrieval Augmented Generation (RAG) system using Couchbase Capella's GSI vector search, OpenAI, and Haystack. We used the BBC News dataset, which contains real-time news articles, to demonstrate how RAG can be used to answer questions about current events and provide up-to-date information that extends beyond the LLM's training data. - -The key components of our RAG system include: - -1. **Couchbase Capella GSI Vector Search** as the high-performance vector database for storing and retrieving document embeddings -2. **Haystack** as the framework for building modular RAG pipelines with flexible component connections -3. **OpenAI Services** for generating embeddings (`text-embedding-3-large`) and LLM responses (`gpt-4o`) -4. **GSI Vector Indexes** (BHIVE/Composite) for optimized vector search performance - -This approach allows us to enhance the capabilities of large language models by grounding their responses in specific, up-to-date information from our knowledge base, while leveraging Couchbase's advanced GSI vector search for optimal performance and scalability. Haystack's modular pipeline approach provides flexibility and extensibility for building complex RAG applications. - diff --git a/tutorial/markdown/generated/vector-search-cookbook/haystack-query_based-RAG_with_Couchbase_Capella_and_OpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/haystack-query_based-RAG_with_Couchbase_Capella_and_OpenAI.md deleted file mode 100644 index 5602a06..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/haystack-query_based-RAG_with_Couchbase_Capella_and_OpenAI.md +++ /dev/null @@ -1,683 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-haystack-rag-with-hyperscale-or-composite-vector-index" -alt_paths: - - "/tutorial-openai-haystack-rag-with-hyperscale-vector-index" - - "/tutorial-openai-haystack-rag-with-composite-vector-index" -title: "RAG with OpenAI, Haystack, and Couchbase Hyperscale & Composite Vector Indexes" -short_title: "RAG with OpenAI, Haystack, and Hyperscale & Composite Indexes" -description: - - Learn how to build a semantic search engine using Couchbase Hyperscale and Composite Vector Indexes. - - This tutorial demonstrates how Haystack integrates Couchbase Hyperscale and Composite Vector Indexes with embeddings generated by OpenAI services. - - Perform Retrieval-Augmented Generation (RAG) using Haystack with Couchbase and OpenAI services while comparing the two index types. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - OpenAI - - Artificial Intelligence - - Haystack - - Hyperscale Vector Index - - Composite Vector Index -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/haystack/query_based/RAG_with_Couchbase_Capella_and_OpenAI.ipynb) - -# Introduction - -In this guide, we will walk you through building a Retrieval Augmented Generation (RAG) application with Haystack orchestrating OpenAI models and Couchbase Capella. We will use the [gpt-4o](https://platform.openai.com/docs/models/gpt-4o) model for response generation and the [text-embedding-3-large](https://platform.openai.com/docs/guides/embeddings/embedding-models) model for generating embeddings. - -This notebook demonstrates how to build a RAG system using: -- The [BBC News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime) containing news articles -- Couchbase Capella Hyperscale and Composite Vector Indexes for vector search -- Haystack framework for the RAG pipeline -- OpenAI for embeddings and text generation - -We leverage Couchbase's Hyperscale and Composite Vector Indexes to enable efficient semantic search at scale. Hyperscale indexes prioritize high-throughput vector similarity across billions of vectors with a compact on-disk footprint, while Composite indexes blend scalar predicates with a vector column to narrow candidate sets before similarity search. For a deeper dive into how these indexes work, see the [overview of Capella vector indexes](https://docs.couchbase.com/cloud/vector-index/vectors-and-indexes-overview.html). - -Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial shows how to combine OpenAI Services and Haystack with Couchbase's Hyperscale and Composite Vector Indexes to deliver a production-ready RAG workflow. - -# Before you start - -## Create and Deploy Your Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy an operational cluster. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Have a multi-node Capella cluster running the Data, Query, Index, and Search services. -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -### OpenAI Models Setup - -In order to create the RAG application, we need an embedding model to generate embeddings of documents for Vector Search and a large language model (LLM) for generating the responses based on the context. - -For this implementation, we'll use OpenAI's models which provide state-of-the-art performance for both embeddings and text generation: - -**Embedding Model**: We'll use OpenAI's `text-embedding-3-large` model, which provides high-quality embeddings with 3,072 dimensions for semantic search capabilities. - -**Large Language Model**: We'll use OpenAI's `gpt-4o` model for generating responses based on the retrieved context. This model offers excellent reasoning capabilities and can handle complex queries effectively. - -**Prerequisites for OpenAI Integration**: -* Create an OpenAI account at [platform.openai.com](https://platform.openai.com) -* Generate an API key from your OpenAI dashboard -* Ensure you have sufficient credits or a valid payment method set up -* Set up your API key as an environment variable or input it securely in the notebook - -For more details about OpenAI's models and pricing, please refer to the [OpenAI documentation](https://platform.openai.com/docs/models). - - -# Installing Necessary Libraries -To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, Haystack handles AI model integrations and pipeline management, and we will use the OpenAI SDK for generating embeddings and calling OpenAI's language models. - - - -```python -# Install required packages -%pip install -r requirements.txt -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, Haystack components for RAG pipeline, embedding generation, and dataset loading. - - - -```python -import getpass -import base64 -import logging -import sys -import time -import pandas as pd -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import CouchbaseException -from couchbase.options import ClusterOptions, KnownConfigProfiles, QueryOptions - -from datasets import load_dataset - -from haystack import Pipeline, Document, GeneratedAnswer -from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder -from haystack.components.builders.answer_builder import AnswerBuilder -from haystack.components.generators import OpenAIGenerator -from haystack.components.preprocessors import DocumentCleaner -from haystack.components.writers import DocumentWriter -from haystack.utils import Secret -from haystack.components.builders import PromptBuilder -from couchbase_haystack import ( - CouchbaseQueryDocumentStore, - CouchbaseQueryEmbeddingRetriever, - QueryVectorSearchType, - QueryVectorSearchSimilarity, - CouchbasePasswordAuthenticator, - CouchbaseClusterOptions -) - -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, collection names, and API keys. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - -**OPENAI_API_KEY** is your OpenAI API key which can be obtained from your OpenAI dashboard at [platform.openai.com](https://platform.openai.com/api-keys). - -**INDEX_NAME** is the name of the Hyperscale or Composite Vector Index we will create for vector search operations. - - -```python -CB_CONNECTION_STRING = input("Couchbase Cluster URL (default: localhost): ") or "couchbase://localhost" -CB_USERNAME = input("Couchbase Username (default: admin): ") or "admin" -CB_PASSWORD = input("Couchbase password (default: Password@12345): ") or "Password@12345" -CB_BUCKET_NAME = input("Couchbase Bucket: ") -SCOPE_NAME = input("Couchbase Scope: ") -COLLECTION_NAME = input("Couchbase Collection: ") -INDEX_NAME = input("Vector Search Index: ") -OPENAI_API_KEY = input("OpenAI API Key: ") - -# Check if the variables are correctly loaded -if not all([CB_CONNECTION_STRING, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_NAME, OPENAI_API_KEY]): - raise ValueError("All configuration variables must be provided.") - -``` - -# Setting Up Logging -Logging is essential for tracking the execution of our script and debugging any issues that may arise. We set up a logger that will display information about the script's progress, including timestamps and log levels. - - - -```python -# Configure logging -logging.basicConfig( - level=logging.INFO, - format="%(asctime)s - %(levelname)s - %(message)s", - handlers=[logging.StreamHandler(sys.stdout)], -) -``` - -# Connecting to Couchbase Capella -The next step is to establish a connection to our Couchbase Capella cluster. This connection will allow us to interact with the database, store and retrieve documents, and perform vector searches. - - - -```python -try: - # Initialize the Couchbase Cluster - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - options.apply_profile(KnownConfigProfiles.WanDevelopment) - - # Connect to the cluster - cluster = Cluster(CB_CONNECTION_STRING, options) - - # Wait for the cluster to be ready - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to the Couchbase cluster") -except CouchbaseException as e: - raise RuntimeError(f"Failed to connect to Couchbase: {str(e)}") -``` - -# Setting Up the Bucket, Scope, and Collection -Before we can store our data, we need to ensure that the appropriate bucket, scope, and collection exist in our Couchbase cluster. The code below checks if these components exist and creates them if they don't, providing a foundation for storing our vector embeddings and documents. - - -```python -from couchbase.management.buckets import CreateBucketSettings -import json - -# Create bucket if it does not exist -bucket_manager = cluster.buckets() -try: - bucket_manager.get_bucket(CB_BUCKET_NAME) - print(f"Bucket '{CB_BUCKET_NAME}' already exists.") -except Exception as e: - print(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating bucket...") - bucket_settings = CreateBucketSettings(name=CB_BUCKET_NAME, ram_quota_mb=500) - bucket_manager.create_bucket(bucket_settings) - print(f"Bucket '{CB_BUCKET_NAME}' created successfully.") - -# Create scope and collection if they do not exist -collection_manager = cluster.bucket(CB_BUCKET_NAME).collections() -scopes = collection_manager.get_all_scopes() -scope_exists = any(scope.name == SCOPE_NAME for scope in scopes) - -if scope_exists: - print(f"Scope '{SCOPE_NAME}' already exists.") -else: - print(f"Scope '{SCOPE_NAME}' does not exist. Creating scope...") - collection_manager.create_scope(SCOPE_NAME) - print(f"Scope '{SCOPE_NAME}' created successfully.") - -collections = [collection.name for scope in scopes if scope.name == SCOPE_NAME for collection in scope.collections] -collection_exists = COLLECTION_NAME in collections - -if collection_exists: - print(f"Collection '{COLLECTION_NAME}' already exists in scope '{SCOPE_NAME}'.") -else: - print(f"Collection '{COLLECTION_NAME}' does not exist in scope '{SCOPE_NAME}'. Creating collection...") - collection_manager.create_collection(collection_name=COLLECTION_NAME, scope_name=SCOPE_NAME) - print(f"Collection '{COLLECTION_NAME}' created successfully.") - -``` - -# Load the BBC News Dataset -To build a RAG engine, we need data to search through. We use the [BBC Realtime News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime), a dataset with up-to-date BBC news articles grouped by month. This dataset contains articles that were created after the LLM was trained. It will showcase the use of RAG to augment the LLM. - -The BBC News dataset's varied content allows us to simulate real-world scenarios where users ask complex questions, enabling us to fine-tune our RAG's ability to understand and respond to various types of queries. - - - -```python -try: - news_dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train") - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") -except Exception as e: - raise ValueError(f"Error loading BBC News dataset: {str(e)}") -``` - -## Preview the Data - - -```python -# Print the first two examples from the dataset -print("Dataset columns:", news_dataset.column_names) -print("\nFirst two examples:") -print(news_dataset[:2]) -``` - -## Preparing the Data for RAG - -We need to extract the context passages from the dataset to use as our knowledge base for the RAG system. - - -```python -import hashlib - -news_articles = news_dataset -unique_articles = {} - -for article in news_articles: - content = article.get("content") - if content: - content_hash = hashlib.md5(content.encode()).hexdigest() # Generate hash of content - if content_hash not in unique_articles: - unique_articles[content_hash] = article # Store full article - -unique_news_articles = list(unique_articles.values()) # Convert back to list - -print(f"We have {len(unique_news_articles)} unique articles in our database.") - -``` - -# Creating Embeddings using OpenAI -Embeddings are numerical representations of text that capture semantic meaning. Unlike keyword-based search, embeddings enable semantic search to understand context and retrieve documents that are conceptually similar even without exact keyword matches. We'll use OpenAI's `text-embedding-3-large` model to create high-quality embeddings with 3,072 dimensions. This model transforms our text data into vector representations that can be efficiently searched using Haystack's OpenAI document embedder. - - - -```python -try: - # Set up the document embedder for processing documents - document_embedder = OpenAIDocumentEmbedder( - api_key=Secret.from_token(OPENAI_API_KEY), - model="text-embedding-3-large" - ) - - # Set up the text embedder for query processing - rag_embedder = OpenAITextEmbedder( - api_key=Secret.from_token(OPENAI_API_KEY), - model="text-embedding-3-large" - ) - - print("Successfully created embedding models") -except Exception as e: - raise ValueError(f"Error creating embedding models: {str(e)}") -``` - -# Testing the Embeddings Model -We can test the text embeddings model by generating an embedding for a string - - -```python -test_result = rag_embedder.run(text="this is a test sentence") -test_embedding = test_result["embedding"] -print(f"Embedding dimension: {len(test_embedding)}") -``` - -# Setting Up the Couchbase Vector Document Store -The `CouchbaseQueryDocumentStore` from the `couchbase_haystack` package provides seamless integration with Couchbase, supporting both Hyperscale and Composite Vector Indexes. - - -```python -try: - # Create the Couchbase vector document store - document_store = CouchbaseQueryDocumentStore( - cluster_connection_string=Secret.from_token(CB_CONNECTION_STRING), - authenticator=CouchbasePasswordAuthenticator( - username=Secret.from_token(CB_USERNAME), - password=Secret.from_token(CB_PASSWORD) - ), - cluster_options=CouchbaseClusterOptions( - profile=KnownConfigProfiles.WanDevelopment, - ), - bucket=CB_BUCKET_NAME, - scope=SCOPE_NAME, - collection=COLLECTION_NAME, - search_type=QueryVectorSearchType.ANN, - similarity=QueryVectorSearchSimilarity.L2 - ) - print("Successfully created Couchbase vector document store") -except Exception as e: - raise ValueError(f"Failed to create Couchbase vector document store: {str(e)}") -``` - -# Creating Haystack Documents -In this section, we'll process our news articles and create Haystack Document objects. -Each Document is created with specific metadata that will be used for retrieval and generation. -We'll observe examples of the document content to understand how the documents are structured. - - -```python -haystack_documents = [] -# Process and store documents -for article in unique_news_articles: # Process all unique articles - try: - document = Document( - content=article["content"], - meta={ - "title": article["title"], - "description": article["description"], - "published_date": article["published_date"], - "link": article["link"], - } - ) - haystack_documents.append(document) - except Exception as e: - print(f"Failed to create document: {str(e)}") - continue - -# Observing an example of the document content -print("Document content preview:") -print(f"Content: {haystack_documents[0].content[:200]}...") -print(f"Metadata: {haystack_documents[0].meta}") - -print(f"Created {len(haystack_documents)} documents") - - -``` - -# Creating and Running the Indexing Pipeline - -In this section, we'll create an indexing pipeline to process our documents. The pipeline will: - -1. DocumentCleaner - Cleans and preprocesses the raw Haystack documents (removes extra whitespace, normalizes text) -2. document_embedder - Generates vector embeddings for each document using an embedding model (likely OpenAI's), converting text into numerical representations for semantic search -3. DocumentWriter - Writes the cleaned documents along with their embeddings to the Couchbase document store - -This transforms raw news articles into searchable vector representations stored in Couchbase for later semantic retrieval in the RAG system. - - -```python - - -# Process documents: split into chunks, generate embeddings, and store in document store -# Create indexing pipeline -indexing_pipeline = Pipeline() -indexing_pipeline.add_component("cleaner", DocumentCleaner()) -indexing_pipeline.add_component("embedder", document_embedder) -indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store)) - -indexing_pipeline.connect("cleaner.documents", "embedder.documents") -indexing_pipeline.connect("embedder.documents", "writer.documents") - - - -``` - -# Run Indexing Pipeline - -Execute the pipeline for processing and indexing BCC news documents: - - -```python -# Run the indexing pipeline -if haystack_documents: - result = indexing_pipeline.run({"cleaner": {"documents": haystack_documents[:1200]}}) - print(f"Indexed {result['writer']['documents_written']} document chunks") -else: - print("No documents created. Skipping indexing.") - -``` - -# Using OpenAI's Large Language Model (LLM) -Large language models are AI systems that are trained to understand and generate human language. We'll be using OpenAI's `gpt-4o` model to process user queries and generate meaningful responses based on the retrieved context from our Couchbase document store. This model is a key component of our RAG system, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By integrating OpenAI's LLM, we equip our RAG system with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our RAG system truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - -The LLM is configured using Haystack's OpenAI generator component with your OpenAI API key for seamless integration with their services. - - -```python -try: - # Set up the LLM generator - generator = OpenAIGenerator( - api_key=Secret.from_token(OPENAI_API_KEY), - model="gpt-4o" - ) - logging.info("Successfully created the OpenAI generator") -except Exception as e: - raise ValueError(f"Error creating OpenAI generator: {str(e)}") -``` - -# Creating the RAG Pipeline - -In this section, we'll create a RAG pipeline using Haystack components. This pipeline serves as the foundation for our RAG system, enabling semantic search capabilities and efficient retrieval of relevant information. - -The RAG pipeline provides a complete workflow that allows us to: -1. Perform semantic searches based on user queries -2. Retrieve the most relevant documents or chunks -3. Generate contextually appropriate responses using our LLM - - - -```python -# Define RAG prompt template -prompt_template = """ -Given these documents, answer the question.\nDocuments: -{% for doc in documents %} - {{ doc.content }} -{% endfor %} - -\nQuestion: {{question}} -\nAnswer: -""" - -# Create the RAG pipeline -rag_pipeline = Pipeline() - -# Add components to the pipeline -rag_pipeline.add_component( - "query_embedder", - rag_embedder, -) -rag_pipeline.add_component("retriever", CouchbaseQueryEmbeddingRetriever(document_store=document_store)) -rag_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template)) -rag_pipeline.add_component("llm",generator) -rag_pipeline.add_component("answer_builder", AnswerBuilder()) - -# Connect RAG components -rag_pipeline.connect("query_embedder", "retriever.query_embedding") -rag_pipeline.connect("retriever.documents", "prompt_builder.documents") -rag_pipeline.connect("prompt_builder.prompt", "llm.prompt") -rag_pipeline.connect("llm.replies", "answer_builder.replies") -rag_pipeline.connect("llm.meta", "answer_builder.meta") -rag_pipeline.connect("retriever", "answer_builder.documents") - -print("Successfully created RAG pipeline") -``` - -# Retrieval-Augmented Generation (RAG) with Couchbase and Haystack - -Let's test our RAG system by performing a semantic search on a sample query. In this example, we'll use a question about Pep Guardiola's reaction to Manchester City's recent form. The RAG system will: - -1. Process the natural language query -2. Search through our document store for relevant information -3. Retrieve the most semantically similar documents -4. Generate a comprehensive response using the LLM - -This demonstrates how our system combines the power of vector search with language model capabilities to provide accurate, contextual answers based on the information in our database. - -**Note:** By default, without any Hyperscale or Composite Vector Index, Couchbase falls back to linear brute-force search that compares the query vector against every document in the collection. This works for small datasets but can become slow as the dataset grows. - - -```python -# Sample query from the dataset - -query = "What is latest news on the death of Charles Breslin?" - -try: - # Perform the semantic search using the RAG pipeline - start_time = time.time() - result = rag_pipeline.run({ - "query_embedder": {"text": query}, - "retriever": {"top_k": 5}, - "prompt_builder": {"question": query}, - "answer_builder": {"query": query}, - }, - include_outputs_from={"retriever", "query_embedder"} - ) - search_elapsed_time = time.time() - start_time - # Get the generated answer - answer: GeneratedAnswer = result["answer_builder"]["answers"][0] - - # Print retrieved documents - print("=== Retrieved Documents ===") - retrieved_docs = result["retriever"]["documents"] - for idx, doc in enumerate(retrieved_docs, start=1): - print(f"Id: {doc.id} Title: {doc.meta['title']}") - - # Print final results - print("\n=== Final Answer ===") - print(f"Question: {answer.query}") - print(f"Answer: {answer.data}") - print("\nSources:") - for doc in answer.documents: - print(f"-> {doc.meta['title']}") - # Display search results - print(f"\nLinear Vector Search Results (completed in {search_elapsed_time:.2f} seconds):") - #print(result["generator"]["replies"][0]) - -except Exception as e: - raise RuntimeError(f"Error performing RAG search: {e}") -``` - -# Create Hyperscale or Composite Vector Indexes - -While the above RAG system works effectively, you can significantly improve query performance by enabling Couchbase Capella's Hyperscale or Composite Vector Indexes. - -## Hyperscale Vector Indexes -- Specifically designed for vector searches -- Perform vector similarity and semantic searches faster than other index types -- Scale to billions of vectors while keeping most of the structure in an optimized on-disk format -- Maintain high accuracy even for vectors with a large number of dimensions -- Support concurrent searches and inserts for constantly changing datasets - -Use this type of index when you primarily query vector values and need low-latency similarity search at scale. In general, Hyperscale Vector Indexes are the best starting point for most vector search workloads. - -## Composite Vector Indexes -- Combine scalar filters with a single vector column in the same index definition -- Designed for searches that apply one vector value alongside scalar attributes that remove large portions of the dataset before similarity scoring -- Consume a moderate amount of memory and can index Tens of million to billion of documents -- Excel when your queries must return a small, highly targeted result set - -Use Composite Vector Indexes when you want to perform searches that blend scalar predicates and vector similarity so that the scalar filters tighten the candidate set. - -For an in-depth comparison and tuning guidance, review the [Couchbase vector index documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html) and the [overview of Capella vector indexes](https://docs.couchbase.com/cloud/vector-index/vectors-and-indexes-overview.html). - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The `index_description` parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -**Centroids (IVF - Inverted File):** -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like `IVF,SQ8`), Couchbase auto-selects based on dataset size - -**Quantization Options:** -- SQ (Scalar Quantization): `SQ4`, `SQ6`, `SQ8` (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): `PQx` (e.g., `PQ32x8`) -- Higher values = better accuracy, larger index size - -**Common Examples:** -- `IVF,SQ8` – Auto centroids, 8-bit scalar quantization (good default) -- `IVF1000,SQ6` – 1000 centroids, 6-bit scalar quantization -- `IVF,PQ32x8` – Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/server/current/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a Hyperscale index for optimal performance. You can adapt the same flow to create a COMPOSITE index by replacing the index type and options. - - -```python -# Create a Hyperscale Vector Index for optimized vector search -try: - hyperscale_index_name = f"{INDEX_NAME}_hyperscale" - - # Use the cluster connection to create the Hyperscale index - scope = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME) - - options = { - "dimension": 3072, # text-embedding-3-large dimension - "similarity": "L2", - } - - scope.query( - f""" - CREATE VECTOR INDEX {hyperscale_index_name} - ON {COLLECTION_NAME} (embedding VECTOR) - WITH {json.dumps(options)} - """, - QueryOptions( - timeout=timedelta(seconds=300) - )).execute() - print(f"Successfully created Hyperscale index: {hyperscale_index_name}") -except Exception as e: - print(f"Hyperscale index may already exist or error occurred: {str(e)}") - -``` - -# Testing Optimized Hyperscale Vector Search - -The example below runs the same RAG query, but now uses the Hyperscale index created above. You'll notice improved performance as the index efficiently retrieves data. If you create a Composite index, the workflow is identical — Haystack automatically routes queries through the scalar filters before performing the vector similarity search. - - -```python -# Test the optimized Hyperscale vector search -query = "What is latest news on the death of Charles Breslin?" - -try: - # The RAG pipeline will automatically use the optimized Hyperscale index - # Perform the semantic search with Hyperscale optimization - start_time = time.time() - result = rag_pipeline.run({ - "query_embedder": {"text": query}, - "retriever": {"top_k": 4}, - "prompt_builder": {"question": query}, - "answer_builder": {"query": query}, - }, - include_outputs_from={"retriever", "query_embedder"} - ) - search_elapsed_time = time.time() - start_time - # Get the generated answer - answer: GeneratedAnswer = result["answer_builder"]["answers"][0] - - # Print retrieved documents - print("=== Retrieved Documents ===") - retrieved_docs = result["retriever"]["documents"] - for idx, doc in enumerate(retrieved_docs, start=0): - print(f"Id: {doc.id} Title: {doc.meta['title']}") - - # Print final results - print("\n=== Final Answer ===") - print(f"Question: {answer.query}") - print(f"Answer: {answer.data}") - print("\nSources:") - for doc in answer.documents: - print(f"-> {doc.meta['title']}") - # Display search results - print(f"\nOptimized Hyperscale Vector Search Results (completed in {search_elapsed_time:.2f} seconds):") - #print(result["generator"]["replies"][0]) - -except Exception as e: - raise RuntimeError(f"Error performing optimized semantic search: {e}") - -``` - -# Conclusion -In this tutorial, we've built a Retrieval Augmented Generation (RAG) system using Haystack with OpenAI models and Couchbase Capella's Hyperscale and Composite Vector Indexes. Using the BBC News dataset, we demonstrated how modern vector indexes make it possible to answer up-to-date questions that extend beyond an LLM's original training data. - -The key components of our RAG system include: - -1. **Couchbase Capella Hyperscale & Composite Vector Indexes** for high-performance storage and retrieval of document embeddings -2. **Haystack** as the framework for building modular RAG pipelines with flexible component connections -3. **OpenAI Services** for generating embeddings (`text-embedding-3-large`) and LLM responses (`gpt-4o`) - -This approach grounds LLM responses in specific, current information from our knowledge base while taking advantage of Couchbase's advanced vector index options for performance and scale. Haystack's modular pipeline model keeps the solution extensible as you layer in additional data sources or services. - diff --git a/tutorial/markdown/generated/vector-search-cookbook/haystack-search_based-RAG_with_Couchbase_Capella_and_OpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/haystack-search_based-RAG_with_Couchbase_Capella_and_OpenAI.md deleted file mode 100644 index 92d7edc..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/haystack-search_based-RAG_with_Couchbase_Capella_and_OpenAI.md +++ /dev/null @@ -1,535 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-haystack-rag-with-search-vector-index" -title: "RAG with OpenAI, Haystack, and Couchbase Search Vector Index" -short_title: "RAG with OpenAI, Haystack, and Search Vector Index" -description: - - Learn how to build a semantic search engine using the Couchbase Search Vector Index. - - This tutorial demonstrates how Haystack integrates Couchbase Search Vector Index with embeddings generated by OpenAI services. - - Perform Retrieval-Augmented Generation (RAG) using Haystack with Couchbase and OpenAI services. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - OpenAI - - Artificial Intelligence - - Haystack - - Search Vector Index -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/haystack/search_based/RAG_with_Couchbase_Capella_and_OpenAI.ipynb) - -# BBC News Dataset RAG Pipeline with Haystack, Couchbase Search Vector Index, and OpenAI - -This notebook demonstrates how to build a Retrieval Augmented Generation (RAG) system using: -- The BBC News dataset containing real-time news articles -- Couchbase Capella Search Vector Index for low-latency vector retrieval -- Haystack framework for the RAG pipeline -- OpenAI for embeddings and text generation - -The system allows users to ask questions about current events and get AI-generated answers based on the latest news articles. - -# Installing Necessary Libraries - -To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, Haystack handles AI model integrations and pipeline management, and we will use the OpenAI SDK for generating embeddings and calling OpenAI's language models. - - -```python -%pip install datasets haystack-ai couchbase-haystack openai pandas -``` - - [Output too long, omitted for brevity] - -# Importing Necessary Libraries - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, Haystack components for RAG pipeline, embedding generation, and dataset loading. - - -```python -import getpass -import base64 -import logging -import sys -import time -import pandas as pd -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import CouchbaseException -from couchbase.options import ClusterOptions -from datasets import load_dataset -from haystack import Pipeline, GeneratedAnswer -from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder -from haystack.components.preprocessors import DocumentCleaner -from haystack.components.writers import DocumentWriter -from haystack.components.builders.answer_builder import AnswerBuilder -from haystack.components.builders.prompt_builder import PromptBuilder -from haystack.components.generators import OpenAIGenerator -from haystack.utils import Secret -from haystack.dataclasses import Document - -from couchbase_haystack import ( - CouchbaseSearchDocumentStore, - CouchbasePasswordAuthenticator, - CouchbaseClusterOptions, - CouchbaseSearchEmbeddingRetriever, -) -from couchbase.options import KnownConfigProfiles - -# Configure logging -logger = logging.getLogger(__name__) -logger.setLevel(logging.DEBUG) - -``` - - /Users/viraj.agarwal/Tasks/Task16.5/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html - from .autonotebook import tqdm as notebook_tqdm - - -# Prerequisites - -## Create and Deploy Your Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy an operational cluster. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Have a multi-node Capella cluster running the Data, Query, Index, and Search services. -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -### OpenAI Models Setup - -In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context. - -For this implementation, we'll use OpenAI's models which provide state-of-the-art performance for both embeddings and text generation: - -**Embedding Model**: We'll use OpenAI's `text-embedding-3-large` model, which provides high-quality embeddings with 3,072 dimensions for semantic search capabilities. - -**Large Language Model**: We'll use OpenAI's `gpt-4o` model for generating responses based on the retrieved context. This model offers excellent reasoning capabilities and can handle complex queries effectively. - -**Prerequisites for OpenAI Integration**: -* Create an OpenAI account at [platform.openai.com](https://platform.openai.com) -* Generate an API key from your OpenAI dashboard -* Ensure you have sufficient credits or a valid payment method set up -* Set up your API key as an environment variable or input it securely in the notebook - -For more details about OpenAI's models and pricing, please refer to the [OpenAI documentation](https://platform.openai.com/docs/models). - -# Configure Couchbase Credentials - -Enter your Couchbase and OpenAI credentials: - -**OPENAI_API_KEY** is your OpenAI API key which can be obtained from your OpenAI dashboard at [platform.openai.com](https://platform.openai.com/api-keys). - -**INDEX_NAME** is the name of the Search Vector Index used for vector search operations. - - -```python -CB_CONNECTION_STRING = input("Couchbase Cluster URL (default: localhost): ") or "localhost" -CB_USERNAME = input("Couchbase Username (default: admin): ") or "admin" -CB_PASSWORD = input("Couchbase password (default: Password@12345): ") or "Password@12345" -CB_BUCKET_NAME = input("Couchbase Bucket: ") -CB_SCOPE_NAME = input("Couchbase Scope: ") -CB_COLLECTION_NAME = input("Couchbase Collection: ") -CB_INDEX_NAME = input("Vector Search Index: ") -OPENAI_API_KEY = input("OpenAI API Key: ") - -# Check if the variables are correctly loaded -if not all([CB_CONNECTION_STRING, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, CB_SCOPE_NAME, CB_COLLECTION_NAME, CB_INDEX_NAME, OPENAI_API_KEY]): - raise ValueError("All configuration variables must be provided.") -``` - - -```python -from couchbase.cluster import Cluster -from couchbase.options import ClusterOptions -from couchbase.auth import PasswordAuthenticator -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.collections import CollectionSpec -from couchbase.management.search import SearchIndex -import json - -# Connect to Couchbase cluster -cluster = Cluster(CB_CONNECTION_STRING, ClusterOptions( - PasswordAuthenticator(CB_USERNAME, CB_PASSWORD))) - -# Create bucket if it does not exist -bucket_manager = cluster.buckets() -try: - bucket_manager.get_bucket(CB_BUCKET_NAME) - print(f"Bucket '{CB_BUCKET_NAME}' already exists.") -except Exception as e: - print(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating bucket...") - bucket_settings = CreateBucketSettings(name=CB_BUCKET_NAME, ram_quota_mb=500) - bucket_manager.create_bucket(bucket_settings) - print(f"Bucket '{CB_BUCKET_NAME}' created successfully.") - -# Create scope and collection if they do not exist -collection_manager = cluster.bucket(CB_BUCKET_NAME).collections() -scopes = collection_manager.get_all_scopes() -scope_exists = any(scope.name == CB_SCOPE_NAME for scope in scopes) - -if scope_exists: - print(f"Scope '{CB_SCOPE_NAME}' already exists.") -else: - print(f"Scope '{CB_SCOPE_NAME}' does not exist. Creating scope...") - collection_manager.create_scope(CB_SCOPE_NAME) - print(f"Scope '{CB_SCOPE_NAME}' created successfully.") - -collections = [collection.name for scope in scopes if scope.name == CB_SCOPE_NAME for collection in scope.collections] -collection_exists = CB_COLLECTION_NAME in collections - -if collection_exists: - print(f"Collection '{CB_COLLECTION_NAME}' already exists in scope '{CB_SCOPE_NAME}'.") -else: - print(f"Collection '{CB_COLLECTION_NAME}' does not exist in scope '{CB_SCOPE_NAME}'. Creating collection...") - collection_manager.create_collection(collection_name=CB_COLLECTION_NAME, scope_name=CB_SCOPE_NAME) - print(f"Collection '{CB_COLLECTION_NAME}' created successfully.") - -# Create Search Vector Index from search_vector_index.json file at scope level -with open('search_vector_index.json', 'r') as search_file: - search_index_definition = SearchIndex.from_json(json.load(search_file)) - - # Update search index definition with user inputs - search_index_definition.name = CB_INDEX_NAME - search_index_definition.source_name = CB_BUCKET_NAME - - # Update types mapping - old_type_key = next(iter(search_index_definition.params['mapping']['types'].keys())) - type_obj = search_index_definition.params['mapping']['types'].pop(old_type_key) - search_index_definition.params['mapping']['types'][f"{CB_SCOPE_NAME}.{CB_COLLECTION_NAME}"] = type_obj - - search_index_name = search_index_definition.name - - # Get scope-level search manager - scope_search_manager = cluster.bucket(CB_BUCKET_NAME).scope(CB_SCOPE_NAME).search_indexes() - - try: - # Check if index exists at scope level - existing_index = scope_search_manager.get_index(search_index_name) - print(f"Search Vector Index '{search_index_name}' already exists at scope level.") - except Exception as e: - print(f"Search Vector Index '{search_index_name}' does not exist at scope level. Creating index from search_vector_index.json...") - with open('search_vector_index.json', 'r') as search_file: - scope_search_manager.upsert_index(search_index_definition) - print(f"Search Vector Index '{search_index_name}' created successfully at scope level.") -``` - - Bucket 'b' already exists. - Scope 's' already exists. - Collection 'c' already exists in scope 's'. - Search Vector Index 'vector_search' already exists at scope level. - - -# Load and Process Movie Dataset - -Load the TMDB movie dataset and prepare documents for indexing: - - -```python -# Load TMDB dataset -print("Loading TMDB dataset...") -dataset = load_dataset("AiresPucrs/tmdb-5000-movies") -movies_df = pd.DataFrame(dataset['train']) -print(f"Total movies found: {len(movies_df)}") - -# Create documents from movie data -docs_data = [] -for _, row in movies_df.iterrows(): - if pd.isna(row['overview']): - continue - - try: - docs_data.append({ - 'id': str(row["id"]), - 'content': f"Title: {row['title']}\nGenres: {', '.join([genre['name'] for genre in eval(row['genres'])])}\nOverview: {row['overview']}", - 'metadata': { - 'title': row['title'], - 'genres': row['genres'], - 'original_language': row['original_language'], - 'popularity': float(row['popularity']), - 'release_date': row['release_date'], - 'vote_average': float(row['vote_average']), - 'vote_count': int(row['vote_count']), - 'budget': int(row['budget']), - 'revenue': int(row['revenue']) - } - }) - except Exception as e: - logger.error(f"Error processing movie {row['title']}: {e}") - -print(f"Created {len(docs_data)} documents with valid overviews") -documents = [Document(id=doc['id'], content=doc['content'], meta=doc['metadata']) - for doc in docs_data] -``` - - Loading TMDB dataset... - - - Generating train split: 100%|██████████| 4803/4803 [00:00<00:00, 123144.70 examples/s] - - - Total movies found: 4803 - Created 4800 documents with valid overviews - - -# Initialize Document Store - -Set up the Couchbase document store for storing movie data and embeddings: - - -```python -# Initialize document store -document_store = CouchbaseSearchDocumentStore( - cluster_connection_string=Secret.from_token(CB_CONNECTION_STRING), - authenticator=CouchbasePasswordAuthenticator( - username=Secret.from_token(CB_USERNAME), - password=Secret.from_token(CB_PASSWORD) - ), - cluster_options=CouchbaseClusterOptions( - profile=KnownConfigProfiles.WanDevelopment, - ), - bucket=CB_BUCKET_NAME, - scope=CB_SCOPE_NAME, - collection=CB_COLLECTION_NAME, - vector_search_index=CB_INDEX_NAME, -) - -print("Couchbase document store initialized successfully.") -``` - - Couchbase document store initialized successfully. - - -# Initialize Embedder for Document Embedding - -Configure the document embedder using Capella AI's endpoint and the E5 Mistral model. This component will generate embeddings for each movie overview to enable semantic search - - - - -```python -embedder = OpenAIDocumentEmbedder( - api_key=Secret.from_token(OPENAI_API_KEY), - model="text-embedding-3-large", -) - -rag_embedder = OpenAITextEmbedder( - api_key=Secret.from_token(OPENAI_API_KEY), - model="text-embedding-3-large", -) - -``` - -# Initialize LLM Generator -Configure the LLM generator using Capella AI's endpoint and Llama 3.1 model. This component will generate natural language responses based on the retrieved documents. - - - -```python -llm = OpenAIGenerator( - api_key=Secret.from_token(OPENAI_API_KEY), - model="gpt-4o", -) -``` - -# Create Indexing Pipeline -Build the pipeline for processing and indexing movie documents: - - -```python -# Create indexing pipeline -index_pipeline = Pipeline() -index_pipeline.add_component("cleaner", DocumentCleaner()) -index_pipeline.add_component("embedder", embedder) -index_pipeline.add_component("writer", DocumentWriter(document_store=document_store)) - -# Connect indexing components -index_pipeline.connect("cleaner.documents", "embedder.documents") -index_pipeline.connect("embedder.documents", "writer.documents") -``` - - - - - - 🚅 Components - - cleaner: DocumentCleaner - - embedder: OpenAIDocumentEmbedder - - writer: DocumentWriter - 🛤️ Connections - - cleaner.documents -> embedder.documents (list[Document]) - - embedder.documents -> writer.documents (list[Document]) - - - -# Run Indexing Pipeline - -Execute the pipeline for processing and indexing movie documents: - - -```python -# Run indexing pipeline - -if documents: - # Process documents in batches for better performance - batch_size = 100 - total_docs = len(documents[:200]) - - for i in range(0, total_docs, batch_size): - batch = documents[i:i + batch_size] - result = index_pipeline.run({"cleaner": {"documents": batch}}) - print(f"Processed batch {i//batch_size + 1}: {len(batch)} documents") - - print(f"\nSuccessfully processed {total_docs} documents") - print(f"Sample document metadata: {documents[0].meta}") -else: - print("No documents created. Skipping indexing.") -``` - - Calculating embeddings: 4it [00:06, 1.73s/it] - - - Processed batch 1: 100 documents - - - Calculating embeddings: 4it [00:06, 1.66s/it] - - Processed batch 2: 100 documents - - Successfully processed 200 documents - Sample document metadata: {'title': 'Four Rooms', 'genres': '[{"id": 80, "name": "Crime"}, {"id": 35, "name": "Comedy"}]', 'original_language': 'en', 'popularity': 22.87623, 'release_date': '1995-12-09', 'vote_average': 6.5, 'vote_count': 530, 'budget': 4000000, 'revenue': 4300000} - - - - - -# Create RAG Pipeline - -Set up the Retrieval Augmented Generation pipeline for answering questions about movies: - - -```python -# Define RAG prompt template -prompt_template = """ -Given these documents, answer the question.\nDocuments: -{% for doc in documents %} - {{ doc.content }} -{% endfor %} - -\nQuestion: {{question}} -\nAnswer: -""" - -# Create RAG pipeline -rag_pipeline = Pipeline() - -# Add components -rag_pipeline.add_component( - "query_embedder", - rag_embedder, -) -rag_pipeline.add_component("retriever", CouchbaseSearchEmbeddingRetriever(document_store=document_store)) -rag_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template)) -rag_pipeline.add_component("llm",llm) -rag_pipeline.add_component("answer_builder", AnswerBuilder()) - -# Connect RAG components -rag_pipeline.connect("query_embedder", "retriever.query_embedding") -rag_pipeline.connect("retriever.documents", "prompt_builder.documents") -rag_pipeline.connect("prompt_builder.prompt", "llm.prompt") -rag_pipeline.connect("llm.replies", "answer_builder.replies") -rag_pipeline.connect("llm.meta", "answer_builder.meta") -rag_pipeline.connect("retriever", "answer_builder.documents") - -print("RAG pipeline created successfully.") -``` - - PromptBuilder has 2 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`. - - - RAG pipeline created successfully. - - -# Ask Questions About Movies - -Use the RAG pipeline to ask questions about movies and get AI-generated answers: - - -```python -# Example question -question = "Why did Manni call Lolla?" - -# Run the RAG pipeline -result = rag_pipeline.run( - { - "query_embedder": {"text": question}, - "retriever": {"top_k": 5}, - "prompt_builder": {"question": question}, - "answer_builder": {"query": question}, - }, - include_outputs_from={"retriever", "query_embedder"} -) - -# Get the generated answer -answer: GeneratedAnswer = result["answer_builder"]["answers"][0] - -# Print retrieved documents -print("=== Retrieved Documents ===") -retrieved_docs = result["retriever"]["documents"] -for idx, doc in enumerate(retrieved_docs, start=1): - print(f"Id: {doc.id} Title: {doc.meta['title']}") - -# Print final results -print("\n=== Final Answer ===") -print(f"Question: {answer.query}") -print(f"Answer: {answer.data}") -print("\nSources:") -for doc in answer.documents: - print(f"-> {doc.meta['title']}") -``` - - === Retrieved Documents === - Id: 006b97c08110cb1b9b58e03943c91fa9412cfe7a2a22830ba5b9e3eb0c342344 Title: Run Lola Run - Id: 33543dab4c048c9467d632f319e02bca94da6f178250c14d26eabfb30911a823 Title: Mambo Italiano - Id: 94c55246e02c290767531f6359b5f44145191e3f2d62a3a64ed4718a666be9f2 Title: Good bye, Lenin! - Id: 00b4d1f455e45fbffa39f72be6de635bdcdb6b8a04289ba4aea41061700b9096 Title: Mean Streets - Id: 9241f819303fe61a25e05469856c01a8843d53a6ce7cec340bf0def848ddb470 Title: Magnolia - - === Final Answer === - Question: Why did Manni call Lolla? - Answer: Manni called Lola because he lost 100,000 DM in a subway train that belongs to a very bad guy, and he needs her help to raise the money within 20 minutes to prevent him from having to rob a store to get the money. - - Sources: - -> Run Lola Run - -> Mambo Italiano - -> Good bye, Lenin! - -> Mean Streets - -> Magnolia - - -# Conclusion - -In this tutorial, we built a Retrieval-Augmented Generation (RAG) system using Couchbase Capella, OpenAI, and Haystack with the BBC News dataset. This demonstrates how to combine Couchbase Search Vector Index with large language models to answer questions about current events using real-time information. - -The key components include: -- **Couchbase Capella Search Vector Index** for vector storage and retrieval -- **Haystack** for pipeline orchestration and component management -- **OpenAI** for embeddings (`text-embedding-3-large`) and text generation (`gpt-4o`) - -This approach enables AI applications to access and reason over current information that extends beyond the LLM's training data, making responses more accurate and relevant for real-world use cases. diff --git a/tutorial/markdown/generated/vector-search-cookbook/huggingface-fts-hugging_face.md b/tutorial/markdown/generated/vector-search-cookbook/huggingface-fts-hugging_face.md deleted file mode 100644 index 79a3af3..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/huggingface-fts-hugging_face.md +++ /dev/null @@ -1,225 +0,0 @@ ---- -# frontmatter -path: "/tutorial-huggingface-couchbase-vector-search-with-fts" -title: Using Hugging Face Embeddings with Couchbase Vector Search using FTS Service -short_title: Hugging Face with Couchbase Vector Search using FTS Service -description: - - Learn how to generate embeddings using Hugging Face and store them in Couchbase. - - This tutorial demonstrates how to use Couchbase's vector search capabilities with Hugging Face embeddings. - - You'll understand how to perform vector search to find relevant documents based on similarity using FTS Service. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - Hugging Face -sdk_language: - - python -length: 30 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/huggingface/fts/hugging_face.ipynb) - -# Introduction - -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [Hugging Face](https://huggingface.co/) as the AI-powered embedding Model. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively, if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com//tutorial-huggingface-couchbase-vector-search-with-global-secondary-index) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/huggingface/fts/hugging_face.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Install necessary libraries - - -```python -!pip --quiet install couchbase==4.4.0 transformers==4.56.1 sentence_transformers==5.1.0 langchain-community==0.3.29 langchain_huggingface==0.3.1 python-dotenv==1.1.1 ipywidgets -``` - -# Imports - - -```python -from pathlib import Path -from datetime import timedelta -from transformers import pipeline, AutoModel, AutoTokenizer -from langchain_huggingface.embeddings.huggingface import HuggingFaceEmbeddings -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.options import (ClusterOptions, ClusterTimeoutOptions, - QueryOptions) -import couchbase.search as search -from couchbase.options import SearchOptions -from couchbase.vector_search import VectorQuery, VectorSearch -import uuid -import os -from dotenv import load_dotenv -import getpass - -``` - -# Prerequisites -In order to run this tutorial, you will need access to a Couchbase Cluster with Full Text Search service either through Couchbase Capella or by running it locally and have credentials to acces a collection on that cluster: - - -```python -# Load environment variables -load_dotenv("./.env") - -# Configuration -couchbase_cluster_url = os.getenv('CB_CLUSTER_URL') or input("Couchbase Cluster URL:") -couchbase_username = os.getenv('CB_USERNAME') or input("Couchbase Username:") -couchbase_password = os.getenv('CB_PASSWORD') or getpass.getpass("Couchbase password:") -couchbase_bucket = os.getenv('CB_BUCKET') or input("Couchbase Bucket:") -couchbase_scope = os.getenv('CB_SCOPE') or input("Couchbase Scope:") -couchbase_collection = os.getenv('CB_COLLECTION') or input("Couchbase Collection:") -``` - -# Couchbase Connection -In this section, we first need to create a `PasswordAuthenticator` object that would hold our Couchbase credentials: - - -```python -auth = PasswordAuthenticator( - couchbase_username, - couchbase_password -) -``` - -Then, we use this object to connect to Couchbase Cluster and select specified above bucket, scope and collection: - - -```python -print("Connecting to cluster") -cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth)) -cluster.wait_until_ready(timedelta(seconds=5)) - -bucket = cluster.bucket(couchbase_bucket) -scope = bucket.scope(couchbase_scope) -collection = scope.collection(couchbase_collection) -print("Connected to the cluster") -``` - - Connecting to cluster - Connected to the cluster - - -# Creating Couchbase Vector Search Index -In order to store generated with Hugging Face embeddings onto a Couchbase Cluster, a vector search index needs to be created first. We included a sample index definition that will work with this tutorial in a file named `huggingface_index.json` located in the folder with this tutorial. The definition can be used to create a vector index using Couchbase server web console, on more information on vector indexes, please read [Create a Vector Search Index with the Server Web Console](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html). Please note that the index is configured for documents from bucket `hugginface`, scope `_default` and collection `huggingface` and you will have to edit `source` and document type name in the index definition file if your collection, scope or bucket names are different. - -Here, our code verifies the existence of the index and will throw an exception if the index has not been found: - - -```python -search_index_name = couchbase_bucket + "._default.vector_test" -search_index = cluster.search_indexes().get_index(search_index_name) -print("Found index: " + search_index_name) -``` - - Found index: huggingface._default.vector_test - - -# Hugging Face Initialization - - -```python -embedding_model = HuggingFaceEmbeddings() -print("Initialized successfully") -``` - - Initialized successfully - - -# Embedding Documents -After initializing Hugging Face transformers library, it can be used to generate vector embeddings for user input or predefined set of phrases. Here, we're generating 2 embeddings for contained in the array strings: - - -```python -texts = [ - "Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.", - "It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.", - input("Enter custom embedding text:") -] -embeddings = [] -for i in range(0, len(texts)): - embeddings.append(embedding_model.embed_query(texts[i])) -``` - -# Storing Embeddings in Couchbase -Generated embeddings are then stored as vector fields inside documents that can contain additional information about the vector, including the original text. The documents are then upserted onto the couchbase cluster: - - -```python -for i in range(0, len(texts)): - doc = { - "id": str(uuid.uuid4()), - "text": texts[i], - "vector": embeddings[i], - } - collection.upsert(doc["id"], doc) -``` - -# Searching For Embeddings -After the documents are upserted onto the cluster, their vector fields will be added into previously imported vector index. Later, new embeddings can be added or used to perform a similarity search on the previously added documents: - - -```python -def search_similar(text): - print("Vector similarity search for phrase: \"" + text + "\"") - search_embedding = embedding_model.embed_query(text) - - search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search( - VectorSearch.from_vector_query( - VectorQuery( - "vector", search_embedding, num_candidates=1 - ) - ) - ) - result = scope.search( - "vector_test", - search_req, - SearchOptions( - limit=13, - fields=["vector", "id", "text"] - ) - ) - for row in result.rows(): - print("Found answer: " + row.id + "; score: " + str(row.score)) - doc = collection.get(row.id) - print("Answer text: " + doc.value["text"]) - -search_similar("name a multipurpose database with distributed capability") -print("------") -search_similar(input("Enter custom search phrase:")) -``` - - Vector similarity search for phrase: "name a multipurpose database with distributed capability" - Found answer: 3993ec2e-c184-4d7f-8fc3-55961afe264c; score: 0.9256534967756203 - Answer text: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - ------ - Vector similarity search for phrase: "What is the data in the sample text?" - Found answer: a7748fac-b41f-4846-bebc-d89bdcd645e3; score: 1.0016003788325407 - Answer text: this is a sample text with the data "Qwerty" - diff --git a/tutorial/markdown/generated/vector-search-cookbook/huggingface-gsi-hugging_face.md b/tutorial/markdown/generated/vector-search-cookbook/huggingface-gsi-hugging_face.md deleted file mode 100644 index 8326510..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/huggingface-gsi-hugging_face.md +++ /dev/null @@ -1,656 +0,0 @@ ---- -# frontmatter -path: "/tutorial-huggingface-couchbase-vector-search-with-global-secondary-index" -title: Using Hugging Face Embeddings with Couchbase Vector Search with GSI -short_title: Hugging Face with Couchbase Vector Search with GSI -description: - - Learn how to generate embeddings using Hugging Face and store them in Couchbase. - - This tutorial demonstrates how to use Couchbase's vector search capabilities with Hugging Face embeddings. - - You'll understand how to perform vector search to find relevant documents based on similarity with GSI. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - GSI - - Artificial Intelligence - - Hugging Face -sdk_language: - - python -length: 30 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/huggingface/gsi/hugging_face.ipynb) - -# Semantic Search with Couchbase GSI Vector Search and Hugging Face - -## Overview - -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Hugging Face](https://huggingface.co/) as the AI-powered embedding model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. - -This tutorial demonstrates how to leverage Couchbase's **Global Secondary Index (GSI) vector search capabilities** with Hugging Face embeddings to create a high-performance semantic search system. GSI vector search in Couchbase offers significant advantages over traditional FTS (Full-Text Search) approaches, particularly for vector-first workloads and scenarios requiring complex filtering with high query-per-second (QPS) performance. - -This guide is designed to be comprehensive yet accessible, with clear step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system. Whether you're building a recommendation engine, content discovery platform, or any application requiring intelligent document retrieval, this tutorial provides the foundation you need. - -**Note**: If you want to perform semantic search using the FTS (Full-Text Search) index instead, please take a look at [this alternative approach](https://developer.couchbase.com//tutorial-huggingface-couchbase-vector-search-with-fts). - -## How to Run This Tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/huggingface/gsi/hugging_face.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -## Setup and Installation - -### Install Necessary Libraries - - -```python -!pip install --quiet langchain-couchbase==0.5.0 transformers==4.56.1 sentence_transformers==5.1.0 langchain_huggingface==0.3.1 python-dotenv==1.1.1 ipywidgets -``` - -### Import Required Modules - - -```python -from pathlib import Path -from datetime import timedelta -from transformers import pipeline, AutoModel, AutoTokenizer -from langchain_huggingface.embeddings.huggingface import HuggingFaceEmbeddings -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.options import ClusterOptions -from langchain_core.globals import set_llm_cache -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -import getpass -import os -from dotenv import load_dotenv -``` - -### Prerequisites - -To run this tutorial successfully, you will need the following requirements: - -#### Couchbase Requirements - -**Version Requirements:** -- **Couchbase Server 8.0+** or **Couchbase Capella** with Query Service enabled -- Note: GSI vector search is a newer feature that requires Couchbase Server 8.0 or above, unlike FTS-based vector search which works with 7.6+ - -**Access Requirements:** -- A configured Bucket, Scope, and Collection -- User credentials with **Read and Write** access to your target collection -- Network connectivity to your Couchbase cluster - -#### Create and Deploy Your Free Tier Operational Cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -#### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -#### Python Environment Requirements - -- **Python 3.8+** -- Required Python packages (installed via pip in the next section): - - `langchain-couchbase==0.5.0rc1` - - `transformers==4.56.1` - - `sentence_transformers==5.1.0` - - `langchain_huggingface==0.3.1` - - -```python -# Load environment variables -load_dotenv("./.env") - -# Configuration -couchbase_cluster_url = os.getenv('CB_CLUSTER_URL') or input("Couchbase Cluster URL:") -couchbase_username = os.getenv('CB_USERNAME') or input("Couchbase Username:") -couchbase_password = os.getenv('CB_PASSWORD') or getpass.getpass("Couchbase password:") -couchbase_bucket = os.getenv('CB_BUCKET') or input("Couchbase Bucket:") -couchbase_scope = os.getenv('CB_SCOPE') or input("Couchbase Scope:") -couchbase_collection = os.getenv('CB_COLLECTION') or input("Couchbase Collection:") -``` - -## Couchbase Connection Setup - -### Create Authentication Object - -In this section, we first need to create a `PasswordAuthenticator` object that would hold our Couchbase credentials: - - -```python -auth = PasswordAuthenticator( - couchbase_username, - couchbase_password -) -``` - -### Connect to Cluster - -Then, we use this object to connect to Couchbase Cluster and select specified above bucket, scope and collection: - - -```python -print("Connecting to cluster at URL: " + couchbase_cluster_url) -cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth)) -cluster.wait_until_ready(timedelta(seconds=5)) - -bucket = cluster.bucket(couchbase_bucket) -scope = bucket.scope(couchbase_scope) -collection = scope.collection(couchbase_collection) -print("Connected to the cluster") -``` - - Connecting to cluster at URL: couchbase://localhost - Connected to the cluster - - -## Understanding GSI Vector Search - -### Optimizing Vector Search with Global Secondary Index (GSI) - -With Couchbase 8.0+, you can leverage the power of GSI-based vector search, which offers significant performance improvements over traditional Full-Text Search (FTS) approaches for vector-first workloads. GSI vector search provides high-performance vector similarity search with advanced filtering capabilities and is designed to scale to billions of vectors. - -#### GSI vs FTS: Choosing the Right Approach - -| Feature | GSI Vector Search | FTS Vector Search | -| --------------------- | --------------------------------------------------------------- | ----------------------------------------- | -| **Best For** | Vector-first workloads, complex filtering, high QPS performance| Hybrid search and high recall rates | -| **Couchbase Version** | 8.0.0+ | 7.6+ | -| **Filtering** | Pre-filtering with `WHERE` clauses (Composite) or post-filtering (BHIVE) | Pre-filtering with flexible ordering | -| **Scalability** | Up to billions of vectors (BHIVE) | Up to 10 million vectors | -| **Performance** | Optimized for concurrent operations with low memory footprint | Good for mixed text and vector queries | - -#### GSI Vector Index Types - -Couchbase offers two distinct GSI vector index types, each optimized for different use cases: - -##### Hyperscale Vector Indexes (BHIVE) - -- **Best for**: Pure vector searches like content discovery, recommendations, and semantic search -- **Use when**: You primarily perform vector-only queries without complex scalar filtering -- **Features**: - - High performance with low memory footprint - - Optimized for concurrent operations - - Designed to scale to billions of vectors - - Supports post-scan filtering for basic metadata filtering - -##### Composite Vector Indexes - -- **Best for**: Filtered vector searches that combine vector similarity with scalar value filtering -- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- **Features**: - - Efficient pre-filtering where scalar attributes reduce the vector comparison scope - - Best for well-defined workloads requiring complex filtering using GSI features - - Supports range lookups combined with vector search - -#### Index Type Selection for This Tutorial - -In this tutorial, we'll demonstrate creating a **BHIVE index** and running vector similarity queries using GSI. BHIVE is ideal for semantic search scenarios where you want: - -1. **High-performance vector search** across large datasets -2. **Low latency** for real-time applications -3. **Scalability** to handle growing vector collections -4. **Concurrent operations** for multi-user environments - -The BHIVE index will provide optimal performance for our Hugging Face embedding-based semantic search implementation. - -#### Alternative: Composite Vector Index - -If your use case requires complex filtering with scalar attributes, you may want to consider using a **Composite Vector Index** instead: - -```python -# Alternative: Create a Composite index for filtered searches -vector_store.create_index( - index_type=IndexType.COMPOSITE, - index_description="IVF,SQ8", - distance_metric=DistanceStrategy.COSINE, - index_name="huggingface_composite_index", -) -``` - -**Use Composite indexes when:** -- You need to filter by document metadata or attributes before vector similarity -- Your queries combine vector search with WHERE clauses -- You have well-defined filtering requirements that can reduce the search space - -**Note**: Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications where you need to search within specific categories, date ranges, or user-specific data segments. - -#### Understanding GSI Index Configuration (Couchbase 8.0 Feature) - -Before creating our BHIVE index, it's important to understand the configuration parameters that optimize vector storage and search performance. The `index_description` parameter controls how Couchbase optimizes vector storage through centroids and quantization. - -##### Index Description Format: `'IVF[],{PQ|SQ}'` - -###### Centroids (IVF - Inverted File) - -- Controls how the dataset is subdivided for faster searches -- **More centroids** = faster search, slower training time -- **Fewer centroids** = slower search, faster training time -- If omitted (like `IVF,SQ8`), Couchbase auto-selects based on dataset size - -###### Quantization Options - -**Scalar Quantization (SQ):** -- `SQ4`, `SQ6`, `SQ8` (4, 6, or 8 bits per dimension) -- Lower memory usage, faster search, slightly reduced accuracy - -**Product Quantization (PQ):** -- Format: `PQx` (e.g., `PQ32x8`) -- Better compression for very large datasets -- More complex but can maintain accuracy with smaller index size - -###### Common Configuration Examples - -- **`IVF,SQ8`** - Auto centroids, 8-bit scalar quantization (good default) -- **`IVF1000,SQ6`** - 1000 centroids, 6-bit scalar quantization -- **`IVF,PQ32x8`** - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - -##### Our Configuration Choice - -In this tutorial, we use `IVF,SQ8` which provides: -- **Auto-selected centroids** optimized for our dataset size -- **8-bit scalar quantization** for good balance of speed, memory usage, and accuracy -- **COSINE distance metric** ideal for semantic similarity search -- **Optimal performance** for most semantic search use cases - - -```python -# Create a BHIVE GSI vector index (good default: IVF,SQ8) -vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=couchbase_bucket, - scope_name=couchbase_scope, - collection_name=couchbase_collection, - embedding=HuggingFaceEmbeddings(), # Hugging Face Initialization - distance_metric=DistanceStrategy.COSINE -) -``` - -## Document Processing and Embedding - -### Embedding Documents - -Now that we have set up our vector store with Hugging Face embeddings, we can add documents to our collection. The `CouchbaseQueryVectorStore` automatically handles the embedding generation process using the Hugging Face transformers library. - -#### Understanding the Embedding Process - -When we add text documents to our vector store, several important processes happen automatically: - -1. **Text Preprocessing**: The input text is preprocessed and tokenized according to the Hugging Face model's requirements -2. **Vector Generation**: Each document is converted into a high-dimensional vector (embedding) that captures its semantic meaning -3. **Storage**: The embeddings are stored in Couchbase along with the original text and any metadata -4. **Indexing**: The vectors are indexed using our BHIVE GSI index for efficient similarity search - -#### Adding Sample Documents - -In this example, we're adding sample documents that demonstrate Couchbase's capabilities. The system will: -- Generate embeddings for each text document using the Hugging Face model -- Store them in our Couchbase collection -- Make them immediately available for semantic search once the GSI index is ready - -**Note**: The `batch_size` parameter controls how many documents are processed together, which can help optimize performance for large document sets. - - -```python -texts = [ - "Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.", - "It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.", - input("Enter custom embedding text:") -] -vector_store.add_texts(texts=texts, batch_size=32) -``` - - - - - ['7c601881e4bf4c53b5b4c2a25628d904', - '0442f351aec2415481138315d492ee80', - 'e20a8dcd8b464e8e819b87c9a0ff05c3'] - - - -## Vector Search Performance Optimization - -Now let's demonstrate the performance benefits of different optimization approaches available in Couchbase. We'll compare three optimization levels to show how each contributes to building a production-ready semantic search system: - -1. **Baseline (Raw Search)**: Basic vector similarity search without GSI optimization -2. **GSI-Optimized Search**: High-performance search using BHIVE GSI index -3. **Cache Benefits**: Show how caching can be applied on top of any search approach - -**Important**: Caching is orthogonal to index types - you can apply caching benefits to both raw searches and GSI-optimized searches to improve repeated query performance. - -### Understanding Vector Search Results - -Before we start our RAG comparisons, let's understand what the search results mean: - -When you perform a search query with vector search: - -1. **Query Embedding**: Your search text is converted into a vector embedding using the Hugging Face model -2. **Vector Similarity Calculation**: The system compares your query vector against all stored document vectors -3. **Distance Computation**: Using the COSINE distance metric, the system calculates similarity distances -4. **Result Ranking**: Documents are ranked by their distance values (lower = more similar) -5. **Post-processing**: Results include both the document content and metadata - -**Note**: The returned value represents the vector distance between query and document embeddings. Lower distance values indicate higher similarity. - -### RAG Search Function - -Let's create a comprehensive search function for our RAG performance comparison: - - -```python -import time - -def search_with_performance_metrics(query_text, stage_name, k=3): - """Perform optimized semantic search with detailed performance metrics""" - print(f"\n=== {stage_name.upper()} ===") - print(f"Query: \"{query_text}\"") - - start_time = time.time() - results = vector_store.similarity_search_with_score(query_text, k=k) - end_time = time.time() - - search_time = end_time - start_time - print(f"Search Time: {search_time:.4f} seconds") - print(f"Results Found: {len(results)} documents") - - for i, (doc, distance) in enumerate(results, 1): - print(f"\n[Result {i}]") - print(f"Vector Distance: {distance:.6f} (lower = more similar)") - # Use the document content directly from search results (no additional KV call needed) - print(f"Document Content: {doc.page_content}") - if hasattr(doc, 'metadata') and doc.metadata: - print(f"Metadata: {doc.metadata}") - - return search_time, results -``` - -### Phase 1: Baseline Performance (Raw Vector Search) - -First, let's establish baseline performance with raw vector search - no GSI optimization yet: - - -```python -test_query = "What are the key features of a scalable NoSQL database?" -print("Testing baseline performance without GSI optimization...") -baseline_time, baseline_results = search_with_performance_metrics( - test_query, "Phase 1: Baseline Vector Search" -) -``` - - Testing baseline performance without GSI optimization... - - === PHASE 1: BASELINE VECTOR SEARCH === - Query: "What are the key features of a scalable NoSQL database?" - Search Time: 0.1484 seconds - Results Found: 3 documents - - [Result 1] - Vector Distance: 0.586197 (lower = more similar) - Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - - [Result 2] - Vector Distance: 0.645435 (lower = more similar) - Document Content: It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more. - - [Result 3] - Vector Distance: 0.976888 (lower = more similar) - Document Content: this is a sample text with the data "hello" - - -### Phase 2: Create BHIVE GSI Index and Test Performance - -Now let's create the BHIVE GSI index and measure the performance improvement: - - -```python -# Create BHIVE index for optimized vector search -print("Creating BHIVE GSI vector index...") -try: - vector_store.create_index( - index_type=IndexType.BHIVE, - index_description="IVF,SQ8", - distance_metric=DistanceStrategy.COSINE, - index_name="huggingface_bhive_index", - ) - print("✓ BHIVE GSI vector index created successfully!") - - # Wait for index to become available - print("Waiting for index to become available...") - time.sleep(3) - -except Exception as e: - if "already exists" in str(e).lower(): - print("✓ BHIVE GSI vector index already exists, proceeding...") - else: - print(f"Error creating GSI index: {str(e)}") - -# Test the same query with GSI optimization -print("\nTesting performance with BHIVE GSI optimization...") -gsi_time, gsi_results = search_with_performance_metrics( - test_query, "Phase 2: GSI-Optimized Search" -) -``` - - Creating BHIVE GSI vector index... - ✓ BHIVE GSI vector index created successfully! - Waiting for index to become available... - - Testing performance with BHIVE GSI optimization... - - === PHASE 2: GSI-OPTIMIZED SEARCH === - Query: "What are the key features of a scalable NoSQL database?" - Search Time: 0.0848 seconds - Results Found: 3 documents - - [Result 1] - Vector Distance: 0.586197 (lower = more similar) - Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - - [Result 2] - Vector Distance: 0.645435 (lower = more similar) - Document Content: It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more. - - [Result 3] - Vector Distance: 0.976888 (lower = more similar) - Document Content: this is a sample text with the data "hello" - - -### Phase 3: Demonstrate Cache Benefits - -Now let's show how caching can improve performance for repeated queries. **Note**: Caching benefits apply to both raw searches and GSI-optimized searches. - - -```python -# Set up Couchbase cache (can be applied to any search approach) -print("Setting up Couchbase cache for improved performance on repeated queries...") -cache = CouchbaseCache( - cluster=cluster, - bucket_name=couchbase_bucket, - scope_name=couchbase_scope, - collection_name=couchbase_collection, -) -set_llm_cache(cache) -print("✓ Couchbase cache enabled!") - -# Test cache benefits with the same query (should show improvement on second run) -cache_query = "How does a distributed database handle high-speed operations?" - -print("\nTesting cache benefits with a different query...") -print("First execution (cache miss):") -cache_time_1, _ = search_with_performance_metrics( - cache_query, "Phase 3a: First Query (Cache Miss)", k=2 -) - -print("\nSecond execution (cache hit):") -cache_time_2, _ = search_with_performance_metrics( - cache_query, "Phase 3b: Repeated Query (Cache Hit)", k=2 -) -``` - - Setting up Couchbase cache for improved performance on repeated queries... - ✓ Couchbase cache enabled! - - Testing cache benefits with a different query... - First execution (cache miss): - - === PHASE 3A: FIRST QUERY (CACHE MISS) === - Query: "How does a distributed database handle high-speed operations?" - Search Time: 0.1024 seconds - Results Found: 2 documents - - [Result 1] - Vector Distance: 0.632770 (lower = more similar) - Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - - [Result 2] - Vector Distance: 0.677951 (lower = more similar) - Document Content: It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more. - - Second execution (cache hit): - - === PHASE 3B: REPEATED QUERY (CACHE HIT) === - Query: "How does a distributed database handle high-speed operations?" - Search Time: 0.0289 seconds - Results Found: 2 documents - - [Result 1] - Vector Distance: 0.632770 (lower = more similar) - Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - - [Result 2] - Vector Distance: 0.677951 (lower = more similar) - Document Content: It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more. - - -### Complete Performance Analysis - -Let's analyze the complete performance improvements across all optimization levels: - - -```python -print("\n" + "="*80) -print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"Phase 1 - Baseline (Raw Search): {baseline_time:.4f} seconds") -print(f"Phase 2 - GSI-Optimized Search: {gsi_time:.4f} seconds") -print(f"Phase 3 - Cache Benefits:") -print(f" First execution (cache miss): {cache_time_1:.4f} seconds") -print(f" Second execution (cache hit): {cache_time_2:.4f} seconds") - -print("\n" + "-"*80) -print("OPTIMIZATION IMPACT ANALYSIS:") -print("-"*80) - -# GSI improvement analysis -if gsi_time and baseline_time and gsi_time < baseline_time: - gsi_speedup = baseline_time / gsi_time - gsi_improvement = ((baseline_time - gsi_time) / baseline_time) * 100 - print(f"GSI Index Benefit: {gsi_speedup:.2f}x faster ({gsi_improvement:.1f}% improvement)") -else: - print(f"GSI Index Benefit: Performance similar to baseline (may vary with dataset size)") - -# Cache improvement analysis -if cache_time_2 and cache_time_1 and cache_time_2 < cache_time_1: - cache_speedup = cache_time_1 / cache_time_2 - cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100 - print(f"Cache Benefit: {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)") -else: - print(f"Cache Benefit: No significant improvement (results may be cached already)") - -print(f"\nKey Insights:") -print(f"• GSI optimization provides consistent performance benefits, especially with larger datasets") -print(f"• Caching benefits apply to both raw and GSI-optimized searches") -print(f"• Combined GSI + Cache provides the best performance for production applications") -print(f"• BHIVE indexes scale to billions of vectors with optimized concurrent operations") -``` - - - ================================================================================ - VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - Phase 1 - Baseline (Raw Search): 0.1484 seconds - Phase 2 - GSI-Optimized Search: 0.0848 seconds - Phase 3 - Cache Benefits: - First execution (cache miss): 0.1024 seconds - Second execution (cache hit): 0.0289 seconds - - -------------------------------------------------------------------------------- - OPTIMIZATION IMPACT ANALYSIS: - -------------------------------------------------------------------------------- - GSI Index Benefit: 1.75x faster (42.8% improvement) - Cache Benefit: 3.55x faster (71.8% improvement) - - Key Insights: - • GSI optimization provides consistent performance benefits, especially with larger datasets - • Caching benefits apply to both raw and GSI-optimized searches - • Combined GSI + Cache provides the best performance for production applications - • BHIVE indexes scale to billions of vectors with optimized concurrent operations - - -### Interactive Testing - -Try your own queries with the optimized search system: - - -```python -custom_query = input("Enter your search query: ") -search_with_performance_metrics(custom_query, "Interactive GSI-Optimized Search") - -``` - - - === INTERACTIVE GSI-OPTIMIZED SEARCH === - Query: "What is the sample data?" - Search Time: 0.0812 seconds - Results Found: 3 documents - - [Result 1] - Vector Distance: 0.623644 (lower = more similar) - Document Content: this is a sample text with the data "hello" - - [Result 2] - Vector Distance: 0.860599 (lower = more similar) - Document Content: It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more. - - [Result 3] - Vector Distance: 0.909207 (lower = more similar) - Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - - - - - - (0.08118820190429688, - [(Document(id='e20a8dcd8b464e8e819b87c9a0ff05c3', metadata={}, page_content='this is a sample text with the data "hello"'), - 0.6236441411684932), - (Document(id='0442f351aec2415481138315d492ee80', metadata={}, page_content='It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.'), - 0.8605992009935179), - (Document(id='7c601881e4bf4c53b5b4c2a25628d904', metadata={}, page_content='Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.'), - 0.9092065785676496)]) - - - -## Conclusion - -You have successfully built a powerful semantic search engine using Couchbase's GSI vector search capabilities and Hugging Face embeddings. This guide has walked you through the complete process of creating a high-performance vector search system that can scale to handle billions of documents. diff --git a/tutorial/markdown/generated/vector-search-cookbook/jinaai-fts-RAG_with_Couchbase_and_Jina_AI.md b/tutorial/markdown/generated/vector-search-cookbook/jinaai-fts-RAG_with_Couchbase_and_Jina_AI.md deleted file mode 100644 index 7266070..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/jinaai-fts-RAG_with_Couchbase_and_Jina_AI.md +++ /dev/null @@ -1,777 +0,0 @@ ---- -# frontmatter -path: "/tutorial-jina-couchbase-rag-with-fts" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Jina AI using FTS -short_title: RAG with Couchbase and Jina -description: - - Learn how to build a semantic search engine using Couchbase and Jina. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Jina embeddings and language models. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase using FTS. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - Jina AI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/jinaai/fts/RAG_with_Couchbase_and_Jina_AI.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Jina](https://jina.ai/) as the AI-powered embedding and language model provider, utilizing Full-Text Search (FTS). Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-jina-couchbase-rag-with-global-secondary-index) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/jinaai/fts/RAG_with_Couchbase_and_Jina_AI.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for Jina AI - -* Please follow the [instructions](https://jina.ai/) to generate the Jina AI credentials. -* Please follow the [instructions](https://chat.jina.ai/api) to generate the JinaChat credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and Jina provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -# Jina doesnt support openai other than 0.27 -%pip install --quiet datasets==3.6.0 langchain-couchbase==0.3.0 langchain-community==0.3.24 openai==0.27 python-dotenv==1.1.0 ipywidgets -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_community.chat_models import JinaChat -from langchain_community.embeddings import JinaEmbeddings -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True) - -# Suppress all logs from specific loggers -logging.getLogger('openai').setLevel(logging.WARNING) -logging.getLogger('httpx').setLevel(logging.WARNING) -``` - -# Loading Sensitive Informnation -In this section, we prompt the user to input essential configuration settings needed for integrating Couchbase with Cohere's API. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv("./.env") - -JINA_API_KEY = os.getenv("JINA_API_KEY") -JINACHAT_API_KEY = os.getenv("JINACHAT_API_KEY") - -CB_HOST = os.getenv("CB_HOST") or 'couchbase://localhost' -CB_USERNAME = os.getenv("CB_USERNAME") or 'Administrator' -CB_PASSWORD = os.getenv("CB_PASSWORD") or 'password' -CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME") or 'vector-search-testing' -INDEX_NAME = os.getenv("INDEX_NAME") or 'vector_search_jina' - -SCOPE_NAME = os.getenv("SCOPE_NAME") or 'shared' -COLLECTION_NAME = os.getenv("COLLECTION_NAME") or 'jina' -CACHE_COLLECTION = os.getenv("CACHE_COLLECTION") or 'cache' - -# Check if the variables are correctly loaded -if not JINA_API_KEY: - raise ValueError("JINA_API_KEY environment variable is not set") -if not JINACHAT_API_KEY: - raise ValueError("JINACHAT_API_KEY environment variable is not set") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-23 10:45:51,014 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-09-23 10:45:56,608 - INFO - Bucket 'vector-search-testing' exists. - 2025-09-23 10:45:59,312 - INFO - Collection 'jina' already exists. Skipping creation. - 2025-09-23 10:46:02,683 - INFO - Primary index present or created successfully. - 2025-09-23 10:46:03,447 - INFO - All documents cleared from the collection. - 2025-09-23 10:46:03,449 - INFO - Bucket 'vector-search-testing' exists. - 2025-09-23 10:46:06,152 - INFO - Collection 'jina_cache' already exists. Skipping creation. - 2025-09-23 10:46:09,482 - INFO - Primary index present or created successfully. - 2025-09-23 10:46:09,804 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This Jina vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `jina`. The configuration is set up for vectors with exactly `1024 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/jina_index.json' # Local setup: specify your file path here - -# # Version for Google Colab -# def load_index_definition_colab(): -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -# try: -# with open(index_definition_path, 'r') as file: -# index_definition = json.load(file) -# return index_definition -# except Exception as e: -# raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Version for Local Environment -def load_index_definition_local(index_definition_path): - try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - return index_definition - except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Usage -# Uncomment the appropriate line based on your environment -# index_definition = load_index_definition_colab() -index_definition = load_index_definition_local('jina_index.json') -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-09-23 10:47:03,763 - INFO - Index 'vector_search_jina' found - 2025-09-23 10:47:04,742 - INFO - Index 'vector_search_jina' already exists. Skipping creation/update. - - -# Creating Jina Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using Jina, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - - - -```python -try: - embeddings = JinaEmbeddings( - jina_api_key=JINA_API_KEY, model_name="jina-embeddings-v3" - ) - logging.info("Successfully created JinaEmbeddings") -except Exception as e: - raise ValueError(f"Error creating JinaEmbeddings: {str(e)}") -``` - - 2025-09-23 10:47:06,326 - INFO - Successfully created JinaEmbeddings - - -# Setting Up the Couchbase Vector Store -A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-09-23 10:47:12,343 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-23 10:47:18,035 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Error Handling: If an error occurs, only the current batch is affected -3. Progress Tracking: Easier to monitor and track the ingestion progress -4. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -# Calculate 60% of the dataset size and round to nearest integer -dataset_size = len(unique_news_articles) -subset_size = round(dataset_size * 0.6) - -# Filter articles by length and create subset -filtered_articles = [article for article in unique_news_articles[:subset_size] - if article and len(article) <= 50000] - -# Process in batches -batch_size = 50 - -try: - vector_store.add_texts( - texts=filtered_articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully") - -except CouchbaseException as e: - logging.error(f"Couchbase error during ingestion: {str(e)}") - raise RuntimeError(f"Error performing document ingestion: {str(e)}") -except Exception as e: - if "Payment Required" in str(e): - logging.error("Payment required for Jina AI API. Please check your subscription status and API key.") - print("To resolve this error:") - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Consider upgrading your subscription plan if needed") - else: - logging.error(f"Unexpected error during ingestion: {str(e)}") - raise RuntimeError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-09-23 10:50:03,866 - INFO - Document ingestion completed successfully - - -# Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-23 10:50:21,526 - INFO - Successfully created cache - - -# Creating the Jina Language Model (LLM) -Language models are AI systems that are trained to understand and generate human language. We'll be using Jina's language model to process user queries and generate meaningful responses. This model is a key component of our semantic search engine, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our search engine with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our search engine truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - - - - -```python -try: - llm = JinaChat(temperature=0.1, jinachat_api_key=JINACHAT_API_KEY) - logging.info("Successfully created JinaChat") -except Exception as e: - logging.error(f"Error creating JinaChat: {str(e)}. Please check your API key and network connection.") - raise -``` - - 2025-09-23 10:50:22,466 - INFO - Successfully created JinaChat - - -## Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. - -In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of the CouchbaseSearchVectorStore. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - -### Note on Retry Mechanism -The search implementation includes a retry mechanism to handle rate limiting and API errors gracefully. If a rate limit error (HTTP 429) is encountered, the system will automatically retry the request up to 3 times with exponential backoff, waiting 2 seconds initially and doubling the wait time between each retry. This helps manage API usage limits while maintaining service reliability. For other types of errors, such as payment requirements or general failures, appropriate error messages and troubleshooting steps are provided to help diagnose and resolve the issue. - - -```python -def perform_semantic_search(query, vector_store, max_retries=3, retry_delay=2): - for attempt in range(max_retries): - try: - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=5) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - return search_results, search_elapsed_time - - except Exception as e: - error_str = str(e) - - # Check if it's a rate limit error (HTTP 429) - if "http_status: 429" in error_str or "query request rejected" in error_str: - logging.warning(f"Rate limit hit (attempt {attempt+1}/{max_retries}). Waiting {retry_delay} seconds...") - time.sleep(retry_delay) - retry_delay *= 2 # Exponential backoff - - if attempt == max_retries - 1: - logging.error("Maximum retry attempts reached. API rate limit exceeded.") - raise RuntimeError("API rate limit exceeded. Please try again later or check your subscription.") - else: - # For other errors, don't retry - logging.error(f"Search error: {error_str}") - if "Payment Required" in error_str: - raise RuntimeError("Payment required for Jina AI API. Please check your subscription status and API key.") - else: - raise RuntimeError(f"Search failed: {error_str}") - -try: - query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - search_results, search_elapsed_time = perform_semantic_search(query, vector_store) - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-"*80) - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-"*80) - -except RuntimeError as e: - print(f"Error: {str(e)}") - print("\nTroubleshooting steps:") - if "API rate limit" in str(e): - print("1. Wait a few minutes before trying again") - print("2. Reduce the frequency of your requests") - print("3. Consider upgrading your Jina AI plan for higher rate limits") - elif "Payment required" in str(e): - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Update your API key configuration") - else: - print("1. Check your network connection") - print("2. Verify your Couchbase and Jina configurations") - print("3. Review the vector store implementation for any bugs") -``` - - 2025-09-23 10:50:25,678 - INFO - Semantic search completed in 2.13 seconds - - - - Semantic Search Results (completed in 2.13 seconds): - -------------------------------------------------------------------------------- - Score: 0.6798, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - Pep Guardiola has not been through a moment like this in his managerial career. Manchester City have lost nine matches in their past 12 - as many defeats as they had suffered in their previous 106 fixtures. At the end of October, City were still unbeaten at the top of the Premier League and favourites to win a fifth successive title. Now they are seventh, 12 points behind leaders Liverpool having played a game more. It has been an incredible fall from grace and left people trying to work out what has happened - and whether Guardiola can make it right. After discussing the situation with those who know him best, I have taken a closer look at the future - both short and long term - and how the current crisis at Man City is going to be solved. - - Pep Guardiola's Man City have lost nine of their past 12 matches - - Guardiola has also been giving it a lot of thought. He has not been sleeping very well, as he has said, and has not been himself at times when talking to the media. He has been talking to a lot of people about what is going on as he tries to work out the reasons for City's demise. Some reasons he knows, others he still doesn't. What people perhaps do not realise is Guardiola hugely doubts himself and always has. He will be thinking "I'm not going to be able to get us out of this" and needs the support of people close to him to push away those insecurities - and he has that. He is protected by his people who are very aware, like he is, that there are a lot of people that want City to fail. It has been a turbulent time for Guardiola. Remember those marks he had on his head after the 3-3 draw with Feyenoord in the Champions League? He always scratches his head, it is a gesture of nervousness. Normally nothing happens but on that day one of his nails was far too sharp so, after talking to the players in the changing room where he scratched his head because of his usual agitated gesturing, he went to the news conference. His right-hand man Manel Estiarte sent him photos in a message saying "what have you got on your head?", but by the time Guardiola returned to the coaching room there was hardly anything there again. He started that day with a cover on his nose after the same thing happened at the training ground the day before. Guardiola was having a footballing debate with Kyle Walker about positional stuff and marked his nose with that same nail. There was also that remarkable news conference after the Manchester derby when he said "I don't know what to do". That is partly true and partly not true. Ignore the fact Guardiola suggested he was "not good enough". He actually meant he was not good enough to resolve the situation with the group of players he has available and with all the other current difficulties. There are obviously logical explanations for the crisis and the first one has been talked about many times - the absence of injured midfielder Rodri. You know the game Jenga? When you take the wrong piece out, the whole tower collapses. That is what has happened here. It is normal for teams to have an over-reliance on one player if he is the best in the world in his position. And you cannot calculate the consequences of an injury that rules someone like Rodri out for the season. City are a team, like many modern ones, in which the holding midfielder is a key element to the construction. So, when you take Rodri out, it is difficult to hold it together. There were Plan Bs - John Stones, Manuel Akanji, even Nathan Ake - but injuries struck. The big injury list has been out of the ordinary and the busy calendar has also played a part in compounding the issues. However, one factor even Guardiola cannot explain is the big uncharacteristic errors in almost every game from international players. Why did Matheus Nunes make that challenge to give away the penalty against Manchester United? Jack Grealish is sent on at the end to keep the ball and cannot do that. There are errors from Walker and other defenders. These are some of the best players in the world. Of course the players' mindset is important, and confidence is diminishing. Wrong decisions get taken so there is almost panic on the pitch instead of calm. There are also players badly out of form who are having to play because of injuries. Walker is now unable to hide behind his pace, I'm not sure Kevin de Bruyne is ever getting back to the level he used to be at, Bernardo Silva and Ilkay Gundogan do not have time to rest, Grealish is not playing at his best. Some of these players were only meant to be playing one game a week but, because of injuries, have played 12 games in 40 days. It all has a domino effect. One consequence is that Erling Haaland isn't getting the service to score. But the Norwegian still remains City's top-scorer with 13. Defender Josko Gvardiol is next on the list with just four. The way their form has been analysed inside the City camp is there have only been three games where they deserved to lose (Liverpool, Bournemouth and Aston Villa). But of course it is time to change the dynamic. - - Guardiola has never protected his players so much. He has not criticised them and is not going to do so. They have won everything with him. Instead of doing more with them, he has tried doing less. He has sometimes given them more days off to clear their heads, so they can reset - two days this week for instance. Perhaps the time to change a team is when you are winning, but no-one was suggesting Man City were about to collapse when they were top and unbeaten after nine league games. Some people have asked how bad it has to get before City make a decision on Guardiola. The answer is that there is no decision to be made. Maybe if this was Real Madrid, Barcelona or Juventus, the pressure from outside would be massive and the argument would be made that Guardiola has to go. At City he has won the lot, so how can anyone say he is failing? Yes, this is a crisis. But given all their problems, City's renewed target is finishing in the top four. That is what is in all their heads now. The idea is to recover their essence by improving defensive concepts that are not there and re-establishing the intensity they are known for. Guardiola is planning to use the next two years of his contract, which is expected to be his last as a club manager, to prepare a new Manchester City. When he was at the end of his four years at Barcelona, he asked two managers what to do when you feel people are not responding to your instructions. Do you go or do the players go? Sir Alex Ferguson and Rafael Benitez both told him that the players need to go. Guardiola did not listen because of his emotional attachment to his players back then and he decided to leave the Camp Nou because he felt the cycle was over. He will still protect his players now but there is not the same emotional attachment - so it is the players who are going to leave this time. It is likely City will look to replace five or six regular starters. Guardiola knows it is the end of an era and the start of a new one. Changes will not be immediate and the majority of the work will be done in the summer. But they are open to any opportunities in January - and a holding midfielder is one thing they need. In the summer City might want to get Spain's Martin Zubimendi from Real Sociedad and they know 60m euros (£50m) will get him. He said no to Liverpool last summer even though everything was agreed, but he now wants to move on and the Premier League is the target. Even if they do not get Zubimendi, that is the calibre of footballer they are after. A new Manchester City is on its way - with changes driven by Guardiola, incoming sporting director Hugo Viana and the football department. - -------------------------------------------------------------------------------- - Score: 0.6795, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - Pep Guardiola has not been through a moment like this in his managerial career. Manchester City have lost nine matches in their past 12 - as many defeats as they had suffered in their previous 106 fixtures. At the end of October, City were still unbeaten at the top of the Premier League and favourites to win a fifth successive title. Now they are seventh, 12 points behind leaders Liverpool having played a game more. It has been an incredible fall from grace and left people trying to work out what has happened - and whether Guardiola can make it right. After discussing the situation with those who know him best, I have taken a closer look at the future - both short and long term - and how the current crisis at Man City is going to be solved. - - Pep Guardiola's Man City have lost nine of their past 12 matches - - Guardiola has also been giving it a lot of thought. He has not been sleeping very well, as he has said, and has not been himself at times when talking to the media. He has been talking to a lot of people about what is going on as he tries to work out the reasons for City's demise. Some reasons he knows, others he still doesn't. What people perhaps do not realise is Guardiola hugely doubts himself and always has. He will be thinking "I'm not going to be able to get us out of this" and needs the support of people close to him to push away those insecurities - and he has that. He is protected by his people who are very aware, like he is, that there are a lot of people that want City to fail. It has been a turbulent time for Guardiola. Remember those marks he had on his head after the 3-3 draw with Feyenoord in the Champions League? He always scratches his head, it is a gesture of nervousness. Normally nothing happens but on that day one of his nails was far too sharp so, after talking to the players in the changing room where he scratched his head because of his usual agitated gesturing, he went to the news conference. His right-hand man Manel Estiarte sent him photos in a message saying "what have you got on your head?", but by the time Guardiola returned to the coaching room there was hardly anything there again. He started that day with a cover on his nose after the same thing happened at the training ground the day before. Guardiola was having a footballing debate with Kyle Walker about positional stuff and marked his nose with that same nail. There was also that remarkable news conference after the Manchester derby when he said "I don't know what to do". That is partly true and partly not true. Ignore the fact Guardiola suggested he was "not good enough". He actually meant he was not good enough to resolve the situation with the group of players he has available and with all the other current difficulties. There are obviously logical explanations for the crisis and the first one has been talked about many times - the absence of injured midfielder Rodri. You know the game Jenga? When you take the wrong piece out, the whole tower collapses. That is what has happened here. It is normal for teams to have an over-reliance on one player if he is the best in the world in his position. And you cannot calculate the consequences of an injury that rules someone like Rodri out for the season. City are a team, like many modern ones, in which the holding midfielder is a key element to the construction. So, when you take Rodri out, it is difficult to hold it together. There were Plan Bs - John Stones, Manuel Akanji, even Nathan Ake - but injuries struck. The big injury list has been out of the ordinary and the busy calendar has also played a part in compounding the issues. However, one factor even Guardiola cannot explain is the big uncharacteristic errors in almost every game from international players. Why did Matheus Nunes make that challenge to give away the penalty against Manchester United? Jack Grealish is sent on at the end to keep the ball and cannot do that. There are errors from Walker and other defenders. These are some of the best players in the world. Of course the players' mindset is important, and confidence is diminishing. Wrong decisions get taken so there is almost panic on the pitch instead of calm. There are also players badly out of form who are having to play because of injuries. Walker is now unable to hide behind his pace, I'm not sure Kevin de Bruyne is ever getting back to the level he used to be at, Bernardo Silva and Ilkay Gundogan do not have time to rest, Grealish is not playing at his best. Some of these players were only meant to be playing one game a week but, because of injuries, have played 12 games in 40 days. It all has a domino effect. One consequence is that Erling Haaland isn't getting the service to score. But the Norwegian still remains City's top-scorer with 13. Defender Josko Gvardiol is next on the list with just four. The way their form has been analysed inside the City camp is there have only been three games where they deserved to lose (Liverpool, Bournemouth and Aston Villa). But of course it is time to change the dynamic. - - Guardiola has never protected his players so much. He has not criticised them and is not going to do so. They have won everything with him. Instead of doing more with them, he has tried doing less. He has sometimes given them more days off to clear their heads, so they can reset - two days this week for instance. Perhaps the time to change a team is when you are winning, but no-one was suggesting Man City were about to collapse when they were top and unbeaten after nine league games. Some people have asked how bad it has to get before City make a decision on Guardiola. The answer is that there is no decision to be made. Maybe if this was Real Madrid, Barcelona or Juventus, the pressure from outside would be massive and the argument would be made that Guardiola has to go. At City he has won the lot, so how can anyone say he is failing? Yes, this is a crisis. But given all their problems, City's renewed target is finishing in the top four. That is what is in all their heads now. The idea is to recover their essence by improving defensive concepts that are not there and re-establishing the intensity they are known for. Guardiola is planning to use the next two years of his contract, which is expected to be his last as a club manager, to prepare a new Manchester City. When he was at the end of his four years at Barcelona, he asked two managers what to do when you feel people are not responding to your instructions. Do you go or do the players go? Sir Alex Ferguson and Rafael Benitez both told him that the players need to go. Guardiola did not listen because of his emotional attachment to his players back then and he decided to leave the Camp Nou because he felt the cycle was over. He will still protect his players now but there is not the same emotional attachment - so it is the players who are going to leave this time. It is likely City will look to replace five or six regular starters. Guardiola knows it is the end of an era and the start of a new one. Changes will not be immediate and the majority of the work will be done in the summer. But they are open to any opportunities in January - and a holding midfielder is one thing they need. In the summer City might want to get Spain's Martin Zubimendi from Real Sociedad and they know 60m euros (£50m) will get him. He said no to Liverpool last summer even though everything was agreed, but he now wants to move on and the Premier League is the target. Even if they do not get Zubimendi, that is the calibre of footballer they are after. A new Manchester City is on its way - with changes driven by Guardiola, incoming sporting director Hugo Viana and the football department. - -------------------------------------------------------------------------------- - Score: 0.6207, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - - ... (output truncated for brevity) - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -try: - template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below: - {context} - - Question: {question}""" - prompt = ChatPromptTemplate.from_template(template) - - rag_chain = ( - {"context": vector_store.as_retriever(search_kwargs={"k": 2}), "question": RunnablePassthrough()} - | prompt - | llm - | StrOutputParser() - ) - logging.info("Successfully created RAG chain") -except Exception as e: - raise ValueError(f"Error creating RAG chain: {str(e)}") -``` - - 2025-09-23 10:50:26,937 - INFO - Successfully created RAG chain - - - -```python -try: - # Create chain with k=2 - # Start with k=4 and gradually reduce if token limit exceeded - # k=4 -> k=3 -> k=2 based on token limit warnings - # Final k=2 produced valid response about Guardiola in 2.33 seconds - current_chain = ( - { - "context": vector_store.as_retriever(search_kwargs={"k": 2}), - "question": RunnablePassthrough() - } - | prompt - | llm - | StrOutputParser() - ) - - # Try to get response - start_time = time.time() - rag_response = current_chain.invoke(query) - elapsed_time = time.time() - start_time - - logging.info(f"RAG response generated in {elapsed_time:.2f} seconds using k=2") - print(f"RAG Response: {rag_response}") - print(f"Response generated in {elapsed_time:.2f} seconds") - -except Exception as e: - if "Payment Required" in str(e): - logging.error("Payment required for Jina AI API. Please check your subscription status and API key.") - print("To resolve this error:") - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Consider upgrading your subscription plan if needed") - else: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-23 10:50:47,733 - INFO - RAG response generated in 17.23 seconds using k=2 - - - RAG Response: Pep Guardiola has been grappling with self-doubt and seeking support to navigate Manchester City's current crisis. - Response generated in 17.23 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What was manchester city manager pep guardiola's reaction to the team's current form?", # Repeated query - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - - print(f"Time taken: {elapsed_time:.2f} seconds") -except Exception as e: - if "Payment Required" in str(e): - logging.error("Payment required for Jina AI API. Please check your subscription status and API key.") - print("To resolve this error:") - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Consider upgrading your subscription plan if needed") - else: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: Fulham and Liverpool played to a 2-2 draw at Anfield, with both teams showcasing strong performances. - Time taken: 5.13 seconds - - Query 2: What was manchester city manager pep guardiola's reaction to the team's current form? - Response: Pep Guardiola has been grappling with self-doubt and seeking support to navigate Manchester City's current crisis. - Time taken: 2.16 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: Fulham and Liverpool played to a 2-2 draw at Anfield, with both teams showcasing strong performances. - Time taken: 1.95 seconds - - -## Conclusion -By following these steps, you’ll have a fully functional semantic search engine that leverages the strengths of Couchbase and Jina. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you’re a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/jinaai-gsi-RAG_with_Couchbase_and_Jina_AI.md b/tutorial/markdown/generated/vector-search-cookbook/jinaai-gsi-RAG_with_Couchbase_and_Jina_AI.md deleted file mode 100644 index fd989e7..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/jinaai-gsi-RAG_with_Couchbase_and_Jina_AI.md +++ /dev/null @@ -1,927 +0,0 @@ ---- -# frontmatter -path: "/tutorial-jina-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Jina AI using GSI -short_title: RAG with Couchbase and Jina -description: - - Learn how to build a semantic search engine using Couchbase and Jina. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Jina embeddings and language models. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase using GSI. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - GSI - - Artificial Intelligence - - LangChain - - Jina AI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/jinaai/gsi/RAG_with_Couchbase_and_Jina_AI.ipynb) - -# Semantic Search with Couchbase GSI Vector Indexes and Jina AI - -## Overview - -This tutorial demonstrates building a high-performance semantic search engine using Couchbase's GSI (Global Secondary Index) vector search and Jina AI for embeddings and language models. We'll show measurable performance improvements with GSI optimization and implement a complete RAG (Retrieval-Augmented Generation) system. Alternatively if you want to perform semantic search using the FTS, please take a look at [this.](https://developer.couchbase.com/tutorial-jina-couchbase-rag-with-fts) - -**Key Features:** -- High-performance GSI vector search with BHIVE indexing -- Jina AI embeddings and language models -- Performance benchmarks showing GSI benefits -- Complete RAG workflow with caching optimization - -**Requirements:** Couchbase Server 8.0+ or Capella with Query Service enabled. - -## How to Run This Tutorial - -This tutorial is available as a Jupyter Notebook that you can run interactively on [Google Colab](https://colab.research.google.com/) or locally by setting up the Python environment. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/jinaai/gsi/RAG_with_Couchbase_and_Jina_AI.ipynb). - -## Prerequisites - -### System Requirements - -- **Couchbase Server 8.0+** or Couchbase Capella -- **Query Service enabled** (required for GSI Vector Indexes) -- **Jina AI API credentials** ([Get them here](https://jina.ai/)) -- **JinaChat API credentials** ([Get them here](https://chat.jina.ai/api)) - -### Couchbase Capella Setup - -1. **Create Account:** Deploy a [free tier cluster](https://cloud.couchbase.com/sign-up) -2. **Configure Access:** Set up database credentials and network security -3. **Enable Query Service:** Required for GSI vector search functionality - -## Setup and Installation - -### Install Required Libraries - -Install the necessary packages for Couchbase GSI vector search, Jina AI integration, and LangChain RAG capabilities. - - -```python -# Jina doesnt support openai other than 0.27 -%pip install --quiet datasets==3.6.0 langchain-couchbase==0.5.0 langchain-community==0.3.24 openai==0.27 python-dotenv==1.1.0 ipywidgets -``` - - Note: you may need to restart the kernel to use updated packages. - - -### Import Required Modules - -Import libraries for Couchbase GSI vector search, Jina AI models, and LangChain components. - - -```python -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_community.chat_models import JinaChat -from langchain_community.embeddings import JinaEmbeddings -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -``` - -### Configure Logging - -Set up logging to track progress and capture any errors during execution. - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True) - -# Suppress all logs from specific loggers -logging.getLogger('openai').setLevel(logging.WARNING) -logging.getLogger('httpx').setLevel(logging.WARNING) -``` - -### Environment Configuration - -Load environment variables for secure access to Jina AI and Couchbase services. Create a `.env` file with your credentials. - - -```python -load_dotenv("./.env") - -JINA_API_KEY = os.getenv("JINA_API_KEY") -JINACHAT_API_KEY = os.getenv("JINACHAT_API_KEY") - -CB_HOST = os.getenv("CB_HOST") or 'couchbase://localhost' -CB_USERNAME = os.getenv("CB_USERNAME") or 'Administrator' -CB_PASSWORD = os.getenv("CB_PASSWORD") or 'password' -CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME") or 'vector-search-testing' -INDEX_NAME = os.getenv("INDEX_NAME") or 'vector_search_jina' - -SCOPE_NAME = os.getenv("SCOPE_NAME") or 'shared' -COLLECTION_NAME = os.getenv("COLLECTION_NAME") or 'jina' -CACHE_COLLECTION = os.getenv("CACHE_COLLECTION") or 'cache' - -# Check if the variables are correctly loaded -if not JINA_API_KEY: - raise ValueError("JINA_API_KEY environment variable is not set") -if not JINACHAT_API_KEY: - raise ValueError("JINACHAT_API_KEY environment variable is not set") -``` - -## Couchbase Connection Setup - -### Connect to Cluster - -Establish connection to Couchbase cluster for vector storage and retrieval operations. - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-10-08 11:18:34,736 - INFO - Successfully connected to Couchbase - - -### Setup Collections - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) -``` - - 2025-10-08 11:18:36,208 - INFO - Bucket 'vector-search-testing' exists. - 2025-10-08 11:18:36,219 - INFO - Collection 'jina' already exists. Skipping creation. - 2025-10-08 11:18:38,322 - INFO - All documents cleared from the collection. - 2025-10-08 11:18:38,322 - INFO - Bucket 'vector-search-testing' exists. - 2025-10-08 11:18:38,327 - INFO - Collection 'jina_cache' already exists. Skipping creation. - 2025-10-08 11:18:40,480 - INFO - All documents cleared from the collection. - - - - - - - - - -## Document Processing and Vector Store - -### Create Jina Embeddings - -Set up Jina AI embeddings to convert text into high-dimensional vectors that capture semantic meaning for similarity search. - - -```python -try: - embeddings = JinaEmbeddings( - jina_api_key=JINA_API_KEY, model_name="jina-embeddings-v3" - ) - logging.info("Successfully created JinaEmbeddings") -except Exception as e: - raise ValueError(f"Error creating JinaEmbeddings: {str(e)}") -``` - - 2025-10-08 11:18:56,191 - INFO - Successfully created JinaEmbeddings - - -### Create GSI Vector Store - -Set up the GSI vector store for high-performance vector storage and similarity search using Couchbase's Query Service. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created GSI vector store") -except Exception as e: - raise ValueError(f"Failed to create GSI vector store: {str(e)}") -``` - - 2025-10-08 11:18:57,341 - INFO - Successfully created GSI vector store - - -### Index Creation Timing - -**Important**: GSI Vector Indexes must be created AFTER uploading vector data. The index creation process analyzes existing vectors to optimize search performance through clustering and quantization. - -### Load Dataset - -Load the BBC News dataset for real-world testing data with authentic news articles covering various topics. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-10-08 11:19:03,903 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -#### Clean Data - -Remove duplicate articles to ensure clean search results. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -#### Store Data - -Process articles in batches and store them in the vector database with embeddings. We'll use 60% of the dataset for faster processing while maintaining good search quality. - - -```python -# Calculate 60% of the dataset size and round to nearest integer -dataset_size = len(unique_news_articles) -subset_size = round(dataset_size * 0.6) - -# Filter articles by length and create subset -filtered_articles = [article for article in unique_news_articles[:subset_size] - if article and len(article) <= 50000] - -# Process in batches -batch_size = 50 - -try: - vector_store.add_texts( - texts=filtered_articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully") - -except CouchbaseException as e: - logging.error(f"Couchbase error during ingestion: {str(e)}") - raise RuntimeError(f"Error performing document ingestion: {str(e)}") -except Exception as e: - if "Payment Required" in str(e): - logging.error("Payment required for Jina AI API. Please check your subscription status and API key.") - print("To resolve this error:") - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Consider upgrading your subscription plan if needed") - else: - logging.error(f"Unexpected error during ingestion: {str(e)}") - raise RuntimeError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-10-08 11:20:18,363 - INFO - Document ingestion completed successfully - - -## Vector Search Performance Testing - -Now let's demonstrate the performance benefits of GSI optimization by testing pure vector search performance. We'll compare three optimization levels: - -1. **Baseline Performance**: Vector search without GSI optimization -2. **GSI-Optimized Performance**: Same search with BHIVE GSI index -3. **Cache Benefits**: Show how caching can be applied on top of GSI for repeated queries - -### GSI Vector Index Types Overview - -Before we start testing, let's understand the index types available: - -**Hyperscale Vector Indexes (BHIVE):** -- **Best for**: Pure vector searches - content discovery, recommendations, semantic search -- **Performance**: High performance with low memory footprint, designed to scale to billions of vectors -- **Optimization**: Optimized for concurrent operations, supports simultaneous searches and inserts -- **Use when**: You primarily perform vector-only queries without complex scalar filtering -- **Ideal for**: Large-scale semantic search, recommendation systems, content discovery - -**Composite Vector Indexes:** -- **Best for**: Filtered vector searches that combine vector search with scalar value filtering -- **Performance**: Efficient pre-filtering where scalar attributes reduce the vector comparison scope -- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- **Ideal for**: Compliance-based filtering, user-specific searches, time-bounded queries -- **Note**: Scalar filters take precedence over vector similarity - -**Choosing the Right Index Type:** -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For this tutorial, we'll use **BHIVE** as it's optimized for pure semantic search scenarios. - -### Index Configuration Details - -The `index_description` parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -**Format**: `'IVF[],{PQ|SQ}'` - -#### **IVF (Inverted File Index) - Centroids Configuration** -- **Purpose**: Controls how the dataset is subdivided into clusters for faster searches -- **Trade-offs**: More centroids = faster searches but slower training time -- **Auto-selection**: If omitted (e.g., `IVF,SQ8`), Couchbase automatically selects the optimal number based on dataset size -- **Manual setting**: Specify exact count (e.g., `IVF1000,SQ8` for 1000 centroids) - -#### **Quantization Options - Vector Compression** - -**SQ (Scalar Quantization)** -- **Purpose**: Compresses vectors by reducing precision of individual components -- **Settings**: `SQ4`, `SQ6`, `SQ8` (4-bit, 6-bit, 8-bit precision) -- **Trade-off**: Lower bits = more compression but less precision -- **Best for**: General-purpose applications where some precision loss is acceptable - -**PQ (Product Quantization)** -- **Purpose**: Advanced compression using subquantizers for better precision -- **Format**: `PQx` (e.g., `PQ32x8` = 32 subquantizers of 8 bits each) -- **Trade-off**: More complex but often better precision than SQ at similar compression ratios -- **Best for**: Applications requiring high precision with significant compression - -#### **Common Configuration Examples** - -``` -IVF,SQ8 # Auto-selected centroids with 8-bit scalar quantization (recommended default) -IVF1000,SQ6 # 1000 centroids with 6-bit scalar quantization (higher compression) -IVF,PQ32x8 # Auto-selected centroids with 32 subquantizers of 8 bits each -IVF500,PQ16x4 # 500 centroids with 16 subquantizers of 4 bits each (high compression) -``` - -#### **Performance Considerations** - -**Distance Interpretation**: In GSI vector search, lower distance values indicate higher similarity, while higher distance values indicate lower similarity. - -**Scalability**: BHIVE indexes can scale to billions of vectors with optimized concurrent operations, making them suitable for large-scale production deployments. - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - -### Vector Search Test Function - - -```python -def test_vector_search_performance(vector_store, query, label="Vector Search"): - """Test pure vector search performance and return timing metrics""" - print(f"\n[{label}] Testing vector search performance") - print(f"[{label}] Query: '{query}'") - - start_time = time.time() - - try: - results = vector_store.similarity_search_with_score(query, k=3) - end_time = time.time() - search_time = end_time - start_time - - print(f"[{label}] Vector search completed in {search_time:.4f} seconds") - print(f"[{label}] Found {len(results)} documents") - - if results: - doc, distance = results[0] - print(f"[{label}] Top result distance: {distance:.6f} (lower = more similar)") - preview = doc.page_content[:100] + "..." if len(doc.page_content) > 100 else doc.page_content - print(f"[{label}] Top result preview: {preview}") - - return search_time - except Exception as e: - print(f"[{label}] Vector search failed: {str(e)}") - return None -``` - -### Test 1: Baseline Performance (No GSI Index) - -Test pure vector search performance without GSI optimization. - - -```python -# Test baseline vector search performance without GSI index -test_query = "What was manchester city manager pep guardiola's reaction to the team's current form?" -print("Testing baseline vector search performance without GSI optimization...") -baseline_time = test_vector_search_performance(vector_store, test_query, "Baseline Search") -print(f"\nBaseline vector search time (without GSI): {baseline_time:.4f} seconds\n") -``` - - Testing baseline vector search performance without GSI optimization... - - [Baseline Search] Testing vector search performance - [Baseline Search] Query: 'What was manchester city manager pep guardiola's reaction to the team's current form?' - [Baseline Search] Vector search completed in 0.8305 seconds - [Baseline Search] Found 3 documents - [Baseline Search] Top result distance: 0.457932 (lower = more similar) - [Baseline Search] Top result preview: 'Promised change, but Juventus are back in crisis' - - "We have entirely changed the way we think about... - - Baseline vector search time (without GSI): 0.8305 seconds - - - -### Create BHIVE GSI Index - -Now let's create a BHIVE GSI vector index to enable high-performance vector searches. The index creation is done programmatically through the vector store. - - -```python -# Create GSI Vector Index for high-performance searches -print("Creating BHIVE GSI vector index...") -try: - vector_store.create_index( - index_type=IndexType.BHIVE, # Use IndexType.COMPOSITE for Composite index - index_description="IVF,SQ8" - ) - print("GSI Vector index created successfully") - - # Wait for index to become available - print("Waiting for index to become available...") - time.sleep(5) - -except Exception as e: - if "already exists" in str(e).lower(): - print("GSI Vector index already exists, proceeding...") - else: - print(f"Error creating GSI index: {str(e)}") -``` - - Creating BHIVE GSI vector index... - GSI Vector index created successfully - Waiting for index to become available... - - -### Alternative: Composite Index Configuration - -If your use case requires complex filtering with scalar attributes, you can create a **Composite index** instead by changing the configuration above: - -```python -# Alternative: Create a Composite index for filtered searches -vector_store.create_index( - index_type=IndexType.COMPOSITE, # Instead of IndexType.BHIVE - index_description="IVF,SQ8" # Same quantization settings -) -``` - -### Test 2: GSI-Optimized Performance - -Test the same vector search with BHIVE GSI optimization. - - -```python -# Test vector search performance with GSI index -gsi_test_query = "What happened in the latest Premier League matches?" -print("Testing vector search performance with BHIVE GSI optimization...") -gsi_time = test_vector_search_performance(vector_store, gsi_test_query, "GSI-Optimized Search") -``` - - Testing vector search performance with BHIVE GSI optimization... - - [GSI-Optimized Search] Testing vector search performance - [GSI-Optimized Search] Query: 'What happened in the latest Premier League matches?' - [GSI-Optimized Search] Vector search completed in 0.6452 seconds - [GSI-Optimized Search] Found 3 documents - [GSI-Optimized Search] Top result distance: 0.394714 (lower = more similar) - [GSI-Optimized Search] Top result preview: The latest updates and analysis from the BBC. - - -### Test 3: Cache Benefits Testing - -Now let's demonstrate how caching can improve performance for repeated queries. **Note**: Caching benefits apply to both baseline and GSI-optimized searches. - - -```python -# Set up Couchbase cache (can be applied to any search approach) -print("Setting up Couchbase cache for improved performance on repeated queries...") -cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, -) -set_llm_cache(cache) -print("✓ Couchbase cache enabled!") -``` - - Setting up Couchbase cache for improved performance on repeated queries... - ✓ Couchbase cache enabled! - - - -```python -# Test cache benefits with a different query to avoid interference -cache_test_query = "What are the latest football transfer developments?" - -print("Testing cache benefits with vector search...") -print("First execution (cache miss):") -cache_time_1 = test_vector_search_performance(vector_store, cache_test_query, "Cache Test - First Run") - -print("\nSecond execution (cache hit - should be faster):") -cache_time_2 = test_vector_search_performance(vector_store, cache_test_query, "Cache Test - Second Run") -``` - - Testing cache benefits with vector search... - First execution (cache miss): - - [Cache Test - First Run] Testing vector search performance - [Cache Test - First Run] Query: 'What are the latest football transfer developments?' - [Cache Test - First Run] Vector search completed in 0.9695 seconds - [Cache Test - First Run] Found 3 documents - [Cache Test - First Run] Top result distance: 0.394020 (lower = more similar) - [Cache Test - First Run] Top result preview: The latest updates and analysis from the BBC. - - Second execution (cache hit - should be faster): - - [Cache Test - Second Run] Testing vector search performance - [Cache Test - Second Run] Query: 'What are the latest football transfer developments?' - [Cache Test - Second Run] Vector search completed in 0.5252 seconds - [Cache Test - Second Run] Found 3 documents - [Cache Test - Second Run] Top result distance: 0.394020 (lower = more similar) - [Cache Test - Second Run] Top result preview: The latest updates and analysis from the BBC. - - -### Vector Search Performance Analysis - -Let's analyze the vector search performance improvements across all optimization levels: - - -```python -print("\n" + "="*80) -print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"Phase 1 - Baseline Search (No GSI): {baseline_time:.4f} seconds") -print(f"Phase 2 - GSI-Optimized Search: {gsi_time:.4f} seconds") -if cache_time_1 and cache_time_2: - print(f"Phase 3 - Cache Benefits:") - print(f" First execution (cache miss): {cache_time_1:.4f} seconds") - print(f" Second execution (cache hit): {cache_time_2:.4f} seconds") - -print("\n" + "-"*80) -print("VECTOR SEARCH OPTIMIZATION IMPACT:") -print("-"*80) - -# GSI improvement analysis -if baseline_time and gsi_time: - speedup = baseline_time / gsi_time if gsi_time > 0 else float('inf') - time_saved = baseline_time - gsi_time - percent_improvement = (time_saved / baseline_time) * 100 - print(f"GSI Index Benefit: {speedup:.2f}x faster ({percent_improvement:.1f}% improvement)") - -# Cache improvement analysis -if cache_time_1 and cache_time_2 and cache_time_2 < cache_time_1: - cache_speedup = cache_time_1 / cache_time_2 - cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100 - print(f"Cache Benefit: {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)") -else: - print(f"Cache Benefit: Variable (depends on query complexity and caching mechanism)") - -print(f"\nKey Insights for Vector Search Performance:") -print(f"• GSI BHIVE indexes provide significant performance improvements for vector similarity search") -print(f"• Performance gains are most dramatic for complex semantic queries") -print(f"• BHIVE optimization is particularly effective for high-dimensional embeddings") -print(f"• Combined with proper quantization (SQ8), GSI delivers production-ready performance") -print(f"• These performance improvements directly benefit any application using the vector store") -``` - - - ================================================================================ - VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - Phase 1 - Baseline Search (No GSI): 0.8305 seconds - Phase 2 - GSI-Optimized Search: 0.6452 seconds - Phase 3 - Cache Benefits: - First execution (cache miss): 0.9695 seconds - Second execution (cache hit): 0.5252 seconds - - -------------------------------------------------------------------------------- - VECTOR SEARCH OPTIMIZATION IMPACT: - -------------------------------------------------------------------------------- - GSI Index Benefit: 1.29x faster (22.3% improvement) - Cache Benefit: 1.85x faster (45.8% improvement) - - Key Insights for Vector Search Performance: - • GSI BHIVE indexes provide significant performance improvements for vector similarity search - • Performance gains are most dramatic for complex semantic queries - • BHIVE optimization is particularly effective for high-dimensional embeddings - • Combined with proper quantization (SQ8), GSI delivers production-ready performance - • These performance improvements directly benefit any application using the vector store - - -## Jina AI RAG Demo - -### What is RAG (Retrieval-Augmented Generation)? - -Now that we've optimized our vector search performance, let's demonstrate how to build a complete RAG system using Jina AI. RAG combines the power of our GSI-optimized semantic search with language model generation: - -1. **Query Processing**: User question is converted to vector embedding using Jina AI -2. **Document Retrieval**: GSI BHIVE index finds most relevant documents (now with proven performance improvements) -3. **Context Assembly**: Retrieved documents provide factual context for the language model -4. **Response Generation**: Jina's language model generates intelligent answers grounded in the retrieved data - -This demo shows how the vector search performance improvements we validated directly enhance the RAG workflow efficiency. - -### Create Jina Language Model - -Initialize Jina's chat model for generating intelligent responses based on our GSI-optimized retrieval system. - - -```python -print("Setting up Jina AI language model for RAG demo...") -try: - llm = JinaChat(temperature=0.1, jinachat_api_key=JINACHAT_API_KEY) - print("✓ JinaChat language model created successfully") - logging.info("Successfully created JinaChat") -except Exception as e: - print(f"✗ Error creating JinaChat: {str(e)}") - print("Please check your JINACHAT_API_KEY and network connection.") - raise -``` - - 2025-10-08 11:24:30,099 - INFO - Successfully created JinaChat - - - Setting up Jina AI language model for RAG demo... - ✓ JinaChat language model created successfully - - -### Build Optimized RAG Pipeline - -Create the complete RAG pipeline that integrates our GSI-optimized vector search with Jina's language model. - - -```python -try: - # Create RAG prompt template for structured responses - template = """You are a helpful assistant that answers questions based on the provided context. - If you cannot answer based on the context provided, respond with a generic answer. - Answer the question as truthfully as possible using the context below: - - Context: - {context} - - Question: {question} - - Answer:""" - - prompt = ChatPromptTemplate.from_template(template) - - # Build the RAG chain: GSI-Optimized Retrieval → Context → Generation → Output - rag_chain = ( - { - "context": vector_store.as_retriever(search_kwargs={"k": 2}), - "question": RunnablePassthrough() - } - | prompt - | llm - | StrOutputParser() - ) - print("Optimized RAG pipeline created successfully") - print("Components: GSI BHIVE Vector Search → Context Assembly → Jina Language Model → Response") -except Exception as e: - raise ValueError(f"Error creating RAG pipeline: {str(e)}") -``` - - Optimized RAG pipeline created successfully - Components: GSI BHIVE Vector Search → Context Assembly → Jina Language Model → Response - - -### RAG Demo with Optimized Search - -Test the complete RAG system leveraging our GSI performance optimizations. - - -```python -print("Testing RAG System with GSI-Optimized Vector Search") -print("=" * 60) - -try: - # Test with a specific query - sample_query = "What are the new eligibility rules for transgender women competing in leading women's golf tours, and what prompted these changes?" - print(f"User Query: {sample_query}") - print("\nProcessing with optimized pipeline...") - print("1. Converting query to vector embedding with Jina AI") - print("2. Searching GSI BHIVE index for relevant documents (optimized)") - print("3. Assembling context from retrieved documents") - print("4. Generating intelligent response with JinaChat") - - start_time = time.time() - rag_response = rag_chain.invoke(sample_query) - end_time = time.time() - - print(f"\nRAG Response (completed in {end_time - start_time:.2f} seconds):") - print("-" * 60) - print(rag_response) - -except Exception as e: - if "Payment Required" in str(e): - print("\nPayment required for Jina AI API.") - print("To resolve:") - print("• Visit https://jina.ai/reader/#pricing for subscription options") - print("• Ensure your API key is valid and has sufficient credits") - else: - print(f"Error: {str(e)}") -``` - - Testing RAG System with GSI-Optimized Vector Search - ============================================================ - User Query: What are the new eligibility rules for transgender women competing in leading women's golf tours, and what prompted these changes? - - Processing with optimized pipeline... - 1. Converting query to vector embedding with Jina AI - 2. Searching GSI BHIVE index for relevant documents (optimized) - 3. Assembling context from retrieved documents - 4. Generating intelligent response with JinaChat - - RAG Response (completed in 4.25 seconds): - ------------------------------------------------------------ - The new eligibility rules for transgender women competing in leading women's golf tours starting from 2025 prevent transgender women who have gone through male puberty from participating. Female players protesting led to these changes, as they called for policies to prevent those recorded as male at birth from competing in women's events. - - -### Multiple Query RAG Demo - -Test the RAG system with various queries to demonstrate the benefits of our optimized vector search. - - -```python -print("\nTesting Optimized RAG System with Multiple Queries") -print("=" * 55) - -try: - test_queries = [ - "What happened in the car incident on Shaftesbury Avenue in London?", - "What did King Charles talk about in his recent Christmas speech?", - ] - - for i, query in enumerate(test_queries, 1): - print(f"\n--- RAG Query {i} ---") - print(f"Question: {query}") - - start_time = time.time() - response = rag_chain.invoke(query) - end_time = time.time() - - print(f"Response (completed in {end_time - start_time:.2f} seconds): {response}") - -except Exception as e: - if "Payment Required" in str(e): - print("Payment required for Jina AI API.") - else: - print(f"Error: {str(e)}") - -print(f"\n✅ RAG demo completed successfully!") -print("✅ The system leverages GSI BHIVE optimization for fast document retrieval!") -print("✅ Jina AI provides high-quality embeddings and intelligent response generation!") -``` - - - Testing Optimized RAG System with Multiple Queries - ======================================================= - - --- RAG Query 1 --- - Question: What happened in the car incident on Shaftesbury Avenue in London? - Response (completed in 3.32 seconds): ### Answer: - A 31-year-old man was arrested on suspicion of attempted murder after driving a car on the wrong side of the road in Shaftesbury Avenue, London, injuring four pedestrians. The incident was treated as an isolated incident and was not terror-related. - - --- RAG Query 2 --- - Question: What did King Charles talk about in his recent Christmas speech? - Response (completed in 0.74 seconds): ### King Charles's Recent Christmas Speech Highlights: - - - Visited a Christmas market at Battersea Power Station. - - Met with Apple chief Tim Cook at Apple's UK headquarters. - - Interacted with carol singers, Christmas shoppers, and stallholders. - - Explored the power station and visited stalls at the Curated Makers Market. - - ✅ RAG demo completed successfully! - ✅ The system leverages GSI BHIVE optimization for fast document retrieval! - ✅ Jina AI provides high-quality embeddings and intelligent response generation! - - -## Conclusion - -You've successfully built a high-performance semantic search engine combining: -- **Couchbase GSI BHIVE indexes** for optimized vector search -- **Jina AI embeddings and language models** for intelligent processing -- **Complete RAG pipeline** with caching optimization diff --git a/tutorial/markdown/generated/vector-search-cookbook/jinaai-query_based-RAG_with_Couchbase_and_Jina_AI.md b/tutorial/markdown/generated/vector-search-cookbook/jinaai-query_based-RAG_with_Couchbase_and_Jina_AI.md deleted file mode 100644 index 43e8f86..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/jinaai-query_based-RAG_with_Couchbase_and_Jina_AI.md +++ /dev/null @@ -1,927 +0,0 @@ ---- -# frontmatter -path: "/tutorial-jina-couchbase-rag-with-hyperscale-or-composite-vector-index" -title: RAG with Jina AI using Couchbase Hyperscale and Composite Vector Index -short_title: RAG with Jina AI using Couchbase Hyperscale and Composite Vector Index -description: - - Learn how to build a semantic search engine using Couchbase and Jina. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Jina embeddings and language models. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain, Couchbase Hyperscale and Composite Vector Index. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Hyperscale Vector Index - - Composite Vector Index - - Artificial Intelligence - - LangChain - - Jina AI -sdk_language: - - python -length: 60 Mins -alt_paths: ["/tutorial-jina-couchbase-rag-with-hyperscale-vector-index", "/tutorial-jina-couchbase-rag-with-composite-vector-index"] ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/jinaai/query_based/RAG_with_Couchbase_and_Jina_AI.ipynb) - -## Introduction - -This tutorial demonstrates building a high-performance semantic search engine using Couchbase's Hyperscale and Composite indexes with Jina AI for embeddings and language models. We'll show measurable performance improvements with Hyperscale and Composite indexes optimization and implement a complete RAG (Retrieval-Augmented Generation) system. For deep dive on the working of these indexes refer to the following [documentation](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html). Alternatively if you want to perform semantic search using the Search Vector Index, please take a look at [this.](https://developer.couchbase.com/tutorial-jina-couchbase-rag-with-search-vector-index) - -**Key Features:** -- High-performance vector search using Hyperscale/Composite indexes -- Jina AI embeddings and language models -- Performance benchmarks showing Hyperscale/Composite index benefits -- Complete RAG workflow with caching optimization - -**Requirements:** Couchbase Server 8.0+ or Capella with Query Service enabled. - -## How to Run This Tutorial - -This tutorial is available as a Jupyter Notebook that you can run interactively on [Google Colab](https://colab.research.google.com/) or locally by setting up the Python environment. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/jinaai/query_based/RAG_with_Couchbase_and_Jina_AI.ipynb). - -## Prerequisites - -### System Requirements - -- **Couchbase Server 8.0+** or Couchbase Capella -- **Query Service enabled** (required for both Hyperscale and Composite Vector Indexes) -- **Jina AI API credentials** ([Get them here](https://jina.ai/)) -- **JinaChat API credentials** ([Get them here](https://chat.jina.ai/api)) - -### Couchbase Capella Setup - -1. **Create Account:** Deploy a [free tier cluster](https://cloud.couchbase.com/sign-up) -2. **Configure Access:** Set up database credentials and network security -3. **Enable Query Service:** Required for vector search functionality using Hyperscale and Composite vector index - -## Setup and Installation - -### Install Required Libraries - -Install the necessary packages for Couchbase vector search, Jina AI integration, and LangChain RAG capabilities. - - -```python -# Jina doesnt support openai other than 0.27 -%pip install --quiet datasets==3.6.0 langchain-couchbase==0.5.0 langchain-community==0.3.24 openai==0.27 python-dotenv==1.1.0 ipywidgets -``` - - Note: you may need to restart the kernel to use updated packages. - - -### Import Required Modules - -Import libraries for Couchbase vector search, Jina AI models, and LangChain components. - - -```python -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_community.chat_models import JinaChat -from langchain_community.embeddings import JinaEmbeddings -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -``` - -### Configure Logging - -Set up logging to track progress and capture any errors during execution. - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True) - -# Suppress all logs from specific loggers -logging.getLogger('openai').setLevel(logging.WARNING) -logging.getLogger('httpx').setLevel(logging.WARNING) -``` - -### Environment Configuration - -Load environment variables for secure access to Jina AI and Couchbase services. Create a `.env` file with your credentials. - - -```python -load_dotenv("./.env") - -JINA_API_KEY = os.getenv("JINA_API_KEY") -JINACHAT_API_KEY = os.getenv("JINACHAT_API_KEY") - -CB_HOST = os.getenv("CB_HOST") or 'couchbase://localhost' -CB_USERNAME = os.getenv("CB_USERNAME") or 'Administrator' -CB_PASSWORD = os.getenv("CB_PASSWORD") or 'password' -CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME") or 'vector-search-testing' -INDEX_NAME = os.getenv("INDEX_NAME") or 'vector_search_jina' - -SCOPE_NAME = os.getenv("SCOPE_NAME") or 'shared' -COLLECTION_NAME = os.getenv("COLLECTION_NAME") or 'jina' -CACHE_COLLECTION = os.getenv("CACHE_COLLECTION") or 'cache' - -# Check if the variables are correctly loaded -if not JINA_API_KEY: - raise ValueError("JINA_API_KEY environment variable is not set") -if not JINACHAT_API_KEY: - raise ValueError("JINACHAT_API_KEY environment variable is not set") -``` - -## Couchbase Connection Setup - -### Connect to Cluster - -Establish connection to Couchbase cluster for vector storage and retrieval operations. - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-10-08 11:18:34,736 - INFO - Successfully connected to Couchbase - - -### Setup Collections - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) -``` - - 2025-10-08 11:18:36,208 - INFO - Bucket 'vector-search-testing' exists. - 2025-10-08 11:18:36,219 - INFO - Collection 'jina' already exists. Skipping creation. - 2025-10-08 11:18:38,322 - INFO - All documents cleared from the collection. - 2025-10-08 11:18:38,322 - INFO - Bucket 'vector-search-testing' exists. - 2025-10-08 11:18:38,327 - INFO - Collection 'jina_cache' already exists. Skipping creation. - 2025-10-08 11:18:40,480 - INFO - All documents cleared from the collection. - - - - - - - - - -## Document Processing and Vector Store - -### Create Jina Embeddings - -Set up Jina AI embeddings to convert text into high-dimensional vectors that capture semantic meaning for similarity search. - - -```python -try: - embeddings = JinaEmbeddings( - jina_api_key=JINA_API_KEY, model_name="jina-embeddings-v3" - ) - logging.info("Successfully created JinaEmbeddings") -except Exception as e: - raise ValueError(f"Error creating JinaEmbeddings: {str(e)}") -``` - - 2025-10-08 11:18:56,191 - INFO - Successfully created JinaEmbeddings - - -### Create Couchbase Vector Store - -Set up the Couchbase vector store which enables both Hyperscale and Composite Vector Indexes for high-performance vector storage and similarity search using Couchbase's Query Service. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created Couchbase vector store") -except Exception as e: - raise ValueError(f"Failed to create Couchbase vector store: {str(e)}") -``` - - 2025-10-08 11:18:57,341 - INFO - Successfully created Couchbase vector store - - -### Index Creation Timing - -**Important**: Hyperscale and Composite Vector Indexes must be created AFTER uploading vector data. The index creation process analyzes existing vectors to optimize search performance through clustering and quantization. - -### Load Dataset - -Load the BBC News dataset for real-world testing data with authentic news articles covering various topics. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-10-08 11:19:03,903 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -#### Clean Data - -Remove duplicate articles to ensure clean search results. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -#### Store Data - -Process articles in batches and store them in the vector database with embeddings. We'll use 60% of the dataset for faster processing while maintaining good search quality. - - -```python -# Calculate 60% of the dataset size and round to nearest integer -dataset_size = len(unique_news_articles) -subset_size = round(dataset_size * 0.6) - -# Filter articles by length and create subset -filtered_articles = [article for article in unique_news_articles[:subset_size] - if article and len(article) <= 50000] - -# Process in batches -batch_size = 50 - -try: - vector_store.add_texts( - texts=filtered_articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully") - -except CouchbaseException as e: - logging.error(f"Couchbase error during ingestion: {str(e)}") - raise RuntimeError(f"Error performing document ingestion: {str(e)}") -except Exception as e: - if "Payment Required" in str(e): - logging.error("Payment required for Jina AI API. Please check your subscription status and API key.") - print("To resolve this error:") - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Consider upgrading your subscription plan if needed") - else: - logging.error(f"Unexpected error during ingestion: {str(e)}") - raise RuntimeError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-10-08 11:20:18,363 - INFO - Document ingestion completed successfully - - -## Vector Search Performance Testing - -Now let's demonstrate the performance benefits of Hyperscale/Composite Vector Index by testing pure vector search performance. We'll compare three optimization levels: - -1. **Baseline Performance**: Vector search without Hyperscale/Composite Vector Index optimization -2. **Hyperscale/Composite Vector Index-Optimized Performance**: Same search with Hyperscale/Composite Vector Index -3. **Cache Benefits**: Show how caching can be applied on top of Hyperscale/Composite Vector Index for repeated queries - -### Vector Index Types Overview - -Before we start testing, let's understand the index types available: - -**Hyperscale Vector Indexes:** -- **Best for**: Pure vector searches - content discovery, recommendations, semantic search -- **Performance**: High performance with low memory footprint, designed to scale to billions of vectors -- **Optimization**: Optimized for concurrent operations, supports simultaneous searches and inserts -- **Use when**: You primarily perform vector-only queries without complex scalar filtering -- **Ideal for**: Large-scale semantic search, recommendation systems, content discovery - -**Composite Vector Indexes:** -- **Best for**: Filtered vector searches that combine vector search with scalar value filtering -- **Performance**: Efficient pre-filtering where scalar attributes reduce the vector comparison scope -- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- **Ideal for**: Compliance-based filtering, user-specific searches, time-bounded queries -- **Note**: Scalar filters take precedence over vector similarity - -**Choosing the Right Index Type:** -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For this tutorial, we'll use **Hyperscale Vector Indexes:** as it's optimized for pure semantic search scenarios. - -### Index Configuration Details - -The `index_description` parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -**Format**: `'IVF[],{PQ|SQ}'` - -#### **IVF (Inverted File Index) - Centroids Configuration** -- **Purpose**: Controls how the dataset is subdivided into clusters for faster searches -- **Trade-offs**: More centroids = faster searches but slower training time -- **Auto-selection**: If omitted (e.g., `IVF,SQ8`), Couchbase automatically selects the optimal number based on dataset size -- **Manual setting**: Specify exact count (e.g., `IVF1000,SQ8` for 1000 centroids) - -#### **Quantization Options - Vector Compression** - -**SQ (Scalar Quantization)** -- **Purpose**: Compresses vectors by reducing precision of individual components -- **Settings**: `SQ4`, `SQ6`, `SQ8` (4-bit, 6-bit, 8-bit precision) -- **Trade-off**: Lower bits = more compression but less precision -- **Best for**: General-purpose applications where some precision loss is acceptable - -**PQ (Product Quantization)** -- **Purpose**: Advanced compression using subquantizers for better precision -- **Format**: `PQx` (e.g., `PQ32x8` = 32 subquantizers of 8 bits each) -- **Trade-off**: More complex but often better precision than SQ at similar compression ratios -- **Best for**: Applications requiring high precision with significant compression - -#### **Common Configuration Examples** - -``` -IVF,SQ8 # Auto-selected centroids with 8-bit scalar quantization (recommended default) -IVF1000,SQ6 # 1000 centroids with 6-bit scalar quantization (higher compression) -IVF,PQ32x8 # Auto-selected centroids with 32 subquantizers of 8 bits each -IVF500,PQ16x4 # 500 centroids with 16 subquantizers of 4 bits each (high compression) -``` - -#### **Performance Considerations** - -**Distance Interpretation**: In vector search using Hyperscale and Composite vector indexes, lower distance values indicate higher similarity, while higher distance values indicate lower similarity. - -**Scalability**: Hyperscale/Composite vector indexes can scale to billions of vectors with optimized concurrent operations, making them suitable for large-scale production deployments. - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on Hyperscale and Composite vector indexes, see [Couchbase Hyperscale and Composite Vector Index Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - -### Vector Search Test Function - - -```python -def test_vector_search_performance(vector_store, query, label="Vector Search"): - """Test pure vector search performance and return timing metrics""" - print(f"\n[{label}] Testing vector search performance") - print(f"[{label}] Query: '{query}'") - - start_time = time.time() - - try: - results = vector_store.similarity_search_with_score(query, k=3) - end_time = time.time() - search_time = end_time - start_time - - print(f"[{label}] Vector search completed in {search_time:.4f} seconds") - print(f"[{label}] Found {len(results)} documents") - - if results: - doc, distance = results[0] - print(f"[{label}] Top result distance: {distance:.6f} (lower = more similar)") - preview = doc.page_content[:100] + "..." if len(doc.page_content) > 100 else doc.page_content - print(f"[{label}] Top result preview: {preview}") - - return search_time - except Exception as e: - print(f"[{label}] Vector search failed: {str(e)}") - return None -``` - -### Test 1: Baseline Performance (No Hyperscale/Composite Vector Index) - -Test pure vector search performance without Hyperscale/Composite Vector Index optimization. - - -```python -# Test baseline vector search performance without Hyperscale/Composite Vector Index -test_query = "What was manchester city manager pep guardiola's reaction to the team's current form?" -print("Testing baseline vector search performance without Hyperscale/Composite Vector Index optimization...") -baseline_time = test_vector_search_performance(vector_store, test_query, "Baseline Search") -print(f"\nBaseline vector search time (without Hyperscale/Composite Vector Index): {baseline_time:.4f} seconds\n") -``` - - Testing baseline vector search performance without Hyperscale/Composite Vector Index optimization... - - [Baseline Search] Testing vector search performance - [Baseline Search] Query: 'What was manchester city manager pep guardiola's reaction to the team's current form?' - [Baseline Search] Vector search completed in 0.8305 seconds - [Baseline Search] Found 3 documents - [Baseline Search] Top result distance: 0.457932 (lower = more similar) - [Baseline Search] Top result preview: 'Promised change, but Juventus are back in crisis' - - "We have entirely changed the way we think about... - - Baseline vector search time (without Hyperscale/Composite Vector Index): 0.8305 seconds - - - -### Create Hyperscale Vector Index - -Now let's create a Hyperscale vector index to enable high-performance vector searches. The index creation is done programmatically through the vector store. - - -```python -# Create Hyperscale Vector Index for high-performance searches -print("Creating Hyperscale vector index...") -try: - vector_store.create_index( - index_type=IndexType.HYPERSCALE, # Use IndexType.COMPOSITE for Composite index - index_description="IVF,SQ8" - ) - print("Hyperscale vector index created successfully") - - # Wait for index to become available - print("Waiting for index to become available...") - time.sleep(5) - -except Exception as e: - if "already exists" in str(e).lower(): - print("Hyperscale vector index already exists, proceeding...") - else: - print(f"Error creating Hyperscale vector index: {str(e)}") -``` - - Creating Hyperscale vector index... - Hyperscale vector index created successfully - Waiting for index to become available... - - -### Alternative: Composite Index Configuration - -If your use case requires complex filtering with scalar attributes, you can create a **Composite index** instead by changing the configuration above: - -```python -# Alternative: Create a Composite index for filtered searches -vector_store.create_index( - index_type=IndexType.COMPOSITE, # Instead of IndexType.HYPERSCALE - index_description="IVF,SQ8" # Same quantization settings -) -``` - -### Test 2: Hyperscale vector indexes Optimized Performance - -Test the same vector search with Hyperscale optimization. - - -```python -# Test vector search performance with Hyperscale index -hyperscale_test_query = "What happened in the latest Premier League matches?" -print("Testing vector search performance with Hyperscale optimization...") -hyperscale_time = test_vector_search_performance(vector_store, hyperscale_test_query, "Hyperscale-Optimized Search") -``` - - Testing vector search performance with Hyperscale optimization... - - [Hyperscale-Optimized Search] Testing vector search performance - [Hyperscale-Optimized Search] Query: 'What happened in the latest Premier League matches?' - [Hyperscale-Optimized Search] Vector search completed in 0.6452 seconds - [Hyperscale-Optimized Search] Found 3 documents - [Hyperscale-Optimized Search] Top result distance: 0.394714 (lower = more similar) - [Hyperscale-Optimized Search] Top result preview: The latest updates and analysis from the BBC. - - -### Test 3: Cache Benefits Testing - -Now let's demonstrate how caching can improve performance for repeated queries. **Note**: Caching benefits apply to both baseline and Hyperscale-optimized searches. - - -```python -# Set up Couchbase cache (can be applied to any search approach) -print("Setting up Couchbase cache for improved performance on repeated queries...") -cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, -) -set_llm_cache(cache) -print("✓ Couchbase cache enabled!") -``` - - Setting up Couchbase cache for improved performance on repeated queries... - ✓ Couchbase cache enabled! - - - -```python -# Test cache benefits with a different query to avoid interference -cache_test_query = "What are the latest football transfer developments?" - -print("Testing cache benefits with vector search...") -print("First execution (cache miss):") -cache_time_1 = test_vector_search_performance(vector_store, cache_test_query, "Cache Test - First Run") - -print("\nSecond execution (cache hit - should be faster):") -cache_time_2 = test_vector_search_performance(vector_store, cache_test_query, "Cache Test - Second Run") -``` - - Testing cache benefits with vector search... - First execution (cache miss): - - [Cache Test - First Run] Testing vector search performance - [Cache Test - First Run] Query: 'What are the latest football transfer developments?' - [Cache Test - First Run] Vector search completed in 0.9695 seconds - [Cache Test - First Run] Found 3 documents - [Cache Test - First Run] Top result distance: 0.394020 (lower = more similar) - [Cache Test - First Run] Top result preview: The latest updates and analysis from the BBC. - - Second execution (cache hit - should be faster): - - [Cache Test - Second Run] Testing vector search performance - [Cache Test - Second Run] Query: 'What are the latest football transfer developments?' - [Cache Test - Second Run] Vector search completed in 0.5252 seconds - [Cache Test - Second Run] Found 3 documents - [Cache Test - Second Run] Top result distance: 0.394020 (lower = more similar) - [Cache Test - Second Run] Top result preview: The latest updates and analysis from the BBC. - - -### Vector Search Performance Analysis - -Let's analyze the vector search performance improvements across all optimization levels: - - -```python -print("\n" + "="*80) -print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"Phase 1 - Baseline Search (No Hyperscale): {baseline_time:.4f} seconds") -print(f"Phase 2 - Hyperscale-Optimized Search: {hyperscale_time:.4f} seconds") -if cache_time_1 and cache_time_2: - print(f"Phase 3 - Cache Benefits:") - print(f" First execution (cache miss): {cache_time_1:.4f} seconds") - print(f" Second execution (cache hit): {cache_time_2:.4f} seconds") - -print("\n" + "-"*80) -print("VECTOR SEARCH OPTIMIZATION IMPACT:") -print("-"*80) - -# Hyperscale improvement analysis -if baseline_time and hyperscale_time: - speedup = baseline_time / hyperscale_time if hyperscale_time > 0 else float('inf') - time_saved = baseline_time - hyperscale_time - percent_improvement = (time_saved / baseline_time) * 100 - print(f"Hyperscale Index Benefit: {speedup:.2f}x faster ({percent_improvement:.1f}% improvement)") - -# Cache improvement analysis -if cache_time_1 and cache_time_2 and cache_time_2 < cache_time_1: - cache_speedup = cache_time_1 / cache_time_2 - cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100 - print(f"Cache Benefit: {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)") -else: - print(f"Cache Benefit: Variable (depends on query complexity and caching mechanism)") - -print(f"\nKey Insights for Vector Search Performance:") -print(f"• Hyperscale indexes provide significant performance improvements for vector similarity search") -print(f"• Performance gains are most dramatic for complex semantic queries") -print(f"• Hyperscale vector index optimization is particularly effective for high-dimensional embeddings") -print(f"• Combined with proper quantization (SQ8), Hyperscale delivers production-ready performance") -print(f"• These performance improvements directly benefit any application using the vector store") -``` - - - ================================================================================ - VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - Phase 1 - Baseline Search (No Hyperscale): 0.8305 seconds - Phase 2 - Hyperscale-Optimized Search: 0.6452 seconds - Phase 3 - Cache Benefits: - First execution (cache miss): 0.9695 seconds - Second execution (cache hit): 0.5252 seconds - - -------------------------------------------------------------------------------- - VECTOR SEARCH OPTIMIZATION IMPACT: - -------------------------------------------------------------------------------- - Hyperscale Index Benefit: 1.29x faster (22.3% improvement) - Cache Benefit: 1.85x faster (45.8% improvement) - - Key Insights for Vector Search Performance: - • Hyperscale indexes provide significant performance improvements for vector similarity search - • Performance gains are most dramatic for complex semantic queries - • Hyperscale vector index optimization is particularly effective for high-dimensional embeddings - • Combined with proper quantization (SQ8), Hyperscale delivers production-ready performance - • These performance improvements directly benefit any application using the vector store - - -## Jina AI RAG Demo - -### What is RAG (Retrieval-Augmented Generation)? - -Now that we've optimized our vector search performance, let's demonstrate how to build a complete RAG system using Jina AI. RAG combines the power of our Hyperscale/Composite index optimized semantic search with language model generation: - -1. **Query Processing**: User question is converted to vector embedding using Jina AI -2. **Document Retrieval**: Hyperscale/Composite index finds most relevant documents (now with proven performance improvements) -3. **Context Assembly**: Retrieved documents provide factual context for the language model -4. **Response Generation**: Jina's language model generates intelligent answers grounded in the retrieved data - -This demo shows how the vector search performance improvements we validated directly enhance the RAG workflow efficiency. - -### Create Jina Language Model - -Initialize Jina's chat model for generating intelligent responses based on our Hyperscale/Composite optimized retrieval system. - - -```python -print("Setting up Jina AI language model for RAG demo...") -try: - llm = JinaChat(temperature=0.1, jinachat_api_key=JINACHAT_API_KEY) - print("✓ JinaChat language model created successfully") - logging.info("Successfully created JinaChat") -except Exception as e: - print(f"✗ Error creating JinaChat: {str(e)}") - print("Please check your JINACHAT_API_KEY and network connection.") - raise -``` - - 2025-10-08 11:24:30,099 - INFO - Successfully created JinaChat - - - Setting up Jina AI language model for RAG demo... - ✓ JinaChat language model created successfully - - -### Build Optimized RAG Pipeline - -Create the complete RAG pipeline that integrates our Hyperscale/Composite optimized vector search with Jina's language model. - - -```python -try: - # Create RAG prompt template for structured responses - template = """You are a helpful assistant that answers questions based on the provided context. - If you cannot answer based on the context provided, respond with a generic answer. - Answer the question as truthfully as possible using the context below: - - Context: - {context} - - Question: {question} - - Answer:""" - - prompt = ChatPromptTemplate.from_template(template) - - # Build the RAG chain: Hyperscale/Composite optimized Retrieval → Context → Generation → Output - rag_chain = ( - { - "context": vector_store.as_retriever(search_kwargs={"k": 2}), - "question": RunnablePassthrough() - } - | prompt - | llm - | StrOutputParser() - ) - print("Optimized RAG pipeline created successfully") - print("Components: Hyperscale/Composite Vector Search → Context Assembly → Jina Language Model → Response") -except Exception as e: - raise ValueError(f"Error creating RAG pipeline: {str(e)}") -``` - - Optimized RAG pipeline created successfully - Components: Hyperscale/Composite Vector Search → Context Assembly → Jina Language Model → Response - - -### RAG Demo with Optimized Search - -Test the complete RAG system leveraging our Hyperscale/Composite performance optimizations. - - -```python -print("Testing RAG System with Hyperscale/Composite-Optimized Vector Search") -print("=" * 60) - -try: - # Test with a specific query - sample_query = "What are the new eligibility rules for transgender women competing in leading women's golf tours, and what prompted these changes?" - print(f"User Query: {sample_query}") - print("\nProcessing with optimized pipeline...") - print("1. Converting query to vector embedding with Jina AI") - print("2. Searching Hyperscale/Composite vector index for relevant documents (optimized)") - print("3. Assembling context from retrieved documents") - print("4. Generating intelligent response with JinaChat") - - start_time = time.time() - rag_response = rag_chain.invoke(sample_query) - end_time = time.time() - - print(f"\nRAG Response (completed in {end_time - start_time:.2f} seconds):") - print("-" * 60) - print(rag_response) - -except Exception as e: - if "Payment Required" in str(e): - print("\nPayment required for Jina AI API.") - print("To resolve:") - print("• Visit https://jina.ai/reader/#pricing for subscription options") - print("• Ensure your API key is valid and has sufficient credits") - else: - print(f"Error: {str(e)}") -``` - - Testing RAG System with Hyperscale/Composite-Optimized Vector Search - ============================================================ - User Query: What are the new eligibility rules for transgender women competing in leading women's golf tours, and what prompted these changes? - - Processing with optimized pipeline... - 1. Converting query to vector embedding with Jina AI - 2. Searching Hyperscale/Composite vector index for relevant documents (optimized) - 3. Assembling context from retrieved documents - 4. Generating intelligent response with JinaChat - - RAG Response (completed in 4.25 seconds): - ------------------------------------------------------------ - The new eligibility rules for transgender women competing in leading women's golf tours starting from 2025 prevent transgender women who have gone through male puberty from participating. Female players protesting led to these changes, as they called for policies to prevent those recorded as male at birth from competing in women's events. - - -### Multiple Query RAG Demo - -Test the RAG system with various queries to demonstrate the benefits of our optimized vector search. - - -```python -print("\nTesting Optimized RAG System with Multiple Queries") -print("=" * 55) - -try: - test_queries = [ - "What happened in the car incident on Shaftesbury Avenue in London?", - "What did King Charles talk about in his recent Christmas speech?", - ] - - for i, query in enumerate(test_queries, 1): - print(f"\n--- RAG Query {i} ---") - print(f"Question: {query}") - - start_time = time.time() - response = rag_chain.invoke(query) - end_time = time.time() - - print(f"Response (completed in {end_time - start_time:.2f} seconds): {response}") - -except Exception as e: - if "Payment Required" in str(e): - print("Payment required for Jina AI API.") - else: - print(f"Error: {str(e)}") - -print(f"\n✅ RAG demo completed successfully!") -print("✅ The system leverages Hyperscale/Composite vector index optimization for fast document retrieval!") -print("✅ Jina AI provides high-quality embeddings and intelligent response generation!") -``` - - - Testing Optimized RAG System with Multiple Queries - ======================================================= - - --- RAG Query 1 --- - Question: What happened in the car incident on Shaftesbury Avenue in London? - Response (completed in 3.32 seconds): ### Answer: - A 31-year-old man was arrested on suspicion of attempted murder after driving a car on the wrong side of the road in Shaftesbury Avenue, London, injuring four pedestrians. The incident was treated as an isolated incident and was not terror-related. - - --- RAG Query 2 --- - Question: What did King Charles talk about in his recent Christmas speech? - Response (completed in 0.74 seconds): ### King Charles's Recent Christmas Speech Highlights: - - - Visited a Christmas market at Battersea Power Station. - - Met with Apple chief Tim Cook at Apple's UK headquarters. - - Interacted with carol singers, Christmas shoppers, and stallholders. - - Explored the power station and visited stalls at the Curated Makers Market. - - ✅ RAG demo completed successfully! - ✅ The system leverages Hyperscale/Composite vector index optimization for fast document retrieval! - ✅ Jina AI provides high-quality embeddings and intelligent response generation! - - -## Conclusion - -You've successfully built a high-performance semantic search engine combining: -- **Couchbase Hyperscale/Composite indexes** for optimized vector search -- **Jina AI embeddings and language models** for intelligent processing -- **Complete RAG pipeline** with caching optimization diff --git a/tutorial/markdown/generated/vector-search-cookbook/jinaai-search_based-RAG_with_Couchbase_and_Jina_AI.md b/tutorial/markdown/generated/vector-search-cookbook/jinaai-search_based-RAG_with_Couchbase_and_Jina_AI.md deleted file mode 100644 index f959e82..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/jinaai-search_based-RAG_with_Couchbase_and_Jina_AI.md +++ /dev/null @@ -1,774 +0,0 @@ ---- -# frontmatter -path: "/tutorial-jina-couchbase-rag-with-search-vector-index" -title: RAG with Jina AI using Couchbase Search Vector Index -short_title: RAG with Couchbase Search Vector Index and Jina -description: - - Learn how to build a semantic search engine using Couchbase and Jina. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Jina embeddings and language models. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase using Search Vector Index. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Search Vector Index - - Artificial Intelligence - - LangChain - - Jina AI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/jinaai/search_based/RAG_with_Couchbase_and_Jina_AI.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Jina](https://jina.ai/) as the AI-powered embedding and language model provider, utilizing Full-Text Search using search vector index. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively if you want to perform semantic search using Hyperscale or Composite indexes, please take a look at [this.](https://developer.couchbase.com/tutorial-jina-couchbase-rag-with-hyperscale-or-composite-vector-index) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/jinaai/search_based/RAG_with_Couchbase_and_Jina_AI.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for Jina AI - -* Please follow the [instructions](https://jina.ai/) to generate the Jina AI credentials. -* Please follow the [instructions](https://chat.jina.ai/api) to generate the JinaChat credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and Jina provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -# Jina doesnt support openai other than 0.27 -%pip install --quiet datasets==3.6.0 langchain-couchbase==0.3.0 langchain-community==0.3.24 openai==0.27 python-dotenv==1.1.0 ipywidgets -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_community.chat_models import JinaChat -from langchain_community.embeddings import JinaEmbeddings -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True) - -# Suppress all logs from specific loggers -logging.getLogger('openai').setLevel(logging.WARNING) -logging.getLogger('httpx').setLevel(logging.WARNING) -``` - -# Loading Sensitive Informnation -In this section, we prompt the user to input essential configuration settings needed for integrating Couchbase with Cohere's API. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv("./.env") - -JINA_API_KEY = os.getenv("JINA_API_KEY") -JINACHAT_API_KEY = os.getenv("JINACHAT_API_KEY") - -CB_HOST = os.getenv("CB_HOST") or 'couchbase://localhost' -CB_USERNAME = os.getenv("CB_USERNAME") or 'Administrator' -CB_PASSWORD = os.getenv("CB_PASSWORD") or 'password' -CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME") or 'vector-search-testing' -INDEX_NAME = os.getenv("INDEX_NAME") or 'vector_search_jina' - -SCOPE_NAME = os.getenv("SCOPE_NAME") or 'shared' -COLLECTION_NAME = os.getenv("COLLECTION_NAME") or 'jina' -CACHE_COLLECTION = os.getenv("CACHE_COLLECTION") or 'cache' - -# Check if the variables are correctly loaded -if not JINA_API_KEY: - raise ValueError("JINA_API_KEY environment variable is not set") -if not JINACHAT_API_KEY: - raise ValueError("JINACHAT_API_KEY environment variable is not set") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-23 10:45:51,014 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-09-23 10:45:56,608 - INFO - Bucket 'vector-search-testing' exists. - 2025-09-23 10:45:59,312 - INFO - Collection 'jina' already exists. Skipping creation. - 2025-09-23 10:46:02,683 - INFO - Primary index present or created successfully. - 2025-09-23 10:46:03,447 - INFO - All documents cleared from the collection. - 2025-09-23 10:46:03,449 - INFO - Bucket 'vector-search-testing' exists. - 2025-09-23 10:46:06,152 - INFO - Collection 'jina_cache' already exists. Skipping creation. - 2025-09-23 10:46:09,482 - INFO - Primary index present or created successfully. - 2025-09-23 10:46:09,804 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Search Vector Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Search Vector Index** comes into play. In this step, we load the Search Vector Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This Jina Search Vector Index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `jina`. The configuration is set up for vectors with exactly `1024 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a Search Vector Index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/jina_index.json' # Local setup: specify your file path here - -# # Version for Google Colab -# def load_index_definition_colab(): -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -# try: -# with open(index_definition_path, 'r') as file: -# index_definition = json.load(file) -# return index_definition -# except Exception as e: -# raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Version for Local Environment -def load_index_definition_local(index_definition_path): - try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - return index_definition - except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Usage -# Uncomment the appropriate line based on your environment -# index_definition = load_index_definition_colab() -index_definition = load_index_definition_local('jina_index.json') -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-09-23 10:47:03,763 - INFO - Index 'vector_search_jina' found - 2025-09-23 10:47:04,742 - INFO - Index 'vector_search_jina' already exists. Skipping creation/update. - - -# Creating Jina Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using Jina, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - - - -```python -try: - embeddings = JinaEmbeddings( - jina_api_key=JINA_API_KEY, model_name="jina-embeddings-v3" - ) - logging.info("Successfully created JinaEmbeddings") -except Exception as e: - raise ValueError(f"Error creating JinaEmbeddings: {str(e)}") -``` - - 2025-09-23 10:47:06,326 - INFO - Successfully created JinaEmbeddings - - -# Setting Up the Couchbase Vector Store -A vector store is where we'll keep our embeddings. Unlike the search vector index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-09-23 10:47:12,343 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-23 10:47:18,035 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Error Handling: If an error occurs, only the current batch is affected -3. Progress Tracking: Easier to monitor and track the ingestion progress -4. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -# Calculate 60% of the dataset size and round to nearest integer -dataset_size = len(unique_news_articles) -subset_size = round(dataset_size * 0.6) - -# Filter articles by length and create subset -filtered_articles = [article for article in unique_news_articles[:subset_size] - if article and len(article) <= 50000] - -# Process in batches -batch_size = 50 - -try: - vector_store.add_texts( - texts=filtered_articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully") - -except CouchbaseException as e: - logging.error(f"Couchbase error during ingestion: {str(e)}") - raise RuntimeError(f"Error performing document ingestion: {str(e)}") -except Exception as e: - if "Payment Required" in str(e): - logging.error("Payment required for Jina AI API. Please check your subscription status and API key.") - print("To resolve this error:") - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Consider upgrading your subscription plan if needed") - else: - logging.error(f"Unexpected error during ingestion: {str(e)}") - raise RuntimeError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-09-23 10:50:03,866 - INFO - Document ingestion completed successfully - - -# Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-23 10:50:21,526 - INFO - Successfully created cache - - -# Creating the Jina Language Model (LLM) -Language models are AI systems that are trained to understand and generate human language. We'll be using Jina's language model to process user queries and generate meaningful responses. This model is a key component of our semantic search engine, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our search engine with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our search engine truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - - - - -```python -try: - llm = JinaChat(temperature=0.1, jinachat_api_key=JINACHAT_API_KEY) - logging.info("Successfully created JinaChat") -except Exception as e: - logging.error(f"Error creating JinaChat: {str(e)}. Please check your API key and network connection.") - raise -``` - - 2025-09-23 10:50:22,466 - INFO - Successfully created JinaChat - - -## Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. - -In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of the CouchbaseSearchVectorStore. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - -### Note on Retry Mechanism -The search implementation includes a retry mechanism to handle rate limiting and API errors gracefully. If a rate limit error (HTTP 429) is encountered, the system will automatically retry the request up to 3 times with exponential backoff, waiting 2 seconds initially and doubling the wait time between each retry. This helps manage API usage limits while maintaining service reliability. For other types of errors, such as payment requirements or general failures, appropriate error messages and troubleshooting steps are provided to help diagnose and resolve the issue. - - -```python -def perform_semantic_search(query, vector_store, max_retries=3, retry_delay=2): - for attempt in range(max_retries): - try: - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=5) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - return search_results, search_elapsed_time - - except Exception as e: - error_str = str(e) - - # Check if it's a rate limit error (HTTP 429) - if "http_status: 429" in error_str or "query request rejected" in error_str: - logging.warning(f"Rate limit hit (attempt {attempt+1}/{max_retries}). Waiting {retry_delay} seconds...") - time.sleep(retry_delay) - retry_delay *= 2 # Exponential backoff - - if attempt == max_retries - 1: - logging.error("Maximum retry attempts reached. API rate limit exceeded.") - raise RuntimeError("API rate limit exceeded. Please try again later or check your subscription.") - else: - # For other errors, don't retry - logging.error(f"Search error: {error_str}") - if "Payment Required" in error_str: - raise RuntimeError("Payment required for Jina AI API. Please check your subscription status and API key.") - else: - raise RuntimeError(f"Search failed: {error_str}") - -try: - query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - search_results, search_elapsed_time = perform_semantic_search(query, vector_store) - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-"*80) - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-"*80) - -except RuntimeError as e: - print(f"Error: {str(e)}") - print("\nTroubleshooting steps:") - if "API rate limit" in str(e): - print("1. Wait a few minutes before trying again") - print("2. Reduce the frequency of your requests") - print("3. Consider upgrading your Jina AI plan for higher rate limits") - elif "Payment required" in str(e): - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Update your API key configuration") - else: - print("1. Check your network connection") - print("2. Verify your Couchbase and Jina configurations") - print("3. Review the vector store implementation for any bugs") -``` - - 2025-09-23 10:50:25,678 - INFO - Semantic search completed in 2.13 seconds - - - - Semantic Search Results (completed in 2.13 seconds): - -------------------------------------------------------------------------------- - Score: 0.6798, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - Pep Guardiola has not been through a moment like this in his managerial career. Manchester City have lost nine matches in their past 12 - as many defeats as they had suffered in their previous 106 fixtures. At the end of October, City were still unbeaten at the top of the Premier League and favourites to win a fifth successive title. Now they are seventh, 12 points behind leaders Liverpool having played a game more. It has been an incredible fall from grace and left people trying to work out what has happened - and whether Guardiola can make it right. After discussing the situation with those who know him best, I have taken a closer look at the future - both short and long term - and how the current crisis at Man City is going to be solved. - - Pep Guardiola's Man City have lost nine of their past 12 matches - - Guardiola has also been giving it a lot of thought. He has not been sleeping very well, as he has said, and has not been himself at times when talking to the media. He has been talking to a lot of people about what is going on as he tries to work out the reasons for City's demise. Some reasons he knows, others he still doesn't. What people perhaps do not realise is Guardiola hugely doubts himself and always has. He will be thinking "I'm not going to be able to get us out of this" and needs the support of people close to him to push away those insecurities - and he has that. He is protected by his people who are very aware, like he is, that there are a lot of people that want City to fail. It has been a turbulent time for Guardiola. Remember those marks he had on his head after the 3-3 draw with Feyenoord in the Champions League? He always scratches his head, it is a gesture of nervousness. Normally nothing happens but on that day one of his nails was far too sharp so, after talking to the players in the changing room where he scratched his head because of his usual agitated gesturing, he went to the news conference. His right-hand man Manel Estiarte sent him photos in a message saying "what have you got on your head?", but by the time Guardiola returned to the coaching room there was hardly anything there again. He started that day with a cover on his nose after the same thing happened at the training ground the day before. Guardiola was having a footballing debate with Kyle Walker about positional stuff and marked his nose with that same nail. There was also that remarkable news conference after the Manchester derby when he said "I don't know what to do". That is partly true and partly not true. Ignore the fact Guardiola suggested he was "not good enough". He actually meant he was not good enough to resolve the situation with the group of players he has available and with all the other current difficulties. There are obviously logical explanations for the crisis and the first one has been talked about many times - the absence of injured midfielder Rodri. You know the game Jenga? When you take the wrong piece out, the whole tower collapses. That is what has happened here. It is normal for teams to have an over-reliance on one player if he is the best in the world in his position. And you cannot calculate the consequences of an injury that rules someone like Rodri out for the season. City are a team, like many modern ones, in which the holding midfielder is a key element to the construction. So, when you take Rodri out, it is difficult to hold it together. There were Plan Bs - John Stones, Manuel Akanji, even Nathan Ake - but injuries struck. The big injury list has been out of the ordinary and the busy calendar has also played a part in compounding the issues. However, one factor even Guardiola cannot explain is the big uncharacteristic errors in almost every game from international players. Why did Matheus Nunes make that challenge to give away the penalty against Manchester United? Jack Grealish is sent on at the end to keep the ball and cannot do that. There are errors from Walker and other defenders. These are some of the best players in the world. Of course the players' mindset is important, and confidence is diminishing. Wrong decisions get taken so there is almost panic on the pitch instead of calm. There are also players badly out of form who are having to play because of injuries. Walker is now unable to hide behind his pace, I'm not sure Kevin de Bruyne is ever getting back to the level he used to be at, Bernardo Silva and Ilkay Gundogan do not have time to rest, Grealish is not playing at his best. Some of these players were only meant to be playing one game a week but, because of injuries, have played 12 games in 40 days. It all has a domino effect. One consequence is that Erling Haaland isn't getting the service to score. But the Norwegian still remains City's top-scorer with 13. Defender Josko Gvardiol is next on the list with just four. The way their form has been analysed inside the City camp is there have only been three games where they deserved to lose (Liverpool, Bournemouth and Aston Villa). But of course it is time to change the dynamic. - - Guardiola has never protected his players so much. He has not criticised them and is not going to do so. They have won everything with him. Instead of doing more with them, he has tried doing less. He has sometimes given them more days off to clear their heads, so they can reset - two days this week for instance. Perhaps the time to change a team is when you are winning, but no-one was suggesting Man City were about to collapse when they were top and unbeaten after nine league games. Some people have asked how bad it has to get before City make a decision on Guardiola. The answer is that there is no decision to be made. Maybe if this was Real Madrid, Barcelona or Juventus, the pressure from outside would be massive and the argument would be made that Guardiola has to go. At City he has won the lot, so how can anyone say he is failing? Yes, this is a crisis. But given all their problems, City's renewed target is finishing in the top four. That is what is in all their heads now. The idea is to recover their essence by improving defensive concepts that are not there and re-establishing the intensity they are known for. Guardiola is planning to use the next two years of his contract, which is expected to be his last as a club manager, to prepare a new Manchester City. When he was at the end of his four years at Barcelona, he asked two managers what to do when you feel people are not responding to your instructions. Do you go or do the players go? Sir Alex Ferguson and Rafael Benitez both told him that the players need to go. Guardiola did not listen because of his emotional attachment to his players back then and he decided to leave the Camp Nou because he felt the cycle was over. He will still protect his players now but there is not the same emotional attachment - so it is the players who are going to leave this time. It is likely City will look to replace five or six regular starters. Guardiola knows it is the end of an era and the start of a new one. Changes will not be immediate and the majority of the work will be done in the summer. But they are open to any opportunities in January - and a holding midfielder is one thing they need. In the summer City might want to get Spain's Martin Zubimendi from Real Sociedad and they know 60m euros (£50m) will get him. He said no to Liverpool last summer even though everything was agreed, but he now wants to move on and the Premier League is the target. Even if they do not get Zubimendi, that is the calibre of footballer they are after. A new Manchester City is on its way - with changes driven by Guardiola, incoming sporting director Hugo Viana and the football department. - -------------------------------------------------------------------------------- - Score: 0.6795, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - Pep Guardiola has not been through a moment like this in his managerial career. Manchester City have lost nine matches in their past 12 - as many defeats as they had suffered in their previous 106 fixtures. At the end of October, City were still unbeaten at the top of the Premier League and favourites to win a fifth successive title. Now they are seventh, 12 points behind leaders Liverpool having played a game more. It has been an incredible fall from grace and left people trying to work out what has happened - and whether Guardiola can make it right. After discussing the situation with those who know him best, I have taken a closer look at the future - both short and long term - and how the current crisis at Man City is going to be solved. - - Pep Guardiola's Man City have lost nine of their past 12 matches - - Guardiola has also been giving it a lot of thought. He has not been sleeping very well, as he has said, and has not been himself at times when talking to the media. He has been talking to a lot of people about what is going on as he tries to work out the reasons for City's demise. Some reasons he knows, others he still doesn't. What people perhaps do not realise is Guardiola hugely doubts himself and always has. He will be thinking "I'm not going to be able to get us out of this" and needs the support of people close to him to push away those insecurities - and he has that. He is protected by his people who are very aware, like he is, that there are a lot of people that want City to fail. It has been a turbulent time for Guardiola. Remember those marks he had on his head after the 3-3 draw with Feyenoord in the Champions League? He always scratches his head, it is a gesture of nervousness. Normally nothing happens but on that day one of his nails was far too sharp so, after talking to the players in the changing room where he scratched his head because of his usual agitated gesturing, he went to the news conference. His right-hand man Manel Estiarte sent him photos in a message saying "what have you got on your head?", but by the time Guardiola returned to the coaching room there was hardly anything there again. He started that day with a cover on his nose after the same thing happened at the training ground the day before. Guardiola was having a footballing debate with Kyle Walker about positional stuff and marked his nose with that same nail. There was also that remarkable news conference after the Manchester derby when he said "I don't know what to do". That is partly true and partly not true. Ignore the fact Guardiola suggested he was "not good enough". He actually meant he was not good enough to resolve the situation with the group of players he has available and with all the other current difficulties. There are obviously logical explanations for the crisis and the first one has been talked about many times - the absence of injured midfielder Rodri. You know the game Jenga? When you take the wrong piece out, the whole tower collapses. That is what has happened here. It is normal for teams to have an over-reliance on one player if he is the best in the world in his position. And you cannot calculate the consequences of an injury that rules someone like Rodri out for the season. City are a team, like many modern ones, in which the holding midfielder is a key element to the construction. So, when you take Rodri out, it is difficult to hold it together. There were Plan Bs - John Stones, Manuel Akanji, even Nathan Ake - but injuries struck. The big injury list has been out of the ordinary and the busy calendar has also played a part in compounding the issues. However, one factor even Guardiola cannot explain is the big uncharacteristic errors in almost every game from international players. Why did Matheus Nunes make that challenge to give away the penalty against Manchester United? Jack Grealish is sent on at the end to keep the ball and cannot do that. There are errors from Walker and other defenders. These are some of the best players in the world. Of course the players' mindset is important, and confidence is diminishing. Wrong decisions get taken so there is almost panic on the pitch instead of calm. There are also players badly out of form who are having to play because of injuries. Walker is now unable to hide behind his pace, I'm not sure Kevin de Bruyne is ever getting back to the level he used to be at, Bernardo Silva and Ilkay Gundogan do not have time to rest, Grealish is not playing at his best. Some of these players were only meant to be playing one game a week but, because of injuries, have played 12 games in 40 days. It all has a domino effect. One consequence is that Erling Haaland isn't getting the service to score. But the Norwegian still remains City's top-scorer with 13. Defender Josko Gvardiol is next on the list with just four. The way their form has been analysed inside the City camp is there have only been three games where they deserved to lose (Liverpool, Bournemouth and Aston Villa). But of course it is time to change the dynamic. - - Guardiola has never protected his players so much. He has not criticised them and is not going to do so. They have won everything with him. Instead of doing more with them, he has tried doing less. He has sometimes given them more days off to clear their heads, so they can reset - two days this week for instance. Perhaps the time to change a team is when you are winning, but no-one was suggesting Man City were about to collapse when they were top and unbeaten after nine league games. Some people have asked how bad it has to get before City make a decision on Guardiola. The answer is that there is no decision to be made. Maybe if this was Real Madrid, Barcelona or Juventus, the pressure from outside would be massive and the argument would be made that Guardiola has to go. At City he has won the lot, so how can anyone say he is failing? Yes, this is a crisis. But given all their problems, City's renewed target is finishing in the top four. That is what is in all their heads now. The idea is to recover their essence by improving defensive concepts that are not there and re-establishing the intensity they are known for. Guardiola is planning to use the next two years of his contract, which is expected to be his last as a club manager, to prepare a new Manchester City. When he was at the end of his four years at Barcelona, he asked two managers what to do when you feel people are not responding to your instructions. Do you go or do the players go? Sir Alex Ferguson and Rafael Benitez both told him that the players need to go. Guardiola did not listen because of his emotional attachment to his players back then and he decided to leave the Camp Nou because he felt the cycle was over. He will still protect his players now but there is not the same emotional attachment - so it is the players who are going to leave this time. It is likely City will look to replace five or six regular starters. Guardiola knows it is the end of an era and the start of a new one. Changes will not be immediate and the majority of the work will be done in the summer. But they are open to any opportunities in January - and a holding midfielder is one thing they need. In the summer City might want to get Spain's Martin Zubimendi from Real Sociedad and they know 60m euros (£50m) will get him. He said no to Liverpool last summer even though everything was agreed, but he now wants to move on and the Premier League is the target. Even if they do not get Zubimendi, that is the calibre of footballer they are after. A new Manchester City is on its way - with changes driven by Guardiola, incoming sporting director Hugo Viana and the football department. - -------------------------------------------------------------------------------- - Score: 0.6207, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - - ... (output truncated for brevity) - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -try: - template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below: - {context} - - Question: {question}""" - prompt = ChatPromptTemplate.from_template(template) - - rag_chain = ( - {"context": vector_store.as_retriever(search_kwargs={"k": 2}), "question": RunnablePassthrough()} - | prompt - | llm - | StrOutputParser() - ) - logging.info("Successfully created RAG chain") -except Exception as e: - raise ValueError(f"Error creating RAG chain: {str(e)}") -``` - - 2025-09-23 10:50:26,937 - INFO - Successfully created RAG chain - - - -```python -try: - # Create chain with k=2 - # Start with k=4 and gradually reduce if token limit exceeded - # k=4 -> k=3 -> k=2 based on token limit warnings - # Final k=2 produced valid response about Guardiola in 2.33 seconds - current_chain = ( - { - "context": vector_store.as_retriever(search_kwargs={"k": 2}), - "question": RunnablePassthrough() - } - | prompt - | llm - | StrOutputParser() - ) - - # Try to get response - start_time = time.time() - rag_response = current_chain.invoke(query) - elapsed_time = time.time() - start_time - - logging.info(f"RAG response generated in {elapsed_time:.2f} seconds using k=2") - print(f"RAG Response: {rag_response}") - print(f"Response generated in {elapsed_time:.2f} seconds") - -except Exception as e: - if "Payment Required" in str(e): - logging.error("Payment required for Jina AI API. Please check your subscription status and API key.") - print("To resolve this error:") - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Consider upgrading your subscription plan if needed") - else: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-23 10:50:47,733 - INFO - RAG response generated in 17.23 seconds using k=2 - - - RAG Response: Pep Guardiola has been grappling with self-doubt and seeking support to navigate Manchester City's current crisis. - Response generated in 17.23 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What was manchester city manager pep guardiola's reaction to the team's current form?", # Repeated query - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - - print(f"Time taken: {elapsed_time:.2f} seconds") -except Exception as e: - if "Payment Required" in str(e): - logging.error("Payment required for Jina AI API. Please check your subscription status and API key.") - print("To resolve this error:") - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Consider upgrading your subscription plan if needed") - else: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: Fulham and Liverpool played to a 2-2 draw at Anfield, with both teams showcasing strong performances. - Time taken: 5.13 seconds - - Query 2: What was manchester city manager pep guardiola's reaction to the team's current form? - Response: Pep Guardiola has been grappling with self-doubt and seeking support to navigate Manchester City's current crisis. - Time taken: 2.16 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: Fulham and Liverpool played to a 2-2 draw at Anfield, with both teams showcasing strong performances. - Time taken: 1.95 seconds - - -## Conclusion -By following these steps, you’ll have a fully functional semantic search engine that leverages the strengths of Couchbase and Jina. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you’re a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/lamaindex-fts-RAG_with_Couchbase_Capella_and_OpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/lamaindex-fts-RAG_with_Couchbase_Capella_and_OpenAI.md deleted file mode 100644 index 35b8d9c..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/lamaindex-fts-RAG_with_Couchbase_Capella_and_OpenAI.md +++ /dev/null @@ -1,505 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-llamaindex-rag-with-fts" -title: "Retrieval-Augmented Generation (RAG) with OpenAI, LlamaIndex and Couchbase Search Vector Index" -short_title: "RAG with OpenAI, LlamaIndex and Couchbase Search Vector Index" -description: - - Learn how to build a semantic search engine using Couchbase's Search Vector Index. - - This tutorial demonstrates how to integrate Couchbase's search vector search capabilities with the embeddings generated by OpenAI Services. - - You will understand how to perform Retrieval-Augmented Generation (RAG) using Llamaindex, Couchbase and OpenAI services. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - OpenAI - - Artificial Intelligence - - LlamaIndex - - FTS -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/lamaindex/fts/RAG_with_Couchbase_Capella_and_OpenAI.ipynb) - -# Introduction - -In this guide, we will walk you through building a Retrieval Augmented Generation (RAG) application using Couchbase Capella as the database, [gpt-4o](https://platform.openai.com/docs/models/gpt-4o) model as the large language model provided by OpenAI. We will use the [text-embedding-3-large](https://platform.openai.com/docs/guides/embeddings/embedding-models) model for generating embeddings. - -This notebook demonstrates how to build a RAG system using: -- The [BBC News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime) containing news articles -- Couchbase Capella as the vector store with FTS (Full Text Search) for vector index creation -- LlamaIndex framework for the RAG pipeline -- OpenAI for embeddings and text generation - -We leverage Couchbase's Full Text Search (FTS) service to create and manage vector indexes, enabling efficient semantic search capabilities. FTS provides the infrastructure for storing, indexing, and querying high-dimensional vector embeddings alongside traditional text search functionality. - -Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial will equip you with the knowledge to create a fully functional RAG system using OpenAI Services and LlamaIndex. - -# Before you start - -## Create and Deploy Your Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy an operational cluster. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Have a multi-node Capella cluster running the Data, Query, Index, and Search services. -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -### OpenAI Models Setup - -In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context. - -For this implementation, we'll use OpenAI's models which provide state-of-the-art performance for both embeddings and text generation: - -**Embedding Model**: We'll use OpenAI's `text-embedding-3-large` model, which provides high-quality embeddings with 3,072 dimensions for semantic search capabilities. - -**Large Language Model**: We'll use OpenAI's `gpt-4o` model for generating responses based on the retrieved context. This model offers excellent reasoning capabilities and can handle complex queries effectively. - -**Prerequisites for OpenAI Integration**: -* Create an OpenAI account at [platform.openai.com](https://platform.openai.com) -* Generate an API key from your OpenAI dashboard -* Ensure you have sufficient credits or a valid payment method set up -* Set up your API key as an environment variable or input it securely in the notebook - -For more details about OpenAI's models and pricing, please refer to the [OpenAI documentation](https://platform.openai.com/docs/models). - - -# Installing Necessary Libraries -To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, LlamaIndex handles AI model integrations, and we will use the OpenAI SDK for generating embeddings and calling OpenAI's language models. - - - -```python -# Install required packages -%pip install datasets llama-index-vector-stores-couchbase==0.6.0 llama-index-embeddings-openai==0.5.1 llama-index-llms-openai==0.5.6 llama-index==0.14.2 -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - - -```python -import getpass -import base64 -import logging -import sys -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import CouchbaseException -from couchbase.options import ClusterOptions - -from datasets import load_dataset - -from llama_index.core import Settings, Document -from llama_index.core.ingestion import IngestionPipeline -from llama_index.core.node_parser import SentenceSplitter -from llama_index.vector_stores.couchbase import CouchbaseSearchVectorStore -from llama_index.embeddings.openai import OpenAIEmbedding -from llama_index.llms.openai import OpenAI - -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, collection names, and API keys. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - -**OPENAI_API_KEY** is your OpenAI API key which can be obtained from your OpenAI dashboard at [platform.openai.com](https://platform.openai.com/api-keys). - -**INDEX_NAME** is the name of the FTS search index we will use for vector search operations. - - -```python -CB_CONNECTION_STRING = input("Couchbase Cluster URL (default: localhost): ") or "localhost" -CB_USERNAME = input("Couchbase Username (default: admin): ") or "admin" -CB_PASSWORD = input("Couchbase password (default: Password@12345): ") or "Password@12345" -CB_BUCKET_NAME = input("Couchbase Bucket: ") -SCOPE_NAME = input("Couchbase Scope: ") -COLLECTION_NAME = input("Couchbase Collection: ") -INDEX_NAME = input("Vector Search Index: ") -OPENAI_API_KEY = input("OpenAI API Key: ") - -# Check if the variables are correctly loaded -if not all([CB_CONNECTION_STRING, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_NAME, OPENAI_API_KEY]): - raise ValueError("All configuration variables must be provided.") - -``` - -# Setting Up Logging -Logging is essential for tracking the execution of our script and debugging any issues that may arise. We set up a logger that will display information about the script's progress, including timestamps and log levels. - - - -```python -# Configure logging -logging.basicConfig( - level=logging.INFO, - format="%(asctime)s - %(levelname)s - %(message)s", - handlers=[logging.StreamHandler(sys.stdout)], -) -``` - -# Connecting to Couchbase Capella -The next step is to establish a connection to our Couchbase Capella cluster. This connection will allow us to interact with the database, store and retrieve documents, and perform vector searches. - - - -```python -try: - # Initialize the Couchbase Cluster - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - - # Connect to the cluster - cluster = Cluster(CB_CONNECTION_STRING, options) - - # Wait for the cluster to be ready - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to the Couchbase cluster") -except CouchbaseException as e: - raise RuntimeError(f"Failed to connect to Couchbase: {str(e)}") -``` - -# Setting Up the Bucket, Scope, and Collection -Before we can store our data, we need to ensure that the appropriate bucket, scope, and collection exist in our Couchbase cluster. The code below checks if these components exist and creates them if they don't, providing a foundation for storing our vector embeddings and documents. - - -```python -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -import json - -# Create bucket if it does not exist -bucket_manager = cluster.buckets() -try: - bucket_manager.get_bucket(CB_BUCKET_NAME) - print(f"Bucket '{CB_BUCKET_NAME}' already exists.") -except Exception as e: - print(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating bucket...") - bucket_settings = CreateBucketSettings(name=CB_BUCKET_NAME, ram_quota_mb=500) - bucket_manager.create_bucket(bucket_settings) - print(f"Bucket '{CB_BUCKET_NAME}' created successfully.") - -# Create scope and collection if they do not exist -collection_manager = cluster.bucket(CB_BUCKET_NAME).collections() -scopes = collection_manager.get_all_scopes() -scope_exists = any(scope.name == SCOPE_NAME for scope in scopes) - -if scope_exists: - print(f"Scope '{SCOPE_NAME}' already exists.") -else: - print(f"Scope '{SCOPE_NAME}' does not exist. Creating scope...") - collection_manager.create_scope(SCOPE_NAME) - print(f"Scope '{SCOPE_NAME}' created successfully.") - -collections = [collection.name for scope in scopes if scope.name == SCOPE_NAME for collection in scope.collections] -collection_exists = COLLECTION_NAME in collections - -if collection_exists: - print(f"Collection '{COLLECTION_NAME}' already exists in scope '{SCOPE_NAME}'.") -else: - print(f"Collection '{COLLECTION_NAME}' does not exist in scope '{SCOPE_NAME}'. Creating collection...") - collection_manager.create_collection(collection_name=COLLECTION_NAME, scope_name=SCOPE_NAME) - print(f"Collection '{COLLECTION_NAME}' created successfully.") - -``` - -# Creating or Updating Search Indexes -With the index definition loaded, the next step is to create or update the Vector Search Index in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our RAG to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust RAG system. - - - -```python -# Create search index from search_index.json file at scope level -with open('fts_index.json', 'r') as search_file: - search_index_definition = SearchIndex.from_json(json.load(search_file)) - - # Update search index definition with user inputs - search_index_definition.name = INDEX_NAME - search_index_definition.source_name = CB_BUCKET_NAME - - # Update types mapping - old_type_key = next(iter(search_index_definition.params['mapping']['types'].keys())) - type_obj = search_index_definition.params['mapping']['types'].pop(old_type_key) - search_index_definition.params['mapping']['types'][f"{SCOPE_NAME}.{COLLECTION_NAME}"] = type_obj - - - search_index_name = search_index_definition.name - - # Get scope-level search manager - scope_search_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - try: - # Check if index exists at scope level - existing_index = scope_search_manager.get_index(search_index_name) - print(f"Search index '{search_index_name}' already exists at scope level.") - except Exception as e: - print(f"Search index '{search_index_name}' does not exist at scope level. Creating search index from fts_index.json...") - scope_search_manager.upsert_index(search_index_definition) - print(f"Search index '{search_index_name}' created successfully at scope level.") -``` - -# Load the BBC News Dataset -To build a RAG engine, we need data to search through. We use the [BBC Realtime News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime), a dataset with up-to-date BBC news articles grouped by month. This dataset contains articles that were created after the LLM was trained. It will showcase the use of RAG to augment the LLM. - -The BBC News dataset's varied content allows us to simulate real-world scenarios where users ask complex questions, enabling us to fine-tune our RAG's ability to understand and respond to various types of queries. - - - -```python -try: - news_dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train") - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") -except Exception as e: - raise ValueError(f"Error loading TREC dataset: {str(e)}") -``` - -## Preview the Data - - -```python -# Print the first two examples from the dataset -print("Dataset columns:", news_dataset.column_names) -print("\nFirst two examples:") -print(news_dataset[:2]) -``` - -## Preparing the Data for RAG - -We need to extract the context passages from the dataset to use as our knowledge base for the RAG system. - - -```python -import hashlib - -news_articles = news_dataset -unique_articles = {} - -for article in news_articles: - content = article.get("content") - if content: - content_hash = hashlib.md5(content.encode()).hexdigest() # Generate hash of content - if content_hash not in unique_articles: - unique_articles[content_hash] = article # Store full article - -unique_news_articles = list(unique_articles.values()) # Convert back to list - -print(f"We have {len(unique_news_articles)} unique articles in our database.") - -``` - -# Creating Embeddings using OpenAI -Embeddings are numerical representations of text that capture semantic meaning. Unlike keyword-based search, embeddings enable semantic search to understand context and retrieve documents that are conceptually similar even without exact keyword matches. We'll use OpenAI's `text-embedding-3-large` model to create high-quality embeddings with 3,072 dimensions. This model transforms our text data into vector representations that can be efficiently searched, with a batch size of 30 for optimal processing. - - - -```python -try: - # Set up the embedding model - embed_model = OpenAIEmbedding( - api_key=OPENAI_API_KEY, - embed_batch_size=30, - model="text-embedding-3-large" - ) - - # Configure LlamaIndex to use this embedding model - Settings.embed_model = embed_model - print("Successfully created embedding model") -except Exception as e: - raise ValueError(f"Error creating embedding model: {str(e)}") -``` - -# Testing the Embeddings Model -We can test the embeddings model by generating an embedding for a string - - -```python -test_embedding = embed_model.get_text_embedding("this is a test sentence") -print(f"Embedding dimension: {len(test_embedding)}") -``` - -# Setting Up the Couchbase Vector Store -The vector store is set up to store the documents from the dataset. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. - - -```python -try: - # Create the Couchbase vector store - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - index_name=INDEX_NAME, - ) - print("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - -# Creating LlamaIndex Documents -In this section, we'll process our news articles and create LlamaIndex Document objects. -Each Document is created with specific metadata and formatting templates to control what the LLM and embedding model see. -We'll observe examples of the formatted content to understand how the documents are structured. - - -```python -from llama_index.core.schema import MetadataMode - -llama_documents = [] -# Process and store documents -for article in unique_news_articles: # Limit to first 100 for demo - try: - document = Document( - text=article["content"], - metadata={ - "title": article["title"], - "description": article["description"], - "published_date": article["published_date"], - "link": article["link"], - }, - excluded_llm_metadata_keys=["description"], - excluded_embed_metadata_keys=["description", "published_date", "link"], - metadata_template="{key}=>{value}", - text_template="Metadata: \n{metadata_str}\n-----\nContent: {content}", - ) - llama_documents.append(document) - except Exception as e: - print(f"Failed to save document to vector store: {str(e)}") - continue - -# Observing an example of what the LLM and Embedding model receive as input -print("The LLM sees this:") -print(llama_documents[0].get_content(metadata_mode=MetadataMode.LLM)) -print("The Embedding model sees this:") -print(llama_documents[0].get_content(metadata_mode=MetadataMode.EMBED)) - - -``` - -# Creating and Running the Ingestion Pipeline - -In this section, we'll create an ingestion pipeline to process our documents. The pipeline will: - -1. Split the documents into smaller chunks (nodes) using the SentenceSplitter -2. Generate embeddings for each node using our embedding model -3. Store these nodes with their embeddings in our Couchbase vector store - -This process transforms our raw documents into a searchable knowledge base that can be queried semantically. - - -```python - - -# Process documents: split into nodes, generate embeddings, and store in vector database -# Step 3: Create and Run IndexPipeline -index_pipeline = IngestionPipeline( - transformations=[SentenceSplitter(),embed_model], - vector_store=vector_store, -) - -index_pipeline.run(documents=llama_documents) - -``` - -# Using OpenAI's Large Language Model (LLM) -Large language models are AI systems that are trained to understand and generate human language. We'll be using OpenAI's `gpt-4o` model to process user queries and generate meaningful responses based on the retrieved context from our Couchbase vector store. This model is a key component of our RAG system, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By integrating OpenAI's LLM, we equip our RAG system with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our RAG system truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - -The LLM is configured using LlamaIndex's OpenAI-like provider with OpenAI's API endpoint and your OpenAI API key for seamless integration with their services. - - -```python -try: - # Set up the LLM - llm = OpenAI( - api_key=OPENAI_API_KEY, - model="gpt-4o", - - ) - - - # Configure LlamaIndex to use this LLM - Settings.llm = llm - logging.info("Successfully created the OpenAI LLM") -except Exception as e: - raise ValueError(f"Error creating OpenAI LLM: {str(e)}") -``` - -# Creating the Vector Store Index - -In this section, we'll create a VectorStoreIndex from our Couchbase vector store. This index serves as the foundation for our RAG system, enabling semantic search capabilities and efficient retrieval of relevant information. - -The VectorStoreIndex provides a high-level interface to interact with our vector store, allowing us to: -1. Perform semantic searches based on user queries -2. Retrieve the most relevant documents or chunks -3. Generate contextually appropriate responses using our LLM - - - -```python -# Create your index -from llama_index.core import VectorStoreIndex - -index = VectorStoreIndex.from_vector_store(vector_store) -rag = index.as_query_engine() -``` - -# Retrieval-Augmented Generation (RAG) with Couchbase and LlamaIndex - -Let's test our RAG system by performing a semantic search on a sample query. In this example, we'll use a question about Pep Guardiola's reaction to Manchester City's recent form. The RAG system will: - -1. Process the natural language query -2. Search through our vector database for relevant information -3. Retrieve the most semantically similar documents -4. Generate a comprehensive response using the LLM - -This demonstrates how our system combines the power of vector search with language model capabilities to provide accurate, contextual answers based on the information in our database. - - -```python -# Sample query from the dataset - -query = "What was Pep Guardiola's reaction to Manchester City's recent form?" - -try: - # Perform the semantic search - start_time = time.time() - response = rag.query(query) - search_elapsed_time = time.time() - start_time - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print(response) - -except RecursionError as e: - raise RuntimeError(f"Error performing semantic search: {e}") -``` - -# Conclusion -In this tutorial, we've built a Retrieval Augmented Generation (RAG) system using Couchbase Capella, Openai and LlamaIndex. We used the BBC News dataset, which contains real-time news articles, to demonstrate how RAG can be used to answer questions about current events and provide up-to-date information that extends beyond the LLM's training data. - -The key components of our RAG system include: - -1. **Couchbase Capella** as the vector database for storing and retrieving document embeddings -2. **LlamaIndex** as the framework for connecting our data to the LLM -3. **OpenAI Services** for generating embeddings (`text-embedding-3-large`) and LLM responses (`gpt-4o`) - -This approach allows us to enhance the capabilities of large language models by grounding their responses in specific, up-to-date information from our knowledge base. diff --git a/tutorial/markdown/generated/vector-search-cookbook/lamaindex-gsi-RAG_with_Couchbase_Capella_and_OpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/lamaindex-gsi-RAG_with_Couchbase_Capella_and_OpenAI.md deleted file mode 100644 index c3b8e07..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/lamaindex-gsi-RAG_with_Couchbase_Capella_and_OpenAI.md +++ /dev/null @@ -1,588 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-llamaindex-rag-with-gsi" -title: "RAG with OpenAI, LlamaIndex and Couchbase Hyperscale and Composite Vector Indexes" -short_title: "RAG with OpenAI, LlamaIndex and Couchbase CVI and HVI" -description: - - Learn how to build a semantic search engine using Couchbase's Hyperscale and Composite Vector Indexes. - - This tutorial demonstrates how to integrate Couchbase's GSI vector search capabilities with OpenAI embeddings. - - You will understand how to perform Retrieval-Augmented Generation (RAG) using LlamaIndex and GSI vector indexes. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - OpenAI - - Artificial Intelligence - - LlamaIndex - - GSI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/lamaindex/gsi/RAG_with_Couchbase_Capella_and_OpenAI.ipynb) - -# Introduction - -In this guide, we will walk you through building a Retrieval Augmented Generation (RAG) application using Couchbase Capella as the database, [gpt-4o](https://platform.openai.com/docs/models/gpt-4o) model as the large language model provided by OpenAI. We will use the [text-embedding-3-large](https://platform.openai.com/docs/guides/embeddings/embedding-models) model for generating embeddings. - -This notebook demonstrates how to build a RAG system using: -- The [BBC News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime) containing news articles -- Couchbase Capella as the vector store with GSI (Global Secondary Index) for vector search -- LlamaIndex framework for the RAG pipeline -- OpenAI for embeddings and text generation - -We leverage Couchbase's Global Secondary Index (GSI) vector search capabilities to create and manage vector indexes, enabling efficient semantic search capabilities. GSI provides high-performance vector search with support for both Hyperscale Vector Indexes (BHIVE) and Composite Vector Indexes, designed to scale to billions of vectors with low memory footprint and optimized concurrent operations. - -Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial will equip you with the knowledge to create a fully functional RAG system using OpenAI Services and LlamaIndex with Couchbase's advanced GSI vector search. - -# Before you start - -## Create and Deploy Your Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy an operational cluster. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Have a multi-node Capella cluster running the Data, Query, Index, and Search services. -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -### OpenAI Models Setup - -In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context. - -For this implementation, we'll use OpenAI's models which provide state-of-the-art performance for both embeddings and text generation: - -**Embedding Model**: We'll use OpenAI's `text-embedding-3-large` model, which provides high-quality embeddings with 3,072 dimensions for semantic search capabilities. - -**Large Language Model**: We'll use OpenAI's `gpt-4o` model for generating responses based on the retrieved context. This model offers excellent reasoning capabilities and can handle complex queries effectively. - -**Prerequisites for OpenAI Integration**: -* Create an OpenAI account at [platform.openai.com](https://platform.openai.com) -* Generate an API key from your OpenAI dashboard -* Ensure you have sufficient credits or a valid payment method set up -* Set up your API key as an environment variable or input it securely in the notebook - -For more details about OpenAI's models and pricing, please refer to the [OpenAI documentation](https://platform.openai.com/docs/models). - - -# Installing Necessary Libraries -To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, LlamaIndex handles AI model integrations, and we will use the OpenAI SDK for generating embeddings and calling OpenAI's language models. - - - -```python -# Install required packages -%pip install datasets llama-index-vector-stores-couchbase==0.6.0 llama-index-embeddings-openai==0.5.1 llama-index-llms-openai==0.5.6 llama-index==0.14.2 -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - - -```python -import getpass -import base64 -import logging -import sys -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import CouchbaseException -from couchbase.options import ClusterOptions, KnownConfigProfiles, QueryOptions - -from datasets import load_dataset - -from llama_index.core import Settings, Document -from llama_index.core.ingestion import IngestionPipeline -from llama_index.core.node_parser import SentenceSplitter -from llama_index.vector_stores.couchbase import CouchbaseQueryVectorStore -from llama_index.embeddings.openai import OpenAIEmbedding -from llama_index.llms.openai import OpenAI -from llama_index.vector_stores.couchbase import CouchbaseQueryVectorStore, QueryVectorSearchSimilarity, QueryVectorSearchType - -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, collection names, and API keys. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - -**OPENAI_API_KEY** is your OpenAI API key which can be obtained from your OpenAI dashboard at [platform.openai.com](https://platform.openai.com/api-keys). - -**INDEX_NAME** is the name of the GSI vector index we will create for vector search operations. - - -```python -CB_CONNECTION_STRING = input("Couchbase Cluster URL (default: localhost): ") or "localhost" -CB_USERNAME = input("Couchbase Username (default: admin): ") or "admin" -CB_PASSWORD = input("Couchbase password (default: Password@12345): ") or "Password@12345" -CB_BUCKET_NAME = input("Couchbase Bucket: ") -SCOPE_NAME = input("Couchbase Scope: ") -COLLECTION_NAME = input("Couchbase Collection: ") -INDEX_NAME = input("Vector Search Index: ") -OPENAI_API_KEY = input("OpenAI API Key: ") - -# Check if the variables are correctly loaded -if not all([CB_CONNECTION_STRING, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_NAME, OPENAI_API_KEY]): - raise ValueError("All configuration variables must be provided.") - -``` - -# Setting Up Logging -Logging is essential for tracking the execution of our script and debugging any issues that may arise. We set up a logger that will display information about the script's progress, including timestamps and log levels. - - - -```python -# Configure logging -logging.basicConfig( - level=logging.INFO, - format="%(asctime)s - %(levelname)s - %(message)s", - handlers=[logging.StreamHandler(sys.stdout)], -) -``` - -# Connecting to Couchbase Capella -The next step is to establish a connection to our Couchbase Capella cluster. This connection will allow us to interact with the database, store and retrieve documents, and perform vector searches. - - - -```python -try: - # Initialize the Couchbase Cluster - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - options.apply_profile(KnownConfigProfiles.WanDevelopment) - # Connect to the cluster - cluster = Cluster(CB_CONNECTION_STRING, options) - - # Wait for the cluster to be ready - cluster.wait_until_ready(timedelta(seconds=5)) - - logging.info("Successfully connected to the Couchbase cluster") -except CouchbaseException as e: - raise RuntimeError(f"Failed to connect to Couchbase: {str(e)}") -``` - -# Setting Up the Bucket, Scope, and Collection -Before we can store our data, we need to ensure that the appropriate bucket, scope, and collection exist in our Couchbase cluster. The code below checks if these components exist and creates them if they don't, providing a foundation for storing our vector embeddings and documents. - - -```python -from couchbase.management.buckets import CreateBucketSettings -import json - -# Create bucket if it does not exist -bucket_manager = cluster.buckets() -try: - bucket_manager.get_bucket(CB_BUCKET_NAME) - print(f"Bucket '{CB_BUCKET_NAME}' already exists.") -except Exception as e: - print(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating bucket...") - bucket_settings = CreateBucketSettings(name=CB_BUCKET_NAME, ram_quota_mb=500) - bucket_manager.create_bucket(bucket_settings) - print(f"Bucket '{CB_BUCKET_NAME}' created successfully.") - -# Create scope and collection if they do not exist -collection_manager = cluster.bucket(CB_BUCKET_NAME).collections() -scopes = collection_manager.get_all_scopes() -scope_exists = any(scope.name == SCOPE_NAME for scope in scopes) - -if scope_exists: - print(f"Scope '{SCOPE_NAME}' already exists.") -else: - print(f"Scope '{SCOPE_NAME}' does not exist. Creating scope...") - collection_manager.create_scope(SCOPE_NAME) - print(f"Scope '{SCOPE_NAME}' created successfully.") - -collections = [collection.name for scope in scopes if scope.name == SCOPE_NAME for collection in scope.collections] -collection_exists = COLLECTION_NAME in collections - -if collection_exists: - print(f"Collection '{COLLECTION_NAME}' already exists in scope '{SCOPE_NAME}'.") -else: - print(f"Collection '{COLLECTION_NAME}' does not exist in scope '{SCOPE_NAME}'. Creating collection...") - collection_manager.create_collection(collection_name=COLLECTION_NAME, scope_name=SCOPE_NAME) - print(f"Collection '{COLLECTION_NAME}' created successfully.") - -scope = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME) -``` - -# Setting Up GSI Vector Search -In this section, we'll set up the Couchbase vector store using GSI (Global Secondary Index) for high-performance vector search. Unlike FTS-based vector search, GSI vector search provides optimized performance for pure vector similarity operations and can scale to billions of vectors with low memory footprint. - -GSI vector search supports two main index types: -- **Hyperscale Vector Indexes (BHIVE)**: Best for pure vector searches with high performance and concurrent operations -- **Composite Vector Indexes**: Best for filtered vector searches combining vector similarity with scalar filtering - -For this tutorial, we'll use the Query vector store. - - -# Load the BBC News Dataset -To build a RAG engine, we need data to search through. We use the [BBC Realtime News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime), a dataset with up-to-date BBC news articles grouped by month. This dataset contains articles that were created after the LLM was trained. It will showcase the use of RAG to augment the LLM. - -The BBC News dataset's varied content allows us to simulate real-world scenarios where users ask complex questions, enabling us to fine-tune our RAG's ability to understand and respond to various types of queries. - - - -```python -try: - news_dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train") - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") -except Exception as e: - raise ValueError(f"Error loading TREC dataset: {str(e)}") -``` - -## Preview the Data - - -```python -# Print the first two examples from the dataset -print("Dataset columns:", news_dataset.column_names) -print("\nFirst two examples:") -print(news_dataset[:2]) -``` - -## Preparing the Data for RAG - -We need to extract the context passages from the dataset to use as our knowledge base for the RAG system. - - -```python -import hashlib - -news_articles = news_dataset -unique_articles = {} - -for article in news_articles: - content = article.get("content") - if content: - content_hash = hashlib.md5(content.encode()).hexdigest() # Generate hash of content - if content_hash not in unique_articles: - unique_articles[content_hash] = article # Store full article - -unique_news_articles = list(unique_articles.values()) # Convert back to list - -print(f"We have {len(unique_news_articles)} unique articles in our database.") - -``` - -# Creating Embeddings using OpenAI -Embeddings are numerical representations of text that capture semantic meaning. Unlike keyword-based search, embeddings enable semantic search to understand context and retrieve documents that are conceptually similar even without exact keyword matches. We'll use OpenAI's `text-embedding-3-large` model to create high-quality embeddings with 3,072 dimensions. This model transforms our text data into vector representations that can be efficiently searched, with a batch size of 30 for optimal processing. - - - -```python -try: - # Set up the embedding model - embed_model = OpenAIEmbedding( - api_key=OPENAI_API_KEY, - embed_batch_size=30, - model="text-embedding-3-large" - ) - - # Configure LlamaIndex to use this embedding model - Settings.embed_model = embed_model - print("Successfully created embedding model") -except Exception as e: - raise ValueError(f"Error creating embedding model: {str(e)}") -``` - -# Testing the Embeddings Model -We can test the embeddings model by generating an embedding for a string - - -```python -test_embedding = embed_model.get_text_embedding("this is a test sentence") -print(f"Embedding dimension: {len(test_embedding)}") -``` - -# Setting Up the Couchbase GSI Vector Store -The GSI vector store is set up to store the documents from the dataset using Couchbase's Global Secondary Index vector search capabilities. This vector store is optimized for high-performance vector similarity search operations and can scale to billions of vectors. - - -```python -try: - # Create the Couchbase GSI vector store - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - search_type=QueryVectorSearchType.ANN, - similarity=QueryVectorSearchSimilarity.DOT, - nprobes=10 - ) - print("Successfully created GSI vector store") -except Exception as e: - raise ValueError(f"Failed to create GSI vector store: {str(e)}") -``` - -# Creating LlamaIndex Documents -In this section, we'll process our news articles and create LlamaIndex Document objects. -Each Document is created with specific metadata and formatting templates to control what the LLM and embedding model see. -We'll observe examples of the formatted content to understand how the documents are structured. - - -```python -from llama_index.core.schema import MetadataMode - -llama_documents = [] -# Process and store documents -for article in unique_news_articles: # Limit to first 100 for demo - try: - document = Document( - text=article["content"], - metadata={ - "title": article["title"], - "description": article["description"], - "published_date": article["published_date"], - "link": article["link"], - }, - excluded_llm_metadata_keys=["description"], - excluded_embed_metadata_keys=["description", "published_date", "link"], - metadata_template="{key}=>{value}", - text_template="Metadata: \n{metadata_str}\n-----\nContent: {content}", - ) - llama_documents.append(document) - except Exception as e: - print(f"Failed to save document to vector store: {str(e)}") - continue - -# Observing an example of what the LLM and Embedding model receive as input -print("The LLM sees this:") -print(llama_documents[0].get_content(metadata_mode=MetadataMode.LLM)) -print("The Embedding model sees this:") -print(llama_documents[0].get_content(metadata_mode=MetadataMode.EMBED)) - - -``` - -# Creating and Running the Ingestion Pipeline - -In this section, we'll create an ingestion pipeline to process our documents. The pipeline will: - -1. Split the documents into smaller chunks (nodes) using the SentenceSplitter -2. Generate embeddings for each node using our embedding model -3. Store these nodes with their embeddings in our Couchbase vector store - -This process transforms our raw documents into a searchable knowledge base that can be queried semantically. - - -```python - - -# Process documents: split into nodes, generate embeddings, and store in vector database -# Step 3: Create and Run IndexPipeline -index_pipeline = IngestionPipeline( - transformations=[SentenceSplitter(),embed_model], - vector_store=vector_store, -) - -index_pipeline.run(documents=llama_documents) - -``` - -# Using OpenAI's Large Language Model (LLM) -Large language models are AI systems that are trained to understand and generate human language. We'll be using OpenAI's `gpt-4o` model to process user queries and generate meaningful responses based on the retrieved context from our Couchbase vector store. This model is a key component of our RAG system, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By integrating OpenAI's LLM, we equip our RAG system with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our RAG system truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - -The LLM is configured using LlamaIndex's OpenAI-like provider with OpenAI's API endpoint and your OpenAI API key for seamless integration with their services. - - -```python -try: - # Set up the LLM - llm = OpenAI( - api_key=OPENAI_API_KEY, - model="gpt-4o", - - ) - # Configure LlamaIndex to use this LLM - Settings.llm = llm - logging.info("Successfully created the OpenAI LLM") -except Exception as e: - raise ValueError(f"Error creating OpenAI LLM: {str(e)}") -``` - -# Creating the Vector Store Index - -In this section, we'll create a VectorStoreIndex from our Couchbase vector store. This index serves as the foundation for our RAG system, enabling semantic search capabilities and efficient retrieval of relevant information. - -The VectorStoreIndex provides a high-level interface to interact with our vector store, allowing us to: -1. Perform semantic searches based on user queries -2. Retrieve the most relevant documents or chunks -3. Generate contextually appropriate responses using our LLM - - - -```python -# Create your index -from llama_index.core import VectorStoreIndex - -index = VectorStoreIndex.from_vector_store(vector_store) -rag = index.as_query_engine() -``` - -# Retrieval-Augmented Generation (RAG) with Couchbase and LlamaIndex - -Let's test our RAG system by performing a semantic search on a sample query. In this example, we'll use a question about Pep Guardiola's reaction to Manchester City's recent form. The RAG system will: - -1. Process the natural language query -2. Search through our vector database for relevant information -3. Retrieve the most semantically similar documents -4. Generate a comprehensive response using the LLM - -This demonstrates how our system combines the power of vector search with language model capabilities to provide accurate, contextual answers based on the information in our database. - -**Note:** By default, without any GSI vector index, Couchbase uses linear brute force search which compares the query vector against every document in the collection. This works for small datasets but can become slow as the dataset grows. - - -```python -# Sample query from the dataset - -query = "Who will Daniel Dubois fight in Saudi Arabia on 22 February?" - -try: - # Perform the semantic search - start_time = time.time() - response = rag.query(query) - search_elapsed_time = time.time() - start_time - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print(response) - -except RecursionError as e: - raise RuntimeError(f"Error performing semantic search: {e}") -``` - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above RAG system works effectively, we can significantly improve query performance by leveraging Couchbase's advanced GSI vector search capabilities. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -**Hyperscale Vector Indexes** -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -**Composite Vector Indexes** -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -**Choosing the Right Index Type** -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html). - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -**Centroids (IVF - Inverted File):** -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -**Quantization Options:** -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -**Common Examples:** -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/server/current/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index for optimal performance. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -# Create a BHIVE (Hyperscale Vector Index) for optimized vector search -try: - bhive_index_name = f"{INDEX_NAME}_bhive" - - options = { - "dimension": 3072, - "description": "IVF1024,PQ32x8", - "similarity": "DOT", - } - scope.query( - f""" - CREATE INDEX {bhive_index_name} - ON {COLLECTION_NAME} (embedding VECTOR) - USING GSI WITH {json.dumps(options)} - """, - QueryOptions( - timeout=timedelta(seconds=300) - )).execute() - print(f"Successfully created BHIVE index: {bhive_index_name}") -except Exception as e: - print(f"BHIVE index may already exist or error occurred: {str(e)}") - -``` - -The example below shows running the same RAG query, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - - -```python -# Test the optimized GSI vector search with BHIVE index -query = "Who will Daniel Dubois fight in Saudi Arabia on 22 February?" -try: - # Create a new query engine using the optimized vector store - optimized_rag = index.as_query_engine() - - # Perform the semantic search with GSI optimization - start_time = time.time() - response = optimized_rag.query(query) - search_elapsed_time = time.time() - start_time - - # Display search results - print(f"\nOptimized GSI Vector Search Results (completed in {search_elapsed_time:.2f} seconds):") - print(response) - -except Exception as e: - raise RuntimeError(f"Error performing optimized semantic search: {e}") - -``` - -# Conclusion -In this tutorial, we've built a Retrieval Augmented Generation (RAG) system using Couchbase Capella's GSI vector search, OpenAI, and LlamaIndex. We used the BBC News dataset, which contains real-time news articles, to demonstrate how RAG can be used to answer questions about current events and provide up-to-date information that extends beyond the LLM's training data. - -The key components of our RAG system include: - -1. **Couchbase Capella GSI Vector Search** as the high-performance vector database for storing and retrieving document embeddings -2. **LlamaIndex** as the framework for connecting our data to the LLM -3. **OpenAI Services** for generating embeddings (`text-embedding-3-large`) and LLM responses (`gpt-4o`) -4. **GSI Vector Indexes** (BHIVE/Composite) for optimized vector search performance - -This approach allows us to enhance the capabilities of large language models by grounding their responses in specific, up-to-date information from our knowledge base, while leveraging Couchbase's advanced GSI vector search for optimal performance and scalability. - diff --git a/tutorial/markdown/generated/vector-search-cookbook/langgraph-couchbase_persistence_langgraph.md b/tutorial/markdown/generated/vector-search-cookbook/langgraph-couchbase_persistence_langgraph.md deleted file mode 100644 index b1077ee..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/langgraph-couchbase_persistence_langgraph.md +++ /dev/null @@ -1,316 +0,0 @@ ---- -# frontmatter -path: "/tutorial-langgraph-persistence-checkpoint" -title: Persist LangGraph State with Couchbase Checkpointer -short_title: Persist LangGraph State with Couchbase -description: - - Learn how to use Checkpointer Library for LangGraph - - Use Couchbase to store the LangGraph states -content_type: tutorial -filter: sdk -technology: - - kv -tags: - - Artificial Intelligence - - LangGraph -sdk_language: - - python -length: 20 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/langgraph/couchbase_persistence_langgraph.ipynb) - -# LangGraph Persistence with Couchbase - -### LangGraph - -LangGraph is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. Compared to other LLM frameworks, it offers these core benefits: cycles, controllability, and persistence. LangGraph allows you to define flows that involve cycles, essential for most agentic architectures, differentiating it from DAG-based solutions. This tutorial focuses on showcasing persisting state of [LangGraph](https://github.com/langchain-ai/langgraph) with Couchbase. - -### Checkpointer - -Checkpointers in LangGraph save snapshots of graph state at each execution step, enabling memory between interactions, human-in-the-loop workflows, and fault tolerance. By organizing states into "threads" with unique IDs - `thread_id`, they preserve conversation history and allow time travel debugging. Checkpointers implement methods to store, retrieve, and list checkpoints, with various backend options (in-memory, Couchbase, SQLite) to suit different application needs. This persistence layer is what enables agents to maintain context across multiple user interactions and recover gracefully from failures. - -### Couchbase as a Checkpointer - -This tutorial focuses on implementing a LangGraph checkpointer with Couchbase, leveraging Couchbase's distributed architecture, JSON document model, and high availability to provide robust persistence for agent workflows. Couchbase's scalability and flexible query capabilities make it an ideal backend for managing complex conversation states across multiple users and sessions. - -# How to use Couchbase Checkpointer - - - -This tutorial focuses on using a LangGraph checkpointer with Couchbase using the dedicated [langgraph-checkpointer-couchbase](https://pypi.org/project/langgraph-checkpointer-couchbase/) package. - -This package provides a seamless way to persist LangGraph agent states in Couchbase, enabling: - -- State persistence across application restarts -- Retrieval of historical conversation steps -- Continued conversations from previous checkpoints -- Both synchronous and asynchronous interfaces - -## Setup environment - -Requires Couchbase Python SDK and langgraph package - - -```python -%%capture --no-stderr -%pip install -U langgraph==0.3.22 langgraph-checkpointer-couchbase -``` - -This particular example uses OpenAI's GPT 4.1-mini as the model - - -```python -import getpass -import os - - -def _set_env(var: str): - if not os.environ.get(var): - os.environ[var] = getpass.getpass(f"{var}: ") - - -_set_env("OPENAI_API_KEY") -``` - -## Setup model and tools for the graph - -We will be creating a [ReAct Agent](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/) for this demo. Let's create a custom tool which our agent can call to get more information. - -We are using a tool `get_weather` which gives the weather information based on the city. This tool gives weather information based on the city. We are also setting up the ChatGPT model here. - - -```python -from typing import Literal -from langchain_core.tools import tool -from langchain_openai import ChatOpenAI -from langgraph.prebuilt import create_react_agent - - -@tool -def get_weather(city: Literal["nyc", "sf"]): - """Use this to get weather information.""" - if city == "nyc": - return "It might be cloudy in nyc" - elif city == "sf": - return "It's always sunny in sf" - else: - raise AssertionError("Unknown city") - - -tools = [get_weather] -model = ChatOpenAI(model_name="gpt-4.1-mini", temperature=0) -``` - -### Couchbase Connection and intialization - -There are 2 ways to initialize a saver. - -1. `from_conn_info` - Provide details of the connection string, username, password. The package will handle connection itself. -2. `from_cluster` - Provide a connected Couchbase.Cluster object. - -We will be using `from_conn_info` in the sync tutorial and `from_cluster` in the async one, but any of the above can be used as per requirements - - -## Use sync connection (CouchbaseSaver) - -Below is usage of CouchbaseSaver (for synchronous use of graph, i.e. `.invoke()`, `.stream()`). CouchbaseSaver implements four methods that are required for any checkpointer: - -- `.put` - Store a checkpoint with its configuration and metadata. -- `.put_writes` - Store intermediate writes linked to a checkpoint (i.e. pending writes). -- `.get_tuple` - Fetch a checkpoint tuple using a given configuration (`thread_id` and `checkpoint_id`). -- `.list` - List checkpoints that match a given configuration and filter criteria. - -Here we will create a Couchbase connection. We are using local setup with bucket `test`, `langgraph` scope. You may change bucket and scope if required. We will also require `checkpoints` and `checkpoint_writes` as collections inside. - -Then a [ReAct Agent](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/) is created with GPT Model, weather tool and Couchbase checkpointer. - -LangGraph's graph is invoked with message for GPT, storing all the state in Couchbase. We use get, get_tuple and list methods to fetch the states again - - -```python -from langgraph_checkpointer_couchbase import CouchbaseSaver - -with CouchbaseSaver.from_conn_info( - cb_conn_str="couchbase://localhost", - cb_username="Administrator", - cb_password="password", - bucket_name="test", - scope_name="langgraph", -) as checkpointer: - graph = create_react_agent(model, tools=tools, checkpointer=checkpointer) - config = {"configurable": {"thread_id": "1"}} - res = graph.invoke({"messages": [("human", "what's the weather in sf")]}, config) - - latest_checkpoint = checkpointer.get(config) - latest_checkpoint_tuple = checkpointer.get_tuple(config) - checkpoint_tuples = list(checkpointer.list(config)) -``` - - -```python -latest_checkpoint -``` - - - - - {'v': 2, - 'ts': '2025-04-22T04:38:11.363745+00:00', - 'id': '1f01f339-8ab3-6ce0-8003-a475eb1c8337', - 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45'), - AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), - ToolMessage(content="It's always sunny in sf", name='get_weather', id='15c7d895-1e5a-4edf-9dfc-5a5f363d6111', tool_call_id='call_zRq0TmgKvcfoaiaRQxd1YlXe'), - AIMessage(content='The weather in San Francisco is always sunny.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 84, 'total_tokens': 95, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgYLdRzuTyLsLvAc3MDV90N8Y9F', 'finish_reason': 'stop', 'logprobs': None}, id='run-8ee450ec-4a15-4ceb-b4ed-49a98c7f9ab5-0', usage_metadata={'input_tokens': 84, 'output_tokens': 11, 'total_tokens': 95, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, - 'channel_versions': {'__start__': 2, - 'messages': 5, - 'branch:to:agent': 5, - 'branch:to:tools': 4}, - 'versions_seen': {'__input__': {}, - '__start__': {'__start__': 1}, - 'agent': {'branch:to:agent': 4}, - 'tools': {'branch:to:tools': 3}}, - 'pending_sends': []} - - - - -```python -latest_checkpoint_tuple -``` - - - - - CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8ab3-6ce0-8003-a475eb1c8337'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:11.363745+00:00', 'id': '1f01f339-8ab3-6ce0-8003-a475eb1c8337', 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content="It's always sunny in sf", name='get_weather', id='15c7d895-1e5a-4edf-9dfc-5a5f363d6111', tool_call_id='call_zRq0TmgKvcfoaiaRQxd1YlXe'), AIMessage(content='The weather in San Francisco is always sunny.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 84, 'total_tokens': 95, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgYLdRzuTyLsLvAc3MDV90N8Y9F', 'finish_reason': 'stop', 'logprobs': None}, id='run-8ee450ec-4a15-4ceb-b4ed-49a98c7f9ab5-0', usage_metadata={'input_tokens': 84, 'output_tokens': 11, 'total_tokens': 95, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, 'channel_versions': {'__start__': 2, 'messages': 5, 'branch:to:agent': 5, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 4}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content='The weather in San Francisco is always sunny.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 84, 'total_tokens': 95, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgYLdRzuTyLsLvAc3MDV90N8Y9F', 'finish_reason': 'stop', 'logprobs': None}, id='run-8ee450ec-4a15-4ceb-b4ed-49a98c7f9ab5-0', usage_metadata={'input_tokens': 84, 'output_tokens': 11, 'total_tokens': 95, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 3, 'parents': {}, 'thread_id': '1'}, parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8237-6166-8002-ac95e49f67e4'}}, pending_writes=[]) - - - - -```python -checkpoint_tuples -``` - - - - - [CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8ab3-6ce0-8003-a475eb1c8337'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:11.363745+00:00', 'id': '1f01f339-8ab3-6ce0-8003-a475eb1c8337', 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content="It's always sunny in sf", name='get_weather', id='15c7d895-1e5a-4edf-9dfc-5a5f363d6111', tool_call_id='call_zRq0TmgKvcfoaiaRQxd1YlXe'), AIMessage(content='The weather in San Francisco is always sunny.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 84, 'total_tokens': 95, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgYLdRzuTyLsLvAc3MDV90N8Y9F', 'finish_reason': 'stop', 'logprobs': None}, id='run-8ee450ec-4a15-4ceb-b4ed-49a98c7f9ab5-0', usage_metadata={'input_tokens': 84, 'output_tokens': 11, 'total_tokens': 95, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, 'channel_versions': {'__start__': 2, 'messages': 5, 'branch:to:agent': 5, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 4}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content='The weather in San Francisco is always sunny.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 84, 'total_tokens': 95, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgYLdRzuTyLsLvAc3MDV90N8Y9F', 'finish_reason': 'stop', 'logprobs': None}, id='run-8ee450ec-4a15-4ceb-b4ed-49a98c7f9ab5-0', usage_metadata={'input_tokens': 84, 'output_tokens': 11, 'total_tokens': 95, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 3, 'parents': {}, 'thread_id': '1'}, parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8237-6166-8002-ac95e49f67e4'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8237-6166-8002-ac95e49f67e4'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:10.473797+00:00', 'id': '1f01f339-8237-6166-8002-ac95e49f67e4', 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content="It's always sunny in sf", name='get_weather', id='15c7d895-1e5a-4edf-9dfc-5a5f363d6111', tool_call_id='call_zRq0TmgKvcfoaiaRQxd1YlXe')], 'branch:to:agent': None}, 'channel_versions': {'__start__': 2, 'messages': 4, 'branch:to:agent': 4, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 2}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'tools': {'messages': [ToolMessage(content="It's always sunny in sf", name='get_weather', id='15c7d895-1e5a-4edf-9dfc-5a5f363d6111', tool_call_id='call_zRq0TmgKvcfoaiaRQxd1YlXe')]}}, 'step': 2, 'parents': {}, 'thread_id': '1'}, parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8230-6028-8001-76f23dc92abc'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8230-6028-8001-76f23dc92abc'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:10.470894+00:00', 'id': '1f01f339-8230-6028-8001-76f23dc92abc', 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})], 'branch:to:tools': None}, 'channel_versions': {'__start__': 2, 'messages': 3, 'branch:to:agent': 3, 'branch:to:tools': 3}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 2}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 1, 'parents': {}, 'thread_id': '1'}, parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-6f17-6fe0-8000-bd10177d2f78'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-6f17-6fe0-8000-bd10177d2f78'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:08.468774+00:00', 'id': '1f01f339-6f17-6fe0-8000-bd10177d2f78', 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45')], 'branch:to:agent': None}, 'channel_versions': {'__start__': 2, 'messages': 2, 'branch:to:agent': 2}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': None, 'step': 0, 'parents': {}, 'thread_id': '1'}, parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-6f15-6664-bfff-0d88597e0554'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-6f15-6664-bfff-0d88597e0554'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:08.467717+00:00', 'id': '1f01f339-6f15-6664-bfff-0d88597e0554', 'channel_values': {'__start__': {'messages': [['human', "what's the weather in sf"]]}}, 'channel_versions': {'__start__': 1}, 'versions_seen': {'__input__': {}}, 'pending_sends': []}, metadata={'source': 'input', 'writes': {'__start__': {'messages': [['human', "what's the weather in sf"]]}}, 'step': -1, 'parents': {}, 'thread_id': '1'}, parent_config=None, pending_writes=None)] - - - -## Use async connection (AsyncCouchbaseSaver) - -This is the asynchronous example, Here we will create a Couchbase connection. We are using local setup with bucket `test`, `langgraph` scope. We will also require `checkpoints` and `checkpoint_writes` as collections inside. These are the methods supported by the library - -- `.aput` - Store a checkpoint with its configuration and metadata. -- `.aput_writes` - Store intermediate writes linked to a checkpoint (i.e. pending writes). -- `.aget_tuple` - Fetch a checkpoint tuple using a given configuration (`thread_id` and `checkpoint_id`). -- `.alist` - List checkpoints that match a given configuration and filter criteria. - -Then a [ReAct Agent](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/) is created with GPT Model, weather tool and Couchbase checkpointer. - -LangGraph's graph is invoked with message for GPT, storing all the state in Couchbase. We use aget, aget_tuple and alist methods to fetch the states again - - -```python -# Create Couchbase Cluster Connection -from acouchbase.cluster import Cluster as ACluster -from couchbase.auth import PasswordAuthenticator -from couchbase.options import ClusterOptions - -cb_conn_str = "couchbase://localhost" -cb_username = "Administrator" -cb_password = "password" - -auth = PasswordAuthenticator(cb_username, cb_password) -options = ClusterOptions(auth) -cb_cluster = await ACluster.connect(cb_conn_str, options) -``` - - -```python -from langgraph_checkpointer_couchbase import AsyncCouchbaseSaver - -async with AsyncCouchbaseSaver.from_cluster( - cluster=cb_cluster, - bucket_name="test", - scope_name="langgraph", -) as checkpointer: - graph = create_react_agent(model, tools=tools, checkpointer=checkpointer) - config = {"configurable": {"thread_id": "2"}} - res = await graph.ainvoke( - {"messages": [("human", "what's the weather in nyc")]}, config - ) - - latest_checkpoint = await checkpointer.aget(config) - latest_checkpoint_tuple = await checkpointer.aget_tuple(config) - checkpoint_tuples = [c async for c in checkpointer.alist(config)] -``` - - -```python -latest_checkpoint -``` - - - - - {'v': 2, - 'ts': '2025-04-22T04:38:51.638880+00:00', - 'id': '1f01f33b-0acb-6cce-8003-7a4f1a2cf90e', - 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d'), - AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), - ToolMessage(content='It might be cloudy in nyc', name='get_weather', id='54b58baa-d179-496f-a84c-f55a340874cc', tool_call_id='call_haxMoLeSz5hkuUXcerR45Kqt'), - AIMessage(content="The weather in NYC might be cloudy. Is there anything else you'd like to know?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 88, 'total_tokens': 107, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhCjd46jhDUvB2B01WHfyikY6he', 'finish_reason': 'stop', 'logprobs': None}, id='run-15e27c46-8433-443e-9909-7eea9a1ab2dc-0', usage_metadata={'input_tokens': 88, 'output_tokens': 19, 'total_tokens': 107, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, - 'channel_versions': {'__start__': 2, - 'messages': 5, - 'branch:to:agent': 5, - 'branch:to:tools': 4}, - 'versions_seen': {'__input__': {}, - '__start__': {'__start__': 1}, - 'agent': {'branch:to:agent': 4}, - 'tools': {'branch:to:tools': 3}}, - 'pending_sends': []} - - - - -```python -latest_checkpoint_tuple -``` - - - - - CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0acb-6cce-8003-7a4f1a2cf90e'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:51.638880+00:00', 'id': '1f01f33b-0acb-6cce-8003-7a4f1a2cf90e', 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='It might be cloudy in nyc', name='get_weather', id='54b58baa-d179-496f-a84c-f55a340874cc', tool_call_id='call_haxMoLeSz5hkuUXcerR45Kqt'), AIMessage(content="The weather in NYC might be cloudy. Is there anything else you'd like to know?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 88, 'total_tokens': 107, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhCjd46jhDUvB2B01WHfyikY6he', 'finish_reason': 'stop', 'logprobs': None}, id='run-15e27c46-8433-443e-9909-7eea9a1ab2dc-0', usage_metadata={'input_tokens': 88, 'output_tokens': 19, 'total_tokens': 107, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, 'channel_versions': {'__start__': 2, 'messages': 5, 'branch:to:agent': 5, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 4}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content="The weather in NYC might be cloudy. Is there anything else you'd like to know?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 88, 'total_tokens': 107, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhCjd46jhDUvB2B01WHfyikY6he', 'finish_reason': 'stop', 'logprobs': None}, id='run-15e27c46-8433-443e-9909-7eea9a1ab2dc-0', usage_metadata={'input_tokens': 88, 'output_tokens': 19, 'total_tokens': 107, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 3, 'parents': {}, 'thread_id': '2'}, parent_config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0138-6a22-8002-a9a8ec4b7e4e'}}, pending_writes=[]) - - - - -```python -checkpoint_tuples -``` - - - - - [CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0acb-6cce-8003-7a4f1a2cf90e'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:51.638880+00:00', 'id': '1f01f33b-0acb-6cce-8003-7a4f1a2cf90e', 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='It might be cloudy in nyc', name='get_weather', id='54b58baa-d179-496f-a84c-f55a340874cc', tool_call_id='call_haxMoLeSz5hkuUXcerR45Kqt'), AIMessage(content="The weather in NYC might be cloudy. Is there anything else you'd like to know?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 88, 'total_tokens': 107, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhCjd46jhDUvB2B01WHfyikY6he', 'finish_reason': 'stop', 'logprobs': None}, id='run-15e27c46-8433-443e-9909-7eea9a1ab2dc-0', usage_metadata={'input_tokens': 88, 'output_tokens': 19, 'total_tokens': 107, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, 'channel_versions': {'__start__': 2, 'messages': 5, 'branch:to:agent': 5, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 4}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content="The weather in NYC might be cloudy. Is there anything else you'd like to know?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 88, 'total_tokens': 107, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhCjd46jhDUvB2B01WHfyikY6he', 'finish_reason': 'stop', 'logprobs': None}, id='run-15e27c46-8433-443e-9909-7eea9a1ab2dc-0', usage_metadata={'input_tokens': 88, 'output_tokens': 19, 'total_tokens': 107, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 3, 'parents': {}, 'thread_id': '2'}, parent_config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0138-6a22-8002-a9a8ec4b7e4e'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0138-6a22-8002-a9a8ec4b7e4e'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:50.634902+00:00', 'id': '1f01f33b-0138-6a22-8002-a9a8ec4b7e4e', 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='It might be cloudy in nyc', name='get_weather', id='54b58baa-d179-496f-a84c-f55a340874cc', tool_call_id='call_haxMoLeSz5hkuUXcerR45Kqt')], 'branch:to:agent': None}, 'channel_versions': {'__start__': 2, 'messages': 4, 'branch:to:agent': 4, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 2}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'tools': {'messages': [ToolMessage(content='It might be cloudy in nyc', name='get_weather', id='54b58baa-d179-496f-a84c-f55a340874cc', tool_call_id='call_haxMoLeSz5hkuUXcerR45Kqt')]}}, 'step': 2, 'parents': {}, 'thread_id': '2'}, parent_config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0134-6404-8001-5d2722d53ba0'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0134-6404-8001-5d2722d53ba0'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:50.633105+00:00', 'id': '1f01f33b-0134-6404-8001-5d2722d53ba0', 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})], 'branch:to:tools': None}, 'channel_versions': {'__start__': 2, 'messages': 3, 'branch:to:agent': 3, 'branch:to:tools': 3}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 2}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 1, 'parents': {}, 'thread_id': '2'}, parent_config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33a-f6f8-65bc-8000-46d63ff6f934'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33a-f6f8-65bc-8000-46d63ff6f934'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:49.559999+00:00', 'id': '1f01f33a-f6f8-65bc-8000-46d63ff6f934', 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d')], 'branch:to:agent': None}, 'channel_versions': {'__start__': 2, 'messages': 2, 'branch:to:agent': 2}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': None, 'step': 0, 'parents': {}, 'thread_id': '2'}, parent_config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33a-f6f5-6790-bfff-47d3da71ee7e'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33a-f6f5-6790-bfff-47d3da71ee7e'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:49.558819+00:00', 'id': '1f01f33a-f6f5-6790-bfff-47d3da71ee7e', 'channel_values': {'__start__': {'messages': [['human', "what's the weather in nyc"]]}}, 'channel_versions': {'__start__': 1}, 'versions_seen': {'__input__': {}}, 'pending_sends': []}, metadata={'source': 'input', 'writes': {'__start__': {'messages': [['human', "what's the weather in nyc"]]}}, 'step': -1, 'parents': {}, 'thread_id': '2'}, parent_config=None, pending_writes=None)] - - - - -```python - -``` diff --git a/tutorial/markdown/generated/vector-search-cookbook/llamaindex-fts-RAG_with_Couchbase_Capella_and_OpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/llamaindex-fts-RAG_with_Couchbase_Capella_and_OpenAI.md deleted file mode 100644 index 3593cc0..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/llamaindex-fts-RAG_with_Couchbase_Capella_and_OpenAI.md +++ /dev/null @@ -1,505 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-llamaindex-rag-with-fts" -title: "Retrieval-Augmented Generation (RAG) with OpenAI, LlamaIndex and Couchbase Search Vector Index" -short_title: "RAG with OpenAI, LlamaIndex and Couchbase Search Vector Index" -description: - - Learn how to build a semantic search engine using Couchbase's Search Vector Index. - - This tutorial demonstrates how to integrate Couchbase's search vector search capabilities with the embeddings generated by OpenAI Services. - - You will understand how to perform Retrieval-Augmented Generation (RAG) using Llamaindex, Couchbase and OpenAI services. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - OpenAI - - Artificial Intelligence - - LlamaIndex - - FTS -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/llamaindex/fts/RAG_with_Couchbase_Capella_and_OpenAI.ipynb) - -# Introduction - -In this guide, we will walk you through building a Retrieval Augmented Generation (RAG) application using Couchbase Capella as the database, [gpt-4o](https://platform.openai.com/docs/models/gpt-4o) model as the large language model provided by OpenAI. We will use the [text-embedding-3-large](https://platform.openai.com/docs/guides/embeddings/embedding-models) model for generating embeddings. - -This notebook demonstrates how to build a RAG system using: -- The [BBC News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime) containing news articles -- Couchbase Capella as the vector store with FTS (Full Text Search) for vector index creation -- LlamaIndex framework for the RAG pipeline -- OpenAI for embeddings and text generation - -We leverage Couchbase's Full Text Search (FTS) service to create and manage vector indexes, enabling efficient semantic search capabilities. FTS provides the infrastructure for storing, indexing, and querying high-dimensional vector embeddings alongside traditional text search functionality. - -Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial will equip you with the knowledge to create a fully functional RAG system using OpenAI Services and LlamaIndex. - -# Before you start - -## Create and Deploy Your Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy an operational cluster. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Have a multi-node Capella cluster running the Data, Query, Index, and Search services. -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -### OpenAI Models Setup - -In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context. - -For this implementation, we'll use OpenAI's models which provide state-of-the-art performance for both embeddings and text generation: - -**Embedding Model**: We'll use OpenAI's `text-embedding-3-large` model, which provides high-quality embeddings with 3,072 dimensions for semantic search capabilities. - -**Large Language Model**: We'll use OpenAI's `gpt-4o` model for generating responses based on the retrieved context. This model offers excellent reasoning capabilities and can handle complex queries effectively. - -**Prerequisites for OpenAI Integration**: -* Create an OpenAI account at [platform.openai.com](https://platform.openai.com) -* Generate an API key from your OpenAI dashboard -* Ensure you have sufficient credits or a valid payment method set up -* Set up your API key as an environment variable or input it securely in the notebook - -For more details about OpenAI's models and pricing, please refer to the [OpenAI documentation](https://platform.openai.com/docs/models). - - -# Installing Necessary Libraries -To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, LlamaIndex handles AI model integrations, and we will use the OpenAI SDK for generating embeddings and calling OpenAI's language models. - - - -```python -# Install required packages -%pip install datasets llama-index-vector-stores-couchbase==0.6.0 llama-index-embeddings-openai==0.5.1 llama-index-llms-openai==0.5.6 llama-index==0.14.2 -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - - -```python -import getpass -import base64 -import logging -import sys -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import CouchbaseException -from couchbase.options import ClusterOptions - -from datasets import load_dataset - -from llama_index.core import Settings, Document -from llama_index.core.ingestion import IngestionPipeline -from llama_index.core.node_parser import SentenceSplitter -from llama_index.vector_stores.couchbase import CouchbaseSearchVectorStore -from llama_index.embeddings.openai import OpenAIEmbedding -from llama_index.llms.openai import OpenAI - -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, collection names, and API keys. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - -**OPENAI_API_KEY** is your OpenAI API key which can be obtained from your OpenAI dashboard at [platform.openai.com](https://platform.openai.com/api-keys). - -**INDEX_NAME** is the name of the FTS search index we will use for vector search operations. - - -```python -CB_CONNECTION_STRING = input("Couchbase Cluster URL (default: localhost): ") or "localhost" -CB_USERNAME = input("Couchbase Username (default: admin): ") or "admin" -CB_PASSWORD = input("Couchbase password (default: Password@12345): ") or "Password@12345" -CB_BUCKET_NAME = input("Couchbase Bucket: ") -SCOPE_NAME = input("Couchbase Scope: ") -COLLECTION_NAME = input("Couchbase Collection: ") -INDEX_NAME = input("Vector Search Index: ") -OPENAI_API_KEY = input("OpenAI API Key: ") - -# Check if the variables are correctly loaded -if not all([CB_CONNECTION_STRING, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_NAME, OPENAI_API_KEY]): - raise ValueError("All configuration variables must be provided.") - -``` - -# Setting Up Logging -Logging is essential for tracking the execution of our script and debugging any issues that may arise. We set up a logger that will display information about the script's progress, including timestamps and log levels. - - - -```python -# Configure logging -logging.basicConfig( - level=logging.INFO, - format="%(asctime)s - %(levelname)s - %(message)s", - handlers=[logging.StreamHandler(sys.stdout)], -) -``` - -# Connecting to Couchbase Capella -The next step is to establish a connection to our Couchbase Capella cluster. This connection will allow us to interact with the database, store and retrieve documents, and perform vector searches. - - - -```python -try: - # Initialize the Couchbase Cluster - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - - # Connect to the cluster - cluster = Cluster(CB_CONNECTION_STRING, options) - - # Wait for the cluster to be ready - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to the Couchbase cluster") -except CouchbaseException as e: - raise RuntimeError(f"Failed to connect to Couchbase: {str(e)}") -``` - -# Setting Up the Bucket, Scope, and Collection -Before we can store our data, we need to ensure that the appropriate bucket, scope, and collection exist in our Couchbase cluster. The code below checks if these components exist and creates them if they don't, providing a foundation for storing our vector embeddings and documents. - - -```python -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -import json - -# Create bucket if it does not exist -bucket_manager = cluster.buckets() -try: - bucket_manager.get_bucket(CB_BUCKET_NAME) - print(f"Bucket '{CB_BUCKET_NAME}' already exists.") -except Exception as e: - print(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating bucket...") - bucket_settings = CreateBucketSettings(name=CB_BUCKET_NAME, ram_quota_mb=500) - bucket_manager.create_bucket(bucket_settings) - print(f"Bucket '{CB_BUCKET_NAME}' created successfully.") - -# Create scope and collection if they do not exist -collection_manager = cluster.bucket(CB_BUCKET_NAME).collections() -scopes = collection_manager.get_all_scopes() -scope_exists = any(scope.name == SCOPE_NAME for scope in scopes) - -if scope_exists: - print(f"Scope '{SCOPE_NAME}' already exists.") -else: - print(f"Scope '{SCOPE_NAME}' does not exist. Creating scope...") - collection_manager.create_scope(SCOPE_NAME) - print(f"Scope '{SCOPE_NAME}' created successfully.") - -collections = [collection.name for scope in scopes if scope.name == SCOPE_NAME for collection in scope.collections] -collection_exists = COLLECTION_NAME in collections - -if collection_exists: - print(f"Collection '{COLLECTION_NAME}' already exists in scope '{SCOPE_NAME}'.") -else: - print(f"Collection '{COLLECTION_NAME}' does not exist in scope '{SCOPE_NAME}'. Creating collection...") - collection_manager.create_collection(collection_name=COLLECTION_NAME, scope_name=SCOPE_NAME) - print(f"Collection '{COLLECTION_NAME}' created successfully.") - -``` - -# Creating or Updating Search Indexes -With the index definition loaded, the next step is to create or update the Vector Search Index in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our RAG to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust RAG system. - - - -```python -# Create search index from search_index.json file at scope level -with open('fts_index.json', 'r') as search_file: - search_index_definition = SearchIndex.from_json(json.load(search_file)) - - # Update search index definition with user inputs - search_index_definition.name = INDEX_NAME - search_index_definition.source_name = CB_BUCKET_NAME - - # Update types mapping - old_type_key = next(iter(search_index_definition.params['mapping']['types'].keys())) - type_obj = search_index_definition.params['mapping']['types'].pop(old_type_key) - search_index_definition.params['mapping']['types'][f"{SCOPE_NAME}.{COLLECTION_NAME}"] = type_obj - - - search_index_name = search_index_definition.name - - # Get scope-level search manager - scope_search_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - try: - # Check if index exists at scope level - existing_index = scope_search_manager.get_index(search_index_name) - print(f"Search index '{search_index_name}' already exists at scope level.") - except Exception as e: - print(f"Search index '{search_index_name}' does not exist at scope level. Creating search index from fts_index.json...") - scope_search_manager.upsert_index(search_index_definition) - print(f"Search index '{search_index_name}' created successfully at scope level.") -``` - -# Load the BBC News Dataset -To build a RAG engine, we need data to search through. We use the [BBC Realtime News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime), a dataset with up-to-date BBC news articles grouped by month. This dataset contains articles that were created after the LLM was trained. It will showcase the use of RAG to augment the LLM. - -The BBC News dataset's varied content allows us to simulate real-world scenarios where users ask complex questions, enabling us to fine-tune our RAG's ability to understand and respond to various types of queries. - - - -```python -try: - news_dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train") - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") -except Exception as e: - raise ValueError(f"Error loading TREC dataset: {str(e)}") -``` - -## Preview the Data - - -```python -# Print the first two examples from the dataset -print("Dataset columns:", news_dataset.column_names) -print("\nFirst two examples:") -print(news_dataset[:2]) -``` - -## Preparing the Data for RAG - -We need to extract the context passages from the dataset to use as our knowledge base for the RAG system. - - -```python -import hashlib - -news_articles = news_dataset -unique_articles = {} - -for article in news_articles: - content = article.get("content") - if content: - content_hash = hashlib.md5(content.encode()).hexdigest() # Generate hash of content - if content_hash not in unique_articles: - unique_articles[content_hash] = article # Store full article - -unique_news_articles = list(unique_articles.values()) # Convert back to list - -print(f"We have {len(unique_news_articles)} unique articles in our database.") - -``` - -# Creating Embeddings using OpenAI -Embeddings are numerical representations of text that capture semantic meaning. Unlike keyword-based search, embeddings enable semantic search to understand context and retrieve documents that are conceptually similar even without exact keyword matches. We'll use OpenAI's `text-embedding-3-large` model to create high-quality embeddings with 3,072 dimensions. This model transforms our text data into vector representations that can be efficiently searched, with a batch size of 30 for optimal processing. - - - -```python -try: - # Set up the embedding model - embed_model = OpenAIEmbedding( - api_key=OPENAI_API_KEY, - embed_batch_size=30, - model="text-embedding-3-large" - ) - - # Configure LlamaIndex to use this embedding model - Settings.embed_model = embed_model - print("Successfully created embedding model") -except Exception as e: - raise ValueError(f"Error creating embedding model: {str(e)}") -``` - -# Testing the Embeddings Model -We can test the embeddings model by generating an embedding for a string - - -```python -test_embedding = embed_model.get_text_embedding("this is a test sentence") -print(f"Embedding dimension: {len(test_embedding)}") -``` - -# Setting Up the Couchbase Vector Store -The vector store is set up to store the documents from the dataset. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. - - -```python -try: - # Create the Couchbase vector store - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - index_name=INDEX_NAME, - ) - print("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - -# Creating LlamaIndex Documents -In this section, we'll process our news articles and create LlamaIndex Document objects. -Each Document is created with specific metadata and formatting templates to control what the LLM and embedding model see. -We'll observe examples of the formatted content to understand how the documents are structured. - - -```python -from llama_index.core.schema import MetadataMode - -llama_documents = [] -# Process and store documents -for article in unique_news_articles: # Limit to first 100 for demo - try: - document = Document( - text=article["content"], - metadata={ - "title": article["title"], - "description": article["description"], - "published_date": article["published_date"], - "link": article["link"], - }, - excluded_llm_metadata_keys=["description"], - excluded_embed_metadata_keys=["description", "published_date", "link"], - metadata_template="{key}=>{value}", - text_template="Metadata: \n{metadata_str}\n-----\nContent: {content}", - ) - llama_documents.append(document) - except Exception as e: - print(f"Failed to save document to vector store: {str(e)}") - continue - -# Observing an example of what the LLM and Embedding model receive as input -print("The LLM sees this:") -print(llama_documents[0].get_content(metadata_mode=MetadataMode.LLM)) -print("The Embedding model sees this:") -print(llama_documents[0].get_content(metadata_mode=MetadataMode.EMBED)) - - -``` - -# Creating and Running the Ingestion Pipeline - -In this section, we'll create an ingestion pipeline to process our documents. The pipeline will: - -1. Split the documents into smaller chunks (nodes) using the SentenceSplitter -2. Generate embeddings for each node using our embedding model -3. Store these nodes with their embeddings in our Couchbase vector store - -This process transforms our raw documents into a searchable knowledge base that can be queried semantically. - - -```python - - -# Process documents: split into nodes, generate embeddings, and store in vector database -# Step 3: Create and Run IndexPipeline -index_pipeline = IngestionPipeline( - transformations=[SentenceSplitter(),embed_model], - vector_store=vector_store, -) - -index_pipeline.run(documents=llama_documents) - -``` - -# Using OpenAI's Large Language Model (LLM) -Large language models are AI systems that are trained to understand and generate human language. We'll be using OpenAI's `gpt-4o` model to process user queries and generate meaningful responses based on the retrieved context from our Couchbase vector store. This model is a key component of our RAG system, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By integrating OpenAI's LLM, we equip our RAG system with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our RAG system truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - -The LLM is configured using LlamaIndex's OpenAI-like provider with OpenAI's API endpoint and your OpenAI API key for seamless integration with their services. - - -```python -try: - # Set up the LLM - llm = OpenAI( - api_key=OPENAI_API_KEY, - model="gpt-4o", - - ) - - - # Configure LlamaIndex to use this LLM - Settings.llm = llm - logging.info("Successfully created the OpenAI LLM") -except Exception as e: - raise ValueError(f"Error creating OpenAI LLM: {str(e)}") -``` - -# Creating the Vector Store Index - -In this section, we'll create a VectorStoreIndex from our Couchbase vector store. This index serves as the foundation for our RAG system, enabling semantic search capabilities and efficient retrieval of relevant information. - -The VectorStoreIndex provides a high-level interface to interact with our vector store, allowing us to: -1. Perform semantic searches based on user queries -2. Retrieve the most relevant documents or chunks -3. Generate contextually appropriate responses using our LLM - - - -```python -# Create your index -from llama_index.core import VectorStoreIndex - -index = VectorStoreIndex.from_vector_store(vector_store) -rag = index.as_query_engine() -``` - -# Retrieval-Augmented Generation (RAG) with Couchbase and LlamaIndex - -Let's test our RAG system by performing a semantic search on a sample query. In this example, we'll use a question about Pep Guardiola's reaction to Manchester City's recent form. The RAG system will: - -1. Process the natural language query -2. Search through our vector database for relevant information -3. Retrieve the most semantically similar documents -4. Generate a comprehensive response using the LLM - -This demonstrates how our system combines the power of vector search with language model capabilities to provide accurate, contextual answers based on the information in our database. - - -```python -# Sample query from the dataset - -query = "What was Pep Guardiola's reaction to Manchester City's recent form?" - -try: - # Perform the semantic search - start_time = time.time() - response = rag.query(query) - search_elapsed_time = time.time() - start_time - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print(response) - -except RecursionError as e: - raise RuntimeError(f"Error performing semantic search: {e}") -``` - -# Conclusion -In this tutorial, we've built a Retrieval Augmented Generation (RAG) system using Couchbase Capella, Openai and LlamaIndex. We used the BBC News dataset, which contains real-time news articles, to demonstrate how RAG can be used to answer questions about current events and provide up-to-date information that extends beyond the LLM's training data. - -The key components of our RAG system include: - -1. **Couchbase Capella** as the vector database for storing and retrieving document embeddings -2. **LlamaIndex** as the framework for connecting our data to the LLM -3. **OpenAI Services** for generating embeddings (`text-embedding-3-large`) and LLM responses (`gpt-4o`) - -This approach allows us to enhance the capabilities of large language models by grounding their responses in specific, up-to-date information from our knowledge base. diff --git a/tutorial/markdown/generated/vector-search-cookbook/llamaindex-gsi-RAG_with_Couchbase_Capella_and_OpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/llamaindex-gsi-RAG_with_Couchbase_Capella_and_OpenAI.md deleted file mode 100644 index 20a597a..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/llamaindex-gsi-RAG_with_Couchbase_Capella_and_OpenAI.md +++ /dev/null @@ -1,588 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-llamaindex-rag-with-gsi" -title: "RAG with OpenAI, LlamaIndex and Couchbase Hyperscale and Composite Vector Indexes" -short_title: "RAG with OpenAI, LlamaIndex and Couchbase CVI and HVI" -description: - - Learn how to build a semantic search engine using Couchbase's Hyperscale and Composite Vector Indexes. - - This tutorial demonstrates how to integrate Couchbase's GSI vector search capabilities with OpenAI embeddings. - - You will understand how to perform Retrieval-Augmented Generation (RAG) using LlamaIndex and GSI vector indexes. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - OpenAI - - Artificial Intelligence - - LlamaIndex - - GSI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/llamaindex/gsi/RAG_with_Couchbase_Capella_and_OpenAI.ipynb) - -# Introduction - -In this guide, we will walk you through building a Retrieval Augmented Generation (RAG) application using Couchbase Capella as the database, [gpt-4o](https://platform.openai.com/docs/models/gpt-4o) model as the large language model provided by OpenAI. We will use the [text-embedding-3-large](https://platform.openai.com/docs/guides/embeddings/embedding-models) model for generating embeddings. - -This notebook demonstrates how to build a RAG system using: -- The [BBC News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime) containing news articles -- Couchbase Capella as the vector store with GSI (Global Secondary Index) for vector search -- LlamaIndex framework for the RAG pipeline -- OpenAI for embeddings and text generation - -We leverage Couchbase's Global Secondary Index (GSI) vector search capabilities to create and manage vector indexes, enabling efficient semantic search capabilities. GSI provides high-performance vector search with support for both Hyperscale Vector Indexes (BHIVE) and Composite Vector Indexes, designed to scale to billions of vectors with low memory footprint and optimized concurrent operations. - -Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial will equip you with the knowledge to create a fully functional RAG system using OpenAI Services and LlamaIndex with Couchbase's advanced GSI vector search. - -# Before you start - -## Create and Deploy Your Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy an operational cluster. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Have a multi-node Capella cluster running the Data, Query, Index, and Search services. -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -### OpenAI Models Setup - -In order to create the RAG application, we need an embedding model to ingest the documents for Vector Search and a large language model (LLM) for generating the responses based on the context. - -For this implementation, we'll use OpenAI's models which provide state-of-the-art performance for both embeddings and text generation: - -**Embedding Model**: We'll use OpenAI's `text-embedding-3-large` model, which provides high-quality embeddings with 3,072 dimensions for semantic search capabilities. - -**Large Language Model**: We'll use OpenAI's `gpt-4o` model for generating responses based on the retrieved context. This model offers excellent reasoning capabilities and can handle complex queries effectively. - -**Prerequisites for OpenAI Integration**: -* Create an OpenAI account at [platform.openai.com](https://platform.openai.com) -* Generate an API key from your OpenAI dashboard -* Ensure you have sufficient credits or a valid payment method set up -* Set up your API key as an environment variable or input it securely in the notebook - -For more details about OpenAI's models and pricing, please refer to the [OpenAI documentation](https://platform.openai.com/docs/models). - - -# Installing Necessary Libraries -To build our RAG system, we need a set of libraries. The libraries we install handle everything from connecting to databases to performing AI tasks. Each library has a specific role: Couchbase libraries manage database operations, LlamaIndex handles AI model integrations, and we will use the OpenAI SDK for generating embeddings and calling OpenAI's language models. - - - -```python -# Install required packages -%pip install datasets llama-index-vector-stores-couchbase==0.6.0 llama-index-embeddings-openai==0.5.1 llama-index-llms-openai==0.5.6 llama-index==0.14.2 -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - - -```python -import getpass -import base64 -import logging -import sys -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import CouchbaseException -from couchbase.options import ClusterOptions, KnownConfigProfiles, QueryOptions - -from datasets import load_dataset - -from llama_index.core import Settings, Document -from llama_index.core.ingestion import IngestionPipeline -from llama_index.core.node_parser import SentenceSplitter -from llama_index.vector_stores.couchbase import CouchbaseQueryVectorStore -from llama_index.embeddings.openai import OpenAIEmbedding -from llama_index.llms.openai import OpenAI -from llama_index.vector_stores.couchbase import CouchbaseQueryVectorStore, QueryVectorSearchSimilarity, QueryVectorSearchType - -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, collection names, and API keys. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - -**OPENAI_API_KEY** is your OpenAI API key which can be obtained from your OpenAI dashboard at [platform.openai.com](https://platform.openai.com/api-keys). - -**INDEX_NAME** is the name of the GSI vector index we will create for vector search operations. - - -```python -CB_CONNECTION_STRING = input("Couchbase Cluster URL (default: localhost): ") or "localhost" -CB_USERNAME = input("Couchbase Username (default: admin): ") or "admin" -CB_PASSWORD = input("Couchbase password (default: Password@12345): ") or "Password@12345" -CB_BUCKET_NAME = input("Couchbase Bucket: ") -SCOPE_NAME = input("Couchbase Scope: ") -COLLECTION_NAME = input("Couchbase Collection: ") -INDEX_NAME = input("Vector Search Index: ") -OPENAI_API_KEY = input("OpenAI API Key: ") - -# Check if the variables are correctly loaded -if not all([CB_CONNECTION_STRING, CB_USERNAME, CB_PASSWORD, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_NAME, OPENAI_API_KEY]): - raise ValueError("All configuration variables must be provided.") - -``` - -# Setting Up Logging -Logging is essential for tracking the execution of our script and debugging any issues that may arise. We set up a logger that will display information about the script's progress, including timestamps and log levels. - - - -```python -# Configure logging -logging.basicConfig( - level=logging.INFO, - format="%(asctime)s - %(levelname)s - %(message)s", - handlers=[logging.StreamHandler(sys.stdout)], -) -``` - -# Connecting to Couchbase Capella -The next step is to establish a connection to our Couchbase Capella cluster. This connection will allow us to interact with the database, store and retrieve documents, and perform vector searches. - - - -```python -try: - # Initialize the Couchbase Cluster - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - options.apply_profile(KnownConfigProfiles.WanDevelopment) - # Connect to the cluster - cluster = Cluster(CB_CONNECTION_STRING, options) - - # Wait for the cluster to be ready - cluster.wait_until_ready(timedelta(seconds=5)) - - logging.info("Successfully connected to the Couchbase cluster") -except CouchbaseException as e: - raise RuntimeError(f"Failed to connect to Couchbase: {str(e)}") -``` - -# Setting Up the Bucket, Scope, and Collection -Before we can store our data, we need to ensure that the appropriate bucket, scope, and collection exist in our Couchbase cluster. The code below checks if these components exist and creates them if they don't, providing a foundation for storing our vector embeddings and documents. - - -```python -from couchbase.management.buckets import CreateBucketSettings -import json - -# Create bucket if it does not exist -bucket_manager = cluster.buckets() -try: - bucket_manager.get_bucket(CB_BUCKET_NAME) - print(f"Bucket '{CB_BUCKET_NAME}' already exists.") -except Exception as e: - print(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating bucket...") - bucket_settings = CreateBucketSettings(name=CB_BUCKET_NAME, ram_quota_mb=500) - bucket_manager.create_bucket(bucket_settings) - print(f"Bucket '{CB_BUCKET_NAME}' created successfully.") - -# Create scope and collection if they do not exist -collection_manager = cluster.bucket(CB_BUCKET_NAME).collections() -scopes = collection_manager.get_all_scopes() -scope_exists = any(scope.name == SCOPE_NAME for scope in scopes) - -if scope_exists: - print(f"Scope '{SCOPE_NAME}' already exists.") -else: - print(f"Scope '{SCOPE_NAME}' does not exist. Creating scope...") - collection_manager.create_scope(SCOPE_NAME) - print(f"Scope '{SCOPE_NAME}' created successfully.") - -collections = [collection.name for scope in scopes if scope.name == SCOPE_NAME for collection in scope.collections] -collection_exists = COLLECTION_NAME in collections - -if collection_exists: - print(f"Collection '{COLLECTION_NAME}' already exists in scope '{SCOPE_NAME}'.") -else: - print(f"Collection '{COLLECTION_NAME}' does not exist in scope '{SCOPE_NAME}'. Creating collection...") - collection_manager.create_collection(collection_name=COLLECTION_NAME, scope_name=SCOPE_NAME) - print(f"Collection '{COLLECTION_NAME}' created successfully.") - -scope = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME) -``` - -# Setting Up GSI Vector Search -In this section, we'll set up the Couchbase vector store using GSI (Global Secondary Index) for high-performance vector search. Unlike FTS-based vector search, GSI vector search provides optimized performance for pure vector similarity operations and can scale to billions of vectors with low memory footprint. - -GSI vector search supports two main index types: -- **Hyperscale Vector Indexes (BHIVE)**: Best for pure vector searches with high performance and concurrent operations -- **Composite Vector Indexes**: Best for filtered vector searches combining vector similarity with scalar filtering - -For this tutorial, we'll use the Query vector store. - - -# Load the BBC News Dataset -To build a RAG engine, we need data to search through. We use the [BBC Realtime News dataset](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime), a dataset with up-to-date BBC news articles grouped by month. This dataset contains articles that were created after the LLM was trained. It will showcase the use of RAG to augment the LLM. - -The BBC News dataset's varied content allows us to simulate real-world scenarios where users ask complex questions, enabling us to fine-tune our RAG's ability to understand and respond to various types of queries. - - - -```python -try: - news_dataset = load_dataset('RealTimeData/bbc_news_alltime', '2024-12', split="train") - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") -except Exception as e: - raise ValueError(f"Error loading TREC dataset: {str(e)}") -``` - -## Preview the Data - - -```python -# Print the first two examples from the dataset -print("Dataset columns:", news_dataset.column_names) -print("\nFirst two examples:") -print(news_dataset[:2]) -``` - -## Preparing the Data for RAG - -We need to extract the context passages from the dataset to use as our knowledge base for the RAG system. - - -```python -import hashlib - -news_articles = news_dataset -unique_articles = {} - -for article in news_articles: - content = article.get("content") - if content: - content_hash = hashlib.md5(content.encode()).hexdigest() # Generate hash of content - if content_hash not in unique_articles: - unique_articles[content_hash] = article # Store full article - -unique_news_articles = list(unique_articles.values()) # Convert back to list - -print(f"We have {len(unique_news_articles)} unique articles in our database.") - -``` - -# Creating Embeddings using OpenAI -Embeddings are numerical representations of text that capture semantic meaning. Unlike keyword-based search, embeddings enable semantic search to understand context and retrieve documents that are conceptually similar even without exact keyword matches. We'll use OpenAI's `text-embedding-3-large` model to create high-quality embeddings with 3,072 dimensions. This model transforms our text data into vector representations that can be efficiently searched, with a batch size of 30 for optimal processing. - - - -```python -try: - # Set up the embedding model - embed_model = OpenAIEmbedding( - api_key=OPENAI_API_KEY, - embed_batch_size=30, - model="text-embedding-3-large" - ) - - # Configure LlamaIndex to use this embedding model - Settings.embed_model = embed_model - print("Successfully created embedding model") -except Exception as e: - raise ValueError(f"Error creating embedding model: {str(e)}") -``` - -# Testing the Embeddings Model -We can test the embeddings model by generating an embedding for a string - - -```python -test_embedding = embed_model.get_text_embedding("this is a test sentence") -print(f"Embedding dimension: {len(test_embedding)}") -``` - -# Setting Up the Couchbase GSI Vector Store -The GSI vector store is set up to store the documents from the dataset using Couchbase's Global Secondary Index vector search capabilities. This vector store is optimized for high-performance vector similarity search operations and can scale to billions of vectors. - - -```python -try: - # Create the Couchbase GSI vector store - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - search_type=QueryVectorSearchType.ANN, - similarity=QueryVectorSearchSimilarity.DOT, - nprobes=10 - ) - print("Successfully created GSI vector store") -except Exception as e: - raise ValueError(f"Failed to create GSI vector store: {str(e)}") -``` - -# Creating LlamaIndex Documents -In this section, we'll process our news articles and create LlamaIndex Document objects. -Each Document is created with specific metadata and formatting templates to control what the LLM and embedding model see. -We'll observe examples of the formatted content to understand how the documents are structured. - - -```python -from llama_index.core.schema import MetadataMode - -llama_documents = [] -# Process and store documents -for article in unique_news_articles: # Limit to first 100 for demo - try: - document = Document( - text=article["content"], - metadata={ - "title": article["title"], - "description": article["description"], - "published_date": article["published_date"], - "link": article["link"], - }, - excluded_llm_metadata_keys=["description"], - excluded_embed_metadata_keys=["description", "published_date", "link"], - metadata_template="{key}=>{value}", - text_template="Metadata: \n{metadata_str}\n-----\nContent: {content}", - ) - llama_documents.append(document) - except Exception as e: - print(f"Failed to save document to vector store: {str(e)}") - continue - -# Observing an example of what the LLM and Embedding model receive as input -print("The LLM sees this:") -print(llama_documents[0].get_content(metadata_mode=MetadataMode.LLM)) -print("The Embedding model sees this:") -print(llama_documents[0].get_content(metadata_mode=MetadataMode.EMBED)) - - -``` - -# Creating and Running the Ingestion Pipeline - -In this section, we'll create an ingestion pipeline to process our documents. The pipeline will: - -1. Split the documents into smaller chunks (nodes) using the SentenceSplitter -2. Generate embeddings for each node using our embedding model -3. Store these nodes with their embeddings in our Couchbase vector store - -This process transforms our raw documents into a searchable knowledge base that can be queried semantically. - - -```python - - -# Process documents: split into nodes, generate embeddings, and store in vector database -# Step 3: Create and Run IndexPipeline -index_pipeline = IngestionPipeline( - transformations=[SentenceSplitter(),embed_model], - vector_store=vector_store, -) - -index_pipeline.run(documents=llama_documents) - -``` - -# Using OpenAI's Large Language Model (LLM) -Large language models are AI systems that are trained to understand and generate human language. We'll be using OpenAI's `gpt-4o` model to process user queries and generate meaningful responses based on the retrieved context from our Couchbase vector store. This model is a key component of our RAG system, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By integrating OpenAI's LLM, we equip our RAG system with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our RAG system truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - -The LLM is configured using LlamaIndex's OpenAI-like provider with OpenAI's API endpoint and your OpenAI API key for seamless integration with their services. - - -```python -try: - # Set up the LLM - llm = OpenAI( - api_key=OPENAI_API_KEY, - model="gpt-4o", - - ) - # Configure LlamaIndex to use this LLM - Settings.llm = llm - logging.info("Successfully created the OpenAI LLM") -except Exception as e: - raise ValueError(f"Error creating OpenAI LLM: {str(e)}") -``` - -# Creating the Vector Store Index - -In this section, we'll create a VectorStoreIndex from our Couchbase vector store. This index serves as the foundation for our RAG system, enabling semantic search capabilities and efficient retrieval of relevant information. - -The VectorStoreIndex provides a high-level interface to interact with our vector store, allowing us to: -1. Perform semantic searches based on user queries -2. Retrieve the most relevant documents or chunks -3. Generate contextually appropriate responses using our LLM - - - -```python -# Create your index -from llama_index.core import VectorStoreIndex - -index = VectorStoreIndex.from_vector_store(vector_store) -rag = index.as_query_engine() -``` - -# Retrieval-Augmented Generation (RAG) with Couchbase and LlamaIndex - -Let's test our RAG system by performing a semantic search on a sample query. In this example, we'll use a question about Pep Guardiola's reaction to Manchester City's recent form. The RAG system will: - -1. Process the natural language query -2. Search through our vector database for relevant information -3. Retrieve the most semantically similar documents -4. Generate a comprehensive response using the LLM - -This demonstrates how our system combines the power of vector search with language model capabilities to provide accurate, contextual answers based on the information in our database. - -**Note:** By default, without any GSI vector index, Couchbase uses linear brute force search which compares the query vector against every document in the collection. This works for small datasets but can become slow as the dataset grows. - - -```python -# Sample query from the dataset - -query = "Who will Daniel Dubois fight in Saudi Arabia on 22 February?" - -try: - # Perform the semantic search - start_time = time.time() - response = rag.query(query) - search_elapsed_time = time.time() - start_time - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print(response) - -except RecursionError as e: - raise RuntimeError(f"Error performing semantic search: {e}") -``` - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above RAG system works effectively, we can significantly improve query performance by leveraging Couchbase's advanced GSI vector search capabilities. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -**Hyperscale Vector Indexes** -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -**Composite Vector Indexes** -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -**Choosing the Right Index Type** -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html). - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -**Centroids (IVF - Inverted File):** -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -**Quantization Options:** -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -**Common Examples:** -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/server/current/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index for optimal performance. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -# Create a BHIVE (Hyperscale Vector Index) for optimized vector search -try: - bhive_index_name = f"{INDEX_NAME}_bhive" - - options = { - "dimension": 3072, - "description": "IVF1024,PQ32x8", - "similarity": "DOT", - } - scope.query( - f""" - CREATE INDEX {bhive_index_name} - ON {COLLECTION_NAME} (embedding VECTOR) - USING GSI WITH {json.dumps(options)} - """, - QueryOptions( - timeout=timedelta(seconds=300) - )).execute() - print(f"Successfully created BHIVE index: {bhive_index_name}") -except Exception as e: - print(f"BHIVE index may already exist or error occurred: {str(e)}") - -``` - -The example below shows running the same RAG query, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - - -```python -# Test the optimized GSI vector search with BHIVE index -query = "Who will Daniel Dubois fight in Saudi Arabia on 22 February?" -try: - # Create a new query engine using the optimized vector store - optimized_rag = index.as_query_engine() - - # Perform the semantic search with GSI optimization - start_time = time.time() - response = optimized_rag.query(query) - search_elapsed_time = time.time() - start_time - - # Display search results - print(f"\nOptimized GSI Vector Search Results (completed in {search_elapsed_time:.2f} seconds):") - print(response) - -except Exception as e: - raise RuntimeError(f"Error performing optimized semantic search: {e}") - -``` - -# Conclusion -In this tutorial, we've built a Retrieval Augmented Generation (RAG) system using Couchbase Capella's GSI vector search, OpenAI, and LlamaIndex. We used the BBC News dataset, which contains real-time news articles, to demonstrate how RAG can be used to answer questions about current events and provide up-to-date information that extends beyond the LLM's training data. - -The key components of our RAG system include: - -1. **Couchbase Capella GSI Vector Search** as the high-performance vector database for storing and retrieving document embeddings -2. **LlamaIndex** as the framework for connecting our data to the LLM -3. **OpenAI Services** for generating embeddings (`text-embedding-3-large`) and LLM responses (`gpt-4o`) -4. **GSI Vector Indexes** (BHIVE/Composite) for optimized vector search performance - -This approach allows us to enhance the capabilities of large language models by grounding their responses in specific, up-to-date information from our knowledge base, while leveraging Couchbase's advanced GSI vector search for optimal performance and scalability. - diff --git a/tutorial/markdown/generated/vector-search-cookbook/mistralai-fts-mistralai.md b/tutorial/markdown/generated/vector-search-cookbook/mistralai-fts-mistralai.md deleted file mode 100644 index b260039..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/mistralai-fts-mistralai.md +++ /dev/null @@ -1,231 +0,0 @@ ---- -# frontmatter -path: "/tutorial-mistralai-couchbase-vector-search-with-fts" -title: Using Mistral AI Embeddings with Couchbase Vector Search using FTS service -short_title: Mistral AI with Couchbase Vector Search using FTS service -description: - - Learn how to generate embeddings using Mistral AI and store them in Couchbase using FTS service. - - This tutorial demonstrates how to use Couchbase's vector search capabilities with Mistral AI embeddings. - - You'll understand how to perform vector search to find relevant documents based on similarity. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - Mistral AI -sdk_language: - - python -length: 30 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/mistralai/fts/mistralai.ipynb) - -# Introduction - -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [Mistral AI](https://mistral.ai/) as the AI-powered embedding Model. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively, if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-mistralai-couchbase-vector-search-with-global-secondary-index) - -Couchbase is a NoSQL distributed document database (JSON) with many of the best features of a relational DBMS: SQL, distributed ACID transactions, and much more. [Couchbase Capella™](https://cloud.couchbase.com/sign-up) is the easiest way to get started, but you can also download and run [Couchbase Server](http://couchbase.com/downloads) on-premises. - -Mistral AI is a research lab building the best open source models in the world. La Plateforme enables developers and enterprises to build new products and applications, powered by Mistral’s open source and commercial LLMs. - -The [Mistral AI APIs](https://console.mistral.ai/) empower LLM applications via: - -- [Text generation](https://docs.mistral.ai/capabilities/completion/), enables streaming and provides the ability to display partial model results in real-time -- [Code generation](https://docs.mistral.ai/capabilities/code_generation/), enpowers code generation tasks, including fill-in-the-middle and code completion -- [Embeddings](https://docs.mistral.ai/capabilities/embeddings/), useful for RAG where it represents the meaning of text as a list of numbers -- [Function calling](https://docs.mistral.ai/capabilities/function_calling/), enables Mistral models to connect to external tools -- [Fine-tuning](https://docs.mistral.ai/capabilities/finetuning/), enables developers to create customized and specilized models -- [JSON mode](https://docs.mistral.ai/capabilities/json_mode/), enables developers to set the response format to json_object -- [Guardrailing](https://docs.mistral.ai/capabilities/guardrailing/), enables developers to enforce policies at the system level of Mistral models - - -# Before you start - -## Get Credentials for Mistral AI - -Please follow the [instructions](https://console.mistral.ai/api-keys/) to generate the Mistral AI credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Install necessary libraries - - -```python -!pip install couchbase==4.3.5 mistralai==1.7.0 -``` - - [Output too long, omitted for brevity] - -# Imports - - -```python -from pathlib import Path -from datetime import timedelta -from mistralai import Mistral -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.options import (ClusterOptions, ClusterTimeoutOptions, - QueryOptions) -import couchbase.search as search -from couchbase.options import SearchOptions -from couchbase.vector_search import VectorQuery, VectorSearch -import uuid -``` - -# Prerequisites - - - -```python -import getpass -couchbase_cluster_url = input("Cluster URL:") -couchbase_username = input("Couchbase username:") -couchbase_password = getpass.getpass("Couchbase password:") -couchbase_bucket = input("Couchbase bucket:") -couchbase_scope = input("Couchbase scope:") -couchbase_collection = input("Couchbase collection:") -``` - - Cluster URL: localhost - Couchbase username: Administrator - Couchbase password: ········ - Couchbase bucket: mistralai - Couchbase scope: _default - Couchbase collection: mistralai - - -# Couchbase Connection - - -```python -auth = PasswordAuthenticator( - couchbase_username, - couchbase_password -) -``` - - -```python -cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth)) -cluster.wait_until_ready(timedelta(seconds=5)) - -bucket = cluster.bucket(couchbase_bucket) -scope = bucket.scope(couchbase_scope) -collection = scope.collection(couchbase_collection) -``` - -# Creating Couchbase Vector Search Index -In order to store Mistral embeddings onto a Couchbase Cluster, a vector search index needs to be created first. We included a sample index definition that will work with this tutorial in the `mistralai_index.json` file. The definition can be used to create a vector index using Couchbase server web console, on more information on vector indexes, please read [Create a Vector Search Index with the Server Web Console](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html). - - -```python -search_index_name = couchbase_bucket + "._default.vector_test" -search_index = cluster.search_indexes().get_index(search_index_name) -``` - -# Mistral Connection - - -```python -MISTRAL_API_KEY = getpass.getpass("Mistral API Key:") -mistral_client = Mistral(api_key=MISTRAL_API_KEY) -``` - -# Embedding Documents -Mistral client can be used to generate vector embeddings for given text fragments. These embeddings represent the sentiment of corresponding fragments and can be stored in Couchbase for further retrieval. A custom embedding text can also be added into the embedding texts array by running this code block: - - -```python -texts = [ - "Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.", - "It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.", - input("custom embedding text") -] -embeddings = mistral_client.embeddings.create( - model="mistral-embed", - inputs=texts, -) - -print("Output embeddings: " + str(len(embeddings.data))) -``` - -The output `embeddings` is an EmbeddingResponse object with the embeddings and the token usage information: - -``` -EmbeddingResponse( - id='eb4c2c739780415bb3af4e47580318cc', object='list', data=[ - Data(object='embedding', embedding=[-0.0165863037109375,...], index=0), - Data(object='embedding', embedding=[-0.0234222412109375,...], index=1)], - Data(object='embedding', embedding=[-0.0466222735279375,...], index=2)], - model='mistral-embed', usage=EmbeddingResponseUsage(prompt_tokens=15, total_tokens=15) -) -``` - -# Storing Embeddings in Couchbase -Each embedding needs to be stored as a couchbase document. According to provided search index, embedding vector values need to be stored in the `vector` field. The original text of the embedding can be stored in the same document: - - -```python -for i in range(0, len(texts)): - doc = { - "id": str(uuid.uuid4()), - "text": texts[i], - "vector": embeddings.data[i].embedding, - } - collection.upsert(doc["id"], doc) -``` - -# Searching For Embeddings -Stored in Couchbase embeddings later can be searched using the vector index to, for example, find text fragments that would be the most relevant to some user-entered prompt: - - -```python -search_embedding = mistral_client.embeddings.create( - model="mistral-embed", - inputs=["name a multipurpose database with distributed capability"], -).data[0] - -search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search( - VectorSearch.from_vector_query( - VectorQuery( - "vector", search_embedding.embedding, num_candidates=1 - ) - ) -) -result = scope.search( - "vector_test", - search_req, - SearchOptions( - limit=13, - fields=["vector", "id", "text"] - ) -) -for row in result.rows(): - print("Found answer: " + row.id + "; score: " + str(row.score)) - doc = collection.get(row.id) - print("Answer text: " + doc.value["text"]) - - -``` - - Found answer: 7a4c24dd-393f-4f08-ae42-69ea7009dcda; score: 1.7320726542316662 - Answer text: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - diff --git a/tutorial/markdown/generated/vector-search-cookbook/mistralai-gsi-mistralai.md b/tutorial/markdown/generated/vector-search-cookbook/mistralai-gsi-mistralai.md deleted file mode 100644 index ff41b14..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/mistralai-gsi-mistralai.md +++ /dev/null @@ -1,662 +0,0 @@ ---- -# frontmatter -path: "/tutorial-mistralai-couchbase-vector-search-with-global-secondary-index" -title: Using Mistral AI Embeddings using GSI Index -short_title: Mistral AI with Couchbase GSI Index -description: - - Learn how to generate embeddings using Mistral AI and store them in Couchbase using GSI. - - This tutorial demonstrates how to use Couchbase's GSI index capabilities with Mistral AI embeddings. - - You'll understand how to perform optimized vector search using Global Secondary Index for better performance. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - Mistral AI - - GSI -sdk_language: - - python -length: 30 Mins ---- - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/mistralai/gsi/mistralai.ipynb) - -# Introduction - -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [Mistral AI](https://mistral.ai/) as the AI-powered embedding Model. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively, if you want to perform semantic search using the FTS, please take a look at [this.](https://developer.couchbase.com/tutorial-mistralai-couchbase-vector-search-with-fts) - -Couchbase is a NoSQL distributed document database (JSON) with many of the best features of a relational DBMS: SQL, distributed ACID transactions, and much more. [Couchbase Capella™](https://cloud.couchbase.com/sign-up) is the easiest way to get started, but you can also download and run [Couchbase Server](http://couchbase.com/downloads) on-premises. - -Mistral AI is a research lab building the best open source models in the world. La Plateforme enables developers and enterprises to build new products and applications, powered by Mistral's open source and commercial LLMs. - -The [Mistral AI APIs](https://console.mistral.ai/) empower LLM applications via: - -- [Text generation](https://docs.mistral.ai/capabilities/completion/), enables streaming and provides the ability to display partial model results in real-time -- [Code generation](https://docs.mistral.ai/capabilities/code_generation/), empowers code generation tasks, including fill-in-the-middle and code completion -- [Embeddings](https://docs.mistral.ai/capabilities/embeddings/), useful for RAG where it represents the meaning of text as a list of numbers -- [Function calling](https://docs.mistral.ai/capabilities/function_calling/), enables Mistral models to connect to external tools -- [Fine-tuning](https://docs.mistral.ai/capabilities/finetuning/), enables developers to create customized and specialized models -- [JSON mode](https://docs.mistral.ai/capabilities/json_mode/), enables developers to set the response format to json_object -- [Guardrailing](https://docs.mistral.ai/capabilities/guardrailing/), enables developers to enforce policies at the system level of Mistral models - -This tutorial demonstrates how to use Mistral AI's embedding capabilities with Couchbase's **Global Secondary Index (GSI)** for optimized vector search operations. GSI provides superior performance for vector operations compared to traditional search methods, especially for large-scale applications. - - -# Before you start - -## Get Credentials for Mistral AI - -Please follow the [instructions](https://console.mistral.ai/api-keys/) to generate the Mistral AI credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -**Note: To run this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0.** - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - - -# Install necessary libraries - - - -```python -%pip install couchbase==4.4.0 mistralai==1.9.10 langchain-couchbase==0.5.0 langchain-core==0.3.76 python-dotenv==1.1.1 - -``` - -# Imports - - - -```python -from datetime import timedelta -from mistralai import Mistral -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.options import ClusterOptions -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy, IndexType -from langchain_core.embeddings import Embeddings -from typing import List -from dotenv import load_dotenv -import os -import time -``` - -# Prerequisites - - - -```python -import getpass - -# Load environment variables from .env file if it exists -load_dotenv() - -# Load from environment variables or prompt for input -CB_HOST = os.getenv('CB_HOST') or input("Cluster URL:") -CB_USERNAME = os.getenv('CB_USERNAME') or input("Couchbase username:") -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass("Couchbase password:") -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input("Couchbase bucket:") -SCOPE_NAME = os.getenv('SCOPE_NAME') or input("Couchbase scope:") -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input("Couchbase collection:") - -``` - -# Couchbase Connection - - - -```python -auth = PasswordAuthenticator( - CB_USERNAME, - CB_PASSWORD -) - -``` - - -```python -cluster = Cluster(CB_HOST, ClusterOptions(auth)) -cluster.wait_until_ready(timedelta(seconds=5)) - -bucket = cluster.bucket(CB_BUCKET_NAME) -scope = bucket.scope(SCOPE_NAME) -collection = scope.collection(COLLECTION_NAME) - -``` - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - except Exception as e: - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - bucket_manager.create_scope(scope_name) - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - bucket_manager.create_collection(scope_name, collection_name) - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - except Exception as e: - print(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) - -``` - - - - - - - - -# Creating Mistral AI Embeddings Wrapper - -Since Mistral AI doesn't have native LangChain integration, we need to create a custom wrapper class that implements the LangChain Embeddings interface. This will allow us to use Mistral AI's embedding model with Couchbase's GSI vector store. - - - -```python -class MistralAIEmbeddings(Embeddings): - """Custom Mistral AI Embeddings wrapper for LangChain compatibility.""" - - def __init__(self, api_key: str, model: str = "mistral-embed"): - self.client = Mistral(api_key=api_key) - self.model = model - - def embed_documents(self, texts: List[str]) -> List[List[float]]: - """Embed search docs.""" - try: - response = self.client.embeddings.create( - model=self.model, - inputs=texts, - ) - return [embedding.embedding for embedding in response.data] - except Exception as e: - raise ValueError(f"Error generating embeddings: {str(e)}") - - def embed_query(self, text: str) -> List[float]: - """Embed query text.""" - try: - response = self.client.embeddings.create( - model=self.model, - inputs=[text], - ) - return response.data[0].embedding - except Exception as e: - raise ValueError(f"Error generating query embedding: {str(e)}") - -``` - -# Mistral Connection - - - -```python -MISTRAL_API_KEY = os.getenv('MISTRAL_API_KEY') or getpass.getpass("Mistral API Key:") -embeddings = MistralAIEmbeddings(api_key=MISTRAL_API_KEY, model="mistral-embed") -mistral_client = Mistral(api_key=MISTRAL_API_KEY) - -``` - -# Understanding GSI Vector Search - -### Optimizing Vector Search with Global Secondary Index (GSI) - -With Couchbase 8.0+, you can leverage the power of GSI-based vector search, which offers significant performance improvements over traditional Full-Text Search (FTS) approaches for vector-first workloads. GSI vector search provides high-performance vector similarity search with advanced filtering capabilities and is designed to scale to billions of vectors. - -#### GSI vs FTS: Choosing the Right Approach - -| Feature | GSI Vector Search | FTS Vector Search | -| --------------------- | --------------------------------------------------------------- | ----------------------------------------- | -| **Best For** | Vector-first workloads, complex filtering, high QPS performance| Hybrid search and high recall rates | -| **Couchbase Version** | 8.0.0+ | 7.6+ | -| **Filtering** | Pre-filtering with `WHERE` clauses (Composite) or post-filtering (BHIVE) | Pre-filtering with flexible ordering | -| **Scalability** | Up to billions of vectors (BHIVE) | Up to 10 million vectors | -| **Performance** | Optimized for concurrent operations with low memory footprint | Good for mixed text and vector queries | - - -#### GSI Vector Index Types - -Couchbase offers two distinct GSI vector index types, each optimized for different use cases: - -##### Hyperscale Vector Indexes (BHIVE) - -- **Best for**: Pure vector searches like content discovery, recommendations, and semantic search -- **Use when**: You primarily perform vector-only queries without complex scalar filtering -- **Features**: - - High performance with low memory footprint - - Optimized for concurrent operations - - Designed to scale to billions of vectors - - Supports post-scan filtering for basic metadata filtering - -##### Composite Vector Indexes - - - **Best for**: Filtered vector searches that combine vector similarity with scalar value filtering -- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- **Features**: - - Efficient pre-filtering where scalar attributes reduce the vector comparison scope - - Best for well-defined workloads requiring complex filtering using GSI features - - Supports range lookups combined with vector search - -#### Index Type Selection for This Tutorial - -In this tutorial, we'll demonstrate creating a **BHIVE index** and running vector similarity queries using GSI. BHIVE is ideal for semantic search scenarios where you want: - -1. **High-performance vector search** across large datasets -2. **Low latency** for real-time applications -3. **Scalability** to handle growing vector collections -4. **Concurrent operations** for multi-user environments - -The BHIVE index will provide optimal performance for our OpenAI embedding-based semantic search implementation. - -#### Alternative: Composite Vector Index - -If your use case requires complex filtering with scalar attributes, you may want to consider using a **Composite Vector Index** instead: - -```python -# Alternative: Create a Composite index for filtered searches -vector_store.create_index( - index_type=IndexType.COMPOSITE, - index_description="IVF,SQ8", - distance_metric=DistanceStrategy.COSINE, - index_name="pydantic_composite_index", -) -``` - -**Use Composite indexes when:** -- You need to filter by document metadata or attributes before vector similarity -- Your queries combine vector search with WHERE clauses -- You have well-defined filtering requirements that can reduce the search space - -**Note**: Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications where you need to search within specific categories, date ranges, or user-specific data segments. - -#### Understanding GSI Index Configuration (Couchbase 8.0 Feature) - -Before creating our BHIVE index, it's important to understand the configuration parameters that optimize vector storage and search performance. The `index_description` parameter controls how Couchbase optimizes vector storage through centroids and quantization. - -##### Index Description Format: `'IVF[],{PQ|SQ}'` - -##### Centroids (IVF - Inverted File) - -- Controls how the dataset is subdivided for faster searches -- **More centroids** = faster search, slower training time -- **Fewer centroids** = slower search, faster training time -- If omitted (like `IVF,SQ8`), Couchbase auto-selects based on dataset size - -###### Quantization Options - -**Scalar Quantization (SQ):** -- `SQ4`, `SQ6`, `SQ8` (4, 6, or 8 bits per dimension) -- Lower memory usage, faster search, slightly reduced accuracy - -**Product Quantization (PQ):** -- Format: `PQx` (e.g., `PQ32x8`) -- Better compression for very large datasets -- More complex but can maintain accuracy with smaller index size - -##### Common Configuration Examples - -- **`IVF,SQ8`** - Auto centroids, 8-bit scalar quantization (good default) -- **`IVF1000,SQ6`** - 1000 centroids, 6-bit scalar quantization -- **`IVF,PQ32x8`** - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - -##### Our Configuration Choice - -In this tutorial, we use `IVF,SQ8` which provides: -- **Auto-selected centroids** optimized for our dataset size -- **8-bit scalar quantization** for good balance of speed, memory usage, and accuracy -- **COSINE distance metric** ideal for semantic similarity search -- **Optimal performance** for most semantic search use cases - -# Setting Up Couchbase GSI Vector Store - -Instead of using FTS (Full-Text Search), we'll use Couchbase's GSI (Global Secondary Index) for vector operations. GSI provides better performance for vector search operations and supports advanced index types like BHIVE and COMPOSITE indexes. - - - -```python -vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - distance_metric=DistanceStrategy.COSINE -) - -print("GSI Vector Store created successfully!") - -``` - - GSI Vector Store created successfully! - - -# Embedding Documents - -Mistral client can be used to generate vector embeddings for given text fragments. These embeddings represent the sentiment of corresponding fragments and can be stored in Couchbase for further retrieval. A custom embedding text can also be added into the embedding texts array by running this code block: - - - -```python -texts = [ - "Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON's versatility, with a foundation that is extremely fast and scalable.", - "It's used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.", - input("custom embedding text") -] - -# Store documents in the GSI vector store -vector_store.add_texts(texts) - -print("Documents added to GSI vector store successfully!") - -``` - - 2025-11-07 15:50:09,439 - INFO - HTTP Request: POST https://api.mistral.ai/v1/embeddings "HTTP/1.1 200 OK" - - - Documents added to GSI vector store successfully! - - -# Understanding Semantic Search in Couchbase - -Semantic search goes beyond traditional keyword matching by understanding the meaning and context behind queries. Here's how it works in Couchbase: - -## How Semantic Search Works - -1. **Vector Embeddings**: Documents and queries are converted into high-dimensional vectors using an embeddings model (in our case, Mistral AI's mistral-embed) - -2. **Similarity Calculation**: When a query is made, Couchbase compares the query vector against stored document vectors using the COSINE distance metric - -3. **Result Ranking**: Documents are ranked by their vector distance (lower distance = more similar meaning) - -4. **Flexible Configuration**: Different distance metrics (cosine, euclidean, dot product) and embedding models can be used based on your needs - -The `similarity_search_with_score` method performs this entire process, returning documents along with their similarity scores. This enables you to find semantically related content even when exact keywords don't match. - -Now let's see semantic search in action and measure its performance with different optimization strategies. - - -# Vector Search Performance Optimization - -Now let's measure and compare the performance benefits of different optimization strategies. We'll conduct a comprehensive performance analysis across two phases: - -## Performance Testing Phases - -1. **Phase 1 - Baseline Performance**: Test vector search without GSI indexes to establish baseline metrics - -2. **Phase 2 - GSI-Optimized Search**: Create BHIVE index and measure performance improvements - -**Important Context:** - -- GSI performance benefits scale with dataset size and concurrent load -- With our dataset (~3 documents), improvements may be modest -- Production environments with millions of vectors show significant GSI advantages -- The combination of GSI + embeddings provides optimal semantic search performance - - -# Phase 1: Baseline Performance (Without GSI Index) - -First, let's test the search performance without any GSI indexes. This will help us establish a baseline for comparison. - - - -```python -import logging - -# Configure logging -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') - -# Phase 1: Baseline Performance (Without GSI Index) -print("="*80) -print("PHASE 1: BASELINE PERFORMANCE (NO GSI INDEX)") -print("="*80) - -query = "name a multipurpose database with distributed capability" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=3) - baseline_time = time.time() - start_time - - logging.info(f"Baseline search completed in {baseline_time:.2f} seconds") - - # Display search results - print(f"\nBaseline Search Results (completed in {baseline_time:.4f} seconds):") - print("-" * 80) - for i, (doc, distance) in enumerate(search_results, 1): - print(f"[Result {i}] Vector Distance: {distance:.4f}") - # Truncate for readability - content_preview = doc.page_content[:150] + "..." if len(doc.page_content) > 150 else doc.page_content - print(f"Text: {content_preview}") - print("-" * 80) - -except Exception as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") - -``` - -# Phase 2: GSI-Optimized Performance (With BHIVE Index) - -Now let's create a BHIVE index and measure the performance improvements when searching with GSI optimization. - - - -```python -# Create a BHIVE index for optimal vector search performance -print("\nCreating BHIVE index for GSI optimization...") -vector_store.create_index( - index_type=IndexType.BHIVE, - index_name="mistral_bhive_index_optimized", - index_description="IVF,SQ8" -) -print("BHIVE index created successfully!") - -``` - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="pydantic_ai_composite_index", index_description="IVF,SQ8") - - -```python -# Phase 2: GSI-Optimized Performance (With BHIVE Index) -print("\n" + "="*80) -print("PHASE 2: GSI-OPTIMIZED PERFORMANCE (WITH BHIVE INDEX)") -print("="*80) - -query = "name a multipurpose database with distributed capability" - -try: - # Perform the semantic search with GSI - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=3) - gsi_time = time.time() - start_time - - logging.info(f"GSI-optimized search completed in {gsi_time:.2f} seconds") - - # Display search results - print(f"\nGSI-Optimized Search Results (completed in {gsi_time:.4f} seconds):") - print("-" * 80) - for i, (doc, distance) in enumerate(search_results, 1): - print(f"[Result {i}] Vector Distance: {distance:.4f}") - # Truncate for readability - content_preview = doc.page_content[:150] + "..." if len(doc.page_content) > 150 else doc.page_content - print(f"Text: {content_preview}") - print("-" * 80) - -except Exception as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") - -``` - -# Performance Summary - -Let's analyze the performance improvements achieved through GSI optimization. - - - -```python -print("\n" + "="*80) -print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"\n📊 Performance Comparison:") -print(f"{'Optimization Level':<35} {'Time (seconds)':<20} {'Status'}") -print("-" * 80) -print(f"{'Phase 1 - Baseline (No Index)':<35} {baseline_time:.4f}{'':16} ⚪ Baseline") -print(f"{'Phase 2 - GSI-Optimized (BHIVE)':<35} {gsi_time:.4f}{'':16} ✅ Optimized") - -# Calculate improvement -if baseline_time > gsi_time: - speedup = baseline_time / gsi_time - improvement = ((baseline_time - gsi_time) / baseline_time) * 100 - print(f"\n✨ GSI Performance Gain: {speedup:.2f}x faster ({improvement:.1f}% improvement)") -elif gsi_time > baseline_time: - slowdown_pct = ((gsi_time - baseline_time) / baseline_time) * 100 - print(f"\n⚠️ Note: GSI was {slowdown_pct:.1f}% slower than baseline in this run") - print(f" This can happen with small datasets. GSI benefits emerge with scale.") -else: - print(f"\n⚖️ Performance: Comparable to baseline") - -print("\n" + "-"*80) -print("KEY INSIGHTS:") -print("-"*80) -print("1. 🚀 GSI Optimization:") -print(" • BHIVE indexes excel with large-scale datasets (millions+ vectors)") -print(" • Performance gains increase with dataset size and concurrent queries") -print(" • Optimal for production workloads with sustained traffic patterns") - -print("\n2. 📦 Dataset Size Impact:") -print(f" • Current dataset: ~3 sample documents") -print(" • At this scale, performance differences may be minimal or variable") -print(" • Significant gains typically seen with 10M+ vectors") - -print("\n3. 🎯 When to Use GSI:") -print(" • Large-scale vector search applications") -print(" • High query-per-second (QPS) requirements") -print(" • Multi-user concurrent access scenarios") -print(" • Production environments requiring scalability") - -print("\n" + "="*80) - -``` - -# Conclusion - -This tutorial demonstrated how to use Mistral AI's embedding capabilities with Couchbase's GSI vector search, including comprehensive performance analysis. Key takeaways include: - -## What We Covered - -1. **Semantic Search Fundamentals**: Understanding how vector embeddings enable meaning-based search -2. **Mistral AI Integration**: Creating a custom LangChain wrapper for Mistral AI's powerful mistral-embed model -3. **Performance Testing**: Conducting baseline vs GSI-optimized performance comparisons -4. **GSI Index Types**: Understanding BHIVE (pure vector search) and COMPOSITE (filtered searches) indexes -5. **Index Configuration**: Learning about centroids, quantization, and optimization settings - -## Key Benefits of This Approach - -1. **High-Performance Vector Search**: GSI provides optimized vector operations with low latency -2. **Scalability**: BHIVE indexes designed to handle billions of vectors efficiently -3. **Production-Ready**: Optimal for applications requiring high QPS and concurrent access -4. **Flexible Configuration**: Customizable index settings for different use cases -5. **Advanced Filtering**: COMPOSITE indexes enable complex scalar + vector queries - -## Performance Insights - -- GSI benefits scale with dataset size and query load -- Small datasets may show modest improvements -- Production environments with millions of vectors see significant performance gains -- Consider your specific use case when choosing between BHIVE and COMPOSITE indexes - -## Next Steps - -- Scale your dataset to explore GSI performance at higher volumes -- Experiment with different index configurations (IVF centroids, quantization settings) -- Try COMPOSITE indexes for filtered search scenarios -- Integrate this solution into your production RAG or semantic search applications - -The combination of Mistral AI's embeddings and Couchbase's GSI vector search provides a powerful, scalable foundation for building intelligent search applications. - diff --git a/tutorial/markdown/generated/vector-search-cookbook/openrouter-deepseek-fts-RAG_with_Couchbase_and_Openrouter_Deepseek.md b/tutorial/markdown/generated/vector-search-cookbook/openrouter-deepseek-fts-RAG_with_Couchbase_and_Openrouter_Deepseek.md deleted file mode 100644 index 10b68ef..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/openrouter-deepseek-fts-RAG_with_Couchbase_and_Openrouter_Deepseek.md +++ /dev/null @@ -1,868 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openrouter-deepseek-with-fts" -title: Retrieval-Augmented Generation with Couchbase and OpenRouter Deepseek using FTS service -short_title: RAG with Couchbase and OpenRouter Deepseek using FTS service -description: - - Learn how to build a semantic search engine using Couchbase and OpenRouter with Deepseek using FTS service. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with OpenRouter Deepseek as both embeddings and language model provider. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - Deepseek - - OpenRouter -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/openrouter-deepseek/fts/RAG_with_Couchbase_and_Openrouter_Deepseek.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Deepseek V3 as the language model provider (via OpenRouter or direct API)](https://deepseek.ai/) and OpenAI for embeddings. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using the FTS service from scratch. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-openrouter-deepseek-with-global-secondary-index/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/openrouter-deepseek/fts/RAG_with_Couchbase_and_Openrouter_Deepseek.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for OpenRouter and Deepseek -* Sign up for an account at [OpenRouter](https://openrouter.ai/) to get your API key -* OpenRouter provides access to Deepseek models, so no separate Deepseek credentials are needed -* Store your OpenRouter API key securely as it will be used to access the models -* For [Deepseek](https://deepseek.ai/) models, you can use the default models provided by OpenRouter - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -## Setting the Stage: Installing Necessary Libraries - -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.3.0 langchain-deepseek==0.1.3 langchain-openai==0.3.13 python-dotenv==1.1.0 -``` - - Note: you may need to restart the kernel to use updated packages. - - -## Importing Necessary Libraries - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException,ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import OpenAIEmbeddings -``` - -## Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -# Suppress httpx logging -logging.getLogger('httpx').setLevel(logging.CRITICAL) -``` - -## Environment Variables and Configuration - -This section handles loading and validating environment variables and configuration settings: -# -1. API Keys: - - Supports either direct Deepseek API or OpenRouter API access - - Prompts for API key input if not found in environment - - Requires OpenAI API key for embeddings -# -2. Couchbase Settings: - - Connection details (host, username, password) - - Bucket, scope and collection names - - Vector search index configuration - - Cache collection settings -# -The code validates that all required credentials are present before proceeding. -It allows flexible configuration through environment variables or interactive prompts, -with sensible defaults for local development. - - - -```python -# Load environment variables from .env file if it exists -load_dotenv() - -# API Keys -# Allow either Deepseek API directly or via OpenRouter -DEEPSEEK_API_KEY = os.getenv('DEEPSEEK_API_KEY') -OPENROUTER_API_KEY = os.getenv('OPENROUTER_API_KEY') - -if not DEEPSEEK_API_KEY and not OPENROUTER_API_KEY: - api_choice = input('Choose API (1 for Deepseek direct, 2 for OpenRouter): ') - if api_choice == '1': - DEEPSEEK_API_KEY = getpass.getpass('Enter your Deepseek API Key: ') - else: - OPENROUTER_API_KEY = getpass.getpass('Enter your OpenRouter API Key: ') - -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API Key: ') - -# Couchbase Settings -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME') or input('Enter your index name (default: vector_search_deepseek): ') or 'vector_search_deepseek' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: deepseek): ') or 'deepseek' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if required credentials are set -required_creds = { - 'OPENAI_API_KEY': OPENAI_API_KEY, - 'CB_HOST': CB_HOST, - 'CB_USERNAME': CB_USERNAME, - 'CB_PASSWORD': CB_PASSWORD, - 'CB_BUCKET_NAME': CB_BUCKET_NAME -} - -# Add the API key that was chosen -if DEEPSEEK_API_KEY: - required_creds['DEEPSEEK_API_KEY'] = DEEPSEEK_API_KEY -elif OPENROUTER_API_KEY: - required_creds['OPENROUTER_API_KEY'] = OPENROUTER_API_KEY -else: - raise ValueError("Either Deepseek API Key or OpenRouter API Key must be provided") - -for cred_name, cred_value in required_creds.items(): - if not cred_value: - raise ValueError(f"{cred_name} is not set") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-05-25 14:39:18,465 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: If you are using Capella, create a bucket manually called vector-search-testing(or any name you prefer) with the same properties. - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-05-25 14:39:19,580 - INFO - Bucket 'vector-search-testing' exists. - 2025-05-25 14:39:21,409 - INFO - Collection 'deepseek' already exists. Skipping creation. - 2025-05-25 14:39:24,342 - INFO - Primary index present or created successfully. - 2025-05-25 14:39:24,604 - INFO - All documents cleared from the collection. - 2025-05-25 14:39:24,606 - INFO - Bucket 'vector-search-testing' exists. - 2025-05-25 14:39:26,535 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-05-25 14:39:29,589 - INFO - Primary index present or created successfully. - 2025-05-25 14:39:29,813 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This OpenRouter Deepseek vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `deepseek`. The configuration is set up for vectors with exactly `1536 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -try: - with open('deepseek_index.json', 'r') as file: - index_definition = json.load(file) -except Exception as e: - raise ValueError(f"Error loading index definition: {str(e)}") -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine.Creating search indexes placeholder text. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-05-25 14:39:31,015 - INFO - Index 'vector_search_deepseek' found - 2025-05-25 14:39:31,770 - INFO - Index 'vector_search_deepseek' already exists. Skipping creation/update. - - -## Creating the Embeddings client -This section creates an OpenAI embeddings client using the OpenAI API key. -The embeddings client is configured to use the "text-embedding-3-small" model, -which converts text into numerical vector representations. -These vector embeddings are essential for semantic search and similarity matching. -The client will be used by the vector store to generate embeddings for documents. - - -```python -try: - embeddings = OpenAIEmbeddings( - api_key=OPENAI_API_KEY, - model="text-embedding-3-small" - ) - logging.info("Successfully created OpenAI embeddings client") -except Exception as e: - raise ValueError(f"Error creating OpenAI embeddings client: {str(e)}") -``` - - 2025-05-25 14:39:32,003 - INFO - Successfully created OpenAI embeddings client - - -## Setting Up the Couchbase Vector Store -A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-05-25 14:39:35,246 - INFO - Successfully created vector store - - -## Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-05-25 14:39:41,364 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-05-25 14:41:37,848 - INFO - Document ingestion completed successfully. - - -## Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-05-25 14:41:40,203 - INFO - Successfully created cache - - -## Setting Up the LLM Model -In this section, we set up the Large Language Model (LLM) for our RAG system. We're using the Deepseek model, which can be accessed through two different methods: - -1. **Deepseek API Key**: This is obtained directly from Deepseek's platform (https://deepseek.ai) by creating an account and subscribing to their API services. With this key, you can access Deepseek's models directly using the `ChatDeepSeek` class from the `langchain_deepseek` package. - -2. **OpenRouter API Key**: OpenRouter (https://openrouter.ai) is a service that provides unified access to multiple LLM providers, including Deepseek. You can obtain an API key by creating an account on OpenRouter's website. This approach uses the `ChatOpenAI` class from `langchain_openai` but with a custom base URL pointing to OpenRouter's API endpoint. - -The key difference is that OpenRouter acts as an intermediary service that can route your requests to various LLM providers, while the Deepseek API gives you direct access to only Deepseek's models. OpenRouter can be useful if you want to switch between different LLM providers without changing your code significantly. - -In our implementation, we check for both keys and prioritize using the Deepseek API directly if available, falling back to OpenRouter if not. The model is configured with temperature=0 to ensure deterministic, focused responses suitable for RAG applications. - - - -```python -from langchain_deepseek import ChatDeepSeek -from langchain_openai import ChatOpenAI - -if DEEPSEEK_API_KEY: - try: - llm = ChatDeepSeek( - api_key=DEEPSEEK_API_KEY, - model_name="deepseek-chat", - temperature=0 - ) - logging.info("Successfully created Deepseek LLM client") - except Exception as e: - raise ValueError(f"Error creating Deepseek LLM client: {str(e)}") -elif OPENROUTER_API_KEY: - try: - llm = ChatOpenAI( - api_key=OPENROUTER_API_KEY, - base_url="https://openrouter.ai/api/v1", - model="deepseek/deepseek-chat-v3.1", - temperature=0, - ) - logging.info("Successfully created Deepseek LLM client through OpenRouter") - except Exception as e: - raise ValueError(f"Error creating Deepseek LLM client: {str(e)}") -else: - raise ValueError("Either Deepseek API Key or OpenRouter API Key must be provided") -``` - - 2025-05-25 14:41:40,237 - INFO - Successfully created Deepseek LLM client through OpenRouter - - -## Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. - -In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of the CouchbaseSearchVectorStore. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-05-25 14:41:41,802 - INFO - Semantic search completed in 1.56 seconds - - - - Semantic Search Results (completed in 1.56 seconds): - -------------------------------------------------------------------------------- - Score: 0.6303, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - Littler beat Luke Humphries to win the Premier League title in May - - The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m." - - Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November. - • None Know a lot about Littler? Take our quiz - -------------------------------------------------------------------------------- - Score: 0.6099, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - Littler returned to Alexandra Palace to a boisterous reception from more than 3,000 spectators and delivered an astonishing display in the fourth set. He was on for a nine-darter after his opening two throws in both of the first two legs and completed the set in 32 darts - the minimum possible is 27. The teenager will next play after Christmas against European Championship winner Ritchie Edhouse, the 29th seed, or Ian White, and is seeded to meet Humphries in the semi-finals. Having entered last year's event ranked 164th, Littler is up to fourth in the world and will go to number two if he reaches the final again this time. He has won 10 titles in his debut professional year, including the Premier League and Grand Slam of Darts. After reaching the World Championship final as a debutant aged just 16, Littler's life has been transformed and interest in darts has rocketed. Google say he was the most searched-for athlete online in the UK during 2024. This Christmas, more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies and has prompted plans to expand the World Championship. Littler was named BBC Young Sports Personality of the Year on Tuesday and was runner-up to athlete Keely Hodgkinson for the main award. - - Nick Kenny will play world champion Luke Humphries in round three after Christmas - - Barneveld was shocked 3-1 by world number 76 Kenny, who was in tears after a famous victory. Kenny, 32, will face Humphries in round three after defeating the Dutchman, who won the BDO world title four times and the PDC crown in 2007. Van Barneveld, ranked 32nd, became the sixth seed to exit in the second round. His compatriot Noppert, the 13th seed, was stunned 3-1 by Joyce, who will face Ryan Searle or Matt Campbell next, with the winner of that tie potentially meeting Littler in the last 16. Elsewhere, 15th seed Chris Dobey booked his place in the third round with a 3-1 win over Alexander Merkx. Englishman Dobey concluded an afternoon session which started with a trio of 3-0 scorelines. Northern Ireland's Brendan Dolan beat Lok Yin Lee to set up a meeting with three-time champion Michael van Gerwen after Christmas. In the final two first-round matches of the 2025 competition, Wales' Rhys Griffin beat Karel Sedlacek of the Czech Republic before Asia number one Alexis Toylo cruised past Richard Veenstra. - -------------------------------------------------------------------------------- - Score: 0.5980, Text: Luke Littler is one of six contenders for the 2024 BBC Sports Personality of the Year award. - - Here BBC Sport takes a look at the darts player's year in five photos. - -------------------------------------------------------------------------------- - Score: 0.5590, Text: Littler is Young Sports Personality of the Year - - This video can not be played To play this video you need to enable JavaScript in your browser. - - Darts player Luke Littler has been named BBC Young Sports Personality of the Year 2024. The 17-year-old has enjoyed a breakthrough year after finishing runner-up at the 2024 PDC World Darts Championship in January. The Englishman, who has won 10 senior titles on the Professional Darts Corporation tour this year, is the first darts player to claim the award. "It shows how well I have done this year, not only for myself, but I have changed the sport of darts," Littler told BBC One. "I know the amount of academies that have been brought up in different locations, tickets selling out at Ally Pally in hours and the Premier League selling out - it just shows how much I have changed it." - - He was presented with the trophy by Harry Aikines-Aryeetey - a former sprinter who won the award in 2005 - and ex-rugby union player Jodie Ounsley, both of whom are stars of the BBC television show Gladiators. Skateboarder Sky Brown, 16, and Para-swimmer William Ellard, 18, were also shortlisted for the award. Littler became a household name at the start of 2024 by reaching the World Championship final aged just 16 years and 347 days. That achievement was just the start of a trophy-laden year, with Littler winning the Premier League Darts, Grand Slam and World Series of Darts Finals among his haul of titles. Littler has gone from 164th to fourth in the world rankings and earned more than £1m in prize money in 2024. The judging panel for Young Sports Personality of the Year included Paralympic gold medallist Sammi Kinghorn, Olympic silver medal-winning BMX freestyler Keiran Reilly, television presenter Qasa Alom and Radio 1 DJ Jeremiah Asiamah, as well as representatives from the Youth Sport Trust, Blue Peter and BBC Sport. - -------------------------------------------------------------------------------- - Score: 0.5414, Text: Wright is the 17th seed at the World Championship - - Two-time champion Peter Wright won his opening game at the PDC World Championship, while Ryan Meikle edged out Fallon Sherrock to set up a match against teenage prodigy Luke Littler. Scotland's Wright, the 2020 and 2022 winner, has been out of form this year, but overcame Wesley Plaisier 3-1 in the second round at Alexandra Palace in London. "It was this crowd that got me through, they wanted me to win. I thank you all," said Wright. Meikle came from a set down to claim a 3-2 victory in his first-round match against Sherrock, who was the first woman to win matches at the tournament five years ago. The 28-year-old will now play on Saturday against Littler, who was named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson on Tuesday night. Littler, 17, will be competing on the Ally Pally stage for the first time since his rise to stardom when finishing runner-up in January's world final to Luke Humphries. Earlier on Tuesday, World Grand Prix champion Mike de Decker – the 24th seed - suffered a surprise defeat to Luke Woodhouse in the second round. He is the second seed to exit following 16th seed James Wade's defeat on Monday to Jermaine Wattimena, who meets Wright in round three. Kevin Doets recovered from a set down to win 3-1 against Noa-Lynn van Leuven, who was making history as the first transgender woman to compete in the tournament. - - Sherrock drew level at 2-2 but lost the final set to Meikle - - The 54-year-old Wright only averaged 89.63 to his opponent's 93.77, but did enough to progress. Sporting a purple mohawk and festive outfit, crowd favourite 'Snakebite' showed glimpses of his best to win the first set and survived eight set darts to go 2-0 ahead. He lost the next but Dutchman Plaisier missed two more set darts in the fourth and Wright seized his opportunity. "Wesley had his chances but he missed them and I took them," he said. "He's got his tour card and he's going to be a dangerous player next year for all the players playing against him." Sherrock, 30, fought back from 2-1 down to force a decider against her English compatriot Meikle. She then narrowly missed the bull to take out 170 in the fourth leg before left-hander Meikle held his nerve to hit double 18 for a 96 finish to seal a hard-fought success. "I felt under pressure from the start and to come through feels unbelievable," said Meikle. "It's an unbelievable prize to play Luke here on this stage. It's the biggest stage of them all. I'm so happy." World number 81 Jeffrey de Graaf, who was born in the Netherlands but now represents Sweden, looked in trouble against Rashad Sweeting before prevailing 3-1. Sweeting, who was making history as the first player from the Bahamas to compete in the tournament, took the first set, but De Graaf fought back to clinch a second-round meeting with two-time champion Gary Anderson Germany's Ricardo Pietreczko, ranked 34, beat China's Xiaochen Zong 3-1 and will face Gian van Veen next. - -------------------------------------------------------------------------------- - Score: 0.5402, Text: Second seed Smith knocked out of Worlds by Doets - - Michael Smith was 2-1 ahead but fell to a shock exit - - Former champion Michael Smith has been sensationally knocked out of the PDC World Championship by Kevin Doets. Englishman Smith, seeded second, went down 3-2 after a pulsating second-round duel at Alexandra Palace in London. Dutchman Doets prevailed 6-4 in the deciding set, despite checkouts of 123, 84, 94 and 76 from 2023 champion Smith. "This was the most stressful game of my life and I've won it, yes," said world number 51 Doets. "I felt if I can keep my focus, I won't lose this. It was so very tight, to get over the line was amazing." Doets, 26, took the first set and fought back after going 2-1 down to avenge his narrow defeat to Smith at the same stage last year. Having lost in the second round of the tournament for the first time since 2020, the 34-year-old Smith will now drop to at least 15 in the rankings. - - Doets won in the first round against Noa-Lynn van Leuven, who was the first transgender woman to play in the tournament - - England's Scott Williams, who made a shock run to the semi-finals in the 2024 tournament before losing to eventual champion Luke Humphries, overcame Niko Springer 3-1 in a thriller. German debutant Springer, second on this year's development tour, won all three legs in the opening set before the match exploded into life. Williams hit back, showing his old swagger as he went ahead after a sensational third set which featured seven 180s. The 34-year-old edged the deciding leg in the fourth and will meet 2018 champion Rob Cross in round two on Monday. Nick Kenny delighted the Ally Pally crowd with a fabulous 170 finish to seal a 3-0 victory in round one over American Stowe Buntz. The Welshman, 31, will face five-time world champion Raymond Barneveld on Saturday evening on a bill which also features teenage star Luke Littler against Ryan Meikle. Canadian Matt Campbell set up a second-round match against Ryan Searle with a 3-2 defeat of Austrian Mensur Suljovic. - - England's Callan Rydz averaged 107.06 to book his place in the second round, before Gabriel Clemens was knocked out by Wales' Robert Owen on Thursday afternoon. Rydz beat Croatian Romeo Grbavac 3-0, recording the tournament's highest average first-round match average in its current 96-player format. It was the competition's 26th highest match average overall and comfortably the best so far at the 2025 event. The previous record was held by teenager Luke Littler, who scored 106.12 at this stage last year. Rydz, from Bedlington in Northumberland, meets Germany's Martin Schindler in the second round on Sunday evening. The afternoon session concluded with Germany's 27th seed Clemens being beaten by Owen, who is ranked 50 places below him. Owen recorded a 3-1 victory, his second in a matter of days, to reach the third round, which begins on 27 December. Hong Kong's Lok Yin Lee came from a set down to beat Chris Landman 3-1 after winning nine straight legs. Lee will face Northern Ireland's Brendan Dolan in round two on Saturday afternoon. Meanwhile, 2024 Grand Slam of Darts runner-up Martin Lukeman came from a set down to beat Indian qualifier Nitin Kumar 3-1. Lukeman meets 21st seed Andrew Gilding on Monday afternoon for a place in the last 32. - -------------------------------------------------------------------------------- - Score: 0.5328, Text: Cross loses as record number of seeds out of Worlds - - Rob Cross has suffered three second-round exits in his eight World Championships - - Former champion Rob Cross became the latest high-profile casualty as a record-breaking 14th seed exited the PDC World Darts Championship in the second round. The number five seed was beaten 3-1 by close friend Scott Williams, who will face Germany's Ricardo Pietreczko in round three. Cross, who won the event on his debut in 2018, took the opening set but failed to reach anywhere near his best as he suffered his third second-round exit. He was joined by number six seed David Chisnall, who was beaten 3-2 in a sudden-death leg by Ricky Evans, who came into the tournament 46th in the PDC's Order of Merit. The 2021 semi-finalist won the opening set, but then found himself 2-1 down to an inspired Evans, who was cheered on relentlessly by the Alexandra Palace crowd. He forced the game into a deciding set and faced match dart but Evans missed bullseye by the width of the wire. Chisnall then missed his own match dart on double tops, before he made a miscalculation when attempting to checkout 139 at 5-4 down. No real harm was done with a sudden-death leg forced but he was unable to hold off Evans, who reaches the third round for the third time in the last five years. "It's not even what it is, again I've played a world-class darts player. I've played quite well and won," Evans told Sky Sports. "Look at this [the crowd], wow. I don't understand it, why are they cheering me on? "I don't get this reception in my household. Thank you very much. You've made a very fat guy very happy." Evans will face unseeded Welshman Robert Owen when the third round starts after the three-day Christmas break. - - World youth champion Gian van Veen had become the 12th seed to be knocked out when he lost 3-1 to Pietreczko. The 28th seed lost the opening set, having missed nine darts at double, but levelled. However, the Dutchman was unable to match Pietreczko, who closed out a comfortable win with a checkout percentage of 55.6%. Pietreczko said: "I am over the moon to win. It is very important for me to be in the third round after Christmas. I love the big stage." The 26th seed trailed 1-0 and 2-1, and both players went on to miss match darts, before Gurney won the final set 3-1 on legs. - - Jonny Clayton is into the third round of the PDC World Darts Championship for a sixth consecutive year - - In the afternoon session, Welsh number seven seed Jonny Clayton also needed sudden death to pull off a sensational final-set comeback against Mickey Mansell in. He was a leg away from defeat twice to his Northern Irish opponent, but came from behind to win the final set 6-5 in a sudden-death leg to win 3-2. Clayton, who will play Gurney in round three, lost the opening set of the match, but fought back to lead 2-1, before being pegged back again by 51-year-old Mansell, who then missed match darts on double tops in the deciding set. "I was very emotional. I've got to be honest, that meant a lot," said Clayton, who is in the favourable half of the draw following shock second-round exits for former world champions Michael Smith and Gary Anderson. "I had chances before and Mickey definitely had chances before. It wasn't great to play in, not the best - I wouldn't wish that on my worst enemy. "There is a lot of weight off my shoulders after that. I know there is another gear or two in the bank, but I'll be honest that meant a lot to me, it is a tester and will try and make me believe again." Clayton was 2-0 down in the fifth set after consecutive 136 and 154 checkouts from Mansell, but won three legs on the trot in 15, 12 and 10 darts to wrestle a 3-2 lead. He missed three darts for the match, before his unseeded opponent held and broke Clayton's throw to lead 4-3. Mansell missed a match dart at double 20, before Clayton won on double five after two missed checkouts. Elsewhere, Northern Ireland's Josh Rock booked his place in the third round against England's Chris Dobey with a 3-0 win over Wales' Rhys Griffin. Martin Lukeman, runner-up to Luke Littler at the Grand Slam of Darts last month, is out after a 3-1 loss to number 21 seed Andrew Gilding. The final day before the Christmas break started with Poland's number 31 seed Krzysztof Ratajski recording a 3-1 win over Alexis Toylo of the Philippines. - - All times are GMT and subject to change. Two fourth-round matches will also be played - -------------------------------------------------------------------------------- - Score: 0.5116, Text: Michael van Gerwen has made just one major ranking event final in 2024 - - Michael van Gerwen enjoyed a comfortable 3-0 victory over English debutant James Hurrell in his opening match of the PDC World Darts Championship. The three-time world champion has had a tough year by his standards, having fallen behind Luke Littler and Luke Humphries, so a relatively stress-free opening match at Alexandra Palace was just what was needed. Hurrell, 40, offered some resistance early on when taking the opening leg of the match, but he would win just two more as Van Gerwen proved far too strong. The third-seeded Dutchman averaged 94.85, took out two three-figure checkouts and hit 50% of his doubles - with six of his nine misses coming in one scrappy leg. Van Gerwen, 35, will now face either Brendan Dolan or Lok Yin Lee in the third round. - - "I think I played OK," Van Gerwen told Sky Sports after his match. "Of course, I was a bit nervous. Like everyone knows it's been a tough year for me. "Overall, it was a good performance. I was confident. I won the game, that's the main thing." Also on Friday night, Germany's Florian Hempel showed why he loves playing on the Alexandra Palace stage with a thrilling 3-1 victory in a high-quality contest against Jeffrey de Zwaan. Both men hit seven 180s in a match played at a fast and furious pace, but 34-year-old Hempel's superior doubles gave him a fourth straight first-round victory in the competition. Hempel moves on to a tie with 26th seed Daryl Gurney but it was a damaging loss for De Zwaan, 28, who came through a late qualifier in November and needed a good run here to keep his PDC tour card for next season. Mickey Mansell earned a second-round date with world number seven Jonny Clayton after a scrappy 3-1 win over Japan's Tomoya Goto, while Dylan Slevin came through an all-Irish tie against William O'Connor to progress to a meeting with Dimitri van den Bergh. - - Stephen Bunting is in the third round of the PDC World Darts Championship for a third consecutive year - - In the afternoon session, Stephen Bunting came from behind to beat Kai Gotthardt 3-1 and book his place in the third round. Englishman Bunting, ranked eighth in the world, dropped the first set and almost went 2-0 down in the match before staging an impressive recovery. Tournament debutant Gotthardt missed three darts at double eight to win the second set, allowing Bunting to take out double 10 to level the match before powering away to victory by winning the third and fourth sets without losing a leg. Victory for "The Bullet" sets up a last 32 meeting with the winner of Dirk van Duijvenbode's meeting with Madars Razma after Christmas. Should Bunting progress further, he is seeded to face world number one and defending world champion Luke Humphries in the quarter-finals on New Year's Day. Elsewhere in Friday afternoon's session, the Dutch duo of Alexander Merkx and Wessel Nijman advanced to the second round with wins over Stephen Burton and Cameron Carolissen respectively. England's Ian White was handed a walkover victory against Sandro Eric Sosing of the Philippines. Sosing withdrew from the competition on medical grounds and was taken to hospital following chest pains. - -------------------------------------------------------------------------------- - Score: 0.5113, Text: Gary Anderson was the fifth seed to be beaten on Sunday - - Two-time champion Gary Anderson has been dumped out of the PDC World Championship on his 54th birthday by Jeffrey de Graaf. The Scot, winner in 2015 and 2016, lost 3-0 to the Swede in a second-round shock at Alexandra Palace in London. "Gary didn't really show up as he usually does. I'm very happy with the win," said De Graaf, 34, who had a 75% checkout success and began with an 11-dart finish. "It's a dream come true for me. He's been my idol since I was 14 years old." Anderson, ranked 14th, became the 11th seed to be knocked out from the 24 who have played so far, and the fifth to fall on Sunday. - - He came into the competition with the year's highest overall three-dart average of 99.66 but hit just three of his 20 checkout attempts to lose his opening match of the tournament for the first time. De Graaf will now meet Filipino qualifier Paolo Nebrida after he stunned England's Ross Smith, the 19th seed, in straight sets. Ritchie Edhouse, Dirk van Duijvenbode and Martin Schindler were the other seeds beaten on day eight. England's Callan Rydz, who hit a record first-round average of 107.06 on Thursday, followed up with a 3-0 win over 23rd seed Schindler on Sunday. The German missed double 12 for a nine-darter in the first set – the third player to do so in 24 hours after Luke Littler and Damon Heta – and ended up losing the leg. Rydz next meets Belgian Dimitri van den Bergh, who hit six 180s and averaged 96 in a 3-0 win over Irishman Dylan Slevin. - - England's Joe Cullen abruptly left his post-match news conference and accused the media of not showing him respect after his 3-0 win over Dutchman Wessel Nijman. Nijman, who has previously served a ban for breaching betting and anti-corruption rules, had been billed as favourite beforehand to beat 23rd seed Cullen. "Honestly, the media attention that Wessel's got, again this is not a reflection on him," Cullen said. "He seems like a fantastic kid, he's been caught up in a few things beforehand, but he's served his time and he's held his hands up, like a lot haven't. "I think the way I've been treated probably with the media and things like that - I know you guys have no control over the bookies - I've been shown no respect, so I won't be showing any respect to any of you guys tonight. "I'm going to go home. Cheers." Ian 'Diamond' White beat European champion and 29th seed Edhouse 3-1 and will face teenage star Littler in the next round. White, born in the same Cheshire town as the 17-year-old, acknowledged he would need to up his game in round three. Asked if he knew who was waiting for him, White joked: "Yeah, Runcorn's number two. I'm from Runcorn and I'm number one." Ryan Searle started Sunday afternoon's action off with a 10-dart leg and went on to beat Matt Campbell 3-0, while Latvian Madars Razma defeated 25th seed Van Duijvenbode 3-1. Seventh seed Jonny Clayton and 2018 champion Rob Cross are among the players in action on Monday as the second round concludes. The third round will start on Friday after a three-day break for Christmas. - -------------------------------------------------------------------------------- - Score: 0.5105, Text: Christian Kist was sealing his first televised nine-darter - - Christian Kist hit a nine-darter but lost his PDC World Championship first-round match to Madars Razma. The Dutchman became the first player to seal a perfect leg in the tournament since Michael Smith did so on the way to beating Michael van Gerwen in the 2023 final. Kist, the 2012 BDO world champion at Lakeside, collects £60,000 for the feat, with the same amount being awarded by sponsors to a charity and to one spectator inside Alexandra Palace in London. The 38-year-old's brilliant finish sealed the opening set, but his Latvian opponent bounced back to win 3-1. Darts is one of the few sports that can measure perfection; snooker has the 147 maximum break, golf has the hole-in-one, darts has the nine-dart finish. Kist scored two maximum 180s to leave a 141 checkout which he completed with a double 12, to the delight of more than 3,000 spectators. The English 12th seed, who has been troubled by wrist and back injuries, could next play Andrew Gilding in the third round - which begins on 27 December - should Gilding beat the winner of Martin Lukeman's match against qualifier Nitin Kumar. Aspinall faces a tough task to reach the last four again, with 2018 champion Rob Cross and 2024 runner-up Luke Littler both in his side of the draw. - - Kist - who was knocked out of last year's tournament by teenager Littler - will still earn a bigger cheque than he would have got for a routine run to the quarter-finals. His nine-darter was the 15th in the history of the championship and first since the greatest leg in darts history when Smith struck, moments after Van Gerwen just missed his attempt. Darts fan Kris, a railway worker from Sutton in south London, was the random spectator picked out to receive £60,000, with Prostate Cancer UK getting the same sum from tournament sponsors Paddy Power. "I'm speechless to be honest. I didn't expect it to happen to me," Kris said. "This was a birthday present so it makes it even better. My grandad got me tickets. It was just a normal day - I came here after work." Kist said: "Hitting the double 12 felt amazing. It was a lovely moment for everyone and I hope Kris enjoys the money. Maybe I will go on vacation next month." Earlier, Jim Williams was favourite against Paolo Nebrida but lost 3-2 in an epic lasting more than an hour. The Filipino took a surprise 2-1 lead and Williams only went ahead for the first time in the opening leg of the deciding set. The Welshman looked on course for victory but missed five match darts. UK Open semi-finalist Ricky Evans set up a second-round match against Dave Chisnall, checking out on 109 to edge past Gordon Mathers 3-2. - -------------------------------------------------------------------------------- - - -## Retrieval-Augmented Generation (RAG) with Couchbase and LangChain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -# Create RAG prompt template -rag_prompt = ChatPromptTemplate.from_messages([ - ("system", "You are a helpful assistant that answers questions based on the provided context."), - ("human", "Context: {context}\n\nQuestion: {question}") -]) - -# Create RAG chain -rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | rag_prompt - | llm - | StrOutputParser() -) -logging.info("Successfully created RAG chain") -``` - - 2025-05-25 14:41:41,810 - INFO - Successfully created RAG chain - - - -```python -try: - start_time = time.time() - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: In his recent 2025 PDC World Championship second-round match against Ryan Meikle, **Luke Littler** achieved several notable milestones and records: - - 1. **Tournament Record Set Average**: - - Littler hit a **140.91 set average** in the fourth set, the highest ever recorded in the tournament for a single set. This included three consecutive legs finished in 11, 10, and 11 darts. - - 2. **Near Nine-Darter**: - - He narrowly missed a nine-dart finish (the pinnacle of darts perfection) by millimeters when he failed to land double 12 in the fourth set. - - 3. **Overall Performance**: - - Despite a slow start and admitted nerves, he secured a **3-1 victory** with a dominant fourth set, hitting **four maximum 180s** and maintaining an overall match average of **100.85**. - - 4. **Emotional Impact**: - - The 17-year-old became emotional post-match, cutting short his on-stage interview due to the intensity of the moment, later calling it the "toughest game" he’d ever played. - - These achievements highlight his resilience and skill, further cementing his status as a rising star in darts. - RAG response generated in 21.84 seconds - - -## Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What were Luke Littler's key achievements and records in his recent PDC World Championship match?", # Repeated query - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") - -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, the game ended in a 2-2 draw. Key highlights include: - - 1. **Red Card Incident**: Liverpool played most of the match with 10 men after Andy Robertson received a red card in the 17th minute for denying a goalscoring opportunity. He had earlier been injured in a tackle by Fulham's Issa Diop. - - 2. **Comeback Resilience**: Despite the numerical disadvantage, Liverpool twice came from behind. Diogo Jota scored an 86th-minute equalizer to secure a point. Fulham's Antonee Robinson praised Liverpool, noting it "didn’t feel like they had 10 men" due to their aggressive, high-pressing approach. - - 3. **Performance Metrics**: Liverpool dominated possession (over 60%) and led in key attacking stats (shots, big chances, touches in the opposition box), showcasing their determination even with a player deficit. - - 4. **Manager & Player Reactions**: - - Manager Arne Slot commended his team’s "outstanding" character and resilience, particularly highlighting Robertson’s effort despite the red card. - - Captain Virgil van Dijk emphasized the team’s ability to "stay calm" and fight back under pressure. - - 5. **League Impact**: The draw extended Liverpool’s lead at the top of the Premier League to five points, as rivals Arsenal also dropped points. Pundits, including Chris Sutton, lauded Liverpool’s "phenomenal" response to adversity. - - Fulham’s strong performance, described as "brave," was also acknowledged, making the match a thrilling encounter between both sides. - Time taken: 14.14 seconds - - Query 2: What were Luke Littler's key achievements and records in his recent PDC World Championship match? - Response: In his recent 2025 PDC World Championship second-round match against Ryan Meikle, **Luke Littler** achieved several notable milestones and records: - - 1. **Tournament Record Set Average**: - - Littler hit a **140.91 set average** in the fourth set, the highest ever recorded in the tournament for a single set. This included three consecutive legs finished in 11, 10, and 11 darts. - - 2. **Near Nine-Darter**: - - He narrowly missed a nine-dart finish (the pinnacle of darts perfection) by millimeters when he failed to land double 12 in the fourth set. - - 3. **Overall Performance**: - - Despite a slow start and admitted nerves, he secured a **3-1 victory** with a dominant fourth set, hitting **four maximum 180s** and maintaining an overall match average of **100.85**. - - 4. **Emotional Impact**: - - The 17-year-old became emotional post-match, cutting short his on-stage interview due to the intensity of the moment, later calling it the "toughest game" he’d ever played. - - These achievements highlight his resilience and skill, further cementing his status as a rising star in darts. - Time taken: 1.82 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, the game ended in a 2-2 draw. Key highlights include: - - 1. **Red Card Incident**: Liverpool played most of the match with 10 men after Andy Robertson received a red card in the 17th minute for denying a goalscoring opportunity. He had earlier been injured in a tackle by Fulham's Issa Diop. - - 2. **Comeback Resilience**: Despite the numerical disadvantage, Liverpool twice came from behind. Diogo Jota scored an 86th-minute equalizer to secure a point. Fulham's Antonee Robinson praised Liverpool, noting it "didn’t feel like they had 10 men" due to their aggressive, high-pressing approach. - - 3. **Performance Metrics**: Liverpool dominated possession (over 60%) and led in key attacking stats (shots, big chances, touches in the opposition box), showcasing their determination even with a player deficit. - - 4. **Manager & Player Reactions**: - - Manager Arne Slot commended his team’s "outstanding" character and resilience, particularly highlighting Robertson’s effort despite the red card. - - Captain Virgil van Dijk emphasized the team’s ability to "stay calm" and fight back under pressure. - - 5. **League Impact**: The draw extended Liverpool’s lead at the top of the Premier League to five points, as rivals Arsenal also dropped points. Pundits, including Chris Sutton, lauded Liverpool’s "phenomenal" response to adversity. - - Fulham’s strong performance, described as "brave," was also acknowledged, making the match a thrilling encounter between both sides. - Time taken: 1.52 seconds - - -## Conclusion -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and Deepseek(via Openrouter). This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/openrouter-deepseek-gsi-RAG_with_Couchbase_and_Openrouter_Deepseek.md b/tutorial/markdown/generated/vector-search-cookbook/openrouter-deepseek-gsi-RAG_with_Couchbase_and_Openrouter_Deepseek.md deleted file mode 100644 index 3ee0b21..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/openrouter-deepseek-gsi-RAG_with_Couchbase_and_Openrouter_Deepseek.md +++ /dev/null @@ -1,816 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openrouter-deepseek-with-global-secondary-index" -title: Retrieval-Augmented Generation with Couchbase and OpenRouter Deepseek using GSI index -short_title: RAG with Couchbase and OpenRouter Deepseek using GSI index -description: - - Learn how to build a semantic search engine using Couchbase and OpenRouter with Deepseek using GSI index. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with OpenRouter Deepseek as both embeddings and language model provider. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - GSI - - Artificial Intelligence - - LangChain - - Deepseek - - OpenRouter -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/openrouter-deepseek/gsi/RAG_with_Couchbase_and_Openrouter_Deepseek.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Deepseek V3 as the language model provider (via OpenRouter or direct API)](https://deepseek.ai/) and OpenAI for embeddings. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-openrouter-deepseek-with-fts/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/openrouter-deepseek/gsi/RAG_with_Couchbase_and_Openrouter_Deepseek.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for OpenRouter and Deepseek -* Sign up for an account at [OpenRouter](https://openrouter.ai/) to get your API key -* OpenRouter provides access to Deepseek models, so no separate Deepseek credentials are needed -* Store your OpenRouter API key securely as it will be used to access the models -* For [Deepseek](https://deepseek.ai/) models, you can use the default models provided by OpenRouter - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -## Setting the Stage: Installing Necessary Libraries - -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-deepseek==0.1.3 langchain-openai==0.3.13 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -## Importing Necessary Libraries - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException,ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -from langchain_openai import OpenAIEmbeddings -``` - -## Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -# Suppress httpx logging -logging.getLogger('httpx').setLevel(logging.CRITICAL) -``` - -## Environment Variables and Configuration - -This section handles loading and validating environment variables and configuration settings: -# -1. API Keys: - - Supports either direct Deepseek API or OpenRouter API access - - Prompts for API key input if not found in environment - - Requires OpenAI API key for embeddings -# -2. Couchbase Settings: - - Connection details (host, username, password) - - Bucket, scope and collection names - - Vector search index configuration - - Cache collection settings -# -The code validates that all required credentials are present before proceeding. -It allows flexible configuration through environment variables or interactive prompts, -with sensible defaults for local development. - - - -```python -# Load environment variables from .env file if it exists -load_dotenv(override= True) - -# API Keys -# Allow either Deepseek API directly or via OpenRouter -DEEPSEEK_API_KEY = os.getenv('DEEPSEEK_API_KEY') -OPENROUTER_API_KEY = os.getenv('OPENROUTER_API_KEY') - -if not DEEPSEEK_API_KEY and not OPENROUTER_API_KEY: - api_choice = input('Choose API (1 for Deepseek direct, 2 for OpenRouter): ') - if api_choice == '1': - DEEPSEEK_API_KEY = getpass.getpass('Enter your Deepseek API Key: ') - else: - OPENROUTER_API_KEY = getpass.getpass('Enter your OpenRouter API Key: ') - -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API Key: ') - -# Couchbase Settings -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: deepseek): ') or 'deepseek' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if required credentials are set -required_creds = { - 'OPENAI_API_KEY': OPENAI_API_KEY, - 'CB_HOST': CB_HOST, - 'CB_USERNAME': CB_USERNAME, - 'CB_PASSWORD': CB_PASSWORD, - 'CB_BUCKET_NAME': CB_BUCKET_NAME -} - -# Add the API key that was chosen -if DEEPSEEK_API_KEY: - required_creds['DEEPSEEK_API_KEY'] = DEEPSEEK_API_KEY -elif OPENROUTER_API_KEY: - required_creds['OPENROUTER_API_KEY'] = OPENROUTER_API_KEY -else: - raise ValueError("Either Deepseek API Key or OpenRouter API Key must be provided") - -for cred_name, cred_value in required_creds.items(): - if not cred_value: - raise ValueError(f"{cred_name} is not set") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-17 15:40:27,133 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: If you are using Capella, create a bucket manually called vector-search-testing(or any name you prefer) with the same properties. - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-09-17 15:41:01,398 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-17 15:41:01,410 - INFO - Collection 'deepseek' does not exist. Creating it... - 2025-09-17 15:41:01,453 - INFO - Collection 'deepseek' created successfully. - 2025-09-17 15:41:03,712 - INFO - All documents cleared from the collection. - 2025-09-17 15:41:03,713 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-17 15:41:03,728 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-09-17 15:41:05,821 - INFO - All documents cleared from the collection. - - - - - - - - - -## Creating the Embeddings client -This section creates an OpenAI embeddings client using the OpenAI API key. -The embeddings client is configured to use the "text-embedding-3-small" model, -which converts text into numerical vector representations. -These vector embeddings are essential for semantic search and similarity matching. -The client will be used by the vector store to generate embeddings for documents. - - -```python -try: - embeddings = OpenAIEmbeddings( - api_key=OPENAI_API_KEY, - model="text-embedding-3-small" - ) - logging.info("Successfully created OpenAI embeddings client") -except Exception as e: - raise ValueError(f"Error creating OpenAI embeddings client: {str(e)}") -``` - - 2025-09-17 15:41:27,149 - INFO - Successfully created OpenAI embeddings client - - -## Setting Up the Couchbase Vector Store -A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-09-17 15:41:55,394 - INFO - Successfully created vector store - - -## Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-17 15:42:04,530 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-09-17 16:08:51,054 - INFO - Document ingestion completed successfully. - - -## Setting Up the LLM Model -In this section, we set up the Large Language Model (LLM) for our RAG system. We're using the Deepseek model, which can be accessed through two different methods: - -1. **Deepseek API Key**: This is obtained directly from Deepseek's platform (https://deepseek.ai) by creating an account and subscribing to their API services. With this key, you can access Deepseek's models directly using the `ChatDeepSeek` class from the `langchain_deepseek` package. - -2. **OpenRouter API Key**: OpenRouter (https://openrouter.ai) is a service that provides unified access to multiple LLM providers, including Deepseek. You can obtain an API key by creating an account on OpenRouter's website. This approach uses the `ChatOpenAI` class from `langchain_openai` but with a custom base URL pointing to OpenRouter's API endpoint. - -The key difference is that OpenRouter acts as an intermediary service that can route your requests to various LLM providers, while the Deepseek API gives you direct access to only Deepseek's models. OpenRouter can be useful if you want to switch between different LLM providers without changing your code significantly. - -In our implementation, we check for both keys and prioritize using the Deepseek API directly if available, falling back to OpenRouter if not. The model is configured with temperature=0 to ensure deterministic, focused responses suitable for RAG applications. - - - -```python -from langchain_deepseek import ChatDeepSeek -from langchain_openai import ChatOpenAI - -if DEEPSEEK_API_KEY: - try: - llm = ChatDeepSeek( - api_key=DEEPSEEK_API_KEY, - model_name="deepseek-chat", - temperature=0 - ) - logging.info("Successfully created Deepseek LLM client") - except Exception as e: - raise ValueError(f"Error creating Deepseek LLM client: {str(e)}") -elif OPENROUTER_API_KEY: - try: - llm = ChatOpenAI( - api_key=OPENROUTER_API_KEY, - base_url="https://openrouter.ai/api/v1", - model="deepseek/deepseek-chat-v3.1", - temperature=0, - ) - logging.info("Successfully created Deepseek LLM client through OpenRouter") - except Exception as e: - raise ValueError(f"Error creating Deepseek LLM client: {str(e)}") -else: - raise ValueError("Either Deepseek API Key or OpenRouter API Key must be provided") -``` - - 2025-09-18 11:18:25,192 - INFO - Successfully created Deepseek LLM client through OpenRouter - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-17 16:11:07,177 - INFO - Semantic search completed in 2.46 seconds - - - - Semantic Search Results (completed in 2.46 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3693, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - Littler beat Luke Humphries to win the Premier League title in May - - The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m." - - Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November. - • None Know a lot about Littler? Take our quiz - -------------------------------------------------------------------------------- - Distance: 0.3900, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - ... (output truncated for brevity) - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -vector_store.create_index(index_type=IndexType.BHIVE, index_name="openrouterdeepseek_bhive_index",index_description="IVF,SQ8") -``` - -The example below shows running the same similarity search, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - -**Important**: When using Composite indexes, scalar filters take precedence over vector similarity, which can improve performance for filtered searches but may miss some semantically relevant results that don't match the scalar criteria. - -Note: In GSI vector search, the distance represents the vector distance between the query and document embeddings. Lower distance indicate higher similarity, while higher distance indicate lower similarity. - - -```python - -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-18 11:17:19,626 - INFO - Semantic search completed in 0.88 seconds - - - - Semantic Search Results (completed in 0.88 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3694, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - Littler beat Luke Humphries to win the Premier League title in May - - The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m." - - Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November. - • None Know a lot about Littler? Take our quiz - -------------------------------------------------------------------------------- - Distance: 0.3901, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - ... (output truncated for brevity) - - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - - -```python -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="openrouterdeepseek_composite_index", index_description="IVF,SQ8") -``` - -## Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-17 16:10:11,473 - INFO - Successfully created cache - - -## Retrieval-Augmented Generation (RAG) with Couchbase and LangChain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -# Create RAG prompt template -rag_prompt = ChatPromptTemplate.from_messages([ - ("system", "You are a helpful assistant that answers questions based on the provided context."), - ("human", "Context: {context}\n\nQuestion: {question}") -]) - -# Create RAG chain -rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | rag_prompt - | llm - | StrOutputParser() -) -logging.info("Successfully created RAG chain") -``` - - 2025-09-18 11:18:34,032 - INFO - Successfully created RAG chain - - - -```python -try: - start_time = time.time() - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: Based on the provided context, Luke Littler's key achievements and records in his recent PDC World Championship match (second-round win against Ryan Meikle) were: - - * **Tournament Record Set Average:** He hit a tournament record 140.91 set average during the match. - * **Near Nine-Darter:** He was "millimetres away from a nine-darter" when he missed double 12. - * **Dominant Final Set:** He won the fourth and final set in just 32 darts (the minimum possible is 27), which included hitting four maximum 180s and clinching three straight legs in 11, 10, and 11 darts. - * **Overall High Average:** He maintained a high overall match average of 100.85. - RAG response generated in 0.49 seconds - - -## Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What were Luke Littler's key achievements and records in his recent PDC World Championship match?", # Repeated query - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") - -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, Liverpool played the majority of the game with 10 men after Andy Robertson received a red card in the 17th minute. Despite being a player down, Liverpool came from behind twice to secure a 2-2 draw. Diogo Jota scored an 86th-minute equalizer to earn Liverpool a point. The performance was praised for its resilience, with Fulham's Antonee Robinson noting that Liverpool "didn't feel like they had 10 men at all." Liverpool maintained over 60% possession and led in attacking metrics such as shots and chances. Both managers acknowledged the strong efforts of their teams in what was described as an enthralling encounter. - Time taken: 4.65 seconds - - Query 2: What were Luke Littler's key achievements and records in his recent PDC World Championship match? - Response: Based on the provided context, Luke Littler's key achievements and records in his recent PDC World Championship match (second-round win against Ryan Meikle) were: - - * **Tournament Record Set Average:** He hit a tournament record 140.91 set average during the match. - * **Near Nine-Darter:** He was "millimetres away from a nine-darter" when he missed double 12. - * **Dominant Final Set:** He won the fourth and final set in just 32 darts (the minimum possible is 27), which included hitting four maximum 180s and clinching three straight legs in 11, 10, and 11 darts. - * **Overall High Average:** He maintained a high overall match average of 100.85. - Time taken: 0.45 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, Liverpool played the majority of the game with 10 men after Andy Robertson received a red card in the 17th minute. Despite being a player down, Liverpool came from behind twice to secure a 2-2 draw. Diogo Jota scored an 86th-minute equalizer to earn Liverpool a point. The performance was praised for its resilience, with Fulham's Antonee Robinson noting that Liverpool "didn't feel like they had 10 men at all." Liverpool maintained over 60% possession and led in attacking metrics such as shots and chances. Both managers acknowledged the strong efforts of their teams in what was described as an enthralling encounter. - Time taken: 1.15 seconds - - -## Conclusion -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and Deepseek(via Openrouter). This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/pydantic_ai-fts-RAG_with_Couchbase_and_PydanticAI.md b/tutorial/markdown/generated/vector-search-cookbook/pydantic_ai-fts-RAG_with_Couchbase_and_PydanticAI.md deleted file mode 100644 index f360393..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/pydantic_ai-fts-RAG_with_Couchbase_and_PydanticAI.md +++ /dev/null @@ -1,620 +0,0 @@ ---- -# frontmatter -path: "/tutorial-pydantic-ai-couchbase-rag-with-fts" -title: Retrieval-Augmented Generation (RAG) with Couchbase and PydanticAI using FTS -short_title: RAG with Couchbase and PydanticAI -description: - - Learn how to build a semantic search engine using Couchbase and PydanticAI using FTS. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with PydanticAI using tool calling. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using PydanticAI and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - OpenAI - - PydanticAI - - FTS -sdk_language: - - python -length: 30 Mins ---- - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/pydantic_ai/fts/RAG_with_Couchbase_and_PydanticAI.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [OpenAI](https://openai.com) as the embedding and LLM provider, and [PydanticAI](https://ai.pydantic.dev) as an agent orchestrator. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-pydantic-ai-couchbase-rag-with-global-secondary-index) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and OpenAI provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -%pip install --quiet -U datasets==3.5.0 langchain-couchbase==0.3.0 langchain-openai==0.3.13 python-dotenv==1.1.0 pydantic-ai==0.1.1 ipywidgets==8.1.6 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from uuid import uuid4 -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (InternalServerFailureException, - QueryIndexAlreadyExistsException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import OpenAIEmbeddings -from tqdm import tqdm - -from dataclasses import dataclass -from pydantic_ai import Agent, RunContext -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API Key: ') - -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME') or input('Enter your index name (default: vector_search_pydantic_ai): ') or 'vector_search_pydantic_ai' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: pydantic_ai): ') or 'pydantic_ai' - -# Check if the variables are correctly loaded -if not OPENAI_API_KEY: - raise ValueError("Missing OpenAI API Key") - -if 'OPENAI_API_KEY' not in os.environ: - os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-04-11 13:54:19,537 - INFO - Successfully connected to Couchbase - - -# Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - time.sleep(2) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists.Skipping creation.") - - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -``` - - 2025-04-11 13:54:23,668 - INFO - Bucket 'vector-search-testing' does not exist. Creating it... - - - 2025-04-11 13:54:25,721 - INFO - Bucket 'vector-search-testing' created successfully. - 2025-04-11 13:54:25,728 - INFO - Scope 'shared' does not exist. Creating it... - 2025-04-11 13:54:25,777 - INFO - Scope 'shared' created successfully. - 2025-04-11 13:54:25,796 - INFO - Collection 'pydantic_ai' does not exist. Creating it... - 2025-04-11 13:54:27,843 - INFO - Collection 'pydantic_ai' created successfully. - 2025-04-11 13:54:28,120 - INFO - Primary index present or created successfully. - 2025-04-11 13:54:28,133 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `pydantic_ai`. The configuration is set up for vectors with exactly `1536 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/pydantic_ai_index.json' # Local setup: specify your file path here - -# # Version for Google Colab -# def load_index_definition_colab(): -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -# try: -# with open(index_definition_path, 'r') as file: -# index_definition = json.load(file) -# return index_definition -# except Exception as e: -# raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Version for Local Environment -def load_index_definition_local(index_definition_path): - try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - return index_definition - except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Usage -# Uncomment the appropriate line based on your environment -# index_definition = load_index_definition_colab() -index_definition = load_index_definition_local('pydantic_ai_index.json') -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") - -except InternalServerFailureException as e: - error_message = str(e) - logging.error(f"InternalServerFailureException raised: {error_message}") - - try: - # Accessing the response_body attribute from the context - error_context = e.context - response_body = error_context.response_body - if response_body: - error_details = json.loads(response_body) - error_message = error_details.get('error', '') - - if "collection: 'pydantic_ai' doesn't belong to scope: 'shared'" in error_message: - raise ValueError("Collection 'pydantic_ai' does not belong to scope 'shared'. Please check the collection and scope names.") - - except ValueError as ve: - logging.error(str(ve)) - raise - - except Exception as json_error: - logging.error(f"Failed to parse the error message: {json_error}") - raise RuntimeError(f"Internal server error while creating/updating search index: {error_message}") -``` - - 2025-04-11 13:54:41,157 - INFO - Creating new index 'vector-search-testing.shared.vector_search_pydantic_ai'... - 2025-04-11 13:54:41,316 - INFO - Index 'vector-search-testing.shared.vector_search_pydantic_ai' successfully created/updated. - - -# Creating OpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using OpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - -```python -try: - embeddings = OpenAIEmbeddings( - model="text-embedding-3-small", - api_key=OPENAI_API_KEY, - ) - logging.info("Successfully created OpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating OpenAIEmbeddings: {str(e)}") -``` - - 2025-04-11 13:55:10,426 - INFO - Successfully created OpenAIEmbeddings - - -# Setting Up the Couchbase Vector Store -The vector store is set up to manage the embeddings created in the previous step. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is built on top of Couchbase, allowing the script to store the embeddings in a way that can be efficiently searched. - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-04-11 13:55:12,849 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-04-11 13:55:22,967 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -With the Vector store set up, the next step is to populate it with data. We save the BBC articles dataset to the vector store. For each document, we will generate the embeddings for the article to use with the semantic search using LangChain. Here one of the articles is larger than the maximum tokens that we can use for our embedding model. If we want to ingest that document, we could split the document and ingest it in parts. However, since it is only a single document for simplicity, we ignore that document from the ingestion process. - - -```python -# Save the current logging level -current_logging_level = logging.getLogger().getEffectiveLevel() - -# # Set logging level to CRITICAL to suppress lower level logs -logging.getLogger().setLevel(logging.CRITICAL) - -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles - ) -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -# Restore the original logging level -logging.getLogger().setLevel(current_logging_level) -``` - -# PydanticAI: An Introduction -From [PydanticAI](https://ai.pydantic.dev/)'s website: - -> PydanticAI is a Python agent framework designed to make it less painful to build production grade applications with Generative AI. - -PydanticAI allows us to define agents and tools easily to create Gen-AI apps in an innovative and painless manner. Some of its features are: -- Built by the Pydantic Team: Built by the team behind Pydantic (the validation layer of the OpenAI SDK, the Anthropic SDK, LangChain, LlamaIndex, AutoGPT, Transformers, CrewAI, Instructor and many more). - -- Model-agnostic: Supports OpenAI, Anthropic, Gemini, Deepseek, Ollama, Groq, Cohere, and Mistral, and there is a simple interface to implement support for other models. - -- Type-safe: Designed to make type checking as powerful and informative as possible for you. - -- Python-centric Design: Leverages Python's familiar control flow and agent composition to build your AI-driven projects, making it easy to apply standard Python best practices you'd use in any other (non-AI) project. - -- Structured Responses: Harnesses the power of Pydantic to validate and structure model outputs, ensuring responses are consistent across runs. - -- Dependency Injection System: Offers an optional dependency injection system to provide data and services to your agent's system prompts, tools and result validators. This is useful for testing and eval-driven iterative development. - -- Streamed Responses: Provides the ability to stream LLM outputs continuously, with immediate validation, ensuring rapid and accurate results. - -- Graph Support: Pydantic Graph provides a powerful way to define graphs using typing hints, this is useful in complex applications where standard control flow can degrade to spaghetti code. - -# Building a RAG Agent using PydanticAI - -PydanticAI makes heavy use of dependency injection to provide data and services to your agent's system prompts and tools. We define dependencies using a `dataclass`, which serves as a container for our dependencies. - -In our case, the only dependency for our agent to work in the `CouchbaseSearchVectorStore` instance. However, we will still use a `dataclass` as it is good practice. In the future, in case we wish to add more dependencies, we can just add more fields to the `dataclass` `Deps`. - -We also initialize an agent as a GPT-4o model. PydanticAI supports many different LLM providers, including Anthropic, Google, Cohere, etc. which can also be used. While initializing the agent, we also pass the type of the dependencies. This is mainly used for type checking, and not actually used at runtime. - - -```python -@dataclass -class Deps: - vector_store: CouchbaseSearchVectorStore - -agent = Agent("openai:gpt-4o", deps_type=Deps) -``` - -# Defining the Vector Store as a Tool -PydanticAI has the concept of `function tools`, which are functions that can be called by LLMs to retrieve extra information that can help form a better response. - -We can perform RAG by creating a tool which retrieves documents that are semantically similar to the query, and allowing the agent to call the tool when required. We can add the function as a tool using the `@agent.tool` decorator. - -Notice that we also add the `context` parameter, which contains the dependencies that are passed to the tool (in this case, the only dependency is the vector store). - - -```python -@agent.tool -async def retrieve(context: RunContext[Deps], search_query: str) -> str: - """Retrieve news data based on a search query. - - Args: - context: The call context - search_query: The search query - """ - search_results = context.deps.vector_store.similarity_search_with_score(search_query, k=5) - return "\n\n".join( - f"# Documents:\n{doc.page_content}" - for doc, score in search_results - ) -``` - -Finally, we create a function that allows us to define our dependencies and run our agent. - - -```python -async def run_agent(question: str): - deps = Deps( - vector_store=vector_store, - ) - answer = await agent.run(question, deps=deps) - return answer -``` - -# Running our Agent -We have now finished setting up our vector store and agent! The system is now ready to accept queries. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" -output = await run_agent(query) - -print("=" * 20, "Agent Output", "=" * 20) -print(output.data) -``` - - 2025-04-11 13:56:53,839 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" - 2025-04-11 13:56:54,485 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" - 2025-04-11 13:57:01,928 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" - - - ==================== Agent Output ==================== - Pep Guardiola has expressed a mix of determination and concern regarding Manchester City's current form. He acknowledged the personal impact of the team's downturn, admitting that the situation has affected his sleep and diet due to the worst run of results he has ever faced in his managerial career. Guardiola described his state of mind as "ugly," noting the team's precarious position in competitions and the need to defend better and avoid mistakes. - - Despite these challenges, Guardiola remains committed to finding solutions, emphasizing the need to improve defensive concepts and restore the team's intensity and form. He acknowledged the errors from some of the best players in the world and expressed a need for the team to stay positive and for players to have the necessary support to overcome their current struggles. - - Moreover, Guardiola expressed a pragmatic view of the situation, accepting that the team must "survive" the season and acknowledging a potential need for a significant rebuild to address the challenges they're facing. As a testament to his commitment, he noted his intention to continue shaping the club during his newly extended contract period. Throughout, he reiterated his belief in the team and emphasized the need to find a way forward. - - -# Inspecting the Agent -We can use the `all_messages()` method in the output object to observe how the agent and tools work. - -In the cell below, we see an extremely detailed list of all the model's messages and tool calls, which happens step by step: -1. The `UserPromptPart`, which consists of the query the user sends to the agent. -2. The agent calls the `retrieve` tool in the `ToolCallPart` message. This includes the `search_query` argument. Couchbase uses this `search_query` to perform semantic search over all the ingested news articles. -3. The `retrieve` tool returns a `ToolReturnPart` object with all the context required for the model to answer the user's query. The retrieve documents were truncated, because a large amount of context was retrieved. -4. The final message is the LLM generated response with the added context, which is sent back to the user. - - -```python -from pprint import pprint - -for idx, message in enumerate(output.all_messages(), start=1): - print(f"Step {idx}:") - pprint(message.__repr__()) - print("=" * 50) -``` - - Step 1: - ('ModelRequest(parts=[UserPromptPart(content="What was manchester city manager ' - 'pep guardiola\'s reaction to the team\'s current form?", ' - 'timestamp=datetime.datetime(2025, 4, 11, 8, 26, 52, 836357, ' - "tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request')") - ================================================== - Step 2: - ("ModelResponse(parts=[ToolCallPart(tool_name='retrieve', " - 'args=\'{"search_query":"Pep Guardiola reaction to Manchester City current ' - 'form"}\', tool_call_id=\'call_oo4Jjn93VkRJ3q9PnAwkt3xm\', ' - "part_kind='tool-call')], model_name='gpt-4o-2024-08-06', " - 'timestamp=datetime.datetime(2025, 4, 11, 8, 26, 53, ' - "tzinfo=datetime.timezone.utc), kind='response')") - ================================================== - Step 3: - ("ModelRequest(parts=[ToolReturnPart(tool_name='retrieve', content='# " - 'Documents:\\nManchester City boss Pep Guardiola has won 18 trophies since he ' - 'arrived at the club in 2016\\n\\nManchester City boss Pep Guardiola says he ' - 'is "fine" despite admitting his sleep and diet are being affected by the ' - 'worst run of results in his entire managerial career. In an interview with ' - 'former Italy international Luca Toni for Amazon Prime Sport before ' - "Wednesday\\'s Champions League defeat by Juventus, Guardiola touched on the " - "personal impact City\\'s sudden downturn in form has had. Guardiola said his " - 'state of mind was "ugly", that his sleep was "worse" and he was eating ' - "lighter as his digestion had suffered. City go into Sunday\\'s derby against " - - ... (output truncated for brevity) - diff --git a/tutorial/markdown/generated/vector-search-cookbook/pydantic_ai-gsi-RAG_with_Couchbase_and_PydanticAI.md b/tutorial/markdown/generated/vector-search-cookbook/pydantic_ai-gsi-RAG_with_Couchbase_and_PydanticAI.md deleted file mode 100644 index d4b1fe1..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/pydantic_ai-gsi-RAG_with_Couchbase_and_PydanticAI.md +++ /dev/null @@ -1,1358 +0,0 @@ ---- -# frontmatter -path: "/tutorial-pydantic-ai-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and PydanticAI using GSI -short_title: RAG with Couchbase and PydanticAI using GSI -description: - - Learn how to build a semantic search engine using Couchbase and PydanticAI using GSI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with PydanticAI using GSI indexes. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using PydanticAI and Couchbase with GSI optimization. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - OpenAI - - PydanticAI - - GSI -sdk_language: - - python -length: 30 Mins ---- - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/pydantic_ai/gsi/RAG_with_Couchbase_and_PydanticAI.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [OpenAI](https://openai.com) as the embedding and LLM provider, and [PydanticAI](https://ai.pydantic.dev) as an agent orchestrator. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI (Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-pydantic-ai-couchbase-rag-with-fts/) - - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/pydantic_ai/gsi/RAG_with_Couchbase_and_PydanticAI.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - - -# Before you start - -## Get Credentials for OpenAI - -* Please follow the [instructions](https://platform.openai.com/docs/quickstart) to generate the OpenAI credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. - -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and OpenAI provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-openai==0.3.32 python-dotenv==1.1.1 pydantic-ai==0.1.1 -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv - -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -from langchain_openai import OpenAIEmbeddings - -from dataclasses import dataclass -from pydantic_ai import Agent, RunContext - -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -# Disable all logging except critical to prevent OpenAI API request logs -logging.getLogger("httpx").setLevel(logging.CRITICAL) - -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - - -```python -load_dotenv() - -# Load from environment variables or prompt for input in one-liners -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API key: ') -CB_HOST = os.getenv('CB_HOST', 'couchbase://localhost') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME', 'Administrator') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD', 'password') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME', 'query-vector-search-testing') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME', 'shared') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME', 'pydantic_ai') or input('Enter your collection name (default: pydantic_ai): ') or 'pydantic_ai' - -# Check if the variables are correctly loaded -if not OPENAI_API_KEY: - raise ValueError("OPENAI_API_KEY is not set in the environment.") - -if 'OPENAI_API_KEY' not in os.environ: - os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY - -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") - -``` - - 2025-11-07 14:19:15,306 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) - -``` - - 2025-11-07 14:19:15,319 - INFO - Bucket 'travel-sample' exists. - 2025-11-07 14:19:15,320 - INFO - Scope 'shared' does not exist. Creating it... - 2025-11-07 14:19:15,352 - INFO - Scope 'shared' created successfully. - 2025-11-07 14:19:15,354 - INFO - Collection 'demo' does not exist. Creating it... - 2025-11-07 14:19:15,407 - INFO - Collection 'demo' created successfully. - 2025-11-07 14:19:17,451 - INFO - All documents cleared from the collection. - - - - - - - - - -# Creating OpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using OpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - - -```python -try: - embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model='text-embedding-3-small') - logging.info("Successfully created OpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating OpenAIEmbeddings: {str(e)}") - -``` - - 2025-11-07 14:19:17,576 - INFO - Successfully created OpenAIEmbeddings - - -# Understanding GSI Vector Search - -### Optimizing Vector Search with Global Secondary Index (GSI) - -With Couchbase 8.0+, you can leverage the power of GSI-based vector search, which offers significant performance improvements over traditional Full-Text Search (FTS) approaches for vector-first workloads. GSI vector search provides high-performance vector similarity search with advanced filtering capabilities and is designed to scale to billions of vectors. - -#### GSI vs FTS: Choosing the Right Approach - -| Feature | GSI Vector Search | FTS Vector Search | -| --------------------- | --------------------------------------------------------------- | ----------------------------------------- | -| **Best For** | Vector-first workloads, complex filtering, high QPS performance| Hybrid search and high recall rates | -| **Couchbase Version** | 8.0.0+ | 7.6+ | -| **Filtering** | Pre-filtering with `WHERE` clauses (Composite) or post-filtering (BHIVE) | Pre-filtering with flexible ordering | -| **Scalability** | Up to billions of vectors (BHIVE) | Up to 10 million vectors | -| **Performance** | Optimized for concurrent operations with low memory footprint | Good for mixed text and vector queries | - - -#### GSI Vector Index Types - -Couchbase offers two distinct GSI vector index types, each optimized for different use cases: - -##### Hyperscale Vector Indexes (BHIVE) - -- **Best for**: Pure vector searches like content discovery, recommendations, and semantic search -- **Use when**: You primarily perform vector-only queries without complex scalar filtering -- **Features**: - - High performance with low memory footprint - - Optimized for concurrent operations - - Designed to scale to billions of vectors - - Supports post-scan filtering for basic metadata filtering - -##### Composite Vector Indexes - - - **Best for**: Filtered vector searches that combine vector similarity with scalar value filtering -- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- **Features**: - - Efficient pre-filtering where scalar attributes reduce the vector comparison scope - - Best for well-defined workloads requiring complex filtering using GSI features - - Supports range lookups combined with vector search - -#### Index Type Selection for This Tutorial - -In this tutorial, we'll demonstrate creating a **BHIVE index** and running vector similarity queries using GSI. BHIVE is ideal for semantic search scenarios where you want: - -1. **High-performance vector search** across large datasets -2. **Low latency** for real-time applications -3. **Scalability** to handle growing vector collections -4. **Concurrent operations** for multi-user environments - -The BHIVE index will provide optimal performance for our OpenAI embedding-based semantic search implementation. - -#### Alternative: Composite Vector Index - -If your use case requires complex filtering with scalar attributes, you may want to consider using a **Composite Vector Index** instead: - -```python -# Alternative: Create a Composite index for filtered searches -vector_store.create_index( - index_type=IndexType.COMPOSITE, - index_description="IVF,SQ8", - distance_metric=DistanceStrategy.COSINE, - index_name="pydantic_composite_index", -) -``` - -**Use Composite indexes when:** -- You need to filter by document metadata or attributes before vector similarity -- Your queries combine vector search with WHERE clauses -- You have well-defined filtering requirements that can reduce the search space - -**Note**: Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications where you need to search within specific categories, date ranges, or user-specific data segments. - -#### Understanding GSI Index Configuration (Couchbase 8.0 Feature) - -Before creating our BHIVE index, it's important to understand the configuration parameters that optimize vector storage and search performance. The `index_description` parameter controls how Couchbase optimizes vector storage through centroids and quantization. - -##### Index Description Format: `'IVF[],{PQ|SQ}'` - -##### Centroids (IVF - Inverted File) - -- Controls how the dataset is subdivided for faster searches -- **More centroids** = faster search, slower training time -- **Fewer centroids** = slower search, faster training time -- If omitted (like `IVF,SQ8`), Couchbase auto-selects based on dataset size - -###### Quantization Options - -**Scalar Quantization (SQ):** -- `SQ4`, `SQ6`, `SQ8` (4, 6, or 8 bits per dimension) -- Lower memory usage, faster search, slightly reduced accuracy - -**Product Quantization (PQ):** -- Format: `PQx` (e.g., `PQ32x8`) -- Better compression for very large datasets -- More complex but can maintain accuracy with smaller index size - -##### Common Configuration Examples - -- **`IVF,SQ8`** - Auto centroids, 8-bit scalar quantization (good default) -- **`IVF1000,SQ6`** - 1000 centroids, 6-bit scalar quantization -- **`IVF,PQ32x8`** - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - -##### Our Configuration Choice - -In this tutorial, we use `IVF,SQ8` which provides: -- **Auto-selected centroids** optimized for our dataset size -- **8-bit scalar quantization** for good balance of speed, memory usage, and accuracy -- **COSINE distance metric** ideal for semantic similarity search -- **Optimal performance** for most semantic search use cases - -# Setting Up the Couchbase Query Vector Store -A vector store is where we'll keep our embeddings. The query vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, GSI converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables us to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - -The vector store requires a distance metric to determine how similarity between vectors is calculated. This is crucial for accurate semantic search results as different distance metrics can yield different similarity rankings. Some of the supported Distance strategies are dot, l2, euclidean, cosine, l2_squared, euclidean_squared. In our implementation we will use cosine which is particularly effective for text embeddings. - - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-11-07 14:19:18,781 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") - -``` - - 2025-11-07 14:19:23,958 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") - -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 100 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 100 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-11-07 14:20:20,820 - INFO - Document ingestion completed successfully. - - -# Understanding Semantic Search in Couchbase - -Semantic search goes beyond traditional keyword matching by understanding the meaning and context behind queries. Here's how it works in Couchbase: - -## How Semantic Search Works - -1. **Vector Embeddings**: Documents and queries are converted into high-dimensional vectors using an embeddings model (in our case, OpenAI's text-embedding-3-small) - -2. **Similarity Calculation**: When a query is made, Couchbase compares the query vector against stored document vectors using the COSINE distance metric - -3. **Result Ranking**: Documents are ranked by their vector distance (lower distance = more similar meaning) - -4. **Flexible Configuration**: Different distance metrics (cosine, euclidean, dot product) and embedding models can be used based on your needs - -The `similarity_search_with_score` method performs this entire process, returning documents along with their similarity scores. This enables you to find semantically related content even when exact keywords don't match. - -Now let's see semantic search in action and measure its performance with different optimization strategies. - -# Vector Search Performance Optimization - -Now let's measure and compare the performance benefits of different optimization strategies. We'll conduct a comprehensive performance analysis across two phases: - -## Performance Testing Phases - -1. **Phase 1 - Baseline Performance**: Test vector search without GSI indexes to establish baseline metrics -2. **Phase 2 - GSI-Optimized Search**: Create BHIVE index and measure performance improvements - -**Important Context:** -- GSI performance benefits scale with dataset size and concurrent load -- With our dataset (~1,700 articles), improvements may be modest -- Production environments with millions of vectors show significant GSI advantages -- The combination of GSI + LLM caching provides optimal RAG performance - - - -```python -# Phase 1: Baseline Performance (Without GSI Index) -print("="*80) -print("PHASE 1: BASELINE PERFORMANCE (NO GSI INDEX)") -print("="*80) - -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - baseline_time = time.time() - start_time - - logging.info(f"Baseline search completed in {baseline_time:.2f} seconds") - - # Display search results - print(f"\nBaseline Search Results (completed in {baseline_time:.4f} seconds):") - print("-" * 80) - for i, (doc, distance) in enumerate(search_results, 1): - print(f"[Result {i}] Vector Distance: {distance:.4f}") - # Truncate for readability - content_preview = doc.page_content[:150] + "..." if len(doc.page_content) > 150 else doc.page_content - print(f"Text: {content_preview}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - ================================================================================ - PHASE 1: BASELINE PERFORMANCE (NO GSI INDEX) - ================================================================================ - - - 2025-11-07 14:20:22,185 - INFO - Baseline search completed in 1.36 seconds - - - - Baseline Search Results (completed in 1.3612 seconds): - -------------------------------------------------------------------------------- - [Result 1] Vector Distance: 0.2956 - Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" desp... - -------------------------------------------------------------------------------- - [Result 2] Vector Distance: 0.3100 - Text: Pep Guardiola has said Manchester City will be his final managerial job in club football before he "maybe" coaches a national team. - - The former Barcel... - -------------------------------------------------------------------------------- - [Result 3] Vector Distance: 0.3311 - Text: 'I am not good enough' - Guardiola faces daunting and major rebuild - - This video can not be played To play this video you need to enable JavaScript in ... - -------------------------------------------------------------------------------- - [Result 4] Vector Distance: 0.3474 - Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - Pep Guardiola has not been through a moment like this in his managerial career. Ma... - -------------------------------------------------------------------------------- - [Result 5] Vector Distance: 0.3666 - Text: Man City's Dias ruled out for 'three or four weeks' - - Ruben Dias has won 10 major trophies during his time at Manchester City - - Manchester City have suf... - -------------------------------------------------------------------------------- - [Result 6] Vector Distance: 0.3818 - Text: 'We have to find a way' - Guardiola vows to end relegation form - - This video can not be played To play this video you need to enable JavaScript in your... - -------------------------------------------------------------------------------- - [Result 7] Vector Distance: 0.4157 - Text: Man City might miss out on Champions League - Guardiola - - Erling Haaland was part of the Manchester City side that won the Champions League for the fir... - -------------------------------------------------------------------------------- - [Result 8] Vector Distance: 0.4705 - Text: 'So happy he is back' - 'integral' De Bruyne 'one of best we've seen' - - This video can not be played To play this video you need to enable JavaScript i... - -------------------------------------------------------------------------------- - [Result 9] Vector Distance: 0.4831 - Text: 'Life is not easy' - Haaland penalty miss sums up Man City crisis - - Manchester City striker Erling Haaland has now missed two of his 17 penalties taken... - -------------------------------------------------------------------------------- - [Result 10] Vector Distance: 0.5382 - Text: Amorim knows job in 'danger' without victories - - This video can not be played To play this video you need to enable JavaScript in your browser. 'I know... - -------------------------------------------------------------------------------- - - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -vector_store.create_index(index_type=IndexType.BHIVE, index_name="pydantic_ai_bhive_index", index_description="IVF,SQ8") -``` - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="pydantic_ai_composite_index", index_description="IVF,SQ8") - - -```python -# Phase 2: GSI-Optimized Performance (With BHIVE Index) -print("\n" + "="*80) -print("PHASE 2: GSI-OPTIMIZED PERFORMANCE (WITH BHIVE INDEX)") -print("="*80) - -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search with GSI - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=3) - gsi_time = time.time() - start_time - - logging.info(f"GSI-optimized search completed in {gsi_time:.2f} seconds") - - # Display search results - print(f"\nGSI-Optimized Search Results (completed in {gsi_time:.4f} seconds):") - print("-" * 80) - for i, (doc, distance) in enumerate(search_results, 1): - print(f"[Result {i}] Vector Distance: {distance:.4f}") - # Truncate for readability - content_preview = doc.page_content[:150] + "..." if len(doc.page_content) > 150 else doc.page_content - print(f"Text: {content_preview}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - - ================================================================================ - PHASE 2: GSI-OPTIMIZED PERFORMANCE (WITH BHIVE INDEX) - ================================================================================ - - - 2025-11-07 14:20:28,399 - INFO - GSI-optimized search completed in 0.41 seconds - - - - GSI-Optimized Search Results (completed in 0.4124 seconds): - -------------------------------------------------------------------------------- - [Result 1] Vector Distance: 0.2956 - Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" desp... - -------------------------------------------------------------------------------- - [Result 2] Vector Distance: 0.3100 - Text: Pep Guardiola has said Manchester City will be his final managerial job in club football before he "maybe" coaches a national team. - - The former Barcel... - -------------------------------------------------------------------------------- - [Result 3] Vector Distance: 0.3311 - Text: 'I am not good enough' - Guardiola faces daunting and major rebuild - - This video can not be played To play this video you need to enable JavaScript in ... - -------------------------------------------------------------------------------- - - -## Performance Analysis Summary - -Let's analyze the performance improvements we've achieved through different optimization strategies: - - - -```python -print("\n" + "="*80) -print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"\n📊 Performance Comparison:") -print(f"{'Optimization Level':<35} {'Time (seconds)':<20} {'Status'}") -print("-" * 80) -print(f"{'Phase 1 - Baseline (No Index)':<35} {baseline_time:.4f}{'':16} ⚪ Baseline") -print(f"{'Phase 2 - GSI-Optimized (BHIVE)':<35} {gsi_time:.4f}{'':16} ✅ Optimized") - -# Calculate improvement -if baseline_time > gsi_time: - speedup = baseline_time / gsi_time - improvement = ((baseline_time - gsi_time) / baseline_time) * 100 - print(f"\n✨ GSI Performance Gain: {speedup:.2f}x faster ({improvement:.1f}% improvement)") -elif gsi_time > baseline_time: - slowdown_pct = ((gsi_time - baseline_time) / baseline_time) * 100 - print(f"\n⚠️ Note: GSI was {slowdown_pct:.1f}% slower than baseline in this run") - print(f" This can happen with small datasets. GSI benefits emerge with scale.") -else: - print(f"\n⚖️ Performance: Comparable to baseline") - -print("\n" + "-"*80) -print("KEY INSIGHTS:") -print("-"*80) -print("1. 🚀 GSI Optimization:") -print(" • BHIVE indexes excel with large-scale datasets (millions+ vectors)") -print(" • Performance gains increase with dataset size and concurrent queries") -print(" • Optimal for production workloads with sustained traffic patterns") - -print("\n2. 📦 Dataset Size Impact:") -print(f" • Current dataset: ~1,700 articles") -print(" • At this scale, performance differences may be minimal or variable") -print(" • Significant gains typically seen with 10M+ vectors") - -print("\n3. 🎯 When to Use GSI:") -print(" • Large-scale vector search applications") -print(" • High query-per-second (QPS) requirements") -print(" • Multi-user concurrent access scenarios") -print(" • Production environments requiring scalability") - -print("\n" + "="*80) - -``` - - - ================================================================================ - VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - - 📊 Performance Comparison: - Optimization Level Time (seconds) Status - -------------------------------------------------------------------------------- - Phase 1 - Baseline (No Index) 1.3612 ⚪ Baseline - Phase 2 - GSI-Optimized (BHIVE) 0.4124 ✅ Optimized - - ✨ GSI Performance Gain: 3.30x faster (69.7% improvement) - - -------------------------------------------------------------------------------- - KEY INSIGHTS: - -------------------------------------------------------------------------------- - 1. 🚀 GSI Optimization: - • BHIVE indexes excel with large-scale datasets (millions+ vectors) - • Performance gains increase with dataset size and concurrent queries - • Optimal for production workloads with sustained traffic patterns - - 2. 📦 Dataset Size Impact: - • Current dataset: ~1,700 articles - • At this scale, performance differences may be minimal or variable - • Significant gains typically seen with 10M+ vectors - - 3. 🎯 When to Use GSI: - • Large-scale vector search applications - • High query-per-second (QPS) requirements - • Multi-user concurrent access scenarios - • Production environments requiring scalability - - ================================================================================ - - -# PydanticAI: An Introduction -From [PydanticAI](https://ai.pydantic.dev/)'s website: - -> PydanticAI is a Python agent framework designed to make it less painful to build production grade applications with Generative AI. - -PydanticAI allows us to define agents and tools easily to create Gen-AI apps in an innovative and painless manner. Some of its features are: -- Built by the Pydantic Team: Built by the team behind Pydantic (the validation layer of the OpenAI SDK, the Anthropic SDK, LangChain, LlamaIndex, AutoGPT, Transformers, CrewAI, Instructor and many more). - -- Model-agnostic: Supports OpenAI, Anthropic, Gemini, Deepseek, Ollama, Groq, Cohere, and Mistral, and there is a simple interface to implement support for other models. - -- Type-safe: Designed to make type checking as powerful and informative as possible for you. - -- Python-centric Design: Leverages Python's familiar control flow and agent composition to build your AI-driven projects, making it easy to apply standard Python best practices you'd use in any other (non-AI) project. - -- Structured Responses: Harnesses the power of Pydantic to validate and structure model outputs, ensuring responses are consistent across runs. - -- Dependency Injection System: Offers an optional dependency injection system to provide data and services to your agent's system prompts, tools and result validators. This is useful for testing and eval-driven iterative development. - -- Streamed Responses: Provides the ability to stream LLM outputs continuously, with immediate validation, ensuring rapid and accurate results. - -- Graph Support: Pydantic Graph provides a powerful way to define graphs using typing hints, this is useful in complex applications where standard control flow can degrade to spaghetti code. - -# Building a RAG Agent using PydanticAI - -PydanticAI makes heavy use of dependency injection to provide data and services to your agent's system prompts and tools. We define dependencies using a `dataclass`, which serves as a container for our dependencies. - -In our case, the only dependency for our agent to work is the `CouchbaseQueryVectorStore` instance. However, we will still use a `dataclass` as it is good practice. In the future, in case we wish to add more dependencies, we can just add more fields to the `dataclass` `Deps`. - -We also initialize an agent as a GPT-4o model. PydanticAI supports many different LLM providers, including Anthropic, Google, Cohere, etc. which can also be used. While initializing the agent, we also pass the type of the dependencies. This is mainly used for type checking, and not actually used at runtime. - - - -```python -@dataclass -class Deps: - vector_store: CouchbaseQueryVectorStore - -agent = Agent("openai:gpt-4o", deps_type=Deps) - -``` - -# Defining the Vector Store as a Tool -PydanticAI has the concept of `function tools`, which are functions that can be called by LLMs to retrieve extra information that can help form a better response. - -We can perform RAG by creating a tool which retrieves documents that are semantically similar to the query, and allowing the agent to call the tool when required. We can add the function as a tool using the `@agent.tool` decorator. - -Notice that we also add the `context` parameter, which contains the dependencies that are passed to the tool (in this case, the only dependency is the vector store). - - - -```python -@agent.tool -async def retrieve(context: RunContext[Deps], search_query: str) -> str: - """Retrieve news data based on a search query. - - Args: - context: The call context - search_query: The search query - """ - search_results = context.deps.vector_store.similarity_search_with_score(search_query, k=5) - return "\n\n".join( - f"# Documents:\n{doc.page_content}" - for doc, score in search_results - ) - -``` - -Finally, we create a function that allows us to define our dependencies and run our agent. - - - -```python -async def run_agent(question: str): - deps = Deps( - vector_store=vector_store, - ) - answer = await agent.run(question, deps=deps) - return answer - -``` - -# Running our Agent -We have now finished setting up our vector store and agent! The system is now ready to accept queries. - - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" -output = await run_agent(query) - -print("=" * 20, "Agent Output", "=" * 20) -print(output.output) - -``` - - ==================== Agent Output ==================== - Pep Guardiola, Manchester City's manager, has expressed concern over the team's current form, describing it as one of the worst runs of results in his managerial career. Guardiola has admitted that the downturn in form has personally affected him, leading to poor sleep and a lighter diet due to stress. Manchester City have won just once in their past ten games, with their Champions League prospects uncertain following a defeat to Juventus. - - Guardiola has been self-critical, stating, "I am not good enough," and emphasizing his responsibility to find solutions. He acknowledged the team's defensive issues and vulnerabilities, which have been exploited by opponents, leading to losses even in games where City initially led. Despite the challenges, Guardiola remains supported by the club and its fans, who continue to reassess their situation as they aim to return to form. - - Internally, Guardiola has been contemplating the reasons behind the team's struggles, which include injuries to key players and errors from usually reliable performers. He recognizes the need for a rebuilding phase and suggests that major changes, possibly including player departures, may be necessary to restore City's competitive edge. - - -# Inspecting the Agent -We can use the `all_messages()` method in the output object to observe how the agent and tools work. - -In the cell below, we see an extremely detailed list of all the model's messages and tool calls, which happens step by step: -1. The `UserPromptPart`, which consists of the query the user sends to the agent. -2. The agent calls the `retrieve` tool in the `ToolCallPart` message. This includes the `search_query` argument. Couchbase uses this `search_query` to perform semantic search over all the ingested news articles. -3. The `retrieve` tool returns a `ToolReturnPart` object with all the context required for the model to answer the user's query. The retrieved documents were truncated, because a large amount of context was retrieved. -4. The final message is the LLM generated response with the added context, which is sent back to the user. - - - -```python -from pprint import pprint - -for idx, message in enumerate(output.all_messages(), start=1): - print(f"Step {idx}:") - pprint(message.__repr__()) - print("=" * 50) - -``` - - Step 1: - ('ModelRequest(parts=[UserPromptPart(content="What was manchester city manager ' - 'pep guardiola\'s reaction to the team\'s current form?", ' - 'timestamp=datetime.datetime(2025, 11, 7, 8, 50, 28, 431572, ' - "tzinfo=datetime.timezone.utc), part_kind='user-prompt')], instructions=None, " - "kind='request')") - ================================================== - Step 2: - ("ModelResponse(parts=[ToolCallPart(tool_name='retrieve', " - 'args=\'{"search_query":"Manchester City Pep Guardiola reaction to current ' - 'form"}\', tool_call_id=\'call_0iM6cTSBayIc2ypx0HhJgNn0\', ' - "part_kind='tool-call')], model_name='gpt-4o-2024-08-06', " - 'timestamp=datetime.datetime(2025, 11, 7, 8, 50, 28, ' - "tzinfo=datetime.timezone.utc), kind='response')") - ================================================== - Step 3: - ("ModelRequest(parts=[ToolReturnPart(tool_name='retrieve', content='# " - 'Documents:\\nManchester City boss Pep Guardiola has won 18 trophies since he ' - 'arrived at the club in 2016\\n\\nManchester City boss Pep Guardiola says he ' - 'is "fine" despite admitting his sleep and diet are being affected by the ' - 'worst run of results in his entire managerial career. In an interview with ' - 'former Italy international Luca Toni for Amazon Prime Sport before ' - "Wednesday\\'s Champions League defeat by Juventus, Guardiola touched on the " - "personal impact City\\'s sudden downturn in form has had. Guardiola said his " - 'state of mind was "ugly", that his sleep was "worse" and he was eating ' - "lighter as his digestion had suffered. City go into Sunday\\'s derby against " - 'Manchester United at Etihad Stadium having won just one of their past 10 ' - 'games. The Juventus loss means there is a chance they may not even secure a ' - 'play-off spot in the Champions League. Asked to elaborate on his comments to ' - 'Toni, Guardiola said: "I\\\'m fine. "In our jobs we always want to do our ' - "best or the best as possible. When that doesn\\'t happen you are more " - 'uncomfortable than when the situation is going well, always that happened. ' - '"In good moments I am happier but when I get to the next game I am still ' - 'concerned about what I have to do. There is no human being that makes an ' - 'activity and it doesn\\\'t matter how they do." Guardiola said City have to ' - 'defend better and "avoid making mistakes at both ends". To emphasise his ' - "point, Guardiola referred back to the third game of City\\'s current run, " - 'against a Sporting side managed by Ruben Amorim, who will be in the United ' - 'dugout at the weekend. City dominated the first half in Lisbon, led thanks ' - "to Phil Foden\\'s early effort and looked to be cruising. Instead, they " - 'conceded three times in 11 minutes either side of half-time as Sporting ' - 'eventually ran out 4-1 winners. "I would like to play the game like we ' - 'played in Lisbon on Sunday, believe me," said Guardiola, who is facing the ' - 'prospect of only having three fit defenders for the derby as Nathan Ake and ' - 'Manuel Akanji try to overcome injury concerns. If there is solace for City, ' - 'it comes from the knowledge United are not exactly flying. Their comeback ' - 'Europa League victory against Viktoria Plzen on Thursday was their third win ' - "of Amorim\\'s short reign so far but only one of those successes has come in " - 'the Premier League, where United have lost their past two games against ' - 'Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements ' - 'already on the red side of the city. "It\\\'s already there," he said. "You ' - 'see all the patterns, the movements, the runners and the pace. He will do a ' - 'good job at United, I\\\'m pretty sure of that."\\n\\nGuardiola says skipper ' - 'Kyle Walker has been offered support by the club after the City defender ' - 'highlighted the racial abuse he had received on social media in the wake of ' - 'the Juventus trip. "It\\\'s unacceptable," he said. "Not because it\\\'s ' - 'Kyle - for any human being. "Unfortunately it happens many times in the real ' - 'world. It is not necessary to say he has the support of the entire club. It ' - 'is completely unacceptable and we give our support to him."\\n\\n# ' - "Documents:\\n\\'I am not good enough\\' - Guardiola faces daunting and major " - 'rebuild\\n\\nThis video can not be played To play this video you need to ' - "enable JavaScript in your browser. \\'I am not good enough\\' - Guardiola " - "says he must find a \\'solution\\' after derby loss\\n\\nPep Guardiola says " - "his sleep has suffered during Manchester City\\'s deepening crisis, so he " - 'will not be helped by a nightmarish conclusion to one of the most stunning ' - 'defeats of his long reign. Guardiola looked agitated, animated and on edge ' - "even after City led the Manchester derby through Josko Gvardiol\\'s " - '36th-minute header, his reaction to the goal one of almost disdain that it ' - 'came via a deflected cross as opposed to in his purist style. He sat alone ' - 'with his eyes closed sipping from a water bottle before the resumption of ' - 'the second half, then was denied even the respite of victory when Manchester ' - 'United gave this largely dismal derby a dramatic conclusion it barely ' - 'deserved with a remarkable late comeback. First, with 88 minutes on the ' - 'clock, Matheus Nunes presented Amad Diallo with the ball before compounding ' - 'his error by flattening the forward as he made an attempt to recover his ' - 'mistake. Bruno Fernandes completed the formalities from the penalty spot. ' - "Worse was to come two minutes later when Lisandro Martinez\\'s routine long " - "ball caught City\\'s defence inexplicably statuesque. Goalkeeper Ederson\\'s " - 'positioning was awry, allowing the lively Diallo to pounce from an acute ' - 'angle to leave Guardiola and his players stunned. It was the latest into any ' - 'game, 88 minutes, that reigning Premier League champions had led then lost. ' - 'It was also the first time City had lost a game they were leading so late ' - "on. And in a sign of City\\'s previous excellence that is now being " - 'challenged, they have only lost four of 105 Premier League home games under ' - 'Guardiola in which they have been ahead at half-time, winning 94 and drawing ' - 'seven. Guardiola delivered a brutal self-analysis as he told Match of the ' - 'Day: "I am not good enough. I am the boss. I am the manager. I have to find ' - 'solutions and so far I haven\\\'t. That\\\'s the reality. "Not much else to ' - 'say. No defence. Manchester United were incredibly persistent. We have not ' - 'lost eight games in two seasons. We can\\\'t defend that."\\n\\nManchester ' - 'City manager Pep Guardiola in despair during the derby defeat to Manchester ' - 'United\\n\\nGuardiola suggested the serious renewal will wait until the ' - 'summer but the red flags have been appearing for weeks in the sudden and ' - 'shocking decline of a team that has lost the aura of invincibility that left ' - 'many opponents beaten before kick-off in previous years. He has had stated ' - 'City must "survive" this season - whatever qualifies as survival for a club ' - 'of such rich ambition - but the quest for a record fifth successive Premier ' - 'League title is surely over as they lie nine points behind leaders Liverpool ' - 'having played a game more. Their Champions League aspirations are also in ' - "jeopardy after another loss, this time against Juventus in Turin. City\\'s " - 'squad has been allowed to grow too old together. The insatiable thirst for ' - 'success seems to have gone, the scales of superiority have fallen away and ' - 'opponents now sense vulnerability right until the final whistle, as United ' - 'did here. The manner in which United were able, and felt able, to snatch ' - 'this victory drove right to the heart of how City, and Guardiola, are ' - 'allowing opponents to prey on their downfall. Guardiola has every reason to ' - 'cite injuries, most significantly to Rodri and also John Stones as well as ' - 'others, but this cannot be used an excuse for such a dramatic decline in ' - 'standards, allied to the appearance of a soft underbelly that is so easily ' - "exploited. And City\\'s rebuild will not be a quick fix. With every " - 'performance, every defeat, the scale of what lies in front of Guardiola ' - "becomes more obvious - and daunting. Manchester City\\'s fans did their best " - 'to reassure Guardiola of their faith in him with a giant Barcelona-inspired ' - 'banner draped from the stands before kick-off emblazoned with his image ' - 'reading "Més que un entrenador" - "More Than A Coach". And Guardiola will ' - 'now need to be more than a coach than at any time in his career. He will ' - "have the finances but it will be done with City\\'s challengers also " - 'strengthening. Kevin de Bruyne, 34 in June, lasted 68 minutes here before he ' - 'was substituted. Age and injuries are catching up with one of the greatest ' - 'players of the Premier League era and he is unlikely to be at City next ' - 'season. Mateo Kovacic, who replaced De Bruyne, is also 31 in May. Kyle ' - 'Walker, 34, is being increasingly exposed. His most notable contribution ' - 'here was an embarrassing collapse to the ground after the mildest ' - 'head-to-head collision with Rasmus Hojlund. Ilkay Gundogan, another ' - "34-year-old and a previous pillar of Guardiola\\'s great successes, no " - 'longer has the legs or energy to exert influence. This looks increasingly ' - 'like a season too far following his return from Barcelona. Flaws are also ' - 'being exposed elsewhere, with previously reliable performers failing to hit ' - 'previous standards. Phil Foden scored 27 goals and had 12 assists when he ' - 'was Premier League Player of the Season last term. This year he has just ' - 'three goals and two assists in 18 appearances in all competitions. He has no ' - 'goals and just one assist in 11 Premier League games. Jack Grealish, who ' - 'came on after 77 minutes against United, has not scored in a year for ' - 'Manchester City, his last goal coming in a 2-2 draw against Crystal Palace ' - 'on 16 December last year. He has, in the meantime, scored twice for England. ' - 'Erling Haaland is also struggling as City lack creativity and cutting edge. ' - 'He has three goals in his past 11 Premier League games after scoring 10 in ' - "his first five. And in another indication of City\\'s impotence, and their " - "reliance on Haaland, defender Gvardiol\\'s goal against United was his " - 'fourth this season, making him their second highest scorer in all ' - 'competitions behind the Norwegian striker, who has 18. Goalkeeper Ederson, ' - 'so reliable for so long, has already been dropped once this season and did ' - "not cover himself in glory for United\\'s winner. Guardiola, with that " - 'freshly signed two-year contract, insists he "wants it" as he treads on this ' - 'alien territory of failure. He will be under no illusions about the size of ' - 'the job in front of him as he placed his head in his hands in anguish after ' - 'yet another damaging and deeply revealing defeat. City and Guardiola are in ' - 'new, unforgiving territory.\\n\\n# Documents:\\nPep Guardiola has said ' - 'Manchester City will be his final managerial job in club football before he ' - '"maybe" coaches a national team.\\n\\nThe former Barcelona and Bayern Munich ' - 'boss has won 15 major trophies since taking charge of City in 2016.\\n\\nThe ' - '53-year-old Spaniard was approached in the summer about the possibility of ' - 'becoming England manager, but last month signed a two-year contract ' - 'extension with City until 2027.\\n\\nSpeaking to celebrity chef Dani Garcia ' - 'on YouTube, Guardiola did not indicate when he intends to step down at City ' - 'but said he would not return to club football - in the Premier League or ' - 'overseas.\\n\\n"I\\\'m not going to manage another team," he ' - 'said.\\n\\n"I\\\'m not talking about the long-term future, but what I\\\'m ' - 'not going to do is leave Manchester City, go to another country, and do the ' - 'same thing as now.\\n\\n"I wouldn\\\'t have the energy. The thought of ' - 'starting somewhere else, all the process of training and so on. No, no, no. ' - 'Maybe a national team, but that\\\'s different.\\n\\n"I want to leave it and ' - "go and play golf, but I can\\'t [if he takes a club job]. I think stopping " - 'would do me good."\\n\\nCity have won just once since Guardiola extended his ' - 'contract - and once in nine games since beating Southampton on 26 ' - 'October.\\n\\nThat victory came at home to Nottingham Forest last Wednesday, ' - 'but was followed by a 2-2 draw at Crystal Palace at the weekend.\\n\\nThe ' - 'Blues visit Juventus next in the Champions League on Wednesday (20:00 GMT), ' - 'before hosting Manchester United in the Premier League on Sunday ' - '(16:30).\\n\\n"Right now we are not in the position - when we have had the ' - 'results of the last seven, eight games - to talk about winning games in ' - 'plural," said Guardiola at his pre-match news conference.\\n\\n"We have to ' - 'win the game and not look at what happens in the next one yet."\\n\\n# ' - "Documents:\\n\\'We have to find a way\\' - Guardiola vows to end relegation " - 'form\\n\\nThis video can not be played To play this video you need to enable ' - "JavaScript in your browser. \\'Worrying\\' and \\'staggering\\' - Why do " - 'Manchester City keep conceding?\\n\\nManchester City are currently in ' - "relegation form and there is little sign of it ending. Saturday\\'s 2-1 " - 'defeat at Aston Villa left them joint bottom of the form table over the past ' - 'eight games with just Southampton for company. Saints, at the foot of the ' - 'Premier League, have the same number of points, four, as City over their ' - 'past eight matches having won one, drawn one and lost six - the same record ' - 'as the floundering champions. And if Southampton - who appointed Ivan Juric ' - 'as their new manager on Saturday - get at least a point at Fulham on Sunday, ' - 'City will be on the worst run in the division. Even Wolves, who sacked boss ' - "Gary O\\'Neil last Sunday and replaced him with Vitor Pereira, have earned " - 'double the number of points during the same period having played a game ' - 'fewer. They are damning statistics for Pep Guardiola, even if he does have ' - 'some mitigating circumstances with injuries to Ederson, Nathan Ake and Ruben ' - 'Dias - who all missed the loss at Villa Park - and the long-term loss of ' - "midfield powerhouse Rodri. Guardiola was happy with Saturday\\'s " - 'performance, despite defeat in Birmingham, but there is little solace to ' - 'take at slipping further out of the title race. He may have needed to field ' - 'a half-fit Manuel Akanji and John Stones at Villa Park but that does not ' - 'account for City looking a shadow of their former selves. That does not ' - 'justify the error Josko Gvardiol made to gift Jhon Duran a golden chance ' - 'inside the first 20 seconds, or £100m man Jack Grealish again failing to ' - "have an impact on a game. There may be legitimate reasons for City\\'s drop " - 'off, whether that be injuries, mental fatigue or just simply a team coming ' - 'to the end of its lifecycle, but their form, which has plunged off a cliff ' - 'edge, would have been unthinkable as they strolled to a fourth straight ' - 'title last season. "The worrying thing is the number of goals conceded," ' - 'said ex-England captain Alan Shearer on BBC Match of the Day. "The number of ' - 'times they were opened up because of the lack of protection and legs in ' - 'midfield was staggering. There are so many things that are wrong at this ' - 'moment in time."\\n\\nThis video can not be played To play this video you ' - "need to enable JavaScript in your browser. Man City \\'have to find a way\\' " - 'to return to form - Guardiola\\n\\nAfterwards Guardiola was calm, so much so ' - 'it was difficult to hear him in the news conference, a contrast to the ' - 'frustrated figure he cut on the touchline. He said: "It depends on us. The ' - 'solution is bring the players back. We have just one central defender fit, ' - 'that is difficult. We are going to try next game - another opportunity and ' - 'we don\\\'t think much further than that. "Of course there are more reasons. ' - "We concede the goals we don\\'t concede in the past, we [don\\'t] score the " - 'goals we score in the past. Football is not just one reason. There are a lot ' - 'of little factors. "Last season we won the Premier League, but we came here ' - 'and lost. We have to think positive and I have incredible trust in the guys. ' - 'Some of them have incredible pride and desire to do it. We have to find a ' - 'way, step by step, sooner or later to find a way back." Villa boss Unai ' - "Emery highlighted City\\'s frailties, saying he felt Villa could seize on " - 'the visitors\\\' lack of belief. "Manchester City are a little bit under the ' - 'confidence they have normally," he said. "The second half was different, we ' - 'dominated and we scored. Through those circumstances they were feeling worse ' - 'than even in the first half."\\n\\nErling Haaland had one touch in the Villa ' - 'box\\n\\nThere are chinks in the armour never seen before at City under ' - 'Guardiola and Erling Haaland conceded belief within the squad is low. He ' - 'told TNT after the game: "Of course, [confidence levels are] not the best. ' - 'We know how important confidence is and you can see that it affects every ' - 'human being. That is how it is, we have to continue and stay positive even ' - 'though it is difficult." Haaland, with 76 goals in 83 Premier League ' - 'appearances since joining City from Borussia Dortmund in 2022, had one shot ' - 'and one touch in the Villa box. His 18 touches in the whole game were the ' - 'lowest of all starting players and he has been self critical, despite ' - "scoring 13 goals in the top flight this season. Over City\\'s last eight " - 'games he has netted just twice though, but Guardiola refused to criticise ' - 'his star striker. He said: "Without him we will be even worse but I like the ' - "players feeling that way. I don\\'t agree with Erling. He needs to have the " - 'balls delivered in the right spots but he will fight for the next ' - 'one."\\n\\n# Documents:\\n\\\'Self-doubt, errors & big changes\\\' - inside ' - 'the crisis at Man City\\n\\nPep Guardiola has not been through a moment like ' - 'this in his managerial career. Manchester City have lost nine matches in ' - 'their past 12 - as many defeats as they had suffered in their previous 106 ' - 'fixtures. At the end of October, City were still unbeaten at the top of the ' - 'Premier League and favourites to win a fifth successive title. Now they are ' - 'seventh, 12 points behind leaders Liverpool having played a game more. It ' - 'has been an incredible fall from grace and left people trying to work out ' - 'what has happened - and whether Guardiola can make it right. After ' - 'discussing the situation with those who know him best, I have taken a closer ' - 'look at the future - both short and long term - and how the current crisis ' - "at Man City is going to be solved.\\n\\nPep Guardiola\\'s Man City have lost " - 'nine of their past 12 matches\\n\\nGuardiola has also been giving it a lot ' - 'of thought. He has not been sleeping very well, as he has said, and has not ' - 'been himself at times when talking to the media. He has been talking to a ' - 'lot of people about what is going on as he tries to work out the reasons for ' - "City\\'s demise. Some reasons he knows, others he still doesn\\'t. What " - 'people perhaps do not realise is Guardiola hugely doubts himself and always ' - 'has. He will be thinking "I\\\'m not going to be able to get us out of this" ' - 'and needs the support of people close to him to push away those insecurities ' - '- and he has that. He is protected by his people who are very aware, like he ' - 'is, that there are a lot of people that want City to fail. It has been a ' - 'turbulent time for Guardiola. Remember those marks he had on his head after ' - 'the 3-3 draw with Feyenoord in the Champions League? He always scratches his ' - 'head, it is a gesture of nervousness. Normally nothing happens but on that ' - 'day one of his nails was far too sharp so, after talking to the players in ' - 'the changing room where he scratched his head because of his usual agitated ' - 'gesturing, he went to the news conference. His right-hand man Manel Estiarte ' - 'sent him photos in a message saying "what have you got on your head?", but ' - 'by the time Guardiola returned to the coaching room there was hardly ' - 'anything there again. He started that day with a cover on his nose after the ' - 'same thing happened at the training ground the day before. Guardiola was ' - 'having a footballing debate with Kyle Walker about positional stuff and ' - 'marked his nose with that same nail. There was also that remarkable news ' - 'conference after the Manchester derby when he said "I don\\\'t know what to ' - 'do". That is partly true and partly not true. Ignore the fact Guardiola ' - 'suggested he was "not good enough". He actually meant he was not good enough ' - 'to resolve the situation with the group of players he has available and with ' - 'all the other current difficulties. There are obviously logical explanations ' - 'for the crisis and the first one has been talked about many times - the ' - 'absence of injured midfielder Rodri. You know the game Jenga? When you take ' - 'the wrong piece out, the whole tower collapses. That is what has happened ' - 'here. It is normal for teams to have an over-reliance on one player if he is ' - 'the best in the world in his position. And you cannot calculate the ' - 'consequences of an injury that rules someone like Rodri out for the season. ' - 'City are a team, like many modern ones, in which the holding midfielder is a ' - 'key element to the construction. So, when you take Rodri out, it is ' - 'difficult to hold it together. There were Plan Bs - John Stones, Manuel ' - 'Akanji, even Nathan Ake - but injuries struck. The big injury list has been ' - 'out of the ordinary and the busy calendar has also played a part in ' - 'compounding the issues. However, one factor even Guardiola cannot explain is ' - 'the big uncharacteristic errors in almost every game from international ' - 'players. Why did Matheus Nunes make that challenge to give away the penalty ' - 'against Manchester United? Jack Grealish is sent on at the end to keep the ' - 'ball and cannot do that. There are errors from Walker and other defenders. ' - "These are some of the best players in the world. Of course the players\\' " - 'mindset is important, and confidence is diminishing. Wrong decisions get ' - 'taken so there is almost panic on the pitch instead of calm. There are also ' - 'players badly out of form who are having to play because of injuries. Walker ' - "is now unable to hide behind his pace, I\\'m not sure Kevin de Bruyne is " - 'ever getting back to the level he used to be at, Bernardo Silva and Ilkay ' - 'Gundogan do not have time to rest, Grealish is not playing at his best. Some ' - 'of these players were only meant to be playing one game a week but, because ' - 'of injuries, have played 12 games in 40 days. It all has a domino effect. ' - "One consequence is that Erling Haaland isn\\'t getting the service to score. " - "But the Norwegian still remains City\\'s top-scorer with 13. Defender Josko " - 'Gvardiol is next on the list with just four. The way their form has been ' - 'analysed inside the City camp is there have only been three games where they ' - 'deserved to lose (Liverpool, Bournemouth and Aston Villa). But of course it ' - 'is time to change the dynamic.\\n\\nGuardiola has never protected his ' - 'players so much. He has not criticised them and is not going to do so. They ' - 'have won everything with him. Instead of doing more with them, he has tried ' - 'doing less. He has sometimes given them more days off to clear their heads, ' - 'so they can reset - two days this week for instance. Perhaps the time to ' - 'change a team is when you are winning, but no-one was suggesting Man City ' - 'were about to collapse when they were top and unbeaten after nine league ' - 'games. Some people have asked how bad it has to get before City make a ' - 'decision on Guardiola. The answer is that there is no decision to be made. ' - 'Maybe if this was Real Madrid, Barcelona or Juventus, the pressure from ' - 'outside would be massive and the argument would be made that Guardiola has ' - 'to go. At City he has won the lot, so how can anyone say he is failing? Yes, ' - "this is a crisis. But given all their problems, City\\'s renewed target is " - 'finishing in the top four. That is what is in all their heads now. The idea ' - 'is to recover their essence by improving defensive concepts that are not ' - 'there and re-establishing the intensity they are known for. Guardiola is ' - 'planning to use the next two years of his contract, which is expected to be ' - 'his last as a club manager, to prepare a new Manchester City. When he was at ' - 'the end of his four years at Barcelona, he asked two managers what to do ' - 'when you feel people are not responding to your instructions. Do you go or ' - 'do the players go? Sir Alex Ferguson and Rafael Benitez both told him that ' - 'the players need to go. Guardiola did not listen because of his emotional ' - 'attachment to his players back then and he decided to leave the Camp Nou ' - 'because he felt the cycle was over. He will still protect his players now ' - 'but there is not the same emotional attachment - so it is the players who ' - 'are going to leave this time. It is likely City will look to replace five or ' - 'six regular starters. Guardiola knows it is the end of an era and the start ' - 'of a new one. Changes will not be immediate and the majority of the work ' - 'will be done in the summer. But they are open to any opportunities in ' - 'January - and a holding midfielder is one thing they need. In the summer ' - "City might want to get Spain\\'s Martin Zubimendi from Real Sociedad and " - 'they know 60m euros (£50m) will get him. He said no to Liverpool last summer ' - 'even though everything was agreed, but he now wants to move on and the ' - 'Premier League is the target. Even if they do not get Zubimendi, that is the ' - 'calibre of footballer they are after. A new Manchester City is on its way - ' - 'with changes driven by Guardiola, incoming sporting director Hugo Viana and ' - "the football department.', tool_call_id='call_0iM6cTSBayIc2ypx0HhJgNn0', " - 'timestamp=datetime.datetime(2025, 11, 7, 8, 50, 30, 639657, ' - "tzinfo=datetime.timezone.utc), part_kind='tool-return')], instructions=None, " - "kind='request')") - ================================================== - Step 4: - ("ModelResponse(parts=[TextPart(content='Pep Guardiola, Manchester City\\'s " - "manager, has expressed concern over the team\\'s current form, describing it " - 'as one of the worst runs of results in his managerial career. Guardiola has ' - 'admitted that the downturn in form has personally affected him, leading to ' - 'poor sleep and a lighter diet due to stress. Manchester City have won just ' - 'once in their past ten games, with their Champions League prospects ' - 'uncertain following a defeat to Juventus. \\n\\nGuardiola has been ' - 'self-critical, stating, "I am not good enough," and emphasizing his ' - "responsibility to find solutions. He acknowledged the team\\'s defensive " - 'issues and vulnerabilities, which have been exploited by opponents, leading ' - 'to losses even in games where City initially led. Despite the challenges, ' - 'Guardiola remains supported by the club and its fans, who continue to ' - 'reassess their situation as they aim to return to form.\\n\\nInternally, ' - "Guardiola has been contemplating the reasons behind the team\\'s struggles, " - 'which include injuries to key players and errors from usually reliable ' - 'performers. He recognizes the need for a rebuilding phase and suggests that ' - 'major changes, possibly including player departures, may be necessary to ' - "restore City\\'s competitive edge.', part_kind='text')], " - "model_name='gpt-4o-2024-08-06', timestamp=datetime.datetime(2025, 11, 7, 8, " - "50, 31, tzinfo=datetime.timezone.utc), kind='response')") - ================================================== - - -## Conclusion -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and PydanticAI. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how it improves querying data more efficiently using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine using PydanticAI's agent-based approach. diff --git a/tutorial/markdown/generated/vector-search-cookbook/smolagents-fts-RAG_with_Couchbase_and_SmolAgents.md b/tutorial/markdown/generated/vector-search-cookbook/smolagents-fts-RAG_with_Couchbase_and_SmolAgents.md deleted file mode 100644 index de998df..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/smolagents-fts-RAG_with_Couchbase_and_SmolAgents.md +++ /dev/null @@ -1,652 +0,0 @@ ---- -# frontmatter -path: "/tutorial-smolagents-couchbase-rag-with-fts" -title: Retrieval-Augmented Generation (RAG) with Couchbase and smolagents -short_title: RAG with Couchbase and smolagents -description: - - Learn how to build a semantic search engine using Couchbase and Hugging Face smolagents. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with smolagents using tool calling. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using smolagents and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - OpenAI - - smolagents - - FTS -sdk_language: - - python -length: 30 Mins ---- - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/smolagents/fts/RAG_with_Couchbase_and_SmolAgents.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [OpenAI](https://openai.com) as the embedding and LLM provider, and [Hugging Face smolagents](https://huggingface.co/docs/smolagents/en/index) as an agent framework. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively if you want to perform semantic search using the GSI index, please take a look at [this.](https://developer.couchbase.com/tutorial-smolagents-couchbase-rag-with-global-secondary-index) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start -## Get Credentials for OpenAI -Please follow the [instructions](https://platform.openai.com/docs/quickstart) to generate the OpenAI credentials. -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and OpenAI provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -%pip install --quiet -U datasets==3.5.0 langchain-couchbase==0.3.0 langchain-openai==0.3.13 python-dotenv==1.1.0 smolagents==1.13.0 ipywidgets==8.1.6 -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (InternalServerFailureException, - ServiceUnavailableException, - QueryIndexAlreadyExistsException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import OpenAIEmbeddings - -from smolagents import Tool, OpenAIServerModel, ToolCallingAgent -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API Key: ') - -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME') or input('Enter your index name (default: vector_search_smolagents): ') or 'vector_search_smolagents' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: smolagents): ') or 'smolagents' - -# Check if the variables are correctly loaded -if not OPENAI_API_KEY: - raise ValueError("Missing OpenAI API Key") - -if 'OPENAI_API_KEY' not in os.environ: - os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-02-28 10:30:17,515 - INFO - Successfully connected to Couchbase - - -# Setting Up Collections in Couchbase -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: - -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: - -1. Main collection for vector embeddings -2. Cache collection for storing results - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -``` - - 2025-02-28 10:30:20,855 - INFO - Bucket 'vector-search-testing' exists. - 2025-02-28 10:30:21,350 - INFO - Collection 'smolagents' does not exist. Creating it... - 2025-02-28 10:30:21,619 - INFO - Collection 'smolagents' created successfully. - 2025-02-28 10:30:26,886 - INFO - Primary index present or created successfully. - 2025-02-28 10:30:26,938 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `smolagents`. The configuration is set up for vectors with exactly `1536 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/smolagents_index.json' # Local setup: specify your file path here - -# # Version for Google Colab -# def load_index_definition_colab(): -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -# try: -# with open(index_definition_path, 'r') as file: -# index_definition = json.load(file) -# return index_definition -# except Exception as e: -# raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Version for Local Environment -def load_index_definition_local(index_definition_path): - try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - return index_definition - except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Usage -# Uncomment the appropriate line based on your environment -# index_definition = load_index_definition_colab() -index_definition = load_index_definition_local('smolagents_index.json') -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-02-28 10:30:32,890 - INFO - Creating new index 'vector-search-testing.shared.vector_search_smolagents'... - 2025-02-28 10:30:33,058 - INFO - Index 'vector-search-testing.shared.vector_search_smolagents' successfully created/updated. - - -# Creating OpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using OpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - -```python -try: - embeddings = OpenAIEmbeddings( - model="text-embedding-3-small", - api_key=OPENAI_API_KEY, - ) - logging.info("Successfully created OpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating OpenAIEmbeddings: {str(e)}") -``` - - 2025-02-28 10:30:36,983 - INFO - Successfully created OpenAIEmbeddings - - -# Setting Up the Couchbase Vector Store -A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-02-28 10:30:40,503 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-02-28 10:30:51,981 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: - -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Error Handling: If an error occurs, only the current batch is affected -3. Progress Tracking: Easier to monitor and track the ingestion progress -4. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 100 to ensure reliable operation. The optimal batch size depends on many factors including: - -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - -```python -# Save the current logging level -current_logging_level = logging.getLogger().getEffectiveLevel() - -# # Set logging level to CRITICAL to suppress lower level logs -logging.getLogger().setLevel(logging.CRITICAL) - -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=100 - ) -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -# Restore the original logging level -logging.getLogger().setLevel(current_logging_level) -``` - -# smolagents: An Introduction -[smolagents](https://huggingface.co/docs/smolagents/en/index) is a agentic framework by Hugging Face for easy creation of agents in a few lines of code. - -Some of the features of smolagents are: - -- ✨ Simplicity: the logic for agents fits in ~1,000 lines of code (see agents.py). We kept abstractions to their minimal shape above raw code! - -- 🧑‍💻 First-class support for Code Agents. Our CodeAgent writes its actions in code (as opposed to "agents being used to write code"). To make it secure, we support executing in sandboxed environments via E2B. - -- 🤗 Hub integrations: you can share/pull tools to/from the Hub, and more is to come! - -- 🌐 Model-agnostic: smolagents supports any LLM. It can be a local transformers or ollama model, one of many providers on the Hub, or any model from OpenAI, Anthropic and many others via our LiteLLM integration. - -- 👁️ Modality-agnostic: Agents support text, vision, video, even audio inputs! Cf this tutorial for vision. - -- 🛠️ Tool-agnostic: you can use tools from LangChain, Anthropic's MCP, you can even use a Hub Space as a tool. - -# Building a RAG Agent using smolagents - -smolagents allows users to define their own tools for the agent to use. These tools can be of two types: -1. Tools defined as classes: These tools are subclassed from the `Tool` class and must override the `forward` method, which is called when the tool is used. -2. Tools defined as functions: These are simple functions that are called when the tool is used, and are decorated with the `@tool` decorator. - -In our case, we will use the first method, and we define our `RetrieverTool` below. We define a name, a description and a dictionary of inputs that the tool accepts. This helps the LLM properly identify and use the tool. - -The `RetrieverTool` is simple: it takes a query generated by the user, and uses Couchbase's performant vector search service under the hood to search for semantically similar documents to the query. The LLM can then use this context to answer the user's question. - - -```python -class RetrieverTool(Tool): - name = "retriever" - description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query." - inputs = { - "query": { - "type": "string", - "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.", - } - } - output_type = "string" - - def __init__(self, vector_store: CouchbaseSearchVectorStore, **kwargs): - super().__init__(**kwargs) - self.vector_store = vector_store - - def forward(self, query: str) -> str: - assert isinstance(query, str), "Query must be a string" - - docs = self.vector_store.similarity_search_with_score(query, k=5) - return "\n\n".join( - f"# Documents:\n{doc.page_content}" - for doc, score in docs - ) - -retriever_tool = RetrieverTool(vector_store) -``` - -# Defining Our Agent -smolagents have predefined configurations for agents that we can use. We use the `ToolCallingAgent`, which writes its tool calls in a JSON format. Alternatively, there also exists a `CodeAgent`, in which the LLM defines it's functions in code. - -The `CodeAgent` is offers benefits in certain challenging scenarios: it can lead to [higher performance in difficult benchmarks](https://huggingface.co/papers/2411.01747) and use [30% fewer steps to solve problems](https://huggingface.co/papers/2402.01030). However, since our use case is just a simple RAG tool, a `ToolCallingAgent` will suffice. - - -```python -agent = ToolCallingAgent( - tools=[retriever_tool], - model=OpenAIServerModel( - model_id="gpt-4o-2024-08-06", - api_key=OPENAI_API_KEY, - ), - max_steps=4, - verbosity_level=2 -) -``` - -# Running our Agent -We have now finished setting up our vector store and agent! The system is now ready to accept queries. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -agent_output = agent.run(query) -``` - - -
╭──────────────────────────────────────────────────── New run ────────────────────────────────────────────────────╮
-                                                                                                                 
- What was manchester city manager pep guardiola's reaction to the team's current form?                           
-                                                                                                                 
-╰─ OpenAIServerModel - gpt-4o-2024-08-06 ─────────────────────────────────────────────────────────────────────────╯
-
- - - - -
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-
- - - - 2025-02-28 10:32:28,032 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" - - - -
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
-│ Calling tool: 'retriever' with arguments: {'query': "Pep Guardiola's reaction to Manchester City's current      │
-│ form"}                                                                                                          │
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - 2025-02-28 10:32:28,466 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" - - - -
[Step 0: Duration 2.25 seconds| Input tokens: 1,010 | Output tokens: 23]
-
- - - - -
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-
- - - - 2025-02-28 10:32:31,724 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" - - - -
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
-│ Calling tool: 'final_answer' with arguments: {'answer': 'Manchester City manager Pep Guardiola has expressed a  │
-│ mix of concern and determination regarding the team\'s current form. Guardiola admitted that this is the worst  │
-│ run of results in his managerial career and that it has affected his sleep and diet. He described his state of  │
-│ mind as "ugly" and acknowledged that City needs to defend better and avoid making mistakes. Despite his         │
-│ personal challenges, Guardiola stated that he is "fine" and focused on finding solutions.\n\nGuardiola also     │
-│ took responsibility for the team\'s struggles, stating he is "not good enough" and has to find solutions. He    │
-│ expressed self-doubt but is striving to improve the team\'s situation step by step. Guardiola has faced         │
-│ criticism due to the team\'s poor form, which has seen them lose several matches and fall behind in the title   │
-│ race.\n\nHe emphasized the need to restore their defensive strength and regain confidence in their play.        │
-│ Guardiola is planning a significant rebuild of the squad to address these challenges, aiming to replace several │
-│ regular starters and emphasize improvements in the team\'s intensity and defensive concepts.'}                  │
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
Final answer: Manchester City manager Pep Guardiola has expressed a mix of concern and determination regarding the 
-team's current form. Guardiola admitted that this is the worst run of results in his managerial career and that it 
-has affected his sleep and diet. He described his state of mind as "ugly" and acknowledged that City needs to 
-defend better and avoid making mistakes. Despite his personal challenges, Guardiola stated that he is "fine" and 
-focused on finding solutions.
-
-Guardiola also took responsibility for the team's struggles, stating he is "not good enough" and has to find 
-solutions. He expressed self-doubt but is striving to improve the team's situation step by step. Guardiola has 
-faced criticism due to the team's poor form, which has seen them lose several matches and fall behind in the title 
-race.
-
-He emphasized the need to restore their defensive strength and regain confidence in their play. Guardiola is 
-planning a significant rebuild of the squad to address these challenges, aiming to replace several regular starters
-and emphasize improvements in the team's intensity and defensive concepts.
-
- - - - -
[Step 1: Duration 2.74 seconds| Input tokens: 7,162 | Output tokens: 241]
-
- - - -# Analyzing the Agent -When the agent runs, smolagents prints out the steps that the agent takes along with the tools called in each step. In the above tool call, two steps occur: - -**Step 1**: First, the agent determines that it requires a tool to be used, and the `retriever` tool is called. The agent also specifies the query parameter for the tool (a string). The tool returns semantically similar documents to the query from Couchbase's vector store. - -**Step 2**: Next, the agent determines that the context retrieved from the tool is sufficient to answer the question. It then calls the `final_answer` tool, which is predefined for each agent: this tool is called when the agent returns the final answer to the user. In this step, the LLM answers the user's query from the context retrieved in step 1 and passes it to the `final_answer` tool, at which point the agent's execution ends. - -# Conclusion - -By following these steps, you’ll have a fully functional agentic RAG system that leverages the strengths of Couchbase and smolagents, along with OpenAI. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you’re a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, RAG-driven chat system. diff --git a/tutorial/markdown/generated/vector-search-cookbook/smolagents-gsi-RAG_with_Couchbase_and_SmolAgents.md b/tutorial/markdown/generated/vector-search-cookbook/smolagents-gsi-RAG_with_Couchbase_and_SmolAgents.md deleted file mode 100644 index bc02d14..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/smolagents-gsi-RAG_with_Couchbase_and_SmolAgents.md +++ /dev/null @@ -1,829 +0,0 @@ ---- -# frontmatter -path: "/tutorial-smolagents-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and smolagents using GSI -short_title: RAG with Couchbase and smolagents using GSI -description: - - Learn how to build a semantic search engine using Couchbase and Hugging Face smolagents using GSI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with smolagents using GSI indexes. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using smolagents and Couchbase with GSI optimization. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - OpenAI - - smolagents - - GSI -sdk_language: - - python -length: 30 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/smolagents/gsi/RAG_with_Couchbase_and_SmolAgents.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [OpenAI](https://openai.com) as the embedding and LLM provider, and [Hugging Face smolagents](https://huggingface.co/docs/smolagents/en/index) as an agent framework. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI (Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-smolagents-couchbase-rag-with-fts/) - - -## How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/smolagents/gsi/RAG_with_Couchbase_and_SmolAgents.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - - -## Before you start -### Get Credentials for OpenAI -Please follow the [instructions](https://platform.openai.com/docs/quickstart) to generate the OpenAI credentials. - -### Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and OpenAI provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - - -```python -%pip install --quiet datasets==4.1.1 langchain-couchbase==0.5.0 langchain-openai==0.3.33 python-dotenv==1.1.1 smolagents==1.21.3 - -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -from langchain_openai import OpenAIEmbeddings - -from smolagents import Tool, OpenAIServerModel, ToolCallingAgent - -``` - -## Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -# Disable all logging except critical to prevent OpenAI API request logs -logging.getLogger("httpx").setLevel(logging.CRITICAL) - -``` - -## Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - - -```python -load_dotenv() - -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API Key: ') - -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: smolagents): ') or 'smolagents' - -# Check if the variables are correctly loaded -if not OPENAI_API_KEY: - raise ValueError("Missing OpenAI API Key") - -if 'OPENAI_API_KEY' not in os.environ: - os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY - -``` - -## Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") - -``` - - 2025-11-07 16:44:51,506 - INFO - Successfully connected to Couchbase - - -### Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) - -``` - - 2025-11-07 16:44:53,519 - INFO - Bucket 'travel-sample' exists. - 2025-11-07 16:44:53,527 - INFO - Collection 'smolagents' does not exist. Creating it... - 2025-11-07 16:44:53,575 - INFO - Collection 'smolagents' created successfully. - 2025-11-07 16:44:55,731 - INFO - All documents cleared from the collection. - - - - - - - - - -## Creating OpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using OpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - - -```python -try: - embeddings = OpenAIEmbeddings( - model="text-embedding-3-small", - api_key=OPENAI_API_KEY, - ) - logging.info("Successfully created OpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating OpenAIEmbeddings: {str(e)}") - -``` - - 2025-11-07 16:44:58,634 - INFO - Successfully created OpenAIEmbeddings - - -# Understanding GSI Vector Search - -### Optimizing Vector Search with Global Secondary Index (GSI) - -With Couchbase 8.0+, you can leverage the power of GSI-based vector search, which offers significant performance improvements over traditional Full-Text Search (FTS) approaches for vector-first workloads. GSI vector search provides high-performance vector similarity search with advanced filtering capabilities and is designed to scale to billions of vectors. - -#### GSI vs FTS: Choosing the Right Approach - -| Feature | GSI Vector Search | FTS Vector Search | -| --------------------- | --------------------------------------------------------------- | ----------------------------------------- | -| **Best For** | Vector-first workloads, complex filtering, high QPS performance| Hybrid search and high recall rates | -| **Couchbase Version** | 8.0.0+ | 7.6+ | -| **Filtering** | Pre-filtering with `WHERE` clauses (Composite) or post-filtering (BHIVE) | Pre-filtering with flexible ordering | -| **Scalability** | Up to billions of vectors (BHIVE) | Up to 10 million vectors | -| **Performance** | Optimized for concurrent operations with low memory footprint | Good for mixed text and vector queries | - - -#### GSI Vector Index Types - -Couchbase offers two distinct GSI vector index types, each optimized for different use cases: - -##### Hyperscale Vector Indexes (BHIVE) - -- **Best for**: Pure vector searches like content discovery, recommendations, and semantic search -- **Use when**: You primarily perform vector-only queries without complex scalar filtering -- **Features**: - - High performance with low memory footprint - - Optimized for concurrent operations - - Designed to scale to billions of vectors - - Supports post-scan filtering for basic metadata filtering - -##### Composite Vector Indexes - - - **Best for**: Filtered vector searches that combine vector similarity with scalar value filtering -- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- **Features**: - - Efficient pre-filtering where scalar attributes reduce the vector comparison scope - - Best for well-defined workloads requiring complex filtering using GSI features - - Supports range lookups combined with vector search - -#### Index Type Selection for This Tutorial - -In this tutorial, we'll demonstrate creating a **BHIVE index** and running vector similarity queries using GSI. BHIVE is ideal for semantic search scenarios where you want: - -1. **High-performance vector search** across large datasets -2. **Low latency** for real-time applications -3. **Scalability** to handle growing vector collections -4. **Concurrent operations** for multi-user environments - -The BHIVE index will provide optimal performance for our OpenAI embedding-based semantic search implementation. - -#### Alternative: Composite Vector Index - -If your use case requires complex filtering with scalar attributes, you may want to consider using a **Composite Vector Index** instead: - -```python -# Alternative: Create a Composite index for filtered searches -vector_store.create_index( - index_type=IndexType.COMPOSITE, - index_description="IVF,SQ8", - distance_metric=DistanceStrategy.COSINE, - index_name="pydantic_composite_index", -) -``` - -**Use Composite indexes when:** -- You need to filter by document metadata or attributes before vector similarity -- Your queries combine vector search with WHERE clauses -- You have well-defined filtering requirements that can reduce the search space - -**Note**: Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications where you need to search within specific categories, date ranges, or user-specific data segments. - -#### Understanding GSI Index Configuration (Couchbase 8.0 Feature) - -Before creating our BHIVE index, it's important to understand the configuration parameters that optimize vector storage and search performance. The `index_description` parameter controls how Couchbase optimizes vector storage through centroids and quantization. - -##### Index Description Format: `'IVF[],{PQ|SQ}'` - -##### Centroids (IVF - Inverted File) - -- Controls how the dataset is subdivided for faster searches -- **More centroids** = faster search, slower training time -- **Fewer centroids** = slower search, faster training time -- If omitted (like `IVF,SQ8`), Couchbase auto-selects based on dataset size - -###### Quantization Options - -**Scalar Quantization (SQ):** -- `SQ4`, `SQ6`, `SQ8` (4, 6, or 8 bits per dimension) -- Lower memory usage, faster search, slightly reduced accuracy - -**Product Quantization (PQ):** -- Format: `PQx` (e.g., `PQ32x8`) -- Better compression for very large datasets -- More complex but can maintain accuracy with smaller index size - -##### Common Configuration Examples - -- **`IVF,SQ8`** - Auto centroids, 8-bit scalar quantization (good default) -- **`IVF1000,SQ6`** - 1000 centroids, 6-bit scalar quantization -- **`IVF,PQ32x8`** - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - -##### Our Configuration Choice - -In this tutorial, we use `IVF,SQ8` which provides: -- **Auto-selected centroids** optimized for our dataset size -- **8-bit scalar quantization** for good balance of speed, memory usage, and accuracy -- **COSINE distance metric** ideal for semantic similarity search -- **Optimal performance** for most semantic search use cases - -## Setting Up the Couchbase Query Vector Store -A vector store is where we'll keep our embeddings. The query vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, GSI converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables us to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - -The vector store requires a distance metric to determine how similarity between vectors is calculated. This is crucial for accurate semantic search results as different distance metrics can yield different similarity rankings. Some of the supported Distance strategies are dot, l2, euclidean, cosine, l2_squared, euclidean_squared. In our implementation we will use cosine which is particularly effective for text embeddings. - - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - -## Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") - -``` - -### Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") - -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 100 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 100 - -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-11-07 16:46:18,967 - INFO - Document ingestion completed successfully. - - -## Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -# Vector Search Performance Optimization - -Now let's measure and compare the performance benefits of different optimization strategies. We'll conduct a comprehensive performance analysis across two phases: - -## Performance Testing Phases - -1. **Phase 1 - Baseline Performance**: Test vector search without GSI indexes to establish baseline metrics -2. **Phase 2 - GSI-Optimized Search**: Create BHIVE index and measure performance improvements - -**Important Context:** -- GSI performance benefits scale with dataset size and concurrent load -- With our dataset (~1,700 articles), improvements may be modest -- Production environments with millions of vectors show significant GSI advantages -- The combination of GSI + LLM caching provides optimal RAG performance - - - -```python -# Phase 1: Baseline Performance (Without GSI Index) -print("="*80) -print("PHASE 1: BASELINE PERFORMANCE (NO GSI INDEX)") -print("="*80) - -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - baseline_time = time.time() - start_time - - logging.info(f"Semantic search completed in {baseline_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {baseline_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, distance in search_results: - print(f"Vector Distance: {distance:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") - -``` - - ================================================================================ - PHASE 1: BASELINE PERFORMANCE (NO GSI INDEX) - ================================================================================ - - - 2025-11-07 16:46:24,561 - INFO - Semantic search completed in 1.34 seconds - - - - Semantic Search Results (completed in 1.34 seconds): - -------------------------------------------------------------------------------- - Vector Distance: 0.2956, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affected by the worst run of results in his entire managerial career. In an interview with former Italy international Luca Toni for Amazon Prime Sport before Wednesday's Champions League defeat by Juventus, Guardiola touched on the personal impact City's sudden downturn in form has had. Guardiola said his state of mind was "ugly", that his sleep was "worse" and he was eating lighter as his digestion had suffered. City go into Sunday's derby against Manchester United at Etihad Stadium having won just one of their past 10 games. The Juventus loss means there is a chance they may not even secure a play-off spot in the Champions League. Asked to elaborate on his comments to Toni, Guardiola said: "I'm fine. "In our jobs we always want to do our best or the best as possible. When that doesn't happen you are more uncomfortable than when the situation is going well, always that happened. "In good moments I am happier but when I get to the next game I am still concerned about what I have to do. There is no human being that makes an activity and it doesn't matter how they do." Guardiola said City have to defend better and "avoid making mistakes at both ends". To emphasise his point, Guardiola referred back to the third game of City's current run, against a Sporting side managed by Ruben Amorim, who will be in the United dugout at the weekend. City dominated the first half in Lisbon, led thanks to Phil Foden's early effort and looked to be cruising. Instead, they conceded three times in 11 minutes either side of half-time as Sporting eventually ran out 4-1 winners. "I would like to play the game like we played in Lisbon on Sunday, believe me," said Guardiola, who is facing the prospect of only having three fit defenders for the derby as Nathan Ake and Manuel Akanji try to overcome injury concerns. If there is solace for City, it comes from the knowledge United are not exactly flying. Their comeback Europa League victory against Viktoria Plzen on Thursday was their third win of Amorim's short reign so far but only one of those successes has come in the Premier League, where United have lost their past two games against Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements already on the red side of the city. "It's already there," he said. "You see all the patterns, the movements, the runners and the pace. He will do a good job at United, I'm pretty sure of that." - - Guardiola says skipper Kyle Walker has been offered support by the club after the City defender highlighted the racial abuse he had received on social media in the wake of the Juventus trip. "It's unacceptable," he said. "Not because it's Kyle - for any human being. "Unfortunately it happens many times in the real world. It is not necessary to say he has the support of the entire club. It is completely unacceptable and we give our support to him." - -------------------------------------------------------------------------------- - Vector Distance: 0.3100, Text: Pep Guardiola has said Manchester City will be his final managerial job in club football before he "maybe" coaches a national team. - - -------------------------------------------------------------------------------- - - - -```python -vector_store.create_index(index_type=IndexType.BHIVE, index_name="smolagents_bhive_index", index_description="IVF,SQ8") -``` - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="pydantic_ai_composite_index", index_description="IVF,SQ8") - - -```python -# Phase 2: GSI-Optimized Performance (With BHIVE Index) -print("\n" + "="*80) -print("PHASE 2: GSI-OPTIMIZED PERFORMANCE (WITH BHIVE INDEX)") -print("="*80) - -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - gsi_time = time.time() - start_time - - logging.info(f"Semantic search completed in {gsi_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {gsi_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, distance in search_results: - print(f"Vector Distance: {distance:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") - -``` - - - ================================================================================ - PHASE 2: GSI-OPTIMIZED PERFORMANCE (WITH BHIVE INDEX) - ================================================================================ - - - 2025-11-07 16:47:01,538 - INFO - Semantic search completed in 0.42 seconds - - - - Semantic Search Results (completed in 0.42 seconds): - -------------------------------------------------------------------------------- - Vector Distance: 0.2956, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affected by the worst run of results in his entire managerial career. In an interview with former Italy international Luca Toni for Amazon Prime Sport before Wednesday's Champions League defeat by Juventus, Guardiola touched on the personal impact City's sudden downturn in form has had. Guardiola said his state of mind was "ugly", that his sleep was "worse" and he was eating lighter as his digestion had suffered. City go into Sunday's derby against Manchester United at Etihad Stadium having won just one of their past 10 games. The Juventus loss means there is a chance they may not even secure a play-off spot in the Champions League. Asked to elaborate on his comments to Toni, Guardiola said: "I'm fine. "In our jobs we always want to do our best or the best as possible. When that doesn't happen you are more uncomfortable than when the situation is going well, always that happened. "In good moments I am happier but when I get to the next game I am still concerned about what I have to do. There is no human being that makes an activity and it doesn't matter how they do." Guardiola said City have to defend better and "avoid making mistakes at both ends". To emphasise his point, Guardiola referred back to the third game of City's current run, against a Sporting side managed by Ruben Amorim, who will be in the United dugout at the weekend. City dominated the first half in Lisbon, led thanks to Phil Foden's early effort and looked to be cruising. Instead, they conceded three times in 11 minutes either side of half-time as Sporting eventually ran out 4-1 winners. "I would like to play the game like we played in Lisbon on Sunday, believe me," said Guardiola, who is facing the prospect of only having three fit defenders for the derby as Nathan Ake and Manuel Akanji try to overcome injury concerns. If there is solace for City, it comes from the knowledge United are not exactly flying. Their comeback Europa League victory against Viktoria Plzen on Thursday was their third win of Amorim's short reign so far but only one of those successes has come in the Premier League, where United have lost their past two games against Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements already on the red side of the city. "It's already there," he said. "You see all the patterns, the movements, the runners and the pace. He will do a good job at United, I'm pretty sure of that." - - Guardiola says skipper Kyle Walker has been offered support by the club after the City defender highlighted the racial abuse he had received on social media in the wake of the Juventus trip. "It's unacceptable," he said. "Not because it's Kyle - for any human being. "Unfortunately it happens many times in the real world. It is not necessary to say he has the support of the entire club. It is completely unacceptable and we give our support to him." - -------------------------------------------------------------------------------- - Vector Distance: 0.3100, Text: Pep Guardiola has said Manchester City will be his final managerial job in club football before he "maybe" coaches a national team. - -------------------------------------------------------------------------------- - - -## Performance Analysis Summary - -Let's analyze the performance improvements we've achieved through different optimization strategies: - - - -```python -print("\n" + "="*80) -print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"\n📊 Performance Comparison:") -print(f"{'Optimization Level':<35} {'Time (seconds)':<20} {'Status'}") -print("-" * 80) -print(f"{'Phase 1 - Baseline (No Index)':<35} {baseline_time:.4f}{'':16} ⚪ Baseline") -print(f"{'Phase 2 - GSI-Optimized (BHIVE)':<35} {gsi_time:.4f}{'':16} ✅ Optimized") - -# Calculate improvement -if baseline_time > gsi_time: - speedup = baseline_time / gsi_time - improvement = ((baseline_time - gsi_time) / baseline_time) * 100 - print(f"\n✨ GSI Performance Gain: {speedup:.2f}x faster ({improvement:.1f}% improvement)") -elif gsi_time > baseline_time: - slowdown_pct = ((gsi_time - baseline_time) / baseline_time) * 100 - print(f"\n⚠️ Note: GSI was {slowdown_pct:.1f}% slower than baseline in this run") - print(f" This can happen with small datasets. GSI benefits emerge with scale.") -else: - print(f"\n⚖️ Performance: Comparable to baseline") - -print("\n" + "-"*80) -print("KEY INSIGHTS:") -print("-"*80) -print("1. 🚀 GSI Optimization:") -print(" • BHIVE indexes excel with large-scale datasets (millions+ vectors)") -print(" • Performance gains increase with dataset size and concurrent queries") -print(" • Optimal for production workloads with sustained traffic patterns") - -print("\n2. 📦 Dataset Size Impact:") -print(f" • Current dataset: ~1,700 articles") -print(" • At this scale, performance differences may be minimal or variable") -print(" • Significant gains typically seen with 10M+ vectors") - -print("\n3. 🎯 When to Use GSI:") -print(" • Large-scale vector search applications") -print(" • High query-per-second (QPS) requirements") -print(" • Multi-user concurrent access scenarios") -print(" • Production environments requiring scalability") - -print("\n" + "="*80) - -``` - - - ================================================================================ - VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - - 📊 Performance Comparison: - Optimization Level Time (seconds) Status - -------------------------------------------------------------------------------- - Phase 1 - Baseline (No Index) 1.3410 ⚪ Baseline - Phase 2 - GSI-Optimized (BHIVE) 0.4157 ✅ Optimized - - ✨ GSI Performance Gain: 3.23x faster (69.0% improvement) - - -------------------------------------------------------------------------------- - KEY INSIGHTS: - -------------------------------------------------------------------------------- - 1. 🚀 GSI Optimization: - • BHIVE indexes excel with large-scale datasets (millions+ vectors) - • Performance gains increase with dataset size and concurrent queries - • Optimal for production workloads with sustained traffic patterns - - 2. 📦 Dataset Size Impact: - • Current dataset: ~1,700 articles - • At this scale, performance differences may be minimal or variable - • Significant gains typically seen with 10M+ vectors - - 3. 🎯 When to Use GSI: - • Large-scale vector search applications - • High query-per-second (QPS) requirements - • Multi-user concurrent access scenarios - • Production environments requiring scalability - - ================================================================================ - - -## smolagents: An Introduction -[smolagents](https://huggingface.co/docs/smolagents/en/index) is a agentic framework by Hugging Face for easy creation of agents in a few lines of code. - -Some of the features of smolagents are: - -- ✨ Simplicity: the logic for agents fits in ~1,000 lines of code (see agents.py). We kept abstractions to their minimal shape above raw code! - -- 🧑‍💻 First-class support for Code Agents. Our CodeAgent writes its actions in code (as opposed to "agents being used to write code"). To make it secure, we support executing in sandboxed environments via E2B. - -- 🤗 Hub integrations: you can share/pull tools to/from the Hub, and more is to come! - -- 🌐 Model-agnostic: smolagents supports any LLM. It can be a local transformers or ollama model, one of many providers on the Hub, or any model from OpenAI, Anthropic and many others via our LiteLLM integration. - -- 👁️ Modality-agnostic: Agents support text, vision, video, even audio inputs! Cf this tutorial for vision. - -- 🛠️ Tool-agnostic: you can use tools from LangChain, Anthropic's MCP, you can even use a Hub Space as a tool. - -# Building a RAG Agent using smolagents - -smolagents allows users to define their own tools for the agent to use. These tools can be of two types: -1. Tools defined as classes: These tools are subclassed from the `Tool` class and must override the `forward` method, which is called when the tool is used. -2. Tools defined as functions: These are simple functions that are called when the tool is used, and are decorated with the `@tool` decorator. - -In our case, we will use the first method, and we define our `RetrieverTool` below. We define a name, a description and a dictionary of inputs that the tool accepts. This helps the LLM properly identify and use the tool. - -The `RetrieverTool` is simple: it takes a query generated by the user, and uses Couchbase's performant vector search service under the hood to search for semantically similar documents to the query. The LLM can then use this context to answer the user's question. - - - -```python -class RetrieverTool(Tool): - name = "retriever" - description = "Uses semantic search to retrieve the parts of news documentation that could be most relevant to answer your query." - inputs = { - "query": { - "type": "string", - "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.", - } - } - output_type = "string" - - def __init__(self, vector_store: CouchbaseQueryVectorStore, **kwargs): - super().__init__(**kwargs) - self.vector_store = vector_store - - def forward(self, query: str) -> str: - assert isinstance(query, str), "Query must be a string" - - docs = self.vector_store.similarity_search_with_score(query, k=5) - return "\n\n".join( - f"# Documents:\n{doc.page_content}" - for doc, distance in docs - ) - -retriever_tool = RetrieverTool(vector_store) - -``` - -## Defining Our Agent -smolagents have predefined configurations for agents that we can use. We use the `ToolCallingAgent`, which writes its tool calls in a JSON format. Alternatively, there also exists a `CodeAgent`, in which the LLM defines it's functions in code. - -The `CodeAgent` is offers benefits in certain challenging scenarios: it can lead to [higher performance in difficult benchmarks](https://huggingface.co/papers/2411.01747) and use [30% fewer steps to solve problems](https://huggingface.co/papers/2402.01030). However, since our use case is just a simple RAG tool, a `ToolCallingAgent` will suffice. - - - -```python -agent = ToolCallingAgent( - tools=[retriever_tool], - model=OpenAIServerModel( - model_id="gpt-4o-2024-08-06", - api_key=OPENAI_API_KEY, - ), - max_steps=4, - verbosity_level=2 -) - -``` - -## Running our Agent -We have now finished setting up our vector store and agent! The system is now ready to accept queries. - - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -agent_output = agent.run(query) -``` - -## Analyzing the Agent -When the agent runs, smolagents prints out the steps that the agent takes along with the tools called in each step. In the above tool call, two steps occur: - -**Step 1**: First, the agent determines that it requires a tool to be used, and the `retriever` tool is called. The agent also specifies the query parameter for the tool (a string). The tool returns semantically similar documents to the query from Couchbase's vector store. - -**Step 2**: Next, the agent determines that the context retrieved from the tool is sufficient to answer the question. It then calls the `final_answer` tool, which is predefined for each agent: this tool is called when the agent returns the final answer to the user. In this step, the LLM answers the user's query from the context retrieved in step 1 and passes it to the `final_answer` tool, at which point the agent's execution ends. - - -## Conclusion - -By following these steps, you'll have a fully functional agentic RAG system that leverages the strengths of Couchbase and smolagents, along with OpenAI. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, RAG-driven chat system using smolagents' agent framework. - diff --git a/tutorial/markdown/generated/vector-search-cookbook/voyage-RAG_with_Couchbase_and_Voyage.md b/tutorial/markdown/generated/vector-search-cookbook/voyage-RAG_with_Couchbase_and_Voyage.md deleted file mode 100644 index 29f20d8..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/voyage-RAG_with_Couchbase_and_Voyage.md +++ /dev/null @@ -1,692 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-voyage-couchbase-rag" -title: Retrieval-Augmented Generation (RAG) with Couchbase, OpenAI, and Voyage -short_title: RAG with Couchbase, OpenAI, and Claude -description: - - Learn how to build a semantic search engine using Couchbase, OpenAI, and Voyage - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Voyage embeddings and use OpenAI as the language model. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - FTS - - Artificial Intelligence - - LangChain - - OpenAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/voyage/RAG_with_Couchbase_and_Voyage.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Voyage](https://www.voyageai.com/) as the AI-powered embedding and [OpenAI](https://openai.com/) as the language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/voyage/RAG_with_Couchbase_and_Voyage.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for VoyageAI and OpenAI - -* Please follow the [instructions](https://platform.openai.com/docs/quickstart) to generate the OpenAI credentials. -* Please follow the [instructions](https://docs.voyageai.com/docs/api-key-and-installation) to generate the VoyageAI credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.3.0 langchain-voyageai==0.1.4 langchain-openai==0.3.13 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -This block imports all the required libraries and modules used in the notebook. These include libraries for environment management, data handling, natural language processing, interaction with Couchbase, and embeddings generation. Each library serves a specific function, such as managing environment variables, handling datasets, or interacting with the Couchbase database. - - -```python -import json -import logging -import os -import time -import getpass -from datetime import timedelta -from dotenv import load_dotenv - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException,ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from langchain_core.documents import Document -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import ChatOpenAI -from langchain_voyageai import VoyageAIEmbeddings -``` - - /Users/aayush.tyagi/Documents/AI/vector-search-cookbook/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html - from .autonotebook import tqdm as notebook_tqdm - - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True) - -# Set the logging from the httpx library to CRITICAL to avoid excessive logging -logging.getLogger('httpx').setLevel(logging.CRITICAL) -``` - -# Loading Sensitive Informnation -In this section, we prompt the user to input essential configuration settings needed for integrating Couchbase with Cohere's API. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -VOYAGE_API_KEY = os.getenv('VOYAGE_API_KEY') or getpass.getpass('Enter your VoyageAI API key: ') -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API key: ') -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME') or input('Enter your index name (default: vector_search_voyage): ') or 'vector_search_voyage' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: voyage): ') or 'voyage' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Verifying that essential environment variables are set -if not VOYAGE_API_KEY: - raise ValueError("VOYAGE_API_KEY is required.") -if not OPENAI_API_KEY: - raise ValueError("OPENAI_API_KEY is required.") -``` - -# Connect to Couchbase -The script attempts to establish a connection to the Couchbase database using the credentials retrieved from the environment variables. Couchbase is a NoSQL database known for its flexibility, scalability, and support for various data models, including document-based storage. The connection is authenticated using a username and password, and the script waits until the connection is fully established before proceeding. - - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-02-24 01:02:11,426 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-02-24 01:02:12,840 - INFO - Bucket 'vector-search-testing' exists. - 2025-02-24 01:02:15,328 - INFO - Collection 'voyage' already exists. Skipping creation. - 2025-02-24 01:02:18,539 - INFO - Primary index present or created successfully. - 2025-02-24 01:02:21,013 - INFO - All documents cleared from the collection. - 2025-02-24 01:02:21,014 - INFO - Bucket 'vector-search-testing' exists. - 2025-02-24 01:02:23,506 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-02-24 01:02:26,647 - INFO - Primary index present or created successfully. - 2025-02-24 01:02:26,913 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This Voyage vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `voyage`. The configuration is set up for vectors with exactly `1536 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/voyage_index.json' # Local setup: specify your file path here - -# # Version for Google Colab -# def load_index_definition_colab(): -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -# try: -# with open(index_definition_path, 'r') as file: -# index_definition = json.load(file) -# return index_definition -# except Exception as e: -# raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Version for Local Environment -def load_index_definition_local(index_definition_path): - try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - return index_definition - except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Usage -# Uncomment the appropriate line based on your environment -# index_definition = load_index_definition_colab() -index_definition = load_index_definition_local('voyage_index.json') -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-04-21 13:43:33,489 - INFO - Index 'vector_search_voyage' found - 2025-04-21 13:43:33,505 - INFO - Index 'vector_search_voyage' already exists. Skipping creation/update. - - -# Create Embeddings -Embeddings are created using the Voyage API. Embeddings are vectors (arrays of numbers) that represent the meaning of text in a high-dimensional space. These embeddings are crucial for tasks like semantic search, where the goal is to find text that is semantically similar to a query. The script uses a pre-trained model provided by Voyage to generate embeddings for the text in the dataset. - - -```python -try: - embeddings = VoyageAIEmbeddings(voyage_api_key=VOYAGE_API_KEY,model="voyage-large-2") - logging.info("Successfully created VoyageAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating VoyageAIEmbeddings: {str(e)}") -``` - - 2025-02-24 01:02:29,797 - INFO - Successfully created VoyageAIEmbeddings - - -# Set Up Vector Store -The vector store is set up to manage the embeddings created in the previous step. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is built on top of Couchbase, allowing the script to store the embeddings in a way that can be efficiently searched. - - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-02-24 01:02:34,123 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-02-24 01:02:39,306 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Error Handling: If an error occurs, only the current batch is affected -3. Progress Tracking: Easier to monitor and track the ingestion progress -4. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 25 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 25 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-02-24 01:39:56,883 - INFO - Document ingestion completed successfully. - - -# Set Up Cache - A cache is set up using Couchbase to store intermediate results and frequently accessed data. Caching is important for improving performance, as it reduces the need to repeatedly calculate or retrieve the same data. The cache is linked to a specific collection in Couchbase, and it is used later in the script to store the results of language model queries. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-02-24 01:39:59,753 - INFO - Successfully created cache - - -# Create Language Model (LLM) -The script initializes a Cohere language model (LLM) that will be used for generating responses to queries. LLMs are powerful tools for natural language understanding and generation, capable of producing human-like text based on input prompts. The model is configured with specific parameters, such as the temperature, which controls the randomness of its outputs. - - - -```python -try: - llm = ChatOpenAI( - openai_api_key=OPENAI_API_KEY, - model="gpt-4o-2024-08-06", - temperature=0 - ) - logging.info(f"Successfully created OpenAI LLM with model gpt-4o-2024-08-06") -except Exception as e: - raise ValueError(f"Error creating OpenAI LLM: {str(e)}") -``` - - 2025-02-24 01:39:59,846 - INFO - Successfully created OpenAI LLM with model gpt-4o-2024-08-06 - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. - -In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of the CouchbaseSearchVectorStore. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-02-24 01:40:02,318 - INFO - Semantic search completed in 2.46 seconds - - - - Semantic Search Results (completed in 2.46 seconds): - -------------------------------------------------------------------------------- - Score: 0.7965, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - Pep Guardiola has not been through a moment like this in his managerial career. Manchester City have lost nine matches in their past 12 - as many defeats as they had suffered in their previous 106 fixtures. At the end of October, City were still unbeaten at the top of the Premier League and favourites to win a fifth successive title. Now they are seventh, 12 points behind leaders Liverpool having played a game more. It has been an incredible fall from grace and left people trying to work out what has happened - and whether Guardiola can make it right. After discussing the situation with those who know him best, I have taken a closer look at the future - both short and long term - and how the current crisis at Man City is going to be solved. - - Pep Guardiola's Man City have lost nine of their past 12 matches - - Guardiola has also been giving it a lot of thought. He has not been sleeping very well, as he has said, and has not been himself at times when talking to the media. He has been talking to a lot of people about what is going on as he tries to work out the reasons for City's demise. Some reasons he knows, others he still doesn't. What people perhaps do not realise is Guardiola hugely doubts himself and always has. He will be thinking "I'm not going to be able to get us out of this" and needs the support of people close to him to push away those insecurities - and he has that. He is protected by his people who are very aware, like he is, that there are a lot of people that want City to fail. It has been a turbulent time for Guardiola. Remember those marks he had on his head after the 3-3 draw with Feyenoord in the Champions League? He always scratches his head, it is a gesture of nervousness. Normally nothing happens but on that day one of his nails was far too sharp so, after talking to the players in the changing room where he scratched his head because of his usual agitated gesturing, he went to the news conference. His right-hand man Manel Estiarte sent him photos in a message saying "what have you got on your head?", but by the time Guardiola returned to the coaching room there was hardly anything there again. He started that day with a cover on his nose after the same thing happened at the training ground the day before. Guardiola was having a footballing debate with Kyle Walker about positional stuff and marked his nose with that same nail. There was also that remarkable news conference after the Manchester derby when he said "I don't know what to do". That is partly true and partly not true. Ignore the fact Guardiola suggested he was "not good enough". He actually meant he was not good enough to resolve the situation with the group of players he has available and with all the other current difficulties. There are obviously logical explanations for the crisis and the first one has been talked about many times - the absence of injured midfielder Rodri. You know the game Jenga? When you take the wrong piece out, the whole tower collapses. That is what has happened here. It is normal for teams to have an over-reliance on one player if he is the best in the world in his position. And you cannot calculate the consequences of an injury that rules someone like Rodri out for the season. City are a team, like many modern ones, in which the holding midfielder is a key element to the construction. So, when you take Rodri out, it is difficult to hold it together. There were Plan Bs - John Stones, Manuel Akanji, even Nathan Ake - but injuries struck. The big injury list has been out of the ordinary and the busy calendar has also played a part in compounding the issues. However, one factor even Guardiola cannot explain is the big uncharacteristic errors in almost every game from international players. Why did Matheus Nunes make that challenge to give away the penalty against Manchester United? Jack Grealish is sent on at the end to keep the ball and cannot do that. There are errors from Walker and other defenders. These are some of the best players in the world. Of course the players' mindset is important, and confidence is diminishing. Wrong decisions get taken so there is almost panic on the pitch instead of calm. There are also players badly out of form who are having to play because of injuries. Walker is now unable to hide behind his pace, I'm not sure Kevin de Bruyne is ever getting back to the level he used to be at, Bernardo Silva and Ilkay Gundogan do not have time to rest, Grealish is not playing at his best. Some of these players were only meant to be playing one game a week but, because of injuries, have played 12 games in 40 days. It all has a domino effect. One consequence is that Erling Haaland isn't getting the service to score. But the Norwegian still remains City's top-scorer with 13. Defender Josko Gvardiol is next on the list with just four. The way their form has been analysed inside the City camp is there have only been three games where they deserved to lose (Liverpool, Bournemouth and Aston Villa). But of course it is time to change the dynamic. - - Guardiola has never protected his players so much. He has not criticised them and is not going to do so. They have won everything with him. Instead of doing more with them, he has tried doing less. He has sometimes given them more days off to clear their heads, so they can reset - two days this week for instance. Perhaps the time to change a team is when you are winning, but no-one was suggesting Man City were about to collapse when they were top and unbeaten after nine league games. Some people have asked how bad it has to get before City make a decision on Guardiola. The answer is that there is no decision to be made. Maybe if this was Real Madrid, Barcelona or Juventus, the pressure from outside would be massive and the argument would be made that Guardiola has to go. At City he has won the lot, so how can anyone say he is failing? Yes, this is a crisis. But given all their problems, City's renewed target is finishing in the top four. That is what is in all their heads now. The idea is to recover their essence by improving defensive concepts that are not there and re-establishing the intensity they are known for. Guardiola is planning to use the next two years of his contract, which is expected to be his last as a club manager, to prepare a new Manchester City. When he was at the end of his four years at Barcelona, he asked two managers what to do when you feel people are not responding to your instructions. Do you go or do the players go? Sir Alex Ferguson and Rafael Benitez both told him that the players need to go. Guardiola did not listen because of his emotional attachment to his players back then and he decided to leave the Camp Nou because he felt the cycle was over. He will still protect his players now but there is not the same emotional attachment - so it is the players who are going to leave this time. It is likely City will look to replace five or six regular starters. Guardiola knows it is the end of an era and the start of a new one. Changes will not be immediate and the majority of the work will be done in the summer. But they are open to any opportunities in January - and a holding midfielder is one thing they need. In the summer City might want to get Spain's Martin Zubimendi from Real Sociedad and they know 60m euros (£50m) will get him. He said no to Liverpool last summer even though everything was agreed, but he now wants to move on and the Premier League is the target. Even if they do not get Zubimendi, that is the calibre of footballer they are after. A new Manchester City is on its way - with changes driven by Guardiola, incoming sporting director Hugo Viana and the football department. - -------------------------------------------------------------------------------- - Score: 0.7948, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affected by the worst run of results in his entire managerial career. In an interview with former Italy international Luca Toni for Amazon Prime Sport before Wednesday's Champions League defeat by Juventus, Guardiola touched on the personal impact City's sudden downturn in form has had. Guardiola said his state of mind was "ugly", that his sleep was "worse" and he was eating lighter as his digestion had suffered. City go into Sunday's derby against Manchester United at Etihad Stadium having won just one of their past 10 games. The Juventus loss means there is a chance they may not even secure a play-off spot in the Champions League. Asked to elaborate on his comments to Toni, Guardiola said: "I'm fine. "In our jobs we always want to do our best or the best as possible. When that doesn't happen you are more uncomfortable than when the situation is going well, always that happened. "In good moments I am happier but when I get to the next game I am still concerned about what I have to do. There is no human being that makes an activity and it doesn't matter how they do." Guardiola said City have to defend better and "avoid making mistakes at both ends". To emphasise his point, Guardiola referred back to the third game of City's current run, against a Sporting side managed by Ruben Amorim, who will be in the United dugout at the weekend. City dominated the first half in Lisbon, led thanks to Phil Foden's early effort and looked to be cruising. Instead, they conceded three times in 11 minutes either side of half-time as Sporting eventually ran out 4-1 winners. "I would like to play the game like we played in Lisbon on Sunday, believe me," said Guardiola, who is facing the prospect of only having three fit defenders for the derby as Nathan Ake and Manuel Akanji try to overcome injury concerns. If there is solace for City, it comes from the knowledge United are not exactly flying. Their comeback Europa League victory against Viktoria Plzen on Thursday was their third win of Amorim's short reign so far but only one of those successes has come in the Premier League, where United have lost their past two games against Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements already on the red side of the city. "It's already there," he said. "You see all the patterns, the movements, the runners and the pace. He will do a good job at United, I'm pretty sure of that." - - Guardiola says skipper Kyle Walker has been offered support by the club after the City defender highlighted the racial abuse he had received on social media in the wake of the Juventus trip. "It's unacceptable," he said. "Not because it's Kyle - for any human being. "Unfortunately it happens many times in the real world. It is not necessary to say he has the support of the entire club. It is completely unacceptable and we give our support to him." - -------------------------------------------------------------------------------- - Score: 0.7755, Text: Pep Guardiola has said Manchester City will be his final managerial job in club football before he "maybe" coaches a national team. - - The former Barcelona and Bayern Munich boss has won 15 major trophies since taking charge of City in 2016. - - The 53-year-old Spaniard was approached in the summer about the possibility of becoming England manager, but last month signed a two-year contract extension with City until 2027. - - - ... (output truncated for brevity) - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -try: - template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below: - {context} - Question: {question}""" - prompt = ChatPromptTemplate.from_template(template) - rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | prompt - | llm - | StrOutputParser() - ) - logging.info("Successfully created RAG chain") -except Exception as e: - raise ValueError(f"Error creating LLM chains: {str(e)}") -``` - - 2025-02-24 01:40:02,392 - INFO - Successfully created RAG chain - - - -```python -try: - # Get RAG response - start_time = time.time() - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") - -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: Pep Guardiola has expressed concern about Manchester City's current form, describing his state of mind as "ugly" and admitting that his sleep and diet have been affected. He acknowledged the team's poor run of results and emphasized the need to defend better and avoid mistakes. Despite the challenges, Guardiola remains calm and focused on finding solutions, expressing trust in his players and a determination to return to form. He has not criticized his players publicly and has instead offered them support, giving them more days off to reset. Guardiola is planning for the future, acknowledging the end of an era and the need for changes in the team. - RAG response generated in 7.56 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What was manchester city manager pep guardiola's reaction to the team's current form?", # Repeated query - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, the game ended in a 2-2 draw. Liverpool played the majority of the match with ten men after Andy Robertson received a red card in the 17th minute. Despite being a player down, Liverpool managed to equalize twice, with Diogo Jota scoring an 86th-minute equalizer. The performance was praised as impressive, with Liverpool maintaining more than 60% possession and leading in several attacking metrics. - Time taken: 6.54 seconds - - Query 2: What was manchester city manager pep guardiola's reaction to the team's current form? - Response: Pep Guardiola has expressed concern about Manchester City's current form, describing his state of mind as "ugly" and admitting that his sleep and diet have been affected. He acknowledged the team's poor run of results and emphasized the need to defend better and avoid mistakes. Despite the challenges, Guardiola remains calm and focused on finding solutions, expressing trust in his players and a determination to return to form. He has not criticized his players publicly and has instead offered them support, giving them more days off to reset. Guardiola is planning for the future, acknowledging the end of an era and the need for changes in the team. - Time taken: 1.98 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, the game ended in a 2-2 draw. Liverpool played the majority of the match with ten men after Andy Robertson received a red card in the 17th minute. Despite being a player down, Liverpool managed to equalize twice, with Diogo Jota scoring an 86th-minute equalizer. The performance was praised as impressive, with Liverpool maintaining more than 60% possession and leading in several attacking metrics. - Time taken: 1.85 seconds - - -## Conclusion -By following these steps, you’ll have a fully functional semantic search engine that leverages the strengths of Couchbase and Voyage. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you’re a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine.