-
Notifications
You must be signed in to change notification settings - Fork 8
Added new spring-ai couchbase vector store tutorial #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,221 @@ | ||
| --- | ||
| # frontmatter | ||
| path: "/tutorial-java-spring-ai" | ||
| title: Couchbase Vector Search using Spring AI | ||
| short_title: Spring AI Vector Storage | ||
| description: | ||
| - Learn how to configure and use couchbase vector search with Spring AI | ||
| - Learn how to vectorize data with Spring AI | ||
| - Learn how to retrieve vector data from Couchbase | ||
| content_type: tutorial | ||
| filter: sdk | ||
| technology: | ||
| - connectors | ||
| - vector search | ||
| tags: | ||
| - LangChain | ||
| - Artificial Intelligence | ||
| - Data Ingestion | ||
| sdk_language: | ||
| - java | ||
| length: 10 Mins | ||
| --- | ||
|
|
||
| ## About This Tutorial | ||
| This tutorial will show how to use a Couchbase database cluster as a Spring AI embedding storage. | ||
|
|
||
| ## Example Source code | ||
| Example source code for this tutorial can be obtained from [Spring AI demo application with Couchbase Vector Store](https://github.com/couchbase-examples/couchbase-spring-ai-demo). | ||
| To do this, clone the repository using git: | ||
| ```shell | ||
| git clone https://github.com/couchbase-examples/couchbase-spring-ai-demo.git | ||
| cd couchbase-spring-ai-demo | ||
| ``` | ||
|
|
||
| ### What is Spring AI? | ||
|
|
||
| Spring AI is an extension of the Spring Framework that simplifies the integration of AI capabilities into Spring applications. It provides abstractions and integrations for working with various AI services and models, making it easier for developers to incorporate AI functionality without having to manage low-level implementation details. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Link to the spring ai project |
||
|
|
||
| Key features of Spring AI include: | ||
| - **Model integrations**: Pre-built connectors to popular AI models (like OpenAI) | ||
| - **Prompt engineering**: Tools for crafting and managing prompts | ||
| - **Vector stores**: Abstractions for storing and retrieving vector embeddings | ||
| - **Document processing**: Utilities for working with unstructured data | ||
|
|
||
| ### Why Use Spring AI? | ||
|
|
||
| Spring AI brings several benefits to Java developers: | ||
| 1. **Familiar programming model**: Uses Spring's dependency injection and configuration | ||
| 2. **Abstraction layer**: Provides consistent interfaces across different AI providers | ||
| 3. **Enterprise-ready**: Built with production use cases in mind | ||
| 4. **Simplified development**: Reduces boilerplate code for AI integrations | ||
|
|
||
|
|
||
| ## Couchbase Embedding Store | ||
| Couchbase spring-ai integration stores each embedding in a separate document and uses an FTS vector index to perform | ||
|
deniswsrosa marked this conversation as resolved.
|
||
| queries against stored vectors. Currently, it supports storing embeddings and their metadata, as well as removing | ||
| embeddings. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This last phrase sounds a bit awkward. Is the framework supposed to support more things? After reading it, feels like there is something missing that we haven't implemented yet. |
||
|
|
||
| ## Project Structure | ||
|
|
||
| ``` | ||
| src/main/java/com/couchbase_spring_ai/demo/ | ||
| ├── Config.java # Application configuration | ||
| ├── Controller.java # REST API endpoints | ||
| └── CouchbaseSpringAiDemoApplications.java # Application entry point | ||
|
|
||
| src/main/resources/ | ||
| ├── application.properties # Application settings | ||
| └── bbc_news_data.json # Sample data | ||
| ``` | ||
|
|
||
| ## Setup and Configuration | ||
|
|
||
| ### Prerequisites | ||
| - [Couchbase Capella](https://docs.couchbase.com/cloud/get-started/create-account.html) account or locally installed [Couchbase Server](/tutorial-couchbase-installation-options) | ||
| - Java 21 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is java 21 a must? check the framework requirements. I think it has to be at least java 17. |
||
| - Maven | ||
| - Couchbase Server running locally | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need couchbase server running locally. Just add "Couchbase Server" |
||
| - OpenAI API key | ||
|
|
||
| ### Configuration Details | ||
|
|
||
| The application is configured in `application.properties`: | ||
|
|
||
|
Comment on lines
+87
to
+88
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| ```properties | ||
| spring.application.name=spring-ai-demo | ||
| spring.ai.openai.api-key=your-openai-api-key | ||
| spring.couchbase.connection-string=couchbase://127.0.0.1 | ||
| spring.couchbase.username=Administrator | ||
| spring.couchbase.password=password | ||
| ``` | ||
|
|
||
| ## Key Components | ||
|
|
||
| ### Configuration Class (`Config.java`) | ||
|
|
||
| This class creates the necessary beans for: | ||
| - Connecting to Couchbase cluster | ||
| - Setting up the OpenAI embedding model (OpenAI key is assumed to be stored as an environment variable.) | ||
| - Configuring the Couchbase vector store | ||
|
|
||
| ```java | ||
| public class Config { | ||
| @Value("${spring.couchbase.connection-string}") | ||
| private String connectionUrl; | ||
| @Value("${spring.couchbase.username}") | ||
| private String username; | ||
| @Value("${spring.couchbase.password}") | ||
| private String password; | ||
| @Value("${spring.ai.openai.api-key}") | ||
| private String openaiKey; | ||
|
|
||
| public Config() { | ||
| } | ||
|
|
||
| @Bean | ||
| public Cluster cluster() { | ||
| return Cluster.connect(this.connectionUrl, this.username, this.password); | ||
| } | ||
|
|
||
| @Bean | ||
| public Boolean initializeSchema() { | ||
| return true; | ||
| } | ||
|
|
||
| @Bean | ||
| public EmbeddingModel embeddingModel() { | ||
| return new OpenAiEmbeddingModel(OpenAiApi.builder().apiKey(this.openaiKey).build()); | ||
| } | ||
|
|
||
| @Bean | ||
| public VectorStore couchbaseSearchVectorStore(Cluster cluster, | ||
| EmbeddingModel embeddingModel, | ||
| Boolean initializeSchema) { | ||
| return CouchbaseSearchVectorStore | ||
| .builder(cluster, embeddingModel) | ||
| .bucketName("test") | ||
| .scopeName("test") | ||
| .collectionName("test") | ||
| .initializeSchema(initializeSchema) | ||
| .build(); | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| The vector store is configured to use: | ||
| - Bucket: "test" | ||
| - Scope: "test" | ||
| - Collection: "test" | ||
|
deniswsrosa marked this conversation as resolved.
|
||
|
|
||
| ### Vector Store Integration | ||
|
|
||
| The application uses `CouchbaseSearchVectorStore`, which: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Link to the JavaDocs of this class |
||
| - Stores document embeddings in Couchbase | ||
| - Provides similarity search capabilities | ||
| - Maintains metadata alongside vector embeddings | ||
|
|
||
| ### Vector Index | ||
| The embedding store uses an FTS vector index in order to perform vector similarity lookups. If provided with a name for | ||
| vector index that does not exist on the cluster, the store will attempt to create a new index with default | ||
| configuration based on the provided initialization settings. It is recommended to manually review the settings for the | ||
| created index and adjust them according to specific use cases. More information about vector search and FTS index | ||
| configuration can be found at [Couchbase Documentation](https://docs.couchbase.com/server/current/vector-search/vector-search.html). | ||
|
Comment on lines
+165
to
+167
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| ### Controller Class (`Controller.java`) | ||
|
|
||
| Provides REST API endpoints: | ||
| - `/tutorial/load`: Loads sample BBC news data into Couchbase | ||
| - `/tutorial/search`: Performs a semantic search for sports-related news articles | ||
|
|
||
| ##### Load functionality | ||
| ```java | ||
| ... | ||
| Document doc = new Document(String.format("%s", i + 1), j.getString("content"), Map.of("title", j.getString("title"))) | ||
| ... | ||
|
Comment on lines
+177
to
+179
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| this.couchbaseSearchVectorStore.add(doc); | ||
| ... | ||
| ``` | ||
|
|
||
| - A new Document object is created. The document's ID is generated using String.format("%s", i + 1), which increments an index i to ensure unique IDs and same ID across calls. Metadata is added as a map with a key "title" and its corresponding value from a previously parsed JSON. | ||
| - The document is then added to the couchbaseSearchVectorStore, which is an instance of a class that handles storing documents in Couchbase. This operation involves vectorizing the document content and storing it in a format suitable for vector search. | ||
|
|
||
|
|
||
| ##### Search functionality | ||
| ```java | ||
| List<Document> results = this.couchbaseSearchVectorStore.similaritySearch(SearchRequest.builder() | ||
| .query("Give me some sports news") | ||
| .similarityThreshold((double)0.75F) | ||
| .topK(15) | ||
| .build()); | ||
|
|
||
| return (List)results.stream() | ||
| .map((doc) -> Map.of("content", doc.getText(), "metadata", doc.getMetadata())) | ||
| .collect(Collectors.toList()); | ||
| ``` | ||
|
|
||
| - A SearchRequest is built with a query string "Give me some sports news". The similarityThreshold is set to 0.75, meaning only documents with a similarity score above this threshold will be considered relevant. The topK parameter is set to 15, indicating that the top 15 most similar documents should be returned. | ||
| - The similaritySearch method of couchbaseSearchVectorStore is called with the built SearchRequest. This method performs a vector similarity search against the stored documents. | ||
| - The results, which are a list of Document objects, are processed using Java Streams. Each document is mapped to a simplified structure containing its text content and metadata. The final result is a list of maps, each representing a document with its content and metadata. | ||
|
|
||
| ## Using the Application | ||
|
|
||
| ### Loading Data | ||
|
|
||
| 1. Start the application | ||
| 2. Make a GET request to `http://localhost:8080/tutorial/load` | ||
| 3. This loads BBC news articles from the included JSON file into Couchbase, creating embeddings via OpenAI | ||
|
|
||
| ### Performing Similarity Searches | ||
|
|
||
| 1. Make a GET request to `http://localhost:8080/tutorial/search` | ||
| 2. The application will search for documents semantically similar to "Give me some sports news" | ||
| 3. Results are returned with content and metadata, sorted by similarity score | ||
|
|
||
|
|
||
| ## Resources | ||
|
|
||
| - [Spring AI Documentation](https://docs.spring.io/spring-ai/reference/index.html) | ||
| - [Couchbase Vector Search](https://docs.couchbase.com/server/current/fts/vector-search.html) | ||
| - [OpenAI Embeddings Documentation](https://platform.openai.com/docs/guides/embeddings) | ||
| - [Spring Boot Documentation](https://docs.spring.io/spring-boot/docs/current/reference/html/) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to make this phrase a bit more engaging.