diff --git a/apps/oracle-database-java-agent-memory/.gitignore b/apps/oracle-database-java-agent-memory/.gitignore index 5cc184fd..800d9629 100644 --- a/apps/oracle-database-java-agent-memory/.gitignore +++ b/apps/oracle-database-java-agent-memory/.gitignore @@ -7,3 +7,9 @@ src/web/venv/ # Local config (contains credentials) src/chatserver/src/main/resources/application-local.yaml + +# ONNX models (large binaries) +models/ + +# Oracle database data +oradata/ diff --git a/apps/oracle-database-java-agent-memory/README.md b/apps/oracle-database-java-agent-memory/README.md index ea03f092..66baa6cc 100644 --- a/apps/oracle-database-java-agent-memory/README.md +++ b/apps/oracle-database-java-agent-memory/README.md @@ -104,7 +104,7 @@ Download the pre-built `all-MiniLM-L12-v2` ONNX model from [Oracle ML](https://b ```bash podman exec oradb mkdir -p /opt/oracle/dumps -podman cp all_MiniLM_L12_v2.onnx oradb:/opt/oracle/dumps/ +podman cp models/all_MiniLM_L12_v2.onnx oradb:/opt/oracle/dumps/ podman exec -i oradb sqlplus pdbadmin/Oracle123@freepdb1 < setup-hybrid-search.sql ``` diff --git a/apps/oracle-database-java-agent-memory/docs/articles/ai-agent-memory-spring-ai-oracle-wordpress.md b/apps/oracle-database-java-agent-memory/docs/articles/ai-agent-memory-spring-ai-oracle-wordpress.md new file mode 100644 index 00000000..e6bcd9c7 --- /dev/null +++ b/apps/oracle-database-java-agent-memory/docs/articles/ai-agent-memory-spring-ai-oracle-wordpress.md @@ -0,0 +1,298 @@ +--- +title: "Build AI Agent Memory with Spring AI and Oracle Database 26ai" +description: "Learn how to give an AI agent persistent episodic, semantic, and procedural memory using Spring AI, Oracle Database 26ai Hybrid Vector Indexes, and in-database ONNX embeddings." +primary_keyword: "AI agent memory" +products: ["Oracle AI Database 26ai", "Oracle Hybrid Vector Indexes"] +audience: ["Developers", "AI engineers"] +--- + +# How I Gave Memory to an AI Agent Using Spring AI and Oracle Database + + +
TL;DR: LLMs forget everything between sessions. This post shows how to build three types of persistent memory — episodic (chat history), semantic (domain knowledge via hybrid search), and procedural (tool calls) — using Spring AI and a single Oracle Database 26ai instance.
+ + + +Every LLM has the same problem: it forgets everything the moment the conversation ends, sometimes even during long conversations. Spend twenty minutes explaining your project setup, your constraints, your preferences — and it nails the answer. Close the tab, open a new session, and it greets you like a stranger. All that context, gone.
+ + + +If you want to build an AI agent — something that actually remembers context, knows things about your domain, and can take action — you need to give it memory. The practical kind, where it remembers what you said, retrieves facts you taught it, and executes real workflows backed by database queries.
+ + + +This post walks through a proof of concept that does exactly that. Three types of memory, one database, not much code.
+ + + +@Tool methodsThe agent runs on Spring Boot with Spring AI. Ollama handles chat inference (qwen2.5) locally. Oracle AI Database 26ai stores all three memory types: a relational table for chat history (episodic), a hybrid vector index for domain knowledge retrieval (semantic), and application tables queried by @Tool methods (procedural). Embeddings are computed in-database by a loaded ONNX model (all-MiniLM-L12-v2), so there are no external embedding API calls. A Streamlit frontend provides a simple web UI.
Both advisors and all six tools run on every request. The agent simultaneously remembers what you said, looks up relevant knowledge, and knows how to perform tasks — all from a single Oracle Database instance. No Pinecone. No Redis. No second database. One connection pool, one set of credentials, one thing to monitor.
+ + + +qwen2.5 model pulledall_MiniLM_L12_v2.onnx) for in-database embeddingsStart an Oracle AI Database 26ai instance and run the one-time setup script to load the ONNX embedding model and create the hybrid vector index. This enables in-database embeddings and combined vector + keyword search.
+ + + +-- Load the ONNX model for in-database embeddings
+BEGIN
+ DBMS_VECTOR.LOAD_ONNX_MODEL(
+ directory => 'DM_DUMP',
+ file_name => 'all_MiniLM_L12_v2.onnx',
+ model_name => 'ALL_MINILM_L12_V2'
+ );
+END;
+/
+
+-- Create a hybrid index: vector similarity + Oracle Text keyword search
+CREATE HYBRID VECTOR INDEX POLICY_HYBRID_IDX
+ON POLICY_DOCS(content)
+PARAMETERS('MODEL ALL_MINILM_L12_V2 VECTOR_IDXTYPE HNSW');
+
+
+
+
+Once the index exists, embeddings are computed automatically when rows are inserted — no external embedding API calls needed.
+ + + +Procedural memory is implemented as @Tool-annotated methods in a Spring component. These are real database queries via JPA that the LLM can call when it decides a task requires action, not just an answer. The @Tool description tells the LLM when to use each method, and @ToolParam describes the arguments.
@Tool(description = "Look up the status of a customer order by its order ID. " +
+ "Returns the current status including shipping information.")
+public String lookupOrderStatus(
+ @ToolParam(description = "The order ID to look up, e.g. ORD-1001") String orderId) {
+ // Fetches order from DB via JPA, returns formatted status string
+}
+
+@Tool(description = "Initiate a product return for a given order. " +
+ "Validates the order exists, checks that it is in DELIVERED status, " +
+ "and verifies the return is within the 30-day return window.")
+public String initiateReturn(
+ @ToolParam(description = "The order ID to return") String orderId,
+ @ToolParam(description = "The reason for the return") String reason) {
+ // Validates order exists, checks DELIVERED status and 30-day window, updates status via JPA
+}
+
+
+
+
+The full class has six tools: getCurrentDateTime, listOrders, lookupOrderStatus, initiateReturn, escalateToSupport, and listSupportTickets. The LLM decides when to act; the Java methods define how.
The controller builds a single ChatClient with two advisors and six tools. MessageChatMemoryAdvisor handles episodic memory — it loads the last 100 messages for the current conversation from a relational table and saves each new exchange. RetrievalAugmentationAdvisor with a custom OracleHybridDocumentRetriever handles semantic memory — it calls DBMS_HYBRID_VECTOR.SEARCH to run vector + keyword search in parallel, fused with Reciprocal Rank Fusion (RRF). The tools are registered via .defaultTools(agentTools).
@RestController
+@RequestMapping("/api/v1/agent")
+public class AgentController {
+
+ public AgentController(ChatClient.Builder builder,
+ JdbcChatMemoryRepository chatMemoryRepository,
+ JdbcTemplate jdbcTemplate,
+ AgentTools agentTools) {
+ // Builds a ChatClient with:
+ // - MessageChatMemoryAdvisor (episodic: last 100 messages per conversation)
+ // - RetrievalAugmentationAdvisor + OracleHybridDocumentRetriever (semantic: hybrid search)
+ // - AgentTools via .defaultTools() (procedural: 6 @Tool methods)
+ // - System prompt defining the agent persona and tool usage rules
+ }
+
+ @PostMapping("/chat")
+ public ResponseEntity<String> chat(
+ @RequestBody String message,
+ @RequestHeader("X-Conversation-Id") String conversationId) {
+ // Sends message to ChatClient with conversation ID, returns LLM response
+ }
+
+ @PostMapping("/knowledge")
+ public ResponseEntity<String> addKnowledge(@RequestBody String content) {
+ // Inserts text into POLICY_DOCS table via JDBC (hybrid index handles embedding)
+ }
+}
+
+
+
+
+All three memory types run on every request. The agent simultaneously remembers what you said, looks up relevant knowledge, and knows how to perform tasks.
+ + + +The custom OracleHybridDocumentRetriever implements Spring AI's DocumentRetriever interface and calls DBMS_HYBRID_VECTOR.SEARCH via JDBC. It passes a JSON parameter specifying the hybrid index, the RRF scorer, and a keyword match clause. This bypasses OracleVectorStore entirely for retrieval.
Why hybrid instead of pure vector search? Dense embeddings capture meaning — a query about "return policy" will match documents about refunds and exchanges. But they're weak on exact terms: a query for "ORD-1001" degrades because the embedding model encodes semantics, not keywords. Hybrid search covers both: the vector side handles meaning, the keyword side handles exact matches, and RRF merges the two result lists by rank position.
+ + + +Start the Oracle DB container, install Ollama, pull the chat model, run the Spring Boot backend with the local profile, and optionally start the Streamlit UI.
Quick test with curl:
+ + + +curl -X POST http://localhost:8080/api/v1/agent/chat \
+ -H "Content-Type: text/plain" \
+ -H "X-Conversation-Id: test-1" \
+ -d "What orders do I have?"
+
+
+
+
+The agent will use procedural memory (the listOrders tool) to query the database and return the demo orders. Try following up with "What is your return policy?" to see semantic memory (hybrid search over policy documents) in action, then "My name is Victor" followed later by "What's my name?" to test episodic memory.
SELECT MODEL_NAME FROM USER_MINING_MODELS. Confirm the hybrid index was created with SELECT INDEX_NAME FROM USER_INDEXES WHERE INDEX_NAME = 'POLICY_HYBRID_IDX'.SPRING_AI_CHAT_MEMORY table was auto-created (initialize-schema: always in application.yaml). Verify you are sending the same X-Conversation-Id header across requests.@Tool descriptions are clear enough for the LLM to match. Check Ollama logs for tool-calling output. The qwen2.5 model supports tool calling natively.Why does the agent need three types of memory instead of just chat history?
Chat history (episodic memory) only covers what was said in the conversation. Semantic memory lets the agent retrieve domain knowledge — like return policies or shipping rules — that was never mentioned in chat. Procedural memory lets it take actions, such as looking up an order or initiating a return, by calling tool methods backed by real database queries.
Why use hybrid search instead of plain vector similarity?
Pure vector search matches by meaning, which works well for natural-language questions but struggles with exact terms like product codes or order IDs. Hybrid search runs vector and keyword search in parallel and merges the results by rank position (Reciprocal Rank Fusion), so the agent finds relevant documents whether the match is semantic, lexical, or both.
Do I need a separate vector database to build this?
No. Oracle AI Database 26ai supports relational tables, hybrid vector indexes, and full-text search in a single instance. The POC uses one connection pool and one set of credentials for chat history, vector retrieval, and all application data.
How are the embeddings generated?
An ONNX model (all-MiniLM-L12-v2) is loaded directly into Oracle Database. Embeddings are computed automatically whenever a row is inserted into the indexed table — no external API calls and no separate embedding service required.
What are the limitations?
This is a proof of concept. There's no authentication, no rate limiting, and no streaming responses. It demonstrates the architecture and approach — production use would require hardening those areas.