Skip to content

Latest commit

 

History

History
274 lines (198 loc) · 8.81 KB

File metadata and controls

274 lines (198 loc) · 8.81 KB

Oracle Database for Java Agent Memory with Spring AI

POC demonstrating AI agent memory using Spring AI with Oracle AI Database 26ai. The agent has three memory layers: episodic memory (chat history persisted via JDBC), semantic memory (domain knowledge retrieved via Oracle Hybrid Vector Search — vector similarity + keyword search fused with Reciprocal Rank Fusion), and procedural memory (DB-backed @Tool-annotated methods the LLM can call to perform actions). Embeddings are computed in-database using a loaded ONNX model. Demo data (8 orders + 12 policy documents) is auto-seeded on startup for a complete end-to-end demo flow.

Architecture

graph LR
    UI["Streamlit UI (:8501)"] --> API["Spring Boot (:8080)"]
    API --> Ollama["Ollama<br/>(LLM chat only)"]
    API --> CM["Chat Memory Table<br/>(episodic memory)"]
    API --> HV["Hybrid Vector Index<br/>(semantic memory)"]
    API --> PM["@Tool Methods<br/>(procedural memory)"]
    PM --> OT["Order & Ticket Tables"]

    subgraph Oracle AI Database 26ai
        CM
        HV
        ONNX["ONNX Model<br/>(all-MiniLM-L12-v2)"]
        HV --> ONNX
        OT
    end
Loading

Memory Layers

graph TD
    Agent["Agent (ChatClient)"]
    Agent --> E["Episodic Memory<br/>MessageChatMemoryAdvisor"]
    Agent --> S["Semantic Memory<br/>RetrievalAugmentationAdvisor"]
    Agent --> P["Procedural Memory<br/>@Tool Methods"]

    E --> |"last 100 messages<br/>per conversation"| DB1["SPRING_AI_CHAT_MEMORY table"]
    S --> |"DBMS_HYBRID_VECTOR.SEARCH<br/>vector + keyword, RRF, top-5"| DB2["POLICY_DOCS table<br/>+ Hybrid Vector Index"]
    P --> T1["listOrders"]
    P --> T2["lookupOrderStatus"]
    P --> T3["initiateReturn"]
    P --> T4["escalateToSupport"]
    P --> T5["listSupportTickets"]
Loading

Query Flow

graph TD
    A[User Message] --> B["POST /api/v1/agent/chat"]
    B --> C["1. Episodic Memory<br/>MessageChatMemoryAdvisor<br/>loads last 100 messages"]
    C --> D["2. Hybrid Vector Search<br/>OracleHybridDocumentRetriever"]
    D --> E["Vector Similarity<br/>(ONNX in-DB)"]
    D --> F["Keyword Search<br/>(Oracle Text)"]
    E --> G["RRF Fusion → top-5"]
    F --> G
    G --> H["3. LLM Call<br/>Ollama qwen2.5"]
    C -.->|chat history| H
    H --> I{Tool call?}
    I -->|yes| J["4. Procedural Memory<br/>@Tool method"]
    J -.->|tool result| H
    I -->|no| K["5. Save Exchange"]
    K --> L[Response]
Loading

Prerequisites

  • Java 21+
  • Python 3.9+
  • Podman or Docker
  • Ollama installed locally

Quick Start

1. Start Oracle Database

mkdir -p ./oradata
podman run -d --name oradb \
  -p 1521:1521 \
  -e ORACLE_PWD=Oracle123 \
  -v ./oradata:/opt/oracle/oradata \
  container-registry.oracle.com/database/free:latest

Wait for the database to be ready:

podman logs -f oradb

Wait for "DATABASE IS READY TO USE!" in the logs before continuing.

2. Grant database privileges

Grants privileges for JPA, ONNX model loading, Oracle Text, and creates the directory for the ONNX model file:

podman exec -i oradb sqlplus sys/Oracle123@freepdb1 as sysdba < setup-db.sql

3. Set up hybrid vector search

Load the ONNX embedding model into Oracle and create the hybrid vector index. This is a one-time setup.

Download the pre-built all-MiniLM-L12-v2 ONNX model from Oracle ML, then:

podman exec oradb mkdir -p /opt/oracle/dumps
podman cp all_MiniLM_L12_v2.onnx oradb:/opt/oracle/dumps/
podman exec -i oradb sqlplus pdbadmin/Oracle123@freepdb1 < setup-hybrid-search.sql

This loads the ONNX model into the database, creates the POLICY_DOCS table, and creates a hybrid vector index that combines vector similarity search with Oracle Text keyword search.

4. Install Ollama, start the server, and pull the chat model

brew install ollama

On macOS, brew install starts Ollama as a background service automatically. If it's not running, start it manually:

ollama serve

This runs in the foreground on http://localhost:11434 — open a new terminal for the next steps. If you get Error: address already in use, the server is already running — skip this step.

Verify it's running:

curl -s http://localhost:11434/api/tags | jq .

Pull the chat model:

ollama pull qwen2.5          # chat model with tool calling support

From cURL:

curl -s http://localhost:11434/api/pull -d '{"model": "qwen2.5"}'

Embeddings are computed in-database by the ONNX model loaded in step 3 — no Ollama embedding model needed.

5. Set up the local profile

cd src/chatserver/src/main/resources
cp application-local.yaml.example application-local.yaml

Ollama defaults (localhost:11434, qwen2.5) are configured in application.yaml. The local profile only overrides database credentials and logging.

6. Start the Chat Server

cd src/chatserver
./gradlew bootRun --args='--spring.profiles.active=local'

The local profile uses the PDBADMIN user that already exists in the Oracle Free container (privileges granted in step 2).

7. Start the Web UI

cd src/web
pip install -r requirements.txt
streamlit run app.py

Opens on http://localhost:8501.

8. Test with curl

Chat (with conversation memory):

curl -X POST http://localhost:8080/api/v1/agent/chat \
  -H "Content-Type: text/plain" \
  -H "X-Conversation-Id: test-1" \
  -d "What orders do I have?"

Ask about policies (semantic memory -- auto-seeded on startup):

curl -X POST http://localhost:8080/api/v1/agent/chat \
  -H "Content-Type: text/plain" \
  -H "X-Conversation-Id: test-1" \
  -d "What's your return policy?"

Use tools (procedural memory):

curl -X POST http://localhost:8080/api/v1/agent/chat \
  -H "Content-Type: text/plain" \
  -H "X-Conversation-Id: test-1" \
  -d "I want to return order ORD-1001, the product was defective."

API Reference

POST /api/v1/agent/chat

Chat with the agent. Supports episodic memory (conversation history) and semantic memory (RAG from knowledge base).

  • Body: plain text message (max 10,000 chars)
  • Headers: Content-Type: text/plain, X-Conversation-Id: <id>
  • Response: plain text

POST /api/v1/agent/knowledge

Add domain knowledge for RAG retrieval. Text is inserted into the POLICY_DOCS table; the hybrid vector index handles embedding automatically.

  • Body: plain text content (max 50,000 chars)
  • Headers: Content-Type: text/plain
  • Response: confirmation message

Environment Variables

When using the local profile, Ollama defaults and database credentials are already configured. No env var exports needed.

When not using the local profile, set:

Variable Description
DB_PASSWORD Oracle Database password

Optional (with defaults)

Variable Default Description
DB_URL jdbc:oracle:thin:@//localhost:1521/freepdb1 JDBC connection URL
DB_USERNAME pdbadmin Database username
OLLAMA_BASE_URL http://localhost:11434 Ollama server URL
OLLAMA_CHAT_MODEL qwen2.5 Ollama chat model
BACKEND_URL http://localhost:8080 Backend URL (Web UI only)

Switching to Other Providers

Spring AI's abstraction layer makes switching the chat model provider a dependency + config change — no Java code changes needed:

Provider Dependency Config prefix
Ollama (local) spring-ai-starter-model-ollama spring.ai.ollama
OpenAI spring-ai-starter-model-openai spring.ai.openai
Anthropic spring-ai-starter-model-anthropic spring.ai.anthropic
OCI GenAI spring-ai-starter-model-oci-genai spring.ai.oci.genai
GCP Vertex AI spring-ai-starter-model-vertex-ai spring.ai.vertex.ai
Azure OpenAI spring-ai-starter-model-azure-openai spring.ai.azure.openai

Only build.gradle dependency and application.yaml config need to change. Embeddings are handled in-database by the ONNX model and are not affected by the chat provider choice.

Cleanup

Oracle Database

podman rm -f oradb
rm -rf ./oradata

Ollama

Delete the pulled model:

ollama rm qwen2.5

Press Ctrl+C to stop the Ollama server