Redis OpenAI Agents is a production-ready Python library that provides Redis-powered infrastructure for the OpenAI Agents SDK. Replace 5+ separate systems with a single Redis deployment.
| Sessions & Memory | Caching & Search | Streaming & Coordination |
|---|---|---|
| AgentSession Persistent conversation storage |
SemanticCache Reduce LLM costs by 25%+ |
RedisStreamTransport Reliable, replayable streaming |
| JSONSession Complex nested data storage |
RedisVectorStore Fast similarity search |
AgentCoordinator Multi-agent orchestration |
| SemanticRouter Intent-based agent routing |
HybridSearchService BM25 + vector combined |
RobustStreamProcessor Consumer groups & replay |
- Drop-in Session Storage → Replace SQLite with distributed Redis sessions
- Cost Reduction → Semantic caching reduces LLM API calls by 25%+
- Production Streaming → Redis Streams for reliable token delivery
- Multi-Agent Systems → Coordinate agents with atomic operations
Install redis-openai-agents into your Python (>=3.10) environment:
pip install redis-openai-agentsChoose from multiple Redis deployment options:
-
Redis Cloud: Managed cloud database (free tier available)
-
Redis (Docker): The official
redis:8image ships with Search, JSON, Time Series, and Bloom filters built in - no separate stack image required.docker run -d --name redis -p 6379:6379 redis:8
-
Redis Enterprise: Commercial, self-hosted database
Want a GUI? Run Redis Insight separately:
docker run -d --name redisinsight -p 5540:5540 redis/redisinsight:latest.
Replace SQLite sessions with Redis for persistent, distributed conversation storage:
from agents import Agent, Runner
from redis_openai_agents import AgentSession
# Create a session
session = AgentSession.create(
user_id="user_123",
redis_url="redis://localhost:6379"
)
# Define your agent
agent = Agent(name="assistant", instructions="You are a helpful assistant.")
# Run the agent
result = await Runner.run(agent, input="Hello!")
# Store the conversation
session.store_agent_result(result)
# Later: Load and continue the conversation
session = AgentSession.load(
conversation_id=session.conversation_id,
user_id="user_123",
redis_url="redis://localhost:6379"
)
# Get conversation history in SDK format
history = session.to_agent_inputs()
result = await Runner.run(agent, input=history + [{"role": "user", "content": "Follow up"}])An async-compatible JSON session is also available:
JSONSessionfor complex nested data.
Reduce LLM costs by caching responses for similar queries:
from redis_openai_agents import SemanticCache
cache = SemanticCache(
redis_url="redis://localhost:6379",
distance_threshold=0.1, # Similarity threshold (lower = stricter)
ttl=3600 # 1 hour TTL
)
# Check cache before calling LLM
result = cache.check(query="What is the capital of France?")
if result:
print(f"Cache hit: {result.response}")
else:
# Call LLM and store result
response = "Paris is the capital of France."
cache.store(query="What is the capital of France?", response=response)Learn more about semantic caching.
Route queries to the appropriate agent using vector similarity - no LLM calls required:
from redis_openai_agents import SemanticRouter, Route
router = SemanticRouter(
name="support-router",
redis_url="redis://localhost:6379",
routes=[
Route(
name="billing",
references=["payment issue", "invoice", "refund request"],
metadata={"agent": "billing_agent"},
distance_threshold=0.3
),
Route(
name="technical",
references=["bug report", "error message", "not working"],
metadata={"agent": "tech_agent"},
distance_threshold=0.3
),
]
)
# Route a query (vector lookup, not LLM call)
match = router.route("I need help with my payment")
print(f"Route to: {match.name}") # "billing"Learn more about semantic routing.
Build RAG applications with Redis vector search:
from redis_openai_agents import RedisVectorStore
store = RedisVectorStore(
name="knowledge-base",
redis_url="redis://localhost:6379"
)
# Add documents
store.add_documents([
{"content": "Redis is an in-memory data store.", "source": "docs"},
{"content": "Python is a programming language.", "source": "wiki"},
])
# Search with metadata filtering
results = store.search(
query="What is Redis?",
k=5,
filter={"source": "docs"}
)
for result in results:
print(f"{result.content} (score: {result.score})")Combine vector similarity with BM25 full-text search for better retrieval:
from redis_openai_agents import HybridSearchService
search = HybridSearchService(
name="hybrid-search",
redis_url="redis://localhost:6379"
)
# Search with both vector and text matching
results = search.search(
query="Redis performance optimization",
k=10,
vector_weight=0.7, # 70% vector similarity
text_weight=0.3 # 30% BM25 text match
)Reliable, replayable token streaming via Redis Streams:
from redis_openai_agents import RedisStreamTransport, RobustStreamProcessor
# Publisher side
transport = RedisStreamTransport(
stream_name="agent-output",
redis_url="redis://localhost:6379"
)
await transport.publish({"type": "token", "data": {"text": "Hello"}})
await transport.publish({"type": "token", "data": {"text": " world!"}})
await transport.publish({"type": "complete", "data": {}})
# Consumer side with automatic recovery
processor = RobustStreamProcessor(
stream_name="agent-output",
consumer_group="clients",
redis_url="redis://localhost:6379"
)
async for event in processor.process():
if event["type"] == "token":
print(event["data"]["text"], end="")Supports consumer groups, automatic acknowledgment, and replay from any position.
Coordinate multiple agents with Redis pub/sub and atomic operations:
from redis_openai_agents import AgentCoordinator, EventType
coordinator = AgentCoordinator(
session_id="multi-agent-session",
redis_url="redis://localhost:6379"
)
# Agent 1: Signal handoff ready
await coordinator.publish(EventType.HANDOFF_READY, {
"from_agent": "triage",
"to_agent": "specialist",
"context": {"topic": "billing"}
})
# Agent 2: Listen for handoffs
async for event in coordinator.subscribe():
if event.type == EventType.HANDOFF_READY:
print(f"Handoff from {event.data['from_agent']}")Compose cross-cutting concerns around the agent's LLM call with an
around-style middleware protocol modelled on LangChain's AgentMiddleware:
from agents import Agent, Runner
from agents.models.openai_responses import OpenAIResponsesModel
from openai import AsyncOpenAI
from redis_openai_agents import (
MiddlewareStack, Route, SemanticCache, SemanticRouter,
)
from redis_openai_agents.middleware import (
SemanticCacheMiddleware, SemanticRouterMiddleware,
)
router = SemanticRouter(
name="support-router", redis_url="redis://localhost:6379",
routes=[Route(name="greeting", references=["hello", "hi"])],
)
router_mw = SemanticRouterMiddleware(router=router, responses={"greeting": "Hi!"})
cache = SemanticCache(redis_url="redis://localhost:6379", similarity_threshold=0.92)
cache_mw = SemanticCacheMiddleware(cache=cache)
stack = MiddlewareStack(
model=OpenAIResponsesModel(model="gpt-4o-mini", openai_client=AsyncOpenAI()),
middlewares=[router_mw, cache_mw], # outer-to-inner
)
agent = Agent(name="assistant", instructions="Be concise.", model=stack)
result = await Runner.run(agent, "hello") # short-circuited by routerShips with SemanticCacheMiddleware, SemanticRouterMiddleware, and
ConversationMemoryMiddleware. Write your own: any object with an async
awrap_model_call(request, handler) coroutine is a middleware.
Memoize a tool's Python callable in Redis, keyed by argument hash. Side-effect
prefixes (send_, delete_, ...) and volatile args (timestamp, now, ...)
bypass the cache automatically.
from agents import function_tool
from redis_openai_agents import cached_tool
@function_tool
@cached_tool(name="lookup_company", redis_url="redis://localhost:6379", ttl=3600)
async def lookup_company(ticker: str) -> str:
return await _hit_paid_api(ticker)Built-in observability with RedisTimeSeries and Prometheus:
from redis_openai_agents import AgentMetrics, PrometheusExporter
metrics = AgentMetrics(redis_url="redis://localhost:6379")
# Record metrics
await metrics.record_latency("agent_run", 150.5)
await metrics.record_tokens("gpt-4", input_tokens=100, output_tokens=50)
await metrics.record_cache_hit("semantic_cache")
# Get statistics
stats = await metrics.get_stats("latency", aggregation="avg", time_range="1h")
# Prometheus export (http://localhost:9090/metrics)
exporter = PrometheusExporter(metrics)
await exporter.start_server(port=9090)| Component | Description |
|---|---|
AgentSession |
Hash-based session storage built on RedisVL MessageHistory |
JSONSession |
JSON document storage for complex nested session data |
SemanticRouter |
Vector-based intent routing without LLM calls |
| Component | Description |
|---|---|
SemanticCache |
Two-level cache (exact match + semantic similarity) |
RedisCachingModel |
Model wrapper with automatic response caching |
RedisVectorStore |
HNSW vector search for RAG applications |
RedisFullTextSearch |
BM25 full-text search with filters |
HybridSearchService |
Combined vector + text search with configurable weights |
| Component | Description |
|---|---|
RedisStreamTransport |
Redis Streams-based event transport |
RobustStreamProcessor |
Consumer groups with automatic recovery |
ResumableStreamRunner |
Checkpoint-based stream resumption |
AgentCoordinator |
Multi-agent coordination via pub/sub |
AtomicOperations |
Lua script-based atomic Redis operations |
| Component | Description |
|---|---|
AgentMetrics |
RedisTimeSeries metrics collection |
PrometheusExporter |
Prometheus metrics endpoint |
RedisTracingProcessor |
SDK-compatible trace storage in Redis Streams |
| Component | Description |
|---|---|
RedisAgentRunner |
Enhanced runner with caching and metrics |
RedisFileSearchTool |
Drop-in replacement for OpenAI file search |
RedisRateLimitGuardrail |
SDK guardrail with Redis-backed rate limiting |
MiddlewareStack |
Around-style middleware wrapping the SDK Model interface |
SemanticCacheMiddleware |
Cache LLM responses by input similarity |
SemanticRouterMiddleware |
Short-circuit matched intents with canned responses |
ConversationMemoryMiddleware |
Inject semantically relevant past messages |
cached_tool |
Decorator that memoizes a tool callable's result in Redis |
| Component | Description |
|---|---|
RankedOperations |
Sorted set rankings for agents and tools |
DeduplicationService |
Bloom filter request deduplication |
RedisConnectionPool |
Connection pooling with retry logic |
| Example | Description |
|---|---|
| 01-routing-agents | Multi-agent routing with handoffs |
| 02-semantic-cache | Reduce LLM costs with caching |
| 03-vector-search | Build RAG applications |
| 04-full-text-search | BM25 full-text search |
| 05-token-streaming | Real-time streaming with Redis Streams |
| 06-time-series-metrics | Observability with TimeSeries |
| 07-full-stack-integration | Complete integration example |
| 08-runner-integration | RedisAgentRunner usage |
| 09-hybrid-search | Combined vector + full-text search |
| 10-agent-ranking | Sorted set rankings |
| 11-deduplication | Bloom filter deduplication |
| 12-agent-coordinator | Multi-agent orchestration |
| 13-robust-streaming | Consumer groups & recovery |
| 14-atomic-operations | Lua script atomicity |
| 15-semantic-router | Intent-based routing |
| 16-middleware | Cache + router + composition around the Model |
| 17-tool-caching | @cached_tool for idempotent tools |
| Challenge | Without Redis | With Redis OpenAI Agents |
|---|---|---|
| Session Storage | SQLite (single-node) | Distributed Redis sessions |
| Caching | None or external service | Built-in semantic cache |
| Vector Search | Pinecone, Qdrant ($70+/mo) | Redis Vector Search (free) |
| Streaming | Custom WebSocket code | Redis Streams (reliable) |
| Metrics | Prometheus + Grafana setup | Built-in TimeSeries |
| Total Services | 5+ separate systems | 1 Redis deployment |
This project uses uv for dependency management.
# Install dependencies
uv sync --all-extras --group dev
# Run tests
uv run pytest --run-api-tests
# Format and lint
make format
make lint
# Type check
make mypy
# Build documentation
make docsWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with ❤️ by Redis for the OpenAI Agents SDK community