What This Document Covers:
- RAG v2 microservice architecture for AI memecoin generation
- 6-stage AI pipeline (validation → query enhancement → RAG retrieval → LLM generation → image → output)
- FastAPI server with session management and background processing
- Multi-embedding weighted search (entity:4, context:3, visual:2, emotions:1)
- Real-time progress tracking and HTTP API integration
Sections in This Document:
- Service Overview
- Microservice Architecture
- 6-Stage AI Generation Pipeline
- HTTP API Endpoints
- Service Startup and Operation
- Configuration Files
Related Documentation:
- → ../WORKFLOWS.md - All workflows overview
- → ../../docs/features/RAG_SYSTEM.md - RAG system architecture
- → ../../docs/features/AI_GENERATION.md - LLM services
- → ./rag_memecoin_generation_workflow.py - Implementation
Context Tags: #workflow #rag #ai-generation #microservice #fastapi #real-time
Microservice architecture for AI-powered memecoin generation using retrieval-augmented generation
The RAG v2 Generation Workflow Service is a production-ready FastAPI microservice that provides AI-powered memecoin generation capabilities. Unlike traditional file-based workflows, this service operates as an independent HTTP API server that can be integrated with web interfaces, external applications, or called directly via REST API.
Traditional Approach: Generate content from scratch → Hope it's good RAG v2 Approach: Find successful examples → Learn patterns → Generate enhanced content
The service leverages the 5-collection multi-embedding vector database of manually curated successful memecoins to provide context and guidance for new generations. The weighted multi-embedding architecture (entity: 4, context: 3, visual: 2, emotions: 1) ensures retrieval emphasizes semantic content over stylistic details, resulting in higher quality and market-relevant outputs.
┌─────────────────────┐ HTTP API ┌─────────────────────────────────────────┐
│ Web UI Client │◄──────────────►│ RAG v2 Generation Workflow Service │
│ (port 8000) │ requests │ (port 8001) │
│ │ │ │
│ • Generation UI │ │ • FastAPI Server │
│ • Status Polling │ │ • Session Management │
│ • Result Display │ │ • Background Processing │
└─────────────────────┘ │ • HTTP API Endpoints │
│ • CORS Support │
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 6-Stage AI Pipeline │
│ │
│ 1. Input Validation │
│ 2. Query Enhancement (Transform) │
│ 3. Similar Memes Retrieval (RAG) │
│ 4. Memecoin Generation (LLM) │
│ 5. Image Generation (Gemini 2.5 Flash) │
│ 6. File Output (JSON + JPG) │
└─────────────────────────────────────────┘
Start a new memecoin generation request
Request Body:
{
"prompt": "A space-themed memecoin with friendly astronaut cats",
"generation_count": 1,
"max_examples": 5,
"selected_tags": ["space", "cats", "friendly"],
"session_id": "optional-custom-session-id"
}Response:
{
"success": true,
"session_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"message": "Generation request accepted and processing started",
"estimated_completion_time": "8-15 seconds"
}Get the current status of a generation request
Response:
{
"session_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"status": "processing",
"message": "Generating memecoin metadata",
"progress": {
"completed_stages": 3,
"total_stages": 5,
"percentage": 60
},
"stage_timestamps": {
"started": "2025-01-14T15:30:42Z",
"query_embedding": "2025-01-14T15:30:44Z",
"similar_retrieval": "2025-01-14T15:30:47Z"
}
}Check service health and availability
Response:
{
"healthy": true,
"status": "running",
"workflow_initialized": true,
"services": {
"llm_service": "healthy",
"clip_service": "healthy"
}
}1. Client Request → POST /workflow/generate
2. Server Response → session_id + "processing started"
3. Background Processing → 6-stage AI pipeline runs
4. Client Polling → GET /workflow/status/{session_id} (every 5s)
5. Status Updates → "processing" → "completed" | "failed"
6. Final Result → Generated files in res/memecoins/generated/
- Process: Validate request parameters and normalize inputs
- Input: WebGenerationInputSource events with user request
- Output: Validated and normalized generation parameters
- Failure: Terminates if required parameters missing or invalid
- Process: Transform user prompt to match split caption format used in vector database
- Input: Original user prompt from validation stage
- Output: Enhanced natural language description following [What] → [References] → [Visual Style] → [Emotions] structure that aligns with entity/context/visual/emotions caption embeddings
- Failure: Falls back to original prompt if enhancement fails
- Example: "crying wojak" → "Crying Wojak sitting in dark room, expressing deep sadness and despair. Meme culture reference to market losses. Simple cartoon style with melancholic mood."
- Rationale: Query enhancement aligns user prompts with the 4-part split caption structure stored in the vector database, improving CLIP similarity search accuracy
- Process: Weighted multi-embedding similarity search across 5 collections (entity: 4, context: 3, visual: 2, emotions: 1)
- Input: Enhanced query from stage 2 converted to CLIP embedding
- Output: SimilarExamplesAction with top-k similar successful examples (cosine similarity ≥0.6) ranked by weighted combined score
- Failure: Continues with empty context (graceful degradation)
- Architecture: Searches across meme_image_embeddings, entity_caption_embeddings, context_caption_embeddings, visual_caption_embeddings, and emotions_caption_embeddings
- Process: RAG-enhanced LLM generation using similar examples as context
- Input: Original prompt + similar examples
- Output: GeneratedTokenMetadataAction with complete memecoin metadata (name, symbol, description, tags)
- Failure: Terminates if LLM generation fails or invalid format
- Process: AI image generation using Gemini 2.5 Flash Image Preview multimodal model
- Input: Generated metadata + style cues from similar examples
- Output: GeneratedImageAction with high-quality 512x512 JPG image
- Failure: Terminates if image generation service unavailable
- Process: Save complete generation results to filesystem via ExecutorRouter
- Input: Metadata + image from previous stages
- Output: OutputFileAction resulting in JSON metadata file + JPG image file with timestamped names
- Failure: Terminates if filesystem errors occur
# Required environment variables
export REPLICATE_API_TOKEN="your_replicate_token" # For CLIP embeddings & image generation
export OPENAI_API_KEY="your_openai_key" # For LLM generation (optional)
export ANTHROPIC_API_KEY="your_anthropic_key" # Alternative LLM provider (optional)# Start RAG v2 Generation Service on port 8001
PYTHONPATH=. python src/workflows/rag_memecoin_generation_workflow.pyConsole Output:
🚀 Starting RAG v2 Workflow Service on port 8001...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🤖 AI-Powered Memecoin Generation Service
🔍 Initializing workflow and services...
🌐 Web UI available at: http://127.0.0.1:8000/generation.html
🔧 Service API available at: http://127.0.0.1:8001/docs
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Workflow service initialized successfully!
🎯 Using real stages with LLM and CLIP services
🛑 Press Ctrl+C to stop
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)
# Start Web UI client on port 8000 (separate terminal)
PYTHONPATH=. uvicorn src.web_ui.main:app --port 8000 --host 127.0.0.1 --reloadGenerated memecoins are saved with timestamped directory structure:
res/memecoins/generated/
├── 20250114_15_ASTROCAT/
│ ├── f47ac10b-58cc-4372-a567-0e02b2c3d479.json # Complete metadata + generation context
│ ├── f47ac10b-58cc-4372-a567-0e02b2c3d479.jpg # High-quality 512x512 image
│ └── generation_context.json # RAG context and similar examples
└── 20250114_16_SPACEDOGE/
├── a1b2c3d4-5e6f-7890-abcd-ef1234567890.json
└── a1b2c3d4-5e6f-7890-abcd-ef1234567890.jpg
Sample Metadata JSON:
{
"name": "AstroCat",
"symbol": "ASTRO",
"description": "Friendly astronaut cats exploring the cosmos, bringing joy and adventure to space travel",
"tags": ["space", "cats", "friendly", "adventure"],
"generation_context": {
"session_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"prompt": "A space-themed memecoin with friendly astronaut cats",
"similar_examples": [
{
"name": "SpacePup",
"similarity": 0.87,
"tags": ["space", "dogs", "adventure"]
}
],
"processing_time_seconds": 23.7,
"generated_at": "2025-01-14T15:30:42Z"
}
}The RAGQueryEnhancementStage (Stage 2) transforms short user prompts into natural language descriptions that match the split caption format used in the 5-collection vector database. This alignment significantly improves CLIP embedding similarity search accuracy by ensuring queries map to the same semantic spaces as the stored embeddings.
Caption Structure: [What/Who] → [References/Context] → [Visual Style] → [Emotions]
Split Caption Alignment:
- Entity portion → Matches
entity_caption_embeddings(weight: 4, highest priority) - Context portion → Matches
context_caption_embeddings(weight: 3, cultural/meme references) - Visual portion → Matches
visual_caption_embeddings(weight: 2, style details) - Emotions portion → Matches
emotions_caption_embeddings(weight: 1, mood/expressions)
Transformation Examples:
-
"crying wojak meme"→"Crying Wojak sitting in dark room, expressing deep sadness and despair. Meme culture reference to market losses. Simple cartoon style with melancholic mood."- Entity: "Crying Wojak sitting in dark room"
- Context: "Meme culture reference to market losses"
- Visual: "Simple cartoon style"
- Emotions: "Deep sadness and despair, melancholic mood"
-
"trump with beer"→"Donald Trump holding a beer can in casual pose at press conference. Political figure in relaxed setting. Realistic photo style with confident and casual emotions." -
"doge to the moon"→"Shiba Inu dog with rocket ship flying toward moon. Crypto culture reference to price increases. Colorful digital art style with excitement and optimism."
This enhancement ensures that user queries align with how memecoins are described across the 4 caption types in the database, enabling the weighted multi-embedding search to retrieve maximally relevant examples for RAG context.
The service is designed to work seamlessly with the Web UI client:
- Service Separation: Web UI focuses on interface, workflow service handles AI processing using InputSource → Processor → Executor pattern
- Independent Scaling: Services can be deployed and scaled independently
- Health Monitoring: Web UI checks service availability and provides user feedback
- Session Tracking: Real-time status updates via HTTP polling
- ExecutorRouter: Multi-action routing for different generation outputs (examples, metadata, images, files)
- User visits Web UI at
http://127.0.0.1:8000/generation.html - Web UI checks workflow service health automatically
- If service unavailable: Clear error message with startup instructions
- If service available: Generation interface becomes active
- User submits generation request via Web UI
- Web UI polls for status updates and displays progress
- Generated results displayed with download links
For programmatic access or integration with external systems:
import httpx
import asyncio
import json
async def generate_memecoin():
async with httpx.AsyncClient(timeout=60.0) as client:
# Start generation
response = await client.post(
"http://127.0.0.1:8001/workflow/generate",
json={
"prompt": "Cute forest animals having a picnic",
"selected_tags": ["animals", "nature", "cute"],
"generation_count": 1
}
)
if response.status_code == 200:
data = response.json()
session_id = data["session_id"]
print(f"Generation started: {session_id}")
# Poll for status
while True:
status_response = await client.get(
f"http://127.0.0.1:8001/workflow/status/{session_id}"
)
if status_response.status_code == 200:
status_data = status_response.json()
print(f"Status: {status_data['status']} - {status_data['message']}")
if status_data["status"] in ["completed", "failed"]:
break
await asyncio.sleep(2)
else:
print(f"Error: {response.status_code} - {response.text}")
# Run the example
asyncio.run(generate_memecoin())model: "anthropic/claude-3-sonnet-20240229"
temperature: 0.7 # Balanced creativity/consistency
max_tokens: 2000 # Sufficient for complete metadata
timeout: 30 # Reasonable timeout for generationmodel: "stability-ai/sdxl"
resolution: "512x512" # Standard memecoin image size
style: "digital art" # Appropriate for crypto content
quality: "high" # Maximum quality for production use- Health Checks: Continuous monitoring of dependent services
- Graceful Degradation: Continues processing when possible
- Session Cleanup: Automatic cleanup of stale sessions (1 hour timeout)
- Rate Limiting: Built-in protection against API quota exhaustion
{
"success": false,
"detail": "RAG workflow service is not running. Please start the workflow service using: PYTHONPATH=. python src/workflows/rag_memecoin_generation_workflow.py",
"status_code": 503
}- Service Unavailable (503): Workflow service not running
- Invalid Request (400): Missing required fields or invalid format
- Processing Failed (500): Internal AI service errors
- Session Not Found (404): Invalid or expired session ID
- Single machine deployment with both services on localhost
- Ports 8000 (Web UI) and 8001 (Workflow Service) must be available
- File system access required for output directory creation
- Container Deployment: Docker containers for service isolation
- Load Balancer: Multiple workflow service instances behind load balancer
- External Storage: Cloud storage for generated files
- Monitoring: Health check endpoints for service monitoring
- API Gateway: Rate limiting and authentication layer
- Horizontal: Multiple workflow service instances
- Vertical: Increase memory/CPU for faster generation
- Queue-Based: Redis/RabbitMQ for handling high request volumes
The RAG v2 Generation Workflow Service provides a production-ready, scalable solution for AI-powered memecoin generation. Its microservice architecture enables flexible deployment patterns while maintaining the high-quality, context-aware generation capabilities that make it unique in the creative AI space.