Skip to content

Latest commit

 

History

History
419 lines (343 loc) · 18.9 KB

File metadata and controls

419 lines (343 loc) · 18.9 KB

🚀 RAG v2 Generation Workflow Service

📋 Document Summary

What This Document Covers:

  • RAG v2 microservice architecture for AI memecoin generation
  • 6-stage AI pipeline (validation → query enhancement → RAG retrieval → LLM generation → image → output)
  • FastAPI server with session management and background processing
  • Multi-embedding weighted search (entity:4, context:3, visual:2, emotions:1)
  • Real-time progress tracking and HTTP API integration

Sections in This Document:

Related Documentation:

Context Tags: #workflow #rag #ai-generation #microservice #fastapi #real-time


Microservice architecture for AI-powered memecoin generation using retrieval-augmented generation

📋 Service Overview

The RAG v2 Generation Workflow Service is a production-ready FastAPI microservice that provides AI-powered memecoin generation capabilities. Unlike traditional file-based workflows, this service operates as an independent HTTP API server that can be integrated with web interfaces, external applications, or called directly via REST API.

🎯 Core Innovation: RAG-Enhanced Generation

Traditional Approach: Generate content from scratch → Hope it's good RAG v2 Approach: Find successful examples → Learn patterns → Generate enhanced content

The service leverages the 5-collection multi-embedding vector database of manually curated successful memecoins to provide context and guidance for new generations. The weighted multi-embedding architecture (entity: 4, context: 3, visual: 2, emotions: 1) ensures retrieval emphasizes semantic content over stylistic details, resulting in higher quality and market-relevant outputs.

🏗️ Microservice Architecture

┌─────────────────────┐    HTTP API    ┌─────────────────────────────────────────┐
│   Web UI Client     │◄──────────────►│  RAG v2 Generation Workflow Service     │
│   (port 8000)      │    requests     │  (port 8001)                           │
│                     │                 │                                         │
│ • Generation UI     │                 │ • FastAPI Server                       │
│ • Status Polling    │                 │ • Session Management                   │
│ • Result Display    │                 │ • Background Processing                │
└─────────────────────┘                 │ • HTTP API Endpoints                   │
                                         │ • CORS Support                         │
                                         └─────────────────────────────────────────┘
                                                            │
                                                            ▼
                                         ┌─────────────────────────────────────────┐
                                         │        6-Stage AI Pipeline             │
                                         │                                         │
                                         │ 1. Input Validation                    │
                                         │ 2. Query Enhancement (Transform)       │
                                         │ 3. Similar Memes Retrieval (RAG)       │
                                         │ 4. Memecoin Generation (LLM)           │
                                         │ 5. Image Generation (Gemini 2.5 Flash) │
                                         │ 6. File Output (JSON + JPG)            │
                                         └─────────────────────────────────────────┘

🌐 HTTP API Endpoints

POST /workflow/generate

Start a new memecoin generation request

Request Body:

{
  "prompt": "A space-themed memecoin with friendly astronaut cats",
  "generation_count": 1,
  "max_examples": 5,
  "selected_tags": ["space", "cats", "friendly"],
  "session_id": "optional-custom-session-id"
}

Response:

{
  "success": true,
  "session_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "message": "Generation request accepted and processing started",
  "estimated_completion_time": "8-15 seconds"
}

GET /workflow/status/{session_id}

Get the current status of a generation request

Response:

{
  "session_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "status": "processing",
  "message": "Generating memecoin metadata",
  "progress": {
    "completed_stages": 3,
    "total_stages": 5,
    "percentage": 60
  },
  "stage_timestamps": {
    "started": "2025-01-14T15:30:42Z",
    "query_embedding": "2025-01-14T15:30:44Z",
    "similar_retrieval": "2025-01-14T15:30:47Z"
  }
}

GET /workflow/health

Check service health and availability

Response:

{
  "healthy": true,
  "status": "running",
  "workflow_initialized": true,
  "services": {
    "llm_service": "healthy",
    "clip_service": "healthy"
  }
}

🔄 Session-Based Processing Flow

1. Client Request → POST /workflow/generate
2. Server Response → session_id + "processing started"
3. Background Processing → 6-stage AI pipeline runs
4. Client Polling → GET /workflow/status/{session_id} (every 5s)
5. Status Updates → "processing" → "completed" | "failed"
6. Final Result → Generated files in res/memecoins/generated/

🤖 6-Stage AI Generation Pipeline (InputSource → Processor → Executor)

Stage 1: Input Validation (< 1 second)

  • Process: Validate request parameters and normalize inputs
  • Input: WebGenerationInputSource events with user request
  • Output: Validated and normalized generation parameters
  • Failure: Terminates if required parameters missing or invalid

Stage 2: Query Enhancement (2-3 seconds)

  • Process: Transform user prompt to match split caption format used in vector database
  • Input: Original user prompt from validation stage
  • Output: Enhanced natural language description following [What] → [References] → [Visual Style] → [Emotions] structure that aligns with entity/context/visual/emotions caption embeddings
  • Failure: Falls back to original prompt if enhancement fails
  • Example: "crying wojak" → "Crying Wojak sitting in dark room, expressing deep sadness and despair. Meme culture reference to market losses. Simple cartoon style with melancholic mood."
  • Rationale: Query enhancement aligns user prompts with the 4-part split caption structure stored in the vector database, improving CLIP similarity search accuracy

Stage 3: Similar Memes Retrieval (1-3 seconds)

  • Process: Weighted multi-embedding similarity search across 5 collections (entity: 4, context: 3, visual: 2, emotions: 1)
  • Input: Enhanced query from stage 2 converted to CLIP embedding
  • Output: SimilarExamplesAction with top-k similar successful examples (cosine similarity ≥0.6) ranked by weighted combined score
  • Failure: Continues with empty context (graceful degradation)
  • Architecture: Searches across meme_image_embeddings, entity_caption_embeddings, context_caption_embeddings, visual_caption_embeddings, and emotions_caption_embeddings

Stage 4: Memecoin Generation (5-15 seconds)

  • Process: RAG-enhanced LLM generation using similar examples as context
  • Input: Original prompt + similar examples
  • Output: GeneratedTokenMetadataAction with complete memecoin metadata (name, symbol, description, tags)
  • Failure: Terminates if LLM generation fails or invalid format

Stage 5: Image Generation (8-25 seconds)

  • Process: AI image generation using Gemini 2.5 Flash Image Preview multimodal model
  • Input: Generated metadata + style cues from similar examples
  • Output: GeneratedImageAction with high-quality 512x512 JPG image
  • Failure: Terminates if image generation service unavailable

Stage 6: File Output (1-2 seconds)

  • Process: Save complete generation results to filesystem via ExecutorRouter
  • Input: Metadata + image from previous stages
  • Output: OutputFileAction resulting in JSON metadata file + JPG image file with timestamped names
  • Failure: Terminates if filesystem errors occur

🚀 Service Startup and Operation

Prerequisites

# Required environment variables
export REPLICATE_API_TOKEN="your_replicate_token"    # For CLIP embeddings & image generation
export OPENAI_API_KEY="your_openai_key"              # For LLM generation (optional)
export ANTHROPIC_API_KEY="your_anthropic_key"        # Alternative LLM provider (optional)

Start the Service

# Start RAG v2 Generation Service on port 8001
PYTHONPATH=. python src/workflows/rag_memecoin_generation_workflow.py

Console Output:

🚀 Starting RAG v2 Workflow Service on port 8001...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🤖 AI-Powered Memecoin Generation Service
🔍 Initializing workflow and services...
🌐 Web UI available at: http://127.0.0.1:8000/generation.html
🔧 Service API available at: http://127.0.0.1:8001/docs
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✅ Workflow service initialized successfully!
🎯 Using real stages with LLM and CLIP services
🛑 Press Ctrl+C to stop

INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)

Optional: Start Web UI Client

# Start Web UI client on port 8000 (separate terminal)
PYTHONPATH=. uvicorn src.web_ui.main:app --port 8000 --host 127.0.0.1 --reload

📁 Output Structure

Generated memecoins are saved with timestamped directory structure:

res/memecoins/generated/
├── 20250114_15_ASTROCAT/
│   ├── f47ac10b-58cc-4372-a567-0e02b2c3d479.json    # Complete metadata + generation context
│   ├── f47ac10b-58cc-4372-a567-0e02b2c3d479.jpg     # High-quality 512x512 image
│   └── generation_context.json                       # RAG context and similar examples
└── 20250114_16_SPACEDOGE/
    ├── a1b2c3d4-5e6f-7890-abcd-ef1234567890.json
    └── a1b2c3d4-5e6f-7890-abcd-ef1234567890.jpg

Sample Metadata JSON:

{
  "name": "AstroCat",
  "symbol": "ASTRO",
  "description": "Friendly astronaut cats exploring the cosmos, bringing joy and adventure to space travel",
  "tags": ["space", "cats", "friendly", "adventure"],
  "generation_context": {
    "session_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
    "prompt": "A space-themed memecoin with friendly astronaut cats",
    "similar_examples": [
      {
        "name": "SpacePup",
        "similarity": 0.87,
        "tags": ["space", "dogs", "adventure"]
      }
    ],
    "processing_time_seconds": 23.7,
    "generated_at": "2025-01-14T15:30:42Z"
  }
}

🎯 Query Enhancement for Better Retrieval

The RAGQueryEnhancementStage (Stage 2) transforms short user prompts into natural language descriptions that match the split caption format used in the 5-collection vector database. This alignment significantly improves CLIP embedding similarity search accuracy by ensuring queries map to the same semantic spaces as the stored embeddings.

Caption Structure: [What/Who] → [References/Context] → [Visual Style] → [Emotions]

Split Caption Alignment:

  • Entity portion → Matches entity_caption_embeddings (weight: 4, highest priority)
  • Context portion → Matches context_caption_embeddings (weight: 3, cultural/meme references)
  • Visual portion → Matches visual_caption_embeddings (weight: 2, style details)
  • Emotions portion → Matches emotions_caption_embeddings (weight: 1, mood/expressions)

Transformation Examples:

  • "crying wojak meme""Crying Wojak sitting in dark room, expressing deep sadness and despair. Meme culture reference to market losses. Simple cartoon style with melancholic mood."

    • Entity: "Crying Wojak sitting in dark room"
    • Context: "Meme culture reference to market losses"
    • Visual: "Simple cartoon style"
    • Emotions: "Deep sadness and despair, melancholic mood"
  • "trump with beer""Donald Trump holding a beer can in casual pose at press conference. Political figure in relaxed setting. Realistic photo style with confident and casual emotions."

  • "doge to the moon""Shiba Inu dog with rocket ship flying toward moon. Crypto culture reference to price increases. Colorful digital art style with excitement and optimism."

This enhancement ensures that user queries align with how memecoins are described across the 4 caption types in the database, enabling the weighted multi-embedding search to retrieve maximally relevant examples for RAG context.

🔗 Web UI Integration

The service is designed to work seamlessly with the Web UI client:

Architecture Benefits

  • Service Separation: Web UI focuses on interface, workflow service handles AI processing using InputSource → Processor → Executor pattern
  • Independent Scaling: Services can be deployed and scaled independently
  • Health Monitoring: Web UI checks service availability and provides user feedback
  • Session Tracking: Real-time status updates via HTTP polling
  • ExecutorRouter: Multi-action routing for different generation outputs (examples, metadata, images, files)

User Experience

  1. User visits Web UI at http://127.0.0.1:8000/generation.html
  2. Web UI checks workflow service health automatically
  3. If service unavailable: Clear error message with startup instructions
  4. If service available: Generation interface becomes active
  5. User submits generation request via Web UI
  6. Web UI polls for status updates and displays progress
  7. Generated results displayed with download links

🐍 Direct API Usage (No Web UI)

For programmatic access or integration with external systems:

import httpx
import asyncio
import json

async def generate_memecoin():
    async with httpx.AsyncClient(timeout=60.0) as client:
        # Start generation
        response = await client.post(
            "http://127.0.0.1:8001/workflow/generate",
            json={
                "prompt": "Cute forest animals having a picnic",
                "selected_tags": ["animals", "nature", "cute"],
                "generation_count": 1
            }
        )
        
        if response.status_code == 200:
            data = response.json()
            session_id = data["session_id"]
            print(f"Generation started: {session_id}")
            
            # Poll for status
            while True:
                status_response = await client.get(
                    f"http://127.0.0.1:8001/workflow/status/{session_id}"
                )
                
                if status_response.status_code == 200:
                    status_data = status_response.json()
                    print(f"Status: {status_data['status']} - {status_data['message']}")
                    
                    if status_data["status"] in ["completed", "failed"]:
                        break
                
                await asyncio.sleep(2)
        else:
            print(f"Error: {response.status_code} - {response.text}")

# Run the example
asyncio.run(generate_memecoin())

🔧 Configuration Files

LLM Configuration (src/res/config/generate_meme_llm.yaml)

model: "anthropic/claude-3-sonnet-20240229"
temperature: 0.7          # Balanced creativity/consistency  
max_tokens: 2000          # Sufficient for complete metadata
timeout: 30               # Reasonable timeout for generation

Image Generation (src/res/config/image_generation_llm.yaml)

model: "stability-ai/sdxl"
resolution: "512x512"     # Standard memecoin image size
style: "digital art"      # Appropriate for crypto content
quality: "high"           # Maximum quality for production use

🛡️ Error Handling and Resilience

Service-Level Error Handling

  • Health Checks: Continuous monitoring of dependent services
  • Graceful Degradation: Continues processing when possible
  • Session Cleanup: Automatic cleanup of stale sessions (1 hour timeout)
  • Rate Limiting: Built-in protection against API quota exhaustion

Client Error Responses

{
  "success": false,
  "detail": "RAG workflow service is not running. Please start the workflow service using: PYTHONPATH=. python src/workflows/rag_memecoin_generation_workflow.py",
  "status_code": 503
}

Common Error Scenarios

  • Service Unavailable (503): Workflow service not running
  • Invalid Request (400): Missing required fields or invalid format
  • Processing Failed (500): Internal AI service errors
  • Session Not Found (404): Invalid or expired session ID

🚀 Deployment Considerations

Development Setup

  • Single machine deployment with both services on localhost
  • Ports 8000 (Web UI) and 8001 (Workflow Service) must be available
  • File system access required for output directory creation

Production Recommendations

  • Container Deployment: Docker containers for service isolation
  • Load Balancer: Multiple workflow service instances behind load balancer
  • External Storage: Cloud storage for generated files
  • Monitoring: Health check endpoints for service monitoring
  • API Gateway: Rate limiting and authentication layer

Scaling Patterns

  • Horizontal: Multiple workflow service instances
  • Vertical: Increase memory/CPU for faster generation
  • Queue-Based: Redis/RabbitMQ for handling high request volumes

The RAG v2 Generation Workflow Service provides a production-ready, scalable solution for AI-powered memecoin generation. Its microservice architecture enables flexible deployment patterns while maintaining the high-quality, context-aware generation capabilities that make it unique in the creative AI space.