Skip to content

Latest commit

 

History

History
751 lines (602 loc) · 33 KB

File metadata and controls

751 lines (602 loc) · 33 KB

✏️ Generated Meme Edit Analysis Workflow

📋 Document Summary

What This Document Covers:

  • AI-powered editing workflow for generated memecoins with intelligent strategy selection
  • 6-stage pipeline: decision analysis → entity extraction → dual planning → image/metadata regeneration
  • Dual plan architecture with separate image and metadata edit plans
  • Multimodal entity extraction with direct database lookups (O(1))
  • Context-enhanced regeneration using entity reference images
  • Edit proposal caching system with 5-minute TTL

Sections in This Document:

Related Documentation:

Context Tags: #workflow #editing #generated-memes #ai-analysis #dual-plan #multimodal #entity-extraction


Intelligent editing workflow for generated memecoins with LLM-powered decision analysis and context-enhanced regeneration

📋 Overview

The Generated Meme Edit Analysis Workflow is a sophisticated AI-powered system that analyzes user feedback and intelligently regenerates affected parts of generated memecoins (name, description, tags, image).

Unlike simple field updates, this workflow:

  • Analyzes feedback with LLM to determine what needs changing
  • Retrieves context from vector DB when problematic entities detected
  • Regenerates selectively only affected fields (metadata-only or full regeneration)
  • Maintains consistency between text and image content

🎯 Core Innovation: Intelligent Edit Strategy Selection

Traditional Approach: Regenerate everything → Wastes time and API costs Our Approach: LLM analyzes feedback → Regenerates only what's needed → Saves time and money

Strategy When Used Time Cost
Metadata-only Text changes only 2-3 sec $0.002
Metadata + Image Visual elements affected 8-12 sec $0.15
Full regeneration Complete overhaul needed 10-15 sec $0.17

Key Innovation: Context-enhanced regeneration using vector DB examples for problematic entities

🏗️ Architecture

Core Assumptions:

  • User feedback is natural language (not structured commands)
  • Generated memecoins use same UUID for in-place updates (no versioning)
  • Session IDs track generation sessions
  • Proposals cached for 5 minutes, require explicit approval
  • Entity database provides O(1) reference image lookups

Key Innovation: 6-stage intelligent pipeline (EditDecisionStage → EntityExtractionStage → MemeEditPlanningStage → ImageRegenerationStage (conditional) → MetadataRegenerationStage (conditional) → EditOutputStage) with LLM planning, multimodal entity extraction, structured dual plan generation, and proposal caching (5-minute TTL).

🧠 Edit Strategy Analysis

Strategy Selection (EditDecisionStage determines):


**Output Model**:
```python
class EditDecision:
    regeneration_strategy: str  # metadata_only | metadata_with_image | full_regeneration
    affected_fields: List[str]  # e.g., ["description", "tags"]
    problematic_entities: List[str]  # e.g., ["pepe", "wojak"]
    needs_image_regeneration: bool
    needs_context_retrieval: bool
    reasoning: str  # LLM explanation

🔄 6-Stage Processing Pipeline

Stage 1: EditDecisionStage (LLM Analysis) (~2-3 seconds)

  • Process: Analyze feedback with LLM to determine edit strategy
  • Input: Current memecoin + user feedback
  • Output: EditDecision with strategy, affected fields, problematic entities
  • Failure: LLM error → Return error to user
  • Context updates: decision, strategy, affected_fields

Example: {"regeneration_strategy": "metadata_with_image", "affected_fields": ["name", "description"], "problematic_entities": ["pepe"], "needs_image_regeneration": true}

Stage 2: EntityExtractionStage (Multimodal Extraction) (~0.5-1 seconds)

  • Process: Multimodal entity extraction from metadata, feedback, and current image
  • Input: MemecoinEntry + user_feedback + image_base64 (optional)
  • Output: EntityExtractionResult with entity_reference_images
  • Failure: No entities found → Continue with empty result (warning logged)
  • Context updates: extracted_entities, entity_reference_images, matched_entities
  • Triggered: Always runs (not conditional)

Key Features:

  • Multimodal extraction: Analyzes both text (metadata + feedback) and image simultaneously
  • Direct database lookups: O(1) key-based lookups in EntityDatabaseService
  • Reference images: Returns Dict[str, List[str]] mapping entity names to Base64 images
  • Image extraction: Enabled by default (enable_image_extraction=True)
  • Entity types: Extracts CHARACTER, LOGO, OBJECT, SYMBOL types (filters out LOCATION, CONCEPT)

Example: Returns entity_reference_images as {"pepe": ["data:image/png;base64,iVBOR..."], "bnb_logo": ["data:image/png;base64,iVBOR..."]} with matched/unmatched counts.

Stage 3: MemeEditPlanningStage (DUAL Plan Generation) (~2-3 seconds)

  • Process: LLM-powered DUAL plan creation (image + metadata) with entity-to-image mapping and inconsistency detection
  • Input: MemecoinEntry + edit_decision + entity_reference_images + extracted_entities + user_feedback + current image + current_metadata
  • Output: MemeEditPlan containing BOTH image_plan (ImageEditPlan) AND metadata_plan (List[MetadataEditInstruction])
  • Failure: LLM error → Terminate processing (critical for accurate regeneration)
  • Context updates: meme_edit_plan, planning_completed=True
  • Triggered: Always runs (not conditional)

Key Features:

  • DUAL plan architecture: Single LLM call generates both image AND metadata plans simultaneously
  • Metadata inconsistency detection: Compares stored vs current metadata to detect mismatches (e.g., name changed but ticker doesn't match)
  • Entity-to-image mapping: Explicit enumeration of which reference image corresponds to which entity
  • Visual attribute checklists: Detailed specifications for colors, shapes, proportions, styles per entity
  • Priority-based corrections: Critical (1), Important (2), Minor (3) classification for both image and metadata
  • Preservation instructions: Explicit guidance on what NOT to change
  • Context optimization: Uses 1 random reference image per entity (reduces LLM context usage)
  • Separation of concerns: Image plan for visual changes, metadata plan for text changes

⚠️ IMPORTANT: current_metadata parameter enables inconsistency detection, but currently has an integration gap (see "Known Limitations" below). The planning stage code correctly handles this parameter, but it's not yet flowing from the orchestrator through the workflow to the processor context.

MemeEditPlan Output Structure (Dual Plan):

class MemeEditPlan:
    image_plan: ImageEditPlan  # Plan for visual changes
    metadata_plan: List[MetadataEditInstruction]  # Plan for metadata changes
    overall_strategy: str  # High-level editing approach

class ImageEditPlan:
    entities_to_edit: List[EntityEditInstruction]  # Entity-specific corrections
    preservation_instructions: List[str]  # Elements to keep unchanged
    edit_priority: List[str]  # Entity names ordered by importance
    overall_strategy: str  # High-level approach for image
    requires_regeneration: bool  # Whether image regeneration needed

class EntityEditInstruction:
    entity_name: str  # e.g., "Solana logo", "Wojak character"
    reference_image_indices: List[int]  # Which ref images to use (0-based)
    corrections_needed: List[str]  # Specific bullet points
    visual_attributes: Dict[str, str]  # color, shape, style, proportions
    priority: int  # 1=critical, 2=important, 3=minor

class MetadataEditInstruction:
    field_name: str  # name | ticker | description | tags
    instruction: str  # What to change/fix
    reasoning: str  # Why this change is needed
    priority: int  # 1=critical, 2=important, 3=minor

Example (Dual Plan structure):

{
  "image_plan": {
    "entities_to_edit": [{
      "entity_name": "Solana logo",
      "reference_image_indices": [1],
      "corrections_needed": ["Replace with official logo", "Match gradient"],
      "visual_attributes": {"color_scheme": "Gradient #00D4AA to #9945FF"},
      "priority": 1
    }],
    "preservation_instructions": ["Keep main Pepe character"],
    "requires_regeneration": true
  },
  "metadata_plan": [{
    "field_name": "ticker",
    "instruction": "Update to match new name",
    "priority": 1
  }]
}

Stage 4: ImageRegenerationStage (Conditional) (~5-8 seconds)

  • Process: Structured plan-based image regeneration with enhanced prompts
  • Input: MemecoinEntry + meme_edit_plan + entity_reference_images + user_feedback
  • Output: New Base64-encoded image in edited_entry.image_base64
  • Failure: Image generation error → Keep original image (graceful degradation)
  • Context updates: edited_entry (with new image), image_regenerated=True
  • Triggered: Only if needs_image_regeneration == true

Key Features:

  • Extracts image_plan from MemeEditPlan: Uses ONLY the image plan component from the dual plan (ignores metadata_plan)
  • Enhanced prompt construction: Explicit entity-to-image enumeration and strong biasing language
  • Current image as PRIMARY reference: Uses existing image for edit context (Image 0)
  • Entity references numbered: Image 1, Image 2, etc. mapped to specific entities
  • Context optimization: 1 random reference image per entity (selected in Stages 2-3)
  • Strong replication bias: "MUST replicate EXACTLY" language for accurate entity rendering
  • Base64-first architecture: Returns Base64 data directly (no file paths)
  • Fail-fast error handling: Terminates on image generation failure

Enhanced Prompt Structure: Enumerates reference images (Image 0: current, Image 1+: entity refs), lists entity-specific corrections with visual attributes (colors, shapes, styles), preservation instructions, and critical execution rules ("MUST replicate EXACTLY", "Study each reference carefully", "Do not improvise").

Stage 5: MetadataRegenerationStage (Conditional) (~2-3 seconds)

  • Process: Structured metadata field regeneration using dual plan's metadata instructions
  • Input: edited_entry (with regenerated image) + meme_edit_plan.metadata_plan + edit_decision.affected_fields
  • Output: Updated MetadataProposal (name, ticker, description, tags)
  • Failure: LLM error → Terminate processing (critical)
  • Context updates: edited_entry (with updated metadata), metadata_regenerated=True
  • Triggered: Only if metadata_plan is not empty OR affected_fields is not empty

Key Features:

  • Uses structured MetadataEditInstruction list: Extracts metadata_plan from MemeEditPlan (ignores image_plan)
  • Priority-based instructions: Displays with emoji labels (🔴 CRITICAL, 🟡 IMPORTANT, 🟢 MINOR)
  • Multimodal context: Receives regenerated image from Stage 4 as visual reference
  • Field-specific guidance: Each instruction targets specific field (name, ticker, description, tags)
  • Targeted updates: Regenerates ONLY fields with instructions or in affected_fields list
  • Field preservation: Unaffected fields remain unchanged
  • Tag validation: Pydantic validator handles both list and comma-separated string inputs
  • Visual consistency: Metadata reflects regenerated image content (seen by LLM via multimodal input)

LLM Prompt Structure: Shows current metadata fields, then structured instructions with priority emoji labels (🔴 CRITICAL, 🟡 IMPORTANT, 🟢 MINOR) including field name, instruction text, and reasoning. Regenerated image provided as multimodal visual context.

MetadataProposal Model:

class MetadataProposal(BaseModel):
    token_name: Optional[str] = None  # None = keep current
    ticker: Optional[str] = None
    description: Optional[str] = None
    tags: Optional[List[str]] = None
    reasoning: str = "Metadata updated based on user feedback"

Stage 6: EditOutputStage (Final Action Creation) (~0.1 seconds)

  • Process: Create GeneratedEditProposalAction from processed context
  • Input: original_entry + edited_entry + edit_decision + image_regenerated
  • Output: GeneratedEditProposalAction for executor
  • Failure: Missing entries → Terminate with error
  • Context updates: edit_proposal_action
  • Triggered: Always runs (not conditional)

Key Features:

  • Comparison logic: Determines affected_fields by comparing original vs edited
  • Edit summary: Packages strategy, affected fields, image regeneration status
  • Session tracking: Includes session_id for proposal lifecycle management
  • Action packaging: Creates final action for GeneratedEditProposalExecutor

GeneratedEditProposalAction Structure:

{
  "session_id": "abc123...",
  "original_entry": MemecoinEntry(...),
  "edited_entry": MemecoinEntry(...),
  "edit_summary": {
    "strategy": "metadata_with_image",
    "affected_fields": ["name", "description"],
    "image_regenerated": True,
    "entity_count": 2
  },
  "user_feedback": "Change dog to cat"
}

Total Pipeline Time:

  • Metadata-only: ~4-7 seconds (Stage 1 + Stage 2 + Stage 3 + Stage 5 + Stage 6)
  • Image-only: ~9-14 seconds (Stage 1 + Stage 2 + Stage 3 + Stage 4 + Stage 6)
  • Full regeneration: ~12-18 seconds (All 6 stages)

🚀 Running the Workflow

Prerequisites

# Required environment variables
export OPENAI_API_KEY="sk-..."          # For LLM analysis
export REPLICATE_API_TOKEN="r8_..."    # For image generation

Orchestrator Integration

This workflow is designed for integration with GenerationOrchestrator, not standalone execution.

Web UI Flow:

User views generated memecoin (/generated.html)
    ↓
User clicks "Edit" or presses "E" key
    ↓
Navigate to /edit_generated_meme.html?uuid={uuid}&return_page={page}&return_index={index}
    ↓
User provides feedback (e.g., "Make it about dogs instead of cats")
    ↓
Frontend calls: POST /api/memecoins/generated/{uuid}/ai-edit
    Body: { "feedback": "..." }
    ↓
GenerationOrchestrator.generate_generated_edit_proposal()
    ↓
Runs GeneratedMemeEditAnalysisWorkflow (5-stage pipeline)
    ↓
Returns proposal with UUID (cached for 5 minutes)
    Response: { "proposal_uuid": "...", "proposal": {...} }
    ↓
User reviews proposal, optionally edits manually
    ↓
User clicks "Accept" or "Reject"
    ↓
If accepted:
  Frontend calls: PATCH /api/memecoins/generated/{uuid}
  Body: { "proposal_uuid": "...", "edited_proposal": {...} (optional) }
    ↓
  GenerationOrchestrator.accept_generated_edit_proposal()
    ↓
  In-place update (same UUID preserved)
    ↓
  Return to gallery with state preservation (?page=X&index=Y)

🔧 Configuration

Environment Variables Required:

  • OPENAI_API_KEY - OpenAI API key for LLM analysis and multimodal entity extraction
  • REPLICATE_API_TOKEN - Replicate API token for image generation

LLM: GPT-4 Turbo (temperature: 0.3) via litellm.yaml Image Generation: Stability AI SDXL via Replicate (512x512, guidance_scale: 7.5) Orchestrator Integration: See Orchestrator Integration section below for programmatic usage

🗄️ Entity Database Integration

The workflow uses EntityDatabaseService for fast reference image lookups during entity extraction.

How It Works

  1. EntityExtractionStage extracts entities from metadata, feedback, and current image
  2. For each extracted entity, performs O(1) direct key lookup in entity database
  3. Returns entity_reference_images Dict mapping entity names to Base64 images
  4. ImageRegenerationStage uses these references for accurate visual generation

Entity Database Structure & Lookup

Database Path: res/entity_database/{entity_type}/{entity_name}/ (e.g., meme_character/pepe/, crypto_logo/solana/) Lookup: O(1) direct key lookup via entity_database_service.get_entity_images(entity_name, entity_type) returns Base64 image array

Benefits vs Vector DB Queries

Aspect Entity Database (Current) Vector DB (Old)
Lookup Speed O(1) - instant O(n log n) - similarity search
Reliability 100% if entity exists Fuzzy matching, false positives
Complexity Simple key lookup Embedding generation + search
Latency <0.1 seconds 1-2 seconds
Memory Minimal Requires loaded embeddings

⚡ Async Processing Patterns

The workflow uses asyncio.Event for coordinating async workflow completion.

Async Completion Signaling

# In GeneratedMemeEditAnalysisWorkflow.run()

# Create async event for completion signaling
self.processing_complete = asyncio.Event()

# Wrap executor.execute() to signal completion
original_execute = self.executor.execute

async def execute_with_signal(action):
    result = await original_execute(action)
    self.processing_complete.set()  # Signal completion
    return result

self.executor.execute = execute_with_signal

# Run workflow
await self.workflow.run()

# Wait for completion with 90-second timeout
try:
    await asyncio.wait_for(
        self.processing_complete.wait(),
        timeout=90.0
    )
except asyncio.TimeoutError:
    raise RuntimeError("Edit workflow timed out after 90 seconds")

Why This Pattern?

  • Problem: DefaultWorkflow.run() doesn't wait for async executor completion
  • Solution: Wrap executor.execute() to signal when action processed
  • Benefits: Clean async coordination, timeout protection
  • Timeout: 90 seconds covers worst case (full regeneration with slow image gen)

📦 Proposal Lifecycle Management

Edit proposals have a 5-minute time-to-live (TTL) managed by GeneratedEditProposalExecutor.

Proposal Caching Flow

# 1. Generate Proposal (cached)
proposal_uuid = str(uuid.uuid4())
timestamp = datetime.now()
edited_entry = <MemecoinEntry with edits>

self._proposals[proposal_uuid] = (timestamp, edited_entry)
# Stored for 5 minutes

# 2. User Reviews Proposal (within 5 minutes)
# Frontend displays proposal, user can manually edit

# 3. Accept Proposal
if proposal_uuid in self._proposals:
    timestamp, cached_entry = self._proposals[proposal_uuid]

    # Check if expired
    if (datetime.now() - timestamp).total_seconds() > 300:  # 5 minutes
        raise ValueError("Proposal expired, please regenerate")

    # Apply updates (in-place, same UUID)
    await storage.update(uuid, cached_entry)

    # Remove from cache
    del self._proposals[proposal_uuid]

Proposal States

State Description Duration Actions
Generated Proposal created, cached 0-5 min Review, manual edit
Expired TTL exceeded >5 min Must regenerate
Accepted Applied to storage N/A Removed from cache
Rejected User canceled N/A Removed from cache

Manual Editing Support

Users can manually edit proposals before acceptance:

// Frontend allows manual editing
const editedProposal = {
  ...originalProposal,
  token_name: "User Tweaked Name",  // Manual override
  description: "Custom description"
};

// Send edited version to acceptance endpoint
PATCH /api/memecoins/generated/{uuid}
Body: {
  proposal_uuid: "...",
  edited_proposal: editedProposal  // Optional override
}

Behavior:

  • If edited_proposal provided: Use it instead of cached version
  • If edited_proposal null: Use cached version from executor
  • Proposal UUID still validated for expiration

🛡️ Error Handling

Critical Errors

  • LLM Service Unavailable: Return error to user, suggest retry
  • Image Generation Failure: Keep original image, log warning
  • Vector Store Error: Proceed without context, log warning

Retry Mechanisms

  • LLM failures: 1 retry with exponential backoff
  • Image generation: No retry (expensive), fail gracefully
  • Vector DB timeouts: Skip context retrieval, proceed

Graceful Degradation

  • No context found: Proceed with regeneration using only feedback
  • Image generation fails: Keep original image, update metadata only
  • Partial metadata update: Apply successful fields, flag failures

⚠️ Known Limitations

current_metadata Integration Gap (Phase 7 Incomplete)

Status: The dual plan architecture (Stages 1-6) is fully implemented and functional, but there's a critical integration gap preventing the current_metadata parameter from reaching the planning stage.

Impact: Metadata inconsistency detection feature is non-functional - the LLM cannot compare stored vs current metadata to detect mismatches.

What Works ✅:

  • UI correctly sends current_metadata to API
  • API correctly passes it to orchestrator
  • Orchestrator correctly accepts the parameter
  • Planning stage code correctly handles the parameter
  • All stage logic works as designed

What's Broken ❌:

  • Workflow signature doesn't accept current_metadata parameter
  • MemoryMemecoinInputSource doesn't support current_metadata
  • Processor doesn't populate context.results["current_metadata"]
  • Result: Planning stage always receives None

Example of Broken Feature:

User Action: Changes name from "CatCoin" to "DogeCoin" but ticker stays "CAT"
Expected: LLM detects mismatch, suggests "DOGE" ticker in metadata_plan
Actual: No inconsistency detection (current_metadata is None)

Workaround: None - requires code changes to complete the integration

For Details: See MEME_EDIT_PLANNING_REFACTOR.md Phase 7 for complete technical analysis and required fixes

Recommended Action: Complete Phase 7 integration to enable full feature set (currently 80% complete)


🔗 Integration with GenerationOrchestrator

API Endpoints

1. Generate Edit Proposal

POST /api/memecoins/generated/{uuid}/ai-edit

Request:
{
  "feedback": "Change it to be about dogs instead"
}

Response (GeneratedEditProposalResponse):
{
  "status": 200,
  "success": true,
  "message": "Edit proposal generated successfully",
  "proposal_uuid": "def456-...",  // Valid for 5 minutes
  "proposal": {
    "token_name": "Doge Meme Coin",
    "ticker": "DOGE",
    "description": "The ultimate dog meme for crypto",
    "tags": ["animals", "dogs", "meme"],
    "image_base64": "data:image/png;base64...",  // Only if image regenerated
    "edit_summary": {
      "strategy": "metadata_with_image",
      "affected_fields": ["token_name", "description", "tags"],
      "image_regenerated": true,
      "entity_count": 2
    }
  }
}

Error Responses:
- 400: Invalid feedback or memecoin not found
- 500: Processing error (LLM failure, image generation error)

2. Accept Edit Proposal

PATCH /api/memecoins/generated/{uuid}

Request (GeneratedAcceptProposalRequest):
{
  "proposal_uuid": "def456-...",
  "edited_proposal": {  // Optional: manually edited proposal
    "token_name": "Custom Doge Coin",
    "description": "Manually tweaked description"
  }
}

Response (GeneratedAcceptProposalResponse):
{
  "status": 200,
  "success": true,
  "message": "Edit applied successfully"
}

Error Responses:
- 400: Proposal not found or expired (>5 minutes)
- 404: Memecoin UUID not found
- 500: Storage error during update

Proposal Lifecycle:

  • Generated proposals cached in executor for 5 minutes
  • After 5 minutes, proposal_uuid becomes invalid
  • User must regenerate proposal if expired
  • Accepted proposals removed from cache immediately
  • In-place updates preserve original UUID

Orchestrator Code

# GenerationOrchestrator integration
async def generate_generated_edit_proposal(self, uuid: str, user_feedback: str):
    # Load current memecoin
    memecoin = await self._storage_service.load(uuid)

    # Use pre-initialized workflow (dependency injection)
    edited_entry = await self._edit_workflow.run(
        memecoin_data=memecoin.to_dict(),
        user_feedback=user_feedback
    )

    # Get edit summary
    summary = self._edit_workflow.get_edit_summary()

    # Cache proposal
    proposal_uuid = str(uuid.uuid4())
    self._edit_proposals[proposal_uuid] = (datetime.now(), edited_entry)

    return proposal_uuid, edited_entry, summary

🔍 Troubleshooting

Common Issues

Issue Cause Solution
"Strategy unclear" error Ambiguous feedback Provide clearer, more specific feedback
Image not regenerating EditDecisionStage determined not needed Add explicit "regenerate image" to feedback
"Proposal expired" error >5 minutes since generation Click "Generate Proposal" again
Entity extraction failing No entities in database Check EntityDatabaseService has required entities
Timeout after 90 seconds Slow image generation or LLM Check API service health, retry
"Missing original_entry" Context flow error Check stage graph execution order

Stage-Specific Issues

Stage 1 (EditDecisionStage):

  • Issue: LLM returns invalid strategy
  • Solution: Check feedback clarity, verify LLM service health

Stage 2 (EntityExtractionStage):

  • Issue: No entities extracted despite visible entities
  • Solution: Check enable_image_extraction=True, verify vision model

Stage 3 (ImageRegenerationStage):

  • Issue: Image quality poor or missing entities
  • Solution: Check entity_reference_images present, verify LLM prompt generation

Stage 4 (MetadataRegenerationStage):

  • Issue: Metadata inconsistent with image
  • Solution: Verify stage order (Image before Metadata), check affected_fields

Stage 5 (EditOutputStage):

  • Issue: Proposal missing fields
  • Solution: Verify all previous stages completed, check context.results

🏗️ Architecture Patterns & Design Decisions

Decision: 5-Stage Linear Pipeline with EditOutputStage

  • Decision: Extended from 4 to 5 stages with dedicated output stage
  • Rationale: Separates action creation from processing logic
  • Benefits: Cleaner processor design, reusable output pattern
  • Implementation: EditOutputStage creates GeneratedEditProposalAction

Decision: Entity Extraction Always Runs (Not Conditional)

  • Decision: EntityExtractionStage runs on every edit, even when no entities found
  • Rationale: Simplifies pipeline logic, prepares context for potential entity needs
  • Benefits: No conditional branching, consistent stage execution
  • Cost: Minimal (~0.5-1 second), acceptable for workflow consistency

Decision: Multimodal Entity Extraction by Default

  • Decision: enable_image_extraction=True by default in EntityExtractionStage
  • Rationale: Image context improves entity identification accuracy
  • Benefits: Extracts entities from both text (metadata + feedback) and current image
  • Implementation: Single LLM call with vision capabilities

Decision: Direct Entity Database Lookups (Not Vector DB)

  • Decision: Use EntityDatabaseService with O(1) key lookups instead of vector DB queries
  • Rationale: Faster, more reliable, simpler implementation
  • Benefits: Sub-second lookups, no similarity scoring needed
  • Tradeoff: Requires exact entity name matches (handled by LLM extraction)

Decision: Image Before Metadata Regeneration

  • Decision: Stage 3 (Image) runs before Stage 4 (Metadata)
  • Rationale: Metadata can reference regenerated image content
  • Benefits: Better text/image consistency when both regenerated
  • Example: Description can mention new visual elements from regenerated image

Decision: Base64-First Image Handling

  • Decision: ImageRegenerationStage returns Base64 data directly
  • Rationale: Eliminates file I/O, simplifies data flow
  • Benefits: Faster, no disk operations, immediate availability
  • Implementation: Stored in edited_entry.image_base64

Decision: Fail-Fast Error Handling (No Placeholder Fallbacks)

  • Decision: LLM failures terminate processing immediately
  • Rationale: Prevents low-quality outputs, maintains data integrity
  • Benefits: User gets clear error, can retry with better feedback
  • Exceptions: Image regeneration failures → keep original (graceful degradation)

Decision: 5-Minute Proposal TTL with Async Caching

  • Decision: Cache proposals for 5 minutes with asyncio.Event synchronization
  • Rationale: Balances user review time with memory usage
  • Benefits: Fast acceptance, prevents stale proposals
  • Implementation: GeneratedEditProposalExecutor with TTL-based cache

Decision: In-Place Updates (Same UUID Preservation)

  • Decision: Accepted edits update same UUID, no versioning
  • Rationale: Simplifies storage, matches user mental model
  • Benefits: Gallery position preserved, no duplicate entries
  • Tradeoff: No edit history (acceptable for generated content)

Decision: LLM Always Used for Prompts (No Fallback)

  • Decision: ImageRegenerationStage always uses LLM for prompt generation
  • Rationale: Ensures high-quality, context-aware prompts
  • Benefits: Better image quality, intelligent entity corrections
  • Cost: Additional LLM call (~2 seconds), worth the quality gain

Decision: Dual Plan Architecture (Image + Metadata)

  • Decision: MemeEditPlanningStage generates both image AND metadata plans in single LLM call
  • Rationale: Separates visual concerns from text concerns, enables inconsistency detection
  • Benefits: Each downstream stage receives ONLY relevant instructions, better separation of concerns
  • Implementation: Single LLM call with structured output (MemeEditPlan model)
  • Status: ⚠️ 80% complete - stage architecture working, integration gap prevents current_metadata flow

📚 Related Documentation

Core Workflow Components

Processing Stages

Supporting Services

Frontend Integration

Related Workflows


Status: ⚠️ Production-Ready with Limitations (80% Complete - Integration Gap in Phase 7) Version: 2.1.0 (6-stage architecture with dual planning) Last Updated: 2025-01-15 Architecture: Event → Processing (6 stages with dual plan) → Execution Key Features: Dual plan architecture (image + metadata), Multimodal entity extraction, Base64-first images, 5-minute proposal TTL Known Limitations: current_metadata integration incomplete - inconsistency detection non-functional (see "Known Limitations" section)