Skip to content

Latest commit

 

History

History
345 lines (284 loc) · 12.6 KB

File metadata and controls

345 lines (284 loc) · 12.6 KB

🎯 Launch Detection Workflow (Simplified)

📋 Document Summary

What This Document Covers:

  • Simplified real-time token launch monitoring from pump.fun
  • 3-stage pipeline (metadata → image → file storage)
  • WebSocket-based event stream with automatic reconnection
  • File-only storage (no AI processing, no database)
  • High-throughput data capture for later processing

Sections in This Document:

Related Documentation:

Context Tags: #workflow #real-time #websocket #launch-detection #file-storage


Real-time token monitoring with file-only storage and no AI processing

Core Concept

The Launch Detection Workflow provides a simplified, lightweight solution for monitoring new token launches from pump.fun. Unlike the AI-enhanced immediate processing variant, this workflow focuses on fast data capture with minimal processing overhead.

This creates a high-throughput foundation for collecting raw token data that can be processed later by other workflows when AI enhancement is needed.

Architecture Philosophy

WebSocket Stream → 3-Stage Pipeline → File Storage Only

Core Assumption: Fast capture beats perfect processing - collect everything first, enhance later.

Key Innovation: Zero AI dependencies enable reliable 24/7 monitoring without API quota concerns or service failures.

Component Architecture

┌─────────────────────┐    ┌──────────────────────┐    ┌─────────────────────┐
│ TokenLaunchSource   │───▶│ SimpleLaunchDetection│───▶│ SimpleFile          │
│ (WebSocket Client)  │    │ Processor            │    │ Executor            │
│                     │    │                      │    │                     │
│ • PumpPortal API    │    │ 3-Stage Pipeline:    │    │ • File Storage      │
│ • Event Generation  │    │ 1. TokenMetadata     │    │ • Pending Directory │
│ • Reconnection      │    │ 2. ImagePreparation  │    │ • JSON + JPG Pairs  │
└─────────────────────┘    │ 3. SimpleFileStorage │    └─────────────────────┘
                           └──────────────────────┘

Data Flow

PumpPortal WebSocket Message
    ↓
TokenLaunchEvent {
  raw_data: { mint, name, symbol, uri, ... }
  timestamp: "2024-01-04T15:30:42Z"
  platform: "pump.fun"
}
    ↓
Processing Context {
  input_data: TokenLaunchEvent
  results: {}
  should_continue_processing: true
  termination_reason: null
}
    ↓
[3 Sequential Stages - No AI Processing]
    ↓
File Output: JSON + JPG Pairs
    ↓
Pending Directory: res/memecoins/pending/

Processing Pipeline - Stage Classes

1. TokenMetadataStage - Foundation Data Extraction

class TokenMetadataStage(BaseStage[TokenLaunchEvent, TokenLaunchEvent])
  • Input: Raw WebSocket data with token address and IPFS URI
  • Processing:
    • Parses token creation data
    • Fetches IPFS metadata with retry logic (2 attempts, 1s/2s delays)
    • Extracts social links (website, twitter, telegram)
    • Creates basic MemecoinEntry object
  • Output: Enriched token data with complete metadata
  • Early termination: Missing name, symbol, token_address, or IPFS fetch failure
  • Context updates: memecoin_entry, ipfs_data

2. ImagePreparationStage - Image Processing

class ImagePreparationStage(BaseStage[TokenLaunchEvent, TokenLaunchEvent])
  • Input: MemecoinEntry with image URL from metadata
  • Processing:
    • Downloads image from IPFS with retry logic
    • Resizes to 512x512px for consistency
    • Converts to JPEG format with 90% quality
    • Base64 encoding for storage
    • MIME type detection and validation
  • Output: Processed image data in MemecoinEntry
  • Early termination: Image download failure, invalid format, or processing error
  • Context updates: image_base64 (updated in memecoin_entry)

3. SimpleFileStorageStage - File Persistence

class SimpleFileStorageStage(BaseStage[TokenLaunchEvent, TokenLaunchEvent])
  • Input: Complete MemecoinEntry with processed image
  • Processing:
    • Saves JSON metadata to {token_address}.json
    • Saves processed JPEG to {token_address}.jpg
    • Uses atomic file operations to prevent corruption
    • Stores in res/memecoins/pending/ directory
  • Output: File paths for saved data
  • Early termination: File system errors or write failures
  • Context updates: saved_json_path, saved_image_path

Stage Graph Configuration

StageGraph(
    nodes=[
        StageNode(TokenMetadataStage(), next_stages=["ImagePreparationStage"]),
        StageNode(ImagePreparationStage(), next_stages=["SimpleFileStorageStage"]),
        StageNode(SimpleFileStorageStage(), next_stages=[])
    ],
    entry_point="TokenMetadataStage"
)

What's NOT Included (Simplified Design)

No AI Processing

  • No tag classification or LLM analysis
  • No caption generation
  • No CLIP embeddings
  • No vector database operations
  • No multimodal AI analysis

No Database Operations

  • No ChromaDB insertion
  • No pending/confirmed collection workflow
  • No vector similarity operations
  • No database dependencies

No Complex Validation

  • No AI-powered quality gates
  • No semantic tag requirements
  • No embedding validation
  • Minimal data completeness checks

Integration Points

Input: WebSocket Token Events

  • Connects to PumpPortal API for pump.fun launches
  • Automatic reconnection with exponential backoff
  • Platform-agnostic design for future expansion (Raydium, Jupiter)

Output: File-Based Storage

  • JSON Files: Complete token metadata in {token_address}.json format
  • JPG Files: Processed 512x512px images as {token_address}.jpg
  • Pending Directory: All files saved to res/memecoins/pending/
  • Atomic Operations: Prevents data corruption during writes

No AI Services Required

  • Zero API dependencies for LLM or embedding services
  • No quota limits or service failures
  • No authentication tokens needed
  • Reliable 24/7 operation capability

Usage

Quick Start

# No environment variables required - zero AI dependencies
# Start simplified monitoring
PYTHONPATH=. python src/workflows/live_launch_detection_workflow.py

Expected Output

🚀 Simplified Token Launch Detection Workflow Starting...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📡 Connecting to token launch monitoring APIs...
🎯 Monitoring for new token launches on pump.fun...
📁 File storage enabled - memecoins will be saved as JSON + JPG pairs
🖼️ Images processed to 512x512px JPEG with 90% quality
💾 Target directory: res/memecoins/pending/
❌ AI processing disabled - no tags, captions, or embeddings
❌ Database storage disabled - only file operations
📊 Complete token data will be stored as files for later processing
🛑 Press Ctrl+C to stop
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

File Output Format

Directory Structure

res/memecoins/pending/
├── 6eW4Cc3rFFGe7RXVDWA6cKPJzJxPwL9bDCFxbKcZpump.json
├── 6eW4Cc3rFFGe7RXVDWA6cKPJzJxPwL9bDCFxbKcZpump.jpg
├── 8mWqZxPCFuFGF7RXVDWA6cKPJzJxPwL9bDCFxbKcZpump.json
├── 8mWqZxPCFuFGF7RXVDWA6cKPJzJxPwL9bDCFxbKcZpump.jpg
└── ...

JSON Format

{
  "token_name": "Trump Save America",
  "ticker": "TRUMPSAVE", 
  "description": "Make America Great Again memecoin for patriots",
  "token_address": "6eW4Cc3rFFGe7RXVDWA6cKPJzJxPwL9bDCFxbKcZpump",
  "created_at": "2025-01-15T10:30:42Z",
  "": "pump.fun"
}

Image Format

  • Format: JPEG (.jpg)
  • Resolution: 512x512px (resized and cropped)
  • Quality: 90% JPEG compression
  • Size: Typically 50-150KB per image

Value Proposition

For High-Volume Monitoring

  • Zero AI API costs enable continuous operation
  • High success rate ensures minimal data loss
  • Fast processing keeps up with busy launch periods
  • No service dependencies reduce failure points

For Later Processing

  • Complete file pairs ready for AI enhancement workflows
  • Standardized format compatible with meme processing workflow
  • Atomic file operations ensure data integrity
  • Pending directory serves as processing queue

For Development & Testing

  • No API keys or external services required
  • Fast iteration and testing cycles
  • Simple debugging with file-based output
  • Easy integration testing without AI complexity

Integration with Other Workflows

Input for Meme Processing Workflow

Files created by this workflow can be processed by:

# Process files captured by launch detection
PYTHONPATH=. python src/workflows/rag_memecoin_insertion_workflow.py res/memecoins/pending/

Input for AI Enhancement

The immediate processing workflow can handle the same events with full AI:

# For AI-enhanced processing (requires API keys)
PYTHONPATH=. python src/workflows/launch_detection_immediate_processing_workflow.py

Manual Curation

Files can be reviewed via Web UI:

# Web interface for file review
PYTHONPATH=. uvicorn src.web_ui.main:app --port 8000 --host 127.0.0.1 --reload

Design Decisions & Tradeoffs

Simplicity vs. AI Enhancement

  • Decision: Remove all AI processing for reliability
  • Tradeoff: No semantic tags or embeddings, but 95% success rate
  • Reasoning: Capture everything first, enhance later when needed

File Storage vs. Database

  • Decision: Use file-based storage only
  • Tradeoff: No immediate querying, but zero database dependencies
  • Reasoning: Files are portable and can be processed by multiple workflows

Image Processing vs. Raw Storage

  • Decision: Process images to standard format during capture
  • Tradeoff: Slight processing overhead, but consistent downstream compatibility
  • Reasoning: 512x512px JPEG is optimal for web UI and AI processing

Atomic Operations vs. Speed

  • Decision: Use atomic file operations even for simple storage
  • Tradeoff: Slightly slower writes, but prevents data corruption
  • Reasoning: Data integrity is critical for downstream processing

Dependencies & Configuration

Required Dependencies

  • Python 3.11+ with standard library
  • WebSocket connection capability
  • File system write access to res/memecoins/pending/

No External APIs Required

  • No REPLICATE_API_KEY needed
  • No OPENAI_API_KEY needed
  • No LANGFUSE_SECRET_KEY needed
  • No database connections required

System Requirements

  • Disk space: ~200KB per token (JSON + processed image)
  • Memory: Minimal - processes one token at a time
  • Network: Stable connection to pump.fun WebSocket API

Monitoring & Observability

Console Output

💾 Saving files for token: 6eW4Cc3rFFGe7RXVDWA6cKPJzJxPwL9bDCFxbKcZpump
✅ Successfully saved files for token 6eW4Cc3rFFGe7RXVDWA6cKPJzJxPwL9bDCFxbKcZpump
   📄 JSON: 6eW4Cc3rFFGe7RXVDWA6cKPJzJxPwL9bDCFxbKcZpump.json
   🖼️ JPG: 6eW4Cc3rFFGe7RXVDWA6cKPJzJxPwL9bDCFxbKcZpump.jpg (512x512px, processed)

Error Handling

  • Early termination on critical failures (missing metadata)
  • Graceful degradation for image processing failures
  • Atomic file operations prevent partial writes
  • Comprehensive error logging with failure reasons

File System Monitoring

  • Monitor res/memecoins/pending/ directory growth
  • Track file pair completeness (JSON + JPG)
  • Watch for failed temp files (.tmp.json, .tmp.jpg)
  • Monitor disk space usage for high-volume periods

The Simplified Launch Detection Workflow provides a robust, zero-dependency solution for high-volume token monitoring. Its focus on reliable data capture makes it ideal for 24/7 operations and serves as the foundation for more sophisticated AI-enhanced processing workflows.