Skip to content

Latest commit

 

History

History
531 lines (382 loc) · 16.3 KB

File metadata and controls

531 lines (382 loc) · 16.3 KB

DATABASE_API.md - API Integration and Best Practices

📋 Document Summary

What This Document Covers:

  • RESTful API endpoints for memecoin management
  • Database query patterns and optimization
  • Performance best practices and caching strategies
  • Troubleshooting guide for common issues
  • Implementation status and code references

Sections in This Document:

Related Documentation:

Context Tags: #api #web-ui #performance #best-practices #troubleshooting


1. API Integration Patterns

LaunchAgencyBot exposes RESTful APIs for memecoin management and database operations through FastAPI. The API layer provides complete CRUD operations across the 3-tier storage workflow (Pending → Approved → Vector DB).

Key Implementation Files:

  • src/web_ui/api/memecoin_routes.py - Memecoin CRUD and curation APIs
  • src/web_ui/api/database_routes.py - Vector DB query and admin APIs
  • src/web_ui/api_models/memecoin.py - Pydantic response models
  • src/web_ui/main.py - FastAPI application entry point

2. Web UI Memecoin API

Implementation: src/web_ui/api/memecoin_routes.py

2.1 Stats Endpoint

Purpose: Get aggregated statistics across all storage tiers

Aspect Details
Endpoint GET /api/memecoins/stats
Response {pending: count, approved: count, database: count}
Use Case Dashboard statistics, progress tracking
Performance Cached globally, invalidated on mutations

2.2 Pending Collection

Purpose: Retrieve paginated pending memecoins for manual curation

Parameter Type Default Description
page int 1 Page number
limit int 20 Items per page

Endpoint: GET /api/memecoins/pending?page=1&limit=20

Response: Returns {memecoins: [...], page, limit, total}. Each memecoin includes: token_name, ticker, token_address, description, tags (array), tags_categories (array), image_data (Base64 data URI), image_caption (null for pending), creation_date (ISO 8601), has_image (boolean).

Features:

  • Includes Base64 image data for inline display
  • No captions (pending memecoins haven't been processed)
  • Sorted by creation date (newest first)

2.3 Approved Collection

Purpose: Retrieve paginated approved memecoins ready for processing

Endpoint: GET /api/memecoins/approved?page=1&limit=20 (same pagination parameters as pending collection)

Response: Identical to pending collection structure - Returns {memecoins: [...], page, limit, total} with same memecoin fields.

Key Difference: Memecoins have passed manual curation and are ready for AI caption generation and vector DB insertion (no captions yet, added during processing).


2.4 Approve Pending Memecoin

Purpose: Atomically move memecoin from pending → approved

POST /api/memecoins/approve/{token_address}

Path Parameters:

  • token_address (string, required): Solana token mint address

Response:

{
  "success": true,
  "message": "Memecoin moved to APPROVED collection"
}

Operation:

  1. Validates token exists in pending
  2. Moves JSON + JPG files atomically
  3. Invalidates stats cache
  4. Returns success confirmation

Error Cases:

  • 404: Token not found in pending
  • 500: File system operation failed

2.5 Delete Operations

Purpose: Remove memecoins from any storage tier

DELETE /api/memecoins/pending/{token_address}
DELETE /api/memecoins/approved/{token_address}
DELETE /api/memecoins/database/{token_address}

Path Parameters:

  • token_address (string, required): Solana token mint address

Response:

{
  "success": true,
  "message": "Memecoin deleted from PENDING collection"
}

Operations:

Delete from Pending/Approved:

  1. Deletes JSON metadata file
  2. Deletes JPG image file
  3. Invalidates stats cache

Delete from Database:

  1. Removes from all 5 ChromaDB collections
  2. Deletes metadata file
  3. Deletes image file
  4. Invalidates stats cache
  5. Invalidates caption cache for token

Error Cases:

  • 404: Token not found in specified collection
  • 500: Partial deletion (rollback attempted)

2.6 Degradation Operation (NEW)

Purpose: Move memecoin from vector DB → approved, stripping AI-generated captions

POST /api/memecoins/degrade/{token_address}

Path Parameters:

  • token_address (string, required): Solana token mint address

Response:

{
  "success": true,
  "message": "Memecoin deleted from database and moved to APPROVED collection"
}

Operation:

  1. Loads metadata from vector DB
  2. Strips caption field (intentional data loss)
  3. Saves stripped metadata to approved directory
  4. Copies image file to approved directory
  5. Deletes from all 5 vector DB collections
  6. Invalidates stats cache
  7. Invalidates caption cache for token

Use Case: Fix incorrect tags/captions and reprocess through pipeline

Critical Warning:

⚠️ CAPTION DATA LOSS: Degradation intentionally removes AI-generated captions. They must be regenerated by reprocessing through the 7-stage insertion pipeline.

UI Requirement: Always display confirmation dialog with warning message: "This will remove AI-generated captions. Captions will need to be regenerated when re-inserting into the database." Include use case explanation and provide Cancel/Degrade action buttons.


3. Database Query API

Implementation: src/web_ui/api/database_routes.py

3.1 Tag-Based Search

Purpose: Find memecoins matching specific tags

Parameter Type Required Description
tags string Yes Comma-separated tags (AND logic)
page int No Page number (default: 1)
limit int No Items per page (default: 20)

Endpoint: GET /api/database/search?tags=dog,space&page=1&limit=20

Response: Returns {memecoins: [...], page, limit, total}. Each memecoin includes standard fields plus image_caption object with 4 keys: entity (subject description), context (cultural/reference meaning), visual (style/aesthetics), emotions (tone/sentiment).

Behavior:

  • Returns memecoins matching ALL specified tags (AND logic)
  • Includes complete 4-part caption structure
  • Caption loaded from metadata files (not ChromaDB)
  • Cached per token address

3.2 Paginated Database Browse

Purpose: Browse vector database with optional text search

GET /api/database/memecoins?page=1&limit=20&search=doge

Query Parameters:

  • page (int, optional): Page number (default: 1)
  • limit (int, optional): Items per page (default: 20)
  • search (string, optional): Text search query

Response: Same structure as tag search

Features:

  • Efficient pagination via ChromaDB offset/limit
  • Optional text search across name, ticker, description
  • No memory overhead for unused results

3.3 Admin Clear Operation

Purpose: Clear entire vector database (admin only)

POST /api/database/admin/clear

Request Body:

{
  "confirm": "CLEAR_ALL_DATA"
}

Response:

{
  "success": true,
  "message": "Vector database cleared successfully",
  "deleted_count": 1203
}

Operation:

  1. Validates confirmation token
  2. Deletes all entries from 5 ChromaDB collections
  3. Deletes all image files
  4. Deletes all metadata files
  5. Clears caption cache
  6. Clears stats cache

Security:

  • Requires admin authentication (implementation-specific)
  • Requires explicit confirmation string
  • Irreversible operation

4. Data Models

Location: src/web_ui/api_models/memecoin.py

4.1 MemecoinResponse Model

Purpose: Unified Pydantic model for API responses (defined in src/web_ui/api_models/memecoin.py)

Field Details:

Field Type Source Notes
token_name str ChromaDB metadata Token name
ticker str ChromaDB metadata Token symbol
token_address str ChromaDB metadata Solana mint address (UUID)
description str Metadata file NOT in ChromaDB
tags List[str] ChromaDB metadata Split from comma-separated string
tags_categories List[str] ChromaDB metadata Split from comma-separated string
image_data str Image file Base64 data URI from FileImageManager
image_caption Optional[Dict] Metadata file NOT in ChromaDB, 4 keys: entity/context/visual/emotions
creation_date str ChromaDB metadata ISO 8601 timestamp
has_image bool Computed Always True in current implementation

5. Best Practices and Maintenance

6.1 Database Health Monitoring

Regular Health Checks

Health Check: Call vector_store.connection_health_check() to verify database status (returns: healthy flag, total_memecoins count, collections list).

Monitoring Schedule:

  • Development: Every startup
  • Production: Daily automated check
  • Critical: After bulk operations

Cleanup Operations

Cleanup: Use image_manager.cleanup_orphaned_files(valid_tokens) and metadata_manager.cleanup_orphaned_files(valid_tokens) to remove orphaned files. Valid tokens retrieved via vector_store.get_all_token_addresses().


5.2 Backup Strategy

Regular Backups: Full backup with tar -czf backup_$(date +%Y%m%d).tar.gz res/memecoins/rag_db/ or lightweight metadata-only with tar -czf metadata_backup_$(date +%Y%m%d).tar.gz res/memecoins/rag_db/metadata/

Restoration: Extract archive with tar -xzf backup_YYYYMMDD.tar.gz -C res/memecoins/. ChromaDB automatically rebuilds from persistent storage (no manual intervention needed).


6.4 Common Pitfalls to Avoid

❌ Don't Store Large Data in ChromaDB Metadata

  • Problem: Slows down all queries, increases memory usage
  • Solution: Store only small metadata, use file storage for large data

❌ Don't Skip Validation

  • Problem: Incomplete data corrupts vector DB, causes errors in RAG
  • Solution: Validate at every stage, fail fast on invalid data

❌ Don't Use Semicolons for Tag Separation

  • Problem: Legacy docs show semicolons, but code uses commas
  • Solution: Always use comma-separated tags: "dog,cat,frog"

❌ Don't Forget to Invalidate Caches

  • Problem: UI shows stale data after deletions/degradations
  • Solution: Invalidate caption cache and stats cache on all mutations

6.5 Monitoring Metrics

Key Metrics to Track:

  • Database Size: Call vector_store.get_stats() → returns total_examples, unique_tags, unique_categories
  • Storage Usage: Call image_manager.get_stats() → returns total_images, total_size_mb
  • Query Performance: Time vector_store.query_similar_weighted(embedding, n_results=10) for latency benchmarking

Performance Targets:

Metric Target Notes
Query latency < 500ms Weighted search, 10 results
Insertion throughput > 5 memecoins/sec Full 7-stage pipeline
Storage efficiency < 200KB per memecoin Image + metadata combined
Memory usage < 500MB For 10,000 memecoins

7. Troubleshooting Guide

7.1 Problem: Missing Images/Metadata

Symptoms:

  • API returns 404 for valid token addresses
  • UI shows broken images
  • Integrity validation reports missing files

Solution:

  1. Run integrity validation script
  2. Check file permissions on storage directories
  3. Look for failed atomic operations (leftover .tmp files)
  4. Check application logs for error messages
  5. Verify disk space availability

7.2 Problem: Slow Queries

Symptoms:

  • API responses take > 1 second
  • UI feels sluggish
  • High CPU usage

Solution:

  1. Check database size (too many memecoins?)
  2. Verify ChromaDB not storing large metadata
  3. Use pagination for large result sets
  4. Consider adding metadata filtering to reduce search space
  5. Profile query execution

7.3 Problem: Caption Data Lost After Degradation

Symptoms:

  • Degraded memecoin missing captions
  • User expects captions to persist

Solution:

  • This is intentional! Captions are stripped during degradation
  • Check approved collection files (no caption field in metadata)
  • Reprocess through 7-stage pipeline to regenerate captions

7.4 Problem: Orphaned Files Accumulating

Symptoms:

  • Storage growing faster than memecoin count
  • Integrity validation reports many orphaned files
  • Disk space issues

Solution:

  1. Run cleanup operations regularly
  2. Check for failed rollbacks (partial deletions)
  3. Implement scheduled cleanup job
  4. Monitor storage growth

7.5 Problem: Cache Invalidation Issues

Symptoms:

  • UI shows old stats after operations
  • Deleted memecoins still appear
  • Caption changes not reflected

Solution:

  1. Verify cache invalidation logic in API routes
  2. Clear caches manually if needed
  3. Check for multiple cache instances (if using multi-process)
  4. Consider using Redis for distributed caching

8. Implementation Status

8.1 Fully Implemented Features

Architecture

  • 5-Collection Multi-Embedding Architecture
  • Weighted Multi-Embedding Search (4/3/2/1)
  • Split Caption System (4-part captions)
  • 3-Tier Storage Workflow
  • Degradation System

Storage

  • Unified Metadata Storage
  • Atomic File Operations
  • FileImageManager and TokenMetadataFileManager

Processing

  • 7-Stage Processing Pipeline
  • Tag Classification System (356 tags)
  • Deduplication checking
  • Early validation

API

  • RESTful endpoints for all CRUD operations
  • Degradation endpoint
  • Tag-based search
  • Paginated browsing
  • Admin operations

Optimization

  • Caption caching (in-memory)
  • Stats caching (global)
  • Native ChromaDB pagination
  • Database integrity tools

8.2 Architecture Components

Component File Path Status
Vector Store src/vector_store/memecoin_store.py ✅ 5 collections, weighted search
Image Manager src/vector_store/file_managers/image_manager.py ✅ Atomic operations
Metadata Manager src/vector_store/file_managers/metadata_manager.py ✅ Validation, atomic writes
Storage Orchestrator src/orchestrators/meme_storage/meme_storage.py ✅ Degradation system
Insertion Processor src/domain/processor/rag_memecoin_insertion_processor.py ✅ 7-stage pipeline
Web UI src/web_ui/main.py ✅ FastAPI application
Memecoin API src/web_ui/api/memecoin_routes.py ✅ CRUD + degradation
Database API src/web_ui/api/database_routes.py ✅ Query + admin
API Models src/web_ui/api_models/memecoin.py ✅ Pydantic schemas

9. References

Primary Implementation Files

  • src/vector_store/memecoin_store.py - MemecoinVectorStore (5 collections, weighted search)
  • src/vector_store/file_managers/image_manager.py - FileImageManager (atomic JPG operations)
  • src/vector_store/file_managers/metadata_manager.py - TokenMetadataFileManager (JSON metadata)
  • src/orchestrators/meme_storage/meme_storage.py - MemeStorageOrchestrator (degradation system)
  • src/constants.py - Storage paths and configuration constants

Workflows and Processing

  • src/workflows/rag_memecoin_insertion_workflow.py - 7-stage processing pipeline
  • src/workflows/rag_memecoin_generation_workflow.py - RAG-enhanced generation

API and Web UI

  • src/web_ui/main.py - FastAPI application entry point
  • src/web_ui/api/memecoin_routes.py - Memecoin CRUD and degradation APIs
  • src/web_ui/api/database_routes.py - Vector DB query and admin APIs
  • src/web_ui/api_models/memecoin.py - MemecoinResponse data model

Document Status: Complete Last Updated: October 15, 2025 Architecture Version: 5-Collection Multi-Embedding with Degradation System