What This Document Covers:
- Entity extraction integration into database edit and RAG insertion workflows
- Implementation tracking with 5 phases (all complete)
- New stages: MetadataOnlyEditPlanningStage and CaptionAndTagsRegenerationStage
- Dual-plan architecture for generated vs database memecoins
- Testing checklist and known limitations
Context Tags: #entity-extraction #rag #workflow #implementation-tracking
Add entity extraction to database edit and RAG insertion workflows for improved AI understanding and metadata quality.
- Fix RAGMemecoinEditProcessor.process() signature - Added missing
action_handlerparameter - Initialize entity_database_service when None - Auto-initializes using get_entity_database_service()
- Verify database edit workflow no longer crashes
- Create MetadataOnlyEditPlanningStage
- Define MetadataOnlyEditPlan model (caption + tag instructions only)
- Implement planning logic for captions/tags
- Add comprehensive logging
- Skip image-related planning entirely
- Create CaptionAndTagsRegenerationStage
- Implement targeted caption updates (4-part structure)
- Implement tag modifications (add/remove/replace)
- Preserve immutable fields (name, ticker, description, image)
- Add validation to prevent immutable field changes
- Restructure RAGMemecoinEditProcessor pipeline
- Remove FeedbackAnalysisAndRegenerationStage from pipeline
- Add MetadataOnlyEditPlanningStage
- Add CaptionAndTagsRegenerationStage
- Configure EntityExtractionStage to skip reference images
- Update stage graph to 3-stage linear pipeline
- Deprecate FeedbackAnalysisAndRegenerationStage (added deprecation note)
- Add EntityExtractionStage to RAGMemecoinInsertionProcessor
- Position after MemecoinValidationStage
- Configure for full entity extraction with reference images
- Update stage graph connections (8 stages total)
- Ensure proper context data flow
- Update workflow documentation
- Fix MetadataOnlyEditPlanningStage system prompt (LLM validation errors)
- Add explicit JSON schema examples to system prompt
- Provide 3 concrete examples (caption-only, tag-only, combined)
- Clarify TagEditInstructions nested structure
- Add CRITICAL RULES and OUTPUT FORMAT sections
- Fix integration test workflow initialization
- Add RAGMemecoinEditAnalysisWorkflow initialization before orchestrator creation
- Pass workflow to MemeStorageOrchestrator constructor
- Fix inconsistent return values in accept_edit_proposal
- Changed 3-tuple return to 2-tuple (line 1220-1223)
- Now matches function signature
Tuple[bool, str]
- Test database edit with caption-only changes - PASSED ✅
- Entity caption modification working correctly
- Planning stage creates valid structured plan
- LLM follows JSON schema without validation errors
- Test database edit with tag-only changes - PASSED ✅
- Tag additions and removals working correctly
- Tag filtering validates against 356-tag vocabulary
- Invalid tags correctly filtered out
- Test database edit with combined caption + tag changes - PASSED ✅
- Visual caption modification + tag changes working
- Both caption instructions and tag instructions generated
- Regeneration stage applies changes correctly
- Verify immutable fields preserved in all cases - PASSED ✅
- Validation stage prevents modification of name, ticker, description, image
- Database memecoins correctly restricted to caption/tag edits only
- Integration test results: 3/3 edit proposal tests passed
- All 3 feedback scenarios generated valid proposals with UUIDs
- No Pydantic validation errors encountered
- 3-stage pipeline (entity extraction → planning → regeneration) executes correctly
- Test insertion workflow with entity extraction
- Performance validation
- Edge case testing (empty feedback, conflicting instructions)
1. EntityExtractionStage
- Extracts entities from metadata + user feedback
- Skips reference image loading (performance optimization)
- Outputs: extracted_entities
2. MetadataOnlyEditPlanningStage (NEW)
- Analyzes feedback and entities
- Creates structured plan for caption/tag modifications ONLY
- Outputs: metadata_edit_plan
3. CaptionAndTagsRegenerationStage (NEW)
- Applies structured plan from planning stage
- Updates ONLY captions and tags
- Preserves immutable fields
- Outputs: edited_entry
1. MemecoinValidationStage
2. EntityExtractionStage (NEW - added after validation)
3. CategoryClassificationStage
4. TagClassificationStage
5. ImageCaptioningStage
6. TagsRefinementStage
7. MemecoinEmbeddingStage
8. MemecoinInsertionActionStage
- Database Memecoins: Can ONLY modify captions and tags
- Immutable Fields: name, ticker, description, image
- No Image Planning: MetadataOnlyEditPlanningStage has no image logic
- Performance: Skip reference images in database workflow
src/domain/processor/rag_memecoin_edit_processor.py- Fixed signature, added service init
src/domain/processor/stages/metadata_only_edit_planning_stage.pysrc/domain/processor/stages/caption_and_tags_regeneration_stage.py
src/domain/processor/rag_memecoin_edit_processor.py- Pipeline restructure pendingsrc/domain/processor/rag_memecoin_insertion_processor.py- Add entity extraction pending
- Database edit workflow runs without TypeError
- Entity extraction provides value to planning stage
- Plans are clear and actionable
- Regeneration follows plan correctly
- Immutable fields never change
- Performance is acceptable
- Entity extraction purpose: Provide clear entity understanding to AI stages for better planning
- No EditDecisionStage needed since we can only modify captions/tags
- Reference images skipped in database workflow for performance