[FEAT] Abstract VLM providers into an extensible Strategy Pattern

### Is your feature request related to a problem? Please describe.

Yes. The current caption_image function hardcodes the initialization, API payload mapping, and error-handling logic specifically for OpenAI. If we want to expand our RAG pipeline to support other Vision-Language Models (VLMs) like Anthropic Claude, Google Gemini, or local models via Ollama/LLaVA, this function will rapidly grow into a deeply nested, brittle block of if/elif statements. This violates the Open-Closed Principle, making maintenance and scaling difficult.

### Describe the solution you'd like


We need to decouple the core image captioning pipeline from individual vendor SDK implementations by introducing a Strategy Pattern:

Create a dedicated directory structure (e.g., app/vision/providers/).

Define a standard abstract base class or interface (e.g., BaseVisionProvider) requiring a .caption(image_bytes: bytes) -> str method.

Refactor the existing OpenAI code into its own class (OpenAIVisionProvider) conforming to this interface.

Implement a simple factory or registry lookup inside caption_image that instantiates the correct provider dynamically based on the VISION_PROVIDER string setting.

### Describe alternatives you've considered

Keeping it inline: Continuing to append elif provider == "anthropic": blocks directly inside caption_image. This was rejected because vendor-specific error handling and dependencies will clutter the core pipeline file.

Function-mapping dict: Mapping string keys to simple standalone helper functions within the same file. While cleaner than nested conditions, it still leaves the file bloated with vendor-specific configuration code.

### Additional Context

-

### GSSoC '26

- [x] Yes, I am participating in GirlScript Summer of Code and would like to build this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Abstract VLM providers into an extensible Strategy Pattern #592

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional Context

GSSoC '26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[FEAT] Abstract VLM providers into an extensible Strategy Pattern #592

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional Context

GSSoC '26

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions