feat: add OpenAI-compatible HTTP server for mellea backends#746
feat: add OpenAI-compatible HTTP server for mellea backends#746markstur wants to merge 3 commits into
Conversation
- Implement FastAPI server with /v1/chat/completions endpoint - Add streaming support via Server-Sent Events - Support all mellea backends (Ollama, OpenAI, HF, Watsonx, LiteLLM) - Include tool calling and token usage tracking - Add comprehensive test suite (9 tests) - Provide documentation and usage examples - Enable deployment as standalone service Closes generative-computing#521 Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
|
First pass at implementing #521 and need to see if I interpreted it right. This server has a chat/completions endpoint that acts as a proxy to backends used by mellea. It is not a server that wraps mellea (e.g. an IVR loop). Creating mellea-as-a-backend might be more interesting but I think there is other work going on that might do that. There are some words in the issue that made me not sure which was the goal. The models list is pretty limited, but I'm thinking I would add a config (like litellm) listing all the provider/model possibilities that should show up in the models list. Otherwise it is just showing the one right now. Need to confirm this is even going the right direction. It's a bunch of nice code -- mostly generated -- but not sure it hits the objective. |
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
- Add tool calling tests (streaming and non-streaming) - Add multi-model session management tests - Add backend configuration tests (base_url, kwargs) - Add error handling and edge case tests - Remove dead code: convert_messages_to_context() All tests use granite4:micro and pass successfully. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
|
Added comments to issue #521 Thanks for the contribution - but I think whilst it does do what the issue suggests, that isn't what was intended, and what was intended I think is implemented via |
|
Had this on my plate to review this afternoon, but @planetf1 has pretty much already given my feedback. I'd recommend checking his Issue comment, I agree with his train of thought |
|
Let's close this. It's not going the right direction. See the issue #521 for more details related to m serve features. Thanks for the reviews/feedback. |
Misc PR
Type of PR
Description
feat: add OpenAI-compatible HTTP server for mellea backends
Testing