Add Ollama provider support for local model inference#9386
Add Ollama provider support for local model inference#9386simpletoolsindia wants to merge 2 commits intowarpdotdev:masterfrom
Conversation
This adds client-side support for using Ollama as a local LLM provider, enabling offline AI features and giving users control over their models. Changes: - Add Ollama variant to LLMProvider enum - Add ollama_url field to ApiKeys for BYOK configuration - Create ollama_client module with chat and streaming support - Add reqwest dependency for HTTP client Ollama provides a simple REST API at localhost:11434 that supports both streaming and non-streaming chat completions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Sridhar Karuppusamy.
|
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I requested changes on this pull request and posted feedback. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds an Ollama provider enum variant, persists an Ollama URL with BYOK settings, and introduces a new Ollama HTTP client.
Concerns
- The changes currently appear to break compilation in multiple places: the generated multi-agent API settings type is initialized with a new field that this PR does not add to the generated dependency, and the new Ollama client uses
anyhow::Contextin functions returningOllamaResult. - The streaming response handling does not correctly deserialize Ollama's boolean
donefield and treats arbitrary byte chunks as complete newline-delimited JSON records, so streaming would fail or drop/corrupt chunks. - Security pass: sending a user-configured local Ollama URL through remote request settings would leak local endpoint configuration and would not provide local inference semantics; Ollama calls should remain client-side unless a dedicated trusted routing design exists.
Verdict
Found: 2 critical, 2 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| openai, | ||
| google, | ||
| open_router, | ||
| ollama_url, |
There was a problem hiding this comment.
🚨 [CRITICAL] This initializes the generated warp_multi_agent_api::request::settings::ApiKeys with ollama_url, but this PR does not update that generated type, so cargo check -p ai will fail; additionally, local Ollama URLs should not be sent through remote request settings.
There was a problem hiding this comment.
Fixed. Removed ollama_url from api_keys_for_request() since local Ollama endpoints should never be sent through remote request settings.
| .get(format!("{}/api/tags", self.base_url)) | ||
| .send() | ||
| .await | ||
| .context("Failed to send request to Ollama")?; |
There was a problem hiding this comment.
🚨 [CRITICAL] context(...) converts the error into anyhow::Error, but this function returns OllamaResult, so ? cannot convert it to OllamaError; map the reqwest::Error directly instead, including the same pattern below.
There was a problem hiding this comment.
Fixed. Removed and changed to use directly which properly converts to via the trait.
|
|
||
| /// A streaming chunk from Ollama's chat API. | ||
| #[derive(Debug, Clone, Deserialize)] | ||
| #[serde(tag = "done")] |
There was a problem hiding this comment.
done as a boolean, but an internally tagged enum expects the tag to match a variant identifier, so serde_json::from_str will reject \"done\": false; use an untagged enum or a struct with done: bool.
There was a problem hiding this comment.
Fixed. Changed StreamChunk to use a helper struct that deserializes done as a boolean field and then converts to the appropriate enum variant.
| ))); | ||
| } | ||
|
|
||
| let stream = response.bytes_stream().map(|chunk_result| { |
There was a problem hiding this comment.
bytes_stream() chunks are not NDJSON record boundaries, so this drops additional lines, misparses split JSON objects, and emits fake empty chunks; buffer until newline boundaries and yield each parsed record.
There was a problem hiding this comment.
Fixed. Rewrote the streaming parser to buffer characters and only parse complete lines delimited by newlines, ensuring proper NDJSON parsing.
- Remove ollama_url from api_keys_for_request() since local endpoints should not be sent through remote request settings - Fix StreamChunk to use untagged deserialization with a helper struct since Ollama sends done as a boolean, not a variant tag - Fix error handling in list_models and chat to map reqwest errors directly instead of using anyhow::Context - Fix chat_streaming to properly buffer by newline boundaries instead of using bytes_stream() which splits chunks incorrectly Co-Authored-By: Claude <noreply@anthropic.com>
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Sridhar Karuppusamy.
|
|
I'm checking this implementation PR for association with a likely matching ready issue. Powered by Oz |
Summary
Ollamavariant toLLMProviderenumollama_urlfield toApiKeysfor BYOK (Bring Your Own Key) configurationollama_clientmodule with chat and streaming supportreqwestdependency for HTTP clientMotivation
This enables Warp to use locally-hosted Ollama models, providing:
Technical Details
Ollama provides a REST API at
localhost:11434that supports both streaming and non-streaming chat completions.API Endpoints Used
GET /api/tags- List available modelsPOST /api/chat- Chat completions (with streaming support)Test Plan
cargo check -p aipassescargo test -p ai -- ollamapassesTODO
🤖 Generated with Claude Code