Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions docs/byok_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ You can use the embedding generation step mentioned in the rag-content repo:

```bash
mkdir ./embeddings_model
pdm run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2
uv run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2
```

#### Option 2: Manual Download and Configuration
Expand Down Expand Up @@ -340,10 +340,6 @@ rag:
- company-docs
```

> [!NOTE]
> Your LLM inference provider (e.g., OpenAI, vLLM) must also be configured in your `run.yaml`.
> For OpenAI, set the `OPENAI_API_KEY` environment variable.

### Example 2: Multiple Knowledge Sources with pgvector

A configuration combining a local FAISS store (via `byok_rag`) with a remote pgvector store (configured directly in the Llama Stack configuration file):
Expand Down
10 changes: 1 addition & 9 deletions docs/rag_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,11 +223,7 @@ Not yet supported.

### Ollama

The `remote::ollama` provider can be used for inference. However, it does not support tool calling, including RAG.
While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.

Tool calling with Ollama is not yet supported.
Currently, tool calling is not supported out of the box. Some experimental patches exist (including internal workarounds), but these are not officially released.
The `remote::ollama` provider does not support tool calling, so RAG as a tool is not available. However, inline RAG is supported.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does Ollama support tool calling or function calling as of 2026?

💡 Result:

Yes. As of 2026-06-03, Ollama supports tool calling (also called function calling) in its chat API. Ollama’s official documentation states that “Ollama supports tool calling (also known as function calling)” and describes how to provide a tools list in the request, how the model returns tool calls, and how to send tool results back for the model to complete its reply [1]. The API documentation in Ollama’s repo likewise documents a tools parameter (with returned tool_calls) in chat requests [2], and Ollama’s blog post explains enabling tool calling by providing tools and that supported models return tool_calls [3]. So, for an up-to-date 2026 answer: Ollama has built-in tool/function calling capability (for models that support it), via the standard tools/tool_calls mechanism in the chat endpoints [1][2][3].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# List files that mention remote::ollama
rg -n "remote::ollama" -S . || true

# Find likely provider files (ts/js/python/etc) mentioning "ollama" and "provider"
rg -n "ollama" -S . | head -n 200 || true

# If there is a providers directory, narrow search
if [ -d "src" ]; then
  rg -n "remote::ollama|ollama.*provider|provider.*ollama" -S src || true
fi
if [ -d "packages" ]; then
  rg -n "remote::ollama|ollama.*provider|provider.*ollama" -S packages || true
fi

# Show the docs line context around docs/rag_guide.md:226
if [ -f "docs/rag_guide.md" ]; then
  sed -n '210,245p' docs/rag_guide.md | nl -ba | sed -n '1,60p'
fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 2896


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show docs/providers.md around the ollama row
if [ -f "docs/providers.md" ]; then
  sed -n '1,120p' docs/providers.md | nl -ba | sed -n '1,120p'
else
  echo "Missing docs/providers.md"
fi

# Find files mentioning remote::ollama
rg -n "remote::ollama" -S . || true

# Find where provider_id="ollama" is used (docs/openapi.md likely, but also code)
rg -n "provider_id=\"ollama\"|provider_id='ollama'|provider.*ollama|ollama.*provider" -S . || true

# Search for tool calling plumbing in the codebase
rg -n "tool(s)?_call(s)?|tool_calls|tools\\b|function calling|function_calling" -S . || true

# Search for any ollama-specific client code that may build tool payloads
rg -n "ollama" -S src packages . | head -n 200 || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 119


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect docs/providers.md around the ollama row (avoid nl)
if [ -f "docs/providers.md" ]; then
  # Print first 120 lines (likely includes table header)
  sed -n '1,140p' docs/providers.md
  echo "----"
  # Print around where 'ollama | remote' appears
  # Use line numbers via grep -n (supported) but avoid nl
  line=$(grep -n "ollama |" docs/providers.md | head -n 1 | cut -d: -f1 || true)
  if [ -n "${line:-}" ]; then
    start=$(( line>20 ? line-20 : 1 ))
    end=$(( line+40 ))
    sed -n "${start},${end}p" docs/providers.md
  else
    # fallback: print a larger section
    sed -n '1,240p' docs/providers.md
  fi
else
  echo "Missing docs/providers.md"
fi

echo "----"

# Find files that mention remote::ollama
rg -n "remote::ollama" -S . || true

echo "----"

# Find tool-calling related plumbing
rg -n "tool_calls|tool-call|function_call|function calling|\\btools\\b" -S . || true

echo "----"

# Find ollama provider code if present
rg -n "provider.*ollama|ollama.*provider|provider_id[=:\"']ollama" -S . || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 50392


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- rag_guide context ----"
sed -n '215,240p' docs/rag_guide.md | cat

echo "---- search for remote::ollama ----"
rg -n "remote::ollama" -S . --glob '!tests/**' || true

echo "---- search for ollama + tool calling indicators (exclude tests) ----"
rg -n "(ollama).*(tool calling|tool-calling|tool_calls|function calling|function_calling|\\btools\\b)" -S . --glob '!tests/**' || true

echo "---- search for tool calling indicators near ollama (exclude tests) ----"
rg -n "(tool calling|tool-calling|tool_calls|function calling|function_calling|\\btools\\b)" -S docs --glob '*ollama*' || true
rg -n "(ollama)" -S docs --glob '*.md' || true

echo "---- list likely source files mentioning ollama (exclude tests) ----"
rg -n "(ollama)" -S . --glob '!tests/**' --max-count 50 || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 6547


🌐 Web query:

llama-stack remote::ollama tool calling support tool_calls

💡 Result:

Llama Stack does support tool calling when using the remote::ollama provider, via the OpenAI-compatible chat.completions interface’s tools/tool_calls fields—but support can depend on (a) the Ollama model’s chat template/tool-calling wiring and (b) Llama Stack version. 1) Whether tool calling is supported - Llama Stack’s Ollama integration is implemented as a remote provider spec with provider_type "remote::ollama" [1]. - Tool calling is expected to work through standard OpenAI-compatible chat completions (tools passed in request, model returns tool_calls in response) as shown in Llama Stack tests and related provider work (e.g., remote-vllm tool calling support landed, indicating the general tool-calling mechanism is wired end-to-end for remote providers) [2]. - For Ollama specifically, Ollama’s own docs confirm it supports tool calling using tool_calls in responses and a follow-up request that includes tool results [3]. 2) Known caveat: tools may not be passed through for non-“Llama” Ollama models / certain model template conversions - There is an explicit Llama Stack issue reporting that when using an Ollama model that is not detected as a Llama model (e.g., custom-converted models), the.Tools variable in the Ollama template is not filled with the tools passed to Llama Stack, i.e., tool calling can fail because the model never receives tool definitions [4]. 3) Practical implication for “tool_calls” support - If your Ollama model’s chat template expects a.Tools section and that section is correctly populated by the Llama Stack Ollama provider, you should see tool_calls come back in chat completion responses (matching Ollama’s documented tool_calls behavior) [3]. - If the model is not properly detected / its template path differs (the scenario described in the issue), tool definitions may not be injected, and tool_calls may not be produced (or may be unreliable) [4]. 4) What to check in your setup - Ensure you are using the Llama Stack API endpoint variant that carries OpenAI-style tools (chat.completions) and passes them to the remote::ollama provider [1]. - Verify your Ollama model supports tool calling in its template (Ollama documents the required tool_calls flow) [3]. - If you use a custom-converted or nonstandard Ollama model, be aware of the reported provider/template bug for non-Llama models where tools are not injected [4]. Bottom line: Yes—Llama Stack + remote::ollama is intended to support tool calling via tool_calls, and Ollama supports tool_calls. But tool calling can break for certain Ollama models/templates (not properly detected as Llama), so your results may vary depending on the model and Llama Stack version [4][3].

Citations:


🌐 Web query:

site:github.com/openai/llama-stack remote::ollama tools tool_calls

💡 Result:

  • The repo you’re trying to search (github.com/openai/llama-stack) doesn’t appear to be the active home anymore—“Llama Stack is now OGX” and the code lives at ogx-ai/ogx (GitHub shows the rename/announcement in the README). [1]
  • remote::ollama provider docs (OGX): configuration includes base_url and the sample uses OLLAMA_URL (defaulting to http://localhost:11434/v1). [2]
  • Providers index (OGX): shows where remote::ollama sits under Inference providers and links to it. [3]
  • Ollama tool calling (tool_calls) reference: Ollama documents tool calling and shows response.message.tool_calls[...] usage. [4]

Sources
[1] (github.com)
[2] (ogx-ai.github.io)
[3] (ogx-ai.github.io)
[4] (docs.ollama.com)

Citations:


🌐 Web query:

site:docs.ollama.com tool calling tools parameter tool_calls

💡 Result:

Tool calling is a capability in LLMs that allows the model to interact with external tools (such as functions, APIs, or databases) by requesting their execution during a conversation [1]. When a model determines that a tool is needed to answer a user's request, it generates a structured request—typically referred to as a tool call—instead of a standard text response [1]. In API implementations (such as the OpenAI-compatible API often used by tools like Ollama), this is facilitated through the tool_calls parameter [1][2]. How tool_calls works: 1. Tool Definition: The user provides a list of available tools (often using a JSON schema format) in the request to the model [1][3]. 2. Model Request: If the model decides to use a tool, it returns a response containing a tool_calls array [1]. Each object in this array includes details about the requested function, such as: - Name: The identifier of the function to execute [1]. - Arguments: The parameters the model has determined are necessary for the function [1]. 3. Execution and Response: The application code executes the specified function using the provided arguments [1]. After execution, the application sends the result back to the model by adding a new message to the conversation history with the role of tool, including the tool_name and the content (the function's output) [1]. 4. Final Answer: The model then processes this result and generates a final response for the user [1]. This process supports parallel tool calling, where multiple tool requests can be generated in a single turn, and multi-turn loops where the model continues to call tools until it has sufficient information to provide a final answer [1]. When using streaming, tool calls are emitted as partial chunks, which the client must accumulate to fully reconstruct the tool call before execution [1][4].

Citations:


Fix misleading remote::ollama “no tool calling” documentation

docs/rag_guide.md currently claims remote::ollama “does not support tool calling,” but Ollama (and the OGX/Llama Stack remote::ollama integration) supports tool calling via OpenAI-compatible tools/tool_calls. Tool-calling/RAG-as-a-tool can still fail depending on the specific Ollama model’s chat-template/tool wiring (e.g., custom/nonstandard models), so the docs should describe this dependency instead of stating blanket non-support.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/rag_guide.md` at line 226, Replace the blanket statement that
"remote::ollama does not support tool calling" with a clarification that
remote::ollama (OGX/Llama Stack integration) can support OpenAI-compatible tool
calling via the standard tools/tool_calls interface, but tool-calling or
RAG-as-a-tool may fail depending on the specific Ollama model's chat-template
and tool wiring (e.g., custom or nonstandard models); update the sentence in
docs/rag_guide.md to reflect that support is conditional on the model's
chat-template/tool configuration rather than absent altogether.


### vLLM Mistral

Expand Down Expand Up @@ -386,7 +382,3 @@ You are a helpful assistant with access to a 'knowledge_search' tool. When users

The top-level `vector_stores` block in [`run.yaml`](../examples/run.yaml) may include `annotation_prompt_params` to control whether extra RAG annotation instructions are injected into the model prompt (for example, citation-style markers). The default configuration sets `enable_annotations: false` under that block to avoid unwanted annotations.

---

# References

4 changes: 2 additions & 2 deletions examples/lightspeed-stack-byok-okp-rag.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,14 @@ byok_rag:
- rag_id: ocp-docs # referenced in rag.inline / rag.tool
rag_type: inline::faiss
embedding_model: sentence-transformers/all-mpnet-base-v2
embedding_dimension: 1024
embedding_dimension: 768
vector_db_id: vs_123 # Vector store ID (from index generation)
db_path: /tmp/ocp.faiss
score_multiplier: 1.0 # Weight for this vector store's results (Inline RAG only)
- rag_id: knowledge-base # referenced in rag.inline / rag.tool
rag_type: inline::faiss
embedding_model: sentence-transformers/all-mpnet-base-v2
embedding_dimension: 384
embedding_dimension: 768
vector_db_id: vs_456 # Vector store ID (from index generation)
db_path: /tmp/kb.faiss
score_multiplier: 1.2 # Weight for this vector store's results (Inline RAG only)
Expand Down
Loading