LCORE-1037: update BYOK and RAG guides to use lightspeed-stack config#1842
Conversation
- Replace pdm with uv for embedding model download command - Remove unnecessary OpenAI API key note - Fix Ollama section: clarify no tool RAG but inline RAG supported - Remove empty References section from RAG guide - Fix incorrect embedding dimensions for all-mpnet-base-v2 (768, not 1024/384) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WalkthroughThis PR updates documentation and configuration for BYOK and RAG setup. It corrects the embedding model download tool from ChangesDocumentation and Configuration Updates
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/rag_guide.md`:
- Line 226: Replace the blanket statement that "remote::ollama does not support
tool calling" with a clarification that remote::ollama (OGX/Llama Stack
integration) can support OpenAI-compatible tool calling via the standard
tools/tool_calls interface, but tool-calling or RAG-as-a-tool may fail depending
on the specific Ollama model's chat-template and tool wiring (e.g., custom or
nonstandard models); update the sentence in docs/rag_guide.md to reflect that
support is conditional on the model's chat-template/tool configuration rather
than absent altogether.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1d83c5af-58d1-4eb0-875e-30d4502b8b32
📒 Files selected for processing (3)
docs/byok_guide.mddocs/rag_guide.mdexamples/lightspeed-stack-byok-okp-rag.yaml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: unit_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: Pylinter
- GitHub Check: build-pr
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Applied to files:
examples/lightspeed-stack-byok-okp-rag.yaml
🔇 Additional comments (2)
examples/lightspeed-stack-byok-okp-rag.yaml (1)
41-41: LGTM!Also applies to: 48-48
docs/byok_guide.md (1)
164-164: ⚡ Quick winConfirm
uvfor rag-content embedding download script.The
rag-contentdocs instruct running the download script withuvlocally (e.g.,uv run python ./scripts/download_embeddings_model.py -l ... -r sentence-transformers/all-mpnet-base-v2), matching the updated command indocs/byok_guide.md.
|
|
||
| Tool calling with Ollama is not yet supported. | ||
| Currently, tool calling is not supported out of the box. Some experimental patches exist (including internal workarounds), but these are not officially released. | ||
| The `remote::ollama` provider does not support tool calling, so RAG as a tool is not available. However, inline RAG is supported. |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does Ollama support tool calling or function calling as of 2026?
💡 Result:
Yes. As of 2026-06-03, Ollama supports tool calling (also called function calling) in its chat API. Ollama’s official documentation states that “Ollama supports tool calling (also known as function calling)” and describes how to provide a tools list in the request, how the model returns tool calls, and how to send tool results back for the model to complete its reply [1]. The API documentation in Ollama’s repo likewise documents a tools parameter (with returned tool_calls) in chat requests [2], and Ollama’s blog post explains enabling tool calling by providing tools and that supported models return tool_calls [3]. So, for an up-to-date 2026 answer: Ollama has built-in tool/function calling capability (for models that support it), via the standard tools/tool_calls mechanism in the chat endpoints [1][2][3].
Citations:
- 1: https://docs.ollama.com/capabilities/tool-calling
- 2: https://github.com/ollama/ollama/blob/main/docs/api.md
- 3: https://ollama.com/blog/tool-support
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# List files that mention remote::ollama
rg -n "remote::ollama" -S . || true
# Find likely provider files (ts/js/python/etc) mentioning "ollama" and "provider"
rg -n "ollama" -S . | head -n 200 || true
# If there is a providers directory, narrow search
if [ -d "src" ]; then
rg -n "remote::ollama|ollama.*provider|provider.*ollama" -S src || true
fi
if [ -d "packages" ]; then
rg -n "remote::ollama|ollama.*provider|provider.*ollama" -S packages || true
fi
# Show the docs line context around docs/rag_guide.md:226
if [ -f "docs/rag_guide.md" ]; then
sed -n '210,245p' docs/rag_guide.md | nl -ba | sed -n '1,60p'
fiRepository: lightspeed-core/lightspeed-stack
Length of output: 2896
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show docs/providers.md around the ollama row
if [ -f "docs/providers.md" ]; then
sed -n '1,120p' docs/providers.md | nl -ba | sed -n '1,120p'
else
echo "Missing docs/providers.md"
fi
# Find files mentioning remote::ollama
rg -n "remote::ollama" -S . || true
# Find where provider_id="ollama" is used (docs/openapi.md likely, but also code)
rg -n "provider_id=\"ollama\"|provider_id='ollama'|provider.*ollama|ollama.*provider" -S . || true
# Search for tool calling plumbing in the codebase
rg -n "tool(s)?_call(s)?|tool_calls|tools\\b|function calling|function_calling" -S . || true
# Search for any ollama-specific client code that may build tool payloads
rg -n "ollama" -S src packages . | head -n 200 || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 119
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect docs/providers.md around the ollama row (avoid nl)
if [ -f "docs/providers.md" ]; then
# Print first 120 lines (likely includes table header)
sed -n '1,140p' docs/providers.md
echo "----"
# Print around where 'ollama | remote' appears
# Use line numbers via grep -n (supported) but avoid nl
line=$(grep -n "ollama |" docs/providers.md | head -n 1 | cut -d: -f1 || true)
if [ -n "${line:-}" ]; then
start=$(( line>20 ? line-20 : 1 ))
end=$(( line+40 ))
sed -n "${start},${end}p" docs/providers.md
else
# fallback: print a larger section
sed -n '1,240p' docs/providers.md
fi
else
echo "Missing docs/providers.md"
fi
echo "----"
# Find files that mention remote::ollama
rg -n "remote::ollama" -S . || true
echo "----"
# Find tool-calling related plumbing
rg -n "tool_calls|tool-call|function_call|function calling|\\btools\\b" -S . || true
echo "----"
# Find ollama provider code if present
rg -n "provider.*ollama|ollama.*provider|provider_id[=:\"']ollama" -S . || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 50392
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- rag_guide context ----"
sed -n '215,240p' docs/rag_guide.md | cat
echo "---- search for remote::ollama ----"
rg -n "remote::ollama" -S . --glob '!tests/**' || true
echo "---- search for ollama + tool calling indicators (exclude tests) ----"
rg -n "(ollama).*(tool calling|tool-calling|tool_calls|function calling|function_calling|\\btools\\b)" -S . --glob '!tests/**' || true
echo "---- search for tool calling indicators near ollama (exclude tests) ----"
rg -n "(tool calling|tool-calling|tool_calls|function calling|function_calling|\\btools\\b)" -S docs --glob '*ollama*' || true
rg -n "(ollama)" -S docs --glob '*.md' || true
echo "---- list likely source files mentioning ollama (exclude tests) ----"
rg -n "(ollama)" -S . --glob '!tests/**' --max-count 50 || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 6547
🌐 Web query:
llama-stack remote::ollama tool calling support tool_calls
💡 Result:
Llama Stack does support tool calling when using the remote::ollama provider, via the OpenAI-compatible chat.completions interface’s tools/tool_calls fields—but support can depend on (a) the Ollama model’s chat template/tool-calling wiring and (b) Llama Stack version. 1) Whether tool calling is supported - Llama Stack’s Ollama integration is implemented as a remote provider spec with provider_type "remote::ollama" [1]. - Tool calling is expected to work through standard OpenAI-compatible chat completions (tools passed in request, model returns tool_calls in response) as shown in Llama Stack tests and related provider work (e.g., remote-vllm tool calling support landed, indicating the general tool-calling mechanism is wired end-to-end for remote providers) [2]. - For Ollama specifically, Ollama’s own docs confirm it supports tool calling using tool_calls in responses and a follow-up request that includes tool results [3]. 2) Known caveat: tools may not be passed through for non-“Llama” Ollama models / certain model template conversions - There is an explicit Llama Stack issue reporting that when using an Ollama model that is not detected as a Llama model (e.g., custom-converted models), the.Tools variable in the Ollama template is not filled with the tools passed to Llama Stack, i.e., tool calling can fail because the model never receives tool definitions [4]. 3) Practical implication for “tool_calls” support - If your Ollama model’s chat template expects a.Tools section and that section is correctly populated by the Llama Stack Ollama provider, you should see tool_calls come back in chat completion responses (matching Ollama’s documented tool_calls behavior) [3]. - If the model is not properly detected / its template path differs (the scenario described in the issue), tool definitions may not be injected, and tool_calls may not be produced (or may be unreliable) [4]. 4) What to check in your setup - Ensure you are using the Llama Stack API endpoint variant that carries OpenAI-style tools (chat.completions) and passes them to the remote::ollama provider [1]. - Verify your Ollama model supports tool calling in its template (Ollama documents the required tool_calls flow) [3]. - If you use a custom-converted or nonstandard Ollama model, be aware of the reported provider/template bug for non-Llama models where tools are not injected [4]. Bottom line: Yes—Llama Stack + remote::ollama is intended to support tool calling via tool_calls, and Ollama supports tool_calls. But tool calling can break for certain Ollama models/templates (not properly detected as Llama), so your results may vary depending on the model and Llama Stack version [4][3].
Citations:
- 1: https://github.com/llamastack/llama-stack/blob/main/ARCHITECTURE.md
- 2: Add vLLM raw completions API ogx-ai/ogx#823
- 3: https://docs.ollama.com/capabilities/tool-calling
- 4: Ollama provider bug: not properly passing tools to ollama when model not detected as llama model ogx-ai/ogx#2691
🌐 Web query:
site:github.com/openai/llama-stack remote::ollama tools tool_calls
💡 Result:
- The repo you’re trying to search (
github.com/openai/llama-stack) doesn’t appear to be the active home anymore—“Llama Stack is now OGX” and the code lives atogx-ai/ogx(GitHub shows the rename/announcement in the README). [1] remote::ollamaprovider docs (OGX): configuration includesbase_urland the sample usesOLLAMA_URL(defaulting tohttp://localhost:11434/v1). [2]- Providers index (OGX): shows where
remote::ollamasits under Inference providers and links to it. [3] - Ollama tool calling (
tool_calls) reference: Ollama documents tool calling and showsresponse.message.tool_calls[...]usage. [4]
Sources
[1] (github.com)
[2] (ogx-ai.github.io)
[3] (ogx-ai.github.io)
[4] (docs.ollama.com)
Citations:
- 1: https://github.com/llamastack/llama-stack
- 2: https://ogx-ai.github.io/docs/providers/inference/remote_ollama
- 3: https://ogx-ai.github.io/docs/providers
- 4: https://docs.ollama.com/capabilities/tool-calling?utm_source=openai
🌐 Web query:
site:docs.ollama.com tool calling tools parameter tool_calls
💡 Result:
Tool calling is a capability in LLMs that allows the model to interact with external tools (such as functions, APIs, or databases) by requesting their execution during a conversation [1]. When a model determines that a tool is needed to answer a user's request, it generates a structured request—typically referred to as a tool call—instead of a standard text response [1]. In API implementations (such as the OpenAI-compatible API often used by tools like Ollama), this is facilitated through the tool_calls parameter [1][2]. How tool_calls works: 1. Tool Definition: The user provides a list of available tools (often using a JSON schema format) in the request to the model [1][3]. 2. Model Request: If the model decides to use a tool, it returns a response containing a tool_calls array [1]. Each object in this array includes details about the requested function, such as: - Name: The identifier of the function to execute [1]. - Arguments: The parameters the model has determined are necessary for the function [1]. 3. Execution and Response: The application code executes the specified function using the provided arguments [1]. After execution, the application sends the result back to the model by adding a new message to the conversation history with the role of tool, including the tool_name and the content (the function's output) [1]. 4. Final Answer: The model then processes this result and generates a final response for the user [1]. This process supports parallel tool calling, where multiple tool requests can be generated in a single turn, and multi-turn loops where the model continues to call tools until it has sufficient information to provide a final answer [1]. When using streaming, tool calls are emitted as partial chunks, which the client must accumulate to fully reconstruct the tool call before execution [1][4].
Citations:
- 1: https://docs.ollama.com/capabilities/tool-calling
- 2: https://docs.ollama.com/api/openai-compatibility
- 3: https://docs.ollama.com/llms-full.txt
- 4: https://docs.ollama.com/capabilities/streaming
Fix misleading remote::ollama “no tool calling” documentation
docs/rag_guide.md currently claims remote::ollama “does not support tool calling,” but Ollama (and the OGX/Llama Stack remote::ollama integration) supports tool calling via OpenAI-compatible tools/tool_calls. Tool-calling/RAG-as-a-tool can still fail depending on the specific Ollama model’s chat-template/tool wiring (e.g., custom/nonstandard models), so the docs should describe this dependency instead of stating blanket non-support.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/rag_guide.md` at line 226, Replace the blanket statement that
"remote::ollama does not support tool calling" with a clarification that
remote::ollama (OGX/Llama Stack integration) can support OpenAI-compatible tool
calling via the standard tools/tool_calls interface, but tool-calling or
RAG-as-a-tool may fail depending on the specific Ollama model's chat-template
and tool wiring (e.g., custom or nonstandard models); update the sentence in
docs/rag_guide.md to reflect that support is conditional on the model's
chat-template/tool configuration rather than absent altogether.
Summary
lightspeed-stack.yamlconfiguration instead of Llama Stackrun.yamlrun.yaml/ Llama Stack-specific references from both guidespdm→uv, remove stale OpenAI note, correct Ollama tool calling description, fix incorrect embedding dimensionsType of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
🤖 Generated with Claude Code
Summary by CodeRabbit
Documentation
uv runcommand for running embedding model downloads.Chores