LCORE-1037: update BYOK and RAG guides to use lightspeed-stack config by are-ces · Pull Request #1842 · lightspeed-core/lightspeed-stack

are-ces · 2026-06-03T09:01:53Z

Summary

Rewrite BYOK and RAG guides to reference lightspeed-stack.yaml configuration instead of Llama Stack run.yaml
Remove all run.yaml / Llama Stack-specific references from both guides
Address review nits: fix pdm → uv, remove stale OpenAI note, correct Ollama tool calling description, fix incorrect embedding dimensions

Type of change

Documentation Update

Tools used to create PR

Assisted-by: Claude Opus 4.6
Generated by: Claude Opus 4.6

Related Tickets & Documents

Closes LCORE-1037

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Documentation-only change. Verified YAML examples are consistent with current config model structure.

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Updated the BYOK setup guide to use the uv run command for running embedding model downloads.
- Clarified Ollama inference provider limitations regarding tool calling support in the RAG documentation.
- Removed unnecessary LLM configuration prerequisite instructions from the BYOK guide.
Chores
- Updated RAG vector store embedding dimension values in example deployment configurations.

- Replace pdm with uv for embedding model download command - Remove unnecessary OpenAI API key note - Fix Ollama section: clarify no tool RAG but inline RAG supported - Remove empty References section from RAG guide - Fix incorrect embedding dimensions for all-mpnet-base-v2 (768, not 1024/384) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-06-03T09:02:08Z

Walkthrough

This PR updates documentation and configuration for BYOK and RAG setup. It corrects the embedding model download tool from pdm to uv, removes outdated LLM configuration guidance, clarifies Ollama provider capabilities around tool-calling support, and updates example configuration to standardize embedding dimensions to 768.

Changes

Documentation and Configuration Updates

Layer / File(s)	Summary
BYOK guide tool and configuration updates `docs/byok_guide.md`	Embedding model download command uses `uv run python` instead of `pdm run python`. Removes note about configuring LLM inference provider (e.g., `OPENAI_API_KEY`) in `run.yaml`.
RAG guide Ollama provider clarification `docs/rag_guide.md`	Ollama `remote::ollama` provider documentation explicitly states tool-calling is unsupported (RAG-as-tool unavailable), while clarifying inline RAG remains supported. Removes caveats about OpenAI-compatible endpoints and experimental tool-calling patches. Extra formatting separator removed.
Example YAML embedding dimension configuration `examples/lightspeed-stack-byok-okp-rag.yaml`	BYOK RAG vector store configurations for `ocp-docs` and `knowledge-base` updated to use consistent `embedding_dimension: 768`.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

lightspeed-core/lightspeed-stack#1838: Directly overlaps with BYOK guide and example YAML configuration updates for embedding model dimensions and byok_rag configuration.

Suggested reviewers

tisnik
syedriko

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly references the main objective: updating BYOK and RAG guides to use lightspeed-stack config, which aligns with all three changed files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/rag_guide.md`:
- Line 226: Replace the blanket statement that "remote::ollama does not support
tool calling" with a clarification that remote::ollama (OGX/Llama Stack
integration) can support OpenAI-compatible tool calling via the standard
tools/tool_calls interface, but tool-calling or RAG-as-a-tool may fail depending
on the specific Ollama model's chat-template and tool wiring (e.g., custom or
nonstandard models); update the sentence in docs/rag_guide.md to reflect that
support is conditional on the model's chat-template/tool configuration rather
than absent altogether.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d83c5af-58d1-4eb0-875e-30d4502b8b32

📥 Commits

Reviewing files that changed from the base of the PR and between 6738771 and ac0b354.

📒 Files selected for processing (3)

docs/byok_guide.md
docs/rag_guide.md
examples/lightspeed-stack-byok-okp-rag.yaml

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: unit_tests (3.13)
GitHub Check: unit_tests (3.12)
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: Pylinter
GitHub Check: build-pr

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.

📚 Learning: 2026-05-20T08:09:30.641Z

Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

examples/lightspeed-stack-byok-okp-rag.yaml

🔇 Additional comments (2)

examples/lightspeed-stack-byok-okp-rag.yaml (1)

41-41: LGTM!

Also applies to: 48-48

docs/byok_guide.md (1)

164-164: ⚡ Quick win

Confirm uv for rag-content embedding download script.

The rag-content docs instruct running the download script with uv locally (e.g., uv run python ./scripts/download_embeddings_model.py -l ... -r sentence-transformers/all-mpnet-base-v2), matching the updated command in docs/byok_guide.md.

coderabbitai · 2026-06-03T09:07:20Z

-
-Tool calling with Ollama is not yet supported.  
-Currently, tool calling is not supported out of the box. Some experimental patches exist (including internal workarounds), but these are not officially released.  
+The `remote::ollama` provider does not support tool calling, so RAG as a tool is not available. However, inline RAG is supported.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does Ollama support tool calling or function calling as of 2026?

💡 Result:

Yes. As of 2026-06-03, Ollama supports tool calling (also called function calling) in its chat API. Ollama’s official documentation states that “Ollama supports tool calling (also known as function calling)” and describes how to provide a tools list in the request, how the model returns tool calls, and how to send tool results back for the model to complete its reply [1]. The API documentation in Ollama’s repo likewise documents a tools parameter (with returned tool_calls) in chat requests [2], and Ollama’s blog post explains enabling tool calling by providing tools and that supported models return tool_calls [3]. So, for an up-to-date 2026 answer: Ollama has built-in tool/function calling capability (for models that support it), via the standard tools/tool_calls mechanism in the chat endpoints [1][2][3].

Citations:

1: https://docs.ollama.com/capabilities/tool-calling

2: https://github.com/ollama/ollama/blob/main/docs/api.md

3: https://ollama.com/blog/tool-support

🏁 Script executed:

#!/bin/bash set -euo pipefail # List files that mention remote::ollama rg -n "remote::ollama" -S . || true # Find likely provider files (ts/js/python/etc) mentioning "ollama" and "provider" rg -n "ollama" -S . | head -n 200 || true # If there is a providers directory, narrow search if [ -d "src" ]; then rg -n "remote::ollama|ollama.*provider|provider.*ollama" -S src || true fi if [ -d "packages" ]; then rg -n "remote::ollama|ollama.*provider|provider.*ollama" -S packages || true fi # Show the docs line context around docs/rag_guide.md:226 if [ -f "docs/rag_guide.md" ]; then sed -n '210,245p' docs/rag_guide.md | nl -ba | sed -n '1,60p' fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 2896

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show docs/providers.md around the ollama row if [ -f "docs/providers.md" ]; then sed -n '1,120p' docs/providers.md | nl -ba | sed -n '1,120p' else echo "Missing docs/providers.md" fi # Find files mentioning remote::ollama rg -n "remote::ollama" -S . || true # Find where provider_id="ollama" is used (docs/openapi.md likely, but also code) rg -n "provider_id=\"ollama\"|provider_id='ollama'|provider.*ollama|ollama.*provider" -S . || true # Search for tool calling plumbing in the codebase rg -n "tool(s)?_call(s)?|tool_calls|tools\\b|function calling|function_calling" -S . || true # Search for any ollama-specific client code that may build tool payloads rg -n "ollama" -S src packages . | head -n 200 || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 119

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect docs/providers.md around the ollama row (avoid nl) if [ -f "docs/providers.md" ]; then # Print first 120 lines (likely includes table header) sed -n '1,140p' docs/providers.md echo "----" # Print around where 'ollama | remote' appears # Use line numbers via grep -n (supported) but avoid nl line=$(grep -n "ollama |" docs/providers.md | head -n 1 | cut -d: -f1 || true) if [ -n "${line:-}" ]; then start=$(( line>20 ? line-20 : 1 )) end=$(( line+40 )) sed -n "${start},${end}p" docs/providers.md else # fallback: print a larger section sed -n '1,240p' docs/providers.md fi else echo "Missing docs/providers.md" fi echo "----" # Find files that mention remote::ollama rg -n "remote::ollama" -S . || true echo "----" # Find tool-calling related plumbing rg -n "tool_calls|tool-call|function_call|function calling|\\btools\\b" -S . || true echo "----" # Find ollama provider code if present rg -n "provider.*ollama|ollama.*provider|provider_id[=:\"']ollama" -S . || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 50392

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "---- rag_guide context ----" sed -n '215,240p' docs/rag_guide.md | cat echo "---- search for remote::ollama ----" rg -n "remote::ollama" -S . --glob '!tests/**' || true echo "---- search for ollama + tool calling indicators (exclude tests) ----" rg -n "(ollama).*(tool calling|tool-calling|tool_calls|function calling|function_calling|\\btools\\b)" -S . --glob '!tests/**' || true echo "---- search for tool calling indicators near ollama (exclude tests) ----" rg -n "(tool calling|tool-calling|tool_calls|function calling|function_calling|\\btools\\b)" -S docs --glob '*ollama*' || true rg -n "(ollama)" -S docs --glob '*.md' || true echo "---- list likely source files mentioning ollama (exclude tests) ----" rg -n "(ollama)" -S . --glob '!tests/**' --max-count 50 || true

Repository: lightspeed-core/lightspeed-stack

Length of output: 6547

🌐 Web query:

llama-stack remote::ollama tool calling support tool_calls

💡 Result:

Llama Stack does support tool calling when using the remote::ollama provider, via the OpenAI-compatible chat.completions interface’s tools/tool_calls fields—but support can depend on (a) the Ollama model’s chat template/tool-calling wiring and (b) Llama Stack version. 1) Whether tool calling is supported - Llama Stack’s Ollama integration is implemented as a remote provider spec with provider_type "remote::ollama" [1]. - Tool calling is expected to work through standard OpenAI-compatible chat completions (tools passed in request, model returns tool_calls in response) as shown in Llama Stack tests and related provider work (e.g., remote-vllm tool calling support landed, indicating the general tool-calling mechanism is wired end-to-end for remote providers) [2]. - For Ollama specifically, Ollama’s own docs confirm it supports tool calling using tool_calls in responses and a follow-up request that includes tool results [3]. 2) Known caveat: tools may not be passed through for non-“Llama” Ollama models / certain model template conversions - There is an explicit Llama Stack issue reporting that when using an Ollama model that is not detected as a Llama model (e.g., custom-converted models), the.Tools variable in the Ollama template is not filled with the tools passed to Llama Stack, i.e., tool calling can fail because the model never receives tool definitions [4]. 3) Practical implication for “tool_calls” support - If your Ollama model’s chat template expects a.Tools section and that section is correctly populated by the Llama Stack Ollama provider, you should see tool_calls come back in chat completion responses (matching Ollama’s documented tool_calls behavior) [3]. - If the model is not properly detected / its template path differs (the scenario described in the issue), tool definitions may not be injected, and tool_calls may not be produced (or may be unreliable) [4]. 4) What to check in your setup - Ensure you are using the Llama Stack API endpoint variant that carries OpenAI-style tools (chat.completions) and passes them to the remote::ollama provider [1]. - Verify your Ollama model supports tool calling in its template (Ollama documents the required tool_calls flow) [3]. - If you use a custom-converted or nonstandard Ollama model, be aware of the reported provider/template bug for non-Llama models where tools are not injected [4]. Bottom line: Yes—Llama Stack + remote::ollama is intended to support tool calling via tool_calls, and Ollama supports tool_calls. But tool calling can break for certain Ollama models/templates (not properly detected as Llama), so your results may vary depending on the model and Llama Stack version [4][3].

Citations:

1: https://github.com/llamastack/llama-stack/blob/main/ARCHITECTURE.md

2: Add vLLM raw completions API ogx-ai/ogx#823

3: https://docs.ollama.com/capabilities/tool-calling

4: Ollama provider bug: not properly passing tools to ollama when model not detected as llama model ogx-ai/ogx#2691

🌐 Web query:

site:github.com/openai/llama-stack remote::ollama tools tool_calls

💡 Result:

The repo you’re trying to search (github.com/openai/llama-stack) doesn’t appear to be the active home anymore—“Llama Stack is now OGX” and the code lives at ogx-ai/ogx (GitHub shows the rename/announcement in the README). [1]

remote::ollama provider docs (OGX): configuration includes base_url and the sample uses OLLAMA_URL (defaulting to http://localhost:11434/v1). [2]

Providers index (OGX): shows where remote::ollama sits under Inference providers and links to it. [3]

Ollama tool calling (tool_calls) reference: Ollama documents tool calling and shows response.message.tool_calls[...] usage. [4]

Sources
[1] (github.com)
[2] (ogx-ai.github.io)
[3] (ogx-ai.github.io)
[4] (docs.ollama.com)

Citations:

1: https://github.com/llamastack/llama-stack

2: https://ogx-ai.github.io/docs/providers/inference/remote_ollama

3: https://ogx-ai.github.io/docs/providers

4: https://docs.ollama.com/capabilities/tool-calling?utm_source=openai

🌐 Web query:

site:docs.ollama.com tool calling tools parameter tool_calls

💡 Result:

Tool calling is a capability in LLMs that allows the model to interact with external tools (such as functions, APIs, or databases) by requesting their execution during a conversation [1]. When a model determines that a tool is needed to answer a user's request, it generates a structured request—typically referred to as a tool call—instead of a standard text response [1]. In API implementations (such as the OpenAI-compatible API often used by tools like Ollama), this is facilitated through the tool_calls parameter [1][2]. How tool_calls works: 1. Tool Definition: The user provides a list of available tools (often using a JSON schema format) in the request to the model [1][3]. 2. Model Request: If the model decides to use a tool, it returns a response containing a tool_calls array [1]. Each object in this array includes details about the requested function, such as: - Name: The identifier of the function to execute [1]. - Arguments: The parameters the model has determined are necessary for the function [1]. 3. Execution and Response: The application code executes the specified function using the provided arguments [1]. After execution, the application sends the result back to the model by adding a new message to the conversation history with the role of tool, including the tool_name and the content (the function's output) [1]. 4. Final Answer: The model then processes this result and generates a final response for the user [1]. This process supports parallel tool calling, where multiple tool requests can be generated in a single turn, and multi-turn loops where the model continues to call tools until it has sufficient information to provide a final answer [1]. When using streaming, tool calls are emitted as partial chunks, which the client must accumulate to fully reconstruct the tool call before execution [1][4].

Citations:

1: https://docs.ollama.com/capabilities/tool-calling

2: https://docs.ollama.com/api/openai-compatibility

3: https://docs.ollama.com/llms-full.txt

4: https://docs.ollama.com/capabilities/streaming

Fix misleading remote::ollama “no tool calling” documentation

docs/rag_guide.md currently claims remote::ollama “does not support tool calling,” but Ollama (and the OGX/Llama Stack remote::ollama integration) supports tool calling via OpenAI-compatible tools/tool_calls. Tool-calling/RAG-as-a-tool can still fail depending on the specific Ollama model’s chat-template/tool wiring (e.g., custom/nonstandard models), so the docs should describe this dependency instead of stating blanket non-support.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/rag_guide.md` at line 226, Replace the blanket statement that "remote::ollama does not support tool calling" with a clarification that remote::ollama (OGX/Llama Stack integration) can support OpenAI-compatible tool calling via the standard tools/tool_calls interface, but tool-calling or RAG-as-a-tool may fail depending on the specific Ollama model's chat-template and tool wiring (e.g., custom or nonstandard models); update the sentence in docs/rag_guide.md to reflect that support is conditional on the model's chat-template/tool configuration rather than absent altogether.

are-ces requested a review from syedriko June 3, 2026 09:03

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

syedriko approved these changes Jun 3, 2026

View reviewed changes

are-ces merged commit f1c6cec into lightspeed-core:main Jun 3, 2026
30 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCORE-1037: update BYOK and RAG guides to use lightspeed-stack config#1842

LCORE-1037: update BYOK and RAG guides to use lightspeed-stack config#1842
are-ces merged 1 commit into
lightspeed-core:mainfrom
are-ces:lcore-1037-update-byok-guide-stack-config

are-ces commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

are-ces commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

are-ces commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading