Skip to content

Add session-pinned Ollama base URL balancing#29

Open
AminMahpour wants to merge 1 commit into
masterfrom
dev/ollama-session-load-balancing
Open

Add session-pinned Ollama base URL balancing#29
AminMahpour wants to merge 1 commit into
masterfrom
dev/ollama-session-load-balancing

Conversation

@AminMahpour
Copy link
Copy Markdown
Owner

Summary

  • add Ollama base_urls pool support with load_balance = "session_round_robin"
  • pin each Chainlit session to one Ollama URL while keeping the base URL override behavior intact
  • update docs and examples to show the new model config shape
  • add regression tests for config parsing, session assignment, and MCP session cache behavior

Testing

  • uv run pytest tests/test_deepagent_runtime_rag.py::test_runtime_config_reads_ollama_base_urls_from_toml tests/test_deepagent_runtime_rag.py::test_agent_runtime_assigns_ollama_base_urls_per_session tests/test_deepagent_runtime_rag.py::test_stateful_mcp_reuses_session_per_chainlit_session -q
  • uv run pytest -q

@AminMahpour AminMahpour marked this pull request as ready for review May 6, 2026 01:46
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: db10cd8f2d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread deepagent_runtime.py
Comment on lines +824 to +825
if base_urls:
base_url = base_urls[0]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve explicit base_url when base_urls is configured

When base_urls is present, this branch overwrites base_url with base_urls[0], so an explicitly configured [model].base_url is ignored. That breaks the documented/expected split where base_url remains the default (including RAG embedding endpoint) while base_urls is only the chat pool. A config like base_url=http://hostA and base_urls=[http://hostB,http://hostC] will silently route default and embedding traffic to hostB instead of hostA.

Useful? React with 👍 / 👎.

Comment thread deepagent_runtime.py
Comment on lines +2115 to +2117
assigned_url = urls[self._next_model_base_url_index % len(urls)]
self._next_model_base_url_index += 1
self._model_base_url_assignments[assignment_key] = assigned_url
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Evict stale model URL assignments after session turnover

Each new session/thread key is stored in _model_base_url_assignments and never removed, so a long-lived runtime accumulates one entry per historical session. Since chat/session IDs are effectively unbounded over time, this creates unbounded in-memory growth and stale pinning state even after sessions are closed.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant