Add session-pinned Ollama base URL balancing#29
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: db10cd8f2d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if base_urls: | ||
| base_url = base_urls[0] |
There was a problem hiding this comment.
Preserve explicit base_url when base_urls is configured
When base_urls is present, this branch overwrites base_url with base_urls[0], so an explicitly configured [model].base_url is ignored. That breaks the documented/expected split where base_url remains the default (including RAG embedding endpoint) while base_urls is only the chat pool. A config like base_url=http://hostA and base_urls=[http://hostB,http://hostC] will silently route default and embedding traffic to hostB instead of hostA.
Useful? React with 👍 / 👎.
| assigned_url = urls[self._next_model_base_url_index % len(urls)] | ||
| self._next_model_base_url_index += 1 | ||
| self._model_base_url_assignments[assignment_key] = assigned_url |
There was a problem hiding this comment.
Evict stale model URL assignments after session turnover
Each new session/thread key is stored in _model_base_url_assignments and never removed, so a long-lived runtime accumulates one entry per historical session. Since chat/session IDs are effectively unbounded over time, this creates unbounded in-memory growth and stale pinning state even after sessions are closed.
Useful? React with 👍 / 👎.
Summary
base_urlspool support withload_balance = "session_round_robin"Testing
uv run pytest tests/test_deepagent_runtime_rag.py::test_runtime_config_reads_ollama_base_urls_from_toml tests/test_deepagent_runtime_rag.py::test_agent_runtime_assigns_ollama_base_urls_per_session tests/test_deepagent_runtime_rag.py::test_stateful_mcp_reuses_session_per_chainlit_session -quv run pytest -q