refactor(llm)!: atomic switch to LLMModel protocol#1760
Conversation
Greptile SummaryThis PR performs an atomic switch from LangChain types to the new
|
| Filename | Overview |
|---|---|
| nemoguardrails/integrations/langchain/llm_adapter.py | New LangChain adapter; drops stop-parameter filtering for OpenAI reasoning models (o3, o4-mini, gpt-5 non-chat) that was present in the old pipeline — will cause API errors for those users. |
| nemoguardrails/actions/llm/utils.py | Clean rewrite to LLMModel protocol; LangChain-specific extraction logic removed; main concern is that streaming path doesn't call _store_reasoning_traces/_store_tool_calls (pre-existing gap, not introduced here). |
| nemoguardrails/llm/models/initializer.py | Now delegates to framework registry; mode parameter correctly moved to model_kwargs and defaulted to "chat"; clean and correct. |
| nemoguardrails/rails/llm/llmrails.py | Typed llm as LLMModel, added _wrap_legacy_llm for backward compatibility; deprecation warning correctly uses stacklevel=3. |
| nemoguardrails/types.py | Well-designed canonical type system; LLMModel/LLMFramework as runtime_checkable Protocols; LLMResponse/LLMResponseChunk/ChatMessage dataclasses are complete and correct. |
| nemoguardrails/llm/frameworks.py | Lazy framework registry with env-var override; clean design. register_framework raises on duplicate which could trip integration tests if they don't call _reset_frameworks(). |
| nemoguardrails/integrations/langchain/message_utils.py | New chatmessage_to_langchain_message correctly falls back to msg.content or "" for all roles; tool_calls_to_langchain_format output boundary is correct. |
| nemoguardrails/actions/llm/generation.py | All call sites correctly updated to .content; fire-and-forget streaming task at line 1260 correctly discards the LLMResponse. |
Sequence Diagram
sequenceDiagram
participant Caller as generation.py caller
participant llm_call as llm_call (utils.py)
participant Adapter as LangChainLLMAdapter
participant LC as LangChain llm
Caller->>llm_call: "await llm_call(llm, prompt, stop=[...], llm_params={...})"
llm_call->>llm_call: isinstance(llm, LLMModel) check
llm_call->>llm_call: _setup_llm_call_info / _log_prompt
alt streaming
llm_call->>llm_call: _stream_llm_call(model, prompt, handler, stop, llm_params)
llm_call->>Adapter: "model.stream(prompt, stop=stop, **llm_params)"
Adapter->>Adapter: _prepare_llm(kwargs) → filter temperature only
Adapter->>LC: "llm.astream(messages, stop=stop)"
LC-->>Adapter: chunks
Adapter-->>llm_call: LLMResponseChunk stream
llm_call-->>Caller: "LLMResponse(content=handler.completion)"
else non-streaming
llm_call->>Adapter: "model.generate(prompt, stop=stop, **llm_params)"
Adapter->>Adapter: _prepare_llm(kwargs) → filter temperature only
Adapter->>LC: "llm.ainvoke(messages, stop=stop)"
LC-->>Adapter: AIMessage
Adapter->>Adapter: _langchain_response_to_llm_response
Adapter-->>llm_call: LLMResponse
llm_call->>llm_call: _store_reasoning_traces / _log_completion / _update_token_stats
llm_call->>llm_call: _store_tool_calls / _store_response_metadata
llm_call-->>Caller: LLMResponse
end
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemoguardrails/integrations/langchain/llm_adapter.py
Line: 151-169
Comment:
**`stop` parameter filtering dropped for reasoning models**
The old `_filter_params_for_openai_reasoning_models` filtered both `temperature` **and** `stop` for models that don't support them (`o3`, `o4-mini`, `gpt-5` non-chat). The new method only removes `temperature`. Now `stop` is forwarded directly to `ainvoke`/`astream` (lines 196, 208) with no stripping for these models.
NeMo Guardrails passes `stop=["\nuser ", "\nUser "]` in the Colang 1.0 generation pipeline (`generation.py` line 1264). Any user pointing `o3`, `o4-mini`, or `gpt-5` (non-chat) at that path will hit an OpenAI API error because those models reject the `stop` parameter.
The adapter needs to filter `stop` before forwarding it:
```python
_STOP_NOT_SUPPORTED_MODELS = ("o4-mini",)
_STOP_NOT_SUPPORTED_PREFIXES = ("gpt-5",) # except gpt-5-chat-*
def _filter_reasoning_model_params(self, params: Optional[dict]) -> Optional[dict]:
if not params:
return params
model_name = _infer_model_name(self._llm).lower()
is_openai_reasoning_model = (
model_name.startswith("o1")
or model_name.startswith("o3")
or (model_name.startswith("gpt-5") and "chat" not in model_name)
)
is_stop_not_supported = (
(model_name.startswith("o3") and not model_name.startswith("o3-mini"))
or model_name.startswith("o4-mini")
or (model_name.startswith("gpt-5") and "gpt-5-chat" not in model_name)
)
if not is_openai_reasoning_model and not is_stop_not_supported:
return params
filtered = params.copy()
if is_openai_reasoning_model:
filtered.pop("temperature", None)
if is_stop_not_supported:
filtered.pop("stop", None)
return filtered
```
And `stop` should be omitted from `ainvoke`/`astream` when it is filtered:
```python
async def generate(self, prompt, *, stop=None, **kwargs):
filtered_stop = None if self._is_stop_not_supported() else stop
llm = self._prepare_llm(kwargs)
response = await llm.ainvoke(self._to_langchain_input(prompt), stop=filtered_stop)
return _langchain_response_to_llm_response(response)
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (8): Last reviewed commit: "fix(llm): use direct assignment for toke..." | Re-trigger Greptile
d081b0e to
bd9227e
Compare
bd9227e to
75d8958
Compare
75d8958 to
ff6d6b5
Compare
7cac41e to
3982772
Compare
3939430 to
3590b19
Compare
3590b19 to
bc61eb8
Compare
bc61eb8 to
62e7e1a
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
62e7e1a to
0a89b7d
Compare
525f8ee to
cb77d85
Compare
0a89b7d to
a6ba1da
Compare
cb77d85 to
a1814b3
Compare
init_llm_model() delegates to the framework registry instead of calling LangChain directly. LLMRails types self.llm as LLMModel and wraps legacy LangChain objects with a deprecation warning.
llm_call() now calls model.generate()/model.stream() via the LLMModel protocol and returns LLMResponse. All LangChain-specific extraction logic (reasoning, tool calls, metadata) removed from utils.py since the adapter handles it.
Add tool_calls_to_langchain_format() to convert canonical nested tool calls back to LangChain flat format at the output boundary. Update runnable_rails.py and guardrails.py to use the new types. fix
Update ~35 call sites across generation actions, library actions, and eval modules to use .content for text access now that llm_call() returns LLMResponse instead of str. fix hallucination action to use the new contract
The old _raise_llm_call_exception extracted endpoint URLs by checking BASE_URL_ATTRIBUTES on the raw LangChain LLM. This was lost in the rewrite. Restore it by using LLMModel.provider_url and implementing the attribute inspection in LangChainLLMAdapter.provider_url.
- Reorder: store reasoning traces before logging completion - Fix missing underscore in test function names - Fix docstring param name rails -> rail_types - Fall back to empty string for None content in chatmessage conversion
Windows time.time() has ~15ms granularity. An instant mock function gets identical start/end timestamps. Add asyncio.sleep(0.02) so duration is always measurable.
cf8b219 to
a865c0b
Compare
📝 WalkthroughWalkthroughThis PR introduces a framework-agnostic LLM model abstraction ( Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemoguardrails/rails/llm/llmrails.py`:
- Around line 121-136: In _wrap_legacy_llm, the ImportError caught when
importing LangChainLLMAdapter should be chained to the new TypeError so the
original exception context is preserved; update the except block to capture the
ImportError (e.g., "except ImportError as err:") and re-raise the TypeError
using "raise TypeError(... ) from err" when declining to proceed, referencing
the _wrap_legacy_llm function and LangChainLLMAdapter import to locate the
change.
In `@tests/test_tool_output_rails.py`:
- Around line 35-36: The fixtures build flat tool-call objects but the helpers
now expect the nested shape under function.arguments; update all mocked calls
used by self_check_tool_calls and validate_tool_parameters so each tool_call has
a "function" dict with "name" and "arguments" keys (e.g., replace top-level
"name" and "args" entries with function: {"name": <tool_name>, "arguments":
<args_dict>}) for the fixtures currently constructed in
tests/test_tool_output_rails.py (the ones referenced by self_check_tool_calls
and validate_tool_parameters), ensuring every mocked call in the happy-path and
blocking tests uses this nested schema.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9aec663d-bca9-4285-b01c-7228a00d8aa6
📒 Files selected for processing (38)
nemoguardrails/__init__.pynemoguardrails/actions/llm/generation.pynemoguardrails/actions/llm/utils.pynemoguardrails/actions/v2_x/generation.pynemoguardrails/eval/check.pynemoguardrails/evaluate/evaluate_factcheck.pynemoguardrails/evaluate/evaluate_moderation.pynemoguardrails/evaluate/utils.pynemoguardrails/guardrails/guardrails.pynemoguardrails/integrations/langchain/llm_adapter.pynemoguardrails/integrations/langchain/message_utils.pynemoguardrails/integrations/langchain/runnable_rails.pynemoguardrails/library/content_safety/actions.pynemoguardrails/library/factchecking/align_score/actions.pynemoguardrails/library/hallucination/actions.pynemoguardrails/library/llama_guard/actions.pynemoguardrails/library/patronusai/actions.pynemoguardrails/library/self_check/facts/actions.pynemoguardrails/library/self_check/input_check/actions.pynemoguardrails/library/self_check/output_check/actions.pynemoguardrails/library/topic_safety/actions.pynemoguardrails/llm/models/initializer.pynemoguardrails/rails/llm/llmrails.pytests/test_actions_llm_utils.pytests/test_content_safety_actions.pytests/test_content_safety_cache.pytests/test_content_safety_integration.pytests/test_llama_guard.pytests/test_llmrails.pytests/test_logging.pytests/test_patronus_lynx.pytests/test_reasoning_trace_extraction.pytests/test_tool_calling_passthrough_only.pytests/test_tool_calling_utils.pytests/test_tool_calls_event_extraction.pytests/test_tool_output_rails.pytests/test_topic_safety_cache.pytests/test_topic_safety_internalevent.py
💤 Files with no reviewable changes (1)
- nemoguardrails/integrations/langchain/llm_adapter.py
resolve test failures that was introduced stack-2
tgasser-nv
left a comment
There was a problem hiding this comment.
Thanks for fixing the accumulation change, the rest all looks good
After #1760 activated the LangChainLLMAdapter path, stop was forwarded unconditionally to llm.ainvoke() and llm.astream(), causing HTTP 400 from gpt-5*, o3 (non-mini) and o4-mini which reject the stop parameter. Unify stop and temperature filtering into a single _prepare_call_params method
Part of the LangChain decoupling stack:
Summary
Atomic switch from LangChain to the LLMModel protocol. These changes are tightly coupled (you cannot change what the initializer returns without also changing how utils.py calls it unless bridge stacks are introduced which is a mess, and we cannot change the return type without updating all callers), so they land as one PR.
How to review
This PR is split into 5 scoped commits for easier review. Please review commit-by-commit rather than the full diff:
rewrite initializer and llmrails for LLMModel protocolinit_llm_model()delegates to framework registry,LLMRailstypesself.llmasLLMModelwith legacy wrappingrewrite llm_call to use LLMModel.generate/streamthe core pipeline change, removes all LangChain extraction logic
update integration boundary for canonical tool callstool_calls_to_langchain_format()at output boundarymigrate all callers to LLMResponse.contentmechanical one-line changes across 35 call sites
restore endpoint URL in LLMCallException via provider_url-uses
LLMModel.provider_urlinstead of inspecting LLM attributesIndividual commits do not pass tests on their own since the changes are interdependent. The PR as a whole passes the full test suite.
Summary by CodeRabbit
New Features
get_default_framework(),register_framework(), andset_default_framework().Refactor
LLMModelinterface throughout the codebase. AddedLangChainLLMAdapterfor backward compatibility with existing LangChain implementations.