upstream sync#11
Merged
Merged
Conversation
* Updated notebook 1 with VDR feedback * Updated notebook 2 (Jaeger) with VDR comments * Replace Llama 3.1 8B with Llama 4 scout as main model * Correct type in cell 10 notebook 2
* chore(release): prepare for v0.16.0 ------ Co-authored-by: tgasser-nv <tgasser-nv@users.noreply.github.com> Co-authored-by: tgasser-nv <200644301+tgasser-nv@users.noreply.github.com> Co-authored-by: Pouyanpi <13303554+Pouyanpi@users.noreply.github.com>
Signed-off-by: Tim Gasser <200644301+tgasser-nv@users.noreply.github.com>
…1.0 (#1365) Add example configuration and documentation for using NVIDIA NeMoGuard NIMs, including content moderation, topic control, and jailbreak detection.
…1399) * Checkin of demo script link * Update getting-started.md Signed-off-by: Tim Gasser <200644301+tgasser-nv@users.noreply.github.com> * docs: edit 1399 (#1400) * edit more * more fixes --------- Signed-off-by: Tim Gasser <200644301+tgasser-nv@users.noreply.github.com> Co-authored-by: Miyoung Choi <miyoungc@nvidia.com>
Update verbose logging to safely handle cases where log records may not have 'id' or 'task' attributes. Prevents potential AttributeError and improves robustness of LLM and prompt log output formatting.
Remove branch restriction from pull_request trigger to allow codecov coverage reporting on PRs targeting any branch, not just develop.
) Implements tool call extraction and passthrough functionality in LLMRails: - Add tool_calls_var context variable for storing LLM tool calls - Refactor llm_call utils to extract and store tool calls from responses - Support tool calls in both GenerationResponse and dict message formats - Add ToolMessage support for langchain message conversion - Comprehensive test coverage for tool calling integration
… Runnable protocol support (#1366) - Implement comprehensive async/sync invoke, batch, and streaming support - Add robust input/output transformation for all LangChain formats (ChatPromptValue, BaseMessage, dict, string) - Enhance chaining behavior with intelligent __or__ method handling RunnableBinding and complex chains - Add concurrency controls, error handling, and configurable blocking messages - Implement proper tool calling support with tool call passthrough - Add extensive test suite (14 test files, 2800+ lines) covering all major functionality including batching, streaming, composition, piping, and tool calling - Reorganize and expand test structure for better maintainability apply review suggestions
…Rails (#1369) Ensure AIMessage responses from RunnableRails contain the same metadata fields (response_metadata, usage_metadata, additional_kwargs, id) as direct LLM calls, enabling consistent LangChain integration behavior.
Enhance streaming in RunnableRails to include generation metadata in streamed chunks. Skips END_OF_STREAM markers and updates chunk formatting to support metadata for AIMessageChunk outputs. This improves compatibility with consumers expecting metadata in streaming responses.
Remove the push trigger from the PR Tests workflow to avoid duplicate runs when a pull request is updated. The workflow will now only run on pull_request events, ensuring a single execution per PR update.
…1382) Introduce tool output/input rails configuration and Colang flows for tool call validation and parameter security checks. Add support for BotToolCall event emission in passthrough mode, enabling tool call guardrails before execution.
…ion and processing (#1386) - Add UserToolMessages event handling and tool input rails processing - Fix message-to-event conversion to properly handle tool messages in conversation history - Preserve tool call context in passthrough mode by using full conversation history - Support tool_calls and tool message metadata in LangChain format conversion - Include comprehensive test suite for tool input rails functionality test(runnable_rails): fix prompt format in passthrough mode feat: support ToolMessage in message dicts refactor: rename BotToolCall to BotToolCalls
…calling (#1405) * fix: preserve message metadata in RunnableRails tool calling - Extract message conversion logic to centralized message_utils module - Dynamically preserve all LangChain message fields (tool_calls, additional_kwargs, etc.) - Fix tool calling metadata loss in passthrough mode - Add comprehensive unit tests for message conversions
…tegration (#1355) --------- Co-authored-by: Trent Holmes <trent_holmes@trendmicro.com> Co-authored-by: Karanjot Singh Saggu <karanjotsingh_saggu@trendmicro.com>
…me (#1401) Signed-off-by: Curtis G. Northcutt <curtis.northcutt@gmail.com>
* feat(llm): add llm_params option to llm_call Extend llm_call to accept an optional llm_params dictionary for passing configuration parameters (e.g., temperature, max_tokens) to the language model. This enables more flexible control over LLM behavior during calls. refactor(llm): replace llm_params context manager with argument Update all usages of the llm_params context manager to pass llm_params as an argument to llm_call instead. This simplifies parameter handling and improves code clarity for LLM calls. docs: clarify prompt customization and llm_params usage update LLMChain config usage
* refactor(llm): remove isolated LLMs for actions Removes the logic for creating isolated LLM instances for actions that require an 'llm' parameter in `LLMRails`. This includes deleting the `_create_isolated_llms_for_actions`, `_detect_llm_requiring_actions`, `_get_action_function`, and `_create_action_llm_copy` methods, as well as removing their invocation from the class initialization. Also deletes all related tests, including unit, integration, and e2e tests for LLM isolation and model_kwargs handling. This simplifies the codebase and test suite by eliminating support for per-action LLM isolation. * Revert tests/utils.py to state before #1336
- Fixed 86 type errors across 7 files in the nemoguardrails/rails/ directory. All fixes preserve existing functionality while improving type safety. - added Pyright to pre-commits
Bump transformers to >=4.56.0 and torch to >=2.8.0 to resolve high severity vulns
Previously, messages of type "tool" were not distinctly labeled in the LoggingCallbackHandler output, causing them to be grouped under "System". This change adds explicit handling for "tool" messages, labeling them as "Tool" in the logs for improved clarity and debugging.
* refactor(logging): replace if-else with dict for type mapping
Type-cleaned all files under `nemoguard/actions` and added them to pyright pre-commit hooks so type-coverage doesn't regress.
* docs: add tools integration guide Add a comprehensive "Tools Integration" guide to the advanced user documentation. The new guide covers supported tools, configuration settings (including passthrough mode), implementation examples, and security considerations for tool usage in NeMo Guardrails. Also update the advanced index to include the new guide. * add the new page to the right index file (#1415) * Update docs/user-guides/advanced/tools-integration.md Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> * Update docs/user-guides/advanced/tools-integration.md Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> * Update docs/user-guides/advanced/tools-integration.md Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> * Update docs/user-guides/advanced/tools-integration.md Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> * Update docs/user-guides/advanced/tools-integration.md Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> * Update docs/user-guides/advanced/tools-integration.md Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> --------- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> Co-authored-by: Miyoung Choi <miyoungc@nvidia.com>
Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>
* feat(cache): add LLM metadata caching for model and provider information Extends the cache system to store and restore LLM metadata (model name and provider name) alongside cache entries. This allows cached results to maintain provenance information about which model and provider generated the original response. - Added LLMMetadataDict and LLMCacheData TypedDict definitions for type safety - Extended CacheEntry to include optional llm_metadata field - Implemented extract_llm_metadata_for_cache() to capture model and provider info from context - Implemented restore_llm_metadata_from_cache() to restore metadata when retrieving cached results - Updated get_from_cache_and_restore_stats() to handle metadata extraction and restoration - Added comprehensive test coverage for metadata caching functionalit * feat(cache): add caching support for jailbreak detection
Skip FastEmbed embedding tests by default to prevent CI failures from external HuggingFace infrastructure issues.
Co-authored-by: Pouyanpi <13303554+Pouyanpi@users.noreply.github.com>
…ctic-embed-m-long` name. (#1464) Signed-off-by: Erick Galinkin <egalinkin@nvidia.com>
Replace real time.sleep() calls with mocked time.time() to eliminate timing dependencies that caused intermittent CI failures. The test would occasionally fail when execution overhead pushed total elapsed time over the 1.0 second threshold, triggering unexpected stats logging. Mocking time.time() ensures deterministic behavior across all environments and reduces test execution time from ~1.16s to ~0.02s.
…el rails (#1467) * fix(runtime): ensure stop flag is set for policy violations in parallel rails * Update nemoguardrails/colang/v1_0/runtime/runtime.py
…ending (#1468) * feat(llm)!: add reasoning_content field to GenerationResponse BREAKING CHANGE: Reasoning traces are no longer prepended directly to response content. When using GenerationOptions, reasoning is now available in the separate reasoning_content field. Without GenerationOptions, reasoning is wrapped in <thinking> tags. Refactor reasoning trace handling to expose reasoning content as a separate field in GenerationResponse instead of prepending it to response content. - Add reasoning_content field to GenerationResponse for structured access - Populate reasoning_content in generate_async when using GenerationOptions - Wrap reasoning traces in <thinking> tags for dict/string responses - Update test to reflect new behavior (no longer prepending reasoning) - Add comprehensive tests for new field and behavior changes This improves API usability by separating reasoning content from the actual response, allowing clients to handle thinking traces independently. Resolves the TODO from PR #1432 about not prepending reasoning traces to final generation content.
) * feat(llm): Add custom HTTP headers support to ChatNVIDIA provider Add custom HTTP headers support to the ChatNVIDIA class patch, enabling users to pass custom headers (authentication tokens, request IDs, billing information, etc.) with all requests to NVIDIA AI endpoints. Implementation Approach - Added custom_headers optional field to ChatNVIDIA class with Pydantic v2 compatibility - Implemented runtime method wrapping that intercepts _client.get_req() and _client.get_req_stream() to merge custom headers with existing headers - Included automatic version detection to ensure compatibility with langchain-nvidia-ai-endpoints >= 0.3.0, with clear error messages for older versions - Works with both synchronous invoke() and streaming requests, fully compatible with VLM (Vision Language Models)
…rect parameter passing (#1471)
…put rails streaming (#1470) * fix(streaming): raise error when stream_async used with disabled output rails streaming When output rails are configured but output.streaming.enabled is False (or not set), calling stream_async() would result in undefined behavior or hangs due to the conflict between streaming expectations and blocking output rail processing. This change adds explicit validation in stream_async() to detect this misconfiguration and raise a clear ValueError with actionable guidance: - Set rails.output.streaming.enabled = True to use streaming with output rails - Use generate_async() instead for non-streaming with output rails Updated affected tests to expect and validate the new error behavior instead of relying on the previous buggy behavior.
…ags (#1474) Adds a compatibility layer for LLM providers that don't properly populate reasoning_content in additional_kwargs. When reasoning_content is missing, the system now falls back to extracting reasoning traces from <think>...</think> tags in the response content and removes the tags from the final output. This fixes compatibility with certain NVIDIA models (e.g., nvidia/llama-3.3-nemotron-super-49b-v1.5) in langchain-nvidia-ai-endpoints that include reasoning traces in <think> tags but fail to populate the reasoning_content field. All reasoning models using ChatNVIDIA should expose reasoning content consistently through the same interface
* Initial checkin * Add nemoguardrails/server to pyright type-checking * chore(types): Type-clean embeddings/ (25 errors) (#1383) * test: restore test that was skipped due to Colang 2.0 serialization issue (#1449) * fix(llm): add fallback extraction for reasoning traces from <think> tags (#1474) Adds a compatibility layer for LLM providers that don't properly populate reasoning_content in additional_kwargs. When reasoning_content is missing, the system now falls back to extracting reasoning traces from <think>...</think> tags in the response content and removes the tags from the final output. This fixes compatibility with certain NVIDIA models (e.g., nvidia/llama-3.3-nemotron-super-49b-v1.5) in langchain-nvidia-ai-endpoints that include reasoning traces in <think> tags but fail to populate the reasoning_content field. All reasoning models using ChatNVIDIA should expose reasoning content consistently through the same interface * Clean up the config_id logic based on Traian and Greptile feedback --------- Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com>
* Cleaned llm/ type errors * Add nemoguardrails/llm to the pyright pre-commit check * Fix types in nemoguardrails/rails module * Use poetry install --all-extras --with dev to install langchain_nvidia_ai_endpoints for Github CI tests * Install extras in test-coverage-report so the langchain_nvidia_ai_endpoints work for pyright type-checking * Remove tritonclient from type-checking (should this be deprecated? * Add upgrade-deps to the full-tests.yml file in Github CI/CD * Exclude providers/trtllm/** and providers/_langchain_nvidia_ai_endpoints_patch.py from type-checking * Roll back type cleaning under llm/providers/trtllm now they're excluded from type-checking * Type-clean the LFU cache implementation * Address Pouyan's feedback. Removed Model.model Optional and default value * fix typo * Revert github workflow changes (not needed now we exclude trtllm from type-checking) * Remove comment from pyproject.toml * Revert mandatory Model name field change, add None-guard back * Address last feedback
* Initial scaffold of mock OpenAI-compatible server * Refactor mock LLM, fix tests * Added tests to load YAML config. Still debugging dependency-injection of this into endpoints * Move FastAPI app import **after** the dependencies are loaded and cached * Remove debugging print statements * Temporary checkin * Add refusal probability and tests to check it * Use YAML configs for Nemoguard and app LLMs * Add Mock configs for content-safety and App LLM * Add async sleep statements and logging to record request time * Change content-safety mock to have latency of 0.5s * Add unit-tests to mock llm * Check for config file * Rename test files to avoid conflicts with other tests * Remove example_usage.py script and type-clean config.py * Regenerate headers with 2023 - 2025 * Removed commented-out code * review: PR#1403 (#1453) * test: run_server test coverage pragma no cover * Update licence --------- Co-authored-by: tgasser-nv <200644301+tgasser-nv@users.noreply.github.com> * Apply greptile fixes * Last couple of cleanups --------- Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com>
* docs: add guide for bot reasoning guardrails update update simplify cleanup * docs: clarify Colang version for bot reasoning guide Add a note specifying that bot reasoning guardrails are supported only in Colang 1.0. Update example file references for improved clarity. * add bot thinking guardrails to toctree * docs: update self-check config link to develop branch * fix typo * fix references to use develop branch * docs: edit #1479 (#1484) --------- Co-authored-by: Miyoung Choi <miyoungc@nvidia.com>
## Summary Document the memory-caching to reduce latency for frequently Guardrail'ed user requests ## Commit history: * Initial checkin * Completed cache doc, some todos to fill in based on local integration testing * Update example with content-safety, topic-control, and jailbreak nemoguard NIMs * Add memory-caching to the table-of-contents * Cleaned up last TODOs * Apply suggestion from @greptile-apps[bot] Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> * edit (#1486) Edits to the memory-cache docs * Final updates to example log format --------- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> Co-authored-by: Miyoung Choi <miyoungc@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* docs: update LLM reasoning traces config guidance Refactor documentation for reasoning traces in LLM configuration guide: - Remove deprecated `reasoning_config` and `apply_to_reasoning_traces` fields. - Add warning about breaking changes in v0.18.0. - Explain new approach using output rails for reasoning traces. - Provide updated YAML and Python usage examples. - Clarify how to access reasoning traces in API responses. - Remove outdated configuration and prompt samples. * docs(llm): clarify GenerationOptions usage and patterns --------- Co-authored-by: Miyoung Choi <miyoungc@nvidia.com>
* prepare 0.18 doc release * add the other two notes * nit
* docs: Add Nemotron Safety Guard Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com> * docs: Replace NemoGuard ContentSafety page The Nemotron Safety Guard model provides content safety and is multilingual. Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com> * docs(fix): Fix content safety redirects Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com> * docs(fix): Greptile review fixes - Clarify input and output rails - Update reference in unused (but built) index.rst Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com> * docs(fix): Errant cursor suggested file name Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com> --------- Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>
* chore(release): prepare for v0.18.0 * update release date --------- Co-authored-by: Pouyanpi <Pouyanpi@users.noreply.github.com> Co-authored-by: Pouyan <13303554+Pouyanpi@users.noreply.github.com> Co-authored-by: tgasser-nv <200644301+tgasser-nv@users.noreply.github.com>
83ef844
into
trustyai-explainability:develop
1 of 7 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.