-
Notifications
You must be signed in to change notification settings - Fork 12
Add safety preprocessing and routing to ReactToMe #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 7 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
30e7dff
refactor: centralize preprocessing in BaseGraphBuilder
heliamoh 7c8f50f
feat: route unsafe queries in ReactToMe
heliamoh 734f361
feat: add safety routing to ReactToMe graph
heliamoh fdc5cd2
feat: strengthen safety checker policy
heliamoh c8f9f71
feat: add preprocessing workflow module
heliamoh 6378148
Apply formatting fixes after lint run
heliamoh cf347f9
style: simplify state merges in BaseGraphBuilder
heliamoh bbc6549
fix: address reviewer import and streaming feedback
heliamoh 855ad4d
fix: address reviewer feedback and resolve state management issues
heliamoh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
heliamoh marked this conversation as resolved.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| from langchain_core.language_models.chat_models import BaseChatModel | ||
| from langchain_core.prompts import ChatPromptTemplate | ||
| from langchain_core.runnables import Runnable | ||
|
|
||
|
|
||
| def create_unsafe_answer_generator(llm: BaseChatModel) -> Runnable: | ||
| """ | ||
| Create an unsafe answer generator chain. | ||
|
|
||
| Args: | ||
| llm: Language model to use. | ||
|
|
||
| Returns: | ||
| Runnable that generates refusal messages for unsafe or out-of-scope queries. | ||
| """ | ||
| system_prompt = """ | ||
| You are an expert scientific assistant operating under the React-to-Me platform. React-to-Me helps both experts and non-experts explore molecular biology using trusted data from the Reactome database. | ||
|
|
||
| You have advanced training in scientific ethics, dual-use research concerns, and responsible AI use. | ||
|
|
||
| You will receive three inputs: | ||
| 1. The user's question. | ||
| 2. A system-generated variable called `reason_unsafe`, which explains why the question cannot be answered. | ||
| 3. The user's preferred language (as a language code or name). | ||
|
|
||
| Your task is to clearly, respectfully, and firmly explain to the user *why* their question cannot be answered, based solely on the `reason_unsafe` input. Do **not** attempt to answer, rephrase, or guide the user toward answering the original question. | ||
|
|
||
| You must: | ||
| - Respond in the user’s preferred language. | ||
| - Politely explain the refusal, grounded in the `reason_unsafe`. | ||
| - Emphasize React-to-Me’s mission: to support responsible exploration of molecular biology through trusted databases. | ||
| - Suggest examples of appropriate topics (e.g., protein function, pathways, gene interactions using Reactome/UniProt). | ||
|
|
||
| You must not provide any workaround, implicit answer, or redirection toward unsafe content. | ||
| """ | ||
| prompt = ChatPromptTemplate.from_messages( | ||
| [ | ||
| ("system", system_prompt), | ||
| ( | ||
| "user", | ||
| "Language:{language}\n\nQuestion:{user_input}\n\n Reason for unsafe or out of scope: {reason_unsafe}", | ||
| ), | ||
| ] | ||
| ) | ||
|
|
||
| return prompt | llm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| """ | ||
| Preprocessing utilities with reusable workflows and state definitions. | ||
| """ | ||
|
|
||
| from .state import PreprocessingState # noqa: F401 | ||
| from .workflow import create_preprocessing_workflow # noqa: F401 | ||
|
heliamoh marked this conversation as resolved.
Outdated
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| from typing import TypedDict | ||
|
|
||
| from langchain_core.messages import BaseMessage | ||
|
|
||
|
|
||
| class PreprocessingState(TypedDict, total=False): | ||
| """State for the preprocessing workflow.""" | ||
|
|
||
| # Inputs | ||
| user_input: str | ||
| chat_history: list[BaseMessage] | ||
|
|
||
| # Task outputs | ||
| rephrased_input: str | ||
| safety: str | ||
| reason_unsafe: str | ||
| expanded_queries: list[str] | ||
| detected_language: str |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does adding
**statein spots like this solve some issue with updating the state?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BaseState(**state, …) merge isn’t there to work around a bug—it’s there because LangGraph hands us an evolving state dict, and we don’t want to drop any of the fields that preprocessing or earlier nodes already wrote (rephrased input, safety tag, chat history, etc.). Postprocess only needs to add additional_content, so merging **state with the new field preserves the existing state while layering on the search results. If we just returned BaseState(additional_content=...), we’d lose everything else that was already in the state and downstream nodes would break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LangGraph implicitly updates the state using the returned dict without dropping omitted fields, so we shouldn't need to do this.