Skip to content

Commit f30911e

Browse files
committed
Complete fix multi language support
1 parent ea8c944 commit f30911e

4 files changed

Lines changed: 59 additions & 22 deletions

File tree

src/agent/profiles/react_to_me.py

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,27 @@ async def generate_unsafe_response(
7373
async def call_model(
7474
self, state: ReactToMeState, config: RunnableConfig
7575
) -> ReactToMeState:
76+
# Build the query, injecting a language instruction for non-English users.
77+
# Retrieval is always done in English (embeddings are English), but the
78+
# final response must be in the user's detected language.
79+
query = state["rephrased_input"]
80+
detected_language = state.get("detected_language", "English")
81+
82+
if detected_language.lower() != "english":
83+
query = (
84+
f"{query}\n\n"
85+
f"[CRITICAL INSTRUCTION: You MUST write your entire response in "
86+
f"{detected_language}. The retrieved context is in English because "
87+
f"the Reactome database is English-only, but your answer to the user "
88+
f"MUST be entirely in {detected_language}. "
89+
f"Keep all gene symbols, protein names, pathway identifiers, "
90+
f"Reactome IDs (e.g. R-HSA-*), and URLs in their original English "
91+
f"form — do NOT translate scientific nomenclature or citation links.]"
92+
)
93+
7694
result: dict[str, Any] = await self.reactome_rag.ainvoke(
7795
{
78-
"input": state["rephrased_input"],
96+
"input": query,
7997
"chat_history": (
8098
state["chat_history"]
8199
if state["chat_history"]
@@ -97,4 +115,4 @@ def create_reactome_graph(
97115
llm: BaseChatModel,
98116
embedding: Embeddings,
99117
) -> StateGraph:
100-
return ReactToMeGraphBuilder(llm, embedding).uncompiled_graph
118+
return ReactToMeGraphBuilder(llm, embedding).uncompiled_graph

src/agent/tasks/cross_database/summarize_reactome_uniprot.py

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,18 @@
44
from langchain_core.runnables import Runnable
55

66
summarization_message = """
7-
You are an expert in molecular biology with significant experience as a curator for the UniProt Database adn the Reactome Pathway Knowledgebase.
8-
Your task is to answer user's question in a clear, accurate, and comprehensive and engaging manner based strictly on the context provided from the UniProt and Reactome Pathway Knowledgebases.
7+
You are an expert in molecular biology with significant experience as a curator for the UniProt Database and the Reactome Pathway Knowledgebase.
8+
Your task is to answer the user's question in a clear, accurate, comprehensive, and engaging manner based strictly on the context provided from the UniProt and Reactome Pathway Knowledgebases.
99
1010
Instructions:
1111
1. Provide answers **strictly based on the given context from the Reactome and UniProt Knowledgebase**. Do **not** use or infer information from any external sources.
1212
2. If the answer cannot be derived from the context provided, do **not** answer the question; instead explain that the information is not currently available in Reactome or UniProt.
13-
3. Extract Key Insights: Identify the most relevant and accurate details from both databases; Focus on points that directly address the users question.
14-
4. Merge Information: Combine overlapping infromation concisely while retining key biological terms terminology (e.g., gene names, protein names, pathway names, disease involvement, etc.)
13+
3. Extract Key Insights: Identify the most relevant and accurate details from both databases; Focus on points that directly address the user's question.
14+
4. Merge Information: Combine overlapping information concisely while retaining key biological terminology (e.g., gene names, protein names, pathway names, disease involvement, etc.)
1515
5. Ensure Clarity & Accuracy:
16-
- The response should be well-structured, factually correct, and directly answer the users question.
16+
- The response should be well-structured, factually correct, and directly answer the user's question.
1717
- Use clear language and logical transitions so the reader can easily follow the discussion.
18-
4. Include all Citations From Sources:
18+
6. Include all Citations From Sources:
1919
- Collect and present **all** relevant citations (links) provided to you.
2020
- Incorporate or list these citations clearly so the user can trace the information back to each respective database.
2121
- Example:
@@ -26,17 +26,25 @@
2626
- <a href="https://www.uniprot.org/uniprotkb/Q92908">GATA6</a>
2727
- <a href="https://www.uniprot.org/uniprotkb/O00482">NR5A2</a>
2828
29-
5. Answer in the Language requested.
30-
6. Write in a conversational and engaging tone suitable for a chatbot.
31-
6. Use clear, concise language to make complex topics accessible to a wide audience.
29+
7. **LANGUAGE (CRITICAL)**: You MUST write your entire response in the language specified below.
30+
- The context from Reactome and UniProt is in English because the databases are English-only.
31+
- However, your response MUST be entirely in the requested language.
32+
- Preserve ALL scientific terminology in English: gene names, protein names, pathway names,
33+
Reactome IDs (R-HSA-*), UniProt IDs, and URLs must remain in their original English form.
34+
- Only translate the explanatory narrative text.
35+
8. Write in a conversational and engaging tone suitable for a chatbot.
36+
9. Use clear, concise language to make complex topics accessible to a wide audience.
3237
"""
3338

3439
summarizer_prompt = ChatPromptTemplate.from_messages(
3540
[
3641
("system", summarization_message),
3742
(
3843
"human",
39-
"User question: {input} \n\n Language: {detected_language} \n\n Reactome-drived information: \n {reactome_answer} \n\n UniProt-drived infromation: \n {uniprot_answer}.",
44+
"User question: {input} \n\n "
45+
"Response Language: {detected_language} \n\n "
46+
"Reactome-derived information: \n {reactome_answer} \n\n "
47+
"UniProt-derived information: \n {uniprot_answer}.",
4048
),
4149
]
4250
)
@@ -49,4 +57,4 @@ def create_reactome_uniprot_summarizer(
4957
llm = llm.model_copy(update={"streaming": True})
5058
return (summarizer_prompt | llm | StrOutputParser()).with_config(
5159
run_name="summarize_answer"
52-
)
60+
)

src/agent/tasks/rephrase.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,25 @@
44
from langchain_core.runnables import Runnable
55

66
contextualize_q_system_prompt = """
7-
You are an expert in question formulation with deep expertise in molecular biology and experience as a Reactome curator. Your task is to analyze the conversation history and the user’s latest query to fully understand their intent and what they seek to learn.
8-
If the user's question is not in English, reformulate the question and translate it to English, ensuring the meaning and intent are preserved.
9-
Reformulate the user’s question into a standalone version that retains its full meaning without requiring prior context. The reformulated question should be:
7+
You are an expert in question formulation with deep expertise in molecular biology and experience as a Reactome curator. Your task is to analyze the conversation history and the user's latest query to fully understand their intent and what they seek to learn.
8+
9+
## Cross-Lingual Strategy
10+
The Reactome and UniProt databases are indexed entirely in English. To maximize retrieval quality,
11+
the reformulated question MUST always be in English regardless of the user's input language.
12+
The downstream generation step handles translating the response back to the user's language.
13+
14+
If the user's question is not in English, translate it to English while preserving:
15+
- The exact biological intent and meaning
16+
- All gene symbols, protein names, and identifiers in their original form
17+
- The specificity of the question (do not generalize)
18+
19+
Reformulate the user's question into a standalone version that retains its full meaning without requiring prior context. The reformulated question should be:
1020
- Clear, concise, and precise
1121
- Optimized for both vector search (semantic meaning) and case-sensitive keyword search
12-
- Faithful to the users intent and scientific accuracy
22+
- Faithful to the user's intent and scientific accuracy
1323
14-
the returned question should always be in English.
15-
If the users question is already in English, self-contained and well-formed, return it as is.
24+
The returned question MUST always be in English.
25+
If the user's question is already in English, self-contained and well-formed, return it as is.
1626
Do NOT answer the question or provide explanations.
1727
"""
1828

@@ -28,4 +38,4 @@
2838
def create_rephrase_chain(llm: BaseChatModel) -> Runnable:
2939
return (contextualize_q_prompt | llm | StrOutputParser()).with_config(
3040
run_name="rephrase_question"
31-
)
41+
)

src/retrievers/reactome/prompt.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
You are an expert in molecular biology with access to the **Reactome Knowledgebase**.
55
Your primary responsibility is to answer the user's questions **comprehensively, mechanistically, and with precision**, drawing strictly from the **Reactome Knowledgebase**.
66
7-
Your output must emphasize biological processes, molecular complexes, regulatory mechanisms, and interactions most relevant to the users question.
7+
Your output must emphasize biological processes, molecular complexes, regulatory mechanisms, and interactions most relevant to the user's question.
88
Provide an information-rich narrative that explains not only what is happening but also how and why, based only on Reactome context.
99
1010
@@ -24,6 +24,7 @@
2424
- Examples:
2525
- <a href="https://reactome.org/content/detail/R-HSA-109581">Apoptosis</a>
2626
- <a href="https://reactome.org/content/detail/R-HSA-1640170">Cell Cycle</a>
27+
6. **Language**: If the user's question contains a language instruction (e.g., "[CRITICAL INSTRUCTION: ... in French]"), you MUST respond in that language. Preserve all gene symbols, protein names, Reactome IDs, and URLs in their original English form — only translate the explanatory text.
2728
2829
## Internal QA (silent)
2930
- All factual claims are cited correctly.
@@ -37,4 +38,4 @@
3738
MessagesPlaceholder(variable_name="chat_history"),
3839
("user", "Context:\n{context}\n\nQuestion: {input}"),
3940
]
40-
)
41+
)

0 commit comments

Comments
 (0)