[DEPRIORITIZED][AAQ-765] Retry LLM generation when AlignScore fails by lickem22 · Pull Request #399 · IDinsight/ask-a-question

lickem22 · 2024-08-19T13:08:10Z

Reviewer: @amiraliemami
Estimate: 40mins

Ticket

Description

Goal

The goal of this PR is to allow retrying LLM response again when AlignScore fails because of a low score N times (default is 0).

Changes

The following changes have been made:

Add backoff library dependency
Updated endpoint to retry when response is of type QueryResponseError and the error is low alignment score. Also allowed to add the previous failure raison in response.debug_info["past_failure"]

Future Tasks (optional)

How has this been tested?

Testing this is tricky because for this change to be observed we need LLM response to work but AlignScore to fail, and finding these cases are not straighforward.
Was tested two ways:
First way is :

Set ALIGN_SCORE_THRESHOLD to an unrealistic score (example 1.5). That way AlignScore fails
Run /search with generate_lllm_response set to true. with a content and a question relevant for the content. An example is content: Here we are going to talk about pineapples because of their pine shapes and the applee like tast
and question: Are apple related to pineapples,
Make sure LLM response is ran twice (by checking logs) if ALIGN_SCORE_N_RETRIES is set to the default value (1)
Make sure debug_info["past_failure"] is in the returned response.
Second way was by still setting ALIGN_SCORE_THRESHOLD to a value >1 but then adding a logic in the code to make sure the value of ALIGN_SCORE_THRESHOLD is reduced to a reasonable value everytime the LLM response is regenerated to make sure that after the second retry the AlignScore check passes. So, in second run, ALIGN_SCORE_THRESHOLD should be less than 0.8. This approach is not straightforward. I am open to more efficient ways of testing this feature.

Checklist

Fill with x for completed.

My code follows the style guidelines of this project
I have reviewed my own code to ensure good quality
I have tested the functionality of my code to ensure it works as intended
I have resolved merge conflicts

(Delete any items below that are not relevant)

I have updated the automated tests
I have updated the scripts in scripts/
I have updated the requirements
I have updated the README file
I have updated affected documentation
I have added a blogpost in Latest Updates
I have updated the CI/CD scripts in .github/workflows/
I have updated the Terraform code

lickem22 · 2024-08-19T13:13:35Z

+            asession=asession,
+            exclude_archived=True,
+        )
+        response.debug_info["past_failure"] = failure_reason


Added that in case it works after the second try to understand why it failed first.

lickem22 · 2024-08-19T13:14:03Z

    return response


+def is_unable_to_generate_response(response: QueryResponse) -> bool:


Added this function retry only if that condition is met.

lickem22 · 2024-08-19T13:15:34Z

+
+
+@backoff.on_predicate(
+    backoff.expo,


What backoff.expo does is basically waiting a little more everytime the function is reran in an exponential way just to handle the load better.

Should we just have a logic that retries once, instead of adding a config (num retries) we don't know if we'll use 🤔 ?

I guess it depends on how useful the approach is, since we haven't done any analysis to see how well it works.
But personnally I think since it doesn't add a dependency (backoff being used by litellm), and since the only code we would change if we retry just once is the decorator and the config variable, the cost is pretty low, so we can just keep it.

lickem22 · 2024-08-28T10:09:48Z

 You are a helpful question-answering AI. You understand user question and answer their \
 question using the REFERENCE TEXT below.
 """
+RETRY_PROMPT_SUFFIX = """


Added a suffix to the prompt to incorporate failure reason

suzinyou

Thanks Carlos! I think we need to think and validate the prompt a bit. Should we discuss in the next tech session?

suzinyou · 2024-08-30T05:13:57Z


    metadata = metadata or {}
-    prompt = RAG.prompt.format(context=context, original_language=original_language)
+    if "failure_reason" in metadata and metadata["failure_reason"]:


How about we create a new arg, "retry=False"?

The downside is

We would have to create it for all the parent functions and

We need both is_retry and metadata["failure_reaon"] to actually do retry.

But I think it would be easier to understand the code, and we won't be hiding any unexpected actions! What do you think?

Something like

if is_retry: if "failure_reason" not in metadata: raise ValueError("failure_reason is required for retry requests") prompt = RAG.retry_prompt.format( context=context, original_language=original_language, failure_reason=metadata["failure_reason"], )

My initial understanding was that we are using this to try the functionality. What if we keep it like this while testing, and if it turns out to be something we want to keep, the we will explicitly set it as a functionality by addind the is_retry parameter. What do you think?

suzinyou · 2024-08-30T06:35:41Z

+RETRY_PROMPT_SUFFIX = """
+If the response above is not aligned with the question, please rectify this by \
+considering the following reason(s) for misalignment: "{failure_reason}". 
+Make necessary adjustments to ensure the answer is aligned with the question.
+"""


Right now, we are only passing failure_reason which is response.debug_info["factual_consistency"]["reason"],
but we should also include the LLM response in this prompt..

Also, shouldn't the prompt define what we mean by alignment?

That makes sense. To be honest, I was just having a go at updating the prompt to take the output into consideration. I am not exactly an expert in prompt engineering. Should we discuss that in a tech session?

suzinyou · 2024-08-30T10:36:17Z

+
+
+@backoff.on_predicate(
+    backoff.expo,


Should we just have a logic that retries once, instead of adding a config (num retries) we don't know if we'll use 🤔 ?

lickem22 added 6 commits August 15, 2024 16:11

First commit

0b40e2a

Merge branch 'main' into implement-retry

ffef8d5

Add retry logic

98536da

Add retry logic

4216376

Cleanup

7561d8c

Cleanup

a5a0ac9

lickem22 requested review from Tanmay-97, amiraliemami, markbotterill, sidravi1, suzinyou and tonyzhao6 as code owners August 19, 2024 13:08

lickem22 commented Aug 19, 2024

View reviewed changes

Comment thread core_backend/app/question_answer/routers.py

suzinyou reviewed Aug 26, 2024

View reviewed changes

Comment thread core_backend/app/question_answer/routers.py

Comment thread core_backend/app/question_answer/routers.py

Comment thread core_backend/requirements.txt Outdated

lickem22 commented Aug 28, 2024

View reviewed changes

suzinyou reviewed Aug 30, 2024

View reviewed changes

lickem22 force-pushed the implement-retry branch from 3858453 to a5a0ac9 Compare August 30, 2024 11:37

amiraliemami force-pushed the main branch from 2ac97e3 to d02746c Compare August 30, 2024 12:21

lickem22 changed the title ~~[AAQ-765] Retry LLM generation when AlignScore fails~~ [DEPRIORITIZED][AAQ-765] Retry LLM generation when AlignScore fails Nov 6, 2024

lickem22 force-pushed the implement-retry branch from 3858453 to a5a0ac9 Compare February 11, 2025 06:40

		return response


		def is_unable_to_generate_response(response: QueryResponse) -> bool:



		@backoff.on_predicate(
		backoff.expo,



		@backoff.on_predicate(
		backoff.expo,

Conversation

lickem22 commented Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer: @amiraliemami Estimate: 40mins

Ticket

Description

Goal

Changes

Future Tasks (optional)

How has this been tested?

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

suzinyou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lickem22 commented Aug 19, 2024 •

edited

Loading

Reviewer: @amiraliemami
Estimate: 40mins