📋 CheckList
🐛 Bug Description
In the FailRetry execution mode, the system is designed to load historical results from previous runs and skip evaluators that have already successfully completed.
However, two critical defects have been identified that cause this mechanism to fail completely.
Defect 1: Incorrect Historical Result ID Mapping (Root Cause)
1.1 Method Context & Responsibility
- Method:
GetExptItemTurnResults
- Location:
backend/modules/evaluation/domain/service/expt_result_impl.go
- Caller:
ExptRecordEvalModeFailRetry.PreEval (in expt_run_item_event_impl.go), called during the initialization phase of FailRetry mode.
- Business Goal: To fetch detailed results (Turn Results) of the Item's previous run from the database. These results are converted into a
RunLog and stored in the expt_turn_result_run_log table, serving as the "historical memory" for the Worker execution.
1.2 Method Signature & Data Structure
- Input:
exptID, itemID (Locking the specific experiment and data item)
- Output:
[]*entity.ExptTurnResult
- This structure contains a critical field
EvaluatorResults (Type: *entity.EvaluatorResults).
- Internally, it holds a Map:
EvalVerIDToResID map[int64]int64 (Expected Key: Evaluator Version ID -> Value: Evaluator Record ID).
1.3 Defect Analysis
When assembling the EvaluatorResults Map, the code incorrectly assigns EvaluatorVersionID as the Value, whereas it should assign EvaluatorResultID.
Code Snippet (The Bug):
// backend/modules/evaluation/domain/service/expt_result_impl.go
// refs are records fetched from the intermediate table expt_turn_evaluator_result_ref
// Contains: {ExptTurnResultID, EvaluatorVersionID, EvaluatorResultID}
for _, ref := range refs {
// ...
// [CRITICAL BUG]
// Expected: Value = ref.EvaluatorResultID (Primary Key of evaluator_record table, e.g., 748392...)
// Actual: Value = ref.EvaluatorVersionID (Config ID of the evaluator, e.g., 1001)
turnEvaluatorVerIDToResultID[ref.ExptTurnResultID][ref.EvaluatorVersionID] = ref.EvaluatorVersionID
}
1.4 Downstream Chain Reaction (Consequences)
This error triggers a domino effect of failures:
-
Persistence Phase (PreEval):
- The
PreEval method calls ToRunLogDO(), serializing this incorrect Map {VerID: VerID} into JSON.
- This JSON is stored in the
evaluator_result_ids field of the expt_turn_result_run_log table.
- DB Actual:
{"EvalVerIDToResID":{"1001":1001}} (Incorrect, points to VersionID)
- DB Expected:
{"EvalVerIDToResID":{"1001":74839201}} (Correct, points to RecordID)
-
Worker Loading Phase (buildExptTurnEvalCtx):
- When the Worker (
ExptItemEvalCtxExecutor) starts, it reads the Log and parses out {1001: 1001}.
- It calls
evaluatorRecordService.BatchGetEvaluatorRecord(ids=[1001]).
- Failure: The system attempts to find a record with
ID=1001 in the evaluator_record table. Since 1001 is a VersionID and not a RecordID, the query returns empty.
-
Execution Decision Phase (CallEvaluators):
- The Worker finds no historical records loaded in memory.
- The logic
if existResult != nil fails.
- Result: The Worker assumes the evaluator has never run and initiates a new RPC call, causing duplicate billing.
Defect 2: Worker Context Cache Key Mismatch (Secondary Issue)
2.1 Problem Description
Even if Defect 1 is fixed (ensuring the database stores the correct RecordID), the Worker's internal context caching mechanism has a logic flaw where the Write Key and Read Key do not match, causing cache lookups to fail.
2.2 Key Objects & Variables
- Execution Context (Worker Context):
etec (Type: *entity.ExptTurnEvalCtx)
- Cache Map Variable:
etec.ExptTurnRunResult.EvaluatorResults (Type: map[int64]*entity.EvaluatorRecord)
- Responsibility: Caches loaded historical evaluator records in the Worker's memory for fast lookup in subsequent steps to avoid duplicate calls.
2.3 Logic Conflict Analysis
Writer Side (Map Construction)
- Location:
backend/modules/evaluation/domain/service/expt_run_item_impl.go
- Method:
buildExptTurnEvalCtx
- Timing: Before the Worker starts processing a Turn, responsible for building the execution context.
- Behavior: Upon successfully fetching
evaluatorRecords from the DB, it builds the Map.
- Flawed Logic:
// Uses record.ID (RecordID) as the Map Key
recordMap[record.ID] = record
etec.ExptTurnRunResult.EvaluatorResults = recordMap
- Result: In-memory Map structure is
{74839201: RecordObject} (Key is RecordID).
Reader Side (Map Lookup)
- Location:
backend/modules/evaluation/domain/service/expt_run_item_turn_impl.go
- Method:
CallEvaluators (calls GetEvaluatorRecord)
- Timing: After the Worker finishes calling the LLM, preparing to execute evaluators one by one.
- Behavior: Checks the cache to decide whether to skip an evaluator.
- Flawed Logic:
// Uses evaluatorVersion.GetEvaluatorVersionID() (VersionID) as the Key
existResult := etec.ExptTurnRunResult.GetEvaluatorRecord(evaluatorVersion.GetEvaluatorVersionID())
(Underlying implementation: return e.EvaluatorResults[evaluatorVersionID])
- Lookup Action: Attempts to find Key
1001 in the Map.
2.4 Consequence: Cache Never Hits
This is a classic Key Mismatch.
- The memory Map stores
{74839201: Record}.
- The code tries to retrieve
Map[1001].
- Result: Returns
nil.
- Business Impact: The Worker misjudges as "not executed", rendering the checkpoint recovery mechanism ineffective and re-executing the evaluator.
🔄 Steps to Reproduce
This issue can be stably reproduced by observing the logs and database state during a retry operation.
- Prepare an Experiment: Create an experiment with at least one evaluator (e.g., an LLM-based evaluator).
- Run & Fail: Run the experiment and ensure it fails after the evaluator has successfully run (e.g., simulate a timeout or error in a subsequent step, or manually interrupt it). Ensure the
evaluator_record table has a successful record for this run.
- Trigger Retry: Restart the experiment in FailRetry mode.
- Observe Logs & DB:
- Check DB: Inspect the
expt_turn_result_run_log table for the new run. The evaluator_result_ids JSON field will show {"1001": 1001} (where 1001 is the VersionID), instead of {"1001": 74839...} (where 74839... is the RecordID).
- Check Worker Logs: The Worker logs will show it calling the LLM/Evaluator again, instead of logging "skip evaluator".
✅ Expected Behavior
- Database: The
expt_turn_result_run_log table should store a correct mapping of EvaluatorVersionID -> EvaluatorResultID (e.g., {"1001": 74839201}).
- Worker Logic:
- The Worker should successfully load the
EvaluatorRecord using the ID from the log.
- The Worker should populate its internal cache map using
EvaluatorVersionID as the key.
- When iterating through evaluators, the
GetEvaluatorRecord(VersionID) check should return the cached record.
- The evaluator execution should be skipped.
❌ Actual Behavior
- Database: The
expt_turn_result_run_log table stores an incorrect mapping of EvaluatorVersionID -> EvaluatorVersionID (e.g., {"1001": 1001}).(This phenomenon will eventually be fixed, meaning the evaluator_result_ids in the expt_turn_result_run_log database will ultimately hold the correct values. Even if an error occurs here, the issue will be resolved during the subsequent process of re-calling the LLM and re-writing to the database. )
- Worker Logic:
- The Worker attempts to query
evaluator_record with ID 1001 and finds nothing.
- Even if the ID were correct, the Worker builds its cache map using
RecordID as the key (Map[74839201] = Record).
- The lookup logic tries to find
Map[1001], which returns nil.
- The system proceeds to re-execute the evaluator.
🚨 Severity
Low - Cosmetic issue or minor inconvenience
🔧 Component
None
💻 Environment
No response
🔧 Go Environment
No response
📋 Logs
📝 Additional Context
To fully resolve this issue, fixes must be applied to both the data source generation and the Worker's cache logic.
Fix 1: Correct Data Mapping in Domain Service
File: backend/modules/evaluation/domain/service/expt_result_impl.go
Method: GetExptItemTurnResults
Change the assignment to use the correct EvaluatorResultID:
// Before
turnEvaluatorVerIDToResultID[ref.ExptTurnResultID][ref.EvaluatorVersionID] = ref.EvaluatorVersionID
// After (Fix)
turnEvaluatorVerIDToResultID[ref.ExptTurnResultID][ref.EvaluatorVersionID] = ref.EvaluatorResultID
Fix 2: Align Worker Cache Key
File: backend/modules/evaluation/domain/service/expt_run_item_impl.go
Method: buildExptTurnEvalCtx
Change the map key to EvaluatorVersionID to match the lookup logic in GetEvaluatorRecord:
// Before
recordMap[record.ID] = record
// After (Fix)
recordMap[record.EvaluatorVersionID] = record
I have accurately identified and fixed the issue, and the test results meet expectations. Please assign this task to me, and I will submit a PR to resolve it.
📋 CheckList
🐛 Bug Description
In the FailRetry execution mode, the system is designed to load historical results from previous runs and skip evaluators that have already successfully completed.
However, two critical defects have been identified that cause this mechanism to fail completely.
Defect 1: Incorrect Historical Result ID Mapping (Root Cause)
1.1 Method Context & Responsibility
GetExptItemTurnResultsbackend/modules/evaluation/domain/service/expt_result_impl.goExptRecordEvalModeFailRetry.PreEval(inexpt_run_item_event_impl.go), called during the initialization phase of FailRetry mode.RunLogand stored in theexpt_turn_result_run_logtable, serving as the "historical memory" for the Worker execution.1.2 Method Signature & Data Structure
exptID,itemID(Locking the specific experiment and data item)[]*entity.ExptTurnResultEvaluatorResults(Type:*entity.EvaluatorResults).EvalVerIDToResID map[int64]int64(Expected Key: Evaluator Version ID -> Value: Evaluator Record ID).1.3 Defect Analysis
When assembling the
EvaluatorResultsMap, the code incorrectly assigns EvaluatorVersionID as the Value, whereas it should assign EvaluatorResultID.Code Snippet (The Bug):
1.4 Downstream Chain Reaction (Consequences)
This error triggers a domino effect of failures:
Persistence Phase (PreEval):
PreEvalmethod callsToRunLogDO(), serializing this incorrect Map{VerID: VerID}into JSON.evaluator_result_idsfield of theexpt_turn_result_run_logtable.{"EvalVerIDToResID":{"1001":1001}}(Incorrect, points to VersionID){"EvalVerIDToResID":{"1001":74839201}}(Correct, points to RecordID)Worker Loading Phase (buildExptTurnEvalCtx):
ExptItemEvalCtxExecutor) starts, it reads the Log and parses out{1001: 1001}.evaluatorRecordService.BatchGetEvaluatorRecord(ids=[1001]).ID=1001in theevaluator_recordtable. Since 1001 is a VersionID and not a RecordID, the query returns empty.Execution Decision Phase (CallEvaluators):
if existResult != nilfails.Defect 2: Worker Context Cache Key Mismatch (Secondary Issue)
2.1 Problem Description
Even if Defect 1 is fixed (ensuring the database stores the correct RecordID), the Worker's internal context caching mechanism has a logic flaw where the Write Key and Read Key do not match, causing cache lookups to fail.
2.2 Key Objects & Variables
etec(Type:*entity.ExptTurnEvalCtx)etec.ExptTurnRunResult.EvaluatorResults(Type:map[int64]*entity.EvaluatorRecord)2.3 Logic Conflict Analysis
Writer Side (Map Construction)
backend/modules/evaluation/domain/service/expt_run_item_impl.gobuildExptTurnEvalCtxevaluatorRecordsfrom the DB, it builds the Map.{74839201: RecordObject}(Key is RecordID).Reader Side (Map Lookup)
backend/modules/evaluation/domain/service/expt_run_item_turn_impl.goCallEvaluators(callsGetEvaluatorRecord)return e.EvaluatorResults[evaluatorVersionID])1001in the Map.2.4 Consequence: Cache Never Hits
This is a classic Key Mismatch.
{74839201: Record}.Map[1001].nil.🔄 Steps to Reproduce
This issue can be stably reproduced by observing the logs and database state during a retry operation.
evaluator_recordtable has a successful record for this run.expt_turn_result_run_logtable for the new run. Theevaluator_result_idsJSON field will show{"1001": 1001}(where1001is the VersionID), instead of{"1001": 74839...}(where74839...is the RecordID).✅ Expected Behavior
expt_turn_result_run_logtable should store a correct mapping ofEvaluatorVersionID -> EvaluatorResultID(e.g.,{"1001": 74839201}).EvaluatorRecordusing the ID from the log.EvaluatorVersionIDas the key.GetEvaluatorRecord(VersionID)check should return the cached record.❌ Actual Behavior
expt_turn_result_run_logtable stores an incorrect mapping ofEvaluatorVersionID -> EvaluatorVersionID(e.g.,{"1001": 1001}).(This phenomenon will eventually be fixed, meaning the evaluator_result_ids in the expt_turn_result_run_log database will ultimately hold the correct values. Even if an error occurs here, the issue will be resolved during the subsequent process of re-calling the LLM and re-writing to the database. )evaluator_recordwith ID1001and finds nothing.RecordIDas the key (Map[74839201] = Record).Map[1001], which returnsnil.🚨 Severity
Low - Cosmetic issue or minor inconvenience
🔧 Component
None
💻 Environment
No response
🔧 Go Environment
No response
📋 Logs
📝 Additional Context
To fully resolve this issue, fixes must be applied to both the data source generation and the Worker's cache logic.
Fix 1: Correct Data Mapping in Domain Service
File:
backend/modules/evaluation/domain/service/expt_result_impl.goMethod:
GetExptItemTurnResultsChange the assignment to use the correct
EvaluatorResultID:Fix 2: Align Worker Cache Key
File:
backend/modules/evaluation/domain/service/expt_run_item_impl.goMethod:
buildExptTurnEvalCtxChange the map key to
EvaluatorVersionIDto match the lookup logic inGetEvaluatorRecord:I have accurately identified and fixed the issue, and the test results meet expectations. Please assign this task to me, and I will submit a PR to resolve it.