Skip to content

Fix eval monitors error messages#953

Open
menakaj wants to merge 1 commit into
wso2:mainfrom
menakaj:eval-logging-fix
Open

Fix eval monitors error messages#953
menakaj wants to merge 1 commit into
wso2:mainfrom
menakaj:eval-logging-fix

Conversation

@menakaj
Copy link
Copy Markdown
Contributor

@menakaj menakaj commented May 25, 2026

Purpose

Describe the problems, issues, or needs driving this feature/fix and include links to related issues in the following format: Resolves issue1, issue2, etc.

This is to improve the eval log messages by removing any internal data being printed.

Goals

Describe the solutions that this feature/fix will introduce to resolve the problems described above

Approach

Describe how you are implementing the solutions. Include an animated GIF or screenshot if the change affects the UI (email documentation@wso2.com to review all UI text). Include a link to a Markdown file or Google doc if the feature write-up is too long to paste here.

User stories

Summary of user stories addressed by this change>

Release note

Brief description of the new feature or bug fix as it will appear in the release notes

Documentation

Link(s) to product documentation that addresses the changes of this PR. If no doc impact, enter �N/A� plus brief explanation of why there�s no doc impact

Training

Link to the PR for changes to the training content in https://github.com/wso2/WSO2-Training, if applicable

Certification

Type �Sent� when you have provided new/updated certification questions, plus four answers for each question (correct answer highlighted in bold), based on this change. Certification questions/answers should be sent to certification@wso2.com and NOT pasted in this PR. If there is no impact on certification exams, type �N/A� and explain why.

Marketing

Link to drafts of marketing content that will describe and promote this feature, including product page changes, technical articles, blog posts, videos, etc., if applicable

Automation tests

  • Unit tests

    Code coverage information

  • Integration tests

    Details about the test cases and coverage

Security checks

Samples

Provide high-level details about the samples related to this feature

Related PRs

List any other related PRs

Migrations (if applicable)

Describe migration steps and platforms on which migration has been tested

Test environment

List all JDK versions, operating systems, databases, and browser/versions on which this feature/fix was tested

Learning

Describe the research phase and any blog posts, patterns, libraries, or add-ons you used to solve the problem.

Summary by CodeRabbit

  • Bug Fixes

    • Improved error messages for failed token requests with clearer HTTP status details.
    • Enhanced error logging across trace operations to provide consistent, informative feedback while protecting sensitive information from logs.
  • Refactor

    • Standardized error handling and reporting across authentication and trace-fetch operations for better consistency.
    • Streamlined monitor startup logging by removing non-essential details while preserving core monitoring information.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

📝 Walkthrough

Walkthrough

This PR enhances error handling and logging safety across evaluation components by introducing a request error sanitization utility, integrating it throughout trace fetching and runner error handlers, improving OAuth token error messages, and removing sensitive endpoint fields from monitor startup logs.

Changes

Safe Request Error Handling and Logging

Layer / File(s) Summary
Safe request error utility
libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py
Introduces _safe_request_error(e) helper that converts requests exceptions into URL-free descriptions using HTTP status codes for HTTPError or exception class names as fallback.
Trace fetcher error sanitization
libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py
fetch_traces and fetch_trace_by_id methods now use _safe_request_error() in exception logging instead of embedding raw exception objects, preventing URL leakage.
Runner error standardization
libs/amp-evaluation/src/amp_evaluation/runner.py
Runner module imports _safe_request_error and applies it to trace-fetch exception handling in Experiment (updating logs and error lists) and Monitor.run() contexts.
OAuth token and monitor logging improvements
evaluation-job/main.py
OAuth token manager wraps raise_for_status() call in try/except to provide normalized HTTP status error messages; monitor startup log removes previously included traces_api_endpoint field.

Sequence Diagram(s)

(No sequence diagram generated — changes are primarily error-handling improvements and logging adjustments without complex multi-component interactions warranting visualization.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Safe error paths now hide URLs away,
Exception messages brief and clean today,
No secrets leaked in logs we say,
Sanitized errors lead the way!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description provides only the Purpose section with substantive content ('improve eval log messages by removing internal data'). All other template sections contain placeholder headers but no meaningful details, making the description largely incomplete. Complete the Goals, Approach, and other critical sections to explain the solutions implemented, how errors are sanitized (using _safe_request_error helper), testing performed, and security implications of the logging changes.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix eval monitors error messages' directly corresponds to the main changes: standardizing error messages in evaluation monitoring by removing sensitive internal data (URLs, endpoints) from logged exceptions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/amp-evaluation/src/amp_evaluation/runner.py`:
- Around line 711-713: The except ValueError block currently logs and appends
the raw exception (logger.warning("Cannot fetch traces: %s", e) and
errors.append(f"Trace fetching failed: {e}")), which can leak request URLs;
replace those uses to pass the exception through the existing sanitizer (call
_safe_request_error(e) or otherwise redact URL content) and log/append the
sanitized string instead (update the logger.warning call and the errors.append
call to use the sanitized result), referencing the except ValueError handler,
logger.warning, errors.append, and the _safe_request_error helper mentioned in
trace/fetcher.py.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f1f1096a-756f-4c73-99d7-187d03d25d6b

📥 Commits

Reviewing files that changed from the base of the PR and between ebf2f34 and cbb90c5.

📒 Files selected for processing (3)
  • evaluation-job/main.py
  • libs/amp-evaluation/src/amp_evaluation/runner.py
  • libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py

Comment on lines 711 to 713
except ValueError as e:
logger.warning(f"Cannot fetch traces: {e}")
logger.warning("Cannot fetch traces: %s", e)
errors.append(f"Trace fetching failed: {e}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Python requests, do InvalidURL, MissingSchema, and InvalidSchema inherit from both RequestException and ValueError?

💡 Result:

Yes. In the Requests library, the exception classes MissingSchema, InvalidSchema, and InvalidURL are declared as subclasses of both requests.exceptions.RequestException and ValueError. In requests.exceptions, you can see the class definitions: - class MissingSchema(RequestException, ValueError) [1] - class InvalidSchema(RequestException, ValueError) [1] - class InvalidURL(RequestException, ValueError) [1] This is also reflected directly in the Requests source for exceptions.py. [2]

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="libs/amp-evaluation/src/amp_evaluation/runner.py"

echo "== File existence =="
ls -l "$FILE" || true

echo "== Show lines around 680-760 =="
sed -n '680,760p' "$FILE" | nl -ba | sed 's/^/ /'

echo "== Find _safe_request_error definition =="
rg -n "_safe_request_error" "$FILE"

echo "== Show _safe_request_error body =="
# Print a window around the first match
LINE=$(rg -n "_safe_request_error" "$FILE" | head -n1 | cut -d: -f1)
START=$((LINE-30))
END=$((LINE+60))
sed -n "${START},${END}p" "$FILE" | nl -ba | sed 's/^/ /'

echo "== Locate the broader exception branch that mentions _safe_request_error =="
rg -n "_safe_request_error\\(e\\)" "$FILE"

Repository: wso2/agent-manager

Length of output: 276


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="libs/amp-evaluation/src/amp_evaluation/runner.py"

echo "== Show lines around 690-740 with line numbers (no nl) =="
sed -n '690,740p' "$FILE" | awk '{printf "%4d  %s\n", NR+689, $0}'

echo "== Show lines around 700-725 with line numbers =="
sed -n '700,725p' "$FILE" | awk '{printf "%4d  %s\n", NR+699, $0}'

echo "== Find _safe_request_error definition =="
rg -n "_safe_request_error" "$FILE" || true

echo "== Show _safe_request_error body (a window around its first definition) =="
DEF_LINE=$(rg -n "_safe_request_error" "$FILE" | head -n1 | cut -d: -f1)
START=$((DEF_LINE-20))
END=$((DEF_LINE+80))
sed -n "${START},${END}p" "$FILE" | awk -v s=$START '{printf "%4d  %s\n", s+NR-1, $0}'

echo "== Locate ValueError branch =="
rg -n "except ValueError as e" "$FILE" || true

echo "== Show the ValueError branch block (a small window) =="
VAL_LINE=$(rg -n "except ValueError as e" "$FILE" | head -n1 | cut -d: -f1)
START=$((VAL_LINE-10))
END=$((VAL_LINE+12))
sed -n "${START},${END}p" "$FILE" | awk -v s=$START '{printf "%4d  %s\n", s+NR-1, $0}'

Repository: wso2/agent-manager

Length of output: 9186


🏁 Script executed:

#!/bin/bash
set -euo pipefail

REPO_ROOT="."
FILE="$(fd -a "fetcher.py" libs/amp-evaluation/src/amp_evaluation/trace | head -n1 || true)"
echo "fetcher.py: $FILE"

if [ -z "${FILE:-}" ]; then
  # fallback: search for _safe_request_error definition
  echo "Locating _safe_request_error definition..."
  rg -n "def _safe_request_error" libs/amp-evaluation/src/amp_evaluation -S || true
  exit 0
fi

echo "== Show _safe_request_error definition =="
DEF_LINE=$(rg -n "def _safe_request_error" "$FILE" | head -n1 | cut -d: -f1)
START=$((DEF_LINE-20))
END=$((DEF_LINE+80))
sed -n "${START},${END}p" "$FILE" | awk -v s=$START '{printf "%4d  %s\n", NR+s, $0}'

echo "== Find any references to _safe_request_error(e) =="
rg -n "_safe_request_error\\(e\\)" -S libs/amp-evaluation/src/amp_evaluation/trace "$FILE" || true

echo "== Find mention of trace fetching errors =="
rg -n "Trace fetching|fetch traces|Failed to fetch traces" "$FILE" -S || true

Repository: wso2/agent-manager

Length of output: 4440


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="libs/amp-evaluation/src/amp_evaluation/runner.py"

echo "== Context before except ValueError (try start area) =="
sed -n '620,730p' "$FILE" | awk '{print NR+619, $0}'

echo "== Find 'Cannot fetch traces' occurrences =="
rg -n "Cannot fetch traces" "$FILE" -n

echo "== Find surrounding 'Trace fetching' logic =="
rg -n "Trace fetching failed|Trace fetching error|Error during trace fetching" "$FILE" -n

echo "== Locate the function/method containing this block =="
# Print the closest def/class above line 711
awk 'NR==711{print "target line is 711"} {print NR ":" $0}' "$FILE" | sed -n '680,740p' >/tmp/snippet.txt
tail -n +1 /tmp/snippet.txt | head -n 40

Repository: wso2/agent-manager

Length of output: 7473


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="libs/amp-evaluation/src/amp_evaluation/runner.py"

echo "== Find _fetch_traces definition =="
rg -n "def _fetch_traces" "$FILE"

LINE=$(rg -n "def _fetch_traces" "$FILE" | head -n1 | cut -d: -f1)
START=$((LINE-30))
END=$((LINE+160))
sed -n "${START},${END}p" "$FILE" | awk '{print NR+ '"$START"' -1 ":", $0}' | head -n 220

Repository: wso2/agent-manager

Length of output: 8615


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py"

echo "== Find fetch_traces implementations =="
rg -n "def fetch_traces|fetch_traces\\(" "$FILE"

echo "== Show likely fetch_traces body (first match) =="
LINE=$(rg -n "def fetch_traces" "$FILE" | head -n1 | cut -d: -f1)
START=$((LINE-30))
END=$((LINE+220))
sed -n "${START},${END}p" "$FILE" | awk 'BEGIN{c=0} {print NR+ '"$START"-1'" ":" $0}' | head -n 260

echo "== Look for requests.* calls in fetcher.py =="
rg -n "requests\\.(get|post|request)|_request\\(|HTTPError|MissingSchema|InvalidURL|InvalidSchema" "$FILE"

echo "== Search for any broad try/except that converts exceptions =="
rg -n "except (ValueError|RequestException|requests\\.exceptions|Exception)" "$FILE" | head -n 50

Repository: wso2/agent-manager

Length of output: 414


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py"

echo "== Show around fetch_traces method definition (line near 427) =="
sed -n '400,560p' "$FILE" | nl -ba | head -n 200

Repository: wso2/agent-manager

Length of output: 188


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py"

echo "== requests usage in fetcher.py =="
rg -n "requests\\.(get|post|request)|_request\\(|HTTPError|MissingSchema|InvalidURL|InvalidSchema|InvalidSchema" "$FILE"

Repository: wso2/agent-manager

Length of output: 330


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py"

echo "== Relevant exception handling in fetcher.py =="
rg -n "except (ValueError|RequestException|requests\\.exceptions\\.|Exception)" "$FILE" | head -n 80

Repository: wso2/agent-manager

Length of output: 264


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py"

echo "== Show around earlier caller 'traces = fetcher.fetch_traces(' (line ~373) =="
sed -n '330,440p' "$FILE" | nl -ba | head -n 160

Repository: wso2/agent-manager

Length of output: 201


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py"

echo "== Show fetch_traces method around requests.get and exception handlers =="
sed -n '380,560p' "$FILE" | awk 'BEGIN{c=0} {c++; printf "%4d  %s\n", 380+c-1, $0}'

Repository: wso2/agent-manager

Length of output: 7349


Sanitize the except ValueError trace-fetching path to avoid request URL leakage.

In libs/amp-evaluation/src/amp_evaluation/runner.py, the except ValueError as e block logs and appends e directly (lines 711-713). Since requests URL exceptions like MissingSchema/InvalidSchema/InvalidURL inherit from ValueError, they can hit this branch when trace/fetcher.py re-raises requests.exceptions.RequestException. Their string forms may include the bad URL, defeating the _safe_request_error logging goal.

🔧 Proposed fix
-        except ValueError as e:
-            logger.warning("Cannot fetch traces: %s", e)
-            errors.append(f"Trace fetching failed: {e}")
+        except ValueError as e:
+            safe = _safe_request_error(e)
+            logger.warning("Cannot fetch traces: %s", safe)
+            errors.append(f"Trace fetching failed: {safe}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/amp-evaluation/src/amp_evaluation/runner.py` around lines 711 - 713, The
except ValueError block currently logs and appends the raw exception
(logger.warning("Cannot fetch traces: %s", e) and errors.append(f"Trace fetching
failed: {e}")), which can leak request URLs; replace those uses to pass the
exception through the existing sanitizer (call _safe_request_error(e) or
otherwise redact URL content) and log/append the sanitized string instead
(update the logger.warning call and the errors.append call to use the sanitized
result), referencing the except ValueError handler, logger.warning,
errors.append, and the _safe_request_error helper mentioned in trace/fetcher.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant