Fix eval monitors error messages by menakaj · Pull Request #953 · wso2/agent-manager

menakaj · 2026-05-25T05:47:28Z

Purpose

Describe the problems, issues, or needs driving this feature/fix and include links to related issues in the following format: Resolves issue1, issue2, etc.

This is to improve the eval log messages by removing any internal data being printed.

Goals

Describe the solutions that this feature/fix will introduce to resolve the problems described above

Approach

Describe how you are implementing the solutions. Include an animated GIF or screenshot if the change affects the UI (email documentation@wso2.com to review all UI text). Include a link to a Markdown file or Google doc if the feature write-up is too long to paste here.

User stories

Summary of user stories addressed by this change>

Release note

Brief description of the new feature or bug fix as it will appear in the release notes

Documentation

Link(s) to product documentation that addresses the changes of this PR. If no doc impact, enter â��N/Aâ�� plus brief explanation of why thereâ��s no doc impact

Training

Link to the PR for changes to the training content in https://github.com/wso2/WSO2-Training, if applicable

Certification

Type â��Sentâ�� when you have provided new/updated certification questions, plus four answers for each question (correct answer highlighted in bold), based on this change. Certification questions/answers should be sent to certification@wso2.com and NOT pasted in this PR. If there is no impact on certification exams, type â��N/Aâ�� and explain why.

Marketing

Link to drafts of marketing content that will describe and promote this feature, including product page changes, technical articles, blog posts, videos, etc., if applicable

Automation tests

Unit tests

Code coverage information
Integration tests

Details about the test cases and coverage

Security checks

Followed secure coding standards in http://wso2.com/technical-reports/wso2-secure-engineering-guidelines? yes/no
Ran FindSecurityBugs plugin and verified report? yes/no
Confirmed that this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets? yes/no

Samples

Provide high-level details about the samples related to this feature

Related PRs

List any other related PRs

Migrations (if applicable)

Describe migration steps and platforms on which migration has been tested

Test environment

List all JDK versions, operating systems, databases, and browser/versions on which this feature/fix was tested

Learning

Describe the research phase and any blog posts, patterns, libraries, or add-ons you used to solve the problem.

Summary by CodeRabbit

Bug Fixes
- Improved error messages for failed token requests with clearer HTTP status details.
- Enhanced error logging across trace operations to provide consistent, informative feedback while protecting sensitive information from logs.
Refactor
- Standardized error handling and reporting across authentication and trace-fetch operations for better consistency.
- Streamlined monitor startup logging by removing non-essential details while preserving core monitoring information.

coderabbitai · 2026-05-25T05:47:46Z

📝 Walkthrough

Walkthrough

This PR enhances error handling and logging safety across evaluation components by introducing a request error sanitization utility, integrating it throughout trace fetching and runner error handlers, improving OAuth token error messages, and removing sensitive endpoint fields from monitor startup logs.

Changes

Safe Request Error Handling and Logging

Layer / File(s)	Summary
Safe request error utility `libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py`	Introduces `_safe_request_error(e)` helper that converts `requests` exceptions into URL-free descriptions using HTTP status codes for `HTTPError` or exception class names as fallback.
Trace fetcher error sanitization `libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py`	`fetch_traces` and `fetch_trace_by_id` methods now use `_safe_request_error()` in exception logging instead of embedding raw exception objects, preventing URL leakage.
Runner error standardization `libs/amp-evaluation/src/amp_evaluation/runner.py`	Runner module imports `_safe_request_error` and applies it to trace-fetch exception handling in `Experiment` (updating logs and error lists) and `Monitor.run()` contexts.
OAuth token and monitor logging improvements `evaluation-job/main.py`	OAuth token manager wraps `raise_for_status()` call in try/except to provide normalized HTTP status error messages; monitor startup log removes previously included `traces_api_endpoint` field.

Sequence Diagram(s)

(No sequence diagram generated — changes are primarily error-handling improvements and logging adjustments without complex multi-component interactions warranting visualization.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Safe error paths now hide URLs away,
Exception messages brief and clean today,
No secrets leaked in logs we say,
Sanitized errors lead the way! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description provides only the Purpose section with substantive content ('improve eval log messages by removing internal data'). All other template sections contain placeholder headers but no meaningful details, making the description largely incomplete.	Complete the Goals, Approach, and other critical sections to explain the solutions implemented, how errors are sanitized (using _safe_request_error helper), testing performed, and security implications of the logging changes.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix eval monitors error messages' directly corresponds to the main changes: standardizing error messages in evaluation monitoring by removing sensitive internal data (URLs, endpoints) from logged exceptions.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/amp-evaluation/src/amp_evaluation/runner.py`:
- Around line 711-713: The except ValueError block currently logs and appends
the raw exception (logger.warning("Cannot fetch traces: %s", e) and
errors.append(f"Trace fetching failed: {e}")), which can leak request URLs;
replace those uses to pass the exception through the existing sanitizer (call
_safe_request_error(e) or otherwise redact URL content) and log/append the
sanitized string instead (update the logger.warning call and the errors.append
call to use the sanitized result), referencing the except ValueError handler,
logger.warning, errors.append, and the _safe_request_error helper mentioned in
trace/fetcher.py.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f1f1096a-756f-4c73-99d7-187d03d25d6b

📥 Commits

Reviewing files that changed from the base of the PR and between ebf2f34 and cbb90c5.

📒 Files selected for processing (3)

evaluation-job/main.py
libs/amp-evaluation/src/amp_evaluation/runner.py
libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py

coderabbitai · 2026-05-25T05:53:10Z

        except ValueError as e:
-            logger.warning(f"Cannot fetch traces: {e}")
+            logger.warning("Cannot fetch traces: %s", e)
            errors.append(f"Trace fetching failed: {e}")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Python requests, do InvalidURL, MissingSchema, and InvalidSchema inherit from both RequestException and ValueError?

💡 Result:

Yes. In the Requests library, the exception classes MissingSchema, InvalidSchema, and InvalidURL are declared as subclasses of both requests.exceptions.RequestException and ValueError. In requests.exceptions, you can see the class definitions: - class MissingSchema(RequestException, ValueError) [1] - class InvalidSchema(RequestException, ValueError) [1] - class InvalidURL(RequestException, ValueError) [1] This is also reflected directly in the Requests source for exceptions.py. [2]

Citations:

1: https://requests.readthedocs.io/en/latest/_modules/requests/exceptions/

2: https://github.com/psf/requests/blob/main/src/requests/exceptions.py

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="libs/amp-evaluation/src/amp_evaluation/runner.py" echo "== File existence ==" ls -l "$FILE" || true echo "== Show lines around 680-760 ==" sed -n '680,760p' "$FILE" | nl -ba | sed 's/^/ /' echo "== Find _safe_request_error definition ==" rg -n "_safe_request_error" "$FILE" echo "== Show _safe_request_error body ==" # Print a window around the first match LINE=$(rg -n "_safe_request_error" "$FILE" | head -n1 | cut -d: -f1) START=$((LINE-30)) END=$((LINE+60)) sed -n "${START},${END}p" "$FILE" | nl -ba | sed 's/^/ /' echo "== Locate the broader exception branch that mentions _safe_request_error ==" rg -n "_safe_request_error\$e\$" "$FILE"

Repository: wso2/agent-manager

Length of output: 276

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="libs/amp-evaluation/src/amp_evaluation/runner.py" echo "== Show lines around 690-740 with line numbers (no nl) ==" sed -n '690,740p' "$FILE" | awk '{printf "%4d %s\n", NR+689, $0}' echo "== Show lines around 700-725 with line numbers ==" sed -n '700,725p' "$FILE" | awk '{printf "%4d %s\n", NR+699, $0}' echo "== Find _safe_request_error definition ==" rg -n "_safe_request_error" "$FILE" || true echo "== Show _safe_request_error body (a window around its first definition) ==" DEF_LINE=$(rg -n "_safe_request_error" "$FILE" | head -n1 | cut -d: -f1) START=$((DEF_LINE-20)) END=$((DEF_LINE+80)) sed -n "${START},${END}p" "$FILE" | awk -v s=$START '{printf "%4d %s\n", s+NR-1, $0}' echo "== Locate ValueError branch ==" rg -n "except ValueError as e" "$FILE" || true echo "== Show the ValueError branch block (a small window) ==" VAL_LINE=$(rg -n "except ValueError as e" "$FILE" | head -n1 | cut -d: -f1) START=$((VAL_LINE-10)) END=$((VAL_LINE+12)) sed -n "${START},${END}p" "$FILE" | awk -v s=$START '{printf "%4d %s\n", s+NR-1, $0}'

Repository: wso2/agent-manager

Length of output: 9186

🏁 Script executed:

#!/bin/bash set -euo pipefail REPO_ROOT="." FILE="$(fd -a "fetcher.py" libs/amp-evaluation/src/amp_evaluation/trace | head -n1 || true)" echo "fetcher.py: $FILE" if [ -z "${FILE:-}" ]; then # fallback: search for _safe_request_error definition echo "Locating _safe_request_error definition..." rg -n "def _safe_request_error" libs/amp-evaluation/src/amp_evaluation -S || true exit 0 fi echo "== Show _safe_request_error definition ==" DEF_LINE=$(rg -n "def _safe_request_error" "$FILE" | head -n1 | cut -d: -f1) START=$((DEF_LINE-20)) END=$((DEF_LINE+80)) sed -n "${START},${END}p" "$FILE" | awk -v s=$START '{printf "%4d %s\n", NR+s, $0}' echo "== Find any references to _safe_request_error(e) ==" rg -n "_safe_request_error\$e\$" -S libs/amp-evaluation/src/amp_evaluation/trace "$FILE" || true echo "== Find mention of trace fetching errors ==" rg -n "Trace fetching|fetch traces|Failed to fetch traces" "$FILE" -S || true

Repository: wso2/agent-manager

Length of output: 4440

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="libs/amp-evaluation/src/amp_evaluation/runner.py" echo "== Context before except ValueError (try start area) ==" sed -n '620,730p' "$FILE" | awk '{print NR+619, $0}' echo "== Find 'Cannot fetch traces' occurrences ==" rg -n "Cannot fetch traces" "$FILE" -n echo "== Find surrounding 'Trace fetching' logic ==" rg -n "Trace fetching failed|Trace fetching error|Error during trace fetching" "$FILE" -n echo "== Locate the function/method containing this block ==" # Print the closest def/class above line 711 awk 'NR==711{print "target line is 711"} {print NR ":" $0}' "$FILE" | sed -n '680,740p' >/tmp/snippet.txt tail -n +1 /tmp/snippet.txt | head -n 40

Repository: wso2/agent-manager

Length of output: 7473

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="libs/amp-evaluation/src/amp_evaluation/runner.py" echo "== Find _fetch_traces definition ==" rg -n "def _fetch_traces" "$FILE" LINE=$(rg -n "def _fetch_traces" "$FILE" | head -n1 | cut -d: -f1) START=$((LINE-30)) END=$((LINE+160)) sed -n "${START},${END}p" "$FILE" | awk '{print NR+ '"$START"' -1 ":", $0}' | head -n 220

Repository: wso2/agent-manager

Length of output: 8615

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py" echo "== Find fetch_traces implementations ==" rg -n "def fetch_traces|fetch_traces\\(" "$FILE" echo "== Show likely fetch_traces body (first match) ==" LINE=$(rg -n "def fetch_traces" "$FILE" | head -n1 | cut -d: -f1) START=$((LINE-30)) END=$((LINE+220)) sed -n "${START},${END}p" "$FILE" | awk 'BEGIN{c=0} {print NR+ '"$START"-1'" ":" $0}' | head -n 260 echo "== Look for requests.* calls in fetcher.py ==" rg -n "requests\\.(get|post|request)|_request\\(|HTTPError|MissingSchema|InvalidURL|InvalidSchema" "$FILE" echo "== Search for any broad try/except that converts exceptions ==" rg -n "except (ValueError|RequestException|requests\\.exceptions|Exception)" "$FILE" | head -n 50

Repository: wso2/agent-manager

Length of output: 414

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py" echo "== Show around fetch_traces method definition (line near 427) ==" sed -n '400,560p' "$FILE" | nl -ba | head -n 200

Repository: wso2/agent-manager

Length of output: 188

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py" echo "== requests usage in fetcher.py ==" rg -n "requests\\.(get|post|request)|_request\\(|HTTPError|MissingSchema|InvalidURL|InvalidSchema|InvalidSchema" "$FILE"

Repository: wso2/agent-manager

Length of output: 330

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py" echo "== Relevant exception handling in fetcher.py ==" rg -n "except (ValueError|RequestException|requests\\.exceptions\\.|Exception)" "$FILE" | head -n 80

Repository: wso2/agent-manager

Length of output: 264

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py" echo "== Show around earlier caller 'traces = fetcher.fetch_traces(' (line ~373) ==" sed -n '330,440p' "$FILE" | nl -ba | head -n 160

Repository: wso2/agent-manager

Length of output: 201

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE="libs/amp-evaluation/src/amp_evaluation/trace/fetcher.py" echo "== Show fetch_traces method around requests.get and exception handlers ==" sed -n '380,560p' "$FILE" | awk 'BEGIN{c=0} {c++; printf "%4d %s\n", 380+c-1, $0}'

Repository: wso2/agent-manager

Length of output: 7349

Sanitize the except ValueError trace-fetching path to avoid request URL leakage.

In libs/amp-evaluation/src/amp_evaluation/runner.py, the except ValueError as e block logs and appends e directly (lines 711-713). Since requests URL exceptions like MissingSchema/InvalidSchema/InvalidURL inherit from ValueError, they can hit this branch when trace/fetcher.py re-raises requests.exceptions.RequestException. Their string forms may include the bad URL, defeating the _safe_request_error logging goal.

🔧 Proposed fix

- except ValueError as e: - logger.warning("Cannot fetch traces: %s", e) - errors.append(f"Trace fetching failed: {e}") + except ValueError as e: + safe = _safe_request_error(e) + logger.warning("Cannot fetch traces: %s", safe) + errors.append(f"Trace fetching failed: {safe}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libs/amp-evaluation/src/amp_evaluation/runner.py` around lines 711 - 713, The except ValueError block currently logs and appends the raw exception (logger.warning("Cannot fetch traces: %s", e) and errors.append(f"Trace fetching failed: {e}")), which can leak request URLs; replace those uses to pass the exception through the existing sanitizer (call _safe_request_error(e) or otherwise redact URL content) and log/append the sanitized string instead (update the logger.warning call and the errors.append call to use the sanitized result), referencing the except ValueError handler, logger.warning, errors.append, and the _safe_request_error helper mentioned in trace/fetcher.py.

Fix eval monitors error messages

cbb90c5

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix eval monitors error messages#953

Fix eval monitors error messages#953
menakaj wants to merge 1 commit into
wso2:mainfrom
menakaj:eval-logging-fix

menakaj commented May 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

menakaj commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Goals

Approach

User stories

Release note

Documentation

Training

Certification

Marketing

Automation tests

Security checks

Samples

Related PRs

Migrations (if applicable)

Test environment

Learning

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

menakaj commented May 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading