Skip to content

[Backport 2.19] Monitoring: Accurately capture predict failures (#4525)#4841

Open
pyek-bot wants to merge 1 commit into
opensearch-project:2.19from
pyek-bot:backport/backport-4555-to-2.19
Open

[Backport 2.19] Monitoring: Accurately capture predict failures (#4525)#4841
pyek-bot wants to merge 1 commit into
opensearch-project:2.19from
pyek-bot:backport/backport-4555-to-2.19

Conversation

@pyek-bot
Copy link
Copy Markdown
Collaborator

@pyek-bot pyek-bot commented May 27, 2026

Summary

  • Backport of [Backport 3.3] Monitoring: Accurately capture predict failures #4555 to 2.19
  • Introduces shouldTrackRemoteFailure(Exception e) method that skips tracking for user/configuration issues (4xx HTTP errors, IllegalArgumentException) while still tracking infrastructure/service issues (5xx errors, other exceptions)
  • Fixes two call sites in runPredict that previously hard-coded false (never tracking remote failures)

Test plan

  • Unit test testShouldTrackRemoteFailure validates all scenarios

) (opensearch-project#4555)

* fix: correctly capture ml failure stats on predict failures

* refactor: remove redundant braces

* refactor: dont track any 400 errors and add positive test case

* fix: apply spotless

---------

(cherry picked from commit a28d344)

Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Co-authored-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
@pyek-bot pyek-bot force-pushed the backport/backport-4555-to-2.19 branch from 56e412f to 69e8df5 Compare May 27, 2026 22:29
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

PR Reviewer Guide 🔍

(Review updated until commit 69e8df5)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

PR Code Suggestions ✨

Latest suggestions up to 69e8df5
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Check exception cause chain

The method doesn't handle wrapped exceptions where the root cause might be an
IllegalArgumentException or OpenSearchStatusException. Consider checking the
exception's cause chain to ensure user configuration errors are properly identified
even when wrapped in other exception types.

plugin/src/main/java/org/opensearch/ml/task/MLPredictTaskRunner.java [608-624]

 boolean shouldTrackRemoteFailure(Exception e) {
     // Don't track failures for user configuration issues
-    if (e instanceof IllegalArgumentException) {
-        return false;
-    }
-
-    // Don't track any 4xx client errors (user/configuration issues)
-    if (e instanceof OpenSearchStatusException) {
-        RestStatus status = ((OpenSearchStatusException) e).status();
-        if (status.getStatus() >= 400 && status.getStatus() < 500) {
+    Throwable current = e;
+    while (current != null) {
+        if (current instanceof IllegalArgumentException) {
             return false;
         }
+
+        // Don't track any 4xx client errors (user/configuration issues)
+        if (current instanceof OpenSearchStatusException) {
+            RestStatus status = ((OpenSearchStatusException) current).status();
+            if (status.getStatus() >= 400 && status.getStatus() < 500) {
+                return false;
+            }
+        }
+        current = current.getCause();
     }
 
     // Track failures for infrastructure/service issues
     return true;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that wrapped exceptions might not be properly handled. Checking the cause chain ensures that user configuration errors (like IllegalArgumentException or 4xx OpenSearchStatusException) are detected even when wrapped, preventing incorrect failure tracking. This is a valid improvement for robustness.

Medium

Previous suggestions

Suggestions up to commit 69e8df5
CategorySuggestion                                                                                                                                    Impact
General
Check exception cause chain

The method should handle nested exceptions by checking the cause chain. If the root
cause is an IllegalArgumentException or a 4xx OpenSearchStatusException, it should
not be tracked even if wrapped in another exception type.

plugin/src/main/java/org/opensearch/ml/task/MLPredictTaskRunner.java [608-624]

 boolean shouldTrackRemoteFailure(Exception e) {
-    // Don't track failures for user configuration issues
-    if (e instanceof IllegalArgumentException) {
-        return false;
-    }
-
-    // Don't track any 4xx client errors (user/configuration issues)
-    if (e instanceof OpenSearchStatusException) {
-        RestStatus status = ((OpenSearchStatusException) e).status();
-        if (status.getStatus() >= 400 && status.getStatus() < 500) {
+    // Check the exception and its cause chain
+    Throwable current = e;
+    while (current != null) {
+        // Don't track failures for user configuration issues
+        if (current instanceof IllegalArgumentException) {
             return false;
         }
+
+        // Don't track any 4xx client errors (user/configuration issues)
+        if (current instanceof OpenSearchStatusException) {
+            RestStatus status = ((OpenSearchStatusException) current).status();
+            if (status.getStatus() >= 400 && status.getStatus() < 500) {
+                return false;
+            }
+        }
+        current = current.getCause();
     }
 
     // Track failures for infrastructure/service issues
     return true;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that checking the exception cause chain would make the error classification more robust. However, the current implementation may be intentionally simple, and the added complexity should be evaluated against actual use cases where exceptions are wrapped.

Medium

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 69e8df5

@pyek-bot
Copy link
Copy Markdown
Collaborator Author

pyek-bot commented May 27, 2026

Spotless failing due to:

* What went wrong:
A problem occurred evaluating project ':opensearch-ml-client'.
> Could not find method shadowJar() for arguments [build_56550d4946yr8dezgj7g3z4xk$_run_closure5@6eb61855] on project ':opensearch-ml-client' of type org.gradle.api.Project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants