Skip to content

[Fix][Zeta] Fix NPE in JobInfoService#10700

Merged
chl-wxp merged 2 commits into
apache:devfrom
dybyte:fix/get-job-info-npe
Apr 29, 2026
Merged

[Fix][Zeta] Fix NPE in JobInfoService#10700
chl-wxp merged 2 commits into
apache:devfrom
dybyte:fix/get-job-info-npe

Conversation

@dybyte
Copy link
Copy Markdown
Contributor

@dybyte dybyte commented Apr 2, 2026

Fixes: #10695

Purpose of this pull request

This PR fixes a potential NPE in JobInfoService.

Does this PR introduce any user-facing change?

How was this patch tested?

Added JobInfoServiceNullSafetyTest.

Check list

@dybyte dybyte marked this pull request as draft April 2, 2026 10:35
@dybyte dybyte force-pushed the fix/get-job-info-npe branch from 5d849ce to 889ce70 Compare April 2, 2026 10:40
@dybyte dybyte marked this pull request as ready for review April 2, 2026 10:46
@DanielLeens
Copy link
Copy Markdown
Contributor

I pulled the branch locally and traced the finished-job path in JobInfoService.getJobInfoJson(Long).

I found one blocking gap: this patch fixes the finishedJobMetrics == null case, but the same path can still NPE when finishedJobDAGInfo is missing.

On the new code:

  • JobInfoService.getJobInfoJson() still loads finishedJobDAGInfo from IMAP_FINISHED_JOB_VERTEX_INFO and passes it straight through (JobInfoService.java:86-94)
  • BaseService.getJobInfoJson(...) then unconditionally calls jobDAGInfo.toJsonObject() (BaseService.java:242-268)

So if the finished-job state exists but the DAG info map entry is absent, we still crash on the same REST path. The new test class only covers missing metrics (JobInfoServiceNullSafetyTest.java:89-101), not missing DAG info.

Please either make the finished-job JSON builder null-safe for jobDAGInfo as well, or prove by invariant that IMAP_FINISHED_JOB_STATE and IMAP_FINISHED_JOB_VERTEX_INFO are always populated atomically. Right now the NPE is only narrowed, not eliminated.

@dybyte
Copy link
Copy Markdown
Contributor Author

dybyte commented Apr 23, 2026

I pulled the branch locally and traced the finished-job path in JobInfoService.getJobInfoJson(Long).

I found one blocking gap: this patch fixes the finishedJobMetrics == null case, but the same path can still NPE when finishedJobDAGInfo is missing.

On the new code:

  • JobInfoService.getJobInfoJson() still loads finishedJobDAGInfo from IMAP_FINISHED_JOB_VERTEX_INFO and passes it straight through (JobInfoService.java:86-94)
  • BaseService.getJobInfoJson(...) then unconditionally calls jobDAGInfo.toJsonObject() (BaseService.java:242-268)

So if the finished-job state exists but the DAG info map entry is absent, we still crash on the same REST path. The new test class only covers missing metrics (JobInfoServiceNullSafetyTest.java:89-101), not missing DAG info.

Please either make the finished-job JSON builder null-safe for jobDAGInfo as well, or prove by invariant that IMAP_FINISHED_JOB_STATE and IMAP_FINISHED_JOB_VERTEX_INFO are always populated atomically. Right now the NPE is only narrowed, not eliminated.

@DanielLeens I checked this again. JobDAGInfo.toJsonObject() itself is not null-safe, but in this path BaseService.getJobInfoJson(...) already checks whether jobDAGInfo is null before calling toJsonObject().

So I don't think a missing finishedJobDAGInfo would trigger an NPE here. The null-safety issue in this patch is limited to finishedJobMetrics on this path.

@DanielLeens
Copy link
Copy Markdown
Contributor

Hi @dybyte, thank you for checking my previous comment so carefully. I re-pulled the latest head locally and traced the REST path again. You are right: my earlier concern about finishedJobDAGInfo was not valid for the current code path, because BaseService.getJobInfoJson(...) already has a null-safe fallback before calling jobDAGInfo.toJsonObject().

What This PR Fixes

  • User pain point: when the REST job-info endpoint reads a finished job whose state exists but whose metrics entry is missing, the old code dereferenced finishedJobMetrics directly and could fail with an NPE instead of returning basic job information.
  • Fix approach: JobInfoService.getJobInfoJson(Long) now falls back to JobMetrics.empty() when the finished-job metrics map has no entry for the job.
  • One-line summary: finished-job REST lookup now survives missing metrics and still returns a valid JSON response.

Runtime Path Checked

REST job-info request
  -> RestHttpGetCommandProcessor / JobInfoServlet
      -> JobInfoService.getJobInfoJson(jobId)
          -> read IMAP_RUNNING_JOB_INFO
              -> if present: convert running JobInfo to JSON
              -> if absent: continue to finished-job state
          -> read IMAP_FINISHED_JOB_STATE
              -> if absent: return a JsonObject with only jobId
              -> if present: read metrics and DAG info
          -> read IMAP_FINISHED_JOB_METRICS
              -> old behavior: null metrics -> finishedJobMetrics.toJsonString() NPE
              -> new behavior: null metrics -> JobMetrics.empty()
          -> read IMAP_FINISHED_JOB_VERTEX_INFO
          -> BaseService.getJobInfoJson(jobState, metricsJson, dagInfo)
              -> jobDAGInfo != null ? jobDAGInfo.toJsonObject() : new JsonObject()
              -> getJobMetrics(...) also checks jobDAGInfo before reading vertex info

Key Findings

  • The normal finished-job REST path does hit this PR.
  • The real fixed case is: finished state exists, but finished metrics are missing.
  • This is a precise fix and matches the existing JobMetrics.empty() semantics already used elsewhere in the engine history path.
  • The new test class covers the missing-metrics regression, the normal finished state + metrics + DAG path, and the no-finished-state fallback.
  • Current GitHub checks are green.

Compatibility / Side Effects

Compatibility: fully compatible.

No API, option, default value, protocol, serialization, checkpoint, or historical upgrade behavior is changed. The only user-visible effect is that a REST query that previously failed on this edge case can now return basic finished-job information with empty metrics.

Findings

No blocking issue found in the current head.

Merge Decision

Conclusion: can merge

  1. Blocking items
  • None.
  1. Recommended non-blocking improvements
  • None.

Overall, the current implementation is focused and low risk. Thanks again for the clear follow-up; after retracing the actual call chain, I agree this PR is ready from my side.

Copy link
Copy Markdown
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Good job

@DanielLeens
Copy link
Copy Markdown
Contributor

Hi @dybyte, I rechecked the current head locally again after the latest review activity.

My conclusion is unchanged:

  • the real fixed case is still “finished state exists, but finished metrics are missing”
  • the current head falls back to JobMetrics.empty()
  • the regression test covers the important null-safety branches
  • the current Build is green

Runtime chain I rechecked

REST job-info request
  -> JobInfoService.getJobInfoJson(jobId)
      -> running job map
      -> finished job state map
      -> finished metrics map
          -> null -> JobMetrics.empty()
      -> convert to JSON

Conclusion: can merge

Blocking items:

  • None from my side.

Thanks again for the careful follow-up here. The current implementation still looks focused and low risk.

@chl-wxp chl-wxp merged commit bdd7b69 into apache:dev Apr 29, 2026
5 checks passed
@dybyte dybyte deleted the fix/get-job-info-npe branch April 29, 2026 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Zeta] getJobInfoJson throws a NullPointerException

4 participants