Skip to content

[SPARK-57776][CORE] Add History Server access logging#56893

Open
qijiale76 wants to merge 3 commits into
apache:masterfrom
qijiale76:history-server-access-log
Open

[SPARK-57776][CORE] Add History Server access logging#56893
qijiale76 wants to merge 3 commits into
apache:masterfrom
qijiale76:history-server-access-log

Conversation

@qijiale76

@qijiale76 qijiale76 commented Jun 30, 2026

Copy link
Copy Markdown

What changes were proposed in this pull request?

This PR adds optional HTTP access logging for Spark History Server.

It adds two History Server configurations:

  • spark.history.ui.accessLog.enabled, default false
  • spark.history.ui.accessLog.excludePaths, default /static,/favicon.ico

When enabled, Spark logs one access record for each non-excluded History Server UI or REST API request after the request completes. The access log is emitted by the org.apache.spark.deploy.history.HistoryServerAccessLogFilter logger at INFO level.

Each access record includes the request method, URI, redacted query string, status code, duration, remote address, remote user when available, user agent, referer, and exception class if the request chain throws.

Query string values are redacted using Spark's existing spark.redaction.regex setting. Static resources and favicon requests are excluded by default to avoid low-value log volume.

Why are the changes needed?

Spark History Server exposes application metadata through both web UI pages and REST APIs. Operators may need access records to troubleshoot unexpected clients, detect API polling, investigate failed requests, or satisfy operational audit requirements.

Today this generally requires deploying a reverse proxy or writing a custom spark.ui.filters servlet filter. This PR provides a built-in History Server access log while keeping the default behavior unchanged.

The feature is disabled by default, does not log request or response bodies, redacts query strings, and avoids logging static-resource noise by default.

Does this PR introduce any user-facing change?

Yes. This PR adds an optional History Server access log.

The default behavior is unchanged because spark.history.ui.accessLog.enabled defaults to false.

Users can enable it with:

spark.history.ui.accessLog.enabled=true

They can customize excluded path prefixes with:

spark.history.ui.accessLog.excludePaths=/static,/favicon.ico

Requests rejected by user-installed UI filters before Spark's internal access log filter may not be recorded.

How was this patch tested?

Tested locally with JDK 17 and tested in our cluster.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex (GPT-5.5)

@qijiale76 qijiale76 force-pushed the history-server-access-log branch from 9362200 to e5682b5 Compare June 30, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant