Skip to content

[SPARK-57466] Add option for dynamic-config to use mounted-ConfigMap#707

Open
csviri wants to merge 10 commits into
apache:mainfrom
csviri:dynamic-config-map-name-informer-mount
Open

[SPARK-57466] Add option for dynamic-config to use mounted-ConfigMap#707
csviri wants to merge 10 commits into
apache:mainfrom
csviri:dynamic-config-map-name-informer-mount

Conversation

@csviri

@csviri csviri commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR adds a second, mount-based source for dynamic configuration overrides and makes the dynamic-config mechanism pluggable, while retaining the existing ConfigMap-informer behavior for backwards compatibility.

A new config option spark.kubernetes.operator.dynamicConfig.source selects the source:

  • configMap (default) — the existing behavior: a dedicated Operator runs SparkOperatorConfigMapReconciler, watching a ConfigMap via a Kubernetes informer. Requires RBAC to read ConfigMaps.
  • file — a new DynamicConfigMonitor periodically reloads a properties file mounted from a ConfigMap volume, requiring no extra RBAC.

Both paths funnel through the same SparkOperatorConfManager.refresh(...) + watched-namespace update logic, so runtime behavior is identical once overrides are applied.

Key changes:

  • New DynamicConfigMonitor (config/DynamicConfigMonitor.java) — a single-thread scheduled reader of a properties file with start() / stop() / isRunning() lifecycle and change detection against the last loaded
    snapshot.
  • SparkOperatorregisterDynamicConfig() dispatches on dynamicConfig.source: the file source returns a DynamicConfigMonitor; the configMap source registers the second informer Operator (restored
    registerSparkOperatorConfMonitor() / overrideConfigMonitorConfigs()). The operator still tracks a List<Operator> plus an optional monitor.
  • SparkOperatorConf — added dynamicConfig.source, dynamicConfig.filePath, dynamicConfig.reloadIntervalSeconds; retained dynamicConfig.selector and dynamicConfig.reconcilerParallelism.
  • Probes (HealthProbe, ReadinessProbe, ProbeService, ProbeUtil) — health/readiness cover all registered operators (so the informer operator is checked under configMap) and additionally the monitor's running
    state under file.
  • Helm chart:
    • dynamicConfig.source added to values.yaml (default configMap).
    • The ConfigMap Role/RoleBinding is created only when enable && source == configMap.
    • The dynamic-config volume and mount are added only when enable && source == file.
    • The dynamic ConfigMap renders a single spark-operator-dynamic.properties key for the file source, or raw per-key data for the configMap source.

Why are the changes needed?

The informer-based source requires the operator to hold cluster/namespace RBAC on ConfigMaps and runs a second JOSDK Operator purely to watch one ConfigMap. The mount-based source removes that RBAC requirement and the extra
informer by letting the kubelet project ConfigMap changes onto a volume that the operator polls. Rather than forcing a migration, this PR offers the mount-based approach as an opt-in alternative while keeping the informer as
the default, so existing deployments are unaffected.

Does this PR introduce any user-facing change?

Yes, additive and backwards-compatible:

  • New spark.kubernetes.operator.dynamicConfig.source option (default configMap). With the default, behavior is unchanged from the released operator.
  • New dynamicConfig.filePath and dynamicConfig.reloadIntervalSeconds options, used only by the file source.
  • New dynamicConfig.source Helm value (default configMap). Opting into file switches the chart to mount the ConfigMap as a volume (rendered under a single spark-operator-dynamic.properties key) and drops the ConfigMap
    RBAC Role/RoleBinding.

No change for users who do not set dynamicConfig.source.

How was this patch tested?

  • Unit tests for both sources:
    • configMapSparkOperatorConfigMapReconcilerTest and the configMap-source case in SparkOperatorTest (asserts a second Operator is registered and no DynamicConfigMonitor).
    • fileDynamicConfigMonitorTest (initial load, no-op reload when unchanged, refresh + namespace update on change, isRunning() lifecycle) and the file-source case in SparkOperatorTest.
  • Updated HealthProbeTest, ReadinessProbeTest, ProbeServiceTest for the List<Operator> + monitor signatures.
  • e2e watched-namespaces chainsaw test runs against the file source (tests/e2e/helm/dynamic-config-values{,-2}.yaml set source: file); ConfigMaps use the spark-operator-dynamic.properties layout and the propagation
    wait was raised to account for kubelet volume-sync plus the reload interval.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

@csviri csviri changed the title Replace dynamic-config informer with mounted-ConfigMap file watcher Replace dynamic-config controller with mounted-ConfigMap file watcher Jun 15, 2026
@csviri csviri changed the title Replace dynamic-config controller with mounted-ConfigMap file watcher [SPARK-57466] Replace dynamic-config controller with mounted-ConfigMap file watcher Jun 15, 2026
@csviri csviri marked this pull request as ready for review June 15, 2026 09:18
@csviri csviri marked this pull request as draft June 15, 2026 09:18
Rework the dynamic-config feature so the operator no longer runs a
dedicated Operator instance with a ConfigMap informer. The dynamic-config
ConfigMap is now mounted into the operator pod as a volume, and a new
DynamicConfigMonitor periodically re-reads the properties file from disk,
refreshing SparkOperatorConfManager and the watched-namespaces updater on
change.

- Add DynamicConfigMonitor with start/stop/isRunning lifecycle
- Simplify SparkOperator from List<Operator> to a single Operator plus an
  optional DynamicConfigMonitor; remove SparkOperatorConfigMapReconciler
- Replace dynamicConfig.selector with dynamicConfig.filePath and
  dynamicConfig.reloadIntervalSeconds
- Helm: drop configmaps RBAC Role/RoleBinding, mount the dynamic ConfigMap
  volume, render a single spark-operator-dynamic.properties key
- Probes: switch to a single Operator and include the monitor's running
  state in /healthz and /readyz

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@csviri csviri force-pushed the dynamic-config-map-name-informer-mount branch from fb3cb27 to 949aea7 Compare June 15, 2026 09:24
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
@csviri csviri marked this pull request as ready for review June 15, 2026 10:22
@csviri csviri marked this pull request as draft June 15, 2026 11:00
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
@csviri csviri marked this pull request as ready for review June 15, 2026 11:07
@csviri

csviri commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

this is a strange failure, tests are passing:
https://github.com/apache/spark-kubernetes-operator/actions/runs/26576971235

@dongjoon-hyun

Copy link
Copy Markdown
Member

Thank you, @csviri .

For the following question, it's just time-out due to the limit of ubuntu-slim GitHub Actions Virtual Environment. Please re-trigger the failed pipeline. We use ubuntu-slim because of the cost-savings.

this is a strange failure, tests are passing: https://github.com/apache/spark-kubernetes-operator/actions/runs/26576971235

Screenshot 2026-06-15 at 09 28 17

Comment thread docs/config_properties.md
csviri added 2 commits June 16, 2026 12:23
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
@csviri csviri marked this pull request as draft June 16, 2026 11:46
csviri added 3 commits June 16, 2026 14:05
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
@csviri csviri requested a review from dongjoon-hyun June 16, 2026 12:48
@csviri csviri marked this pull request as ready for review June 16, 2026 12:58
@csviri csviri changed the title [SPARK-57466] Replace dynamic-config controller with mounted-ConfigMap file watcher [SPARK-57466] Add option for dynamic-config to use mounted-ConfigMap Jun 16, 2026
@csviri

csviri commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

@dongjoon-hyun updated, please take a look if this way with feature flag is how you imagined this. thx!

@dongjoon-hyun

Copy link
Copy Markdown
Member

Thank you, @csviri .

cc @aaruna , too.

@jiangzho jiangzho left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this feature - thanks @csviri !

}

@Test
void testHealthProbeWithInformerHealthWithMultiOperators() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask if this is inevitable removal of irrelevant HealthProbe test coverage, @csviri ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new testHealthProbeWithDynamicConfigMonitor test coverage contains this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nit properly, good catch, fixed that and ProbeServiceTest. thx!

data:
spark.kubernetes.operator.watchedNamespaces: default, spark-1
spark-operator-dynamic.properties: |
spark.kubernetes.operator.watchedNamespaces=default, spark-1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change this?

@csviri csviri Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, see below.

data:
spark.kubernetes.operator.watchedNamespaces: default
spark-operator-dynamic.properties: |
spark.kubernetes.operator.watchedNamespaces=default

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change this?

@csviri csviri Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: yes, afaik there is no way to mount all ConfigMap data as a single file in K8S. A key is always a new file. So we have to place the whole content under a key in ConfigMap.

Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
@dongjoon-hyun

Copy link
Copy Markdown
Member

Hi, @csviri . According to your comment, the following sounds irrelevant to this PR's proposal. Could you make a new PR only for that? We can proceed that independent topic before this PR.

yes, afaik there is not way to mount all ConfigMap data in K8S. A key is always a new file.

data:
- spark.kubernetes.operator.watchedNamespaces: default
+   spark-operator-dynamic.properties: |
+        spark.kubernetes.operator.watchedNamespaces=default

@csviri

csviri commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Hi, @csviri . According to your comment, the following sounds irrelevant to this PR's proposal. Could you make a new PR only for that? We can proceed that independent topic before this PR.

yes, afaik there is not way to mount all ConfigMap data in K8S. A key is always a new file.

data:
- spark.kubernetes.operator.watchedNamespaces: default
+   spark-operator-dynamic.properties: |
+        spark.kubernetes.operator.watchedNamespaces=default

@dongjoon-hyun maybe I was not clear, those are needed/used in the chainsaw test, and needs to be in this format. Or do I miss something?

@dongjoon-hyun

Copy link
Copy Markdown
Member

No, you were clear.

maybe I was not clear, those are needed/used in the chainsaw test, and needs to be in this format. Or do I miss something?

Maybe, I wasn't clear enough. My perspective was the following.

  • This is supposed to introduce a new alternative path which we should not affect the existing functionality and test coverage.
  • In other words, this PR is supposed to introduce an independent set of test coverage instead of touching the existing test code and test resource.

So, if you need to change some existing test resource for any reasons like the following, it should be done before this PR independently.

A key is always a new file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants