[SPARK-57466] Add option for dynamic-config to use mounted-ConfigMap#707
[SPARK-57466] Add option for dynamic-config to use mounted-ConfigMap#707csviri wants to merge 10 commits into
Conversation
Rework the dynamic-config feature so the operator no longer runs a dedicated Operator instance with a ConfigMap informer. The dynamic-config ConfigMap is now mounted into the operator pod as a volume, and a new DynamicConfigMonitor periodically re-reads the properties file from disk, refreshing SparkOperatorConfManager and the watched-namespaces updater on change. - Add DynamicConfigMonitor with start/stop/isRunning lifecycle - Simplify SparkOperator from List<Operator> to a single Operator plus an optional DynamicConfigMonitor; remove SparkOperatorConfigMapReconciler - Replace dynamicConfig.selector with dynamicConfig.filePath and dynamicConfig.reloadIntervalSeconds - Helm: drop configmaps RBAC Role/RoleBinding, mount the dynamic ConfigMap volume, render a single spark-operator-dynamic.properties key - Probes: switch to a single Operator and include the monitor's running state in /healthz and /readyz Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fb3cb27 to
949aea7
Compare
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
|
this is a strange failure, tests are passing: |
|
Thank you, @csviri . For the following question, it's just time-out due to the limit of
|
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
|
@dongjoon-hyun updated, please take a look if this way with feature flag is how you imagined this. thx! |
| } | ||
|
|
||
| @Test | ||
| void testHealthProbeWithInformerHealthWithMultiOperators() { |
There was a problem hiding this comment.
May I ask if this is inevitable removal of irrelevant HealthProbe test coverage, @csviri ?
There was a problem hiding this comment.
The new testHealthProbeWithDynamicConfigMonitor test coverage contains this?
There was a problem hiding this comment.
No, nit properly, good catch, fixed that and ProbeServiceTest. thx!
| data: | ||
| spark.kubernetes.operator.watchedNamespaces: default, spark-1 | ||
| spark-operator-dynamic.properties: | | ||
| spark.kubernetes.operator.watchedNamespaces=default, spark-1 |
There was a problem hiding this comment.
Do we need to change this?
| data: | ||
| spark.kubernetes.operator.watchedNamespaces: default | ||
| spark-operator-dynamic.properties: | | ||
| spark.kubernetes.operator.watchedNamespaces=default |
There was a problem hiding this comment.
Do we need to change this?
There was a problem hiding this comment.
Same as above: yes, afaik there is no way to mount all ConfigMap data as a single file in K8S. A key is always a new file. So we have to place the whole content under a key in ConfigMap.
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
|
Hi, @csviri . According to your comment, the following sounds irrelevant to this PR's proposal. Could you make a new PR only for that? We can proceed that independent topic before this PR.
data:
- spark.kubernetes.operator.watchedNamespaces: default
+ spark-operator-dynamic.properties: |
+ spark.kubernetes.operator.watchedNamespaces=default |
@dongjoon-hyun maybe I was not clear, those are needed/used in the chainsaw test, and needs to be in this format. Or do I miss something? |
|
No, you were clear.
Maybe, I wasn't clear enough. My perspective was the following.
So, if you need to change some existing test resource for any reasons like the following, it should be done before this PR independently.
|

What changes were proposed in this pull request?
This PR adds a second, mount-based source for dynamic configuration overrides and makes the dynamic-config mechanism pluggable, while retaining the existing ConfigMap-informer behavior for backwards compatibility.
A new config option
spark.kubernetes.operator.dynamicConfig.sourceselects the source:configMap(default) — the existing behavior: a dedicatedOperatorrunsSparkOperatorConfigMapReconciler, watching a ConfigMap via a Kubernetes informer. Requires RBAC to read ConfigMaps.file— a newDynamicConfigMonitorperiodically reloads a properties file mounted from a ConfigMap volume, requiring no extra RBAC.Both paths funnel through the same
SparkOperatorConfManager.refresh(...)+ watched-namespace update logic, so runtime behavior is identical once overrides are applied.Key changes:
DynamicConfigMonitor(config/DynamicConfigMonitor.java) — a single-thread scheduled reader of a properties file withstart()/stop()/isRunning()lifecycle and change detection against the last loadedsnapshot.
SparkOperator—registerDynamicConfig()dispatches ondynamicConfig.source: thefilesource returns aDynamicConfigMonitor; theconfigMapsource registers the second informerOperator(restoredregisterSparkOperatorConfMonitor()/overrideConfigMonitorConfigs()). The operator still tracks aList<Operator>plus an optional monitor.SparkOperatorConf— addeddynamicConfig.source,dynamicConfig.filePath,dynamicConfig.reloadIntervalSeconds; retaineddynamicConfig.selectoranddynamicConfig.reconcilerParallelism.HealthProbe,ReadinessProbe,ProbeService,ProbeUtil) — health/readiness cover all registered operators (so the informer operator is checked underconfigMap) and additionally the monitor's runningstate under
file.dynamicConfig.sourceadded tovalues.yaml(defaultconfigMap).Role/RoleBindingis created only whenenable && source == configMap.enable && source == file.spark-operator-dynamic.propertieskey for thefilesource, or raw per-key data for theconfigMapsource.Why are the changes needed?
The informer-based source requires the operator to hold cluster/namespace RBAC on ConfigMaps and runs a second JOSDK
Operatorpurely to watch one ConfigMap. The mount-based source removes that RBAC requirement and the extrainformer by letting the kubelet project ConfigMap changes onto a volume that the operator polls. Rather than forcing a migration, this PR offers the mount-based approach as an opt-in alternative while keeping the informer as
the default, so existing deployments are unaffected.
Does this PR introduce any user-facing change?
Yes, additive and backwards-compatible:
spark.kubernetes.operator.dynamicConfig.sourceoption (defaultconfigMap). With the default, behavior is unchanged from the released operator.dynamicConfig.filePathanddynamicConfig.reloadIntervalSecondsoptions, used only by thefilesource.dynamicConfig.sourceHelm value (defaultconfigMap). Opting intofileswitches the chart to mount the ConfigMap as a volume (rendered under a singlespark-operator-dynamic.propertieskey) and drops the ConfigMapRBAC
Role/RoleBinding.No change for users who do not set
dynamicConfig.source.How was this patch tested?
configMap—SparkOperatorConfigMapReconcilerTestand the configMap-source case inSparkOperatorTest(asserts a secondOperatoris registered and noDynamicConfigMonitor).file—DynamicConfigMonitorTest(initial load, no-op reload when unchanged, refresh + namespace update on change,isRunning()lifecycle) and the file-source case inSparkOperatorTest.HealthProbeTest,ReadinessProbeTest,ProbeServiceTestfor theList<Operator>+ monitor signatures.watched-namespaceschainsaw test runs against thefilesource (tests/e2e/helm/dynamic-config-values{,-2}.yamlsetsource: file); ConfigMaps use thespark-operator-dynamic.propertieslayout and the propagationwait was raised to account for kubelet volume-sync plus the reload interval.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)