[FLINK-39555] FLIP-575: Scaling Executor Plugin SPI for Flink Autoscaler#1085
Open
Dennis-Mircea wants to merge 1 commit intoapache:mainfrom
Open
[FLINK-39555] FLIP-575: Scaling Executor Plugin SPI for Flink Autoscaler#1085Dennis-Mircea wants to merge 1 commit intoapache:mainfrom
Dennis-Mircea wants to merge 1 commit intoapache:mainfrom
Conversation
d922cef to
e44c647
Compare
Contributor
Author
|
This FLIP PR is now ready for the final review. NOTE: If the #1099 PR will be merged before (and most probably it will) there will be some conflicts that will must be resolved afterwards. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
This pull request introduces the
ScalingExecutorPluginSPI as defined in FLIP-575: Scaling Executor Plugin SPI for Flink Autoscaler. The SPI allows users to plug in custom logic that intercepts, modifies, or vetoes computed scaling decisions before they are applied by the autoscaler. It is complementary to FLIP-514: Custom Evaluator plugin for Flink Autoscaler, which extends the metric evaluation layer; this contribution extends the scaling decision execution layer.Unlike
FlinkAutoscalerEvaluator(capped at one instance per job),ScalingExecutorPluginnatively supports multiple named instances chained together in a deterministic priority order, enabling composition of orthogonal concerns (resource gating, parallelism alignment, cost capping, audit, …) without coupling them inside a single plugin.NOTE: This PR ensures the full alignment with the
[FLINK-39511] FLIP-514: Custom Evaluator plugin for Flink AutoscalerPR (#1099).Brief change log
New SPI -
org.apache.flink.autoscaler.ScalingExecutorPlugin<KEY, CTX>with:priority()(default0, lower runs first) for deterministic chain ordering.apply(Context, Map<JobVertexID, ScalingSummary>)returning a (possibly modified) summary map; an empty map vetoes the scaling operation.Contextbundling theJobAutoScalerContext, the prefix-stripped per-instanceConfiguration, the evaluated metrics, and the job topology.Discovery / selection split - mirrors the
FlinkAutoscalerEvaluatorflow:flink-kubernetes-operator/AutoscalerUtils#discoverCustomScalingExecutors(Configuration)via the FlinkPluginUtilsmechanism;flink-autoscaler-standalone/AutoscalerUtils#discoverCustomScalingExecutors(Configuration)via JavaServiceLoader.ScalingExecutor#resolveCustomScalingExecutors(Configuration)matches the configured chain against the discovered SPI bag, sorts bypriority(), and returns an unmodifiable orderedCollection<Map.Entry<String, ScalingExecutorPlugin>>carrying the user-chosen<name>for each resolved instance. This guarantees that per-job overrides viaFlinkDeployment.spec.flinkConfigurationtake effect on the next reconcile without restarting the operator.Configuration (metric-reporter idiom) in
AutoScalerOptions:job.autoscaler.scaling.custom-executors- list of named instances.job.autoscaler.scaling.custom-executor.<name>.class- implementation FQN.job.autoscaler.scaling.custom-executor.<name>.<parameter>- free-form per-instance options, delivered to the plugin viaContext.getConfiguration()with the per-instance prefix stripped.kubernetes.operator.-prefixed form as a fallback; canonical wins on overlap.customScalingExecutorClassKey,customScalingExecutorClassFallbackKey,customScalingExecutorClassOption,customScalingExecutorConfiguration.ScalingExecutorintegration - invokesapplyScalingExecutorPluginsafter the blocked check, after memory tuning, and after the resource-quota gate, but before persisting parallelism overrides. Approve / veto outcomes are surfaced asScalingApproved/ScalingVetoedKubernetes events withmessageKey = <reason>:<instanceName>(so chained instances of the same plugin class do not de-duplicate). An INFO log lineCustom scaling executors resolved and sorted by priority: [...]is emitted per reconciliation listing each[name, class, priority]entry in execution order.Documentation - new
## Custom Scaling Executorssection indocs/content/docs/operations/plugins.mdcovering the interface, SPI service file, packaging, YAML configuration, operator + standalone deployment with the corresponding discovery log lines, and the per-reconciliation chain log hint.Verifying this change
This change added tests and can be verified as follows:
flink-autoscaler/ScalingExecutorTest- extended with a dedicated nestedScalingExecutorPluginTestsuite covering:priority()-driven chain ordering using high / default / low priority instances registered out of order.<reason>:<instanceName>messageKey, including the no-plugins-registered case.kubernetes.operator.-prefixed fallback for both thescaling.custom-executorslist and the per-instance<name>.classkey.Configurationdelivery (canonical prefix, legacy prefix, and canonical-wins-on-overlap semantics).flink-autoscaler-standalone/AutoscalerUtilsTest- verifiesServiceLoader-based discovery returns the registeredScalingExecutorPlugininstances.flink-kubernetes-operator/AutoscalerUtilsTest- verifies plugin-directory discovery viaPluginUtils.Does this pull request potentially affect one of the following parts:
CustomResourceDescriptors: noDocumentation