[FLINK-39511] FLIP-514: Custom Evaluator plugin for Flink Autoscaler#1099
[FLINK-39511] FLIP-514: Custom Evaluator plugin for Flink Autoscaler#1099Dennis-Mircea wants to merge 4 commits intoapache:mainfrom
Conversation
|
Question & Follow-up: single-evaluator limitation vs. config namespace shape As it stands, At the same time, job.autoscaler.metrics.custom-evaluator.name: custom-evaluator
job.autoscaler.metrics.custom-evaluator.custom-evaluator.target-data-rate: 100000.0That repeated I see two coherent directions and would like the maintainers' preference before pushing further commits: Option A - keep the single-evaluator contract, simplify the namespace. Drop the job.autoscaler.metrics.custom-evaluator-name: custom-evaluator
job.autoscaler.metrics.custom-evaluator.target-data-rate: 100000.0The Option A has the following pro & cons:
Option B - lift the limitation, align the config shape with Flink metric reporters and with FLIP-514. Flink's metric-reporter config is the established idiom for "list of named instances, each with its own class and its own bag of options": metrics.reporters: my_jmx_reporter,my_other_reporter
metrics.reporter.my_jmx_reporter.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
metrics.reporter.my_jmx_reporter.port: 9020-9040
metrics.reporter.my_other_reporter.factory.class: org.apache.flink.metrics.graphite.GraphiteReporterFactory
metrics.reporter.my_other_reporter.host: 192.168.1.1
metrics.reporter.my_other_reporter.port: 10000Applied to custom evaluators, the mapping is:
Concretely: job.autoscaler.metrics.custom-evaluators: custom-evaluator-1,custom-evaluator-2
job.autoscaler.metrics.custom-evaluator.custom-evaluator-1.class: org.apache.flink.autoscaler.custom.MyFirstEvaluator
job.autoscaler.metrics.custom-evaluator.custom-evaluator-1.target-data-rate: 100000.0
job.autoscaler.metrics.custom-evaluator.custom-evaluator-2.class: org.apache.flink.autoscaler.custom.MySecondEvaluator
job.autoscaler.metrics.custom-evaluator.custom-evaluator-2.some-threshold: 0.8The consequence is that the To keep the ordering contract between multiple custom evaluators explicit and self-describing rather than implicit in the declaration order of
The Option B has the following pro & cons:
My preference would be to land Option B so we match FLIP-514 directly and don't ship a config surface we know we'll have to migrate away from. Thoughts? |
@Dennis-Mircea let's go for option 2 but to limit the change/complexity we can also start with supporting only a single evaluator plugin and add the priority / ordering logic as a follow up jira |
Squashed from PR apache#953 (apache#953). Original commits by Pradeepta Choudhury <pchoudhury22@apple.com>: - Draft PR for plugin based approach for custom evaluator for scaling metric evaluation - Add sample custom evaluator for simple trend adjustor and fallback key support with delegating configuration - Add getName method to CustomEvaluator and javadocs - Refactor tests for the plugin and pass null for custom evaluators in standalone autoscaler entrypoint - Rename evaluator plugin to match naming convention for plugins Co-authored-by: Pradeepta Choudhury <pchoudhury22@apple.com>
… Flink Autoscaler
d7bc703 to
bdd087b
Compare
The option 2 is now fully implemented and the FLINK-39554 JIRA was created as follow up. The PR is now ready for final review. One last question and clarification about the |
What is the purpose of the change
This PR implements the FLIP-514 and is a continuation of the work done as part of #953 PR.
It introduces a pluggable
FlinkAutoscalerEvaluatorSPI that lets users provide custom scaling-metric evaluation logic on top of the metrics evaluated internally by the autoscaler, both in the Flink Kubernetes Operator (via the Flink Plugins mechanism) and inflink-autoscaler-standalone(via the standard JavaServiceLoader). The evaluator returned map is merged on top of the internally-evaluated metrics for each job vertex, so users can override or augment specificScalingMetricvalues (e.g.TARGET_DATA_RATE,TRUE_PROCESSING_RATE,CATCH_UP_DATA_RATE) without forking the autoscaler.NOTE: This PR ensures the full alignment with the
[FLINK-39555] FLIP-575: Scaling Executor Plugin SPI for Flink AutoscalerPR (#1085).Brief change log
FlinkAutoscalerEvaluatorSPI (flink-autoscaler) withgetName(),evaluateVertexMetrics(vertex, evaluatedMetrics, Context)and aContextexposing an un-modifiable view ofjobConf,metricsHistory, previously evaluated vertex metrics,topology,processingBacklog,restartTimeand the evaluator-specificcustomEvaluatorConf.job.autoscaler.metrics.custom-evaluator.nameconfig option andAutoScalerOptions#forCustomEvaluator(conf, name)helper, which builds aDelegatingConfigurationscoped tojob.autoscaler.metrics.custom-evaluator.<name>.for the evaluator.JobAutoScalerImpl#getCustomEvaluatorIfRequiredand passed the resolvedTuple2<FlinkAutoscalerEvaluator, Configuration>down toScalingMetricEvaluator, which now calls the evaluator per vertex in topological order and merges the returned metrics on top of the internally evaluated ones.org.apache.flink.kubernetes.operator.autoscaler.AutoscalerUtils#discoverCustomEvaluators(Configuration)using the FlinkPluginManager(/opt/flink/plugins). The discovered evaluators are injected intoJobAutoScalerImplfromFlinkOperator.org.apache.flink.autoscaler.standalone.AutoscalerUtils#discoverCustomEvaluators()usingServiceLoader.load(FlinkAutoscalerEvaluator.class)and wired it intoStandaloneAutoscalerEntrypoint#createJobAutoscaler.docs/.../operations/plugins.mdunder a new Custom Flink Autoscaler Evaluator section, including a warning hint that a single custom evaluator is supported per pipeline today.Verifying this change
This change added tests and can be verified as follows:
flink-autoscaler):JobAutoScalerImplTest#testGetCustomEvaluatorIfRequiredcovers the registry lookup (configured name present / absent / unknown) and the per-evaluatorDelegatingConfigurationbuilt byAutoScalerOptions.forCustomEvaluator.MetricsCollectionAndEvaluationTestexercises end-to-end evaluation with aTestCustomEvaluatoroverride onTARGET_DATA_RATEfor source vertices and asserts that the custom value wins over the internally computed one while the rest of the metrics remain untouched.flink-kubernetes-operator):TestingFlinkDeploymentController/FlinkOperatorwiring verified by existing controller tests loading theTestCustomEvaluatorviaMETA-INF/services.flink-autoscaler-standalone):AutoscalerUtilsTest#testDiscoverCustomEvaluatorsregistersTestCustomEvaluatorthroughMETA-INF/services/org.apache.flink.autoscaler.metrics.FlinkAutoscalerEvaluatorand asserts thatAutoscalerUtils.discoverCustomEvaluators()finds it keyed bygetName().Does this pull request potentially affect one of the following parts:
CustomResourceDescriptors: noDocumentation