[hotfix] Validate jarURI in DefaultValidator#1104
Conversation
Extend DefaultValidator.validateJobSpec to inspect the JobSpec jarURI: malformed URIs are rejected, the scheme must be in a configurable allowlist, and for http/https the host must not resolve to loopback, link-local, site-local, wildcard or multicast addresses. Two new config options control the behaviour: - kubernetes.operator.user.artifacts.allowed-schemes (List<String>, default: "https", "local") - kubernetes.operator.user.artifacts.disallow-restricted-hosts (Boolean, default: true) Update the FlinkSessionJob overview docs (en and zh) to describe the new defaults and the override knobs, and regenerate the operator config reference. Signed-off-by: Andrea Cosentino <ancosen@gmail.com>
|
|
||
| Configuration configuration = Configuration.fromMap(confMap); | ||
|
|
||
| Optional<String> jarUriError = validateJarURI(job.getJarURI(), configuration); |
There was a problem hiding this comment.
I don't really think that we should validate Application jarURIs here. I think session job validation makes sense because of the http download and local access on the operator pod but for applications these don't really matter.
There was a problem hiding this comment.
Done in c38f8bd. Moved the call out of validateJobSpec so application clusters are no longer touched, and added it to validateSessionJobOnly which runs on every session job submission.
| </tr> | ||
| <tr> | ||
| <td><h5>kubernetes.operator.user.artifacts.allowed-schemes</h5></td> | ||
| <td style="word-wrap: break-word;">"https";<wbr>"local"</td> |
There was a problem hiding this comment.
if we remove application validation then local can be removed as well
There was a problem hiding this comment.
Done. local is gone from the default allowlist (default is now https only), and the option moved to the system section.
| + "schemes (such as 's3' or 'hdfs') can extend this list. " | ||
| + "Scheme matching is case-insensitive."); | ||
|
|
||
| @Documentation.Section(SECTION_DYNAMIC) |
There was a problem hiding this comment.
These configs should go into SECTION_SYSTEM and be resolved in FlinkOperatorConfiguration otherwise the user would be able to override them from their CR's config
There was a problem hiding this comment.
Good catch, done. Both options are now SECTION_SYSTEM, resolved via FlinkOperatorConfiguration.fromConfiguration (new jarUriAllowedSchemes / jarUriDisallowRestrictedHosts fields), and read by the validator from configManager.getOperatorConfiguration(). A new test in DefaultValidatorTest#testSessionJobJarUriValidationUsesOperatorConfig confirms a CR-supplied override of these keys is ignored.
- Drop jarURI validation for FlinkDeployment; only FlinkSessionJob is validated since application clusters do not fetch the jar through the operator pod. - Move JAR_URI_ALLOWED_SCHEMES and JAR_URI_DISALLOW_RESTRICTED_HOSTS to SECTION_SYSTEM and resolve them via FlinkOperatorConfiguration so a user-supplied flinkConfiguration in a CR cannot override the operator-level allowlist. - Drop "local" from the default allowlist now that application clusters bypass validation; default is "https" only. - Switch the static validateJarURI helper to take a Collection of allowed schemes and a boolean directly; call it from validateSessionJobOnly via the resolved operator configuration. - Update tests: replace FlinkDeployment-based jarURI tests with direct unit tests on the static helper plus a session-job integration test that verifies a CR override is ignored. - Add SAMPLE_SESSION_JOB_JAR test constant to keep existing session job test fixtures valid under the stricter default allowlist. - Refresh docs (en + zh) and regenerate the operator config reference; the two options now appear under the system section. Signed-off-by: Andrea Cosentino <ancosen@gmail.com>
What
Validate the
jarURIfield on FlinkSessionJob submission: reject malformed URIs, schemes outside an allowlist, andhttp/httpsURIs whose host resolves to loopback, link-local, site-local, wildcard or multicast addresses. FlinkDeployment is intentionally not validated, since application clusters reference a JAR shipped inside the image (e.g.local://) and the operator never fetches it.Config options
Both options are operator-level (
SECTION_SYSTEM), resolved viaFlinkOperatorConfigurationso a user-suppliedflinkConfigurationin a CR cannot override them.kubernetes.operator.user.artifacts.allowed-schemes(List<String>, defaulthttps)kubernetes.operator.user.artifacts.disallow-restricted-hosts(Boolean, defaulttrue)Tests
mvn -pl flink-kubernetes-operator test→ 2133 / 0mvn -pl flink-kubernetes-webhook test→ 105 / 0mvn clean install -DskipTests(full reactor) → BUILD SUCCESSDefaultValidatorTestcovers:testJarUriSchemeValidationandtestJarUriHostValidation— direct unit tests on the static helper (allowed/disallowed schemes, malformed URIs, missing schemes, loopback/link-local/site-local/wildcard/multicast hosts, opt-out flag).testSessionJobJarUriValidationUsesOperatorConfig— verifies the validator reads the allowlist from the operator-level configuration and that a CR-supplied override of those keys is ignored.