Skip to content

[FLINK-40005][cdc-cli] CliExecutor application mode resolves pipeline definition from local path, remote path or inline content#4451

Open
Zhile wants to merge 1 commit into
apache:masterfrom
Zhile:FLINK-40005-cli-executor-read-local-file
Open

[FLINK-40005][cdc-cli] CliExecutor application mode resolves pipeline definition from local path, remote path or inline content#4451
Zhile wants to merge 1 commit into
apache:masterfrom
Zhile:FLINK-40005-cli-executor-read-local-file

Conversation

@Zhile

@Zhile Zhile commented Jun 29, 2026

Copy link
Copy Markdown

What is the purpose of the change

Fix [FLINK-40005]: in application mode, CliExecutor.main failed to start a YAML pipeline with Missing required field "source" in top-level configuration.

args[0] was passed straight to YamlPipelineDefinitionParser.parse(String, ...), which treats its String argument as YAML content. When args[0] is a file path, the path itself is parsed as YAML, producing a document without source/sink and failing validation.

The two native application deployment executors share CliExecutor.main but pass args[0] with different semantics — the root of the problem:

  • K8SApplicationDeploymentExecutor sets APPLICATION_ARGS = commandLine.getArgList(), so args[0] is the pipeline definition file path (shipped into the JobManager container, e.g. mounted via a ConfigMap by the Flink Kubernetes Operator).
  • YarnApplicationDeploymentExecutor reads the file on the client side and sets APPLICATION_ARGS to the pipeline definition content.

So a fix that only reads args[0] as a path would fix K8S but regress YARN (whose args[0] is already content, not a path).

Regression introduced by #3643 (FLINK-35360, Support Yarn application mode for yaml job).

Brief change log

  • CliExecutor.main now resolves args[0] via resolvePipelineDef(...), in three cases:
    1. Explicit-scheme path (s3:///hdfs:///oss:///file://) — read through Flink's FileSystem so the matching plugin resolves it.
    2. Bare local file path — read with the local JVM file API (Files.readAllBytes) rather than Flink's FileSystem, whose cluster default may be S3/HDFS and would not resolve a local path shipped next to the JobManager.
    3. Otherwise the value is already the pipeline definition content (YARN) — parsed verbatim.
  • Fixes K8S application mode without regressing YARN, and additionally supports placing the pipeline definition file on remote storage (S3/HDFS/OSS).
  • Update Kubernetes deployment docs (EN + ZH): the FlinkDeployment example now uses entryClass: org.apache.flink.cdc.cli.CliExecutor without --use-mini-cluster, and the "native application mode is not supported" note is removed.

Verifying this change

  • CliExecutorTest covers all three resolution paths (local path / scheme path via file://, same code path as s3:// / inline content) plus the original regression.
  • Manually verified on a Flink 1.20 Kubernetes Application-mode deployment (Flink Kubernetes Operator): the pipeline starts and runs; previously it failed with Missing required field "source".

Does this pull request potentially affect one of the following parts

  • Dependencies: no
  • The public API: no
  • The serializers / checkpointing: no

@github-actions github-actions Bot added the cli label Jun 29, 2026
@Zhile

Zhile commented Jun 29, 2026

Copy link
Copy Markdown
Author

@MOBIN-F can you help to review? I'm not sure if this applies the original design.

… definition from local path, remote path or inline content

In application mode, CliExecutor.main received args[0] and passed it to
YamlPipelineDefinitionParser.parse(String, Configuration), which treats the
String as YAML content. The path string itself was parsed as YAML and failed
validation with: Missing required field "source" in top-level configuration.

The two native application deployment executors share CliExecutor.main but pass
args[0] with different semantics:
- K8SApplicationDeploymentExecutor sets APPLICATION_ARGS = commandLine.getArgList(),
  so args[0] is the pipeline definition FILE PATH (shipped into the JobManager
  container, e.g. mounted via a ConfigMap by the Flink Kubernetes Operator).
- YarnApplicationDeploymentExecutor reads the file on the client side and sets
  APPLICATION_ARGS to the pipeline definition CONTENT.

Resolve args[0] in three cases, in order:
1. An explicit-scheme path (s3://, hdfs://, oss://, file://) is read through
   Flink's FileSystem so the matching plugin resolves it. The scheme is explicit,
   so it is not at risk of being hijacked by the cluster default FileSystem.
2. A bare local file path is read with the local JVM file API (java.nio
   Files.readAllBytes) instead of Flink's FileSystem, whose cluster default may
   be S3/HDFS and would not resolve a local path shipped next to the JobManager.
3. Otherwise the value is already the pipeline definition content (YARN) and is
   parsed verbatim.

This fixes K8S application mode without regressing YARN application mode, and
also supports placing the pipeline definition file on remote storage.

Also update the Kubernetes deployment docs (EN + ZH): the FlinkDeployment example
now uses entryClass org.apache.flink.cdc.cli.CliExecutor without --use-mini-cluster,
and the "native application mode is not supported" note is removed.

Regression introduced by apache#3643 (FLINK-35360, Support Yarn application mode for
yaml job).
@Zhile Zhile force-pushed the FLINK-40005-cli-executor-read-local-file branch from a96ec21 to 0e10c04 Compare June 30, 2026 06:30
@github-actions github-actions Bot added the docs Improvements or additions to documentation label Jun 30, 2026
@Zhile Zhile changed the title [FLINK-40005][cdc-cli] CliExecutor application mode reads pipeline definition from local file path [FLINK-40005][cdc-cli] CliExecutor application mode resolves pipeline definition from local path, remote path or inline content Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli docs Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant