Description
When running ingestion via the DataHub executor (acryl-datahub-executor v1.4.0.3), the ingestion subprocess completes successfully and exits cleanly, but the executor wrapper never detects the exit. The job remains permanently stuck in RUNNING in the DataHub UI and never transitions to SUCCESS.
This is not a pipeline issue — the data is ingested correctly. The bug is in the executor's subprocess monitoring loop.
Steps to Reproduce
- Run any scheduled ingestion source via the DataHub executor (confirmed with
iceberg and azure_ad)
- Wait for the pipeline subprocess to complete
- Observe that the run status never changes from
Running to Success in the UI
Expected Behavior
Job transitions to SUCCESS after the subprocess exits.
Actual Behavior
- Subprocess exits cleanly (
Pipeline finished successfully, pending_requests: 0, checkpoint committed)
- Executor wrapper continues sending unchanged
RUNNING heartbeats to GMS
- GMS logs:
Skipped producing MCL for ingested aspect dataHubExecutionRequestResult ... Aspect has not changed.
- "Stale logs" warning appears ~4 minutes after completion, job never resolves
Evidence
- Pipeline finished at
21:20:22 with 0 failures, 0 warnings, all events confirmed
- Stale-logs warning at
~21:24:12 — 230 seconds later, still RUNNING
- GMS k9s log confirms no
dataHubExecutionRequestResult aspect write after subprocess exit
- Reproduced across multiple sources (
iceberg, azure_ad) — confirms systemic executor-level issue, not connector-specific
Environment
| Field |
Value |
acryl-datahub version |
1.4.0.3 |
| Executor |
datahub-executor pod (Kubernetes) |
| Sources affected |
iceberg, azure_ad (likely all sources) |
Suspected Root Cause
Race condition or missing waitpid / signal handler in the executor's subprocess monitoring loop — the wrapper never receives or acts on the child process exit signal.
Workaround
Switching the sink to SYNC mode reduces occurrence by eliminating the async drain/flush phase, but does not fully resolve the issue for all cases.
sink:
type: datahub-rest
config:
mode: SYNC
Description
When running ingestion via the DataHub executor (
acryl-datahub-executorv1.4.0.3), the ingestion subprocess completes successfully and exits cleanly, but the executor wrapper never detects the exit. The job remains permanently stuck inRUNNINGin the DataHub UI and never transitions toSUCCESS.This is not a pipeline issue — the data is ingested correctly. The bug is in the executor's subprocess monitoring loop.
Steps to Reproduce
icebergandazure_ad)RunningtoSuccessin the UIExpected Behavior
Job transitions to
SUCCESSafter the subprocess exits.Actual Behavior
Pipeline finished successfully,pending_requests: 0, checkpoint committed)RUNNINGheartbeats to GMSSkipped producing MCL for ingested aspect dataHubExecutionRequestResult ... Aspect has not changed.Evidence
21:20:22with 0 failures, 0 warnings, all events confirmed~21:24:12— 230 seconds later, stillRUNNINGdataHubExecutionRequestResultaspect write after subprocess exiticeberg,azure_ad) — confirms systemic executor-level issue, not connector-specificEnvironment
acryl-datahubversion1.4.0.3datahub-executorpod (Kubernetes)iceberg,azure_ad(likely all sources)Suspected Root Cause
Race condition or missing
waitpid/ signal handler in the executor's subprocess monitoring loop — the wrapper never receives or acts on the child process exit signal.Workaround
Switching the sink to
SYNCmode reduces occurrence by eliminating the async drain/flush phase, but does not fully resolve the issue for all cases.