Skip to content

[Bug] Single spurious "exception in shielded future" ERROR log per cancelled in-flight activity on Python 3.11+ (residual after #1523) #1600

@AlePiccin

Description

@AlePiccin

What are you really trying to do?

Cancelling a running workflow that still has activities in flight, without ERROR-level log noise for what is a normal, expected operation. After #1504 was fixed by #1523 (shipped in 1.28.0), the amplified logging is gone, but a single spurious exception in shielded future ERROR is still emitted per in-flight activity on cancel, which pollutes error logs/alerting.

Describe the bug

#1523 added task.uncancel() to the shield loops and removed the amplified behavior: without uncancel() the cancel counter stayed elevated on Python 3.11+, the while True / asyncio.shield loop re-raised CancelledError on every re-await, and each turn (a) sent a duplicate cancel command to the server and (b) armed another _log_on_exception. That part is genuinely fixed — thank you.

However, on Python 3.11+ (observed on 3.14) a single spurious line is still logged at ERROR level per in-flight activity when a workflow with running activities is cancelled:

ActivityError exception in shielded future
future: <Future finished exception=ActivityError('Activity cancelled')>
...
temporalio.exceptions.ActivityError: Activity cancelled

Cancelling a workflow with 4 activities in flight produces 4 such ERROR lines. The cancellation itself is delivered and handled correctly — these are orphaned duplicates of the same exception, not a functional failure. On Python 3.10 it is silent.

Expected: no ERROR-level log on an explicit, expected cancellation — consistent with the stated goal of #1504/#1523 ("spurious error-level logging of 'exception in shielded future'").

Minimal Reproduction

import asyncio
from datetime import timedelta
from temporalio import activity, workflow

@activity.defn
async def slow_activity() -> None:
    await asyncio.sleep(60)

@workflow.defn
class MyWorkflow:
    @workflow.run
    async def run(self) -> None:
        await asyncio.gather(*(
            workflow.execute_activity(
                slow_activity,
                start_to_close_timeout=timedelta(minutes=2),
            )
            for _ in range(4)
        ))

Start the workflow, wait for the activities to begin, then await handle.cancel(). The worker emits 4 ActivityError exception in shielded future ERROR lines (one per in-flight activity). On Python 3.10 it is silent.

Environment/Versions

Additional context

uncancel() cannot suppress this one, because the offending callback is armed before the except CancelledError block runs.

In CPython 3.14's reworked asyncio.shield() (Lib/asyncio/tasks.py), when the outer future is cancelled before the inner completes, _outer_done_callback arms _log_on_exception on the inner future:

def _outer_done_callback(outer):
    if not inner.done():
        inner.remove_done_callback(_inner_done_callback)
        inner.remove_done_callback(_log_on_exception)
        inner.add_done_callback(_log_on_exception)   # armed here

_log_on_exception then logs via inner._loop.call_exception_handler(...) if the inner future finishes not cancelled and with an exception.

In run_activity() (_outbound_schedule_activity, _workflow_instance.py):

  1. Workflow cancel → the _ActivityHandle task is cancelled → the outer of await asyncio.shield(handle._result_fut) is cancelled. _outer_done_callback arms _log_on_exception on handle._result_fut. This happens before our except runs, so uncancel() (which runs afterward) can't undo this first arming.
  2. except CancelledError: send cancel command, uncancel(), loop → re-shield the same _result_fut. Thanks to uncancel() this await now blocks normally (no re-raise loop ⇒ no further arming — the fix: call task.uncancel() after catching CancelledError in shield loops (Python 3.11+) #1523 win).
  3. Activity reports cancelled ⇒ the SDK resolves _result_fut via set_exception(ActivityError("Activity cancelled")) (not cancel()). The new shield delivers it correctly to the workflow, and the still-armed _log_on_exception from step 1 fires once ⇒ one ERROR line, routed through the workflow loop's default_exception_handler.

So it's exactly one residual log per cancelled in-flight activity, structurally unavoidable with the current shield-and-reshield pattern. The same pattern exists in the other 5 loops touched by #1523 (child workflow, nexus, signal-external), which likely emit one residual line each on cancel too.

Possible fix directions — the leftover arming comes from the first, now-abandoned shield:

  • When re-shielding the same future after catching CancelledError, drop the stale callback the abandoned outer left behind (_result_fut.remove_done_callback(...)) before the next asyncio.shield(...). (Relies on a CPython-internal callback, so brittle.)
  • Avoid re-shielding the same future across loop iterations — await it once and re-arm cancel separately.
  • As a last resort, consume the orphaned exception explicitly (a done callback that calls .exception() to mark it retrieved) so _log_on_exception finds nothing to report.

Functionally harmless — the impact is ERROR-level log/alert pollution on a normal, expected cancellation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions