feat(agentserver): light up durable-task primitive (core 2.0.0b6 + invocations 1.0.0b5) by RaviPidaparthi · Pull Request #46997 · Azure/azure-sdk-for-python

RaviPidaparthi · 2026-05-19T19:12:36Z

Summary

Lights up the durable-task primitive in azure-ai-agentserver-core
2.0.0b6 (and the matching invocations-protocol sample suite in
azure-ai-agentserver-invocations 1.0.0b5) as a new feature.

The durable-task primitive is a small decorator-driven API that lets a
hosted agent run long operations as named tasks that survive
process crashes, OOM kills, and container redeployments. Tasks pick up
exactly where they were after recovery, without the developer writing
any explicit checkpoint or replay code.

Full developer guide:
sdk/agentserver/azure-ai-agentserver-core/docs/durable-task-guide.md.

Scope of THIS PR

azure-ai-agentserver-core — full durable-task primitive shipping
for the first time (no prior release of the primitive).
azure-ai-agentserver-invocations — matching durable sample
suite (durable_copilot, durable_multiturn, durable_langgraph,
durable_research) demonstrating the primitive end-to-end on the
invocations transport. Plus per-sample READMEs, a SHIPPABLE.md
manifest, a cross-sample DURABLE_SAMPLES.md operational guide, and
a CI gate (test_samples_shippable_bar.py) that enforces the
per-sample shippable bar on every PR.

Out of scope of this PR (split into separate PRs)

azure-ai-agentserver-responses durable orchestration
→ see PR for branch feature/agentserver-responses-spec016
durable-agent-demo azd-deployable hosted-agent sample
→ see PR for branch feature/agentserver-durable-agent-demo
(temporary; never-merged demo sample)

What the primitive ships

Tour:

from azure.ai.agentserver.core.durable import task

@task(name="long_research")
async def do_research(ctx, prompt: str) -> dict:
    if ctx.entry_mode == "recovered":
        # Pick up from where you were, using ctx.metadata
        ...
    await ctx.stream({"phase": "searching"})
    ...
    return {"summary": "..."}

# In your handler:
run = await do_research.run(task_id="task-123", input={"prompt": "..."})
async for chunk in run:
    ...
result = await run.result()

Concepts shipping

@task(...) decorator + Task returned object with .run(),
.start(), .options(...), .get_active_run(task_id).
TaskContext — entry_mode, input, metadata (with auto-flush
at lifecycle boundaries), cancel (asyncio.Event), cause
booleans timeout_exceeded / cancel_requested, steering signals
pending_input_count / is_steered_turn, shutdown,
retry_attempt, recovery_count. Provides ctx.suspend(output=...),
ctx.stream(chunk), ctx.exit_for_recovery().
TaskResult.status: Literal["completed", "suspended"].
Failure paths surface as exceptions (TaskFailed, TaskCancelled,
TaskConflictError).
TaskConflictError — single error type for any "task is busy / not
available" state (live elsewhere, recovered elsewhere, evicted under
split-brain protection, terminal with queued steerer). Carries
current_status so callers can branch.
RetryPolicy — exponential / fixed / linear backoff presets,
durable across crash and recovery.
EntryMode Literal: "fresh" | "resumed" | "recovered".
Suspended (sentinel for .run() of a suspended task),
TaskStatus Literal, TaskMetadata, StreamHandler,
StreamHandlerFactory, QueueStreamHandler.

Behavior shipping

Automatic recovery — crashed-mid-task records are detected at
three layers (startup scan, periodic background scan, inline reclaim
at scheduling primitives). The developer sees
ctx.entry_mode == "recovered" and otherwise the same TaskContext
surface as on a fresh entry.
Split-brain protection — a new agent process that takes over a
session cancels stranded executions in the previous process cleanly
via HTTP 409 binding_mismatch. The previous process cancels its
execution, suppresses its terminal write, and signals its awaiters
with TaskConflictError.
Steering as plain multi-turn — Task.start(...) on an already-
active steerable task queues the new input. The first turn's
ctx.suspend(...) call resolves the steerer's .result() with the
next turn's outcome.
Per-turn wall-clock durable timeout — @task(timeout=...) is
anchored to a persisted per-turn-start timestamp. A crash mid-turn
does NOT reset the budget; the recovered watchdog computes
remaining budget from the persisted timestamp.
Metadata auto-flush at lifecycle boundaries — ctx.metadata is
flushed automatically at every terminal-of-turn boundary.
Bookkeeping is durable — suspended-resume input patches are
ETag-protected; steerable input data is cleared at the suspend
transition (data minimization); the lease owner string incorporates
both FOUNDRY_AGENT_NAME and session ID so two different agents
sharing a session ID cannot collide on lease ownership.

Transport

HostedTaskProvider is built on azure.core.AsyncPipelineClient
with the standard policy chain (request-id, headers, user-agent,
retry, AsyncBearerTokenCredentialPolicy, task-API logging,
distributed tracing). Retry policy retries on 5xx / 408 / 429 only —
never on 409 regardless of body. ContentDecodePolicy intentionally
excluded; body parsing happens at the call site with defensive
error handling.
httpx is no longer a production dependency.

Validation

Check	Status
Core pylint	10.00/10 (0 new findings vs origin/main)
Core mypy	0 new errors
Core pyright	Pass
Core sphinx	Pass
Invocations pylint	0 new findings vs origin/main
Core tests	439 passed, 6 skipped
Core durable suite	345 passed, 1 skipped
Invocations tests	244 passed, 2 skipped

…-core Implements a crash-resilient durable task system with: - @durable_task decorator with full lifecycle management (start, run, get, cancel, terminate) - TaskResult[Output] wrapper replacing exception-based suspension handling - Cooperative cancellation and configurable timeouts - Configurable retry policies with backoff - Callable factories for tags, title, and description - Local in-memory provider for development/testing - Task streaming support via AsyncIterator - Lease-based distributed locking - Ephemeral and persistent task modes - Task metadata and source provenance tracking Includes: - 248 passing tests across 17 test modules - 3 sample applications (retry, source, streaming) - Developer guide documentation - Spec files (001-006) covering all design decisions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>