Enable async tool cancellation feature.#4241
Conversation
Codecov Report❌ Patch coverage is
... and 18 files with indirect coverage changes 🚀 New features to boost your workflow:
|
|
@aconchillo, @kompfner, @markbackman, should we invoke |
JiwaniZakir
left a comment
There was a problem hiding this comment.
In base_llm_adapter.py, the fallback branch in from_standard_tools (lines ~155–163) silently drops the user's legacy/provider-specific tools when _builtin_tools is non-empty—the original behavior was return tools which passed them through unchanged. Replacing them entirely with only built-in tools is a silent breaking change for anyone using a non-ToolsSchema format; a better approach would be to still pass the original tools through and emit the warning, rather than discarding them.
In _compose_system_instruction within llm_service.py, the old implementation unconditionally appended completion_instructions (i.e., even when _filter_incomplete_user_turns was False), whereas the new version gates this behind the flag. This is likely a bug fix, but it's worth verifying there aren't existing users or tests relying on the old unconditional behavior—a regression test covering the "turn completion off, async cancellation on" composition path would help here.
In llm_response_universal.py, tool_call_id is now duplicated—once inside the JSON content payload and once as a top-level message key. If this is intentional (e.g., for a provider that reads it from the content body), a brief comment explaining why would prevent future readers from removing it as redundant.
283aa29 to
7b45a56
Compare
58882cc to
772fb57
Compare
| await super().start(frame) | ||
| if not self._run_in_parallel: | ||
| await self._create_sequential_runner_task() | ||
| if self._has_async_functions(): |
There was a problem hiding this comment.
We should probably also add a property to the class so users can disable this behavior if they wish.
There was a problem hiding this comment.
Just wondering...would it ever make sense to make async tool cancellation a per-tool configuration as opposed to an all-or-nothing configuration?
There was a problem hiding this comment.
I think we can start with an all-or-nothing approach and make it a per-tool cancellation if needed.
There was a problem hiding this comment.
Is the thinking that we'll add this param in a follow-up PR?
There was a problem hiding this comment.
I was planning to add this parameter in this PR so we can keep it disabled by default, preserving the current behavior unless the user specifies otherwise.
There was a problem hiding this comment.
disabled by default
Oh, huh. I think I was imagining that it would be enabled by default (but honestly, hadn't thought about it too much). The nice thing is we haven't shipped async tool support yet, so there's no "shipped behavior" to consider and we can make this decision before we ship any of it.
What was your reasoning for having it disabled by default? Are LLMs already pretty good at simply ignoring tool results that come back after they're no longer relevant, i.e. after the conversation has already moved onto another topic and it's clear that the user is no longer interested in the result?
I guess a simple test would be "get me the weather in san francisco...oh wait, sorry, i meant san diego".
There was a problem hiding this comment.
at simply ignoring tool results that come back after they're no longer relevant
Hmm, actually, thinking about this a bit more, our architecture doesn't really always let the LLM make that judgment call...when the result comes back it will trigger an LLM response.
There was a problem hiding this comment.
The idea was to avoid injecting anything for the user by default, unless they explicitly ask for it.
However, LLMs are not yet good at ignoring function call results, at least in my tests. 🤕
There was a problem hiding this comment.
LLMs are not yet good at ignoring function call results, at least in my tests
We've actually set it up so they can't, I think—we ask them to run inference with those results as the latest message(s).
I wonder if the way to get them to reliably ignore irrelevant function call results is with some prompting akin to what @markbackman did with USER_TURN_COMPLETION_INSTRUCTIONS: asking the LLM to disregard "async_tool" results that are no longer relevant to the conversation, including in the case where the result is the only message after the last assistant message (in which case we could ask the LLM to output a symbol that will be ignored, essentially making it "skip" responding).
There was a problem hiding this comment.
I have added a new property for now: enable_async_tool_cancellation
| if name != CANCEL_ASYNC_TOOL_NAME | ||
| ) | ||
|
|
||
| def _setup_async_tool_cancellation(self): |
There was a problem hiding this comment.
Do we ever want to "tear down" async tool cancellation, say when the LLM receives a context with an updated set of tools that no longer contain async tools?
There was a problem hiding this comment.
I believe the right way to do this would be to validate, when unregister_function is invoked, whether any async tools remain, and if not, then we should remove that.
What do you think ?
There was a problem hiding this comment.
Oh, hmm...not sure. unregister_function is pretty optional, in terms of telling the LLM which set of tools is available to it...the user could choose to keep all tools "registered" with the Pipecat service but at runtime set LLMSetToolsFrames to change which tools are available for each inference.
Kind of makes me think we probably want to reconsider when/where we run the logic that determines if we have async tools and sets up the tool cancellation mechanism.
There was a problem hiding this comment.
we probably want to reconsider when/where we run the logic that determines if we have async tools and sets up the tool cancellation mechanism
Exploring what that would look like, in case it would be too gnarly...
We would have the "hook in" point for setting up the async tool cancellation mechanism be right before running inference, once we've resolved the actual set of tools available to the LLM for that inference run (not the set of tools register_functioned with the Pipecat LLM service). So...completely internal to the adapter get_llm_invocation_params function.
Advantages of this approach:
- It only adds the async tool cancellation mechanism if we really need it (in the case of more tools being
registered than are being used) - Correctly omits the tool cancellation mechanism if the user doesn't
unregister_function
Disadvantage of this approach:
- There are some LLMs where
toolscan also be specified directly in the init, and the resolution of the "active set" of tools happens afterget_llm_invocation_params.- Though...I'm not sure async tools even makes sense for realtime LLMs, since we don't really have a good way for all of them to report async result updates mid-conversation...maybe we should
- It would require an update to all the
get_llm_invocation_paramsimplementations in the adapters - It might be beneficial to the LLM to have the async-tool-cancellation-related system instructions in place continually throughout the conversation...
Taking a step back: the reason to change your approach would be to make the cancellation mechanism's lifespan be "tighter" around when async tools are actually available to the LLM. But it doesn't hurt to have the mechanism in place for longer than that. So...maybe let's do nothing.
|
I think we'll really want an example file to demonstrate where this behavior would be useful, with some helpful commenting explaining how the user might exercise it. |
|
Thoughts after talking with @filipi87... It seems like async tool cancellation is sort of being designed to be two things:
Re 1: I'm not sure the cancellation mechanism as it exists in this PR will excel at this without some modification, as tool calls are "expensive"—requiring a back and forth—and the LLM will respond to cancellation tool call results coming in (the latter point which maybe we can address with additional prompt engineering in the cancellation tool call result message). It's possible an entirely different approach like this one or something we're not yet considering, would be better suited to the task. Re 2: I feel a smidge uneasy using a hidden tool cancellation mechanism as a control interface for long-running subagent-like tasks - if the developer is adding something like "start_location_tracking", it would be more natural for them to provide a "stop_location_tracking" (and possibly even an "update_location_tracking", if they so desire). This then makes me wonder if these sorts of tasks are better suited for the WIP subagents and their corresponding control systems...which then makes me wonder whether we're making a mistake by bundling up long-running, streamed-response agent-like things into our conception of "tools" in the first place. Maybe I'm overthinking things and spiraling out...but I think it's worth a bit of more discussion, as we really want to get this right since it's a pretty foundational pattern. Thanks for bearing with me! |
|
Re 1: talked to @filipi87 and he'll try addressing the issue of
🤞 hopefully that'll be enough to make things work fairly well for both the disregard-result case as well as the cancel-long-running-task case (we talked about the validation scenarios that make sense for this). Re 2: after talking to @aconchillo and @filipi87, my existential unease about longer-running streamed-result tools have been assuaged. I'm back on board.
Yes, probably! For those who know me: very much classic me.
I think it was worth it 👍. |
|
This looks great to me @filipi87 ! My only concern would be the cancellation prompt. Worst case, if the prompt doesn't work for some use case/LLM, users are able to modify the module variables. |
Or they can simply leave the mechanism disabled and handle it on their own. |
|
@kompfner , it looks like we were already respecting when the run_llm is False:
|
|
@kompfner, it looks like we need to keep For example:
|
| self._function_call_task_finished(task) | ||
|
|
||
| if cancelled_items: | ||
| await self._call_event_handler("on_function_calls_cancelled", cancelled_items) |
There was a problem hiding this comment.
Shouldn't there be a single tool call that matches tool_call_id? So, this should be on_function_call_cancelled (singular) and we don't need to keep track of a list, I think.
There was a problem hiding this comment.
I created on_function_calls_cancelled to match the behavior of on_function_calls_started, and this callback is invoked even when there are no async function calls.
So, for example, if you have two function calls in progress (non-async) and you interrupt the bot, both will be cancelled.
|
Thanks for sharing the outcome of running the examples, @filipi87!
I see: if we hadn't set Our of curiosity: what was the assistant's response at the end of the ignore-my-first-weather-request example? I assume the assistant did the right thing and didn't feel the need to mention canceling the first request. |
kompfner
left a comment
There was a problem hiding this comment.
Thanks for bearing with all my comments! This is a tricky feature! Nice work.
Exactly. In that case, after we provided the result of the cancelled function, the assistant simply invoked |
| logger.info(f"Starting {len(function_calls)} function calls") | ||
|
|
||
| @task.event_handler("on_function_calls_cancelled") | ||
| async def on_function_calls_cancelled(service, cancelled): |
There was a problem hiding this comment.
Update to:
async def on_function_calls_cancelled(service, function_calls: List[FunctionCallFromLLM]):
same in the started case.
|
LGTM! |



Enable async tool cancellation feature.