AutoContextMemory calls block() during WebFlux reactive execution
AgentScope-Java is an open-source project. To involve a broader community, we recommend asking your questions in English.
Describe the bug
When using ReActAgent with AutoContextMemory and AutoContextHook in a Spring WebFlux streaming endpoint, AgentScope-Java may call Mono.block() inside the Reactor event-loop thread during auto context compression.
This causes Reactor to throw:
java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread reactor-http-epoll-2
The blocking call appears in io.agentscope.core.memory.autocontext.AutoContextMemory, where compression summaries are generated by streaming the model and then calling .block() to wait for the final compressed message.
To Reproduce
Steps to reproduce the behavior:
- Create a
ReActAgent with AutoContextMemory and AutoContextHook.
AutoContextMemory memory = new AutoContextMemory(autoContextConfig, compressionModel);
ReActAgent agent = ReActAgent.builder()
.name("my_agent")
.model(model)
.memory(memory)
.hook(new AutoContextHook())
.sysPrompt(sysPrompt)
.build();
- Execute the agent from a Spring WebFlux / Reactor streaming request, for example:
Flux<AguiSsePayload> events = agent.stream(messages, options)
.concatMapIterable(event -> convertEvent(event, state));
-
Make the conversation history large enough to trigger AutoContextMemory#compressIfNeeded().
-
During PreReasoningEvent, AutoContextHook calls autoContextMemory.compressIfNeeded(). Inside AutoContextMemory, the following blocking code is executed on the Reactor HTTP event-loop thread:
Msg block =
model.stream(newMessages, null, options)
.concatMap(chunk -> processChunk(chunk, context))
.then(Mono.defer(() -> Mono.just(context.buildFinalMessage())))
.onErrorResume(InterruptedException.class, Mono::error)
.block();
- See the runtime error.
Expected behavior
AutoContextMemory should not call block() on a Reactor event-loop thread.
Possible expected behaviors:
- Provide a fully reactive compression API, for example
Mono<Boolean> compressIfNeededAsync().
- Make
AutoContextHook schedule blocking compression work on Schedulers.boundedElastic().
- Avoid blocking entirely by composing the model stream reactively.
Error messages
java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread reactor-http-epoll-2
Relevant source location in AutoContextMemory.java:
Msg block =
model.stream(newMessages, null, options)
.concatMap(chunk -> processChunk(chunk, context))
.then(Mono.defer(() -> Mono.just(context.buildFinalMessage())))
.onErrorResume(InterruptedException.class, Mono::error)
.block();
The same pattern appears multiple times in AutoContextMemory.java, including summary / compression paths for large messages, current round summaries, previous round summaries, and tool compression.
Environment (please complete the following information):
- AgentScope-Java Version:
1.1.0-RC1
- Java Version:
25
- OS:
Windows 10 for local development; runtime error observed on a Reactor Netty reactor-http-epoll-* thread
- Framework: Spring WebFlux / Reactor
Additional context
In our application, AutoContextMemory is used in an AG-UI / SSE streaming endpoint. The endpoint returns a Flux and executes agent.stream(...) as part of the reactive request pipeline.
As a workaround, we replaced AutoContextHook with a project-local hook that keeps the original behavior but schedules autoContextMemory.compressIfNeeded() on Schedulers.boundedElastic() before updating the PreReasoningEvent input messages.
This avoids blocking the Reactor HTTP event-loop thread, but a native fix in AgentScope-Java would be preferable.
AutoContextMemorycallsblock()during WebFlux reactive executionAgentScope-Java is an open-source project. To involve a broader community, we recommend asking your questions in English.
Describe the bug
When using
ReActAgentwithAutoContextMemoryandAutoContextHookin a Spring WebFlux streaming endpoint, AgentScope-Java may callMono.block()inside the Reactor event-loop thread during auto context compression.This causes Reactor to throw:
The blocking call appears in
io.agentscope.core.memory.autocontext.AutoContextMemory, where compression summaries are generated by streaming the model and then calling.block()to wait for the final compressed message.To Reproduce
Steps to reproduce the behavior:
ReActAgentwithAutoContextMemoryandAutoContextHook.Make the conversation history large enough to trigger
AutoContextMemory#compressIfNeeded().During
PreReasoningEvent,AutoContextHookcallsautoContextMemory.compressIfNeeded(). InsideAutoContextMemory, the following blocking code is executed on the Reactor HTTP event-loop thread:Expected behavior
AutoContextMemoryshould not callblock()on a Reactor event-loop thread.Possible expected behaviors:
Mono<Boolean> compressIfNeededAsync().AutoContextHookschedule blocking compression work onSchedulers.boundedElastic().Error messages
Relevant source location in
AutoContextMemory.java:The same pattern appears multiple times in
AutoContextMemory.java, including summary / compression paths for large messages, current round summaries, previous round summaries, and tool compression.Environment (please complete the following information):
1.1.0-RC125Windows 10for local development; runtime error observed on a Reactor Nettyreactor-http-epoll-*threadAdditional context
In our application,
AutoContextMemoryis used in an AG-UI / SSE streaming endpoint. The endpoint returns aFluxand executesagent.stream(...)as part of the reactive request pipeline.As a workaround, we replaced
AutoContextHookwith a project-local hook that keeps the original behavior but schedulesautoContextMemory.compressIfNeeded()onSchedulers.boundedElastic()before updating thePreReasoningEventinput messages.This avoids blocking the Reactor HTTP event-loop thread, but a native fix in AgentScope-Java would be preferable.