Skip to content

temporal-spring-ai: preserve Usage and RateLimit in ChatResponse metadata#2854

Open
donald-pinckney wants to merge 4 commits intomasterfrom
spring-ai/response-metadata
Open

temporal-spring-ai: preserve Usage and RateLimit in ChatResponse metadata#2854
donald-pinckney wants to merge 4 commits intomasterfrom
spring-ai/response-metadata

Conversation

@donald-pinckney
Copy link
Copy Markdown
Contributor

@donald-pinckney donald-pinckney commented Apr 21, 2026

What was changed

  • ActivityChatModel.toResponse now rehydrates Usage and RateLimit onto the ChatResponseMetadata returned to workflow code. Previously only model was copied across; token counts and rate-limit headers were silently discarded.
  • Usage is rehydrated as a Spring AI DefaultUsage(prompt, completion, total). RateLimit is an interface with no public default impl in spring-ai-model, so we return an anonymous implementation backed by the fields from the serialized activity output record.
  • New ResponseMetadataTest drives a workflow that calls a stub model populating both Usage and RateLimit, then flattens the resulting metadata to primitives inside the workflow (the interfaces don't Jackson-round-trip across the workflow result) and asserts every field.

No changes to ChatModelTypes (the records already carry all fields) or ChatModelActivityImpl (already populates them).

Why?

Users of the plugin couldn't read token counts or rate-limit headers from Spring AI responses even though the underlying ChatModel returned them. Cost tracking, observability integrations, and rate-limit-aware advisors were all broken. This is a pure bugfix — the activity payload was already carrying the data, just not being put back on the response. Independent of #2852 and #2853 (no rebase needed, no coupling).

donald-pinckney and others added 4 commits April 21, 2026 15:53
…teLimit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Asserts that Usage (prompt/completion/total tokens) and RateLimit
(requests/tokens limit/remaining/reset) round-trip from a stub
ChatModel's ChatResponseMetadata through the chat activity and back
to workflow code. The workflow flattens to primitives because Usage
and RateLimit are interfaces and can't Jackson-round-trip across
the workflow result without concrete-type hints.

Currently fails with token counts of 0 (Spring AI's EmptyUsage
sentinel) because ActivityChatModel.toResponse only rehydrates
md.getModel() — Usage and RateLimit are dropped. The implementation
follows in a subsequent commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…data

ActivityChatModel.toResponse now rehydrates Usage and RateLimit onto the
ChatResponseMetadata it returns to workflow code, not just the model
name. The activity side (ChatModelActivityImpl) already serialized
these into the output record; they were being silently discarded when
the workflow side rebuilt the ChatResponse.

Usage is rehydrated as a Spring AI DefaultUsage(promptTokens,
completionTokens, totalTokens). RateLimit is an interface with no
public default impl in spring-ai-model, so we return an anonymous
implementation backed by the fields from the activity output record.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Planning scratchpad — not part of the shipped artifact. Removed before merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@donald-pinckney donald-pinckney marked this pull request as ready for review April 22, 2026 19:50
@donald-pinckney donald-pinckney requested a review from a team as a code owner April 22, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant