temporal-spring-ai: preserve Usage and RateLimit in ChatResponse metadata#2854
Open
donald-pinckney wants to merge 4 commits intomasterfrom
Open
temporal-spring-ai: preserve Usage and RateLimit in ChatResponse metadata#2854donald-pinckney wants to merge 4 commits intomasterfrom
donald-pinckney wants to merge 4 commits intomasterfrom
Conversation
…teLimit) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Asserts that Usage (prompt/completion/total tokens) and RateLimit (requests/tokens limit/remaining/reset) round-trip from a stub ChatModel's ChatResponseMetadata through the chat activity and back to workflow code. The workflow flattens to primitives because Usage and RateLimit are interfaces and can't Jackson-round-trip across the workflow result without concrete-type hints. Currently fails with token counts of 0 (Spring AI's EmptyUsage sentinel) because ActivityChatModel.toResponse only rehydrates md.getModel() — Usage and RateLimit are dropped. The implementation follows in a subsequent commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…data ActivityChatModel.toResponse now rehydrates Usage and RateLimit onto the ChatResponseMetadata it returns to workflow code, not just the model name. The activity side (ChatModelActivityImpl) already serialized these into the output record; they were being silently discarded when the workflow side rebuilt the ChatResponse. Usage is rehydrated as a Spring AI DefaultUsage(promptTokens, completionTokens, totalTokens). RateLimit is an interface with no public default impl in spring-ai-model, so we return an anonymous implementation backed by the fields from the activity output record. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Planning scratchpad — not part of the shipped artifact. Removed before merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What was changed
ActivityChatModel.toResponsenow rehydratesUsageandRateLimitonto theChatResponseMetadatareturned to workflow code. Previously onlymodelwas copied across; token counts and rate-limit headers were silently discarded.Usageis rehydrated as a Spring AIDefaultUsage(prompt, completion, total).RateLimitis an interface with no public default impl inspring-ai-model, so we return an anonymous implementation backed by the fields from the serialized activity output record.ResponseMetadataTestdrives a workflow that calls a stub model populating bothUsageandRateLimit, then flattens the resulting metadata to primitives inside the workflow (the interfaces don't Jackson-round-trip across the workflow result) and asserts every field.No changes to
ChatModelTypes(the records already carry all fields) orChatModelActivityImpl(already populates them).Why?
Users of the plugin couldn't read token counts or rate-limit headers from Spring AI responses even though the underlying
ChatModelreturned them. Cost tracking, observability integrations, and rate-limit-aware advisors were all broken. This is a pure bugfix — the activity payload was already carrying the data, just not being put back on the response. Independent of #2852 and #2853 (no rebase needed, no coupling).