Commit d267832
authored
feat: adds reporting for cost and latency optimization failures (#180)
**Requirements**
- [x] I have added test coverage for new or changed functionality
- [x] I have followed the repository's [pull request submission
guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests)
- [x] I have validated my changes against all supported platform
versions
**Describe the solution you've provided**
This is intended to demystify some of the results we're receiving from
the optimization package - namely:
- Total token counts are now accrued and reported with each result so
that we can see if a user crosses the total allowed tokens threshold
- Score results are reported for cost or latency if they're being
optimized against as an item in the `score` result so that it can be
shown on the UI
- Finally, if quality has already met the required threshold the prompt
now contains instructions to optimize only against cost (if cost is
being optimized against)
**Describe alternatives you've considered**
This is in some ways a bug fix since this information wasn't clear to
the user as to what was causing the failure. Technically additional
feature/functionality but likely required to express the required
information to make it actionable for the user.
**Additional context**
Cost and latency are only optimized for/include scores if they trigger
the keywords that would lead to them being optimized. "Base"
implementations without these features being used are unaffected.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Changes optimization pass/fail logic and persisted result payloads
(new gate scores, baseline handling, token-budget semantics), which
could affect when runs succeed/fail and what the UI/API receives.
>
> **Overview**
> Improves optimization run reporting by tracking and persisting a
single `accumulated_token_usage` total across agent, judge, and
variation calls, and including it in result PATCH payloads (extending
`generationTokens` to allow `accumulated_total`).
>
> Refactors latency/cost optimization to use explicit baseline values
(not `history[0]`), caps history growth (`_trim_history`) for both
standard and ground-truth flows, and adds synthetic
`_latency_gate`/`_cost_gate` score entries so gate failures are visible
in results.
>
> Adjusts run control flow so pass/fail is evaluated before token-limit
checks (including GT batches and validation), and updates variation
prompting to focus purely on cost reduction when quality is already
passing; also relaxes the cost gate tolerance from 20% to 10%
improvement and expands tests accordingly.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
365fa94. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->5 files changed
Lines changed: 1052 additions & 145 deletions
File tree
- packages/optimization
- src/ldai_optimizer
- tests
0 commit comments