Skip to content

Commit 3b4baa3

Browse files
feat: adds ability to optimize for cost (#172)
**Requirements** - [x] I have added test coverage for new or changed functionality - [x] I have followed the repository's [pull request submission guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests) - [x] I have validated my changes against all supported platform versions **Describe the solution you've provided** Implements cost optimization in the same manner as latency optimization. Searches the acceptance statement for keywords pertaining to token usage/cost (e.g. costs, pricing, bill) and adds instructions to the variation generation to try to optimize for costs. Additionally has the acceptance statement prompt return instructions for the variation generation (ie, cheaper model, etc). **Describe alternatives you've considered** This is a feature addition. **Additional context** We'll be adding UI options for both latency and cost with adjustable thresholds, but these are still valid once those arrive since a mention of cost/latency means the user is trying to optimize for it. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Adds new cost-gating logic and changes iteration/batch bookkeeping (baseline tracking, history trimming, token-limit handling), which can affect optimization outcomes and persisted result records. Risk is moderated by extensive new unit tests covering the new gates and edge cases. > > **Overview** > Adds **cost optimization support** alongside existing latency optimization: acceptance statements are scanned for cost keywords, agent calls get per-turn `estimated_cost_usd` (via model pricing when available), and a new `_cost_gate` is applied similarly to `_latency_gate`, with both gates recorded as synthetic judge scores for visibility. > > Improves optimization loop correctness and observability by explicitly tracking baselines (duration and cost), trimming `_history` to bounded windows (standard and GT), counting variation-generation tokens into the run total, stamping `accumulated_token_usage` into result payloads, and refining token-limit behavior (treat `0` as unlimited and evaluate pass/fail before halting on budget). Also tightens model ID prefix stripping to avoid breaking Bedrock region-style IDs and updates package metadata naming/description. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 4fc1ecf. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
2 parents 92f51fa + 4fc1ecf commit 3b4baa3

7 files changed

Lines changed: 1973 additions & 105 deletions

File tree

packages/optimization/pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[project]
2-
name = "ldai_optimizer"
2+
name = "launchdarkly-ai-optimizer"
33
version = "0.1.0" # x-release-please-version
4-
description = "LaunchDarkly AI tool — optimizer"
4+
description = "LaunchDarkly AI tool — Optimization"
55
authors = [{name = "LaunchDarkly", email = "dev@launchdarkly.com"}]
66
license = {text = "Apache-2.0"}
77
readme = "README.md"

packages/optimization/src/ldai_optimizer/client.py

Lines changed: 514 additions & 75 deletions
Large diffs are not rendered by default.

packages/optimization/src/ldai_optimizer/dataclasses.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ class JudgeResult:
4343
rationale: Optional[str] = None
4444
duration_ms: Optional[float] = None
4545
usage: Optional[TokenUsage] = None
46+
estimated_cost_usd: Optional[float] = None
4647

4748
def to_json(self) -> Dict[str, Any]:
4849
"""
@@ -61,6 +62,8 @@ def to_json(self) -> Dict[str, Any]:
6162
"input": self.usage.input,
6263
"output": self.usage.output,
6364
}
65+
if self.estimated_cost_usd is not None:
66+
result["estimated_cost_usd"] = self.estimated_cost_usd
6467
return result
6568

6669

@@ -217,6 +220,8 @@ class OptimizationContext:
217220
iteration: int = 0 # current iteration number
218221
duration_ms: Optional[float] = None # wall-clock time for the agent call in milliseconds
219222
usage: Optional[TokenUsage] = None # token usage reported by the agent for this iteration
223+
estimated_cost_usd: Optional[float] = None # estimated cost; USD when pricing available, else total tokens
224+
accumulated_token_usage: Optional[int] = None # single running total across ALL calls in this run (generation + judges + variation)
220225

221226
def copy_without_history(self) -> OptimizationContext:
222227
"""
@@ -236,6 +241,8 @@ def copy_without_history(self) -> OptimizationContext:
236241
iteration=self.iteration,
237242
duration_ms=self.duration_ms,
238243
usage=self.usage,
244+
estimated_cost_usd=self.estimated_cost_usd,
245+
accumulated_token_usage=self.accumulated_token_usage,
239246
)
240247

241248
def to_json(self) -> Dict[str, Any]:
@@ -261,6 +268,8 @@ def to_json(self) -> Dict[str, Any]:
261268
"history": history_list,
262269
"iteration": self.iteration,
263270
"duration_ms": self.duration_ms,
271+
"estimated_cost_usd": self.estimated_cost_usd,
272+
"accumulated_token_usage": self.accumulated_token_usage,
264273
}
265274
if self.usage is not None:
266275
result["usage"] = {

packages/optimization/src/ldai_optimizer/ld_api_client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ class AgentOptimizationResultPatch(TypedDict, total=False):
118118
completionResponse: str
119119
scores: Dict[str, Any]
120120
generationLatency: int
121-
generationTokens: Dict[str, int]
121+
generationTokens: Dict[str, Any]
122122
evaluationLatencies: Dict[str, float]
123123
evaluationTokens: Dict[str, Dict[str, int]]
124124
variation: Dict[str, Any]

packages/optimization/src/ldai_optimizer/prompts.py

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,13 @@
1616
re.IGNORECASE,
1717
)
1818

19+
_COST_KEYWORDS = re.compile(
20+
r"\b(cheap|cheaper|cheapest|costs?|costly|expensive|budget|affordable|"
21+
r"spend|spending|economical|cost-effective|frugal|"
22+
r"price|pricing|bill|billing)\b",
23+
re.IGNORECASE,
24+
)
25+
1926

2027
def _acceptance_criteria_implies_duration_optimization(
2128
judges: Optional[Dict[str, OptimizationJudge]],
@@ -39,6 +46,28 @@ def _acceptance_criteria_implies_duration_optimization(
3946
return False
4047

4148

49+
def _acceptance_criteria_implies_cost_optimization(
50+
judges: Optional[Dict[str, OptimizationJudge]],
51+
) -> bool:
52+
"""Return True if any judge acceptance statement implies a cost reduction goal.
53+
54+
Scans each judge's acceptance_statement for cost-related keywords. The
55+
check is case-insensitive. Returns False when judges is None or no judge
56+
carries an acceptance statement.
57+
58+
:param judges: Judge configuration dict from OptimizationOptions, or None.
59+
:return: True if cost optimization should be applied.
60+
"""
61+
if not judges:
62+
return False
63+
for judge in judges.values():
64+
if judge.acceptance_statement and _COST_KEYWORDS.search(
65+
judge.acceptance_statement
66+
):
67+
return True
68+
return False
69+
70+
4271
def build_message_history_text(
4372
history: List[OptimizationContext],
4473
input_text: str,
@@ -114,6 +143,8 @@ def build_new_variation_prompt(
114143
variable_choices: List[Dict[str, Any]],
115144
initial_instructions: str,
116145
optimize_for_duration: bool = False,
146+
optimize_for_cost: bool = False,
147+
quality_already_passing: bool = False,
117148
) -> str:
118149
"""
119150
Build the LLM prompt for generating an improved agent configuration.
@@ -133,6 +164,11 @@ def build_new_variation_prompt(
133164
:param initial_instructions: The original unmodified instructions template
134165
:param optimize_for_duration: When True, appends a duration optimization section
135166
instructing the LLM to prefer faster models and simpler instructions.
167+
:param optimize_for_cost: When True, appends a cost optimization section
168+
instructing the LLM to prefer cheaper models and reduce token usage.
169+
:param quality_already_passing: When True, signals that all judge criteria are
170+
currently passing and the cost optimization section should instruct the LLM
171+
to preserve existing behavior while only reducing cost.
136172
:return: The assembled prompt string
137173
"""
138174
sections = [
@@ -147,6 +183,7 @@ def build_new_variation_prompt(
147183
history, model_choices, variable_choices, initial_instructions
148184
),
149185
variation_prompt_duration_optimization(model_choices) if optimize_for_duration else "",
186+
variation_prompt_cost_optimization(model_choices, quality_already_passing=quality_already_passing) if optimize_for_cost else "",
150187
]
151188

152189
return "\n\n".join(s for s in sections if s)
@@ -248,6 +285,8 @@ def variation_prompt_configuration(
248285
lines.append(f"Agent response: <untrusted>{previous_ctx.completion_response}</untrusted>")
249286
if previous_ctx.duration_ms is not None:
250287
lines.append(f"Agent duration: {previous_ctx.duration_ms:.0f}ms")
288+
if previous_ctx.estimated_cost_usd is not None:
289+
lines.append(f"Estimated agent cost: ${previous_ctx.estimated_cost_usd:.6f}")
251290
return "\n".join(lines)
252291
else:
253292
return "\n".join(
@@ -301,6 +340,8 @@ def variation_prompt_feedback(
301340
lines.append(feedback_line)
302341
if ctx.duration_ms is not None:
303342
lines.append(f"Agent duration: {ctx.duration_ms:.0f}ms")
343+
if ctx.estimated_cost_usd is not None:
344+
lines.append(f"Estimated agent cost: ${ctx.estimated_cost_usd:.6f}")
304345
return "\n".join(lines)
305346

306347

@@ -556,3 +597,76 @@ def variation_prompt_duration_optimization(model_choices: List[str]) -> str:
556597
"Quality criteria remain the primary objective — do not sacrifice passing scores to achieve lower latency.",
557598
]
558599
)
600+
601+
602+
def variation_prompt_cost_optimization(
603+
model_choices: List[str],
604+
quality_already_passing: bool = False,
605+
) -> str:
606+
"""
607+
Cost optimization section of the variation prompt.
608+
609+
Included when acceptance criteria imply a cost reduction goal. Instructs
610+
the LLM to treat token usage as a secondary objective — quality criteria
611+
must still be met first — and provides concrete guidance on how to reduce
612+
cost through model selection and instruction simplification.
613+
614+
When ``quality_already_passing`` is True, the framing shifts: since all
615+
judge criteria are already satisfied, the LLM is instructed to preserve
616+
the existing behavior exactly and only apply changes that reduce cost
617+
without affecting output quality.
618+
619+
:param model_choices: List of model IDs the LLM may select from, so it can
620+
apply its own knowledge of which models tend to be cheaper.
621+
:param quality_already_passing: When True, signals that all judge criteria
622+
are currently passing. The section will direct the LLM to preserve
623+
output quality and focus exclusively on cost reduction strategies.
624+
:return: The cost optimization prompt block.
625+
"""
626+
if quality_already_passing:
627+
intent_lines = [
628+
"## Cost Optimization:",
629+
"The acceptance criteria for this optimization implies that token usage / cost should be reduced.",
630+
"*** IMPORTANT: All quality acceptance criteria are currently passing. ***",
631+
"The goal of this variation is to reduce cost WITHOUT changing the behavior or quality of the agent's responses.",
632+
"Do NOT alter the instructions in ways that would change what the agent says or how it reasons.",
633+
"Only apply changes that reduce token usage or switch to a cheaper model while preserving the same output quality.",
634+
"If you cannot reduce cost without risking quality, keep the instructions unchanged and only consider a cheaper model.",
635+
"",
636+
]
637+
else:
638+
intent_lines = [
639+
"## Cost Optimization:",
640+
"The acceptance criteria for this optimization implies that token usage / cost should be reduced.",
641+
"In addition to improving quality, generate a variation that aims to reduce the agent's cost.",
642+
"",
643+
]
644+
645+
shared_lines = [
646+
"Cost is driven by two factors: (1) the number of tokens processed, and (2) the per-token price of the model.",
647+
"Target both factors with the strategies below.",
648+
"",
649+
"### Reducing token usage (input tokens):",
650+
"- Remove redundant, verbose, or repeated phrasing from the instructions.",
651+
"- Collapse multi-sentence explanations into a single concise directive.",
652+
"- Remove examples or few-shot demonstrations unless they are essential for accuracy.",
653+
"- Eliminate instructional scaffolding that the model does not need (e.g. 'You are a helpful assistant that...').",
654+
"- Use bullet points instead of prose where possible — they are more token-efficient.",
655+
"",
656+
"### Reducing token usage (output tokens):",
657+
"- Instruct the agent to be concise and avoid unnecessary elaboration.",
658+
"- Specify the exact format and length of the expected response (e.g. 'Respond in one sentence.').",
659+
"- Set or reduce max_tokens if the current value allows longer responses than needed.",
660+
"- Avoid instructions that encourage the agent to 'explain its reasoning' unless required by the acceptance criteria.",
661+
"",
662+
"### Reducing per-token cost via model selection:",
663+
"- Consider switching to a cheaper model from the available choices if quality requirements can still be met.",
664+
f" Available models: {model_choices}",
665+
" Use your knowledge of relative model pricing to prefer lower-cost options.",
666+
" Only switch models if the cheaper model is capable of satisfying the acceptance criteria.",
667+
"",
668+
"Quality criteria remain the primary objective — do not sacrifice passing scores to achieve lower cost.",
669+
"Apply cost-reduction changes incrementally: prefer the smallest change that measurably reduces cost.",
670+
]
671+
672+
return "\n".join(intent_lines + shared_lines)

packages/optimization/src/ldai_optimizer/util.py

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,10 @@
55
import logging
66
import random
77
import re
8-
from typing import Any, Awaitable, Dict, List, Optional, Tuple, TypeVar, Union
8+
from typing import TYPE_CHECKING, Any, Awaitable, Dict, List, Optional, Tuple, TypeVar, Union
9+
10+
if TYPE_CHECKING:
11+
from ldai.tracker import TokenUsage
912

1013
from ldai_optimizer._slug_words import _ADJECTIVES, _NOUNS
1114

@@ -313,3 +316,44 @@ def judge_passed(score: float, threshold: float, is_inverted: bool) -> bool:
313316
the score must stay at or below the threshold: ``score <= threshold``.
314317
"""
315318
return score <= threshold if is_inverted else score >= threshold
319+
320+
321+
def estimate_cost(
322+
usage: Optional["TokenUsage"],
323+
model_config: Optional[Dict[str, Any]],
324+
) -> Optional[float]:
325+
"""Estimate the monetary cost of a single agent call in USD.
326+
327+
Uses ``costPerInputToken`` and ``costPerOutputToken`` from the model config.
328+
Returns ``None`` when either ``usage`` is ``None`` or no pricing fields are
329+
present on the model config — ensuring the return value is always in USD or
330+
absent, never a raw token count. This prevents unit-mismatch bugs when
331+
comparing costs across iterations where the model (and its pricing
332+
availability) may differ.
333+
334+
``costPerCachedInputToken`` is intentionally ignored — the estimate uses
335+
input/output tokens only.
336+
337+
:param usage: Token usage from the agent call. When ``None``, returns ``None``.
338+
:param model_config: Model config dict from ``get_model_configs()``, or ``None``.
339+
:return: Estimated cost in USD, or ``None`` if usage or pricing data is absent, or if
340+
both ``usage.input`` and ``usage.output`` are ``None`` (no token counts available).
341+
"""
342+
if usage is None:
343+
return None
344+
345+
input_price = model_config.get("costPerInputToken") if model_config else None
346+
output_price = model_config.get("costPerOutputToken") if model_config else None
347+
348+
if input_price is None and output_price is None:
349+
return None
350+
351+
cost = 0.0
352+
computed = False
353+
if input_price is not None and usage.input is not None:
354+
cost += usage.input * input_price
355+
computed = True
356+
if output_price is not None and usage.output is not None:
357+
cost += usage.output * output_price
358+
computed = True
359+
return cost if computed else None

0 commit comments

Comments
 (0)