Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/proposals/050-costguard-scorer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ Note: the score assigned by the temperatured sigmoid is not in [0, 1], but in (0
Implement CostGuard via a series of small PRs

1. Extend `modelconfigcollector` to collect input and output token prices;
2. Extend `requestmetadata` to maintain tdigest;
2. Implement a specialized `requestcostmetadata` extractor to maintain tdigest of cost;
3. Extend `AttributeMap` in the `datastore` to maintain tdigest and warmup counters;
4. Implement CostGuard;
5. Wire `CostGuard` with the rest of the system.
Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ module github.com/llm-d/llm-d-inference-payload-processor
go 1.25.0

require (
github.com/caio/go-tdigest/v5 v5.0.0
github.com/envoyproxy/go-control-plane/envoy v1.37.0
github.com/fsnotify/fsnotify v1.9.0
github.com/go-logr/logr v1.4.3
Expand Down
4 changes: 4 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/blang/semver/v4 v4.0.0 h1:1PFHFE6yCCTv8C1TeyNNarDzntLi7wMI5i/pzqYIsAM=
github.com/blang/semver/v4 v4.0.0/go.mod h1:IbckMUScFkM3pff0VJDNKRiT6TG/YpiHIM2yvyW5YoQ=
github.com/caio/go-tdigest/v5 v5.0.0 h1:XQKgYSazZPbWFDAJ51dKqoZoDrISmTrB8UcWwCmfo6Y=
github.com/caio/go-tdigest/v5 v5.0.0/go.mod h1:wI618wZoAYzIDZlpX2CfyTQdrdGtwEZOJuXdrI3zk/Y=
github.com/cenkalti/backoff/v5 v5.0.3 h1:ZN+IMa753KfX5hd8vVaMixjnqRZ3y8CuJKRKj1xcsSM=
github.com/cenkalti/backoff/v5 v5.0.3/go.mod h1:rkhZdG3JZukswDf7f0cwqPNk4K0sa+F97BxZthm/crw=
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
Expand Down Expand Up @@ -109,6 +111,8 @@ github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc=
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
github.com/leesper/go_rng v0.0.0-20190531154944-a612b043e353 h1:X/79QL0b4YJVO5+OsPH9rF2u428CIrGL/jLmPsoOQQ4=
github.com/leesper/go_rng v0.0.0-20190531154944-a612b043e353/go.mod h1:N0SVk0uhy+E1PZ3C9ctsPRlvOPAFPkCNlcPBDkt0N3U=
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
Expand Down
48 changes: 48 additions & 0 deletions pkg/framework/interface/datalayer/pricing/cost_digest.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
/*
Copyright 2026 The llm-d Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package pricing

import (
"github.com/caio/go-tdigest/v5"

"github.com/llm-d/llm-d-inference-payload-processor/pkg/framework/interface/datalayer"
)

// CostDigestAttributeKey is the AttributeMap key under which a model's
// *CostDigest is stored. Producers (the requestcostmetadata extractor)
// publish a snapshot of the running digest here; consumers (the CostGuard
// scorer and any other cost-aware reader) fetch the snapshot via Get.
const CostDigestAttributeKey = "cost_digest"

// CostDigest is a Cloneable wrapper around a *tdigest.TDigest from
// github.com/caio/go-tdigest/v5, recording the per-request cost
// distribution of a model. It is stored on a Model's AttributeMap under
// CostDigestAttributeKey.
//
// The wrapped digest must be non-nil. Cloning is delegated to the
// library's own Clone, which produces an independent digest with the
// same centroids; per the library docs the RNG state is not cloned.
type CostDigest struct {
Digest *tdigest.TDigest
}

// Clone implements datalayer.Cloneable. It returns a *CostDigest whose
// inner digest is independent of the original — adding samples to the
// clone does not affect the source, and vice versa.
func (c *CostDigest) Clone() datalayer.Cloneable {
return &CostDigest{Digest: c.Digest.Clone()}
}
69 changes: 69 additions & 0 deletions pkg/framework/plugins/datalayer/requestcostmetadata/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Request Cost Metadata Extractor

## What it is

A datasource extractor that turns each completed inference response into a per-request
cost sample and folds it into a per-model [t-digest](https://github.com/caio/go-tdigest)
stored on the Model's `AttributeMap`. It is registered as type
`request-cost-metadata-extractor` and runs on the same response-event loop as
`request-metadata-extractor`. It is a building block for the CostGuard scorer
(see [docs/proposals/050-costguard-scorer/README.md](../../../../../../docs/proposals/050-costguard-scorer/README.md)).

## What it does

1. Ignores `RequestEventType` events. Cost is observed only after a response.
2. On each `ResponseEventType` event:
- Reads the model name from the request body's `model` field.
- Reads `prompt_tokens` and `completion_tokens` from the response's `usage` block.
Skips the sample (with a debug log) if either is absent or non-positive.
- Reads the model's `*pricing.TokenPrices` from the AttributeMap under
`pricing.TokenPricesAttributeKey`. Skips the sample (with a debug log) if absent —
a model with no declared pricing has no defined cost. A model declared with
`TokenPrices{0,0}` (a free model) is *not* skipped: it records `cost=0`.
- Computes
`cost = prompt_tokens * InputTokenPrice + completion_tokens * OutputTokenPrice`
and adds the value to the model's running t-digest.
3. At the end of each `Extract` batch, for every model whose digest was updated and
whose flush interval has elapsed since the last publish, writes a *clone* of the
digest to the Model's AttributeMap under `pricing.CostDigestAttributeKey`. The
stored value is a `*pricing.CostDigest`.

This extractor does not freeze and replace the digest at epoch boundaries — the
digest accumulates without bound. Epoch handling lands in a follow-up PR.

## Inputs consumed

- `dlsrc.ResponsePayload.Request.Body["model"]` — the model name (string).
- `dlsrc.ResponsePayload.Response.Body["usage"]` — a `map[string]any` containing
`prompt_tokens` and `completion_tokens` as `float64`.
- `pricing.TokenPricesAttributeKey` on the Model's AttributeMap — populated by the
`modelconfigcollector` plugin at startup and on config-file changes.

## Configuration

```json
{
"compression": 200,
"flushIntervalDuration": "5s"
}
```

- `compression` (optional, default `200`): t-digest compression. Higher values
trade memory for quantile accuracy. Must be `> 0`.
- `flushIntervalDuration` (optional, default `"5s"`): aggregation window before a
per-model digest snapshot is published to the AttributeMap. Set to `"0s"` to
publish on every event (used in unit tests). Must be `>= 0`.

## Known limitations

- **Side-effect creation of empty Models for unconfigured names.** When a
response arrives for a model name that the operator never declared (i.e. a
model with no `pricing.TokenPrices` attribute), this extractor's lookup
goes through `Datastore.GetOrCreateModel`, which registers an empty Model
in the datastore as a side effect. The cost sample is correctly skipped,
but the model name leaks into `Datastore.Models()` and becomes visible to
every other plugin that enumerates the store. This is a limitation of the
current `Datastore` interface, which has no read-only `GetModel(name)`
method. A follow-up PR will add `GetModel` to the interface and migrate
this extractor to use it; once that lands, responses for unconfigured
models will be skipped without any datastore mutation.
Loading