Skip to content

P1-4: Routing strategies — latency-based + cost-based + tag-based #54

Description

@moonming

Problem

`crates/aisix-proxy/src/routing.rs` ships failover / round-robin / weighted (first-hop only). LiteLLM has 10+ strategies. Three of them cover the bulk of customer requests:

  1. latency-based — pick lowest-EWMA-latency target.
  2. cost-based — pick cheapest `usd_per_1k_token` target meeting a max-latency-tolerance.
  3. tag-based — match request header tags to target tags (`x-aisix-tag: prod` → only targets tagged prod).

Scope

DP (moonming/ai-gateway)

  • Extend `RoutingStrategy` enum in `aisix-core/src/models/routing.rs`.
  • Per-target rolling latency window (already partially tracked for routing.rs's existing fallback). Use `hdrhistogram` or simple EWMA.
  • Cost catalog in DP-side (or pull per-target cost from CP-pushed config).
  • Tag matching: request → `Vec` tags; target → `Vec` tags; intersection non-empty = eligible.

CP (api7/api7ee-3-control-plane)

  • `routing` resource (env-scoped) extends with new strategy enum values.
  • Per-target metadata: tags, cost (already partially there).

Dashboard UI (api7/AISIX-Cloud)

  • Routing-strategy picker adds 3 new entries.
  • Per-target form: tags input, cost field.

e2e

  • 3 targets with different EWMAs → assert latency-based picks the fastest.
  • 3 targets with different costs → cost-based picks cheapest.
  • Tag header → tag-based filters correctly.

Estimate

DP 3d, CP 1d, UI 1d, e2e 1d

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-value differentiatorgap-with-litellmIdentified by LiteLLM feature parity audit

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions