Problem
`crates/aisix-proxy/src/routing.rs` ships failover / round-robin / weighted (first-hop only). LiteLLM has 10+ strategies. Three of them cover the bulk of customer requests:
- latency-based — pick lowest-EWMA-latency target.
- cost-based — pick cheapest `usd_per_1k_token` target meeting a max-latency-tolerance.
- tag-based — match request header tags to target tags (`x-aisix-tag: prod` → only targets tagged prod).
Scope
DP (moonming/ai-gateway)
- Extend `RoutingStrategy` enum in `aisix-core/src/models/routing.rs`.
- Per-target rolling latency window (already partially tracked for routing.rs's existing fallback). Use `hdrhistogram` or simple EWMA.
- Cost catalog in DP-side (or pull per-target cost from CP-pushed config).
- Tag matching: request → `Vec` tags; target → `Vec` tags; intersection non-empty = eligible.
CP (api7/api7ee-3-control-plane)
- `routing` resource (env-scoped) extends with new strategy enum values.
- Per-target metadata: tags, cost (already partially there).
Dashboard UI (api7/AISIX-Cloud)
- Routing-strategy picker adds 3 new entries.
- Per-target form: tags input, cost field.
e2e
- 3 targets with different EWMAs → assert latency-based picks the fastest.
- 3 targets with different costs → cost-based picks cheapest.
- Tag header → tag-based filters correctly.
Estimate
DP 3d, CP 1d, UI 1d, e2e 1d
Problem
`crates/aisix-proxy/src/routing.rs` ships failover / round-robin / weighted (first-hop only). LiteLLM has 10+ strategies. Three of them cover the bulk of customer requests:
Scope
DP (moonming/ai-gateway)
CP (api7/api7ee-3-control-plane)
Dashboard UI (api7/AISIX-Cloud)
e2e
Estimate
DP 3d, CP 1d, UI 1d, e2e 1d