Bootstrap DFlash profit adaptive DM at max depth, converge via argmax

Anbeeld · Anbeeld · commit 57da314b5699 · 2026-06-15T21:02:43.000+02:00
The profit controller's cold start walked the active draft depth up through
low probe depths (0 -&gt; 4 -&gt; 8 -&gt; max) before settling, so short responses
spent their useful window at low depth and felt laggy. A prior attempt to
"start high" instead reversed the walk to descend (max -&gt; 8 -&gt; 4) and let
production rest at the walk's terminal depth; since cold-start probe
measurements are uniform, that collapsed to the floor and could not climb
back, halving decode throughput on high-acceptance workloads.

Decouple the resting depth from the probe walk:

- Cold start now holds production at the maximum draft depth once the no-spec
  baseline exists, instead of walking through low depths, so short requests
  run at max immediately. If the held max is measured clearly worse than
  no-spec, it falls through early so a better depth can take over.
- The scheduler characterizes the lower probe spread through transient
  one-cycle excursions while production stays at max; the argmax candidate
  scorer then demotes only when a measured lower depth is genuinely faster
  (the safe, well-gated direction).

The existing scoring, hysteresis, active-episode, baseline-reprobe, off-probe,
and lower-rescue safeguards are unchanged, so the controller still converges
to the true throughput optimum whether it is high, mid, or low.

Add end-to-end convergence tests (high/mid/low optima) asserting the settled
depth -- the property the earlier regression violated -- and update the
cold-start/warmup tests to the hold-max mechanism.
diff --git a/tests/test-adaptive-dm.cpp b/tests/test-adaptive-dm.cpp
@@ -422,10 +422,13 @@ int main() {
         state.apply_profit_recommendation(recommended);
     }
     state.observe_profit_timing(0, 0, 0, 0.0f, 30.0f, 0.0f, 30.0f);
-    assert(state.decide_profit_n_max(8) == 4);
+    // With a baseline available the cold controller bootstraps at the maximum
+    // draft depth rather than walking up from a shallow probe.
+    assert(state.decide_profit_n_max(8) == 8);
 
-    // test cold start: fresh profit controllers seed no-spec baseline before
-    // any positive-depth DFlash cycle, then probe shallow depth first.
+    // test cold start: fresh profit controllers seed the no-spec baseline before
+    // any positive-depth DFlash cycle, then bootstrap production at the maximum
+    // draft depth (the lower spread is characterized via transient excursions).
     state.reset_profit_state();
     state.dm_profit_min_samples = 2;
     state.dm_off_dwell = 1;
@@ -435,7 +438,7 @@ int main() {
     assert(state.decide_profit_n_max(8) == 0);
     state.observe_profit_timing(0, 0, 0, 0.0f, 32.0f, 0.0f, 32.0f);
     assert(state.profit_baseline_ready());
-    assert(state.decide_profit_n_max(8) == 2);
+    assert(state.decide_profit_n_max(8) == 8);
 
     // Baseline scoring uses the same current EWMA policy as positive depths;
     // stale best no-spec spikes must not make baseline unbeatable.
@@ -447,17 +450,79 @@ int main() {
     assert(state.profit_baseline.best_score >= 39.9f);
     assert_close(state.profit_score_for_depth(0), 16.0f);
 
-    // test explicit warmup requires extra measured samples for the initial
-    // positive-depth probe before moving to the next depth.
+    // test explicit warmup requires extra measured samples per spread depth before
+    // the initial probe set is treated as characterized; production holds at the
+    // maximum draft depth throughout, and only argmax (post-characterization) moves.
     state.reset_profit_state();
     state.dm_profit_min_samples = 1;
     state.dm_profit_warmup = 2;
     state.observe_profit_timing(0, 0, 0, 0.0f, 30.0f, 0.0f, 30.0f);
-    assert(state.decide_profit_n_max(8) == 2);
+    assert(state.profit_baseline_ready());
+    assert(state.decide_profit_n_max(8) == 8);
+    // one sample per spread depth is not enough under warmup=2
     observe_profit_cycle(state, 2, 2, 1, 60.0f);
-    assert(state.decide_profit_n_max(8) == 2);
+    observe_profit_cycle(state, 4, 4, 1, 60.0f);
+    observe_profit_cycle(state, 8, 8, 1, 60.0f);
+    assert(!state.profit_initial_probe_set_ready(8));
+    assert(state.decide_profit_n_max(8) == 8);
+    // a second sample per spread depth completes characterization
     observe_profit_cycle(state, 2, 2, 1, 60.0f);
-    assert(state.decide_profit_n_max(8) == 4);
+    observe_profit_cycle(state, 4, 4, 1, 60.0f);
+    observe_profit_cycle(state, 8, 8, 1, 60.0f);
+    assert(state.profit_initial_probe_set_ready(8));
+    state.dm_profit_warmup = 0;
+
+    // End-to-end convergence: a cold profit controller bootstraps at the maximum
+    // draft depth and must then settle on the genuinely fastest depth, regardless
+    // of whether the optimum is high, mid, or low. This is the property the earlier
+    // "start high, walk down, rest at the walk terminal" regression violated: it
+    // collapsed to the floor and could not climb back. Throughput here is modeled
+    // unimodally (peak at `optimum`): score(d) = (1 + min(d, optimum)) / (20 + 2d),
+    // with a no-spec baseline every positive depth beats.
+    auto converged_depth = [](int base_n_max, int optimum) -> int {
+        server_adaptive_dm_state s;
+        s.dm_profit_min_samples = 2;
+        s.dm_profit_baseline_interval = 0; // disable reprobe noise for a deterministic check
+        auto feed = [&](int d) {
+            if (d <= 0) {
+                s.observe_profit_timing(0, 0, 0, 0.0f, 35.0f, 0.0f, 35.0f);
+                return;
+            }
+            const int acc = d < optimum ? d : optimum;
+            const float ms = 20.0f + 2.0f * (float) d;
+            s.observe_profit_acceptance(d, acc);
+            s.observe_profit_timing(d, d, acc, 0.0f, ms, 0.0f, ms);
+        };
+        feed(0);
+        feed(0);
+        assert(s.profit_baseline_ready());
+        int rec = s.decide_profit_n_max(base_n_max);
+        assert(rec == base_n_max); // cold start bootstraps at max, not a low probe
+        s.apply_profit_recommendation(rec);
+        // Each iteration samples production plus the full depth range, standing in
+        // for transient characterization/exploration excursions accumulating over
+        // time. The controller must converge to (and hold) the throughput optimum.
+        for (int iter = 0; iter < 200; ++iter) {
+            feed(s.adaptive_n_max > 0 ? s.adaptive_n_max : base_n_max);
+            for (int d = 1; d <= base_n_max; ++d) {
+                feed(d);
+            }
+            feed(0);
+            rec = s.decide_profit_n_max(base_n_max);
+            s.apply_profit_recommendation(rec);
+        }
+        return s.adaptive_n_max;
+    };
+    {
+        const int high = converged_depth(15, 15);
+        assert(high >= 12); // high optimum: stay high (the regressed case wanted this)
+        const int mid8 = converged_depth(15, 8);
+        assert(mid8 >= 6 && mid8 <= 10); // mid optimum identified, not stuck high or low
+        const int mid12 = converged_depth(15, 12);
+        assert(mid12 >= 10 && mid12 <= 14); // mid-high optimum identified
+        const int low = converged_depth(15, 4);
+        assert(low >= 2 && low <= 6); // low optimum identified, not stuck high
+    }
 
     // test reset_request_state preserves learned profit data while resetting
     // request-local counters
diff --git a/tools/server/server-adaptive-dm.h b/tools/server/server-adaptive-dm.h
@@ -1298,29 +1298,30 @@ struct server_adaptive_dm_state {
                 ? base_n_max
                 : std::clamp<int>(adaptive_n_max, 0, base_n_max));
         const bool returning_from_baseline_probe = profit_baseline_probe_resume_n > 0;
-        const int unready_probe = profit_next_unready_probe_depth(base_n_max);
         const bool collecting_initial_probe_set =
             profit_baseline_probe_resume_n <= 0 &&
             !profit_request_requires_fresh_switch_sample &&
             !profit_initial_probe_set_ready(base_n_max);
-        const bool current_episode_ready_for_probe =
-            current_n > 0 &&
-            profit_active_episode_ready(current_n, base_n_max);
-        const float current_episode_score_for_probe =
-            current_episode_ready_for_probe ? profit_active_episode_score(current_n) : 0.0f;
-        const bool current_episode_clearly_bad =
-            current_episode_ready_for_probe &&
-            current_episode_score_for_probe < baseline_score * (1.0f + dm_profit_min);
-        const bool unready_probe_can_reduce_current =
-            current_n > 0 &&
-            unready_probe > 0 &&
-            unready_probe < current_n;
-        if (collecting_initial_probe_set &&
-                unready_probe > 0 &&
-                (!current_episode_clearly_bad || unready_probe_can_reduce_current)) {
-            profit_current_score = baseline_score;
-            profit_last_recommended_n = unready_probe;
-            return unready_probe;
+        // Cold-start bootstrap: once the no-spec baseline exists, run production at
+        // the maximum draft depth and let the scheduler characterize the lower probe
+        // spread through transient exploration excursions (see get_dflash_n_draft_max).
+        // Holding max keeps short requests fast instead of stranding them at a low
+        // probe depth, and the argmax candidate scorer below demotes only when a
+        // measured lower depth is genuinely faster. If the held positive depth is
+        // measured to be clearly worse than no-spec, fall through so a better
+        // characterized depth (or baseline) can take over without waiting for the
+        // full spread.
+        if (collecting_initial_probe_set) {
+            const int hold_n = current_n > 0 ? current_n : base_n_max;
+            const bool hold_episode_clearly_bad =
+                current_n > 0 &&
+                profit_active_episode_ready(current_n, base_n_max) &&
+                profit_active_episode_score(current_n) < baseline_score * (1.0f + dm_profit_min);
+            if (!hold_episode_clearly_bad) {
+                profit_current_score = baseline_score;
+                profit_last_recommended_n = hold_n;
+                return hold_n;
+            }
         }
 
         if (current_n == 0 && profit_baseline_probe_resume_n <= 0) {
diff --git a/tools/server/server-context.cpp b/tools/server/server-context.cpp
@@ -1264,12 +1264,20 @@ struct server_slot : server_adaptive_dm_state {
                 if (advance_adaptive_probe) {
                     explore_counter++;
                     if (explore_counter % dm_explore_interval == 0) {
-                        const int explore_n_max = profit_next_unready_explore_depth(
-                                adaptive_n_max,
-                                base_n_max,
-                                explore_counter / dm_explore_interval);
-                        if (explore_n_max > 0) {
-                            n_draft_max = explore_n_max;
+                        // While the cold-start probe spread is still uncharacterized,
+                        // sample it through transient excursions so the profit argmax
+                        // can compare low/mid/high depths; production stays at the held
+                        // max between excursions. Afterwards, use steady local explore.
+                        const int excursion_n_max =
+                            (profit_baseline_ready() &&
+                                    !profit_initial_probe_set_ready(base_n_max))
+                                ? profit_next_unready_probe_depth(base_n_max)
+                                : profit_next_unready_explore_depth(
+                                        adaptive_n_max,
+                                        base_n_max,
+                                        explore_counter / dm_explore_interval);
+                        if (excursion_n_max > 0) {
+                            n_draft_max = excursion_n_max;
                         }
                     }
                 }