Skip to content

Instrument SimTable for rare ActionIsntFinished flake#99

Merged
folkengine merged 3 commits into
mainfrom
checks
May 17, 2026
Merged

Instrument SimTable for rare ActionIsntFinished flake#99
folkengine merged 3 commits into
mainfrom
checks

Conversation

@folkengine
Copy link
Copy Markdown
Collaborator

Action rejections in apply_action were silently swallowed via let _ =,
masking a rare CI stall where run_street exhausts its iteration cap and
bring_it_in() then fails with ActionIsntFinished. The eprintln!s here
fire only on the smoking-gun paths (rejected primary action; rejected
fallback for Bet/Raise; run_street exit with incomplete betting), so
green runs stay silent. The next failure on CI will surface the
offending PlayerAction, error, seat, street, and per-seat state under
the failing test's captured stderr. No behavior change.

Next move when CI fails again

The diagnostic output will appear under the failing test's ---- ... stdout ---- block. With one captured failure, we can:

  1. Identify which action the decider returned that the table rejected.
  2. Identify which street and what seat state caused it.
  3. Decide between the three fix paths from earlier (surface-and-force-fold, decider-rule fix, or seeded-RNG plumbing) with a concrete reproducer in hand instead of a probability guess.

  Action rejections in apply_action were silently swallowed via let _ =,
  masking a rare CI stall where run_street exhausts its iteration cap and
  bring_it_in() then fails with ActionIsntFinished. The eprintln!s here
  fire only on the smoking-gun paths (rejected primary action; rejected
  fallback for Bet/Raise; run_street exit with incomplete betting), so
  green runs stay silent. The next failure on CI will surface the
  offending PlayerAction, error, seat, street, and per-seat state under
  the failing test's captured stderr. No behavior change.

Next move when CI fails again

  The diagnostic output will appear under the failing test's ---- ... stdout ---- block. With one captured failure, we can:

  1. Identify which action the decider returned that the table rejected.
  2. Identify which street and what seat state caused it.
  3. Decide between the three fix paths from earlier (surface-and-force-fold, decider-rule fix, or seeded-RNG plumbing) with a concrete
  reproducer in hand instead of a probability guess.
  The CI flakes in tests/exploitative_play_smoke.rs (ActionIsntFinished
  stalls, no-rule-fires assertions) traced to RuleBasedDecider::decide
  calling rand::rng() — the thread-local — at decider.rs:126, plus the
  deck shuffle using the same thread-local at cards.rs:467. Across 1,000
  hands of probabilistic play, rare RNG samples produced pathological
  sequences that the assertions couldn't tolerate.

  This change is additive only:

    * BotDecider trait gains decide_seeded / on_new_hand_with_rng with
      default impls that forward to the existing methods. All three
      shipped deciders (RuleBasedDecider, JokerDecider,
      ExploitativeDecider) override decide_seeded; JokerDecider also
      gains new_with_rng and overrides on_new_hand_with_rng.
    * SimTable gains an Option<SmallRng> field and with_seed / with_rng
      builder methods. When set, run_hand uses shuffle_in_place_with on
      the deck and dispatches deciders through decide_seeded; when
      unset, the original thread-local path is preserved.
    * Cards gains shuffle_in_place_with; shuffle_in_place delegates.
    * The three smoke tests opt into deterministic runs via .with_seed(0).

  Verified across 50 consecutive runs of the smoke suite (50 pass, 0
  fail), 9,059 lib tests, all doc tests, all sibling integration tests
  including bot-training. No downstream impact — existing BotDecider
  impls (e.g. TelemetryDecider in examples) compile unchanged.

Optional follow-ups

  1. Heuristic-calibration gap: the seed sweep showed BotProfile::loose_passive produces VPIP ~0.13 and PFR ~0.06 in heads-up play —
  nit-like, not LP-like. Either RangeStrategy::loose_passive's range is too tight or RuleBasedDecider's preflop calling logic ignores profile
   width. Worth a separate investigation; not blocking this PR.
  2. Decide whether to keep the previous instrumentation: the eprintln! smoking-gun diagnostics in src/bot/sim.rs::apply_action/run_street
  and the enriched panic message in tests/exploitative_play_smoke.rs::accumulated_lp_stats_cause_profile_deviation are now
  belt-and-suspenders — they don't fire on the green path, only on rare regressions. Recommend keeping them as cheap insurance, but happy to
  revert if you'd rather the PR be tight.
@folkengine folkengine merged commit ec7a21c into main May 17, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant