fix: use full test dataset for support overview training table #120
fix: use full test dataset for support overview training table #120vishpillai123 wants to merge 3 commits into
Conversation
- Updated run_predictions() to accept test_sample_cap=None for full dataset - Modified training_h2o.py to generate support_overview using full test dataset - Ensures support_overview matches ROC table in using complete test data - Previously used sampled 200 rows, now uses all test rows for accurate distribution
|
@vishpillai123 Is this ready for review? If yes, can you add a comment for context and explanation? Then can you tag Jonathan? |
Not ready yet since we need to backfill old support score tables for schools. The code itself is fine, I just need to backfill. |
jnolendata
left a comment
There was a problem hiding this comment.
The change makes sense—using the full test set for the support overview ensures the numbers reflect reality. I noticed run_predictions is called twice: the first call does the usual predictions and SHAP, and the second call is mainly for generating support_score_distribution. The second call repeats model loading, imputation, prediction, and SHAP, which might be more than needed since we just want the support table. Maybe a lighter approach for the second call could work, but overall the logic is clear.
No description provided.