Skip to content

fix: use full test dataset for support overview training table #120

Open
vishpillai123 wants to merge 3 commits into
developfrom
feat/support-overview-full-test-dataset
Open

fix: use full test dataset for support overview training table #120
vishpillai123 wants to merge 3 commits into
developfrom
feat/support-overview-full-test-dataset

Conversation

@vishpillai123
Copy link
Copy Markdown
Collaborator

No description provided.

- Updated run_predictions() to accept test_sample_cap=None for full dataset
- Modified training_h2o.py to generate support_overview using full test dataset
- Ensures support_overview matches ROC table in using complete test data
- Previously used sampled 200 rows, now uses all test rows for accurate distribution
@vishpillai123 vishpillai123 changed the base branch from main to develop March 3, 2026 23:11
@kaylawilding
Copy link
Copy Markdown
Collaborator

kaylawilding commented May 1, 2026

@vishpillai123 Is this ready for review? If yes, can you add a comment for context and explanation? Then can you tag Jonathan?

@vishpillai123
Copy link
Copy Markdown
Collaborator Author

@vishpillai123 Is this ready for review? If yes, can you add a comment for context and explanation? Then can you tag Jonathan?

Not ready yet since we need to backfill old support score tables for schools. The code itself is fine, I just need to backfill.

Copy link
Copy Markdown

@jnolendata jnolendata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense—using the full test set for the support overview ensures the numbers reflect reality. I noticed run_predictions is called twice: the first call does the usual predictions and SHAP, and the second call is mainly for generating support_score_distribution. The second call repeats model loading, imputation, prediction, and SHAP, which might be more than needed since we just want the support table. Maybe a lighter approach for the second call could work, but overall the logic is clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants