Skip to content

feat: create edvise inference tasks#153

Open
nm3224 wants to merge 11 commits into
developfrom
feat-create-edvise-inference-tasks
Open

feat: create edvise inference tasks#153
nm3224 wants to merge 11 commits into
developfrom
feat-create-edvise-inference-tasks

Conversation

@nm3224
Copy link
Copy Markdown
Collaborator

@nm3224 nm3224 commented May 21, 2026

Summary

  • Makes inference prep and H2O inference schema-aware so both PDP and Edvise/ES schools can run the same workflow.
  • Renames pdp_inf_prep.py → inf_prep.py
  • Adds --schema_type / --job_type so config, feature cleanup, and cohort logging work for both PDP (PDPProjectConfig / PDPCleanup) and Edvise (ESProjectConfig / ESCleanup) without changing PDP behavior.
  • Adds inf_prep and inference_h2o tasks to github_es_inference.yml so the Edvise inference pipeline matches PDP end-to-end (checkpoint → student_selection → inf_prep → inference_h2o).
  • Updates inference_h2o.py to load config via project_config_class(schema_type) instead of hardcoding PDPProjectConfig; both inference YAMLs now pass --schema_type.
  • Extracts shared helpers to utils.py: cohort_pair_columns() and feature_cleanup_for_schema() (also used by model_prep.py for cohort logging and cleanup selection).
  • Fixes test import after the rename (pdp_inf_prep → inf_prep) and mypy typing for the new helpers.
  • PDP inference is unchanged when schema_type=pdp (default). Edvise jobs pass schema_type=edvise (or es) through the pipeline.

Question/Need to Fix: features table

The current features table is for PDP-only — custom schools will have their own table, likely stored in unity catalog. We should make this flexible to read from UC too for edvise schools. This PR does not fully solve that yet. inference_h2o still defaults to shared/assets/pdp_features_table.toml, with --features_table_path wired as a manual override. There is no catalog-based resolution for per-school feature tables yet. We can consider resolving this by reading the features table path from school/config/catalog (similar to how other pipeline inputs are resolved), and pass it through the inference YAMLs for Edvise/custom schools instead of relying on the PDP default.

@nm3224 nm3224 marked this pull request as draft May 21, 2026 19:05
nm3224 added 6 commits May 21, 2026 15:16
Add schema_type-aware config, cleanup, and cohort logging while keeping the existing PDP term_filter flow unchanged.
…s PDP, changed to do that

note: it only reads "shared/assets/pdp_features_table.toml" right now for the feature table- we should devise a way to read it from UC for edvise schools either by storing their feature table in silver or in bronze
@nm3224 nm3224 marked this pull request as ready for review May 21, 2026 20:08
Copy link
Copy Markdown
Collaborator

@vishpillai123 vishpillai123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @nm3224, I haven't gotten the chance to review the PR yet, but just wanted to give my take on your question about the features table.

I'm pretty sure we are okay using the PDP one since edvise and PDP have the same feature engineering process. Is that right @kaylawilding ? Either way, we either should create an edvise features table or have a shared one for PDP/edvise. Another option is a shared one for PDP/edvise, then PDP one and edvise one for where they are different. But I think there's a lot of similarity between PDP and edvise now.

We should move away from defining one at the unity catalog level. That's only for legacy schools. I also don't think we need to have a --features_table_path as a manual override. This should just match PDP.

@nm3224
Copy link
Copy Markdown
Collaborator Author

nm3224 commented May 26, 2026

hey @nm3224, I haven't gotten the chance to review the PR yet, but just wanted to give my take on your question about the features table.

I'm pretty sure we are okay using the PDP one since edvise and PDP have the same feature engineering process. Is that right @kaylawilding ? Either way, we either should create an edvise features table or have a shared one for PDP/edvise. Another option is a shared one for PDP/edvise, then PDP one and edvise one for where they are different. But I think there's a lot of similarity between PDP and edvise now.

We should move away from defining one at the unity catalog level. That's only for legacy schools. I also don't think we need to have a --features_table_path as a manual override. This should just match PDP.

@vishpillai123 oh you're totally right- sorry i'm mixing up edvise vs legacy in my head! i think a shared one would be good even if there's some features that don't overlap and are only edvise-specific or pdp-specific, since the majority likely will overlap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants