feat: create edvise inference tasks#153
Conversation
… to just inf_prep with aim to make it school agnostic
… from we haven't written data ingestion yet, so for now we read from bronze
Add schema_type-aware config, cleanup, and cohort logging while keeping the existing PDP term_filter flow unchanged.
…s PDP, changed to do that note: it only reads "shared/assets/pdp_features_table.toml" right now for the feature table- we should devise a way to read it from UC for edvise schools either by storing their feature table in silver or in bronze
There was a problem hiding this comment.
hey @nm3224, I haven't gotten the chance to review the PR yet, but just wanted to give my take on your question about the features table.
I'm pretty sure we are okay using the PDP one since edvise and PDP have the same feature engineering process. Is that right @kaylawilding ? Either way, we either should create an edvise features table or have a shared one for PDP/edvise. Another option is a shared one for PDP/edvise, then PDP one and edvise one for where they are different. But I think there's a lot of similarity between PDP and edvise now.
We should move away from defining one at the unity catalog level. That's only for legacy schools. I also don't think we need to have a --features_table_path as a manual override. This should just match PDP.
@vishpillai123 oh you're totally right- sorry i'm mixing up edvise vs legacy in my head! i think a shared one would be good even if there's some features that don't overlap and are only edvise-specific or pdp-specific, since the majority likely will overlap. |
Summary
Question/Need to Fix: features table
The current features table is for PDP-only — custom schools will have their own table, likely stored in unity catalog. We should make this flexible to read from UC too for edvise schools. This PR does not fully solve that yet. inference_h2o still defaults to shared/assets/pdp_features_table.toml, with --features_table_path wired as a manual override. There is no catalog-based resolution for per-school feature tables yet. We can consider resolving this by reading the features table path from school/config/catalog (similar to how other pipeline inputs are resolved), and pass it through the inference YAMLs for Edvise/custom schools instead of relying on the PDP default.