feat: remove custom institutions, enforce PDP / Edvise / Legacy in API and uploads#228
Conversation
…y schemas UNKNOWN - Require exactly one of PDP, Edvise, or Legacy on POST /institutions - Remove custom schema resolution and Databricks extension generation for uploads - Fix PATCH /institutions to persist allowed_schemas to inst.schemas column - LEGACY_SCHEMA_GROUP stores UNKNOWN only; drop validation_extension module - Update tests and default fixtures for typeless/custom removal Made-with: Cursor
- POST/PATCH: require exactly one school type (pdp, edvise, or legacy) - PATCH: recompute schemas only when the type triple changes; merge optional allowed_schemas on change - PATCH: honor is_edvise/is_legacy for auto-assigned ids (POST parity) - Docs/tests: validation namespaces; disambiguate custom naming in code and tests Made-with: Cursor
Restore original docstrings and test names where "custom" referred to\nconverters, schema config, or JSON keys—not custom institutions.\n\nKeep gcsutil validate_file institution_id line aligned with pdp/edvise/legacy\nonly (no institution-UUID-for-custom upload path). Made-with: Cursor
…ol type When (name, state) matches an existing InstTable row, validate stored\npdp_id/edvise_id/legacy_id the same as new creates: at most one non-null\nand exactly one required. Return 400 with guidance instead of 200 for\ntypeless or invalid rows. Add regression tests. Made-with: Cursor
…s-only - Reject is_pdp without pdp_id on POST\n- Reject duplicate (name, state) when stored row has conflicting ids\n- Reject PATCH is_edvise on PDP row without clearing pdp_id\n- Reject PATCH with both is_edvise and is_legacy\n- allowed_schemas-only PATCH replaces schemas when type unchanged Made-with: Cursor
- Add shared mutual-exclusion detail constant for POST/PATCH paths - Extract duplicate-post row validation and PATCH merge/validate/persist helpers - Keep update_inst within single-responsibility helpers; reuse row response mapper Made-with: Cursor
- Remove unused HTTPException import from databricks.py (F401) - Cast ORM row in _require_single_institution_row_by_uuid for InstTable (no-any-return) Made-with: Cursor
Made-with: Cursor
| LEGACY_SCHEMA_GROUP: Final = { | ||
| SchemaType.STUDENT, | ||
| SchemaType.COURSE, | ||
| SchemaType.UNKNOWN, |
There was a problem hiding this comment.
Do we want to use "UNKNOWN" as a catch-all for any type of dataset?
Is it possible to remove LEGACY_SCHEMA_GROUP entirely?
There was a problem hiding this comment.
When a file is validated, the code must always get at least one schema name in the list it validates against. An empty list causes failure. For legacy schools, we still save a default UNKNOWN on the institution that way the stored schemas field is never empty.
So we need something for LEGACY_SCHEMA_GROUP schemas, I just made it "UNKNOWN". If we want to call it something different I can adjust it.
There was a problem hiding this comment.
Given that we are removing schema extensions, which is the right call to me given how we're evolving in Edvise, I'm thinking we'll have to reconsider or even deprecate our previous API validation layer (Mesh's validation implementation, not what you've recently implemented for PDP). This is out of scope for this PR, but curious to hear your thoughts here.
There was a problem hiding this comment.
So kind of - technically legacy validation is still set up using the schema validation Mesh set up, we're just basically using what's already been built to say that if something is a legacy school it doesn't need any specific validations applied to it.
So once we have Edvise also on the Pandera validation system, legacy will be the only thing using the old system. At that point it would probably make sense to reconfigure/refactor validation for legacy schools so that we can simplify the code (since we're not actually using it for what it was supposed to be used for). So agreed on this - but technically it doesn't break anything currently to have it, so not sure if it would be a priority.
| "Edvise Schema (ES) (set edvise_id or is_edvise), or Legacy " | ||
| "(set legacy_id or is_legacy)." | ||
| ) | ||
| _MUTUALLY_EXCLUSIVE_SCHOOL_TYPE_DETAIL = ( |
There was a problem hiding this comment.
thank you for building this part out!
|
what is a school-type triple? |
@vishpillai123 it's just the three IDs we store on an institution: pdp_id, edvise_id, and legacy_id. Together they answer “what kind of school is this?” The rule is: exactly one of the three should be set (PDP or Edvise Schema or Legacy—not more than one, not zero). It's basically just referring to “those three fields as a group,” |
Ok got it makes sense! I feel like having two columns with "school_type", "school_id" makes a bit more intuitive sense to me, where school_type is limited to "pdp", "edvise", and "legacy". I'm assuming the ID is a random integer of some kind? Though I think this is fine to have the current design too. |
@vishpillai123 100% agree, and Bill wants this as well. We just already had is_pdp and pdp_id as columns, so it was following that format |
changes
Institutions API (
routers/institutions.py)pdp_id), Edvise Schema / ES (edvise_idoris_edvisewith auto-id), or Legacy (legacy_idoris_legacywith auto-id).is_edvise/is_legacycan auto-assign ids like POST;schemasis recomputed only when the school-type triple changes;allowed_schemasalone can still replaceschemaswhen the type does not change._MUTUALLY_EXCLUSIVE_SCHOOL_TYPE_DETAIL), helpers for duplicate-post checks, PATCH merge/validate/persist, and a slimmerupdate_inst(DRY + structure aligned with internal style guidelines).Data / upload path (
routers/data.pyand related)Utilities
LEGACY_SCHEMA_GROUPincludesUNKNOWNas appropriate for this model).Tests (
institutions_test.py, and related where touched)allowed_schemas-only PATCH, and invalid existing rows.context
Custom institutions are being retired. The API and upload pipeline must not accept or branch on a fourth “custom” school type; institutions and uploads must map cleanly to PDP, Edvise (ES), or Legacy with consistent validation and predictable
schemasbehavior. Hardening duplicate POST avoids silently succeeding when a legacy bad row exists (typeless or conflicting ids).https://app.asana.com/1/6325821815997/project/1208486153653829/task/1213791624909790?focus=true
testing
Tested deployed feature branch on dev with its associated edvise-ui feature branch: confirmed that custom is removed and that school creation is working as intended.
questions