Skip to content

feat: remove custom institutions, enforce PDP / Edvise / Legacy in API and uploads#228

Merged
vishpillai123 merged 8 commits into
developfrom
feat/remove-custom-institutions
Apr 13, 2026
Merged

feat: remove custom institutions, enforce PDP / Edvise / Legacy in API and uploads#228
vishpillai123 merged 8 commits into
developfrom
feat/remove-custom-institutions

Conversation

@chapmanhk

@chapmanhk chapmanhk commented Apr 10, 2026

Copy link
Copy Markdown
Collaborator

changes

  • Institutions API (routers/institutions.py)

    • Remove the custom-institution model from create/update semantics: every institution must resolve to exactly one of PDP (pdp_id), Edvise Schema / ES (edvise_id or is_edvise with auto-id), or Legacy (legacy_id or is_legacy with auto-id).
    • POST: Validates mutual exclusivity and “exactly one type”; idempotent duplicate POST (same name + state) returns 400 if the existing row is misconfigured (e.g. more than one id set, or none).
    • PATCH: Same invariants; is_edvise / is_legacy can auto-assign ids like POST; schemas is recomputed only when the school-type triple changes; allowed_schemas alone can still replace schemas when the type does not change.
    • Refactor: Shared error copy (_MUTUALLY_EXCLUSIVE_SCHOOL_TYPE_DETAIL), helpers for duplicate-post checks, PATCH merge/validate/persist, and a slimmer update_inst (DRY + structure aligned with internal style guidelines).
  • Data / upload path (routers/data.py and related)

    • Upload resolution limited to edvise / pdp / legacy (no custom-institution path).
  • Utilities

    • Legacy schema grouping updated (e.g. LEGACY_SCHEMA_GROUP includes UNKNOWN as appropriate for this model).
  • Tests (institutions_test.py, and related where touched)

    • Coverage for duplicate POST edge cases, PATCH flags, allowed_schemas-only PATCH, and invalid existing rows.

context

Custom institutions are being retired. The API and upload pipeline must not accept or branch on a fourth “custom” school type; institutions and uploads must map cleanly to PDP, Edvise (ES), or Legacy with consistent validation and predictable schemas behavior. Hardening duplicate POST avoids silently succeeding when a legacy bad row exists (typeless or conflicting ids).

https://app.asana.com/1/6325821815997/project/1208486153653829/task/1213791624909790?focus=true

testing

Tested deployed feature branch on dev with its associated edvise-ui feature branch: confirmed that custom is removed and that school creation is working as intended.

questions

  • None

…y schemas UNKNOWN

- Require exactly one of PDP, Edvise, or Legacy on POST /institutions
- Remove custom schema resolution and Databricks extension generation for uploads
- Fix PATCH /institutions to persist allowed_schemas to inst.schemas column
- LEGACY_SCHEMA_GROUP stores UNKNOWN only; drop validation_extension module
- Update tests and default fixtures for typeless/custom removal

Made-with: Cursor
- POST/PATCH: require exactly one school type (pdp, edvise, or legacy)
- PATCH: recompute schemas only when the type triple changes; merge optional allowed_schemas on change
- PATCH: honor is_edvise/is_legacy for auto-assigned ids (POST parity)
- Docs/tests: validation namespaces; disambiguate custom naming in code and tests

Made-with: Cursor
Restore original docstrings and test names where "custom" referred to\nconverters, schema config, or JSON keys—not custom institutions.\n\nKeep gcsutil validate_file institution_id line aligned with pdp/edvise/legacy\nonly (no institution-UUID-for-custom upload path).

Made-with: Cursor
…ol type

When (name, state) matches an existing InstTable row, validate stored\npdp_id/edvise_id/legacy_id the same as new creates: at most one non-null\nand exactly one required. Return 400 with guidance instead of 200 for\ntypeless or invalid rows. Add regression tests.

Made-with: Cursor
…s-only

- Reject is_pdp without pdp_id on POST\n- Reject duplicate (name, state) when stored row has conflicting ids\n- Reject PATCH is_edvise on PDP row without clearing pdp_id\n- Reject PATCH with both is_edvise and is_legacy\n- allowed_schemas-only PATCH replaces schemas when type unchanged

Made-with: Cursor
- Add shared mutual-exclusion detail constant for POST/PATCH paths
- Extract duplicate-post row validation and PATCH merge/validate/persist helpers
- Keep update_inst within single-responsibility helpers; reuse row response mapper

Made-with: Cursor
- Remove unused HTTPException import from databricks.py (F401)
- Cast ORM row in _require_single_institution_row_by_uuid for InstTable (no-any-return)

Made-with: Cursor
Comment thread src/webapp/utilities.py
LEGACY_SCHEMA_GROUP: Final = {
SchemaType.STUDENT,
SchemaType.COURSE,
SchemaType.UNKNOWN,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use "UNKNOWN" as a catch-all for any type of dataset?

Is it possible to remove LEGACY_SCHEMA_GROUP entirely?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a file is validated, the code must always get at least one schema name in the list it validates against. An empty list causes failure. For legacy schools, we still save a default UNKNOWN on the institution that way the stored schemas field is never empty.
So we need something for LEGACY_SCHEMA_GROUP schemas, I just made it "UNKNOWN". If we want to call it something different I can adjust it.

@vishpillai123 vishpillai123 Apr 10, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we are removing schema extensions, which is the right call to me given how we're evolving in Edvise, I'm thinking we'll have to reconsider or even deprecate our previous API validation layer (Mesh's validation implementation, not what you've recently implemented for PDP). This is out of scope for this PR, but curious to hear your thoughts here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So kind of - technically legacy validation is still set up using the schema validation Mesh set up, we're just basically using what's already been built to say that if something is a legacy school it doesn't need any specific validations applied to it.
So once we have Edvise also on the Pandera validation system, legacy will be the only thing using the old system. At that point it would probably make sense to reconfigure/refactor validation for legacy schools so that we can simplify the code (since we're not actually using it for what it was supposed to be used for). So agreed on this - but technically it doesn't break anything currently to have it, so not sure if it would be a priority.

"Edvise Schema (ES) (set edvise_id or is_edvise), or Legacy "
"(set legacy_id or is_legacy)."
)
_MUTUALLY_EXCLUSIVE_SCHOOL_TYPE_DETAIL = (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for building this part out!

@vishpillai123

Copy link
Copy Markdown
Collaborator

what is a school-type triple?

@chapmanhk

Copy link
Copy Markdown
Collaborator Author

what is a school-type triple?

@vishpillai123 it's just the three IDs we store on an institution: pdp_id, edvise_id, and legacy_id.

Together they answer “what kind of school is this?” The rule is: exactly one of the three should be set (PDP or Edvise Schema or Legacy—not more than one, not zero). It's basically just referring to “those three fields as a group,”

@vishpillai123

vishpillai123 commented Apr 13, 2026

Copy link
Copy Markdown
Collaborator

what is a school-type triple?

@vishpillai123 it's just the three IDs we store on an institution: pdp_id, edvise_id, and legacy_id.

Together they answer “what kind of school is this?” The rule is: exactly one of the three should be set (PDP or Edvise Schema or Legacy—not more than one, not zero). It's basically just referring to “those three fields as a group,”

Ok got it makes sense! I feel like having two columns with "school_type", "school_id" makes a bit more intuitive sense to me, where school_type is limited to "pdp", "edvise", and "legacy". I'm assuming the ID is a random integer of some kind?

Though I think this is fine to have the current design too.

@vishpillai123 vishpillai123 merged commit 55c2009 into develop Apr 13, 2026
5 checks passed
@chapmanhk

Copy link
Copy Markdown
Collaborator Author

what is a school-type triple?

@vishpillai123 it's just the three IDs we store on an institution: pdp_id, edvise_id, and legacy_id.
Together they answer “what kind of school is this?” The rule is: exactly one of the three should be set (PDP or Edvise Schema or Legacy—not more than one, not zero). It's basically just referring to “those three fields as a group,”

Ok got it makes sense! I feel like having two columns with "school_type", "school_id" makes a bit more intuitive sense to me, where school_type is limited to "pdp", "edvise", and "legacy". I'm assuming the ID is a random integer of some kind?

Though I think this is fine to have the current design too.

@vishpillai123 100% agree, and Bill wants this as well. We just already had is_pdp and pdp_id as columns, so it was following that format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants