Skip to content

feat: edvise schema school data audit#89

Draft
nm3224 wants to merge 86 commits into
developfrom
feat-custom_data_audit
Draft

feat: edvise schema school data audit#89
nm3224 wants to merge 86 commits into
developfrom
feat-custom_data_audit

Conversation

@nm3224
Copy link
Copy Markdown
Collaborator

@nm3224 nm3224 commented Jan 13, 2026

feat: edvise schema school data audit

Edvise Cohort Standardizer:

  • Operates similarly to PDP's cohort standardizer.
  • Logs credential types and enrollment types
  • Log high values of NAs
  • Logs missing bias variables
  • Logs grade distribution
  • Logs top majors
  • Replaces NA fields with "N" in pell and first gen columns
  • Finds and logs duplicates on primary keys: ["student_id", "cohort_term"]
    - Runs drop readmits func; checks again for remaining duplicates, if yes, runs keep_earlier_record func
  • Drops unused, unpopulated bias columns

Edvise Course Standardizer:

  • Operates similarly to PDP's course standardizer.
  • Strip trailing decimals
  • Log high values of NAs
  • Log grade distribution
  • Finds and logs duplicates on primary keys: ["student_id", "term", "course_subject", "course_num"]
    - Runs handling duplicates func (schema_type="es")
  • Runs check_pf_grade_consistency func
  • Runs validate_credit_consistency func
  • Runs assign_numeric_grade func

Logging cohort years + terms, academic years + terms (this can happen in the actual script, similar to how it is for PDP)

Questions

  • I'm not sure if the approach right now with dropping re admits and keeping the earlier record is the best way to de-dupe duplicates in the cohort file; i'm finding for some schools, duplicates are related to dual major or dual program students. How might we address these?
  • We still need to edit the handling duplicates function in the course file to fit a default standard for our edvise schema schools.

- renamed print_credential_and_enrollment_types_and_retention to just print_credential_and_enrollment_types - we will make print_retention its own sub func just for PDP b/c custom doesn't have a retention field and the logic around print_credential_and_enrollment_types is useful for custom and PDP
@nm3224 nm3224 changed the base branch from main to develop January 13, 2026 20:48
@nm3224 nm3224 changed the title Feat custom data audit feat: custom data audit Jan 13, 2026
@nm3224 nm3224 changed the title feat: custom data audit feat: custom/edvise schema school data audit Jan 13, 2026
@kaylawilding
Copy link
Copy Markdown
Collaborator

Edited to include the edivse schema data audit:

  • pulled the alike code into a data audit class in the backend
  • now the edvise and pdp scripts simply call that and do cli

@kaylawilding kaylawilding marked this pull request as draft March 23, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants