Skip to content

feat: renumber duplicate keys only on relevant course fields#144

Open
nm3224 wants to merge 3 commits into
developfrom
feat-change-duplicate-handling-behavior
Open

feat: renumber duplicate keys only on relevant course fields#144
nm3224 wants to merge 3 commits into
developfrom
feat-change-duplicate-handling-behavior

Conversation

@nm3224
Copy link
Copy Markdown
Collaborator

@nm3224 nm3224 commented Apr 28, 2026

Primary Keys for course records now also include: section_id.
BUT: if section_id has >75% null/missing values, exclude it from primary keys and proceed without it.

PDP and Edvise schema duplicate handling now share a clear rule: suffix course_number only when the duplicate-key group disagrees on course classification/type, course name, credits attempted or earned, or grade (relevant fields we care about). Otherwise - collapse to one row per key (arbitrary dropping). The kept row is chosen deterministically as the first index in the group.

How this changes our student-term features:
For scalar ("course_id", catalog) in sum_val_equal_cols_by_group, course_id values that are renumbered variants of the catalog id (e.g. ENG101-1 for ENG101) are counted toward num_courses_course_id_*, so key-course counts can reflect multiple rows in a term after dedupe. This captured important information around students re-taking courses which was otherwise dropped.

  • Q: Should this not be the case for lab/lecture combos or courses where the course-name is tangibly different, since those aren't really courses re-taken?
    • Note: Section_ID should make the grain more unique so that re-taken courses don't need to be suffixed if their section ID is different. If section ID is the same, then suffixing would still occur- not sure if this is the behavior we want since in that case the duplicate may be a "true" one / an error.

New unit tests added to ensure things are working as expected.

PDP and Edvise schema duplicate handling now share a clear rule: suffix
course_number only when the duplicate-key group disagrees on
course classification/type, course name, credits attempted or earned, or grade (relevant fields we care about).

Otherwise - collapse to one row per key (arbitrary dropping).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant