feat: renumber duplicate keys only on relevant course fields by nm3224 · Pull Request #144 · datakind/edvise

nm3224 · 2026-04-28T16:46:01Z

Primary Keys for course records now also include: section_id.
BUT: if section_id has >75% null/missing values, exclude it from primary keys and proceed without it.

PDP and Edvise schema duplicate handling now share a clear rule: suffix course_number only when the duplicate-key group disagrees on course classification/type, course name, credits attempted or earned, or grade (relevant fields we care about). Otherwise - collapse to one row per key (arbitrary dropping). The kept row is chosen deterministically as the first index in the group.

How this changes our student-term features:
For scalar ("course_id", catalog) in sum_val_equal_cols_by_group, course_id values that are renumbered variants of the catalog id (e.g. ENG101-1 for ENG101) are counted toward num_courses_course_id_*, so key-course counts can reflect multiple rows in a term after dedupe. This captured important information around students re-taking courses which was otherwise dropped.

Q: Should this not be the case for lab/lecture combos or courses where the course-name is tangibly different, since those aren't really courses re-taken?
- Note: Section_ID should make the grain more unique so that re-taken courses don't need to be suffixed if their section ID is different. If section ID is the same, then suffixing would still occur- not sure if this is the behavior we want since in that case the duplicate may be a "true" one / an error.

New unit tests added to ensure things are working as expected.

PDP and Edvise schema duplicate handling now share a clear rule: suffix course_number only when the duplicate-key group disagrees on course classification/type, course name, credits attempted or earned, or grade (relevant fields we care about). Otherwise - collapse to one row per key (arbitrary dropping).

nm3224 requested review from kaylawilding and vishpillai123 as code owners April 28, 2026 16:46

nm3224 added 2 commits April 28, 2026 12:54

feat: adding section_id to primary keys

d78c3d6

Merge branch 'develop' into feat-change-duplicate-handling-behavior

da93731

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: renumber duplicate keys only on relevant course fields#144

feat: renumber duplicate keys only on relevant course fields#144
nm3224 wants to merge 3 commits into
developfrom
feat-change-duplicate-handling-behavior

nm3224 commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nm3224 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nm3224 commented Apr 28, 2026 •

edited

Loading