feat: add gen ai as 4th option in addition pdp / edvise / legacy in api and uploads by nm3224 · Pull Request #230 · datakind/edvise-api

nm3224 · 2026-04-30T17:44:33Z

What i've done so far:

Added GenAI as a fourth “lane” for institutions alongside PDP, Edvise Schema (ES), and Legacy: new genai_id on the institution record (with is_genai / genai_id on create/update APIs), exactly one of the four ids enforced, and genai_id returned on institution reads/lists.
Kept GenAI distinct from Edvise in the database: the row continues to be identified by genai_id; nothing in this work converts a GenAI school to edvise_id after mapping—this is on purpose.
Aligned initial uploads with Legacy: for GenAI institutions, upload routing uses the same loose path as Legacy for now (encoding/CSV read, PII-style gates, no strict ES column match; vague filenames can resolve to UNKNOWN like Legacy).

What still needs to be done:

GenAI mapping job
After a raw file lands for a genai_id institution, we need to trigger the GenAi mapping pipeline (worker, scheduled job, queue message, or internal endpoint). Define inputs (bucket/path, inst_id, file id) and failure/retry behavior.
Validate mapped output with ES rules without changing institution type
Run Edvise Schema (ES) checks on the mapped tables inside the mapper or a follow-up job. Important: this is validation only—the institution row should stay genai_id (do not set edvise_id to “graduate” the school; provenance stays GenAI).
Databricks + webapp wiring
Write validated mapped outputs to the agreed Databricks layout (catalog/schema/tables or volumes), and register those artifacts back to the webapp (metadata APIs, file/batch records, or whatever your product uses) so they can flow into ES schema training and inference pipelines.
Production database: after merging, we need to update the production database so the inst table has a genai_id column (and a unique rule on it if you use auto-generated ids like genai_1).
Optional: dedicated GenAI upload namespace
- Today GenAI reuses the Legacy validation branch in code for speed; later you may want a genai-specific path so you can change rules (file types, quotas, audit logs) without touching Legacy schools.
Observability & ops: metrics/logging for mapping success/failure, PII rejections, and ES validation failures on mapped data (tagged by inst_id / genai_id).
Documentation: short internal doc for DS/support—four lanes, GenAI = loose ingest + mapper + ES check on output, institution type never flips to Edvise for provenance.

Cross-team

edvise-ui: fourth option on create/edit institution; display genai_id wherever pdp_id / edvise_id / legacy_id are shown (likely a task for @mrmaloof ).
Access control: confirm Datakinder-only create still matches policy (The GenAI path didn’t accidentally add a new endpoint or code path that skips the same is_datakinder() guard. Same rule, fourth type)

full transparency- cursor helped me ground myself with a lot of this and add the necessary code- if things are wrong-unnecessary, let me know!

note: for GenAI raw files, we will reuse the same loose rules as Legacy institutions (read CSV, PII check, no strict ES columns).

nm3224 added 2 commits April 30, 2026 13:34

feat: Added "GenAI" as an option for "create institution"

348183a

note: for GenAI raw files, we will reuse the same loose rules as Legacy institutions (read CSV, PII check, no strict ES columns).

fix: style

0e0b4e0

nm3224 marked this pull request as draft April 30, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add gen ai as 4th option in addition pdp / edvise / legacy in api and uploads#230

feat: add gen ai as 4th option in addition pdp / edvise / legacy in api and uploads#230
nm3224 wants to merge 2 commits into
developfrom
feat-add-GenAI-as-4th-option-in-addition-PDP-/-Edvise-/-Legacy-in-API-and-uploads

nm3224 commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nm3224 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What i've done so far:

What still needs to be done:

Cross-team

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nm3224 commented Apr 30, 2026 •

edited

Loading