Skip to content

feat: add gen ai as 4th option in addition pdp / edvise / legacy in api and uploads#230

Draft
nm3224 wants to merge 2 commits into
developfrom
feat-add-GenAI-as-4th-option-in-addition-PDP-/-Edvise-/-Legacy-in-API-and-uploads
Draft

feat: add gen ai as 4th option in addition pdp / edvise / legacy in api and uploads#230
nm3224 wants to merge 2 commits into
developfrom
feat-add-GenAI-as-4th-option-in-addition-PDP-/-Edvise-/-Legacy-in-API-and-uploads

Conversation

@nm3224
Copy link
Copy Markdown
Contributor

@nm3224 nm3224 commented Apr 30, 2026

What i've done so far:

  • Added GenAI as a fourth “lane” for institutions alongside PDP, Edvise Schema (ES), and Legacy: new genai_id on the institution record (with is_genai / genai_id on create/update APIs), exactly one of the four ids enforced, and genai_id returned on institution reads/lists.
  • Kept GenAI distinct from Edvise in the database: the row continues to be identified by genai_id; nothing in this work converts a GenAI school to edvise_id after mapping—this is on purpose.
  • Aligned initial uploads with Legacy: for GenAI institutions, upload routing uses the same loose path as Legacy for now (encoding/CSV read, PII-style gates, no strict ES column match; vague filenames can resolve to UNKNOWN like Legacy).

What still needs to be done:

  • GenAI mapping job
    After a raw file lands for a genai_id institution, we need to trigger the GenAi mapping pipeline (worker, scheduled job, queue message, or internal endpoint). Define inputs (bucket/path, inst_id, file id) and failure/retry behavior.
  • Validate mapped output with ES rules without changing institution type
    Run Edvise Schema (ES) checks on the mapped tables inside the mapper or a follow-up job. Important: this is validation only—the institution row should stay genai_id (do not set edvise_id to “graduate” the school; provenance stays GenAI).
  • Databricks + webapp wiring
    Write validated mapped outputs to the agreed Databricks layout (catalog/schema/tables or volumes), and register those artifacts back to the webapp (metadata APIs, file/batch records, or whatever your product uses) so they can flow into ES schema training and inference pipelines.
  • Production database: after merging, we need to update the production database so the inst table has a genai_id column (and a unique rule on it if you use auto-generated ids like genai_1).
  • Optional: dedicated GenAI upload namespace
    - Today GenAI reuses the Legacy validation branch in code for speed; later you may want a genai-specific path so you can change rules (file types, quotas, audit logs) without touching Legacy schools.
  • Observability & ops: metrics/logging for mapping success/failure, PII rejections, and ES validation failures on mapped data (tagged by inst_id / genai_id).
  • Documentation: short internal doc for DS/support—four lanes, GenAI = loose ingest + mapper + ES check on output, institution type never flips to Edvise for provenance.

Cross-team

  • edvise-ui: fourth option on create/edit institution; display genai_id wherever pdp_id / edvise_id / legacy_id are shown (likely a task for @mrmaloof ).
  • Access control: confirm Datakinder-only create still matches policy (The GenAI path didn’t accidentally add a new endpoint or code path that skips the same is_datakinder() guard. Same rule, fourth type)

full transparency- cursor helped me ground myself with a lot of this and add the necessary code- if things are wrong-unnecessary, let me know!

nm3224 added 2 commits April 30, 2026 13:34
note: for GenAI raw files, we will reuse the same loose rules as Legacy institutions (read CSV, PII check, no strict ES columns).
@nm3224 nm3224 marked this pull request as draft April 30, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant