Skip to content

Commit a05d8d0

Browse files
vishpillai123Mesh-achchapmanhkWilliam Carrcursoragent
authored
chore(release): sync develop with main (#251)
* Merge pull request #217 from datakind/fix/pdp-course-handling-duplicates fix: fix duplicate-handling step in validation * fix(storage): reduce peak memory during upload validation - Download unvalidated blob to a temp file and validate by path instead of blob.open().read() via _path_for_edvise_read (avoids a full in-RAM copy). - Write validated CSV to a temp file and upload_from_filename instead of building the entire CSV in a StringIO string. Branched from develop (repo has no dev branch). Made-with: Cursor * chore(storage): log errno on temp download/to_csv OSError Helps distinguish ENOSPC vs other failures in Cloud Run logs; re-raises unchanged. Made-with: Cursor * test(storage): cover temp cleanup and OSError logging for validate upload - Download OSError: unlink temp, skip validate_file_reader, log errno - to_csv OSError: unlink temp, no upload, log errno - Upload failure after to_csv: temp still unlinked Made-with: Cursor * refactor(storage): extract temp download/unlink helpers for clarity Aligns with universal-principles: keep _run_validation_and_get_normalized_df under 50 lines, reduce nesting, replace tmp_path with local_csv_path naming. Made-with: Cursor * style: apply black/ruff format to gcsutil_test.py Made-with: Cursor * feat: consolidating staging into main and using main going forward as production (#234) * Feat: Added backfill endpoint * Fix: linting * added func description * added func description * added func description * added func description * added func description * added func description * added func description * feat: adjusted run output endpointto return model_run_id * Delete .DS_Store * Delete src/.DS_Store * Delete terraform/.DS_Store * feat: added model deletion endpoint * feat: added model deletion endpoint * feat: added model deletion endpoint * fix: linting * fix: linting * fix: linting * fix: linting * fix: linting * fix: linting * fix: linting * fixed model name malformation * fix: removed databricks deletion functionality * fix: removed query results not needed * fix: removed query results not needed * fix: added status * fix: added status * fix: formatting fix * fix: added query to retrieve model id * fix: added passive delete to db cascade so deleting the model ensures job runs are deleted * fix: removed extra db query for model id, since db now handles passive deletes * fix: formatting fix * fix: removed db mapping framework * fix: removed db mapping framework * fix: removed db mapping framework * fix: removed db mapping framework * feat: changed endpoint parameter name from experiment_run_id -> model_run_id * fix: type check errors * test batch and file data * eda endpoints * test data * eda calculations * eda year and term, course enrollemnts * eda degree types * fix: divide data category into a seperate front end table section * fix: linting * feat: developed function for adding custom jobs with institution and model validation * fix: linting errors * fix: linting errors * fix: changed route from GET to POST * fix: added output filename definition * fix: linting errors * eda test institution data * eda test institution * eda data * eda test data * allow missing eda data * eda enrollment type by intensity * eda pell recipient by 1st gen * eda student age by gender * eda pell status by race * eda tests * cache eda * tidy up * remove LOCAL test bucket setup * return List from get_term_counts * import pandas * remove unused variable * tidy up * eda bucket names * fix: type check errors * fix: type check errors * fix: type check errors * fix: formatting errors * fix: type check errors * fix: type check errors * fix: batch name renewal * fix: batch name renewal * fix: changed output_valid to true * fix: adjusted model card file path * fix: ensuring we are grabbing the most recent run for a model id * remove colors from /eda endpoint * return count and percentage in /eda degree_types * tidy up * fix: fix file format * fix: retrieve by model_run_id instead * fix: formatting * fix: validation error for worwic * fix: changed model name to model_run_id parameter * fix: added function to retrieve config.toml from select catalog * manually initialized course mappings * feat: added validation mapping * fix: formatting * fix: pylint * Ignore .cursor folder for personal cursor preferences * feat(schema): add Edvise schema definition * feat(institutions): add Edvise schema support Add Edvise schema support to institution management: - Add edvise_id field to InstTable and SchemaRegistryTable - Update create/update endpoints with Edvise support and validation - Add mutual exclusivity check (PDP vs Edvise) - Implement normalization for empty strings and whitespace - Remove redundant boolean flags (derive status from ID presence) - Add comprehensive test coverage (34 new test cases) All changes are backward compatible. * fix: resolve CI/CD test failures - Fix test_create_inst_with_edvise_success: use unique institution name to avoid UNIQUE constraint violation - Fix test_trigger_inference_run: add pdp_id to InstTable fixture in models_test.py - Fix code formatting: run ruff format on database.py, institutions.py, and institutions_test.py These fixes address the three issues that were causing CI/CD test failures: 1. UNIQUE constraint failed: inst.name in test_create_inst_with_edvise_success 2. Assertion error: expected 400 but got 501 in test_trigger_inference_run 3. Ruff format check failures * fix: resolve unique constraint conflicts in SchemaRegistryTable - Add doc_type to is_pdp and is_edvise unique constraints to allow base, PDP, and Edvise schemas to coexist with same version - Add CheckConstraint to enforce mutual exclusivity of is_pdp and is_edvise flags Fixes Bugbot issue: Unique constraint prevented coexisting schema types for same version. The original constraints (is_pdp, version_label) and (is_edvise, version_label) prevented base schema and PDP/Edvise extensions from sharing the same version label since they all had is_pdp=False and is_edvise=False. Adding doc_type to these constraints allows proper coexistence while maintaining uniqueness guarantees. Also adds database-level enforcement that is_pdp and is_edvise cannot both be True simultaneously. * fix: resolve mypy type errors - Fix type error in institutions.py: change set to list for requested_schemas default value - Add return type annotations to all test functions in institutions_test.py - Add return type annotations to fixture functions - Add typing.Any import for fixture return types Fixes mypy errors: incompatible types in assignment and missing return type annotations. * fix: add missing type annotations to test function parameters - Add TestClient type annotations to test_create_inst_unauth, test_create_inst, test_edit_inst, and test_delete_inst Fixes mypy errors: Function is missing a type annotation for one or more arguments. * feat: Implement Phase 3 Edvise schema validation logic - Add EDVISE_SCHEMA_GROUP constant to utilities.py (mirrors PDP_SCHEMA_GROUP) - Add _edvise_cache to _ValidationState class for schema caching with TTL - Update validation_helper() to load Edvise schema when edvise_id is set - Add defensive check for mutual exclusivity (pdp_id and edvise_id cannot both be set) - Add error handling for missing Edvise schema with clear error messages - Update institution creation endpoint to use EDVISE_SCHEMA_GROUP when edvise_id is provided - Add comprehensive test suite: 15 tests covering happy path, errors, cache, authorization, and edge cases This implementation enables institutions with edvise_id to use the Edvise schema extension for file validation, following the same pattern as PDP schema validation. All changes are backwards compatible and include comprehensive test coverage (~90% of critical paths). * fix: Resolve Edvise test failures and improve test reliability - Fix type annotation error in PDP schema branch (mypy no-redef) - Change test user to DATAKINDER for multi-institution access - Fix database constraint violation in precedence test (version_label) - Simplify cache tests to verify behavior instead of implementation - Remove duplicate assertion in cache expiration test - Optimize imports in test fixture * fix: Update Edvise test filenames to include descriptive keywords - Change generic test filenames (test.csv, test_file.csv, etc.) to include 'student' keyword - This allows validation_helper to properly infer model types from filenames - Fixes ValueError: Could not infer model(s) from file name errors - Formatting will be applied by CI ruff formatter * style: Format data_test.py with ruff * fix(validation): return proper HTTP status codes for institution errors - Change ValueError to HTTPException (404) when institution not found in validation_helper - Fix test_validate_edvise_unauthorized to test actual unauthorized access instead of non-existent institution - Ensures proper HTTP status codes are returned to API clients * fix: handle filename inference errors and extension schema deactivation - Replace ValueError with HTTPException (422) for filename inference failures to return proper user-facing error instead of 500 - Deactivate existing extension schemas before inserting new ones to ensure only one active extension per institution and prevent nondeterministic queries - Add comprehensive validation error formatter with PII masking and user-friendly messages - Add integration and snapshot tests for error formatter * fix: remove unused imports from validation_error_formatter_snapshot_test - Remove unused typing imports (Any, Dict, List) - Remove unused pandera imports (DataFrameSchema, Column, Check) - Remove unused MAX_ERROR_EXAMPLES import Fixes ruff linting errors (F401) reported in CI. * fix: resolve test failures and configuration issues - Remove invalid catalog_name parameter from create_custom_schema_extension call - Restore testpaths configuration to use src directory - Add Pandera FutureWarning filter to pytest config - Fix syntax warning in databricks.py docstring - Format files with Ruff * fix: resolve Ruff and Mypy linting errors - Remove unused imports (IO, cast, tomli/tomllib) from databricks.py - Remove duplicate import re statement - Add type annotations to test cases in validation_error_formatter_test.py - Add type: ignore comments for intentional invalid type tests * fix: align database constraints with production schema and fix Edvise version_label collision - Fix uq_pdp_version constraint to match production: remove doc_type (matches actual DB schema) - Remove uq_edvise_version constraint (enforced operationally, not via DB constraint) - Update CHECK constraint to use MySQL-compatible boolean values (1/0 instead of TRUE/FALSE) - Fix Edvise test fixture to use version_label='edvise-1.0.0' to avoid uq_pdp_version collision - Add explanatory comment about version_label choice in test fixture These changes ensure the ORM matches the actual production database schema and prevent constraint violations when running tests against MySQL. * fix: handle parameterized Pandera check types in validation error formatting Fix bug where parameterized check types (e.g., "isin(['A', 'B', 'C'])", "str_length(3, None)") were not being matched to their formatters, causing generic error messages instead of human-readable ones. Changes: - Add _extract_base_check_type() to extract base type from parameterized check types (e.g., "isin(['A', 'B'])" -> "isin") - Add _normalize_check_type_alias() to map verbose Pandera names to spec keys (e.g., "greater_than" -> "gt", "greater_than_or_equal_to" -> "ge") - Update _find_check_spec() to use base type extraction and alias normalization - Update _format_check_error() to only format when matching spec is found (prevents semantic errors like formatting "greater_than" as "ge") - Add _format_gt_error() and _format_lt_error() for strict comparison checks - Preserve semantic correctness: strict comparisons (> and <) vs non-strict (≥ and ≤) Edge cases handled: - Namespaced types: "Check.isin(['A'])" -> "isin" - Empty/None/non-string inputs: returns safe empty string - Spaces around parentheses: "isin (['A'])" -> "isin" - Complex repr: "str_matches(re.compile('...'))" -> "str_matches" Testing: - Add comprehensive unit tests for base type extraction and alias handling - Add tests for parameterized check types (isin, str_length, gt, ge) - Update integration test assertion to match actual output format - Update snapshot fixtures to reflect new human-readable messages Fixes parameterized check type matching while maintaining semantic correctness for strict vs non-strict comparisons. * style: format validation_error_formatter files with ruff Auto-formatted files to comply with project formatting standards. * feat: add case-insensitive institution name lookup - Implement case-insensitive matching for GET /institutions/name/{inst_name} endpoint - Use func.lower() on both database column and input parameter for case-insensitive comparison - Update docstring to document case-insensitive behavior and error handling - Add comprehensive test cases for case-insensitive matching: - Test multiple case variations (original, title case, uppercase, mixed case) - Test lowercase input matching database entries - Test uppercase input matching lowercase database entries - Fix type error: change requested_schemas assignment from set to list for type consistency - Apply code formatting with ruff * fix: add missing return type annotations to test functions - Add Generator import from typing for fixture return types - Add return type annotations (-> None) to all test functions: - test_read_all_inst - test_read_all_inst_datakinder - test_read_inst_by_name - test_read_inst_by_name_case_insensitive - test_read_inst_by_name_case_insensitive_lowercase - test_read_inst_by_name_case_insensitive_uppercase - test_read_inst_by_pdp_id - test_read_inst - Fix fixture return types to use Generator[TestClient, None, None] - client_fixture - datakinder_client_fixture - Resolves mypy type checking errors for test file * style: apply ruff formatting to test file - Split long function signatures across multiple lines for readability - Format client_fixture and datakinder_client_fixture function signatures - Format test_read_inst_by_name_case_insensitive_lowercase and _uppercase function signatures * fix(test): update institutions test for edvise_id API changes - Remove unused typing.Any import - Update test_read_all_inst_datakinder to include edvise_id in expected response - Add edvise_test_school institution to expected response (4 institutions total) - Fix line length for pylint compliance This fixes test failures caused by API changes from develop branch that now return edvise_id and pdp_id fields for all institutions. * fix(validation): pass institution_id so Edvise/PDP/custom use correct extension block - Thread schema_namespace (edvise | pdp | inst UUID) from data router through validate_file and validate_file_reader into validate_dataset - merge_model_columns now receives correct key for extension_schema['institutions'] - Add institution_id param with default 'pdp' for backward compatibility - Add tests: assert Edvise validation passes institution_id='edvise'; add unit test that institution_id selects the right extension block (edvise vs pdp) - Expand docstrings (Args/Returns) and add comment explaining schema_namespace - Addresses reviewer Q1: schema extension logic now works for Edvise and custom institutions, not only PDP * Apply Black formatting to institutions_test.py * Apply ruff format to institutions_test.py * Fix institutions_test assert for Black and Ruff format compatibility * Fix pylint E1135 in data_test: use .get() instead of membership test on captured_schema * Apply ruff format to data_test.py * feat(validation): schema validation during upload with PDP/edvise repo alignment - Add PDP edvise schema validation path (validation_pdp_edvise) - Add Edvise-to-PDP normalization (validation_edvise_normalize) - Integrate repo schemas into validation pipeline and error formatter - Update pdp_schema_extension and lockfile; add tests Co-authored-by: Cursor <cursoragent@cursor.com> * feat(validation): write normalized data to validated/, archive raw to raw/ - On validation success: archive original to raw/{filename}, write normalized (canonical columns, coerced dtypes) DataFrame to validated/{filename}, delete from unvalidated/ - Validation layer always returns normalized_df on success; storage serializes to UTF-8 CSV and uploads to validated/ - Add input validation and helpers in gcsutil (under 50 lines); catch specific exceptions; TYPE_CHECKING for HardValidationError in validation_pdp_edvise - Add gcsutil_test.py: validate_file input/error/success paths, _run_validation_and_get_normalized_df, _write_dataframe_to_gcs_as_csv - Add validation_test: empty-schema short-circuit returns normalized_df None - Ruff/black formatting and lint fixes; mypy-clean for touched files Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(validation): align with universal principles, add tests, fix types and format - Extract validation helpers to meet 50-line rule (_header_missing_and_extra, _get_csv_read_kwargs, _validate_optional_columns_json) - Extract gcsutil._archive_raw_and_write_validated; add type hints to rename_file - Add tests: PDP rename/validate_dataframe, CSV read failure, gcsutil error propagation, edvise institution_identifier in validate_file call - Remove unused validation_edvise_normalize and its tests - Fix mypy in validation_pdp_edvise and tests (Optional[List], cast, annotations) - Apply ruff format Co-authored-by: Cursor <cursoragent@cursor.com> * feat(validation): use edvise read for PDP uploads and add PDP path tests - Route PDP cohort/course through edvise read (read_raw_pdp_*); remove API-side normalizers for PDP so pipeline and API share one source of truth - Add _path_for_edvise_read, _read_pdp_course_edvise, _validate_pdp_with_edvise_read - Convert Pandera SchemaErrors to HardValidationError in PDP path - Add validation_pdp_read_path_test.py (routing, path cleanup, SchemaErrors, course converter fallback); extend Src type with io.StringIO for file-like Co-authored-by: Cursor <cursoragent@cursor.com> * move cloud build config to repo * sst-app-api -> edvise-api * quiet down sqlalchemy * use EdaSummary from edvise * use ruff formatter * test a file * tidy up * Add return type annotations for mypy in main_test and users_test * tidy up * move cache check after batch result check * fix test_execute_pdp_pull * install git * install git in correct Dockerfile * install git in worker * update edvise branch * use develop branch for edvise * install edvise in build * cloudbuild with edvise * fix(validation): resolve pylint used-before-assignment error Initialize schema_err_to_raise before try block to satisfy pylint's static analysis, which doesn't recognize that pytest.skip() always raises. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(api): add legacy school type with any-format uploads - Add legacy_id to InstTable and institution API (create, update, read) - Enforce mutual exclusivity of pdp_id, edvise_id, legacy_id via has_at_most_one_school_type - Legacy validation: encoding + CSV read only, no schema checks - Add LEGACY_SCHEMA_GROUP and tests for legacy path and mutual exclusivity Made-with: Cursor * feat(api): legacy PII check, principles compliance, and test coverage - Add PII column check for legacy uploads; reject before raw/validated - Treat student_id as non-PII (false positive) for all institution types - Comply with universal principles: docstrings, extract create_institution helpers (<50 lines), comment lazy import in validation - Add tests: has_at_most_one_school_type, legacy header-only CSV, legacy PII rejection returns 400, explicit legacy_id create, update add legacy_id, storage/Databricks failure paths - Fix mypy in create_institution (row variable) Made-with: Cursor * docs(api): use Edvise Schema (ES) naming to reduce confusion Replace 'Edvise schema' with 'Edvise Schema (ES)' in docstrings, comments, and user-facing error messages so the schema type is distinguished from the Edvise product (ES convention). Made-with: Cursor * feat(data): allow legacy institutions to upload files with any filename - Fetch institution before filename inference; set allowed_schemas to UNKNOWN when inference fails for legacy (non-legacy still get 422 for non-descriptive names) - Refactor validation_helper into helpers under 50 lines; add full docstrings, early empty-filename and invalid inst_id validation, log before 404 - Add unit tests for _infer_allowed_schemas_from_filename and _ext_models_set - Add integration tests: empty filename 422, invalid inst_id 404, edvise non-descriptive filename 422, duplicate validate idempotent - Fix mypy and ruff/black in data.py and data_test.py - Add PR_DESCRIPTION.md for feature branch Made-with: Cursor * chore: remove PR_DESCRIPTION.md Made-with: Cursor * fix(validation): run PII check for header-only legacy CSVs * fix(test): align validation error snapshot with non-PII student_id display Made-with: Cursor * feat(validation): use PDP cohort converter and support custom converters - Use converter_func_cohort by default for PDP cohort validation (filters DE/DS/SE) - Add optional pdp_cohort_converter_func and pdp_course_converter_func to validate_file_reader and validate_dataset for school-specific overrides - Course validation tries custom converter first, then default handling_duplicates - Validate converter args are callable; convert converter/read failures to HardValidationError so API returns 400 with context - Add PDPConverterFunc type; extract helpers to meet 50-line and error-handling rules Made-with: Cursor * fix(validation): satisfy mypy for PDP validation and tests - Add unreachable return after with block in _validate_pdp_with_edvise_read - Use cast(Any, ...) in tests that pass non-callables to converter params Made-with: Cursor * chore: remove real institution names * chore: ruff format * fix: use latest edvise EdaSummary * fix: use edvise develop branch * chore(deps): pin edvise to develop * feat(ci): notify slack channel on deployment * fix: lock file was out of sync * chore: bump edvise version to 0.1.12 * Revert "feat: legacy school type with any-format uploads, PII check, and Edvise Schema (ES) naming" * Revert "Revert "feat: legacy school type with any-format uploads, PII check, and Edvise Schema (ES) naming"" * feat(config): add optional local inst/batch/file seed from config for LOCAL * style: ruff format * fix(validation): pass schema_type to handling_duplicates for PDP course CSV read_raw_pdp_course_data calls converter_func(df) with one argument; bare handling_duplicates is invalid on current edvise. Use a wrapper that calls handling_duplicates(df, "pdp") positionally for edvise compatibility. Remove the broken second default converter. Update PDP read path test. Made-with: Cursor * style: ruff format PDP course read path test Made-with: Cursor * fix(deps): upgrade databricks-sql-connector for pyarrow>=17 (edvise) databricks-sql-connector 3.5 pins pyarrow<17; edvise requires pyarrow>=17. Use databricks-sql-connector[pyarrow]~=4.2.x and refresh uv.lock (pyarrow 19). Aligns lock with Cloud Build 'uv lock --upgrade-package edvise'. Made-with: Cursor * Merge pull request #217 from datakind/fix/pdp-course-handling-duplicates fix: fix duplicate-handling step in validation * fix(storage): reduce peak memory during upload validation - Download unvalidated blob to a temp file and validate by path instead of blob.open().read() via _path_for_edvise_read (avoids a full in-RAM copy). - Write validated CSV to a temp file and upload_from_filename instead of building the entire CSV in a StringIO string. Branched from develop (repo has no dev branch). Made-with: Cursor * chore(storage): log errno on temp download/to_csv OSError Helps distinguish ENOSPC vs other failures in Cloud Run logs; re-raises unchanged. Made-with: Cursor * test(storage): cover temp cleanup and OSError logging for validate upload - Download OSError: unlink temp, skip validate_file_reader, log errno - to_csv OSError: unlink temp, no upload, log errno - Upload failure after to_csv: temp still unlinked Made-with: Cursor * refactor(storage): extract temp download/unlink helpers for clarity Aligns with universal-principles: keep _run_validation_and_get_normalized_df under 50 lines, reduce nesting, replace tmp_path with local_csv_path naming. Made-with: Cursor * style: apply black/ruff format to gcsutil_test.py Made-with: Cursor * fix(storage): reduce peak memory during upload validation - Download unvalidated blob to a temp file and validate by path instead of blob.open().read() via _path_for_edvise_read (avoids a full in-RAM copy). - Write validated CSV to a temp file and upload_from_filename instead of building the entire CSV in a StringIO string. Branched from develop (repo has no dev branch). Made-with: Cursor * chore(storage): log errno on temp download/to_csv OSError Helps distinguish ENOSPC vs other failures in Cloud Run logs; re-raises unchanged. Made-with: Cursor * test(storage): cover temp cleanup and OSError logging for validate upload - Download OSError: unlink temp, skip validate_file_reader, log errno - to_csv OSError: unlink temp, no upload, log errno - Upload failure after to_csv: temp still unlinked Made-with: Cursor * refactor(storage): extract temp download/unlink helpers for clarity Aligns with universal-principles: keep _run_validation_and_get_normalized_df under 50 lines, reduce nesting, replace tmp_path with local_csv_path naming. Made-with: Cursor * style: apply black/ruff format to gcsutil_test.py Made-with: Cursor * chore: bump edvise v0.2.0 * fix(pdp-validation): default cohort converter to none Stop passing edvise converter_func_cohort when pdp_cohort_converter_func is omitted so PDP cohort rows are validated as read. - Callers may still pass an explicit cohort converter. - Update PDP read-path test to expect converter_func=None. - Refresh docstrings (pipeline vs API, Args/Returns/Raises) in validation and validation_pdp_edvise. Made-with: Cursor * feat(api): remove custom institution path; require school type; legacy schemas UNKNOWN - Require exactly one of PDP, Edvise, or Legacy on POST /institutions - Remove custom schema resolution and Databricks extension generation for uploads - Fix PATCH /institutions to persist allowed_schemas to inst.schemas column - LEGACY_SCHEMA_GROUP stores UNKNOWN only; drop validation_extension module - Update tests and default fixtures for typeless/custom removal Made-with: Cursor * feat(api): harden institutions API after custom-institution removal - POST/PATCH: require exactly one school type (pdp, edvise, or legacy) - PATCH: recompute schemas only when the type triple changes; merge optional allowed_schemas on change - PATCH: honor is_edvise/is_legacy for auto-assigned ids (POST parity) - Docs/tests: validation namespaces; disambiguate custom naming in code and tests Made-with: Cursor * docs(api): revert broad custom wording; keep upload docs accurate Restore original docstrings and test names where "custom" referred to\nconverters, schema config, or JSON keys—not custom institutions.\n\nKeep gcsutil validate_file institution_id line aligned with pdp/edvise/legacy\nonly (no institution-UUID-for-custom upload path). Made-with: Cursor * fix(institutions): reject POST duplicate when existing row lacks school type When (name, state) matches an existing InstTable row, validate stored\npdp_id/edvise_id/legacy_id the same as new creates: at most one non-null\nand exactly one required. Return 400 with guidance instead of 200 for\ntypeless or invalid rows. Add regression tests. Made-with: Cursor * test(institutions): cover duplicate POST, PATCH flags, allowed_schemas-only - Reject is_pdp without pdp_id on POST\n- Reject duplicate (name, state) when stored row has conflicting ids\n- Reject PATCH is_edvise on PDP row without clearing pdp_id\n- Reject PATCH with both is_edvise and is_legacy\n- allowed_schemas-only PATCH replaces schemas when type unchanged Made-with: Cursor * refactor(institutions): extract PATCH helpers and DRY school-type errors - Add shared mutual-exclusion detail constant for POST/PATCH paths - Extract duplicate-post row validation and PATCH merge/validate/persist helpers - Keep update_inst within single-responsibility helpers; reuse row response mapper Made-with: Cursor * fix(lint): satisfy ruff and mypy on databricks and institutions - Remove unused HTTPException import from databricks.py (F401) - Cast ORM row in _require_single_institution_row_by_uuid for InstTable (no-any-return) Made-with: Cursor * style(institutions): apply ruff format to router and tests Made-with: Cursor * refactor: simplify local_inst_data * docs: Update local_inst_data instructions * chore: remove unused import * fix: make pdp_id and state optional * chore: bumping pyproject and uv.lock --------- Co-authored-by: Mesh <meshach.ogunmodede@datakind.org> Co-authored-by: Meshach Ogunmodede <142531479+Mesh-ach@users.noreply.github.com> Co-authored-by: William Carr <bill.carr@datakind.org> Co-authored-by: Hannah Ofstedahl <98632391+chapmanhk@users.noreply.github.com> Co-authored-by: Hannah Ofstedahl <hannahxchapman@yahoo.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: William Carr <bill@datakind.org> Co-authored-by: Vishakh Pillai <vishpillai97@gmail.com> Co-authored-by: kaylawilding <95330483+kaylawilding@users.noreply.github.com> * Revert "feat: consolidating staging into main and using main going forward as…" (#236) This reverts commit 9b70f23. * Merge develop into main (#240) * docs: inherit org community health files (#237) * docs: remove local community health files to inherit from org-wide .github repo * docs: update README to include previous contributing info * feat(api): simplify create model request to name only (#238) * chore: bump edvise dependency to 1.0.0 (#241) Co-authored-by: Vishakh Pillai <vishpillai97@gmail.com> * chore: creating dummy changlog.md file while we create semver / gitflow process --------- Co-authored-by: Rachel Wells <rachellaurynwells@gmail.com> Co-authored-by: William Carr <bill@datakind.org> Co-authored-by: Vishakh Pillai <vishpillai97@gmail.com> * chore(release): edvise-api 1.0.0 (#249) * docs: inherit org community health files (#237) * docs: remove local community health files to inherit from org-wide .github repo * docs: update README to include previous contributing info * feat(api): simplify create model request to name only (#238) * chore: bump edvise dependency to 1.0.0 (#241) Co-authored-by: Vishakh Pillai <vishpillai97@gmail.com> * chore: creating dummy changlog.md file while we create semver / gitflow process * feat: trigger Databricks GCS→bronze sync after file validation (Edvise/Legacy) (#239) * Trigger GCS→bronze Databricks sync after validation (Edvise/Legacy) - Add run_validated_gcs_to_bronze_sync and job edvise_validated_gcs_to_bronze_sync with include_blob_paths_json for validated/{file_name}. - Call after successful validate-upload / validate-sftp when edvise_id or legacy_id is set; ENABLE_GCS_BRONZE_SYNC_ON_VALIDATION (default true) to disable. - Failures to start the job are logged and do not fail validation. - Extend data tests with DatabricksControl mock and assertions. Made-with: Cursor * feat(data): bronze sync after validation with tracing and job id resolution Schedule GCS-to-bronze Databricks run_now in BackgroundTasks after validated/ writes. Add correlation_id and JSON trace logs (validation_request, background start/done). Optional DATABRICKS_VALIDATED_BRONZE_SYNC_JOB_ID; resolve job by name with duplicate detection when unset. Refine skip reasons for PDP vs Edvise/Legacy. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(data): align bronze sync with universal principles Extract Databricks helpers and job-parameter constants, use specific exceptions (ValueError, DatabricksError), and split background logging into focused functions under 50 lines. Add tests for PDP-only and env kill-switch skips plus run_now parameter contract coverage. Co-authored-by: Cursor <cursoragent@cursor.com> * style: apply ruff format to bronze sync modules Co-authored-by: Cursor <cursoragent@cursor.com> * fix(data): resolve prefixed bronze sync Databricks jobs Co-authored-by: Cursor <cursoragent@cursor.com> * fix(data): map bronze sync job ids by environment Co-authored-by: Cursor <cursoragent@cursor.com> * fix(data): trigger bronze sync during validation request Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * feat(api): validate Edvise uploads with repo schemas (#242) * feat(api): validate Edvise uploads with repo schemas Route Edvise student and course uploads through upstream edvise Pandera schemas so upload validation matches the pipeline contract and is not bypassed by registry schema drift. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(api): separate Edvise validation routing Keep PDP and Edvise repo-schema upload validation paths distinct so each helper has a single responsibility while preserving the same validation behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(api): remove redundant repo validation fallback Keep JSON validation flow focused now that PDP and Edvise repo-schema uploads are routed before schema merging. Co-authored-by: Cursor <cursoragent@cursor.com> * style(api): format validation routing test Apply Ruff formatting to keep the Edvise validation routing tests passing style checks. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webapp): establish pyproject.toml as canonical Edvise API version (#243) * feat(webapp): establish pyproject.toml as canonical Edvise API version, set version and title in OpenAPI * docs: rename SST -> Edvise * docs(webapp): update formatter instructions to ruff to align with github workflows and engineering playbook * test(webapp): assert OpenAPI version matches pyproject.toml * feat(eda): add clear_cache option to /eda endpoint (#233) * feat: legacy school inference DB job trigger (#212) * feat: custom school inference, but need to confirm if custom is the same as legacy * fix: transitioning from 'custom' to 'legacy' * fix: remove validation of job parameters, handled already through edvise * fix: run request still requires str values, defaulting to empty string * fix: still getting pydantic error * feat: using substring matching to find legacy job since i have it deployed under my name because of target==dev * fix: style * fix: style * fix: style * fix: making batch file name more robust so we don't run into decoding issues * fix: merge conflict * fix: merge conflict --------- Co-authored-by: Vishakh Pillai <vishpillai97@gmail.com> * feat: add gen ai as 4th option in addition pdp / edvise / legacy in api and uploads (#244) * feat: Added "GenAI" as an option for "create institution" note: for GenAI raw files, we will reuse the same loose rules as Legacy institutions (read CSV, PII check, no strict ES columns). * fix: style --------- Co-authored-by: Noreen Mayat <nm3224@alum.barnard.edu> Co-authored-by: Vishakh Pillai <vishpillai97@gmail.com> * fix(models): derive PDP batch schema configs from institution schemas (#247) * fix(models): derive PDP batch schema configs from institution schemas When model.schema_configs is null, PDP inference now builds a default required COURSE+STUDENT (etc.) batch rule from inst.schemas instead of 500ing. Explicit model configs still take precedence. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(models): cast jsonpickle decode for mypy no-any-return Co-authored-by: Cursor <cursoragent@cursor.com> * fix(databricks): prefer Cloud Run job when pipeline name is ambiguous When multiple dev bundle jobs match a PDP or legacy inference pipeline substring, resolve to [dev dev_cloudrun_sa] if present; otherwise use the first sorted match instead of failing the inference request. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Vishakh Pillai <vishpillai97@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> * ci: datakind shared workflows (#245) * ci: datakind shared workflows * refactor: rename test.yml -> tests.yml * fix(ci): add workflow_call to style and tests workflows * refactor: use pre-release workflow from shared workflows * ci: replace with shared enforce-pr-targets workflow Aligns checks against the current protected branches, main and develop, rather than staging * chore: remove unused workflow * refactor: remove pull_request triggers. These run via ci.yml * ci: pin tests and type-check to Python 3.13 * chore(ci): remove legacy webapp-and-worker precommit workflow * ci: standardize on Python 3.12 across workflows and pyproject * ci: test workflow enforcement * ci: test workflow enforcement * ci: add gate job to report required ci status check * chore: bump python version to 3.10 * chore: standardize Python 3.12 across project and Docker * chore: updating edvise v1.2.0 * chore: CHANGELOG.md update + type check * chore(release): bump version * ci(cloudbuild): parameterize webapp deploy for multi-environment triggers --------- Co-authored-by: Rachel Wells <rachellaurynwells@gmail.com> Co-authored-by: Vishakh Pillai <64162993+vishpillai123@users.noreply.github.com> Co-authored-by: Vishakh Pillai <vishpillai97@gmail.com> Co-authored-by: Hannah Ofstedahl <98632391+chapmanhk@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Noreen Mayat <nm3224@alum.barnard.edu> --------- Co-authored-by: Meshach Ogunmodede <142531479+Mesh-ach@users.noreply.github.com> Co-authored-by: Hannah Ofstedahl <98632391+chapmanhk@users.noreply.github.com> Co-authored-by: Hannah Ofstedahl <hannahxchapman@yahoo.com> Co-authored-by: Mesh <meshach.ogunmodede@datakind.org> Co-authored-by: William Carr <bill.carr@datakind.org> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: William Carr <bill@datakind.org> Co-authored-by: Vishakh Pillai <vishpillai97@gmail.com> Co-authored-by: kaylawilding <95330483+kaylawilding@users.noreply.github.com> Co-authored-by: Rachel Wells <rachellaurynwells@gmail.com> Co-authored-by: Noreen Mayat <nm3224@alum.barnard.edu>
1 parent 387690c commit a05d8d0

15 files changed

Lines changed: 66 additions & 689 deletions

File tree

.devcontainer/devcontainer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"name": "python 3.10 & uv",
2+
"name": "python 3.12 & uv",
33
"dockerComposeFile": "compose.yaml",
44
"service": "app",
55
"workspaceFolder": "/workspace",

.devcontainer/dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM mcr.microsoft.com/devcontainers/python:3.10-bookworm
1+
FROM mcr.microsoft.com/devcontainers/python:3.12-bookworm
22

33
# Copy uv and uvx from the official image
44
COPY --from=ghcr.io/astral-sh/uv:0.4.30 /uv /uvx /bin/

.github/actions/setup-python-env/action.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@ name: "Set Up Python Environment"
22

33
inputs:
44
python-version:
5-
description: "Python version to use"
6-
required: true
5+
description: "Python version to use (default from .python-version)"
6+
required: false
7+
default: "3.12"
78

89
runs:
910
using: composite

.github/workflows/style.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,6 @@ jobs:
1111
uses: actions/checkout@v4
1212
- name: Set up Python environment
1313
uses: ./.github/actions/setup-python-env
14-
with:
15-
python-version: "3.12"
1614
- name: Get changed files
1715
id: changed-files
1816
uses: step-security/changed-files@v45

.github/workflows/tests.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@ jobs:
1313
uses: actions/checkout@v4
1414
- name: Set up Python environment
1515
uses: ./.github/actions/setup-python-env
16-
with:
17-
python-version: "3.12"
1816
- name: Run tests
1917
run: |
2018
uv run python -m pytest

.github/workflows/type-check.yml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,6 @@ jobs:
1111
uses: actions/checkout@v4
1212
- name: Set up Python environment
1313
uses: ./.github/actions/setup-python-env
14-
with:
15-
python-version: "3.12"
1614
- name: Get changed files
1715
id: changed-files
1816
uses: step-security/changed-files@v45
@@ -22,4 +20,4 @@ jobs:
2220
- name: Check types
2321
if: steps.changed-files.outputs.any_changed == 'true'
2422
run: |
25-
uv run python -m mypy --install-types --non-interactive ${{ steps.changed-files.outputs.all_changed_files }}
23+
uv run python -m mypy ${{ steps.changed-files.outputs.all_changed_files }}

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,6 @@ profile_default/
7979
ipython_config.py
8080

8181
# pyenv
82-
.python-version
8382

8483
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
8584
__pypackages__/

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.12

CHANGELOG.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,13 @@
1-
# CHANGELOG
2-
- TBD: Updated with gitflow & semver
1+
## 1.0.0 (2026-06-16)
2+
- feat: add GenAI as a fourth schema type (alongside PDP, Edvise, and Legacy) in API and uploads (#244)
3+
- feat: legacy school inference Databricks job trigger (#212)
4+
- feat: validate Edvise uploads against repo schemas (#242)
5+
- feat: trigger Databricks GCS→bronze sync after file validation (Edvise/Legacy) (#239)
6+
- feat: simplify create-model request to name only (#238)
7+
- feat: add `clear_cache` option to `/eda` endpoint (#233)
8+
- feat: establish `pyproject.toml` as canonical Edvise API version (#243)
9+
- fix: derive PDP batch schema configs from institution schemas (#247)
10+
- chore: bump `edvise` dependency to 1.2.0
11+
- chore: standardize Python 3.12 across project, CI, devcontainer, and Docker
12+
- ci: adopt DataKind shared workflows (#245)
13+
- docs: inherit org community health files (#237)

cloudbuild-webapp.yaml

Lines changed: 23 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
1-
# Cloud Build config for webapp (dev-webapp trigger).
2-
# _REGION and _ENVIRONMENT are set by the trigger (Terraform).
1+
# Cloud Build config for webapp. Reused by per-environment triggers (dev, qa, staging, prod).
2+
# Each trigger sets _REGION and _ENVIRONMENT; deploys Cloud Run service "${_ENVIRONMENT}-webapp".
3+
substitutions:
4+
_REGION: us-east4
5+
_SERVICE: edvise-api
36
steps:
47
- name: ghcr.io/astral-sh/uv:debian
58
entrypoint: bash
@@ -12,40 +15,40 @@ steps:
1215
- name: gcr.io/cloud-builders/docker
1316
args:
1417
- build
15-
- '-f'
18+
- "-f"
1619
- src/webapp/Dockerfile
17-
- '-t'
18-
- '${_REGION}-docker.pkg.dev/${PROJECT_ID}/edvise-api/webapp:$COMMIT_SHA'
19-
- '-t'
20-
- '${_REGION}-docker.pkg.dev/${PROJECT_ID}/edvise-api/webapp:latest'
20+
- "-t"
21+
- "${_REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/webapp:$COMMIT_SHA"
22+
- "-t"
23+
- "${_REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/webapp:latest"
2124
- .
2225
- name: gcr.io/cloud-builders/docker
2326
args:
2427
- push
25-
- '${_REGION}-docker.pkg.dev/${PROJECT_ID}/edvise-api/webapp:$COMMIT_SHA'
28+
- "${_REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/webapp:$COMMIT_SHA"
2629
- name: gcr.io/cloud-builders/docker
2730
args:
2831
- push
29-
- '${_REGION}-docker.pkg.dev/${PROJECT_ID}/edvise-api/webapp:latest'
32+
- "${_REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/webapp:latest"
3033
- name: gcr.io/cloud-builders/gcloud
3134
args:
3235
- run
3336
- deploy
34-
- '${_ENVIRONMENT}-webapp'
35-
- '--image'
36-
- '${_REGION}-docker.pkg.dev/${PROJECT_ID}/edvise-api/webapp:$COMMIT_SHA'
37-
- '--region'
38-
- '${_REGION}'
37+
- "${_SERVICE}"
38+
- "--image"
39+
- "${_REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/webapp:$COMMIT_SHA"
40+
- "--region"
41+
- "${_REGION}"
3942
- name: curlimages/curl
4043
args:
41-
- '-X'
44+
- "-X"
4245
- POST
43-
- '-H'
44-
- 'Content-Type: application/json'
45-
- '-f'
46-
- '-d'
46+
- "-H"
47+
- "Content-Type: application/json"
48+
- "-f"
49+
- "-d"
4750
- >-
48-
{"text":"🚀 *$REPO_NAME* deployed · `$BRANCH_NAME` · $TRIGGER_NAME · $BUILD_ID"}
51+
{"text":"🚀 $TRIGGER_NAME → *$REPO_NAME* deployed · `$BRANCH_NAME`"}
4952
- >-
5053
https://hooks.slack.com/triggers/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
5154
id: notify-slack

0 commit comments

Comments
 (0)