Add Data Science template and execution base#348
Open
matheusmra wants to merge 3 commits into
Open
Conversation
Introduce a new Data Science grading template and shared execution base. Adds autograder/template_library/data_science.py with tests for stdout metrics, CSV/JSON artifact validation, and model artifact checks; adds execution_base.py to centralize sandbox execution and error handling (removes duplicate BaseExecutionTest from input_output). Registers DataScienceTemplate in the template library and service, updates translations (en/pt_br), adds docs (docs/template-library/data_science.md), a Python DS sandbox Dockerfile, and unit tests. Also updates README to advertise the Data Science template and adjusts sandbox models/tests accordingly.
5 tasks
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a DataScienceTemplate with five new test functions for grading ML/data-science assignments, extracts the shared sandbox-execution base class into its own module, and introduces a PYTHON_DS sandbox variant with a scientific-Python Docker image.
Changes:
- New
data_sciencetemplate withexpect_stdout_value,expect_metric,expect_csv_output,expect_json_output,expect_model_artifacttests, plus en/pt-BR translation entries and unit tests. - Moves
BaseExecutionTestfrominput_output.pyinto a newexecution_base.pyand rewires the I/O template to import from it. - Adds
Language.PYTHON_DSenum,Dockerfile.python-ds, README section anddocs/template-library/data_science.md.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| autograder/template_library/data_science.py | New template and five BaseExecutionTest-based test functions. |
| autograder/template_library/execution_base.py | Extracted BaseExecutionTest (sandbox run + base error handling). |
| autograder/template_library/input_output.py | Removes the in-file BaseExecutionTest and imports it from execution_base. |
| autograder/template_library/init.py | Exports DataScienceTemplate. |
| autograder/services/template_library_service.py | Registers data_science in the template registry. |
| autograder/translations/en.json, pt_br.json | New data_science.* translation keys (uses {field_key} placeholder). |
| sandbox_manager/models/sandbox_models.py | Adds PYTHON_DS = ("python_ds", "sandbox-pyds:latest"). |
| sandbox_manager/images/Dockerfile.python-ds | New image with numpy/pandas/scikit-learn/matplotlib/seaborn. |
| README.md | Adds Data Science template section (4-row test table). |
| docs/template-library/data_science.md | Full documentation of template, prerequisites, and example config. |
| tests/unit/test_data_science.py | Unit tests for all five tests and template metadata. |
| tests/unit/test_template_contract.py | Adds DataScienceTemplate() to the contract validation loop. |
Comment on lines
+228
to
+233
| | Test Name | Description | Key Parameters | | ||
| |-----------|-------------|----------------| | ||
| | `expect_metric` | Extract metric from stdout and validate vs threshold | `metric_pattern`, `condition`, `threshold` | | ||
| | `expect_csv_output` | Validate generated CSV file contents | `artifact_path`, `expected_columns`, `expected_shape`, `expected_values` | | ||
| | `expect_json_output` | Validate generated JSON structure and keys | `artifact_path`, `required_keys`, `expected_values` | | ||
| | `expect_model_artifact` | Verify a model file was produced | `artifact_path`, `min_size_bytes` | |
| class Language(Enum): | ||
| """Supported programming languages in the sandbox.""" | ||
| PYTHON = ("python", "sandbox-py:latest") | ||
| PYTHON_DS = ("python_ds", "sandbox-pyds:latest") |
Comment on lines
+274
to
+302
| @staticmethod | ||
| def _validate_artifact_path(artifact_path: str) -> Optional[str]: | ||
| """Return an error message if the path is unsafe, else None.""" | ||
| if not artifact_path: | ||
| return "artifact_path is required" | ||
| if artifact_path.startswith("/") or ".." in artifact_path.split("/"): | ||
| return f"Invalid artifact_path (absolute or traversal): {artifact_path}" | ||
| return None | ||
|
|
||
| @staticmethod | ||
| def _parse_csv(content: str) -> tuple: | ||
| """Parse CSV content, return (headers, rows) or raise ValueError.""" | ||
| reader = csv.reader(io.StringIO(content)) | ||
| rows = list(reader) | ||
| if not rows: | ||
| raise ValueError("CSV file is empty") | ||
| headers = rows[0] | ||
| data_rows = rows[1:] | ||
| return headers, data_rows | ||
|
|
||
| @staticmethod | ||
| def _values_match(actual: str, expected, tolerance: float) -> bool: | ||
| """Compare two values with numeric tolerance if both are numbers.""" | ||
| try: | ||
| actual_num = float(actual) | ||
| expected_num = float(expected) | ||
| return abs(actual_num - expected_num) <= tolerance | ||
| except (ValueError, TypeError): | ||
| return str(actual).strip() == str(expected).strip() |
Comment on lines
+642
to
+646
| "key_found": "✓ Key '{field_key}' found.", | ||
| "key_missing": "✗ Required key '{field_key}' not found.", | ||
| "value_match": "✓ Key '{field_key}': expected {expected}, got {actual}.", | ||
| "value_mismatch": "✗ Key '{field_key}': expected {expected}, got {actual}.", | ||
| "value_key_missing": "✗ Key '{field_key}' not found (cannot check value).", |
Add a new Python Data Science sandbox build target and include it in sandbox-build-all (Makefile). Introduce Language.PYTHON_DS support in command resolution with the same default as Python (command_resolver). Extract common validate_artifact_path and values_match helpers in data_science, remove duplicated per-class implementations, and update callers to use the shared functions. Update API testing to use the 'field_key' translation parameter and adjust English and Portuguese translations accordingly. Also add README entry for the new expect_stdout_value test. These changes reduce duplication and add explicit support for data-science sandboxes and key naming consistency.
Update sandbox_manager/images/Dockerfile.python-ds to use scikit-learn==1.4.1.post1 instead of 1.4.1. This ensures the Python data-science image uses the post-release build (likely for packaging/bugfix improvements) without other dependency changes.
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The current template library covers command-line I/O, web development, API testing, and static analysis, but lacks specialized support for data science assignments. Data science tasks involve unique grading requirements, such as reading datasets injected via assets, parsing non-deterministic outputs like ML model metrics (which require tolerance-based and threshold comparisons), and verifying generated artifact files (like
.pklmodels or prediction CSVs). We needed a dedicated grading template to support these complex workflows securely and efficiently without polluting the existing generic I/O templates.Solution
Implemented the new
DataScienceTemplateto natively support grading ML and data science tasks.What was changed:
expect_stdout_value: Extracts numeric values from stdout via regex and compares them using a numeric tolerance.expect_metric: Validates ML metrics extracted via regex against a threshold (e.g.,>= 0.85).expect_csv_output: Validates generated CSV files using proportional scoring for column names, shape, and cell values (with tolerance).expect_json_output: Validates required keys and nested values in JSON outputs via dot notation.expect_model_artifact: Ensures a generated model file meets minimum size requirements.input_output.pyinto a newBaseExecutionTestclass inexecution_base.py. This prevents cross-template dependencies while keeping DRY principles.PYTHON_DSlanguage enum and createdDockerfile.python-dsto support heavy scientific libraries (e.g., pandas, scikit-learn, numpy) in an isolated environment.t()translation function by renaming the reserved{key}placeholder to{field_key}in all related JSON translation files.Further clarifications
docker build -t sandbox-pyds:latest -f sandbox_manager/images/Dockerfile.python-ds .) before testing submissions that use thePYTHON_DSlanguage enum..parquetfiles in the future if dataset sizes for student assignments become too large for standard CSV processing.Related issues
Closes #338 #339 #340 #341, #342 and #343
Checklist