Add Data Science template and execution base by matheusmra · Pull Request #348 · webtech-network/autograder

matheusmra · 2026-05-29T12:38:34Z

Context

The current template library covers command-line I/O, web development, API testing, and static analysis, but lacks specialized support for data science assignments. Data science tasks involve unique grading requirements, such as reading datasets injected via assets, parsing non-deterministic outputs like ML model metrics (which require tolerance-based and threshold comparisons), and verifying generated artifact files (like .pkl models or prediction CSVs). We needed a dedicated grading template to support these complex workflows securely and efficiently without polluting the existing generic I/O templates.

Solution

Implemented the new DataScienceTemplate to natively support grading ML and data science tasks.

What was changed:

New Test Functions:
- expect_stdout_value: Extracts numeric values from stdout via regex and compares them using a numeric tolerance.
- expect_metric: Validates ML metrics extracted via regex against a threshold (e.g., >= 0.85).
- expect_csv_output: Validates generated CSV files using proportional scoring for column names, shape, and cell values (with tolerance).
- expect_json_output: Validates required keys and nested values in JSON outputs via dot notation.
- expect_model_artifact: Ensures a generated model file meets minimum size requirements.
Architectural Improvements: Extracted the shared execution and base error handling logic from input_output.py into a new BaseExecutionTest class in execution_base.py. This prevents cross-template dependencies while keeping DRY principles.
Sandbox Environment: Added the PYTHON_DS language enum and created Dockerfile.python-ds to support heavy scientific libraries (e.g., pandas, scikit-learn, numpy) in an isolated environment.
Translation Engine Fix: Fixed a collision bug in the t() translation function by renaming the reserved {key} placeholder to {field_key} in all related JSON translation files.

Further clarifications

Reviewer notes: Ensure that you build the new docker image locally (docker build -t sandbox-pyds:latest -f sandbox_manager/images/Dockerfile.python-ds .) before testing submissions that use the PYTHON_DS language enum.
Follow-up work: We might consider adding native parsing for .parquet files in the future if dataset sizes for student assignments become too large for standard CSV processing.

Related issues

Closes #338 #339 #340 #341, #342 and #343

Checklist

I linked the related issue(s) and explained the motivation.
I kept this PR focused and scoped to a single concern.
I added or updated tests for changed behavior (or explained why not needed).
I ran the relevant tests locally.
I updated documentation when needed (README/docs/API examples).
This PR introduces API contract changes (request/response/endpoint/DTO).
If API changed, I documented compatibility or migration notes.
This PR includes breaking changes.
If breaking, I clearly described impact and migration steps.

Introduce a new Data Science grading template and shared execution base. Adds autograder/template_library/data_science.py with tests for stdout metrics, CSV/JSON artifact validation, and model artifact checks; adds execution_base.py to centralize sandbox execution and error handling (removes duplicate BaseExecutionTest from input_output). Registers DataScienceTemplate in the template library and service, updates translations (en/pt_br), adds docs (docs/template-library/data_science.md), a Python DS sandbox Dockerfile, and unit tests. Also updates README to advertise the Data Science template and adjusts sandbox models/tests accordingly.

Copilot

Pull request overview

Adds a DataScienceTemplate with five new test functions for grading ML/data-science assignments, extracts the shared sandbox-execution base class into its own module, and introduces a PYTHON_DS sandbox variant with a scientific-Python Docker image.

Changes:

New data_science template with expect_stdout_value, expect_metric, expect_csv_output, expect_json_output, expect_model_artifact tests, plus en/pt-BR translation entries and unit tests.
Moves BaseExecutionTest from input_output.py into a new execution_base.py and rewires the I/O template to import from it.
Adds Language.PYTHON_DS enum, Dockerfile.python-ds, README section and docs/template-library/data_science.md.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
autograder/template_library/data_science.py	New template and five `BaseExecutionTest`-based test functions.
autograder/template_library/execution_base.py	Extracted `BaseExecutionTest` (sandbox run + base error handling).
autograder/template_library/input_output.py	Removes the in-file `BaseExecutionTest` and imports it from `execution_base`.
autograder/template_library/init.py	Exports `DataScienceTemplate`.
autograder/services/template_library_service.py	Registers `data_science` in the template registry.
autograder/translations/en.json, pt_br.json	New `data_science.*` translation keys (uses `{field_key}` placeholder).
sandbox_manager/models/sandbox_models.py	Adds `PYTHON_DS = ("python_ds", "sandbox-pyds:latest")`.
sandbox_manager/images/Dockerfile.python-ds	New image with numpy/pandas/scikit-learn/matplotlib/seaborn.
README.md	Adds Data Science template section (4-row test table).
docs/template-library/data_science.md	Full documentation of template, prerequisites, and example config.
tests/unit/test_data_science.py	Unit tests for all five tests and template metadata.
tests/unit/test_template_contract.py	Adds `DataScienceTemplate()` to the contract validation loop.

+| Test Name | Description | Key Parameters |
+|-----------|-------------|----------------|
+| `expect_metric` | Extract metric from stdout and validate vs threshold | `metric_pattern`, `condition`, `threshold` |
+| `expect_csv_output` | Validate generated CSV file contents | `artifact_path`, `expected_columns`, `expected_shape`, `expected_values` |
+| `expect_json_output` | Validate generated JSON structure and keys | `artifact_path`, `required_keys`, `expected_values` |
+| `expect_model_artifact` | Verify a model file was produced | `artifact_path`, `min_size_bytes` |


 class Language(Enum):
    """Supported programming languages in the sandbox."""
    PYTHON = ("python", "sandbox-py:latest")
+    PYTHON_DS = ("python_ds", "sandbox-pyds:latest")


+    @staticmethod
+    def _validate_artifact_path(artifact_path: str) -> Optional[str]:
+        """Return an error message if the path is unsafe, else None."""
+        if not artifact_path:
+            return "artifact_path is required"
+        if artifact_path.startswith("/") or ".." in artifact_path.split("/"):
+            return f"Invalid artifact_path (absolute or traversal): {artifact_path}"
+        return None
+
+    @staticmethod
+    def _parse_csv(content: str) -> tuple:
+        """Parse CSV content, return (headers, rows) or raise ValueError."""
+        reader = csv.reader(io.StringIO(content))
+        rows = list(reader)
+        if not rows:
+            raise ValueError("CSV file is empty")
+        headers = rows[0]
+        data_rows = rows[1:]
+        return headers, data_rows
+
+    @staticmethod
+    def _values_match(actual: str, expected, tolerance: float) -> bool:
+        """Compare two values with numeric tolerance if both are numbers."""
+        try:
+            actual_num = float(actual)
+            expected_num = float(expected)
+            return abs(actual_num - expected_num) <= tolerance
+        except (ValueError, TypeError):
+            return str(actual).strip() == str(expected).strip()


+        "key_found": "✓ Key '{field_key}' found.",
+        "key_missing": "✗ Required key '{field_key}' not found.",
+        "value_match": "✓ Key '{field_key}': expected {expected}, got {actual}.",
+        "value_mismatch": "✗ Key '{field_key}': expected {expected}, got {actual}.",
+        "value_key_missing": "✗ Key '{field_key}' not found (cannot check value).",


Add a new Python Data Science sandbox build target and include it in sandbox-build-all (Makefile). Introduce Language.PYTHON_DS support in command resolution with the same default as Python (command_resolver). Extract common validate_artifact_path and values_match helpers in data_science, remove duplicated per-class implementations, and update callers to use the shared functions. Update API testing to use the 'field_key' translation parameter and adjust English and Portuguese translations accordingly. Also add README entry for the new expect_stdout_value test. These changes reduce duplication and add explicit support for data-science sandboxes and key naming consistency.

Update sandbox_manager/images/Dockerfile.python-ds to use scikit-learn==1.4.1.post1 instead of 1.4.1. This ensures the Python data-science image uses the post-release build (likely for packaging/bugfix improvements) without other dependency changes.

matheusmra · 2026-05-29T13:06:49Z

@ArthurCRodrigues

matheusmra linked an issue May 29, 2026 that may be closed by this pull request

Feature: Data Science Template (data_science) #338

Open

5 tasks

matheusmra marked this pull request as ready for review May 29, 2026 12:39

Copilot AI review requested due to automatic review settings May 29, 2026 12:39

Copilot started reviewing on behalf of matheusmra May 29, 2026 12:39 View session

Copilot AI reviewed May 29, 2026

View reviewed changes

matheusmra added 2 commits May 29, 2026 09:57

fix: scikit-learn to 1.4.1.post1

3d00d4a

Update sandbox_manager/images/Dockerfile.python-ds to use scikit-learn==1.4.1.post1 instead of 1.4.1. This ensures the Python data-science image uses the post-release build (likely for packaging/bugfix improvements) without other dependency changes.

matheusmra requested a review from ArthurCRodrigues May 29, 2026 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Data Science template and execution base#348

Add Data Science template and execution base#348
matheusmra wants to merge 3 commits into
mainfrom
338-feature-data-science-template-data_science

matheusmra commented May 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

matheusmra commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

matheusmra commented May 29, 2026

Context

Solution

Further clarifications

Related issues

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

matheusmra commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants