Skip to content

Add Data Science template and execution base#348

Open
matheusmra wants to merge 3 commits into
mainfrom
338-feature-data-science-template-data_science
Open

Add Data Science template and execution base#348
matheusmra wants to merge 3 commits into
mainfrom
338-feature-data-science-template-data_science

Conversation

@matheusmra

Copy link
Copy Markdown
Member

Context

The current template library covers command-line I/O, web development, API testing, and static analysis, but lacks specialized support for data science assignments. Data science tasks involve unique grading requirements, such as reading datasets injected via assets, parsing non-deterministic outputs like ML model metrics (which require tolerance-based and threshold comparisons), and verifying generated artifact files (like .pkl models or prediction CSVs). We needed a dedicated grading template to support these complex workflows securely and efficiently without polluting the existing generic I/O templates.

Solution

Implemented the new DataScienceTemplate to natively support grading ML and data science tasks.

What was changed:

  • New Test Functions:
    • expect_stdout_value: Extracts numeric values from stdout via regex and compares them using a numeric tolerance.
    • expect_metric: Validates ML metrics extracted via regex against a threshold (e.g., >= 0.85).
    • expect_csv_output: Validates generated CSV files using proportional scoring for column names, shape, and cell values (with tolerance).
    • expect_json_output: Validates required keys and nested values in JSON outputs via dot notation.
    • expect_model_artifact: Ensures a generated model file meets minimum size requirements.
  • Architectural Improvements: Extracted the shared execution and base error handling logic from input_output.py into a new BaseExecutionTest class in execution_base.py. This prevents cross-template dependencies while keeping DRY principles.
  • Sandbox Environment: Added the PYTHON_DS language enum and created Dockerfile.python-ds to support heavy scientific libraries (e.g., pandas, scikit-learn, numpy) in an isolated environment.
  • Translation Engine Fix: Fixed a collision bug in the t() translation function by renaming the reserved {key} placeholder to {field_key} in all related JSON translation files.

Further clarifications

  • Reviewer notes: Ensure that you build the new docker image locally (docker build -t sandbox-pyds:latest -f sandbox_manager/images/Dockerfile.python-ds .) before testing submissions that use the PYTHON_DS language enum.
  • Follow-up work: We might consider adding native parsing for .parquet files in the future if dataset sizes for student assignments become too large for standard CSV processing.

Related issues

Closes #338 #339 #340 #341, #342 and #343

Checklist

  • I linked the related issue(s) and explained the motivation.
  • I kept this PR focused and scoped to a single concern.
  • I added or updated tests for changed behavior (or explained why not needed).
  • I ran the relevant tests locally.
  • I updated documentation when needed (README/docs/API examples).
  • This PR introduces API contract changes (request/response/endpoint/DTO).
  • If API changed, I documented compatibility or migration notes.
  • This PR includes breaking changes.
  • If breaking, I clearly described impact and migration steps.

Introduce a new Data Science grading template and shared execution base. Adds autograder/template_library/data_science.py with tests for stdout metrics, CSV/JSON artifact validation, and model artifact checks; adds execution_base.py to centralize sandbox execution and error handling (removes duplicate BaseExecutionTest from input_output). Registers DataScienceTemplate in the template library and service, updates translations (en/pt_br), adds docs (docs/template-library/data_science.md), a Python DS sandbox Dockerfile, and unit tests. Also updates README to advertise the Data Science template and adjusts sandbox models/tests accordingly.
@matheusmra matheusmra linked an issue May 29, 2026 that may be closed by this pull request
5 tasks
@matheusmra matheusmra marked this pull request as ready for review May 29, 2026 12:39
Copilot AI review requested due to automatic review settings May 29, 2026 12:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a DataScienceTemplate with five new test functions for grading ML/data-science assignments, extracts the shared sandbox-execution base class into its own module, and introduces a PYTHON_DS sandbox variant with a scientific-Python Docker image.

Changes:

  • New data_science template with expect_stdout_value, expect_metric, expect_csv_output, expect_json_output, expect_model_artifact tests, plus en/pt-BR translation entries and unit tests.
  • Moves BaseExecutionTest from input_output.py into a new execution_base.py and rewires the I/O template to import from it.
  • Adds Language.PYTHON_DS enum, Dockerfile.python-ds, README section and docs/template-library/data_science.md.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
autograder/template_library/data_science.py New template and five BaseExecutionTest-based test functions.
autograder/template_library/execution_base.py Extracted BaseExecutionTest (sandbox run + base error handling).
autograder/template_library/input_output.py Removes the in-file BaseExecutionTest and imports it from execution_base.
autograder/template_library/init.py Exports DataScienceTemplate.
autograder/services/template_library_service.py Registers data_science in the template registry.
autograder/translations/en.json, pt_br.json New data_science.* translation keys (uses {field_key} placeholder).
sandbox_manager/models/sandbox_models.py Adds PYTHON_DS = ("python_ds", "sandbox-pyds:latest").
sandbox_manager/images/Dockerfile.python-ds New image with numpy/pandas/scikit-learn/matplotlib/seaborn.
README.md Adds Data Science template section (4-row test table).
docs/template-library/data_science.md Full documentation of template, prerequisites, and example config.
tests/unit/test_data_science.py Unit tests for all five tests and template metadata.
tests/unit/test_template_contract.py Adds DataScienceTemplate() to the contract validation loop.

Comment thread README.md
Comment on lines +228 to +233
| Test Name | Description | Key Parameters |
|-----------|-------------|----------------|
| `expect_metric` | Extract metric from stdout and validate vs threshold | `metric_pattern`, `condition`, `threshold` |
| `expect_csv_output` | Validate generated CSV file contents | `artifact_path`, `expected_columns`, `expected_shape`, `expected_values` |
| `expect_json_output` | Validate generated JSON structure and keys | `artifact_path`, `required_keys`, `expected_values` |
| `expect_model_artifact` | Verify a model file was produced | `artifact_path`, `min_size_bytes` |
class Language(Enum):
"""Supported programming languages in the sandbox."""
PYTHON = ("python", "sandbox-py:latest")
PYTHON_DS = ("python_ds", "sandbox-pyds:latest")
Comment on lines +274 to +302
@staticmethod
def _validate_artifact_path(artifact_path: str) -> Optional[str]:
"""Return an error message if the path is unsafe, else None."""
if not artifact_path:
return "artifact_path is required"
if artifact_path.startswith("/") or ".." in artifact_path.split("/"):
return f"Invalid artifact_path (absolute or traversal): {artifact_path}"
return None

@staticmethod
def _parse_csv(content: str) -> tuple:
"""Parse CSV content, return (headers, rows) or raise ValueError."""
reader = csv.reader(io.StringIO(content))
rows = list(reader)
if not rows:
raise ValueError("CSV file is empty")
headers = rows[0]
data_rows = rows[1:]
return headers, data_rows

@staticmethod
def _values_match(actual: str, expected, tolerance: float) -> bool:
"""Compare two values with numeric tolerance if both are numbers."""
try:
actual_num = float(actual)
expected_num = float(expected)
return abs(actual_num - expected_num) <= tolerance
except (ValueError, TypeError):
return str(actual).strip() == str(expected).strip()
Comment on lines +642 to +646
"key_found": "✓ Key '{field_key}' found.",
"key_missing": "✗ Required key '{field_key}' not found.",
"value_match": "✓ Key '{field_key}': expected {expected}, got {actual}.",
"value_mismatch": "✗ Key '{field_key}': expected {expected}, got {actual}.",
"value_key_missing": "✗ Key '{field_key}' not found (cannot check value).",
Add a new Python Data Science sandbox build target and include it in sandbox-build-all (Makefile). Introduce Language.PYTHON_DS support in command resolution with the same default as Python (command_resolver). Extract common validate_artifact_path and values_match helpers in data_science, remove duplicated per-class implementations, and update callers to use the shared functions. Update API testing to use the 'field_key' translation parameter and adjust English and Portuguese translations accordingly. Also add README entry for the new expect_stdout_value test. These changes reduce duplication and add explicit support for data-science sandboxes and key naming consistency.
Update sandbox_manager/images/Dockerfile.python-ds to use scikit-learn==1.4.1.post1 instead of 1.4.1. This ensures the Python data-science image uses the post-release build (likely for packaging/bugfix improvements) without other dependency changes.
@matheusmra

Copy link
Copy Markdown
Member Author

@ArthurCRodrigues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Data Science Template (data_science)

2 participants