Skip to content

Create tasks sub-folder with .ttl schema.org and SHACL files + basic python validator#1016

Merged
leobianco merged 7 commits into
mlcommons:mainfrom
leobianco:main
May 4, 2026
Merged

Create tasks sub-folder with .ttl schema.org and SHACL files + basic python validator#1016
leobianco merged 7 commits into
mlcommons:mainfrom
leobianco:main

Conversation

@leobianco
Copy link
Copy Markdown
Contributor

This pull request creates the tasks sub-folder to the Croissant repository, providing (1) an extension of schema.org in croissant-tasks.ttl to define metadata for machine learning tasks, (2) SHACL shapes in croissant-tasks-shapes.ttl enforcing the format compliance, (3) A validator.py script + a simple test suite targeting both valid and invalid JSON-LD examples.

…itory, providing (1) an extension of schema.org in croissant-tasks.ttl to define metadata for machine learning tasks, (2) SHACL shapes in croissant-tasks-shapes.ttl enforcing the format compliance, (3) A validator.py script + a simple test suite targeting both valid and invalid JSON-LD examples.
@leobianco leobianco requested a review from a team as a code owner March 23, 2026 10:37
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 23, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@leobianco
Copy link
Copy Markdown
Contributor Author

Hi,

The workflow is failing at the notebook tests, it seems that it needs to be refactored? Here is an AI-generated explanation + potential fix:

The Problem
If you look closely at the logs, the installation step fails silently (or rather, pip prints an error message but the notebook continues executing) before it reaches the cell that imports etils:

  1. Permission Denied for APT:

E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)

The notebook is trying to run !apt-get install without sudo, but the GitHub actions runner requires sudo to install system packages.

  1. Invalid pip syntax:
error: invalid-egg-fragment
× The 'mlcroissant[dev]' egg fragment is invalid

Modern versions of pip dropped support for the old #egg=name[extra] syntax when installing from git repositories. Because this failed, the dependencies (including etils) were never actually installed.

  1. The Final Crash:

ModuleNotFoundError: No module named 'etils'
Because the pip install in the previous cell failed, Python crashes as soon as it tries to import etils.

The Solution
You need to edit the file recipes/bounding-boxes.ipynb (and potentially any other .ipynb notebook files in the recipes/ folder that have this same installation cell).

Open the notebook, find the very first code block that looks like this:

# Install mlcroissant from the source
!apt-get install -y python3-dev graphviz libgraphviz-dev pkg-config
!pip install "git+https://github.com/${GITHUB_REPOSITORY:-mlcommons/croissant}.git@${GITHUB_HEAD_REF:-main}#subdirectory=python/mlcroissant&egg=mlcroissant[dev]"

And replace the two commands with correctly formatted versions:

# Install mlcroissant from the source
!sudo apt-get install -y python3-dev graphviz libgraphviz-dev pkg-config
!pip install "mlcroissant[dev] @ git+https://github.com/${GITHUB_REPOSITORY:-mlcommons/croissant}.git@${GITHUB_HEAD_REF:-main}#subdirectory=python/mlcroissant"

What changed:

  1. Added sudo in front of apt-get so it has permissions on the test runner.
  2. Changed the pip syntax to the modern "Direct URL" syntax (name[extra] @ url) instead of appending &egg=... to the end of the URL.

…at some attribute be undetermined (a spec). TaskSolution only requires a pointer to a TaskProblem. Checking that TaskSolution actually solves the TaskProblem it points to will be done later in python, not in SHACL.
Copy link
Copy Markdown
Contributor

@benjelloun benjelloun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Sorry for the late review.

Comment thread tasks/testdata/invalid_problem_no_spec.jsonld
Comment thread tasks/testdata/invalid_solution_no_is_based_on.jsonld
Comment thread tasks/croissant-tasks-shapes.ttl
sh:targetClass croissant:TaskProblem ;
# A TaskProblem must have at least one property that is a spec class
sh:property [
sh:or (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not the execution or evaluation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add the execution spec option to the execution field of the TaskProblem, as it might be filled later for a particular TaskSolution execution.

As for the evaluation, intuitively, at least for benchmarks, evaluation should be concrete on the TaskProblem, otherwise each solution could specify its own evaluation metrics?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree for benchmarks, but with a bit of imagination, I could imagine some other types of tasks where the evaluation is not specified by the problem...

Comment thread tasks/croissant-tasks.ttl Outdated
Comment thread tasks/croissant-tasks.ttl Outdated
Comment thread tasks/croissant-tasks.ttl
@leobianco leobianco merged commit 833fe78 into mlcommons:main May 4, 2026
13 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants