Skip to content

Commit 833fe78

Browse files
authored
Create tasks sub-folder with .ttl schema.org and SHACL files + basic python validator (#1016)
This pull request creates the tasks sub-folder to the Croissant repository, providing (1) an extension of schema.org in croissant-tasks.ttl to define metadata for machine learning tasks, (2) SHACL shapes in croissant-tasks-shapes.ttl enforcing the format compliance, (3) A validator.py script + a simple test suite targeting both valid and invalid JSON-LD examples.
1 parent 74ec679 commit 833fe78

20 files changed

Lines changed: 949 additions & 0 deletions

tasks/README.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Croissant Tasks
2+
3+
This library provides a SHACL validation toolkit and ontology for extending schema.org and Croissant with Task descriptions. It is part of the `mlcommons/croissant` universe.
4+
5+
## Installation
6+
7+
You can install this package locally using `pip`:
8+
9+
```bash
10+
pip install .
11+
```
12+
13+
For development (which installs dependencies needed to run tests):
14+
15+
```bash
16+
pip install -e .[dev]
17+
```
18+
19+
## Usage
20+
21+
This package includes a validator script to check if data (in `.jsonld` format) conforms to the Croissant Tasks SHACL shapes.
22+
23+
```bash
24+
python validator.py <path_to_data.jsonld>
25+
```
26+
27+
For example, to test against the included example data:
28+
```bash
29+
python validator.py testdata/valid_solution.jsonld
30+
```
31+
32+
## Running Tests
33+
34+
To run the full suite of unit tests, you can use `pytest`:
35+
36+
```bash
37+
pytest validator_test.py
38+
```
39+
40+
## Croissant Tasks Specification
41+
42+
The Croissant Tasks specification defines an ontology and a set of validation rules (SHACL shapes) to describe machine learning tasks, their problems, and their solutions. It builds upon `schema.org` and MLCommons `Croissant`.
43+
44+
### Core Concepts
45+
46+
* **`croissant:Task`**: The base class for all tasks, extending `schema:CreativeWork`.
47+
* **`croissant:TaskProblem`**: Represents a task profile or a "problem" to be solved. It defines what the task is but leaves some components open for implementation. It is characterized by having at least one "Spec" property, indicating what is expected from a solution.
48+
* **`croissant:TaskSolution`**: Represents a concrete implementation or solution to a `TaskProblem`. It must be linked to a `TaskProblem` and must provide concrete values for the components specified as specs in the problem.
49+
* **`croissant:EvaluationTask`**: A specialization of `Task` that includes evaluation-specific information, such as metrics and results.
50+
* **`croissant:EvaluationResult`**: A structured result of an evaluation, pairing a metric with its value.
51+
52+
### Properties
53+
54+
A `Task` (and its subclasses) can have the following properties:
55+
56+
* **`croissant:input`**: Data used as input. Can be a concrete dataset (MLCommons `croissant:Dataset`, `schema:Dataset`), a URL, or an `InputSpec`.
57+
* **`croissant:output`**: Data generated by the task. Can be a dataset (`schema:Dataset`), source code (`schema:SoftwareSourceCode`), or an `OutputSpec`.
58+
* **`croissant:implementation`**: The program that executes the task. Can be a software application (`schema:SoftwareApplication`), source code (`schema:SoftwareSourceCode`), or an `ImplementationSpec`.
59+
* **`croissant:execution`**: Information about task execution. Points to `croissant:ExecutionInfo` or `croissant:ExecutionSpec`.
60+
* **`croissant:evaluation`**: Evaluation of the task. Points to another task (`croissant:EvaluationTask`) or `croissant:EvaluationSpec`.
61+
* **`croissant:subTask`**: Links to one or more subtasks that are part of this task.
62+
* **`schema:isBasedOn`**: Used by `TaskSolution` to link back to the `TaskProblem` it solves.
63+
* **`croissant:evaluationResults`**: Used by `EvaluationTask` to link to one or more `EvaluationResult` objects.
64+
* **`croissant:evaluatedSolution`**: Used by `EvaluationTask` to link to the `TaskSolution` being evaluated.
65+
* **`croissant:metric`**: Used by `EvaluationResult` to specify the metric name or IRI.
66+
* **`croissant:value`**: Used by `EvaluationResult` to specify the result value.
67+
68+
### Specification Classes (Specs)
69+
70+
Specs are used in `TaskProblem` to define requirements for what a solution must provide:
71+
72+
* **`croissant:InputSpec`**: Specifies input requirements. Must include a `croissant:schema` property pointing to a formal Croissant `RecordSet`.
73+
* **`croissant:OutputSpec`**: Specifies output requirements. Must include a `croissant:schema` property pointing to a formal Croissant `RecordSet`.
74+
* **`croissant:ImplementationSpec`**: Specifies technical requirements for implementation. Can include `croissant:tests` (pointing to a `Test` or URL) and `croissant:environment` (pointing to a description or URL).
75+
* **`croissant:ExecutionSpec`**: A placeholder for execution info before running.
76+
* **`croissant:EvaluationSpec`**: A placeholder for evaluation metrics that have not been specified yet.
77+
78+
### Validation Rules (SHACL Shapes)
79+
80+
The specification enforces the following rules via SHACL shapes:
81+
82+
**General Task Rules:**
83+
* `input`, `output`, and `implementation` must point to valid types as described above.
84+
85+
**TaskProblem Rules:**
86+
* Must have at least one `input`, `output`, or `implementation` property that is a Spec class (`InputSpec`, `OutputSpec`, or `ImplementationSpec`).
87+
* `execution` must be an `ExecutionSpec`.
88+
* `evaluation` must be an `EvaluationTask` or `EvaluationSpec`.
89+
90+
**TaskSolution Rules:**
91+
* Must be linked to a `TaskProblem` via `schema:isBasedOn`.
92+
* Cannot contain any Spec classes for `input`, `output`, `implementation`, or `evaluation`.
93+
* Must have either:
94+
* At least one concrete `implementation`.
95+
* Or have `subTask`s, and *all* subtasks must have concrete implementations.
96+
97+
**EvaluationTask Rules:**
98+
* `evaluationResults` must point to `EvaluationResult`.
99+
* `evaluatedSolution` must point to exactly one `TaskSolution`.
100+
101+
**EvaluationResult Rules:**
102+
* `metric` is required and must be a string or URL.
103+
* `value` is required and must be a QuantitativeValue, string, or number.

tasks/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""Croissant Tasks Package."""

tasks/croissant-tasks-shapes.ttl

Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
@prefix schema: <https://schema.org/> .
2+
@prefix sh: <http://www.w3.org/ns/shacl#> .
3+
@prefix croissant: <http://mlcommons.org/croissant/> .
4+
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
5+
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
6+
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
7+
8+
# -----------------------------------------------------------------------------
9+
# Base Task Shape
10+
# -----------------------------------------------------------------------------
11+
croissant:TaskShape
12+
a sh:NodeShape ;
13+
sh:targetClass croissant:Task ;
14+
sh:property [
15+
sh:path croissant:input ;
16+
sh:or (
17+
[ sh:class croissant:Dataset ]
18+
[ sh:class schema:Dataset ]
19+
[ sh:nodeKind sh:IRI ]
20+
[ sh:class croissant:InputSpec ]
21+
) ;
22+
sh:message "croissant:input must point to a Dataset, URL, or InputSpec."
23+
] ;
24+
sh:property [
25+
sh:path croissant:output ;
26+
sh:or (
27+
[ sh:class schema:Dataset ]
28+
[ sh:class schema:SoftwareSourceCode ]
29+
[ sh:class croissant:OutputSpec ]
30+
) ;
31+
sh:message "croissant:output must point to a Dataset, SoftwareSourceCode, or OutputSpec."
32+
] ;
33+
sh:property [
34+
sh:path croissant:implementation ;
35+
sh:or (
36+
[ sh:class schema:SoftwareApplication ]
37+
[ sh:class schema:SoftwareSourceCode ]
38+
[ sh:class croissant:ImplementationSpec ]
39+
) ;
40+
sh:message "croissant:implementation must point to a SoftwareApplication, SoftwareSourceCode, or ImplementationSpec."
41+
] ;
42+
sh:property [
43+
sh:path croissant:execution ;
44+
sh:class croissant:ExecutionInfo ;
45+
sh:message "croissant:execution must point to an ExecutionInfo."
46+
] ;
47+
sh:property [
48+
sh:path croissant:evaluation ;
49+
sh:class croissant:EvaluationTask ;
50+
sh:message "croissant:evaluation must point to an EvaluationTask."
51+
] ;
52+
sh:property [
53+
sh:path croissant:subTask ;
54+
sh:class croissant:Task ;
55+
sh:message "croissant:subTask must point to a Task."
56+
] .
57+
58+
# -----------------------------------------------------------------------------
59+
# TaskProblem Shape
60+
# -----------------------------------------------------------------------------
61+
croissant:TaskProblemShape
62+
a sh:NodeShape ;
63+
sh:targetClass croissant:TaskProblem ;
64+
# A TaskProblem must have at least one property that is a spec class
65+
sh:property [
66+
sh:or (
67+
[ sh:path croissant:input ; sh:class croissant:InputSpec ]
68+
[ sh:path croissant:output ; sh:class croissant:OutputSpec ]
69+
[ sh:path croissant:implementation ; sh:class croissant:ImplementationSpec ]
70+
) ;
71+
sh:minCount 1 ;
72+
sh:message "A TaskProblem must have at least one property (input, output, or implementation) that is a spec class (InputSpec, OutputSpec, or ImplementationSpec)."
73+
] ;
74+
sh:property [
75+
sh:path croissant:execution ;
76+
sh:class croissant:ExecutionSpec ;
77+
sh:message "Execution property of a TaskProblem must be an ExecutionSpec."
78+
] ;
79+
sh:property [
80+
sh:path croissant:evaluation ;
81+
sh:or (
82+
[ sh:class croissant:EvaluationTask ]
83+
[ sh:class croissant:EvaluationSpec ]
84+
) ;
85+
sh:message "Evaluation property of a TaskProblem must be an EvaluationTask or EvaluationSpec."
86+
] .
87+
88+
# -----------------------------------------------------------------------------
89+
# TaskSolution Shape
90+
# -----------------------------------------------------------------------------
91+
croissant:TaskSolutionShape
92+
a sh:NodeShape ;
93+
sh:targetClass croissant:TaskSolution ;
94+
sh:property [
95+
sh:path schema:isBasedOn ;
96+
sh:or (
97+
[ sh:class croissant:TaskProblem ]
98+
[ sh:nodeKind sh:IRI ]
99+
) ;
100+
sh:minCount 1 ;
101+
sh:message "A TaskSolution must be formally linked to a TaskProblem via schema:isBasedOn."
102+
] ;
103+
sh:property [
104+
sh:path croissant:input ;
105+
sh:not [ sh:class croissant:InputSpec ] ;
106+
sh:message "A TaskSolution cannot have an InputSpec as input." ;
107+
] ;
108+
sh:property [
109+
sh:path croissant:output ;
110+
sh:not [ sh:class croissant:OutputSpec ] ;
111+
sh:message "A TaskSolution cannot have an OutputSpec as output." ;
112+
] ;
113+
sh:property [
114+
sh:path croissant:implementation ;
115+
sh:not [ sh:class croissant:ImplementationSpec ] ;
116+
sh:message "A TaskSolution cannot have an ImplementationSpec as implementation." ;
117+
] ;
118+
sh:property [
119+
sh:path croissant:evaluation ;
120+
sh:not [ sh:class croissant:EvaluationSpec ] ;
121+
sh:message "A TaskSolution cannot have an EvaluationSpec as evaluation." ;
122+
] ;
123+
sh:or (
124+
# Option 1: Direct concrete implementation
125+
[
126+
sh:property [
127+
sh:path croissant:implementation ;
128+
sh:qualifiedValueShape [
129+
sh:not [ sh:class croissant:ImplementationSpec ]
130+
] ;
131+
sh:qualifiedMinCount 1 ;
132+
sh:message "TaskSolution must have at least one concrete implementation if it has no subTasks." ;
133+
]
134+
]
135+
# Option 2: All subtasks have concrete implementations
136+
[
137+
sh:and (
138+
[
139+
sh:property [
140+
sh:path croissant:subTask ;
141+
sh:minCount 1 ;
142+
sh:message "TaskSolution without direct concrete implementation must have at least one subTask." ;
143+
]
144+
]
145+
[
146+
sh:property [
147+
sh:path croissant:subTask ;
148+
sh:node [
149+
a sh:NodeShape ;
150+
sh:property [
151+
sh:path croissant:implementation ;
152+
sh:qualifiedValueShape [
153+
sh:not [ sh:class croissant:ImplementationSpec ]
154+
] ;
155+
sh:qualifiedMinCount 1 ;
156+
sh:message "All subTasks of a TaskSolution must have a concrete implementation." ;
157+
]
158+
]
159+
]
160+
]
161+
)
162+
]
163+
) .
164+
165+
166+
# -----------------------------------------------------------------------------
167+
# Spec Shapes
168+
# -----------------------------------------------------------------------------
169+
croissant:InputSpecShape
170+
a sh:NodeShape ;
171+
sh:targetClass croissant:InputSpec ;
172+
sh:property [
173+
sh:path croissant:schema ;
174+
sh:class croissant:RecordSet ;
175+
sh:message "croissant:schema must point to a RecordSet."
176+
] .
177+
croissant:OutputSpecShape
178+
a sh:NodeShape ;
179+
sh:targetClass croissant:OutputSpec ;
180+
sh:property [
181+
sh:path croissant:schema ;
182+
sh:class croissant:RecordSet ;
183+
sh:message "croissant:schema must point to a RecordSet."
184+
] .
185+
croissant:ImplementationSpecShape
186+
a sh:NodeShape ;
187+
sh:targetClass croissant:ImplementationSpec ;
188+
sh:property [
189+
sh:path croissant:tests ;
190+
sh:or (
191+
[ sh:class croissant:Test ]
192+
[ sh:nodeKind sh:IRI ]
193+
) ;
194+
sh:message "croissant:tests must point to a Test or URL."
195+
] ;
196+
sh:property [
197+
sh:path croissant:environment ;
198+
sh:or (
199+
[ sh:class schema:CreativeWork ]
200+
[ sh:nodeKind sh:IRI ]
201+
) ;
202+
sh:message "croissant:environment must point to a CreativeWork or URL."
203+
] .
204+
205+
# -----------------------------------------------------------------------------
206+
# EvaluationResult Shape
207+
# -----------------------------------------------------------------------------
208+
croissant:EvaluationResultShape
209+
a sh:NodeShape ;
210+
sh:targetClass croissant:EvaluationResult ;
211+
sh:property [
212+
sh:path croissant:metric ;
213+
sh:or (
214+
[ sh:datatype xsd:string ]
215+
[ sh:nodeKind sh:IRI ]
216+
) ;
217+
sh:minCount 1 ;
218+
sh:message "croissant:metric must be a string or a URL, and is required."
219+
] ;
220+
sh:property [
221+
sh:path croissant:value ;
222+
sh:or (
223+
[ sh:class schema:QuantitativeValue ]
224+
[ sh:datatype xsd:string ]
225+
[ sh:datatype xsd:decimal ]
226+
[ sh:datatype xsd:integer ]
227+
) ;
228+
sh:minCount 1 ;
229+
sh:message "croissant:value must be a QuantitativeValue, string, or number, and is required."
230+
] .
231+
232+
# -----------------------------------------------------------------------------
233+
# EvaluationTask Shape
234+
# -----------------------------------------------------------------------------
235+
croissant:EvaluationTaskShape
236+
a sh:NodeShape ;
237+
sh:targetClass croissant:EvaluationTask ;
238+
sh:property [
239+
sh:path croissant:evaluationResults ;
240+
sh:class croissant:EvaluationResult ;
241+
sh:message "croissant:evaluationResults must point to an EvaluationResult."
242+
] ;
243+
sh:property [
244+
sh:path croissant:evaluatedSolution ;
245+
sh:class croissant:TaskSolution ;
246+
sh:minCount 1 ;
247+
sh:maxCount 1 ;
248+
sh:message "croissant:evaluatedSolution must point to exactly one TaskSolution."
249+
] .
250+

0 commit comments

Comments
 (0)