You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: federated-learning-profile.md
+17-19Lines changed: 17 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,23 +30,21 @@ At minimum: Inform a recipient of the federated learning configuration that was
30
30
31
31
Ideal: Enable the federated learning process to be re-run automatically by providing a standard way to document configuration values.
32
32
33
-
## Compatibility
33
+
###Compatibility
34
34
35
35
This profile is based on RO-Crate 1.2 and aims to be compatible with other profiles used in trusted research environments and workflows, including [Five Safes RO-Crate] (0.4+) and the [Workflow Run RO-Crate] family.
36
36
37
-
## Inheritance
37
+
###Inheritance
38
38
39
39
This profile inherits all the requirements from [Process Run Crate], a profile designed to capture the execution of one or more computational tools. This ensures consistency in the core metadata structure of the crate.
40
40
41
41
To summarise this profile as an extension of Process Run Crate: the [CreateAction] represents the learning process, with [object] referencing the training datasets AND the learning configuration, [result] referencing the output model, and [instrument] referencing the federated learning framework used (e.g. Flower).
42
42
43
-
## Example Metadata Document (`ro-crate-metadata.json`)
Example metadata file: [JSON-LD](example-fl-crate/ro-crate-metadata.json), [HTML preview](example-fl-crate/ro-crate-preview.html).
46
46
47
-
## Full Specification
48
-
49
-
### Input data
47
+
## Input data
50
48
51
49
Each dataset used for training SHOULD be represented by a data entity in the crate. The data itself MAY be access controlled.
52
50
@@ -60,17 +58,17 @@ In data entities representing training datasets:
60
58
61
59
Each entity representing a training dataset MUST be referenced from [object] on the [CreateAction] which describes the training execution (see [Federated Learning Process Execution](#federated-learning-process-execution)).
62
60
63
-
####Data partitioning strategy
61
+
### Data partitioning strategy
64
62
65
63
The federated learning process described in the crate is assumed to use a “horizontal” data partitioning strategy, where each client site holds the same variables for a different cohort.
66
64
67
65
Future versions of this profile may also support “vertical” data partitioning, where different clients hold different variables for the same cohort.
68
66
69
-
###Federated Learning Tools and Configuration
67
+
## Federated Learning Tools and Configuration
70
68
71
69
It is assumed that there is, at minimum, a tool or script that is distributed to clients and used to train the model on local data. Depending on the architecture or framework used there may be additional tools or scripts, for example to configure a centralized server or aggregator.
72
70
73
-
####Training tool or workflow
71
+
### Training tool or workflow
74
72
75
73
The training could be orchestrated and run using a specific federated learning framework (e.g. Flower), a general software tool (e.g. Python), or a computational workflow (e.g. a Nextflow workflow), according to how the learning process is designed.
76
74
@@ -83,21 +81,21 @@ That entity MUST be referenced from [instrument] in the [CreateAction] describin
83
81
84
82
If a computational workflow is used, the crate MAY also include further metadata to conform to [Workflow Run Crate].
85
83
86
-
####Training configuration – as files
84
+
### Training configuration – as files
87
85
88
86
Where the training is configured using configuration files or scripts, those files SHOULD be included in the crate and described using data entities.
89
87
90
88
Those entities SHOULD be referenced from [object] in the [CreateAction] describing the training execution (see [Federated Learning Process Execution](#federated-learning-process-execution)).
91
89
92
-
####Training configuration – as environment variables
90
+
### Training configuration – as environment variables
93
91
94
92
Configuration that is provided using environment variables should be described using [PropertyValue] entities, as in [Process Run Crate: Representing environment variable settings](https://www.researchobject.org/workflow-run-crate/profiles/process_run_crate/#representing-environment-variable-settings)
95
93
96
-
###Federated Learning Process Execution
94
+
## Federated Learning Process Execution
97
95
98
96
It is assumed that the training process will usually be captured as a single [CreateAction].
99
97
100
-
####Execution of the training process
98
+
### Execution of the training process
101
99
102
100
A [CreateAction] entity MUST be present which describes the execution of the training process using the following properties:
103
101
@@ -112,20 +110,20 @@ A [CreateAction] entity MUST be present which describes the execution of the tra
112
110
*[resourceUsage] MAY reference [resource usage metrics](#metrics---resource-usage) for the training process
113
111
* Other properties (e.g. [name], [description], [agent]) SHOULD follow the guidelines set in [Process Run Crate](https://www.researchobject.org/workflow-run-crate/profiles/process_run_crate/#requirements)
114
112
115
-
####Pre-processing and post-processing
113
+
### Pre-processing and post-processing
116
114
117
115
Additional [CreateAction]s MAY be included in the crate to describe pre- and post- processing steps. See [Process Run Crate: Multiple processes](https://www.researchobject.org/workflow-run-crate/profiles/process_run_crate/#multiple-processes).
118
116
119
117
Note that if those pre- or post-processing steps are part of an automated workflow, they may be sufficiently described by using [Workflow Run Crate] or [Provenance Run Crate].
120
118
121
-
####Metrics - resource usage
119
+
### Metrics - resource usage
122
120
123
121
Resource metrics – such as memory usage, execution time, estimated carbon cost, etc. – MAY be included in the crate. If they are they SHOULD follow the guidance in [Provenance Run Crate: Representing resource usage](https://www.researchobject.org/workflow-run-crate/profiles/provenance_run_crate/#representing-resource-usage).
124
122
125
123
*Note 2026-02-26: this link does not yet work as the material is not yet merged into RDMkit*
126
124
For guidance on best-practice metrics to collect for federated learning, see [RDMkit: Federated Learning](https://rdmkit.elixir-europe.org/federated_learning).
127
125
128
-
####Metrics - model performance
126
+
### Metrics - model performance
129
127
130
128
Metrics that describe the performance of the training process and/or the trained model – such as drift detection metrics, loss/accuracy metrics, client-participation rate, etc. – MAY be included in the crate. If included, they SHOULD be described using [PropertyValue] entities, and those entities MUST be linked from [result] on the [CreateAction] (along with the model itself, see [Output model](#output-model)).
131
129
@@ -136,7 +134,7 @@ This aligns with the guidance on resource usage metrics above, except that the m
136
134
*Note 2026-02-26: this link does not yet work as the material is not yet merged into RDMkit*
137
135
For guidance on best-practice metrics to collect for federated learning, see [RDMkit: Federated Learning](https://rdmkit.elixir-europe.org/federated_learning).
138
136
139
-
###Output model
137
+
## Output model
140
138
141
139
The crate MUST contain a data entity representing the output model. This could be a direct serialization of the model to file, or another representation of the model. The data entity:
142
140
@@ -150,9 +148,9 @@ The model MAY be further documented by one or more supplementary files, such as
150
148
* the model entity MUST reference them through [subjectOf]
151
149
* If the files were automatically generated during/at the end of the training process, the relevant [CreateAction] SHOULD reference them via [result]
152
150
153
-
###Additional metadata
151
+
## Additional metadata
154
152
155
-
####Sensitive Data
153
+
### Sensitive Data
156
154
157
155
In processes where sensitive data is used, the [Five Safes RO-Crate] profile MAY additionally be followed.
0 commit comments