Skip to content

Commit cb21ae5

Browse files
committed
tweak heading levels
1 parent bb69be6 commit cb21ae5

1 file changed

Lines changed: 17 additions & 19 deletions

File tree

federated-learning-profile.md

Lines changed: 17 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -30,23 +30,21 @@ At minimum: Inform a recipient of the federated learning configuration that was
3030

3131
Ideal: Enable the federated learning process to be re-run automatically by providing a standard way to document configuration values.
3232

33-
## Compatibility
33+
### Compatibility
3434

3535
This profile is based on RO-Crate 1.2 and aims to be compatible with other profiles used in trusted research environments and workflows, including [Five Safes RO-Crate] (0.4+) and the [Workflow Run RO-Crate] family.
3636

37-
## Inheritance
37+
### Inheritance
3838

3939
This profile inherits all the requirements from [Process Run Crate], a profile designed to capture the execution of one or more computational tools. This ensures consistency in the core metadata structure of the crate.
4040

4141
To summarise this profile as an extension of Process Run Crate: the [CreateAction] represents the learning process, with [object] referencing the training datasets AND the learning configuration, [result] referencing the output model, and [instrument] referencing the federated learning framework used (e.g. Flower).
4242

43-
## Example Metadata Document (`ro-crate-metadata.json`)
43+
### Example Metadata Document (`ro-crate-metadata.json`)
4444

4545
Example metadata file: [JSON-LD](example-fl-crate/ro-crate-metadata.json), [HTML preview](example-fl-crate/ro-crate-preview.html).
4646

47-
## Full Specification
48-
49-
### Input data
47+
## Input data
5048

5149
Each dataset used for training SHOULD be represented by a data entity in the crate. The data itself MAY be access controlled.
5250

@@ -60,17 +58,17 @@ In data entities representing training datasets:
6058

6159
Each entity representing a training dataset MUST be referenced from [object] on the [CreateAction] which describes the training execution (see [Federated Learning Process Execution](#federated-learning-process-execution)).
6260

63-
#### Data partitioning strategy
61+
### Data partitioning strategy
6462

6563
The federated learning process described in the crate is assumed to use a “horizontal” data partitioning strategy, where each client site holds the same variables for a different cohort.
6664

6765
Future versions of this profile may also support “vertical” data partitioning, where different clients hold different variables for the same cohort.
6866

69-
### Federated Learning Tools and Configuration
67+
## Federated Learning Tools and Configuration
7068

7169
It is assumed that there is, at minimum, a tool or script that is distributed to clients and used to train the model on local data. Depending on the architecture or framework used there may be additional tools or scripts, for example to configure a centralized server or aggregator.
7270

73-
#### Training tool or workflow
71+
### Training tool or workflow
7472

7573
The training could be orchestrated and run using a specific federated learning framework (e.g. Flower), a general software tool (e.g. Python), or a computational workflow (e.g. a Nextflow workflow), according to how the learning process is designed.
7674

@@ -83,21 +81,21 @@ That entity MUST be referenced from [instrument] in the [CreateAction] describin
8381

8482
If a computational workflow is used, the crate MAY also include further metadata to conform to [Workflow Run Crate].
8583

86-
#### Training configuration – as files
84+
### Training configuration – as files
8785

8886
Where the training is configured using configuration files or scripts, those files SHOULD be included in the crate and described using data entities.
8987

9088
Those entities SHOULD be referenced from [object] in the [CreateAction] describing the training execution (see [Federated Learning Process Execution](#federated-learning-process-execution)).
9189

92-
#### Training configuration – as environment variables
90+
### Training configuration – as environment variables
9391

9492
Configuration that is provided using environment variables should be described using [PropertyValue] entities, as in [Process Run Crate: Representing environment variable settings](https://www.researchobject.org/workflow-run-crate/profiles/process_run_crate/#representing-environment-variable-settings)
9593

96-
### Federated Learning Process Execution
94+
## Federated Learning Process Execution
9795

9896
It is assumed that the training process will usually be captured as a single [CreateAction].
9997

100-
#### Execution of the training process
98+
### Execution of the training process
10199

102100
A [CreateAction] entity MUST be present which describes the execution of the training process using the following properties:
103101

@@ -112,20 +110,20 @@ A [CreateAction] entity MUST be present which describes the execution of the tra
112110
* [resourceUsage] MAY reference [resource usage metrics](#metrics---resource-usage) for the training process
113111
* Other properties (e.g. [name], [description], [agent]) SHOULD follow the guidelines set in [Process Run Crate](https://www.researchobject.org/workflow-run-crate/profiles/process_run_crate/#requirements)
114112

115-
#### Pre-processing and post-processing
113+
### Pre-processing and post-processing
116114

117115
Additional [CreateAction]s MAY be included in the crate to describe pre- and post- processing steps. See [Process Run Crate: Multiple processes](https://www.researchobject.org/workflow-run-crate/profiles/process_run_crate/#multiple-processes).
118116

119117
Note that if those pre- or post-processing steps are part of an automated workflow, they may be sufficiently described by using [Workflow Run Crate] or [Provenance Run Crate].
120118

121-
#### Metrics - resource usage
119+
### Metrics - resource usage
122120

123121
Resource metrics – such as memory usage, execution time, estimated carbon cost, etc. – MAY be included in the crate. If they are they SHOULD follow the guidance in [Provenance Run Crate: Representing resource usage](https://www.researchobject.org/workflow-run-crate/profiles/provenance_run_crate/#representing-resource-usage).
124122

125123
*Note 2026-02-26: this link does not yet work as the material is not yet merged into RDMkit*
126124
For guidance on best-practice metrics to collect for federated learning, see [RDMkit: Federated Learning](https://rdmkit.elixir-europe.org/federated_learning).
127125

128-
#### Metrics - model performance
126+
### Metrics - model performance
129127

130128
Metrics that describe the performance of the training process and/or the trained model – such as drift detection metrics, loss/accuracy metrics, client-participation rate, etc. – MAY be included in the crate. If included, they SHOULD be described using [PropertyValue] entities, and those entities MUST be linked from [result] on the [CreateAction] (along with the model itself, see [Output model](#output-model)).
131129

@@ -136,7 +134,7 @@ This aligns with the guidance on resource usage metrics above, except that the m
136134
*Note 2026-02-26: this link does not yet work as the material is not yet merged into RDMkit*
137135
For guidance on best-practice metrics to collect for federated learning, see [RDMkit: Federated Learning](https://rdmkit.elixir-europe.org/federated_learning).
138136

139-
### Output model
137+
## Output model
140138

141139
The crate MUST contain a data entity representing the output model. This could be a direct serialization of the model to file, or another representation of the model. The data entity:
142140

@@ -150,9 +148,9 @@ The model MAY be further documented by one or more supplementary files, such as
150148
* the model entity MUST reference them through [subjectOf]
151149
* If the files were automatically generated during/at the end of the training process, the relevant [CreateAction] SHOULD reference them via [result]
152150

153-
### Additional metadata
151+
## Additional metadata
154152

155-
#### Sensitive Data
153+
### Sensitive Data
156154

157155
In processes where sensitive data is used, the [Five Safes RO-Crate] profile MAY additionally be followed.
158156

0 commit comments

Comments
 (0)