You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[QC-589] Reduce the amount of configuration needed for mutlinode setups (#715)
* Do not require remote machine and port when running with AliECS
* One less indentation in infrastucture generator
* Allow for optional return type for DataSampling::PortForPolicy
* Expand and update the multinode setups documentation
* trigger rebuild
* Update InfrastructureGenerator.cxx
* Wrong if condition
Co-authored-by: Barthélémy von Haller <barthelemy.von.haller@gmail.com>
Co-authored-by: Barthélémy von Haller <barthelemy.von.haller@cern.ch>
Copy file name to clipboardExpand all lines: doc/Advanced.md
+32-21Lines changed: 32 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -132,10 +132,9 @@ locally produced Monitor Objects should be merged on QC servers and then have Ch
132
132
By **remote QC tasks** we mean those which run on QC servers (**remote machines**), while **local QC Tasks**
133
133
run on FLPs and EPNs (**local machines**).
134
134
135
-
While it is responsibility of the run operators to run all the processing topologies during the
136
-
data taking, here we show how to achieve such multinode workflows on development setups, running
137
-
them just with DPL driver. Note that for now we support cases with one or more local machines,
138
-
but just only one remote machine.
135
+
Setting up a multinode setup to run standalone or with AliECS requires different amount of parameters,
136
+
as some of them are overwritten by AliECS anyway. Such parameters are marked accordingly. Please note
137
+
that for now we support cases with one or more local machines, but just only one remote machine.
139
138
140
139
In our example, we assume having two local processing nodes (`localnode1`, `localnode2`) and one
141
140
QC node (`qcnode`). There are two types of QC Tasks declared:
@@ -145,7 +144,9 @@ QC node (`qcnode`). There are two types of QC Tasks declared:
145
144
`localnode2`only. Mergers are not needed in this case, but there is a process running Checks against
146
145
Monitor Objects generated by this Task.
147
146
148
-
We use the `SkeletonTask` class for both, but any Task can be used of course. Should a Task be local, all its `MonitorObject`s need to be mergeable - they should be one of the mergeable ROOT types (histograms, TTrees) or inherit [MergeInterface](https://github.com/AliceO2Group/AliceO2/blob/dev/Utilities/Mergers/include/Mergers/MergeInterface.h).
147
+
We use the `SkeletonTask` class for both, but any Task can be used of course. Should a Task be local,
148
+
all its `MonitorObject`s need to be mergeable - they should be one of the mergeable ROOT types (histograms, TTrees)
149
+
or inherit [MergeInterface](https://github.com/AliceO2Group/AliceO2/blob/dev/Utilities/Mergers/include/Mergers/MergeInterface.h).
149
150
150
151
These are the steps to follow to get a multinode setup:
151
152
@@ -165,18 +166,21 @@ added:
165
166
"localnode1",
166
167
"localnode2"
167
168
],
168
-
"remoteMachine": "qcnode",
169
-
"remotePort": "30132",
170
-
"mergingMode": "delta"
169
+
"remoteMachine": "qcnode", "":"not needed with AliECS",
170
+
"remotePort": "30132", "":"not needed with AliECS",
171
+
"mergingMode": "delta", "":"if absent, delta is default"
171
172
}
172
173
},
173
174
```
174
175
List the local processing machines in the `localMachines` array. `remoteMachine` should contain the host name which
175
-
will serve as a QC server and `remotePort` should be a port number on which Mergers will wait for upcoming MOs. Make
176
-
sure it is not used by other service. If different QC Tasks are run in parallel, use separate ports for each. One
177
-
also may choose the merging mode - `delta` is the default and recommended (tasks are reset after each cycle, so they
178
-
send only updates), but if it is not feasible, Mergers may expect `entire` objects - tasks are not reset, they
179
-
always send entire objects and the latest versions are combined in Mergers.
176
+
will serve as a QC server and `remotePort` should be a port number on which Mergers will wait for upcoming MOs. Make
177
+
sure it is not used by other service. If different QC Tasks are run in parallel, use separate ports for each. One
178
+
also may choose the merging mode - `delta` is the default and recommended (tasks are reset after each cycle, so they
179
+
send only updates), but if it is not feasible, Mergers may expect `entire` objects - tasks are not reset, they
180
+
always send entire objects and the latest versions are combined in Mergers.
181
+
182
+
With the `delta` mode, one can cheat by specifying just one local machine name and referencing only that one later.
183
+
This is not possible with `entire` mode, because then Mergers need identifiable data sources to merge objects correctly.
180
184
181
185
In case of a remote task, choosing `"remote"` option for the `"location"` parameter is enough.
182
186
@@ -196,29 +200,28 @@ In case of a remote task, choosing `"remote"` option for the `"location"` parame
196
200
}
197
201
```
198
202
199
-
However in both cases, one has to specify the machines where data should be sampled, as below. If data should be
200
-
published to external machines (with remote tasks), one has to add a local port number. Use separate ports for each
201
-
Data Sampling Policy.
203
+
In case the task is running remotely, one has to specify the machines where data should be published to external
204
+
machines (with remote tasks) and a local port number. Use separate ports for each Data Sampling Policy.
202
205
```json
203
206
{
204
207
"dataSamplingPolicies": [
205
208
...
206
209
{
207
210
"id": "rnd-little",
208
211
"active": "true",
209
-
"machines": [
212
+
"machines": [ "","needed only for remote QC tasks",
210
213
"localnode2"
211
214
],
212
-
"port": "30333"
215
+
"port": "30333", "":"not needed with AliECS",
213
216
...
214
217
}
215
218
]
216
219
}
217
220
```
218
221
/
219
222
2. Make sure that the firewalls are properly configured. If your machines block incoming/outgoing connections by
220
-
default, you can add these rules to the firewall (run as sudo). Consider enabling only concrete ports or a small
221
-
range of those.
223
+
default, you can add these rules to the firewall (run as sudo). Consider enabling only concrete ports or a small
If your network is isolated, you might consider disabling the firewall as an alternative. Be wary of the security risks.
237
+
```
238
+
systemctl stop firewalld # to disable until reboot
239
+
systemctl disable firewalld # to disable permanently
240
+
```
233
241
234
242
3. Install the same version of the QC software on each of these nodes. We cannot guarantee that different QC versions will talk to each other without problems. Also, make sure the configuration file that you will use is the same everywhere.
If there are no problems, on QCG you should see the `example` histogram updated under the paths `qc/TST/MO/MultiNodeLocal`
247
255
and `qc/TST/MO/MultiNodeRemote`, and corresponding Checks under the path `qc/TST/QO/`.
248
256
249
-
## Writing a DPL data producer
257
+
When using AliECS, one has to generate workflow templates and upload them to the corresponding repository. Please
258
+
contact the QC or AliECS developers to receive assistance or instruction on how to do that.
259
+
260
+
## Writing a DPL data producer
250
261
251
262
For your convenience, and although it does not lie within the QC scope, we would like to document how to write a simple data producer in the DPL. The DPL documentation can be found [here](https://github.com/AliceO2Group/AliceO2/blob/dev/Framework/Core/README.md) and for questions please head to the [forum](https://alice-talk.web.cern.ch/).
0 commit comments