|
| 1 | +.. # Copyright (C) 2020-2024 Intel Corporation |
| 2 | +.. # SPDX-License-Identifier: Apache-2.0 |
| 3 | +
|
| 4 | +Federated Analytics |
| 5 | +======================================= |
| 6 | + |
| 7 | +Introduction to Federated Analytics |
| 8 | +------------------------------------- |
| 9 | + |
| 10 | +Federated Analytics is a privacy-preserving approach to compute statistics or perform data analysis on distributed datasets without aggregating raw data into a centralized location. This method ensures data security while enabling insights to be drawn from decentralized data sources. For instance, one can compute the mean, frequency distributions, or other statistical measures across datasets located on multiple devices. Federated Analytics is particularly valuable in scenarios where data sharing is restricted due to privacy concerns or regulatory constraints. |
| 11 | + |
| 12 | +OpenFL's Support for Federated Analytics |
| 13 | +------------------------------------------ |
| 14 | + |
| 15 | +OpenFL, a flexible framework for Federated Learning, extends its capabilities to support Federated Analytics. By leveraging the federation plan and task runner API, OpenFL enables users to perform analytics tasks across collaborators. These tasks are defined in the ``plan.yaml`` file and distributed to collaborators for execution. The results are then aggregated by the aggregator to provide global insights. |
| 16 | + |
| 17 | + |
| 18 | +Example Workspace: Histogram Calculation using sklearn IRIS Dataset |
| 19 | +------------------------------------------------------------------------------ |
| 20 | + |
| 21 | +The Federated Analytics workspace for histogram calculation demonstrates how to compute frequency distributions of specific features across distributed datasets. This workspace leverages the OpenFL framework to ensure privacy-preserving analytics while providing global insights into the data. |
| 22 | + |
| 23 | +**Task Configuration:** |
| 24 | + |
| 25 | +The analytics tasks are defined in the `plan.yaml` file. For example: |
| 26 | + |
| 27 | +.. code-block:: yaml |
| 28 | + :emphasize-lines: 6,41,43,45 |
| 29 | +
|
| 30 | + aggregator: |
| 31 | + defaults: plan/defaults/aggregator.yaml |
| 32 | + template: openfl.component.Aggregator |
| 33 | + settings: |
| 34 | + last_state_path: save/result.json |
| 35 | + rounds_to_train: 1 # Number of training rounds (set to 1 for Federated Analytics). |
| 36 | +
|
| 37 | + collaborator: |
| 38 | + defaults: plan/defaults/collaborator.yaml |
| 39 | + template: openfl.component.Collaborator |
| 40 | + settings: |
| 41 | + use_delta_updates: false |
| 42 | + opt_treatment: RESET |
| 43 | +
|
| 44 | + data_loader: |
| 45 | + defaults: plan/defaults/data_loader.yaml |
| 46 | + template: src.dataloader.IRISInMemory |
| 47 | + settings: |
| 48 | + collaborator_count: 2 |
| 49 | + data_group_name: iris |
| 50 | + batch_size: 150 |
| 51 | +
|
| 52 | + task_runner: |
| 53 | + defaults: plan/defaults/task_runner.yaml |
| 54 | + template: src.taskrunner.IrisHistogram |
| 55 | +
|
| 56 | + network: |
| 57 | + defaults: plan/defaults/network.yaml |
| 58 | +
|
| 59 | + assigner: |
| 60 | + template: openfl.component.RandomGroupedAssigner |
| 61 | + settings: |
| 62 | + task_groups: |
| 63 | + - name: analytics |
| 64 | + percentage: 1.0 |
| 65 | + tasks: |
| 66 | + - analytics |
| 67 | +
|
| 68 | + tasks: |
| 69 | + analytics: |
| 70 | + function: analytics |
| 71 | + aggregation_type: |
| 72 | + template: src.aggregatehistogram.AggregateHistogram |
| 73 | + kwargs: |
| 74 | + columns: ['sepal length (cm)', 'sepal width (cm)'] |
| 75 | +
|
| 76 | +**Note:** The `function` and `aggregation_type.template` fields in the configuration can be replaced with custom implementations to suit specific use cases. This flexibility allows users to define their own analytics logic and aggregation methods tailored to their requirements. |
| 77 | + |
| 78 | +**Data Distribution**: The dataset is distributed across collaborators, with each collaborator holding a local shard of the data. |
| 79 | + |
| 80 | +**Local Computation**: Each collaborator computes the histogram for the specified feature(s) on its local data shard. This ensures that raw data never leaves the collaborator's environment. |
| 81 | + |
| 82 | +**Aggregation**: The aggregator collects the histograms from all collaborators and combines them to compute the global histogram. The aggregated results are saved in `save/result.json`. This file provides a global view of the frequency distribution for the selected feature, computed in a privacy-preserving manner. |
| 83 | + |
| 84 | + |
| 85 | +By following this structured approach, the Federated Analytics workspace enables secure and efficient computation of histograms across distributed datasets. |
| 86 | + |
| 87 | +Detailed Instructions |
| 88 | +--------------------- |
| 89 | + |
| 90 | +Workspace Setup and Federation Run |
| 91 | + |
| 92 | +Create a workspace for analytics (for example, using the federated_analytics/histogram template): |
| 93 | + |
| 94 | +.. code-block:: bash |
| 95 | +
|
| 96 | + fx workspace create --prefix ./analytics_workspace --template federated_analytics/histogram |
| 97 | + cd analytics_workspace |
| 98 | + fx workspace certify |
| 99 | + fx aggregator generate-cert-request |
| 100 | + fx aggregator certify --silent |
| 101 | +
|
| 102 | +Initialize the plan normally: |
| 103 | + |
| 104 | +.. code-block:: bash |
| 105 | +
|
| 106 | + fx plan initialize |
| 107 | +
|
| 108 | +Run the federation using your collaborators. For example: |
| 109 | + |
| 110 | +.. code-block:: bash |
| 111 | +
|
| 112 | + fx collaborator create -n collaborator1 -d 1 |
| 113 | + fx collaborator generate-cert-request -n collaborator1 |
| 114 | + fx collaborator certify -n collaborator1 --silent |
| 115 | +
|
| 116 | + fx collaborator create -n collaborator2 -d 2 |
| 117 | + fx collaborator generate-cert-request -n collaborator2 |
| 118 | + fx collaborator certify -n collaborator2 --silent |
| 119 | +
|
| 120 | + fx aggregator start > ~/fx_aggregator.log 2>&1 & |
| 121 | + fx collaborator start -n collaborator1 > ~/collab1.log 2>&1 & |
| 122 | + fx collaborator start -n collaborator2 > ~/collab2.log 2>&1 & |
| 123 | +
|
| 124 | +Once the federation run is complete, the results will be saved. |
| 125 | + |
| 126 | +The result file `save/result.json` contains the aggregated histogram data. For example: |
| 127 | + |
| 128 | +.. code-block:: json |
| 129 | +
|
| 130 | + { |
| 131 | + "sepal length (cm) histogram": [ |
| 132 | + 0.0, |
| 133 | + 0.0, |
| 134 | + 9.0, |
| 135 | + 50.0, |
| 136 | + 56.0, |
| 137 | + 28.0, |
| 138 | + 7.0, |
| 139 | + 0.0, |
| 140 | + 0.0 |
| 141 | + ], |
| 142 | + "sepal length (cm) bins": [ |
| 143 | + 4.0, |
| 144 | + 5.777777671813965, |
| 145 | + 7.55555534362793, |
| 146 | + 9.333333015441895, |
| 147 | + 11.11111068725586, |
| 148 | + 12.88888931274414, |
| 149 | + 14.666666984558105, |
| 150 | + 16.44444465637207, |
| 151 | + 18.22222137451172, |
| 152 | + 20.0 |
| 153 | + ], |
| 154 | + "sepal width (cm) histogram": [ |
| 155 | + 47.0, |
| 156 | + 91.0, |
| 157 | + 12.0, |
| 158 | + 0.0, |
| 159 | + 0.0, |
| 160 | + 0.0, |
| 161 | + 0.0, |
| 162 | + 0.0, |
| 163 | + 0.0 |
| 164 | + ], |
| 165 | + "sepal width (cm) bins": [ |
| 166 | + 4.0, |
| 167 | + 5.777777671813965, |
| 168 | + 7.55555534362793, |
| 169 | + 9.333333015441895, |
| 170 | + 11.11111068725586, |
| 171 | + 12.88888931274414, |
| 172 | + 14.666666984558105, |
| 173 | + 16.44444465637207, |
| 174 | + 18.22222137451172, |
| 175 | + 20.0 |
| 176 | + ] |
| 177 | + } |
| 178 | +
|
| 179 | +
|
| 180 | +Conclusion |
| 181 | +---------- |
| 182 | +Federated Analytics in OpenFL enables privacy-preserving data analysis on distributed datasets. By leveraging the task runner API and predefined analytics tasks, users can seamlessly compute global statistics without compromising data privacy. This feature simplifies the workflow for distributed data analysis and ensures compliance with privacy regulations. |
0 commit comments