You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/programming_guide/filters.rst
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,6 +44,14 @@ purpose of security. In fact, privacy and homomorphic encryption techniques are
44
44
- SVTPrivacy for differential privacy through sparse vector techniques (:mod:`nvflare.app_common.filters.svt_privacy`)
45
45
- Homomorphic encryption filters to encrypt data before sharing (:mod:`nvflare.app_common.homomorphic_encryption.he_model_encryptor.py` and :mod:`nvflare.app_common.homomorphic_encryption.he_model_decryptor`)
46
46
47
+
Model update compression should use the same filter boundary: compress before
48
+
sending and decompress after receiving so trainer and aggregator code can
49
+
exchange normal model updates. Use :ref:`message_quantization` for built-in
50
+
model quantization. For custom schemes, implement a ``DXOFilter`` for
51
+
``DataKind.WEIGHTS`` or ``DataKind.WEIGHT_DIFF`` and register it as a task
52
+
result filter for client-to-server updates, or as a task data filter for
Copy file name to clipboardExpand all lines: docs/user_guide/data_scientist_guide/job_recipe.rst
+8-3Lines changed: 8 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -163,7 +163,7 @@ Execution Environments
163
163
164
164
A **Job Recipe** defines *what* to run in a federated learning setting, but it also needs to know *where* to run. NVFlare provides several **execution environments** that allow the same recipe to be executed in different contexts:
165
165
166
-
* **Simulation (** ``SimEnv`` **)** – For local testing and experimentation on a single machine
166
+
* **Simulation (** ``SimEnv`` **)** – For local testing and experimentation on a single machine or in one batch job
167
167
* **Proof-of-Concept (** ``PocEnv`` **)** – For small-scale, multi-process setups that mimic real-world deployment on a single machine
168
168
* **Production (** ``ProdEnv`` **)** – For full-scale distributed deployments across multiple organizations and sites
169
169
@@ -172,17 +172,22 @@ This separation enables users to **prototype once and deploy anywhere** without
172
172
SimEnv – Simulation Environment
173
173
-------------------------------
174
174
175
-
Runs all clients and the server as **threads** within a single process. This is lightweight and easy to set up with no networking required. Best suited for:
175
+
Runs the job with the local FL simulator backend: no provisioned project or
176
+
long-running server/client daemons. Simulated clients use local worker
177
+
processes; ``num_threads`` is the historical name for the worker-process
178
+
concurrency. Best suited for:
176
179
177
180
* Quick experiments
178
181
* Debugging scripts and models
179
182
* Educational use cases
183
+
* Batch-scheduled experiments where one submitted job should run the complete
184
+
federated workflow and then exit
180
185
181
186
**Arguments:**
182
187
183
188
* ``num_clients`` (int): Number of simulated clients
184
189
* ``clients``: A list of client names (length needs to match ``num_clients`` if both are provided)
185
-
* ``num_threads``: Number of threads to use to run simulated clients
190
+
* ``num_threads``: Number of concurrent simulated client worker processes
186
191
* ``gpu_config`` (str): List of GPU device IDs, comma separated
Copy file name to clipboardExpand all lines: docs/user_guide/nvflare_cli/fl_simulator.rst
+39-29Lines changed: 39 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,10 +11,11 @@ The FL Simulator is a lightweight simulator of a running NVFLARE FL deployment,
11
11
and it can allow researchers to test and debug their application without
12
12
provisioning a real project.
13
13
14
-
The FL jobs run on a server and
15
-
multiple clients in the same process but in a similar way to how it would run
16
-
in a real deployment so researchers can more quickly build out new components
17
-
and jobs that can then be directly used in a real production deployment.
14
+
The FL jobs run on a local simulator-managed server and simulated clients,
15
+
without provisioning a real project or starting long-running server/client
16
+
daemons. Use the simulator for single-machine development, tests, and
17
+
batch-scheduled experiments where one Python command should start, run, and
18
+
exit. Use POC or production modes for the provisioned deployment model.
18
19
19
20
***********************
20
21
Command Usage
@@ -846,56 +847,65 @@ application run.
846
847
Processes, Clients, and Events
847
848
******************************
848
849
849
-
Specifying number of processes
850
-
==============================
851
-
The simulator ``-t`` option provides the ability to specify how many processes to run the simulator with.
850
+
Specifying Client Worker Processes
851
+
==================================
852
+
The simulator ``-t`` option provides the ability to specify how many simulated
853
+
client worker processes can run concurrently.
852
854
853
855
.. note::
854
856
855
-
The ``-t`` and ``--threads`` option for simulator was originally due to clients running in separate threads.
856
-
However each client now actually runs in a separate process. This distinction will not affect the user experience.
857
+
The ``-t`` and ``--threads`` option name is historical. Simulated client
858
+
execution now uses separate worker processes, and the option controls worker
859
+
process concurrency.
857
860
858
861
- N = number of clients (``-n``)
859
-
- T = number of processes (``-t``)
862
+
- T = number of concurrent client worker processes (``-t``)
860
863
861
-
When running the simulator with fewer processes than clients (T < N)
862
-
the simulator will need to swap-in/out the clients for the processes, resulting in some of the clients running sequentially as processes are available.
863
-
This also will cause the ClientRunner/learner objects to go through setup and teardown in every round.
864
-
Using T < N is only needed when trying to simulate of large number of clients using a single machine with limited resources.
864
+
When running the simulator with fewer worker processes than clients (T < N),
865
+
the simulator swaps clients in and out as worker processes become available.
866
+
This also causes the ClientRunner/learner objects to go through setup and
867
+
teardown in every round. Using T < N is only needed when simulating many clients
868
+
on a single machine with limited resources.
865
869
866
-
In most cases, run the simulator with the same number of processes as clients (T = N). The simulator will run the number of clients in separate processes at the same time. Each
867
-
client will always be running in memory with no swap-in/out, but it will require more resources available.
870
+
In most cases, run the simulator with the same number of worker processes as
871
+
clients (T = N). Each client stays in memory with no swap-in/out, but this
872
+
requires more available resources.
868
873
869
874
For the dataset / tensorboard initialization, you could make use of EventType.SWAP_IN and EventType.SWAP_OUT
870
875
in the application.
871
876
872
877
SWAP_IN and SWAP_OUT events
873
878
===========================
874
-
During FLARE simulator execution, the client Apps are executed in turn in the same execution thread. Each executing client App will go
875
-
fetch the task from the controller on the server, execute the task, and then submit the task results to the controller. Once finished submitting
876
-
results, the current client App will yield the executing thread to the next client App to execute.
879
+
During FLARE simulator execution, simulated client Apps fetch tasks from the
880
+
controller, execute the tasks, and submit results back to the controller. When
881
+
T < N, multiple simulated clients share a smaller pool of worker processes and
882
+
may be swapped in and out as worker processes become available.
877
883
878
-
If the client App needs to preserve some states for the next "execution turn" to continue, the client executor can make use of the ``SWAP_OUT``
879
-
event fired by the simulator engine to save the current states. When the client App gets the turn to execute again, use the ``SWAP_IN``
880
-
event to recover the previous saved states.
884
+
If the client App needs to preserve state for the next execution turn, the
885
+
client executor can use the ``SWAP_OUT`` event fired by the simulator engine to
886
+
save the current state. When the client App gets another turn to execute, use
887
+
the ``SWAP_IN`` event to recover the previous saved state.
881
888
882
889
Multi-GPU and Separate Client Process with Simulator
Copy file name to clipboardExpand all lines: examples/tutorials/flare_simulator.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@
7
7
"source": [
8
8
"## Intro to the FL Simulator\n",
9
9
"\n",
10
-
"The [FL Simulator](https://nvflare.readthedocs.io/en/latest/user_guide/nvflare_cli/fl_simulator.html) runs a local simulation of a running NVFLARE FL deployment. This allows researchers to test and debug an application without provisioning a real, distributed FL project. The FL Simulator runs a server and multiple clients in the same local process, with communication that mimics a real deployment. This allows researchers to more quickly build out new components and jobs that can be directly used in a production deployment.\n",
10
+
"The [FL Simulator](https://nvflare.readthedocs.io/en/latest/user_guide/nvflare_cli/fl_simulator.html) runs the same job concepts with a local simulator-managed server and simulated clients. It is useful for local development and batch jobs that should start, run, and exit without provisioned daemons.\n",
11
11
"\n",
12
12
"### Setup\n",
13
13
"The NVFlare [Getting Started Guide](https://nvflare.readthedocs.io/en/main/getting_started.html) provides instructions for setting up NVFlare on a local system or in a Docker image. We've also cloned the NVFlare GitHub in our top-level working directory."
Copy file name to clipboardExpand all lines: examples/tutorials/job_recipe.ipynb
+4-3Lines changed: 4 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -185,16 +185,17 @@
185
185
"source": [
186
186
"### SimEnv – Simulation Environment\n",
187
187
"\n",
188
-
"Runs all clients and the server as **threads** within a single process. This is lightweight and easy to set up; no networking required. Best suited for:\n",
188
+
"Runs the job with the local FL simulator backend: no provisioned project or long-running server/client daemons. Simulated clients use local worker processes; `num_threads` is the historical name for the worker-process concurrency. Best suited for:\n",
189
189
"\n",
190
190
"* Quick experiments\n",
191
191
"* Debugging scripts and models\n",
192
192
"* Educational use cases\n",
193
+
"* Batch-scheduled experiments where one submitted job should run the complete federated workflow and then exit\n",
193
194
"\n",
194
195
"**Arguments:**\n",
195
196
"* `num_clients` (int): number of simulated clients\n",
196
197
"* `clients`: a list of client names (length needs to match num_clients if both are provided)\n",
197
-
"* `num_threads`: number of threads to use to run simulated clients\n",
198
+
"* `num_threads`: number of concurrent simulated client worker processes\n",
198
199
"* `gpu_config` (str): list of GPU Device Ids, comma separated\n",
0 commit comments