Clarify simulator and API doc signposts

YuanTingHsieh · YuanTingHsieh · commit bce7c715cf34 · 2026-05-06T15:09:32.000-07:00
diff --git a/docs/programming_guide/execution_api_type/executor.rst b/docs/programming_guide/execution_api_type/executor.rst
@@ -10,6 +10,13 @@ An :class:`Executor<nvflare.apis.executor.Executor>` is an FLComponent for FL cl
 wherein the ``execute`` method receives and returns a Shareable object given a task name,
 ``FLContext``, and ``abort_signal``.
 
+.. note::
+
+   The Executor API is the low-level client task API. Most new ML training
+   examples should start with the :ref:`client_api` and :ref:`job_recipe`, and
+   use Executor directly only when they need a custom task contract or framework
+   integration.
+
 .. literalinclude:: ../../../nvflare/apis/executor.py
     :language: python
     :lines: 24-
diff --git a/docs/programming_guide/filters.rst b/docs/programming_guide/filters.rst
@@ -44,6 +44,14 @@ purpose of security. In fact, privacy and homomorphic encryption techniques are
     - SVTPrivacy for differential privacy through sparse vector techniques (:mod:`nvflare.app_common.filters.svt_privacy`)
     - Homomorphic encryption filters to encrypt data before sharing (:mod:`nvflare.app_common.homomorphic_encryption.he_model_encryptor.py` and :mod:`nvflare.app_common.homomorphic_encryption.he_model_decryptor`)
 
+Model update compression should use the same filter boundary: compress before
+sending and decompress after receiving so trainer and aggregator code can
+exchange normal model updates. Use :ref:`message_quantization` for built-in
+model quantization. For custom schemes, implement a ``DXOFilter`` for
+``DataKind.WEIGHTS`` or ``DataKind.WEIGHT_DIFF`` and register it as a task
+result filter for client-to-server updates, or as a task data filter for
+server-to-client model messages.
+
 DXO - Data Exchange Object
 ===========================
 
diff --git a/docs/user_guide/data_scientist_guide/job_recipe.rst b/docs/user_guide/data_scientist_guide/job_recipe.rst
@@ -163,7 +163,7 @@ Execution Environments
 
 A **Job Recipe** defines *what* to run in a federated learning setting, but it also needs to know *where* to run. NVFlare provides several **execution environments** that allow the same recipe to be executed in different contexts:
 
-* **Simulation (** ``SimEnv`` **)** – For local testing and experimentation on a single machine
+* **Simulation (** ``SimEnv`` **)** – For local testing and experimentation on a single machine or in one batch job
 * **Proof-of-Concept (** ``PocEnv`` **)** – For small-scale, multi-process setups that mimic real-world deployment on a single machine
 * **Production (** ``ProdEnv`` **)** – For full-scale distributed deployments across multiple organizations and sites
 
@@ -172,17 +172,22 @@ This separation enables users to **prototype once and deploy anywhere** without
 SimEnv – Simulation Environment
 -------------------------------
 
-Runs all clients and the server as **threads** within a single process. This is lightweight and easy to set up with no networking required. Best suited for:
+Runs the job with the local FL simulator backend: no provisioned project or
+long-running server/client daemons. Simulated clients use local worker
+processes; ``num_threads`` is the historical name for the worker-process
+concurrency. Best suited for:
 
 * Quick experiments
 * Debugging scripts and models
 * Educational use cases
+* Batch-scheduled experiments where one submitted job should run the complete
+  federated workflow and then exit
 
 **Arguments:**
 
 * ``num_clients`` (int): Number of simulated clients
 * ``clients``: A list of client names (length needs to match ``num_clients`` if both are provided)
-* ``num_threads``: Number of threads to use to run simulated clients
+* ``num_threads``: Number of concurrent simulated client worker processes
 * ``gpu_config`` (str): List of GPU device IDs, comma separated
 * ``log_config`` (str): Log config mode (``'concise'``, ``'full'``, ``'verbose'``), filepath, or level
 
diff --git a/docs/user_guide/nvflare_cli/fl_simulator.rst b/docs/user_guide/nvflare_cli/fl_simulator.rst
@@ -11,10 +11,11 @@ The FL Simulator is a lightweight simulator of a running NVFLARE FL deployment,
 and it can allow researchers to test and debug their application without
 provisioning a real project.
 
-The FL jobs run on a server and
-multiple clients in the same process but in a similar way to how it would run
-in a real deployment so researchers can more quickly build out new components
-and jobs that can then be directly used in a real production deployment.
+The FL jobs run on a local simulator-managed server and simulated clients,
+without provisioning a real project or starting long-running server/client
+daemons. Use the simulator for single-machine development, tests, and
+batch-scheduled experiments where one Python command should start, run, and
+exit. Use POC or production modes for the provisioned deployment model.
 
 ***********************
 Command Usage
@@ -846,56 +847,65 @@ application run.
 Processes, Clients, and Events
 ******************************
 
-Specifying number of processes
-==============================
-The simulator ``-t`` option provides the ability to specify how many processes to run the simulator with.
+Specifying Client Worker Processes
+==================================
+The simulator ``-t`` option provides the ability to specify how many simulated
+client worker processes can run concurrently.
 
 .. note::
 
-    The ``-t`` and ``--threads`` option for simulator was originally due to clients running in separate threads.
-    However each client now actually runs in a separate process. This distinction will not affect the user experience.
+    The ``-t`` and ``--threads`` option name is historical. Simulated client
+    execution now uses separate worker processes, and the option controls worker
+    process concurrency.
 
 - N = number of clients (``-n``)
-- T = number of processes (``-t``)
+- T = number of concurrent client worker processes (``-t``)
 
-When running the simulator with fewer processes than clients (T < N)
-the simulator will need to swap-in/out the clients for the processes, resulting in some of the clients running sequentially as processes are available.
-This also will cause the ClientRunner/learner objects to go through setup and teardown in every round.
-Using T < N is only needed when trying to simulate of large number of clients using a single machine with limited resources.
+When running the simulator with fewer worker processes than clients (T < N),
+the simulator swaps clients in and out as worker processes become available.
+This also causes the ClientRunner/learner objects to go through setup and
+teardown in every round. Using T < N is only needed when simulating many clients
+on a single machine with limited resources.
 
-In most cases, run the simulator with the same number of processes as clients (T = N). The simulator will run the number of clients in separate processes at the same time. Each
-client will always be running in memory with no swap-in/out, but it will require more resources available.
+In most cases, run the simulator with the same number of worker processes as
+clients (T = N). Each client stays in memory with no swap-in/out, but this
+requires more available resources.
 
 For the dataset / tensorboard initialization, you could make use of EventType.SWAP_IN and EventType.SWAP_OUT
 in the application.
 
 SWAP_IN and SWAP_OUT events
 ===========================
-During FLARE simulator execution, the client Apps are executed in turn in the same execution thread. Each executing client App will go
-fetch the task from the controller on the server, execute the task, and then submit the task results to the controller. Once finished submitting
-results, the current client App will yield the executing thread to the next client App to execute.
+During FLARE simulator execution, simulated client Apps fetch tasks from the
+controller, execute the tasks, and submit results back to the controller. When
+T < N, multiple simulated clients share a smaller pool of worker processes and
+may be swapped in and out as worker processes become available.
 
-If the client App needs to preserve some states for the next "execution turn" to continue, the client executor can make use of the ``SWAP_OUT``
-event fired by the simulator engine to save the current states. When the client App gets the turn to execute again, use the ``SWAP_IN``
-event to recover the previous saved states.
+If the client App needs to preserve state for the next execution turn, the
+client executor can use the ``SWAP_OUT`` event fired by the simulator engine to
+save the current state. When the client App gets another turn to execute, use
+the ``SWAP_IN`` event to recover the previous saved state.
 
 Multi-GPU and Separate Client Process with Simulator
 ====================================================
-The simulator runs within the same process, and it will make use of a single GPU if it is detected with ``nvidia-smi``.
-If there are multiple GPUs available and you want to make use of them all for the simulator run, you can use the
-``-gpu`` option for this. The ``-gpu`` option provides the list of GPUs for the simulator to run on. The
-clients list will be distributed among the GPUs.
+The simulator uses separate client worker processes and assigns GPUs to those
+workers. If there are multiple GPUs available and you want to make use of them
+all for the simulator run, you can use the ``-gpu`` option for this. The
+``-gpu`` option provides the list of GPUs for the simulator to run on. The
+clients list will be distributed among the GPU groups.
 
 For example:
 
 .. code-block::shell
 
     -c  c1,c2,c3,c4,c5 -gpu 0,1
 
-The clients c1, c3, and c5 will run on GPU 0 in one process, and clients c2 and c4 will run on GPU 1 in another process.
+The clients c1, c3, and c5 will be assigned to GPU 0, and clients c2 and c4
+will be assigned to GPU 1.
 
-The GPU numbers do not have to be unique. If you use ``-gpu 0,0``, this will run 2 separate client processes on GPU 0, assuming this GPU will have
-enough memory to support the applications.
+The GPU numbers do not have to be unique. If you use ``-gpu 0,0``, this will
+create two client worker slots assigned to GPU 0, assuming this GPU has enough
+memory to support the applications.
 
 .. note::
 
diff --git a/examples/tutorials/flare_simulator.ipynb b/examples/tutorials/flare_simulator.ipynb
@@ -7,7 +7,7 @@
    "source": [
     "## Intro to the FL Simulator\n",
     "\n",
-    "The [FL Simulator](https://nvflare.readthedocs.io/en/latest/user_guide/nvflare_cli/fl_simulator.html) runs a local simulation of a running NVFLARE FL deployment.  This allows researchers to test and debug an application without provisioning a real, distributed FL project. The FL Simulator runs a server and multiple clients in the same local process, with communication that mimics a real deployment.  This allows researchers to more quickly build out new components and jobs that can be directly used in a production deployment.\n",
+    "The [FL Simulator](https://nvflare.readthedocs.io/en/latest/user_guide/nvflare_cli/fl_simulator.html) runs the same job concepts with a local simulator-managed server and simulated clients. It is useful for local development and batch jobs that should start, run, and exit without provisioned daemons.\n",
     "\n",
     "### Setup\n",
     "The NVFlare [Getting Started Guide](https://nvflare.readthedocs.io/en/main/getting_started.html) provides instructions for setting up NVFlare on a local system or in a Docker image.  We've also cloned the NVFlare GitHub in our top-level working directory."
diff --git a/examples/tutorials/job_recipe.ipynb b/examples/tutorials/job_recipe.ipynb
@@ -185,16 +185,17 @@
    "source": [
     "### SimEnv – Simulation Environment\n",
     "\n",
-    "Runs all clients and the server as **threads** within a single process. This is lightweight and easy to set up; no networking required. Best suited for:\n",
+    "Runs the job with the local FL simulator backend: no provisioned project or long-running server/client daemons. Simulated clients use local worker processes; `num_threads` is the historical name for the worker-process concurrency. Best suited for:\n",
     "\n",
     "* Quick experiments\n",
     "* Debugging scripts and models\n",
     "* Educational use cases\n",
+    "* Batch-scheduled experiments where one submitted job should run the complete federated workflow and then exit\n",
     "\n",
     "**Arguments:**\n",
     "* `num_clients` (int): number of simulated clients\n",
     "* `clients`: a list of client names (length needs to match num_clients if both are provided)\n",
-    "* `num_threads`: number of threads to use to run simulated clients\n",
+    "* `num_threads`: number of concurrent simulated client worker processes\n",
     "* `gpu_config` (str): list of GPU Device Ids, comma separated\n",
     "* `log_config` (str): \"log config mode ('concise', 'full', 'verbose'), filepath, or level\"\n",
     "\n",
@@ -468,4 +469,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}