Update README files

EZoni · EZoni · commit b0bbfbe7d7b2 · 2026-02-03T20:28:01.000-08:00
diff --git a/dashboard/README.md b/dashboard/README.md
@@ -1,121 +1,107 @@
-## Dashboard
+# Dashboard
 
-Here are a few how-to guides on how to develop and use the dashboard.
+This guide contains important instructions on how to use and develop the Synapse dashboard.
 
-### Prerequisites
-- Ensure you have Conda installed.
-- Ensure you have Docker installed (if you plan to use Docker).
+## Prerequisites
 
-### How to create a new conda environment lock file
+Make sure you have installed [conda](https://docs.conda.io/) and [Docker](https://docs.docker.com/).
 
-1. Activate the `base` conda environment:
-    ```console
-    conda activate base
-    ```
+## How to generate the conda environment lock file
 
-2. Install `conda-lock` (if not already installed):
-    ```console
-    conda install -c conda-forge conda-lock
-    ```
+A new conda environment lock file can be generated by running the following commands:
 
-3. Create the lock file starting from the existing minimal environment file:
-    ```console
-    conda-lock --file environment.yml --lockfile environment-lock.yml
-    ```
+```bash
+conda activate base
+conda install -c conda-forge conda-lock  # if conda-lock is not installed
+conda-lock --file environment.yml --lockfile environment-lock.yml
+```
 
-### How to set up the conda environment
+## How to create the conda environment
 
-1. Activate the `base` conda environment:
-    ```console
-    conda activate base
-    ```
+The conda environment defined by the lock file can be created by running the following commands:
 
-2. Install `conda-lock` (if not already installed):
-    ```console
-    conda install -c conda-forge conda-lock
-    ```
-
-3. Create the `synapse-gui` conda environment from the lock file:
-    ```console
-    conda-lock install --name synapse-gui environment-lock.yml
-    ```
+```bash
+conda activate base
+conda install -c conda-forge conda-lock  # if conda-lock is not installed
+conda-lock install --name synapse-gui environment-lock.yml
+```
 
-### How to run the GUI
+## How to run the dashboard
 
-1. Activate the `synapse-gui` conda environment:
-    ```console
+1. Activate the conda environment:
+    ```bash
     conda activate synapse-gui
     ```
 
 2. Set the database settings (read only):
-    ```console
+    ```bash
     export SF_DB_HOST='127.0.0.1'
     export SF_DB_READONLY_PASSWORD='your_password_here'  # Use SINGLE quotes around the password!
     ```
 
-3. For local development, open a separate terminal and keep it open while SSH forwarding the database connection:
-    ```console
+3. Open a separate terminal and keep it open while SSH forwarding the database connection:
+    ```bash
     ssh -L 27017:mongodb05.nersc.gov:27017 <username>@dtn03.nersc.gov -N
     ```
 
-4. Run the GUI from the `dashboard/` folder:
+4. Run the dashboard from the `dashboard/` folder:
     - Via the web browser interface:
-    ```console
+    ```bash
     python -u app.py --port 8080
     ```
     - As a desktop application:
-    ```console
+    ```bash
     python -u app.py --app
     ```
-    If you run the GUI as a desktop application, make sure to set the following environment variable first:
-    ```console
+    If you run the dashboard as a desktop application, make sure to set the following environment variable first:
+    ```bash
     python -m pip install pywebview[qt]
     export PYWEBVIEW_GUI=qt
     ```
 
-5. Terminate the GUI via `Ctrl` + `C`.
+5. Terminate the application via `Ctrl` + `C`.
 
-### How to build and run the Docker container
+## How to build and run the Docker container
 
 1. Move to the root directory of the repository.
 
 2. Build the Docker image based on `Dockerfile`:
-    ```console
+    ```bash
     docker build --platform linux/amd64 -t synapse-gui -f dashboard.Dockerfile .
     ```
 
-3. Run the Docker container from the `dashboard/` folder:
-    ```console
+3. Run the Docker container:
+    ```bash
     docker run --network=host -v /etc/localtime:/etc/localtime -v $PWD/ml:/app/ml -e SF_DB_HOST='127.0.0.1' -e SF_DB_READONLY_PASSWORD='your_password_here' synapse-gui
     ```
     For debugging, you can also enter the container without starting the app:
-    ```console
+    ```bash
     docker run --network=host -v /etc/localtime:/etc/localtime -v $PWD/ml:/app/ml -e SF_DB_HOST='127.0.0.1' -e SF_DB_READONLY_PASSWORD='your_password_here' -it synapse-gui bash
     ```
     Note that `-v /etc/localtime:/etc/localtime` is necessary to synchronize the time zone in the container with the host machine.
 
-4. Optional: Publish the container privately to NERSC registry (https://registry.nersc.gov):
-    ```console
+4. (Optional) Publish the container privately to [NERSC registry](https://registry.nersc.gov):
+    ```bash
     docker login registry.nersc.gov
     # Username: your NERSC username
     # Password: your NERSC password without 2FA
     ```
-    ```console
+    ```bash
     docker tag synapse-gui:latest registry.nersc.gov/m558/superfacility/synapse-gui:latest
     docker tag synapse-gui:latest registry.nersc.gov/m558/superfacility/synapse-gui:$(date "+%y.%m")
     docker push -a registry.nersc.gov/m558/superfacility/synapse-gui
     ```
-    This has been also automated through the Python script [publish_container.py](https://github.com/BLAST-AI-ML/synapse/blob/main/publish_container.py), which can be executed via
-    ```console
+    This has been also automated through the Python script [publish_container.py](../publish_container.py), which can be executed via
+    ```bash
     python publish_container.py --gui
     ```
 
-5. Optional: From time to time, as you develop the container, you might want to prune old, unused images to get back GBytes of storage on your development machine:
-    ```console
+5. (Optional) As you develop the container, you might want to prune old, unused images periodically in order to free space on your development machine:
+    ```bash
     docker system prune -a
     ```
 
-### How to get the Superfacility API credentials
+## How to get the Superfacility API credentials
 
 Following the instructions at [docs.nersc.gov/services/sfapi/authentication/#client](https://docs.nersc.gov/services/sfapi/authentication/#client):
 
@@ -127,8 +113,8 @@ Following the instructions at [docs.nersc.gov/services/sfapi/authentication/#cli
 
 4. Enter a client name (e.g., "Synapse"), choose `sf558` for the user, choose "Red" security level, and select either "Your IP" or "Spin" from the "IP Presets" menu, depending on whether the key will be used from a local computer or from Spin.
 
-5. Download the private key file (in pem format) and save it as `priv_key.pem` in the root directory of the GUI.
-   Each time the GUI is launched, it will automatically find the existing key file and load the corresponding credentials.
+5. Download the private key file (in pem format) and save it as `priv_key.pem` in the root directory of the dashboard.
+   Each time the dashboard is launched, it will automatically find the existing key file and load the corresponding credentials.
 
 6. Copy your client ID and add it on the first line of your private key file as described in the instructions at [nersc.github.io/sfapi_client/quickstart/#storing-keys-in-files](https://nersc.github.io/sfapi_client/quickstart/#storing-keys-in-files):
     ```
diff --git a/ml/README.md b/ml/README.md
@@ -1,104 +1,116 @@
 # ML Training
 
-The ML training (implemented in ``train_model.py``) can be run in two ways:
+This guide contains important instructions on how to train ML models within Synapse.
 
-- In your local Python environment, for testing/debugging: ``python train_model.py ...``
+## Prerequisites
 
-- Through the GUI, by clicking the ``Train`` button, or through SLURM by running ``sbatch training_pm.sbatch``.
-In both cases, the training runs in a Docker container at NERSC. This Docker container
-is pulled from the NERSC registry (https://registry.nersc.gov) and does not reflect any local changes
-you may have made to ``train_model.py``, unless you re-build and re-deploy the container.
+Make sure you have installed [conda](https://docs.conda.io/) and [Docker](https://docs.docker.com/).
 
-Both methods are described in more detail below.
+## Overview
 
-## Training in a local Python environment (testing/debugging)
+Synapse's ML training is implemented primarily in [train_model.py](train_model.py).
+ML models can be trained in two distinct ways:
 
-### On your local computer
+1. In a local Python environment, for testing and debugging.
 
-For local development, ensure you have [Conda](https://conda-forge.org/download/) installed. Then:
+2. Through the dashboard (by clicking the ``Train`` button) or through SLURM (by running ``sbatch training_pm.sbatch``).
+In both cases, the training runs in a Docker container at NERSC.
+This Docker container is pulled from the [NERSC registry](https://registry.nersc.gov) and does not reflect any local changes you may have made to [train_model.py](train_model.py), unless you re-build and re-deploy the container first.
 
-1. Create the conda environment (this only needs to be done once):
+The following sections describe in more details these two ways of training ML models.
+
+## How to run ML training in a local Python environment
+
+### On a local computer
+
+1. Create the conda environment defined by the lock file (only once):
    ```bash
-   conda env create -f environment.yml
+   conda activate base
+   conda install -c conda-forge conda-lock  # if conda-lock is not installed
+   conda-lock install --name synapse-ml environment-lock.yml
    ```
 
-2. Open a separate terminal and keep it open:
+2. Open a separate terminal and keep it open while SSH forwarding the database connection:
    ```bash
    ssh -L 27017:mongodb05.nersc.gov:27017 <username>@dtn03.nersc.gov -N
    ```
 
-3. Activate the conda environment and setup database read-write access:
+3. Activate the conda environment:
    ```bash
    conda activate synapse-ml
+   ```
+
+4. Set up database settings (read-write):
+   ```bash
    export SF_DB_ADMIN_PASSWORD='your_password_here'  # Use SINGLE quotes around the password!
    ```
 
-4. Run the training script in test mode:
-   ```console
-   python train_model.py --test --model <NN/GP> --config_file <your_test_yaml_file>
+5. Run the ML training script in test mode:
+   ```bash
+   python train_model.py --test --model <NN/GP> --config_file <your_config_file>
    ```
 
 ### At NERSC
 
-1. Create the conda environment (this only needs to be done once):
+1. Create the conda environment defined by the lock file (only once):
    ```bash
    module load python
-   conda env create --prefix /global/cfs/cdirs/m558/$(whoami)/sw/perlmutter/synapse-ml -f environment.yml
+   conda env create --prefix /global/cfs/cdirs/m558/$(whoami)/sw/perlmutter/synapse-ml -f environment.yml  # FIXME
    ```
 
-2. Activate the environment and setup database read-write access:
+2. Activate the conda environment:
    ```bash
    module load python
    conda activate /global/cfs/cdirs/m558/$(whoami)/sw/perlmutter/synapse-ml
+   ```
+
+3. Set up database settings (read-write):
+   ```bash
+   module load python
    export SF_DB_ADMIN_PASSWORD='your_password_here'  # Use SINGLE quotes around the password!
    ```
 
-3. Run the training script in test mode:
-   ```console
-   python train_model.py --test --model <NN/GP> --config_file <your_test_yaml_file>
+4. Run the ML training script in test mode:
+   ```bash
+   python train_model.py --test --model <NN/GP> --config_file <your_config_file>
    ```
 
-## Training through the GUI or through SLURM
+## Training through the dashboard or through SLURM
 
-> **Warning:**
->
-> Pushing a new Docker container affects training jobs launched from your locally-deployed GUI,
-> but also from the production GUI (deployed on NERSC Spin), since in both cases, the training
-> runs in a Docker container at NERSC, which is pulled from the NERSC registry (https://registry.nersc.gov).
->
-> Yet, currently, this is the only way to test the end-to-end integration of the GUI with the training workflow.
+> [!WARNING]
+> Pushing a new Docker container affects training jobs launched from your locally-deployed dashboard, but also from the production dashboard (deployed at NERSC through Spin), because in both cases the ML training runs in a Docker container at NERSC, which is pulled from the [NERSC registry](https://registry.nersc.gov).
+> Currently, this is the only way to test the end-to-end integration of the dashboard with the ML training workflow.
 
 1. Move to the root directory of the repository.
 
 2. Build the Docker image based on `Dockerfile`:
-   ```console
+   ```bash
    docker build --platform linux/amd64 -t synapse-ml -f ml.Dockerfile .
    ```
 
-3. Optional: From time to time, as you develop the container, you might want to prune old, unused images to get back GBytes of storage on your development machine:
-   ```console
+3. (Optional) As you develop the container, you might want to prune old, unused images periodically in order to free space on your development machine:
+   ```bash
    docker system prune -a
    ```
 
-4. Publish the container privately to NERSC registry (https://registry.nersc.gov):
-   ```console
+4. Publish the container privately to [NERSC registry](https://registry.nersc.gov):
+   ```bash
    docker login registry.nersc.gov
    # Username: your NERSC username
    # Password: your NERSC password without 2FA
    ```
-
-   ```console
+   ```bash
    docker tag synapse-ml:latest registry.nersc.gov/m558/superfacility/synapse-ml:latest
    docker tag synapse-ml:latest registry.nersc.gov/m558/superfacility/synapse-ml:$(date "+%y.%m")
    docker push -a registry.nersc.gov/m558/superfacility/synapse-ml
    ```
-    This has been also automated through the Python script [publish_container.py](https://github.com/BLAST-AI-ML/synapse/blob/main/publish_container.py), which can be executed via
-    ```console
+    This has been also automated through the Python script [publish_container.py](../publish_container.py), which can be executed via
+    ```bash
     python publish_container.py --ml
     ```
 
-5. Optional test: Run the Docker container manually on Perlmutter:
-   ```console
+5. (Optional) Run the Docker container manually on Perlmutter:
+   ```bash
    ssh perlmutter-p1.nersc.gov
 
    podman-hpc login --username $USER registry.nersc.gov
@@ -107,27 +119,26 @@ For local development, ensure you have [Conda](https://conda-forge.org/download/
    podman-hpc pull registry.nersc.gov/m558/superfacility/synapse-ml:latest
    ```
 
-   Ensure the file `$HOME/db.profile` contains a line `export SF_DB_ADMIN_PASSWORD=...` with the write password to the database.
+   Ensure the file `$HOME/db.profile` contains a line `export SF_DB_ADMIN_PASSWORD=...` with the read-write password to the database.
 
-   ```console
+   ```bash
    salloc -N 1 --ntasks-per-node=1 -t 1:00:00 -q interactive -C gpu --gpu-bind=single:1 -c 32 -G 1 -A m558
 
    podman-hpc run --gpu -v /etc/localtime:/etc/localtime -v $HOME/db.profile:/root/db.profile -v /path/to/config.yaml:/app/ml/config.yaml --rm -it registry.nersc.gov/m558/superfacility/synapse-ml:latest python -u /app/ml/train_model.py --test --config_file /app/ml/config.yaml --model NN
    ```
    Note that `-v /etc/localtime:/etc/localtime` is necessary to synchronize the time zone in the container with the host machine.
 
 
-> **Note:**
->
-> When we run ML training jobs through the GUI, we use NERSC's Superfacility API with the collaboration account `sf558`.
-> Since this is a non-interactive, non-user account, we also use a custom user to pull the image from https://registry.nersc.gov to Perlmutter.
-> The registry login credentials need to be prepared (once) in the `$HOME` of `sf558` (`/global/homes/s/sf558/`) in a file named `registry.profile` with the following content:
+> [!NOTE]
+> When we run ML training jobs through the dashboard, we use NERSC's Superfacility API with the collaboration account `sf558`.
+> Since this is a non-interactive, non-user account, we also use a custom user to pull the image from the [NERSC registry](https://registry.nersc.gov) to Perlmutter.
+> The registry login credentials need to be prepared (only once) in the `$HOME` of user `sf558` (`/global/homes/s/sf558/`), in a file named `registry.profile` with the following content:
 > ```bash
 > export REGISTRY_USER="robot\$m558+perlmutter-nersc-gov"
 > export REGISTRY_PASSWORD="..."
 > ```
 
 ## References
 
-* https://docs.nersc.gov/development/containers/podman-hpc/overview/
-* https://docs.nersc.gov/development/containers/registry/
+* [Podman at NERSC](https://docs.nersc.gov/development/containers/podman-hpc/overview/)
+* [Using NERSC's `registry.nersc.gov`](https://docs.nersc.gov/development/containers/registry/)