diff --git a/.gitignore b/.gitignore index 37834b7..62f8036 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,6 @@ experiments/*/ # JetBrains IDE idea directories .idea/ + +# Documentation +docs/build/ diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 0000000..d0c3cbf --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/docs.yml b/docs/docs.yml new file mode 100644 index 0000000..fb6df6a --- /dev/null +++ b/docs/docs.yml @@ -0,0 +1,12 @@ +name: synapse-docs + +channels: + - conda-forge + - nodefaults + +dependencies: + - myst-parser + - sphinx + - sphinx-autobuild + - sphinx-book-theme + - sphinx-copybutton diff --git a/docs/make.bat b/docs/make.bat new file mode 100644 index 0000000..747ffb7 --- /dev/null +++ b/docs/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=source +set BUILDDIR=build + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.https://www.sphinx-doc.org/ + exit /b 1 +) + +if "%1" == "" goto help + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/source/conf.py b/docs/source/conf.py new file mode 100644 index 0000000..27d7354 --- /dev/null +++ b/docs/source/conf.py @@ -0,0 +1,30 @@ +# Configuration file for the Sphinx documentation builder. +# +# For the full list of built-in configuration values, see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Project information ----------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information + +project = "Synapse" +copyright = "BSD-3-Clause-LBNL" +author = "Arjun Dhamrait, Andrea Diaz, Marco Garten, Axel Huebl, Revathi Jambunathan, Remi Lehe, Ethan Rodriguez, Olga Shapoval, Jean-Luc Vay, Edoardo Zoni" + +# -- General configuration --------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration + +extensions = ["myst_parser", "sphinx_copybutton"] +myst_heading_anchors = 2 + +templates_path = ["_templates"] +exclude_patterns = [] + +# -- Options for HTML output ------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output + +html_theme = "sphinx_book_theme" +html_theme_options = { + "show_navbar_depth": 1, + "max_navbar_depth": 1, +} +html_static_path = [] diff --git a/docs/source/dashboard.md b/docs/source/dashboard.md new file mode 100644 index 0000000..7b8bd98 --- /dev/null +++ b/docs/source/dashboard.md @@ -0,0 +1,27 @@ +# Dashboard + +The dashboard is a Trame application rooted at `dashboard/app.py`. +It discovers experiments from `experiments/synapse-*`, reads each experiment's `config.yaml`, connects to MongoDB, loads MLflow models, and builds the GUI used to inspect data and launch jobs. + +## Main Managers + +- `state_manager.py`: shared Trame server, state, controller, and startup defaults. +- `model_manager.py`: MLflow model lookup, download, evaluation, and model training launch. +- `parameters_manager.py`: input sliders, parameter bounds, and single-simulation launch. +- `outputs_manager.py`: displayed output selection. +- `optimization_manager.py`: model-based input optimization with SciPy. +- `calibration_manager.py`: simulation-to-experiment variable conversion. +- `sfapi_manager.py`: Superfacility API credential upload, Perlmutter status, and job monitoring. +- `error_manager.py`: user-visible error collection. +- `utils.py`: config loading, database access, date filters, and Plotly figures. + +## Views + +- `/`: experiment selection, plots, parameter controls, optimization, ML controls, calibration controls, and errors. +- `/hpc`: NERSC Superfacility API credential and Perlmutter status panel. +- `/chat`: embedded assistant route for experiment support; currently backed by [synapse-chat.lbl.gov](https://synapse-chat.lbl.gov/). + +## NERSC Credentials + +Simulation and ML training launches require a Superfacility API key file uploaded through the dashboard. +The file must be PEM-formatted and include the Superfacility API client ID as the first line, followed by the private key. diff --git a/docs/source/data-model.md b/docs/source/data-model.md new file mode 100644 index 0000000..b792444 --- /dev/null +++ b/docs/source/data-model.md @@ -0,0 +1,29 @@ +# Data Model + +MongoDB stores experiment and simulation records. +The collection name should match `experiment` in `config.yaml`. + +## Record Types + +- `experiment_flag: 1`: experimental data. +- `experiment_flag: 0`: simulation data. + +## Required Fields + +Each record must contain fields for the configured input and output variables, named to match `config.yaml`. + +Simulation records may use simulation-space names when `simulation_calibration` maps them back to experimental names. + +## Optional Fields + +The dashboard uses these when present: + +- `date`: filtering and hover text for experimental records. +- `scan_number`: hover text. +- `shot_number`: hover text. +- `_id`: hover text and lookup for linked simulation media, such as MP4 files described in [Simulation Outputs](simulations.md#simulation-outputs). + +## Date Filtering + +Dashboard date filtering applies only to experimental records. +Simulation records are loaded without the date filter. diff --git a/docs/source/deployment.md b/docs/source/deployment.md new file mode 100644 index 0000000..c8c630a --- /dev/null +++ b/docs/source/deployment.md @@ -0,0 +1,34 @@ +# Deployment + +Synapse is deployed with Docker images and NERSC services. + +## Dashboard Image + +From the repository root: + +```bash +docker build --platform linux/amd64 --output type=image,oci-mediatypes=true -t synapse-gui -f dashboard.Dockerfile . +``` + +## ML Image + +From the repository root: + +```bash +docker build --platform linux/amd64 --output type=image,oci-mediatypes=true -t synapse-ml -f ml.Dockerfile . +``` + +The two build commands differ only by image tag and Dockerfile. + +## Publish Helper + +```bash +python publish_container.py --gui --ml +``` + +## NERSC Assumptions + +- Dashboard runs on Spin. +- Training and simulations run on Perlmutter through Superfacility API. +- Images are pushed to `registry.nersc.gov/m558/superfacility`. +- Before publishing, validate locally and, when possible, against a staging Spin deployment. diff --git a/docs/source/developer-notes.md b/docs/source/developer-notes.md new file mode 100644 index 0000000..43d859f --- /dev/null +++ b/docs/source/developer-notes.md @@ -0,0 +1,34 @@ +# Developer Notes + +## Style + +Python code is linted and formatted with Ruff through pre-commit: + +```bash +pre-commit run --files +``` + +Ruff runs with its default rule set; there is no `pyproject.toml` or `ruff.toml` to override it. + +## Environments + +- Dashboard dependencies live in `dashboard/environment.yml`. +- ML dependencies live in `ml/environment.yml`. +- Regenerate the matching `environment-lock.yml` after dependency changes. + +## Testing + +The project does not have a full pytest suite. +The main integration check is: + +```bash +python tests/test_ml_pipeline.py +``` + +It requires a local MLflow server. + +## Patterns + +- Dashboard features use manager classes in `dashboard/*_manager.py`. +- Experiment-specific behavior belongs under `experiments/synapse-*`. +- Shared dashboard helpers live in `dashboard/utils.py`. diff --git a/docs/source/experiment-configuration.md b/docs/source/experiment-configuration.md new file mode 100644 index 0000000..20e506f --- /dev/null +++ b/docs/source/experiment-configuration.md @@ -0,0 +1,57 @@ +# Experiment Configuration + +An experiment is a directory named `experiments/synapse-/`. +The dashboard strips `synapse-` and uses the rest as the experiment identifier. + +Each experiment should provide: + +- `config.yaml` +- optional `simulation_scripts/` +- optional `experiment_scripts/` + +## Required Config Sections + +- `experiment`: collection and model namespace, for example `bella-ip2`. +- `database`: MongoDB connection and credential environment variables. +- `mlflow`: tracking URI and optional API key environment variable. +- `execution_mode`: ML training and simulation mode hints. +- `inputs`: scalar variables with `name`, `type`, `default`, and `value_range`. +- `outputs`: scalar variables with `name` and `type`. + +## Calibration + +`simulation_calibration` maps simulation variable names to experimental variable names: + +```yaml +simulation_calibration: + input1: + name: "simulation_variable" + unit: "unit" + depends_on: "experimental_variable" + alpha_guess: 1.0 + alpha_uncertainty: 0.1 + beta_guess: 0.0 + beta_uncertainty: 0.0 +``` + +Dashboard display uses: + +```text +experimental = simulation / alpha + beta +``` + +Simulation launch uses: + +```text +simulation = alpha * (experimental - beta) +``` + +These are inverse conversions: display maps simulation to experimental units, while launch maps dashboard parameters back to simulation units. + +## Add an Experiment + +1. Clone or create `experiments/synapse-/`. +2. Add `config.yaml`. +3. Ensure MongoDB fields match the configured input and output variable names. +4. Add `simulation_scripts/` only if dashboard launch is needed. +5. Train and register a model if dashboard predictions are needed. diff --git a/docs/source/getting-started.md b/docs/source/getting-started.md new file mode 100644 index 0000000..0cc3c4b --- /dev/null +++ b/docs/source/getting-started.md @@ -0,0 +1,40 @@ +# Getting Started + +For a reproducible install, use `environment-lock.yml` rather than the unpinned `environment.yml`. + +## Dashboard + +From `dashboard/`: + +```bash +conda-lock install --name synapse-gui environment-lock.yml +conda activate synapse-gui +export SF_DB_HOST='127.0.0.1' +export SF_DB_READONLY_PASSWORD='...' +export AM_SC_API_KEY='...' +python -u app.py --port 8080 +``` + +For local MongoDB access, open a tunnel first: + +```bash +ssh -L 27017:mongodb05.nersc.gov:27017 @dtn03.nersc.gov -N +``` + +## ML Training + +From `ml/`: + +```bash +conda-lock install --name synapse-ml environment-lock.yml +conda activate synapse-ml +export SF_DB_READONLY_PASSWORD='...' +export AM_SC_API_KEY='...' +python train_model.py --test --config_file ../experiments/synapse-bella-ip2/config.yaml --model NN +``` + +## Required Environment Variables + +- `SF_DB_HOST`: MongoDB host for the dashboard. +- `SF_DB_READONLY_PASSWORD`: read-only MongoDB password. +- `AM_SC_API_KEY`: American Science Cloud MLflow API key when the config uses that service. diff --git a/docs/source/index.rst b/docs/source/index.rst new file mode 100644 index 0000000..8b24ced --- /dev/null +++ b/docs/source/index.rst @@ -0,0 +1,19 @@ +Synapse +======= + +Welcome to the Synapse documentation! + +.. toctree:: + :hidden: + :maxdepth: 1 + :caption: Contents: + + overview + getting-started + dashboard + data-model + experiment-configuration + simulations + ml-training + deployment + developer-notes diff --git a/docs/source/ml-training.md b/docs/source/ml-training.md new file mode 100644 index 0000000..5a0e0c4 --- /dev/null +++ b/docs/source/ml-training.md @@ -0,0 +1,52 @@ +# ML Training + +Model training is implemented in `ml/train_model.py`. +It reads config and MongoDB records, trains a model, wraps it with `lume-model`, and optionally registers it in MLflow. + +## Model Types + +Use `--model` with one of: + +- `GP`: Gaussian Process. +- `NN`: single neural network. +- `ensemble_NN`: ensemble neural network. The current ensemble size is defined in `train_nn_ensemble()` in `ml/train_model.py`. + +## Command + +```bash +python train_model.py --config_file ../experiments/synapse-bella-ip2/config.yaml --model NN +``` + +Use `--test` to skip MLflow registration. + +## Phases + +1. Load config, variables, database records, and MLflow settings. +2. Build calibration and normalization transforms. +3. Train on simulation data. +4. Train [calibration](experiment-configuration.md#calibration) on experimental data when available. +5. Build a `lume-model`. +6. Register to MLflow unless `--test` is set. + +## MLflow Names + +Registered models use: + +```text +synapse-_ +``` + +The MLflow experiment is: + +```text +synapse- +``` + +## Pipeline Test + +Run against a local MLflow server: + +```bash +docker run -p 127.0.0.1:5000:5000 ghcr.io/mlflow/mlflow mlflow server --host 0.0.0.0 +python tests/test_ml_pipeline.py +``` diff --git a/docs/source/overview.md b/docs/source/overview.md new file mode 100644 index 0000000..c5c5994 --- /dev/null +++ b/docs/source/overview.md @@ -0,0 +1,40 @@ +# Overview + +Synapse is a modular framework for connecting experimental data, simulation data, and machine learning models for digital twin workflows. + +The main source areas are: + +- `dashboard/`: a Trame web application for exploring experiments, simulations, model predictions, optimization, calibration, and NERSC job controls. +- `ml/`: model training code for Gaussian Process, single Neural Network, and Neural Network ensemble models. +- `experiments/`: experiment-specific configuration and scripts, usually cloned from private repositories. +- `tests/`: integration checks for the ML pipeline. +- `docs/`: Sphinx documentation source. + +The typical workflow is: + +1. Add or update an experiment repository under `experiments/synapse-/`. +2. Define `config.yaml` with database, MLflow, input, output, and optional calibration settings. +3. Load experiment and simulation points from MongoDB. +4. Train a model from simulation data, optionally calibrating against experimental data. +5. Register the trained model in MLflow. +6. Use the dashboard to visualize data, query the model, optimize inputs, and launch NERSC jobs. + +## Services + +Synapse currently assumes these external services: + +- MongoDB for experiment and simulation records. +- MLflow for registered model storage. +- NERSC Spin for dashboard deployment. +- NERSC Superfacility API for Perlmutter jobs. +- NERSC container registry for dashboard and ML images. + +## Repository Map + +```text +dashboard/ Trame GUI and dashboard managers +ml/ ML training script, model classes, Perlmutter batch template +experiments/ Experiment configs and experiment-owned scripts +tests/ End-to-end ML pipeline helpers +docs/ Sphinx documentation source +``` diff --git a/docs/source/simulations.md b/docs/source/simulations.md new file mode 100644 index 0000000..e0a90ca --- /dev/null +++ b/docs/source/simulations.md @@ -0,0 +1,44 @@ +# Simulations + +Synapse treats simulation support as experiment-owned code. +The dashboard only knows where to find scripts and how to submit a job through the NERSC Superfacility API. + +## Directory Layout + +For dashboard-triggered single simulations, an experiment may provide: + +```text +experiments/synapse-/simulation_scripts/ + submission_script_single + templates/ + ... +``` + +If `submission_script_single` exists, the dashboard enables the `Simulate` button. +Before submission, it writes the current dashboard parameters to `single_simulation_parameters.yaml` after converting experimental variables to simulation variables. + +## Submission Flow + +1. User uploads valid Superfacility API credentials. +2. Dashboard checks Perlmutter status. +3. User clicks `Simulate`. +4. Files from `simulation_scripts/templates/` and the generated parameter YAML are uploaded to: + + ```text + /global/cfs/cdirs/m558/superfacility/simulation_running//templates + ``` + +5. The dashboard reads `submission_script_single` and submits it through Superfacility API. +6. Job status is polled until a terminal state, such as completed, failed, or cancelled. + +## Parameter Scans + +Some experiment repositories also include `submission_script_multi` or custom scan scripts. +These are experiment-specific and are usually run manually on Perlmutter. + +## Simulation Outputs + +Simulation records should be written to the experiment's MongoDB collection with `experiment_flag: 0`. +Field names should match either the experiment config outputs or the configured simulation calibration variable names. + +Some dashboards may link simulation records to MP4 files stored on the Perlmutter shared file system, but this behavior is experiment-specific.