|
| 1 | +# Dataset viewer |
| 2 | + |
| 3 | +The dataset viewer is a small trame/VTK web application that lets |
| 4 | +you browse PLAID datasets stored on disk and inspect their samples in 3D. |
| 5 | +It ships as the `plaid-viewer` console script. |
| 6 | + |
| 7 | +## Architecture |
| 8 | + |
| 9 | +The viewer runs as a single trame server process: |
| 10 | + |
| 11 | +- `plaid.viewer.services.PlaidDatasetService` discovers datasets and |
| 12 | + loads `plaid.Sample` instances. It uses |
| 13 | + `plaid.storage.init_from_disk` to obtain `(dataset_dict, |
| 14 | + converter_dict)` and materialises a sample on demand with |
| 15 | + `converter.to_plaid(dataset, index)`, so every PLAID backend |
| 16 | + (`hf_datasets`, `cgns`, `zarr`, ...) is supported uniformly. |
| 17 | + Hugging Face Hub datasets are also supported: when a dataset id is |
| 18 | + registered as a repo id, the service dispatches to |
| 19 | + `plaid.storage.init_streaming_from_hub` instead, so samples are |
| 20 | + streamed lazily without a full local copy. |
| 21 | +- `plaid.viewer.services.ParaviewArtifactService` writes each selected |
| 22 | + sample to a CGNS file (or `.cgns.series` sidecar for time-dependent |
| 23 | + samples) in a per-process cache directory. |
| 24 | +- `plaid.viewer.trame_app.server.build_server` assembles the UI |
| 25 | + (Vuetify side drawer with dataset/split/sample selectors and display |
| 26 | + options) and a VTK pipeline: `vtkCGNSReader` → optional cut plane → |
| 27 | + optional threshold → composite-data geometry → mapper/actor. |
| 28 | + |
| 29 | +There is no separate FastAPI backend and no second port: dataset |
| 30 | +discovery, CGNS export and the 3D view are all served by trame. |
| 31 | + |
| 32 | +## Launching the viewer |
| 33 | + |
| 34 | +```bash |
| 35 | +uv run plaid-viewer --datasets-root /path/to/datasets |
| 36 | +``` |
| 37 | + |
| 38 | +Useful options: |
| 39 | + |
| 40 | +| Option | Default | Description | |
| 41 | +| ----------------- | ----------- | ------------------------------------------------------------------------------------------------ | |
| 42 | +| `--datasets-root` | *required* | Directory containing one sub-directory per PLAID dataset. A single-dataset directory also works. | |
| 43 | +| `--host` | `127.0.0.1` | Bind address for the trame HTTP server. | |
| 44 | +| `--port` | `8080` | Port exposed by the trame HTTP server. | |
| 45 | +| `--hub-repo` | `None` | Hugging Face Hub repo id (`namespace/name`) streamed via `init_streaming_from_hub`. Repeat the flag to pre-register multiple repos. | |
| 46 | + |
| 47 | +Open `http://<host>:<port>/` in your browser. |
| 48 | + |
| 49 | +### Streaming from the Hugging Face Hub |
| 50 | + |
| 51 | +Hub datasets can be added at launch time with `--hub-repo` or from the |
| 52 | +running UI through the **Hub** tab in the side drawer (the drawer now |
| 53 | +groups the local datasets root and the Hugging Face repo input under a |
| 54 | +`Local / Hub` tab selector, hidden when `--disable-root-change` is set). |
| 55 | +Each registered repo shows up as a removable chip and as a new entry in |
| 56 | +the **Dataset** dropdown. Samples are loaded on demand through |
| 57 | +`plaid.storage.init_streaming_from_hub`, so only the selected sample's |
| 58 | +shards are fetched. |
| 59 | + |
| 60 | +```bash |
| 61 | +# Start with one or more hub datasets pre-registered. |
| 62 | +uv run plaid-viewer --hub-repo PLAID-lib/VKI-LS59 --hub-repo PLAID-lib/Rotor37 |
| 63 | +``` |
| 64 | + |
| 65 | +Streaming splits returned by PLAID are forward-only |
| 66 | +`datasets.IterableDataset` objects without `__len__`. The viewer adapts |
| 67 | +accordingly: |
| 68 | + |
| 69 | +- A `streaming` chip appears in the toolbar to advertise the mode. |
| 70 | +- The **Sample** slider starts at a single reachable step and grows by |
| 71 | + one every time the user moves it to the right; each right-arrow press |
| 72 | + consumes the next element from the iterator. |
| 73 | +- Revisiting an already-fetched index simply re-renders the cached |
| 74 | + sample; the slider cannot be rewound because the underlying iterator |
| 75 | + cannot. |
| 76 | +- Switching split or dataset rebuilds a fresh iterator from the Hub. |
| 77 | +- When the stream is exhausted the slider caps at the last consumed |
| 78 | + index and the counter label shows `(end of stream)`. |
| 79 | + |
| 80 | + |
| 81 | +## Using the UI |
| 82 | + |
| 83 | +The side drawer provides, from top to bottom: |
| 84 | + |
| 85 | +1. **Dataset / Split** - two `VSelect` controls that pick the active |
| 86 | + dataset and split. |
| 87 | +2. **Sample** - a `VSlider` over the integer sample index of the current |
| 88 | + split; the selected `sample_id` (and the total count) is shown under |
| 89 | + the slider. |
| 90 | +3. **Base** - a `VBtnToggle` with exclusive, mandatory selection: exactly |
| 91 | + one renderable CGNS base exposed by `vtkCGNSReader.GetBaseSelection()` |
| 92 | + is active at any time. Bases that contain |
| 93 | + no `Zone_t` children (for example, a `Global` base storing only |
| 94 | + reference scalars or free-standing tensors) are not rendered but are |
| 95 | + summarised in the **Non-visual bases** accordion further down the |
| 96 | + drawer: each `DataArray_t` is listed with its name, dtype, shape and a |
| 97 | + short value preview. |
| 98 | +4. **Field / Colormap / Show edges** - colour the geometry by any point |
| 99 | + or cell array (all point and cell arrays are enabled on the reader |
| 100 | + by default so every field shows up in the dropdown), pick from a set |
| 101 | + of built-in colormaps and optionally overlay wireframe edges. |
| 102 | +5. **Cut plane** - toggle a `vtkCutter` and interactively adjust its |
| 103 | + normal and signed offset along that normal (the plane origin is the |
| 104 | + current dataset's bounding-box centre). |
| 105 | +6. **Threshold** - toggle a `vtkThreshold` filter on the currently |
| 106 | + selected field and set the `[min, max]` range. Defaults are populated |
| 107 | + from the field's data range. |
| 108 | +7. **Select features** - an expandable panel listing the field paths |
| 109 | + available for the current dataset (retrieved from the PLAID metadata |
| 110 | + schema). Toggling checkboxes and clicking **Apply** filters the loaded |
| 111 | + samples down to the selected fields: |
| 112 | + - For disk-backed datasets the selection is forwarded to |
| 113 | + `converter.to_plaid(dataset, index, features=...)`. PLAID expands |
| 114 | + the list internally with |
| 115 | + `plaid.utils.cgns_helper.update_features_for_CGNS_compatibility` |
| 116 | + to preserve the CGNS conventions (coordinates, zones, grid |
| 117 | + locations, etc. that make the kept fields renderable). The |
| 118 | + user-facing selection is first intersected with the active split's |
| 119 | + own feature catalogue, so paths that only live in another split |
| 120 | + (for example a field present in `train` but not in `test`) do not |
| 121 | + trigger a `Missing features` error. |
| 122 | + - For streaming (Hugging Face Hub) datasets the expansion must be |
| 123 | + done ahead of `init_streaming_from_hub`. The viewer calls |
| 124 | + `update_features_for_CGNS_compatibility` itself and hands the |
| 125 | + expanded list to the streaming loader, then invalidates the |
| 126 | + current iterator so the next sample is materialised with the new |
| 127 | + filter. |
| 128 | + The **Clear** / **Select all** buttons in the panel header provide |
| 129 | + shortcuts; an empty selection loads only the geometric support |
| 130 | + (mesh + zones + metadata). |
| 131 | +8. **Reset camera** - re-frames the current actor. |
| 132 | + |
| 133 | +The 3D view is a server-side `VtkRemoteView` (images are rendered on the |
| 134 | +server and streamed to the browser). Camera manipulation uses the |
| 135 | +ParaView-like trackball style: |
| 136 | + |
| 137 | +- Left mouse button: rotate. |
| 138 | +- Middle mouse button (or Shift + left): pan. |
| 139 | +- Mouse wheel (or right button drag): zoom. |
| 140 | + |
| 141 | +A status line at the bottom of the drawer reports the last action or |
| 142 | +error. |
| 143 | + |
| 144 | +## Cache layout |
| 145 | + |
| 146 | +Artifacts are written under an **ephemeral** per-process temp directory |
| 147 | +created by `plaid.viewer.cache.CacheRoot` (named |
| 148 | +`plaid-viewer-{pid}-{token}` under `tempfile.gettempdir()`): |
| 149 | + |
| 150 | +``` |
| 151 | +<cache_root>/datasets/<dataset_id>/<split>/<sample_id>/<key_prefix>/ |
| 152 | + meshes/ # one CGNS per timestep (time-dependent) |
| 153 | + meshes.cgns.series # ParaView file-series sidecar (time-dependent) |
| 154 | + mesh.cgns # single static mesh |
| 155 | + metadata.json # cache key, sample ref, export version, ... |
| 156 | +``` |
| 157 | + |
| 158 | +The cache holds **at most one artifact at a time**: once VTK has loaded |
| 159 | +a sample's CGNS into memory the on-disk copy is no longer needed, so |
| 160 | +the next `ensure_artifact` call removes the previous folder before |
| 161 | +writing the new one. |
| 162 | + |
| 163 | +The whole cache root is deleted at shutdown through four complementary |
| 164 | +layers: `atexit`, `SIGINT` / `SIGTERM` handlers, the `with CacheRoot()` |
| 165 | +context manager used by the CLI, and an orphan sweep at startup that |
| 166 | +removes directories left behind by previously-crashed processes. |
| 167 | + |
| 168 | +The cache key is a SHA-256 of the sample reference, PLAID |
| 169 | +version and `ViewerConfig.export_version`. |
| 170 | + |
| 171 | +## Programmatic usage |
| 172 | + |
| 173 | +```python |
| 174 | +from pathlib import Path |
| 175 | +from plaid.viewer.cache import CacheRoot |
| 176 | +from plaid.viewer.config import ViewerConfig |
| 177 | +from plaid.viewer.services import ParaviewArtifactService, PlaidDatasetService |
| 178 | +from plaid.viewer.trame_app.server import build_server |
| 179 | + |
| 180 | +config = ViewerConfig(datasets_root=Path("/path/to/datasets")) |
| 181 | +with CacheRoot() as cache: |
| 182 | + datasets = PlaidDatasetService(config) |
| 183 | + artifacts = ParaviewArtifactService(datasets, cache.path) |
| 184 | + server = build_server(datasets, artifacts) |
| 185 | + server.start(host="127.0.0.1", port=8080, open_browser=False) |
| 186 | +``` |
0 commit comments