Skip to content

Commit fc3c894

Browse files
committed
Merge branch 'main' into psharpe/GLOBE-perf-to-merge
2 parents d31351d + e579a9f commit fc3c894

102 files changed

Lines changed: 828 additions & 571 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/ISSUE_TEMPLATE/bug_report.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ body:
3131
attributes:
3232
label: Version
3333
description: What version of PhysicsNeMo are you running?
34-
placeholder: "example: 2.0.0"
34+
placeholder: "example: 2.1.0"
3535
validations:
3636
required: true
3737

CHANGELOG.md

Lines changed: 93 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ All notable changes to this project will be documented in this file.
66
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
77
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
88

9-
## [2.1.0a0] - 2026-XX-YY
9+
## [2.2.0] - 2026-XX-YY
1010

1111
### Added
1212

@@ -21,15 +21,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2121
(`SpatialBranch`), and coordinate features.
2222
- Adds `Sin` elementwise sine activation to `physicsnemo.nn`, registered
2323
in `ACT2FN` so it can be looked up by name (`get_activation("sin")`).
24-
- Adds GLOBE model (`physicsnemo.experimental.models.globe.model.GLOBE`),
25-
including new variant that uses a dual tree traversal algorithm to reduce the
26-
complexity of the kernel evaluations from O(N^2) to O(N).
27-
- Adds GLOBE AirFRANS example case (`examples/cfd/external_aerodynamics/globe/airfrans`)
28-
- Adds GLOBE DrivAerML example case (`examples/cfd/external_aerodynamics/globe/drivaer`)
29-
- Adds drop-test dynamics recipe.
30-
- Adds concrete dropout uncertainty quantification for GeoTransolver. Learnable
31-
per-layer dropout rates enable MC-Dropout inference for uncertainty
32-
estimates. Disabled by default (`concrete_dropout: false`).
3324
- Adds active-learning recipe for external-aerodynamics surrogates
3425
(`examples/cfd/external_aerodynamics/active_learning_aero/`). Iteratively
3526
fine-tunes a GP-augmented GeoTransolver onto an out-of-distribution
@@ -39,6 +30,57 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3930
protocols and `physicsnemo.experimental.uq.VariationalGPHead`, with a
4031
layered structure (generic AL driver / GP-UQ recipe / aero adapter)
4132
designed for reuse on other UQ-based regression problems.
33+
34+
### Changed
35+
36+
### Deprecated
37+
38+
### Removed
39+
40+
### Fixed
41+
42+
- Replaced three plain-string regex / docstring literals containing invalid
43+
escape sequences with raw-string equivalents
44+
(`physicsnemo/utils/logging/launch.py`,
45+
`physicsnemo/metrics/general/calibration.py`,
46+
`physicsnemo/metrics/general/crps.py`); these were `SyntaxWarning`s today
47+
and become `SyntaxError`s in Python 3.16.
48+
- Various test cleanups to remove self-inflicted warnings in CI output:
49+
disabled pytest collection for `TestModelA`/`TestModelB` helpers in
50+
`test/core/test_registry.py` via `__test__ = False`; migrated
51+
`test/nn/module/test_interpolation.py` to call the non-deprecated
52+
`grid_to_point_interpolation` and added a dedicated test for the
53+
deprecation alias; scoped a `lr_scheduler.step()`-before-`optimizer.step()`
54+
`UserWarning` filter to a single test in
55+
`test/optim/test_combined_optimizer.py`; guarded the
56+
`DistributedManager.initialize()` calls in `test/utils/test_checkpoint.py`
57+
with `is_initialized()`; and suppressed the import-time
58+
`ExperimentalFeatureWarning` in `test/datapipes/healda/test_features.py`
59+
via `warnings.catch_warnings()`.
60+
- Fixed `physicsnemo.utils.get_checkpoint_dir` returning paths with `\`
61+
separators on Windows (e.g. `.\checkpoints_model`), which was inconsistent
62+
with the `/`-based paths used elsewhere in the checkpoint utilities and
63+
broke the `test_get_checkpoint_dir` CI test on Windows. The function now
64+
always joins with `/`, working uniformly for local paths and `fsspec`
65+
URIs (`msc://`, etc.) across operating systems.
66+
67+
### Security
68+
69+
### Dependencies
70+
71+
## [2.1.0] - 2026-05-26
72+
73+
### Added
74+
75+
- Adds GLOBE model (`physicsnemo.experimental.models.globe.model.GLOBE`),
76+
including new variant that uses a dual tree traversal algorithm to reduce the
77+
complexity of the kernel evaluations from O(N^2) to O(N).
78+
- Adds GLOBE AirFRANS example case (`examples/cfd/external_aerodynamics/globe/airfrans`)
79+
- Adds GLOBE DrivAerML example case (`examples/cfd/external_aerodynamics/globe/drivaer`)
80+
- Adds drop-test dynamics recipe.
81+
- Adds concrete dropout uncertainty quantification for GeoTransolver. Learnable
82+
per-layer dropout rates enable MC-Dropout inference for uncertainty
83+
estimates. Disabled by default (`concrete_dropout: false`).
4284
- Adds automatic support for `FSDP` and/or `ShardTensor` models in checkpoint save/load
4385
functionality
4486
- PhysicsNeMo-Mesh now supports conversion from PyVista/VTK/VTU meshes that may
@@ -226,42 +268,46 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
226268
combined-workflow and from-checkpoint round-trip tests. Most tests
227269
run with `fullgraph=True` and `error_on_recompile` to catch
228270
`torch.compile` regressions.
271+
- Internal weight initialization in the distributed AFNO layers and the
272+
`EarthAttention` blocks of `physicsnemo.nn.module.attention_layers` now
273+
dispatches to `torch.nn.init.trunc_normal_` directly instead of going
274+
through frozen in-tree copies of the pre-PyTorch-2.12 inverse-CDF
275+
implementation. PyTorch 2.12 reimplemented `trunc_normal_` as a
276+
rejection-sampling loop on top of `normal_()` (see
277+
[pytorch/pytorch#174997](https://github.com/pytorch/pytorch/pull/174997)),
278+
so seeded from-scratch initialization consumes the RNG stream
279+
differently on 2.12+ vs older versions. Existing trained checkpoints
280+
are unaffected (loading bypasses init). Forward-accuracy reference
281+
outputs for `AFNO`, `ModAFNO`, `Transolver`, `FLARE`, and `Pangu` were
282+
regenerated against the new algorithm. Rather than wiring per-model
283+
skips, `test.common.validate_forward_accuracy` now uniformly skips on
284+
`torch < 2.12` (the reference data is locked to that floor via a single
285+
`_REFERENCE_DATA_MIN_TORCH` constant; bump it when a PyTorch
286+
release next changes an init/RNG algorithm any forward-accuracy model
287+
depends on, and regenerate the `.pth` files at the same time).
229288

230289
### Deprecated
231290

232291
- `physicsnemo.utils.mesh` is deprecated and will be removed in v2.2.0. For
233292
isosurface extraction, use `physicsnemo.mesh.generate.marching_cubes` instead
234293
of `sdf_to_stl`. For VTP/OBJ/STL file conversion (`combine_vtp_files`,
235294
`convert_tesselated_files_in_directory`), use VTK or PyVista directly.
295+
- `physicsnemo.nn.module.utils.trunc_normal_` (and its submodule path
296+
`physicsnemo.nn.module.utils.weight_init.trunc_normal_`) is deprecated
297+
and will be removed in v2.2.0. It is now a thin wrapper around
298+
`torch.nn.init.trunc_normal_` that emits a `DeprecationWarning` on
299+
call, replacing the frozen in-tree copy of the legacy inverse-CDF
300+
implementation. Use `torch.nn.init.trunc_normal_` directly.
236301

237302
### Removed
238303

304+
- The legacy in-tree `trunc_normal_` implementation that lived in
305+
`physicsnemo/models/afno/distributed/layers.py` (`_trunc_normal_` /
306+
`_no_grad_trunc_normal_`) is removed. These names were private; all
307+
in-tree call sites now use `torch.nn.init.trunc_normal_`.
308+
239309
### Fixed
240310

241-
- Replaced three plain-string regex / docstring literals containing invalid
242-
escape sequences with raw-string equivalents
243-
(`physicsnemo/utils/logging/launch.py`,
244-
`physicsnemo/metrics/general/calibration.py`,
245-
`physicsnemo/metrics/general/crps.py`); these were `SyntaxWarning`s today
246-
and become `SyntaxError`s in Python 3.16.
247-
- Various test cleanups to remove self-inflicted warnings in CI output:
248-
disabled pytest collection for `TestModelA`/`TestModelB` helpers in
249-
`test/core/test_registry.py` via `__test__ = False`; migrated
250-
`test/nn/module/test_interpolation.py` to call the non-deprecated
251-
`grid_to_point_interpolation` and added a dedicated test for the
252-
deprecation alias; scoped a `lr_scheduler.step()`-before-`optimizer.step()`
253-
`UserWarning` filter to a single test in
254-
`test/optim/test_combined_optimizer.py`; guarded the
255-
`DistributedManager.initialize()` calls in `test/utils/test_checkpoint.py`
256-
with `is_initialized()`; and suppressed the import-time
257-
`ExperimentalFeatureWarning` in `test/datapipes/healda/test_features.py`
258-
via `warnings.catch_warnings()`.
259-
- Fixed `physicsnemo.utils.get_checkpoint_dir` returning paths with `\`
260-
separators on Windows (e.g. `.\checkpoints_model`), which was inconsistent
261-
with the `/`-based paths used elsewhere in the checkpoint utilities and
262-
broke the `test_get_checkpoint_dir` CI test on Windows. The function now
263-
always joins with `/`, working uniformly for local paths and `fsspec`
264-
URIs (`msc://`, etc.) across operating systems.
265311
- Fixed functional benchmark plot fallback labeling so unlabeled ASV results use
266312
the same key ordering as the benchmark runner.
267313
- Fixed graph break caused by `FunctionSpec` dispatch (`max(key=)` is not supported by `torch.compile`)
@@ -305,12 +351,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
305351
- Fixed the sinusoidal positional embeddings formula in `SongUNet` and
306352
`MultiDiffusionModel2D` so it now follows the standard `sin / cos`
307353
convention. Affected reference data was regenerated.
308-
309-
### Security
354+
- Constructing a `Mesh` (or `DomainMesh`) inside a `torch.compile`-traced
355+
function no longer raises `AttributeError` / `KeyError` or silently
356+
produces wrong output. The breakage came from two regressions in
357+
`tensordict >= 0.12.0` (PR `pytorch/tensordict#1552`), where the
358+
`@tensorclass` init wrapper's bypass branch silently skipped both
359+
field-default normalization and `__post_init__` under
360+
`torch.compile`. We pin `tensordict < 0.12` until the upstream fix
361+
(`pytorch/tensordict#1708`, `pytorch/tensordict#1709`) ships, and add
362+
a regression test (`test/mesh/mesh/test_compile.py`) that constructs
363+
a `Mesh` inside `torch.compile` and reads cached properties, so the
364+
same bug cannot return on a future pin bump unnoticed.
310365

311366
### Dependencies
312367

313368
- Increments minimum viable PyTorch version to `torch>=2.5.0` to support FSDP better
369+
- Upper-bounds `tensordict < 0.12` to avoid the `torch.compile` regressions
370+
in `tensordict >= 0.12.0` (see corresponding entry under Fixed).
314371

315372
## [2.0.0] - 2026-03-09
316373

Dockerfile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,9 @@ FROM ${BASE_CONTAINER} AS builder
2626
ARG TARGETPLATFORM
2727

2828
# Install uv (use system Python for installs; set so --system is default)
29-
COPY --from=ghcr.io/astral-sh/uv:0.10.3 /uv /uvx /bin/
29+
# Pinned to 0.11.14 (latest stable as of May 2026) which bundles
30+
# rustls-webpki >= 0.103.13 (fixes GHSA-82j2-j2ch-gfr8).
31+
COPY --from=ghcr.io/astral-sh/uv:0.11.14 /uv /uvx /bin/
3032
ENV UV_SYSTEM_PYTHON=1
3133
# Base image Python is PEP 668 externally-managed; allow system installs in container
3234
ENV UV_BREAK_SYSTEM_PACKAGES=1

examples/additive_manufacturing/sintering_physics/README.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,25 @@ For more sample parts simulation:
5858

5959
- Download PhysicsNeMo, make or install
6060

61+
- Install the example's Python dependencies:
62+
63+
```bash
64+
pip install -r requirements.txt
65+
```
66+
6167
- Find the matching torch-scatter version with torch and cuda version enabled:
6268
- i.e. pip install torch-scatter-f `https://data.pyg.org/whl/torch-2.2.0%2Bcu121/torch_scatter-2.1.2%2Bpt22cu121-cp311-cp311-linux_x86_64.whl`
6369
(replace the torch-scatter wheel with the matching cuda, torch version )
6470
- torch-scatter installation guide: `https://pypi.org/project/torch-scatter/`
6571
- wheels source: `https://data.pyg.org/whl/`
6672

67-
- pip install tensorflow
73+
- Install TensorFlow separately (used by `reading_utils.py`,
74+
`graph_dataset.py`, and the `data_process/` scripts for
75+
`tf.train.SequenceExample` I/O and `tf.data` pipelines):
76+
77+
```bash
78+
pip install "tensorflow>=2.15,<3.0"
79+
```
6880

6981
- test version: tensorflow-2.15.0.post1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
7082

examples/additive_manufacturing/sintering_physics/inference.py

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,6 @@
3232
"Mesh Graph Net Datapipe requires the Tensorflow library. Install the "
3333
+ "package at: https://www.tensorflow.org/install"
3434
)
35-
physical_devices = tf.config.list_physical_devices("GPU")
36-
37-
try:
38-
for device_ in physical_devices:
39-
tf.config.experimental.set_memory_growth(device_, True)
40-
except:
41-
# Invalid device or cannot modify virtual devices once initialized.
42-
pass
43-
4435
import hydra
4536
import torch
4637
from graph_dataset import GraphDataset
Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1-
# pyvista is optional, required if need to run data preprocessing from raw simulation
2-
# pyvista==0.32.1
3-
tensorflow>=2.15,<3.0 # generate tfrecord
1+
# Core dependencies for training, inference, and rollout rendering
2+
dm-tree
3+
tqdm
4+
hydra-core
5+
omegaconf
6+
matplotlib
7+
pyvista
8+
vtk
9+
natsort
10+
scikit-learn
11+
tensorboard
12+
tensorflow-cpu>=2.15,<3.0

examples/additive_manufacturing/sintering_physics/train.py

Lines changed: 23 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -58,14 +58,6 @@
5858
)
5959
from physicsnemo.models.vfgn.graph_network_modules import VFGNLearnedSimulator
6060

61-
physical_devices = tf.config.list_physical_devices("GPU")
62-
try:
63-
for device_ in physical_devices:
64-
tf.config.experimental.set_memory_growth(device_, True)
65-
except:
66-
# Invalid device or cannot modify virtual devices once initialized.
67-
pass
68-
6961

7062
def Train(rank_zero_logger, dist, cfg: DictConfig):
7163
"""
@@ -131,6 +123,7 @@ def Train(rank_zero_logger, dist, cfg: DictConfig):
131123
writer = SummaryWriter(log_dir=cfg.data_options.ckpt_path_vfgn)
132124

133125
optimizer = None
126+
scaler = None
134127
# todo : check device
135128
device = "cpu"
136129
step = 0
@@ -178,18 +171,23 @@ def Train(rank_zero_logger, dist, cfg: DictConfig):
178171

179172
sampled_noise *= noise_mask
180173

181-
pred_target = model(
182-
next_positions=targets.to(device),
183-
position_sequence=inputs.to(device),
184-
position_sequence_noise=sampled_noise.to(device),
185-
n_particles_per_example=features["n_particles_per_example"].to(device),
186-
n_edges_per_example=features["n_edges_per_example"].to(device),
187-
senders=features["senders"].to(device),
188-
receivers=features["receivers"].to(device),
189-
predict_length=cfg.train_options.pred_len,
190-
particle_types=features["particle_type"].to(device),
191-
global_context=features.get("step_context").to(device),
192-
)
174+
amp_enabled = cfg.general.fp16 and scaler is not None
175+
with torch.autocast(
176+
device_type=device.type if isinstance(device, torch.device) else "cpu",
177+
enabled=amp_enabled,
178+
):
179+
pred_target = model(
180+
next_positions=targets.to(device),
181+
position_sequence=inputs.to(device),
182+
position_sequence_noise=sampled_noise.to(device),
183+
n_particles_per_example=features["n_particles_per_example"].to(device),
184+
n_edges_per_example=features["n_edges_per_example"].to(device),
185+
senders=features["senders"].to(device),
186+
receivers=features["receivers"].to(device),
187+
predict_length=cfg.train_options.pred_len,
188+
particle_types=features["particle_type"].to(device),
189+
global_context=features.get("step_context").to(device),
190+
)
193191

194192
if optimizer is None:
195193
# first data need to inference the feature size
@@ -208,14 +206,7 @@ def Train(rank_zero_logger, dist, cfg: DictConfig):
208206
model = model.to(device)
209207
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
210208
if cfg.general.fp16:
211-
# double check if amp installed
212-
try:
213-
from apex import amp
214-
215-
model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
216-
except ImportError as e:
217-
print("Apex package not available -> ", e)
218-
exit()
209+
scaler = torch.amp.GradScaler(device.type)
219210

220211
scheduler = torch.optim.lr_scheduler.ExponentialLR(
221212
optimizer, gamma=0.1, verbose=True
@@ -394,11 +385,12 @@ def Train(rank_zero_logger, dist, cfg: DictConfig):
394385
# back propogation
395386
optimizer.zero_grad()
396387
if cfg.general.fp16:
397-
with amp.scale_loss(loss, optimizer) as scaled_loss:
398-
scaled_loss.backward()
388+
scaler.scale(loss).backward()
389+
scaler.step(optimizer)
390+
scaler.update()
399391
else:
400392
loss.backward()
401-
optimizer.step()
393+
optimizer.step()
402394

403395
running_loss += loss.item()
404396

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
hydra-core>=1.2.0
22
termcolor>=2.1.1
3+
matplotlib
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
mlflow>=2.1.1
1+
mlflow>=3.12.0
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
1-
gdown
1+
gdown>=5.2.2
2+
matplotlib
3+
scipy

0 commit comments

Comments
 (0)