Context
After training my model using the Lightning Module, i tried exporting the resulting model to Pytorch and ONNX. i followed the training structure used in the git repo, i then exported the model to pytorch, then tried to load it and export it to ONNX (the code is down below)
I used a LightningModule and Dataset class structures both identical to the ones described in the repo notebook . and my libraries versions are as follow:
- PyTorch: 2.4.0a0+f70bd71a48.nv24.06
- Torch CUDA available: True
- PyTorch Lightning: 2.6.1
- Segmentation Models PyTorch: 0.5.0
- NumPy: 1.26.4
- OpenCV: 4.9.0
Error
The training, export to pytorch and loading is handled fine, no issue in these steps, i can even run inferences using the loaded smp.PSPNet model. But when trying to export the pytorch smp.PSPNet model to ONNX, i run into a SymbolicValueError saying :
Unsupported: ONNX export of operator adaptive_avg_pool2d, output size that are not factor of input size. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues [Caused by the value '469 defined in (%469 : Long(2, strides=[1], device=cpu) = onnx::Constant[value= 3 3 [ CPULongType{2} ]]()
)' (type 'Tensor') in the TorchScript graph. The containing node has kind 'onnx::Constant'.]
Inputs:
Empty
Outputs:
#0: 469 defined in (%469 : Long(2, strides=[1], device=cpu) = onnx::Constant[value= 3 3 [ CPULongType{2} ]]()
) (type 'Tensor')
Code used
Please refer to the code below showing how i trained, loaded, and exported the checkpoints.
trainer.fit(
model,
train_dataloaders=train_loader,
val_dataloaders=valid_loader,
)
torch.save(model.model.state_dict(), "model.pth")
model = smp.PSPNet(
encoder_name="mobilenet_v2",
encoder_weights=None,
classes=1,
activation=None
)
model.load_state_dict(torch.load("model.pth", map_location="cpu"))
model.to(device)
images, masks = next(iter(train_loader))
dummy_input = images[:1]
dummy_input = dummy_input.float()
dummy_input = dummy_input / 255.0
dummy_input = dummy_input.to(device)
with torch.inference_mode():
model.eval()
output = model(dummy_input)
torch.onnx.export(
model,
dummy_input,
"model.onnx",
export_params=True,
opset_version=17, # the ONNX version to export
do_constant_folding=True, # whether to execute constant folding for optimization
input_names=["input"],
output_names=["output"],
dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}},
)
Discussion
The input used in the ONNX export is the exact input used in the training (which is 128x128) since i used the same Dataloader, so i find it very weird that the error is showing a complete different number and that it leads to this error.
I assume that i should use avg_pool2d instead of adaptive_avg_pool2d because i read that ONNX doesn't know how to handle the adaptive_avg_pool2d, but i can find nowhere how to change the pooling function in smp or LightningModule.
Does any one have any idea how i can solve the issue please ?
Context
After training my model using the Lightning Module, i tried exporting the resulting model to Pytorch and ONNX. i followed the training structure used in the git repo, i then exported the model to pytorch, then tried to load it and export it to ONNX (the code is down below)
I used a
LightningModuleandDatasetclass structures both identical to the ones described in the repo notebook . and my libraries versions are as follow:Error
The training, export to pytorch and loading is handled fine, no issue in these steps, i can even run inferences using the loaded
smp.PSPNetmodel. But when trying to export the pytorchsmp.PSPNetmodel to ONNX, i run into aSymbolicValueErrorsaying :Code used
Please refer to the code below showing how i trained, loaded, and exported the checkpoints.
Discussion
The input used in the ONNX export is the exact input used in the training (which is
128x128) since i used the sameDataloader, so i find it very weird that the error is showing a complete different number and that it leads to this error.I assume that i should use
avg_pool2dinstead ofadaptive_avg_pool2dbecause i read that ONNX doesn't know how to handle theadaptive_avg_pool2d, but i can find nowhere how to change the pooling function insmporLightningModule.Does any one have any idea how i can solve the issue please ?