python -m torch.distributed.launch --nproc_per_node=1 --master_port=14485 tools/test.py configs/ntu120_xsub/j.py -C checkpoints/ntu120_xsub/j_1/best_top1_acc_epoch_141.pth --launcher pytorch --eval top_k_accuracy --out result.pkl
/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/distributed/launch.py:207: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects --local-rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
main()
/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
2025-11-13 17:31:59,649 - protogcn - INFO - 50919 videos remain after valid thresholding
/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via init_process_group or barrier . Using the current device set by the user.
warnings.warn( # warn only once
[rank0]:[W1113 17:31:59.054641934 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
load checkpoint from local path: checkpoints/ntu120_xsub/j_1/best_top1_acc_epoch_141.pth
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/ProtoGCN/tools/test.py", line 252, in
[rank0]: main()
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/ProtoGCN/tools/test.py", line 235, in main
[rank0]: outputs = inference_pytorch(args, cfg, data_loader)
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/ProtoGCN/tools/test.py", line 164, in inference_pytorch
[rank0]: load_checkpoint(model, args.checkpoint, map_location='cpu')
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/mmcv/runner/checkpoint.py", line 638, in load_checkpoint
[rank0]: checkpoint = _load_checkpoint(filename, map_location, logger)
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/mmcv/runner/checkpoint.py", line 572, in _load_checkpoint
[rank0]: return CheckpointLoader.load_checkpoint(filename, map_location, logger)
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/mmcv/runner/checkpoint.py", line 314, in load_checkpoint
[rank0]: return checkpoint_loader(filename, map_location) # type: ignore
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/mmcv/runner/checkpoint.py", line 334, in load_from_local
[rank0]: checkpoint = torch.load(filename, map_location=map_location)
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/serialization.py", line 1524, in load
[rank0]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
[rank0]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
[rank0]: (1) In PyTorch 2.6, we changed the default value of the weights_only argument in torch.load from False to True. Re-running torch.load with weights_only set to False will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
[rank0]: (2) Alternatively, to load with weights_only=True please check the recommended steps in the following error message.
[rank0]: WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use torch.serialization.add_safe_globals([numpy.core.multiarray.scalar]) or the torch.serialization.safe_globals([numpy.core.multiarray.scalar]) context manager to allowlist this global if you trust this class/function.
bash tools/dist_test.sh configs/ntu60_xsub/j.py checkpoints/CHECKPOINT.pth 1 --eval top_k_accuracy --out result.pkl
Where can I find this? -> CHECKPOINT.pth
Or, should I do it like this?
bash tools/dist_test.sh configs/ntu60_xsub/j.py checkpoints/ntu120_xsub_joint_best.pth1 --eval top_k_accuracy --out result.pkl
However, this error happens, what am I missing?
error log :
bash tools/dist_test.sh configs/ntu120_xsub/j.py checkpoints/ntu120_xsub/j_1/best_top1_acc_epoch_141.pth 1 --eval top_k_accuracy --out result.pkl
CONFIG=configs/ntu120_xsub/j.py
CHECKPOINT=checkpoints/ntu120_xsub/j_1/best_top1_acc_epoch_141.pth
GPUS=1
++ dirname tools/dist_test.sh
MKL_SERVICE_FORCE_INTEL=1
++ dirname tools/dist_test.sh
PYTHONPATH=tools/..:
CUDA_VISIBLE_DEVICES=0
python -m torch.distributed.launch --nproc_per_node=1 --master_port=14485 tools/test.py configs/ntu120_xsub/j.py -C checkpoints/ntu120_xsub/j_1/best_top1_acc_epoch_141.pth --launcher pytorch --eval top_k_accuracy --out result.pkl
/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/distributed/launch.py:207: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects
--local-rankargument to be set, pleasechange it to read from
os.environ['LOCAL_RANK']instead. Seehttps://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
main()
/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
2025-11-13 17:31:59,649 - protogcn - INFO - 50919 videos remain after valid thresholding
/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via
init_process_grouporbarrier. Using the current device set by the user.warnings.warn( # warn only once
[rank0]:[W1113 17:31:59.054641934 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
load checkpoint from local path: checkpoints/ntu120_xsub/j_1/best_top1_acc_epoch_141.pth
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/ProtoGCN/tools/test.py", line 252, in
[rank0]: main()
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/ProtoGCN/tools/test.py", line 235, in main
[rank0]: outputs = inference_pytorch(args, cfg, data_loader)
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/ProtoGCN/tools/test.py", line 164, in inference_pytorch
[rank0]: load_checkpoint(model, args.checkpoint, map_location='cpu')
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/mmcv/runner/checkpoint.py", line 638, in load_checkpoint
[rank0]: checkpoint = _load_checkpoint(filename, map_location, logger)
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/mmcv/runner/checkpoint.py", line 572, in _load_checkpoint
[rank0]: return CheckpointLoader.load_checkpoint(filename, map_location, logger)
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/mmcv/runner/checkpoint.py", line 314, in load_checkpoint
[rank0]: return checkpoint_loader(filename, map_location) # type: ignore
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/mmcv/runner/checkpoint.py", line 334, in load_from_local
[rank0]: checkpoint = torch.load(filename, map_location=map_location)
[rank0]: File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/serialization.py", line 1524, in load
[rank0]: raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
[rank0]: _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
[rank0]: (1) In PyTorch 2.6, we changed the default value of the
weights_onlyargument intorch.loadfromFalsetoTrue. Re-runningtorch.loadwithweights_onlyset toFalsewill likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.[rank0]: (2) Alternatively, to load with
weights_only=Trueplease check the recommended steps in the following error message.[rank0]: WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use
torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])or thetorch.serialization.safe_globals([numpy.core.multiarray.scalar])context manager to allowlist this global if you trust this class/function.[rank0]: Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[rank0]:[W1113 17:32:00.391155371 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
E1113 17:32:01.134000 340892 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 340930) of binary: /home/mhncity/Desktop/handonghun/protogcn/protogcn/bin/python
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/distributed/launch.py", line 207, in
main()
File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/typing_extensions.py", line 3004, in wrapper
return arg(*args, **kwargs)
File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/distributed/launch.py", line 203, in main
launch(args)
File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/distributed/launch.py", line 188, in launch
run(args)
File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/distributed/run.py", line 883, in run
elastic_launch(
File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 139, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/mhncity/Desktop/handonghun/protogcn/protogcn/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 270, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
tools/test.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2025-11-13_17:32:01
host : mhncity-desktop
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 340930)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html