We present the detailed performance on various datasets and modalities.
All the checkpoints and training logs are provided in the Google Drive. We sincerely hope that this repo could be helpful for your research.
The detailed results for pretrained models are displayed below:
| Modality | NTU 60 X-Sub | NTU 60 X-View | NTU 120 X-Sub | NTU 120 X-Set |
|---|---|---|---|---|
| Joint | 92.75 | 97.39 | 87.39 | 89.59 |
| Bone | 93.01 | 96.94 | 90.01 | 91.57 |
| K-Bone | 93.01 | 97.01 | 89.49 | 90.46 |
| 2-ensemble | 94.01 | 97.73 | 90.51 | 92.43 |
| 4-ensemble | 94.30 | 97.99 | 91.35 | 92.97 |
| 6-ensemble | 94.51 | 98.19 | 91.70 | 93.05 |
| Modality | Kinetics-Skeleton | FineGYM | UAV-Human CSv1 | UAV-Human CSv2 |
|---|---|---|---|---|
| Joint | 50.69 | 94.64 | 47.25 | 73.66 |
| Bone | 49.10 | 95.70 | 48.50 | 73.98 |
| K-Bone | 48.26 | 95.54 | 47.44 | 73.42 |
| 2-ensemble | 52.24 | 96.08 | 50.31 | 76.00 |
| 4-ensemble | 52.98 | 96.46 | 50.85 | 76.95 |
| 6-ensemble | 53.57 | 96.53 | 52.05 | 78.07 |
We adopt the widely-used six-stream ensemble strategy introduced in InfoGCN. Here K-Bone denotes the newly skeleton representation proposed by InfoGCN. Interestingly, we find that the improvement of multi-stream ensemble method mainly comes from complementarity and stochasticity. For well-performing models, stochastic boosting of single-modality is more efficient than complementary boosting of motion-modality. The detailed comparisons for various datasets are provided in {dataset}_ensemble.py.
In addition, we use three augmentation techniques Flip, Part Drop, and Mixup that could provide performance gains. Here, Mixup doubles the samples, which leads to longer training times. Nevertheless, when coupled with multi-modal semantic priors, it could deliver superior performance. Notably, due to randomness, these augmentations may also make the performance fluctuate. You could choose whether or not to use them based on the actual needs.