Is your feature request related to a problem? Please describe.
I notice that the focal loss implementation is currently using the BCE loss and sigmoid activation function. For the multiclass situation, would it be more appropriate to use softmax instead?
Is your feature request related to a problem? Please describe.
I notice that the focal loss implementation is currently using the BCE loss and sigmoid activation function. For the multiclass situation, would it be more appropriate to use softmax instead?