Implement a function to compute the gradient of the cross-entropy loss with softmax activation, a crucial component in backpropagation for training deep neural networks. This task involves understanding the mathematical foundations of softmax and cross-entropy loss.
The softmax function is used for multi-class classification problems, where it maps a vector of real numbers to a vector of probabilities, ensuring that each element is in the range (0, 1) and the elements sum up to 1. The cross-entropy loss measures the difference between the predicted probabilities and the true distribution, typically represented as a one-hot encoded vector.
To compute the gradient, we follow these steps:
This technique is widely used in image classification tasks.
logits = [[1.0, 2.0, 3.0]] # batch=1, classes=3 targets = [2] # True class is index 2
[[0.09, 0.24, -0.67]]
Softmax of [1,2,3]: [0.09, 0.24, 0.67] One-hot target: [0, 0, 1]
Gradient = softmax - one_hot = [0.09, 0.24, 0.67-1] = [0.09, 0.24, -0.33]