Implement a function to compute the gradient of the cross-entropy loss with softmax activation, a crucial component in backpropagation for training deep neural networks. This task involves understanding the mathematical foundations of softmax and cross-entropy loss.

The softmax function is used for multi-class classification problems, where it maps a vector of real numbers to a vector of probabilities, ensuring that each element is in the range (0, 1) and the elements sum up to 1. The cross-entropy loss measures the difference between the predicted probabilities and the true distribution, typically represented as a one-hot encoded vector.

To compute the gradient, we follow these steps:

Compute the softmax of the input vector $z$ using the formula $p_i = \frac{e^{z_i}}{\sum_j e^{z_j}}$ .
Calculate the cross-entropy loss using $L = -\sum_i y_i \log(p_i)$ , where $y$ is the one-hot encoded target vector. The key formula for the gradient is:

\frac{\partial L}{\partial z_i} = p_i - y_i

This technique is widely used in image classification tasks.

Softmax of [1,2,3]: [0.09, 0.24, 0.67] One-hot target: [0, 0, 1]

Gradient = softmax - one_hot = [0.09, 0.24, 0.67-1] = [0.09, 0.24, -0.33]

logits: Raw network outputs (batch_size, num_classes)
targets: Ground truth class indices (batch_size,)
Return: Gradient tensor same shape as logits

Editor

Python 3.13.1

Test Results

0/0

Run code to see test results.

📘

Softmax Cross-Entropy Gradient

MediumDeep Learning

To compute the gradient, we follow these steps:

Compute the softmax of the input vector $z$ using the formula $p_i = \frac{e^{z_i}}{\sum_j e^{z_j}}$ .
Calculate the cross-entropy loss using $L = -\sum_i y_i \log(p_i)$ , where $y$ is the one-hot encoded target vector. The key formula for the gradient is:

\frac{\partial L}{\partial z_i} = p_i - y_i

This technique is widely used in image classification tasks.

Example:

Input:

logits = [[1.0, 2.0, 3.0]]  # batch=1, classes=3
targets = [2]  # True class is index 2

Output:

[[0.09, 0.24, -0.67]]

Reasoning: