NOTE
This is part of the 7th homework, titled Deep Learning I, for the course Machine Learning (IN2064) in the Winter Semester 2024/25 at TUM.
Problem 3:
In machine learning you often come across problems which contain the following quantity:
For example, if we want to calculate the log-likelihood of a neural network with a softmax output, we get this quantity due to the normalisation constant. If you try to calculate it naively, you will quickly encounter underflows or overflows, depending on the scale of . Despite working in log-space, the limited precision of computers is not enough, and the result will be or . To combat this issue, we typically use the following identity:
for an arbitrary . This means you can shift the center of the exponential sum. A typical value is setting to the maximum (), which forces the greatest value to be zero, and even if the other values would underflow, you get a reasonable result.
Your task is to show that the identity holds.
Solution:
In we used the fact that . In this manner we have proved that LHS = RHS. Q.E.D.
Problem 4:
Similar to the previous exercise, we can compute the output of the softmax function
in a numerically stable way by shifting by an arbitrary constant (a):
Often, . Show that the above identity holds.
Solution:
The proof is rather trivial:
Q.E.D