NOTE

This is part of the 7th homework, titled Deep Learning I, for the course Machine Learning (IN2064) in the Winter Semester 2024/25 at TUM.

Problem 3:

In machine learning you often come across problems which contain the following quantity:

For example, if we want to calculate the log-likelihood of a neural network with a softmax output, we get this quantity due to the normalisation constant. If you try to calculate it naively, you will quickly encounter underflows or overflows, depending on the scale of . Despite working in log-space, the limited precision of computers is not enough, and the result will be or . To combat this issue, we typically use the following identity:

for an arbitrary . This means you can shift the center of the exponential sum. A typical value is setting to the maximum (), which forces the greatest value to be zero, and even if the other values would underflow, you get a reasonable result.

Your task is to show that the identity holds.

Solution:

In we used the fact that . In this manner we have proved that LHS = RHS. Q.E.D.

Problem 4:

Similar to the previous exercise, we can compute the output of the softmax function

in a numerically stable way by shifting by an arbitrary constant (a):

Often, . Show that the above identity holds.

Solution:

The proof is rather trivial:

Q.E.D