Optimising Likelihoods - Monotonic Transforms

NOTE

This is part of the 3rd homework, titled Probabilistic Inference, for the course Machine Learning (IN2064) in the Winter Semester 2024/25 at TUM.

Intro

Usually, we maximise the log-likelihood, $lo g p (x_{1}, \dots, x_{n} ∣ θ)$ instead of the likelihood. The next two problems provide a justification for this. In the lecture, we encountered the likelihood maximisation problem:

ar g θ \in [0, 1] max θ^{t} (1 - θ)^{h},

where $t$ and $h$ denoted the number of tails and heads in a sequence of coin tosses, respectively.

Problem 6

Compute the first and second derivative of this likelihood with respect to $θ$ . Then compute the first and second derivative of the log-likelihood $lo g θ^{t} (1 - θ)^{h}$ .

Solution

Likelihood:

First derivative:

\frac{d}{d θ} θ^{t} (1 - θ)^{h} = t θ^{t - 1} (1 - θ)^{h} - h θ^{t} (1 - θ)^{h - 1}

Second derivative:

\frac{d ^{2}}{d θ ^{2}} [θ^{t} (1 - θ)^{h}] = \frac{d}{d θ} [t θ^{t - 1} (1 - θ)^{h} - h θ^{t} (1 - θ)^{h - 1}] = Derivative of first term [t (t - 1) θ^{t - 2} (1 - θ)^{h} - h t θ^{t - 1} (1 - θ)^{h - 1}] Derivative of second term - [h t θ^{t - 1} (1 - θ)^{h - 1} - h (h - 1) θ^{t} (1 - θ)^{h - 2}] = t (t - 1) θ^{t - 2} (1 - θ)^{h} - 2 h t θ^{t - 1} (1 - θ)^{h - 1} + h (h - 1) θ^{t} (1 - θ)^{h - 2}

Log-likelihood:

First derivative:

\frac{d}{d θ} [lo g (θ^{t} (1 - θ)^{h})] = \frac{d}{d θ} [t lo g θ + h lo g (1 - θ] = \frac{t}{θ} - \frac{h}{1 - θ}

Second derivative:

\frac{d ^{2}}{d θ ^{2}} [lo g (θ^{t} (1 - θ)^{h})] = \frac{d}{d θ} [\frac{t}{θ} - \frac{h}{1 - θ}] = t θ^{- 2} - h (1 - θ)^{- 2}

Problem 7

Show that for any differentiable, positive function $f (θ)$ , every local maximum of $lo g f (θ)$ is also a local maximum of $f (θ)$ . Considering this and the previous exercise, what is your conclusion?

Solution

Let $f (θ)$ be a differentiable and positive function on its domain. Denote $g (θ) = lo g f (θ)$ . We will show that: *if $θ_{0}$ is a local maximum of $g (θ)$ , then $θ_{0}$ is also a local maximum of $f (θ)$ .

Since $f (θ) > 0$ , $lo g f (θ)$ is well-defined and differentiable. The derivative of $g (θ)$ is given by:

g^{'} (θ) = \frac{f ^{'} ( θ )}{f ( θ )} .

Now, if $θ_{0}$ is a local maximum of $g (θ)$ , then $g^{'} (θ_{0}) = 0$ . Substituting $g^{'} (θ_{0})$ :

\frac{f ^{'} ( θ _{0} )}{f ( θ _{0} )} = 0.

Since $f (θ_{0}) > 0$ , this implies:

f^{'} (θ_{0}) = 0.

Thus, $θ_{0}$ is a critical point of $f (θ)$ . As $θ_{0}$ is a local maximum of $g (θ)$ , there exists an interval $(θ_{0} - ϵ, θ_{0} + ϵ)$ such that for all $θ$ in this interval:

g (θ) \leq g (θ_{0}) .

Equivalently:

lo g f (θ) \leq lo g f (θ_{0}) .

Exponentiating both sides (which preserves the inequality since the exponential function is monotonic):

f (θ) \leq f (θ_{0}) .

Thus, $θ_{0}$ is a local maximum of $f (θ)$ .

Conclusion

After this and the previous exercise, we can conclude that maximising the likelihood function is the same as maximising the log-likelihood function. However, the log-likelihood is much simpler to derive, making it easier to use in practice.

aderylo: field-report

Explorer

Optimising Likelihoods - Monotonic Transforms

Intro

Problem 6

Solution

Problem 7

Solution

Conclusion

Graph View

Table of Contents

Backlinks