Properties of MLE and MAP

NOTE

This is part of the 3rd homework, titled Probabilistic Inference, for the course Machine Learning (IN2064) in the Winter Semester 2024/25 at TUM.

Problem 8:

Consider a Bernoulli random variable $X$ and suppose we have observed $m$ occurrences of $X = 1$ and $l$ occurrences of $X = 0$ in a sequence of $N = m + l$ Bernoulli experiments. We are only interested in the number of occurrences of $X = 1$ —we will model this with a Binomial distribution with parameter $θ$ . A prior distribution for $θ$ is given by the Beta distribution with parameters $a, b$ . Show that the posterior mean value $E [θ ∣ D]$ (not the MAP estimate) of $θ$ lies between the prior mean of $θ$ and the maximum likelihood estimate for $θ$ .

To do this, show that the posterior mean can be written as $λ$ times the prior mean plus $(1 - λ)$ times the maximum likelihood estimate, with $0 \leq λ \leq 1$ . This illustrates the concept of the posterior mean being a compromise between the prior distribution and the maximum likelihood solution.

The probability mass function of the Binomial distribution for some $m \in {0, 1, \dots, N}$ is:

p (x = m ∣ N, θ) = (m N) θ^{m} (1 - θ)^{N - m} .

Hint: Identify the posterior distribution. You may then look up the mean rather than computing it.

Solution

Our goal is to show that the posterior mean $E [θ ∣ D]$ lies between the prior mean $E [θ]$ and the maximum likelihood estimate (MLE) $\hat{θ}_{MLE}$ . Specifically, we aim to show that:

E [θ ∣ D] = λ \cdot E [θ] + (1 - λ) \cdot \hat{θ}_{MLE},

for some $λ \in [0, 1]$ . This illustrates the concept that the posterior mean is a compromise between the prior distribution and the maximum likelihood solution.

We will proceed in the following steps:

Calculating the posterior distribution.
Calculating the mean of the posterior distribution.
Calculating the maximum likelihood estimate (MLE) of $θ$ .
Expressing the posterior mean as a convex combination of the prior mean and the MLE.

1. Calculating the posterior distribution

First, we recall the forms of the prior distribution and the likelihood function:

Prior distribution: $P (θ) = Beta (a, b) = \frac{Γ ( a + b )}{Γ ( a ) Γ ( b )} θ^{a - 1} (1 - θ)^{b - 1}, θ \in [0, 1] .$
Likelihood function:

P (D ∣ θ) = (m N) θ^{m} (1 - θ)^{N - m},

where $D$ represents the observed data (with $m$ successes and $l = N - m$ failures).

Using Bayes’ theorem, the posterior distribution is proportional to the product of the likelihood and the prior:

P (θ ∣ D) \propto P (D ∣ θ) P (θ) .

Substituting the expressions for the likelihood and the prior, we get:

P (θ ∣ D) \propto [θ^{m} (1 - θ)^{N - m}] [θ^{a - 1} (1 - θ)^{b - 1}] = θ^{(a + m - 1)} (1 - θ)^{(b + N - m - 1)} = θ^{(a + m - 1)} (1 - θ)^{(b + l - 1)} .

It’s not hard to notice that we arrived at unnormalized form of a Beta distribution with updated parameters. Thanks to our perceptiveness we don’t have to go through the hoops of calculating normalizing constant, as it will be equal to the gamma term in the PDF of Beta distribution. As such we have calculated a posterior distribution:

P (θ ∣ D) = Beta (a^{'}, b^{'}),

where:

a^{'} = a + m, b^{'} = b + l .

2. Calculating the mean of the posterior distribution

The mean of a Beta distribution $Beta (α, β)$ is given by:

E [θ] = \frac{α}{α + β} .

Therefore, the posterior mean as given as follows:

E [θ ∣ D] = E [B e t a (α + m, β + l)] = \frac{a + m}{a + b + N} .

3. Calculating the maximum likelihood estimate (MLE)

The maximum likelihood estimate of $θ$ for the Binomial distribution is the sample proportion:

\hat{θ}_{MLE} = \frac{m}{N} .

4. Expressing the posterior mean as a convex combination

The prior mean is:

E [θ] = \frac{a}{a + b} .

Now, we aim to express $E [θ ∣ D]$ as a convex combination of $E [θ]$ and $\hat{θ}_{MLE}$ . Consider:

E [θ ∣ D] = \frac{a + m}{a + b + N} = \frac{a + b}{a + b + N} \cdot \frac{a}{a + b} + \frac{N}{a + b + N} \cdot \frac{m}{N} = (\frac{a + b}{a + b + N}) E [θ] + (\frac{N}{a + b + N}) \hat{θ}_{MLE} .

Defining:

λ = \frac{a + b}{a + b + N}, 1 - λ = \frac{N}{a + b + N},

we have:

E [θ ∣ D] = λ \cdot E [θ] + (1 - λ) \cdot \hat{θ}_{MLE} .

Since $a, b, N \geq 0$ , it follows that $λ \in [0, 1]$ .

Conclusion

We have shown that the posterior mean $E [θ ∣ D]$ is a weighted average of the prior mean $E [θ]$ and the MLE $\hat{θ}_{MLE}$ , with weights $λ$ and $1 - λ$ , respectively, where $λ = \frac{a + b}{a + b + N}$ .

Therefore, the posterior mean lies between the prior mean and the maximum likelihood estimate:

E [θ ∣ D] = λ \cdot E [θ] + (1 - λ) \cdot \hat{θ}_{MLE}, with 0 \leq λ \leq 1.

This result illustrates how the posterior mean balances the prior belief and the observed data in Bayesian inference.

Problem 9

Consider the following probabilistic model:

p (λ ∣ a, b) p (x ∣ λ) = Gamma (λ ∣ a, b) = \frac{b ^{a}}{Γ ( a )} λ^{a - 1} exp (- bλ) = Poisson (x ∣ λ) = \frac{λ ^{x} exp ( - λ )}{x !}

where $a \in (1, \infty)$ and $b \in (0, \infty)$ . We have observed a single data point $x \in N$ . Task: Derive the maximum a posteriori (MAP) estimate of the parameter $λ$ for the above probabilistic model. Show your work.

Solution

To calculate the maximum a posteriori, we need to compute the following formula:

ar g λ max p (λ ∣ a, b, x) (1)

From Bayes’ theorem we can derive:

p (λ ∣ a, b, x) = \frac{p ( a , b , x ∣ λ ) \cdot p ( λ )}{p ( a , b , x )} = \frac{p ( a , b ∣ λ ) \cdot p ( x ∣ λ ) \cdot p ( λ )}{p ( a , b ) \cdot p ( x )} = \frac{p ( x ∣ λ ) \cdot p ( λ ∣ a , b )}{p ( x )} \propto p (x ∣ λ) \cdot p (λ ∣ a, b) (2)

Applying Equation (2) to Equation (1), we get:

ar g λ max [lo g p (x ∣ λ) + lo g p (λ ∣ a, b)] = = ar g λ max [lo g (\frac{λ ^{x} exp ( - λ )}{x !}) + lo g (\frac{b ^{a}}{Γ ( a )} λ^{a - 1} exp (- bλ))] = = ar g λ max [x lo g λ - λ + (a - 1) lo g λ - bλ] = = ar g λ max [x lo g λ - λ + (a - 1) lo g λ - bλ] = = ar g λ max [(x + a - 1) lo g λ - (1 + b) λ] (3)

From the assumptions, we know that:

$a \in (1, \infty)$ ,
$b \in (0, \infty)$ ,
$x \in N$ .

Because of these and the concavity of the logarithm function, we know that Equation (3) is concave.

To find the maximum, we set the derivative to zero:

\frac{( x + a - 1 )}{λ} - (1 + b) = 0

Solving for $λ$ :

λ = \frac{x + a - 1}{1 + b}

Answer:

λ_{MAP} = \frac{x + a - 1}{1 + b}

aderylo: field-report

Explorer

Properties of MLE and MAP

Problem 8:

Solution

1. Calculating the posterior distribution

2. Calculating the mean of the posterior distribution

3. Calculating the maximum likelihood estimate (MLE)

4. Expressing the posterior mean as a convex combination

Conclusion

Problem 9

Solution

Graph View

Table of Contents

Backlinks