Maximum likelihood is one of those topics in mathematical statistics that takes a while to wrap your head around. At first glance it seems to be making something that seems easy (estimating a parameter with a statistic) into something way more complicated than it needs to be. For example, a frequent exercise is to find the maximum likelihood estimator of the mean of a normal distribution. You take the product of the n normal pdfs, take the log of that, find the first derivative, set it equal to 0 and solve. You find out that it’s
Anyway, one result of maximum likelihood that baffled me for the longest time was the variance of a maximum likelihood estimator. It’s this:
To me this had no intuition at all. In fact it still doesn’t. However it works. Now many statistics books will go over determining the maximum likelihood estimator in painstaking detail, but then they’ll blow through the variance of the estimator in a few lines. The purpose of this post is to hopefully fill in those gaps.
It all starts with Taylor’s theorem, which says (in so many words):
We’re interested in approximating the variance, so we forget about the remainder. The rest is another (approximate) way to express
The first thing to note here is that
Now let’s take the variance of that expression:
Wait, hold up.
Now take the variance:
OK, getting closer. It’s starting to look like the result I showed at the beginning. We still need to find
First let’s recall the useful formula
Now it turns out that the
Once again, we have to back up and make yet another observation. Notice the following:
Recall that
Now rearrange as follows and see what we have:
Look at our variance formula we were working on:
See where we can make the substitution? Let’s do it:
The expected value of the second term is 0 for the same reason that
OK, we’re ALMOST THERE! Now bring back the full variance expression we had earlier…
…and plug what we just found:
Do the cancellation and we get the final reduced expression for the variance of the maximum likelihood estimator: