MLE in R

Note: This exercise is borrowed from Jason Blevins

This is a short exercise in running maximum likelihood estimation (MLE) in R. We will do it for the following AR(1) process: \[ z_t = \mu + \rho (z_{t - 1} - \mu) + \sigma \varepsilon_t \] where \(\varepsilon_t \sim N(0, 1)\).

Before I get into the details of MLE, what is the fundamental insight of MLE? The idea is that we want to estimate the parameters of the model that are generating our data. There are some “true” parameters that we are trying to estimate (\(\mu\), \(\rho\), and \(\sigma\)), but we cannot directly observe them. MLE asserts that for given parameters values, we can compute the probability of seeing our specific draw of the data (i.e. the “likelihood” of the data). MLE says that the best guess of these parameters is the one that maximizes this likelihood.

So, back to the exercise. We want to perform MLE to estimate the above AR(1) process. First, let’s get some data. We’ll assume that the true parameter values are \(\mu = -0.5\), \(\rho = 0.9\), and \(\sigma = 1\). I also set the seed of the random number generator so that anyone running this script will be able to reproduce these results.

set.seed(100)

# True parameters
mu <- -0.5
rho <- 0.9
sigma <- 1.0

I’ll simulate this data for 10,000 periods and allocate a vector to hold the data.

periods <- 10000
z <- double(periods)

I need to draw an initial value of \(z_1\) from which to generate the full stream of data. Plugging in \(z\) recursively, we can show that the unconditional distribution of \(z\) is \(N(\mu, \sigma^2 / (1 - \rho^2))\)

z[1] <- rnorm(1, mean = mu, sd = sigma / sqrt(1 - rho^2))

Now, I generate the rest of the data.1

eps <- rnorm(9999)
for (i in 2:periods) {
  z[i] <- mu + rho * (z[i - 1] - mu) + eps[i - 1]
}

Let’s see what our simulated data looks like:

plot(z, type = "l")

We can compare the moments with the true moments as well.

cat("True mean: ", round(mu, 2), 
    "\nSample mean: ", round(mean(z), 2))
## True mean:  -0.5 
## Sample mean:  -0.46
cat("True sd: ", round(sigma / sqrt(1 - rho^2), 2), 
    "\nSample sd: ", round(sd(z), 2))
## True sd:  2.29 
## Sample sd:  2.19

The good news is that the sample moments are close to the actual true parameters. So how do we translate this into an MLE algorithm that will output parameter estimates? This is where we need to generate our likelihood function that needs to be maximized. Basically, we need to compute the probability of the observed data given our parameter guesses. Since this is an AR(1) process, each observation is not independent, so we cannot take the product of the probabilities of observing each data point (represented below):

\[ \mathcal{L}(\theta ; z) \neq \prod_t f(z_t | \theta) \]

However, since our data comes from an AR(1) process, we can leverage the fact that it will be conditionally independent given the previous data point. In particular, the likelihood will be the following product:

\[ \mathcal{L}(\theta ; z) = \prod_t f(z_t | z_{t - 1}, \theta) \] To make this more convenient to work with, we take the natural log since this converts the product into a sum. \[ \ell (\theta ; z) = \sum_t \ln f(z_t | z_{t - 1}, \theta) \]

Before we can code up the likelihood, we need to know what \(f\) looks like. Looking at the AR(1) representation, we can show that \(z_t | z_{t - 1} \sim N(\mu + \rho (z_{t - 1} - \mu), \sigma^2)\)

Now that we know what the conditional distribution of \(z\) is, we can code our likelihood function and let R do the rest of the work. The optimizer that we will use in R will find the minimum of our function, so we take the negative of everything.2 I store our parameters in \(\theta = \{\mu, \rho, \sigma \}\).

logl <- function(theta, data) {
  mu <- theta[1]
  rho <- theta[2]
  sigma <- theta[3]
  periods <- length(data)
  sum(-dnorm(data[2:periods], 
             mean = mu + rho * (data[1:(periods - 1)] - mu), 
             sd = sigma, log = TRUE))
}

Now we have a function which will output the log-likelihood for given values of our parameters and the data. We can use nlm to compute the estimated parameters (I make the initial guess 0.1 for each parameter) and store the results in out.

theta1 <- c(0.1, 0.1, 0.1)
out <- suppressWarnings(nlm(logl, theta1, data = z))

Let’s see how the MLE estimates compare to our true parameters.

cat("True mu: ", mu, 
    "\nMLE mu: ", round(out$estimate[1], 2))
## True mu:  -0.5 
## MLE mu:  -0.46
cat("True rho: ", rho, 
    "\nMLE rho: ", round(out$estimate[2], 2))
## True rho:  0.9 
## MLE rho:  0.89
cat("True sigma: ", sigma,
    "\nMLE sigma: ", round(out$estimate[3], 2))
## True sigma:  1 
## MLE sigma:  0.99

Not bad! This worked out pretty well!


  1. For efficiency, I make all of the error draws outside of the loop. Given that we’re only doing 10,000 simulations, this probably does not significantly affect performance, but for larger simulations, it’s much better to make draws outside of a loop.

  2. Maximizing a function is equivalent to minimizing the negative of the function

comments powered by Disqus