I recently read The Checklist Manifesto by Atul Gawande and was fascinated by how relatively simple checklists can improve performance and results in such complex endeavors as surgery or flying a commercial airplane. I decided I wanted to make a checklist of my own for Logistic regression. It ended up not being a checklist on how to do it per se, but rather a list of important facts to remember. Here’s what I came up with.
- Logistic regression models the probability that y = 1, \( P(y_{i} = 1) = logit^{-1}(X_{i}\beta) \) where \( logit^{-1} = \frac{e^{x}}{1+e^{x}} \)
- Logistic predictions are probabilistic. It predicts a probability that y = 1. It does not make a point prediction.
- The function \( logit^{-1} = \frac{e^{x}}{1+e^{x}} \) transforms continuous values to the range (0,1).
- Dividing a regression coefficient by 4 will give an upper bound of the predictive difference corresponding to a unit difference in x. For example if \(\beta = 0.33\), then \(0.33/4 = 0.08\). This means a unit increase in x corresponds to no more than a 8% positive difference in the probability that y = 1.
- The odds of success (i.e., y = 1) increase multiplicatively by \( e^{\beta} \) for every one-unit increase in x. That is, exponentiating logistic regression coefficients can be interpreted as odds ratios. For example, let’s say we have a regression coefficient of 0.497. Exponentiating gives \( e^{0.497} = 1.64 \). That means the odds of success increase by 64% for each one-unit increase in x. Recall that odds = \( \frac{p}{1-p} \). If our predicted probability at x is 0.674, then the odds of success are \(\frac{0.674}{0.326} = 2.07 \). Therefore at x + 1, the odds will increase by 64% from 2.07 to \(2.07(1.64) = 3.40\). Notice that \( 1.64 = \frac{3.40}{2.70}\), which is an odds ratio. The ratio of odds of x + 1 to x will always be \( e^{\beta}\), where \( \beta\) is a logistic regression coefficient.
- Plots of raw residuals from logistic regression are generally not useful. Instead it’s preferable to plot binned residuals “by dividing the data into categories (bins) based on their fitted values, and then plotting the average residual versus the average fitted value for each bin.” (Gelman & Hill, p. 97). Example R code for doing this can be found here.
- The error rate is the proportion of cases in your model that predicts y = 1 when the case is actually y = 0 in the data. We predict y = 1 when the predicted probability exceeds 0.5. Otherwise we predict y = 0. It’s not good if your error rate equals the null rate. The null rate is usually the proportion of 0’s in your data. In other words, if you guessed all cases in your data are y = 1, then the null rate is the percentage you guessed wrong. Let’s say your data has 58% of y = 1 and 42% of y = 0, then the null rate is 42%. Further, let’s say you do some logistic regression on this data and your model has an error rate of 36%. That is, 36% of the time it predicts the wrong outcome. This means your model does only 4% better than simply guessing that all cases are y = 1.
- Deviance is a measure of error. Lower deviance is better. When an informative predictor is added to a model, we expect deviance to decrease by more than 1. If not, then we’ve likely added a non-informative predictor to the model that just adds noise.
- If a predictor x is completely aligned with the outcome so that y = 1 when x is above some threshold and y = 0 when x is below some threshold, then the coefficient estimate will explode to some gigantic value. This means the parameter cannot be estimated. This is an identifiability problem called separation.
Most of this comes from Chapter 5 of Data Analysis Using Regression and Multilevel/Hierarchical Models by Gellman and Hill. I also pulled a little from chapter 5 of An Introduction to Categorical Data Analysis by Agresti.