Contrasts in R determine how linear model coefficients of categorical variables are interpreted. The default contrast for unordered categorical variables is the Treatment contrast. This means the “first” level (aka, the baseline) is rolled into the intercept and all subsequent levels have a coefficient that represents their difference from the baseline. That’s not too hard to grasp. But what about other contrasts, namely the Helmert and Sum contrasts? What do they do? Instead of explaining them, I figured I would demonstrate each.
First, let’s make some data:
set.seed(2112)
# create 3 levels, 10 each
flevels <- factor(rep(c("A","B","C"),c(10,10,10)))
# create some "nice" data, sorted so means at each level have good separation
vals <- sort(round(runif(30,3,15)))
# calculate mean of each level for reference
means <- tapply(vals,flevels,mean)
means
A B C
6.0 10.1 12.9
Our data consist of three levels of arbitrary values. "flevels" is our categorical variable. Notice I explicitly defined it to be a factor using the factor() function. I need to do this so R knows this variable is a factor and codes it according to whatever contrast setting we decide to use.
Let's verify the default unordered contrast setting is the Treatment contrast:
options("contrasts")
$contrasts
unordered ordered
"contr.treatment" "contr.poly"
Indeed it is. This means our factor levels are coded as follows:
contrasts(flevels)
B C
A 0 0
B 1 0
C 0 1
This is a 3 x 2 matrix. The 2 columns of the matrix tells us that our model will have 2 coefficients, one for the B level and one for the C level. Therefore the A level is the baseline. The coefficients we get in our linear model for B and C will indicate the respective differences of their mean from the level A mean. The values in the rows tell us what values to plug into the model to get the means for the row labels. For example, to get the mean for A we plug in 0's for both coefficients which leaves us with the intercept. Therefore the intercept is the mean of A. Let's see all this in action before we explore the Helmert and Sum contrasts.
m.trt <- lm(vals ~ flevels)
summary(m.trt)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.0000 0.3935 15.249 8.63e-15 ***
flevelsB 4.1000 0.5564 7.368 6.32e-08 ***
flevelsC 6.9000 0.5564 12.400 1.17e-12 ***
Now we can verify how the Treatment contrast works by extracting the coefficient values from the model and comparing to the means we calculated earlier:
# Intercept = mean of A
coef(m.trt)[1]
(Intercept)
6
means[1]
A
6
# flevelsB = mean of B - mean of A
coef(m.trt)[2]
flevelsB
4.1
means[2] - means[1]
B
4.1
# flevelsC = mean of C - mean of A
coef(m.trt)[3]
flevelsC
6.9
means[3] - means[1]
C
6.9
Let's also verify that plugging in the row values of the contrast matrix returns the means of each level:
# plug in row values into model to get the means of A, B and C, respectively
means
A B C
6.0 10.1 12.9
# mean of A --> row 1: 0 0
coef(m.trt)[1] + 0*coef(m.trt)[2] + 0*coef(m.trt)[3]
(Intercept)
6
# mean of B --> row 2: 0 0
coef(m.trt)[1] + 1*coef(m.trt)[2] + 0*coef(m.trt)[3]
(Intercept)
10.1
# mean of C --> row 3: 0 0
coef(m.trt)[1] + 0*coef(m.trt)[2] + 1*coef(m.trt)[3]
(Intercept)
12.9
So that's how Treatment contrasts work. Now let's look at Helmert contrasts. "The coefficients for the Helmert regressors compare each level with the average of the "preceding" ones", says Fox in his book An R and S-Plus Companion to Applied Regression. I guess that makes sense. Kind of. Eh, not really. At least not to me. I say we do as we did before: fit a model and compare the coefficients to the means and see what they mean. Before we do that we need to set the contrast to Helmert:
# set contrast to "contr.helmert"
contrasts(flevels) <- "contr.helmert"
contrasts(flevels) # take a look
[,1] [,2]
A -1 -1
B 1 -1
C 0 2
Interesting. Notice the column labels are no longer associated with the levels of the factor. They just say 1 and 2. However this still tells us that our model will have two coefficients. Again the row values tell us what to plug in to get the means of A, B and C, respectively. To get the mean of A, we plug in -1 and -1 to the model. This means our intercept has a different interpretation. Let's fit the linear model and investigate.
m.hel <- lm(vals ~ flevels)
summary(m.hel)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.6667 0.2272 42.553 < 2e-16 ***
flevels1 2.0500 0.2782 7.368 6.32e-08 ***
flevels2 1.6167 0.1606 10.064 1.24e-10 ***
It turns out the intercept is the mean of the means, the first coefficient is the mean of the first two levels minus the first level, and the second coefficient is the mean of all three levels minus the mean of the first two levels. Did you get that? Here, this may help:
# intercept = mean of all means
coef(m.hel)[1]
(Intercept)
9.666667
mean(means)
[1] 9.666667
# flevels1 = mean of first two levels minus first level
coef(m.hel)[2]
flevels1
2.05
mean(means[1:2]) - means[1]
A
2.05
# flevels2 = mean of all three levels minus mean of first two levels
coef(m.hel)[3]
flevels2
1.616667
mean(means) - mean(means[1:2])
[1] 1.616667
Let's do that thing again where we plug in the row values of the contrast matrix to verify it returns the means of the levels:
means
A B C
6.0 10.1 12.9
# mean of A --> row 1: -1 -1
coef(m.hel)[1] + -1*coef(m.hel)[2] + -1*coef(m.hel)[3]
(Intercept)
6
# mean of B --> row 2: 1 -1
coef(m.hel)[1] + 1*coef(m.hel)[2] + -1*coef(m.hel)[3]
(Intercept)
10.1
# mean of C --> row 3: 0 2
coef(m.hel)[1] + 0*coef(m.hel)[2] + 2*coef(m.hel)[3]
(Intercept)
12.9
That leaves us with the Sum contrast. Regarding models fitted with the Sum contrasts, Fox tells us that "each coefficient compares the corresponding level of the factor to the average of the other levels." I think like Helmert contrasts, this one is better demonstrated. As before we need to change the contrast setting.
# set contrast to "contr.sum"
contrasts(flevels) <- "contr.sum"
contrasts(flevels) # take a look
[,1] [,2]
A 1 0
B 0 1
C -1 -1
Just like the Helmert contrast we see two columns with no labels. Our model will have two coefficients that don't correspond directly to the levels of our factors. By now we know the values in the rows are what we plug into our model to get the means of our levels. To get the mean of level A, we plug in 1 and 0. Time to fit the model and investigate:
m.sum <- lm(vals ~ flevels)
summary(m.sum)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.6667 0.2272 42.553 < 2e-16 ***
flevels1 -3.6667 0.3213 -11.413 7.75e-12 ***
flevels2 0.4333 0.3213 1.349 0.189
Like the Helmert contrasts, our intercept is mean of all means. But our two coefficients have different interpretations. The first is the difference between the mean of level 1 (A) and the mean of all means. The second coefficient is the difference between the mean of level 2 (B) and the mean of all means. Notice in the model output above that the second coefficient is not significant. In other words, the mean of level B is not significantly different from the mean of all means.
# intercept = mean of all means
coef(m.sum)[1]
(Intercept)
9.666667
mean(means)
[1] 9.666667
# flevels1 = mean of level 1 (here, A) - mean of all means
coef(m.sum)[2]
flevels1
-3.666667
means[1] - mean(means)
A
-3.666667
# flevels2 = mean of level 2 (here, B) - mean of all means
coef(m.sum)[3]
flevels2
0.4333333
means[2] - mean(means)
B
0.4333333
Finally to be complete we plug in the row values of the Sum contrast matrix to verify it returns the means of the factor levels:
means
A B C
6.0 10.1 12.9
# mean of A -->row 1: 1 0
coef(m.sum)[1] + 1*coef(m.sum)[2] + 0*coef(m.sum)[3]
(Intercept)
6
# mean of B -->row 2: 0 1
coef(m.sum)[1] + 0*coef(m.sum)[2] + 1*coef(m.sum)[3]
(Intercept)
10.1
# mean of C -->row 3: -1 -1
coef(m.sum)[1] + -1*coef(m.sum)[2] + -1*coef(m.sum)[3]
(Intercept)
12.9
And finally we wrap up this exercise by returning the contrast level of our categorical variable back to the system default:
contrasts(flevels) <- NULL
Hopefully this helps you get a better handle on what Helmert and Sum contrasts do.
Very nice explainations! Thanks!
One comment though:
In the comments on explaining the contrasts using Sum contrasts you say f ex
#flevels1 = mean of all means – mean of level 1 (here, A)
but it is really the opposite (and that is what you actually calculate in the code below the comment)? I.e.
#flevels1 = mean of level 1 (here, A) – mean of all means
This goes for the comment on flevel2 as well.
You’re right, good catch! I’ll fix that. Thanks for letting me know!