Monthly Archives: January 2024

Encoding a contrast in a linear model

This is note to myself on how to encode a contrast in a linear model.

The following data come from the text Design and Analysis of Experiments by Dean and Voss (1999). It involves the lifetime per unit cost of nonrechargeable batteries. Four types of batteries are considered:

  1. alkaline, name brand
  2. alkaline, store brand
  3. heavy duty, name brand
  4. heavy duty, store brand

We can read the data in from the textbook web site.

URL <- "https://corescholar.libraries.wright.edu/cgi/viewcontent.cgi?filename=12&article=1007&context=design_analysis&type=additional"
bat <- read.table(URL, header=TRUE)
names(bat) <- tolower(names(bat))
bat$type <- factor(bat$type)
head(bat)
##   type lpuc order
## 1    1  611     1
## 2    2  923     2
## 3    1  537     3
## 4    4  476     4
## 5    1  542     5
## 6    1  593     6

A quick look at the means suggests the alkaline store brand battery seems like the best battery for the money.

means <- tapply(bat$lpuc, bat$type, mean)
means
##      1      2      3      4 
## 570.75 860.50 433.00 496.25

We might want to make comparisons between these means. Three such comparisons are as follows:

  1. compare battery duty (alkaline vs heavy duty)
  2. compare battery brand (name brand versus store brand)
  3. compare the interaction (levels 1/4 vs levels 2/3)

These are the comparisons presented in the text (p. 171). We can make these comparisons “by hand”.

# compare battery duty
mean(means[1:2]) - mean(means[3:4])
## [1] 251
# compare battery brand
mean(means[c(1,3)]) - mean(means[c(2,4)])
## [1] -176.5
# compare interaction
mean(means[c(1,4)]) - mean(means[c(2,3)])
## [1] -113.25

We can also make these comparisons using a contrast matrix. Below we create the contrast as a matrix object and name it K. Then we use matrix multiplication to calculate the same comparisons above.

K <- matrix(c(1/2, 1/2, -1/2, -1/2,
              1/2, -1/2, 1/2, -1/2,
              1/2, -1/2, -1/2, 1/2), 
            byrow = T, ncol = 4)
K %*% means
##         [,1]
## [1,]  251.00
## [2,] -176.50
## [3,] -113.25

This particular contrast matrix is orthogonal. “Two contrasts are orthogonal if the sum of the products of corresponding coefficients (i.e. coefficients for the same means) adds to zero.” (source) We can show this using the crossprod() function.

crossprod(K[1,],K[2,])
##      [,1]
## [1,]    0
crossprod(K[2,],K[3,])
##      [,1]
## [1,]    0
crossprod(K[1,],K[3,])
##      [,1]
## [1,]    0

Obviously we would like to calculate standard errors and confidence intervals for these comparisons. One way is to fit a model using lm() and do the comparisons as a follow-up using a package such as {multcomp}. To do this we use the glht() function with our contrast, K.

library(multcomp)
m <- lm(lpuc ~ type, data = bat)
comp <- glht(m, linfct = mcp(type = K))
confint(comp)
## 
##   Simultaneous Confidence Intervals
## 
## Multiple Comparisons of Means: User-defined Contrasts
## 
## 
## Fit: lm(formula = lpuc ~ type, data = bat)
## 
## Quantile = 2.7484
## 95% family-wise confidence level
##  
## 
## Linear Hypotheses:
##        Estimate  lwr       upr      
## 1 == 0  251.0000  184.1334  317.8666
## 2 == 0 -176.5000 -243.3666 -109.6334
## 3 == 0 -113.2500 -180.1166  -46.3834

Another way is to encode the contrasts directly in our model. We can do that using the capital C() function. The only difference here is that we need to transpose the matrix. The comparisons need to be defined on the columns instead of the rows. We could use the t() function to transpose K, but we’ll go ahead and create a new matrix to make this clear.

K2 <- matrix(c(1/2, 1/2, -1/2, -1/2,
              1/2, -1/2, 1/2, -1/2,
              1/2, -1/2, -1/2, 1/2), 
            ncol = 3)

Now we refit the model using the contrast in the model formula. Notice the coefficients are the same comparisons we calculated above. The intercept is the grand mean of lpuc, (i.e. mean(bat$lpuc)).

m2 <- lm(lpuc ~ C(type, K2), data = bat)
summary(m2)
## 
## Call:
## lm(formula = lpuc ~ C(type, K2), data = bat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -66.50 -33.56 -18.12  38.19  72.75 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    590.12      12.16  48.511 3.85e-15 ***
## C(type, K2)1   251.00      24.33  10.317 2.55e-07 ***
## C(type, K2)2  -176.50      24.33  -7.255 1.01e-05 ***
## C(type, K2)3  -113.25      24.33  -4.655 0.000556 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 48.66 on 12 degrees of freedom
## Multiple R-squared:  0.9377, Adjusted R-squared:  0.9222 
## F-statistic: 60.24 on 3 and 12 DF,  p-value: 1.662e-07

And now we can use confint() to get the confidence intervals.

confint(m2)
##                  2.5 %     97.5 %
## (Intercept)   563.6202  616.62977
## C(type, K2)1  197.9905  304.00954
## C(type, K2)2 -229.5095 -123.49046
## C(type, K2)3 -166.2595  -60.24046

These are narrower than what we got with {multcomp}. That’s because {multcomp} uses a family-wise confidence interval that adjusts for making multiple comparisons.

Another property of orthogonal contrasts is that their estimated means are uncorrelated. We can see this by calling vcov() on the fitted model object that directly uses the contrast matrix we created.

zapsmall(vcov(m2))
##              (Intercept) C(type, K2)1 C(type, K2)2 C(type, K2)3
## (Intercept)     147.9818       0.0000       0.0000       0.0000
## C(type, K2)1      0.0000     591.9271       0.0000       0.0000
## C(type, K2)2      0.0000       0.0000     591.9271       0.0000
## C(type, K2)3      0.0000       0.0000       0.0000     591.9271

We can also use the {emmeans} package to make these comparisons as follows using the original model.

library(emmeans)
emm_out <- emmeans(m, "type")
contrast(emm_out, list(c1 = c(1, 1, -1, -1)/2,
                       c2 = c(1, -1, 1, -1)/2,
                       c3 = c(1, -1, -1, 1)/2)) |>
  confint()
##  contrast estimate   SE df lower.CL upper.CL
##  c1            251 24.3 12      198    304.0
##  c2           -176 24.3 12     -230   -123.5
##  c3           -113 24.3 12     -166    -60.2
## 
## Confidence level used: 0.95