We consider some ways of smart coding of interaction terms
The smart coding of interaction terms considers different formulations of the standard terms in a regression model. The model for testing of interaction is
y = x1 + x2 + x1*x2
This model has 3 df, including 1 df for the interaction x1*x2 (if x1 and x2 both have 1 df). If x1 is a binary variable, we can code x2 such that it represents the effects without x2 and with x2 respectively:
y = x1 + (1 – x1)*x2 + x1*x2
This model also has 3 df, but now the effects are directly estimated for x2 in the absence of x1 and the presence of x1. The term (1 – x1)*x2 is the x2 effect if x1=0; the term x1*x2 is the x2 effect if x1=1.
Suppose we consider a categorical predictor with 7 levels, e.g. presenting problem in children with fever ('problem'). We are interested in the effect of temperature ('temp') on urgency (high vs low). The model for testing of interaction is
y = as.factor(problem) + temp + as.factor(problem) * temp
This model has 13 df of freedom. For easier interpretation of the effect of temp by problem we create 7 variables:
p1t <- ifelse('problem'==1, temp, 0) p2t <- ifelse('problem'==2, temp, 0) ... p7t <- ifelse('problem'==7, temp, 0)
We then fit the model
y = as.factor(problem) + p1t + p2t + ... + p7t
This model also has 13 df, but the effects of temperature are easier to interpret.
If x1 is a continuous predictor, we can estimate the effect of x2 at specific values of x1 by subtracting the x1 value at which we want to estimate the effect of x2. For example, when we examine a model with interaction between systolic blood pressure (SBP, range e.g. 60 – 200) and age (AGE, range e.g. 30 – 90), we could reformulate the standard model
y = AGE + SBP + AGE*SBP as y = (AGE-50) + SBP + (AGE-50)*SBP for the SBP effect at age 50, and y = (AGE-70) + SBP + (AGE-70)*SBP for the SBP effect at age 70.
We illustrate the smart coding in the n=785 subsample of GUSTO-I. We want to assess the effects of age among those with or without tachycardia (HRT 0 / 1). We calculate 2 age variables AGE0 and AGE1 to do so:
AGE0 <- gustos$AGE * (1 - gustos$HRT) # Age effect, No tachycardia AGE1 <- gustos$AGE * gustos$HRT # Age effect, Tachycardia
The standard fit was
Coef S.E. Wald P AGE 0.0759 0.024 3.16 0.0016 HRT=Tachycardia -3.6376 2.835 -1.28 0.1995 AGE * HRT=Tachycardia 0.0655 0.040 1.64 0.1004 ...
The smart coding gives us
Coef S.E. Wald P AGE0 0.0759 0.024 3.16 0.0016 AGE1 0.1414 0.033 4.26 0.0000 HRT=Tachycardia -3.6376 2.835 -1.28 0.1995 ...
In the standard fit, we have to add the effects of AGE and AGE*HRT to obtain the age effect in those with HRT. A confidence interval for the age effect in those with HRT requires use of the covariance matrix of the fit. With smart coding, the confidence intervals are obtained directly from the fit from the SE of each age effect. Note that the coefficient for AGE1 is the sum of the AGE and AGE*HRT effects in the standard fit.
The interpretation of the HRT effect is at age zero; we can make that at age 50 as follows:
AGE0 <- (gustos$AGE-50) * (1-gustos$HRT) AGE1 <- (gustos$AGE-50) * gustos$HRT
This coding leads to the following regression coefficients
Coef S.E. Wald P AGE0 0.07587 0.02401 3.16 0.0016 AGE1 0.14140 0.03323 4.26 0.0000 HRT -0.3609 0.88541 -0.41 0.6835 ...
For age 70, we subtract 70 instead of 50 from the age variable, and this leads to
Coef S.E. Wald P Intercept -4.4323 0.52548 -8.43 0.0000 AGE0 0.07587 0.02401 3.16 0.0016 AGE1 0.14140 0.03323 4.26 0.0000 HRT 0.94971 0.33094 2.87 0.0041 ...
So, we note that HRT is slightly protective at age 50 (coefficient -0.36, p=0.68) and a risk factor at age 70 (coefficient 0.95, p=0.004, see Fig 12.1).
We can reformulate the interaction model to prevent this counterintuitive pattern.
AGE55min <- ifelse(gustos$AGE<55, gustos$AGE-55,0) AGE55plusNoHRT <- ifelse(gustos$AGE<55|gustos$HRT==0,0,gustos$AGE-55) AGE55plusHRT <- ifelse(gustos$AGE<55|gustos$HRT==1,0,gustos$AGE-55)
This coding leads to the following regression coefficients
Coef S.E. Wald P AGE55min 0.08709 0.10706 0.81 0.416 AGE55plusNoHRT 0.07701 0.02491 3.09 0.002 AGE55plusHRT 0.14123 0.02577 5.48 0.000 ...
We now estimate a common age effect until age 55 (AGE55min coefficient 0.087 per year); from age 55 the age effect is 0.141 and 0.077 for those with and without tachycardia respectively. We do not incorporate a separate effect of HRT, since we assume a breakpoint at age 55. If we only assume a stronger effect over age 55, we can formulate a model with the following coefficients:
Coef S.E. Wald P AGE55 0.07817 0.02097 3.73 0.0002 AGE55plusHRT 0.06409 0.01871 3.43 0.0006 ... where AGE55 <- gustos$AGE-55 # age is zero at 55 years
We now have an age effect for the HRT=1 (‘Tachycardia’) group that starts at age 55, and is +0.064 stronger than the effect of 0.078 for the HRT=0 (‘No tachycardia’) group. Dividing age by 10 would make the coefficient interpretable as per decade older (Fig 12.2).