18. Presentation

Below, some extra details are shown for the presentation of the testicular cancer model.

Construction of score chart

Below we discuss the construction of the score chart.

  1. Multiply and round regression coefficients of binary predictors and dummy variables of categorical predictors
  2. Search for score for continuous predictors, continuous or categorized
  3. Estimate the multiplication factor for the scores
  4. Estimate the intercept, check c statistic, and present as score chart

1. Multiply and round coefficients

The first step is to multiply regression coefficients and round them to scores. A simple approach is to multiply coefficients by 10. But we can also search for smaller rounded scores. For example, the coefficients of the binary predictors Teratoma, Pre.AFP, and Pre.HCG were quite similar (approximately 0.8 for the penalized coefficients, Table 18.4). We multiply by 10/8 to give these 3 predictors each a score of 1. In general, we can often find lower numbers for multiplication that still allow for a refined prediction.

The R command for a search of scaling factor for the penalized coefficients was:

for (i in seq(1, 10, by=0.5)) {
cat(	"Multiply by:", 10/i, "i=", i, 
"Coefs:", round(full.pen$coefficients[-1] * 10 / i), "\n") }

2. Score for continuous predictors

Three continuous predictors are considered in the prediction model, with a log transformation (LDHst), square root transformation (Post.size), and linear (Reduction in size during treatment). These continuous predictors need to be rescaled in such a way that a 1 point in score corresponds approximately to a 10/8 increase in log odds. We first treat these predictors as continuous variables, and later consider categorization as a further simplification. We go through several steps: a) score 0: convenient? With a predictor such as age, it is often strange to have a score of 0 at 0 years. We may need to change the reference point to a sensible value, which we subtract from the original value. b) score points: We aim to find values of the predictors where the scores are +1, +2, etc points, depending on the distribution of predictor values and the predictor effects. c) intermediate scores: We have to think about intermediate values, which have e.g. a score of +0.5 points. Other values can then be scored by linear interpolation. We consider these 3 steps for the 3 continuous predictors in the testicular cancer histology prediction model (LDH, postchemotherapy size, and reduction in size).

LDH

A score of zero is obtained for log(1), i.e. when LDH values are equal to the upper limit of normal. This is convenient as a reference. The log transformed variable happens to have a penalized coefficient of 0.88, which can be rounded directly to 1 point when multiplied by 10/8 (0.88 * 10/8 = 1.1). Hence, when the natural log = 1, the score is one point. The LDHst value then is exp(1) = 2.7. So a LDH values 2.7 times the upper limit of normal have a score of 1. The score of 1.1 was actually somewhat larger than 1. Hence, we can set 1 point at 2.5 times normal, and a score of zero at 1 times normal levels. A score of 2 points is achieved at 2.5*2.5 is approximately 6 times normal levels. A more general approach is to study the relation between the score corresponding to LDH values. We calculate the score (rounded at 1 decimal) for the LDHst values (Fig 18.4). We note that intermediate levels of LDH are common; the median value was 1.36 times normal. And a score of +0.5 is found at 1.6 times normal LDH; we show this value in the score chart. A score of +1.5 is found at 4 times normal LDH. Scores to be shown for LDHst are 0.6, 1, 1.6, 2.5, 4, and 6 times upper limit of normal, with scores of -0.5, 0, 0.5, 1, 1.5, and 2 respectively.
Relation between LDHst and rounded values of the score in the testicular cancer model. We can read the scores for increasing values of LDHst. The density of the data is plotted at the bottom of the graph.

Sqrt(Post.size)

In the nomogram, a postchemotherapy size of 50 mm obtains a score of zero, such that smaller sizes have positive points. In the score chart we can choose a similar approach, but prefer a size of 10 mm corresponding to zero. The 10 mm size is clinically the most interesting, since many centers do not perform surgery for patients with masses smaller than 10 mm. Masses smaller than 10 mm are considered ‘normal size’ by radiologists. We cannot define Post.size with negative numbers however, since we want to study the square root of Post.size. Hence, we subtract – 10mm from Post.size only after the model was fitted. Based on the figure below, we choose the following sizes to be shown with scores: <5mm, +0.5; 10mm, 0; 20mm, –0.5; 40mm, –1, and 70mm, –1.5 point.
Relation between postchemotherapy size and rounded values of the score in the testicular cancer model. We can read the decreasing scores for increasing values of size. The density of the data is plotted at the bottom of the graph.

Reduction in size

A reduction of 0% is associated with 0 points, which is convenient. The relationship is simply linear (see figure below). However, in the data we noted that none of the patients had necrosis when there was an increase in mass size during chemotherapy. We choose the following points: <0%: –1, 0%, 0; 50%, 1; 100%, 2 points.
Relation between reduction in size and rounded values of the score in the testicular cancer model. We can read the scores for increasing reduction. The density of the data is plotted at the bottom of the graph..

Verify that coefficients are correct for rescaled predictors

We may check that the regression coefficients for each rescaled predictor with a logistic regression model:

lrm(NEC ~ Teratoma+Pre.AFP+Pre.HCG+LDHr+SQPOSTr+REDUC5, data=n544)

The coefficients are 0.91, 0.91, 0.77, 0.89, 0.89, and 0.80. Hence the rescaling worked to obtain effects around 0.8, which was the typical value of the coefficients of the 3 dichotomous predictors.

Coding with categorization

Categorization of the three continuous predictors can also be considered. It should consider the distribution and the predictive effects of the predictors. Strong predictors, with a wide range of predictor values, should have more categories than weaker predictors. For LDHst, we could simply categorize the predictor as normal versus abnormal, as was done for AFP and HCG. (For the latter 2 tumor markers we found no clear benefit of a continuous coding with restricted cubic spline analysesStat Med 2001). For postchemotherapy size, we could use 3 categories: <20mm, 20-49mm, and >=50mm, with 2, 1, and 0 as scores respectively. For reduction in size, we create 3 categories, with increase, 0 – 49% reduction, and >=50% reduction in size having –1, 0 and +1 points respectively. These categorizations can also be checked in a regression analysis, where the coefficient for the categorized predictors should have values around 0.8. Indeed, this is the case when we fit the following model, where coefficients were 0.92, 0.86, 0.66, 0.91, 0.86, and 0.79:

lrm(NEC ~ Teratoma+Pre.AFP+Pre.HCG+PRELDH+POST2+REDUCr, data=n544)

Using categorized versions of the continuous predictors led to a substantial drop in c statistic (from 0.839 to 0.808). Hence, categorizing may simplify presentation at the cost of performance. The score chart is shown here: Tables 18.extra.

3. Estimate the multiplication factor

After finding a suitable set of weights we need to find the multiplication factor for the scores. In the testicular cancer model we round after multiplication with the factor 10/8 in a logistic model. To compensate for this multiplication, the actual multiplier was 0.86. This factor can be multiplied with the shrinkage factor (in this case: 0.95) to obtain shrunk predictions (shrunk.beta=0.81).

The calculation is as follows , where we omit the intercept from the set of coefficients:

score.fit 	<- lrm(NEC~Teratoma+Pre.AFP+Pre.HCG+LDHr+SQPOSTr+REDUC5,
data=n544,x=T,y=T)
rounded.lp	<- score.fit$x %*% rep(1,6)	# All scores a weight of 1
# multiplier makes the rescaled factors for logistic formula
multiplier	<- lrm.fit(y=score.fit$y, x=rounded.lp)$coef[2]
shrunk.beta	<- 0.95 * multiplier  # shrinkage * 0.86
shrunk.beta # shrunk multiplier for better predictions, 0.81

4. Estimate the intercept, c statistic, and present the model

We estimate the intercept corresponding to the scores, using the rounded coefficients multiplied with the shrunk.beta coefficient as an offset variable:

lrm.fit(y=score.fit$y, offset=shrunk.beta*rounded.lp)
# The formula is: lp = –1.94 + 0.81*score.

We check the deterioration in discriminative performance. The c statistic of the original model was 0.839; uniform shrinkage does not affect this value. We find that the c statistic with rounded scores and using continuous predictors is only 0.001 lower (0.838).
The final score chart can be constructed in several ways. Especially, the presentation of values for continuous predictors is possible with scores horizontally, or vertically.

R script for testicular cancer case study

Several variants to present the testicular cancer model are shown in this R script, starting with the n544.sav data set.
Enjoy! :-)

## Alternative: Br J Cancer 1996 paper: make a score from 0 - 5, 
## and leave postsize on one axis; exclude increase in size (low p(nec))
score5  <- n544$Teratoma+n544$Pre.AFP+n544$Pre.HCG+n544$PRELDH+n544$REDUCr
score5  <- ifelse(n544$REDUCr<0,0,score5)
describe(score5)
# Simple coding: 5 categories for postsize (no difference 20-30 and 30-50 mm)
n544$POST5 <- ifelse(n544$SQPOST<=sqrt(10),0,
                ifelse(n544$SQPOST<=sqrt(20),1,
                  ifelse(n544$SQPOST<=sqrt(30),2,
                    ifelse(n544$SQPOST<=sqrt(50),3,4))))
POST5 <- n544$POST5[n544$REDUCr>-1]
y <- n544$NEC[n544$REDUCr>-1]                    

x <- ftable(as.data.frame(cbind(score5,POST5,y)))
ftable(x, col.vars = c(1,3))

# Predictions from simplified lrm model
full.simple2  <- lrm(y~as.factor(POST5)+score5,x=T,y=T,se.fit=T)
# Calculate predicted probabilities + 95% CI
x <- cbind(full.simple2$x, 
        round(plogis(full.simple2$linear.predictors),2), 
        round(plogis(full.simple2$linear.predictors-1.96*full.simple2$se.fit),2),
        round(plogis(full.simple2$linear.predictors+1.96*full.simple2$se.fit),2))
unique(x[order(full.simple2$x[,1],full.simple2$x[,2],full.simple2$x[,3],full.simple2$x[,4]),])

## C stat for this Table; which makes 5 groups (4 cut-offs in ROC curve)
n544$simple.cat  <- ifelse(n544$POST5==3 & score5==4,70,
                    ifelse(n544$POST5==3 & score5==5,80,
                      ifelse(n544$POST5==2 & score5==3,60,
                      ifelse(n544$POST5==2 & score5==4,80,                    
                      ifelse(n544$POST5==2 & score5==5,90,                    
                        ifelse(n544$POST5<2 & score5==2,60,
                        ifelse(n544$POST5<2 & score5==3,70,
                        ifelse(n544$POST5<2 & score5==4,80,
                        ifelse(n544$POST5<2 & score5==5,90, 50)))))))))
describe(n544$simple.cat)
rcorr.cens(n544$simple.cat, n544$NEC)

## **Radiology paper**: could **graph pre - post with lines for score 1 - 4**
score4  <- n544$TER+n544$PREAFP+n544$PREHCG+n544$PRELDH
lrm(n544$NEC ~ as.vector(score4) + as.vector(n544$REDUC10))

full.simple3 <- lrm.fit(y=n544$NEC,x=cbind(score4,n544$SQPOST,n544$REDUC10))
full.simple3
full.simple3$coef

# reduc = (pre-post) / pre; reduc = 1 - post/pre; reduc-1 = -post/pre; pre = -post/(reduc-1)
n544$presize <- - n544$SQPOST^2 / ifelse((n544$REDUC10/10 - 1) != 0,(n544$REDUC10/10 - 1),-.04)  
describe(n544$presize)
# Now calculate iso probability lines: e.g. prob -70%, 
# depending on postsize (horizontal), and presize (vertical)
cbind(n544$presize, n544$SQPOST^2, plogis(full$linear.predictor))

# Now calculate PRESIZE from SQPOST, for given probabilities
presize = f(sqpost, plogis(full))
pre = -post/(reduc10-10) # for reduc10<10
# range sqpost from 2 to 50; score 4 0 - 4
full.simple3$coef
x <- as.matrix(cbind(rep(0:4,49), rep(sqrt(2:50),5)))
lp.simple3  <- x %*% full.simple3$coef[2:3] + full.simple3$coef[1]
lp.simple3
# now calculate reduc10 with condition p=70% etc
# Solve equation qlogis(.7) = lp.simple3 + full.simple3$coef[4] * n544$REDUC10
reduc10.70 <- (qlogis(.7) - lp.simple3) /full.simple3$coef[4]
# Calculate presizes: pre = -post/(reduc10/10-1)
presize.70  <- - rep(2:50,5) / (reduc10.70/10 - 1)
# in 1 formula for efficiency
presize.90  <- - rep(2:50,5) / (((qlogis(.9) - lp.simple3) /full.simple3$coef[4])/10 - 1)
presize.80  <- - rep(2:50,5) / (((qlogis(.8) - lp.simple3) /full.simple3$coef[4])/10 - 1)
presize.60  <- - rep(2:50,5) / (((qlogis(.6) - lp.simple3) /full.simple3$coef[4])/10 - 1)
presize.50  <- - rep(2:50,5) / (((qlogis(.5) - lp.simple3) /full.simple3$coef[4])/10 - 1)

x <- as.matrix(cbind(rep(0:4,49), rep(sqrt(2:50),5),rep(2:50,5),lp.simple3,reduc10.70,
          presize.70,presize.90,presize.80,presize.60,presize.50 ))
colnames(x) <- Cs(score4, sqpost,postsize,lp,reduc1070,
          presize70,presize90,presize80,presize60,presize50)
x <- as.data.frame(x)
x[1:10,]
x[x<0]  <- NA

# Try to make some plots; isoprobability lines for different scores, x-axis=postsize
par(mfrow=c(2,2))
for (i in 1:4) {
plot(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize50"], main=paste("Score=",i,sep=""),
  ylab='Postchemo size (mm)', xlab="Prechemo size (mm)", 
  xlim=c(0,100), ylim=c(0,50),type="p", pch="50%", axes=F,las=1)
axis(side=2,at=c(0,10,20,30,50))
axis(side=1,at=c(0,20,50,100))
points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize60"], pch="60%")
points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize70"], pch="70%")
points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize80"], pch="80%")
points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize90"], pch="90%")
} # end loop over score 1 - 4

#########################################
## **Meta-model with tree presentation**   ##
## Start with dichotomizing predictions from full as <70% vs >=70%
n544$predfull <- plogis(full$linear.predictor)
n544$predhigh <- ifelse(plogis(full$linear.predictor)<.7,0,1)
describe(n544)
rcorr.cens(n544$predhigh, n544$NEC)
table(n544$predhigh, n544$NEC)
# So: low prediction, NEC 129:273; high prediction, NEC 116:26
273 /(273+26) # sensitivity for residual tumor/teratoma
116 /(116+129) # specificity for necrosis

# Make **tree model** for this outcome
library(rpart)
options(digits=3)
tree.orig <- rpart(NEC ~ Teratoma+Pre.AFP+Pre.HCG+LDHst+Post.size+Reduction,
			data=n544)
plot(tree.orig)
text(tree.orig, use.n=F)

tree.meta <- rpart(predhigh ~ Teratoma+Pre.AFP+Pre.HCG+LDHst+Post.size+Reduction,
			data=n544)
plot(tree.meta)
text(tree.meta, use.n=F)

## Make smooth tree presentation; classification with reduction, teratoma, AFP
score3  <- as.numeric(n544$Reduction>70)+as.numeric(n544$Teratoma==1)+as.numeric(n544$Pre.AFP==1)
n544$tree.cat  <- ifelse(n544$Reduction>50 & score3>=2,1,0)
describe(n544$tree.cat)
rcorr.cens(n544$tree.cat, n544$NEC) 
table(n544$tree.cat, n544$NEC)                                
# So: low prediction, NEC 134:264; high prediction, NEC 111:35
264 /(264+35) # sensitivity for residual tumor/teratoma
111 /(111+134) # specificity for necrosis

additional/chapter18.txt · Last modified: 2018/11/27 11:27 by ewsteyerberg
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0