Below, some extra details are shown for the presentation of the testicular cancer model.
Below we discuss the construction of the score chart.
The first step is to multiply regression coefficients and round them to scores. A simple approach is to multiply coefficients by 10. But we can also search for smaller rounded scores. For example, the coefficients of the binary predictors Teratoma, Pre.AFP, and Pre.HCG were quite similar (approximately 0.8 for the penalized coefficients, Table 18.4). We multiply by 10/8 to give these 3 predictors each a score of 1. In general, we can often find lower numbers for multiplication that still allow for a refined prediction.
The R command for a search of scaling factor for the penalized coefficients was:
for (i in seq(1, 10, by=0.5)) { cat( "Multiply by:", 10/i, "i=", i, "Coefs:", round(full.pen$coefficients[-1] * 10 / i), "\n") }
Three continuous predictors are considered in the prediction model, with a log transformation (LDHst), square root transformation (Post.size), and linear (Reduction in size during treatment). These continuous predictors need to be rescaled in such a way that a 1 point in score corresponds approximately to a 10/8 increase in log odds. We first treat these predictors as continuous variables, and later consider categorization as a further simplification. We go through several steps: a) score 0: convenient? With a predictor such as age, it is often strange to have a score of 0 at 0 years. We may need to change the reference point to a sensible value, which we subtract from the original value. b) score points: We aim to find values of the predictors where the scores are +1, +2, etc points, depending on the distribution of predictor values and the predictor effects. c) intermediate scores: We have to think about intermediate values, which have e.g. a score of +0.5 points. Other values can then be scored by linear interpolation. We consider these 3 steps for the 3 continuous predictors in the testicular cancer histology prediction model (LDH, postchemotherapy size, and reduction in size).
A score of zero is obtained for log(1), i.e. when LDH values are equal to the upper limit of normal. This is convenient as a reference. The log transformed variable happens to have a penalized coefficient of 0.88, which can be rounded directly to 1 point when multiplied by 10/8 (0.88 * 10/8 = 1.1). Hence, when the natural log = 1, the score is one point. The LDHst value then is exp(1) = 2.7. So a LDH values 2.7 times the upper limit of normal have a score of 1. The score of 1.1 was actually somewhat larger than 1. Hence, we can set 1 point at 2.5 times normal, and a score of zero at 1 times normal levels. A score of 2 points is achieved at 2.5*2.5 is approximately 6 times normal levels. A more general approach is to study the relation between the score corresponding to LDH values. We calculate the score (rounded at 1 decimal) for the LDHst values (Fig 18.4). We note that intermediate levels of LDH are common; the median value was 1.36 times normal. And a score of +0.5 is found at 1.6 times normal LDH; we show this value in the score chart. A score of +1.5 is found at 4 times normal LDH. Scores to be shown for LDHst are 0.6, 1, 1.6, 2.5, 4, and 6 times upper limit of normal, with scores of -0.5, 0, 0.5, 1, 1.5, and 2 respectively.
In the nomogram, a postchemotherapy size of 50 mm obtains a score of zero, such that smaller sizes have positive points. In the score chart we can choose a similar approach, but prefer a size of 10 mm corresponding to zero. The 10 mm size is clinically the most interesting, since many centers do not perform surgery for patients with masses smaller than 10 mm. Masses smaller than 10 mm are considered ‘normal size’ by radiologists. We cannot define Post.size with negative numbers however, since we want to study the square root of Post.size. Hence, we subtract – 10mm from Post.size only after the model was fitted. Based on the figure below, we choose the following sizes to be shown with scores: <5mm, +0.5; 10mm, 0; 20mm, –0.5; 40mm, –1, and 70mm, –1.5 point.
.
A reduction of 0% is associated with 0 points, which is convenient. The relationship is simply linear (see figure below). However, in the data we noted that none of the patients had necrosis when there was an increase in mass size during chemotherapy. We choose the following points: <0%: –1, 0%, 0; 50%, 1; 100%, 2 points.
.
We may check that the regression coefficients for each rescaled predictor with a logistic regression model:
lrm(NEC ~ Teratoma+Pre.AFP+Pre.HCG+LDHr+SQPOSTr+REDUC5, data=n544)
The coefficients are 0.91, 0.91, 0.77, 0.89, 0.89, and 0.80. Hence the rescaling worked to obtain effects around 0.8, which was the typical value of the coefficients of the 3 dichotomous predictors.
Categorization of the three continuous predictors can also be considered. It should consider the distribution and the predictive effects of the predictors. Strong predictors, with a wide range of predictor values, should have more categories than weaker predictors. For LDHst, we could simply categorize the predictor as normal versus abnormal, as was done for AFP and HCG. (For the latter 2 tumor markers we found no clear benefit of a continuous coding with restricted cubic spline analysesStat Med 2001). For postchemotherapy size, we could use 3 categories: <20mm, 20-49mm, and >=50mm, with 2, 1, and 0 as scores respectively. For reduction in size, we create 3 categories, with increase, 0 – 49% reduction, and >=50% reduction in size having –1, 0 and +1 points respectively. These categorizations can also be checked in a regression analysis, where the coefficient for the categorized predictors should have values around 0.8. Indeed, this is the case when we fit the following model, where coefficients were 0.92, 0.86, 0.66, 0.91, 0.86, and 0.79:
lrm(NEC ~ Teratoma+Pre.AFP+Pre.HCG+PRELDH+POST2+REDUCr, data=n544)
Using categorized versions of the continuous predictors led to a substantial drop in c statistic (from 0.839 to 0.808). Hence, categorizing may simplify presentation at the cost of performance. The score chart is shown here: Tables 18.extra.
After finding a suitable set of weights we need to find the multiplication factor for the scores. In the testicular cancer model we round after multiplication with the factor 10/8 in a logistic model. To compensate for this multiplication, the actual multiplier was 0.86. This factor can be multiplied with the shrinkage factor (in this case: 0.95) to obtain shrunk predictions (shrunk.beta=0.81).
The calculation is as follows , where we omit the intercept from the set of coefficients:
score.fit <- lrm(NEC~Teratoma+Pre.AFP+Pre.HCG+LDHr+SQPOSTr+REDUC5, data=n544,x=T,y=T) rounded.lp <- score.fit$x %*% rep(1,6) # All scores a weight of 1 # multiplier makes the rescaled factors for logistic formula multiplier <- lrm.fit(y=score.fit$y, x=rounded.lp)$coef[2] shrunk.beta <- 0.95 * multiplier # shrinkage * 0.86 shrunk.beta # shrunk multiplier for better predictions, 0.81
We estimate the intercept corresponding to the scores, using the rounded coefficients multiplied with the shrunk.beta coefficient as an offset variable:
lrm.fit(y=score.fit$y, offset=shrunk.beta*rounded.lp) # The formula is: lp = –1.94 + 0.81*score.
We check the deterioration in discriminative performance. The c statistic of the original model was 0.839; uniform shrinkage does not affect this value. We find that the c statistic with rounded scores and using continuous predictors is only 0.001 lower (0.838).
The final score chart can be constructed in several ways. Especially, the presentation of values for continuous predictors is possible with scores horizontally, or vertically.
Several variants to present the testicular cancer model are shown in this R script, starting with the n544.sav data set.
Enjoy!
## Alternative: Br J Cancer 1996 paper: make a score from 0 - 5, ## and leave postsize on one axis; exclude increase in size (low p(nec)) score5 <- n544$Teratoma+n544$Pre.AFP+n544$Pre.HCG+n544$PRELDH+n544$REDUCr score5 <- ifelse(n544$REDUCr<0,0,score5) describe(score5) # Simple coding: 5 categories for postsize (no difference 20-30 and 30-50 mm) n544$POST5 <- ifelse(n544$SQPOST<=sqrt(10),0, ifelse(n544$SQPOST<=sqrt(20),1, ifelse(n544$SQPOST<=sqrt(30),2, ifelse(n544$SQPOST<=sqrt(50),3,4)))) POST5 <- n544$POST5[n544$REDUCr>-1] y <- n544$NEC[n544$REDUCr>-1] x <- ftable(as.data.frame(cbind(score5,POST5,y))) ftable(x, col.vars = c(1,3)) # Predictions from simplified lrm model full.simple2 <- lrm(y~as.factor(POST5)+score5,x=T,y=T,se.fit=T) # Calculate predicted probabilities + 95% CI x <- cbind(full.simple2$x, round(plogis(full.simple2$linear.predictors),2), round(plogis(full.simple2$linear.predictors-1.96*full.simple2$se.fit),2), round(plogis(full.simple2$linear.predictors+1.96*full.simple2$se.fit),2)) unique(x[order(full.simple2$x[,1],full.simple2$x[,2],full.simple2$x[,3],full.simple2$x[,4]),]) ## C stat for this Table; which makes 5 groups (4 cut-offs in ROC curve) n544$simple.cat <- ifelse(n544$POST5==3 & score5==4,70, ifelse(n544$POST5==3 & score5==5,80, ifelse(n544$POST5==2 & score5==3,60, ifelse(n544$POST5==2 & score5==4,80, ifelse(n544$POST5==2 & score5==5,90, ifelse(n544$POST5<2 & score5==2,60, ifelse(n544$POST5<2 & score5==3,70, ifelse(n544$POST5<2 & score5==4,80, ifelse(n544$POST5<2 & score5==5,90, 50))))))))) describe(n544$simple.cat) rcorr.cens(n544$simple.cat, n544$NEC) ## **Radiology paper**: could **graph pre - post with lines for score 1 - 4** score4 <- n544$TER+n544$PREAFP+n544$PREHCG+n544$PRELDH lrm(n544$NEC ~ as.vector(score4) + as.vector(n544$REDUC10)) full.simple3 <- lrm.fit(y=n544$NEC,x=cbind(score4,n544$SQPOST,n544$REDUC10)) full.simple3 full.simple3$coef # reduc = (pre-post) / pre; reduc = 1 - post/pre; reduc-1 = -post/pre; pre = -post/(reduc-1) n544$presize <- - n544$SQPOST^2 / ifelse((n544$REDUC10/10 - 1) != 0,(n544$REDUC10/10 - 1),-.04) describe(n544$presize) # Now calculate iso probability lines: e.g. prob -70%, # depending on postsize (horizontal), and presize (vertical) cbind(n544$presize, n544$SQPOST^2, plogis(full$linear.predictor)) # Now calculate PRESIZE from SQPOST, for given probabilities presize = f(sqpost, plogis(full)) pre = -post/(reduc10-10) # for reduc10<10 # range sqpost from 2 to 50; score 4 0 - 4 full.simple3$coef x <- as.matrix(cbind(rep(0:4,49), rep(sqrt(2:50),5))) lp.simple3 <- x %*% full.simple3$coef[2:3] + full.simple3$coef[1] lp.simple3 # now calculate reduc10 with condition p=70% etc # Solve equation qlogis(.7) = lp.simple3 + full.simple3$coef[4] * n544$REDUC10 reduc10.70 <- (qlogis(.7) - lp.simple3) /full.simple3$coef[4] # Calculate presizes: pre = -post/(reduc10/10-1) presize.70 <- - rep(2:50,5) / (reduc10.70/10 - 1) # in 1 formula for efficiency presize.90 <- - rep(2:50,5) / (((qlogis(.9) - lp.simple3) /full.simple3$coef[4])/10 - 1) presize.80 <- - rep(2:50,5) / (((qlogis(.8) - lp.simple3) /full.simple3$coef[4])/10 - 1) presize.60 <- - rep(2:50,5) / (((qlogis(.6) - lp.simple3) /full.simple3$coef[4])/10 - 1) presize.50 <- - rep(2:50,5) / (((qlogis(.5) - lp.simple3) /full.simple3$coef[4])/10 - 1) x <- as.matrix(cbind(rep(0:4,49), rep(sqrt(2:50),5),rep(2:50,5),lp.simple3,reduc10.70, presize.70,presize.90,presize.80,presize.60,presize.50 )) colnames(x) <- Cs(score4, sqpost,postsize,lp,reduc1070, presize70,presize90,presize80,presize60,presize50) x <- as.data.frame(x) x[1:10,] x[x<0] <- NA # Try to make some plots; isoprobability lines for different scores, x-axis=postsize par(mfrow=c(2,2)) for (i in 1:4) { plot(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize50"], main=paste("Score=",i,sep=""), ylab='Postchemo size (mm)', xlab="Prechemo size (mm)", xlim=c(0,100), ylim=c(0,50),type="p", pch="50%", axes=F,las=1) axis(side=2,at=c(0,10,20,30,50)) axis(side=1,at=c(0,20,50,100)) points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize60"], pch="60%") points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize70"], pch="70%") points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize80"], pch="80%") points(y=x[x$score4==i,"postsize"],x=x[x$score4==i, "presize90"], pch="90%") } # end loop over score 1 - 4 ######################################### ## **Meta-model with tree presentation** ## ## Start with dichotomizing predictions from full as <70% vs >=70% n544$predfull <- plogis(full$linear.predictor) n544$predhigh <- ifelse(plogis(full$linear.predictor)<.7,0,1) describe(n544) rcorr.cens(n544$predhigh, n544$NEC) table(n544$predhigh, n544$NEC) # So: low prediction, NEC 129:273; high prediction, NEC 116:26 273 /(273+26) # sensitivity for residual tumor/teratoma 116 /(116+129) # specificity for necrosis # Make **tree model** for this outcome library(rpart) options(digits=3) tree.orig <- rpart(NEC ~ Teratoma+Pre.AFP+Pre.HCG+LDHst+Post.size+Reduction, data=n544) plot(tree.orig) text(tree.orig, use.n=F) tree.meta <- rpart(predhigh ~ Teratoma+Pre.AFP+Pre.HCG+LDHst+Post.size+Reduction, data=n544) plot(tree.meta) text(tree.meta, use.n=F) ## Make smooth tree presentation; classification with reduction, teratoma, AFP score3 <- as.numeric(n544$Reduction>70)+as.numeric(n544$Teratoma==1)+as.numeric(n544$Pre.AFP==1) n544$tree.cat <- ifelse(n544$Reduction>50 & score3>=2,1,0) describe(n544$tree.cat) rcorr.cens(n544$tree.cat, n544$NEC) table(n544$tree.cat, n544$NEC) # So: low prediction, NEC 134:264; high prediction, NEC 111:35 264 /(264+35) # sensitivity for residual tumor/teratoma 111 /(111+134) # specificity for necrosis