Table of Contents

A theoretical motivation for this shrinkage approach was developed by Van Houwelingen, who argued that a shrunken model can be seen as a weighted average of the fitted model and a null model without predictors in the traditional shrinkage approach ^{Van Houwelingen, 2000 Steyerberg, 2004}. After recalibration, we can formulate the shrunken model as a weighted average of the fitted model and a recalibrated model. The shrunken regression coefficients βshrunk can be written as

βshrunk = βcal + s * gamma = βoverall * βorig + s * (βnew – βoverall * βorig),

where βcal is the recalibrated regression coefficient from multiplying the overall calibration slope (βoverall) with the original coefficient value (βorig); s is the shrinkage factor; and gamma is the difference in re-estimated coefficient (βnew) and recalibrated coefficient βcal. If s = 0 (severe shrinkage), the shrunken coefficient is the recalibrated value; if s = 1 (no shrinkage), the new coefficient value is retained. The value for the shrinkage factor can be determined from the increase in model performance over simple recalibration ^{Steyerberg, 2004}.

Extensive simulation studies were performed within the GUSTO-I data set. The TIMI-II model was developed in 3339 patients and updated with method 1 to 8 (Table 20.1) in validation sets of varying size (n=200, n=500, n=1000, n=2000, n=5000, n=10,000). The updated models were further tested in independent patients from GUSTO-I ^{Steyerberg, 2004}.

Calibration-in-the-large was a problem for the validity of the TIMI model in GUSTO-I. This was solved by updating the intercept (all methods 2 to 8). The calibration slope was close to 1 for the TIMI model in GUSTO-I. Without shrinkage, updating led to an average slope below one, reflecting optimism is estimation of the updated coefficients. This miscalibration was especially severe for re-estimation of the 8 predictor model in small samples (method 5); e.g. n=200, slope=0.59; n=500, slope=0.82 on average. Shrinkage largely resolved this problem, with average slopes close to one for the re-estimated models (n=200, slope=0.93; n=500, slope=0.98). In larger validation samples, all methods led to average calibration slopes being close to 1.

Discriminative ability of the TIMI-II model was by definition not affected by the re-calibration methods 2 or 3, which leave the rank order of predictions unchanged. The c statistic was 0.785. Discrimination was adversely affected by model re-estimation or model revision when validation samples had relatively small sizes (n ≤ 500). For example, c decreased on average to 0.742 by method 5 with n=200, and to 0.771 with n=500. With shrinkage, the decrease was less (to 0.775 for n=200). An improvement in average c (by at most 0.01) was only seen with validation samples of at least 1000 patients, and with updating including shrinkage. Without shrinkage, c did not improve with re-estimation unless n>=2000. Further details are presented elsewhere ^{Steyerberg, 2004}.

Updating is expected to be more beneficial if the previous model is based on a small sample size. We hereto simulated the situation that the TIMI-II model was developed in n=500 instead of n=3339. The average slope of the linear predictor was then around 0.8, reflecting a need for shrinkage, consistent with the small development sample size. Updating of the slope (method 3) solved this problem. The discriminative ability was also hampered by the smaller size of the development data set (average c around 0.75 for methods 1 to 3, in contrast to 0.785 for the original TIMI-II model). Still, re-estimation with n=200 led to a lower c statistic (0.742). A more satisfactory performance was obtained with methods 5, 7, or 8 for validation sample sizes n ≥ 500, especially when combined with shrinkage.