Table of Contents

Van Houwelingen and Thorogood describe a case study on the validation and updating of a prognostic model for kidney graft survival using data from the Eurotransplant database {Van Houwelingen, 1995}. The model was developed with data from 7121 patients who received their implant between 1984 and 1987. Predictors included donor and recipient characteristics, such as age, sex, and blood group, and HLA mismatches. We are specifically interested in the applicability of a prediction model in each of the transplantation centers. We hereto study differences between the centers after adjustment for important predictors. In total 52 centers were considered. Differences between centers were highly significant in a traditional fixed effect Cox regression survival model (χ^{2} 160, 51 df, p<0.0001).

In an alternative approach, the heterogeneity between centers (τ^{2}) was estimated assuming that the underlying true center effects are distributed as N(μ, τ^{2}), with estimates per center N(α_{i}, σ_{i}^{2}). This is similar to random effects meta-analysis {DerSimonian, 1986}. The τ^{2} was estimated in an iterative procedure, using estimates of center effects α_{i} and their standard errors σ_{i}^{2} from a traditional fixed effect analysis. The heterogeneity was hence estimated in a two-step approach: first the fixed effects were estimated per center, and next the distribution of center effects was done in a second step. The resulting τ^{2} was 0.22, or τ=0.47 {Van Houwelingen, 1995}. The latter effect is at the log(hazard) scale, and indicates substantially larger heterogeneity between centers than for patients with an acute MI in GUSTO-I.

The Empirical Bayes estimates were compared to the traditionally adjusted estimates of the differences between centers {Van Houwelingen, 1995}. As expected, shrinkage towards the average occurred. Again, the shrinkage was especially marked for centers with small numbers of patients (as reflected in a large standard errors of the fixed effect estimates in a Cox regression analysis). These same centers had widely varying outcome in the traditional analysis (either much larger or much smaller than the average). The EB estimates had a similar range for any size of the center, while the traditionally adjusted estimates had a wider range with decreasing sample size. The EB estimates for each center were used for the prediction of kidney graft survival in future patients.

In the renal transplant study, further updating of the prediction model was considered with a more recent set of 6419 patients {Van Houwelingen, 1995}. The aim was to perform some fine-tuning of the model, similar to the recalibration approach discussed in the Chapter 20. The prediction model included EB estimates of the center effects. A single calibration slope was considered over all centers, which was close to 1 (0.97, SE 0.06). This calibration slope was further below 1 if the fixed effect estimates were used for center-specific predictions. This example hence illustrates that better predictions are made with EB than traditional estimates.