Ioannis Ntzoufras
Publications Page
D. Fouskakis,
I. Ntzoufras
&
D. Draper
Department of Applied Mathematics,
National Technical University of Athens,
Athens, GREECE;
email fouskakis@math.ntua.gr
Department of Statistics,
Athens University of Economics and Business,
Athens, GREECE;
e-mail: ntzoufras@aueb.gr.
Department of Applied Mathematics and Statistics,
Baskin School of Engineering,
University of California,
Santa Cruz, USA;
e-mail: draper@ams.ucsc.edu
Journal of the Royal Statistical Society C (Applied Statistics), 58, 383-403, 2009.
SYNOPSIS
The measurement and improvement of the quality of health care are important areas of current research and development. An indirect way to evaluate the quality of hospital care is to compare the observed mortality rates at each of a number of hospitals with their expected rates, given the sickness at admission of their patients. Patient sickness at admission is often assessed by using logistic regression of mortality, for example within 30 days of admission, on a fairly large number of sickness indicators to construct a sickness scale, employing classical variable selection methods - which trade off prediction accuracy against parsimony - to find an ``optimal" subset of 10--20 indicators. When the goal is the creation of a sickness scale that may be used prospectively to measure quality of care on a new set of patients in a cost-effective manner, traditional variable selection methods can produce sub-optimal subsets, since they do not account for differences in the data collection costs of the available predictors.
In settings of this type, with two desirable criteria that compete - here, high predictive accuracy and low cost - a method must be found to achieve a joint optimization. There are two possible approaches, both of which reduce the dimensionality of the optimization problem from two to one: either (a) both criteria can be placed on a common scale, trading one off against the other, and optimization can occur on that scale, or (b) one criterion can be optimized, subject to a bound on the other. Elsewhere we explore strategy (a); here we develop a method for implementing strategy (b), through a cost-restriction-benefit analysis. The practical relevance of the selected variable subsets using the method of this paper is ensured by enforcing an overall limit on the total data collection cost of each subset: the search is conducted only among models whose cost does not exceed this budgetary restriction.
Conventional model search algorithms in our setting will fail if the best model under strategy (a) is outside the imposed cost limit and when collinear predictors with high predictive ability are present. The reason for this failure is the existence of multiple modes with movement paths
that are forbidden due to the cost restriction. To solve this problem, in this paper we develop a population-based trans-dimensional reversible-jump Markov chain Monte Carlo (population RJMCMC) algorithm, in which ideas from the population-based MCMC and simulated tempering algorithms are combined. Comparing our method with standard RJMCMC, we find that the population-based RJMCMC algorithm moves successfully and more efficiently between distant neighbourhoods of ``good" models, achieves convergence faster and has smaller Monte Carlo standard errors for a given amount of CPU time. In a case study of n = 2, 532 pneumonia patients on whom p =83 sickness indicators were measured, with marginal costs varying from
smallest to largest across the predictor variables by a factor of 20, the final model chosen by population RJMCMC, both on the basis of highest posterior probability and specifying the median probability model, is clinically sensible for pneumonia patients and achieves good predictive ability while capping data collection costs.
Keywords: Bayesian model comparison; Cost-restriction-benefit analysis; Health care evaluation; Population-based MCMC algorithms; Reversible-jump Markov chain Monte Carlo (RJMCMC) methods; Simulated tempering.
First Draft 6/6/2007; Final Version 1/5/2008; Published July 2009.
Download: Published version available at the Journal of the Royal Society D (Applied Statistics) by