Bayesian inference for generalized linear mixed models

YOUYI FONG

Downloaded from http://biostatistics.oxfordjournals.org/ at Cornell University Library on April 20, 2013

Department of Biostatistics, University of Washington, Seattle, WA 98112, USA ˚ HAVARD RUE Department of Mathematical Sciences, The Norwegian University for Science and Technology, N-7491 Trondheim, Norway JON WAKEFIELD∗ Departments of Statistics and Biostatistics, University of Washington, Seattle, WA 98112, USA jonno@u.washington.edu S UMMARY Generalized linear mixed models (GLMMs) continue to grow in popularity due to their ability to directly acknowledge multiple levels of dependency and model different data types. For small sample sizes especially, likelihood-based inference can be unreliable with variance components being particularly difficult to estimate. A Bayesian approach is appealing but has been hampered by the lack of a fast implementation, and the difficulty in specifying prior distributions with variance components again being particularly problematic. Here, we briefly review previous approaches to computation in Bayesian implementations of GLMMs and illustrate in detail, the use of integrated nested Laplace approximations in this context. We consider a number of examples, carefully specifying prior distributions on meaningful quantities in each case. The examples cover a wide range of data types including those requiring smoothing over time and a relatively complicated spline model for which we examine our prior specification in terms of the implied degrees of freedom. We conclude that Bayesian inference is now practically feasible for GLMMs and provides an attractive alternative to likelihood-based approaches such as penalized quasi-likelihood. As with likelihood-based approaches, great care is required in the analysis of clustered binary data since approximation strategies may be less accurate for such data. Keywords: Integrated nested Laplace approximations; Longitudinal data; Penalized quasi-likelihood; Prior specification; Spline models.

1. I NTRODUCTION Generalized linear mixed models (GLMMs) combine a generalized linear model with normal random effects on the linear predictor scale, to give a rich family of models that have been used in a wide variety of applications (see, e.g. Diggle and others, 2002; Verbeke and Molenberghs, 2000, 2005; McCulloch and others, 2008). This flexibility comes at a price, however, in terms of analytical tractability, which has a ∗ To whom correspondence should be addressed. c The Author 2009. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

398

Y. F ONG AND OTHERS

number of implications including computational complexity, and an unknown degree to which inference is dependent on modeling assumptions. Likelihood-based inference may be carried out relatively easily within many software platforms (except perhaps for binary responses), but inference is dependent on asymptotic sampling distributions of estimators, with few guidelines available as to when such theory will produce accurate inference. A Bayesian approach is attractive, but requires the specification of prior distributions which is not straightforward, in particular for variance components. Computation is also an issue since the usual implementation is via Markov chain Monte Carlo (MCMC), which carries a large computational overhead. The seminal article of Breslow and Clayton (1993) helped to popularize GLMMs and placed an emphasis on likelihood-based inference via penalized quasi-likelihood (PQL). It is the aim of this article to describe, through a series of examples (including all of those considered in Breslow and Clayton, 1993), how Bayesian inference may be performed with computation via a fast implementation and with...