Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives: An Essential Journey with Donald Rubin's Statistical Family
This book brings together a collection of articles on statistical methods relating to missing data analysis, including multiple imputation, propensity scores, instrumental variables, and Bayesian inference. Covering new research topics and real-world examples which do not feature in many standard texts. The book is dedicated to Professor Don Rubin (Harvard). Don Rubin has made fundamental contributions to the study of missing data.
Key features of the book include:
- Comprehensive coverage of an imporant area for both research and applications.
- Adopts a pragmatic approach to describing a wide range of intermediate and advanced statistical techniques.
- Covers key topics such as multiple imputation, propensity scores, instrumental variables and Bayesian inference.
- Includes a number of applications from the social and health sciences.
- Edited and authored by highly respected researchers in the area.
Table of Contents:
Preface xiii
I Casual inference and observational studies 1
1 An overview of methods for causal inference from observational studies, by Sander Greenland 3
1.1 Introduction 3
1.2 Approaches based on causal models 3
1.3 Canonical inference 9
1.4 Methodologic modeling 10
1.5 Conclusion 13
2 Matching in observational studies, by Paul R. Rosenbaum 15
2.1 The role of matching in observational studies 15
2.2 Why match? 16
2.3 Two key issues: balance and structure 17
2.4 Additional issues 21
3 Estimating causal effects in nonexperimental studies, by Rajeev Dehejia 25
3.1 Introduction 25
3.2 Identifying and estimating the average treatment effect 27
3.3 The NSWdata 29
3.4 Propensity score estimates 31
3.5 Conclusions 35
4 Medication cost sharing and drug spending in Medicare, by Alyce S. Adams 37
4.1 Methods 38
4.2 Results 40
4.3 Study limitations 45
4.4 Conclusions and policy implications 46
5 A comparison of experimental and observational data analyses, by Jennifer L. Hill, Jerome P. Reiter, and Elaine L. Zanutto 49
5.1 Experimental sample 50
5.2 Constructed observational study 51
5.3 Concluding remarks 60
6 Fixing broken experiments using the propensity score, by Bruce Sacerdote 61
6.1 Introduction 61
6.2 The lottery data 62
6.3 Estimating the propensity scores 63
6.4 Results 65
6.5 Concluding remarks 71
7 The propensity score with continuous treatments, by Keisuke Hirano and Guido W. Imbens 73
7.1 Introduction 73
7.2 The basic framework 74
7.3 Bias removal using the GPS 76
7.4 Estimation and inference 78
7.5 Application: the Imbens–Rubin–Sacerdote lottery sample 79
7.6 Conclusion 83
8 Causal inference with instrumental variables, by Junni L. Zhang 85
8.1 Introduction 85
8.2 Key assumptions for the LATE interpretation of the IV estimand 87
8.3 Estimating causal effects with IV 90
8.4 Some recent applications 95
8.5 Discussion 95
9 Principal stratification, by Constantine E. Frangakis 97
9.1 Introduction: partially controlled studies 97
9.2 Examples of partially controlled studies 97
9.3 Principal stratification 101
9.4 Estimands 102
9.5 Assumptions 104
9.6 Designs and polydesigns 107
II Missing data modeling 109
10 Nonresponse adjustment in government statistical agencies: constraints, inferential goals, and robustness issues, by John L. Eltinge 111
10.1 Introduction: a wide spectrum of nonresponse adjustment efforts in government statistical agencies 111
10.2 Constraints 112
10.3 Complex estimand structures, inferential goals, and utility functions 112
10.4 Robustness 113
10.5 Closing remarks 113
11 Bridging across changes in classification systems, by Nathaniel Schenker 117
11.1 Introduction 117
11.2 Multiple imputation to achieve comparability of industry and occupation codes 118
11.3 Bridging the transition from single-race reporting to multiple-race reporting 123
11.4 Conclusion 128
12 Representing the Census undercount by multiple imputation of households, by Alan M. Zaslavsky 129
12.1 Introduction 129
12.2 Models 131
12.3 Inference 134
12.4 Simulation evaluations 138
12.5 Conclusion 140
13 Statistical disclosure techniques based on multiple imputation, by Roderick J. A. Little, Fang Liu, and Trivellore
E. Raghunathan 141
13.1 Introduction 141
13.2 Full synthesis 143
13.3 SMIKe andMIKe 144
13.4 Analysis of synthetic samples 147
13.5 An application 149
13.6 Conclusions 152
14 Designs producing balanced missing data: examples from the National Assessment of Educational Progress, by Neal Thomas 153
14.1 Introduction 153
14.2 Statistical methods in NAEP 155
14.3 Split and balanced designs for estimating population parameters 157
14.4 Maximum likelihood estimation 159
14.5 The role of secondary covariates 160
14.6 Conclusions 162
15 Propensity score estimation with missing data, by Ralph B. D'Agostino Jr. 163
15.1 Introduction 163
15.2 Notation 165
15.3 Applied example:March of Dimes data 168
15.4 Conclusion and future directions 174
16 Sensitivity to nonignorability in frequentist inference, by Guoguang Ma and Daniel F. Heitjan 175
16.1 Missing data in clinical trials 175
16.2 Ignorability and bias 175
16.3 A nonignorable selection model 176
16.4 Sensitivity of the mean and variance 177
16.5 Sensitivity of the power 178
16.6 Sensitivity of the coverage probability 180
16.7 An example 184
16.8 Discussion 185
III Statistical modeling and computation 187
17 Statistical modeling and computation, by D. Michael Titterington 189
17.1 Regression models 190
17.2 Latent-variable problems 191
17.3 Computation: non-Bayesian 191
17.4 Computation: Bayesian 192
17.5 Prospects for the future 193
18 Treatment effects in before-after data, by Andrew Gelman 195
18.1 Default statistical models of treatment effects 195
18.2 Before-after correlation is typically larger for controls than for treated units 196
18.3 A class of models for varying treatment effects 200
18.4 Discussion 201
19 Multimodality in mixture models and factor models, by Eric Loken 203
19.1 Multimodality in mixture models 204
19.2 Multimodal posterior distributions in continuous latent variable models 209
19.3 Summary 212
20 Modeling the covariance and correlation matrix of repeated measures, by W. John Boscardin and Xiao Zhang 215
20.1 Introduction 215
20.2 Modeling the covariance matrix 216
20.3 Modeling the correlation matrix 218
20.4 Modeling a mixed covariance-correlation matrix 220
20.5 Nonzero means and unbalanced data 220
20.6 Multivariate probit model 221
20.7 Example: covariance modeling 222
20.8 Example: mixed data 225
21 Robit regression: a simple robust alternative to logistic and probit regression, by Chuanhai Liu 227
21.1 Introduction 227
21.2 The robit model 228
21.3 Robustness of likelihood-based inference using logistic, probit, and robit regression models 230
21.4 Complete data for simple maximum likelihood estimation 231
21.5 Maximum likelihood estimation using EM-type algorithms 233
21.6 A numerical example 235
21.7 Conclusion 238
22 Using EM and data augmentation for the competing risks model, by Radu V. Craiu and Thierry Duchesne 239
22.1 Introduction 239
22.2 The model 240
22.3 EM-based analysis 243
22.4 Bayesian analysis 244
22.5 Example 248
22.6 Discussion and further work 250
23 Mixed effects models and the EM algorithm, by Florin Vaida, Xiao-Li Meng, and Ronghui Xu 253
23.1 Introduction 253
23.2 Binary regression with random effects 254
23.3 Proportional hazards mixed-effects models 259
24 The sampling/importance resampling algorithm, by Kim-Hung Li 265
24.1 Introduction 265
24.2 SIR algorithm 266
24.3 Selection of the pool size 267
24.4 Selection criterion of the importance sampling distribution 271
24.5 The resampling algorithms 272
24.6 Discussion 276
IV Applied Bayesian inference 277
25 Whither applied Bayesian inference?, by Bradley P. Carlin 279
25.1 Where we've been 279
25.2 Where we are 281
25.3 Where we're going 282
26 Efficient EM-type algorithms for fitting spectral lines in high-energy astrophysics, by David A. van Dyk and Taeyoung Park 285
26.1 Application-specific statistical methods 285
26.2 The Chandra X-ray observatory 287
26.3 Fitting narrow emission lines 289
26.4 Model checking and model selection 294
27 Improved predictions of lynx trappings using a biological model, by Cavan Reilly and Angelique Zeringue 297
27.1 Introduction 297
27.2 The current best model 298
27.3 Biological models for predator prey systems 299
27.4 Some statistical models based on the Lotka-Volterra system 300
27.5 Computational aspects of posterior inference 302
27.6 Posterior predictive checks and model expansion 304
27.7 Prediction with the posterior mode 307
27.8 Discussion 308
28 Record linkage using finite mixture models, by Michael D. Larsen 309
28.1 Introduction to record linkage 309
28.2 Record linkage 310
28.3 Mixture models 311
28.4 Application 314
28.5 Analysis of linked files 316
28.6 Bayesian hierarchical record linkage 317
28.7 Summary 318
29 Identifying likely duplicates by record linkage in a survey of prostitutes, by Thomas R. Belin, Hemant Ishwaran, Naihua Duan, Sandra H. Berry, and David E. Kanouse 319
29.1 Concern about duplicates in an anonymous survey 319
29.2 General frameworks for record linkage 321
29.3 Estimating probabilities of duplication in the Los Angeles Women's Health Risk Study 322
29.4 Discussion 328
30 Applying structural equation models with incomplete data, by Hal S. Stern and Yoonsook Jeon 331
30.1 Structural equation models 332
30.2 Bayesian inference for structural equation models 334
30.3 Iowa Youth and Families Project example 339
30.4 Summary and discussion 342
31 Perceptual scaling, by Ying Nian Wu, Cheng-En Guo, and Song Chun Zhu 343
31.1 Introduction 343
31.2 Sparsity and minimax entropy 347
31.3 Complexity scaling law. 353
31.4 Perceptibility scaling law 356
31.5 Texture = imperceptible structures 358
31.6 Perceptibility and sparsity 359
References 361
Index 401
About the Author :
Andrew Gelman is Professor of Statistics and Professor of Political Science at Columbia University. He has published over 150 articles in statistical theory, methods, and computation, and in applications areas including decision analysis, survey sampling, political science, public health, and policy. His other books are Bayesian Data Analysis (1995, second edition 2003) and Teaching Statistics: A Bag of Tricks (2002).
Xiao-Li Meng, Department of Statistics, Harvard University, USA.
Review :
"I congratulate the editors on this volume; it really is an essential and very enjoyable journey with Don Rubin's statistical family." (Biometrics, September 2006) "…contains much current important work…" (Technometrics, November 2005)
"This a useful reference book on an important topic with applications to a wide range of disciplines." (CHOICE, September 2005)
“With this variety of papers, the reader is bound to find some papers interesting…” (Journal of Applied Statistics, Vol.32, No.3, April 2005)
“I strongly recommend that libraries have a copy of this book in their reference section.” (Journal of the Royal Statistical Society Series A, June 2005)
"...a very useful addition to academic libraries…" (Short Book Reviews, Vol.24, No.3, December 2004)