Andrew Gelman
Columbia University in the City of New York
H-index: 129
North America-United States
Description
Andrew Gelman, With an exceptional h-index of 129 and a recent h-index of 91 (since 2020), a distinguished researcher at Columbia University in the City of New York, specializes in the field of statistics, political science.
His recent articles reflect a diverse array of research interests and contributions to the field:
Model validation for aggregate inferences in out-of-sample prediction
Bayesian workflow for time-varying transmission in stratified compartmental infectious disease transmission models
Past, Present and Future of Software for Bayesian Inference
Active Statistics: Stories, Games, Problems, and Hands-on Demonstrations for Applied Regression and Causal Inference
Hierarchical Bayesian Models to Mitigate Systematic Disparities in Prediction with Proxy Outcomes
Commentaries on “Beyond statistical significance: Five principles for the new era of data analysis and reporting”
Using leave‐one‐out cross validation (LOO) in a multilevel regression and poststratification (MRP) workflow: A cautionary tale
Regression, poststratification, and small-area estimation with sampling weights
Professor Information
University | Columbia University in the City of New York |
---|---|
Position | Professor of Statistics and Political Science |
Citations(all) | 180737 |
Citations(since 2020) | 87586 |
Cited By | 129029 |
hIndex(all) | 129 |
hIndex(since 2020) | 91 |
i10Index(all) | 336 |
i10Index(since 2020) | 258 |
University Profile Page | Columbia University in the City of New York |
Research & Interests List
statistics
political science
Top articles of Andrew Gelman
Model validation for aggregate inferences in out-of-sample prediction
Generalization to new samples is a fundamental rationale for statistical modeling. For this purpose, model validation is particularly important, but recent work in survey inference has suggested that simple aggregation of individual prediction scores does not give a good measure of the score for population aggregate estimates. In this manuscript we explain why this occurs, propose two scoring metrics designed specifically for this problem, and demonstrate their use in three different ways. We show that these scoring metrics correctly order models when compared to the true score, although they do underestimate the magnitude of the score. We demonstrate with a problem in survey research, where multilevel regression and poststratification (MRP) has been used extensively to adjust convenience and low-response surveys to make population and subpopulation estimates.
Authors
Lauren Kennedy,Aki Vehtari,Andrew Gelman
Published Date
2024/2/15
Bayesian workflow for time-varying transmission in stratified compartmental infectious disease transmission models
Compartmental models that describe infectious disease transmission across subpopulations are central for assessing the impact of non-pharmaceutical interventions, behavioral changes and seasonal effects on the spread of respiratory infections. We present a Bayesian workflow for such models, including four features: (1) an adjustment for incomplete case ascertainment, (2) an adequate sampling distribution of laboratory-confirmed cases, (3) a flexible, time-varying transmission rate, and (4) a stratification by age group. We benchmarked the performance of various implementations of two of these features (2 and 3). For the second feature, we used SARS-CoV-2 data from the canton of Geneva (Switzerland) and found that a quasi-Poisson distribution is the most suitable sampling distribution for describing the overdispersion in the observed laboratory-confirmed cases. For the third feature, we implemented three methods: Brownian motion, B-splines, and approximate Gaussian processes (aGP). We compared their performance in terms of the number of effective samples per second, and the error and sharpness in estimating the time-varying transmission rate over a selection of ordinary differential equation solvers and tuning parameters, using simulated seroprevalence and laboratory-confirmed case data. Even though all methods could recover the time-varying dynamics in the transmission rate accurately, we found that B-splines perform up to four and ten times faster than Brownian motion and aGPs, respectively. We validated the B-spline model with simulated age-stratified data. We applied this model to 2020 laboratory-confirmed SARS-CoV …
Authors
Judith Bouman,Anthony Hauser,Simon L Grimm,Martin Wohlfender,Samir Bhatt,Elizaveta Semenova,Andrew Gelman,Christian L Althaus,Julien Riou
Journal
medRxiv
Published Date
2023
Past, Present and Future of Software for Bayesian Inference
Software tools for Bayesian inference have undergone rapid evolution in the past three decades, following popularisation of the first generation MCMC-sampler implementations. More recently, exponential growth in the number of users has been stimulated both by the active development of new packages by the machine learning community and popularity of specialist software for particular applications. This review aims to summarize the most popular software and provide a useful map for a reader to navigate the world of Bayesian computation. We anticipate a vigorous continued development of algorithms and corresponding software in multiple research fields, such as probabilistic programming, likelihood-free inference and Bayesian neural networks, which will further broaden the possibilities for employing the Bayesian paradigm in exciting applications.
Authors
Erik Štrumbelj,Alexandre Bouchard-Côté,Jukka Corander,Andrew Gelman,Håvard Rue,Lawrence Murray,Henri Pesonen,Martyn Plummer,Aki Vehtari
Published Date
2024/2
Active Statistics: Stories, Games, Problems, and Hands-on Demonstrations for Applied Regression and Causal Inference
This book provides statistics instructors and students with complete classroom material for a one-or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging'flipped classroom'environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels, and practice exam questions to help guide learning. Designed to accompany the authors' previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or be used by learners as a hands-on workbook.
Authors
Andrew Gelman,Aki Vehtari
Published Date
2024/3/31
Hierarchical Bayesian Models to Mitigate Systematic Disparities in Prediction with Proxy Outcomes
Label bias occurs when the outcome of interest is not directly observable and instead modeling is performed with proxy labels. When the difference between the true outcome and the proxy label is correlated with predictors, this can yield systematic disparities in predictions for different groups of interest. We propose Bayesian hierarchical measurement models to address these issues. Through practical examples, we demonstrate how our approach improves accuracy and helps with algorithmic fairness.
Authors
Jonas Mikhaeil,Andrew Gelman,Philip Greengard
Journal
arXiv preprint arXiv:2403.00639
Published Date
2024/3/1
Commentaries on “Beyond statistical significance: Five principles for the new era of data analysis and reporting”
Three commentaries below provide different perspectives on data analysis and reporting. They generally focus on how the quality of the measures and manipulations determines the value of the analysis. Norbert Schwarz and Fritz Strack's comment is less on the right statistic and more on “sloppy reasoning, gaps between theoretical concepts and their operationalizations, and blissful ignorance of the situated nature of human thinking, feeling, and doing contribute more to the limited reproducibility of empirical findings than the choice of a particular test statistic.” They propose that particular effects are contextual and inappropriately labeled as true or false. Instead, our job is to focus on general constructs that make sense of the diversity of human experience and psychological reactions. Too often studies replicating psychological effects in the noisy and confounded conditions of the marketplace result in statistical …
Authors
Norbert Schwarz,Fritz Strack,Andrew Gelman,Stijn van Osselaer,Joel Huber
Journal
Journal of Consumer Psychology
Published Date
2023
Using leave‐one‐out cross validation (LOO) in a multilevel regression and poststratification (MRP) workflow: A cautionary tale
In recent decades, multilevel regression and poststratification (MRP) has surged in popularity for population inference. However, the validity of the estimates can depend on details of the model, and there is currently little research on validation. We explore how leave‐one‐out cross validation (LOO) can be used to compare Bayesian models for MRP. We investigate two approximate calculations of LOO: Pareto smoothed importance sampling (PSIS‐LOO) and a survey‐weighted alternative (WTD‐PSIS‐LOO). Using two simulation designs, we examine how accurately these two criteria recover the correct ordering of model goodness at predicting population and small‐area estimands. Focusing first on variable selection, we find that neither PSIS‐LOO nor WTD‐PSIS‐LOO correctly recovers the models' order for an MRP population estimand, although both criteria correctly identify the best and worst model. When …
Authors
Swen Kuh,Lauren Kennedy,Qixuan Chen,Andrew Gelman
Journal
Statistics in Medicine
Published Date
2024/2/28
Regression, poststratification, and small-area estimation with sampling weights
A well-known rule in practical survey research is to include weights when estimating a population average but not to use weights when fitting a regression model—as long as the regression includes as predictors all the information that went into the sampling weights. But what if you don’t know where the weights came from? We propose a quasi-Bayesian approach using a joint regression of the outcome and the sampling weight, followed by poststratifcation on the two variables, thus using design information within a model-based context to obtain inferences for small-area estimates, regressions, and other population quantities of interest.
Authors
Andrew Gelman,Yajuan Si,Brady T West
Published Date
2024/2/19
Professor FAQs
What is Andrew Gelman's h-index at Columbia University in the City of New York?
The h-index of Andrew Gelman has been 91 since 2020 and 129 in total.
What are Andrew Gelman's top articles?
The articles with the titles of
Model validation for aggregate inferences in out-of-sample prediction
Bayesian workflow for time-varying transmission in stratified compartmental infectious disease transmission models
Past, Present and Future of Software for Bayesian Inference
Active Statistics: Stories, Games, Problems, and Hands-on Demonstrations for Applied Regression and Causal Inference
Hierarchical Bayesian Models to Mitigate Systematic Disparities in Prediction with Proxy Outcomes
Commentaries on “Beyond statistical significance: Five principles for the new era of data analysis and reporting”
Using leave‐one‐out cross validation (LOO) in a multilevel regression and poststratification (MRP) workflow: A cautionary tale
Regression, poststratification, and small-area estimation with sampling weights
...
are the top articles of Andrew Gelman at Columbia University in the City of New York.
What are Andrew Gelman's research interests?
The research interests of Andrew Gelman are: statistics, political science
What is Andrew Gelman's total number of citations?
Andrew Gelman has 180,737 citations in total.