Weining Shen

University of California, Irvine

H-index: 13

North America-United States

About Weining Shen

Weining Shen, With an exceptional h-index of 13 and a recent h-index of 12 (since 2020), a distinguished researcher at University of California, Irvine, specializes in the field of Statistics, Machine learning, Biostatistics.

His recent articles reflect a diverse array of research interests and contributions to the field:

Clustering spatial functional data using a geographically weighted Dirichlet process

Risk-aware restricted outcome learning for individualized treatment regimes of schizophrenia

Survival impact of post-operative immunotherapy in resected stage III cutaneous melanomas in the checkpoint era

A Bayesian Approach for Selecting Relevant External Data (BASE): Application to a study of Long-Term Outcomes in a Hemophilia Gene Therapy Trial

A Bayesian nonparametric approach for handling item and examinee heterogeneity in assessment data

SportQA: A Benchmark for Sports Understanding in Large Language Models

Nonparametric tests for treatment effect heterogeneity in observational studies

A Joint Analysis for Field Goal Attempts and Percentages of Professional Basketball Players: Bayesian Nonparametric Resource.

Weining Shen Information

University	University of California, Irvine
Position	___
Citations(all)	981
Citations(since 2020)	774
Cited By	482
hIndex(all)	13
hIndex(since 2020)	12
i10Index(all)	18
i10Index(since 2020)	18
Email	Access Email
University Profile Page	University of California, Irvine

Weining Shen Skills & Research Interests

Statistics

Machine learning

Biostatistics

Top articles of Weining Shen

Clustering spatial functional data using a geographically weighted Dirichlet process

Authors

Tianyu Pan,Weining Shen,Guanyu Hu

Journal

Canadian Journal of Statistics

Published Date

2024/1/5

We propose a Bayesian nonparametric clustering approach to study the spatial heterogeneity effect for functional data observed at spatially correlated locations. We consider a geographically weighted Chinese restaurant process equipped with a conditional autoregressive prior to capture fully the spatial correlation of function curves. To sample efficiently from our model, we customize a prior called Quadratic Gamma, which ensures conjugacy. We design a Markov chain Monte Carlo algorithm to infer simultaneously the posterior distributions of the number of groups and the grouping configurations. The superior numerical performance of the proposed method over competing methods is demonstrated using simulated examples and a U.S. annual precipitation study.

Risk-aware restricted outcome learning for individualized treatment regimes of schizophrenia

Authors

Shuying Zhu,Weining Shen,Haoda Fu,Annie Qu

Journal

The Annals of Applied Statistics

Published Date

2024/6

Additional numerical results and the algorithm description are provided.

Survival impact of post-operative immunotherapy in resected stage III cutaneous melanomas in the checkpoint era

Authors

G Hagopian,X Jiang,C Grant,D Brazel,P Kumar,M Yamamoto,J Jakowatz,W Chow,T Tran,W Shen,J Moyers

Journal

ESMO open

Published Date

2024/2/1

BackgroundCheckpoint inhibitors have shown improvement in recurrence-free survival in the post-operative setting for node-positive melanoma and were first approved in late 2015. However, single-agent checkpoint therapies have yet to show benefit to overall survival (OS) for lower-risk stage III cancers. We evaluated the OS benefit of post-operative immunotherapy in the National Cancer Database (NCDB).Patients and methodsPatient cases were selected from the NCDB 2020 Participant Use File. Patients diagnosed with stage III cutaneous melanoma between 2016 and 2019 who underwent definitive resection for their melanoma were included. OS between those who received post-operative immunotherapy within 84 days of surgery and those who did not was analyzed by the Kaplan–Meier method. Demographic and clinical characteristics between the two groups were compared via Cox proportional hazard …

A Bayesian Approach for Selecting Relevant External Data (BASE): Application to a study of Long-Term Outcomes in a Hemophilia Gene Therapy Trial

Authors

Tianyu Pan,Xiang Zhang,Weining Shen,Ting Ye

Journal

arXiv preprint arXiv:2403.13260

Published Date

2024/3/20

Gene therapies aim to address the root causes of diseases, particularly those stemming from rare genetic defects that can be life-threatening or severely debilitating. While there has been notable progress in the development of gene therapies in recent years, understanding their long-term effectiveness remains challenging due to a lack of data on long-term outcomes, especially during the early stages of their introduction to the market. To address the critical question of estimating long-term efficacy without waiting for the completion of lengthy clinical trials, we propose a novel Bayesian framework. This framework selects pertinent data from external sources, often early-phase clinical trials with more comprehensive longitudinal efficacy data that could lead to an improved inference of the long-term efficacy outcome. We apply this methodology to predict the long-term factor IX (FIX) levels of HEMGENIX (etranacogene dezaparvovec), the first FDA-approved gene therapy to treat adults with severe Hemophilia B, in a phase 3 study. Our application showcases the capability of the framework to estimate the 5-year FIX levels following HEMGENIX therapy, demonstrating sustained FIX levels induced by HEMGENIX infusion. Additionally, we provide theoretical insights into the methodology by establishing its posterior convergence properties.

A Bayesian nonparametric approach for handling item and examinee heterogeneity in assessment data

Authors

Tianyu Pan,Weining Shen,Clintin P Davis‐Stober,Guanyu Hu

Journal

British Journal of Mathematical and Statistical Psychology

Published Date

2024/2

We propose a novel nonparametric Bayesian item response theory model that estimates clusters at the question level, while simultaneously allowing for heterogeneity at the examinee level under each question cluster, characterized by a mixture of binomial distributions. The main contribution of this work is threefold. First, we present our new model and demonstrate that it is identifiable under a set of conditions. Second, we show that our model can correctly identify question‐level clusters asymptotically, and the parameters of interest that measure the proficiency of examinees in solving certain questions can be estimated at a n rate (up to a log term). Third, we present a tractable sampling algorithm to obtain valid posterior samples from our proposed model. Compared to the existing methods, our model manages to reveal the multi‐dimensionality of the examinees' proficiency level in handling different types of …

SportQA: A Benchmark for Sports Understanding in Large Language Models

Authors

Haotian Xia,Zhengbang Yang,Yuqing Wang,Rhys Tracy,Yun Zhao,Dongdong Huang,Zezhi Chen,Yan Zhu,Yuan-fang Wang,Weining Shen

Journal

arXiv preprint arXiv:2402.15862

Published Date

2024/2/24

A deep understanding of sports, a field rich in strategic and dynamic content, is crucial for advancing Natural Language Processing (NLP). This holds particular significance in the context of evaluating and advancing Large Language Models (LLMs), given the existing gap in specialized benchmarks. To bridge this gap, we introduce SportQA, a novel benchmark specifically designed for evaluating LLMs in the context of sports understanding. SportQA encompasses over 70,000 multiple-choice questions across three distinct difficulty levels, each targeting different aspects of sports knowledge from basic historical facts to intricate, scenario-based reasoning tasks. We conducted a thorough evaluation of prevalent LLMs, mainly utilizing few-shot learning paradigms supplemented by chain-of-thought (CoT) prompting. Our results reveal that while LLMs exhibit competent performance in basic sports knowledge, they struggle with more complex, scenario-based sports reasoning, lagging behind human expertise. The introduction of SportQA marks a significant step forward in NLP, offering a tool for assessing and enhancing sports understanding in LLMs.

Nonparametric tests for treatment effect heterogeneity in observational studies

Authors

Maozhu Dai,Weining Shen,Hal S Stern

Journal

Canadian Journal of Statistics

Published Date

2023/6

We consider the problem of testing for treatment effect heterogeneity in observational studies and propose a nonparametric test based on multisample U$$ U $$‐statistics. To account for potential confounders, we use reweighted data where the weights are determined by estimated propensity scores. The proposed method does not require any parametric assumptions on the outcomes and bypasses the need for modelling the treatment effect for each study subgroup. We establish the asymptotic normality for the test statistic and demonstrate its superior numerical performance over several competing approaches via simulation studies. Two real data applications are discussed: an employment programme evaluation study and a mental health study of China's one‐child policy.

A Joint Analysis for Field Goal Attempts and Percentages of Professional Basketball Players: Bayesian Nonparametric Resource.

Authors

Eliot Wong-Toi,Hou-Cheng Yang,Weining Shen,Guanyu Hu

Journal

Journal of Data Science

Published Date

2023/1/1

Understanding shooting patterns among different players is a fundamental problem in basketball game analyses. In this paper, we quantify the shooting pattern via the field goal attempts and percentages over twelve non-overlapping regions around the front court. A joint Bayesian nonparametric mixture model is developed to find latent clusters of players based on their shooting patterns. We apply our proposed model to learn the heterogeneity among selected players from the National Basketball Association (NBA) games over the 2018-2019 regular season and 2019-2020 bubble season. Thirteen clusters are identified for 2018-2019 regular season and seven clusters are identified for 2019-2020 bubble season. We further examine the shooting patterns of players in these clusters and discuss their relation to players' other available information. The results shed new insights on the effect of NBA COVID bubble and …

Covariate dependent Beta-GOS process

Authors

Kunzhi Chen,Weining Shen,Weixuan Zhu

Journal

Computational Statistics & Data Analysis

Published Date

2023/4/1

Covariate-dependent processes have been widely used in Bayesian nonparametric statistics thanks to their flexibility to incorporate covariate information and correlation among process realizations. Unlike most of the existing work that focuses on extensions of exchangeable species sampling processes such as Dirichlet process, a new class of covariate-dependent nonexchangeable priors is proposed by considering the generalization of an nonexchangeable sequence, namely the Beta-GOS model. The proposed prior has an equivalent formulation under a continuous kernel mixture. It also has a latent variable representation that leads to a natural nonexchangeable parallel with the classical dependent Dirichlet process formulation. This prior is further applied in regression and autoregressive models and it is shown that its posterior sampling algorithm enjoys the same computational complexity with that of the …

Spatiotemporal Besov priors for Bayesian inverse problems

Authors

Shiwei Lan,Mirjeta Pasha,Shuyi Li

Journal

arXiv preprint arXiv:2306.16378

Published Date

2023/6/28

Fast development in science and technology has driven the need for proper statistical tools to capture special data features such as abrupt changes or sharp contrast. Many applications in the data science seek spatiotemporal reconstruction from a sequence of time-dependent objects with discontinuity or singularity, e.g. dynamic computerized tomography (CT) images with edges. Traditional methods based on Gaussian processes (GP) may not provide satisfactory solutions since they tend to offer over-smooth prior candidates. Recently, Besov process (BP) defined by wavelet expansions with random coefficients has been proposed as a more appropriate prior for this type of Bayesian inverse problems. While BP outperforms GP in imaging analysis to produce edge-preserving reconstructions, it does not automatically incorporate temporal correlation inherited in the dynamically changing images. In this paper, we generalize BP to the spatiotemporal domain (STBP) by replacing the random coefficients in the series expansion with stochastic time functions following Q-exponential process which governs the temporal correlation strength. Mathematical and statistical properties about STBP are carefully studied. A white-noise representation of STBP is also proposed to facilitate the point estimation through maximum a posterior (MAP) and the uncertainty quantification (UQ) by posterior sampling. Two limited-angle CT reconstruction examples and a highly non-linear inverse problem involving Navier-Stokes equation are used to demonstrate the advantage of the proposed STBP in preserving spatial features while accounting for temporal changes …

Covariance estimation for matrix-valued data

Authors

Yichi Zhang,Weining Shen,Dehan Kong

Journal

Journal of the American Statistical Association

Published Date

2023/10/2

Covariance estimation for matrix-valued data has received an increasing interest in applications. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are established, and the derived minimax lower bound shows our proposed estimator is rate …

Analysis of professional basketball field goal attempts via a Bayesian matrix clustering approach

Authors

Fan Yin,Guanyu Hu,Weining Shen

Journal

Journal of Computational and Graphical Statistics

Published Date

2023/1/2

We propose a Bayesian nonparametric matrix clustering approach to analyze the latent heterogeneity structure in the shot selection data collected from professional basketball players in the National Basketball Association (NBA). The proposed method adopts a mixture of finite mixtures framework and fully uses the spatial information via a mixture of matrix normal distribution representation. We propose an efficient Markov chain Monte Carlo algorithm for posterior sampling that allows simultaneous inference on both the number of clusters and the cluster configurations. We also establish large-sample convergence properties for the posterior distribution. The compelling empirical performance of the proposed method is demonstrated via simulation studies and an application to shot chart data from selected players in the NBAs 2017–2018 regular season. Supplementary materials for this article are available online.

Advanced Volleyball Stats for All Levels: Automatic Setting Tactic Detection and Classification with a Single Camera

Authors

Haotian Xia,Rhys Tracy,Yun Zhao,Yuqing Wang,Yuan-Fang Wang,Weining Shen

Published Date

2023/12/4

This paper presents PathFinder and PathFinderPlus, two novel end-to-end computer vision frameworks designed specifically for advanced setting strategy classification in volleyball matches from a single camera view. Our frameworks combine setting ball trajectory recognition with a novel set trajectory classifier to generate comprehensive and advanced statistical data. This approach offers a fresh perspective for in-game analysis and surpasses the current level of granularity in volleyball statistics. In comparison to existing methods used in our baseline PathFinder framework, our proposed ball trajectory detection methodology in PathFinderPlus exhibits superior performance for classifying setting tactics under various game conditions. This robustness is particularly advantageous in handling complex game situations and accommodating different camera angles. Additionally, our study introduces an innovative …

Bayesian methods in tensor analysis

Authors

Yiyao Shi,Weining Shen

Journal

arXiv preprint arXiv:2302.05978

Published Date

2023/2/12

Tensors, also known as multidimensional arrays, are useful data structures in machine learning and statistics. In recent years, Bayesian methods have emerged as a popular direction for analyzing tensor-valued data since they provide a convenient way to introduce sparsity into the model and conduct uncertainty quantification. In this article, we provide an overview of frequentist and Bayesian methods for solving tensor completion and regression problems, with a focus on Bayesian methods. We review common Bayesian tensor approaches including model formulation, prior assignment, posterior computation, and theoretical properties. We also discuss potential future directions in this field.

Normal Approximation for Bayesian Mixed Effects Binomial Regression Models

Authors

Brandon Berman,Wesley O Johnson,Weining Shen

Journal

Bayesian Analysis

Published Date

2023/6

The supplementary material contains technical details and proofs that the full conditionals for u and for T(u) are asymptotically normal, as well as details for the more complex model.

Simultaneous Denoising and Heterogeneity Learning for Time Series Data

Authors

Xiwen Jiang,Weining Shen

Journal

Statistics in Biosciences

Published Date

2023/8/24

Noisy time series data are often collected in biomedical applications, and it remains an important task to understand the data heterogeneity. We propose an approach that combines the strength of trend filtering and distance-based clustering to simultaneously perform temporal mean denoising and subject-level clustering. We discuss an iterative algorithm that efficiently computes the cluster structure and clusterwise mean trends. Simulation studies confirm the excellent numerical performance of our method. We further consider two data application examples including an U.S. lung cancer mortality study and a suicide rate study.

Structure recovery and trend estimation for dynamic network analysis

Authors

Xiwen Jiang,Selena Margarita Livas,Fan Yin,Sayantan Banerjee,Carter T Butts,Weining Shen

Journal

Stat

Published Date

2023/1

Low‐dimensional parametric models for network dynamics have been successful as inferentially efficient and interpretable tools for modelling network evolution but have difficulty in settings with strong time inhomogeneity (particularly when sharp variation in parameters is possible and covariates are limited). Here, we propose to address this problem via a novel family of block‐structured dynamic exponential‐family random graph models (ERGMs), where the time domain is divided into consecutive blocks and the network parameters are assumed to evolve smoothly within each block. In particular, we let the latent ERGM parameters follow a piecewise polynomial model with an unknown block structure (e.g., change points). We propose an iterative estimation procedure that involves estimating the block structure using trend filtering and fitting ERGMs for networks belonging to the same time block. We demonstrate …

Changes in beat-to-beat blood pressure and pulse rate variability following stroke

Authors

Arash Abiri,En-Fan Chou,Weining Shen,Mark J Fisher,Michelle Khine

Journal

Scientific Reports

Published Date

2023/11/7

Associations between cerebrovascular disease and impaired autonomic function and cerebrovascular reactivity have led to increased interest in variability of heart rate (HRV) and blood pressure (BPV) following stroke. In this study, beat-to-beat pulse rate variability (PRV) and BPV were measured in clinically stable stroke patients (6 ischemic, 2 hemorrhagic) at least one year after their last cerebrovascular event. Beat-to-beat blood pressure (BP) measurements were collected from subjects while resting in the sitting position for one hour. Compared with healthy controls, stroke patients exhibited significantly greater time-domain (standard deviation, coefficient of variation, average real variability) and normalized high-frequency BPV (all p < 0.05). Stroke patients also exhibited lower LF:HF ratios than control subjects (p = 0.003). No significant differences were observed in PRV between the two groups, suggesting …

Genomic alterations and tumor mutation burden in merkel cell carcinoma

Authors

Danielle Brazel,Priyanka Kumar,Hung Doan,Tianyu Pan,Weining Shen,Ling Gao,Justin T Moyers

Journal

JAMA Network Open

Published Date

2023/1/3

ImportanceMerkel cell carcinoma (MCC) is a rare and highly aggressive cutaneous neuroendocrine carcinoma with increasing incidence. Cytotoxic chemotherapy and checkpoint inhibitors provide treatment options in the metastatic setting; however, there are no approved or standard of care targeted therapy treatment options.ObjectiveTo identify actionable alterations annotated by the OncoKB database therapeutic evidence level in association with tumor mutation burden (TMB).Design, Setting, and ParticipantsThis is a retrospective, cross-sectional study using data from the American Association for Cancer Research Genomics Evidence Neoplasia Information Exchange, a multicenter international cancer consortium database. Patients with MCC were enrolled in participating institutions between 2017 and 2022. Data from version 11.0 of the database were released in January 2022 and analyzed from April to June …

Random forest adjustment for approximate Bayesian computation

Authors

Jiefeng Bi,Weining Shen,Weixuan Zhu

Journal

Journal of Computational and Graphical Statistics

Published Date

2022/1/2

We propose a novel method for regression adjustment in approximate Bayesian computation to help improve the accuracy and computational efficiency of the posterior inference. The proposed method uses random forest regression to model the connection between summary statistics and the parameters of interest. Compared with existing approaches, the proposed method bypasses the need of preselection of summary statistics in the model, and is capable of capturing the potential nonlinear relationship between the parameters of interest and summary statistics. We also introduce a measure to quantify the importance of each summary statistic used in the model. We study the asymptotic properties of the proposed estimator and show that it has an excellent finite-sample numerical performance via two simulation examples and an application to a population genetic study.

See List of Professors in Weining Shen University(University of California, Irvine)