Kun Chen

Kun Chen

University of Connecticut

H-index: 26

North America-United States

About Kun Chen

Kun Chen, With an exceptional h-index of 26 and a recent h-index of 24 (since 2020), a distinguished researcher at University of Connecticut, specializes in the field of Statistics, Machine Learning, Health Data Science.

His recent articles reflect a diverse array of research interests and contributions to the field:

Tensor regression for incomplete observations with application to longitudinal studies

Variational Bayesian inference for bipartite mixed-membership stochastic block model with applications to collaborative filtering

Tree-guided rare feature selection and logic aggregation with electronic health records data

Transfer learning with large-scale quantile regression

Elucidating Molecular Networks Underpinning Heterogeneity in Parkinson's Disease Progression Across Clinical Manifestation Spectrum

A multilinear least-squares formulation for sparse tensor canonical correlation analysis

The impact of early life experiences and gut microbiota on neurobehavioral development among preterm infants: A longitudinal cohort study

Accuracy and generalizability of machine learning models for adolescent suicide prediction with longitudinal clinical records

Kun Chen Information

University

University of Connecticut

Position

Associate Professor

Citations(all)

3723

Citations(since 2020)

2785

Cited By

1495

hIndex(all)

26

hIndex(since 2020)

24

i10Index(all)

44

i10Index(since 2020)

39

Email

University Profile Page

University of Connecticut

Kun Chen Skills & Research Interests

Statistics

Machine Learning

Health Data Science

Top articles of Kun Chen

Tensor regression for incomplete observations with application to longitudinal studies

Authors

Tianchen Xu,Kun Chen,Gen Li

Journal

The Annals of Applied Statistics

Published Date

2024/6

The supplementary material contains the extension of the proposed method to log-contrast models, the derivation of the model fitting algorithms, and additional numerical results for simulation and real data studies.

Variational Bayesian inference for bipartite mixed-membership stochastic block model with applications to collaborative filtering

Authors

Jie Liu,Zifeng Ye,Kun Chen,Panpan Zhang

Journal

Computational Statistics & Data Analysis

Published Date

2024/1/1

A network-based method applied to collaborative filtering in recommender systems is introduced in this paper. Specifically, a novel mixed-membership stochastic block model with a conjugate prior from the exponential family is proposed for bipartite networks. The analytical expression of the model is derived, and a variational Bayesian algorithm that is computationally feasible for approximating the untractable posterior distributions is presented. Extensive simulations show that the proposed model provides more accurate inference than competing methods with the presence of outliers. The proposed model is also applied to a MovieLens dataset for a real data application.

Tree-guided rare feature selection and logic aggregation with electronic health records data

Authors

Jianmin Chen,Robert H Aseltine,Fei Wang,Kun Chen

Journal

Journal of the American Statistical Association

Published Date

2024/2/22

Statistical learning with a large number of rare binary features is commonly encountered in analyzing electronic health records (EHR) data, especially in the modeling of disease onset with prior medical diagnoses and procedures. Dealing with the resulting highly sparse and large-scale binary feature matrix is notoriously challenging as conventional methods may suffer from a lack of power in testing and inconsistency in model fitting, while machine learning methods may suffer from the inability of producing interpretable results or clinically-meaningful risk factors. To improve EHR-based modeling and use the natural hierarchical structure of disease classification, we propose a tree-guided feature selection and logic aggregation approach for large-scale regression with rare binary features, in which dimension reduction is achieved through not only a sparsity pursuit but also an aggregation promoter with the logic …

Transfer learning with large-scale quantile regression

Authors

Jun Jin,Jun Yan,Robert H Aseltine,Kun Chen

Journal

Technometrics

Published Date

2024/2/7

Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data may go beyond the target and be supplemented from other sources that possibly share similarities with the target. A crucial question is how to properly distinguish and utilize useful information from other sources to improve the quantile estimation and inference at the target. We develop transfer learning methods for high-dimensional quantile regression by detecting informative sources whose models are similar to the target and utilizing them to improve the target model. We show that under reasonable conditions, the detection of the informative sources based on sample splitting is consistent. Compared to the naive estimator with only the target data, the transfer learning estimator …

Elucidating Molecular Networks Underpinning Heterogeneity in Parkinson's Disease Progression Across Clinical Manifestation Spectrum

Authors

Manqi Zhou,Alison Ke,Xingbo Wang,Kun Chen,Fei Wang,Chang Su

Journal

medRxiv

Published Date

2024

In this study, we applied statistical and machine learning techniques to identify molecular mechanisms underlying the heterogeneity in individual Parkinson's Disease (PD) progression. Leveraging data from the Parkinson's Progression Markers Initiative (PPMI) cohort, we analyzed genetic and clinical data for patients with PD, focusing on traits including motor symptoms, non-motor symptoms, and biomarkers. Our method identified significant single-nucleotide polymorphisms (SNPs) associated with each PD trait, revealing key genetic factors and their impact on disease progression. Furthermore, through network medicine approaches, we delineated disease modules, uncovering unique gene clusters and their roles in PD pathology. The integration of pathway enrichment analysis further enhanced our understanding of the functional implications of these genetic variations, notably highlighting the significance of cellular stress response and protein aggregation pathways in PD. Overall, our findings offer a comprehensive view of the genetic landscape of PD progression, highlighting the potential of personalized medicine in managing this complex disease.

A multilinear least-squares formulation for sparse tensor canonical correlation analysis

Authors

Jun Yu,Zhaoming Kong,Kun Chen,Xin Zhang,Yong Chen,Lifang He

Journal

Transactions on Machine Learning Research

Published Date

2023/6/12

Tensor data are becoming important recently in various applications, e.g., image and video recognition, which pose new challenges for data modeling and analysis approaches, such as high-order relations of large complexity, varying data scale and gross noise. In this paper, we consider the problem of sparse canonical correlation analysis for arbitrary tensor data. Although several methods have been proposed for this task, there are still limitations hindering its practical applications. To this end, we present a general Sparse Tensor Canonical Correlation Analysis (gSTCCA) method from a multilinear least-squares perspective. Specifically, we formulate the problem as a constrained multilinear least-squares problem with tensor-structured sparsity regularization based on CANDECOMP/PARAFAC (CP) decomposition. Then we present a divide-and-conquer deflation approach to tackle the problem by successive rank-one tensor estimation of the residual tensors, where the overall model is broken up into a set of unconstrained linear least-squares problems that can be efficiently solved. Through extensive experiments conducted on five different datasets for recognition tasks, we demonstrate that the proposed method achieves promising performance compared to the SOTA vector- and tensor-based canonical correlation analysis methods in terms of classification accuracy, model sparsity, and robustness to missing and noisy data. The code is publicly available at https://github.com/junfish/gSTCCA.

The impact of early life experiences and gut microbiota on neurobehavioral development among preterm infants: A longitudinal cohort study

Authors

Jie Chen,Hongfei Li,Tingting Zhao,Kun Chen,Ming-Hui Chen,Zhe Sun,Wanli Xu,Kendra Maas,Barry M Lester,Xiaomei S Cong

Journal

Microorganisms

Published Date

2023/3/22

Objectives The objective of this study is to investigate the impact of early life experiences and gut microbiota on neurobehavioral development in preterm infants during neonatal intensive care unit (NICU) hospitalization. Methods Preterm infants were followed from NICU admission until their 28th postnatal day or until discharge. Daily stool samples, painful/stressful experiences, feeding patterns, and other clinical and demographic data were collected. Gut microbiota was profiled using 16S rRNA sequencing, and operational taxonomic units (OTUs) were selected to predict the neurobehaviors. The neurobehavioral development was assessed by the Neonatal Neurobehavioral Scale (NNNS) at 36 to 38 weeks of post-menstrual age (PMA). Fifty-five infants who had NNNS measurements were included in the sparse log-contrast regression analysis. Results Preterm infants who experienced a high level of pain/stress during the NICU hospitalization had higher NNNS stress/abstinence scores. Eight operational taxonomic units (OTUs) were identified to be associated with NNNS subscales after controlling demographic and clinical features, feeding patterns, and painful/stressful experiences. These OTUs and taxa belonging to seven genera, i.e., Enterobacteriaceae_unclassified, Escherichia-Shigella, Incertae_Sedis, Veillonella, Enterococcus, Clostridium_sensu_stricto_1, and Streptococcus with five belonging to Firmicutes and two belonging to Proteobacteria phylum. The enriched abundance of Enterobacteriaceae_unclassified (OTU17) and Streptococcus (OTU28) were consistently associated with less optimal neurobehavioral outcomes. The other …

Accuracy and generalizability of machine learning models for adolescent suicide prediction with longitudinal clinical records

Authors

Chengxi Zang,Yu Hou,Jun Jin,Shane Sacco,Kun Chen,Robert Aseltine,Fei Wang

Published Date

2023/6/16

Machine Learning (ML) models trained from real-world data (RWD) have demonstrated promise in predicting suicide attempts in adolescents. However, their cross-data performance and transportability for suicide prediction are largely unknown, hindering the clinical adoption of these ML models. We developed different ML suicide prediction models based on RWD collected in different contexts (inpatient, outpatient, etc.) with varying purposes (eg, administrative claims and electronic health records), compare their cross-data performance, and evaluate their transportability. The data used was from the All-Payer Claims Database (APCD) and Hospital Inpatient Discharge Database (HIDD) in Connecticut as well as Electronic Health Records (EHR) data provided by Kansas Health Information Network (KHIN). From three datasets collected in different settings (inpatient, outpatient, etc.) and purposes (eg, administrative claims and electronic health records), we included 285,320 patients among whom we identified 3389 (1.2%) suicide attempters. 66.0% of the suicide attempters were female. Different machine learning models (regularized logistic regression, gradient boosting machine, long-short term memory neural network) were evaluated on the local datasets and transported datasets. Significant and comparable decreases in the transfer performance compared to the local performance of all three ML models were observed, with the decline in performance reducing average AUC by up to 7.7%, reducing average sensitivity by up to 16%, and PPV by up to 2% at the 90% specificity level, and reducing sensitivity by up to 20% and PPV by up to 5% at …

Regularized Fingerprinting in Detection and Attribution of Climate Change with Weight Matrix Optimizing the Efficiency in Scaling Factor Estimation

Authors

Yan Li,Kun Chen,Jun Yan,Xuebin Zhang

Journal

Annals of Applied Statistics

Published Date

2023

A) Sufficiency to Assume Orthogonal Covariates; B) Justification of Method MV in the GLS Case; C) Justification of Method MV in the GTLS Case; D) Detailed Results on Simulation Studies; E) Details of the CMIP5 climate models for the control runs.

It's all relative: Regression analysis with compositional predictors

Authors

Gen Li,Yan Li,Kun Chen

Journal

Biometrics

Published Date

2022

Compositional data reside in a simplex and measure fractions or proportions of parts to a whole. Most existing regression methods for such data rely on log‐ratio transformations that are inadequate or inappropriate in modeling high‐dimensional data with excessive zeros and hierarchical structures. Moreover, such models usually lack a straightforward interpretation due to the interrelation between parts of a composition. We develop a novel relative‐shift regression framework that directly uses proportions as predictors. The new framework provides a paradigm shift for regression analysis with compositional predictors and offers a superior interpretation of how shifting concentration between parts affects the response. New equi‐sparsity and tree‐guided regularization methods and an efficient smoothing proximal gradient algorithm are developed to facilitate feature aggregation and dimension reduction in regression …

The preterm human milk microbiota fluctuates by postpartum week and is characterized by gestational age and maternal BMI

Authors

Evgenia Jen Filatava,Zhongmao Liu,Jiaojiao Xie,Dong-Binh Tran,Kun Chen,Noura El Habbal,George Weinstock,Yanjiao Zhou,Katherine E Gregory

Journal

Mbio

Published Date

2023/12/19

Evidence suggests that the type of nutrition fed to preterm infants influences their intestinal microbiome and immunity. However, few studies have conducted a longitudinal analysis of the microbiota of mother’s own milk (MOM) and the factors shaping its composition. Furthermore, the microbiota of pasteurized donor human milk (PDHM) and infant formula have not been extensively investigated. Here, we examine the microbiota of 238 MOM, 30 PDHM, and 73 formula samples from a cohort of 72 preterm infants using 16S rRNA sequencing. We find differences in the microbial diversity and composition between nutrition types. Additionally, we demonstrate that the microbiota of MOM exhibits temporal fluctuations and has associations with several maternal factors. Lastly, we identify three microbiota community clusters within MOM—termed lactotypes—that have distinct taxonomic compositions and maternal factors …

Survival modeling of suicide risk with rare and uncertain diagnoses

Authors

Wenjie Wang,Chongliang Luo,Robert H Aseltine,Fei Wang,Jun Yan,Kun Chen

Journal

Statistics in Biosciences

Published Date

2023/6/7

Motivated by the pressing need for suicide prevention through improving behavioral healthcare, we use medical claims data to study the risk of subsequent suicide attempts (SA) for patients who were hospitalized due to suicide attempts and later discharged. Understanding the risk behaviors of such patients at elevated suicide risk is an important step towards the goal of “Zero Suicide”. An immediate and unconventional challenge is that the identification of SA from medical claims contains substantial uncertainty: almost 20% of “suspected” SA are identified from diagnosis codes indicating external causes of injury and poisoning with undermined intent. It is thus of great interest to learn which of these undetermined events are more likely actual SA and how to properly utilize them in survival analysis with severe censoring. To tackle these interrelated problems, we develop an integrative Cox cure model with …

Discussion of ‘Statistical inference for streamed longitudinal data’

Authors

Peter XK Song,Ling Zhou

Journal

Biometrika

Published Date

2023/12/1

We congratulate the authors for their contribution (Luo et al., 2023) to the development of a timely streaming data analytic. Our discussion focuses on the robustness of their proposed inference against potential outliers or contaminated data cases that are pervasive in practice. Arguably, the issue of data quality is a legitimate concern, which is particularly relevant to streaming data that may arrive voluminously and perpetually. Assuming the absence of abnormal data cases, as considered in Luo et al.(2023), is unrealistic and impractical. Thus, in this commentary we attempt a discussion of the robustness through simulation experiments for the proposed statistical inference, an important methodology aspect that has been ignored by the authors in their paper.

Target-based fusion using social determinants of health to enhance suicide prediction with electronic health records

Authors

Shane J Sacco,Kun Chen,Fei Wang,Robert Aseltine

Journal

PLoS one

Published Date

2023/4/26

Objective Preventing suicide in US youth is of paramount concern, with rates increasing over 50% between 2007 and 2018. Statistical modeling using electronic health records may help identify at-risk youth before a suicide attempt. While electronic health records contain diagnostic information, which are known risk factors, they generally lack or poorly document social determinants (e.g., social support), which are also known risk factors. If statistical models are built incorporating not only diagnostic records, but also social determinants measures, additional at-risk youth may be identified before a suicide attempt. Methods Suicide attempts were predicted in hospitalized patients, ages 10–24, from the State of Connecticut’s Hospital Inpatient Discharge Database (HIDD; N = 38943). Predictors included demographic information, diagnosis codes, and using a data fusion framework, social determinants features transferred or fused from an external source of survey data, The National Longitudinal Study of Adolescent to Adult Health (Add Health). Social determinant information for each HIDD patient was generated by averaging values from their most similar Add Health individuals (e.g., top 10), based upon matching shared features between datasets (e.g., Pearson’s r). Attempts were then modelled using an elastic net logistic regression with both HIDD features and fused Add Health features. Results The model including fused social determinants outperformed the conventional model (AUC = 0.83 v. 0.82). Sensitivity and positive predictive values at 90 and 95% specificity were almost 10% higher when including fused features (e.g., sensitivity at 90 …

High-throughput target trial emulation for Alzheimer’s disease drug repurposing with real-world data

Authors

Chengxi Zang,Hao Zhang,Jie Xu,Hansi Zhang,Sajjad Fouladvand,Shreyas Havaldar,Feixiong Cheng,Kun Chen,Yong Chen,Benjamin S Glicksberg,Jin Chen,Jiang Bian,Fei Wang

Journal

Nature Communications

Published Date

2023/12/11

Target trial emulation is the process of mimicking target randomized trials using real-world data, where effective confounding control for unbiased treatment effect estimation remains a main challenge. Although various approaches have been proposed for this challenge, a systematic evaluation is still lacking. Here we emulated trials for thousands of medications from two large-scale real-world data warehouses, covering over 10 years of clinical records for over 170 million patients, aiming to identify new indications of approved drugs for Alzheimer’s disease. We assessed different propensity score models under the inverse probability of treatment weighting framework and suggested a model selection strategy for improved baseline covariate balancing. We also found that the deep learning-based propensity score model did not necessarily outperform logistic regression-based methods in covariate balancing. Finally …

Associations between natural language processing–enriched social determinants of health and suicide death among US veterans

Authors

Avijit Mitra,Richeek Pradhan,Rachel D Melamed,Kun Chen,David C Hoaglin,Katherine L Tucker,Joel I Reisman,Zhichao Yang,Weisong Liu,Jack Tsai,Hong Yu

Journal

JAMA Network Open

Published Date

2023/3/1

ImportanceSocial determinants of health (SDOHs) are known to be associated with increased risk of suicidal behaviors, but few studies use SDOHs from unstructured electronic health record notes.ObjectiveTo investigate associations between veterans’ death by suicide and recent SDOHs, identified using structured and unstructured data.Design, Setting, and ParticipantsThis nested case-control study included veterans who received care under the US Veterans Health Administration from October 1, 2010, to September 30, 2015. A natural language processing (NLP) system was developed to extract SDOHs from unstructured clinical notes. Structured data yielded 6 SDOHs (ie, social or familial problems, employment or financial problems, housing instability, legal problems, violence, and nonspecific psychosocial needs), NLP on unstructured data yielded 8 SDOHs (social isolation, job or financial insecurity, housing …

Intentional self-harm among US veterans with traumatic brain injury or posttraumatic stress disorder: retrospective cohort study from 2008 to 2017

Authors

Bhanu Pratap Singh Rawat,Joel Reisman,Terri K Pogoda,Weisong Liu,Subendhu Rongali,Robert H Aseltine Jr,Kun Chen,Jack Tsai,Dan Berlowitz,Hong Yu,Kathleen F Carlson

Journal

JMIR public health and surveillance

Published Date

2023/7/24

Background Veterans with a history of traumatic brain injury (TBI) and/or posttraumatic stress disorder (PTSD) may be at increased risk of suicide attempts and other forms of intentional self-harm as compared to veterans without TBI or PTSD. Objective Using administrative data from the US Veterans Health Administration (VHA), we studied associations between TBI and PTSD diagnoses, and subsequent diagnoses of intentional self-harm among US veterans who used VHA health care between 2008 and 2017. Methods All veterans with encounters or hospitalizations for intentional self-harm were assigned “index dates” corresponding to the date of the first related visit; among those without intentional self-harm, we randomly selected a date from among the veteran’s health care encounters to match the distribution of case index dates over the 10-year period. We then examined the prevalence of TBI and PTSD diagnoses within the 5-year period prior to veterans’ index dates. TBI, PTSD, and intentional self-harm were identified using International Classification of Diseases diagnosis and external cause of injury codes from inpatient and outpatient VHA encounters. We stratified analyses by veterans’ average yearly VHA utilization in the 5-year period before their index date (low, medium, or high). Variations in prevalence and odds of intentional self-harm diagnoses were compared by veterans’ prior TBI and PTSD diagnosis status (TBI only, PTSD only, and comorbid TBI/PTSD) for each VHA utilization stratum. Multivariable models adjusted for age, sex, race, ethnicity, marital status, Department of …

Multivariate functional regression via nested reduced-rank regularization

Authors

Xiaokang Liu,Shujie Ma,Kun Chen

Journal

Journal of Computational and Graphical Statistics

Published Date

2022

We propose a nested reduced-rank regression (NRRR) approach in fitting a regression model with multivariate functional responses and predictors to achieve tailored dimension reduction and facilitate model interpretation and visualization. Our approach is based on a two-level low-rank structure imposed on the functional regression surfaces. A global low-rank structure identifies a small set of latent principal functional responses and predictors that drives the underlying regression association. A local low-rank structure then controls the complexity and smoothness of the association between the principal functional responses and predictors. The functional problem boils down to an integrated matrix approximation task through basis expansion, where the blocks of an integrated low-rank matrix share some common row space and/or column space. This nested reduced-rank structure also finds potential applications …

Excitation wavelength-dependent photoluminescence decay of single quantum dots near plasmonic gold nanoparticles

Authors

Yonglei Sun,Yongchen Wang,Hua Zhu,Na Jin,Adnan Mohammad,Necmi Biyikli,Ou Chen,Kun Chen,Jing Zhao

Journal

The Journal of Chemical Physics

Published Date

2022/4/21

Changing the excitation wavelength is a simple but effective strategy to modulate the photophysical cha racteristics of colloidal quantum dots (QDs) near plasmonic nanostructures. It has been observed that the photoluminescence (PL) decay of QDs near plasmonic nanostructures differs when the excitation wavelength is varied, but the exact mechanism is still unclear today. Here, we studied the excitation wavelength dependence of the PL decay of CdSe/CdS core/shell QDs near plasmonic gold nanoparticles at the single QD level. With the aid of statistical science, we demonstrated that the PL decay of a single QD near gold nanoparticles is generally faster when the QD is excited spectrally close to the localized surface plasmon resonance of gold nanoparticles. This excitation wavelength dependence is mainly caused by the varied proportion of photons coming from biexciton emission, which is the result of …

Multivariate Linear Regression

Authors

Gregory C Reinsel,Raja P Velu,Kun Chen

Published Date

2022

Regression methods are perhaps the most widely used statistical tools in data analysis. When several response variables are studied simultaneously, we are in the sphere of multivariate regression. The usual description of the multivariate regression model, which relates the set of m multiple responses to a set of n predictor variables, assumes implicitly that the m × n regression coefficient matrix is of full rank. It can then be demonstrated that the simultaneous estimation of the elements of the coefficient matrix, by least squares or maximum likelihood estimation methods, yields the same results as a set of m multiple regressions, where each of the m individual response variables is regressed separately on the predictor variables.

See List of Professors in Kun Chen University(University of Connecticut)

Kun Chen FAQs

What is Kun Chen's h-index at University of Connecticut?

The h-index of Kun Chen has been 24 since 2020 and 26 in total.

What are Kun Chen's top articles?

The articles with the titles of

Tensor regression for incomplete observations with application to longitudinal studies

Variational Bayesian inference for bipartite mixed-membership stochastic block model with applications to collaborative filtering

Tree-guided rare feature selection and logic aggregation with electronic health records data

Transfer learning with large-scale quantile regression

Elucidating Molecular Networks Underpinning Heterogeneity in Parkinson's Disease Progression Across Clinical Manifestation Spectrum

A multilinear least-squares formulation for sparse tensor canonical correlation analysis

The impact of early life experiences and gut microbiota on neurobehavioral development among preterm infants: A longitudinal cohort study

Accuracy and generalizability of machine learning models for adolescent suicide prediction with longitudinal clinical records

...

are the top articles of Kun Chen at University of Connecticut.

What are Kun Chen's research interests?

The research interests of Kun Chen are: Statistics, Machine Learning, Health Data Science

What is Kun Chen's total number of citations?

Kun Chen has 3,723 citations in total.

What are the co-authors of Kun Chen?

The co-authors of Kun Chen are Sally Wenzel, Eric A. Hoffman, Fei Wang, FAMIA, FIAHSI, FACMI, Ji Zhu, kung-sik chan.

    Co-Authors

    H-index: 137
    Sally Wenzel

    Sally Wenzel

    University of Pittsburgh

    H-index: 115
    Eric A. Hoffman

    Eric A. Hoffman

    University of Iowa

    H-index: 80
    Fei Wang, FAMIA, FIAHSI, FACMI

    Fei Wang, FAMIA, FIAHSI, FACMI

    Cornell University

    H-index: 69
    Ji Zhu

    Ji Zhu

    University of Michigan-Dearborn

    H-index: 56
    kung-sik chan

    kung-sik chan

    University of Iowa

    academic-engine

    Useful Links