Building an Enhanced Publication Ecosystem for Statistical Innovation

Statistics in Biosciences

Published On 2024/3/16

Statistical research functions much like a supply chain, beginning with the identification of demands, such as open problems in science and technology, and the gathering or generation of data (eg, through surveys, clinical trials, or scientific experiments). These demands and data fuel the creation of innovative analytical solutions, encompassing fresh theories and methodologies for study design and data analysis, and, increasingly, the development of software tools to disseminate these solutions to end-users. To ensure proper and effective utilization of new methods and tools, it is crucial to systematically evaluate them, enabling users to understand their properties, capabilities, and limitations. Ultimately, their value must be tested and demonstrated by solving real-world problems. Thus, demands, data, theory, methods, software, as well as their evaluation, validation, and application, all serve as vital components …

Journal

Statistics in Biosciences

Page

1-4

Authors

Hongkai Ji

Hongkai Ji

Johns Hopkins University

H-Index

47

Research Interests

Computational Biology

Statistics

Genomics

Big Data

Gene Regulation

University Profile Page

Other Articles from authors

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Journal of Allergy and Clinical Immunology

Metabolomic Profiles during Early Childhood and Risk of Food Allergies and Asthma in Multi-ethnic Children from a Prospective Birth Cohort

BackgroundThere are increasing numbers of metabolomic studies in food allergy (FA) and asthma, which, however, are predominantly limited by cross-sectional designs, small sample size, and being conducted in European populations.ObjectiveWe sought to identify metabolites unique to and shared by children with FA and/or asthma in a racially diverse prospective birth cohort, the Boston Birth Cohort.MethodsMass spectrometry–based untargeted metabolomic profiling was performed using venous plasma collected in early childhood (n = 811). FA was diagnosed according to clinical symptoms consistent with an acute hypersensitivity reaction at food ingestion and food specific-IgE > 0.35 kU/L. Asthma was defined on the basis of physician diagnosis. Generalized estimating equations were applied to analyze metabolomic associations with FA and asthma, adjusting for potential confounders.ResultsDuring a …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

The Journal of Infectious Diseases

Emergency myelopoiesis distinguishes multisystem inflammatory syndrome in children from pediatric severe COVID-19

Background Multisystem inflammatory syndrome in children (MIS-C) is a hyperinflammatory condition caused by recent infection with severe acute respiratory syndrome coronavirus 2, but the underlying immunological mechanisms driving this distinct syndrome are unknown. Methods We utilized high-dimensional flow cytometry, cell-free (cf) DNA, and cytokine and chemokine profiling to identify mechanisms of critical illness distinguishing MIS-C from severe acute coronavirus disease 2019 (SAC). Results Compared to SAC, MIS-C patients demonstrated profound innate immune cell death and features of emergency myelopoiesis (EM), an understudied phenomenon observed in severe inflammation. EM signatures were characterized by fewer mature myeloid cells in the periphery and decreased expression of HLA-DR and CD86 on antigen-presenting cells …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Nature Immunology

NF-κB subunits direct kinetically distinct transcriptional cascades in antigen receptor-activated B cells

The nuclear factor kappa B (NF-κB) family of transcription factors orchestrates signal-induced gene expression in diverse cell types. Cellular responses to NF-κB activation are regulated at the level of cell and signal specificity, as well as differential use of family members (subunit specificity). Here we used time-dependent multi-omics to investigate the selective functions of Rel and RelA, two closely related NF-κB proteins, in primary B lymphocytes activated via the B cell receptor. Despite large numbers of shared binding sites genome wide, Rel and RelA directed kinetically distinct cascades of gene expression in activated B cells. Single-cell RNA sequencing revealed marked heterogeneity of Rel- and RelA-specific responses, and sequential binding of these factors was not a major mechanism of protracted transcription. Moreover, nuclear co-expression of Rel and RelA led to functional antagonism between the …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Iscience

High-throughput screening for myelination promoting compounds using human stem cell-derived oligodendrocyte progenitor cells

Promoting myelination capacity of endogenous oligodendrocyte precursor cells (OPCs) is a promising therapeutic approach for CNS demyelinating disorders such as Multiple Sclerosis (MS). To aid in the discovery of myelination-promoting compounds, we generated a genome-engineered human pluripotent stem cell (hPSC) line that consists of three reporters: identification-and-purification tag, GFP, and secreted-NanoLuc, driven by the endogenous PDGFRA, PLP1, and MBP genes, respectively. Using this cell line, we established a high-throughput drug screening platform and performed a small-molecule screen, which identified at least two myelination-promoting small-molecule (Ro1138452 and SR2211) that target prostacyclin (IP) receptor and retinoic acid receptor-related orphan receptor γ (RORγ), respectively. Single-cell-transcriptomic analysis of differentiating OPCs treated with these molecules further …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

The Journal of clinical investigation

Cell-free DNA reveals distinct pathology of multisystem inflammatory syndrome in children

Multisystem inflammatory syndrome in children (MIS-C) is a rare but life-threatening hyperinflammatory condition induced by infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that causes pediatric COVID-19 (pCOVID-19). The relationship of the systemic tissue injury to the pathophysiology of MIS-C is poorly defined. We leveraged the high sensitivity of epigenomics analyses of plasma cell-free DNA (cfDNA) and plasma cytokine measurements to identify the spectrum of tissue injury and glean mechanistic insights. Compared with pediatric healthy controls (pHCs) and patients with pCOVID-19, patients with MIS-C had higher levels of cfDNA primarily derived from innate immune cells, megakaryocyte-erythroid precursor cells, and nonhematopoietic tissues such as hepatocytes, cardiac myocytes, and kidney cells. Nonhematopoietic tissue cfDNA levels demonstrated significant …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Genes

Reconstruction of Single-Cell Trajectories Using Stochastic Tree Search

The recent advancement in single-cell RNA sequencing technologies enables the understanding of dynamic cellular processes at the single-cell level. Using trajectory inference methods, pseudotimes can be estimated based on reconstructed single-cell trajectories which can be further used to gain biological knowledge. Existing methods for modeling cell trajectories, such as minimal spanning tree or k-nearest neighbor graph, often lead to locally optimal solutions. In this paper, we propose a penalized likelihood-based framework and introduce a stochastic tree search (STS) algorithm aiming at the global solution in a large and non-convex tree space. Both simulated and real data experiments show that our approach is more accurate and robust than other existing methods in terms of cell ordering and pseudotime estimation.

Hongkai Ji

Hongkai Ji

Johns Hopkins University

The Journal of Nutrition

Contrasting association of maternal plasma biomarkers of smoking and one-carbon micronutrients with offspring DNA methylation: Evidence of AHRR gene-smoking-folate interaction

BackgroundMaternal prenatal smoking is known to alter offspring DNA methylation (DNAm). However, there are no effective interventions to mitigate smoking-induced DNAm alteration.ObjectivesThis study investigated whether 1-carbon nutrients (folate, vitamins B6, and B12) can protect against prenatal smoking-induced offspring DNAm alterations in the aryl hydrocarbon receptor repressor (AHRR) (cg05575921), GFI1 (cg09935388), and CYP1A1 (cg05549655) genes.MethodsThis study included mother-newborn dyads from a racially diverse US birth cohort. The cord blood DNAm at the above 3 sites were derived from a previous study using the Illumina Infinium MethylationEPIC BeadChip. Maternal smoking was assessed by self-report and plasma biomarkers (hydroxycotinine and cotinine). Maternal plasma folate, and vitamins B6 and B12 concentrations were obtained shortly after delivery. Linear regressions …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Nature communications

Single-cell allele-specific expression analysis reveals dynamic and cell-type-specific regulatory effects

Differential allele-specific expression (ASE) is a powerful tool to study context-specific cis-regulation of gene expression. Such effects can reflect the interaction between genetic or epigenetic factors and a measured context or condition. Single-cell RNA sequencing (scRNA-seq) allows the measurement of ASE at individual-cell resolution, but there is a lack of statistical methods to analyze such data. We present Differential Allelic Expression using Single-Cell data (DAESC), a powerful method for differential ASE analysis using scRNA-seq from multiple individuals, with statistical behavior confirmed through simulation. DAESC accounts for non-independence between cells from the same individual and incorporates implicit haplotype phasing. Application to data from 105 induced pluripotent stem cell (iPSC) lines identifies 657 genes dynamically regulated during endoderm differentiation, with enrichment for changes …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Nature communications

Probabilistic embedding, clustering, and alignment for integrating spatial transcriptomics data with PRECAST

Spatially resolved transcriptomics involves a set of emerging technologies that enable the transcriptomic profiling of tissues with the physical location of expressions. Although a variety of methods have been developed for data integration, most of them are for single-cell RNA-seq datasets without consideration of spatial information. Thus, methods that can integrate spatial transcriptomics data from multiple tissue slides, possibly from multiple individuals, are needed. Here, we present PRECAST, a data integration method for multiple spatial transcriptomics datasets with complex batch effects and/or biological effects between slides. PRECAST unifies spatial factor analysis simultaneously with spatial clustering and embedding alignment, while requiring only partially shared cell/domain clusters across datasets. Using both simulated and four real datasets, we show improved cell/domain detection with outstanding …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Nature medicine

Neoadjuvant enoblituzumab in localized prostate cancer: a single-arm, phase 2 trial

B7 homolog 3 (B7-H3; CD276), a tumor-associated antigen and possible immune checkpoint, is highly expressed in prostate cancer (PCa) and is associated with early recurrence and metastasis. Enoblituzumab is a humanized, Fc-engineered, B7-H3-targeting antibody that mediates antibody-dependent cellular cytotoxicity. In this phase 2, biomarker-rich neoadjuvant trial, 32 biological males with operable intermediate to high-risk localized PCa were enrolled to evaluate the safety, anti-tumor activity and immunogenicity of enoblituzumab when given before prostatectomy. The coprimary outcomes were safety and undetectable prostate-specific antigen (PSA) level (PSA0) 1 year postprostatectomy, and the aim was to obtain an estimate of PSA0 with reasonable precision. The primary safety endpoint was met with no notable unexpected surgical or medical complications, or surgical delay. Overall, 12% of patients …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Molecular Neurobiology

Identification of Regulatory Elements in Primary Sensory Neurons Involved in Trauma-Induced Neuropathic Pain

Chronic pain is a significant public health issue that is often refractory to existing therapies. Here we use a multiomic approach to identify cis-regulatory elements that show differential chromatin accessibility and reveal transcription factor (TF) binding motifs with functional regulation in the rat dorsal root ganglion (DRG), which contain cell bodies of primary sensory neurons, after nerve injury. We integrated RNA-seq to understand how differential chromatin accessibility after nerve injury may influence gene expression. Using TF protein arrays and chromatin immunoprecipitation-qPCR, we confirmed C/EBPγ binding to a differentially accessible sequence and used RNA-seq to identify processes in which C/EBPγ plays an important role. Our findings offer insights into TF motifs that are associated with chronic pain. These data show how interactions between chromatin landscapes and TF expression patterns may work …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Oncogene

ERK hyperactivation serves as a unified mechanism of escape in intrinsic and acquired CDK4/6 inhibitor resistance in acral lentiginous melanoma

Patients with metastatic acral lentiginous melanoma (ALM) suffer worse outcomes relative to patients with other forms of cutaneous melanoma (CM), and do not benefit as well to approved melanoma therapies. Identification of cyclin-dependent kinase 4 and 6 (CDK4/6) pathway gene alterations in >60% of ALMs has led to clinical trials of the CDK4/6 inhibitor (CDK4i/6i) palbociclib for ALM; however, median progression free survival with CDK4i/6i treatment was only 2.2 months, suggesting existence of resistance mechanisms. Therapy resistance in ALM remains poorly understood; here we report hyperactivation of MAPK signaling and elevated cyclin D1 expression serve as a mechanism of intrinsic early/adaptive CDK4i/6i resistance. ALM cells that have acquired CDK4i/6i resistance following chronic treatment exposure also exhibit hyperactivation of the MAPK pathway. MEK and/or ERK inhibition increases CDK4i …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

bioRxiv

Distinct Myeloid Derived Suppressor Cell Populations Promote Tumor Aggression in Glioblastoma

The diversity of genetic programs and cellular plasticity of glioma-associated myeloid cells, and thus their contribution to tumor growth and immune evasion, is poorly understood. We performed single cell RNA-sequencing of immune and tumor cells from 33 glioma patients of varying tumor grades. We identified two populations characteristic of myeloid derived suppressor cells (MDSC), unique to glioblastoma (GBM) and absent in grades II and III tumors: i) an early progenitor population (E-MDSC) characterized by strong upregulation of multiple catabolic, anabolic, oxidative stress, and hypoxia pathways typically observed within tumor cells themselves, and ii) a monocytic MDSC (M-MDSC) population. The E-MDSCs geographically co-localize with a subset of highly metabolic glioma stem-like tumor cells with a mesenchymal program in the pseudopalisading region, a pathognomonic feature of GBMs associated with poor prognosis. Ligand-receptor interaction analysis revealed symbiotic cross-talk between the stemlike tumor cells and E-MDSCs in GBM, whereby glioma stem cells produce chemokines attracting E-MDSCs, which in turn produce growth and survival factors for the tumor cells. Our large-scale single-cell analysis elucidated unique MDSC populations as key facilitators of GBM progression and mediators of tumor immunosuppression, suggesting that targeting these specific myeloid compartments, including their metabolic programs, may be a promising therapeutic intervention in this deadly cancer.One-Sentence SummaryAggressive glioblastoma harbors two unique myeloid populations capable of promoting stem-like properties of …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Frontiers in immunology

Immune activation and exhaustion marker expression on T-cell subsets in ART-treated adolescents and young adults with perinatal HIV-1 infection as correlates of viral persistence

HIV-1 infection in memory CD4+ T cells forms a latent reservoir that is a barrier to cure. Identification of immune biomarkers that correlate with HIV-1 reservoir size may aid with evaluating efficacy of HIV-1 eradication strategies, towards ART-free remission and cure. In adults living with non-perinatal HIV-1, the immune exhaustion marker PD-1 on central memory CD4+ T cells (Tcm) correlates with measures of HIV-1 reservoir size. Immune correlates of HIV-1 are less defined in adolescents and young adults with perinatal HIV-1. With multi-parameter flow cytometry, we examined immune activation (CD69, CD25, HLA-DR), and exhaustion (PD-1, TIGIT, TIM-3 and LAG-3) markers on CD4+ T cell subsets (naïve (Tn), central memory (Tcm), and the combination (Ttem) of transitional (Ttm) and effector memory (Tem) cells, in 10 adolescents and young adults living with perinatal HIV-1 (median age 15.9 years; median duration of virologic suppression 7.0 years), in whom HIV-1 reservoir size was measured with the Intact Proviral HIV-1 DNA Assay (IPDA) and an enhanced Tat/Rev limiting dilution assay (TILDA). RNA-seq was also performed on the unstimulated CD4+ T cells. The median total HIV-1 DNA concentration in memory CD4+ T cells was 211.90 copies per million CD4+ T cells. In the 7 participants with subtype B HIV-1 infection, the median intact proviral DNA load was 7.96 copies per million CD4+ T cells. Levels of HLA-DR and TIGIT on the Ttem were correlated with total HIV-1 DNA (r=0.76, p=0.015) and (r=0.72, p=0.023), respectively, but not with intact proviral load or induced reservoir size. HIV-1 DNA load was also positively correlated with …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Science immunology

Lung tumor–infiltrating Treg have divergent transcriptional profiles and function linked to checkpoint blockade response

Regulatory T cells (Treg) are conventionally viewed as suppressors of endogenous and therapy-induced antitumor immunity; however, their role in modulating responses to immune checkpoint blockade (ICB) is unclear. In this study, we integrated single-cell RNA-seq/T cell receptor sequencing (TCRseq) of >73,000 tumor-infiltrating Treg (TIL-Treg) from anti–PD-1–treated and treatment-naive non–small cell lung cancers (NSCLC) with single-cell analysis of tumor-associated antigen (TAA)–specific Treg derived from a murine tumor model. We identified 10 subsets of human TIL-Treg, most of which have high concordance with murine TIL-Treg subsets. Only one subset selectively expresses high levels of TNFRSF4 (OX40) and TNFRSF18 (GITR), whose engangement by cognate ligand mediated proliferative programs and NF-κB activation, as well as multiple genes involved in Treg suppression, including LAG3 …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Developmental Biology

The BAF chromatin complex component SMARCC1 does not mediate GLI transcriptional repression of Hedgehog target genes in limb buds

Transcriptional responses to the Hedgehog (HH) signaling pathway are primarily modulated by GLI repression in the mouse limb. Previous studies suggested a role for the BAF chromatin remodeling complex in mediating GLI repression. Consistent with this possibility, the core BAF complex protein SMARCC1 is present at most active limb enhancers including the majority of GLI enhancers. However, in contrast to GLI repression which reduces chromatin accessibility, SMARCC1 maintains chromatin accessibility at most enhancers, including those bound by GLI. Moreover, SMARCC1 binding at GLI-regulated enhancers occurs independently of GLI3. Consistent with previous studies, some individual GLI target genes are mis-regulated in Smarcc1 conditional knockouts, though most GLI target genes are unaffected. Moreover, SMARCC1 is not necessary for mediating constitutive GLI repression in HH mutant limb …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Nucleic acids research

DNA methylation entropy is associated with DNA sequence features and developmental epigenetic divergence

Epigenetic information defines tissue identity and is largely inherited in development through DNA methylation. While studied mostly for mean differences, methylation also encodes stochastic change, defined as entropy in information theory. Analyzing allele-specific methylation in 49 human tissue sample datasets, we find that methylation entropy is associated with specific DNA binding motifs, regulatory DNA, and CpG density. Then applying information theory to 42 mouse embryo methylation datasets, we find that the contribution of methylation entropy to time- and tissue-specific patterns of development is comparable to the contribution of methylation mean, and methylation entropy is associated with sequence and chromatin features conserved with human. Moreover, methylation entropy is directly related to gene expression variability in development, suggesting a role for epigenetic entropy in …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

International journal of molecular sciences

Regional Gene Expression in the Retina, Optic Nerve Head, and Optic Nerve of Mice with Optic Nerve Crush and Experimental Glaucoma

A major risk factor for glaucomatous optic neuropathy is the level of intraocular pressure (IOP), which can lead to retinal ganglion cell axon injury and cell death. The optic nerve has a rostral unmyelinated portion at the optic nerve head followed by a caudal myelinated region. The unmyelinated region is differentially susceptible to IOP-induced damage in rodent models and human glaucoma. While several studies have analyzed gene expression changes in the mouse optic nerve following optic nerve injury, few were designed to consider the regional gene expression differences that exist between these distinct areas. We performed bulk RNA-sequencing on the retina and separately micro-dissected unmyelinated and myelinated optic nerve regions from naïve C57BL/6 mice, mice after optic nerve crush, and mice with microbead-induced experimental glaucoma (total = 36). Gene expression patterns in the naïve unmyelinated optic nerve showed significant enrichment of the Wnt, Hippo, PI3K-Akt, and transforming growth factor β pathways, as well as extracellular matrix–receptor and cell membrane signaling pathways, compared to the myelinated optic nerve and retina. Gene expression changes induced by both injuries were more extensive in the myelinated optic nerve than the unmyelinated region, and greater after nerve crush than glaucoma. Changes present three and fourteen days after injury largely subsided by six weeks. Gene markers of reactive astrocytes did not consistently differ between injury states. Overall, the transcriptomic phenotype of the mouse unmyelinated optic nerve was significantly different from immediately adjacent …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Nature Communications

A statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples

Pseudotime analysis with single-cell RNA-sequencing (scRNA-seq) data has been widely used to study dynamic gene regulatory programs along continuous biological processes. While many methods have been developed to infer the pseudotemporal trajectories of cells within a biological sample, it remains a challenge to compare pseudotemporal patterns with multiple samples (or replicates) across different experimental conditions. Here, we introduce Lamian, a comprehensive and statistically-rigorous computational framework for differential multi-sample pseudotime analysis. Lamian can be used to identify changes in a biological process associated with sample covariates, such as different biological conditions while adjusting for batch effects, and to detect changes in gene expression, cell density, and topology of a pseudotemporal trajectory. Unlike existing methods that ignore sample variability, Lamian …

2023/11/10

Article Details

Other articles from Statistics in Biosciences journal

Aidong Adam Ding

Aidong Adam Ding

North Eastern University

Statistics in Biosciences

Statistical Methods for Selective Biomarker Testing

Biomarkers are critically important tools in modern clinical diagnosis, prognosis, and classification/prediction. However, there are fiscal and analytical barriers to biomarker research. Selective Genotyping is an approach to increasing study power and efficiency where individuals with the most extreme phenotype (response) are chosen for genotyping (exposure) in order to maximize the information in the sample. In this article, we describe an analogous procedure in the biomarker testing landscape where both response and biomarker (exposure) are continuous. We propose an intuitive reverse-regression least squares estimator for the parameters relating biomarker value to response. An expression for robust standard error and corresponding confidence interval are derived. A simulation study is used to demonstrate that this method is unbiased and efficient relative to estimates from random sampling when the joint …

Yiran Zhang

Yiran Zhang

University of California, San Diego

Statistics in Biosciences

Marginal structural illness-death models for semi-competing risks data

The three-state illness-death model has been established as a general approach for regression analysis of semi-competing risks data. For observational data the marginal structural models (MSM) are a useful tool, under the potential outcomes framework to define and estimate parameters with causal interpretations. In this paper we introduce a class of marginal structural illness-death models for the analysis of observational semi-competing risks data. We consider two specific such models, the Markov illness-death MSM and the frailty-based Markov illness-death MSM. For interpretation purposes, risk contrasts under the MSMs are defined. Inference under the illness-death MSM can be carried out using estimating equations with inverse probability weighting, while inference under the frailty-based illness-death MSM requires a weighted EM algorithm. We study the inference procedures under both MSMs using …

Dungang Liu

Dungang Liu

University of Cincinnati

Statistics in Biosciences

Something Out of Nothing? The Influence of Double-Zero Studies in Meta-analysis of Adverse Events in Clinical Trials

In addition to clinical efficacy, safety is another important outcome to assess in randomized controlled trials. It focuses on the occurrence of adverse events, such as stroke, deaths, and other rare events. Because of the low or very low rates of observing adverse events, meta-analysis is often used to pooled together evidence from dozens or even hundreds of similar clinical trials to strengthen inference. A well-known issue in rare-event meta-analysis is that some or even majority of the available studies may observe zero events in both the treatment and control groups. The influence of these so-called double-zero studies has been researched in the literature, which nevertheless focuses on reaching a dichotomous conclusion—whether or not double-zero studies should be included in the analysis. It has not been addressed when and how they contribute to inference, especially for the odds ratio. This paper fills this …

Ming T. Tan

Ming T. Tan

Georgetown University

Statistics in Biosciences

Integrative Classification Using Structural Equation Modeling of Homeostasis

We consider binary classification in the high-dimensional setting, where the number of features is huge, and the number of observations is limited. We focus on the setting where features in one group have certain correlation structures that are not present in the other group. This is particularly relevant in early detection of diseases where subjects develop from a normal or homeostatic state to a diseased condition. Linear discriminant analysis (with a link function) and classification based on regularized regression or machine learning have been used as methods for this problem and related variable selection. However, most methods do not account for the correlation structures of variables within groups. While the diseased group may demonstrate abundant diversity and no clear structure, achieving higher accuracy in classification requires considering the correlation structures in the control group with homeostasis …

Li-Hsiang Lin

Li-Hsiang Lin

Georgia Institute of Technology

Statistics in Biosciences

Promotion Time Cure Model with Local Polynomial Estimation

In modeling survival data with a cure fraction, flexible modeling of covariate effects on the probability of cure has important medical implications, which aids investigators in identifying better treatments to cure. This paper studies a semiparametric form of the Yakovlev promotion time cure model that allows for nonlinear effects of a continuous covariate. We adopt the local polynomial approach and use the local likelihood criterion to derive nonlinear estimates of covariate effects on cure rates, assuming that the baseline distribution function follows a parametric form. This approach ensures that the model is identifiable and we adopt a flexible method to estimate the cure rate locally, the important part in cure models, and a convenient way to estimate the baseline function globally. An algorithm is proposed to implement estimation at both the local and global scales. Asymptotic properties of local polynomial estimates, the …

Connie Rhee

Connie Rhee

University of California, Irvine

Statistics in Biosciences

Multivariate Varying Coefficient Spatiotemporal Model

As of 2020, 807,920 individuals in the US had end-stage kidney disease (ESKD) with about 70% of patients on dialysis, a life-sustaining treatment. Dialysis patients experience high mortality rates, where frequent hospitalizations are a major contributor to morbidity and mortality. There is growing interest in identifying the risk factors for the correlated outcomes of hospitalization and mortality among dialysis patients across the US Utilizing national data from the United States Renal Data System (USRDS), we propose a novel multivariate varying coefficient spatiotemporal model to study the time-dynamic effects of risk factors (eg, urbanicity and area deprivation index) on the multivariate outcome of hospitalization and mortality rates, as a function of time on dialysis. While capturing time-varying effects of risk factors on the mean, the proposed model also incorporates spatiotemporal patterns of the residuals for efficient …

Zhaohu (Jonathan) Fan

Zhaohu (Jonathan) Fan

University of Cincinnati

Statistics in Biosciences

Something Out of Nothing? The Influence of Double-Zero Studies in Meta-analysis of Adverse Events in Clinical Trials

In addition to clinical efficacy, safety is another important outcome to assess in randomized controlled trials. It focuses on the occurrence of adverse events, such as stroke, deaths, and other rare events. Because of the low or very low rates of observing adverse events, meta-analysis is often used to pooled together evidence from dozens or even hundreds of similar clinical trials to strengthen inference. A well-known issue in rare-event meta-analysis is that some or even majority of the available studies may observe zero events in both the treatment and control groups. The influence of these so-called double-zero studies has been researched in the literature, which nevertheless focuses on reaching a dichotomous conclusion—whether or not double-zero studies should be included in the analysis. It has not been addressed when and how they contribute to inference, especially for the odds ratio. This paper fills this …

Hongkai Ji

Hongkai Ji

Johns Hopkins University

Statistics in Biosciences

Building an Enhanced Publication Ecosystem for Statistical Innovation

Statistical research functions much like a supply chain, beginning with the identification of demands, such as open problems in science and technology, and the gathering or generation of data (eg, through surveys, clinical trials, or scientific experiments). These demands and data fuel the creation of innovative analytical solutions, encompassing fresh theories and methodologies for study design and data analysis, and, increasingly, the development of software tools to disseminate these solutions to end-users. To ensure proper and effective utilization of new methods and tools, it is crucial to systematically evaluate them, enabling users to understand their properties, capabilities, and limitations. Ultimately, their value must be tested and demonstrated by solving real-world problems. Thus, demands, data, theory, methods, software, as well as their evaluation, validation, and application, all serve as vital components …

SUDIPTO BANERJEE

SUDIPTO BANERJEE

University of California, Los Angeles

Statistics in Biosciences

Multivariate Varying Coefficient Spatiotemporal Model

As of 2020, 807,920 individuals in the US had end-stage kidney disease (ESKD) with about 70% of patients on dialysis, a life-sustaining treatment. Dialysis patients experience high mortality rates, where frequent hospitalizations are a major contributor to morbidity and mortality. There is growing interest in identifying the risk factors for the correlated outcomes of hospitalization and mortality among dialysis patients across the US Utilizing national data from the United States Renal Data System (USRDS), we propose a novel multivariate varying coefficient spatiotemporal model to study the time-dynamic effects of risk factors (eg, urbanicity and area deprivation index) on the multivariate outcome of hospitalization and mortality rates, as a function of time on dialysis. While capturing time-varying effects of risk factors on the mean, the proposed model also incorporates spatiotemporal patterns of the residuals for efficient …

Kassu Mehari BEYENE

Kassu Mehari BEYENE

Université Catholique de Louvain

Statistics in Biosciences

Estimate Time-To-Infection (TTI) Vaccination Effect When TTI for Unvaccinated Group is Unknown

The COVID-19 pandemic has caused significant morbidity and mortality, as well as social and economic disruption worldwide in general and USA in particular. In order to reduce these effects, a global effort to develop effective vaccines against the COVID-19 virus has produced various options with the effectiveness assessed on the rate of infection between vaccinated and unvaccinated groups, which has been used for important policy decision-making on vaccination effectiveness ever since. However, the rate of infection is an over-simplified index in assessing the vaccination effectiveness overall, which should be strengthened to address the duration of protection with time-to-infection effect. The fundamental challenge in estimating the vaccination effect over time is that the time-to-infection for unvaccinated group is unknown due to nonexistent vaccination time. This paper is then aimed to fill this knowledge gap to …

Steven D. Edland

Steven D. Edland

University of California, San Diego

Statistics in Biosciences

Marginal structural illness-death models for semi-competing risks data

The three-state illness-death model has been established as a general approach for regression analysis of semi-competing risks data. For observational data the marginal structural models (MSM) are a useful tool, under the potential outcomes framework to define and estimate parameters with causal interpretations. In this paper we introduce a class of marginal structural illness-death models for the analysis of observational semi-competing risks data. We consider two specific such models, the Markov illness-death MSM and the frailty-based Markov illness-death MSM. For interpretation purposes, risk contrasts under the MSMs are defined. Inference under the illness-death MSM can be carried out using estimating equations with inverse probability weighting, while inference under the frailty-based illness-death MSM requires a weighted EM algorithm. We study the inference procedures under both MSMs using …

JIANGUO SUN

JIANGUO SUN

University of Missouri

Statistics in Biosciences

Variable Selection for Nonlinear Covariate Effects with Interval-Censored Failure Time Data

This paper discusses variable selection when one faces general, high-dimensional interval-censored failure time data, which commonly occur in many areas such as epidemiological, medical and public health studies. Furthermore, for the situation, it is often the case that covariates may have nonlinear effects, which makes the variable selection more challenging. For the problem, we propose a novel and robust variable selection technique under a class of semiparametric additive transformation models. In particular, Bernstein polynomials are employed to approximate unknown functions and an EM algorithm is developed with the use of Poisson-based data augmentation for the implementation of the proposed method. An extensive simulation study is conducted to assess the empirical performance of the approach and suggests that it works well in practical situations. Finally, we apply the method to a set of real …

Melvin G Mcinnis

Melvin G Mcinnis

University of Michigan

Statistics in Biosciences

Integrating Summary Information from Many External Studies with Population Heterogeneity and a Study of COVID-19 Pandemic Impact on Mental Health of People with Bipolar Disorder

The penalized constrained maximum likelihood (PCML) method integrates summary information from external studies to improve model parameter estimation for an index study of interest in the context of population heterogeneity. It simultaneously selects and incorporates relevant external information. However, it only considers scenarios where the number of external studies is small. There are situations where many external studies exist for possible information integration. Here, we extend the PCML method by allowing the number of external studies to increase according to the sample size of the index (internal) study, motivated by a study of the COVID-19 pandemic impact on mental health of people with bipolar disorder. Our extension allows leveraging the many relevant external studies of mental health before and during the COVID-19 pandemic. Within our general framework, the asymptotic properties of the …

Sebastian Zöllner

Sebastian Zöllner

University of Michigan-Dearborn

Statistics in Biosciences

Integrating Summary Information from Many External Studies with Population Heterogeneity and a Study of COVID-19 Pandemic Impact on Mental Health of People with Bipolar Disorder

The penalized constrained maximum likelihood (PCML) method integrates summary information from external studies to improve model parameter estimation for an index study of interest in the context of population heterogeneity. It simultaneously selects and incorporates relevant external information. However, it only considers scenarios where the number of external studies is small. There are situations where many external studies exist for possible information integration. Here, we extend the PCML method by allowing the number of external studies to increase according to the sample size of the index (internal) study, motivated by a study of the COVID-19 pandemic impact on mental health of people with bipolar disorder. Our extension allows leveraging the many relevant external studies of mental health before and during the COVID-19 pandemic. Within our general framework, the asymptotic properties of the …

Danh V. Nguyen

Danh V. Nguyen

University of California, Irvine

Statistics in Biosciences

Multivariate Varying Coefficient Spatiotemporal Model

As of 2020, 807,920 individuals in the US had end-stage kidney disease (ESKD) with about 70% of patients on dialysis, a life-sustaining treatment. Dialysis patients experience high mortality rates, where frequent hospitalizations are a major contributor to morbidity and mortality. There is growing interest in identifying the risk factors for the correlated outcomes of hospitalization and mortality among dialysis patients across the US Utilizing national data from the United States Renal Data System (USRDS), we propose a novel multivariate varying coefficient spatiotemporal model to study the time-dynamic effects of risk factors (eg, urbanicity and area deprivation index) on the multivariate outcome of hospitalization and mortality rates, as a function of time on dialysis. While capturing time-varying effects of risk factors on the mean, the proposed model also incorporates spatiotemporal patterns of the residuals for efficient …

Boya Zhang

Boya Zhang

Virginia Polytechnic Institute and State University

Statistics in Biosciences

A Variance-Based Sensitivity Analysis Approach for Identifying Interactive Exposures

Chemical mixtures can significantly affect human health, but understanding the interactions among various chemical exposures and identifying influential ones in relation to some health outcomes are difficult. Bayesian kernel machine regression (BKMR) is a widely used model for capturing nonlinear dynamics and interactions between multiple exposures and health outcomes. However, tools for quantifying the interactions captured by this flexible model are scarce. Utilizing the inherent connection between BKMR and Gaussian process regressions, we adopt the classic variance-based sensitivity analysis tools from the uncertainty quantification community and propose a variable clustering approach to quantify interactions, discover high-order interaction terms, and rank variable importance. The performance of this method is demonstrated in a range of simulation scenarios and applied to a real dataset to examine …

Yuan Ji, PhD

Yuan Ji, PhD

University of Chicago

Statistics in Biosciences

A Unified Decision Framework for Phase I Dose-Finding Designs

The purpose of a phase I dose-finding clinical trial is to investigate the toxicity profiles of various doses for a new drug and identify the maximum tolerate dose. Over the past three decades, various dose-finding designs have been proposed and discussed, including conventional model-based designs, new model-based designs using toxicity probability intervals, and rule-based designs. We present a simple decision framework that can generate several popular designs as special cases. We show that these designs share common elements under the framework, such as the same likelihood function, the use of the loss functions, and the nature of the optimal decisions as Bayes rules. They differ mostly in the choice of the prior distributions. We present theoretical results on the decision framework and its link to specific and popular designs like mTPI, BOIN, and CRM. These results provide useful insights into the …

Haochang Shou

Haochang Shou

University of Pennsylvania

Statistics in Biosciences

Mediation Analysis with Random Distribution as Mediator with an Application to iCOMPARE Trial

Physical activity has long been shown to be associated with biological and physiological performance and risk of diseases. It is of great interest to assess whether the effect of an exposure or intervention on an outcome is mediated through physical activity measured by modern wearable devices such as actigraphy. However, existing methods for mediation analysis focus almost exclusively on mediation variable that is in the Euclidean space, which cannot be applied directly to physical activity measured by wearable devices. Such data is best summarized in the form of a random distribution. In this paper, we develop a structural equation model to the setting where a random distribution is treated as the mediator. We provide sufficient conditions for identifying the average causal effects of a distribution mediator and present methods for estimating the direct and mediating effects of a random distribution mediator on the …

Yunro Chung

Yunro Chung

Arizona State University

Statistics in Biosciences

Nonparametric Estimation of Proportional Hazards with Monotone Baseline Hazard and Covariate Effect

Order-restricted inference has been applied to survival analysis when its hazard function is known to have a specific shape prior to data analysis. Under the proportional hazards assumption, the partial likelihood approach is commonly used to estimate a covariate effect on the distribution of survival time without specifying its baseline hazard function, but at the same time, the shape information of the baseline hazard function cannot be used in the partial liklelihood estimation procedure. In this paper, we propose a nonparametric full likelihood method for estimating the covariate effect and baseline hazard functions simultaneously under monotone shape restriction. We develop an efficient algorithm using generalized isotonic regression techniques. We extend the algorithm to model with time-dependent covariates. Simulation studies demonstrate that the proposed full likelihood method shows smaller variance than …

Peisong Han

Peisong Han

University of Michigan

Statistics in Biosciences

Integrating Summary Information from Many External Studies with Population Heterogeneity and a Study of COVID-19 Pandemic Impact on Mental Health of People with Bipolar Disorder

The penalized constrained maximum likelihood (PCML) method integrates summary information from external studies to improve model parameter estimation for an index study of interest in the context of population heterogeneity. It simultaneously selects and incorporates relevant external information. However, it only considers scenarios where the number of external studies is small. There are situations where many external studies exist for possible information integration. Here, we extend the PCML method by allowing the number of external studies to increase according to the sample size of the index (internal) study, motivated by a study of the COVID-19 pandemic impact on mental health of people with bipolar disorder. Our extension allows leveraging the many relevant external studies of mental health before and during the COVID-19 pandemic. Within our general framework, the asymptotic properties of the …