Functional partial least squares with censored outcomes: Prediction of breast cancer risk with mammogram images
The Annals of Applied Statistics
Published On 2024/6
Tables, figures, and additional simulation and application results.
Journal
The Annals of Applied Statistics
Published On
2024/6
Volume
18
Issue
2
Page
1051-1063
Authors
Graham Colditz, MD, DrPH
Washington University in St. Louis
Position
Professor Dept. of Surgery
H-Index(all)
309
H-Index(since 2020)
125
I-10 Index(all)
0
I-10 Index(since 2020)
0
Citation(all)
0
Citation(since 2020)
0
Cited By
0
Research Interests
Epidemiology
Cancer Prevention
Health Outcomes
University Profile Page
Jiguo Cao
Simon Fraser University
Position
H-Index(all)
28
H-Index(since 2020)
21
I-10 Index(all)
0
I-10 Index(since 2020)
0
Citation(all)
0
Citation(since 2020)
0
Cited By
0
Research Interests
Functional Data Analysis
Estimating Differential Equations
Machine Learning
University Profile Page
Other Articles from authors
Jiguo Cao
Simon Fraser University
1 Product of Normal Densities
Supplemental Figure 3: Parameter ground truths and parameter estimates from fitting asymptotic and unmarked (traditional) N-mixture models with K= 300. Estimated parameters are λ (initial mean site abundance), γ (importation rate), ω (survival probability), and p (probability of detection). A total of 1, 1000 simulations are represented.
2024/3/4
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Cancer Research
Longitudinal latent class analysis to further understand trajectory of density over time and risk of breast cancer
Background: It is clinically important to refine strategies to manage women with dense breasts as they represent approximately 50% of all women screened. Change in breast density is related to risk of breast cancer.1 We also must understand this change in density over time in women with dense and non-dense breasts. While population level data have been reported to estimate decrease in density with age, this has largely used digitized film images.2 Expanding use of digital mammography and repeated screening generates a large library of images for each woman, offering the potential to use more of the image data. Methods: We use longitudinal latent class analysis to cluster women and estimate their change in density over time. To evaluate the groups of women with different patterns of change in density we fit latent class models to our previously published data.1 This includes 289 pathology confirmed …
2024/3/22
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Cancer Medicine
Racial differences in treatment and survival among older patients with multiple myeloma
Background Treatments for multiple myeloma (MM) have evolved over time and improved MM survival. While racial differences in MM treatment and prognosis between non‐Hispanic African American (NHAA) and non‐Hispanic White (NHW) patients are well‐established, it is unclear whether they have persisted after the introduction of novel agents. Methods Using the Surveillance, Epidemiology, and End Results‐Medicare linked database, our study investigated racial difference in the receipt of treatment within 1 year following diagnosis and assessed survival outcomes among Medicare beneficiaries (≥66 years) diagnosed with MM from 2007 to 2017. We applied multivariable Cox proportional hazards models to estimate the association between race and survival and presented hazard ratios (HRs). Results Of 2094 NHAA and 11,983 NHW older patients with MM, 59.5% and 64.8% received treatment …
2024/1/17
Article DetailsJiguo Cao
Simon Fraser University
arXiv preprint arXiv:2401.09499
Functional Autoencoder for Smoothing and Representation Learning
A common pipeline in functional data analysis is to first convert the discretely observed data to smooth functions, and then represent the functions by a finite-dimensional vector of coefficients summarizing the information. Existing methods for data smoothing and dimensional reduction mainly focus on learning the linear mappings from the data space to the representation space, however, learning only the linear representations may not be sufficient. In this study, we propose to learn the nonlinear representations of functional data using neural network autoencoders designed to process data in the form it is usually collected without the need of preprocessing. We design the encoder to employ a projection layer computing the weighted inner product of the functional data and functional weights over the observed timestamp, and the decoder to apply a recovery layer that maps the finite-dimensional vector extracted from the functional data back to functional space using a set of predetermined basis functions. The developed architecture can accommodate both regularly and irregularly spaced data. Our experiments demonstrate that the proposed method outperforms functional principal component analysis in terms of prediction and classification, and maintains superior smoothing ability and better computational efficiency in comparison to the conventional autoencoders under both linear and nonlinear settings.
2024/1/17
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Cancer Research
Abstract PO5-09-02: Breast Tissue Proteomic Profile of Breast Cancer in Premenopausal Women and Association with Mammographic Breast Density
Introduction: Breast cancer incidence is rising in premenopausal, hence, there is a critical need to understand factors underlying premenopausal breast cancer development in order to guide targeted prevention. Mammographic breast density is a strong risk factor for, as well as an intermediate phenotype for premenopausal breast cancer. Yet, the molecular mechanisms underlying the associations of dense breasts with breast cancer are not well understood. Our objectives in this study are to perform proteomic analysis in breast tissues to (i) identify proteins that are associated with breast cancer development in premenopausal women; (ii) determine which of these proteins are also associated with dense breasts. Methods: We performed proteomic analysis on tumor and adjacent normal tissues from 50 premenopausal women with breast cancer who had breast tissue samples archived at the St. Louis Breast Tumor …
2024/5/2
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Breast Cancer Research
How does weight gain since the age of 18 years affect breast cancer risk in later life? A meta-analysis
Early life factors are important risk factors for breast cancer. The association between weight gain after age 18 and breast cancer risk is inconsistent across previous epidemiologic studies. To evaluate this association, we conducted a meta-analysis according to PRISMA guidelines and the established inclusion criteria. We performed a comprehensive literature search using Medline (Ovid), Embase, Scopus, Cochrane Library, and ClinicalTrials.gov to identify relevant studies published before June 3, 2022. Two reviewers independently reviewed the articles for final inclusion. Seventeen out of 4,725 unique studies met the selection criteria. The quality of studies was assessed using the Newcastle-Ottawa Scale (NOS), and all were of moderate to high quality with NOS scores ranging from 5 to 8. We included 17 studies (11 case-control, 6 cohort) in final analysis. In case-control studies, weight gain after age 18 was …
2024/3/7
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Journal of Clinical Oncology
Plant Foods Intake and Risk of Premature Aging in Adult Survivors of Childhood Cancer in the St Jude Lifetime Cohort (SJLIFE)
PURPOSETo identify dietary factors that are related to premature aging in adult survivors of childhood cancer, we examined the associations between plant food intakes and age-related deficit accumulation.METHODSA total of 3,322 childhood cancer survivors (age 18-65 years, mean = 31, standard deviation = 8.4) in the St Jude Lifetime Cohort had total fruit, total vegetables and subgroups, whole grains, refined grains, nuts/seeds, and nutrients intake assessed using a food frequency questionnaire. Premature aging at baseline was assessed by the deficit accumulation index (DAI) and categorized as low, medium, and high risk. Multinomial logistic regressions (reference: low risk) adjusting for confounders estimated odds ratios (ORs) and 95% CIs. Multivariable linear regression of a continuous intake against a continuous DAI was also performed.RESULTSDark green vegetable (ORhigh v low = 0.47 [95% CI, 0.28 …
2024/1
Article DetailsJiguo Cao
Simon Fraser University
Journal of Computational and Graphical Statistics
Functional nonlinear learning
Using representations of functional data can be more convenient and beneficial in subsequent statistical models than direct observations. These representations, in a lower-dimensional space, extract and compress information from individual curves. The existing representation learning approaches in functional data analysis usually use linear mapping in parallel to those from multivariate analysis, for example, functional principal component analysis (FPCA). However, functions, as infinite-dimensional objects, sometimes have nonlinear structures that cannot be uncovered by linear mapping. Linear methods will be more overwhelmed by multivariate functional data. For that matter, this article proposes a functional nonlinear learning (FunNoL) method to sufficiently represent multivariate functional data in a lower-dimensional feature space. Furthermore, we merge a classification model for enriching the ability of …
2024/1/2
Article DetailsJiguo Cao
Simon Fraser University
The Annals of Applied Statistics
How are PreLaunch online movie reviews related to box office revenues?
The Supplementary Material contains the mean curves and the top three FPCs estimated from the observed quantile trajectories at quantile levels 0.1, 0.25, 0.5, and 0.9 and additional results for predicting the week two box office revenues.
2024/6
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Cancer Research
Abstract PO3-09-01: Integrating pathomic and radiomic images to classify risk of subsequent events among women with DCIS
Background Multiple sources of ~omic data can be generated from women at different stages of developing breast cancer, the leading cancer diagnosed in women worldwide. Traditionally interrogation of risk factors to study associations and develop prediction models for future breast events has been limited to one or few risk factors, or summary scores of clinical and tumor characteristics. Methods to bring mammography images and breast biopsies of precancer lesions together to summarize risk of cancer developing in the breast are urgently needed. Integration of these two sources has not been performed to date, but has potential to increase accuracy of risk prediction. Approach The Repository of Archival Human Breast Tissue (RAHBT) was established in 2007 at Washington University School of Medicine (WUSM) and maintains biospecimens and medical record data of women treated with breast-conserving …
2024/5/2
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Statistics in Medicine
Modeling correlated pairs of mammogram images
Mammography remains the primary screening strategy for breast cancer, which continues to be the most prevalent cancer diagnosis among women globally. Because screening mammograms capture both the left and right breast, there is a nonnegligible correlation between the pair of images. Previous studies have explored the concept of averaging between the pair of images after proper image registration; however, no comparison has been made in directly utilizing the paired images. In this paper, we extend the bivariate functional principal component analysis over triangulations to jointly characterize the pair of imaging data bounded in an irregular domain and then nest the extracted features within the survival model to predict the onset of breast cancer. The method is applied to our motivating data from the Joanne Knight Breast Health Cohort at Siteman Cancer Center. Our findings indicate that there was no …
2024/2/13
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Cancer Causes & Control
Pectoral muscle removal in mammogram images: A novel approach for improved accuracy and efficiency
PurposeAccurate pectoral muscle removal is critical in mammographic breast density estimation and many other computer-aided algorithms. We propose a novel approach to remove pectoral muscles form mediolateral oblique (MLO) view mammograms and compare accuracy and computational efficiency with existing method (Libra).MethodsA pectoral muscle identification pipeline was developed. The image is first binarized to enhance contrast and then the Canny algorithm was applied for edge detection. Robust interpolation is used to smooth out the pectoral muscle region. Accuracy and computational speed of pectoral muscle identification was assessed using 951 women (1,902 MLO mammograms) from the Joanne Knight Breast Health Cohort at Washington University School of Medicine.ResultsOur proposed algorithm exhibits lower mean error of 12.22% in comparison to Libra’s estimated error of 20.44 …
2024/1
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
screening
Most cancer deaths are unaddressed by current screening paradigms
○ In the United States (US), cancer is the leading cause of death among adults≤ 80 years1
2024/4/5
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Cancer Causes & Control
Avoiding lead-time bias by estimating stage-specific proportions of cancer and non-cancer deaths
PurposeUnderstanding how stage at cancer diagnosis influences cause of death, an endpoint that is not susceptible to lead-time bias, can inform population-level outcomes of cancer screening.MethodsUsing data from 17 US Surveillance, Epidemiology, and End Results registries for 1,154,515 persons aged 50–84 years at cancer diagnosis in 2006–2010, we evaluated proportional causes of death by cancer type and uniformly classified stage, following or extrapolating all patients until death through 2020.ResultsMost cancer patients diagnosed at stages I–II did not go on to die from their index cancer, whereas most patients diagnosed at stage IV did. For patients diagnosed with any cancer at stages I–II, an estimated 26% of deaths were due to the index cancer, 63% due to non-cancer causes, and 12% due to a subsequent primary (non-index) cancer. In contrast, for patients diagnosed with any stage IV cancer …
2024/1/18
Article DetailsJiguo Cao
Simon Fraser University
Journal of Agricultural, Biological and Environmental Statistics
Faster Asymptotic Solutions for N-Mixtures on Large Populations
We derive an asymptotic likelihood function for open-population N-mixture models and show that it has favorable computational complexity and accuracy when compared to the traditional likelihood function for large population sizes. We validate our asymptotic model with simulation studies and apply our model to estimate the population size of Ancient Murrelet chicks, comparing against results obtained using the traditional N-mixture likelihood and an alternative asymptotic model based on the multivariate normal distribution. For the Ancient Murrelet case study, our asymptotic model computes twice as fast as the traditional models, eleven times faster when parallel processing is used, and provides higher-precision estimates than the asymptotic multivariate normal model. We provide an open-source implementation of our methods in the quickNmix R package. Supplementary material to this paper is provided online.
2024/3/30
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Cancer Research
Dynamic prediction with repeated mammogram images improves 5-year breast cancer risk performance
Background: Increasing use of digital screening mammography opens potential to add individualized dynamic measures of breast image history to predict future breast cancer risk. Current prediction models use a single mammogram measure, usually breast density, at one point in time, to add to prediction models based on established risk factors. Our previous work has shown that change in each breast over time is related to risk of diagnosis with breast cancer. Methods: We apply a dynamic prediction model that includes full field digital mammogram in each breast to predict future 5-year risk. Data from Joanne Knight Breast Health Cohort (JKBHC) serve as training population with 290 pathology-confirmed breast cancer cases and 658 controls sampled with a total number of 8,712 craniocaudal (CC) view mammograms for analysis. Nurses’ Health Study 2 (NHS2) serves as the external validation with 188 cases …
2024/3/22
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Systems and methods for image alignment and registration
Among the various aspects of the present disclosure are the provision of an image alignment and registration system and a breast cancer risk prediction system.
2024/1/18
Article DetailsGraham Colditz, MD, DrPH
Washington University in St. Louis
Handbook of Health and Rehabilitation Psychology
Physical activity and health
Increasing exercise and physical activity participation is currently a major health objective in the United States (US Department of Health and Human Services, 1990). People who are active and fit are likely to live longer than those who are not and they are more likely to avoid several debilitating chronic disease conditions. In fact, evidence relating exercise to health now suggests that a larger propor-
1995/10/31
Article DetailsOther articles from The Annals of Applied Statistics journal
Kun Chen
University of Connecticut
The Annals of Applied Statistics
Tensor regression for incomplete observations with application to longitudinal studies
The supplementary material contains the extension of the proposed method to log-contrast models, the derivation of the model fitting algorithms, and additional numerical results for simulation and real data studies.
2024/6
Article DetailsPeter A. Sims
Columbia University in the City of New York
The Annals of Applied Statistics
RZiMM-scRNA: A regularized zero-inflated mixture model framework for single-cell RNA-seq data
We provide additional plots and tables for the results section in Supplementary Information.
2024/3
Article DetailsWensheng Guo
University of Pennsylvania
The Annals of Applied Statistics
Semiparametric bivariate hierarchical state space model with application to hormone circadian relationship
In the supplementary material, we provide detailed calculations for the EM algorithm and the likelihood ratio test. Additionally, we include further results from the application analysis and simulation studies.
2024/6
Article DetailsTianchen Xu
Columbia University in the City of New York
The Annals of Applied Statistics
Tensor regression for incomplete observations with application to longitudinal studies
The supplementary material contains the extension of the proposed method to log-contrast models, the derivation of the model fitting algorithms, and additional numerical results for simulation and real data studies.
2024/6
Article DetailsAndrew D Paterson
University of Toronto
The Annals of Applied Statistics
A population-aware retrospective regression to detect genome-wide variants with sex difference in allele frequency
Supplementary figures.
2024/6
Article DetailsFlorian Steinke
Technische Universität Darmstadt
The Annals of Applied Statistics
Generative machine learning methods for multivariate ensemble postprocessing
Ablation studies regarding the architecture and hyperparameter choices of the conditional generative model and some results not shown in the paper are provided.
2024/3
Article DetailsJorge Mateu
Universidad Jaime I
The Annals of Applied Statistics
A nonseparable first-order spatiotemporal intensity for events on linear networks: An application to ambulance interventions
The supplementary material summarises the results obtained when testing the spatial predictive accuracy considering different time periods and alternative training sets. Moreover, it includes additional details on the comparison with planar and separable modelling approaches. Finally, we also report more precise details regarding the computing times on the extended road network.
2024/3
Article DetailsPaola Crippa
University of Notre Dame
The Annals of Applied Statistics
Sensitivity analysis of wind energy resources with Bayesian non-Gaussian and nonstationary functional ANOVA
The supplement contains additional analyses and plots in support to the main findings in the paper. The code for this work is available at the following GitHub repository: github.com/Env-an-Stat-group/24.Zhang.AoAS.
2024/3
Article DetailsJian Kang
University of Michigan
The Annals of Applied Statistics
Latent subgroup identification in image-on-scalar regression
In the supplementary material, we provide supplemental information about the Hermite polynomials and basis function construction, the process of using the GPfit package to estimate the smoothing parameter, sensitivity analysis with varying hyperparameter values, and additional figures describing the detailed simulation and application results.
2024/3
Article DetailsStefano Mazzuco
Università degli Studi di Padova
The Annals of Applied Statistics
Functional concurrent regression with compositional covariates and its application to the time-varying effect of causes of death on human longevity
This directory contains the R package fcrc, including the code for reproducing the analysis, the simulation studies and all images of the paper.
2024/6
Article DetailsJon Wakefield
University of Washington
The Annals of Applied Statistics
A Bayesian hierarchical small area population model accounting for data source specific methodologies from American Community Survey, Population Estimates Program, and …
Appendices A–H. Appendix A: Notation Table, Appendix B: Modeling Approaches for Satiotemporal Interaction Terms, Appendix C: Summary of Model Assumptions and Justifications, Appendix D: Naive Model Description, Appendix E: Comparison of Errors, Appendix F: Validation Results for Selected Counties, Appendix G: Maps of Relative Differences, Appendix H: County Estimates for Georgia.
2024/6
Article DetailsKatherine P. Liao, MD, MPH
Harvard University
The Annals of Applied Statistics
Learning healthcare delivery network with longitudinal electronic health records data
This Supplementary Material includes details for marginal pseudo-likelihood, comparison of MIC to cAIC and cBIC, and sensitivity analysis of the choice to basis functions in the real data application.
2024/3
Article DetailsJeffrey P. Spence
Stanford University
The Annals of Applied Statistics
A simple and flexible test of sample exchangeability with applications to statistical genomics
The Supplementary Information PDF includes technical details, proofs, and supplementary figures for our work.
2024/3
Article DetailsMarc G Genton
King Abdullah University of Science and Technology
The Annals of Applied Statistics
Sensitivity analysis of wind energy resources with Bayesian non-Gaussian and nonstationary functional ANOVA
The supplement contains additional analyses and plots in support to the main findings in the paper. The code for this work is available at the following GitHub repository: github.com/Env-an-Stat-group/24.Zhang.AoAS.
2024/3
Article DetailsJie Peng
University of California, Davis
The Annals of Applied Statistics
Estimating fiber orientation distribution with application to study brain lateralization using HCP D-MRI data
A supplementary text with additional details on FOD estimators, synthetic experiments results and the HCP D-MRI application.
2024/3
Article DetailsBijan Niknam
Harvard University
The Annals of Applied Statistics
Privacy-preserving, communication-efficient, and target-flexible hospital quality measurement
The Supplementary Material consists of five appendices. In Appendix I we derive the form of the influence functions. In Appendix II, we show how patient-level information is not required to solve for the data-adaptive hospital-level weights (i.e., summary-level information is sufficient). In Appendix III, we prove the data-adaptive property of the hospital-level weights. Appendix IV and Appendix V contain additional results from the simulation study and real data analysis, respectively.
2024/6
Article DetailsDanica M. Ommen
Iowa State University
The Annals of Applied Statistics
Density-based matching rule: Optimality, estimation, and application in forensic problems
The online supplement provides additional figures described in this article and the details of non-Gaussian distributions used in the simulation study.
2024/3
Article DetailsBrian J Reich
North Carolina State University
The Annals of Applied Statistics
Modeling extremal streamflow using deep learning approximations and a flexible spatial process
The Supplementary Material consists of three appendices. Appendix A goes over the some properties of the PMM, and an overview of the variable importance measure used in the text. Appendix B presents supplementary simulation studies detailing the performance of the PMM in various density estimation and parameter estimation scenarios. Appendix C consists of additional results from the HCDN data analysis, including MCMC convergence, model comparison and model fit results, and selected results from analyzing the extremal streamflow data in its original scale.
2024/6
Article DetailsWeining Shen
University of California, Irvine
The Annals of Applied Statistics
Risk-aware restricted outcome learning for individualized treatment regimes of schizophrenia
Additional numerical results and the algorithm description are provided.
2024/6
Article DetailsTheo Economou
University of Exeter
The Annals of Applied Statistics
A hierarchical spline model for correcting and hindcasting temperature data
All the data, code and Supplementary Material (including Figures S1–S5) are available online (Economou, Johnson and Dyson (2024)) but can also be accessed at Zenodo (Economou (2023)) with DOI: 10.5281/zenodo.10074436. Note that the station names have been anonymised for confidentiality purposes. This repository comprises a single zipped file, code_data_supplementary_plots_v2.zip, which includes all the supplementary figures referenced in the paper: FigureS1.pdf: trace plot of the MCMC samples for the deviance. FigureS2.pdf: predicted vs observed Tmax values for each station. FigureS3.pdf: QQ plot for each station. FigureS4.pdf: Empirical and predicted autocorrelation plots for the 10 stations with long enough time series. FigureS5.pdf: Empirical autocorrelation plots for all stations (except 11, 16, 19 and 21) up to lag 30 Figure3_All_Stations.pdf: Same as Figure 3 but for all stations …
2024/6
Article Details