Functional partial least squares with censored outcomes: Prediction of breast cancer risk with mammogram images

The Annals of Applied Statistics

Published On 2024/6

Tables, figures, and additional simulation and application results.

Journal

The Annals of Applied Statistics

Published On

2024/6

Volume

18

Issue

2

Page

1051-1063

Authors

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Position

Professor Dept. of Surgery

H-Index(all)

309

H-Index(since 2020)

125

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

Epidemiology

Cancer Prevention

Health Outcomes

University Profile Page

Jiguo Cao

Jiguo Cao

Simon Fraser University

Position

H-Index(all)

28

H-Index(since 2020)

21

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

Functional Data Analysis

Estimating Differential Equations

Machine Learning

University Profile Page

Other Articles from authors

Jiguo Cao

Jiguo Cao

Simon Fraser University

1 Product of Normal Densities

Supplemental Figure 3: Parameter ground truths and parameter estimates from fitting asymptotic and unmarked (traditional) N-mixture models with K= 300. Estimated parameters are λ (initial mean site abundance), γ (importation rate), ω (survival probability), and p (probability of detection). A total of 1, 1000 simulations are represented.

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Cancer Research

Longitudinal latent class analysis to further understand trajectory of density over time and risk of breast cancer

Background: It is clinically important to refine strategies to manage women with dense breasts as they represent approximately 50% of all women screened. Change in breast density is related to risk of breast cancer.1 We also must understand this change in density over time in women with dense and non-dense breasts. While population level data have been reported to estimate decrease in density with age, this has largely used digitized film images.2 Expanding use of digital mammography and repeated screening generates a large library of images for each woman, offering the potential to use more of the image data. Methods: We use longitudinal latent class analysis to cluster women and estimate their change in density over time. To evaluate the groups of women with different patterns of change in density we fit latent class models to our previously published data.1 This includes 289 pathology confirmed …

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Cancer Medicine

Racial differences in treatment and survival among older patients with multiple myeloma

Background Treatments for multiple myeloma (MM) have evolved over time and improved MM survival. While racial differences in MM treatment and prognosis between non‐Hispanic African American (NHAA) and non‐Hispanic White (NHW) patients are well‐established, it is unclear whether they have persisted after the introduction of novel agents. Methods Using the Surveillance, Epidemiology, and End Results‐Medicare linked database, our study investigated racial difference in the receipt of treatment within 1 year following diagnosis and assessed survival outcomes among Medicare beneficiaries (≥66 years) diagnosed with MM from 2007 to 2017. We applied multivariable Cox proportional hazards models to estimate the association between race and survival and presented hazard ratios (HRs). Results Of 2094 NHAA and 11,983 NHW older patients with MM, 59.5% and 64.8% received treatment …

Jiguo Cao

Jiguo Cao

Simon Fraser University

arXiv preprint arXiv:2401.09499

Functional Autoencoder for Smoothing and Representation Learning

A common pipeline in functional data analysis is to first convert the discretely observed data to smooth functions, and then represent the functions by a finite-dimensional vector of coefficients summarizing the information. Existing methods for data smoothing and dimensional reduction mainly focus on learning the linear mappings from the data space to the representation space, however, learning only the linear representations may not be sufficient. In this study, we propose to learn the nonlinear representations of functional data using neural network autoencoders designed to process data in the form it is usually collected without the need of preprocessing. We design the encoder to employ a projection layer computing the weighted inner product of the functional data and functional weights over the observed timestamp, and the decoder to apply a recovery layer that maps the finite-dimensional vector extracted from the functional data back to functional space using a set of predetermined basis functions. The developed architecture can accommodate both regularly and irregularly spaced data. Our experiments demonstrate that the proposed method outperforms functional principal component analysis in terms of prediction and classification, and maintains superior smoothing ability and better computational efficiency in comparison to the conventional autoencoders under both linear and nonlinear settings.

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Cancer Research

Abstract PO5-09-02: Breast Tissue Proteomic Profile of Breast Cancer in Premenopausal Women and Association with Mammographic Breast Density

Introduction: Breast cancer incidence is rising in premenopausal, hence, there is a critical need to understand factors underlying premenopausal breast cancer development in order to guide targeted prevention. Mammographic breast density is a strong risk factor for, as well as an intermediate phenotype for premenopausal breast cancer. Yet, the molecular mechanisms underlying the associations of dense breasts with breast cancer are not well understood. Our objectives in this study are to perform proteomic analysis in breast tissues to (i) identify proteins that are associated with breast cancer development in premenopausal women; (ii) determine which of these proteins are also associated with dense breasts. Methods: We performed proteomic analysis on tumor and adjacent normal tissues from 50 premenopausal women with breast cancer who had breast tissue samples archived at the St. Louis Breast Tumor …

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Breast Cancer Research

How does weight gain since the age of 18 years affect breast cancer risk in later life? A meta-analysis

Early life factors are important risk factors for breast cancer. The association between weight gain after age 18 and breast cancer risk is inconsistent across previous epidemiologic studies. To evaluate this association, we conducted a meta-analysis according to PRISMA guidelines and the established inclusion criteria. We performed a comprehensive literature search using Medline (Ovid), Embase, Scopus, Cochrane Library, and ClinicalTrials.gov to identify relevant studies published before June 3, 2022. Two reviewers independently reviewed the articles for final inclusion. Seventeen out of 4,725 unique studies met the selection criteria. The quality of studies was assessed using the Newcastle-Ottawa Scale (NOS), and all were of moderate to high quality with NOS scores ranging from 5 to 8. We included 17 studies (11 case-control, 6 cohort) in final analysis. In case-control studies, weight gain after age 18 was …

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Journal of Clinical Oncology

Plant Foods Intake and Risk of Premature Aging in Adult Survivors of Childhood Cancer in the St Jude Lifetime Cohort (SJLIFE)

PURPOSETo identify dietary factors that are related to premature aging in adult survivors of childhood cancer, we examined the associations between plant food intakes and age-related deficit accumulation.METHODSA total of 3,322 childhood cancer survivors (age 18-65 years, mean = 31, standard deviation = 8.4) in the St Jude Lifetime Cohort had total fruit, total vegetables and subgroups, whole grains, refined grains, nuts/seeds, and nutrients intake assessed using a food frequency questionnaire. Premature aging at baseline was assessed by the deficit accumulation index (DAI) and categorized as low, medium, and high risk. Multinomial logistic regressions (reference: low risk) adjusting for confounders estimated odds ratios (ORs) and 95% CIs. Multivariable linear regression of a continuous intake against a continuous DAI was also performed.RESULTSDark green vegetable (ORhigh v low = 0.47 [95% CI, 0.28 …

Jiguo Cao

Jiguo Cao

Simon Fraser University

Journal of Computational and Graphical Statistics

Functional nonlinear learning

Using representations of functional data can be more convenient and beneficial in subsequent statistical models than direct observations. These representations, in a lower-dimensional space, extract and compress information from individual curves. The existing representation learning approaches in functional data analysis usually use linear mapping in parallel to those from multivariate analysis, for example, functional principal component analysis (FPCA). However, functions, as infinite-dimensional objects, sometimes have nonlinear structures that cannot be uncovered by linear mapping. Linear methods will be more overwhelmed by multivariate functional data. For that matter, this article proposes a functional nonlinear learning (FunNoL) method to sufficiently represent multivariate functional data in a lower-dimensional feature space. Furthermore, we merge a classification model for enriching the ability of …

Jiguo Cao

Jiguo Cao

Simon Fraser University

The Annals of Applied Statistics

How are PreLaunch online movie reviews related to box office revenues?

The Supplementary Material contains the mean curves and the top three FPCs estimated from the observed quantile trajectories at quantile levels 0.1, 0.25, 0.5, and 0.9 and additional results for predicting the week two box office revenues.

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Cancer Research

Abstract PO3-09-01: Integrating pathomic and radiomic images to classify risk of subsequent events among women with DCIS

Background Multiple sources of ~omic data can be generated from women at different stages of developing breast cancer, the leading cancer diagnosed in women worldwide. Traditionally interrogation of risk factors to study associations and develop prediction models for future breast events has been limited to one or few risk factors, or summary scores of clinical and tumor characteristics. Methods to bring mammography images and breast biopsies of precancer lesions together to summarize risk of cancer developing in the breast are urgently needed. Integration of these two sources has not been performed to date, but has potential to increase accuracy of risk prediction. Approach The Repository of Archival Human Breast Tissue (RAHBT) was established in 2007 at Washington University School of Medicine (WUSM) and maintains biospecimens and medical record data of women treated with breast-conserving …

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Statistics in Medicine

Modeling correlated pairs of mammogram images

Mammography remains the primary screening strategy for breast cancer, which continues to be the most prevalent cancer diagnosis among women globally. Because screening mammograms capture both the left and right breast, there is a nonnegligible correlation between the pair of images. Previous studies have explored the concept of averaging between the pair of images after proper image registration; however, no comparison has been made in directly utilizing the paired images. In this paper, we extend the bivariate functional principal component analysis over triangulations to jointly characterize the pair of imaging data bounded in an irregular domain and then nest the extracted features within the survival model to predict the onset of breast cancer. The method is applied to our motivating data from the Joanne Knight Breast Health Cohort at Siteman Cancer Center. Our findings indicate that there was no …

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Cancer Causes & Control

Pectoral muscle removal in mammogram images: A novel approach for improved accuracy and efficiency

PurposeAccurate pectoral muscle removal is critical in mammographic breast density estimation and many other computer-aided algorithms. We propose a novel approach to remove pectoral muscles form mediolateral oblique (MLO) view mammograms and compare accuracy and computational efficiency with existing method (Libra).MethodsA pectoral muscle identification pipeline was developed. The image is first binarized to enhance contrast and then the Canny algorithm was applied for edge detection. Robust interpolation is used to smooth out the pectoral muscle region. Accuracy and computational speed of pectoral muscle identification was assessed using 951 women (1,902 MLO mammograms) from the Joanne Knight Breast Health Cohort at Washington University School of Medicine.ResultsOur proposed algorithm exhibits lower mean error of 12.22% in comparison to Libra’s estimated error of 20.44 …

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

screening

Most cancer deaths are unaddressed by current screening paradigms

○ In the United States (US), cancer is the leading cause of death among adults≤ 80 years1

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Cancer Causes & Control

Avoiding lead-time bias by estimating stage-specific proportions of cancer and non-cancer deaths

PurposeUnderstanding how stage at cancer diagnosis influences cause of death, an endpoint that is not susceptible to lead-time bias, can inform population-level outcomes of cancer screening.MethodsUsing data from 17 US Surveillance, Epidemiology, and End Results registries for 1,154,515 persons aged 50–84 years at cancer diagnosis in 2006–2010, we evaluated proportional causes of death by cancer type and uniformly classified stage, following or extrapolating all patients until death through 2020.ResultsMost cancer patients diagnosed at stages I–II did not go on to die from their index cancer, whereas most patients diagnosed at stage IV did. For patients diagnosed with any cancer at stages I–II, an estimated 26% of deaths were due to the index cancer, 63% due to non-cancer causes, and 12% due to a subsequent primary (non-index) cancer. In contrast, for patients diagnosed with any stage IV cancer …

Jiguo Cao

Jiguo Cao

Simon Fraser University

Journal of Agricultural, Biological and Environmental Statistics

Faster Asymptotic Solutions for N-Mixtures on Large Populations

We derive an asymptotic likelihood function for open-population N-mixture models and show that it has favorable computational complexity and accuracy when compared to the traditional likelihood function for large population sizes. We validate our asymptotic model with simulation studies and apply our model to estimate the population size of Ancient Murrelet chicks, comparing against results obtained using the traditional N-mixture likelihood and an alternative asymptotic model based on the multivariate normal distribution. For the Ancient Murrelet case study, our asymptotic model computes twice as fast as the traditional models, eleven times faster when parallel processing is used, and provides higher-precision estimates than the asymptotic multivariate normal model. We provide an open-source implementation of our methods in the quickNmix R package. Supplementary material to this paper is provided online.

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Cancer Research

Dynamic prediction with repeated mammogram images improves 5-year breast cancer risk performance

Background: Increasing use of digital screening mammography opens potential to add individualized dynamic measures of breast image history to predict future breast cancer risk. Current prediction models use a single mammogram measure, usually breast density, at one point in time, to add to prediction models based on established risk factors. Our previous work has shown that change in each breast over time is related to risk of diagnosis with breast cancer. Methods: We apply a dynamic prediction model that includes full field digital mammogram in each breast to predict future 5-year risk. Data from Joanne Knight Breast Health Cohort (JKBHC) serve as training population with 290 pathology-confirmed breast cancer cases and 658 controls sampled with a total number of 8,712 craniocaudal (CC) view mammograms for analysis. Nurses’ Health Study 2 (NHS2) serves as the external validation with 188 cases …

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Systems and methods for image alignment and registration

Among the various aspects of the present disclosure are the provision of an image alignment and registration system and a breast cancer risk prediction system.

Graham Colditz, MD, DrPH

Graham Colditz, MD, DrPH

Washington University in St. Louis

Handbook of Health and Rehabilitation Psychology

Physical activity and health

Increasing exercise and physical activity participation is currently a major health objective in the United States (US Department of Health and Human Services, 1990). People who are active and fit are likely to live longer than those who are not and they are more likely to avoid several debilitating chronic disease conditions. In fact, evidence relating exercise to health now suggests that a larger propor-

1995/10/31

Article Details

Other articles from The Annals of Applied Statistics journal

Kun Chen

Kun Chen

University of Connecticut

The Annals of Applied Statistics

Tensor regression for incomplete observations with application to longitudinal studies

The supplementary material contains the extension of the proposed method to log-contrast models, the derivation of the model fitting algorithms, and additional numerical results for simulation and real data studies.

Peter A. Sims

Peter A. Sims

Columbia University in the City of New York

The Annals of Applied Statistics

RZiMM-scRNA: A regularized zero-inflated mixture model framework for single-cell RNA-seq data

We provide additional plots and tables for the results section in Supplementary Information.

Wensheng Guo

Wensheng Guo

University of Pennsylvania

The Annals of Applied Statistics

Semiparametric bivariate hierarchical state space model with application to hormone circadian relationship

In the supplementary material, we provide detailed calculations for the EM algorithm and the likelihood ratio test. Additionally, we include further results from the application analysis and simulation studies.

Tianchen Xu

Tianchen Xu

Columbia University in the City of New York

The Annals of Applied Statistics

Tensor regression for incomplete observations with application to longitudinal studies

The supplementary material contains the extension of the proposed method to log-contrast models, the derivation of the model fitting algorithms, and additional numerical results for simulation and real data studies.

Florian Steinke

Florian Steinke

Technische Universität Darmstadt

The Annals of Applied Statistics

Generative machine learning methods for multivariate ensemble postprocessing

Ablation studies regarding the architecture and hyperparameter choices of the conditional generative model and some results not shown in the paper are provided.

Jorge Mateu

Jorge Mateu

Universidad Jaime I

The Annals of Applied Statistics

A nonseparable first-order spatiotemporal intensity for events on linear networks: An application to ambulance interventions

The supplementary material summarises the results obtained when testing the spatial predictive accuracy considering different time periods and alternative training sets. Moreover, it includes additional details on the comparison with planar and separable modelling approaches. Finally, we also report more precise details regarding the computing times on the extended road network.

Paola Crippa

Paola Crippa

University of Notre Dame

The Annals of Applied Statistics

Sensitivity analysis of wind energy resources with Bayesian non-Gaussian and nonstationary functional ANOVA

The supplement contains additional analyses and plots in support to the main findings in the paper. The code for this work is available at the following GitHub repository: github.com/Env-an-Stat-group/24.Zhang.AoAS.

Jian Kang

Jian Kang

University of Michigan

The Annals of Applied Statistics

Latent subgroup identification in image-on-scalar regression

In the supplementary material, we provide supplemental information about the Hermite polynomials and basis function construction, the process of using the GPfit package to estimate the smoothing parameter, sensitivity analysis with varying hyperparameter values, and additional figures describing the detailed simulation and application results.

Stefano Mazzuco

Stefano Mazzuco

Università degli Studi di Padova

The Annals of Applied Statistics

Functional concurrent regression with compositional covariates and its application to the time-varying effect of causes of death on human longevity

This directory contains the R package fcrc, including the code for reproducing the analysis, the simulation studies and all images of the paper.

Jon Wakefield

Jon Wakefield

University of Washington

The Annals of Applied Statistics

A Bayesian hierarchical small area population model accounting for data source specific methodologies from American Community Survey, Population Estimates Program, and …

Appendices A–H. Appendix A: Notation Table, Appendix B: Modeling Approaches for Satiotemporal Interaction Terms, Appendix C: Summary of Model Assumptions and Justifications, Appendix D: Naive Model Description, Appendix E: Comparison of Errors, Appendix F: Validation Results for Selected Counties, Appendix G: Maps of Relative Differences, Appendix H: County Estimates for Georgia.

Katherine P. Liao, MD, MPH

Katherine P. Liao, MD, MPH

Harvard University

The Annals of Applied Statistics

Learning healthcare delivery network with longitudinal electronic health records data

This Supplementary Material includes details for marginal pseudo-likelihood, comparison of MIC to cAIC and cBIC, and sensitivity analysis of the choice to basis functions in the real data application.

Jeffrey P. Spence

Jeffrey P. Spence

Stanford University

The Annals of Applied Statistics

A simple and flexible test of sample exchangeability with applications to statistical genomics

The Supplementary Information PDF includes technical details, proofs, and supplementary figures for our work.

Marc G Genton

Marc G Genton

King Abdullah University of Science and Technology

The Annals of Applied Statistics

Sensitivity analysis of wind energy resources with Bayesian non-Gaussian and nonstationary functional ANOVA

The supplement contains additional analyses and plots in support to the main findings in the paper. The code for this work is available at the following GitHub repository: github.com/Env-an-Stat-group/24.Zhang.AoAS.

Jie Peng

Jie Peng

University of California, Davis

The Annals of Applied Statistics

Estimating fiber orientation distribution with application to study brain lateralization using HCP D-MRI data

A supplementary text with additional details on FOD estimators, synthetic experiments results and the HCP D-MRI application.

Bijan Niknam

Bijan Niknam

Harvard University

The Annals of Applied Statistics

Privacy-preserving, communication-efficient, and target-flexible hospital quality measurement

The Supplementary Material consists of five appendices. In Appendix I we derive the form of the influence functions. In Appendix II, we show how patient-level information is not required to solve for the data-adaptive hospital-level weights (i.e., summary-level information is sufficient). In Appendix III, we prove the data-adaptive property of the hospital-level weights. Appendix IV and Appendix V contain additional results from the simulation study and real data analysis, respectively.

Danica M. Ommen

Danica M. Ommen

Iowa State University

The Annals of Applied Statistics

Density-based matching rule: Optimality, estimation, and application in forensic problems

The online supplement provides additional figures described in this article and the details of non-Gaussian distributions used in the simulation study.

Brian J Reich

Brian J Reich

North Carolina State University

The Annals of Applied Statistics

Modeling extremal streamflow using deep learning approximations and a flexible spatial process

The Supplementary Material consists of three appendices. Appendix A goes over the some properties of the PMM, and an overview of the variable importance measure used in the text. Appendix B presents supplementary simulation studies detailing the performance of the PMM in various density estimation and parameter estimation scenarios. Appendix C consists of additional results from the HCDN data analysis, including MCMC convergence, model comparison and model fit results, and selected results from analyzing the extremal streamflow data in its original scale.

Weining Shen

Weining Shen

University of California, Irvine

The Annals of Applied Statistics

Risk-aware restricted outcome learning for individualized treatment regimes of schizophrenia

Additional numerical results and the algorithm description are provided.

Theo Economou

Theo Economou

University of Exeter

The Annals of Applied Statistics

A hierarchical spline model for correcting and hindcasting temperature data

All the data, code and Supplementary Material (including Figures S1–S5) are available online (Economou, Johnson and Dyson (2024)) but can also be accessed at Zenodo (Economou (2023)) with DOI: 10.5281/zenodo.10074436. Note that the station names have been anonymised for confidentiality purposes. This repository comprises a single zipped file, code_data_supplementary_plots_v2.zip, which includes all the supplementary figures referenced in the paper: FigureS1.pdf: trace plot of the MCMC samples for the deviance. FigureS2.pdf: predicted vs observed Tmax values for each station. FigureS3.pdf: QQ plot for each station. FigureS4.pdf: Empirical and predicted autocorrelation plots for the 10 stations with long enough time series. FigureS5.pdf: Empirical autocorrelation plots for all stations (except 11, 16, 19 and 21) up to lag 30 Figure3_All_Stations.pdf: Same as Figure 3 but for all stations …