Jiguo Cao

Jiguo Cao

Simon Fraser University

H-index: 28

North America-Canada

About Jiguo Cao

Jiguo Cao, With an exceptional h-index of 28 and a recent h-index of 21 (since 2020), a distinguished researcher at Simon Fraser University, specializes in the field of Functional Data Analysis, Estimating Differential Equations, Machine Learning.

His recent articles reflect a diverse array of research interests and contributions to the field:

1 Product of Normal Densities

Functional Autoencoder for Smoothing and Representation Learning

Functional nonlinear learning

How are PreLaunch online movie reviews related to box office revenues?

Functional partial least squares with censored outcomes: Prediction of breast cancer risk with mammogram images

Faster Asymptotic Solutions for N-Mixtures on Large Populations

Learning from limited temporal data: Dynamically sparse historical functional linear models with applications to Earth science

Predicting Alzheimer’s disease progression in healthy and MCI subjects using multi‐modal deep learning approach

Jiguo Cao Information

University

Simon Fraser University

Position

___

Citations(all)

3654

Citations(since 2020)

1742

Cited By

2540

hIndex(all)

28

hIndex(since 2020)

21

i10Index(all)

55

i10Index(since 2020)

46

Email

University Profile Page

Simon Fraser University

Jiguo Cao Skills & Research Interests

Functional Data Analysis

Estimating Differential Equations

Machine Learning

Top articles of Jiguo Cao

1 Product of Normal Densities

Authors

Matthew RP Parker,Jiguo Cao,Laura LE Cowen,Lloyd T Elliott

Published Date

2024/3/4

Supplemental Figure 3: Parameter ground truths and parameter estimates from fitting asymptotic and unmarked (traditional) N-mixture models with K= 300. Estimated parameters are λ (initial mean site abundance), γ (importation rate), ω (survival probability), and p (probability of detection). A total of 1, 1000 simulations are represented.

Functional Autoencoder for Smoothing and Representation Learning

Authors

Sidi Wu,Cédric Beaulac,Jiguo Cao

Journal

arXiv preprint arXiv:2401.09499

Published Date

2024/1/17

A common pipeline in functional data analysis is to first convert the discretely observed data to smooth functions, and then represent the functions by a finite-dimensional vector of coefficients summarizing the information. Existing methods for data smoothing and dimensional reduction mainly focus on learning the linear mappings from the data space to the representation space, however, learning only the linear representations may not be sufficient. In this study, we propose to learn the nonlinear representations of functional data using neural network autoencoders designed to process data in the form it is usually collected without the need of preprocessing. We design the encoder to employ a projection layer computing the weighted inner product of the functional data and functional weights over the observed timestamp, and the decoder to apply a recovery layer that maps the finite-dimensional vector extracted from the functional data back to functional space using a set of predetermined basis functions. The developed architecture can accommodate both regularly and irregularly spaced data. Our experiments demonstrate that the proposed method outperforms functional principal component analysis in terms of prediction and classification, and maintains superior smoothing ability and better computational efficiency in comparison to the conventional autoencoders under both linear and nonlinear settings.

Functional nonlinear learning

Authors

Haixu Wang,Jiguo Cao

Journal

Journal of Computational and Graphical Statistics

Published Date

2024/1/2

Using representations of functional data can be more convenient and beneficial in subsequent statistical models than direct observations. These representations, in a lower-dimensional space, extract and compress information from individual curves. The existing representation learning approaches in functional data analysis usually use linear mapping in parallel to those from multivariate analysis, for example, functional principal component analysis (FPCA). However, functions, as infinite-dimensional objects, sometimes have nonlinear structures that cannot be uncovered by linear mapping. Linear methods will be more overwhelmed by multivariate functional data. For that matter, this article proposes a functional nonlinear learning (FunNoL) method to sufficiently represent multivariate functional data in a lower-dimensional feature space. Furthermore, we merge a classification model for enriching the ability of …

How are PreLaunch online movie reviews related to box office revenues?

Authors

Tianyu Guan,Jason Ho,Robert Krider,Jiguo Cao,Andrew Fogg

Journal

The Annals of Applied Statistics

Published Date

2024/6

The Supplementary Material contains the mean curves and the top three FPCs estimated from the observed quantile trajectories at quantile levels 0.1, 0.25, 0.5, and 0.9 and additional results for predicting the week two box office revenues.

Functional partial least squares with censored outcomes: Prediction of breast cancer risk with mammogram images

Authors

Shu Jiang,Jiguo Cao,Graham A Colditz

Journal

The Annals of Applied Statistics

Published Date

2024/6

Tables, figures, and additional simulation and application results.

Faster Asymptotic Solutions for N-Mixtures on Large Populations

Authors

MRP Parker,J Cao,LLE Cowen,LT Elliott

Journal

Journal of Agricultural, Biological and Environmental Statistics

Published Date

2024/3/30

We derive an asymptotic likelihood function for open-population N-mixture models and show that it has favorable computational complexity and accuracy when compared to the traditional likelihood function for large population sizes. We validate our asymptotic model with simulation studies and apply our model to estimate the population size of Ancient Murrelet chicks, comparing against results obtained using the traditional N-mixture likelihood and an alternative asymptotic model based on the multivariate normal distribution. For the Ancient Murrelet case study, our asymptotic model computes twice as fast as the traditional models, eleven times faster when parallel processing is used, and provides higher-precision estimates than the asymptotic multivariate normal model. We provide an open-source implementation of our methods in the quickNmix R package. Supplementary material to this paper is provided online.

Learning from limited temporal data: Dynamically sparse historical functional linear models with applications to Earth science

Authors

Joseph Janssen,Shizhe Meng,Asad Haris,Stefan Schrunner,Jiguo Cao,William J Welch,Nadja Kunz,Ali A Ameli

Journal

arXiv preprint arXiv:2303.06501

Published Date

2023/3/11

Scientists and statisticians often want to learn about the complex relationships that connect two variables that vary over time. Recent work on sparse functional historical linear models confirms that they are promising for this purpose, but several notable limitations exist. Most importantly, previous works have imposed sparsity on the coefficient function, but have not allowed the sparsity, hence lag, to vary with time. We simplify the framework of sparse functional historical linear models by using a rectangular coefficient structure along with Whittaker smoothing, then relax the previous frameworks by estimating the dynamic time lag from a hierarchical coefficient structure. We motivate our study by aiming to extract the physical rainfall-runoff processes hidden within hydrological data. We show the promise and accuracy of our method using four simulation studies, justified by two real sets of hydrological data.

Predicting Alzheimer’s disease progression in healthy and MCI subjects using multi‐modal deep learning approach

Authors

Ghazal Mirabnahrazam,Da Ma,Cédric Beaulac,Sieun Lee,Karteek Popuri,Hyunwoo Lee,Jiguo Cao,Lei Wang,James E Galvin,Mirza Faisal Beg

Journal

Alzheimer's & Dementia

Published Date

2023/6

Background Alzheimer’s disease (AD) is a complex disorder influenced by many factors, but it is unclear how each factor contributes to disease progression. An in‐depth examination of these factors may yield an accurate estimate of time‐to‐conversion to AD for patients at various disease stages. Recent advances in deep learning have enabled researchers to predict patient’s disease onset time by exploring the influencing factors in AD progression. Method We used 543 subjects with 63 features from 3 data modalities from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The following modalities were used: 1) MRI, 2) genetic and 3) DTC (Demographic, cognitive Tests and Cerebrospinal fluid). The 21 most important features were automatically selected for the three modalities. We used a Deep Learning‐based survival analysis model that extends the classic Cox regression model to predict the …

Neural networks for scalar input and functional output

Authors

Sidi Wu,Cédric Beaulac,Jiguo Cao

Journal

Statistics and Computing

Published Date

2023

The regression of a functional response on a set of scalar predictors can be a challenging task, especially if there is a large number of predictors, or the relationship between those predictors and the response is nonlinear. In this work, we propose a solution to this problem: a feed-forward neural network (NN) designed to predict a functional response using scalar inputs. First, we transform the functional response to a finite-dimensional representation and construct an NN that outputs this representation. Then, we propose to modify the output of an NN via the objective function and introduce different objective functions for network training. The proposed models are suited for both regularly and irregularly spaced data, and a roughness penalty can be further applied to control the smoothness of the predicted curve. The difficulty in implementing both those features lies in the definition of objective functions that can be …

A dynamic interaction semiparametric function-on-scalar model

Authors

Hua Liu,Jinhong You,Jiguo Cao

Journal

Journal of the American Statistical Association

Published Date

2023/1/2

Motivated by recent work studying massive functional data, such as the COVID-19 data, we propose a new dynamic interaction semiparametric function-on-scalar (DISeF) model. The proposed model is useful to explore the dynamic interaction among a set of covariates and their effects on the functional response. The proposed model includes many important models investigated recently as special cases. By tensor product B-spline approximating the unknown bivariate coefficient functions, a three-step efficient estimation procedure is developed to iteratively estimate bivariate varying-coefficient functions, the vector of index parameters, and the covariance functions of random effects. We also establish the asymptotic properties of the estimators including the convergence rate and their asymptotic distributions. In addition, we develop a test statistic to check whether the dynamic interaction varies with time/spatial …

Computational Efficiency and Precision for Replicated-Count and Batch-Marked Hidden Population Models

Authors

Matthew RP Parker,Laura LE Cowen,Jiguo Cao,Lloyd T Elliott

Journal

Journal of Agricultural, Biological and Environmental Statistics

Published Date

2023/3

We address two computational issues common to open-population N-mixture models, hidden integer-valued autoregressive models, and some hidden Markov models. The first issue is computation time, which can be dramatically improved through the use of a fast Fourier transform. The second issue is tractability of the model likelihood function for large numbers of hidden states, which can be solved by improving numerical stability of calculations. As an illustrative example, we detail the application of these methods to the open-population N-mixture models. We compare computational efficiency and precision between these methods and standard methods employed by state-of-the-art ecological software. We show faster computing times (a to times speed improvement for population size upper bounds of 500 and 1000, respectively) over state-of-the-art ecological software for N-mixture models. We also …

A Gaussian Sliding Windows Regression Model for Hydrological Inference

Authors

Stefan Schrunner,Joseph Janssen,Anna Jenul,Jiguo Cao,Ali A Ameli,William J Welch

Journal

arXiv preprint arXiv:2306.00453

Published Date

2023/6/1

Statistical models are an essential tool to model, forecast and understand the hydrological processes in watersheds. In particular, the modeling of time lags associated with the time between rainfall occurrence and subsequent changes in streamflow, is of high practical importance. Since water can take a variety of flowpaths to generate streamflow, a series of distinct runoff pulses from different flowpath may combine to create the observed streamflow time series. Current state-of-the-art models are not able to sufficiently confront the problem complexity with interpretable parametrization, which would allow insights into the dynamics of the distinct flow paths for hydrological inference. The proposed Gaussian Sliding Windows Regression Model targets this problem by combining the concept of multiple windows sliding along the time axis with multiple linear regression. The window kernels, which indicate the weights applied to different time lags, are implemented via Gaussian-shaped kernels. As a result, each window can represent one flowpath and, thus, offers the potential for straightforward process inference. Experiments on simulated and real-world scenarios underline that the proposed model achieves accurate parameter estimates and competitive predictive performance, while fostering explainable and interpretable hydrological modeling.

Unsupervised learning on US weather forecast performance

Authors

Chuyuan Lin,Ying Yu,Lucas Y Wu,Jiguo Cao

Journal

Computational Statistics

Published Date

2023/9

Nowadays, climate events and weather predictions have a huge impact on human activities. To understand the accuracy of weather prediction, we applied the functional principal component analysis (FPCA) method to investigate the main pattern of variance within the U.S. weather prediction error over a period of 3 years. We further grouped the states in the U.S. based on their similarity in weather forecast performance using two types of functional clustering approaches: the filtering method and the model-based method. The strengths and weaknesses of each clustering method were detected through the simulation studies. Then, the clustering approaches were applied to U.S. weather data from 2014 to 2017. Through clustering, cluster-specific patterns were visually detected, and the cluster-to-cluster differences were quantified in order to identify the most and least predictable U.S. states.

Deep learning with functional inputs

Authors

Barinder Thind,Kevin Multani,Jiguo Cao

Journal

Journal of Computational and Graphical Statistics

Published Date

2023/1/2

We present a methodology for integrating functional data into deep neural networks. The model is defined for scalar responses with multiple functional and scalar covariates. A by-product of the method is a set of dynamic functional weights that can be visualized during the optimization process. This visualization leads to a greater interpretability of the relationship between the covariates and the response relative to conventional neural networks. The model is shown to perform well in a number of contexts including prediction of new data and recovery of the true underlying relationship between the functional covariate and scalar response; these results were confirmed through real data applications and simulation studies. An R package (FuncNN) has also been developed on top of Keras, a popular deep learning library—this allows for general use of the approach. A supplemental document, the data and R codes are …

Estimating functional single index models with compact support

Authors

Yunlong Nie,Liangliang Wang,Jiguo Cao

Journal

Environmetrics

Published Date

2023/3

The functional single index models are widely used to describe the nonlinear relationship between a scalar response and a functional predictor. The conventional functional single index model assumes that the coefficient function is nonzero in the entire time domain. In other words, the functional predictor always has a nonzero effect on the response all the time. We propose a new compact functional single index model, in which the coefficient function is only nonzero in a subregion. We also propose an efficient method that can simultaneously estimate the nonlinear link function, the coefficient function and also the nonzero region of the coefficient function. Hence, our method can identify the region in which the functional predictor is related to the response. Our method is illustrated by an application example in which the total number of daily bike rentals is predicted based on hourly temperature data. The finite …

Supervised two-dimensional functional principal component analysis with time-to-event outcomes and mammogram imaging data

Authors

Shu Jiang,Jiguo Cao,Bernard Rosner,Graham A Colditz

Journal

Biometrics

Published Date

2023/6

Screening mammography aims to identify breast cancer early and secondarily measures breast density to classify women at higher or lower than average risk for future breast cancer in the general population. Despite the strong association of individual mammography features to breast cancer risk, the statistical literature on mammogram imaging data is limited. While functional principal component analysis (FPCA) has been studied in the literature for extracting image-based features, it is conducted independently of the time-to-event response variable. With the consideration of building a prognostic model for precision prevention, we present a set of flexible methods, supervised FPCA (sFPCA) and functional partial least squares (FPLS), to extract image-based features associated with the failure time while accommodating the added complication from right censoring. Throughout the article, we hope to …

Nonlinear prediction of functional time series

Authors

Haixu Wang,Jiguo Cao

Journal

Environmetrics

Published Date

2023/8

We propose a nonlinear prediction (NOP) method for functional time series. Conventional methods for functional time series are mainly based on functional principal component analysis or functional regression models. These approaches rely on the stationary or linear assumption of the functional time series. However, real data sets are often nonstationary, and the temporal dependence between trajectories cannot be captured by linear models. Conventional methods are also hard to analyze multivariate functional time series. To tackle these challenges, the NOP method employs a nonlinear mapping for functional data that can be directly applied to multivariate functions without any preprocessing step. The NOP method constructs feature space with forecast information, hence it provides a better ground for predicting future trajectories. The NOP method avoids calculating covariance functions and enables online …

Functional L-Optimality Subsampling for Functional Generalized Linear Models with Massive Data

Authors

Hua Liu,Jinhong You,Jiguo Cao

Journal

Journal of Machine Learning Research

Published Date

2023

Massive data bring the big challenges of memory and computation for analysis. These challenges can be tackled by taking subsamples from the full data as a surrogate. For functional data, it is common to collect multiple measurements over their domains, which require even more memory and computation time when the sample size is large. The computation would be much more intensive when statistical inference is required through bootstrap samples. Motivated by analyzing large-scale kidney transplant data, we propose an optimal subsampling method based on the functional L-optimality criterion for functional generalized linear models. To the best of our knowledge, this is the first attempt to propose a subsampling method for functional data analysis. The asymptotic properties of the resultant estimators are also established. The analysis results from extensive simulation studies and from the kidney transplant data show that the functional L-optimality subsampling (FLoS) method is much better than the uniform subsampling approach and can well approximate the results based on the full data while dramatically reducing the computation time and memory.

Automatic search intervals for the smoothing parameter in penalized splines

Authors

Zheyuan Li,Jiguo Cao

Journal

Statistics and Computing

Published Date

2023/2

The selection of smoothing parameter is central to the estimation of penalized splines. The best value of the smoothing parameter is often the one that optimizes a smoothness selection criterion, such as generalized cross-validation error (GCV) and restricted likelihood (REML). To correctly identify the global optimum rather than being trapped in an undesired local optimum, grid search is recommended for optimization. Unfortunately, the grid search method requires a pre-specified search interval that contains the unknown global optimum, yet no guideline is available for providing this interval. As a result, practitioners have to find it by trial and error. To overcome such difficulty, we develop novel algorithms to automatically find this interval. Our automatic search interval has four advantages. (i) It specifies a smoothing parameter range where the associated penalized least squares problem is numerically solvable. (ii) It is …

Identifying regions of interest in mammogram images

Authors

Shu Jiang,Jiguo Cao,Graham A Colditz

Journal

Statistical methods in medical research

Published Date

2023/5

Screening mammography is the primary preventive strategy for early detection of breast cancer and an essential input to breast cancer risk prediction and application of prevention/risk management guidelines. Identifying regions of interest within mammogram images that are associated with 5- or 10-year breast cancer risk is therefore clinically meaningful. The problem is complicated by the irregular boundary issue posed by the semi-circular domain of the breast area within mammograms. Accommodating the irregular domain is especially crucial when identifying regions of interest, as the true signal comes only from the semi-circular domain of the breast region, and noise elsewhere. We address these challenges by introducing a proportional hazards model with imaging predictors characterized by bivariate splines over triangulation. The model sparsity is enforced with the group lasso penalty function. We apply …

See List of Professors in Jiguo Cao University(Simon Fraser University)

Jiguo Cao FAQs

What is Jiguo Cao's h-index at Simon Fraser University?

The h-index of Jiguo Cao has been 21 since 2020 and 28 in total.

What are Jiguo Cao's top articles?

The articles with the titles of

1 Product of Normal Densities

Functional Autoencoder for Smoothing and Representation Learning

Functional nonlinear learning

How are PreLaunch online movie reviews related to box office revenues?

Functional partial least squares with censored outcomes: Prediction of breast cancer risk with mammogram images

Faster Asymptotic Solutions for N-Mixtures on Large Populations

Learning from limited temporal data: Dynamically sparse historical functional linear models with applications to Earth science

Predicting Alzheimer’s disease progression in healthy and MCI subjects using multi‐modal deep learning approach

...

are the top articles of Jiguo Cao at Simon Fraser University.

What are Jiguo Cao's research interests?

The research interests of Jiguo Cao are: Functional Data Analysis, Estimating Differential Equations, Machine Learning

What is Jiguo Cao's total number of citations?

Jiguo Cao has 3,654 citations in total.

What are the co-authors of Jiguo Cao?

The co-authors of Jiguo Cao are Hongyu Zhao, Ke Wang, Giles Hooker, Wei Luo, Zhong Wang, Arnab Maity.

    Co-Authors

    H-index: 112
    Hongyu Zhao

    Hongyu Zhao

    Yale University

    H-index: 69
    Ke Wang

    Ke Wang

    Simon Fraser University

    H-index: 37
    Giles Hooker

    Giles Hooker

    Cornell University

    H-index: 26
    Wei Luo

    Wei Luo

    Deakin University

    H-index: 24
    Zhong Wang

    Zhong Wang

    Cornell University

    H-index: 23
    Arnab Maity

    Arnab Maity

    North Carolina State University

    academic-engine

    Useful Links