Jianqing Fan

Princeton University

H-index: 121

North America-United States

Description

Jianqing Fan, With an exceptional h-index of 121 and a recent h-index of 88 (since 2020), a distinguished researcher at Princeton University, specializes in the field of Statistics, machine learning, finance, economics, mathematical statistics.

His recent articles reflect a diverse array of research interests and contributions to the field:

An overview of diffusion models: Applications, guided generation, statistical rates and optimization

Semi-parametric tensor factor analysis by iteratively projected singular value decomposition

Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel

Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

Policy optimization using semiparametric models for dynamic pricing

A general theory for robust clustering via trimmed mean

Ranking inferences based on the top choice of multiway comparisons

Robust Transfer Learning with Unreliable Source Data

Professor Information

University	Princeton University
Position	Professor of Statistics Professor of Finance
Citations(all)	84687
Citations(since 2020)	33959
Cited By	64242
hIndex(all)	121
hIndex(since 2020)	88
i10Index(all)	339
i10Index(since 2020)	252
Email	Access Email
University Profile Page	Princeton University

Research & Interests List

Statistics

machine learning

finance

economics

mathematical statistics

Top articles of Jianqing Fan

An overview of diffusion models: Applications, guided generation, statistical rates and optimization

Diffusion models, a powerful and universal generative AI technology, have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology. In these applications, diffusion models provide flexible high-dimensional data modeling, and act as a sampler for generating new samples under active guidance towards task-desired properties. Despite the significant empirical success, theory of diffusion models is very limited, potentially slowing down principled methodological innovations for further harnessing and improving diffusion models. In this paper, we review emerging applications of diffusion models, understanding their sample generation under various controls. Next, we overview the existing theories of diffusion models, covering their statistical properties and sampling capabilities. We adopt a progressive routine, beginning with unconditional diffusion models and connecting to conditional counterparts. Further, we review a new avenue in high-dimensional structured optimization through conditional diffusion models, where searching for solutions is reformulated as a conditional sampling problem and solved by diffusion models. Lastly, we discuss future directions about diffusion models. The purpose of this paper is to provide a well-rounded theoretical exposure for stimulating forward-looking theories and methods of diffusion models.

Authors

Minshuo Chen,Song Mei,Jianqing Fan,Mengdi Wang

Published Date

2024/4/11

Semi-parametric tensor factor analysis by iteratively projected singular value decomposition

This paper introduces a general framework of Semi-parametric TEnsor Factor Analysis (STEFA) that focuses on the methodology and theory of low-rank tensor decomposition with auxiliary covariates. Semi-parametric TEnsor Factor Analysis models extend tensor factor models by incorporating auxiliary covariates in the loading matrices. We propose an algorithm of iteratively projected singular value decomposition (IP-SVD) for the semi-parametric estimation. It iteratively projects tensor data onto the linear space spanned by the basis functions of covariates and applies singular value decomposition on matricized tensors over each mode. We establish the convergence rates of the loading matrices and the core tensor factor. The theoretical results only require a sub-exponential noise distribution, which is weaker than the assumption of sub-Gaussian tail of noise in the literature. Compared with the Tucker …

Authors

Elynn Y Chen,Dong Xia,Chencheng Cai,Jianqing Fan

Journal

Journal of the Royal Statistical Society Series B: Statistical Methodology

Published Date

2024/2/3

Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel

The problem of structured matrix estimation has been studied mostly under strong noise dependence assumptions. This paper considers a general framework of noisy low-rank-plus-sparse matrix recovery, where the noise matrix may come from any joint distribution with arbitrary dependence across entries. We propose an incoherent-constrained least-square estimator and prove its tightness both in the sense of deterministic lower bound and matching minimax risks under various noise distributions. To attain this, we establish a novel result asserting that the difference between two arbitrary low-rank incoherent matrices must spread energy out across its entries, in other words cannot be too sparse, which sheds light on the structure of incoherent low-rank matrices and may be of independent interest. We then showcase the applications of our framework to several important statistical machine learning problems. In the problem of estimating a structured Markov transition kernel, the proposed method achieves the minimax optimality and the result can be extended to estimating the conditional mean operator, a crucial component in reinforcement learning. The applications to multitask regression and structured covariance estimation are also presented. We propose an alternating minimization algorithm to approximately solve the potentially hard optimization problem. Numerical results corroborate the effectiveness of our method which typically converges in a few steps.

Authors

Jinhang Chai,Jianqing Fan

Journal

arXiv preprint arXiv:2401.02520

Published Date

2024/1/4

Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Specifically, consider a γ-discounted, infinite-horizon Markov game with S states, in which the max-player has A actions and the min-player has B actions. We propose a pessimistic model–based algorithm with Bernstein-style lower confidence bounds—called the value iteration with lower confidence bounds for zero-sum Markov games—that provably finds an ε-approximate Nash equilibrium with a sample complexity no larger than (up to some log factor). Here, is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-à-vis the target data), and the target accuracy ε can be any value within . Our sample complexity bound strengthens prior art by a factor of , achieving minimax optimality for a broad regime …

Authors

Yuling Yan,Gen Li,Yuxin Chen,Jianqing Fan

Journal

Operations Research

Published Date

2024/4/2

Policy optimization using semiparametric models for dynamic pricing

In this article, we study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise. Products are sold one at a time, and only a binary response indicating success or failure of a sale is observed. Our model setting is similar to the work by? except that we expand the demand curve to a semiparametric model and learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision making policy that minimizes regret (maximizes revenue) by combining semiparametric estimation for a generalized linear model with unknown link and online decision making. Under mild conditions, for a market noise cdf with mth order derivative ( ), our policy achieves a regret upper bound of O˜d(T2m+14m−1), where T is the time horizon and O˜d is the order hiding logarithmic terms and the feature …

Authors

Jianqing Fan,Yongyi Guo,Mengxin Yu

Journal

Journal of the American Statistical Association

Published Date

2022/10/28

A general theory for robust clustering via trimmed mean

Clustering is a fundamental tool in statistical machine learning in the presence of heterogeneous data. Many recent results focus primarily on optimal mislabeling guarantees, when data are distributed around centroids with sub-Gaussian errors. Yet, the restrictive sub-Gaussian model is often invalid in practice, since various real-world applications exhibit heavy tail distributions around the centroids or suffer from possible adversarial attacks that call for robust clustering with a robust data-driven initialization. In this paper, we introduce a hybrid clustering technique with a novel multivariate trimmed mean type centroid estimate to produce mislabeling guarantees under a weak initialization condition for general error distributions around the centroids. A matching lower bound is derived, up to factors depending on the number of clusters. In addition, our approach also produces the optimal mislabeling even in the presence of adversarial outliers. Our results reduce to the sub-Gaussian case when errors follow sub-Gaussian distributions. To solve the problem thoroughly, we also present novel data-driven robust initialization techniques and show that, with probabilities approaching one, these initial centroid estimates are sufficiently good for the subsequent clustering algorithm to achieve the optimal mislabeling rates. Furthermore, we demonstrate that the Lloyd algorithm is suboptimal for more than two clusters even when errors are Gaussian, and for two clusters when errors distributions have heavy tails. Both simulated data and real data examples lend further support to both of our robust initialization procedure and clustering algorithm.

Authors

Soham Jana,Jianqing Fan,Sanjeev Kulkarni

Journal

arXiv preprint arXiv:2401.05574

Published Date

2024/1/10

Ranking inferences based on the top choice of multiway comparisons

Motivated by many applications such as online recommendations and individual choices, this paper considers ranking inference of n items based on the observed data on the top choice among M randomly selected items at each trial. This is a useful modification of the Plackett-Luce model for M-way ranking with only the top choice observed and is an extension of the celebrated Bradley-Terry-Luce model that corresponds to M = 2. Under a uniform sampling scheme in which any M distinguished items are selected for comparisons with probability p and the selected M items are compared L times with multinomial outcomes, we establish the statistical rates of convergence for underlying n preference scores using both -norm and -norm, under the minimum sampling complexity (smallest order of p). In addition, we establish the asymptotic normality of the maximum likelihood estimator that allows us to construct …

Authors

Jianqing Fan,Zhipeng Lou,Weichen Wang,Mengxin Yu

Journal

Journal of the American Statistical Association

Published Date

2024/2/7

Robust Transfer Learning with Unreliable Source Data

This paper addresses challenges in robust transfer learning stemming from ambiguity in Bayes classifiers and weak transferable signals between the target and source distribution. We introduce a novel quantity called the ''ambiguity level'' that measures the discrepancy between the target and source regression functions, propose a simple transfer learning procedure, and establish a general theorem that shows how this new quantity is related to the transferability of learning in terms of risk improvements. Our proposed ''Transfer Around Boundary'' (TAB) model, with a threshold balancing the performance of target and source data, is shown to be both efficient and robust, improving classification while avoiding negative transfer. Moreover, we demonstrate the effectiveness of the TAB model on non-parametric classification and logistic regression tasks, achieving upper bounds which are optimal up to logarithmic factors. Simulation studies lend further support to the effectiveness of TAB. We also provide simple approaches to bound the excess misclassification error without the need for specialized knowledge in transfer learning.

Authors

Jianqing Fan,Cheng Gao,Jason M Klusowski

Journal

arXiv preprint arXiv:2310.04606

Published Date

2023/10/6