Jianqing Fan
Princeton University
H-index: 121
North America-United States
Description
Jianqing Fan, With an exceptional h-index of 121 and a recent h-index of 88 (since 2020), a distinguished researcher at Princeton University, specializes in the field of Statistics, machine learning, finance, economics, mathematical statistics.
His recent articles reflect a diverse array of research interests and contributions to the field:
An overview of diffusion models: Applications, guided generation, statistical rates and optimization
Semi-parametric tensor factor analysis by iteratively projected singular value decomposition
Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel
Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
Policy optimization using semiparametric models for dynamic pricing
A general theory for robust clustering via trimmed mean
Ranking inferences based on the top choice of multiway comparisons
Robust Transfer Learning with Unreliable Source Data
Professor Information
University | Princeton University |
---|---|
Position | Professor of Statistics Professor of Finance |
Citations(all) | 84687 |
Citations(since 2020) | 33959 |
Cited By | 64242 |
hIndex(all) | 121 |
hIndex(since 2020) | 88 |
i10Index(all) | 339 |
i10Index(since 2020) | 252 |
University Profile Page | Princeton University |
Research & Interests List
Statistics
machine learning
finance
economics
mathematical statistics
Top articles of Jianqing Fan
An overview of diffusion models: Applications, guided generation, statistical rates and optimization
Diffusion models, a powerful and universal generative AI technology, have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology. In these applications, diffusion models provide flexible high-dimensional data modeling, and act as a sampler for generating new samples under active guidance towards task-desired properties. Despite the significant empirical success, theory of diffusion models is very limited, potentially slowing down principled methodological innovations for further harnessing and improving diffusion models. In this paper, we review emerging applications of diffusion models, understanding their sample generation under various controls. Next, we overview the existing theories of diffusion models, covering their statistical properties and sampling capabilities. We adopt a progressive routine, beginning with unconditional diffusion models and connecting to conditional counterparts. Further, we review a new avenue in high-dimensional structured optimization through conditional diffusion models, where searching for solutions is reformulated as a conditional sampling problem and solved by diffusion models. Lastly, we discuss future directions about diffusion models. The purpose of this paper is to provide a well-rounded theoretical exposure for stimulating forward-looking theories and methods of diffusion models.
Authors
Minshuo Chen,Song Mei,Jianqing Fan,Mengdi Wang
Published Date
2024/4/11
Semi-parametric tensor factor analysis by iteratively projected singular value decomposition
This paper introduces a general framework of Semi-parametric TEnsor Factor Analysis (STEFA) that focuses on the methodology and theory of low-rank tensor decomposition with auxiliary covariates. Semi-parametric TEnsor Factor Analysis models extend tensor factor models by incorporating auxiliary covariates in the loading matrices. We propose an algorithm of iteratively projected singular value decomposition (IP-SVD) for the semi-parametric estimation. It iteratively projects tensor data onto the linear space spanned by the basis functions of covariates and applies singular value decomposition on matricized tensors over each mode. We establish the convergence rates of the loading matrices and the core tensor factor. The theoretical results only require a sub-exponential noise distribution, which is weaker than the assumption of sub-Gaussian tail of noise in the literature. Compared with the Tucker …
Authors
Elynn Y Chen,Dong Xia,Chencheng Cai,Jianqing Fan
Journal
Journal of the Royal Statistical Society Series B: Statistical Methodology
Published Date
2024/2/3
Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel
The problem of structured matrix estimation has been studied mostly under strong noise dependence assumptions. This paper considers a general framework of noisy low-rank-plus-sparse matrix recovery, where the noise matrix may come from any joint distribution with arbitrary dependence across entries. We propose an incoherent-constrained least-square estimator and prove its tightness both in the sense of deterministic lower bound and matching minimax risks under various noise distributions. To attain this, we establish a novel result asserting that the difference between two arbitrary low-rank incoherent matrices must spread energy out across its entries, in other words cannot be too sparse, which sheds light on the structure of incoherent low-rank matrices and may be of independent interest. We then showcase the applications of our framework to several important statistical machine learning problems. In the problem of estimating a structured Markov transition kernel, the proposed method achieves the minimax optimality and the result can be extended to estimating the conditional mean operator, a crucial component in reinforcement learning. The applications to multitask regression and structured covariance estimation are also presented. We propose an alternating minimization algorithm to approximately solve the potentially hard optimization problem. Numerical results corroborate the effectiveness of our method which typically converges in a few steps.
Authors
Jinhang Chai,Jianqing Fan
Journal
arXiv preprint arXiv:2401.02520
Published Date
2024/1/4
Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Specifically, consider a γ-discounted, infinite-horizon Markov game with S states, in which the max-player has A actions and the min-player has B actions. We propose a pessimistic model–based algorithm with Bernstein-style lower confidence bounds—called the value iteration with lower confidence bounds for zero-sum Markov games—that provably finds an ε-approximate Nash equilibrium with a sample complexity no larger than (up to some log factor). Here, is some unilateral clipped concentrability coefficient that reflects the coverage and distribution shift of the available data (vis-à-vis the target data), and the target accuracy ε can be any value within . Our sample complexity bound strengthens prior art by a factor of , achieving minimax optimality for a broad regime …
Authors
Yuling Yan,Gen Li,Yuxin Chen,Jianqing Fan
Journal
Operations Research
Published Date
2024/4/2
Policy optimization using semiparametric models for dynamic pricing
In this article, we study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise. Products are sold one at a time, and only a binary response indicating success or failure of a sale is observed. Our model setting is similar to the work by? except that we expand the demand curve to a semiparametric model and learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision making policy that minimizes regret (maximizes revenue) by combining semiparametric estimation for a generalized linear model with unknown link and online decision making. Under mild conditions, for a market noise cdf with mth order derivative ( ), our policy achieves a regret upper bound of O˜d(T2m+14m−1), where T is the time horizon and O˜d is the order hiding logarithmic terms and the feature …
Authors
Jianqing Fan,Yongyi Guo,Mengxin Yu
Journal
Journal of the American Statistical Association
Published Date
2022/10/28
A general theory for robust clustering via trimmed mean
Clustering is a fundamental tool in statistical machine learning in the presence of heterogeneous data. Many recent results focus primarily on optimal mislabeling guarantees, when data are distributed around centroids with sub-Gaussian errors. Yet, the restrictive sub-Gaussian model is often invalid in practice, since various real-world applications exhibit heavy tail distributions around the centroids or suffer from possible adversarial attacks that call for robust clustering with a robust data-driven initialization. In this paper, we introduce a hybrid clustering technique with a novel multivariate trimmed mean type centroid estimate to produce mislabeling guarantees under a weak initialization condition for general error distributions around the centroids. A matching lower bound is derived, up to factors depending on the number of clusters. In addition, our approach also produces the optimal mislabeling even in the presence of adversarial outliers. Our results reduce to the sub-Gaussian case when errors follow sub-Gaussian distributions. To solve the problem thoroughly, we also present novel data-driven robust initialization techniques and show that, with probabilities approaching one, these initial centroid estimates are sufficiently good for the subsequent clustering algorithm to achieve the optimal mislabeling rates. Furthermore, we demonstrate that the Lloyd algorithm is suboptimal for more than two clusters even when errors are Gaussian, and for two clusters when errors distributions have heavy tails. Both simulated data and real data examples lend further support to both of our robust initialization procedure and clustering algorithm.
Authors
Soham Jana,Jianqing Fan,Sanjeev Kulkarni
Journal
arXiv preprint arXiv:2401.05574
Published Date
2024/1/10
Ranking inferences based on the top choice of multiway comparisons
Motivated by many applications such as online recommendations and individual choices, this paper considers ranking inference of n items based on the observed data on the top choice among M randomly selected items at each trial. This is a useful modification of the Plackett-Luce model for M-way ranking with only the top choice observed and is an extension of the celebrated Bradley-Terry-Luce model that corresponds to M = 2. Under a uniform sampling scheme in which any M distinguished items are selected for comparisons with probability p and the selected M items are compared L times with multinomial outcomes, we establish the statistical rates of convergence for underlying n preference scores using both -norm and -norm, under the minimum sampling complexity (smallest order of p). In addition, we establish the asymptotic normality of the maximum likelihood estimator that allows us to construct …
Authors
Jianqing Fan,Zhipeng Lou,Weichen Wang,Mengxin Yu
Journal
Journal of the American Statistical Association
Published Date
2024/2/7
Robust Transfer Learning with Unreliable Source Data
This paper addresses challenges in robust transfer learning stemming from ambiguity in Bayes classifiers and weak transferable signals between the target and source distribution. We introduce a novel quantity called the ''ambiguity level'' that measures the discrepancy between the target and source regression functions, propose a simple transfer learning procedure, and establish a general theorem that shows how this new quantity is related to the transferability of learning in terms of risk improvements. Our proposed ''Transfer Around Boundary'' (TAB) model, with a threshold balancing the performance of target and source data, is shown to be both efficient and robust, improving classification while avoiding negative transfer. Moreover, we demonstrate the effectiveness of the TAB model on non-parametric classification and logistic regression tasks, achieving upper bounds which are optimal up to logarithmic factors. Simulation studies lend further support to the effectiveness of TAB. We also provide simple approaches to bound the excess misclassification error without the need for specialized knowledge in transfer learning.
Authors
Jianqing Fan,Cheng Gao,Jason M Klusowski
Journal
arXiv preprint arXiv:2310.04606
Published Date
2023/10/6
Professor FAQs
What is Jianqing Fan's h-index at Princeton University?
The h-index of Jianqing Fan has been 88 since 2020 and 121 in total.
What are Jianqing Fan's top articles?
The articles with the titles of
An overview of diffusion models: Applications, guided generation, statistical rates and optimization
Semi-parametric tensor factor analysis by iteratively projected singular value decomposition
Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel
Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games
Policy optimization using semiparametric models for dynamic pricing
A general theory for robust clustering via trimmed mean
Ranking inferences based on the top choice of multiway comparisons
Robust Transfer Learning with Unreliable Source Data
...
are the top articles of Jianqing Fan at Princeton University.
What are Jianqing Fan's research interests?
The research interests of Jianqing Fan are: Statistics, machine learning, finance, economics, mathematical statistics
What is Jianqing Fan's total number of citations?
Jianqing Fan has 84,687 citations in total.
What are the co-authors of Jianqing Fan?
The co-authors of Jianqing Fan are Runze Li, Han Liu, Irène Gijbels, Yi Ren, Yuxin Chen, Zongwu Cai.