Santosh S. Vempala

Santosh S. Vempala

Georgia Institute of Technology

H-index: 71

North America-United States

About Santosh S. Vempala

Santosh S. Vempala, With an exceptional h-index of 71 and a recent h-index of 50 (since 2020), a distinguished researcher at Georgia Institute of Technology, specializes in the field of Algorithms, Randomness, High-dimensional Geometry, Optimization, Foundations of Data Science.

His recent articles reflect a diverse array of research interests and contributions to the field:

Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time

Eldan's stochastic localization and the KLS conjecture: Isoperimetry, concentration and mixing

Computation with Sequences of Assemblies in a Model of the Brain

Sampling with Barriers: Faster Mixing via Lewis Weights

Efficiently Sampling the PSD Cone with the Metric Dikin Walk

Is Planted Coloring Easier than Planted Clique?

Calibrated language models must hallucinate

Condition-number-independent convergence rate of Riemannian Hamiltonian Monte Carlo with numerical integrators

Santosh S. Vempala Information

University

Georgia Institute of Technology

Position

___

Citations(all)

21988

Citations(since 2020)

8617

Cited By

17330

hIndex(all)

71

hIndex(since 2020)

50

i10Index(all)

161

i10Index(since 2020)

122

Email

University Profile Page

Georgia Institute of Technology

Santosh S. Vempala Skills & Research Interests

Algorithms

Randomness

High-dimensional Geometry

Optimization

Foundations of Data Science

Top articles of Santosh S. Vempala

Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time

Authors

Xinyuan Cao,Santosh Vempala

Journal

Advances in Neural Information Processing Systems

Published Date

2024/2/13

We give a polynomial-time algorithm for learning high-dimensional halfspaces with margins in -dimensional space to within desired TV distance when the ambient distribution is an unknown affine transformation of the -fold product of an (unknown) symmetric one-dimensional logconcave distribution, and the halfspace is introduced by deleting at least an fraction of the data in one of the component distributions. Notably, our algorithm does not need labels and establishes the unique (and efficient) identifiability of the hidden halfspace under this distributional assumption. The sample and time complexity of the algorithm are polynomial in the dimension and . The algorithm uses only the first two moments of* suitable re-weightings* of the empirical distribution, which we call* contrastive moments*; its analysis uses classical facts about generalized Dirichlet polynomials and relies crucially on a new monotonicity property of the moment ratio of truncations of logconcave distributions. Such algorithms, based only on first and second moments were suggested in earlier work, but hitherto eluded rigorous guarantees. Prior work addressed the special case when the underlying distribution is Gaussian via Non-Gaussian Component Analysis. We improve on this by providing polytime guarantees based on Total Variation (TV) distance, in place of existing moment-bound guarantees that can be super-polynomial. Our work is also the first to go beyond Gaussians in this setting.

Eldan's stochastic localization and the KLS conjecture: Isoperimetry, concentration and mixing

Authors

Yin Tat Lee,Santosh S Vempala

Journal

arXiv preprint arXiv:1612.01507

Published Date

2016/12/5

We show that the Cheeger constant for -dimensional isotropic logconcave measures is , improving on the previous best bound of As corollaries we obtain the same improved bound on the thin-shell estimate, Poincar\'{e} constant and Lipschitz concentration constant and an alternative proof of this bound for the isotropic (slicing) constant; it also follows that the ball walk for sampling from an isotropic logconcave density in converges in steps from a warm start. The proof is based on gradually transforming any logconcave density to one that has a significant Gaussian factor via a Martingale process. Extending this proof technique, we prove that the log-Sobolev constant of any isotropic logconcave density in with support of diameter is , resolving a question posed by Frieze and Kannan in 1997. This is asymptotically the best possible estimate and improves on the previous bound of by Kannan-Lov\'{a}sz-Montenegro. It follows that for any isotropic logconcave density, the ball walk with step size mixes in proper steps from \emph{any }starting point. This improves on the previous best bound of and is also asymptotically tight. The new bound leads to the following large deviation inequality for an -Lipschitz function over an isotropic logconcave density : for any , where is the median or mean of for ; this generalizes and improves on previous bounds by Paouris and by Guedon-Milman. The technique also bounds the ``small ball'' probability in terms of the Cheeger constant, and recovers the current best bound.

Computation with Sequences of Assemblies in a Model of the Brain

Authors

Max Dabagia,Christos Papadimitriou,Santosh Vempala

Published Date

2024/3/15

Even as machine learning exceeds human-level performance on many applications, the generality, robustness, and rapidity of the brain’s learning capabilities remain unmatched. How cognition arises from neural activity is the central open question in neuroscience, inextricable from the study of intelligence itself. A simple formal model of neural activity was proposed in Papadimitriou (2020) and has been subsequently shown, through both mathematical proofs and simulations, to be capable of implementing certain simple cognitive operations via the creation and manipulation of assemblies of neurons. However, many intelligent behaviors rely on the ability to recognize, store, and manipulate temporal sequences of stimuli (planning, language, navigation, to list a few). Here we show that, in the same model, time can be captured naturally as precedence through synaptic weights and plasticity, and, as a result, a range of computations on sequences of assemblies can be carried out. In particular, repeated presentation of a sequence of stimuli leads to the memorization of the sequence through corresponding neural assemblies: upon future presentation of any stimulus in the sequence, the corresponding assembly and its subsequent ones will be activated, one after the other, until the end of the sequence. If the stimulus sequence is presented to two brain areas simultaneously, a scaffolded representation is created, resulting in more efficient memorization and recall, in agreement with cognitive experiments. Finally, we show that any finite state machine can be learned in a similar way, through the presentation of appropriate patterns of sequences …

Sampling with Barriers: Faster Mixing via Lewis Weights

Authors

Khashayar Gatmiry,Jonathan Kelner,Santosh S Vempala

Journal

arXiv preprint arXiv:2303.00480

Published Date

2023/3/1

We analyze Riemannian Hamiltonian Monte Carlo (RHMC) for sampling a polytope defined by inequalities in $\R^n$ endowed with the metric defined by the Hessian of a convex barrier function. The advantage of RHMC over Euclidean methods such as the ball walk, hit-and-run and the Dikin walk is in its ability to take longer steps. However, in all previous work, the mixing rate has a linear dependence on the number of inequalities. We introduce a hybrid of the Lewis weights barrier and the standard logarithmic barrier and prove that the mixing rate for the corresponding RHMC is bounded by , improving on the previous best bound of (based on the log barrier). This continues the general parallels between optimization and sampling, with the latter typically leading to new tools and more refined analysis. To prove our main results, we have to overcomes several challenges relating to the smoothness of Hamiltonian curves and the self-concordance properties of the barrier. In the process, we give a general framework for the analysis of Markov chains on Riemannian manifolds, derive new smoothness bounds on Hamiltonian curves, a central topic of comparison geometry, and extend self-concordance to the infinity norm, which gives sharper bounds; these properties appear to be of independent interest.

Efficiently Sampling the PSD Cone with the Metric Dikin Walk

Authors

Yunbum Kook,Santosh S Vempala

Journal

arXiv preprint arXiv:2307.12943

Published Date

2023/7/24

Semi-definite programs represent a frontier of efficient computation. While there has been much progress on semi-definite optimization, with moderate-sized instances currently solvable in practice by the interior-point method, the basic problem of sampling semi-definite solutions remains a formidable challenge. The direct application of known polynomial-time algorithms for sampling general convex bodies to semi-definite sampling leads to a prohibitively high running time. In addition, known general methods require an expensive rounding phase as pre-processing. Here we analyze the Dikin walk, by first adapting it to general metrics, then devising suitable metrics for the PSD cone with affine constraints. The resulting mixing time and per-step complexity are considerably smaller, and by an appropriate choice of the metric, the dependence on the number of constraints can be made polylogarithmic. We introduce a refined notion of self-concordant matrix functions and give rules for combining different metrics. Along the way, we further develop the theory of interior-point methods for sampling.

Is Planted Coloring Easier than Planted Clique?

Authors

Pravesh Kothari,Santosh S Vempala,Alexander S Wein,Jeff Xu

Published Date

2023/7/12

We study the computational complexity of two related problems: recovering a planted q-coloring in G (n, 1/2), and finding efficiently verifiable witnesses of non-q-colorability (aka refutations) in G (n, 1/2). Our main results show hardness for both these problems in a restricted-but-powerful class of algorithms based on computing low-degree polynomials in the inputs. The problem of recovering a planted q-coloring is equivalent to recovering q disjoint planted cliques that cover all the vertices—a potentially easier variant of the well-studied planted clique problem. Our first result shows that this variant is as hard as the original planted clique problem in the low-degree polynomial model of computation: each clique needs to have size k>> sqrt (n) for efficient recovery to be possible. For the related variant where the cliques cover a (1-epsilon)-fraction of the vertices, we also show hardness by reduction from planted clique. Our second result shows that refuting q-colorability of G (n, 1/2) is hard in the low-degree polynomial model when q>> n^{2/3} but easy when q<< n^{1/2}, and we leave closing this gap for future work. Our proof is more subtle than similar results for planted clique and involves constructing a non-standard distribution over q-colorable graphs. We note that while related to several prior works, this is the first work that explicitly formulates refutation problems in the low-degree polynomial model. The proofs of our main results involve showing low-degree hardness of hypothesis testing between an appropriately constructed pair of distributions. For refutation, we show completeness of this approach: in the low-degree model, the refutation task is …

Calibrated language models must hallucinate

Authors

Adam Tauman Kalai,Santosh S Vempala

Journal

arXiv preprint arXiv:2311.14648

Published Date

2023/11/24

Recent language models generate false but plausible-sounding text with surprising frequency. Such "hallucinations" are an obstacle to the usability of language-based AI systems and can harm people who rely upon their outputs. This work shows shows that there is an inherent statistical lower-bound on the rate that pretrained language models hallucinate certain types of facts, having nothing to do with the transformer LM architecture or data quality. For "arbitrary" facts whose veracity cannot be determined from the training data, we show that hallucinations must occur at a certain rate for language models that satisfy a statistical calibration condition appropriate for generative language models. Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training data (a "Good-Turing" estimate), even assuming ideal training data without errors. One conclusion is that models pretrained to be sufficiently good predictors (i.e., calibrated) may require post-training to mitigate hallucinations on the type of arbitrary facts that tend to appear once in the training set. However, our analysis also suggests that there is no statistical reason that pretraining will lead to hallucination on facts that tend to appear more than once in the training data (like references to publications such as articles and books, whose hallucinations have been particularly notable and problematic) or on systematic facts (like arithmetic calculations). Therefore, different architectures and learning algorithms may mitigate these latter types of hallucinations.

Condition-number-independent convergence rate of Riemannian Hamiltonian Monte Carlo with numerical integrators

Authors

Yunbum Kook,Yin Tat Lee,Ruoqi Shen,Santosh Vempala

Published Date

2023/7/12

We study the convergence rate of discretized Riemannian Hamiltonian Monte Carlo on sampling from distributions in the form of on a convex body $\mathcal {M}\subset\R^{n} $. We show that for distributions in the form of on a polytope with constraints, the convergence rate of a family of commonly-used integrators is independent of and the geometry of the polytope. In particular, the implicit midpoint method (IMM) and the generalized Leapfrog method (LM) have a mixing time of to achieve total variation distance to the target distribution. These guarantees are based on a general bound on the convergence rate for densities of the form in terms of parameters of the manifold and the integrator. Our theoretical guarantee complements the empirical results of\cite {kook2022sampling}, which shows that RHMC with IMM can sample ill-conditioned, non-smooth and constrained distributions in very high dimension efficiently in practice.

The bit complexity of efficient continuous optimization

Authors

Mehrdad Ghadiri,Richard Peng,Santosh S Vempala

Published Date

2023/4/4

We analyze the bit complexity of efficient algorithms for fundamental optimization problems, such as linear regression, p-norm regression, and linear programming (LP). State-of-the-art algorithms are iterative, and in terms of the number of arithmetic operations, they match the current time complexity of multiplying two n-by-n matrices (up to polylogarithmic factors). However, previous work has typically assumed infinite precision arithmetic, and due to complicated inverse maintenance techniques, the actual running times of these algorithms are unknown. To settle the running time and bit complexity of these algorithms, we demonstrate that a core common subroutine, known as inverse maintenance, is backward-stable. Additionally, we show that iterative approaches for solving constrained weighted regression problems can be accomplished with bounded-error preconditioners. Specifically, we prove that linear programs can be …

The -Cap Process on Geometric Random Graphs

Authors

Mirabel E Reid,Santosh S Vempala

Published Date

2023/7/12

The -cap (or -winners-take-all) process on a graph works as follows: in each iteration, a subset of vertices of the graph are identified as winners; the next round winners are the vertices that have the highest total degree from the current winners, with ties broken randomly. This natural process is a simple model of firing activity and inhibition in the brain and has been found to have desirable robustness properties as an activation function. We study its convergence on directed geometric random graphs in any constant dimension, revealing rather surprising behavior, with the support of the current active set converging to lie in a small ball and the active set itself remaining essentially random within that.

Beyond moments: Robustly learning affine transformations with asymptotically optimal error

Authors

He Jia,Pravesh K Kothari,Santosh S Vempala

Published Date

2023/11/6

We present a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Specifically, given an -corrupted sample from a distribution D obtained by applying an unknown affine transformation to the uniform distribution on a d-dimensional hypercube , our algorithm constructs such that the total variation distance of the distribution from D is using poly time and samples. Total variation distance is the information-theoretically strongest possible notion of distance in our setting and our recovery guarantees in this distance are optimal up to the absolute constant factor multiplying . In particular, if the rows of A are normalized to be unit length, our total variation distance guarantee implies a bound on the sum of the distances between the row vectors of A and …

Computation with Sequences in the Brain

Authors

Max Dabagia,Christos H Papadimitriou,Santosh S Vempala

Journal

arXiv preprint arXiv:2306.03812

Published Date

2023/6/6

Even as machine learning exceeds human-level performance on many applications, the generality, robustness, and rapidity of the brain's learning capabilities remain unmatched. How cognition arises from neural activity is a central open question in neuroscience, inextricable from the study of intelligence itself. A simple formal model of neural activity was proposed in Papadimitriou [2020] and has been subsequently shown, through both mathematical proofs and simulations, to be capable of implementing certain simple cognitive operations via the creation and manipulation of assemblies of neurons. However, many intelligent behaviors rely on the ability to recognize, store, and manipulate temporal sequences of stimuli (planning, language, navigation, to list a few). Here we show that, in the same model, time can be captured naturally as precedence through synaptic weights and plasticity, and, as a result, a range of computations on sequences of assemblies can be carried out. In particular, repeated presentation of a sequence of stimuli leads to the memorization of the sequence through corresponding neural assemblies: upon future presentation of any stimulus in the sequence, the corresponding assembly and its subsequent ones will be activated, one after the other, until the end of the sequence. Finally, we show that any finite state machine can be learned in a similar way, through the presentation of appropriate patterns of sequences. Through an extension of this mechanism, the model can be shown to be capable of universal computation. We support our analysis with a number of experiments to probe the limits of learning in this model in …

Convergence of Gibbs sampling: Coordinate Hit-and-Run mixes fast

Authors

Aditi Laddha,Santosh S Vempala

Journal

Discrete & Computational Geometry

Published Date

2023/9

Gibbs sampling, also known as Coordinate Hit-and-Run (CHAR), is a Markov chain Monte Carlo algorithm for sampling from high-dimensional distributions. In each step, the algorithm selects a random coordinate and re-samples that coordinate from the distribution induced by fixing all the other coordinates. While this algorithm has become widely used over the past half-century, guarantees of efficient convergence have been elusive. We show that the Coordinate Hit-and-Run algorithm for sampling from a convex body K in mixes in steps, where K contains a ball of radius r and R is the average distance of a point of K from its centroid. We also give an upper bound on the conductance of Coordinate Hit-and-Run, showing that it is strictly worse than Hit-and-Run or the Ball Walk in the worst case.

The manifold joys of sampling (invited talk)

Authors

Yin Tat Lee,Santosh S Vempala

Published Date

2022

We survey recent progress and many open questions in the field of sampling high-dimensional distributions, with specific focus on sampling with non-Euclidean metrics.

A unified view of graph regularity via matrix decompositions

Authors

Greg Bodwin,Santosh Vempala

Journal

Random Structures & Algorithms

Published Date

2022/8

We give a unified proof of algorithmic weak and Szemerédi regularity lemmas for several well‐studied classes of sparse graphs, for which only weak regularity lemmas were previously known. These include core‐dense graphs, low threshold rank graphs, and (a version of) upper regular graphs. More precisely, we define cut pseudorandom graphs, we prove our regularity lemmas for these graphs, and then we show that cut pseudorandomness captures all of the above graph classes as special cases. The core of our approach is an abstracted matrix decomposition, which can be computed by a simple algorithm by Charikar. Using work of Oveis Gharan and Trevisan, it also implies new PTASes for MAX‐CUT, MAX‐BISECTION, MIN‐BISECTION for a significantly expanded class of input graphs. (It is NP Hard to get PTASes for these graphs in general.)

Provable lifelong learning of representations

Authors

Xinyuan Cao,Weiyang Liu,Santosh S Vempala

Published Date

2022

In lifelong learning, tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a lifelong learning algorithm that maintains and refines the internal feature representation. We prove that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation. The resulting sample complexity improves significantly on existing bounds. In the setting of linear features, our algorithm is provably efficient and the sample complexity for input dimension , tasks with features up to error is . We also prove a matching lower bound for any lifelong learning algorithm that uses a single task learner as a black box. We complement our analysis with an empirical study, including a heuristic lifelong learning algorithm for deep neural networks. Our method performs favorably on challenging realistic image datasets compared to state-of-the-art continual learning methods.

Geodesic Walks in Polytopes

Authors

Yin Tat Lee,Santosh Vempala

Journal

SIAM Journal on Computing

Published Date

2022

We introduce the geodesic walk for sampling Riemannian manifolds and apply it to the problem of generating uniform random points from the interior of polytopes in specified by inequalities. The walk is a discrete-time simulation of a stochastic differential equation on the Riemannian manifold equipped with the metric induced by the Hessian of a convex function; each step is the solution of an ordinary differential equation (ODE). The resulting sampling algorithm for polytopes mixes in steps. This is the first walk that breaks the quadratic barrier for mixing in high dimension, improving on the previous best bound of by Kannan and Narayanan for the Dikin walk. We also show that each step of the geodesic walk (solving an ODE) can be implemented efficiently, thus improving the time complexity for sampling polytopes. Our analysis of the geodesic walk for general Hessian manifolds does not …

Sampling with Riemannian Hamiltonian Monte Carlo in a constrained space

Authors

Yunbum Kook,Yin-Tat Lee,Ruoqi Shen,Santosh Vempala

Journal

Advances in Neural Information Processing Systems

Published Date

2022/12/6

We demonstrate for the first time that ill-conditioned, non-smooth, constrained distributions in very high dimension, upwards of 100,000, can be sampled efficiently\emph {in practice}. Our algorithm incorporates constraints into the Riemannian version of Hamiltonian Monte Carlo and maintains sparsity. This allows us to achieve a mixing rate independent of smoothness and condition numbers. On benchmark data sets in systems biology and linear programming, our algorithm outperforms existing packages by orders of magnitude. In particular, we achieve a 1,000-fold speed-up for sampling from the largest published human metabolic network (RECON3D). Our package has been incorporated into a popular Bioinformatics library.

Assemblies of neurons learn to classify well-separated distributions

Authors

Max Dabagia,Santosh S Vempala,Christos Papadimitriou

Published Date

2022/6/28

An assembly is a large population of neurons whose synchronous firing represents a memory, concept, word, and other cognitive category. Assemblies are believed to provide a bridge between high-level cognitive phenomena and low-level neural activity. Recently, a computational system called the\emph {Assembly Calculus}(AC), with a repertoire of biologically plausible operations on assemblies, has been shown capable of simulating arbitrary space-bounded computation, but also of simulating complex cognitive phenomena such as language, reasoning, and planning. However, the mechanism whereby assemblies can mediate {\em learning} has not been known. Here we present such a mechanism, and prove rigorously that, for simple classification problems defined on distributions of labeled assemblies, a new assembly representing each class can be reliably formed in response to a few stimuli from the class; this assembly is henceforth reliably recalled in response to new stimuli from the same class. Furthermore, such class assemblies will be distinguishable as long as the respective classes are reasonably separated—for example, when they are clusters of similar assemblies, or more generally separable with margin by a linear threshold function. To prove these results, we draw on random graph theory with dynamic edge weights to estimate sequences of activated vertices, yielding strong generalizations of previous calculations and theorems in this field over the past five years. These theorems are backed up by experiments demonstrating the successful formation of assemblies which represent concept classes on synthetic data drawn …

A unified approach to discrepancy minimization

Authors

Nikhil Bansal,Aditi Laddha,Santosh S Vempala

Journal

arXiv preprint arXiv:2205.01023

Published Date

2022/5/2

We study a unified approach and algorithm for constructive discrepancy minimization based on a stochastic process. By varying the parameters of the process, one can recover various state-of-the-art results. We demonstrate the flexibility of the method by deriving a discrepancy bound for smoothed instances, which interpolates between known bounds for worst-case and random instances.

See List of Professors in Santosh S. Vempala University(Georgia Institute of Technology)

Santosh S. Vempala FAQs

What is Santosh S. Vempala's h-index at Georgia Institute of Technology?

The h-index of Santosh S. Vempala has been 50 since 2020 and 71 in total.

What are Santosh S. Vempala's top articles?

The articles with the titles of

Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time

Eldan's stochastic localization and the KLS conjecture: Isoperimetry, concentration and mixing

Computation with Sequences of Assemblies in a Model of the Brain

Sampling with Barriers: Faster Mixing via Lewis Weights

Efficiently Sampling the PSD Cone with the Metric Dikin Walk

Is Planted Coloring Easier than Planted Clique?

Calibrated language models must hallucinate

Condition-number-independent convergence rate of Riemannian Hamiltonian Monte Carlo with numerical integrators

...

are the top articles of Santosh S. Vempala at Georgia Institute of Technology.

What are Santosh S. Vempala's research interests?

The research interests of Santosh S. Vempala are: Algorithms, Randomness, High-dimensional Geometry, Optimization, Foundations of Data Science

What is Santosh S. Vempala's total number of citations?

Santosh S. Vempala has 21,988 citations in total.

    academic-engine

    Useful Links