Sushant Sachdeva

Associate Professor
University of Toronto

Associate Professor, University of Toronto Mississauga
Faculty Member, Vector Institute

Fall 2025--	On leave at OpenAI
2016-2017	Research Scientist, Google
2014-2016	Department of Computer Science, Yale University Postdoctoral Associate, Hosted by Daniel Spielman
Fall 2013	Simons Fellow, Simons Institute, UC Berkeley
2008-2013	Ph.D., Computer Science, Princeton University Advised by Sanjeev Arora
2004-2008	B.Tech., Computer Science and Engg., IIT Bombay

Research Interests

Algorithms, and its connections to optimization and statistics.

Focus areas: Design of fast algorithms, particularly for graph problems combining techniques from convex optimization, and numerical linear algebra

Mathematical Machine Learning: Optimization for machine learning, Representation Learning, Learning on Graphs, Mixing of Markov Chains

Positions

The theory group is seeking MSc/PhD students with strong cs/math backgrounds interested in algorithms and/or machine learning broadly. Please apply directly to the department, and indicate in your application if you're interested in working with me.

I am also looking for MScAC students working on projects related to model training, efficiency improvement, learning on graphs etc.

Recent News

Dec 2025: Awarded the 2025 Infosys Award in Engineering and Computer Science

Jun 2023: Awarded the 2023 Frontiers of Science Award

Feb 2023: Awarded the Sloan Research Fellowship 2023

Feb 2023: Awarded the Young Alumni Achiever Award by IIT Bombay

Jan 2023: Invited to give a Plenary Talk at SODA 2023

Oct 2022: Our paper received the Best Paper Award at FOCS 2022

Aug 2022: Deeksha Adil successfully defended her Ph.D.

Current and Past Group Members

Lawrence Li (PhD, 2020--)
Yibin Zhao (PhD, 2020--)
Paul Zhang (MSc, 2024--2026)
Laurel Aquino (MScAC, 2025--2026)
Jonathan Hsu (MScAC, 2025--2026)
Sipu Qi (MScAC, 2025--2026)
Jiongan Mu (MScAC, 2025--2026)
Yuhe Fan (MScAC, 2024--2025)
Anthony Rinaldi (MScAC, 2023--2024)
Hantang Li (MScAC, 2023--24)
Deepkamal Kaur Gill (MScAC, 2022--23)
Gramoz Goranci (Postdoc, 2020--21)
Deeksha Adil (PhD, 2017--22)
Hao Zhang (MScAC, 2021--22)

Recent Service

Program Committee member for FOCS 2025, ITCS 2025, ICALP 2024, ITCS 2024, FOCS 2023. Area chair for NeurIPS 2025, 2024

Associate Editor for TheoretiCS, Guest Editor for TALG (SODA 2019 Special Issue), SICOMP (STOC 2019 Special Issue)

Richard Peng (GaTech) and I organized a workshop at FOCS 2018: Laplacian Paradigm 2.0

Selected Publications

Maximum Flow and Minimum-Cost Flow in Almost-Linear Time
Best Paper Award

FOCS 2022

with Li Chen, Rasmus Kyng, Yang P. Liu, Richard Peng, Maximilian Probst Gutenberg

Abstract

Conference

arXiv

Short Video

Long Video

We give an algorithm that computes exact maximum flows and minimum-cost flows on directed graphs with m edges and polynomially bounded integral demands, costs, and capacities in \(m^{1+o(1)}\) time. Our algorithm builds the flow through a sequence of \(m^{1+o(1)}\) approximate undirected minimum-ratio cycles, each of which is computed and processed in amortized \(m^{o(1)}\) time using a new dynamic graph data structure.
Our framework extends to algorithms running in \(m^{1+o(1)}\) time for computing flows that minimize general edge-separable convex functions to high accuracy. This gives almost-linear time algorithms for several problems including entropy-regularized optimal transport, matrix scaling, p-norm flows, and p-norm isotonic regression on arbitrary directed acyclic graphs.

Graph Sparsification, Spectral Sketches, and Faster Resistance Computation,
via Short Cycle Decompositions

SICOMP 2020
FOCS 2018

with Timothy Chu, Yu Gao, Richard Peng, Saurabh Sawlani, Junxing Wang

Abstract

Journal

Conference

arXiv

Video

We develop a framework for graph sparsification and sketching, based on a new tool, short cycle decomposition -- a decomposition of an unweighted graph into an edge-disjoint collection of short cycles, plus few extra edges. A simple observation gives that every graph \(G\) on \(n\) vertices with m edges can be decomposed in \(O(mn)\) time into cycles of length at most \(2 \log n\), and at most \(2n\) extra edges. We give an \(m^{1+o(1)}\) time algorithm for constructing a short cycle decomposition, with cycles of length \(n^{o(1)}\), and \(n^{1+o(1)}\) extra edges. These decompositions enable us to make progress on several open questions:

* We give an algorithm to find \((1\pm\epsilon)\)-approximations to effective resistances of all edges in time \(m^{1+o(1)}\epsilon^{-1.5}\), improving over the previous best of \(\widetilde{O}(\min\{m\epsilon^{-2},n^2 \epsilon^{-1}\})\). This gives an algorithm to approximate the determinant of a Laplacian up to \((1\pm\epsilon)\) in \(m^{1 + o(1)} + n^{15/8+o(1)}\epsilon^{-7/4}\) time.

* We show existence and efficient algorithms for constructing graphical spectral sketches -- a distribution over sparse graphs H such that for a fixed vector \(x\), we have w.h.p. \(x'L_Hx=(1\pm\epsilon)x'L_Gx\) and \(x'L_H^+x=(1\pm\epsilon)x'L_G^+x\). This implies the existence of resistance-sparsifiers with about \(n\epsilon^{-1}\) edges that preserve the effective resistances between every pair of vertices up to \((1\pm\epsilon).\)

* By combining short cycle decompositions with known tools in graph sparsification, we show the existence of nearly-linear sized degree-preserving spectral sparsifiers, as well as significantly sparser approximations of directed graphs. The latter is critical to recent breakthroughs on faster algorithms for solving linear systems in directed Laplacians.

Improved algorithms for constructing short cycle decompositions will lead to improvements for each of the above results.

Approximate Gaussian Elimination for Laplacians:
Fast, Sparse, and Simple

FOCS 2016

with Rasmus Kyng

Abstract

Conference

arXiv

Video

We show how to perform sparse approximate Gaussian elimination for Laplacian matrices. We present a simple, nearly linear time algorithm that approximates a Laplacian by a matrix with a sparse Cholesky factorization, the version of Gaussian elimination for symmetric matrices. This is the first nearly linear time solver for Laplacian systems that is based purely on random sampling, and does not use any graph theoretic constructions such as low-stretch trees, sparsifiers, or expanders. The crux of our analysis is a novel concentration bound for matrix martingales where the differences are sums of conditionally independent variables.

Algorithms for Lipschitz Learning on Graphs

COLT 2015

with Rasmus Kyng, Anup Rao, Daniel Spielman

Abstract

Conference

arXiv

Video

Code on Github

We develop fast algorithms for solving regression problems on graphs where one is given the value of a function at some vertices, and must find its smoothest possible extension to all vertices. The extension we compute is the absolutely minimal Lipschitz extension, and is the limit for large p of p-Laplacian regularization. We present an algorithm that computes a minimal Lipschitz extension in expected linear time, and an algorithm that computes an absolutely minimal Lipschitz extension in expected time \(\widetilde{O}(mn)\). The latter algorithm has variants that seem to run much faster in practice. These extensions are particularly amenable to regularization: we can perform \(\ell_{0}\)-regularization on the given values in polynomial time and \(\ell_{1}\)-regularization on the initial function values and on graph edge weights in time \(\widetilde{O}(m^{\frac{3}{2}})\).

Faster Algorithms via Approximation Theory

FnTTCS Monograph

with Nisheeth K. Vishnoi

Abstract

Now Publishers

PDF

Faster Algorithms via Approximation Theory illustrates how classical and modern techniques from approximation theory play a crucial role in obtaining results that are relevant to the emerging theory of fast algorithms. The key lies in the fact that such results imply faster ways to approximate primitives such as \(A^s v, A^{-1}v, \exp(-A)v\), eigenvalues and eigenvectors, which are fundamental to many spectral algorithms.
The first half of the book is devoted to the ideas and results from approximation theory that are central, elegant, and may have wider applicability in theoretical computer science. These include not only techniques relating to polynomial approximations but also those relating to approximations by rational functions and beyond. The remaining half illustrates a variety of ways that these results can be used to design fast algorithms.
Faster Algorithms via Approximation Theory is self-contained and should be of interest to researchers and students in theoretical computer science, numerical linear algebra, and related areas.

Provable ICA with Unknown Gaussian Noise,
and Implications for Gaussian Mixtures and Autoencoders

Algorithmica 2015
NIPS 2012

with Sanjeev Arora, Rong Ge, Ankur Moitra

Abstract

Journal

Conference

arXiv

We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees. In particular, suppose we are given samples of the form y = Ax + \(\eta\) where A is an unknown n X n matrix and x is chosen uniformly at random from \(\{+1, -1\}^n\), \(\eta\) is an n-dimensional Gaussian random variable with unknown covariance \(\Sigma\): We give an algorithm that provable recovers A and \(\Sigma\) up to an additive \(\epsilon\) whose running time and sample complexity are polynomial in n and \(1 / \epsilon\). To accomplish this, we introduce a novel "quasi-whitening" step that may be useful in other contexts in which the covariance of Gaussian noise is not known in advance. We also give a general framework for finding all local optima of a function (given an oracle for approximately finding just one) and this is a crucial step in our algorithm, one that has been overlooked in previous attempts, and allows us to control the accumulation of error when we find the columns of A one by one via local search.

Approximating the Exponential, the Lanczos Method and
an \(\widetilde{O}(m)\)-Time Spectral Algorithm for Balanced Separator

STOC 2012

with Lorenzo Orecchia, Nisheeth K. Vishnoi

Abstract

Conference

arXiv

Video

We give a novel spectral approximation algorithm for the balanced separator problem that, given a graph G, a constant balance b \(\in (0,\frac{1}{2}],\) and a parameter \(\gamma,\) either finds an \(\Omega(b)\)-balanced cut of conductance \(O(\sqrt{\gamma})\) in G, or outputs a certificate that all b-balanced cuts in G have conductance at least \(\gamma,\) and runs in time \(\widetilde{O}(m).\) This settles the question of designing asymptotically optimal spectral algorithms for balanced separator. Our algorithm relies on a variant of the heat kernel random walk and requires, as a subroutine, an algorithm to compute exp(-L)v where L is the Laplacian of a graph related to G and v is a vector. Algorithms for computing the matrix-exponential-vector product efficiently comprise our next set of results. Our main result here is a new algorithm which computes a good approximation to exp(-A)v for a class of symmetric positive semidefinite (PSD) matrices A and a given vector v, in time roughly \(\widetilde{O}(m_A),\) where \(m_A\) is the number of non-zero entries of A. This uses, in a non-trivial way, the breakthrough result of Spielman and Teng on inverting symmetric and diagonally-dominant matrices in \(\widetilde{O}(m_A)\) time. Finally, we prove that \(e^{-x}\) can be uniformly approximated up to a small additive error, in a non-negative interval [a,b] with a polynomial of degree roughly \(\sqrt{b-a}.\) While this result is of independent interest in approximation theory, we show that, via the Lanczos method from numerical analysis, it yields a simple algorithm to compute exp(-A)v for symmetric PSD matrices that runs in time roughly \(O(t_A\cdot \sqrt{\|A\|}),\) where \(t_A\) is time required for the computation of the vector Aw for given vector w. As an application, we obtain a simple and practical algorithm, with output conductance \(O(\sqrt{\gamma}),\) for balanced separator that runs in time \(\widetilde{O}(\frac{m}{\sqrt{\gamma}}).\) This latter algorithm matches the running time, but improves on the approximation guarantee of the Evolving-Sets-based algorithm by Andersen and Peres for balanced separator.