Yurii Nesterov

Last updated
Yurii Nesterov
Nesterov yurii.jpg
2005 in Oberwolfach
Born (1956-01-25) January 25, 1956 (age 67)
Citizenship Belgium
Alma mater Moscow State University (1977)
Awards
Scientific career
Fields
Institutions
Doctoral advisor Boris Polyak

Yurii Nesterov is a Russian mathematician, an internationally recognized expert in convex optimization, especially in the development of efficient algorithms and numerical optimization analysis. He is currently a professor at the University of Louvain (UCLouvain).

Contents

Biography

In 1977, Yurii Nesterov graduated in applied mathematics at Moscow State University. From 1977 to 1992 he was a researcher at the Central Economic Mathematical Institute of the Russian Academy of Sciences. Since 1993, he has been working at UCLouvain, specifically in the Department of Mathematical Engineering from the Louvain School of Engineering, Center for Operations Research and Econometrics.

In 2000, Nesterov received the Dantzig Prize. [1]

In 2009, Nesterov won the John von Neumann Theory Prize. [2]

In 2016, Nesterov received the EURO Gold Medal. [3]

In 2023, Yurii Nesterov and Arkadi Nemirovski received the WLA Prize in Computer Science or Mathematics, "for their seminal work in convex optimization theory". [4]

Academic work

Nesterov is most famous for his work in convex optimization, including his 2004 book, considered a canonical reference on the subject. [5] His main novel contribution is an accelerated version of gradient descent that converges considerably faster than ordinary gradient descent (commonly referred as Nesterov momentum, Nesterov Acceleration or Nesterov accelerated gradient, in short — NAG). [6] [7] [8] [9] [10] This method, sometimes called "FISTA", was further developed by Beck & Teboulle in their 2009 paper "A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems". [11]

His work with Arkadi Nemirovski in their 1994 book [12] is the first to point out that the interior point method can solve convex optimization problems, and the first to make a systematic study of semidefinite programming (SDP). Also in this book, they introduced the self-concordant functions which are useful in the analysis of Newton's method. [13]

Related Research Articles

The John von Neumann Theory Prize of the Institute for Operations Research and the Management Sciences (INFORMS) is awarded annually to an individual who has made fundamental and sustained contributions to theory in operations research and the management sciences.

<span class="mw-page-title-main">Linear programming</span> Method to solve some optimization problems

Linear programming (LP), also called linear optimization, is a method to achieve the best outcome in a mathematical model whose requirements are represented by linear relationships. Linear programming is a special case of mathematical programming.

<span class="mw-page-title-main">Mathematical optimization</span> Study of mathematical algorithms for optimization problems

Mathematical optimization or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

<span class="mw-page-title-main">Gradient descent</span> Optimization algorithm

In mathematics, gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient of the function at the current point, because this is the direction of steepest descent. Conversely, stepping in the direction of the gradient will lead to a local maximum of that function; the procedure is then known as gradient ascent. It is particularly useful in machine learning for minimizing the cost or loss function. Gradient descent should not be confused with local search algorithms, although both are iterative methods for optimization.

<span class="mw-page-title-main">Stochastic gradient descent</span> Optimization algorithm

Stochastic gradient descent is an iterative method for optimizing an objective function with suitable smoothness properties. It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient by an estimate thereof. Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate.

<span class="mw-page-title-main">Interior-point method</span> Algorithms for solving convex optimization problems

Interior-point methods are a certain class of algorithms that solve linear and nonlinear convex optimization problems.

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

In convex optimization, a linear matrix inequality (LMI) is an expression of the form

In optimization, a self-concordant function is a function for which

The Bregman method is an iterative algorithm to solve certain convex optimization problems involving regularization. The original version is due to Lev M. Bregman, who published it in 1967.

Arkadi Nemirovski is a professor at the H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology. He has been a leader in continuous optimization and is best known for his work on the ellipsoid method, modern interior-point methods and robust optimization.

Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function. At each iteration, the algorithm determines a coordinate or coordinate block via a coordinate selection rule, then exactly or inexactly minimizes over the corresponding coordinate hyperplane while fixing all other coordinates or coordinate blocks. A line search along the coordinate direction can be performed at the current iterate to determine the appropriate step size. Coordinate descent is applicable in both differentiable and derivative-free contexts.

Randomized (Block) Coordinate Descent Method is an optimization algorithm popularized by Nesterov (2010) and Richtárik and Takáč (2011). The first analysis of this method, when applied to the problem of minimizing a smooth convex function, was performed by Nesterov (2010). In Nesterov's analysis the method needs to be applied to a quadratic perturbation of the original function with an unknown scaling factor. Richtárik and Takáč (2011) give iteration complexity bounds which do not require this, i.e., the method is applied to the objective function directly. Furthermore, they generalize the setting to the problem of minimizing a composite function, i.e., sum of a smooth convex and a convex block-separable function:

<span class="mw-page-title-main">Laurence Wolsey</span>

Laurence Alexander Wolsey is an English mathematician working in the field of integer programming. He is a former president and research director of the Center for Operations Research and Econometrics (CORE) at Université catholique de Louvain in Belgium. He is professor emeritus of applied mathematics at the engineering school of the same university.

<span class="mw-page-title-main">Proximal gradient method</span>

Proximal gradient methods are a generalized form of projection used to solve non-differentiable convex optimization problems.

Proximal gradientmethods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is regularization of the form

<span class="mw-page-title-main">Center for Operations Research and Econometrics</span>

The Center for Operations Research and Econometrics (CORE) is an interdisciplinary research institute of the University of Louvain (UCLouvain) located in Louvain-la-Neuve, Belgium. Since 2010, it is part of the Louvain Institute of Data Analysis and Modeling in economics and statistics (LIDAM), along with the Institute for Economic and Social Research (IRES), Louvain Finance (LFIN) and the Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).

Peter Richtarik is a Slovak mathematician and computer scientist working in the area of big data optimization and machine learning, known for his work on randomized coordinate descent algorithms, stochastic gradient descent and federated learning. He is currently a Professor of Computer Science at the King Abdullah University of Science and Technology.

In mathematical optimization, oracle complexity is a standard theoretical framework to study the computational requirements for solving classes of optimization problems. It is suitable for analyzing iterative algorithms which proceed by computing local information about the objective function at various points. The framework has been used to provide tight worst-case guarantees on the number of required iterations, for several important classes of optimization problems.

In mathematics, mirror descent is an iterative optimization algorithm for finding a local minimum of a differentiable function.

References

  1. "The George B. Dantzig Prize". 2000. Retrieved December 12, 2014.
  2. "John Von Neumann Theory Prize". 2009. Retrieved June 4, 2014.
  3. "EURO Gold Medal". 2016. Retrieved August 20, 2016.
  4. "Laureates of the 2023 WLA Prize Announced". 2023. Retrieved October 4, 2023.
  5. Nesterov, Yurii (2004). Introductory lectures on convex optimization : A basic course. Kluwer Academic Publishers. CiteSeerX   10.1.1.693.855 . ISBN   978-1402075537.
  6. Nesterov, Y (1983). "A method for unconstrained convex minimization problem with the rate of convergence ". Doklady AN USSR. 269: 543–547.
  7. Walkington, Noel J. (2023). "Nesterov's Method for Convex Optimization". SIAM Review. 65 (2): 539–562. doi:10.1137/21M1390037. ISSN   0036-1445.
  8. Bubeck, Sebastien (April 1, 2013). "ORF523: Nesterov's Accelerated Gradient Descent" . Retrieved June 4, 2014.
  9. Bubeck, Sebastien (March 6, 2014). "Nesterov's Accelerated Gradient Descent for Smooth and Strongly Convex Optimization" . Retrieved June 4, 2014.
  10. "The zen of gradient descent". blog.mrtz.org. Retrieved 2023-05-13.
  11. Beck, Amir; Teboulle, Marc (2009-01-01). "A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems". SIAM Journal on Imaging Sciences. 2 (1): 183–202. doi:10.1137/080716542.
  12. Nesterov, Yurii; Arkadii, Nemirovskii (1995). Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics. ISBN   978-0898715156.
  13. Boyd, Stephen P.; Vandenberghe, Lieven (2004). Convex Optimization (PDF). Cambridge University Press. ISBN   978-0-521-83378-3 . Retrieved October 15, 2011.