Peter Richtarik

Peter Richtarik
Peter Richtarik
Born	Nitra, Slovakia
Nationality	Slovak
Alma mater	Comenius University Cornell University
	Scientific career
Fields	Mathematics, Computer Science, Machine Learning
Institutions	KAUST
Thesis	Some algorithms for large-scale convex and linear minimization in relative scale (2007)
Academic advisors	Yurii Nesterov
Website	https://richtarik.org

Last updated August 14, 2023

Peter Richtarik is a Slovak mathematician and computer scientist ^[1] working in the area of big data optimization and machine learning, known for his work on randomized coordinate descent algorithms, stochastic gradient descent and federated learning. He is currently a Professor of Computer Science at the King Abdullah University of Science and Technology.

Education

Richtarik earned a master's degree in mathematics from Comenius University, Slovakia, in 2001, graduating summa cum laude.^[2] In 2007, he obtained a PhD in operations research from Cornell University, advised by Michael Jeremy Todd.^[3]^[4]

Career

Between 2007 and 2009, he was a postdoctoral scholar in the Center for Operations Research and Econometrics and Department of Mathematical Engineering at Universite catholique de Louvain, Belgium, working with Yurii Nesterov.^[5]^[6] Between 2009 and 2019, Richtarik was a Lecturer and later Reader in the School of Mathematics at the University of Edinburgh. He is a Turing Fellow.^[7] Richtarik founded and organizes a conference series entitled "Optimization and Big Data".^[8]^[9]

Academic work

Richtarik's early research concerned gradient-type methods, optimization in relative scale, sparse principal component analysis and algorithms for optimal design. Since his appointment at Edinburgh, he has been working extensively on building algorithmic foundations of randomized methods in convex optimization, especially randomized coordinate descent algorithms and stochastic gradient descent methods. These methods are well suited for optimization problems described by big data and have applications in fields such as machine learning, signal processing and data science.^[10]^[11] Richtarik is the co-inventor of an algorithm generalizing the randomized Kaczmarz method for solving a system of linear equations, contributed to the invention of federated learning, and co-developed a stochastic variant of the Newton's method.

Awards and distinctions

2020, Due to his Hirsch index of 40 or more,^[12] he belongs among top 0.05% of computer scientists.^[13]
2016, SIGEST Award (jointly with Olivier Fercoq)^[14] of the Society for Industrial and Applied Mathematics
2016, EPSRC Early Career Fellowship in Mathematical Sciences^[15]
2015, EUSA Best Research or Dissertation Supervisor Award (2nd place)^[16]
2014, Plenary Talk at 46th Conference of Slovak Mathematicians^[17]

Bibliography

Peter Richtarik & Martin Takac (2012). "Efficient serial and parallel coordinate descent methods for huge-scale truss topology design". Operations Research Proceedings 2011. Operations Research Proceedings. Springer-Verlag. pp. 27–32. doi:10.1007/978-3-642-29210-1_5. ISBN 978-3-642-29209-5.
Peter Richtarik & Martin Takac (2014). "Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function". Mathematical Programming. Springer. 144 (1): 1–38. arXiv: 1107.2848 . doi:10.1007/s10107-012-0614-z. S2CID 254137101.
Olivier Fercoq & Peter Richtarik (2015). "Accelerated, parallel and proximal coordinate descent". SIAM Journal on Optimization. 25 (4): 1997–2023. arXiv: 1312.5799 . doi:10.1137/130949993. S2CID 8068556.
Dominik Csiba; Zheng Qu; Peter Richtarik (2015). "Stochastic Dual Coordinate Ascent with Adaptive Probabilities" (pdf). Proceedings of the 32nd International Conference on Machine Learning. pp. 674–683.
Robert M Gower & Peter Richtarik (2015). "Randomized Iterative Methods for Linear Systems". SIAM Journal on Matrix Analysis and Applications. 36 (4): 1660–1690. doi:10.1137/15M1025487. hdl: 20.500.11820/5c673b9e-8cf3-482c-8602-da8abcb903dd . S2CID 8215294.
Peter Richtarik & Martin Takac (2016). "Parallel coordinate descent methods for big data optimization". Mathematical Programming. 156 (1): 433–484. doi:10.1007/s10107-015-0901-6. hdl: 20.500.11820/a5649cad-b6b8-4ccc-9ca2-b368131dcbe5 . S2CID 254133277.
Zheng Qu & Peter Richtarik (2016). "Coordinate descent with arbitrary sampling I: algorithms and complexity". Optimization Methods and Software. 31 (5): 829–857. arXiv: 1412.8060 . doi:10.1080/10556788.2016.1190360. S2CID 2636844.
Zheng Qu & Peter Richtarik (2016). "Coordinate descent with arbitrary sampling II: expected separable overapproximation". Optimization Methods and Software. 31 (5): 858–884. arXiv: 1412.8063 . doi:10.1080/10556788.2016.1190361. S2CID 11048560.
Zheng Qu; Peter Richtarik; Martin Takac; Olivier Fercoq (2016). "SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization" (pdf). Proceedings of the 33rd International Conference on Machine Learning. pp. 1823–1832.
Zeyuan Allen-Zhu; Zheng Qu; Peter Richtarik; Yang Yuan (2016). "Even faster accelerated coordinate descent using non-uniform sampling" (pdf). Proceedings of the 33rd International Conference on Machine Learning. pp. 1110–1119.
Dominik Csiba & Peter Richtarik (2016). "Importance sampling for minibatches". arXiv: 1602.02283 [cs.LG].
Dominik Csiba & Peter Richtarik (2016). "Coordinate descent face-off: primal or dual?". arXiv: 1605.08982 [math.OC].

Related Research Articles

Artificial neural networks are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

In computer science, evolutionary computation is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character.

Algorithmic composition is the technique of using algorithms to create music.

Stochastic gradient descent is an iterative method for optimizing an objective function with suitable smoothness properties. It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient by an estimate thereof. Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate.

The Kaczmarz method or Kaczmarz's algorithm is an iterative algorithm for solving linear equation systems $. It was first discovered by the Polish mathematician Stefan Kaczmarz, and was rediscovered in the field of image reconstruction from projections by Richard Gordon, Robert Bender, and Gabor Herman in 1970, where it is called the Algebraic Reconstruction Technique (ART). ART includes the positivity constraint, making it nonlinear.$

In probability theory and machine learning, the multi-armed bandit problem is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. The name comes from imagining a gambler at a row of slot machines, who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether to continue with the current machine or try a different machine. The multi-armed bandit problem also falls into the broad category of stochastic scheduling.

Stochastic optimization (SO) methods are optimization methods that generate and use random variables. For stochastic problems, the random variables appear in the formulation of the optimization problem itself, which involves random objective functions or random constraints. Stochastic optimization methods also include methods with random iterates. Some stochastic optimization methods use random iterates to solve stochastic problems, combining both meanings of stochastic optimization. Stochastic optimization methods generalize deterministic methods for deterministic problems.

Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.

In machine learning, a hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters are derived via training.

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function. At each iteration, the algorithm determines a coordinate or coordinate block via a coordinate selection rule, then exactly or inexactly minimizes over the corresponding coordinate hyperplane while fixing all other coordinates or coordinate blocks. A line search along the coordinate direction can be performed at the current iterate to determine the appropriate step size. Coordinate descent is applicable in both differentiable and derivative-free contexts.

Randomized (Block) Coordinate Descent Method is an optimization algorithm popularized by Nesterov (2010) and Richtárik and Takáč (2011). The first analysis of this method, when applied to the problem of minimizing a smooth convex function, was performed by Nesterov (2010). In Nesterov's analysis the method needs to be applied to a quadratic perturbation of the original function with an unknown scaling factor. Richtárik and Takáč (2011) give iteration complexity bounds which do not require this, i.e., the method is applied to the objective function directly. Furthermore, they generalize the setting to the problem of minimizing a composite function, i.e., sum of a smooth convex and a convex block-separable function:

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

Apache SINGA is an Apache top-level project for developing an open source machine learning library. It provides a flexible architecture for scalable distributed training, is extensible to run over a wide range of hardware, and has a focus on health-care applications.

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters are learned.

Federated learning is a machine learning technique that trains an algorithm via multiple independent sessions, each using its own dataset. This approach stands in contrast to traditional centralized machine learning techniques where local datasets are merged into one training session, as well as to approaches that assume that local data samples are identically distributed.

Elad Hazan is an Israeli-American computer scientist, academic, author and researcher. He is a Professor of Computer Science at Princeton University, and the co-founder and director of Google AI Princeton.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

(Stochastic) variance reduction is an algorithmic approach to minimizing functions that can be decomposed into finite sums. By exploiting the finite sum structure, variance reduction techniques are able to achieve convergence rates that are impossible to achieve with methods that treat the objective as an infinite sum, as in the classical Stochastic approximation setting.

References

↑ "Richtarik's DBLP profile" . Retrieved December 23, 2020.
↑ "Richtarik's CV" (PDF). Retrieved August 21, 2016.
↑ "Mathematics Genealogy Project" . Retrieved August 20, 2016.
↑ "Cornell PhD Thesis" . Retrieved August 22, 2016.
↑ "Postdoctoral Fellows at CORE" . Retrieved August 22, 2016.
↑ "Simons Institute for the Theory of Computing, UC Berkeley" . Retrieved August 22, 2016.
↑ "Alan Turing Institute Faculty Fellows" . Retrieved August 22, 2016.
↑ "Optimization and Big Data 2012" . Retrieved August 20, 2016.
↑ "Optimization and Big Data 2015" . Retrieved August 20, 2016.
↑ Cathy O'Neil & Rachel Schutt (2013). "Modeling and Algorithms at Scale". Doing Data Science: Straight Talk from the Frontline. O'Reilly. ISBN 9781449358655 . Retrieved August 21, 2016.
↑ Sebastien Bubeck (2015). Convex Optimization: Algorithms and Complexity. Foundations and Trends in Machine Learning. Now Publishers. ISBN 978-1601988607.
↑ "Google Scholar" . Retrieved December 28, 2020.
↑ "The h Index for Computer Science" . Retrieved December 28, 2020.
↑ "SIGEST Award" . Retrieved August 20, 2016.
↑ "EPSRC Fellowship" . Retrieved August 21, 2016.
↑ "EUSA Awards 2015" . Retrieved August 20, 2016.
↑ "46th Conference of Slovak Mathematicians" . Retrieved August 22, 2016.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Richtarik's DBLP profile" . Retrieved December 23, 2020.

[2] "Richtarik's CV" (PDF). Retrieved August 21, 2016.

[mathgenealogy-3] "Mathematics Genealogy Project" . Retrieved August 20, 2016.

[thesis-4] "Cornell PhD Thesis" . Retrieved August 22, 2016.

[COREfellows-5] "Postdoctoral Fellows at CORE" . Retrieved August 22, 2016.

[simons-6] "Simons Institute for the Theory of Computing, UC Berkeley" . Retrieved August 22, 2016.

[ATI-7] "Alan Turing Institute Faculty Fellows" . Retrieved August 22, 2016.

[8] "Optimization and Big Data 2012" . Retrieved August 20, 2016.

[9] "Optimization and Big Data 2015" . Retrieved August 20, 2016.

[10] Cathy O'Neil & Rachel Schutt (2013). "Modeling and Algorithms at Scale". Doing Data Science: Straight Talk from the Frontline. O'Reilly. ISBN 9781449358655 . Retrieved August 21, 2016.

[11] Sebastien Bubeck (2015). Convex Optimization: Algorithms and Complexity. Foundations and Trends in Machine Learning. Now Publishers. ISBN 978-1601988607.

[12] "Google Scholar" . Retrieved December 28, 2020.

[13] "The h Index for Computer Science" . Retrieved December 28, 2020.

[14] "SIGEST Award" . Retrieved August 20, 2016.

[15] "EPSRC Fellowship" . Retrieved August 21, 2016.

[16] "EUSA Awards 2015" . Retrieved August 20, 2016.

[17] "46th Conference of Slovak Mathematicians" . Retrieved August 22, 2016.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]