Multitask optimization

Last updated

Multi-task optimization is a paradigm in the optimization literature that focuses on solving multiple self-contained tasks simultaneously. [1] [2] The paradigm has been inspired by the well-established concepts of transfer learning [3] and multi-task learning [4] in predictive analytics.

Contents

The key motivation behind multi-task optimization is that if optimization tasks are related to each other in terms of their optimal solutions or the general characteristics of their function landscapes, [5] the search progress can be transferred to substantially accelerate the search on the other.

The success of the paradigm is not necessarily limited to one-way knowledge transfers from simpler to more complex tasks. In practice an attempt is to intentionally solve a more difficult task that may unintentionally solve several smaller problems. [6]

There is a direct relationship between multitask optimization and multi-objective optimization [7] .

Methods

There are several common approaches for multi-task optimization: Bayesian optimization, evolutionary computation, and approaches based on Game theory. [1]

Multi-task Bayesian optimization

Multi-task Bayesian optimization is a modern model-based approach that leverages the concept of knowledge transfer to speed up the automatic hyperparameter optimization process of machine learning algorithms. [8] The method builds a multi-task Gaussian process model on the data originating from different searches progressing in tandem. [9] The captured inter-task dependencies are thereafter utilized to better inform the subsequent sampling of candidate solutions in respective search spaces.

Evolutionary multi-tasking

Evolutionary multi-tasking has been explored as a means of exploiting the implicit parallelism of population-based search algorithms to simultaneously progress multiple distinct optimization tasks. By mapping all tasks to a unified search space, the evolving population of candidate solutions can harness the hidden relationships between them through continuous genetic transfer. This is induced when solutions associated with different tasks crossover. [2] [10] Recently, modes of knowledge transfer that are different from direct solution crossover have been explored. [11]

Game-theoretic optimization

Game-theoretic approaches to multi-task optimization propose to view the optimization problem as a game, where each task is a player. All players compete through the reward matrix of the game, and try to reach a solution that satisfies all players (all tasks). This view provide insight about how to build efficient algorithms based on gradient descent optimization (GD), which is particularly important for training deep neural networks. [12] In GD for MTL, the problem is that each task provides its own loss, and it is not clear how to combine all losses and create a single unified gradient, leading to several different aggregation strategies. [13] [14] [15] This aggregation problem can be solved by defining a game matrix where the reward of each player is the agreement of its own gradient with the common gradient, and then setting the common gradient to be the Nash Cooperative bargaining [16] of that system.

Applications

Algorithms for multi-task optimization span a wide array of real-world applications. Recent studies highlight the potential for speed-ups in the optimization of engineering design parameters by conducting related designs jointly in a multi-task manner. [10] In machine learning, the transfer of optimized features across related data sets can enhance the efficiency of the training process as well as improve the generalization capability of learned models. [17] [18] In addition, the concept of multi-tasking has led to advances in automatic hyperparameter optimization of machine learning models and ensemble learning. [19] [20]

Applications have also been reported in cloud computing, [21] with future developments geared towards cloud-based on-demand optimization services that can cater to multiple customers simultaneously. [2] [22] Recent work has additionally shown applications in chemistry. [23]

See also

Related Research Articles

<span class="mw-page-title-main">Artificial neural network</span> Computational model used in machine learning, based on connected, hierarchical functions

Artificial neural networks are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.

In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators.

<span class="mw-page-title-main">Particle swarm optimization</span> Iterative simulation method

In computational science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formula over the particle's position and velocity. Each particle's movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.

Neuroevolution, or neuro-evolution, is a form of artificial intelligence that uses evolutionary algorithms to generate artificial neural networks (ANN), parameters, and rules. It is most commonly applied in artificial life, general game playing and evolutionary robotics. The main benefit is that neuroevolution can be applied more widely than supervised learning algorithms, which require a syllabus of correct input-output pairs. In contrast, neuroevolution requires only a measure of a network's performance at a task. For example, the outcome of a game can be easily measured without providing labeled examples of desired strategies. Neuroevolution is commonly used as part of the reinforcement learning paradigm, and it can be contrasted with conventional deep learning techniques that use gradient descent on a neural network with a fixed topology.

<span class="mw-page-title-main">Ant colony optimization algorithms</span> Optimization algorithm

In computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. Artificial ants stand for multi-agent methods inspired by the behavior of real ants. The pheromone-based communication of biological ants is often the predominant paradigm used. Combinations of artificial ants and local search algorithms have become a method of choice for numerous optimization tasks involving some sort of graph, e.g., vehicle routing and internet routing.

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Early versions of MTL were called "hints".

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

<span class="mw-page-title-main">Estimation of distribution algorithm</span>

Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding an uninformative prior over admissible solutions and ending with the model that generates only the global optima.

<span class="mw-page-title-main">Transfer learning</span> Machine learning technique

Transfer learning (TL) is a technique in machine learning (ML) in which knowledge learned from a task is re-used in order to boost performance on a related task. For example, for image classification, knowledge gained while learning to recognize cars could be applied when trying to recognize trucks. This topic is related to the psychological literature on transfer of learning, although practical ties between the two fields are limited. Reusing/transferring information from previously learned tasks to new tasks has the potential to significantly improve learning efficiency.

A memetic algorithm (MA) in computer science and operations research, is an extension of the traditional genetic algorithm (GA) or more general evolutionary algorithm (EA). It may provide a sufficiently good solution to an optimization problem. It uses a suitable heuristic or local search technique to improve the quality of solutions generated by the EA and to reduce the likelihood of premature convergence.

Multi-objective optimization or Pareto optimization is an area of multiple-criteria decision making that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. Multi-objective is a type of vector optimization that has been applied in many fields of science, including engineering, economics and logistics where optimal decisions need to be taken in the presence of trade-offs between two or more conflicting objectives. Minimizing cost while maximizing comfort while buying a car, and maximizing performance whilst minimizing fuel consumption and emission of pollutants of a vehicle are examples of multi-objective optimization problems involving two and three objectives, respectively. In practical problems, there can be more than three objectives.

There are many types of artificial neural networks (ANN).

Bayesian optimization is a sequential design strategy for global optimization of black-box functions that does not assume any functional forms. It is usually employed to optimize expensive-to-evaluate functions.

<span class="mw-page-title-main">Symbolic regression</span> Type of regression analysis

Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity.

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process.

Memetic computing is a novel computational paradigm that incorporates the notion of meme(s) as basic units of transferable information encoded in computational representations for boosting the performance of artificial evolutionary systems in the domain of search and optimization.

This is a comparison of statistical analysis software that allows doing inference with Gaussian processes often using approximations.

Self-supervised learning (SSL) is a paradigm in machine learning where a model is trained on a task using the data itself to generate supervisory signals, rather than relying on external labels provided by humans. In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals. SSL tasks are designed so that solving it requires capturing essential features or relationships in the data. The input data is typically augmented or transformed in a way that creates pairs of related samples. One sample serves as the input, and the other is used to formulate the supervisory signal. This augmentation can involve introducing noise, cropping, rotation, or other transformations. Self-supervised learning more closely imitates the way humans learn to classify objects.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

References

  1. 1 2 Gupta, Abhishek; Ong, Yew-Soon; Feng, Liang (2018). "Insights on Transfer Optimization: Because Experience is the Best Teacher". IEEE Transactions on Emerging Topics in Computational Intelligence. 2: 51–64. doi:10.1109/TETCI.2017.2769104. hdl: 10356/147980 . S2CID   11510470.
  2. 1 2 3 Gupta, Abhishek; Ong, Yew-Soon; Feng, Liang (2016). "Multifactorial Evolution: Toward Evolutionary Multitasking". IEEE Transactions on Evolutionary Computation. 20 (3): 343–357. doi:10.1109/TEVC.2015.2458037. hdl: 10356/148174 . S2CID   13767012.
  3. Pan, Sinno Jialin; Yang, Qiang (2010). "A Survey on Transfer Learning". IEEE Transactions on Knowledge and Data Engineering. 22 (10): 1345–1359. doi:10.1109/TKDE.2009.191. S2CID   740063.
  4. Caruana, R., "Multitask Learning", pp. 95-134 in Sebastian Thrun, Lorien Pratt (eds.) Learning to Learn, (1998) Springer ISBN   9780792380474
  5. Cheng, Mei-Ying; Gupta, Abhishek; Ong, Yew-Soon; Ni, Zhi-Wei (2017). "Coevolutionary multitasking for concurrent global optimization: With case studies in complex engineering design". Engineering Applications of Artificial Intelligence. 64: 13–24. doi: 10.1016/j.engappai.2017.05.008 . S2CID   13767210.
  6. Cabi, Serkan; Sergio Gómez Colmenarejo; Hoffman, Matthew W.; Denil, Misha; Wang, Ziyu; Nando de Freitas (2017). "The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously". arXiv: 1707.03300 [cs.AI].
  7. J. -Y. Li, Z. -H. Zhan, Y. Li and J. Zhang, "Multiple Tasks for Multiple Objectives: A New Multiobjective Optimization Method via Multitask Optimization," in IEEE Transactions on Evolutionary Computation, doi : 10.1109/TEVC.2023.3294307
  8. Swersky, K., Snoek, J., & Adams, R. P. (2013). Multi-task bayesian optimization. Advances in neural information processing systems (pp. 2004-2012).
  9. Bonilla, E. V., Chai, K. M., & Williams, C. (2008). Multi-task Gaussian process prediction. Advances in neural information processing systems (pp. 153-160).
  10. 1 2 Ong, Y. S., & Gupta, A. (2016). Evolutionary multitasking: a computer science view of cognitive multitasking. Cognitive Computation, 8(2), 125-142.
  11. Feng, Liang; Zhou, Lei; Zhong, Jinghui; Gupta, Abhishek; Ong, Yew-Soon; Tan, Kay-Chen; Qin, A. K. (2019). "Evolutionary Multitasking via Explicit Autoencoding". IEEE Transactions on Cybernetics. 49 (9): 3457–3470. doi:10.1109/TCYB.2018.2845361. PMID   29994415. S2CID   51613697.
  12. Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). Deep Learning. MIT Press. ISBN   978-0-262-03561-3.
  13. Liu, L.; Li, Y.; Kuang, Z.; Xue, J.; Chen, Y.; Yang, W.; Liao, Q.; Zhang, W. (2021-05-04). "Towards Impartial Multi-task Learning". In: Proceedings of the International Conference on Learning Representations (ICLR 2021). ICLR: Virtual event. (2021). Retrieved 2022-11-20.
  14. Tianhe, Yu; Saurabh, Kumar; Abhishek, Gupta; Sergey, Levine; Karol, Hausman; Chelsea, Finn (2020). "Gradient Surgery for Multi-Task Learning". Advances in Neural Information Processing Systems. 33. arXiv: 2001.06782 .
  15. Liu, Bo; Liu, Xingchao; Jin, Xiaojie; Stone, Peter; Liu, Qiang (2021-10-26). "Conflict-Averse Gradient Descent for Multi-task Learning". arXiv: 2110.14048 [cs.LG].
  16. Aviv Navon, Aviv Shamsian, Idan Achituve, Haggai Maron, Kenji Kawaguchi, Gal Chechik, Ethan Fetaya, (2022). Multi-Task Learning as a Bargaining Game. International conference on machine learning.
  17. Chandra, R., Gupta, A., Ong, Y. S., & Goh, C. K. (2016, October). Evolutionary multi-task learning for modular training of feedforward neural networks. In International Conference on Neural Information Processing (pp. 37-46). Springer, Cham.
  18. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems (pp. 3320-3328).
  19. Wen, Yu-Wei; Ting, Chuan-Kang (2016). "Learning ensemble of decision trees through multifactorial genetic programming". 2016 IEEE Congress on Evolutionary Computation (CEC). pp. 5293–5300. doi:10.1109/CEC.2016.7748363. ISBN   978-1-5090-0623-6. S2CID   2617811.
  20. Zhang, Boyu; Qin, A. K.; Sellis, Timos (2018). "Evolutionary feature subspaces generation for ensemble classification". Proceedings of the Genetic and Evolutionary Computation Conference. pp. 577–584. doi:10.1145/3205455.3205638. ISBN   978-1-4503-5618-3. S2CID   49564862.
  21. Bao, Liang; Qi, Yutao; Shen, Mengqing; Bu, Xiaoxuan; Yu, Jusheng; Li, Qian; Chen, Ping (2018). "An Evolutionary Multitasking Algorithm for Cloud Computing Service Composition". Services – SERVICES 2018. Lecture Notes in Computer Science. Vol. 10975. pp. 130–144. doi:10.1007/978-3-319-94472-2_10. ISBN   978-3-319-94471-5.
  22. Tang, J., Chen, Y., Deng, Z., Xiang, Y., & Joy, C. P. (2018). A Group-based Approach to Improve Multifactorial Evolutionary Algorithm. In IJCAI (pp. 3870-3876).
  23. Felton, Kobi; Wigh, Daniel; Lapkin, Alexei (2021). "Multi-task Bayesian Optimization of Chemical Reactions". chemRxiv. doi:10.26434/chemrxiv.13250216.v2.