Neural architecture search

Last updated

Neural architecture search (NAS) [1] [2] is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. [3] [4] Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used: [1]

Contents

NAS is closely related to hyperparameter optimization [5] and meta-learning [6] and is a subfield of automated machine learning (AutoML).

Reinforcement learning

Reinforcement learning (RL) can underpin a NAS search strategy. Barret Zoph and Quoc Viet Le [3] applied NAS with RL targeting the CIFAR-10 dataset and achieved a network architecture that rivals the best manually-designed architecture for accuracy, with an error rate of 3.65, 0.09 percent better and 1.05x faster than a related hand-designed model. On the Penn Treebank dataset, that model composed a recurrent cell that outperforms LSTM, reaching a test set perplexity of 62.4, or 3.6 perplexity better than the prior leading system. On the PTB character language modeling task it achieved bits per character of 1.214. [3]

Learning a model architecture directly on a large dataset can be a lengthy process. NASNet [4] [7] addressed this issue by transferring a building block designed for a small dataset to a larger dataset. The design was constrained to use two types of convolutional cells to return feature maps that serve two main functions when convoluting an input feature map: normal cells that return maps of the same extent (height and width) and reduction cells in which the returned feature map height and width is reduced by a factor of two. For the reduction cell, the initial operation applied to the cell's inputs uses a stride of two (to reduce the height and width). [4] The learned aspect of the design included elements such as which lower layer(s) each higher layer took as input, the transformations applied at that layer and to merge multiple outputs at each layer. In the studied example, the best convolutional layer (or "cell") was designed for the CIFAR-10 dataset and then applied to the ImageNet dataset by stacking copies of this cell, each with its own parameters. The approach yielded accuracy of 82.7% top-1 and 96.2% top-5. This exceeded the best human-invented architectures at a cost of 9 billion fewer FLOPS—a reduction of 28%. The system continued to exceed the manually-designed alternative at varying computation levels. The image features learned from image classification can be transferred to other computer vision problems. E.g., for object detection, the learned cells integrated with the Faster-RCNN framework improved performance by 4.0% on the COCO dataset. [4]

In the so-called Efficient Neural Architecture Search (ENAS), a controller discovers architectures by learning to search for an optimal subgraph within a large graph. The controller is trained with policy gradient to select a subgraph that maximizes the validation set's expected reward. The model corresponding to the subgraph is trained to minimize a canonical cross entropy loss. Multiple child models share parameters, ENAS requires fewer GPU-hours than other approaches and 1000-fold less than "standard" NAS. On CIFAR-10, the ENAS design achieved a test error of 2.89%, comparable to NASNet. On Penn Treebank, the ENAS design reached test perplexity of 55.8. [8]

Evolution

An alternative approach to NAS is based on evolutionary algorithms, which has been employed by several groups. [9] [10] [11] [12] [13] [14] [15] An Evolutionary Algorithm for Neural Architecture Search generally performs the following procedure. [16] First a pool consisting of different candidate architectures along with their validation scores (fitness) is initialised. At each step the architectures in the candidate pool are mutated (e.g.: 3x3 convolution instead of a 5x5 convolution). Next the new architectures are trained from scratch for a few epochs and their validation scores are obtained. This is followed by replacing the lowest scoring architectures in the candidate pool with the better, newer architectures. This procedure is repeated multiple times and thus the candidate pool is refined over time. Mutations in the context of evolving ANNs are operations such as adding or removing a layer, which include changing the type of a layer (e.g., from convolution to pooling), changing the hyperparameters of a layer, or changing the training hyperparameters. On CIFAR-10 and ImageNet, evolution and RL performed comparably, while both slightly outperformed random search. [12] [11]

Bayesian optimization

Bayesian Optimization (BO), which has proven to be an efficient method for hyperparameter optimization, can also be applied to NAS. In this context, the objective function maps an architecture to its validation error after being trained for a number of epochs. At each iteration, BO uses a surrogate to model this objective function based on previously obtained architectures and their validation errors. One then chooses the next architecture to evaluate by maximizing an acquisition function, such as expected improvement, which provides a balance between exploration and exploitation. Acquisition function maximization and objective function evaluation are often computationally expensive for NAS, and make the application of BO challenging in this context. Recently, BANANAS [17] has achieved promising results in this direction by introducing a high-performing instantiation of BO coupled to a neural predictor.

Hill-climbing

Another group used a hill climbing procedure that applies network morphisms, followed by short cosine-annealing optimization runs. The approach yielded competitive results, requiring resources on the same order of magnitude as training a single network. E.g., on CIFAR-10, the method designed and trained a network with an error rate below 5% in 12 hours on a single GPU. [18]

While most approaches solely focus on finding architecture with maximal predictive performance, for most practical applications other objectives are relevant, such as memory consumption, model size or inference time (i.e., the time required to obtain a prediction). Because of that, researchers created a multi-objective search. [15] [19]

LEMONADE [15] is an evolutionary algorithm that adopted Lamarckism to efficiently optimize multiple objectives. In every generation, child networks are generated to improve the Pareto frontier with respect to the current population of ANNs.

Neural Architect [19] is claimed to be a resource-aware multi-objective RL-based NAS with network embedding and performance prediction. Network embedding encodes an existing network to a trainable embedding vector. Based on the embedding, a controller network generates transformations of the target network. A multi-objective reward function considers network accuracy, computational resource and training time. The reward is predicted by multiple performance simulation networks that are pre-trained or co-trained with the controller network. The controller network is trained via policy gradient. Following a modification, the resulting candidate network is evaluated by both an accuracy network and a training time network. The results are combined by a reward engine that passes its output back to the controller network.

One-shot models

RL or evolution-based NAS require thousands of GPU-days of searching/training to achieve state-of-the-art computer vision results as described in the NASNet, mNASNet and MobileNetV3 papers. [4] [20] [21]

To reduce computational cost, many recent NAS methods rely on the weight-sharing idea. [22] [23] In this approach, a single overparameterized supernetwork (also known as the one-shot model) is defined. A supernetwork is a very large Directed Acyclic Graph (DAG) whose subgraphs are different candidate neural networks. Thus, in a supernetwork, the weights are shared among a large number of different sub-architectures that have edges in common, each of which is considered as a path within the supernet. The essential idea is to train one supernetwork that spans many options for the final design rather than generating and training thousands of networks independently. In addition to the learned parameters, a set of architecture parameters are learnt to depict preference for one module over another. Such methods reduce the required computational resources to only a few GPU days.

More recent works further combine this weight-sharing paradigm, with a continuous relaxation of the search space, [24] [25] [26] [27] which enables the use of gradient-based optimization methods. These approaches are generally referred to as differentiable NAS and have proven very efficient in exploring the search space of neural architectures. One of the most popular algorithms amongst the gradient-based methods for NAS is DARTS. [26] However, DARTS faces problems such as performance collapse due to an inevitable aggregation of skip connections and poor generalization which were tackled by many future algorithms. [28] [29] [30] [31] Methods like [29] [30] aim at robustifying DARTS and making the validation accuracy landscape smoother by introducing a Hessian norm based regularisation and random smoothing/adversarial attack respectively. The cause of performance degradation is later analyzed from the architecture selection aspect. [32]

Differentiable NAS has shown to produce competitive results using a fraction of the search-time required by RL-based search methods. For example, FBNet (which is short for Facebook Berkeley Network) demonstrated that supernetwork-based search produces networks that outperform the speed-accuracy tradeoff curve of mNASNet and MobileNetV2 on the ImageNet image-classification dataset. FBNet accomplishes this using over 400x less search time than was used for mNASNet. [33] [34] [35] Further, SqueezeNAS demonstrated that supernetwork-based NAS produces neural networks that outperform the speed-accuracy tradeoff curve of MobileNetV3 on the Cityscapes semantic segmentation dataset, and SqueezeNAS uses over 100x less search time than was used in the MobileNetV3 authors' RL-based search. [36] [37]

Neural architecture search benchmarks

Neural architecture search often requires large computational resources, due to its expensive training and evaluation phases. This further leads to a large carbon footprint required for the evaluation of these methods. To overcome this limitation, NAS benchmarks [38] [39] [40] [41] have been introduced, from which one can either query or predict the final performance of neural architectures in seconds. A NAS benchmark is defined as a dataset with a fixed train-test split, a search space, and a fixed training pipeline (hyperparameters). There are primarily two types of NAS benchmarks: a surrogate NAS benchmark and a tabular NAS benchmark. A surrogate benchmark uses a surrogate model (e.g.: a neural network) to predict the performance of an architecture from the search space. On the other hand, a tabular benchmark queries the actual performance of an architecture trained up to convergence. Both of these benchmarks are queryable and can be used to efficiently simulate many NAS algorithms using only a CPU to query the benchmark instead of training an architecture from scratch.

See also

Further reading

Survey articles.

Related Research Articles

<span class="mw-page-title-main">Neural network (machine learning)</span> Computational model used in machine learning, based on connected, hierarchical functions

Neural networks are a branch of machine learning models inspired by the neuronal organization found in the biological neural networks in animal brains.

<span class="mw-page-title-main">Activation function</span> Artificial neural network node function

The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is nonlinear. Modern activation functions include the smooth version of the ReLU, the GELU, which was used in the 2018 BERT model, the logistic (sigmoid) function used in the 2012 speech recognition model developed by Hinton et al, the ReLU used in the 2012 AlexNet computer vision model and in the 2015 ResNet model.

Kunihiko Fukushima is a Japanese computer scientist, most noted for his work on artificial neural networks and deep learning. He is currently working part-time as a senior research scientist at the Fuzzy Logic Systems Institute in Fukuoka, Japan.

In machine learning, a hyperparameter is a parameter, such as the learning rate or choice of optimizer, which specifies details of the learning process, hence the name hyperparameter. This is in contrast to parameters which determine the model itself.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process.

Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems.

<span class="mw-page-title-main">Residual neural network</span> Deep learning method

A residual neural network is a deep learning model in which the weight layers learn residual functions with reference to the layer inputs. It behaves like a highway network whose gates are opened through strongly positive bias weights. This enables deep learning models with tens or hundreds of layers to train easily and approach better accuracy when going deeper. The identity skip connections, often referred to as "residual connections", are also used in the 1997 LSTM networks, Transformer models, the AlphaGo Zero system, the AlphaStar system, and the AlphaFold system.

The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

U-Net is a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg. The network is based on a fully convolutional neural network whose architecture was modified and extended to work with fewer training images and to yield more precise segmentation. Segmentation of a 512 × 512 image takes less than a second on a modern GPU.

In computer vision, SqueezeNet is the name of a deep neural network for image classification that was released in 2016. SqueezeNet was developed by researchers at DeepScale, University of California, Berkeley, and Stanford University. In designing SqueezeNet, the authors' goal was to create a smaller neural network with fewer parameters while achieving competitive accuracy.

DeepScale, Inc. was an American technology company headquartered in Mountain View, California, that developed perceptual system technologies for automated vehicles. On October 1, 2019, the company was acquired by Tesla, Inc.

<span class="mw-page-title-main">Federated learning</span> Decentralized machine learning

Federated learning is a sub-field of machine learning focusing on settings in which multiple entities collaboratively train a model while ensuring that their data remains decentralized. This stands in contrast to machine learning settings in which data is centrally stored. One of the primary defining characteristics of federated learning is data heterogeneity. Due to the decentralized nature of the clients' data, there is no guarantee that data samples held by each client are independently and identically distributed.

Bidirectional Encoder Representations from Transformers (BERT) is a language model based on the transformer architecture, notable for its dramatic improvement over previous state of the art models. It was introduced in October 2018 by researchers at Google. A 2020 literature survey concluded that "in a little over a year, BERT has become a ubiquitous baseline in Natural Language Processing (NLP) experiments counting over 150 research publications analyzing and improving the model."

<span class="mw-page-title-main">Knowledge graph embedding</span> Dimensionality reduction of graph-based semantic data objects [machine learning task]

In representation learning, knowledge graph embedding (KGE), also referred to as knowledge representation learning (KRL), or multi-relation learning, is a machine learning task of learning a low-dimensional representation of a knowledge graph's entities and relations while preserving their semantic meaning. Leveraging their embedded representation, knowledge graphs (KGs) can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction.

A graph neural network (GNN) belongs to a class of artificial neural networks for processing data that can be represented as graphs.

A vision transformer (ViT) is a transformer designed for computer vision. A ViT breaks down an input image into a series of patches, serialises each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings.

The Fashion MNIST dataset is a large freely available database of fashion images that is commonly used for training and testing various machine learning systems. Fashion-MNIST was intended to serve as a replacement for the original MNIST database for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.

Lê Viết Quốc, or in romanized form Quoc Viet Le, is a Vietnamese-American computer scientist and a machine learning pioneer at Google Brain, which he established with others from Google. He co-invented the doc2vec and seq2seq models in natural language processing. Le also initiated and lead the AutoML initiative at Google Brain, including the proposal of neural architecture search.

References

  1. 1 2 Elsken, Thomas; Metzen, Jan Hendrik; Hutter, Frank (August 8, 2019). "Neural Architecture Search: A Survey". Journal of Machine Learning Research. 20 (55): 1–21. arXiv: 1808.05377 .
  2. Wistuba, Martin; Rawat, Ambrish; Pedapati, Tejaswini (2019-05-04). "A Survey on Neural Architecture Search". arXiv: 1905.01392 [cs.LG].
  3. 1 2 3 Zoph, Barret; Le, Quoc V. (2016-11-04). "Neural Architecture Search with Reinforcement Learning". arXiv: 1611.01578 [cs.LG].
  4. 1 2 3 4 5 Zoph, Barret; Vasudevan, Vijay; Shlens, Jonathon; Le, Quoc V. (2017-07-21). "Learning Transferable Architectures for Scalable Image Recognition". arXiv: 1707.07012 [cs.CV].
  5. Matthias Feurer and Frank Hutter. Hyperparameter optimization. In: AutoML: Methods, Systems, Challenges, pages 3–38.
  6. Vanschoren, Joaquin (2019). "Meta-Learning". Automated Machine Learning. The Springer Series on Challenges in Machine Learning. pp. 35–61. doi:10.1007/978-3-030-05318-5_2. ISBN   978-3-030-05317-8. S2CID   239362577.
  7. Zoph, Barret; Vasudevan, Vijay; Shlens, Jonathon; Le, Quoc V. (November 2, 2017). "AutoML for large scale image classification and object detection". Research Blog. Retrieved 2018-02-20.
  8. Pham, Hieu; Guan, Melody Y.; Zoph, Barret; Le, Quoc V.; Dean, Jeff (2018-02-09). "Efficient Neural Architecture Search via Parameter Sharing". arXiv: 1802.03268 [cs.LG].
  9. Real, Esteban; Moore, Sherry; Selle, Andrew; Saxena, Saurabh; Suematsu, Yutaka Leon; Tan, Jie; Le, Quoc; Kurakin, Alex (2017-03-03). "Large-Scale Evolution of Image Classifiers". arXiv: 1703.01041 [cs.NE].
  10. Suganuma, Masanori; Shirakawa, Shinichi; Nagao, Tomoharu (2017-04-03). "A Genetic Programming Approach to Designing Convolutional Neural Network Architectures". arXiv: 1704.00764v2 [cs.NE].
  11. 1 2 Liu, Hanxiao; Simonyan, Karen; Vinyals, Oriol; Fernando, Chrisantha; Kavukcuoglu, Koray (2017-11-01). "Hierarchical Representations for Efficient Architecture Search". arXiv: 1711.00436v2 [cs.LG].
  12. 1 2 Real, Esteban; Aggarwal, Alok; Huang, Yanping; Le, Quoc V. (2018-02-05). "Regularized Evolution for Image Classifier Architecture Search". arXiv: 1802.01548 [cs.NE].
  13. Miikkulainen, Risto; Liang, Jason; Meyerson, Elliot; Rawal, Aditya; Fink, Dan; Francon, Olivier; Raju, Bala; Shahrzad, Hormoz; Navruzyan, Arshak; Duffy, Nigel; Hodjat, Babak (2017-03-04). "Evolving Deep Neural Networks". arXiv: 1703.00548 [cs.NE].
  14. Xie, Lingxi; Yuille, Alan (2017). "Genetic CNN". 2017 IEEE International Conference on Computer Vision (ICCV). pp. 1388–1397. arXiv: 1703.01513 . doi:10.1109/ICCV.2017.154. ISBN   978-1-5386-1032-9. S2CID   206770867.
  15. 1 2 3 Elsken, Thomas; Metzen, Jan Hendrik; Hutter, Frank (2018-04-24). "Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution". arXiv: 1804.09081 [stat.ML].
  16. Liu, Yuqiao; Sun, Yanan; Xue, Bing; Zhang, Mengjie; Yen, Gary G; Tan, Kay Chen (2021). "A Survey on Evolutionary Neural Architecture Search". IEEE Transactions on Neural Networks and Learning Systems. 34 (2): 1–21. arXiv: 2008.10937 . doi:10.1109/TNNLS.2021.3100554. PMID   34357870. S2CID   221293236.
  17. White, Colin; Neiswanger, Willie; Savani, Yash (2020-11-02). "BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search". arXiv: 1910.11858 [cs.LG].
  18. Thomas, Elsken; Jan Hendrik, Metzen; Frank, Hutter (2017-11-13). "Simple And Efficient Architecture Search for Convolutional Neural Networks". arXiv: 1711.04528 [stat.ML].
  19. 1 2 Zhou, Yanqi; Diamos, Gregory. "Neural Architect: A Multi-objective Neural Architecture Search with Performance Prediction" (PDF). Baidu. Archived from the original (PDF) on 2019-09-27. Retrieved 2019-09-27.
  20. Tan, Mingxing; Chen, Bo; Pang, Ruoming; Vasudevan, Vijay; Sandler, Mark; Howard, Andrew; Le, Quoc V. (2018). "MnasNet: Platform-Aware Neural Architecture Search for Mobile". arXiv: 1807.11626 [cs.CV].
  21. Howard, Andrew; Sandler, Mark; Chu, Grace; Chen, Liang-Chieh; Chen, Bo; Tan, Mingxing; Wang, Weijun; Zhu, Yukun; Pang, Ruoming; Vasudevan, Vijay; Le, Quoc V.; Adam, Hartwig (2019-05-06). "Searching for MobileNetV3". arXiv: 1905.02244 [cs.CV].
  22. Pham, Hieu; Guan, Melody Y.; Zoph, Barret; Le, Quoc V.; Dean, Jeff (2018). "Efficient Neural Architecture Search via Parameter Sharing". arXiv: 1802.03268 [cs.LG].
  23. Li, Liam; Talwalkar, Ameet (2019). "Random Search and Reproducibility for Neural Architecture Search". arXiv: 1902.07638 [cs.LG].
  24. Cai, Han; Zhu, Ligeng; Han, Song (2018). "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware". arXiv: 1812.00332 [cs.LG].
  25. Dong, Xuanyi; Yang, Yi (2019). "Searching for a Robust Neural Architecture in Four GPU Hours". arXiv: 1910.04465 [cs.CV].
  26. 1 2 Liu, Hanxiao; Simonyan, Karen; Yang, Yiming (2018). "DARTS: Differentiable Architecture Search". arXiv: 1806.09055 [cs.LG].
  27. Xie, Sirui; Zheng, Hehui; Liu, Chunxiao; Lin, Liang (2018). "SNAS: Stochastic Neural Architecture Search". arXiv: 1812.09926 [cs.LG].
  28. Chu, Xiangxiang; Zhou, Tianbao; Zhang, Bo; Li, Jixiang (2019). "Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search". arXiv: 1911.12126 [cs.LG].
  29. 1 2 Zela, Arber; Elsken, Thomas; Saikia, Tonmoy; Marrakchi, Yassine; Brox, Thomas; Hutter, Frank (2019). "Understanding and Robustifying Differentiable Architecture Search". arXiv: 1909.09656 [cs.LG].
  30. 1 2 Chen, Xiangning; Hsieh, Cho-Jui (2020). "Stabilizing Differentiable Architecture Search via Perturbation-based Regularization". arXiv: 2002.05283 [cs.LG].
  31. Xu, Yuhui; Xie, Lingxi; Zhang, Xiaopeng; Chen, Xin; Qi, Guo-Jun; Tian, Qi; Xiong, Hongkai (2019). "PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search". arXiv: 1907.05737 [cs.CV].
  32. Wang, Ruochen; Cheng, Minhao; Chen, Xiangning; Tang, Xiaocheng; Hsieh, Cho-Jui (2021). "Rethinking Architecture Selection in Differentiable NAS". arXiv: 2108.04392 [cs.LG].
  33. Wu, Bichen; Dai, Xiaoliang; Zhang, Peizhao; Wang, Yanghan; Sun, Fei; Wu, Yiming; Tian, Yuandong; Vajda, Peter; Jia, Yangqing; Keutzer, Kurt (24 May 2019). "FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search". arXiv: 1812.03443 [cs.CV].
  34. Sandler, Mark; Howard, Andrew; Zhu, Menglong; Zhmoginov, Andrey; Chen, Liang-Chieh (2018). "MobileNetV2: Inverted Residuals and Linear Bottlenecks". arXiv: 1801.04381 [cs.CV].
  35. Keutzer, Kurt (2019-05-22). "Co-Design of DNNs and NN Accelerators" (PDF). IEEE. Retrieved 2019-09-26.
  36. Shaw, Albert; Hunter, Daniel; Iandola, Forrest; Sidhu, Sammy (2019). "SqueezeNAS: Fast neural architecture search for faster semantic segmentation". arXiv: 1908.01748 [cs.CV].
  37. Yoshida, Junko (2019-08-25). "Does Your AI Chip Have Its Own DNN?". EE Times. Retrieved 2019-09-26.
  38. Ying, Chris; Klein, Aaron; Real, Esteban; Christiansen, Eric; Murphy, Kevin; Hutter, Frank (2019). "NAS-Bench-101: Towards Reproducible Neural Architecture Search". arXiv: 1902.09635 [cs.LG].
  39. Zela, Arber; Siems, Julien; Hutter, Frank (2020). "NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search". arXiv: 2001.10422 [cs.LG].
  40. Dong, Xuanyi; Yang, Yi (2020). "NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search". arXiv: 2001.00326 [cs.CV].
  41. Zela, Arber; Siems, Julien; Zimmer, Lucas; Lukasik, Jovita; Keuper, Margret; Hutter, Frank (2020). "Surrogate NAS Benchmarks: Going Beyond the Limited Search Spaces of Tabular NAS Benchmarks". arXiv: 2008.09777 [cs.LG].