EfficientNet

EfficientNet
Developer(s)	Google AI
Initial release	May 2019
Repository	github.com/tensorflow/tpu/tree/master/models/official/efficientnet
Written in	Python
License	Apache License 2.0
Website	Google AI Blog

Last updated October 21, 2024

EfficientNet is a family of convolutional neural networks (CNNs) for computer vision published by researchers at Google AI in 2019.^[1] Its key innovation is compound scaling, which uniformly scales all dimensions of depth, width, and resolution using a single parameter.

Compound scaling

EfficientNet introduces compound scaling, which, instead of scaling one dimension of the network at a time, such as depth (number of layers), width (number of channels), or resolution (input image size), uses a compound coefficient $\phi$ to scale all three dimensions simultaneously. Specifically, given a baseline network, the depth, width, and resolution are scaled according to the following equations:^[1] ${\begin{aligned}{\text{depth multiplier: }}d&=\alpha ^{\phi }\\{\text{width multiplier: }}w&=\beta ^{\phi }\\{\text{resolution multiplier: }}r&=\gamma ^{\phi }\end{aligned}}$ subject to $\alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2$ and $\alpha \geq 1,\beta \geq 1,\gamma \geq 1$ . The $\alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2$ condition is such that increasing $\phi$ by a factor of $\phi _{0}$ would increase the total FLOPs of running the network on an image approximately $2^{\phi _{0}}$ times. The hyperparameters $\alpha$ , $\beta$ , and $\gamma$ are determined by a small grid search. The original paper suggested 1.2, 1.1, and 1.15, respectively.

Architecturally, they optimized the choice of modules by neural architecture search (NAS), and found that the inverted bottleneck convolution (which they called MBConv) used in MobileNet worked well.

The EfficientNet family is a stack of MBConv layers, with shapes determined by the compound scaling. The original publication consisted of 8 models, from EfficientNet-B0 to EfficientNet-B7, with increasing model size and accuracy. EfficientNet-B0 is the baseline network, and subsequent models are obtained by scaling the baseline network by increasing $\phi$ .

Variants

EfficientNet has been adapted for fast inference on edge TPUs ^[2] and centralized TPU or GPU clusters by NAS.^[3]

EfficientNet V2 was published in June 2021. The architecture was improved by further NAS search with more types of convolutional layers.^[4] It also introduced a training method, which progressively increases image size during training, and uses regularization techniques like dropout, RandAugment,^[5] and Mixup.^[6] The authors claim this approach mitigates accuracy drops often associated with progressive resizing.

Related Research Articles

In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.

In electromagnetism, the electromagnetic tensor or electromagnetic field tensor is a mathematical object that describes the electromagnetic field in spacetime. The field tensor was first used after the four-dimensional tensor formulation of special relativity was introduced by Hermann Minkowski. The tensor allows related physical laws to be written concisely, and allows for the quantization of the electromagnetic field by the Lagrangian formulation described below.

In physics, precisely in the study of the theory of general relativity and many alternatives to it, the post-Newtonian formalism is a calculational tool that expresses Einstein's (nonlinear) equations of gravity in terms of the lowest-order deviations from Newton's law of universal gravitation. This allows approximations to Einstein's equations to be made in the case of weak fields. Higher-order terms can be added to increase accuracy, but for strong fields, it may be preferable to solve the complete equations numerically. Some of these post-Newtonian approximations are expansions in a small parameter, which is the ratio of the velocity of the matter forming the gravitational field to the speed of light, which in this case is better called the speed of gravity. In the limit, when the fundamental speed of gravity becomes infinite, the post-Newtonian expansion reduces to Newton's law of gravity.

Scalar–tensor–vector gravity (STVG) is a modified theory of gravity developed by John Moffat, a researcher at the Perimeter Institute for Theoretical Physics in Waterloo, Ontario. The theory is also often referred to by the acronym MOG.

In natural language processing, latent Dirichlet allocation (LDA) is a Bayesian network for modeling automatically extracted topics in textual corpora. The LDA is an example of a Bayesian topic model. In this, observations are collected into documents, and each word's presence is attributable to one of the document's topics. Each document will contain a small number of topics.

The time-evolving block decimation (TEBD) algorithm is a numerical scheme used to simulate one-dimensional quantum many-body systems, characterized by at most nearest-neighbour interactions. It is dubbed Time-evolving Block Decimation because it dynamically identifies the relevant low-dimensional Hilbert subspaces of an exponentially larger original Hilbert space. The algorithm, based on the Matrix Product States formalism, is highly efficient when the amount of entanglement in the system is limited, a requirement fulfilled by a large class of quantum many-body systems in one dimension.

In differential geometry, a field of mathematics, a Courant algebroid is a vector bundle together with an inner product and a compatible bracket more general than that of a Lie algebroid.

<span class="mw-page-title-main">Truncated normal distribution</span> Type of probability distribution

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above. The truncated normal distribution has wide applications in statistics and econometrics.

The Birnbaum–Saunders distribution, also known as the fatigue life distribution, is a probability distribution used extensively in reliability applications to model failure times. There are several alternative formulations of this distribution in the literature. It is named after Z. W. Birnbaum and S. C. Saunders.

In mathematics, the Veblen functions are a hierarchy of normal functions, introduced by Oswald Veblen in Veblen (1908). If φ₀ is any normal function, then for any non-zero ordinal α, φ_α is the function enumerating the common fixed points of φ_β for β<α. These functions are all normal.

We take the functional theoretic algebra C[0, 1] of curves. For each loop γ at 1, and each positive integer n, we define a curve $called n -curve . The n -curves are interesting in two ways.$

Their f-products, sums and differences give rise to many beautiful curves.
Using the n-curves, we can define a transformation of curves, called n-curving.

The multivariate stable distribution is a multivariate probability distribution that is a multivariate generalisation of the univariate stable distribution. The multivariate stable distribution defines linear relations between stable distribution marginals. In the same way as for the univariate case, the distribution is defined in terms of its characteristic function.

In mathematics, Ricci calculus constitutes the rules of index notation and manipulation for tensors and tensor fields on a differentiable manifold, with or without a metric tensor or connection. It is also the modern name for what used to be called the absolute differential calculus, developed by Gregorio Ricci-Curbastro in 1887–1896, and subsequently popularized in a paper written with his pupil Tullio Levi-Civita in 1900. Jan Arnoldus Schouten developed the modern notation and formalism for this mathematical framework, and made contributions to the theory, during its applications to general relativity and differential geometry in the early twentieth century.

Asymptotic safety is a concept in quantum field theory which aims at finding a consistent and predictive quantum theory of the gravitational field. Its key ingredient is a nontrivial fixed point of the theory's renormalization group flow which controls the behavior of the coupling constants in the ultraviolet (UV) regime and renders physical quantities safe from divergences. Although originally proposed by Steven Weinberg to find a theory of quantum gravity, the idea of a nontrivial fixed point providing a possible UV completion can be applied also to other field theories, in particular to perturbatively nonrenormalizable ones. In this respect, it is similar to quantum triviality.

In differential geometry, a field in mathematics, a Lie bialgebroid consists of two compatible Lie algebroids defined on dual vector bundles. Lie bialgebroids are the vector bundle version of Lie bialgebras.

Batch normalization is a method used to make training of artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling. It was proposed by Sergey Ioffe and Christian Szegedy in 2015.

<span class="mw-page-title-main">Glossary of Lie groups and Lie algebras</span>

This is a glossary for the terminology applied in the mathematical theories of Lie groups and Lie algebras. For the topics in the representation theory of Lie groups and Lie algebras, see Glossary of representation theory. Because of the lack of other options, the glossary also includes some generalizations such as quantum group.

In machine learning, diffusion models, also known as diffusion probabilistic models or score-based generative models, are a class of latent variable generative models. A diffusion model consists of three major components: the forward process, the reverse process, and the sampling procedure. The goal of diffusion models is to learn a diffusion process for a given dataset, such that the process can generate new elements that are distributed similarly as the original dataset. A diffusion model models data as generated by a diffusion process, whereby a new datum performs a random walk with drift through the space of all possible data. A trained diffusion model can be sampled in many ways, with different efficiency and quality.

In machine learning, normalization is a statistical technique with various applications. There are mainly two forms of normalization, data normalization and activation normalization. Data normalization, or feature scaling, is a general technique in statistics, and it includes methods that rescale input data so that they have well-behaved range, mean, variance, and other statistical properties. Activation normalization is specific to deep learning, and it includes methods that rescale the activation of hidden neurons inside a neural network.

MobileNet is a family of convolutional neural network (CNN) architectures designed for image classification, object detection, and other computer vision tasks. They are designed for small size, low latency, and low power consumption, making them suitable for on-device inference and edge computing on resource-constrained devices like mobile phones and embedded systems. They were originally designed to be run efficiently on mobile devices with TensorFlow Lite.

References

1 2 Tan, Mingxing; Le, Quoc V. (2020-09-11), EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, arXiv: 1905.11946
↑ "EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML". research.google. August 6, 2019. Retrieved 2024-10-18.
↑ Li, Sheng; Tan, Mingxing; Pang, Ruoming; Li, Andrew; Cheng, Liqun; Le, Quoc; Jouppi, Norman P. (2021-02-10), Searching for Fast Model Families on Datacenter Accelerators, arXiv: 2102.05610
↑ Tan, Mingxing; Le, Quoc V. (2021-06-23), EfficientNetV2: Smaller Models and Faster Training, arXiv: 2104.00298
↑ Cubuk, Ekin D.; Zoph, Barret; Shlens, Jonathon; Le, Quoc V. (2020). "Randaugment: Practical Automated Data Augmentation With a Reduced Search Space": 702–703. arXiv: 1909.13719 .{{cite journal}}: Cite journal requires |journal= (help)
↑ Zhang, Hongyi; Cisse, Moustapha; Dauphin, Yann N.; Lopez-Paz, David (2018-04-27), mixup: Beyond Empirical Risk Minimization, arXiv: 1710.09412

External links

EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling (Google AI Blog)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 Tan, Mingxing; Le, Quoc V. (2020-09-11), EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, arXiv: 1905.11946

[2] "EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML". research.google. August 6, 2019. Retrieved 2024-10-18.

[3] Li, Sheng; Tan, Mingxing; Pang, Ruoming; Li, Andrew; Cheng, Liqun; Le, Quoc; Jouppi, Norman P. (2021-02-10), Searching for Fast Model Families on Datacenter Accelerators, arXiv: 2102.05610

[4] Tan, Mingxing; Le, Quoc V. (2021-06-23), EfficientNetV2: Smaller Models and Faster Training, arXiv: 2104.00298

[5] Cubuk, Ekin D.; Zoph, Barret; Shlens, Jonathon; Le, Quoc V. (2020). "Randaugment: Practical Automated Data Augmentation With a Reduced Search Space": 702–703. arXiv: 1909.13719 .{{cite journal}}: Cite journal requires |journal= (help)

[6] Zhang, Hongyi; Cisse, Moustapha; Dauphin, Yann N.; Lopez-Paz, David (2018-04-27), mixup: Beyond Empirical Risk Minimization, arXiv: 1710.09412

[1]

[2]

[3]

[4]

[5]

[6]