ARM big.LITTLE

Last updated

Cortex A57/A53 MPCore big.LITTLE CPU chip ARMCortexA57A53.jpg
Cortex A57/A53 MPCore big.LITTLE CPU chip

ARM big.LITTLE is a heterogeneous computing architecture developed by ARM Holdings, coupling relatively battery-saving and slower processor cores (LITTLE) with relatively more powerful and power-hungry ones (big). The intention is to create a multi-core processor that can adjust better to dynamic computing needs and use less power than clock scaling alone. ARM's marketing material promises up to a 75% savings in power usage for some activities. [1] Most commonly, ARM big.LITTLE architectures are used to create a multi-processor system-on-chip (MPSoC).

Contents

In October 2011, big.LITTLE was announced along with the Cortex-A7, which was designed to be architecturally compatible with the Cortex-A15. [2] In October 2012 ARM announced the Cortex-A53 and Cortex-A57 (ARMv8-A) cores, which are also intercompatible to allow their use in a big.LITTLE chip. [3] ARM later announced the Cortex-A12 at Computex 2013 followed by the Cortex-A17 in February 2014. Both the Cortex-A12 and the Cortex-A17 can also be paired in a big.LITTLE configuration with the Cortex-A7. [4] [5]

The problem that big.LITTLE solves

For a given library of CMOS logic, active power increases as the logic switches more per second, while leakage increases with the number of transistors. So, CPUs designed to run fast are different from CPUs designed to save power. When a very fast out-of-order CPU is idling at very low speeds, a CPU with much less leakage (fewer transistors) could do the same work. For example, it might use a smaller (fewer transistors) memory cache, or a simpler microarchitecture such as removing out-of-order execution. big.LITTLE is a way to optimize for both cases: Power and speed, in the same system.

In practice, a big.LITTLE system can be surprisingly inflexible. One issue is the number and types of power and clock domains that the IC provides. These may not match the standard power management features offered by an operating system. Another is that the CPUs no longer have equivalent abilities, and matching the right software task to the right CPU becomes more difficult. Most of these problems are being solved by making the electronics and software more flexible.

Run-state migration

There are three ways [6] for the different processor cores to be arranged in a big.LITTLE design, depending on the scheduler implemented in the kernel. [7]

Clustered switching

Big.Little clustered switching Big.Little Cluster Switching.png
Big.Little clustered switching

The clustered model approach is the first and simplest implementation, arranging the processor into identically sized clusters of "big" or "LITTLE" cores. The operating system scheduler can only see one cluster at a time; when the load on the whole processor changes between low and high, the system transitions to the other cluster. All relevant data are then passed through the common L2 cache, the active core cluster is powered off and the other one is activated. A Cache Coherent Interconnect (CCI) is used. This model has been implemented in the Samsung Exynos 5 Octa (5410). [8]

In-kernel switcher (CPU migration)

Big.Little in-kernel switcher In Kernel Switcher.jpg
Big.Little in-kernel switcher

CPU migration via the in-kernel switcher (IKS) involves pairing up a 'big' core with a 'LITTLE' core, with possibly many identical pairs in one chip. Each pair operates as one so-termed virtual core, and only one real core is (fully) powered up and running at a time. The 'big' core is used when the demand is high and the 'LITTLE' core is employed when demand is low. When demand on the virtual core changes (between high and low), the incoming core is powered up, running state is transferred, the outgoing is shut down, and processing continues on the new core. Switching is done via the cpufreq framework. A complete big.LITTLE IKS implementation was added in Linux 3.11. big.LITTLE IKS is an improvement of cluster migration (§ Clustered switching), the main difference being that each pair is visible to the scheduler.

A more complex arrangement involves a non-symmetric grouping of 'big' and 'LITTLE' cores. A single chip could have one or two 'big' cores and many more 'LITTLE' cores, or vice versa. Nvidia created something similar to this with the low-power 'companion core' in their Tegra 3 System-on-Chip.

Heterogeneous multi-processing (global task scheduling)

Big.Little heterogeneous multi-processing Global Task Scheduling.jpg
Big.Little heterogeneous multi-processing

The most powerful use model of big.LITTLE architecture is heterogeneous multi-processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores. [9]

This model has been implemented in the Samsung Exynos starting with the Exynos 5 Octa series (5420, 5422, 5430), [10] [11] and Apple A series processors starting with the Apple A11. [12]

Scheduling

The paired arrangement allows for switching to be done transparently to the operating system using the existing dynamic voltage and frequency scaling (DVFS) facility. The existing DVFS support in the kernel (e.g. cpufreq in Linux) will simply see a list of frequencies/voltages and will switch between them as it sees fit, just like it does on the existing hardware. However, the low-end slots will activate the 'Little' core and the high-end slots will activate the 'Big' core. This is the early solution provided by Linux's "deadline" CPU scheduler (not to be confused with the I/O scheduler with the same name) since 2012. [13]

Alternatively, all the cores may be exposed to the kernel scheduler, which will decide where each process/thread is executed. This will be required for the non-paired arrangement but could possibly also be used on the paired cores. It poses unique problems for the kernel scheduler, which, at least with modern commodity hardware, has been able to assume all cores in a SMP system are equal rather than heterogeneous. A 2019 addition to Linux 5.0 called Energy Aware Scheduling is an example of a scheduler that considers cores differently. [14] [15]

Advantages of global task scheduling

Successor

In May 2017, ARM announced DynamIQ as the successor to big.LITTLE. [16] DynamIQ is expected to allow for more flexibility and scalability when designing multi-core processors. In contrast to big.LITTLE, it increases the maximum number of cores in a cluster to 8 for Armv8.2 CPUs, 12 for Armv9 and 14 for Armv9.2 [17] and allows for varying core designs within a single cluster, and up to 32 total clusters. The technology also offers more fine grained per core voltage control and faster L2 cache speeds. However, DynamIQ is incompatible with previous ARM designs and is initially only supported by the Cortex-A75 and Cortex-A55 CPU cores and their successors.

Related Research Articles

ARM is a family of RISC instruction set architectures (ISAs) for computer processors. Arm Ltd. develops the ISAs and licenses them to other companies, who build the physical devices that use the instruction set. It also designs and licenses cores that implement these ISAs.

<span class="mw-page-title-main">Tegra</span> System on a chip by Nvidia

Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics processing unit (GPU), northbridge, southbridge, and memory controller onto one package. Early Tegra SoCs are designed as efficient multimedia processors. The Tegra-line evolved to emphasize performance for gaming and machine learning applications without sacrificing power efficiency, before taking a drastic shift in direction towards platforms that provide vehicular automation with the applied "Nvidia Drive" brand name on reference boards and its semiconductors; and with the "Nvidia Jetson" brand name for boards adequate for AI applications within e.g. robots or drones, and for various smart high level automation purposes.

<span class="mw-page-title-main">ARM Cortex-A15</span> Family of microprocessor cores with ARM microarchitecture

The ARM Cortex-A15 MPCore is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture. It is a multicore processor with out-of-order superscalar pipeline running at up to 2.5 GHz.

<span class="mw-page-title-main">Exynos</span> Family of system-on-a-chip models with ARM processor cores

Exynos, formerly Hummingbird (Korean: 엑시노스), is a series of ARM-based system-on-chips developed by Samsung Electronics' System LSI division and manufactured by Samsung Foundry. It is a continuation of Samsung's earlier S3C, S5L and S5P line of SoCs.

<span class="mw-page-title-main">ARM Cortex-A7</span> 2011 computer microprocessor core

The ARM Cortex-A7 MPCore is a 32-bit microprocessor core licensed by ARM Holdings implementing the ARMv7-A architecture announced in 2011.

<span class="mw-page-title-main">Mali (processor)</span> Series of graphics processing units produced by ARM Holdings

The Mali and Immortalis series of graphics processing units (GPUs) and multimedia processors are semiconductor intellectual property cores produced by Arm Holdings for licensing in various ASIC designs by Arm partners.

<span class="mw-page-title-main">Cubieboard</span>

Cubieboard is a single-board computer, made in Zhuhai, Guangdong, China. The first short run of prototype boards were sold internationally in September 2012, and the production version started to be sold in October 2012. It can run Android 4 ICS, Ubuntu 12.04 desktop, Fedora 19 ARM Remix desktop, Armbian, Arch Linux ARM, a Debian-based Cubian distribution, FreeBSD, or OpenBSD.

<span class="mw-page-title-main">Allwinner Technology</span> Fabless semiconductor company

Allwinner Technology Co., Ltd is a fabless semiconductor company that designs mixed-signal systems on a chip (SoC). The company is headquartered in Zhuhai, Guangdong, China. It has a sales and technical support office in Shenzhen, Guangdong, and logistics operations in Hong Kong.

The ARM Cortex-A17 is a 32-bit processor core implementing the ARMv7-A architecture, licensed by ARM Holdings. Providing up to four cache-coherent cores, it serves as the successor to the Cortex-A9 and replaces the previous ARM Cortex-A12 specifications. ARM claims that the Cortex-A17 core provides 60% higher performance than the Cortex-A9 core, while reducing the power consumption by 20% under the same workload.

Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

This is a comparison of ARM instruction set architecture application processor cores designed by ARM Holdings and 3rd parties. It does not include ARM Cortex-R, ARM Cortex-M, or legacy ARM cores.

Qualcomm Kryo is a series of custom or semi-custom ARM-based CPUs included in the Snapdragon line of SoCs.

The ARM Cortex-A76 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. ARM states a 25% and 35% increase in integer and floating point performance, respectively, over a Cortex-A75 of the previous generation.

The ARM Cortex-A77 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. ARM announced an increase of 23% and 35% in integer and floating point performance, respectively. Memory bandwidth increased 15% relative to the A76.

The ARM Cortex-A78 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Ltd.'s Austin centre.

The ARM Cortex-X1 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre as part of ARM's Cortex-X Custom (CXC) program.

<span class="mw-page-title-main">Samsung Galaxy A12</span> 2020 Android smartphone by Samsung

The Samsung Galaxy A12 is an Android smartphone manufactured by Samsung Electronics. This phone was announced in November 2020 as a successor to the Samsung Galaxy A11. The phone has a quad-camera setup with a 48 MP main camera and a 6.5 in (170 mm) HD+ Infinity-V display. The Li-Po battery has 5000 mAh. It shipped with Android 10, which can be updated to Android 12.

References

  1. "big.LITTLE technology". ARM.com. Archived from the original on 22 October 2012. Retrieved 17 October 2012.
  2. "ARM Unveils its Most Energy Efficient Application Processor Ever; Redefines Traditional Power And Performance Relationship With big.LITTLE Processing" (Press release). ARM Holdings. 19 October 2011. Retrieved 31 October 2012.
  3. "ARM Launches Cortex-A50 Series, the World's Most Energy-Efficient 64-bit Processors" (Press release). ARM Holdings . Retrieved 31 October 2012.
  4. "ARM's new Cortex-A12 is ready to power 2014's $200 midrange smartphones". The Verge. April 2014.
  5. "ARM Cortex A17: An Evolved Cortex A12 for the Mainstream in 2015". AnandTech. April 2014.
  6. Brian Jeff (18 June 2013). "Ten Things to Know About big.LITTLE". ARM Holdings. Archived from the original on 10 September 2013. Retrieved 17 September 2013.
  7. George Grey (10 July 2013). "big.LITTLE Software Update". Linaro. Archived from the original on 4 October 2013. Retrieved 17 September 2013.
  8. Peter Clarke (6 August 2013). "Benchmarking ARM's big-little architecture" . Retrieved 17 September 2013.
  9. Big.LITTLE Processing with ARM Cortex-A15 & Cortex-A7 (PDF), ARM Holdings, September 2013, archived from the original (PDF) on 17 April 2012, retrieved 17 September 2013
  10. Brian Klug (11 September 2013). "Samsung Announces big.LITTLE MP Support in Exynos 5420". AnandTech . Retrieved 16 September 2013.
  11. "Samsung Unveils New Products from its System LSI Business at Mobile World Congress". Samsung Tomorrow. Archived from the original on 16 March 2014. Retrieved 26 February 2013.
  12. "The future is here: iPhone X". Apple Newsroom. Retrieved 25 February 2018.
  13. McKenney, Paul (12 June 2012). "A big.LITTLE scheduler update". LWN.net.
  14. Perret, Quentin (25 February 2019). "Energy Aware Scheduling merged in Linux 5.0". community.arm.com.
  15. "Energy Aware Scheduling". The Linux Kernel documentation.
  16. Humrick, Matt (29 May 2017). "Exploring Dynamiq and ARM's New CPUs". Anandtech. Retrieved 10 July 2017.
  17. Ltd, Arm. "DynamIQ – Arm®". Arm | The Architecture for the Digital World. Retrieved 18 October 2023.

Further reading