AQuoSA

Last updated

AQuoSA (Adaptive Quality of Service Architecture) [1] is an open architecture for the provisioning of adaptive Quality of Service functionality into the Linux kernel. The project features a flexible, portable, lightweight and open architecture for supporting QoS related services on the top of a general-purpose operating system as Linux. The architecture is well founded on formal scheduling analysis and control theoretical results.

Contents

A key feature of AQuoSA is the Resource Reservation layer that is capable of dynamically adapting the CPU allocation for QoS aware applications based on their run-time requirements. In order to provide such functionality, AQuoSA embeds a kernel-level CPU scheduler implementing a resource reservation mechanism for the CPU based on Earliest Deadline First (EDF). This gives the ability to the Linux kernel to realize (partially) temporal isolation among the tasks running within the system.

AQuoSA is one of a few projects that provides real-time scheduling capabilities to unprivileged users on a multi-user system in a controlled way, by means of a properly designed access-control model.

Description

The architecture of the project may be summarized as follows:

AQuoSA.svg

Patch to the Linux kernel

At the lowest level, a patch to the Linux kernel adds the ability to notify to dynamically loaded modules any relevant scheduling event. These have been identified in the creation or death of tasks, as well as the block and unblock events. This patch is minimally invasive, in that it consists of a few lines of code properly inserted mainly within the Linux scheduler code (sched.c). It has been called "Generic Scheduler Patch", because it potentially allows to implement any scheduling policy.

Resource Reservations

The Resource Reservations layer is composed of three components.

The core component is a dynamically loadable kernel module that implements a Resource Reservations scheduling paradigm for the CPU, by exploiting functionality introduced into the Linux kernel through the Generic Scheduler Patch.

Second, a user-level library (QRES library) allows an application to use the new scheduling policy through a complete and well-designed set of API calls. Essentially, these calls allow an application to ask the system to reserve a certain percentage of the CPU to their processes.

Third, a kernel-level component (the Supervisor) mediates all requests made by the applications through the QRES library, so that the total sum of the requested CPU shares does not violate the schedulability condition of the scheduler (less than one, or slightly less than one, due to overhead). The supervisor behaviour is completely configurable by the system administrator, so that it is possible to specify, on a per-user/per-group basis, minimum guaranteed and maximum allowed values for the reservations made on the CPU.

With AQuoSA, applications may use directly the Resource Reservation layer, which allows them to reserve a fraction of the CPU, so to run with the required scheduling guarantees. For example, a multimedia application may ask the operating system to run the application with the guarantee of being scheduled at least for Q milliseconds every P milliseconds, where Q and P depend on the nature of the application.

When registering an application with the Resource Reservation layer, it is possible to specify a minimum guaranteed reservation that the system should always guarantee to the application. Based on the requests of minimum guaranteed reservations, the layer performs admission control, i.e. it allows a new application in only if, after the addition of it, the new set of running applications does not overcome the CPU saturation limit.

Adaptive Reservations

For typical multimedia application making use of high compression technologies, it may be quite difficult, impractical or inconvenient to run such applications with a fixed reservation on the CPU. In fact, the most efficient reservation that should be used may vary over time due to varying compression level that results in varying decompression time.

Traditional real-time systems make use of WCET (Worst Case Execution Time) analysis techniques in order to compute what is the maximum time an instance of, for example, a periodic task may execute on the CPU before blocking waiting for the next instance.

Such analysis is very difficult in today's complex multimedia applications, especially when running on general-purpose hardware like standard PCs, where technologies like multi-level caches, CPU execution pipelines, on-bus buffers, multi-master buses, introduce many unpredictable variables in determining the time required for memory accesses.

On such systems, it is much more convenient to tune a system design based on the average expected load of the application. Otherwise, the system may be significantly under-utilized during runtime.

As already mentioned, for certain classes of multimedia applications, such as a video player, it is quite impossible to find an appropriate fixed value for the fraction of CPU required by the application at run-time, due to the heavy fluctuations of the load depending on the actual data that is being processed by the application. A fixed reservation based on the average requirements, or slightly greater than that, results in transient periods of poor quality during runtime (e.g. movie playback). On the other hand, a fixed reservation based on the maximum expected load results in an unneeded over-reservation of the CPU for most of the time, except the periods in which the load really approaches the maximum expected value.

For these classes of applications, it is much more convenient to use the Adaptive Reservation techniques, like the ones provided by the Adaptive Reservation layer of AQuoSA, that performs a continuous on-line monitoring of the computational requirements of the application processes, so that it may dynamically adapt the reservation made on the CPU depending on the monitored data.

The Adaptive Reservation layer exposes to applications an API for using a set of controllers which are of quite general use within a wide set of multimedia applications.

AQuoSA Access Control Model

Most real-time variations of Linux require users of real-time capabilities of the modified OS to have root privileges on the system. This is perfectly acceptable in an embedded system context. However, this is excessive for multi-user systems where real-time scheduling features are needed for multimedia applications or similar. Therefore, AQuoSA embeds a dedicated access-control model by which system administrators can:

  1. define real-time scheduling quotas to individual users or groups, in terms of maximum values for the minimum guaranteed bandwidth that the OS can grant to individual users or groups as a whole;
  2. control how the optional required bandwidth values, in excess to the minimum guaranteed figures, is distributed among competing users, in overload situations;
  3. control the maximum scheduling overhead that can be imposed on the system as due to real-time reservations created by individual users or groups; for example, allowing to control what is the minimum period that can be specified in a real-time reservation.

More details can be found in the paper on the topic published at RTAS 2008. [2]

Later Evolution

AQuoSA submitted the first proposed EDF patch to the Linux Kernel was submitted in 2009, released as SCHED_EDF.

Shortly after this, the patch was renamed to SCHED_DEADLINE

Numerous subsequent patches were released by the developers over the following years. With EDF also becoming the foundation of the IRMOS project, with the subsequent evolution being contributed to the project.

In 2012 SCHED_DEADLINE evolved into PREEMPT RT.

PREEMT_RT continued to evolve, though still as a patch, and remained the canonical implementation of EDF scheduling within Linux until Kernel 6.x in 2023.

Up until this time EDF scheduling was still usually only available as a Kernel compile time option that was rarely enabled within common Linux distributions. However, there was a simple, almost automated, Kernel compilation procedure available for Linux users to make use of it, if and/or when required.

In October 2023, the release of Kernel 6.6 finally saw the default use of EDF in the stock Linux Kernel. With what was now known as EEVDF (EEVDF being a direct descendent of the original EDF project). This was incorporated into the mainline Kernel as a replacement for the long-standing CFS scheduler, that being the stock, 'soft' real-time, preemtive scheduler used by most Linux installations since its adoption in Kernel 2.6.23 back in 2007 and also perhaps as a final replacement for PREEMPT_RT, though as of 2024 this is not certain.

Related Research Articles

<span class="mw-page-title-main">Operating system</span> Software that manages computer hardware resources

An operating system (OS) is system software that manages computer hardware and software resources, and provides common services for computer programs.

A real-time operating system (RTOS) is an operating system (OS) for real-time computing applications that processes data and events that have critically defined time constraints. An RTOS is distinct from a time-sharing operating system, such as Unix, which manages the sharing of system resources with a scheduler, data buffers, or fixed task prioritization in multitasking or multiprogramming environments. Processing time requirements need to be fully understood and bound rather than just kept as a minimum. All processing must occur within the defined constraints. Real-time operating systems are event-driven and preemptive, meaning the OS can monitor the relevant priority of competing tasks, and make changes to the task priority. Event-driven systems switch between tasks based on their priorities, while time-sharing systems switch the task based on clock interrupts.

<span class="mw-page-title-main">QNX</span> Real-time operating system (RTOS) software

QNX is a commercial Unix-like real-time operating system, aimed primarily at the embedded systems market.

<span class="mw-page-title-main">Load (computing)</span> Amount of computational work that a computer system performs

In UNIX computing, the system load is a measure of the amount of computational work that a computer system performs. The load average represents the average system load over a period of time. It conventionally appears in the form of three numbers which represent the system load during the last one-, five-, and fifteen-minute periods.

In computing, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows.

RTLinux is a hard realtime real-time operating system (RTOS) microkernel that runs the entire Linux operating system as a fully preemptive process. The hard real-time property makes it possible to control robots, data acquisition systems, manufacturing plants, and other time-sensitive instruments and machines from RTLinux applications. The design was patented. Despite the similar name, it is not related to the Real-Time Linux project of the Linux Foundation.

<span class="mw-page-title-main">Cooperative Linux</span> Software to run both Windows and Linux

Cooperative Linux, abbreviated as coLinux, is software which allows Microsoft Windows and the Linux kernel to run simultaneously in parallel on the same machine.

Adeos is a nanokernel hardware abstraction layer (HAL), or hypervisor, that operates between computer hardware and the operating system (OS) that runs on it. It is distinct from other nanokernels in that it is not only a low level layer for an outer kernel. Instead, it is intended to run several kernels together, which makes it similar to full virtualization technologies. It is free and open-source software released under a GNU General Public License (GPL).

In computing, preemption is the act of temporarily interrupting an executing task, with the intention of resuming it at a later time. This interrupt is done by an external scheduler with no assistance or cooperation from the task. This preemptive scheduler usually runs in the most privileged protection ring, meaning that interruption and then resumption are considered highly secure actions. Such changes to the currently executing task of a processor are known as context switching.

<span class="mw-page-title-main">OpenVZ</span> Operating-system level virtualization technology

OpenVZ is an operating-system-level virtualization technology for Linux. It allows a physical server to run multiple isolated operating system instances, called containers, virtual private servers (VPSs), or virtual environments (VEs). OpenVZ is similar to Solaris Containers and LXC.

<span class="mw-page-title-main">Ubuntu Studio</span> Derivative of the Ubuntu operating system

Ubuntu Studio is a recognized flavor of the Ubuntu Linux distribution, which is geared to general multimedia production. The original version, based on Ubuntu 7.04, was released on 10 May 2007.

<span class="mw-page-title-main">Completely Fair Scheduler</span> Linux process scheduler

The Completely Fair Scheduler (CFS) was a process scheduler that was merged into the 2.6.23 release of the Linux kernel. It was the default scheduler of the tasks of the SCHED_NORMAL class and handled CPU resource allocation for executing processes, aiming to maximize overall CPU utilization while also maximizing interactive performance.

Nano-RK is a wireless sensor networking real-time operating system (RTOS) from Carnegie Mellon University, designed to run on microcontrollers for use in sensor networks. Nano-RK supports a fixed-priority fully preemptive scheduler with fine-grained timing primitives to support real-time task sets. "Nano" implies that the RTOS is small, using 2 KB of random-access memory (RAM) and using 18 KB of flash memory, while RK is short for resource kernel. A resource kernel provides reservations on how often system resources can be used. For example, a task might only be allowed to execute 10 ms every 150 ms, or a node might only be allowed to transmit 10 network packets per minute. These reservations form a virtual energy budget to ensure a node meets its designed battery lifetime and to prevent a failed node from generating excessive network traffic. Nano-RK is open-source software, is written in C and runs on the Atmel-based FireFly sensor networking platform, the MicaZ motes, and the MSP430 processor.

<span class="mw-page-title-main">Linux kernel</span> Free Unix-like operating system kernel

The Linux kernel is a free and open-source, UNIX-like kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the kernel for the GNU operating system (OS) which was created to be a free replacement for Unix. Since the late 1990s, it has been included in many operating system distributions, many of which are called Linux. One such Linux kernel operating system is Android which is used in many mobile and embedded devices.

<span class="mw-page-title-main">Brain Fuck Scheduler</span> Process scheduler in Linux

The Brain Fuck Scheduler (BFS) is a process scheduler designed for the Linux kernel in August 2009 based on earliest eligible virtual deadline first scheduling (EEVDF), as an alternative to the Completely Fair Scheduler (CFS) and the O(1) scheduler. BFS was created by Con Kolivas.

Temporal isolation or performance isolation among virtual machine (VMs) refers to the capability of isolating the temporal behavior of multiple VMs among each other, despite them running on the same physical host and sharing a set of physical resources such as processors, memory, and disks.

<span class="mw-page-title-main">Slurm Workload Manager</span> Free and open-source job scheduler for Linux and similar computers

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

SCHED_DEADLINE EDF-based task scheduler in the Linux kernel

SCHED_DEADLINE is a CPU scheduler available in the Linux kernel since version 3.14, based on the earliest deadline first (EDF) and constant bandwidth server (CBS) algorithms, supporting resource reservations: each task scheduled under such policy is associated with a budget Q, and a period P, corresponding to a declaration to the kernel that Q time units are required by that task every P time units, on any processor. This makes SCHED_DEADLINE particularly suitable for real-time applications, like multimedia or industrial control, where P corresponds to the minimum time elapsing between subsequent activations of the task, and Q corresponds to the worst-case execution time needed by each activation of the task.

Earliest deadline first (EDF) or least time to go is a dynamic priority scheduling algorithm used in real-time operating systems to place processes in a priority queue. Whenever a scheduling event occurs the queue will be searched for the process closest to its deadline. This process is the next to be scheduled for execution.

Earliest eligible virtual deadline first (EEVDF) is a dynamic priority proportional share scheduling algorithm for soft real-time systems.

References

  1. Palopoli, Luigi; Cucinotta, Tommaso; Marzario, Luca; Lipari, Giuseppe (April 2008). "AQuoSA - Adaptive Quality of Service Architecture". Software: Practice and Experience. 39: 1–31. CiteSeerX   10.1.1.149.8231 . doi:10.1002/spe.883. hdl:11382/361474. S2CID   3056232.
  2. Cucinotta, Tommaso (2008). "Access Control for Adaptive Reservations on Multi-User Systems". 2008 IEEE Real-Time and Embedded Technology and Applications Symposium. pp. 387–396. doi:10.1109/RTAS.2008.16. ISBN   978-0-7695-3146-5. S2CID   1008365.