Brain Fuck Scheduler

Brain Fuck Scheduler
Developer(s)	Con Kolivas
Final release	0.512 / October 3, 2016;8 years ago
Written in	C
Operating system	Linux
License	GNU GPL
Website	kernel.kolivas.org

Last updated November 22, 2024

The Brain Fuck Scheduler (BFS) is a process scheduler designed for the Linux kernel in August 2009 based on earliest eligible virtual deadline first scheduling (EEVDF),^[2] as an alternative to the Completely Fair Scheduler (CFS) and the O(1) scheduler.^[3] BFS was created by Con Kolivas.^[4]

The objective of BFS, compared to other schedulers, is to provide a scheduler with a simpler algorithm, that does not require adjustment of heuristics or tuning parameters to tailor performance to a specific type of computational workload. Kolivas asserted that these tunable parameters were difficult for the average user to understand, especially in terms of interactions of multiple parameters with each other, and claimed that the use of such tuning parameters could often result in improved performance in a specific targeted type of computation, at the cost of worse performance in the general case.^[4] BFS has been reported to improve responsiveness on Linux desktop computers with fewer than 16 cores.^[5]

Shortly following its introduction, the new scheduler made headlines within the Linux community, appearing on Slashdot , with reviews in Linux Magazine and Linux Pro Magazine .^[3]^[6]^[7] Although there have been varied reviews of improved performance and responsiveness, Con Kolivas did not intend for BFS to be integrated into the mainline kernel.^[4]

Theoretical design and efficiency

In 2009, BFS was introduced and had originally used a doubly linked list data structure,^[8]^[9] but the data structure is treated like a queue. Task insertion is ⁠ $O(1)$ ⁠.^[9]^{: ln 119–120} Task search for next task to execute is ⁠ $O(n)$ ⁠ worst case.^[9]^{: ln 209} It uses a single global run queue which all CPUs use. Tasks with higher scheduling priorities get executed first.^[9]^{: ln 4146–4161} Tasks are ordered (or distributed) and chosen based on the virtual deadline formula in all policies except for the realtime and Isochronous priority classes.

The execution behavior is still a weighted variation of the Round-Robin Scheduler especially when tasks have the same priority below the Isochronous policy.^[9]^{: ln 1193–1195, 334–335} The user tuneable round robin interval (time slice) is 6 milliseconds by default which was chosen as the minimal jitter just below detectable by humans.^[9]^{: ln 306} Kolivas claimed that anything below the 6 ms was pointless and anything above 300 ms for the round robin timeslice is fruitless in terms of throughput.^[9]^{: ln 314–320} This important tuneable can tailor the round robin scheduler as a trade off between throughput and latency.^[9]^{: ln 229–320} All tasks get the same time slice with the exception of realtime FIFO which is assumed to have infinite time slice.^[9]^{: ln 1646, 1212–1218, 4062, 3910}

Kolivas explained the reason why he choose to go with the doubly linked list mono-runqueue than the multi-runqueue (round robin^[10]^{: par. 3}) priority array^[11]^[10] per CPU that was used in his RDSL scheduler was to put to ease fairness among the multiple CPU scenario and remove complexity that each runqueue in a multi-runqueue scenario had to maintain its own latencies and [task] fairness.^[9]^{: ln 81–92} He claimed that deterministic latencies was guaranteed with BFS in his later iteration of MuQSS.^[12]^{: ln 471–472} He also recognized possible lock contention problem (related to the altering, removal, creation of task node data)^[12]^{: ln 126–144} with increasing CPUs and the overhead of ⁠ $O(\log n)$ ⁠ next task for execution lookup.^[12]^{: ln 472–478} MuQSS tried to resolve those problems.

Kolivas later changed the design to a skip list in the v0.480 release of BFS in 2016.^[13] This time this altered the efficiency of the scheduler. He noted ⁠ $O(\log n)$ ⁠ task insertion, ⁠ $O(1)$ ⁠ task lookup; ⁠ $O(k)$ ⁠, with ⁠ $k\leq 16$ ⁠, for task removal.^[13]^{: ln 4}

Virtual deadline

The virtual deadline formula is a future deadline time that is the scaled round robin timeslice based on the nice level offset by the current time (in niffy units or nanosecond jiffies, an internal kernel time counter).^[9]^{: ln 4023, 4063} The virtual deadline only suggests the order but does not guarantee that a task will run exactly on the future scheduled niffy.^[9]^{: ln 161–163}

First a prio ratios lookup table is created.^[9]^{: ln 8042–8044} It is based on a recursive sequence. It increases 10% each nice level.^[9]^{: ln 161} It follows a parabolic pattern if graphed, and the niced tasks are distributed as a moving squared function from 0 to 39 (corresponding from highest to lowest nice priority) as the domain and 128 to 5089 as the range.^[9]^{: ln 177–179, 120, 184–185} The moving part comes from the t variable in the virtual deadline formula that Kolivas hinted.

g(0) = 128 g(i) = INT( g(i-1)*11/10 )

Index	Numerator
0	128
1	140
2	154
3	169
4	185
5	203
6	223
7	245
8	269
9	295
10	324
11	356
12	391
13	430
14	473
15	520
16	572
17	629
18	691
19	760
20	836
21	919
22	1010
23	1111
24	1222
25	1344
26	1478
27	1625
28	1787
29	1965
30	2161
31	2377
32	2614
33	2875
34	3162
35	3478
36	3825
37	4207
38	4627
39	5089

The task's nice-to-index mapping function f(n) is mapped from nice −20...19 to index 0...39 to be used as the input to the prio ratio lookup table. This mapping function is the TASK_USER_PRIO() macro in sched.h in the kernel header. The internal kernel implementation slightly differs with range between 100 and 140 static priority but users will see it as −20...19 nice.

Nice	Index
−20	0
−19	1
−18	2
−17	3
−16	4
−15	5
−14	6
−13	7
−12	8
−11	9
−10	10
−9	11
−8	12
−7	13
−6	14
−5	15
−4	16
−3	17
−2	18
−1	19
0	20
1	21
2	22
3	23
4	24
5	25
6	26
7	27
8	28
9	29
10	30
11	31
12	32
13	33
14	34
15	35
16	36
17	37
18	38
19	39

The virtual deadline is based on this exact formula:^[9]^{: ln 4063, 4036, 4033, 1180}

T = 6 N = 1<<20 d(n,t) = t + g(f(n)) * T * (N/128)

Alternatively,

{\mathtt {d(n,t)={\frac {g(f(n))}{128}}*T*N+t}}

^[14]

where d(n,t) is the virtual deadline in u64 integer nanoseconds as a function of nice n and t which is the current time in niffies, g(i) is the prio ratio table lookup as a function of index, f(n) is the task's nice-to-index mapping function, T is the round robin timeslice in milliseconds, N is a constant of 1 millisecond in terms of nanoseconds as a latency reducing approximation of the conversion factor of $\mathrm {\frac {10^{9}ns}{10^{3}ms}}$ but Kolivas uses a base 2 constant N with approximately that scale.^[9]^{: ln 1173–1174} Smaller values of d mean that the virtual deadline is earlier corresponding to negative nice values. Larger values of d indicate the virtual deadline is pushed back later corresponding to positive nice values. It uses this formula whenever the timeslice expires.^[9]^{: ln 5087}

128 in base 2 corresponds to 100 in base 10 and possibly a "pseudo 100".^[9]^{: ln 3415} 115 in base 2 corresponds to 90 in base 10. Kolivas uses 128 for "fast shifts",^[9]^{: ln 3846, 1648, 3525} as in division is right shift base 2.

Nice	Virtual deadline in timeslices relative to t	Virtual deadline in exact seconds relative to t
−20	1.0	0.006
−19	1.09	0.006562
−18	1.2	0.007219
−17	1.3	0.007922
−16	1.4	0.008672
−15	1.5	0.009516
−14	1.7	0.010453
−13	1.9	0.011484
−12	2.1	0.012609
−11	2.3	0.013828
−10	2.5	0.015187
−9	2.7	0.016688
−8	3.0	0.018328
−7	3.3	0.020156
−6	3.6	0.022172
−5	4.0	0.024375
−4	4.4	0.026812
−3	4.9	0.029484
−2	5.3	0.032391
−1	5.9	0.035625
0	6.5	0.039188
1	7.1	0.043078
2	7.8	0.047344
3	8.6	0.052078
4	9.5	0.057281
5	10.5	0.063000
6	11.5	0.069281
7	12.6	0.076172
8	13.9	0.083766
9	15.3	0.092109
10	16.8	0.101297
11	18.5	0.111422
12	20.4	0.122531
13	22.4	0.134766
14	24.7	0.148219
15	27.1	0.163031
16	29.8	0.179297
17	32.8	0.197203
18	36.1	0.216891
19	39.7	0.238547

Scheduling policies

BFS uses scheduling policies to determine how much of the CPU tasks may use. BFS uses 4 scheduling tiers (called scheduling policies or scheduling classes) ordered from best to worst which determines how tasks are selected^[9]^{: ln 4146–4161} with the ones on top being executed first.

Each task has a special value called a prio. In the v0.462 edition (used in the -ck 4.0 kernel patchset), there are total of 103 "priority queues" (aka prio) or allowed values that it can take. No actual special data structure was used as the priority queue but only the doubly linked list runqueue itself. The lower prio value means it is more important and gets executed first.

Realtime policy

The realtime policy was designed for realtime tasks. This policy implies that the running tasks cannot be interrupted (i.e. preempted) by the lower prio-ed task or lower priority policy tiers. Priority classes considered under the realtime policy by the scheduler are those marked SCHED_RR and SCHED_FIFO.^[9]^{: ln 351, 1150} The scheduler treats realtime round robin (SCHED_RR) and realtime FIFO (SCHED_FIFO) differently.^[9]^{: ln 3881–3934}

The design laid out first 100 static priority queues.^[9]^{: ln 189}

The task that will get chosen for execution is based on task availability of the lowest value of prio of the 100 queues and FIFO scheduling.^[9]^{: ln 4146–4161}

On forks, the process priority will be demoted to normal policy.^[9]^{: ln 2708}

On unprivileged use (i.e. non-root user) of sched_setscheduler called with a request for realtime policy class, the scheduler will demote the task to Isochronous policy.^[9]^{: ln 350–359, 5023–5025}

Isochronous policy

The Isochronous policy was designed for near realtime performance for non-root users.^[9]^{: ln 325}

The design laid out 1 priority queue that by default ran as pseudo-realtime tasks, but can be tuned as a degree of realtime.^[9]^{: ln 1201, 346–348}

The behavior of the policy can allow a task can be demoted to normal policy^[9]^{: ln 336} when it exceeds a tuneable resource handling percentage (70% by default^[9]^{: ln 343, 432}) of 5 seconds scaled to the number of online CPUs and the timer resolution plus 1 tick.^[9]^{: ln 343, 3844–3859, 1167, 338}^[12]^{: ln 1678, 4770–4783, 734} The formula was altered in MuQSS due to the multi-runqueue design. The exact formulas are:

{\mathtt {T_{BFS}=((5*F*n)+1)*R}}

{\mathtt {T_{MuQSS}=(5*F)*R}}

where T is the total number of isochronous ticks, F is the timer frequency, n is the number of online CPUs, R is the tuneable resource handling percentage not in decimal but as a whole number. The timer frequency is set to 250 by default and editable in the kernel, but usually tuned to 100 Hz for servers and 1000 Hz for interactive desktops. 250 is the balanced value. Setting R to 100 made tasks behave as realtime and 0 made it not pseudo-realtime and anything in the middle was pseudo-realtime.^[9]^{: ln 346–348}

The task that had an earliest virtual deadline was chosen for execution, but when multiple Isochronous tasks are in existence, they schedule as round robin allowing tasks to run the tuneable round robin value (with 6 ms as the default) one after another in a fair equal chance without considering the nice level.^[9]^{: ln 334}

This behavior of the Isochronous policy is unique to only BFS and MuQSS and may not be implemented in other CPU schedulers.^[9]^{: ln 324, 8484–8485}^[12]^{: ln 1287–1288}

Normal policy

The normal policy was designed for regular use and is the default policy. Newly created tasks are typically marked normal.^[9]^{: ln 502}

The design laid out one priority queue and tasks are chosen to be executed first based on earliest virtual deadline.

Idle priority policy

The idle priority was designed for background processes such as distributed programs and transcoders so that foreground processes or those above this scheduling policy can run uninterrupted.^[9]^{: ln 363–368}

The design laid out 1 priority queue and tasks can be promoted to normal policy automatically to prevent indefinite resource hold.^[9]^{: ln 368}

The next executed task with Idle priority with others residing in the same priority policy is selected by the earliest virtual deadline.^[9]^{: ln 4160–4161}

Preemption

Preemption can occur when a newly ready task with a higher priority policy (i.e. higher prio) has an earlier virtual deadline than the currently running task - which will be descheduled and put at the back of the queue.^[9]^{: ln 169–175} Descheduled means that its virtual deadline is updated.^[9]^{: ln 165–166} The task's time gets refilled to max round robin quantum when it has used up all its time.^[9]^{: ln 4057–4062, 5856} If the scheduler found the task at the higher prio with the earliest virtual deadline, it will execute in place of the less important currently running task only if all logical CPUs (including hyperthreaded cores / SMT threads) are busy. The scheduler will delay preemption as long as possible if there are unused logical CPUs.

If a task is marked idle priority policy, it cannot preempt at all even other idle policy marked tasks but rather use cooperative multitasking.^[9]^{: ln 2341–2344}

Task placement, multiple cores

When the scheduler discovers a waking task on a non-unicore system, it will need to determine which logical CPU to run the task on. The scheduler favors most the idle hyperthreaded cores (or idle SMT threads) first on the same CPU that the task executed on,^[9]^{: ln 261} then the other idle core of a multicore CPU,^[9]^{: ln 262} then the other CPUs on the same NUMA node,^[9]^{: ln 267, 263–266, 255–258} then all busy hyperthreaded cores / SMT threads / logical CPUs to be preempted on the same NUMA node,^[9]^{: ln 265–267} then the other (remote) NUMA node^[9]^{: ln 268–270} and is ranked on a preference list.^[9]^{: ln 255–274} This special scan exists to minimize latency overhead resulting of migrating the task.^[9]^{: ln 245, 268–272}

The preemption order is similar to the above paragraph. The preemption order is hyperthreaded core / SMT units on the same multicore first, then the other core in the multicore, then the other CPU on the same NUMA node.^[9]^{: ln 265–267} When it goes scanning for a task to preempt in the other remote NUMA node, the preemption is just any busy threads with lower to equal prio or later virtual deadline assuming that all logical CPUs (including hyperthreaded core / SMT threads) in the machine are all busy.^[9]^{: ln 270} The scheduler will have to scan for a suitable task with a lower or maybe equal priority policy task (with a later virtual deadline if necessary) to preempt and avoid logical CPUs with a task with a higher priority policy which it cannot preempt. Local preemption has a higher rank than scanning for a remote idle NUMA unit.^[9]^{: ln 265–269}

When a task is involuntary preempted at the time the CPU is slowed down as a result of kernel mediated CPU frequency scaling (aka CPU frequency governor), the task is specially marked "sticky" except those marked as realtime policy.^[9]^{: ln 2085} Marked sticky indicates that the task still has unused time and the task is restricted executing to the same CPU.^[9]^{: ln 233–243} The task will be marked sticky whenever the CPU scaling governor has scaled the CPU at a slower speed.^[9]^{: ln 2082–2107, 8840–8848} The idled stickied task will return to either executing at full Ghz speed by chance or to be rescheduled to execute on the best idle CPU that is not the same CPU that the task ran on.^[9]^{: ln 2082–2086, 239–242, 2068–2079} It is not desirable to migrate the task to other places but make it idle instead because of increased latency brought about of overhead to migrating the task to another CPU or NUMA node.^[9]^{: ln 228, 245} This sticky feature was removed in the last iteration of BFS (v0.512) corresponding to Kolivas' patchset 4.8-ck1 and did not exist in MuQSS.

schedtool

A privileged user can change the priority policy of a process with the schedtool program^[9]^{: ln 326, 373} or it is done by a program itself.^[9]^{: ln 336} The priority class can be manipulated at the code level with a syscall like sched_setscheduler only available to root,^[15] which schedtool uses.^[16]

Benchmarks

In a contemporary study,^[5] the author compared the BFS to the CFS using the Linux kernel v3.6.2 and several performance-based endpoints. The purpose of this study was to evaluate the Completely Fair Scheduler (CFS) in the vanilla Linux kernel and the BFS in the corresponding kernel patched with the ck1 patchset. Seven different machines were used to see if differences exist and, to what degree they scale using performance based metrics. Number of logical CPUs ranged from 1 to 16. These end-points were never factors in the primary design goals of the BFS. The results were encouraging.

Kernels patched with the ck1 patch set including the BFS outperformed the vanilla kernel by 1% to 8% using the CFS at nearly all the performance-based benchmarks tested.^[5] Further study with a larger test set could be conducted, but based on the small test set of 7 PCs evaluated, these increases in process queuing, efficiency/speed are, on the whole, independent of CPU type (mono, dual, quad, hyperthreaded, etc.), CPU architecture (32-bit and 64-bit) and of CPU multiplicity (mono or dual socket).

Moreover, on several "modern" CPUs, such as the Intel Core 2 Duo and Core i7, that represent common workstations and laptops, BFS consistently outperformed the CFS in the vanilla kernel at all benchmarks. Efficiency and speed gains were small to moderate.

Adoption

BFS is the default scheduler for the following desktop Linux distributions:

Additionally, BFS has been added to an experimental branch of Google's Android development repository.^[21] It was not included in the Froyo release after blind testing did not show an improved user experience.^[22]

MuQSS

BFS has been retired in favour of MuQSS, known formally as the Multiple Queue Skiplist Scheduler, a rewritten implementation of the same concept.^[23]^[24] The primary author abandoned^[25] work on MuQSS by the end of August 2021.

Theoretical design and efficiency

MuQSS uses a bidirectional static arrayed 8 level skip list and tasks are ordered by static priority [queues] (referring to the scheduling policy) and a virtual deadline.^[12]^{: ln 519, 525, 537, 588, 608} 8 was chosen to fit the array in the cacheline.^[12]^{: ln 523} Doubly linked data structure design was chosen to speed up task removal. Removing a task takes only O(1) with a doubly skip list versus the original design by William Pugh which takes ⁠ $O(n)$ ⁠ worst case.^[12]^{: ln 458}

Task insertion is ⁠ $O(\log n)$ ⁠.^[12]^{: ln 458} The next task for execution lookup is ⁠ $O(k)$ ⁠, where $k$ is the number of CPUs.^[12]^{: ln 589–590, 603, 5125} The next task for execution is ⁠ $O(1)$ ⁠ per runqueue,^[12]^{: ln 5124} but the scheduler examines every other runqueues to maintain task fairness among CPUs, for latency or balancing (to maximize CPU usage and cache coherency on the same NUMA node over those that access across NUMA nodes), so ultimately ⁠ $O(k)$ ⁠.^[12]^{: ln 591–593, 497–501, 633–656} The max number of tasks it can handle are 64k tasks per runqueue per CPU.^[12]^{: ln 521} It uses multiple task runqueues in some configurations one runqueue per CPU, whereas its predecessor BFS only used one task runqueue for all CPUs.

Tasks are ordered as a gradient in the skip list in a way that realtime policy priority comes first and idle policy priority comes last.^[12]^{: ln 2356–2358} Normal and idle priority policy still get sorted by virtual deadline which uses nice values.^[12]^{: ln 2353} Realtime and Isochronous policy tasks are run in FIFO order ignoring nice values.^[12]^{: ln 2350–2351} New tasks with same key are placed in FIFO order meaning that newer tasks get placed at the end of the list (i.e. top most node vertically), and tasks at 0th level or at the front-bottom get execution first before those at nearest to the top vertically and those furthest away from the head node.^[12]^{: ln 2351–2352, 590} The key used for inserted sorting is either the static priority^[12]^{: ln 2345, 2365,} or the virtual deadline.^[12]^{: ln 2363}

The user can choose to share runqueues among multicore or have a runqueue per logical CPU.^[12]^{: ln 947–1006} The speculation of sharing runqueues design was to reduce latency with a tradeoff of throughput.^[12]^{: ln 947–1006}

A new behavior introduced by MuQSS was the use of the high resolution timer for below millisecond accuracy when timeslices were used up resulting in rescheduling tasks.^[12]^{: ln 618–630, 3829–3851, 3854–3865, 5316}

Related Research Articles

In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes to share a single central processing unit (CPU), and is an essential feature of a multiprogramming or multitasking operating system. In a traditional CPU, each process - a program in execution - utilizes the various CPU registers to store data and hold the current state of the running process. However, in a multitasking operating system, the operating system switches between processes or threads to allow the execution of multiple processes simultaneously. For every switch, the operating system must save the state of the currently running process, followed by loading the next process state, which will run on the CPU. This sequence of operations that stores the state of the running process and loads the following running process is called a context switch.

<span class="mw-page-title-main">Load (computing)</span> Amount of computational work that a computer system performs

In UNIX computing, the system load is a measure of the amount of computational work that a computer system performs. The load average represents the average system load over a period of time. It conventionally appears in the form of three numbers which represent the system load during the last one-, five-, and fifteen-minute periods.

In computing, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows.

In computer science, rate-monotonic scheduling (RMS) is a priority assignment algorithm used in real-time operating systems (RTOS) with a static-priority scheduling class. The static priorities are assigned according to the cycle duration of the job, so a shorter cycle duration results in a higher job priority.

RTLinux is a hard realtime real-time operating system (RTOS) microkernel that runs the entire Linux operating system as a fully preemptive process. The hard real-time property makes it possible to control robots, data acquisition systems, manufacturing plants, and other time-sensitive instruments and machines from RTLinux applications. The design was patented. Despite the similar name, it is not related to the Real-Time Linux project of the Linux Foundation.

Completely Fair Queuing (CFQ) is an I/O scheduler for the Linux kernel which was written in 2003 by Jens Axboe.

In modern computers many processes run at once. Active processes are placed in an array called a run queue, or runqueue. The run queue may contain priority values for each process, which will be used by the scheduler to determine which process to run next. To ensure each program has a fair share of resources, each one is run for some time period (quantum) before it is paused and placed back into the run queue. When a program is stopped to let another run, the program with the highest priority in the run queue is then allowed to execute.

top (software) Task manager program found in many Unix-like operating systems

top is a task manager or system monitor program, found in many Unix-like operating systems, that displays information about CPU and memory utilization.

nice is a program found on Unix and Unix-like operating systems such as Linux. It directly maps to a kernel call of the same name. nice is used to invoke a utility or shell script with a particular CPU priority, thus giving the process more or less CPU time than other processes. A niceness of -20 is the lowest niceness, or highest priority. The default niceness for processes is inherited from its parent process and is usually 0.

An O(1) scheduler is a kernel scheduling design that can schedule processes within a constant amount of time, regardless of how many processes are running on the operating system. This is an improvement over previously used O(n) schedulers, which schedule processes in an amount of time that scales linearly based on the amounts of inputs.

In computing, preemption is the act of temporarily interrupting an executing task, with the intention of resuming it at a later time. This interrupt is done by an external scheduler with no assistance or cooperation from the task. This preemptive scheduler usually runs in the most privileged protection ring, meaning that interruption and then resumption are considered highly secure actions. Such changes to the currently executing task of a processor are known as context switching.

AQuoSA is an open architecture for the provisioning of adaptive Quality of Service functionality into the Linux kernel. The project features a flexible, portable, lightweight and open architecture for supporting QoS related services on the top of a general-purpose operating system as Linux. The architecture is well founded on formal scheduling analysis and control theoretical results.

The Completely Fair Scheduler (CFS) was a process scheduler that was merged into the 2.6.23 release of the Linux kernel. It was the default scheduler of the tasks of the SCHED_NORMAL class and handled CPU resource allocation for executing processes, aiming to maximize overall CPU utilization while also maximizing interactive performance.

Con Kolivas is a Greek-Australian anaesthetist. He has worked as a computer programmer on the Linux kernel and on the development of the cryptographic currency mining software CGMiner. His Linux contributions include patches for the kernel to improve its desktop performance, particularly reducing I/O impact.

A process is a program in execution, and an integral part of any modern-day operating system (OS). The OS must allocate resources to processes, enable processes to share and exchange information, protect the resources of each process from other processes and enable synchronization among processes. To meet these requirements, The OS must maintain a data structure for each process, which describes the state and resource ownership of that process, and which enables the operating system to exert control over each process.

<span class="mw-page-title-main">Linux kernel</span> Free Unix-like operating system kernel

The Linux kernel is a free and open source, UNIX-like kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the kernel for the GNU operating system (OS) which was created to be a free replacement for Unix. Since the late 1990s, it has been included in many operating system distributions, many of which are called Linux. One such Linux kernel operating system is Android which is used in many mobile and embedded devices.

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

SCHED_DEADLINE is a CPU scheduler available in the Linux kernel since version 3.14, based on the earliest deadline first (EDF) and constant bandwidth server (CBS) algorithms, supporting resource reservations: each task scheduled under such policy is associated with a budget Q, and a period P, corresponding to a declaration to the kernel that Q time units are required by that task every P time units, on any processor. This makes SCHED_DEADLINE particularly suitable for real-time applications, like multimedia or industrial control, where P corresponds to the minimum time elapsing between subsequent activations of the task, and Q corresponds to the worst-case execution time needed by each activation of the task.

Earliest deadline first (EDF) or least time to go is a dynamic priority scheduling algorithm used in real-time operating systems to place processes in a priority queue. Whenever a scheduling event occurs the queue will be searched for the process closest to its deadline. This process is the next to be scheduled for execution.

Earliest eligible virtual deadline first (EEVDF) is a dynamic priority proportional share scheduling algorithm for soft real-time systems.

References

↑ "-ck hacking: BFS version 0.512, linux-4.8-ck1, MuQSS for linux-4.8". ck-hack.blogspot.com. 2016-10-03. Retrieved 2016-11-10.
↑ "The MuQSS CPU scheduler [LWN.net]". LWN.net .
1 2 "Con Kolivas Introduces New BFS Scheduler » Linux Magazine". Linuxpromagazine.com. 2009-09-02. Retrieved 2013-10-30.
1 2 3 "FAQs about BFS v0.330". Ck.kolivas.org. Retrieved 2013-10-30.
1 2 3 "CPU Schedulers Compared" (PDF). Repo-ck.com. Retrieved 2013-10-30.
↑ "Con Kolivas Returns, With a Desktop-Oriented Linux Scheduler". Slashdot. 6 September 2009. Retrieved 2013-10-30.
↑ "Ingo Molnar Tests New BF Scheduler". Linux Magazine. 2009-09-08. Retrieved 2013-10-30.
↑ "sched-bfs-001.patch". Con Kolivas. 2009-08-13. Retrieved 2020-10-09.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 "4.0-sched-bfs-462.patch". Con Kolivas. 2015-04-16. Retrieved 2019-01-29.
1 2 "The Rotating Staircase Deadline Scheduler". corbet. 2007-03-06. Retrieved 2019-01-30.
↑ "sched-rsdl-0.26.patch". Con Kolivas. Archived from the original on 2011-07-26. Retrieved 2019-01-30.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 "0001-MultiQueue-Skiplist-Scheduler-version-v0.173.patch". Con Kolivas. 2018-08-27. Retrieved 2019-01-29.
1 2 "4.7-sched-bfs-480.patch". Con Kolivas. 2016-09-02. Retrieved 2020-10-09.
↑ The alternative formula is presented for ease of understanding. All math is done in integer math so precision loss would be great. It is possibly why Kolivas deferred the division by 128 to one of the largest numbers as a multiple of 128 resulting in no remainder.
↑ "The Linux Scheduler". Moshe Bar. 2000-04-01. Retrieved 2019-01-29.
↑ "schedtool.c". freek. 2017-07-17. Retrieved 2019-01-30.
↑ "Sabayon 7 Brings Linux Heaven". Ostatic.com. Retrieved 2013-10-30.
↑ "2010 Edition is now available for download". PCLinuxOS. 2013-10-22. Retrieved 2013-10-30.
↑ "Zenwalk 6.4 is ready ! - Releases - News". Zenwalk.org. Archived from the original on 2013-10-23. Retrieved 2013-10-30.
↑ "About GalliumOS - GalliumOS Wiki". wiki.galliumos.org. Retrieved 2018-06-08.
↑ Archived September 22, 2009, at the Wayback Machine
↑ "CyanogenMod 5 for the G1/ADP1". Lwn.net. Retrieved 2013-10-30.
↑ "ck-hacking: linux-4.8-ck2, MuQSS version 0.114". ck-hack.blogspot.com. 2016-10-21. Retrieved 2016-11-10.
↑ "Con Kolivas Announces First Major Release of MuQSS, Successor to BFS - Phoronix".
↑ "5.14 and the future of MuQSS and -ck once again". ck-hack.blogspot.com. 2021-08-31. Retrieved 2022-09-20.

External links

Brain Fuck Scheduler FAQ

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "-ck hacking: BFS version 0.512, linux-4.8-ck1, MuQSS for linux-4.8". ck-hack.blogspot.com. 2016-10-03. Retrieved 2016-11-10.

[2] "The MuQSS CPU scheduler [LWN.net]". LWN.net .

[Linux_Pro_Magazine_on_BFS-3] 1 2 "Con Kolivas Introduces New BFS Scheduler » Linux Magazine". Linuxpromagazine.com. 2009-09-02. Retrieved 2013-10-30.

[bfsfaq-4] 1 2 3 "FAQs about BFS v0.330". Ck.kolivas.org. Retrieved 2013-10-30.

[shedulers_compared-5] 1 2 3 "CPU Schedulers Compared" (PDF). Repo-ck.com. Retrieved 2013-10-30.

[6] "Con Kolivas Returns, With a Desktop-Oriented Linux Scheduler". Slashdot. 6 September 2009. Retrieved 2013-10-30.

[7] "Ingo Molnar Tests New BF Scheduler". Linux Magazine. 2009-09-08. Retrieved 2013-10-30.

[sched-bfs-001.patch-8] "sched-bfs-001.patch". Con Kolivas. 2009-08-13. Retrieved 2020-10-09.

[4.0-sched-bfs-462.patch-9] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 "4.0-sched-bfs-462.patch". Con Kolivas. 2015-04-16. Retrieved 2019-01-29.

[lwn-224865-10] 1 2 "The Rotating Staircase Deadline Scheduler". corbet. 2007-03-06. Retrieved 2019-01-30.

[11] "sched-rsdl-0.26.patch". Con Kolivas. Archived from the original on 2011-07-26. Retrieved 2019-01-30.

[0001-MultiQueue-Skiplist-Scheduler-version-v0.173.patch-12] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 "0001-MultiQueue-Skiplist-Scheduler-version-v0.173.patch". Con Kolivas. 2018-08-27. Retrieved 2019-01-29.

[4.7-sched-bfs-480.patch-13] 1 2 "4.7-sched-bfs-480.patch". Con Kolivas. 2016-09-02. Retrieved 2020-10-09.

[14] The alternative formula is presented for ease of understanding. All math is done in integer math so precision loss would be great. It is possibly why Kolivas deferred the division by 128 to one of the largest numbers as a multiple of 128 resulting in no remainder.

[15] "The Linux Scheduler". Moshe Bar. 2000-04-01. Retrieved 2019-01-29.

[16] "schedtool.c". freek. 2017-07-17. Retrieved 2019-01-30.

[17] "Sabayon 7 Brings Linux Heaven". Ostatic.com. Retrieved 2013-10-30.

[18] "2010 Edition is now available for download". PCLinuxOS. 2013-10-22. Retrieved 2013-10-30.

[19] "Zenwalk 6.4 is ready ! - Releases - News". Zenwalk.org. Archived from the original on 2013-10-23. Retrieved 2013-10-30.

[20] "About GalliumOS - GalliumOS Wiki". wiki.galliumos.org. Retrieved 2018-06-08.

[21] Archived September 22, 2009, at the Wayback Machine

[22] "CyanogenMod 5 for the G1/ADP1". Lwn.net. Retrieved 2013-10-30.

[bfsretired-23] "ck-hacking: linux-4.8-ck2, MuQSS version 0.114". ck-hack.blogspot.com. 2016-10-21. Retrieved 2016-11-10.

[phoronix-MuQSS-24] "Con Kolivas Announces First Major Release of MuQSS, Successor to BFS - Phoronix".

[muqsseol-25] "5.14 and the future of MuQSS and -ck once again". ck-hack.blogspot.com. 2021-08-31. Retrieved 2022-09-20.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

Brain Fuck Scheduler

Contents

Theoretical design and efficiency

Virtual deadline

Scheduling policies

Realtime policy

Isochronous policy

Normal policy

Idle priority policy

Preemption

Task placement, multiple cores

schedtool

Benchmarks

Adoption

MuQSS

Theoretical design and efficiency

See also

Related Research Articles

References

External links