Qsort

Last updated

qsort is a C standard library function that implements a sorting algorithm for arrays of arbitrary objects according to a user-provided comparison function. It is named after the "quicker sort" algorithm [1] (a quicksort variant due to R. S. Scowen), which was originally used to implement it in the Unix C library, although the C standard does not require it to implement quicksort. [2]

Contents

The ability to operate on different kinds of data ( polymorphism ) is achieved by taking a function pointer to a three-way comparison function, as well as a parameter that specifies the size of its individual input objects. The C standard requires the comparison function to implement a total order on the items in the input array. [3]

History

A qsort function appears in Version 2 Unix in 1972 as a library assembly language subroutine. Its interface is unlike the modern version, in that it can be pseudo-prototyped as qsort(void * start, void * end, unsigned length) – sorting contiguously-stored length-long byte strings from the range [start, end). [1] This, and the lack of a replaceable comparison function, makes it unsuitable to properly sort the system's little-endian integers, or any other data structures.

In Version 3 Unix, the interface is extended by calling compar(III), with an interface identical to modern-day memcmp. This function may be overridden by the user's program to implement any kind of ordering, in an equivalent fashion to the compar argument to standard qsort (though program-global, of course). [4]

Version 4 Unix adds a C implementation, with an interface equivalent to the standard. [5] It was rewritten in 1983 for the Berkeley Software Distribution. [2] The function was standardized in ANSI C (1989). The assembly implementation is removed in Version 6 Unix. [6]

In 1991, Bell Labs employees observed that AT&T and BSD versions of qsort would consume quadratic time for some simple inputs. Thus Jon Bentley and Douglas McIlroy engineered a new faster and more robust implementation. [2] McIlroy would later produce a more complex quadratic-time input, termed AntiQuicksort, in 1998. This function constructs adversary data on-the-fly. [7]

Example

The following piece of C code shows how to sort a list of integers using qsort.

#include<stdlib.h>/* Comparison function. Receives two generic (void) pointers to the items under comparison. */intcompare_ints(constvoid*p,constvoid*q){intx=*(constint*)p;inty=*(constint*)q;/* Avoid return x - y, which can cause undefined behaviour       because of signed integer overflow. */if(x<y)return-1;// Return -1 if you want ascending, 1 if you want descending order. elseif(x>y)return1;// Return 1 if you want ascending, -1 if you want descending order.return0;// All the logic is often alternatively written:return(x>y)-(x<y);}/* Sort an array of n integers, pointed to by a. */voidsort_ints(int*a,size_tn){qsort(a,n,sizeof(*a),compare_ints);}

Extensions

Since the comparison function of the original qsort only accepts two pointers, passing in additional parameters (e.g. producing a comparison function that compares by the two value's difference with another value) must be done using global variables. The issue was solved by the BSD and GNU Unix-like systems by an introducing qsort_r function, which allows for an additional parameter to be passed to the comparison function. The two versions of qsort_r have different argument orders. C11 Annex K defines a qsort_s essentially identical to GNU's qsort_r. The macOS and FreeBSD libcs also contain qsort_b, a variant that uses blocks, an analogue to closures, as an alternate solution to the same problem. [8]

Related Research Articles

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, and protocol stacks, but its use in application software has been decreasing. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

Berkeley sockets is an application programming interface (API) for Internet sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BSD Unix operating system, which was released in 1983.

Lex is a computer program that generates lexical analyzers. It is commonly used with the yacc parser generator and is the standard lexical analyzer generator on many Unix and Unix-like systems. An equivalent tool is specified as part of the POSIX standard.

The C programming language provides many standard library functions for file input and output. These functions make up the bulk of the C standard library header <stdio.h>. The functionality descends from a "portable I/O package" written by Mike Lesk at Bell Labs in the early 1970s, and officially became part of the Unix operating system in Version 7.

In mathematics and computer science, a higher-order function (HOF) is a function that does at least one of the following:

In computer science, a type signature or type annotation defines the inputs and outputs of a function, subroutine or method. A type signature includes the number, types, and order of the function's arguments. One important use of a type signature is for function overload resolution, where one particular definition of a function to be called is selected among many overloaded forms.

Flex is a free and open-source software alternative to lex. It is a computer program that generates lexical analyzers . It is frequently used as the lex implementation together with Berkeley Yacc parser generator on BSD-derived operating systems, or together with GNU bison in *BSD ports and in Linux distributions. Unlike Bison, flex is not part of the GNU Project and is not released under the GNU General Public License, although a manual for Flex was produced and published by the Free Software Foundation.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

<span class="mw-page-title-main">C syntax</span> Set of rules defining correctly structured programs

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

In computer programming, a function object is a construct allowing an object to be invoked or called as if it were an ordinary function, usually with the same syntax. In some languages, particularly C++, function objects are often called functors.

bc, for basic calculator, is "an arbitrary-precision calculator language" with syntax similar to the C programming language. bc is typically used as either a mathematical scripting language or as an interactive mathematical shell.

In some programming languages, const is a type qualifier, which indicates that the data is read-only. While this can be used to declare constants, const in the C family of languages differs from similar constructs in other languages in that it is part of the type, and thus has complicated behavior when combined with pointers, references, composite data types, and type-checking. In other languages, the data is not in a single memory location, but copied at compile time for each use. Languages which use it include C, C++, D, JavaScript, Julia, and Rust.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

In C programming, the functions getaddrinfo and getnameinfo convert domain names, hostnames, and IP addresses between human-readable text representations and structured binary formats for the operating system's networking API. Both functions are contained in the POSIX standard application programming interface (API).

<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

scanf, short for scan formatted, is a C standard library function that reads and parses text from standard input.

A class in C++ is a user-defined type or data structure declared with any of the keywords class, struct or union that has data and functions as its members whose access is governed by the three access specifiers private, protected or public. By default access to members of a C++ class declared with the keyword class is private. The private members are not accessible outside the class; they can be accessed only through member functions of the class. The public members form an interface to the class and are accessible outside the class.

sort is a generic function in the C++ Standard Library for doing comparison sorting. The function originated in the Standard Template Library (STL).

In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfil the same role for the function type as literals do for other data types.

Getopt is a C library function used to parse command-line options of the Unix/POSIX style. It is a part of the POSIX specification, and is universal to Unix-like systems. It is also the name of a Unix program for parsing command line arguments in shell scripts.

References

  1. 1 2 "UNIX Programmer's Manual, Second Edition" (PDF). Bell Telephone Laboratories. June 12, 1972. p. 193 via The Unix Heritage Society.
  2. 1 2 3 Bentley, Jon L.; McIlroy, M. Douglas (1993). "Engineering a sort function". Software: Practice and Experience. 23 (11): 1249–1265. CiteSeerX   10.1.1.14.8162 . doi:10.1002/spe.4380231105. S2CID   8822797.
  3. ISO/IEC 9899:201x, Programming Languages—C (draft). §7.22.5. November 16, 2010.
  4. "UNIX Programmer's Manual, Third Edition". Bell Telephone Laboratories. February 1973. p. qsort(III) via The Unix Heritage Society.
  5. "UNIX Programmer's Manual, Fourth Edition". Bell Telephone Laboratories. November 1973. p. qsort(III) via The Unix Heritage Society.
  6. "qsort(III), from UNIX Programmer's Manual, Sixth Edition". Unix Archive.
  7. McIlroy, M. D. (10 April 1999). "A killer adversary for quicksort" (PDF). Software: Practice and Experience. 29 (4): 341–344. doi:10.1002/(SICI)1097-024X(19990410)29:4<341::AID-SPE237>3.0.CO;2-R. S2CID   35935409.
  8. qsort_r(3)    FreeBSD Library Functions Manual