One-pass algorithm

Last updated December 13, 2023

In computing, a one-pass algorithm or single-pass algorithm is a streaming algorithm which reads its input exactly once.^[1] It does so by processing items in order, without unbounded buffering; it reads a block into an input buffer, processes it, and moves the result into an output buffer for each step in the process.^[2] A one-pass algorithm generally requires O(n) (see 'big O' notation) time and less than O(n) storage (typically O(1)), where n is the size of the input.^[3] An example of a one-pass algorithm is the Sondik partially observable Markov decision process.^[4]

Example problems solvable by one-pass algorithms

Given any list as an input:

Count the number of elements.

Given a list of numbers:

Find the k largest or smallest elements, k given in advance.
Find the sum, mean, variance and standard deviation of the elements of the list. See also Algorithms for calculating variance.

Given a list of symbols from an alphabet of k symbols, given in advance.

Count the number of times each symbol appears in the input.
Find the most or least frequent elements.
Sort the list according to some order on the symbols (possible since the and after number of symbols is limited).
Find the maximum gap between two appearances of a given symbol.

Example problems not solvable by one-pass algorithms

Given any list as an input:

Find the nth element from the end (or report that the list has fewer than n elements).
Find the middle element of the list. However, this is solvable with two passes: Pass 1 counts the elements and pass 2 picks out the middle one.

Given a list of numbers:

Find the median.
Find the modes (This is not the same as finding the most frequent symbol from a limited alphabet).
Sort the list.
Count the number of items greater than or less than the mean. However, this can be done in constant memory with two passes: Pass 1 finds the average and pass 2 does the counting.

The two-pass algorithms above are still streaming algorithms but not one-pass algorithms.

Related Research Articles

In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

<span class="mw-page-title-main">Insertion sort</span> Sorting algorithm

Insertion sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time by comparisons. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

<span class="mw-page-title-main">Merge sort</span> Divide and conquer-based sorting algorithm

In computer science, merge sort is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the relative order of equal elements is the same in the input and output. Merge sort is a divide-and-conquer algorithm that was invented by John von Neumann in 1945. A detailed description and analysis of bottom-up merge sort appeared in a report by Goldstine and von Neumann as early as 1948.

Merge algorithms are a family of algorithms that take multiple sorted lists as input and produce a single list as output, containing all the elements of the inputs lists in sorted order. These algorithms are used as subroutines in various sorting algorithms, most famously merge sort.

In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process is repeated for each digit, while preserving the ordering of the prior step, until all digits have been considered. For this reason, radix sort has also been called bucket sort and digital sort.

<span class="mw-page-title-main">Sorting algorithm</span> Algorithm that arranges lists in order

In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important for optimizing the efficiency of other algorithms that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

In computer science, selection sort is an in-place comparison sorting algorithm. It has an O(n²) time complexity, which makes it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity and has performance advantages over more complicated algorithms in certain situations, particularly where auxiliary memory is limited.

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset $of integers and a target-sum, and the question is to decide whether any subset of the integers sum to precisely . The problem is known to be NP-hard. Moreover, some restricted variants of it are NP-complete too, for example:$

Bucket sort, or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, a generalization of pigeonhole sort that allows multiple keys per bucket, and is a cousin of radix sort in the most-to-least significant digit flavor. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity depends on the algorithm used to sort each bucket, the number of buckets to use, and whether the input is uniformly distributed.

In computer science, bogosort is a sorting algorithm based on the generate and test paradigm. The function successively generates permutations of its input until it finds one that is sorted. It is not considered useful for sorting, but may be used for educational purposes, to contrast it with more efficient algorithms.

Bead sort, also called gravity sort, is a natural sorting algorithm, developed by Joshua J. Arulanandham, Cristian S. Calude and Michael J. Dinneen in 2002, and published in The Bulletin of the European Association for Theoretical Computer Science. Both digital and analog hardware implementations of bead sort can achieve a sorting time of O(n); however, the implementation of this algorithm tends to be significantly slower in software and can only be used to sort lists of positive integers. Also, it would seem that even in the best case, the algorithm requires O(n²) space.

Bitonic mergesort is a parallel algorithm for sorting. It is also used as a construction method for building a sorting network. The algorithm was devised by Ken Batcher. The resulting sorting networks consist of $comparators and have a delay of, where is the number of items to be sorted. This makes it a popular choice for sorting large numbers of elements on an architecture which itself contains a large number of parallel execution units running in lockstep, such as a typical GPU.$

<span class="mw-page-title-main">Quicksort</span> Divide and conquer sorting algorithm

Quicksort is an efficient, general-purpose sorting algorithm. Quicksort was developed by British computer scientist Tony Hoare in 1959 and published in 1961. It is still a commonly used algorithm for sorting. Overall, it is slightly faster than merge sort and heapsort for randomized data, particularly on larger distributions.

Spreadsort is a sorting algorithm invented by Steven J. Ross in 2002. It combines concepts from distribution-based sorts, such as radix sort and bucket sort, with partitioning concepts from comparison sorts such as quicksort and mergesort. In experimental results it was shown to be highly efficient, often outperforming traditional algorithms such as quicksort, particularly on distributions exhibiting structure and string sorting. There is an open-source implementation with performance analysis and benchmarks, and HTML documentation .

Flashsort is a distribution sorting algorithm showing linear computational complexity $O (n)$ for uniformly distributed data sets and relatively little additional memory requirement. The original work was published in 1998 by Karl-Dietrich Neubert.

Samplesort is a sorting algorithm that is a divide and conquer algorithm often used in parallel processing systems. Conventional divide and conquer sorting algorithms partitions the array into sub-intervals or buckets. The buckets are then sorted individually and then concatenated together. However, if the array is non-uniformly distributed, the performance of these sorting algorithms can be significantly throttled. Samplesort addresses this issue by selecting a sample of size $s$ from the $n$ -element sequence, and determining the range of the buckets by sorting the sample and choosing $p -1 < s$ elements from the result. These elements then divide the array into $p$ approximately equal-sized buckets. Samplesort is described in the 1970 paper, "Samplesort: A Sampling Approach to Minimal Storage Tree Sorting", by W. D. Frazer and A. C. McKellar.

<span class="mw-page-title-main">Bubble sort</span> Simple comparison sorting algorithm

Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the input list element by element, comparing the current element with the one after it, swapping their values if needed. These passes through the list are repeated until no swaps had to be performed during a pass, meaning that the list has become fully sorted. The algorithm, which is a comparison sort, is named for the way the larger elements "bubble" up to the top of the list.

The Boyer–Moore majority vote algorithm is an algorithm for finding the majority of a sequence of elements using linear time and a constant number of words of memory. It is named after Robert S. Boyer and J Strother Moore, who published it in 1981, and is a prototypical example of a streaming algorithm.

In computer science, input enhancement is the principle that processing a given input to a problem and altering it in a specific way will increase runtime efficiency or space efficiency, or both. The altered input is usually stored and accessed to simplify the problem. By exploiting the structure and properties of the inputs, input enhancement creates various speed-ups in the efficiency of the algorithm.

In computer science, k-way merge algorithms or multiway merges are a specific type of sequence merge algorithms that specialize in taking in k sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two. Two-way merges are also referred to as binary merges.The k- way merge also external sorting algorithm.

References

↑ Schweikardt, Nicole. "One-Pass Algorithm" (PDF). Retrieved 2021-07-01.
↑ Pollett, Chris (2005-03-14). "One and Two Pass Algorithms" (PDF). Retrieved 2021-07-01.
↑ Schweikardt, Nicole (2009), "One-Pass Algorithm", in LIU, LING; ÖZSU, M. TAMER (eds.), Encyclopedia of Database Systems, Boston, MA: Springer US, pp. 1948–1949, doi:10.1007/978-0-387-39940-9_253, ISBN 978-0-387-39940-9 , retrieved 2021-04-13
↑ "Sondik's One-Pass Algorithm". www.pomdp.org.

This computer science article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[frankfurt-1] Schweikardt, Nicole. "One-Pass Algorithm" (PDF). Retrieved 2021-07-01.

[sjsu-2] Pollett, Chris (2005-03-14). "One and Two Pass Algorithms" (PDF). Retrieved 2021-07-01.

[eds-3] Schweikardt, Nicole (2009), "One-Pass Algorithm", in LIU, LING; ÖZSU, M. TAMER (eds.), Encyclopedia of Database Systems, Boston, MA: Springer US, pp. 1948–1949, doi:10.1007/978-0-387-39940-9_253, ISBN 978-0-387-39940-9 , retrieved 2021-04-13

[pomdp-4] "Sondik's One-Pass Algorithm". www.pomdp.org.

[1]

[2]

[3]

[4]

One-pass algorithm

Contents

Example problems solvable by one-pass algorithms

Example problems not solvable by one-pass algorithms

Related Research Articles

References