Rasmus Pagh

Last updated
Rasmus Pagh
Born6 February 1975 (1975-02-06) (age 48)
Denmark
Alma mater
Awards
  • WWW Best Paper Award (2014)
  • ESA Test-of-Time Award (2020)
Scientific career
Fields
Institutions
Thesis Hashing, randomness and dictionaries
Doctoral advisor Peter Bro Miltersen
Website rasmuspagh.net

Rasmus Pagh is a Danish computer scientist and a professor of computer science at the University of Copenhagen. His main work is in algorithms and data structures, and he is particularly known for the cuckoo hashing algorithm and for co-founding the Basic Algorithms Research Center, BARC, in Copenhagen.

Contents

Early life and education

Rasmus Pagh was born in Copenhagen, [1] but soon after his family moved to Esbjerg in western Denmark. He went to high school at Rødkilde Amtsgymnasium where he participated in the "JP Forsker" science competition, and in the "Georg Mohr" mathematics competition. After graduating in 1994, he went to study mathematics and computer science at Aarhus University. In 1998 he started his PhD with Peter Bro Miltersen and started writing articles about hashing and efficient dictionaries, culminating in his work on cuckoo hashing. Soon after his thesis defence was in the fall of 2002 he became an assistant professor at the recently founded IT University of Copenhagen.

Career

In 2007, Rasmus founded the Scalable Query Evaluation for Reliable Databases (SQERD) project. The project aimed at applying modern algorithmic techniques to problems arising in database management systems in connection with the evaluation of queries. From 2011-2015, he ran the MaDaMS project, which partnered with Demetra A/S, Aarhus University and Apptus AB at finding more efficient approaches to data mining. [2]

Rasmus Pagh was made full professor at ITU with his Inaugural Lecture [3] in 2013. In 2014, he received an ERC Consolidator Grant for a project on Scalable Similarity Search. [4] [5] The project resulted in many new algorithms, including a way to prevent false negatives in high dimensional search. [6] In 2017 Pagh co-founded the Basic Algorithms Research Center, BARC, in Copenhagen [7] with Mikkel Thorup, Thore Husfeldt and Stephen Alstrup. Soon thereafter he took a sabbatical to join the Simons Institute at University of California, Berkeley [8] and become a Google visiting scholar. [9]

In 2019, Rasmus Pagh became an Associate Editor of the SIAM Journal on Computing . [10]

In 2020, Rasmus Pagh received the European Symposium on Algorithms Test-of-Time award for his 2001 work on cuckoo hashing with Flemming Friche Rodler. [11]

See also

Related Research Articles

<span class="mw-page-title-main">Hash table</span> Associative array for storing key-value pairs

In computing, a hash table, also known as a hash map, is a data structure that implements an associative array, also called a dictionary, which is an abstract data type that maps keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored.

<span class="mw-page-title-main">Perfect hash function</span> Hash function without any collisions

In computer science, a perfect hash functionh for a set S is a hash function that maps distinct elements in S to a set of m integers, with no collisions. In mathematical terms, it is an injective function.

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed ; the more items added, the larger the probability of false positives.

In computer science, a fusion tree is a type of tree data structure that implements an associative array on w-bit integers on a finite universe, where each of the input integers has size less than 2w and is non-negative. When operating on a collection of n key–value pairs, it uses O(n) space and performs searches in O(logwn) time, which is asymptotically faster than a traditional self-balancing binary search tree, and also better than the van Emde Boas tree for large values of w. It achieves this speed by using certain constant-time operations that can be done on a machine word. Fusion trees were invented in 1990 by Michael Fredman and Dan Willard.

<span class="mw-page-title-main">Linear probing</span> Computer programming method for hashing

Linear probing is a scheme in computer programming for resolving collisions in hash tables, data structures for maintaining a collection of key–value pairs and looking up the value associated with a given key. It was invented in 1954 by Gene Amdahl, Elaine M. McGraw, and Arthur Samuel and first analyzed in 1963 by Donald Knuth.

In computer programming, primary clustering is a phenomenon that causes performance degradation in linear-probing hash tables. The phenomenon states that, as elements are added to a linear probing hash table, they have a tendency to cluster together into long runs. If the hash table is at a load factor of for some parameter , then the expected length of the run containing a given element is . This causes insertions and negative queries to take expected time in a linear-probing hash table.

<span class="mw-page-title-main">Jeffrey Vitter</span> American computer scientist

Jeffrey Scott Vitter is a U.S. computer scientist and academic administrator. Born in 1955 in New Orleans, Vitter has served in several senior higher education administration posts. He is a former chancellor of the University of Mississippi. He assumed the chancellor position on January 1, 2016. His formal investiture to the chancellorship took place on November 10, 2016, at the University of Mississippi's Oxford Campus.

<span class="mw-page-title-main">Cuckoo hashing</span> Data structure hashing scheme

Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table, with worst-case constant lookup time. The name derives from the behavior of some species of cuckoo, where the cuckoo chick pushes the other eggs or young out of the nest when it hatches in a variation of the behavior referred to as brood parasitism; analogously, inserting a new key into a cuckoo hashing table may push an older key to a different location in the table.

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values.

In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but still allows for efficient query operations. The concept was originally introduced by Jacobson to encode bit vectors, (unlabeled) trees, and planar graphs. Unlike general lossless data compression algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure, insofar as the size of the stored or encoded data similarly depends upon the specific content of the data itself.

In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

Similarity search is the most general term used for a range of mechanisms which share the principle of searching spaces of objects where the only available comparator is the similarity between any pair of objects. This is becoming increasingly important in an age of large information repositories where the objects contained do not possess any natural order, for example large collections of images, sounds and other sophisticated digital objects.

<span class="mw-page-title-main">Kurt Mehlhorn</span> German computer scientist (born 1949)

Kurt Mehlhorn is a German theoretical computer scientist. He has been a vice president of the Max Planck Society and is director of the Max Planck Institute for Computer Science.

WADS, the Algorithms and Data Structures Symposium, is an international academic conference in the field of computer science, focusing on algorithms and data structures. WADS is held every second year, usually in Canada and always in North America. It is held in alternation with its sister conference, the Scandinavian Symposium and Workshops on Algorithm Theory (SWAT), which is usually held in Scandinavia and always in Northern Europe. Historically, the proceedings of both conferences were published by Springer Verlag through their Lecture Notes in Computer Science series. Springer continues to publish WADS proceedings, but starting in 2016, SWAT proceedings are now published by Dagstuhl through their Leibniz International Proceedings in Informatics.

Michael David Mitzenmacher is an American computer scientist working in algorithms. He is Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences and was area dean of computer science July 2010 to June 2013. He also runs My Biased Coin, a blog about theoretical computer science.

In computer science, a family of hash functions is said to be k-independent, k-wise independent or k-universal if selecting a function at random from the family guarantees that the hash codes of any designated k keys are independent random variables. Such families allow good average case performance in randomized algorithms or data structures, even if the input data is chosen by an adversary. The trade-offs between the degree of independence and the efficiency of evaluating the hash function are well studied, and many k-independent families have been proposed.

In computer science, tabulation hashing is a method for constructing universal families of hash functions by combining table lookup with exclusive or operations. It was first studied in the form of Zobrist hashing for computer games; later work by Carter and Wegman extended this method to arbitrary fixed-length keys. Generalizations of tabulation hashing have also been developed that can handle variable-length keys such as text strings.

Mikkel Thorup is a Danish computer scientist working at University of Copenhagen. He completed his undergraduate education at Technical University of Denmark and his doctoral studies at Oxford University in 1993. From 1993 to 1998, he was at University of Copenhagen and from 1998 to 2013 he was at AT&T Labs-Research in New Jersey. Since 2013 he has been at the University of Copenhagen as a Professor and Head of Center for Efficient Algorithms and Data Structures (EADS).

A cuckoo filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set, like a Bloom filter does. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". A cuckoo filter can also delete existing items, which is not supported by Bloom filters. In addition, for applications that store many items and target moderately low false positive rates, cuckoo filters can achieve lower space overhead than space-optimized Bloom filters.

In computer science, a retrieval data structure, also known as static function, is a space-efficient dictionary-like data type composed of a collection of pairs that allows the following operations:

References

  1. "About Rasmus". www.itu.dk.
  2. "MaDaMS". sites.google.com.
  3. https://www.itu.dk/people/pagh/papers/inaugural-lecture.pdf [ bare URL PDF ]
  4. "On a mission to save search engines". 2014-03-12.
  5. "Scalable Similarity Search".
  6. "Søgning efter sorte huller".
  7. "NYT københavnsk kraftcenter inden for algoritmeforskning". 2017-03-20.
  8. "Rasmus Pagh | Simons Institute for the Theory of Computing". 27 February 2018.
  9. "Rasmus Pagh (@RasmusPagh1) | Twitter". twitter.com.
  10. "SICOMP | Editorial Board | SIAM". www.siam.org.
  11. "ESA - European Symposium on Algorithms: ESA Test-of-Time Award 2020". European Symposia on Algorithms. Retrieved 2021-05-22.