Uzi Vishkin

Last updated
Uzi Vishkin
Born1953
Alma mater Hebrew University
Technion
Scientific career
Fields parallel algorithms
Institutions IBM Thomas J. Watson Research Center
New York University
Tel Aviv University
University of Maryland, College Park
Doctoral advisor Yossi Shiloach

Uzi Vishkin (born 1953) is a computer scientist at the University of Maryland, College Park, where he is Professor of Electrical and Computer Engineering at the University of Maryland Institute for Advanced Computer Studies (UMIACS). Uzi Vishkin is known for his work in the field of parallel computing. In 1996, he was inducted as a Fellow of the Association for Computing Machinery, with the following citation: "One of the pioneers of parallel algorithms research, Dr. Vishkin's seminal contributions played a leading role in forming and shaping what thinking in parallel has come to mean in the fundamental theory of Computer Science." [1]

Contents

Biography

Uzi Vishkin was born in Tel Aviv, Israel. He completed his B.Sc. (1974) and M.Sc. in Mathematics at the Hebrew University, before earning his D.Sc. in Computer Science at the Technion (1981). He then spent a year working at the IBM Thomas J. Watson Research Center in Yorktown Heights, New York. From 1982 to 1984, he worked at the department of computer science at New York University and remained affiliated with it till 1988. From 1984 until 1997 he worked in the computer science department of Tel Aviv University, serving as its chair from 1987 to 1988. Since 1988 he is with the University of Maryland, College Park.

PRAM-on-chip

A notable rudimentary abstraction—that any single instruction available for execution in a serial program executes immediately—made serial computing simple. A consequence of this abstraction is a step-by-step (inductive) explication of the instruction available next for execution. The rudimentary parallel abstraction behind the PRAM-on-chip concept, dubbed Immediate Concurrent Execution (ICE) in Vishkin (2011), is that indefinitely many instructions available for concurrent execution execute immediately. A consequence of ICE is a step-by-step (inductive) explication (also known as lock-step) of the instructions available next for concurrent execution. Moving beyond the serial von Neumann computer (the only successful general purpose platform to date), the aspiration of the PRAM-on-chip concept is that computer science will again be able to augment mathematical induction with a simple one-line computing abstraction. A chronological overview of the evolution of the PRAM-on-chip concept and its hardware and software prototyping follow. In the 1980s and 1990s, Uzi Vishkin co-authored several articles that helped building a theory of parallel algorithms in a mathematical model called parallel random access machine (PRAM), which is a generalization for parallel computing of the standard serial computing model random-access machine (RAM). The parallel machines needed for implementing the PRAM model have not yet been built at the time, and quite a few challenged the ability to ever build such machines. Concluding in 1997 [2] that the transistor count on chip as implied by Moore's Law will allow building a powerful parallel computer on a single silicon chip within a decade, he developed a PRAM-On-Chip vision that called for building a parallel computer on a single chip that allows programmers to develop their algorithms for the PRAM model. He went on to invent the explicit multi-threaded (XMT) computer architecture that enables implementation of this PRAM theory, and led his research team to completing in January 2007 a 64-processor computer [3] named Paraleap, [4] that demonstrates the overall concept. The XMT concept was presented in Vishkin et al. (1998), Naishlos et al. (2003), the XMT 64-processor computer in Wen & Vishkin (2008), in Vishkin (2011) and most recently in Ghanim, Vishkin & Barua (2018), where it was shown that lock-step parallel programming (using ICE) can achieve the same performance as the fastest hand-tuned multi-threaded code on XMT systems. Such inductive lock-step approach stands in contrast to multi-threaded programming approaches of other many core systems that are known for challenging programmers. The demonstration of XMT comprised several hardware and software components, as well as teaching PRAM algorithms in order to program the XMT Paraleap, using a language called XMTC. Since making parallel programming easy is one of the biggest challenges facing computer science today, the demonstration also sought to include teaching the basics of PRAM algorithms and XMTC programming to students ranging from high-school to graduate school.

Parallel algorithms

In the field of parallel algorithms, Uzi Vishkin co-authored the paper Shiloach & Vishkin (1982b) that contributed the work-time (WT) (sometimes called work-depth) framework for conceptualizing and describing parallel algorithms. The WT framework was adopted as the basic presentation framework in the parallel algorithms books JaJa (1992) and Keller, Kessler & Traeff (2001), as well as in the class notes Vishkin (2009). In the WT framework, a parallel algorithm is first described in terms of parallel rounds. For each round, the operations to be performed are characterized, but several issues can be suppressed. For example, the number of operations at each round need not be clear, processors need not be mentioned and any information that may help with the assignment of processors to jobs need not be accounted for. Second, the suppressed information is provided. The inclusion of the suppressed information is, in fact, guided by the proof of a scheduling theorem due to Brent (1974). The WT framework is useful since while it can greatly simplify the initial description of a parallel algorithm, inserting the details suppressed by that initial description is often not very difficult. Similarly, first casting an algorithm in the WT framework can be very helpful for programming it in XMTC. Vishkin (2011) explains the simple connection between the WT framework and the more rudimentary ICE abstraction noted above.

In the field of parallel and distributed algorithms, one of the seminal papers co-authored by Uzi Vishkin is Cole & Vishkin (1986). This work introduced an efficient parallel technique for graph coloring. The Cole–Vishkin algorithm finds a vertex colouring in an n-cycle in O(log* n) synchronous communication rounds. This algorithm is nowadays presented in many textbooks, including Introduction to Algorithms by Cormen et al., [5] and it forms the basis of many other distributed algorithms for graph colouring. [6]

Other contributions by Uzi Vishkin and various co-authors include parallel algorithms for list ranking, lowest common ancestor, spanning trees, and biconnected components.

Selected publications

Notes

  1. ACM: Fellows Award / Uzi Vishkin.
  2. Vishkin, Uzi. Spawn-join instruction set architecture for providing explicit multithreading. U.S. Patent 6,463,527. See also Vishkin et al. (1998).
  3. University of Maryland, press release, June 26, 2007: "Maryland Professor Creates Desktop Supercomputer" Archived 2009-12-14 at the Wayback Machine .
  4. University of Maryland, A. James Clark School of Engineering, press release, November 28, 2007: "Next Big "Leap" in Computing Technology Gets a Name".
  5. 1st ed., Section 30.5.
  6. See, e.g., Goldberg, Plotkin & Shannon (1988).

Related Research Articles

A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. Distributed computing is a field of computer science that studies distributed systems.

Very long instruction word (VLIW) refers to instruction set architectures that are designed to exploit instruction level parallelism (ILP). A VLIW processor allows programs to explicitly specify instructions to execute in parallel, whereas conventional central processing units (CPUs) mostly allow programs to specify instructions to execute in sequence only. VLIW is intended to allow higher performance without the complexity inherent in some other designs.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is an algorithm which can do multiple operations in a given time. It has been a tradition of computer science to describe serial algorithms in abstract machine models, often the one known as random-access machine. Similarly, many computer science researchers have used a so-called parallel random-access machine (PRAM) as a parallel abstract machine (shared-memory).

<span class="mw-page-title-main">Concurrency (computer science)</span> Ability to execute a task in a non-serial manner

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability of a program, algorithm, or problem into order-independent or partially-ordered components or units of computation.

In computer science, a parallel random-access machine is a shared-memory abstract machine. As its name indicates, the PRAM is intended as the parallel-computing analogy to the random-access machine (RAM). In the same way that the RAM is used by sequential-algorithm designers to model algorithmic performance, the PRAM is used by parallel-algorithm designers to model parallel algorithmic performance. Similar to the way in which the RAM model neglects practical issues, such as access time to cache memory versus main memory, the PRAM model neglects such issues as synchronization and communication, but provides any (problem-size-dependent) number of processors. Algorithm cost, for instance, is estimated using two parameters O(time) and O(time × processor_number).

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently the compiled programs can execute. The implementation of a parallel programming model can take the form of a library invoked from a sequential language, as an extension to an existing language, or as an entirely new language.

In parallel algorithms, the list ranking problem involves determining the position, or rank, of each item in a linked list. That is, the first item in the list should be assigned the number 1, the second item in the list should be assigned the number 2, etc. Although it is straightforward to solve this problem efficiently on a sequential computer, by traversing the list in order, it is more complicated to solve in parallel. As Anderson & Miller (1990) wrote, the problem was viewed as important in the parallel algorithms community both for its many applications and because solving it led to many important ideas that could be applied in parallel algorithms more generally.

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

Thread Level Speculation (TLS), also known as Speculative Multi-threading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

<span class="mw-page-title-main">Data parallelism</span> Parallelization across multiple processors in parallel computing environments

Data parallelism is parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like arrays and matrices by working on each element in parallel. It contrasts to task parallelism as another form of parallelism.

XMTC is a shared-memory parallel programming language. It is an extension of the C programming language which strives to enable easy PRAM-like programming based on the explicit multi-threading paradigm. It is developed as part of the XMT PRAM-On-Chip vision by a research team at the University of Maryland, College Park, led by Dr. Uzi Vishkin.

<span class="mw-page-title-main">Kurt Mehlhorn</span> German computer scientist (born 1949)

Kurt Mehlhorn is a German theoretical computer scientist. He has been a vice president of the Max Planck Society and is director of the Max Planck Institute for Computer Science.

Explicit Multi-Threading (XMT) is a computer science paradigm for building and programming parallel computers designed around the parallel random-access machine (PRAM) parallel computational model. A more direct explanation of XMT starts with the rudimentary abstraction that made serial computing simple: that any single instruction available for execution in a serial program executes immediately. A consequence of this abstraction is a step-by-step (inductive) explication of the instruction available next for execution. The rudimentary parallel abstraction behind XMT, dubbed Immediate Concurrent Execution (ICE) in Vishkin (2011), is that indefinitely many instructions available for concurrent execution execute immediately. A consequence of ICE is a step-by-step (inductive) explication of the instructions available next for concurrent execution. Moving beyond the serial von Neumann computer, the aspiration of XMT is that computer science will again be able to augment mathematical induction with a simple one-line computing abstraction.

In computing, algorithmic skeletons, or parallelism patterns, are a high-level parallel programming model for parallel and distributed computing.

In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors. It does so efficiently in terms of execution time, memory usage, and inter-processor communication.

Pointer jumping or path doubling is a design technique for parallel algorithms that operate on pointer structures, such as linked lists and directed graphs. Pointer jumping allows an algorithm to follow paths with a time complexity that is logarithmic with respect to the length of the longest path. It does this by "jumping" to the end of the path computed by neighbors.

In computer science, the analysis of parallel algorithms is the process of finding the computational complexity of algorithms executed in parallel – the amount of time, storage, or other resources needed to execute them. In many respects, analysis of parallel algorithms is similar to the analysis of sequential algorithms, but is generally more involved because one must reason about the behavior of multiple cooperating threads of execution. One of the primary goals of parallel analysis is to understand how a parallel algorithm's use of resources changes as the number of processors is changed.

Richard J. Cole is a Silver Professor of Computer Science at the Courant Institute of Mathematical Sciences, New York University, and works on the Design and Analysis of Computer Algorithms.

References

This survey paper cites 16 papers co-authored by Vishkin

Cites 36 papers co-authored by Vishkin

This survey paper cites 20 papers co-authored by Vishkin

Cites 19 papers co-authored by Vishkin