Richard Vuduc

Last updated
Williams, Samuel; Oliker, Leonid; Vuduc, Richard; Shalf, John; Yelick, Katherine; Demmel, James (2007). "Optimization of sparse matrix-vector multiplication on emerging multicore platforms". Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07. p. 1. doi:10.1145/1362622.1362674. ISBN   9781595937643. S2CID   1845814.
  • Vuduc, Richard; Demmel, James W.; Yelick, Katherine A. (2005). "OSKI: A library of automatically tuned sparse matrix kernels". Journal of Physics: Conference Series. 16 (1): 521. Bibcode:2005JPhCS..16..521V. doi: 10.1088/1742-6596/16/1/071 . ISSN   1742-6596.
  • Vuduc, Richard (Rich). "Model-driven autotuning of sparse matrix-vector multiply on GPUs". ACM SIGPLAN Notices.
  • Im, Eun-Jin; Yelick, Katherine; Vuduc, Richard (February 2004). "Sparsity: Optimization Framework for Sparse Matrix Kernels". Int. J. High Perform. Comput. Appl. 18 (1): 135–158. CiteSeerX   10.1.1.137.5844 . doi:10.1177/1094342004041296. ISSN   1094-3420. S2CID   2447843.
  • Vuduc, Richard Wilson (2003). Automatic Performance Tuning of Sparse Matrix Kernels (Thesis). University of California, Berkeley.
  • Demmel, J.; Dongarra, J.; Eijkhout, V.; Fuentes, E.; Petitet, A.; Vuduc, R.; Whaley, R. C.; Yelick, K. (February 2005). "Self-Adapting Linear Algebra Algorithms and Software". Proceedings of the IEEE. 93 (2): 293–312. CiteSeerX   10.1.1.108.7568 . doi:10.1109/JPROC.2004.840848. ISSN   0018-9219. S2CID   3065125.
  • Vuduc, Richard; Demmel, James W.; Yelick, Katherine A.; Kamil, Shoaib; Nishtala, Rajesh; Lee, Benjamin (2002). "Performance Optimizations and Bounds for Sparse Matrix-vector Multiply". Proceedings of the 2002 ACM/IEEE Conference on Supercomputing. SC '02. Los Alamitos, CA, USA: IEEE Computer Society Press. pp. 1–35.
  • Lashuk, Ilya; Chandramowlishwaran, Aparna; Langston, Harper; Nguyen, Tuan-Anh; Sampath, Rahul; Shringarpure, Aashay; Vuduc, Richard; Ying, Lexing; Zorin, Denis (May 2012). "A Massively Parallel Adaptive Fast Multipole Method on Heterogeneous Architectures". Communications of the ACM. 55 (5): 101–109. doi:10.1145/2160718.2160740. ISSN   0001-0782. S2CID   2272736.
  • Rahimian, Abtin; Lashuk, Ilya; Veerapaneni, Shravan; Chandramowlishwaran, Aparna; Malhotra, Dhairya; Moon, Logan; Sampath, Rahul; Shringarpure, Aashay; Vetter, Jeffrey; Vuduc, Richard; Zorin, Denis; Biros, George (2010). "Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures". 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. pp. 1–11. doi:10.1109/SC.2010.42. ISBN   9781424475599. S2CID   5490197.
  • Sim, Jaewoong; Dasgupta, Aniruddha; Kim, Hyesoon; Vuduc, Richard (2012). "A performance analysis framework for identifying potential benefits in GPGPU applications". Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12. p. 11. CiteSeerX   10.1.1.226.3542 . doi:10.1145/2145816.2145819. ISBN   9781450311601. S2CID   6817445.
  • Vuduc, Richard; Chandramowlishwaran, Aparna; Choi, Jee; Guney, Murat; Shringarpure, Aashay (2010). "On the Limits of GPU Acceleration". Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism. HotPar'10. Berkeley, CA, USA: USENIX Association. p. 13.
  • Vuduc, Richard W.; Moon, Hyun-Jin (2005). "Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure". High Performance Computing and Communications. Lecture Notes in Computer Science. Vol. 3726. Berlin, Heidelberg: Springer-Verlag. pp. 807–816. doi:10.1007/11557654_91. ISBN   978-3540290315.
  • Park, Sangmin; Vuduc, Richard W.; Harrold, Mary Jean (2010). "Falcon". Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - ICSE '10. Vol. 1. p. 245. doi:10.1145/1806799.1806838. ISBN   9781605587196. S2CID   8744239.
  • Vuduc, Richard; Demmel, James W.; Bilmes, Jeff A. (February 2004). "Statistical Models for Empirical Search-Based Performance Tuning". The International Journal of High Performance Computing Applications. 18 (1): 65–94. CiteSeerX   10.1.1.64.5699 . doi:10.1177/1094342004041293. ISSN   1094-3420. S2CID   2563412.
  • Qing, Yi; Keith, Seymour; Haihang, You; Richard, Vuduc; Dan, Quinlan. "POET: Parameterized Optimizations for Empirical Tuning". 2007 IEEE International Parallel and Distributed Processing Symposium.
  • Chandramowlishwaran, A.; Knobe, K.; Vuduc, R. (April 2010). "Performance evaluation of concurrent collections on high-performance multicore computing systems". 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). pp. 1–12. CiteSeerX   10.1.1.169.5643 . doi:10.1109/IPDPS.2010.5470404. ISBN   978-1-4244-6442-5. S2CID   1133093.
  • Related Research Articles

    <span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

    A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, supercomputers have existed which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

    General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.

    SUPER-UX was a version of the Unix operating system from NEC that is used on its SX series of supercomputers.

    Split-C is a parallel extension of the C programming language. The Split-C project website describes Split-C as:

    a parallel extension of the C programming language that supports efficient access to a global address space on current distributed memory multiprocessors. It retains the "small language" character of C and supports careful engineering and optimization of programs by providing a simple, predictable cost model.

    Thread Level Speculation (TLS), also known as Speculative Multi-threading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

    In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portion is local to each process, thread, or processing element. The novelty of PGAS is that the portions of the shared memory space may have an affinity for a particular process, thereby exploiting locality of reference in order to improve performance. A PGAS memory model is featured in various parallel programming languages and libraries, including: Coarray Fortran, Unified Parallel C, Split-C, Fortress, Chapel, X10, UPC++, Coarray C++, Global Arrays, DASH and SHMEM. The PGAS paradigm is now an integrated part of the Fortran language, as of Fortran 2008 which standardized coarrays.

    In mathematics, a graph partition is the reduction of a graph to a smaller graph by partitioning its set of nodes into mutually exclusive groups. Edges of the original graph that cross between the groups will produce edges in the partitioned graph. If the number of resulting edges is small compared to the original graph, then the partitioned graph may be better suited for analysis and problem-solving than the original. Finding a partition that simplifies graph analysis is a hard problem, but one that has applications to scientific computing, VLSI circuit design, and task scheduling in multiprocessor computers, among others. Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological and biological networks. For a survey on recent trends in computational methods and applications see Buluc et al. (2013). Two common examples of graph partitioning are minimum cut and maximum cut problems.

    Concurrent Collections is a programming model for software frameworks to expose parallelism in applications. The Concurrent Collections conception originated from tagged stream processing development with HP TStreams.

    Reverse computation is a software application of the concept of reversible computing.

    Gather/scatter is a type of memory addressing that at once collects (gathers) from, or stores (scatters) data to, multiple, arbitrary indices. Examples of its use include sparse linear algebra operations, sorting algorithms, fast Fourier transforms, and some computational graph theory problems. It is the vector equivalent of register indirect addressing, with gather involving indexed reads, and scatter, indexed writes. Vector processors have hardware support for gather and scatter operations, as do many input/output systems, allowing large data sets to be transferred to main memory more rapidly.

    SC, the International Conference for High Performance Computing, Networking, Storage and Analysis, is the annual conference established in 1988 by the Association for Computing Machinery and the IEEE Computer Society. In 2019, about 13,950 people participated overall; by 2022 attendance had rebounded to 11,830 both in-person and online. The not-for-profit conference is run by a committee of approximately 600 volunteers who spend roughly three years organizing each conference.

    <span class="mw-page-title-main">James Demmel</span> American mathematician

    James Weldon Demmel Jr. is an American mathematician and computer scientist, the Dr. Richard Carl Dehmel Distinguished Professor of Mathematics and Computer Science at the University of California, Berkeley.

    <span class="mw-page-title-main">Tachyon (software)</span>

    Tachyon is a parallel/multiprocessor ray tracing software. It is a parallel ray tracing library for use on distributed memory parallel computers, shared memory computers, and clusters of workstations. Tachyon implements rendering features such as ambient occlusion lighting, depth-of-field focal blur, shadows, reflections, and others. It was originally developed for the Intel iPSC/860 by John Stone for his M.S. thesis at University of Missouri-Rolla. Tachyon subsequently became a more functional and complete ray tracing engine, and it is now incorporated into a number of other open source software packages such as VMD, and SageMath. Tachyon is released under a permissive license.

    The LINPACK Benchmarks are a measure of a system's floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering.

    <span class="mw-page-title-main">Kathryn S. McKinley</span> American computer scientist

    Kathryn S. McKinley is an American computer scientist noted for her research on compilers, runtime systems, and computer architecture. She is also known for her leadership in broadening participation in computing. McKinley was co-chair of CRA-W from 2011 to 2014.

    In mathematics and statistics, random projection is a technique used to reduce the dimensionality of a set of points which lie in Euclidean space. According to theoretical results, random projection preserves distances well, but empirical results are sparse. They have been applied to many natural language tasks under the name random indexing.

    <span class="mw-page-title-main">Haesun Park</span> South Korean American mathematician

    Haesun Park is a professor and chair of Computational Science and Engineering at the Georgia Institute of Technology. She is an IEEE Fellow, ACM Fellow, and Society for Industrial and Applied Mathematics Fellow. Park's main areas of research are Numerical Algorithms, Data Analysis, Visual Analytics and Parallel Computing. She has co-authored over 100 articles in peer-reviewed journals and conferences.

    <span class="mw-page-title-main">Ümit Çatalyürek</span>

    Ümit V. Çatalyürek is a professor of computer science at the Georgia Institute of Technology, and Adjunct Professor in department of Biomedical Informatics at the Ohio State University. He is known for his work on graph analytics, parallel algorithms for scientific applications, data-intensive computing, and large scale genomic and biomedical applications. He was the director of the High Performance Computing Lab at the Ohio State University. He was named Fellow of the Institute of Electrical and Electronics Engineers (IEEE) in 2016 for contributions to combinatorial scientific computing and parallel computing.

    Michael A. Bender is an American computer scientist, known for his work in cache-oblivious algorithms, lowest common ancestor data structures, scheduling (computing), and pebble games. He is David R. Smith Leading Scholar professor of computer science at Stony Brook University, and a co-founder of storage technology startup company Tokutek.

    The Center for Supercomputing Research and Development (CSRD) at the University of Illinois (UIUC) was a research center funded from 1984 to 1993. It built the shared memory Cedar computer system, which included four hardware multiprocessor clusters, as well as parallel system and applications software. It was distinguished from the four earlier UIUC Illiac systems by starting with commercial shared memory subsystems that were based on an earlier paper published by the CSRD founders. Thus CSRD was able to avoid many of the hardware design issues that slowed the Illiac series work. Over its 9 years of major funding, plus follow-on work by many of its participants, CSRD pioneered many of the shared memory architectural and software technologies upon which all 21st century computation is based.

    References

    1. 1 2 3 4 "Richard Vuduc | College of Computing". www.cc.gatech.edu. Retrieved 2017-12-17.
    2. "Richard Vuduc - Google Scholar Citations". scholar.google.co.in. Retrieved 2017-12-17.
    3. Poletti, Therese. "Why the father of the self-driving car left Google". MarketWatch. Retrieved 2017-12-18.
    Richard Vuduc
    Rich Vuduc 2017.png
    Nationality American
    Awards NSF career award, Gordon Bell Prize
    Academic background
    Alma mater Cornell University, University of California-Berkeley