Torsten Hoefler

Last updated

Torsten Hoefler
Torsten Hoefler.jpg
Alma mater Indiana University
TU Chemnitz
Awards ACM Fellow [1]

IEEE CS Sidney Fernbach Award [2]
IEEE Fellow [3] [4]

Latsis Prize of ETH Zürich [5]
Scientific career
Fields High-Performance Computing
Computer Science
Institutions ETH Zurich
Swiss National Supercomputing Centre
Microsoft
Cray
University of Illinois at Urbana-Champaign
Indiana University
Doctoral advisor Andrew Lumsdaine

Torsten Hoefler is a Professor of Computer Science at ETH Zurich [6] and the Chief Architect for Machine Learning at the Swiss National Supercomputing Centre. [7] Previously, he led the Advanced Application and User Support team at the Blue Waters Directorate of the National Center for Supercomputing Applications, [8] and held an adjunct professor position at the Computer Science Department at the University of Illinois at Urbana Champaign. [9] His expertise lies in large-scale parallel computing and high-performance computing systems. He focuses on applications in large-scale artificial intelligence as well as climate sciences.

Contents

Hoefler is an IEEE Fellow, [10] ACM Fellow, [11] and a member of the European Academy of Sciences Academia Europaea. [12] His Erdos number is two. [13]

He has been invited to present several keynote lectures at major international conferences such as ACM's Federated Computing Research Conference, [14] IEEE Cluster, [15] HPC Asia, Supercomputing Asia, [16] or the International Symposium on Distributed Computing. [17]

Career

Hoefler received his Diplom in Computer Science from TU Chemnitz where he received the best student award in 2005. [18] He worked on high-performance computing systems from the very beginning of his career. He continued his studies at Indiana University, the home of Open MPI, under the guidance of Prof. Andrew Lumsdaine. He received his PhD in Computer Science in 2008 from Indiana University and was subsequently honored with the university's Young Alumni Award [19] as well as Distinguished Alumni Award [20]

He continued his work on the Message Passing Interface standard as a key member of the MPI Forum [21] responsible for the chapters on Collective Communication and Process Topologies as well as co-authoring the chapter on One-Sided Communications. [22]

In 2010, he joined the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign (UIUC). As lead for application performance analysis and support, he supported the design and deployment of the Blue Waters Supercomputer. [8] He also held a position as adjunct professor at UIUC's Computer Science department. He accepted a position as assistant professor at ETH Zurich in 2011, [23] where he received tenure in 2017, [24] and is full professor from 2020. [25]

Hoefler has held various visiting researcher positions at French Alternative Energies and Atomic Energy Commission in France, CINECA in Italy, as well as Argonne National Laboratory, Sandia National Laboratory, and Microsoft in the United States. [9] As a consultant, he supported Cray Inc. in the area of high-performance networking and Microsoft Corporation in the areas of quantum computing and large-scale artificial intelligence systems. He spent his sabbatical in 2019 at Microsoft helping to establish various AI supercomputing efforts including the Maia 100 system. [26] [27] [28]

Hoefler has been a member of the ACM SIGHPC executive committee since its founding in 2011. [29]

He was elected IEEE Fellow for “contributions to large-scale parallel processing systems and supercomputers”, [10] ACM Fellow for “foundational contributions to High-Performance Computing and the application of HPC techniques to machine learning”, [11] and he received the IEEE Sidney Fernbach Award in 2022 for “application-aware design of HPC algorithms, systems and architectures, and transformative impact on scientific computing and industry”. [2]

Hoefler received the inaugural Jack Dongarra award at ISC High Performance Conference in 2023. [30] [31] [32] He was appointed as a senior fellow of the Abu Dhabi Investment Authority Labs in 2023. [33] [34]

Research impact

Hoefler is known for his contributions to the Message Passing Interface (MPI) standard. He served as author for the chapters “Collective Communication” and “Process Topologies” in MPI-2.2 and the chapters “Collective Communication”, “One-Sided Communications”, and “Process Topologies” in MPI-3 . For the MPI-3 standardization, he chaired the Collective Communications and Topology working groups. [35]

He developed principles for the implementation of nonblocking collective operations and remote memory access that are widely used in MPI implementations such as OpenMPI, MPICH, and derivatives. [36] Nonblocking collective operations such as allreduce, allgather, or broadcast form the basis of modern AI training systems. [37]

After co-authoring a pioneering paper on parallel deep learning [38] and during his sabbatical at Microsoft, he coined the term “3D parallelism” in modern artificial intelligence training that organizes data parallelism, pipeline parallelism, and operator parallelism into one consistent view. [39]

In his work on high-speed interconnects, he co-developed several award-winning network topologies [39] [40] [41] and contributed routing algorithms that are used in the OpenSM routing manager on InfiniBand computer clusters. [42]

On the application side, Hoefler focuses on improving the performance of climate simulations as a digital twin [43] [44] [45] and machine learning for climate simulations. [46] He has been a convener of the Berlin Summit in Earth Virtualization Engines [47] to develop strategies to enable global access to high-resolution climate simulations. [48] [49]

Scientific reproducibility

Hoefler has been vocal about improving reproducibility of performance measurements in high-performance computing [50] and later machine learning. The latter is featured in IEEE Computer Journal as a cover feature on Research Reproducibility. [51] As Technical Papers chair of ACM/IEEE Supercomputing Conference (SC18), he introduced a new revision-based review process to the conference to improve the quality of the publications. [52] His group received the SIGHPC Certificate of Appreciation for reproducible methods at the ACM/IEEE Supercomputing Conference (SC22) ACM student cluster competition. [53] His paper on HammingMesh received the ACM/IEEE Supercomputing Conference (SC22) Best Reproducibility Advancement Award. [54] [53] He also presented the opening keynote at the first ACM Conference on Reproducibility and Replicability. [55]

Awards and honors

Hoefler and his team received six best (student) paper awards at the ACM/IEEE Supercomputing Conference between 2010 and 2023, [56] [57] [58] [59] the top conference in High-Performance Computing. Additional important awards are listed below.

2023

2022

2021

2020

2019

2015

2014

2013

2012

Related Research Articles

<span class="mw-page-title-main">Niklaus Wirth</span> Swiss computer scientist (1934–2024)

Niklaus Emil Wirth was a Swiss computer scientist. He designed several programming languages, including Pascal, and pioneered several classic topics in software engineering. In 1984, he won the Turing Award, generally recognized as the highest distinction in computer science, "for developing a sequence of innovative computer languages".

<span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, supercomputers have existed which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of users writing portable message-passing programs in C, C++, and Fortran. There are several open-source MPI implementations, which fostered the development of a parallel software industry, and encouraged development of portable and scalable large-scale parallel applications.

<span class="mw-page-title-main">Jack Dongarra</span> American computer scientist (born 1950)

Jack Joseph Dongarra is an American computer scientist and mathematician. He is the American University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department at the University of Tennessee. He holds the position of a Distinguished Research Staff member in the Computer Science and Mathematics Division at Oak Ridge National Laboratory, Turing Fellowship in the School of Mathematics at the University of Manchester, and is an adjunct professor and teacher in the Computer Science Department at Rice University. He served as a faculty fellow at the Texas A&M University Institute for Advanced Study (2014–2018). Dongarra is the founding director of the Innovative Computing Laboratory at the University of Tennessee. He was the recipient of the Turing Award in 2021.

<span class="mw-page-title-main">David Bader (computer scientist)</span> American computer scientist

David A. Bader is a Distinguished Professor and Director of the Institute for Data Science at the New Jersey Institute of Technology. Previously, he served as the Chair of the Georgia Institute of Technology School of Computational Science & Engineering, where he was also a founding professor, and the executive director of High-Performance Computing at the Georgia Tech College of Computing. In 2007, he was named the first director of the Sony Toshiba IBM Center of Competence for the Cell Processor at Georgia Tech.

<span class="mw-page-title-main">Bill Gropp</span>

William Douglas Gropp is the director of the National Center for Supercomputing Applications (NCSA) and the Thomas M. Siebel Chair in the Department of Computer Science at the University of Illinois at Urbana–Champaign. He is also the founding Director of the Parallel Computing Institute. Gropp helped to create the Message Passing Interface, also known as MPI, and the Portable, Extensible Toolkit for Scientific Computation, also known as PETSc.

The Sidney Fernbach Award established in 1992 by the IEEE Computer Society, in memory of Sidney Fernbach, one of the pioneers in the development and application of high performance computers for the solution of large computational problems as the Division Chief for the Computation Division at Lawrence Livermore Laboratory from the late 1950s through the 1970s. A certificate and $2,000 are awarded for outstanding contributions in the application of high performance computers using innovative approaches. The nomination deadline is 1 July each year.

Brutus is the central high-performance cluster of ETH Zurich. It was introduced to the public in May 2008. A new computing cluster called EULER has been announced and opened to the public in May 2014.

SC, the International Conference for High Performance Computing, Networking, Storage and Analysis, is the annual conference established in 1988 by the Association for Computing Machinery and the IEEE Computer Society. In 2019, about 13,950 people participated overall; by 2022 attendance had rebounded to 11,830 both in-person and online. The not-for-profit conference is run by a committee of approximately 600 volunteers who spend roughly three years organizing each conference.

The Seymour Cray Computer Engineering Award, also known as the Seymour Cray Award, is an award given by the IEEE Computer Society, to recognize significant and innovative contributions in the field of high-performance computing. The award honors scientists who exhibit the creativity demonstrated by Seymour Cray, founder of Cray Research, Inc., and an early pioneer of supercomputing. Cray was an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research which built many of these machines. Called "the father of supercomputing," Cray has been credited with creating the supercomputer industry. He played a key role in the invention and design of the UNIVAC 1103, a landmark high-speed computer and the first computer available for commercial use.

The Ken Kennedy Award, established in 2009 by the Association for Computing Machinery and the IEEE Computer Society in memory of Ken Kennedy, is awarded annually and recognizes substantial contributions to programmability and productivity in computing and substantial community service or mentoring contributions. The award includes a $5,000 honorarium and the award recipient will be announced at the ACM - IEEE Supercomputing Conference.

<span class="mw-page-title-main">K computer</span> Supercomputer in Kobe, Japan

The K computer – named for the Japanese word/numeral "kei" (京), meaning 10 quadrillion (1016) – was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Prefecture, Japan. The K computer was based on a distributed memory architecture with over 80,000 compute nodes. It was used for a variety of applications, including climate research, disaster prevention and medical research. The K computer's operating system was based on the Linux kernel, with additional drivers designed to make use of the computer's hardware.

<span class="mw-page-title-main">Supercomputing in Europe</span> Overview of supercomputing in Europe

Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed European Infrastructure for Supercomputing Applications (DEISA), which was formed in 2002 as a consortium of eleven supercomputing centers from seven European countries. Operating within the CORDIS framework, HPC Europa aims to provide access to supercomputers across Europe.

The LINPACK Benchmarks are a measure of a system's floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering.

<span class="mw-page-title-main">Michael Franz</span> American computer scientist

Michael Franz is an American computer scientist best known for his pioneering work on just-in-time compilation and optimisation and on artificial software diversity. He is a Distinguished Professor of Computer Science in the Donald Bren School of Information and Computer Sciences at the University of California, Irvine (UCI), a Professor of Electrical Engineering and Computer Science in the Henry Samueli School of Engineering at UCI, and Director of UCI's Secure Systems and Software Laboratory.

<span class="mw-page-title-main">ACM SIGHPC</span> ACMs Special Interest Group on High Performance Computing

ACM SIGHPC is the Association for Computing Machinery's Special Interest Group on High Performance Computing, an international community of students, faculty, researchers, and practitioners working on research and in professional practice related to supercomputing, high-end computers, and cluster computing. The organization co-sponsors international conferences related to high performance and scientific computing, including: SC, the International Conference for High Performance Computing, Networking, Storage and Analysis; the Platform for Advanced Scientific Computing (PASC) Conference; Practice and Experience in Advanced Research Computing (PEARC); and PPoPP, the Symposium on Principles and Practice of Parallel Programming.

<span class="mw-page-title-main">Richard Vuduc</span>

Richard Vuduc is a tenured professor of computer science at the Georgia Institute of Technology. His research lab, The HPC Garage, studies high-performance computing, scientific computing, parallel algorithms, modeling, and engineering. He is a member of the Association for Computing Machinery (ACM). As of 2022, Vuduc serves as Vice President of the SIAM Activity Group on Supercomputing. He has co-authored over 200 articles in peer-reviewed journals and conferences.

Adrian Perrig is a Swiss computer science researcher and professor at ETH Zurich, leading the Network Security research group. His research focuses on networking and systems security, and specifically on the design of a secure next-generation internet architecture.

Michela Taufer is an Italian-American computer scientist and holds the Jack Dongarra Professorship in High Performance Computing within the Department of Electrical Engineering and Computer Science at the University of Tennessee, Knoxville. She is an ACM Distinguished Scientist and an IEEE Senior Member. In 2021, together with a team al Lawrence Livermore National Laboratory, she earned a R&D 100 Award for the Flux workload management software framework in the Software/Services category.

<span class="mw-page-title-main">Anton Gunzinger</span> Electronics engineer, entrepreneur

Anton Gunzinger is a Swiss electrical engineer and entrepreneur. He was a developer of high-performance parallelized computers.

References

  1. "Global Computing Association Names 57 Fellows for Outstanding Contributions That Propel Technology Today". www.acm.org. Retrieved 17 February 2023.
  2. 1 2 3 "Torsten Hoefler Receives IEEE CS Sidney Fernbach Award 2022" . Retrieved 17 February 2023.
  3. "2022 NEWLY ELEVATED FELLOWS" (PDF). Institute of Electrical and Electronics Engineers (IEEE).
  4. 1 2 "Adrian Perrig and Torsten Hoefler named IEEE Fellows". inf.ethz.ch. Retrieved 17 February 2023.
  5. 1 2 "Turning life into a profession". ethz.ch. Retrieved 17 February 2023.
  6. "Prof. Dr. Torsten Hoefler". Departement Informatik. inf.ethz.ch. Zürich, Schweiz: ETH Zürich . Retrieved 22 June 2023.
  7. "ETH Professor Torsten Hoefler Joins CSCS as Chief Architect for Machine Learning".
  8. 1 2 "Blue Waters staff, partners bring home awards from SC10". NCSA. Retrieved 17 February 2023.
  9. 1 2 "Torsten Hoefler's CV" (PDF).
  10. 1 2 "2022 Newly Elevated Fellows" (PDF). Institute of Electrical and Electronics Engineers (IEEE).
  11. 1 2 3 "Global Computing Association Names 57 Fellows for Outstanding Contributions That Propel Technology Today". www.acm.org. Retrieved 22 June 2023.
  12. "Academy of Europe: Hoefler Torsten". www.ae-info.org. Retrieved 22 June 2023.
  13. "Erdos2, Version 2020, August 7, 2020". sites.google.com. Retrieved 11 February 2024.
  14. "Plenary Speakers". fcrc.acm.org. Retrieved 22 June 2023.
  15. "IEEE Cluster 2016". clustercomp.org. Retrieved 22 June 2023.
  16. "Keynote Speakers" . Retrieved 11 February 2024.
  17. "Keynote Talks | International Symposium on DIStributed Computing (DISC) 2020" . Retrieved 22 June 2023.
  18. "Leistung, die sich auszahlt". www.tu-chemnitz.de. Retrieved 22 June 2023.
  19. "Torsten Hoefler: University Honors and Awards: Indiana University". University Honors & Awards. Retrieved 22 June 2023.
  20. "Luddy School honors 2023 alumni award winners". IU News Archive. Retrieved 6 November 2023.
  21. "MPI Forum". www.mpi-forum.org. Retrieved 22 June 2023.
  22. "MPI 3.1 Specification" (PDF).
  23. "16 new professors at the ETH Zurich". www.ethlife.ethz.ch. Retrieved 22 June 2023.
  24. "18 professors appointed at ETH Zurich and EPFL". www.admin.ch. Retrieved 17 February 2023.
  25. "11 new professors appointed at ETH Zurich and EPFL". www.admin.ch. Retrieved 22 June 2023.
  26. US 11076210,Hoefler, Torsten; Heddes, Mattheus C.& Belk, Jonathan R.,"Distributed processing architecture",published 2021-07-27, assigned to Microsoft Technology Licensing LLC
  27. US 11886938,Goel, Deepak; Heddes, Mattheus C.& Hoefler, Torstenet al.,"Message communication between integrated computing devices",published 2021-03-11, assigned to Microsoft Technology Licensing LLC
  28. "With a systems approach to chips, Microsoft aims to tailor everything 'from silicon to service' to meet AI demand" . Retrieved 11 February 2024.
  29. "Meeting Your Needs - Executive Committee". www.sighpc.org. Retrieved 22 June 2023.
  30. 1 2 "Torsten Hoefler Earns First Jack Dongarra Early Career Award". HPCwire. Retrieved 22 June 2023.
  31. 1 2 "Torsten Hoefler Earns First Jack Dongarra Early Career Award - Welcome to ISC High Performance 2023". www.isc-hpc.com. Retrieved 22 June 2023.
  32. 1 2 "Torsten Hoefler Named First Winner of Jack Dongarra Early Career Award". 17 April 2023. Retrieved 22 June 2023.
  33. "ADIA Lab Appoints Senior Fellows" . Retrieved 15 February 2024.
  34. "Fellows" . Retrieved 15 February 2024.
  35. "MPI 3.0 Collective Communications and Topology Working Group" . Retrieved 8 November 2023.
  36. "Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI" . Retrieved 8 November 2023.
  37. "Improving NCCL performance for cloud ML applications" . Retrieved 8 November 2023.
  38. "Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis". doi:10.1145/3320060. S2CID   220247313 . Retrieved 8 November 2023.
  39. 1 2 "HammingMesh: a network topology for large-scale deep learning" . Retrieved 8 November 2023.
  40. Besta, Maciej; Hoefler, Torsten (2014). "Slim Fly: A Cost Effective Low-Diameter Network Topology". SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. pp. 348–359. arXiv: 1912.08968 . doi:10.1109/SC.2014.34. ISBN   978-1-4799-5500-8. S2CID   2149630 . Retrieved 8 November 2023.
  41. Arimilli, Baba; Arimilli, Ravi; Chung, Vicente; Clark, Scott; Denzel, Wolfgan; Drerup, Ben; Hoefler, Torsten; Joyner, Jody; Lewis, Jerry; Li, Jian; Ni, Nan; Rajamony, Ram (2010). "The PERCS High-Performance Interconnect". 2010 18th IEEE Symposium on High Performance Interconnects. pp. 75–82. doi:10.1109/HOTI.2010.16. ISBN   978-1-4244-8547-5. S2CID   16627945 . Retrieved 8 November 2023.
  42. Hoefler, Torsten; Schneider, Timo; Lumsdaine, Andrew (2009). "Optimized Routing for Large-Scale InfiniBand Networks". 2009 17th IEEE Symposium on High Performance Interconnects. pp. 103–111. doi:10.1109/HOTI.2009.9. S2CID   12742852 . Retrieved 8 November 2023.
  43. "Convection-resolving climate modeling on future supercomputing platforms (crCLIM)" . Retrieved 8 November 2023.
  44. "Scientists begin building highly accurate digital twin of our planet" . Retrieved 8 November 2023.
  45. "The digital revolution of Earth-system science" . Retrieved 8 November 2023.
  46. "Deep learning and a changing economy in weather and climate prediction" . Retrieved 8 November 2023.
  47. "Participants" . Retrieved 8 November 2023.
  48. "Earth Virtualization Engines (EVE)" . Retrieved 8 November 2023.
  49. "Earth Virtualization Engines: A Technical Perspective" . Retrieved 8 November 2023.
  50. "Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results". doi:10.1145/2807591.2807644. S2CID   165618 . Retrieved 8 November 2023.
  51. Hoefler, Torsten (2022). "Benchmarking Data Science: 12 Ways to Lie With Statistics and Performance on Parallel Computers". Computer. 55 (8): 49–56. doi:10.1109/MC.2022.3152681. S2CID   251294669 . Retrieved 8 November 2023.
  52. "SC18 Papers Submissions Open Today with New Review Process" . Retrieved 8 November 2023.
  53. 1 2 "Congratulations to All of This Year's SC and Society Awardees" . Retrieved 8 November 2023.
  54. "Luddy alum receives prestigious award for contributions to high performance computing" . Retrieved 8 November 2023.
  55. "Featured Keynote Speakers" . Retrieved 8 November 2023.
  56. "Blue Waters staff, partners bring home awards from SC10" . Retrieved 11 February 2024.
  57. "SC13 Concludes with Awards for Outstanding Achievements in HPC" . Retrieved 11 February 2024.
  58. "Supercomputing 2014 Recognizes Outstanding Achievements in HPC" . Retrieved 11 February 2024.
  59. "Congratulations to the SC and Society Awardees for SC19 in Denver" . Retrieved 11 February 2024.
  60. "Luddy alum receives prestigious award for contributions to high performance computing". Luddy alum receives prestigious award for contributions to high performance computing. Retrieved 17 February 2023.
  61. "HPCwire Unveils Honorees for Its 2021 People to Watch Feature". HPCwire. Retrieved 17 February 2023.
  62. "ERC Consolidator Grants 2020" (PDF).
  63. "Olga Sorkine-Hornung and Torsten Hoefler receive 2 ERC Consolidator Grants". inf.ethz.ch. Retrieved 17 February 2023.
  64. "Torsten Hoefler receives BenchCouncil Rising Star Award". inf.ethz.ch. Retrieved 17 February 2023.
  65. "ACM Names Recipients of 2019 Gordon Bell Prize". www.acm.org. Retrieved 17 February 2023.
  66. "Middle Award". www.ieee-tcsc.org. Retrieved 17 February 2023.
  67. "ERC Starting Grants 2015" (PDF).
  68. "Torsten Hoefler: University Honors and Awards: Indiana University". University Honors & Awards. Retrieved 17 February 2023.
  69. "Torsten Hoefler | IEEE Computer Society" . Retrieved 17 February 2023.
  70. "2013 Faculty Award recipients" (PDF). Retrieved 22 June 2023.
  71. Black, Doug (21 February 2012). "Torsten Hoefler Wins 2012 SIAG/Supercomputing Junior Scientist Prize". High-Performance Computing News Analysis | insideHPC. Retrieved 17 February 2023.