Tachyon (software)

Last updated
Tachyon
Original author(s) John E. Stone
Written in C
Type Ray tracing/3D rendering software
License BSD-3-Clause
Website jedi.ks.uiuc.edu/~johns/tachyon/
Satellite tobacco mosaic virus molecular graphics produced in VMD and rendered using Tachyon. The scene is shown with a combination of direct lighting and ambient occlusion lighting to improve the visibility of pockets and cavities. The VMD axes are shown as an example of rendering of non-molecular geometry. Satellite tobacco mosaic virus rendering produced by VMD and Tachyon.jpg
Satellite tobacco mosaic virus molecular graphics produced in VMD and rendered using Tachyon. The scene is shown with a combination of direct lighting and ambient occlusion lighting to improve the visibility of pockets and cavities. The VMD axes are shown as an example of rendering of non-molecular geometry.
Tachyon rendering of a 1-billion atom aerosolized SARS-CoV-2 virion (COVID-19). Aerosolized SARS-CoV-2 Virion.png
Tachyon rendering of a 1-billion atom aerosolized SARS-CoV-2 virion (COVID-19).
Intel iPSC/860 32-node parallel computer running a Tachyon performance test. August 22, 1995. Intel iPSC 860 32-node parallel computer front panel.gif
Intel iPSC/860 32-node parallel computer running a Tachyon performance test. August 22, 1995.

Tachyon is a parallel/multiprocessor ray tracing software. It is a parallel ray tracing library for use on distributed memory parallel computers, shared memory computers, and clusters of workstations. Tachyon implements rendering features such as ambient occlusion lighting, depth-of-field focal blur, shadows, reflections, and others. It was originally developed for the Intel iPSC/860 by John Stone for his M.S. thesis at University of Missouri-Rolla. [1] Tachyon subsequently became a more functional and complete ray tracing engine, and it is now incorporated into a number of other open source software packages such as VMD, and SageMath. Tachyon is released under a permissive license (included in the tarball).

Contents

Evolution and Features

Tachyon was originally developed for the Intel iPSC/860, a distributed memory parallel computer based on a hypercube interconnect topology based on the Intel i860, an early RISC CPU with VLIW architecture and . Tachyon was originally written using Intel's proprietary NX message passing interface for the iPSC series, but it was ported to the earliest versions of MPI shortly thereafter in 1995. Tachyon was adapted to run on the Intel Paragon platform using the Paragon XP/S 150 MP at Oak Ridge National Laboratory. The ORNL XP/S 150 MP was the first platform Tachyon supported that combined both large-scale distributed memory message passing among nodes, and shared memory multithreading within nodes. Adaptation of Tachyon to a variety of conventional Unix-based workstation platforms and early clusters followed, including porting to the IBM SP2. Tachyon was incorporated into the PARAFLOW CFD code to allow in-situ volume visualization of supersonic combustor flows performed on the Paragon XP/S at NASA Langley Research Center, providing a significant performance gain over conventional post-processing visualization approaches that had been used previously. [2] Beginning in 1999, support for Tachyon was incorporated into the molecular graphics program VMD, and this began an ongoing period co-development of Tachyon and VMD where many new Tachyon features were added specifically for molecular graphics. Tachyon was used to render the winning image illustration category for the NSF 2004 Visualization Challenge. [3] In 2007, Tachyon added support for ambient occlusion lighting, which was one of the features that made it increasingly popular for molecular visualization in conjunction with VMD. VMD and Tachyon were gradually adapted to support routine visualization and analysis tasks on clusters, and later for large petascale supercomputers. Tachyon was used to produce figures, movies, and the Nature cover image of the atomic structure of the HIV-1 capsid solved by Zhao et al. in 2013, on the Blue Waters petascale supercomputer at NCSA, U. Illinois. [4] [5] Both CPU and GPU versions of Tachyon were used to render images of the SARS-CoV-2 virion, spike protein, and aerosolized virion in three separate ACM Gordon Bell COVID-19 research projects, including the winning project at Supercomputing 2020, [6] and two finalist projects at Supercomputing 2021.[ citation needed ]

Use in Parallel Computing Demonstrations, Training, and Benchmarking

Owing in part to its portability to a diverse range of platforms Tachyon has been used as a test case for a variety of parallel computing and compiler research articles.

In 1999, John Stone assisted Bill Magro with adaptation of Tachyon to support early versions of the OpenMP directive-based parallel computing standard, using Kuck and Associates' KCC compiler. Tachyon was shown as a demo performing interactive ray tracing on DEC Alpha workstations using KCC and OpenMP.

In 2000, Intel acquired Kuck and Associates Inc., [7] and Tachyon continued to be used as an OpenMP demonstration. Intel later used Tachyon to develop a variety of programming examples for its Threading Building Blocks (TBB) parallel programming system, where an old version of the program continues to be incorporated as an example to the present day. [8] [9]

In 2006, Tachyon was selected by the SPEC HPG for inclusion in the SPEC MPI 2007 benchmark suite. [10] [11]

Beyond Tachyon's typical use as tool for rendering high quality images, likely due to its portability and inclusion in SPEC MPI 2007, it has also been used as a test case and point of comparison for a variety of research projects related to parallel rendering and visualization, [12] [13] [14] [15] [16] [17] [18] [19] [20] cloud computing, [21] [22] [23] [24] [25] and parallel computing, [26] [27] [28] compilers, [29] [30] [31] [32] runtime systems, [33] [34] and computer architecture, [35] [36] [37] performance analysis tools, [38] [39] [40] and energy efficiency of HPC systems. [41] [42] [43]

See also

Related Research Articles

<span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, supercomputers have existed which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

<span class="mw-page-title-main">OpenMP</span> Open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

<span class="mw-page-title-main">Visual Molecular Dynamics</span> Visualization and modelling software

Visual Molecular Dynamics (VMD) is a molecular modelling and visualization computer program. VMD is developed mainly as a tool to view and analyze the results of molecular dynamics simulations. It also includes tools for working with volumetric data, sequence data, and arbitrary graphics objects. Molecular scenes can be exported to external rendering tools such as POV-Ray, RenderMan, Tachyon, Virtual Reality Modeling Language (VRML), and many others. Users can run their own Tcl and Python scripts within VMD as it includes embedded Tcl and Python interpreters. VMD runs on Unix, Apple Mac macOS, and Microsoft Windows. VMD is available to non-commercial users under a distribution-specific license which permits both use of the program and modification of its source code, at no charge.

The BBN Butterfly was a massively parallel computer built by Bolt, Beranek and Newman in the 1980s. It was named for the "butterfly" multi-stage switching network around which it was built. Each machine had up to 512 CPUs, each with local memory, which could be connected to allow every CPU access to every other CPU's memory, although with a substantially greater latency than for its own. The CPUs were commodity microprocessors. The memory address space was shared.

<span class="mw-page-title-main">Pat Hanrahan</span> American computer graphics researcher

Patrick M. Hanrahan is an American computer graphics researcher, the Canon USA Professor of Computer Science and Electrical Engineering in the Computer Graphics Laboratory at Stanford University. His research focuses on rendering algorithms, graphics processing units, as well as scientific illustration and visualization. He has received numerous awards, including the 2019 Turing Award.

<span class="mw-page-title-main">Intel iPSC</span>

The Intel Personal SuperComputer was a product line of parallel computers in the 1980s and 1990s. The iPSC/1 was superseded by the Intel iPSC/2, and then the Intel iPSC/860.

Thread Level Speculation (TLS), also known as Speculative Multithreading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

In computer architecture, memory-level parallelism (MLP) is the ability to have pending multiple memory operations, in particular cache misses or translation lookaside buffer (TLB) misses, at the same time.

HPX, short for High Performance ParalleX, is a runtime system for high-performance computing. It is currently under active development by the STE||AR group at Louisiana State University. Focused on scientific computing, it provides an alternative execution model to conventional approaches such as MPI. HPX aims to overcome the challenges MPI faces with increasing large supercomputers by using asynchronous communication between nodes and lightweight control objects instead of global barriers, allowing application developers to exploit fine-grained parallelism.

A hypertree network is a network topology that shares some traits with the binary tree network. It is a variation of the fat tree architecture.

<span class="mw-page-title-main">Molecular modeling on GPUs</span> Using graphics processing units for molecular simulations

Molecular modeling on GPU is the technique of using a graphics processing unit (GPU) for molecular simulations.

Princeton Application Repository for Shared-Memory Computers (PARSEC) is a benchmark suite composed of multi-threaded emerging workloads that is used to evaluate and develop next-generation chip-multiprocessors. It was collaboratively created by Intel and Princeton University to drive research efforts on future computer systems. Since its inception the benchmark suite has become a community project that is continued to be improved by a broad range of research institutions. PARSEC is freely available and is used for both academic and non-academic research.

<span class="mw-page-title-main">SpiNNaker</span>

SpiNNaker is a massively parallel, manycore supercomputer architecture designed by the Advanced Processor Technologies Research Group (APT) at the Department of Computer Science, University of Manchester. It is composed of 57,600 processing nodes, each with 18 ARM9 processors and 128 MB of mobile DDR SDRAM, totalling 1,036,800 cores and over 7 TB of RAM. The computing platform is based on spiking neural networks, useful in simulating the human brain.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

Klaus Schulten was a German-American computational biophysicist and the Swanlund Professor of Physics at the University of Illinois at Urbana-Champaign. Schulten used supercomputing techniques to apply theoretical physics to the fields of biomedicine and bioengineering and dynamically model living systems. His mathematical, theoretical, and technological innovations led to key discoveries about the motion of biological cells, sensory processes in vision, animal navigation, light energy harvesting in photosynthesis, and learning in neural networks.

Mary Katherine Vernon is an American computer scientist who works as a professor of computer science and industrial engineering at the University of Wisconsin–Madison. Her research concerns high-performance computer architecture and streaming media.

An AI accelerator or neural processing unit is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2018, a typical AI integrated circuit chip contains billions of MOSFET transistors. A number of vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

<span class="mw-page-title-main">Nader Bagherzadeh</span>

Nader Bagherzadeh is a professor of computer engineering in the Department of Electrical Engineering and Computer Science at the University of California, Irvine, where he served as a chair from 1998 to 2003. Bagherzadeh has been involved in research and development in the areas of: Computer Architecture, Reconfigurable Computing, VLSI Chip Design, Network-on-Chip, 3D chips, Sensor Networks, Computer Graphics, Memory and Embedded Systems. Bagherzadeh was named Fellow of the Institute of Electrical and Electronics Engineers (IEEE) in 2014 for contributions to the design and analysis of coarse-grained reconfigurable processor architectures. Bagherzadeh has published more than 400 articles in peer-reviewed journals and conferences. He was with AT&T Bell Labs from 1980 to 1984.

In the high-performance computing environment, burst buffer is a fast intermediate storage layer positioned between the front-end computing processes and the back-end storage systems. It bridges the performance gap between the processing speed of the compute nodes and the Input/output (I/O) bandwidth of the storage systems. Burst buffers are often built from arrays of high-performance storage devices, such as NVRAM and SSD. It typically offers from one to two orders of magnitude higher I/O bandwidth than the back-end storage systems.

References

  1. Stone, John E. (January 1998). "An Efficient Library for Parallel Ray Tracing and Animation". Masters Theses.
  2. Stone, J.; Underwood, M. (1996). "Rendering of numerical flow simulations using MPI". Proceedings. Second MPI Developer's Conference. pp. 138–141. CiteSeerX   10.1.1.27.4822 . doi:10.1109/MPIDC.1996.534105. ISBN   978-0-8186-7533-1. S2CID   16846313.
  3. Emad Tajkhorshid; Klaus Schulten. "Water Permeation Through Aquaporins". Theoretical and Computational Biophysics Group, University of Illinois at Urbana-Champaign.
  4. Zhao, Gongpu; Perilla, Juan R.; Yufenyuy, Ernest L.; Meng, Xin; Chen, Bo; Ning, Jiying; Ahn, Jinwoo; Gronenborn, Angela M.; Schulten, Klaus (2013). "Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics". Nature. 497 (7451): 643–646. Bibcode:2013Natur.497..643Z. doi:10.1038/nature12162. PMC   3729984 . PMID   23719463.
  5. Stone, John E.; Isralewitz, Barry; Schulten, Klaus (2013). "Early experiences scaling VMD molecular visualization and analysis jobs on blue waters". 2013 Extreme Scaling Workshop (XSW 2013). pp. 43–50. CiteSeerX   10.1.1.396.3545 . doi:10.1109/XSW.2013.10. ISBN   978-1-4799-3691-5. S2CID   16329833.
  6. Casalino, Lorenzo; Dommer, Abigail C; Gaieb, Zied; Barros, Emilia P; Sztain, Terra; Ahn, Surl-Hee; Trifan, Anda; Brace, Alexander; Bogetti, Anthony T; Clyde, Austin; Ma, Heng (September 2021). "AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics". The International Journal of High Performance Computing Applications. 35 (5): 432–451. doi:10.1177/10943420211006452. ISSN   1094-3420. PMC   8064023 .{{cite journal}}: CS1 maint: date and year (link)
  7. "Intel To Acquire Kuck & Associates. Acquisition Expands Intel's Capabilities in Software Development Tools for Multiprocessor Computing" . Retrieved January 30, 2016.
  8. "Intel® Threading Building Blocks (Intel® TBB)" . Retrieved January 30, 2016.
  9. "Parallel for -Tachyon". Intel Corporation. 2009-03-09. Retrieved January 30, 2016.
  10. "122.tachyon SPEC MPI2007 Benchmark Description" . Retrieved January 30, 2016.
  11. Müller, Matthias S.; Van Waveren, Matthijs; Lieberman, Ron; Whitney, Brian; Saito, Hideki; Kumaran, Kalyan; Baron, John; Brantley, William C.; Parrott, Chris; Elken, Tom; Feng, Huiyu; Ponder, Carl (2009). "SPEC MPI2007—an application benchmark suite for parallel systems using MPI". Concurrency and Computation: Practice and Experience: n/a. doi:10.1002/cpe.1535. S2CID   5496204.
  12. Rosenberg, Robert O.; Lanzagorta, Marco O.; Chtchelkanova, Almadena; Khokhlov, Alexei (2000). "Parallel visualization of large data sets". In Erbacher, Robert F; Chen, Philip C; Roberts, Jonathan C; Wittenbrink, Craig M (eds.). Visual Data Exploration and Analysis VII. Vol. 3960. pp. 135–143. doi:10.1117/12.378889. S2CID   62573871.
  13. Lawlor, Orion Sky. "IMPOSTORS FOR PARALLEL INTERACTIVE COMPUTER GRAPHICS" (PDF). M.S., University of Illinois at Urbana-Champaign, 2001. Retrieved January 30, 2016.
  14. "Lawlor, Orion Sky, Matthew Page, and Jon Genetti. "MPIglut: powerwall programming made easier." (2008)" (PDF). Retrieved January 30, 2016.
  15. McGuigan, Michael (2008-01-09). "Toward the Graphics Turing Scale on a Blue Gene Supercomputer". arXiv: 0801.1500 [cs.GR].
  16. "Lawlor, Orion Sky, and Joe Genetti. "Interactive volume rendering aurora on the GPU." (2011)" (PDF).
  17. Grottel, Sebastian; Krone, Michael; Scharnowski, Katrin; Ertl, Thomas (2012). "Object-space ambient occlusion for molecular dynamics". 2012 IEEE Pacific Visualization Symposium. pp. 209–216. doi:10.1109/PacificVis.2012.6183593. ISBN   978-1-4673-0866-3. S2CID   431332.
  18. Stone, John E.; Isralewitz, Barry; Schulten, Klaus (2013). "Early experiences scaling VMD molecular visualization and analysis jobs on blue waters". 2013 Extreme Scaling Workshop (XSW 2013). pp. 43–50. CiteSeerX   10.1.1.396.3545 . doi:10.1109/XSW.2013.10. ISBN   978-1-4799-3691-5. S2CID   16329833.
  19. Stone, John E.; Vandivort, Kirby L.; Schulten, Klaus (2013). "GPU-accelerated molecular visualization on petascale supercomputing platforms". Proceedings of the 8th International Workshop on Ultrascale Visualization - Ultra Vis '13. pp. 1–8. doi:10.1145/2535571.2535595. ISBN   9781450325004. S2CID   18633700.
  20. Sener, Melih; et al. "Visualization of Energy Conversion Processes in a Light Harvesting Organelle at Atomic Detail" (PDF). Retrieved January 30, 2016.
  21. Patchin, Philip; Lagar-Cavilla, H. Andrés; De Lara, Eyal; Brudno, Michael (2009). "Adding the easy button to the cloud with Snow Flock and MPI". Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing - HPCVirt '09. pp. 1–8. CiteSeerX   10.1.1.534.7880 . doi:10.1145/1519138.1519139. ISBN   9781605584652. S2CID   15380880.
  22. Neill, Richard; Carloni, Luca P.; Shabarshin, Alexander; Sigaev, Valeriy; Tcherepanov, Serguei (2011). "Embedded Processor Virtualization for Broadband Grid Computing". 2011 IEEE/ACM 12th International Conference on Grid Computing. pp. 145–156. CiteSeerX   10.1.1.421.5483 . doi:10.1109/Grid.2011.27. ISBN   978-1-4577-1904-2. S2CID   7760113.
  23. "A Workflow Engine for Computing Clouds, Daniel Franz, Jie Tao, Holger Marten, and Achim Streit. CLOUD COMPUTING 2011 : The Second International Conference on Cloud Computing, GRIDs, and Virtualization". 2011: 1–6. CiteSeerX   10.1.1.456.6480 .{{cite journal}}: Cite journal requires |journal= (help)
  24. Tao, Jie; et al. (2012). "An Implementation Approach for Inter-Cloud Service Combination" (PDF). International Journal on Advances in Software. 5 (1&2): 65–75.
  25. Neill, Richard W. (2013). Heterogeneous Cloud Systems Based on Broadband Embedded Computing (Thesis). Columbia University. doi:10.7916/d8hh6jg1.
  26. Manjikian, Naraig (2010). "Exploring Multiprocessor Design and Implementation Issues with In-Class Demonstrations". Proceedings of the Canadian Engineering Education Association. doi: 10.24908/pceea.v0i0.3110 . Retrieved January 30, 2016.
  27. Kim, Wooyoung; Voss, M. (2011-01-01). "Multicore Desktop Programming with Intel Threading Building Blocks". IEEE Software. 28 (1): 23–31. doi:10.1109/MS.2011.12. ISSN   0740-7459. S2CID   14305861.
  28. Tchiboukdjian, Marc; Carribault, Patrick; Perache, Marc (2012). "Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI Tasks". 2012 IEEE 26th International Parallel and Distributed Processing Symposium. pp. 366–377. doi:10.1109/IPDPS.2012.42. ISBN   978-1-4673-0975-2. S2CID   15232185.
  29. Ghodrat, Mohammad Ali; Givargis, Tony; Nicolau, Alex (2008). "Control flow optimization in loops using interval analysis". Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems - CASES '08. p. 157. CiteSeerX   10.1.1.144.7693 . doi:10.1145/1450095.1450120. ISBN   9781605584690. S2CID   14310352.
  30. Guerin, Xavier (2010-05-12). Guerin, Xavier. An Efficient Embedded Software Development Approach for Multiprocessor System-on-Chips. Diss. Institut National Polytechnique de Grenoble-INPG, 2010 (phdthesis). Institut National Polytechnique de Grenoble - INPG. Retrieved January 30, 2016.
  31. Milanez, Teo; Collange, Sylvain; Quintão Pereira, Fernando Magno; Meira Jr., Wagner; Ferreira, Renato (2014-10-01). "Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads". Parallel Computing. 40 (9): 548–558. doi:10.1016/j.parco.2014.03.006.
  32. Ojha, Davendar Kumar; Sikka, Geeta (2014-01-01). Satapathy, Suresh Chandra; Avadhani, P. S.; Udgata, Siba K.; Lakshminarayana, Sadasivuni (eds.). A Study on Vectorization Methods for Multicore SIMD Architecture Provided by Compilers. Advances in Intelligent Systems and Computing. Springer International Publishing. pp. 723–728. doi:10.1007/978-3-319-03107-1_79. ISBN   9783319031064.
  33. Kang, Mikyung; Kang, Dong-In; Lee, Seungwon; Lee, Jaedon (2013). "A system framework and API for run-time adaptable parallel software". Proceedings of the 2013 Research in Adaptive and Convergent Systems on - RACS '13. pp. 51–56. doi:10.1145/2513228.2513239. ISBN   9781450323482. S2CID   30376161.
  34. Biswas, Susmit; Supinski, Bronis R. de; Schulz, Martin; Franklin, Diana; Sherwood, Timothy; Chong, Frederic T. (2011). "Exploiting Data Similarity to Reduce Memory Footprints". 2011 IEEE International Parallel & Distributed Processing Symposium. pp. 152–163. CiteSeerX   10.1.1.294.6312 . doi:10.1109/IPDPS.2011.24. ISBN   978-1-61284-372-8. S2CID   14570159.
  35. Man-Lap Li; Sasanka, R.; Adve, S.V.; Yen-Kuang Chen; Debes, E. (2005). "The ALPbench benchmark suite for complex multimedia applications". IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005. pp. 34–45. CiteSeerX   10.1.1.79.42 . doi:10.1109/IISWC.2005.1525999. ISBN   978-0-7803-9461-2. S2CID   7065621.
  36. Zhang, Jiaqi; Chen, Wenguang; Tian, Xinmin; Zheng, Weimin (2008). "Exploring the Emerging Applications for Transactional Memory". 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies. pp. 474–480. doi:10.1109/PDCAT.2008.77. ISBN   978-0-7695-3443-5. S2CID   9699030.
  37. "Almaless, Ghassan, and Franck Wajsburt. "On the scalability of image and signal processing parallel applications on emerging cc-NUMA many-cores." Design and Architectures for Signal and Image Processing (DASIP), 2012 Conference on. IEEE, 2012" (PDF).
  38. Szebenyi, Zolt´n; Wolf, Felix; Wylie, Brian J.N. (2011). "Performance Analysis of Long-Running Applications". 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PHD Forum. pp. 2105–2108. doi:10.1109/IPDPS.2011.388. ISBN   978-1-61284-425-1. S2CID   14284392.
  39. Szebenyi, Zoltán; Wylie, Brian J. N.; Wolf, Felix (2008-06-27). Kounev, Samuel; Gorton, Ian; Sachs, Kai (eds.). SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 99–123. CiteSeerX   10.1.1.167.5445 . doi:10.1007/978-3-540-69814-2_8. ISBN   9783540698135.
  40. Wagner, Michael; Knupfer, Andreas; Nagel, Wolfgang E. (2013). "Hierarchical Memory Buffering Techniques for an In-Memory Event Tracing Extension to the Open Trace Format 2". 2013 42nd International Conference on Parallel Processing. pp. 970–976. doi:10.1109/ICPP.2013.115. ISBN   978-0-7695-5117-3. S2CID   14289974.
  41. Wonyoung Kim; Gupta, Meeta S.; Wei, Gu-Yeon; Brooks, David (2008). "System level analysis of fast, per-core DVFS using on-chip switching regulators". 2008 IEEE 14th International Symposium on High Performance Computer Architecture. pp. 123–134. CiteSeerX   10.1.1.320.879 . doi:10.1109/HPCA.2008.4658633. ISBN   978-1-4244-2070-4. S2CID   538731.
  42. Hackenberg, Daniel; Schöne, Robert; Molka, Daniel; Müller, Matthias S.; Knüpfer, Andreas (2010). "Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks". Computer Science - Research and Development. 25 (3–4): 155–163. doi:10.1007/s00450-010-0118-0. S2CID   12354074.
  43. Ioannou, Nikolas; Kauschke, Michael; Gries, Matthias; Cintra, Marcelo (2011). "Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud Computer". 2011 International Conference on Parallel Architectures and Compilation Techniques. pp. 131–142. CiteSeerX   10.1.1.644.9076 . doi:10.1109/PACT.2011.19. ISBN   978-1-4577-1794-9. S2CID   11697039.