A massively parallel processor array, also known as a multi purpose processor array (MPPA) is a type of integrated circuit which has a massively parallel array of hundreds or thousands of CPUs and RAM memories. These processors pass work to one another through a reconfigurable interconnect of channels. By harnessing a large number of processors working in parallel, an MPPA chip can accomplish more demanding tasks than conventional chips. MPPAs are based on a software parallel programming model for developing high-performance embedded system applications.
MPPA is a MIMD (Multiple Instruction streams, Multiple Data) architecture, with distributed memory accessed locally, not shared globally. Each processor is strictly encapsulated, accessing only its own code and memory. Point-to-point communication between processors is directly realized in the configurable interconnect.[1]
The MPPA's massive parallelism and its distributed memory MIMD architecture distinguishes it from multicore and manycore architectures, which have fewer processors and an SMP or other shared memory architecture, mainly intended for general-purpose computing. It's also distinguished from GPGPUs with SIMD architectures, used for HPC applications.[2]
Programming
An MPPA application is developed by expressing it as a hierarchical block diagram or workflow, whose basic objects run in parallel, each on their own processor. Likewise, large data objects may be broken up and distributed into local memories with parallel access. Objects communicate over a parallel structure of dedicated channels. The objective is to maximize aggregate throughput while minimizing local latency, optimizing performance and efficiency. An MPPA's model of computation is similar to a Kahn process network or communicating sequential processes (CSP).[3]
The Chinese Sunway project developed their own 260-core SW26010 manycore chip for the TaihuLight supercomputer, which was, from June 2016 to June 2018, the world's fastest supercomputer.[16][17]
Anton 3 processors, designed by D. E. Shaw Research for molecular dynamics simulations, contain arrays of 576 processors arranged in a 12×24 tiled grid of pairs of cores; a routed network links these tiles together and extends off-chip to other nodes in a full system.[18][19]
↑ Mike Butts. "Multicore and Massively Parallel Platforms and Moore's Law Scalability". Proceedings of the Embedded Systems Conference - Silicon Valley, April 2008.
↑ Mike Butts; Brad Budlong; Paul Wasson; Ed White (April 2008). Reconfigurable Work Farms on a Massively Parallel Processor Array. 2008 16th International Symposium on Field-Programmable Custom Computing Machines. IEEE Computer Society. doi:10.1109/FCCM.2008.6.
↑ Yu, Zhiyi; Meeuwsen, Michael; Apperson, Ryan; Sattari, Omar; Lai, Michael; Webb, Jeremy; Work, Eric; Mohsenin, Tinoosh; Singh, Mandeep; Baas, Bevan (2006). An asynchronous array of simple processors for DSP applications. IEEE International Solid-State Circuits Conference (ISSCC’06). Vol.49. pp.428–429. doi:10.1109/ISSCC.2006.1696225.
↑ Truong, Dean; Cheng, Wayne; Mohsenin, Tinoosh; Yu, Zhiyi; Jacobson, Toney; Landge, Gouri; Meeuwsen, Michael; etal. (2008). A 167-processor 65 nm computational platform with per-processor dynamic supply voltage and dynamic clock frequency scaling. Symposium on VLSI Circuits. pp.22–23. doi:10.1109/VLSIC.2008.4585936.
↑ Michael Bedford Taylor; Jason Kim; Jason Miller; David Wentzlaff; Fae Ghodrat; Ben Greenwald; Henry Hoffmann; Paul Johnson; Walter Lee; Arvind Saraf; Nathan Shnidman; Volker Strumpen; Saman Amarasinghe; Anant Agarwal (February 2003). "A 16-issue multiple-program-counter microprocessor with point-to-point scalar operand network". Proceedings of the IEEE International Solid-State Circuits Conference. doi:10.1109/ISSCC.2003.1234253.
↑ Yu, Zhiyi; You, Kaidi; Xiao, Ruijin; Quan, Heng; Ou, Peng; Ying, Yan; Yang, Haofan; Zeng, Xiaoyang (2012). "An 800MHz 320mW 16-core processor with message-passing and shared-memory inter-core communication mechanisms". 2012 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE. pp.64–66. doi:10.1109/ISSCC.2012.6176931.
↑ Ou, Peng; Zhang, Jiajie; Quan, Heng; Li, Yi; He, Maofei; Yu, Zheng; Yu, Xueqiu; etal. (2013). "A 65nm 39GOPS/W 24-core processor with 11Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array". 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE. pp.56–57. doi:10.1109/ISSCC.2013.6487635.
↑ Shaw, David E.; Adams, Peter J.; Azaria, Asaph; Bank, Joseph A.; Batson, Brannon; Bell, Alistair; Bergdorf, Michael; Bhatt, Jhanvi; Butts, J. Adam; Correia, Timothy; Dirks, Robert M.; Dror, Ron O.; Eastwood, Michael P.; Edwards, Bruce; Even, Amos (2021-11-14). "Anton 3". Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. St. Louis Missouri: ACM. pp.1–11. doi:10.1145/3458817.3487397. ISBN978-1-4503-8442-1. S2CID239036976.
↑ Adams, Peter J.; Batson, Brannon; Bell, Alistair; Bhatt, Jhanvi; Butts, J. Adam; Correia, Timothy; Edwards, Bruce; Feldmann, Peter; Fenton, Christopher H.; Forte, Anthony; Gagliardo, Joseph; Gill, Gennette; Gorlatova, Maria; Greskamp, Brian; Grossman, J.P. (2021-08-22). "The ΛNTON 3 ASIC: A Fire-Breathing Monster for Molecular Dynamics Simulations". 2021 IEEE Hot Chips 33 Symposium (HCS). Palo Alto, CA, USA: IEEE. pp.1–22. doi:10.1109/HCS52781.2021.9567084. ISBN978-1-6654-1397-8. S2CID239039245.
This page is based on this Wikipedia article Text is available under the CC BY-SA 4.0 license; additional terms may apply. Images, videos and audio are available under their respective licenses.