Cyclops64

Last updated
The architecture for Cyclops64 C64 architecture.png
The architecture for Cyclops64

Cyclops64 (formerly known as Blue Gene/C) is a cellular architecture in development by IBM. The Cyclops64 project aims to create the first "supercomputer on a chip".

Cellular architecture

A cellular architecture is a type of computer architecture prominent in parallel computing. Cellular architectures are relatively new, with IBM's Cell microprocessor being the first one to reach the market. Cellular architecture takes multi-core architecture design to its logical conclusion, by giving the programmer the ability to run large numbers of concurrent threads within a single processor. Each 'cell' is a compute node containing thread units, memory, and communication. Speed-up is achieved by exploiting thread-level parallelism inherent in many applications.

IBM American multinational technology and consulting corporation

International Business Machines Corporation (IBM) is an American multinational information technology company headquartered in Armonk, New York, with operations in over 170 countries. The company began in 1911, founded in Endicott, New York, as the Computing-Tabulating-Recording Company (CTR) and was renamed "International Business Machines" in 1924.

Supercomputer extremely powerful computer for its era

A supercomputer is a computer with a high level of performance compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there are supercomputers which can perform up to nearly a hundred quadrillion FLOPS. Since November 2017, all of the world's fastest 500 supercomputers run Linux-based operating systems. Additional research is being conducted in China, the United States, the European Union, Taiwan and Japan to build even faster, more powerful and more technologically superior exascale supercomputers.

Contents

History

Cyclops64 is part of the Blue Gene effort, to produce the next several generations of supercomputers. The projects were started in response to the announced construction of the Earth Simulator.

Earth Simulator highly parallel vector supercomputer system for running global climate models

The Earth Simulator (ES), developed by the Japanese government's initiative "Earth Simulator Project", was a highly parallel vector supercomputer system for running global climate models to evaluate the effects of global warming and problems in solid earth geophysics. The system was developed for Japan Aerospace Exploration Agency, Japan Atomic Energy Research Institute, and Japan Marine Science and Technology Center (JAMSTEC) in 1997. Construction started in October 1999, and the site officially opened on 11 March 2002. The project cost 60 billion yen.

Cyclops64 is a cooperative project between the United States Department of Energy (which is partially funding the project), the U.S. Department of Defense, industry (IBM in particular), and academia.

United States Department of Energy Cabinet-level department of the United States government

The United States Department of Energy (DOE) is a cabinet-level department of the United States Government concerned with the United States' policies regarding energy and safety in handling nuclear material. Its responsibilities include the nation's nuclear weapons program, nuclear reactor production for the United States Navy, energy conservation, energy-related research, radioactive waste disposal, and domestic energy production. It also directs research in genomics; the Human Genome Project originated in a DOE initiative. DOE sponsors more research in the physical sciences than any other U.S. federal agency, the majority of which is conducted through its system of National Laboratories. The agency is administered by the United States Secretary of Energy, and its headquarters are located in Southwest Washington, D.C., on Independence Avenue in the James V. Forrestal Building, named for James Forrestal, as well as in Germantown, Maryland.

United States Department of Defense United States federal executive department

The Department of Defense is an executive branch department of the federal government charged with coordinating and supervising all agencies and functions of the government concerned directly with national security and the United States Armed Forces. The department is the largest employer in the world, with nearly 1.3 million active duty servicemen and women as of 2016. Adding to its employees are over 826,000 National Guardsmen and Reservists from the four services, and over 732,000 civilians bringing the total to over 2.8 million employees. Headquartered at the Pentagon in Arlington, Virginia, just outside Washington, D.C., the DoD's stated mission is to provide "the military forces needed to deter war and ensure our nation's security".

The architecture was conceived by Seymour Cray Award winner Monty Denneau, who is currently leading the project.

The Seymour Cray Computer Engineering Award, also known as the Seymour Cray Award, is an award given by the IEEE Computer Society, to recognize significant and innovative contributions in the field of high-performance computing. The award honors scientists who exhibit the creativity demonstrated by Seymour Cray, founder of Cray Research, Inc., and an early pioneer of supercomputing. Cray was an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research which built many of these machines. Called "the father of supercomputing," Cray has been credited with creating the supercomputer industry. He played a key role in the invention and design of the UNIVAC 1103, a landmark high-speed computer and the first computer available for commercial use.

Monty Denneau Computer architect and mathematician

Monty M. Denneau is a computer architect and mathematician. Denneau was awarded the 2002 Seymour Cray Computer Engineering Award for "ingenious and sustained contributions to designs and implementations at the frontier of high performance computing leading to widely used industrial products."

Architecture overview

Each 64-bit Cyclops64 chip (processor) will run at 500 megahertz and contain 80 processors. Each processor will have two thread units and a floating point unit. A thread unit is an in-order 64-bit RISC core with 32 kB scratch pad memory, using a 60-instruction subset of the Power ISA instruction set. Five processors share a 32 kB instruction cache.

Power ISA Computer instruction set architecture

The Power ISA is an instruction set architecture (ISA) developed by the OpenPOWER Foundation, led by IBM. It was originally developed by the defunct Power.org industry group. Power ISA is an evolution of the PowerPC ISA, created by the mergers of the core PowerPC ISA and the optional Book E for embedded applications. The merger of these two components in 2006 was led by Power.org founders IBM and Freescale Semiconductor. The ISA is divided into several categories and every component is defined as a part of a category; each category resides within a certain Book. Processors implement a set of these categories. Different classes of processors are required to implement certain categories, for example a server class processor includes the categories Base, Server, Floating-Point, 64-Bit, etc. All processors implement the Base category.

The processors will be connected with a 96 port, 7 stage non-internally blocking crossbar switch. They will communicate with each other via global interleaved memory (memory that can be written to and read by all threads) in the SRAM.

In electronics, a crossbar switch is a collection of switches arranged in a matrix configuration. A crossbar switch has multiple input and output lines that form a crossed pattern of interconnecting lines between which a connection may be established by closing a switch located at each intersection, the elements of the matrix. Originally, a crossbar switch consisted literally of crossing metal bars that provided the input and output paths. Later implementations achieved the same switching topology in solid state semiconductor chips. The cross-point switch is one of the principal switch architectures, together with a rotary switch, memory switch, and a crossover switch.

The theoretical peak performance of a Cyclops64 chip is 80 gigaflops (this assumes a continuous stream of multiply–accumulate instructions, each of which are counted as two floating-point operations). A full system (consisting of 2 thread units per processor, 80 processors per chip, 1 chip per board, 48 boards per midplane, 3 midplanes per rack, and 96 (12 x 8) racks per system) would contain 13,824 C64 chips, consisting of 1,105,920 processors capable of running 2,211,840 concurrent threads.

Software

Cyclops64 exposes much of the underlying hardware to the programmer, allowing the programmer to write very high performance, finely tuned software. One negative consequence is that efficiently programming Cyclops64 is difficult. [ citation needed ]

The system is expected to support TiNy-Threads (a threading library developed at the University of Delaware) and POSIX Threads.

Design and fabrication

Verification testing and system software development is being done at the University of Delaware.

Related Research Articles

Central processing unit electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions

A central processing unit (CPU), also called a central processor or main processor, is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions. The computer industry has used the term "central processing unit" at least since the early 1960s. Traditionally, the term "CPU" refers to a processor, more specifically to its processing unit and control unit (CU), distinguishing these core elements of a computer from external components such as main memory and I/O circuitry.

Kendall Square Research

Kendall Square Research (KSR) was a supercomputer company headquartered originally in Kendall Square in Cambridge, Massachusetts in 1986, near Massachusetts Institute of Technology (MIT). It was co-founded by Steven Frank and Henry Burkhardt III, who had formerly helped found Data General and Encore Computer and was one of the original team that designed the PDP-8. KSR produced two models of supercomputer, the KSR1 and KSR2.

Microcode is a computer hardware technique that imposes an interpreter between the CPU hardware and the programmer-visible instruction set architecture of the computer. As such, the microcode is a layer of hardware-level instructions that implement higher-level machine code instructions or internal state machine sequencing in many digital processing elements. Microcode is used in general-purpose central processing units, although in current desktop CPUs it is only a fallback path for cases that the faster hardwired control unit cannot handle.

SPARC RISC instruction set architecture

SPARC is a reduced instruction set computing (RISC) instruction set architecture (ISA) originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed in the early 1980s. First released in 1987, SPARC was one of the most successful early commercial RISC systems, and its success led to the introduction of similar RISC designs from a number of vendors through the 1980s and 90s.

SIMD class of parallel computers in Flynns taxonomy, with multiple processing elements that perform the same operation on multiple data points simultaneously

Single instruction, multiple data (SIMD) is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment. SIMD is particularly applicable to common tasks such as adjusting the contrast in a digital image or adjusting the volume of digital audio. Most modern CPU designs include SIMD instructions to improve the performance of multimedia use. SIMD is not to be confused with SIMT, which utilizes threads.

The Intel i860 was a RISC microprocessor design introduced by Intel in 1989. It was one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the 1980s. It was released with considerable fanfare, slightly obscuring the earlier Intel i960, which was successful in some niches of embedded systems, and which many considered to be a better design. The i860 never achieved commercial success and the project was terminated in the mid-1990s.

IBM Blue Gene series of supercomputers by IBM

Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the PFLOPS (petaFLOPS) range, with low power consumption.

In computer architecture, 64-bit computing is the use of processors that have datapath widths, integer size, and memory address widths of 64 bits. Also, 64-bit computer architectures for central processing units (CPUs) and arithmetic logic units (ALUs) are those that are based on processor registers, address buses, or data buses of that size. From the software perspective, 64-bit computing means the use of code with 64-bit virtual memory addresses. However, not all 64-bit instruction sets support full 64-bit virtual memory addresses; x86-64 and ARMv8, for example, support only 48 bits of virtual address, with the remaining 16 bits of the virtual address required to be all 0's or all 1's, and several 64-bit instruction sets support fewer than 64 bits of physical memory address.

The IBM RS64 is a family of microprocessors that were used in the late 1990s in IBM's RS/6000 and AS/400 servers.

Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation.

The POWER1 is a multi-chip CPU developed and fabricated by IBM that implemented the POWER instruction set architecture (ISA). It was originally known as the RISC System/6000 CPU or, when in an abbreviated form, the RS/6000 CPU, before introduction of successors required the original name to be replaced with one that used the same naming scheme (POWERn) as its successors in order to differentiate it from the newer designs.

POWER7

POWER7 is a family of superscalar symmetric multiprocessors based on the Power ISA 2.06 instruction set architecture released in 2010 that succeeded the POWER6. POWER7 was developed by IBM at several sites including IBM's Rochester, MN; Austin, TX; Essex Junction, VT; T. J. Watson Research Center, NY; Bromont, QC and IBM Deutschland Research & Development GmbH, Böblingen, Germany laboratories. IBM announced servers based on POWER7 on 8 February 2010.

The IBM A2 is a massively multicore capable and multithreaded 64-bit Power ISA processor core designed by IBM using the Power ISA v.2.06 specification. Versions of processors based on the A2 core range from a 2.3 GHz version with 16 cores consuming 65 W to a less powerful, four core version, consuming 20 W at 1.4 GHz. Each A2 core is capable of four-way multithreading and have 16 KB+16 KB instruction and data cache per core. All core variants execute instructions in-order.

Adapteva is a fabless semiconductor company focusing on low power many core microprocessor design. The company was the second company to announce a design with 1,000 specialized processing cores on a single integrated circuit.

The IBM POWER ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC.

IBM has a series of high performance microprocessors called POWER followed by a number designating generation, i.e. POWER1, POWER2, POWER3 and so forth up to the latest POWER9. These processors have been used by IBM in their RS/6000, AS/400, pSeries, iSeries, System p, System i and Power Systems line of servers and supercomputers. They have also been used in data storage devices by IBM and by other server manufacturers like Bull and Hitachi.

The z13 is a microprocessor made by IBM for their z13 mainframe computers, announced on January 14, 2015. Manufactured at GlobalFoundries' East Fishkill, New York fabrication plant. IBM stated that it is the world's fastest microprocessor and is about 10% faster than its predecessor the zEC12 in general single-threaded computing, but significantly more when doing specialized tasks.

Fermi (supercomputer)

Fermi is a 2.097 petaFLOPS supercomputer located at CINECA.

The SW26010 is a 260-core manycore processor designed by the National High Performance Integrated Circuit Design Center in Shanghai. It implements the Sunway architecture, a 64-bit reduced instruction set computing (RISC) architecture designed in China. The SW26010 has four clusters of 64 Compute-Processing Elements (CPEs) which are arranged in an eight-by-eight array. The CPEs support single instruction, multiple data (SIMD) instructions, and are capable of performing eight double-precision floating-point operations per cycle. Each cluster is accompanied by a more conventional general-purpose core called the Management Processing Element (MPE) that provides supervisory functions. Each cluster has its own dedicated DDR3 SDRAM controller, and a memory bank with its own address space. The processor runs at a clock speed of 1.45 GHz.