ST200 family

Last updated

The ST200 is a family of very long instruction word (VLIW) processor cores based on technology jointly developed by Hewlett-Packard Laboratories and STMicroelectronics under the name Lx. The main application of the ST200 family is embedded media processing.

Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

Hewlett-Packard American information technology company

The Hewlett-Packard Company or Hewlett-Packard was an American multinational information technology company headquartered in Palo Alto, California. It developed and provided a wide variety of hardware components as well as software and related services to consumers, small- and medium-sized businesses (SMBs) and large enterprises, including customers in the government, health and education sectors.

STMicroelectronics French-Italian multinational electronics and semiconductor manufacturer headquartered in Schiphol, Amsterdam (Netherlands)

STMicroelectronics is a French-Italian multinational electronics and semiconductor manufacturer headquartered in Geneva, Switzerland. It is commonly called ST, and it is Europe's largest semiconductor chip maker based on revenue. While STMicroelectronics corporate headquarters and the headquarters for EMEA region are based in Geneva, the holding company, STMicroelectronics N.V. is registered in Amsterdam, Netherlands.

Contents

Lx architecture

The Lx architecture is closer to the original VLIW architecture defined by the Trace processor series from Multiflow than to the EPIC architectures exemplified by the IA-64. Precisely, the Lx is a symmetric clustered architecture, where clusters communicate through explicit send and receive instructions. Each cluster executes up to 4 instructions per cycle with a maximum of one control instruction (goto, jump, call, return), one memory instruction (load, store, pre-fetch), and two multiply instructions per cycle. All arithmetic instructions operate on integer values with operands belonging either to the general register file (64 x 32-bit) or to the branch register file (8 x 1-bit). General register $r0 always reads as zero, while general register $r63 is the link register. In order to eliminate some conditional branches, the Lx architecture also provides partial predication support in the form of conditional selection instructions. There is no division instruction, but a divide step instruction is provided. All instructions are fully pipelined. The RAW latencies are single-cycle except for the load, multiply, compare to branch RAW latencies. The WAR latencies are zero cycles and the WAW latencies are single cycle.

Multiflow Computer, Inc., founded in April, 1984 near New Haven, Connecticut, USA, was a manufacturer and seller of minisupercomputer hardware and software embodying the VLIW design style. Multiflow, incorporated in Delaware, ended operations in March, 1990, after selling about 125 VLIW minisupercomputers in the United States, Europe, and Japan.

Explicitly parallel instruction computing (EPIC) is a term coined in 1997 by the HP–Intel alliance to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called Independence architectures. It was the basis for Intel and HP development of the Intel Itanium architecture, and HP later asserted that "EPIC" was merely an old term for the Itanium architecture. EPIC permits microprocessors to execute software instructions in parallel by using the compiler, rather than complex on-die circuitry, to control parallel instruction execution. This was intended to allow simple performance scaling without resorting to higher clock frequencies.

IA-64

IA-64 is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was evolved and then implemented in a new processor microarchitecture by Intel with HP's continued partnership and expertise on the underlying EPIC design concepts. In order to establish what was their first new ISA in 20 years and bring an entirely new product line to market, Intel made a massive investment in product definition, design, software development tools, OS, software industry partnerships, and marketing. To support this effort Intel created the largest design team in their history and a new marketing and industry enabling team completely separate from x86. The first Itanium processor, codenamed Merced, was released in 2001.

The principal architects for the ST200 Lx implementation [1] were Paolo Faraboschi (HPL, architecture) and Fred Homewood (STM, microarchitecture). Key members of the architecture and microarchitecture team included Geoffrey Brown (HPL co-lead), Giuseppe Desoli (HP), Gary Vondran (HP), Trefor Southwell (ST), Tony Jarvis (ST), and Alex Starr (ST).

The architecture was really a true cross company development, co-sited for the early duration of the project, lasting some two years.

ST200 cores

The ST200 VLIW family currently comprises the ST210, ST220, ST231 cores, which are single-cluster implementations of the Lx architecture. The differences among these cores are minimal:

Memory management unit hardware translating virtual addresses to physical address

A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical addresses. It is usually implemented as part of the central processing unit (CPU), but it also can be in the form of a separate integrated circuit.

In digital video, STM reported in 2009 that it had shipped over 40 million systems-on-chip (SoCs) containing a VLIW processor from the ST200 family. Since many of these SoCs contain multiple ST200s (the STi7200 contains four ST231s), they actually shipped in excess of 70 million of these VLIW processors. [2]

Compiling tools

The first ST210 compiler was the HP Lx compiler developed at HP Labs Cambridge, itself a descendant of the Multiflow Trace scheduling compiler and heavily modified by HP to target the embedded domain. Starting with the ST220, STMicroelectronics introduced compilers based on the Open64 technology. In these compilers, the Open64 release has been improved by upgrading its GCC C and C++ front-end from 2.96 to 3.x and later 4.x, in order to achieve full C++ compliance. The GNU C extensions have been fully implemented in the Open64, including the asm statements. As a result, the Linux kernel can be compiled for the ST200.

Trace scheduling is an optimization technique developed by Josh Fisher used in compilers for computer programs.

Open64 is a free, open-source, optimizing compiler for the Itanium and x86-64 microprocessor architectures. It derives from the SGI compilers for the MIPS R10000 processor, called MIPSPro. It was initially released in 2000 as GNU GPL software under the name Pro64. The following year, University of Delaware adopted the project and renamed the compiler to Open64. It now mostly serves as a research platform for compiler and computer architecture research groups. Open64 supports Fortran 77/95 and C/C++, as well as the shared memory programming model OpenMP. It can conduct high-quality interprocedural analysis, data-flow analysis, data dependence analysis, and array region analysis. Development has ceased, although other projects can use the project's source.

The GNU Compiler Collection (GCC) is a compiler system produced by the GNU Project supporting various programming languages. GCC is a key component of the GNU toolchain and the standard compiler for most Unix-like operating systems. The Free Software Foundation (FSF) distributes GCC under the GNU General Public License. GCC has played an important role in the growth of free software, as both a tool and an example.

The other ST200 compilation tools are straightforward ports of GNU as, GNU ld, and GDB.

Related Research Articles

Processor design is the design engineering task of creating a processor, a component of computer hardware. It is a subfield of computer engineering and electronics engineering (fabrication). The design process involves choosing an instruction set and a certain execution paradigm and results in a microarchitecture, which might be described in e.g. VHDL or Verilog. For microprocessor design, this description is then manufactured employing some of the various semiconductor device fabrication processes, resulting in a die which is bonded onto a chip carrier. This chip carrier is then soldered onto, or inserted into a socket on, a printed circuit board (PCB).

MIPS is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by MIPS Computer Systems.

SuperH is a 32-bit reduced instruction set computing (RISC) instruction set architecture (ISA) developed by Hitachi and currently produced by Renesas. It is implemented by microcontrollers and microprocessors for embedded systems.

The Intel i860 was a RISC microprocessor design introduced by Intel in 1989. It was one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the 1980s. It was released with considerable fanfare, slightly obscuring the earlier Intel i960, which was successful in some niches of embedded systems, and which many considered to be a better design. The i860 never achieved commercial success and the project was terminated in the mid-1990s.

Digital signal processor specialized microprocessor

A digital signal processor (DSP) is a specialized microprocessor, with its architecture optimized for the operational needs of digital signal processing.

Instruction-level parallelism

Instruction-level parallelism (ILP) is a measure of how many of the instructions in a computer program can be executed simultaneously.

Loongson is a family of general-purpose MIPS64 CPUs developed at the Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) in China. The chief architect is Professor Hu Weiwu. It was formerly called Godson.

Josh Fisher American and Spanish computer scientist

Joseph A "Josh" Fisher is an American and Spanish computer scientist noted for his work on VLIW architectures, compiling, and instruction-level parallelism, and for the founding of Multiflow Computer. He is a Hewlett-Packard Senior Fellow (Emeritus).

In computer architecture, a transport triggered architecture (TTA) is a kind of processor design in which programs directly control the internal transport buses of a processor. Computation happens as a side effect of data transports: writing data into a triggering port of a functional unit triggers the functional unit to start a computation. This is similar to what happens in a systolic array. Due to its modular structure, TTA is an ideal processor template for application-specific instruction-set processors (ASIP) with customized datapath but without the inflexibility and design cost of fixed function hardware accelerators.

The Fujitsu FR-V is one of the very few processors ever able to process both a very long instruction word (VLIW) and vector processor instructions at the same time, increasing throughput with high parallel computing while increasing performance per watt and hardware efficiency. The family was presented in 1999. Its design was influenced by the VPP500/5000 models of the Fujitsu VP/2000 vector processor supercomputer line.

POWER3

The POWER3 is a microprocessor, designed and exclusively manufactured by IBM, that implemented the 64-bit version of the PowerPC instruction set architecture (ISA), including all of the optional instructions of the ISA such as instructions present in the POWER2 version of the POWER ISA but not in the PowerPC ISA. It was introduced on 5 October 1998, debuting in the RS/6000 43P Model 260, a high-end graphics workstation. The POWER3 was originally supposed to be called the PowerPC 630 but was renamed, probably to differentiate the server-oriented POWER processors it replaced from the more consumer-oriented 32-bit PowerPCs. The POWER3 was the successor of the P2SC derivative of the POWER2 and completed IBM's long-delayed transition from POWER to PowerPC, which was originally scheduled to conclude in 1995. The POWER3 was used in IBM RS/6000 servers and workstations at 200 MHz. It competed with the Digital Equipment Corporation (DEC) Alpha 21264 and the Hewlett-Packard (HP) PA-8500.

R10000

The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture (ISA) developed by MIPS Technologies, Inc. (MTI), then a division of Silicon Graphics, Inc. (SGI). The chief designers are Chris Rowen and Kenneth C. Yeager. The R10000 microarchitecture is known as ANDES, an abbreviation for Architecture with Non-sequential Dynamic Execution Scheduling. The R10000 largely replaces the R8000 in the high-end and the R4400 elsewhere. MTI was a fabless semiconductor company; the R10000 was fabricated by NEC and Toshiba. Previous fabricators of MIPS microprocessors such as Integrated Device Technology (IDT) and three others did not fabricate the R10000 as it was more expensive to do so than the R4000 and R4400.

R5000

The R5000 is a microprocessor that implements the MIPS IV instruction set architecture (ISA) developed by Quantum Effect Design (QED). The project was funded by MIPS Technologies, Inc (MTI), also the licensor. MTI then licensed the design to Integrated Device Technology (IDT), NEC, NKK, and Toshiba. The R5000 succeeded the QED R4600 and R4700 as their flagship high-end embedded microprocessor. IDT marketed its version of the R5000 as the 79RV5000, NEC as VR5000, NKK as the NR5000, and Toshiba as the TX5000. The R5000 was sold to PMC-Sierra when the company acquired QED. Derivatives of the R5000 are still in production today for embedded systems.

PRISM was Apollo Computer's high-performance CPU used in their DN10000 series workstations. It was for some time the fastest microprocessor available, a high fraction of a Cray-1 in a workstation. Hewlett Packard purchased Apollo in 1989, ending development of PRISM, although some of PRISM's ideas were later used in HP's own HP-PA Reduced instruction set computer (RISC) and Itanium processors.

Advanced Vector Extensions are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later on by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.

R4600

The R4600, code-named "Orion", is a microprocessor developed by Quantum Effect Design (QED) that implemented the MIPS III instruction set architecture (ISA). As QED was a design firm that did not fabricate or sell their designs, the R4600 was first licensed to Integrated Device Technology (IDT), and later to Toshiba and then NKK. These companies fabricated the microprocessor and marketed it. The R4600 was designed as a low-end workstation or high-end embedded microprocessor. Users included Silicon Graphics, Inc. (SGI) for their Indy workstation and DeskStation Technology for their Windows NT workstations. The R4600 was instrumental in making the Indy successful by providing good integer performance at a competitive price. In embedded systems, prominent users included Cisco Systems in their network routers and Canon in their printers.

Fermi is the codename for a GPU microarchitecture developed by Nvidia, first released to retail in April 2010, as the successor to the Tesla microarchitecture. It was the primary microarchitecture used in the GeForce 400 series and GeForce 500 series. It was followed by Kepler, and used alongside Kepler in the GeForce 600 series, GeForce 700 series, and GeForce 800 series, in the latter two only in mobile GPUs. In the workstation market, Fermi found use in the Quadro x000 series, Quadro NVS models, as well as in Nvidia Tesla computing modules. All desktop Fermi GPUs were manufactured in 40 nm, mobile Fermi GPUs in 40 nm and 28 nm. Fermi is the oldest microarchitecture from NVIDIA that received support for the Microsoft's rendering API Direct3D 12 feature_level 11.

References

  1. Paolo Faraboschi, Geoffrey Brown, Joseph A. Fisher, Giuseppe Desoli, Fred (Mark Owen) Homewood, Lx: A Technology Platform for Customizable VLIW Embedded Processing, in Proc. 27th Annu. Int. Symp. Computer Architecture, June 2000, pp. 203–213.
  2. Fisher, Faraboschi, and Young. VLIW Processors: From Blue Sky to Best Buy, "IEEE SOLID-STATE CIRCUITS MAGAZINE", June 2009, 10-17.