Joel McCormack

Last updated December 13, 2024

Joel McCormack is an American computer scientist who designed the NCR Corporation version of the p-code machine, which is a kind of stack machine popular in the 1970s as the preferred way to implement new computing architectures and languages such as Pascal and BCPL. The NCR design shares no common architecture with the Pascal MicroEngine designed by Western Digital but both were meant to execute the UCSD p-System.[1,2]

P-machine theory

Urs Ammann, a student of Niklaus Wirth, originally presented p-code in his PhD thesis (see Urs Ammann, On Code Generation in a Pascal Compiler, Software: Practice and Experience, Vol. 7, No. 3, 1977, pp. 391–423). The central idea is that a complex software system is coded for a non-existent, fictitious, minimal computer or virtual machine and that computer is realized on specific real hardware with an interpreting computer program that is typically small, simple, and quickly developed. The Pascal programming language had to be re-written for every new computer being acquired, so Ammann proposed writing the system one time to a virtual architecture. The successful academic implementation of Pascal was the UCSD p-System developed by Kenneth Bowles, a professor at UCSD, who began the project of developing a universal Pascal programming environment using the P-machine architecture for the multitude of different computing platforms in use at that time. McCormack was part of a team of undergraduates working on the project.[3] He took this familiarity and experience with him to NCR.

P-machine design

NCR hired McCormack directly out of college. They had previously developed a bit-sliced hardware implementation of a p-code machine using AMD's AM2900 chipset. A myriad of timing and performance problems plagued the machine; McCormack proposed a redesign of the processor, which would have a microsequencer based on programmable logic. When McCormack left NCR to start Volition Systems he continued his work on the processor as a contractor.

This new CPU used horizontal microcode which radically enhanced parallelism within the microarchitecture. These wide, 80-bit microwords allowed the CPU to perform many operations in a single microcycle: the processor could do an arithmetic operation while also performing a memory read into the internal stack, or transfer the contents of a register while at the same time reading new data into the ALU. As a result, many of the simpler p-code operations only took one or two microinstructions; some operations were constructed with tight, single-microword loops.

Two bits per clock selected one of four cycle times for each instruction: 130, 150, or 175 nanoseconds, which generated with a delay line. Faster parts from AMD would have also allowed for a 98 ns cycle time, but there was no correspondingly faster branch control unit. A separate prefetch/instruction formatting unit also used delay lines to generate asynchronous timing signals. This unit had a 32-bit buffer and could decode the next data in multiple formats: signed byte; unsigned byte; word; and compressed "big" format, which encoded small numbers in 0..127 in a single byte, and larger numbers within 128..32767 in two.

An on-board stack of 1024, 16-bit words held temporary values—scalars as well as sets. The stack addresses ran downwards, with the stack pointer decrementing before a write and incrementing after a read. A register in the AMD 2901's internal file held the top-of-stack value so as to accelerate simple operations. Integer addition took only a single instruction cycle; since one operand was always in the register file, only one fetch from stack memory was needed.

Each wide control word could either hold the address of the next microinstruction or it could control the next p-machine instruction to be fetched. Thus, the microsequencer could jump almost arbitrarily the control code. The first 256 microinstructions in memory corresponded to p-machine instructions, so the microassembler would place the first control word in its corresponding location. P-code instructions that took multiple multiple microinstructions to execute could not start with a branch, (as this field is already used to jump to the rest of the microprogram for the instruction). ^{[ citation needed ]}

P-machine architecture

The CPU used the technique of keeping the top word of the stack in one of the AMD 2901 registers. This often resulted in one fewer microinstructions. For example, here are a few p-codes the way they ended up. tos is a register, and q is a register. "|" means parallel activities in a single cycle. (The stack doesn't quite operate this way...it decrements before data is written to it, and increments after data is read.)

Since next-address control and next microcode location were in each wide microword, there was no penalty for any-order execution of the microcode. A table of 256 labels, and the microcode compiler moved the first instruction at each of those labels to the first 256 locations of microcode memory. The only restriction this placed upon the microcode was that if the p-code required more than one microinstruction, then the first microinstruction couldn't have any flow control specified (as it would be filled in with a "goto <rest of microcode for p-code>).

fetch % Fetch and save in an AMD register the next byte opcode from  % the prefetch unit, and go to that location in the microcode.  q := ubyte | goto ubyte  SLDCI % Short load constant integer (push opcode byte)  % Push top-of-stack AMD register onto real stack, load  % the top-of-stack register with the fetched opcode that got us here  dec(sp) | stack := tos | tos := q | goto fetch  LDCI % Load constant integer (push opcode word)  % A lot like SLDCI, except fetch 2-byte word and "push" on stack  dec(sp) | stack := tos | tos := word | goto fetch  SLDL1 % Short load local variable at offset 1  % mpd0 is a pointer to local data at offset 0.  Write appropriate  % data address into the byte-addressed memory-address-register  mar := mpd0+2  % Push tos, load new tos from memory SLDX dec(sp) | stack := tos | tos := memword | goto fetch  LDL % Load local variable at offset specified by "big" operand  r0 := big  mar := mpd0 + r0 | goto sldx  INCR % Increment top-of-stack by big operand  tos := tos + big | goto fetch  ADI % Add two words on top of stack  tos := tos + stack | inc(sp) | goto fetch  EQUI % Top two words of stack equal?  test tos - stack | inc(sp)  tos := 0 | if ~zero goto fetch  tos := 1 | goto fetch

This architecture should be compared to the original P-code machine specification as proposed by Niklaus Wirth.

P-machine performance

The end result was a 9"x11" board for the CPU that ran UCSD p-System faster than anything else, by a wide margin. As much as 35-50 times faster than the LSI-11 interpreter, and 7-9 times faster than the Western Digital Pascal MicroEngine did by replacing the LSI-11 microcode with p-code microcode. It also ran faster than the Niklaus Wirth Lilith machine but lacked the bit-mapped graphics capabilities, and around the same speed as a VAX-11/750 running native code. (But the VAX was hampered by the poor code coming out of the Berkeley Pascal compiler, and was also a 32-bit machine.)

Education

University of California, San Diego: BA, 1978
University of California, San Diego: MS, 1979

Later employment

Publications

Joel McCormack, Robert McNamara. Efficient and Tiled Polygon Traversal Using Half-Plane Edge Functions, to appear as Research Report 2000/4, Compaq Western Research Laboratory, August 2000. [Superset of Workshop paper listed immediately below.]
Joel McCormack, Robert McNamara. Tiled Polygon Traversal Using Half-Plane Edge Functions, Proceedings of the 2000 EUROGRAPHICS/SIGGRAPH Workshop on Graphics Hardware, ACM Press, New York, August 2000, pp. 15–21.
Robert McNamara, Joel McCormack, Norman P. Jouppi. Prefiltered Antialiased Lines Using Half-Plane Distance Functions, Research Report 98/2, Compaq Western Research Laboratory, August 2000. [Superset of Workshop paper listed immediately below.]
Robert McNamara, Joel McCormack, Norman P. Jouppi. Prefiltered Antialiased Lines Using Half-Plane Distance Functions, Proceedings of the 2000 EUROGRAPHICS/SIGGRAPH Workshop on Graphics Hardware, ACM Press, New York, August 2000, pp. 77–85.
Joel McCormack, Keith I. Farkas, Ronald Perry, Norman P. Jouppi. Simple and Table Feline: Fast Elliptical Lines for Anisotropic Texture Mapping, Research Report 99/1, Compaq Western Research Laboratory, October 1999. [Superset of SIGGRAPH paper listed immediately below.]
Joel McCormack, Ronald Perry, Keith I. Farkas, Norman P. Jouppi. Feline: Fast Elliptical Lines for Anisotropic Texture Mapping, SIGGRAPH 99 Conference Proceedings, ACM Press, New York, August 1999, pp. 243–250.
Joel McCormack, Robert McNamara, Christopher Gianos, Larry Seiler, Norman P. Jouppi, Ken Correll, Todd Dutton, John Zurawski. Neon: A (Big) (Fast) Single-Chip 3D Workstation Graphics Accelerator, Research Report 98/1, Compaq Western Research Laboratory, Revised July 1999. [Superset of Workshop and IEEE Neon papers listed immediately below.]
Joel McCormack, Robert McNamara, Christopher Gianos, Larry Seiler, Norman P. Jouppi, Ken Correll, Todd Dutton, John Zurawski. Implementing Neon: A 256-bit Graphics Accelerator, IEEE Micro, Vol. 19, No. 2, March/April 1999, pp. 58–69.
Joel McCormack, Robert McNamara, Christopher Gianos, Larry Seiler, Norman P. Jouppi, Ken Correll. Neon: A Single-Chip 3D Workstation Graphics Accelerator, Proceedings of the 1998 EUROGRAPHICS/SIGGRAPH Workshop on Graphics Hardware, ACM Press, New York, August 1998, pp. 123–132. [Voted Best Paper/Presentation.]
Joel McCormack, Robert McNamara. A Smart Frame Buffer, Research Report 93/1, Digital Equipment Corporation, Western Research Laboratory, January 1993. [Superset of USENIX paper listed immediately below.]
Joel McCormack, Robert McNamara. A Sketch of the Smart Frame Buffer, Proceedings of the 1993 Winter USENIX Conference, USENIX Association, Berkeley, January 1993, pp. 169–179.
Joel McCormack. Writing Fast X Servers for Dumb Color Frame Buffers, Research Report 91/1, Digital Equipment Corporation, Western Research Laboratory, February 1991. [Superset of the Software: Practice and Experience paper listed immediately below.]
Joel McCormack. Writing Fast X Servers for Dumb Color Frame Buffers, Software: Practice and Experience, Vol 20(S2), John Wiley & Sons, Ltd., West Sussex, England, October 1990, pp. 83–108. [Translated and reprinted in the Japanese edition of UNIX Magazine, ASCII Corp., October 1991, pp. 76–96.]
Hania Gajewska, Mark S. Manasse, Joel McCormack. Why X is Not Our Ideal Window System, Software: Practice and Experience, Vol 20(S2), John Wiley & Sons, Ltd., West Sussex, England, October 1990, pp. 137–171.
Paul J. Asente and Ralph R. Swick, with Joel McCormack. X Window System Toolkit: The Complete Programmer's Guide and Specification, X Version 11, Release 4, Digital Press, Maynard, Massachusetts, 1990.
Joel McCormack, Paul Asente. An Overview of the X Toolkit, Proceedings of the ACM SIGGRAPH Symposium on User Interface Software, ACM Press, New York, October 1988, pp. 46–55.
Joel McCormack, Paul Asente. Using the X Toolkit, or, How to Write a Widget. Proceedings of the Summer 1988 USENIX Conference, USENIX Association, Berkeley, June 1988, pp. 1–14.
Joel McCormack. The Right Language for the Job. UNIX Review, REVIEW Publications Co., Renton, Washington, Vol. 3, No. 9, September 1985, pp. 22–32.
Joel McCormack, Richard Gleaves. Modula-2: A Worthy Successor to Pascal, BYTE, Byte Publications, Peterborough, New Hampshire, Vol. 8, No. 4, April 1983, pp. 385–395.

Related Research Articles

A control store is the part of a CPU's control unit that stores the CPU's microprogram. It is usually accessed by a microsequencer. A control store implementation whose contents are unalterable is known as a Read Only Memory (ROM) or Read Only Storage (ROS); one whose contents are alterable is known as a Writable Control Store (WCS).

The 8086 is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus, and is notable as the processor used in the original IBM PC design.

In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer, also known as its machine code. It consists of a set of hardware-level instructions that implement the higher-level machine code instructions or control internal finite-state machine sequencing in many digital processing components. While microcode is utilized in Intel and AMD general-purpose CPUs in contemporary desktops and laptops, it functions only as a fallback path for scenarios that the faster hardwired control unit is unable to manage.

Pascal is an imperative and procedural programming language, designed by Niklaus Wirth as a small, efficient language intended to encourage good programming practices using structured programming and data structuring. It is named after French mathematician, philosopher and physicist Blaise Pascal.

The Pentium is a microprocessor introduced by Intel on March 22, 1993. It is the first CPU using the Pentium brand. Considered the fifth generation in the x86 (8086) compatible line of processors, succeeding the i486, its implementation and microarchitecture was internally called P5.

In computer programming, a P-code machine is a virtual machine designed to execute P-code, the assembly language or machine code of a hypothetical central processing unit (CPU). The term "P-code machine" is applied generically to all such machines, as well as specific implementations using those machines. One of the most notable uses of P-Code machines is the P-Machine of the Pascal-P system. The developers of the UCSD Pascal implementation within this system construed the P in P-code to mean pseudo more often than portable; they adopted a unique label for pseudo-code meaning instructions for a pseudo-machine.

Symbolics, Inc., was a privately held American computer manufacturer that acquired the assets of the former company and continues to sell and maintain the Open Genera Lisp system and the Macsyma computer algebra system.

UCSD Pascal is a Pascal programming language system that runs on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1977. It was developed at the University of California, San Diego (UCSD).

In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interpreter generally uses one of the following strategies for program execution:

Parse the source code and perform its behavior directly;
Translate source code into some efficient intermediate representation or object code and immediately execute that;
Explicitly execute stored precompiled bytecode made by a compiler and matched with the interpreter's virtual machine.

Interlisp is a programming environment built around a version of the programming language Lisp. Interlisp development began in 1966 at Bolt, Beranek and Newman in Cambridge, Massachusetts with Lisp implemented for the Digital Equipment Corporation (DEC) PDP-1 computer by Danny Bobrow and D. L. Murphy. In 1970, Alice K. Hartley implemented BBN LISP, which ran on PDP-10 machines running the operating system TENEX. In 1973, when Danny Bobrow, Warren Teitelman and Ronald Kaplan moved from BBN to the Xerox Palo Alto Research Center (PARC), it was renamed Interlisp. Interlisp became a popular Lisp development tool for artificial intelligence (AI) researchers at Stanford University and elsewhere in the community of the Defense Advanced Research Projects Agency (DARPA). Interlisp was notable for integrating interactive development tools into an integrated development environment (IDE), such as a debugger, an automatic correction tool for simple errors, and analysis tools.

The DECstation was a brand of computers used by DEC, and refers to three distinct lines of computer systems—the first released in 1978 as a word processing system, and the latter two both released in 1989. These comprised a range of computer workstations based on the MIPS architecture and a range of PC compatibles. The MIPS-based workstations ran ULTRIX, a DEC-proprietary version of UNIX, and early releases of OSF/1.

In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.

The CDC Cyber range of mainframe-class supercomputers were the primary products of Control Data Corporation (CDC) during the 1970s and 1980s. In their day, they were the computer architecture of choice for scientific and mathematically intensive computing. They were used for modeling fluid flow, material science stress analysis, electrochemical machining analysis, probabilistic analysis, energy and academic computing, radiation shielding modeling, and other applications. The lineup also included the Cyber 18 and Cyber 1000 minicomputers. Like their predecessor, the CDC 6600, they were unusual in using the ones' complement binary representation.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

The PERQ, also referred to as the Three Rivers PERQ or ICL PERQ, is a pioneering workstation computer produced in the late 1970s through the early 1980s. It is the first commercially-produced personal workstation with a graphical user interface (GUI). The design of the PERQ was heavily influenced by the original workstation computer, the Xerox Alto, which was never commercially produced. The workstation was conceived by six former Carnegie Mellon University alumni and employees: Brian S. Rosen, James R. Teter, William H. Broadley, J. Stanley Kriz, Raj Reddy and Paul G. Newbury, who formed the startup Three Rivers Computer Corporation (3RCC) in 1974.

Am2900 is a family of integrated circuits (ICs) created in 1975 by Advanced Micro Devices (AMD). They were constructed with bipolar devices, in a bit-slice topology, and were designed to be used as modular components each representing a different aspect of a computer control unit (CCU). By using the bit slicing technique, the Am2900 family was able to implement a CCU with data, addresses, and instructions to be any multiple of 4 bits by multiplying the number of ICs. One major problem with this modular technique was that it required a larger number of ICs to implement what could be done on a single CPU IC. The Am2901 chip included an arithmetic logic unit (ALU) and 16 4-bit processor register slices, and was the "core" of the series. It could count using 4 bits and implement binary operations as well as various bit-shifting operations. The Am2909 was a 4-bit-slice address sequencer that could generate 4-bit addresses on a single chip, and by using n of them, it was able to generate 4n-bit addresses. It had a stack that could store a microprogram counter up to 4 nest levels, as well as a stack pointer.

Pascal MicroEngine is a series of microcomputer products manufactured by Western Digital from 1979 through the mid-1980s, designed specifically to run the UCSD p-System efficiently. Compared to other microcomputers, which use a machine language p-code interpreter, the Pascal MicroEngine has its interpreter implemented in microcode; p-code is its machine language. The most common programming language used on the p-System is Pascal.

The MCP-1600 is a multi-chip 16-bit microprocessor introduced by Western Digital in 1975 and produced through the early 1980s. Used in the Pascal MicroEngine, the WD16 processor in the Alpha Microsystems AM-100, and the DEC LSI-11 microcomputer, a cost-reduced and compact implementation of the DEC PDP-11.

A high-level language computer architecture (HLLCA) is a computer architecture designed to be targeted by a specific high-level programming language (HLL), rather than the architecture being dictated by hardware considerations. It is accordingly also termed language-directed computer design, coined in McKeeman (1967) and primarily used in the 1960s and 1970s. HLLCAs were popular in the 1960s and 1970s, but largely disappeared in the 1980s. This followed the dramatic failure of the Intel 432 (1981) and the emergence of optimizing compilers and reduced instruction set computer (RISC) architectures and RISC-like complex instruction set computer (CISC) architectures, and the later development of just-in-time compilation (JIT) for HLLs. A detailed survey and critique can be found in Ditzel & Patterson (1980).

Intel microcode is microcode that runs inside x86 processors made by Intel. Since the P6 microarchitecture introduced in the mid-1990s, the microcode programs can be patched by the operating system or BIOS firmware to work around bugs found in the CPU after release. Intel had originally designed microcode updates for processor debugging under its design for testing (DFT) initiative.

References

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

v t e NCR Voyix
Products and services	304 315 340 5380 53C9x Carbonless Copy Paper Century 100 CRAM SLU TMX Voyager VRX WaveLAN
Divisions	4Front UNIX Copient Technologies NCR Self-Service Netkey Digital Signage Symbios Logic Teradata Data Warehousing
People	Bill Anderson Ed Deeds Joe Desch Joel McCormack Lars Nyberg Bob Oelman Francis Osborn John Patterson James Ritty Tom Watson
Other	Babbitt Book Award Building Career Center Country Club Institute Patterson House Slab Slidertown Wright Bros. House WWII Enigma Lab
Category Commons