A binary recompiler is a compiler that takes executable binary files as input, analyzes their structure, applies transformations and optimizations, and outputs new optimized executable binaries. [1]
The foundation to the concepts of binary recompilation were laid out by Gary Kildall [2] [3] [4] [5] [6] [7] [8] with the development of the optimizing assembly code translator XLT86 in 1981. [4] [9] [10] [11]
Gary Arlen Kildall was an American computer scientist and microcomputer entrepreneur.
The Intel 8080 ("eighty-eighty") is the second 8-bit microprocessor designed and manufactured by Intel. It first appeared in April 1974 and is an extended and enhanced variant of the earlier 8008 design, although without binary compatibility. The initial specified clock rate or frequency limit was 2 MHz, with common instructions using 4, 5, 7, 10, or 11 cycles. As a result, the processor is able to execute several hundred thousand instructions per second. Two faster variants, the 8080A-1 and 8080A-2, became available later with clock frequency limits of 3.125 MHz and 2.63 MHz respectively. The 8080 needs two support chips to function in most applications: the i8224 clock generator/driver and the i8228 bus controller. It is implemented in N-type metal-oxide-semiconductor logic (NMOS) using non-saturated enhancement mode transistors as loads thus demanding a +12 V and a −5 V voltage in addition to the main transistor–transistor logic (TTL) compatible +5 V.
The 8086 is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus, and is notable as the processor used in the original IBM PC design.
The Z80 is an 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first working samples were delivered in March 1976, and it was officially introduced on the market in July 1976. With the revenue from the Z80, the company built its own chip factories and grew to over a thousand employees over the following two years.
CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/85-based microcomputers by Gary Kildall of Digital Research, Inc. Initially confined to single-tasking on 8-bit processors and no more than 64 kilobytes of memory, later versions of CP/M added multi-user variations and were migrated to 16-bit processors.
Digital Research, Inc. was a company created by Gary Kildall to market and develop his CP/M operating system and related 8-bit, 16-bit and 32-bit systems like MP/M, Concurrent DOS, FlexOS, Multiuser DOS, DOS Plus, DR DOS and GEM. It was the first large software company in the microcomputer world. Digital Research was originally based in Pacific Grove, California, later in Monterey, California.
MP/M is a discontinued multi-user version of the CP/M operating system, created by Digital Research developer Tom Rolander in 1979. It allowed multiple users to connect to a single computer, each using a separate terminal.
86-DOS is a discontinued operating system developed and marketed by Seattle Computer Products (SCP) for its Intel 8086-based computer kit. Initially known as QDOS, the name was changed to 86-DOS once SCP started licensing the operating system in 1980.
In computing, binary translation is a form of binary recompilation where sequences of instructions are translated from a source instruction set to the target instruction set. In some cases such as instruction set simulation, the target instruction set may be the same as the source instruction set, providing testing and debugging features such as instruction trace, conditional breakpoints and hot spot detection.
The PL/M programming language (an acronym of Programming Language for Microcomputers) is a high-level language conceived and developed by Gary Kildall in 1973 for Hank Smith at Intel for its microprocessors.
A fat binary is a computer executable program or library which has been expanded with code native to multiple instruction sets which can consequently be run on multiple processor types. This results in a file larger than a normal one-architecture binary file, thus the name.
The IMSAI 8080 was an early microcomputer released in late 1975, based on the Intel 8080 and later 8085 and S-100 bus. It was a clone of its main competitor, the earlier MITS Altair 8800. The IMSAI is largely regarded as the first "clone" microcomputer. The IMSAI machine ran a highly modified version of the CP/M operating system called IMDOS. It was developed, manufactured and sold by IMS Associates, Inc.. In total, between 17,000 and 20,000 units were produced from 1975 to 1978.
Multiuser DOS is a real-time multi-user multi-tasking operating system for IBM PC-compatible microcomputers.
Digital Systems Inc., Seattle, USA, between 1966 and 1979 an accounting service and technology development company founded by John Q. Torode. The company was reorganized into the microcomputer design and development company Digital Microsystems, Inc. (DMS), Oakland, USA, founded in 1979. In 1984, it was sold to the new UK operation Digital Microsystems Ltd. (DML) and finally ended its US operations in 1986. Without Torode, Digital Microsystems Ltd.'s product HiNet was sold to Apricot Computers Plc in 1987. In 1986, Torode founded a new company, IC Designs, Inc., based partly on Theodore "Ted" H. Kehl's VLSI technology at the University of Washington (UW), which was bought by Cypress Semiconductor Corp. in 1993.
Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded systems, the addresses for objects were absolute starting at a known location, often zero. Since multiprocessing systems dynamically link and switch between programs it became necessary to be able to relocate objects using position-independent code. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.
Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary information in ASCII text form. It is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices and hardware emulators. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. Some also use it as a container format holding packets of stream data. Common file extensions used for the resulting files are .HEX or .H86. The HEX file is then read by a programmer to write the machine code into a PROM or is transferred to the target system for loading and execution.
A source-to-source translator, source-to-source compiler, transcompiler, or transpiler is a type of translator that takes the source code of a program written in a programming language as its input and produces an equivalent source code in the same or a different programming language. A source-to-source translator converts between programming languages that operate at approximately the same level of abstraction, while a traditional compiler translates from a higher level programming language to a lower level programming language. For example, a source-to-source translator may perform a translation of a program from Python to JavaScript, while a traditional compiler translates from a language like C to assembler or Java to bytecode. An automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations or language constructs.
CBASIC is a compiled version of the BASIC programming language written for the CP/M operating system by Gordon Eubanks in 1976–1977. It is an enhanced version of BASIC-E.
CPMulator is a program to emulate the CP/M operating system under x86 DOS. The program was developed in 1984 by Keystone Software Development. The company was owned and operated by Jay Sprenkle.
ISIS, short for Intel System Implementation Supervisor, is an operating system for early Intel microprocessors like the 8080. It was originally developed by Ken Burgett and Jim Stein under the management of Steve Hanna and Terry Opdendyk for the Intel Microprocessor Development System with two 8" floppy drives, starting in 1975, and later adopted as ISIS-II as the operating system for the PL/M compiler, assembler, link editor, and In-Circuit Emulator. The ISIS operating system was developed on an early prototype of the MDS 800 computer, the same type of hardware that Gary Kildall used to develop CP/M.
[…] "Unless you have a translating scheme that takes account of the peculiar idiosyncrasies of the target microprocessor, there is no way that an automatic translator can work," explains Daniel Davis, a programmer with Digital Research. "You'll end up with direct transliterations." […] In spite of all these limitations, progress has been made recently in the development of translators. Most notably, Digital Research has introduced its eight- to 16-bit assembly code translator. Based on research performed by Digital Research president Gary Kildall, the XLT86 appears to offer advances over previously available software translator technology. Like Sorcim's Trans and Intel's Convert 86, Kildall's package translates assembly-language code from an 8080 microprocessor to an 8086. However, Kildall has applied a global flow analysis technique that takes into account some of the major drawbacks of other translators. The procedure analyzes the register and flag usage in sections of 8080 code in order to eliminate nonessential code. According to Digital Research programmer Davis, the algorithm Kildall uses allows the translator to consider the context as it translates the program. Until now, one of the major problems with any translator program has been the inability of the software to do much more than transliteration. If Digital Research's new translator actually advances the technology to the point where context can be considered, then more software translators may proliferate in the microcomputer marketplace.
In March, 1995, the Software Publishers Association posthumously honored Gary for his contributions to the computer industry. They listed some of his accomplishments: […] In the 1980s, through DRI, he introduced a binary recompiler. […]
[…] Rolander: I mentioned earlier that Gary liked to approach a problem as an architect. […] And he would draw the most beautiful pictures of his data structures. […] And when he finished that […] and was convinced those data structures were now correct, he would go into just an unbelievable manic coding mode. He would just go for as many as 20 hours a day […] he was just gone during these periods of time. On a couple of those occasions, when he'd get something running the first time, which could be in the middle of night. And all you who have written software have seen that, for example, that the first time it comes up on the screen, you’ve got to tell somebody. My wife Lori will tell you that I had a couple of those calls in the middle of the night, LOGO was one example, XLT 86 was another, where he got it running the first time, and he had to have somebody see it. So it didn't matter what time it was, he'd call me, I'd have to come over and see it running. […](33 pages)
[…] XLT-86 is an analytical translator program written in PL/I-80. It reads the entire 8080 source program, assembles it to machine code, analyzes the register, memory and flag utilization, and emits an optimized 8086 assembly-language program. […] The program translation proceeds in a five-step process. First, the program is scanned and assembled to produce symbol values and locations. Second, the program structure is analyzed and decomposed into basic blocks. Third, the basic blocks are analyzed to determine program flow and resource usage. Forth, the block structure and register allocation data is gathered into a listing for the user. Fifth, the flow information and source program are used to produce the 8086 source program. […]
[…] Kildall: […] A year and a half ago I was probably spending 75% of my time on the business and 25% on programming. XLT-86 was a product I was working on at that time, and it took me nine months to do it. That would have been a three-month project if I had been able to concentrate on it. […]
[…] PC: What are some of the complexities involved in translating a program from 8080 to 8086 form? Kildall: Straight translations at the source program level you can do pretty much mechanically. For example, an 8080 "Add immediate 5" instruction turns into an "Add AL 5" on the 8086 — very straightforward translation of the op codes themselves. The complexity in mechanical translation comes from situations such as this: The 8080 instruction DAD H takes the HL register and adds DE to it. For the 8086 the equivalent instruction would be something like ADD DX BX, which is fine, no particular problem. You just say the DX register is the same as HL and BX the same as DE. The problem is that the 8086 instruction has a side effect of setting the zero flag, and the 8080 instruction does not. In mechanical translation you end up doing something like saving the flags, restoring the flags, doing some shifts and rotates, and so forth. These add about five or six extra instructions to get the same semantic effect. There are a lot of sequences in 8080 code that produce very strange sequences in 8086 code; they just don't map very well because of flag registers and things of that sort. The way we get software over is a thing called XLT-86. It's been out six months or so. PC: By "better" code do you mean smaller? Kildall: Twenty percent smaller than if you just took every op code and did a straight translation, saving the registers to preserve semantics. PC: How does the size of the translated program compare to the 8080 version? Kildall: If you take an 8080 program, move it over to 86 land and do an XLT-86 translation, you'll find that it is roughly 10 to 20 percent larger. With 16-bit machines it's more difficult to address everything; you get op codes that are a little bit bigger on the average. An interesting phenomenon is that one of the reasons you don't get a tremendous speed increase in the 16-bit world is because you're running more op codes over the data bus. […]