SQUOZE

Last updated

SQUOZE (abbreviated as SQZ) is a memory-efficient representation of a combined source and relocatable object program file with a symbol table on punched cards which was introduced in 1958 with the SCAT assembler [1] [2] on the SHARE Operating System (SOS) for the IBM 709. [3] [4] A program in this format was called a SQUOZE deck . [5] [6] [7] It was also used on later machines including the IBM 7090 and 7094.

Contents

Encoding

In the SQUOZE encoding, identifiers in the symbol table were represented in a 50-character alphabet, allowing a 36-bit machine word to represent six alphanumeric characters plus two flag bits, thus saving two bits per six characters, [6] [1] because the six bits normally allocated for each character could store up to 64 states rather than only the 50 states needed to represent the 50 letters of the alphabet, and 506 < 234.

SQUOZE character codes [1]
Most
significant
digits
Least significant digits
Dec+0+1+2+3+4+5+6+7
Oct01234567
Dec Oct Bin 000001010011100101110111
+00000space0123456
+81001789ABCDE
+162010FGHIJKLM
+243011NOPQRSTU
+324100VWXYZ= #/ %) ⌑
+405101+ &-- @+ &-*/$
+486110,.

Using base 50 already saves a single bit every three characters, so it was used in two three-character chunks. The manual [1] has a formula for encoding six characters ABCDEF:

For example "SQUOZE", normally 36 bits: 35 33 37 31 44 17(base 8) would be encoded in two 17-bit pieces to fit in the 34 bits as ( 0o220231 << 17 ) | 0o175473 == 0o110114575473.

A simpler example of the same logic would be how a three-digit BCD number would take up 12 bits, such as 987: 9 8 7(base 16)1001 1000 0111(base 2), but any such value could be stored in 10 bits directly, saving two bits, such as 987: 3db(base 16)11 1101 1011(base 2).

Etymology

"Squoze" is a facetious past participle of the verb 'to squeeze'. [5] [6]

The name SQUOZE was later borrowed for similar schemes used on DEC machines; [4] they had a 40-character alphabet (50 in octal) and were called DEC RADIX 50 and MOD40, [8] but sometimes nicknamed DEC Squoze.

See also

Related Research Articles

<span class="mw-page-title-main">Assembly language</span> Low-level programming language

In computer programming, assembly language, often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.

The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. To disambiguate arbitrarily sized bytes from the common 8-bit definition, network protocol documents such as the Internet Protocol refer to an 8-bit byte as an octet. Those bits in an octet are usually counted with numbering from 0 to 7 or 7 to 0 depending on the bit endianness.

<span class="mw-page-title-main">Binary-coded decimal</span> System of digitally encoding numbers

In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.

Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.

In mathematics and computing, the hexadecimal numeral system is a positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbols, hexadecimal uses sixteen distinct symbols, most often the symbols "0"–"9" to represent values 0 to 9, and "A"–"F" to represent values from ten to fifteen.

<span class="mw-page-title-main">IBM 1401</span> 1960s decimal computer

The IBM 1401 is a variable-wordlength decimal computer that was announced by IBM on October 5, 1959. The first member of the highly successful IBM 1400 series, it was aimed at replacing unit record equipment for processing data stored on punched cards and at providing peripheral services for larger computers. The 1401 is considered to be the Ford Model-T of the computer industry, because it was mass-produced and because of its sales volume. Over 12,000 units were produced and many were leased or resold after they were replaced with newer technology. The 1401 was withdrawn on February 8, 1971.

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte.

<span class="mw-page-title-main">IBM 709</span> Vacuum tube computer system

The IBM 709 was a computer system, initially announced by IBM in January 1957 and first installed during August 1958. The 709 was an improved version of its predecessor, the IBM 704, and was the third of the IBM 700/7000 series of scientific computers. The improvements included overlapped input/output, indirect addressing, and three "convert" instructions which provided support for decimal arithmetic, leading zero suppression, and several other operations. The 709 had 32,768 words of 36-bit magnetic core memory and could execute 42,000 add or subtract instructions per second. It could multiply two 36-bit integers at a rate of 5000 per second.

<span class="mw-page-title-main">IBM 700/7000 series</span> Mainframe computer systems made by IBM through the 1950s and early 1960s

The IBM 700/7000 series is a series of large-scale (mainframe) computer systems that were made by IBM through the 1950s and early 1960s. The series includes several different, incompatible processor architectures. The 700s use vacuum-tube logic and were made obsolete by the introduction of the transistorized 7000s. The 7000s, in turn, were eventually replaced with System/360, which was announced in 1964. However the 360/65, the first 360 powerful enough to replace 7000s, did not become available until November 1965. Early problems with OS/360 and the high cost of converting software kept many 7000s in service for years afterward.

<span class="mw-page-title-main">36-bit computing</span>

In computer architecture, 36-bit integers, memory addresses, or other data units are those that are 36 bits wide. Also, 36-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers, address buses, or data buses of that size. 36-bit computers were popular in the early mainframe computer era from the 1950s through the early 1970s.

Chen–Ho encoding is a memory-efficient alternate system of binary encoding for decimal digits.

RADIX 50 or RAD50, is an uppercase-only character encoding created by Digital Equipment Corporation (DEC) for use on their DECsystem, PDP, and VAX computers.

Several 8-bit character sets (encodings) were designed for binary representation of common Western European languages, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols. These character sets also happen to support many other languages such as Malay, Swahili, and Classical Latin.

<span class="mw-page-title-main">IBM 2741</span>

The IBM 2741 is a printing computer terminal that was introduced in 1965. Compared to the teletypewriter machines that were commonly used as printing terminals at the time, the 2741 offers 50% higher speed, much higher quality printing, quieter operation, interchangeable type fonts, and both upper and lower case letters.

Densely packed decimal (DPD) is an efficient method for binary encoding decimal digits.

A six-bit character code is a character encoding designed for use on computers with word lengths a multiple of 6. Six bits can only encode 64 distinct characters, so these codes generally include only the upper-case letters, the numerals, some punctuation characters, and sometimes control characters. The 7-track magnetic tape format was developed to store data in such codes, along with an additional parity bit.

<span class="mw-page-title-main">Decimal computer</span> Computer operating on base-10 numbers

Decimal computers are computers which can represent numbers and addresses in decimal as well as providing instructions to operate on those numbers and addresses directly in decimal, without conversion to a pure binary representation. Some also had a variable wordlength, which enabled operations on numbers with a large number of digits.

The SHARE Operating System (SOS) is an operating system introduced in 1959 by the SHARE user group. It is an improvement on the General Motors GM-NAA I/O operating system, the first operating system for the IBM 704. The main objective was to improve the sharing of programs.

In computer architecture, 18-bit integers, memory addresses, or other data units are those that are 18 bits wide. Also, 18-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers, address buses, or data buses of that size.

BCD, also called alphanumeric BCD, alphameric BCD, BCD Interchange Code, or BCDIC, is a family of representations of numerals, uppercase Latin letters, and some special and control characters as six-bit character codes.

References

  1. 1 2 3 4 SHARE 709 System Committee, ed. (June 1961) [1959]. "Section 02: SCAT Language; Appendix 1: Table of Permissible Characters; Appendix 3: SQUOZE Deck Format - Chapter 8: Dictionary". SOS Reference Manual - SHARE System for the IBM 709 (PDF). New York, USA: SOS Group, International Business Machines Corporation. pp. 02.00.01 – 02.00.11, 12.03.08.01 – 12.03.08.02, 12.01.00.01. X28-1213. Distribution No. 1–5. Archived (PDF) from the original on 2020-06-18. Retrieved 2020-06-18. pp. 12.03.08.01 – 12.03.08.02: […] Bit Positions Used […] Bit 0 […] Bit 1 […] Bits 2–35 […] Base 50 representation of the symbol with heading character. […] The base 50 representation of a symbol is obtained as follows: […] a. If the symbol has fewer than five characters, it is headed (by blank if it is in an unheaded region). […] b. The symbol with it[s] heading character is left-justified and any unused low-order positions are filled with blanks. […] c. Each character in the symbol is replaced by it[s] base 50 equivalent. […] d. The result is then converted by the following: if the symbol, after each character is rep[l]aced by its base 50 equivalent, is ABCDEF, its base 50 representation is (A*502+B*50+C)*217+(D*502+E*50+F). […]{{cite book}}: CS1 maint: numeric names: editors list (link)
  2. Salomon, David (February 1993) [1992]. Written at California State University, Northridge, California, USA. Chivers, Ian D. (ed.). Assemblers and Loaders (PDF). Ellis Horwood Series In Computers And Their Applications (1 ed.). Chicester, West Sussex, UK: Ellis Horwood Limited / Simon & Schuster International Group. ISBN   0-13-052564-2. Archived (PDF) from the original on 2020-03-23. Retrieved 2008-10-01. (xiv+294+4 pages)
  3. Jacob, Bruce; Ng, Spencer W.; Wang, David T.; Rodrigez, Samuel (2008). "Part I Chapter 3.1.3 On-Line Locality Optimizations: Dynamic Compression of Instructions and Data". Memory Systems: Cache, DRAM, Disk. The Morgan Kaufmann Series in Computer Architecture and Design. Morgan Kaufmann Publishers / Elsevier. p. 147. ISBN   978-0-12-379751-3. (900 pages)
  4. 1 2 Jones, Douglas W. (2018). "Lecture 7, Object Codes, Loaders and Linkers - Final steps on the road to machine code". Operating Systems, Spring 2018. Part of the CS:3620 Operating Systems Collection. The University of Iowa, Department of Computer Science. Archived from the original on 2020-06-06. Retrieved 2020-06-06.
  5. 1 2 Boehm, Elaine M.; Steel, Jr., Thomas B. (June 1958). Machine Implementation of Symbolic Programming - Summary of a Paper to be Presented at the Summer 1958 Meeting of the ACM. ACM '58: Preprints of papers presented at the 13th national meeting of the Association for Computing Machinery. pp. 17-1–17-3. doi:10.1145/610937.610953. Archived from the original on 2020-06-06. Retrieved 2020-06-06. (3 pages)
  6. 1 2 3 Boehm, Elaine M.; Steel, Jr., Thomas B. (April 1959). "The SHARE 709 System: Machine Implementation of Symbolic Programming". Journal of the ACM . 6 (2): 134–140. doi: 10.1145/320964.320968 . S2CID   16545134. Archived from the original on 2020-06-04. Retrieved 2020-06-04. pp. 137–138: […] There is an interesting feature related to the encoding of symbols for inclusion in the dictionary. In the usual mode of expression, symbols may be constructed from a set of 50 characters. If encoding were character by character, six bits would be required for the representation of each such character. As a symbol may contain as many as six characters, a total of 36 bits would be required for the representation of each symbol. This might seem convenient, as the length of a 709 word is exactly 36 bits, but a moment's consideration shows that it is unfortunate as it would be desirable to have a bit or two available in the same word as the symbol representation, giving a clue to the nature of the symbol. These flagging bits can be obtained. Let each character possible represent a digit in a number system having a base of fifty. Now six character symbols may be read as natural numbers in a base fifty system. If these numbers are converted to the usual base two system, only 34 bits are required for the maximum number and a gain of two flag bits has been made. This has the incidental feature of decreasing the requisite number of bits for representing the entire code, but conversion time would outweigh the saving by a significant margin were it not for the peculiar length of the 709 word. Here is a clear illustration of the critical effect the precise specifications of the machine concerned hold over the details of an encoding schema. […]{{cite journal}}: CS1 maint: unfit URL (link) (7 pages)
  7. Shell, Donald L. (April 1959) [October 1958]. "The SHARE 709 System: A Cooperative Effort". Journal of the ACM . 6 (2): 123–127. doi: 10.1145/320964.320966 . S2CID   16476514. Archived from the original on 2020-06-17. Retrieved 2020-06-16. (5 pages)
  8. "8.10 .RAD50". PAL-11R Assembler - Programmer's Manual - Program Assembly Language and Relocatable Assembler for the Disk Operating System (2nd revised printing ed.). Maynard, Massachusetts, USA: Digital Equipment Corporation. May 1971 [February 1971]. p. 8-8. DEC-11-ASDB-D. Retrieved 2020-06-18. p. 8-8: […] PDP-11 systems programs often handle symbols in a specially coded form called RADIX 50 (this form is sometimes referred to as MOD40). This form allows 3 characters to be packed into 16 bits […]

Further reading