IBM Shoebox

Last updated

The IBM Shoebox was a 1961 IBM computer that was able to perform mathematical functions and provide speech recognition. It was capable of recognizing 16 spoken words, including the digits from 0 through 9.

Contents

It was developed by William C. Dersch in the Advanced Systems Development Division Laboratory at IBM [1]

History

It was displayed at the IBM Pavilion during the 1962 Seattle World's Fair. [1]

It was approximately the size and shape of a standard American shoebox. It had a display of ten small lamp lights labeled with the digits 0 through 9 and an attached microphone. Speaking the name of the digit into the microphone would cause the appropriate digit lamp to light. [1]

Inside the box were a power supply, three analog audio filters and some (presumably) Diode-Resistor-Logic circuitry. The design allowed for the recognition of each digit name “Zero”, “One”, Two” … “Nine” and its front, middle, and ending sound. (Sometimes no middle). And that each sound was high pitched, middle pitched or low pitched. Example: “Five” is High-Middle-High. “Zero” is High-Middle-Low. The microphone was connected to the three audio filters for high, middle, and low pass. The filters latched the logic based decoder and switched one of the ten lamps.[ citation needed ]

Early development in Natural Language Processing, like the IBM Shoebox, has influenced development in fields such as speech recognition, including things like "voice dialing", "call routing", and "automated appliance control". [2]

See also

Related Research Articles

<span class="mw-page-title-main">Binary-coded decimal</span> System of digitally encoding numbers

In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.

<span class="mw-page-title-main">Amiga Original Chip Set</span> Chipset used in Amiga personal computer

The Original Chip Set (OCS) is a chipset used in the earliest Commodore Amiga computers and defined the Amiga's graphics and sound capabilities. It was succeeded by the slightly improved Enhanced Chip Set (ECS) and the greatly improved Advanced Graphics Architecture (AGA).

<span class="mw-page-title-main">Sound card</span> Expansion card that provides input and output of audio signals

A sound card is an internal expansion card that provides input and output of audio signals to and from a computer under the control of computer programs. The term sound card is also applied to external audio interfaces used for professional audio applications.

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech-to-text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.

<span class="mw-page-title-main">Vocoder</span> Voice encryption, transformation, and synthesis device

A vocoder is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation.

<span class="mw-page-title-main">Sound effect</span> Artificially created or enhanced sound

A sound effect is an artificially created or enhanced sound, or sound process used to emphasize artistic or other content of films, television shows, live performance, animation, video games, music, or other media.

A ternary numeral system has three as its base. Analogous to a bit, a ternary digit is a trit. One trit is equivalent to log2 3 bits of information.

<span class="mw-page-title-main">Microphone</span> Device that converts sound into an electrical signal

A microphone, colloquially called a mic, or mike, is a transducer that converts sound into an electrical signal. Microphones are used in many applications such as telephones, hearing aids, public address systems for concert halls and public events, motion picture production, live and recorded audio engineering, sound recording, two-way radios, megaphones, and radio and television broadcasting. They are also used in computers and other electronic devices, such as mobile phones, for recording sounds, speech recognition, VoIP, and other purposes, such as ultrasonic sensors or knock sensors.

<span class="mw-page-title-main">IBM 1620</span> Small IBM scientific computer released in 1959

The IBM 1620 was announced by IBM on October 21, 1959, and marketed as an inexpensive scientific computer. After a total production of about two thousand machines, it was withdrawn on November 19, 1970. Modified versions of the 1620 were used as the CPU of the IBM 1710 and IBM 1720 Industrial Process Control Systems.

<span class="mw-page-title-main">IBM 1401</span> 1960s decimal computer

The IBM 1401 is a variable-wordlength decimal computer that was announced by IBM on October 5, 1959. The first member of the highly successful IBM 1400 series, it was aimed at replacing unit record equipment for processing data stored on punched cards and at providing peripheral services for larger computers. The 1401 is considered by IBM to be the Ford Model-T of the computer industry due to its mass appeal. Over 12,000 units were produced and many were leased or resold after they were replaced with newer technology. The 1401 was withdrawn on February 8, 1971.

<span class="mw-page-title-main">Mixing console</span> Device used for audio mixing

A mixing console or mixing desk is an electronic device for mixing audio signals, used in sound recording and reproduction and sound reinforcement systems. Inputs to the console include microphones, signals from electric or electronic instruments, or recorded sounds. Mixers may control analog or digital signals. The modified signals are summed to produce the combined output signals, which can then be broadcast, amplified through a sound reinforcement system or recorded.

<span class="mw-page-title-main">Megaphone</span> Portable or hand-held device to amplify voices and sounds

A megaphone, speaking trumpet, bullhorn, blowhorn, or loudhailer is usually a portable or hand-held, cone-shaped acoustic horn used to amplify a person's voice or other sounds and direct it in a given direction. The sound is introduced into the narrow end of the megaphone, by holding it up to the face and speaking into it, and the sound waves radiate out the wide end. A megaphone increases the volume of sound by increasing the acoustic impedance seen by the vocal cords, matching the impedance of the vocal cords to the air, so that more sound power is radiated. It also serves to direct the sound waves in the direction the horn is pointing. It somewhat distorts the sound of the voice because the frequency response of the megaphone is greater at higher sound frequencies.

<span class="mw-page-title-main">Surround sound</span> System with loudspeakers that surround the listener

Surround sound is a technique for enriching the fidelity and depth of sound reproduction by using multiple audio channels from speakers that surround the listener. Its first application was in movie theaters. Prior to surround sound, theater sound systems commonly had three screen channels of sound that played from three loudspeakers located in front of the audience. Surround sound adds one or more channels from loudspeakers to the side or behind the listener that are able to create the sensation of sound coming from any horizontal direction around the listener.

<span class="mw-page-title-main">Spectrogram</span> Visual representation of the spectrum of frequencies of a signal as it varies with time

A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represented in a 3D plot they may be called waterfall displays.

<span class="mw-page-title-main">PSK31</span> Type of radioteletype mode

PSK31 or "Phase Shift Keying, 31 Baud", also BPSK31 and QPSK31, is a popular computer-sound card-generated radioteletype mode, used primarily by amateur radio operators to conduct real-time keyboard-to-keyboard chat, most often using frequencies in the high frequency amateur radio bands (near-shortwave). PSK31 is distinguished from other digital modes in that it is specifically tuned to have a data rate close to typing speed, and has an extremely narrow bandwidth, allowing many conversations in the same bandwidth as a single voice channel. This narrow bandwidth makes better use of the RF energy in a very narrow space thus allowing relatively low-power equipment to communicate globally using the same skywave propagation used by shortwave radio stations.

<span class="mw-page-title-main">PC speaker</span> Internal loudspeaker built into some (older) IBM PC-compatible computers

A PC speaker is a loudspeaker built into some IBM PC compatible computers. The first IBM Personal Computer, model 5150, employed a standard 2.25 inch magnetic driven (dynamic) speaker. More recent computers use a tiny moving-iron or piezo speaker instead. The speaker allows software and firmware to provide auditory feedback to a user, such as to report a hardware fault. A PC speaker generates waveforms using the programmable interval timer, an Intel 8253 or 8254 chip.

<span class="mw-page-title-main">Sound reinforcement system</span> Amplified sound system for public events

A sound reinforcement system is the combination of microphones, signal processors, amplifiers, and loudspeakers in enclosures all controlled by a mixing console that makes live or pre-recorded sounds louder and may also distribute those sounds to a larger or more distant audience. In many situations, a sound reinforcement system is also used to enhance or alter the sound of the sources on the stage, typically by using electronic effects, such as reverb, as opposed to simply amplifying the sources unaltered.

Computational auditory scene analysis (CASA) is the study of auditory scene analysis by computational means. In essence, CASA systems are "machine listening" systems that aim to separate mixtures of sound sources in the same way that human listeners do. CASA differs from the field of blind signal separation in that it is based on the mechanisms of the human auditory system, and thus uses no more than two microphone recordings of an acoustic environment. It is related to the cocktail party problem.

<span class="mw-page-title-main">Virtual assistant</span> Software agent

A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.

<span class="mw-page-title-main">Equalization (audio)</span> Changing the balance of frequency components in an audio signal

Equalization, or simply EQ, in sound recording and reproduction is the process of adjusting the volume of different frequency bands within an audio signal. The circuit or equipment used to achieve this is called an equalizer.

References

  1. 1 2 3 "IBM Shoebox". IBM. Archived from the original on January 19, 2005.
  2. Rost, Michael (December 3, 2015). Teaching and Researching Listening: Third Edition (Applied Linguistics in Action). Routledge. p. 92. ISBN   978-1138840386.