Voice modem command set

Last updated

A voice modem is an analog telephone data modem with a built-in capability of transmitting and receiving voice recordings over the phone line. Voice modems are used for telephony and answering machine applications. Similar to the Hayes command set used for data modems, in which the host PC commands the modem via a series of commands known as AT commands, there exists a well-defined set of common voice AT commands that are somewhat consistent throughout the industry.

Contents

Implementation problems

Because voice mode is not the typical use for a modem, many modems on the market have poor or buggy support for their voice modes. Characteristics of a good voice modem depend greatly upon the intended application, and include:

Plus versus Hash

Each voice modem platform tends to support either one of two sets of voice commands—in particular, one flavor of the command set contains a plus (+) sign, and the other contains a hash (#) sign.

Detecting voice mode

Support for voice mode can be detected on a modem by issuing the following command: AT+FCLASS=?

This command is usually supported containing the plus sign whether a modem supports "plus" or the "hash" command set, because the command (which stands for "fax class") is part of the industry-standard fax commands which always use the plus.

A modem supporting voice will respond with a comma-delimited list of numbers that includes the number 8. A modem not supporting voice will respond with ERROR, or with a list of numbers not including 8. (Many modems will report 0,1,2 indicating support for data (0), and class 1 and 2 faxes—this is an indication that voice support is not present.)

Modems supporting the "hash" command set usually respond to AT#CLS=? as well.

Entering voice mode

The command AT+FCLASS=8 or AT#CLS=8 will put the modem in voice mode. Most modems still remain on-hook and respond with OK. Once this command has been accepted, most modems will respond with Data Link Escape (DLE) messages instead of or in addition to normal modem responses. For example, instead of reporting a phone line ringing with the RING message, many modems will instead send the DLE ASCII character, followed by the letter R. The specific set of DLE events reported by each modem is specific to its chipset and documented in its reference guide.

Querying the modem's capabilities

The command AT+VLS=? or AT#VLS=? usually returns a list of operating modes that are specific to each modem. Each of these numbered modes determines the telephone line's on-hook or off-hook status, as well as sound routing between each of the following:

Many chipsets offer a listing of all the possible combinations of modes even if the specific modem board doesn't support them all. That's because the board manufacturer is almost always different from the chipset maker, and the chipset comes pre-configured to support all possible hardware, even if not implemented on the circuit board.

Example of response to AT+VLS=? from a modem on the market in 2006:

AT+VLS=?  0,"",0000000000,0000000000,B084008000  1,"T",0B8418E000,0FE418E000,0B8419E000  2,"L",0884008000,0CE4008000,0884018000  3,"LT",0B8418E000,0FE418E000,0B8419E000  4,"S",0084008000,0484008000,3084018000  5,"ST",0B8418E000,0FE418E000,0B8419E000  6,"M",0084008000,04E4008000,3084008000  7,"MST",0B8418E000,0FE418E000,0B8419E000  8,"S1",0084008000,0484008000,3084018000  9,"S1T",0B8418E000,0FE418E000,0B8419E000  10,"MS1T",0B8418E000,0FE418E000,0B8419E000  11,"M1",0084008000,04E4008000,3084008000  13,"M1S1T",0B8418E000,0FE418E000,0B8419E000  14,"H",0084008000,04E4008000,3084018000  15,"HT",0B8418E000,0FE418E000,0B8419E000  16,"MS",0084008000,04E4008000,3084018000  17,"MS1",0084008000,04E4008000,3084018000  19,"M1S1",0084008000,04E4008000,3084018000  20,"t",0B8418E000,0FE418E000,BB8419E000

While every modem is different, usually mode 0 means on-hook (hung up) and mode 1 is sufficient to pick up the phone, record/playback audio, and detect DTMF (touch tones).

The command AT+VSM=? or AT#VSM=? usually returns a list of audio data formats supported by the modem. Each format includes a name (such as PCM, ADPCM, μ-law, A-law), a number of bits per sample (usually 2, 3, 4, 8, or 16) and an audio sampling rate (usually 7,200, 8,000, or 11,025 Hertz). These are industry-standard audio codecs whose implementations are well published. The ADPCM standard is an exception. Modems claiming to support ADPCM almost always support Dialogic ADPCM, also known as "VOX", which is similar but not compatible with other ADPCM implementations, including Interactive Multimedia Association (IMA) ADPCM as well as MS ADPCM (a Microsoft implementation used in WAV files). Modems may support these as well, if a qualifier is listed—otherwise, by default, ADPCM means Dialogic.

Example response to AT+VSM=? from a modem on the market in 2006:

AT+VSM=?  1,"UNSIGNED PCM",8,0,8000,0,0  129,"IMA ADPCM",4,0,8000,0,0  130,"UNSIGNED PCM",8,0,8000,0,0  140,"2 Bit ADPCM",2,0,8000,  141,"4 Bit ADPCM",4,0,8000,0,0

The desired audio data format is selected using the same command but with a number instead of a question mark. It is used for both sending and receiving.

Answering calls

Answering calls is usually done with either the AT+VLS=n or AT#VLS=n commands, where n is a number representing the modem's mode. For the vast majority of modems, this number will be 1 to answer a telephone call, and 0 to hang up; other numbers activate other functionality when present, such as speakerphone. Some modems answer in response to ATA—the standard data-mode answer command—but other modems will interpret this as a command to actually answer in data and not voice mode.

Transmitting audio data

To begin transmitting audio data, the host sends the command AT+VTX or AT#VTX. This results in a response from the modem of CONNECT or VCON. (Modems using the "plus" command set usually respond CONNECT, while those using the "hash" set respond VCON, which stands for voice connect.)

From then on, the modem interprets any data sent from the computer as wave audio data, using the codec selected by the AT+VSM or AT#VSM command.

The audio data is always sent to the modem slightly faster than it can play it, so the modem may buffer a small portion of it and play it smoothly with no clicks or pops caused by delays in the computer's operating system. For example, during playback of an 8 kHz audio file at 8-bit resolution (which creates 8,000 bytes, or 80,000 bits when including start/stop bits, per second), the data must travel over the serial port at a minimum of 115,200 bits per second. (115,200 bit/s is the first setting of a typical computer serial port that's greater than 80,000.) In addition, due to some extra overhead involved in doubling DLE bytes in the stream (mentioned below), a small amount of extra bandwidth is mandatory to allow for this.

When the modem wants the computer to temporarily pause so the playback can catch up, it temporarily lowers the CTS (Clear to Send) signal on the RS-232 serial port. The modem re-raises the signal in time for the computer to resume sending audio data before the playback buffer becomes completely empty.

When the computer wants to signal the end of audio data, most modems expect to see an ASCII DLE character (0x10), followed by the ! character.

Because the DLE byte can and often does occur in normal audio data, it must be sent twice to the modem when it is to be interpreted as a byte of audio data.

Most modems also accept a sequence of DLE + CAN (cancel) as a signal to cancel audio playback. The distinction is that the modem is to understand that it is to immediately abort playback now, rather than let remaining data in the playback buffer run to completion.

When the modem is done playback, it responds OK.

Throttling playback

During playback, it is necessary to send the audio data at a rate that keeps the audio playing smoothly, but without sending it faster than the modem can handle it. It is also desirable to make sure the modem can always abort playback and discard any buffered audio in case a message is to be canceled. Message cancellation is expected by callers who already know the answers to voice prompts and provide their answer early (and who would become irritated at being forced to listen to a prompt they've already responded to).

There are several ways to keep the computer sending audio data to the modem at a rate to keep up with playback without overrunning the audio buffer.

The most straightforward is to use CTS flow control. The following caveats exist.

A second way to throttle playback involves polling a "tick" timer provided by the host computer's operating system and based on a hardware clock that's independent of the host's CPU load. This may or may not be available, and it depends entirely on the host operating system. However, when available, it is extremely reliable. It is reasonable to assume that the PC needs to stay ahead of the playback by a couple of hundred bytes and that the modem will buffer this. (The commands AT+VBQ or AT#VBQ on voice modems will often reveal the size of the buffer in bytes, and 1 to 2 kilobytes is a typical response.)

A third way to throttle playback involves inserting dummy DLE messages into the output stream such that the audio data takes a known amount of time to transmit through the serial port, and the playback is essentially clocked by the UART in the serial port.

For example, when considering using dummy DLE stuffing, a few things must first be noted. In a typical scenario, one second of audio might be 8,000 one-byte samples, and with a small percentage of the samples being equal to the DLE byte and must be doubled, a typical second of audio might be 8,050 bytes. The trick involves inserting enough meaningless DLE messages into the bytes that the modem will discard (that is, a DLE followed by a byte without any specific meaning) so that there are exactly 11,520 bytes (assuming a serial port locked at 115,200 bit/s) which will take exactly 1 second to transmit through the serial port. Although it is possible that interrupt latency on the host PC may cause slightly less than 11,520 bytes to be sent per second, most voice modems will buffer enough bytes before actually starting playback to permit a small skew here. Also the PC can be programmed to convert a second of audio into slightly fewer than 11,520 bytes (all voice modems will buffer a small overrun without the need for flow control as long as it is no more than a few hundred bytes).

Dummy DLE stuffing is unlikely to work with "Winmodems" that have no physical UART. It makes sense only with external serial modems that are physically clocked to a specific bit rate by a clock generator behind the external serial port.

Recording audio data

The method for recording audio data is the same, except that the command is AT+VRX, or AT#VRX, and the modem transmits audio data while the computer receives it. The RTS/CTS flow control are not used here (the computer must accept all the audio data it receives, and the modem automatically paces its transmission to match the audio sampling rate).

The modem never stops transmitting until the computer tells it to stop, which is usually with CTRL-C. The data is always terminated with DLE+!, and all DLE bytes naturally occurring in the stream are sent twice to differentiate them from normal DLE messages.

Before, during, and after recording, the modem may notify the computer host of specific events including, but not limited to, the following:

When the modem wants to tell the host about these, it sends a DLE byte, plus a (usually) 1-byte message describing the event. The list of supported events varies by modem, but usually a digit (as well as * and #) mean touch-tones pressed, and the letter "s" means silence detected. Some modems report only one event for each touch-tone keypress, while others report a keypress repeatedly until the key is released, and then a special "key released" event.

Terminating a voice call

Any of the following commands usually cause the modem to hang up and terminate a voice call: AT+VLS=0, AT#VLS=0, ATH, ATZ. Dropping the RS-232 DTR (data terminal ready) signal often accomplishes this as well. The modem remains in voice mode (except in the case of ATZ).

Voice modems do not automatically hang up even when the caller on the other end does. They may report the hangup, dialtone, or silence events, but it is up to the computer to act upon them. If when the modem is recording, the caller hangs up and the computer doesn't react, the modem will continue providing the audio recording everything else heard on the line, such as dial tones, telephone company error messages, and so forth.

See also

Related Research Articles

Original Chip Set

The Original Chip Set (OCS) is a chipset used in the earliest Commodore Amiga computers and defined the Amiga's graphics and sound capabilities. It was succeeded by the slightly improved Enhanced Chip Set (ECS) and greatly improved Advanced Graphics Architecture (AGA).

16550 UART Integrated circuit serial port implementation

The 16550 UART is an integrated circuit designed for implementing the interface for serial communications. The corrected -A version was released in 1987 by National Semiconductor. It is frequently used to implement the serial port for IBM PC compatible personal computers, where it is often connected to an RS-232 interface for modems, serial mice, printers, and similar peripherals. It was the first serial chip used in the IBM PS/2 line, which were introduced in 1987.

Serial port Communication interface transmitting information sequentially

In computing, a serial port is a serial communication interface through which information transfers in or out sequentially one bit at a time. This is in contrast to a parallel port, which communicates multiple bits simultaneously in parallel. Throughout most of the history of personal computers, data has been transferred through serial ports to devices such as modems, terminals, various peripherals, and directly between computers.

The Serial Line Internet Protocol (SLIP) is an encapsulation of the Internet Protocol designed to work over serial ports and router connections. It is documented in RFC 1055. On personal computers, SLIP has largely been replaced by the Point-to-Point Protocol (PPP), which is better engineered, has more features, and does not require its IP address configuration to be set before it is established. On microcontrollers, however, SLIP is still the preferred way of encapsulating IP packets, due to its very small overhead.

The Hayes command set is a specific command language originally developed by Dennis Hayes for the Hayes Smartmodem 300 baud modem in 1981.

GeoPort is a serial data system used on some models of the Apple Macintosh that could be externally clocked to run at a 2 Mbit/s data rate. GeoPort slightly modified the existing Mac serial port pins to allow the computer's internal DSP hardware or software to send data that, when passed to a digital-to-analog converter, emulated various devices such as modems and fax machines. GeoPort could be found on late-model 68K-based machines as well as many pre-USB Power Macintosh models and PiPPiN. Some later Macintosh models also included an internal GeoPort via an internal connector on the Communications Slot. Apple GeoPort technology is now obsolete, and modem support is typically offered through USB.

The Serial Peripheral Interface (SPI) is a synchronous serial communication interface specification used for short-distance communication, primarily in embedded systems. The interface was developed by Motorola in the mid-1980s and has become a de facto standard. Typical applications include Secure Digital cards and liquid crystal displays.

DECstation DEC brand of computers

The DECstation was a brand of computers used by DEC, and refers to three distinct lines of computer systems—the first released in 1978 as a word processing system, and the latter two both released in 1989. These comprised a range of computer workstations based on the MIPS architecture and a range of PC compatibles. The MIPS-based workstations ran ULTRIX, a DEC-proprietary version of UNIX, and early releases of OSF/1.

ANIM is a file format, used to store digital movies and computer generated animations, and is a variation of the ILBM format, which is a subformat of Interchange File Format.

Null modem

Null modem is a communication method to directly connect two DTEs using an RS-232 serial cable. The name stems from the historical use of RS-232 cables to connect two teleprinter devices or two modems in order to communicate with one another; null modem communication refers to using a crossed-over RS-232 cable to connect the teleprinters directly to one another without the modems. It is also used to serially connect a computer to a printer, since both are DTE, and is known as a Printer Cable.

G.726 ITU-T Recommendation

G.726 is an ITU-T ADPCM speech codec standard covering the transmission of voice at rates of 16, 24, 32, and 40 kbit/s. It was introduced to supersede both G.721, which covered ADPCM at 32 kbit/s, and G.723, which described ADPCM for 24 and 40 kbit/s. G.726 also introduced a new 16 kbit/s rate. The four bit rates associated with G.726 are often referred to by the bit size of a sample, which are 2, 3, 4, and 5-bits respectively. The corresponding wide-band codec based on the same technology is G.722.

Flash Video is a container file format used to deliver digital video content over the Internet using Adobe Flash Player version 6 and newer. Flash Video content may also be embedded within SWF files. There are two different Flash Video file formats: FLV and F4V. The audio and video data within FLV files are encoded in the same way as SWF files. The F4V file format is based on the ISO base media file format, starting with Flash Player 9 update 3. Both formats are supported in Adobe Flash Player and developed by Adobe Systems. FLV was originally developed by Macromedia. In the early 2000s, Flash Video was the de facto standard for web-based streaming video. Users include Hulu, VEVO, Yahoo! Video, metacafe, Reuters.com, and many other news providers.

8250 UART

The 8250 UART is an integrated circuit designed for implementing the interface for serial communications. The part was originally manufactured by the National Semiconductor Corporation. It was commonly used in PCs and related equipment such as printers or modems. The 8250 included an on-chip programmable bit rate generator, allowing use for both common and special-purpose bit rates which could be accurately derived from an arbitrary crystal oscillator reference frequency.

The AAA chipset was intended to be the next-generation Amiga multimedia system designed by Commodore International. Initially begun as a secret project, the first design discussions were started in 1988, and after many revisions and redesigns the first silicon versions were fabricated in 1992–1993. The project was all but abandoned in 1993 after it was projected that PCs were to equal the AAA shortly after release, so a further jump was needed, leading to project Hombre. AAA was not designed to be AGA compatible.

Macintosh Quadra 840AV Personal computer by Apple, Inc.

The Macintosh Quadra 840AV is a personal computer designed, manufactured, and sold by Apple Computer, Inc. from July 1993 to July 1994. It was introduced alongside the Centris 660AV, where "AV" signifies audiovisual capabilities, such as video input and output, telecommunications, speech recognition, and enhanced audio. The 840AV has the same mini tower form factor as the Quadra 800, with a faster Motorola 68040 processor.

Command and Data modes refer to the two modes in which a computer modem may operate. These modes are defined in the Hayes command set, which is the de facto standard for all modems. These modes exist because there is only one channel of communication between the modem and the computer, which must carry both the computer's commands to the modem, as well as the data that the modem is enlisted to transmit to the remote party over the telephone line.

T.38 is an ITU recommendation for allowing transmission of fax over IP networks (FoIP) in real time.

Keyboard controller (computing)

In computing, a keyboard controller is a device that interfaces a keyboard to a computer. Its main function is to inform the computer when a key is pressed or released. When data from the keyboard arrives, the controller raises an interrupt to allow the CPU to handle the input.

Atari SIO

The Serial Input/Output system, universally known as SIO, was a proprietary peripheral bus and related software protocol stacks used on the Atari 8-bit family to provide most input/output duties for those computers. Unlike most I/O systems of the era, such as RS-232, SIO included a lightweight protocol that allowed multiple devices to be attached to a single daisy-chained port that supported dozens of devices. It also supported plug-and-play operations. SIO's designer, Joe Decuir, credits his work on the system as the basis of USB.

PlayStation technical specifications Overview of the technical specifications of the PlayStation

The PlayStation technical specifications describe the various components of the original PlayStation video game console.

References