Voice modem command set
- Main article: Modem
A voice modem is a term commonly used to describe an analog telephone data modem with a built-in capability of transmitting and receiving voice recordings over the phone line. Voice modems are used for telephony and answering machine applications. Similar to the Hayes command set used for data modems, in which the host PC commands the modem via a series of commands known as AT commands, there exists a well-defined set of common voice AT commands that are somewhat consistent throughout the industry.
Plus versus Hash
Each voice modem platform tends to support either one of two sets of voice commands - in particular, one flavor of the command set contains a + sign, and the other contains a # sign.
Detecting voice mode
Voice mode can be detected on a modem by issuing the following command: AT+FCLASS=?
This command is usually supported whether a modem supports "plus" or the "hash" command set, because the command is also part of the fax commands which always uses the plus.
A modem supporting voice will respond with a list of numbers that includes the number 8. A modem not supporting voice will respond with ERROR, or with a list of numbers not including 8. (Many modems will report 0,1,2 indicating support for data (0), and class 1 and 2 faxes).
Modems supporting the "hash" command set usually respond to AT#CLS=? as well.
Entering voice mode
The command AT+FCLASS=8 or AT#CLS=8 will put the modem in voice mode. Most modems still remain on-hook and respond with OK. Once this command has been accepted, most modems will respond with Data Link Escape (DLE) messages instead of or in addition to normal modem responses. For example, instead of reporting a phone line ringing with the RING message, many modems will instead send the DLE ASCII character, followed by the letter R. The specific set of DLE events reported by each modem is specific to its chipset and documented in its reference guide.
Querying the modem's capabilities
The command AT+VLS=? or AT#VLS=? usually returns a list of operating modes that are specific to each modem. Each of these numbered modes decides all of the following parameters:
- Whether the modem is off-hook or on-hook
- Whether the attached telephone handset can talk over the phone
- Whether audio can be played back or recorded
- Whether an attached speakerphone is enabled or disabled
While every modem is different, usually mode 0 means on-hook (hung up) and mode 1 is sufficient to pick up the phone, record/playback audio, and detect DTMF (touch tones).
The command AT+VSM=? or AT#VSM=? usually returns a list of audio data formats supported by the modem. Each format includes a name (such as PCM, ADPCM, μ-law, A-law), a number of bits per sample (usually 2, 3, 4, 8, or 16) and an audio sampling rate (usually 7200, 8000, or 11025 Hertz). These are industry-standard audio codecs whose implementations are well published. There is one notable exception - the implementation of the ADPCM codec often differs from the industry-standard one used on computers. Modems usually support Dialogic ADPCM, which is a similar but different implementation than what is referred to as MS ADPCM which is commonly used in WAV files.
Answering calls
Answering calls is usually done with either the AT+VLS=n or AT#VLS=n commands, where n is a number representing the modem's mode. Some modems answer in response to ATA - the standard data-mode answer command - but other modems will interpret this as a command to actually answer in data and not voice mode.
Transmitting audio data
To begin transmitting audio data, the host sends the command AT+VTX or AT#VTX. This results in a response from the modem of CONNECT or VCON. (Modems using the "plus" command set usually respond CONNECT, while those using the "hash" set respond VCON, which stands for voice connect).
From then on, the modem interprets any data sent from the computer as wave audio data, using the codec selected by the AT+VSM or AT#VSM command.
Usually the audio data is sent to the modem faster than it can play it. When the modem wants the computer to temporarily pause so the playback can catch up, it temporarily lowers the CTS (Clear-To-Send) signal on the RS232 serial port. The modem re-raises the signal in time for the computer to resume sending audio data before the playback buffer becomes completely empty.
When the computer wants to signal the end of audio data, most modems expect to see an ASCII DLE character (0x10), followed by the ! character.
Because the DLE byte can and often does occur in normal audio data, it must be sent twice to the modem when it is to be interpreted as a byte of audio data.
Most modems also accept a sequence of DLE + CAN (cancel) as a signal to cancel audio playback. The distinction is that the modem is to understand that it is to immediately abort playback now, rather than let remaining data in the playback buffer run to completion. This is used by applications in response to an early touch-tone keypress where the caller has already made their selection and isn't interested in hearing a few more seconds of the message they've already responded to.
When the modem is done playback, it responds OK.
Recording audio data
The method for recording audio data is the same, except that the command is AT+VRX or AT#VRX, and the modem transmits audio data while the computer receives it. The RTS/CTS flow control are not used here (the computer must accept all the audio data it receives, and the modem automatically paces its transmission to match the audio sampling rate).
The modem never stops transmitting until the computer tells it to stop, which is usually with CTRL-C. The data is always terminated with DLE+!, and all DLE characters are doubled up.
Before, during, and after recording, the modem may notify the computer host of specific events including, but not limited to, the following:
- Touch-tone keypresses detected
- Silence detected
- Line polarity reversal detected (often meaning caller hung up)
- Dial tone detected
- Fax tone detected
When the modem wants to tell the host about these, it sends a DLE byte, plus a (usually) 1-byte message describing the event. The list of supported events varies by modem, but usually a digit (as well as * and #) mean touch-tones pressed, and the letter "s" means silence detected. Some modems report only one event for each touch-tone keypress, while others report a keypress repeatedly until the key is released, and then a special "key released" event.
Terminating a voice call
Any of the following commands usually cause the modem to hang up and terminate a voice call: AT+VLS=0, AT#VLS=0, ATH, ATZ. Dropping the RS232 DTR (data terminal ready) signal often accomplishes this as well. The modem remains in voice mode (except in the case of ATZ).
Voice modems do not automatically hang up even when the caller on the other end does. They may report the hangup, dialtone, or silence events, but it is up to the computer to act upon them. If when the modem is recording, the caller hangs up and the computer doesn't react, the modem will continue providing the audio recording everything else heard on the line, such as dial tones, telephone company error messages, and so forth.
References
- AT command reference manual for Rockwell, Conexant, and Lucent chipsets. (Each chipset manufacturer produces a manual with this same title, followed by the name of the product to which it applies)
- Zoom Tech Support Documentation, AT Command References