Digital dictation

Last updated

Digital dictation is a method of recording and editing the spoken word in real-time. The recording is done by the use of a digital recorder. Digital recorders are lighter, have a longer battery life and are able to record for a lot longer utilizing the same media, as compared to analog tape based dictation machines. The files generated with digital recorders vary in size, depending on the manufacturer and the format the user chooses. The most common file formats that digital recorders generate have one of the extensions WAV, WMA or MP3. True dictation machines record in the DSS and DS2 format.

Digital Speech Standard (DSS) is a proprietary compressed digital audio file format defined by the International Voice Association, a co-operative venture by Olympus, Philips and Grundig Business Systems.

The DSS and DS2 file formats compress audio allowing for greater portability. In some cases, speech is recorded where sound quality is paramount and transcription unnecessary, e.g. for broadcasting a theatre play; such recording uses techniques closer to high-fidelity music recording, rather than those discussed here.

Broadcasting distribution of audio and video content to a dispersed audience via any audio or visual mass communications medium

Broadcasting is the distribution of audio or video content to a dispersed audience via any electronic mass communications medium, but typically one using the electromagnetic spectrum, in a one-to-many model. Broadcasting began with AM radio, which came into popular use around 1920 with the spread of vacuum tube radio transmitters and receivers. Before this, all forms of electronic communication were one-to-one, with the message intended for a single recipient. The term broadcasting evolved from its use as the agricultural method of sowing seeds in a field by casting them broadly about. It was later adopted for describing the widespread distribution of information by printed materials or by telegraph. Examples applying it to "one-to-many" radio transmissions of an individual station to multiple listeners appeared as early as 1898.

Digital dictation offers several advantages over traditional cassette tape based dictation:

Random access

In computer science, random access is the ability to access an arbitrary element of a sequence in equal time or any item of data from a population of addressable elements roughly as easily and efficiently as any other, no matter how many elements may be in the set. It is typically contrasted to sequential access.

Digital audio technology that records, stores, and reproduces sound

Digital audio is sound that has been recorded in, or converted into, digital form. In digital audio, the sound wave of the audio signal is encoded as numerical samples in continuous sequence. For example, in CD audio, samples are taken 44100 times per second each with 16 bit sample depth. Digital audio is also the name for the entire technology of sound recording and reproduction using audio signals that have been encoded in digital form. Following significant advances in digital audio technology during the 1970s, it gradually replaced analog audio technology in many areas of audio engineering and telecommunications in the 1990s and 2000s.

File transfer is the transmission of a computer file through a communication channel from one computer system to another. Typically, file transfer is mediated by a communications protocol. In the history of computing, a large number of file transfer protocols have been designed for different contexts.

Dictation audio can be recorded in various audio file formats. Most digital dictation systems use a lossy form of audio compression based on modelling of the vocal tract to minimize hard disk space and optimize network utilization as files are transferred between users. (Note that WAV is not an audio encoding format, but a file format and has little or no bearing on the encoding rate (kbit/s), size or audio quality of the resulting file.)

An audio file format is a file format for storing digital audio data on a computer system. The bit layout of the audio data is called the audio coding format and can be uncompressed, or compressed to reduce the file size, often using lossy compression. The data can be a raw bitstream in an audio coding format, but it is usually embedded in a container format or an audio data format with defined storage layer.

Waveform Audio File Format is a Microsoft and IBM audio file format standard for storing an audio bitstream on PCs. It is an application of the Resource Interchange File Format (RIFF) bitstream format method for storing data in "chunks", and thus is also close to the 8SVX and the AIFF format used on Amiga and Macintosh computers, respectively. It is the main format used on Microsoft Windows systems for raw and typically uncompressed audio. The usual bitstream encoding is the linear pulse-code modulation (LPCM) format.

Digital dictation is different from speech recognition where audio is analyzed by a computer using speech algorithms in an attempt to transcribe the document. With digital dictation the process of converting digital audio to text may be done using digital transcription software, typically controlled by a foot switch which allows the transcriber to PLAY, STOP, REWIND, and BACKSPACE.

These days there are Digital Transcription Kits that allow integration with Speech Recognition Software. This gives the typist the option to either type a document manually, or send a document to be converted to text by Software such as Dragon NaturallySpeaking.

Methods

Portable Recorder
Portable, hand held, digital recorders are the modern replacement for analog handhelds. Digital portables allow transfer of recordings by docking or plugging into a computer. Digital recorders eliminate the need for cassette tapes. Professional digital hand held recorders are available with slide switch, push button, fingerprint locking, and barcode scanning options.

Computer
Another common way to record digital dictation is with a computer dictation microphone. There are several different types of computer dictation microphones available, but each one has similar features and operation. Olympus Direct Rec, Philips SpeechMike, and Dictaphone Powermic are all digital computer dictation microphones that also feature push button control for operating dictation or speech recognition software. The dictation microphone operates through a USB port on the computer it is used with.

Call-in Dictation System
Call in dictation systems allow one to record their dictations over the phone. With call in dictation systems the author dials a phone number, enters a PIN and starts dictating. Touch tone controls allow for start, pause, playback, and sending of dictation audio file. The call in dictation systems usually feature a Pod that can be plugged into a phone line. The pod can then be plugged into a computer to store dictation audio recording in compatible transcription or management software.

Mobile Phone
Currently there are several digital dictation applications available for mobile phones. With mobile dictation apps, one can record, edit, and send dictation files over networks. Wireless transfer of dictation files decreases turnaround time. Mobile dictation applications allow users to stay connected to dictation workflows through a network, such as the Internet.

Types of software

There are two types of digital dictation software:

  1. Standalone digital sound recording software: Basic software whereby the audio is recorded as a simple file. Most digital sound recording applications are designed for individuals or a very small number of users, as they do not offer a network efficient way of transferring the audio files other than email, they also do not encrypt or password protect the audio file
  2. Digital dictation workflow software: Advanced software for commercial organizations where audio is still played by a typist but the audio file can be securely and efficiently transferred. The workflow element of these advanced systems also allows users to share audio files instantly, create virtual teams, outsource transcription securely, and set up confidential send options or 'ethical walls'. Digital Dictation workflow software is normally Active Directory integrated and can be used in conjunction with document, practice or case management systems. Typical businesses using workflow software are law firms, healthcare organizations, accountancies, or surveying firms.

Recordings can be made over the telephone, on a computer or via a hand held dictation device that is "docked" to a computer.

See also

Notes

    Related Research Articles

    MiniDisc magneto-optical storage medium

    MiniDisc (MD) is a magneto-optical disc-based data storage format offering a capacity of 60, 74 minutes and, later, 80 minutes, of digitized audio or 1 gigabyte of Hi-MD data. Sony brand audio players were on the market in September 1992.

    A de facto standard is a custom or convention that has achieved a dominant position by public acceptance or market forces. De facto is a Latin phrase that means in fact in the sense of "in practice but not necessarily ordained by law" or "in practice or actuality, but not officially established", as opposed to de jure.

    A hard disk recorder (HDR) is a system that uses a high-capacity hard disk to record digital audio or digital video. Hard disk recording systems represent an alternative to reel-to-reel audio tape recording and video tape recorders, and provide editing capabilities unavailable to tape recorders. Audio HDR systems, which can be standalone or computer-based, typically include provisions for digital mixing and processing of the audio signal.

    Dictaphone former American company founded by Alexander Graham Bell that produced dictation machines

    Dictaphone was an American company founded by Alexander Graham Bell that produced dictation machines. It is now a division of Nuance Communications based in Burlington, Massachusetts.

    Digital audio workstation electronic system designed primarily for editing digital audio

    A digital audio workstation (DAW) is an electronic device or application software used for recording, editing and producing audio files. DAWs come in a wide variety of configurations from a single software program on a laptop, to an integrated stand-alone unit, all the way to a highly complex configuration of numerous components controlled by a central computer. Regardless of configuration, modern DAWs have a central interface that allows the user to alter and mix multiple recordings and tracks into a final produced piece.

    Hi-MD

    In January 2004, Sony announced the Hi-MD media storage format as a further development of the MiniDisc format. With its release in later 2004, came the ability to use newly developed, high-capacity 1 gigabyte Hi-MD discs, sporting the same dimensions as regular MiniDiscs. The Hi-MD format can be considered obsolete as the last recorder/player was discontinued in 2011. The discs themselves were withdrawn from sale in September 2012, though regular MiniDiscs are still available.

    Portable media player Portable device capable of storing and playing digital media

    A portable media player (PMP) or digital audio player (DAP) is a portable consumer electronics device capable of storing and playing digital media such as audio, images, and video files. The data is typically stored on a CD, DVD, BD, flash memory, microdrive, or hard drive. Most portable media players are equipped with a 3.5 mm headphone jack, which users can plug headphones into, or connect to a boombox or hifi system. In contrast, analogue portable audio players play music from non-digital media that use analogue signal storage, such as cassette tapes or vinyl records.

    MacSpeech, Inc. was a New Hampshire-based technology company that produced software-based speech recognition and voice dictation solutions for the Apple ecosystem. The company's products included iListen, MacSpeech Dictate, MacSpeech Dictate Medical, MacSpeech Dictate Legal, MacSpeech Dictate International, and MacSpeech Scribe. On February 12, 2010, Nuance Communications, Inc. acquired MacSpeech.

    A voice-user interface (VUI) makes spoken human interaction with computers possible, using speech recognition to understand spoken commands and questions, and typically text to speech to play a reply. A voice command device (VCD) is a device controlled with a voice user interface.

    SonicStage

    SonicStage is the name for Sony software that is used for managing portable devices when they are plugged into a computer running Microsoft Windows. It comprises a music player and library manager, similar to iTunes, Windows Media Player and RealPlayer. It is used to manage the library of ATRAC OMG/OMA and MP3 recordings on a PC. It was first used in VAIO PCs put on the Japanese market in October 2001, and superseded OpenMG Jukebox. Version 2 was found on 2004 model products, and Version 3 on 2005 model products.

    Samsung YEPP

    Yepp was Samsung Electronics' digital audio player brand until Samsung decided to retire most of their family brands in February 2011. From then on, their MP3 players were simply branded "Samsung" worldwide until they discontinued all of them late 2013. The brand included a wide range of hard-drive based as well as flash-memory based players. The name is claimed to be an acronym for "young, energetic, passionate person".

    Zoom H2 Handy Recorder

    The H2 Handy Recorder is a handheld digital audio recorder from Zoom first announced at the NAMM tradeshow in February 2007. It records very high quality digital stereo or 4-channel audio on a hand-held unit, and has been called "the studio on a stick."

    Call recording software records telephone conversations over PSTN or VoIP in a digital audio file format. Call recording is distinct from call logging and tracking, which record details about the call but not the conversation; however, software may include both recording and logging functionality.

    Total Recorder is digital audio editor software from High Criteria, Inc. which is able to record any sound coming through a computer soundcard. In addition to recording through a soundcard, Total Recorder is able to record digital sound directly through its virtual sound driver. This driver provides an advantage of recording audio reproduced by an external program directly in digital format, i.e. without digital-analog-digital conversions leading to loss of quality, and even in those cases when a computer soundcard has no loop-back line. Total Recorder is a shareware program. Evaluation version of Total Recorder is a fully functional version of the program, with the exception that an audible noise will be inserted about every 60 seconds.

    Camtasia screencasting app

    Camtasia is a software suite, created and published by TechSmith, for creating video tutorials and presentations directly via screencast, or via a direct recording plug-in to Microsoft PowerPoint. The screen area to be recorded can be chosen freely, and audio or other multimedia recordings may be recorded at the same time or added separately from any other source and integrated in the Camtasia component of the product. Both versions of Camtasia started as enhanced screen capture programs and have evolved to integrate screen capture and post-processing tools targeted at the educational and information multimedia development marketplace.

    MacSpeech Scribe is speech recognition software for Mac OS X designed specifically for transcription of recorded voice dictation. It runs on Mac OS X 10.6 Snow Leopard. The software transcribes dictation recorded by an individual speaker. Typically the speaker will record their dictation using a digital recording device such as a handheld digital recorder, mobile smartphone, or desktop or laptop computer with a suitable microphone. MacSpeech Scribe supports specific audio file formats for recorded dictation: .aif, .aiff, .wav, .mp4, .m4a, and .m4v.

    Speech Processing Solutions international electronics company

    Speech Processing Solutions is an international electronics company headquartered in Vienna, Austria. The company designs, develops, manufactures and markets speech processing devices, such as those used in digital dictation and speech recognition. Speech Processing Solutions was formed on 1 July 2012. Philips Speech Processing was part of the Philips Consumer Lifestyle sector. Speech Processing Solutions is now an official licensee of the Philips brand. The company has subsidiaries in the US, Canada, Australia, the United Kingdom, Belgium, France and Germany, and employs around 170 people worldwide.

    Voice writing is a method used for court reporting, medical transcription, and closed captioning. Using the voice writing method, a court reporter speaks directly into a stenomask or speech silencer—a hand-held mask containing one or two microphones and voice-dampening materials. As the reporter repeats the testimony into the recorder, the mask prevents the reporter from being heard during testimony. Voice writers record everything that is said by judges, witnesses, attorneys, and other parties to a proceeding, including gestures and emotional reactions, and either provide real-time feed or prepare transcripts afterwards.

    The field of language documentation in the modern context involves a complex and ever-evolving set of tools and methods, and the study and development of their use - and, especially, identification and promotion of best practices - can be considered a sub-field of language documentation proper. Among these are ethical and recording principles, workflows and methods, hardware tools, and software tools.

    References

    "Digital Voice Recorder Buyer's Guide" . Retrieved 2011-06-01. 

    "What is Digital Dictation?" . Retrieved 2013-01-07. 

    "Legal IT Apps: Dictation transcription on tap, thanks to new app" . Retrieved 2012-06-12. 

    "Digital Voice Recorders" . Retrieved 2014-10-29.