Voice user interface

Last updated July 06, 2024

A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.

Voice user interfaces have been added to automobiles, home automation systems, computer operating systems, home appliances like washing machines and microwave ovens, and television remote controls. They are the primary way of interacting with virtual assistants on smartphones and smart speakers. Older automated attendants (which route phone calls to the correct extension) and interactive voice response systems (which conduct more complicated transactions over the phone) can respond to the pressing of keypad buttons via DTMF tones, but those with a full voice user interface allow callers to speak requests and responses without having to press any buttons.

Newer voice command devices are speaker-independent, so they can respond to multiple voices, regardless of accent or dialectal influences. They are also capable of responding to several commands at once, separating vocal messages, and providing appropriate feedback, accurately imitating a natural conversation.^[1]

Overview

A VUI is the interface to any speech application. Only a short time ago, controlling a machine by simply talking to it was only possible in science fiction. Until recently, this area was considered to be artificial intelligence. However, advances in technologies like text-to-speech, speech-to-text, natural language processing, and cloud services contributed to the mass adoption of these types of interfaces. VUIs have become more commonplace, and people are taking advantage of the value that these hands-free, eyes-free interfaces provide in many situations.

VUIs need to respond to input reliably, or they will be rejected and often ridiculed by their users. Designing a good VUI requires interdisciplinary talents of computer science, linguistics and human factors psychology – all of which are skills that are expensive and hard to come by. Even with advanced development tools, constructing an effective VUI requires an in-depth understanding of both the tasks to be performed, as well as the target audience that will use the final system. The closer the VUI matches the user's mental model of the task, the easier it will be to use with little or no training, resulting in both higher efficiency and higher user satisfaction.

A VUI designed for the general public should emphasize ease of use and provide a lot of help and guidance for first-time callers. In contrast, a VUI designed for a small group of power users (including field service workers), should focus more on productivity and less on help and guidance. Such applications should streamline the call flows, minimize prompts, eliminate unnecessary iterations and allow elaborate "mixed initiative dialogs", which enable callers to enter several pieces of information in a single utterance and in any order or combination. In short, speech applications have to be carefully crafted for the specific business process that is being automated.

Not all business processes render themselves equally well for speech automation. In general, the more complex the inquiries and transactions are, the more challenging they will be to automate, and the more likely they will be to fail with the general public. In some scenarios, automation is simply not applicable, so live agent assistance is the only option. A legal advice hotline, for example, would be very difficult to automate. On the flip side, speech is perfect for handling quick and routine transactions, like changing the status of a work order, completing a time or expense entry, or transferring funds between accounts.

History

Early applications for VUI included voice-activated dialing of phones, either directly or through a (typically Bluetooth) headset or vehicle audio system.

In 2007, a CNN business article reported that voice command was over a billion dollar industry and that companies like Google and Apple were trying to create speech recognition features.^[2] In the years since the article was published, the world has witnessed a variety of voice command devices. Additionally, Google has created a speech recognition engine called Pico TTS and Apple released Siri. Voice command devices are becoming more widely available, and innovative ways for using the human voice are always being created. For example, Business Week suggests that the future remote controller is going to be the human voice. Currently Xbox Live allows such features and Jobs hinted at such a feature on the new Apple TV.^[3]

Voice command software products on computing devices

Both Apple Mac and Windows PC provide built in speech recognition features for their latest operating systems.

Microsoft Windows

Two Microsoft operating systems, Windows 7 and Windows Vista, provide speech recognition capabilities. Microsoft integrated voice commands into their operating systems to provide a mechanism for people who want to limit their use of the mouse and keyboard, but still want to maintain or increase their overall productivity.^[4]

Windows Vista

With Windows Vista voice control, a user may dictate documents and emails in mainstream applications, start and switch between applications, control the operating system, format documents, save documents, edit files, efficiently correct errors, and fill out forms on the Web. The speech recognition software learns automatically every time a user uses it, and speech recognition is available in English (U.S.), English (U.K.), German (Germany), French (France), Spanish (Spain), Japanese, Chinese (Traditional), and Chinese (Simplified). In addition, the software comes with an interactive tutorial, which can be used to train both the user and the speech recognition engine.^[5]

Windows 7

In addition to all the features provided in Windows Vista, Windows 7 provides a wizard for setting up the microphone and a tutorial on how to use the feature.^[6]

Mac OS X

All Mac OS X computers come pre-installed with the speech recognition software. The software is user-independent, and it allows for a user to, "navigate menus and enter keyboard shortcuts; speak checkbox names, radio button names, list items, and button names; and open, close, control, and switch among applications."^[7] However, the Apple website recommends a user buy a commercial product called Dictate.^[7]

Commercial products

If a user is not satisfied with the built in speech recognition software or a user does not have a built speech recognition software for their OS, then a user may experiment with a commercial product such as Braina Pro or DragonNaturallySpeaking for Windows PCs,^[8] and Dictate, the name of the same software for Mac OS.^[9]

Voice command mobile devices

Any mobile device running Android OS, Microsoft Windows Phone, iOS 9 or later, or Blackberry OS provides voice command capabilities. In addition to the built-in speech recognition software for each mobile phone's operating system, a user may download third party voice command applications from each operating system's application store: Apple App store, Google Play, Windows Phone Marketplace (initially Windows Marketplace for Mobile), or BlackBerry App World.

Android OS

Google has developed an open source operating system called Android, which allows a user to perform voice commands such as: send text messages, listen to music, get directions, call businesses, call contacts, send email, view a map, go to websites, write a note, and search Google.^[10] The speech recognition software is available for all devices since Android 2.2 "Froyo", but the settings must be set to English.^[10] Google allows for the user to change the language, and the user is prompted when he or she first uses the speech recognition feature if he or she would like their voice data to be attached to their Google account. If a user decides to opt into this service, it allows Google to train the software to the user's voice.^[11]

Google introduced the Google Assistant with Android 7.0 "Nougat". It is much more advanced than the older version.

Amazon.com has the Echo that uses Amazon's custom version of Android to provide a voice interface.

Microsoft Windows

Windows Phone is Microsoft's mobile device's operating system. On Windows Phone 7.5, the speech app is user independent and can be used to: call someone from your contact list, call any phone number, redial the last number, send a text message, call your voice mail, open an application, read appointments, query phone status, and search the web.^[12]^[13] In addition, speech can also be used during a phone call, and the following actions are possible during a phone call: press a number, turn the speaker phone on, or call someone, which puts the current call on hold.^[13]

Windows 10 introduces Cortana, a voice control system that replaces the formerly used voice control on Windows phones.

iOS

Apple added Voice Control to its family of iOS devices as a new feature of iPhone OS 3. The iPhone 4S, iPad 3, iPad Mini 1G, iPad Air, iPad Pro 1G, iPod Touch 5G and later, all come with a more advanced voice assistant called Siri. Voice Control can still be enabled through the Settings menu of newer devices. Siri is a user independent built-in speech recognition feature that allows a user to issue voice commands. With the assistance of Siri a user may issue commands like, send a text message, check the weather, set a reminder, find information, schedule meetings, send an email, find a contact, set an alarm, get directions, track your stocks, set a timer, and ask for examples of sample voice command queries.^[14] In addition, Siri works with Bluetooth and wired headphones.^[15]

Amazon Alexa

In 2014 Amazon introduced the Alexa smart home device. Its main purpose was just a smart speaker, that allowed the consumer to control the device with their voice. Eventually, it turned into a novelty device that had the ability to control home appliance with voice. Now almost all the appliances are controllable with Alexa, including light bulbs and temperature. By allowing voice control, Alexa can connect to smart home technology allowing you to lock your house, control the temperature, and activate various devices. This form of A.I allows for someone to simply ask it a question, and in response the Alexa searches for, finds, and recites the answer back to you.^[16]

Speech recognition in cars

As car technology improves, more features will be added to cars and these features could potentially distract a driver. Voice commands for cars, according to CNET, should allow a driver to issue commands and not be distracted. CNET stated that Nuance was suggesting that in the future they would create a software that resembled Siri, but for cars.^[17] Most speech recognition software on the market in 2011 had only about 50 to 60 voice commands, but Ford Sync had 10,000.^[17] However, CNET suggested that even 10,000 voice commands was not sufficient given the complexity and the variety of tasks a user may want to do while driving.^[17] Voice command for cars is different from voice command for mobile phones and for computers because a driver may use the feature to look for nearby restaurants, look for gas, driving directions, road conditions, and the location of the nearest hotel.^[17] Currently, technology allows a driver to issue voice commands on both a portable GPS like a Garmin and a car manufacturer navigation system.^[18]

List of Voice Command Systems Provided By Motor Manufacturers:

Ford Sync
Lexus Voice Command
Chrysler UConnect
Honda Accord
GM IntelliLink
BMW
Mercedes
Pioneer
Harman
Hyundai

Non-verbal input

While most voice user interfaces are designed to support interaction through spoken human language, there have also been recent explorations in designing interfaces take non-verbal human sounds as input.^[19]^[20] In these systems, the user controls the interface by emitting non-speech sounds such as humming, whistling, or blowing into a microphone.^[21]

One such example of a non-verbal voice user interface is Blendie,^[22]^[23] an interactive art installation created by Kelly Dobson. The piece comprised a classic 1950s-era blender which was retrofitted to respond to microphone input. To control the blender, the user must mimic the whirring mechanical sounds that a blender typically makes: the blender will spin slowly in response to a user's low-pitched growl, and increase in speed as the user makes higher-pitched vocal sounds.

Another example is VoiceDraw,^[24] a research system that enables digital drawing for individuals with limited motor abilities. VoiceDraw allows users to "paint" strokes on a digital canvas by modulating vowel sounds, which are mapped to brush directions. Modulating other paralinguistic features (e.g. the loudness of their voice) allows the user to control different features of the drawing, such as the thickness of the brush stroke.

Other approaches include adopting non-verbal sounds to augment touch-based interfaces (e.g. on a mobile phone) to support new types of gestures that wouldn't be possible with finger input alone.^[21]

Design challenges

Voice interfaces pose a substantial number of challenges for usability. In contrast to graphical user interfaces (GUIs), best practices for voice interface design are still emergent.^[25]

Discoverability

With purely audio-based interaction, voice user interfaces tend to suffer from low discoverability:^[25] it is difficult for users to understand the scope of a system's capabilities. In order for the system to convey what is possible without a visual display, it would need to enumerate the available options, which can become tedious or infeasible. Low discoverability often results in users reporting confusion over what they are "allowed" to say, or a mismatch in expectations about the breadth of a system's understanding.^[26]^[27]

Transcription

While speech recognition technology has improved considerably in recent years, voice user interfaces still suffer from parsing or transcription errors in which a user's speech is not interpreted correctly.^[28] These errors tend to be especially prevalent when the speech content uses technical vocabulary (e.g. medical terminology) or unconventional spellings such as musical artist or song names.^[29]

Understanding

Effective system design to maximize conversational understanding remains an open area of research. Voice user interfaces that interpret and manage conversational state are challenging to design due to the inherent difficulty of integrating complex natural language processing tasks like coreference resolution, named-entity recognition, information retrieval, and dialog management.^[30] Most voice assistants today are capable of executing single commands very well but limited in their ability to manage dialogue beyond a narrow task or a couple turns in a conversation.^[31]

Future uses

Pocket-size devices, such as PDAs or mobile phones, currently rely on small buttons for user input. These are either built into the device or are part of a touch-screen interface, such as that of the Apple iPod Touch and iPhone Siri Application. Extensive button-pressing on devices with such small buttons can be tedious and inaccurate, so an easy-to-use, accurate, and reliable VUI would potentially be a major breakthrough in the ease of their use. Nonetheless, such a VUI would also benefit users of laptop- and desktop-sized computers, as well, as it would solve numerous problems currently associated with keyboard and mouse use, including repetitive-strain injuries such as carpal tunnel syndrome, the challenges of navigating and inputting text within digital interfaces by the visually impaired,^[32] and slow typing speed on the part of inexperienced keyboard users. Moreover, keyboard use typically entails either sitting or standing stationary in front of the connected display; by contrast, a VUI would free the user to be far more mobile, as speech input eliminates the need to look at a keyboard.

Such developments could change the face of current machines and have far-reaching implications on how users interact with them. Hand-held devices would be designed with larger, easier-to-view screens, as no keyboard would be required. Touch-screen devices would no longer need to split the display between content and an on-screen keyboard, thus providing full-screen viewing of the content. Laptop computers could essentially be cut in half in terms of size, as the keyboard half would be eliminated and all internal components would be integrated behind the display, effectively resulting in a simple tablet computer. Desktop computers would consist of a CPU and screen, saving desktop space otherwise occupied by the keyboard and eliminating sliding keyboard rests built under the desk's surface. Television remote controls and keypads on dozens of other devices, from microwave ovens to photocopiers, could also be eliminated.

Numerous challenges would have to be overcome, however, for such developments to occur. First, the VUI would have to be sophisticated enough to distinguish between input, such as commands, and background conversation; otherwise, false input would be registered and the connected device would behave erratically. A standard prompt, such as the famous "Computer!" call by characters in science fiction TV shows and films such as Star Trek , could activate the VUI and prepare it to receive further input by the same speaker. Conceivably, the VUI could also include a human-like representation: a voice or even an on-screen character, for instance, that responds back (e.g., "Yes, Vamshi?") and continues to communicate back and forth with the user in order to clarify the input received and ensure accuracy.

Second, the VUI would have to work in concert with highly sophisticated software in order to accurately process and find/retrieve information or carry out an action as per the particular user's preferences. For instance, if Samantha prefers information from a particular newspaper, and if she prefers that the information be summarized in point-form, she might say, "Computer, find me some information about the flooding in southern China last night"; in response, the VUI that is familiar with her preferences would "find" facts about "flooding" in "southern China" from that source, convert it into point-form, and deliver it to her on screen and/or in voice form, complete with a citation. Therefore, accurate speech-recognition software, along with some degree of artificial intelligence on the part of the machine associated with the VUI, would be required.

Privacy implications

Privacy concerns are raised by the fact that voice commands are available to the providers of voice-user interfaces in unencrypted form, and can thus be shared with third parties and be processed in an unauthorized or unexpected manner.^[33]^[34] Additionally to the linguistic content of recorded speech, a user's manner of expression and voice characteristics can implicitly contain information about his or her biometric identity, personality traits, body shape, physical and mental health condition, sex, gender, moods and emotions, socioeconomic status and geographical origin.^[35]

Related Research Articles

A graphical user interface, or GUI, is a form of user interface that allows users to interact with electronic devices through graphical icons and visual indicators such as secondary notation. In many applications, GUIs are used instead of text-based UIs, which are based on typed command labels or text navigation. GUIs were introduced in reaction to the perceived steep learning curve of command-line interfaces (CLIs), which require commands to be typed on a computer keyboard.

<span class="mw-page-title-main">Personal digital assistant</span> Multi-purpose mobile device

A personal digital assistant (PDA) is a multi-purpose mobile device which functions as a personal information manager. By 2007, PDAs have been mostly displaced by the widespread adoption of more highly capable smartphones, in particular those based on iOS and Android, and thus saw a rapid decline afterwards.

<span class="mw-page-title-main">User interface</span> Means by which a user interacts with and controls a machine

In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine from the human end, while the machine simultaneously feeds back information that aids the operators' decision-making process. Examples of this broad concept of user interfaces include the interactive aspects of computer operating systems, hand tools, heavy machinery operator controls and process controls. The design considerations applicable when creating user interfaces are related to, or involve such disciplines as, ergonomics and psychology.

Keystroke logging, often referred to as keylogging or keyboard capturing, is the action of recording (logging) the keys struck on a keyboard, typically covertly, so that a person using the keyboard is unaware that their actions are being monitored. Data can then be retrieved by the person operating the logging program. A keystroke recorder or keylogger can be either software or hardware.

Computer accessibility refers to the accessibility of a computer system to all people, regardless of disability type or severity of impairment. The term accessibility is most often used in reference to specialized hardware or software, or a combination of both, designed to enable the use of a computer by a person with a disability or impairment.

A screen reader is a form of assistive technology (AT) that renders text and image content as speech or braille output. Screen readers are essential to people who are blind, and are useful to people who are visually impaired, illiterate, or have a learning disability. Screen readers are software applications that attempt to convey what people with normal eyesight see on a display to their users via non-visual means, like text-to-speech, sound icons, or a braille device. They do this by applying a wide variety of techniques that include, for example, interacting with dedicated accessibility APIs, using various operating system features, and employing hooking techniques.

In human–computer interaction, WIMP stands for "windows, icons, menus, pointer", denoting a style of interaction using these elements of the user interface. Other expansions are sometimes used, such as substituting "mouse" and "mice" for menus, or "pull-down menu" and "pointing" for pointer.

Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data.

In computing, multi-touch is technology that enables a surface to recognize the presence of more than one point of contact with the surface at the same time. The origins of multitouch began at CERN, MIT, University of Toronto, Carnegie Mellon University and Bell Labs in the 1970s. CERN started using multi-touch screens as early as 1976 for the controls of the Super Proton Synchrotron. Capacitive multi-touch displays were popularized by Apple's iPhone in 2007. Multi-touch may be used to implement additional functionality, such as pinch to zoom or to activate certain subroutines attached to predefined gestures using gesture recognition.

A text entry interface or text entry device is an interface that is used to enter text information in an electronic device. A commonly used device is a mechanical computer keyboard. Most laptop computers have an integrated mechanical keyboard, and desktop computers are usually operated primarily using a keyboard and mouse. Devices such as smartphones and tablets mean that interfaces such as virtual keyboards and voice recognition are becoming more popular as text entry systems.

Natural-language user interface is a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications.

A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.

In computing, 3D interaction is a form of human-machine interaction where users are able to move and perform interaction in 3D space. Both human and machine process information where the physical position of elements in the 3D space is relevant.

In computing, a natural user interface (NUI) or natural interface is a user interface that is effectively invisible, and remains invisible as the user continuously learns increasingly complex interactions. The word "natural" is used because most computer interfaces use artificial control devices whose operation has to be learned. Examples include voice assistants, such as Alexa and Siri, touch and multitouch interactions on today's mobile phones and tablets, but also touch interfaces invisibly integrated into the textiles furnitures.

In computing, an input device is a piece of equipment used to provide data and control signals to an information processing system, such as a computer or information appliance. Examples of input devices include keyboards, computer mice, scanners, cameras, joysticks, and microphones.

MessagEase is an input method and virtual keyboard for touchscreen devices. It relies on a new entry system designed by Saied B. Nesbat, formatted as a 3x3 matrix keypad where users may press or swipe up, down, left, right, or diagonally to access all keys and symbols. It is a keyboard that was designed for devices like cell phones, mimicking the early cell phones' limited number of 12 keys.

Virtual assistants are software technology that assist users complete various tasks. Well known virtual assistants include Amazon Alexa, and Siri, produced by Apple. Other companies, such as Google and Microsoft, also have virtual assistants. There are privacy issues concerning what information can go to the third party corporations that operate virtual assistants and how this data can potentially be used.

Voice computing is the discipline that develops hardware or software to process voice inputs.

<span class="mw-page-title-main">Shumin Zhai</span> Human–computer interaction research scientist

Shumin Zhai is a Chinese-born American Canadian Human–computer interaction (HCI) research scientist and inventor. He is known for his research specifically on input devices and interaction methods, swipe-gesture-based touchscreen keyboards, eye-tracking interfaces, and models of human performance in human-computer interaction. His studies have contributed to both foundational models and understandings of HCI and practical user interface designs and flagship products. He previously worked at IBM where he invented the ShapeWriter text entry method for smartphones, which is a predecessor to the modern Swype keyboard. Dr. Zhai's publications have won the ACM UIST Lasting Impact Award and the IEEE Computer Society Best Paper Award, among others, and he is most known for his research specifically on input devices and interaction methods, swipe-gesture-based touchscreen keyboards, eye-tracking interfaces, and models of human performance in human-computer interaction. Dr. Zhai is currently a Principal Scientist at Google where he leads and directs research, design, and development of human-device input methods and haptics systems.

References

↑ "Washing Machine Voice Control". Appliance Magazine.
↑ Borzo, Jeanette (8 February 2007). "Now You're Talking". CNN Money. Retrieved 25 April 2012.
↑ "Voice Control, the End of the TV Remote?". Bloomberg.com. Business Week. 9 December 2011. Archived from the original on December 8, 2011. Retrieved 1 May 2012.
↑ "Windows Vista Built In Speech". Windows Vista. Retrieved 25 April 2012.
↑ "Speech Operation On Vista". Microsoft.
↑ "Speech Recognition Set Up". Microsoft.
1 2 "Physical and Motor Skills". Apple.
↑ "DragonNaturallySpeaking PC". Nuance.
↑ "DragonNaturallySpeaking Mac". Nuance.
1 2 "Voice Actions".
↑ "Google Voice Search For Android Can Now Be "Trained" To Your Voice". 14 December 2010. Retrieved 24 April 2012.
↑ "Using Voice Command". Microsoft. Retrieved 24 April 2012.
1 2 "Using Voice Commands". Microsoft. Retrieved 27 April 2012.
↑ "Siri, The iPhone 3GS & 4, iPod 3 & 4, have voice control like an express Siri, it plays music, pauses music, suffle, Facetime, and calling Features". Apple. Retrieved 27 April 2012.
↑ "Siri FAQ". Apple.
↑ "How Amazon's Echo went from a smart speaker to the center of your home". Business Insider .
1 2 3 4 "Siri Like Voice". CNET.
↑ "Portable GPS With Voice". CNET.
↑ Blattner, Meera M.; Greenberg, Robert M. (1992). "Communicating and Learning Through Non-speech Audio". Multimedia Interface Design in Education. pp. 133–143. doi:10.1007/978-3-642-58126-7_9. ISBN 978-3-540-55046-4.
↑ Hereford, James; Winn, William (October 1994). "Non-Speech Sound in Human-Computer Interaction: A Review and Design Guidelines". Journal of Educational Computing Research. 11 (3): 211–233. doi:10.2190/mkd9-w05t-yj9y-81nm. ISSN 0735-6331. S2CID 61510202.
1 2 "Voice augmented manipulation | Proceedings of the 15th international conference on Human-computer interaction with mobile devices and services". dlnext.acm.org. doi:10.1145/2493190.2493244. S2CID 6251400 . Retrieved 2019-02-27.
↑ "Blendie | Proceedings of the 5th conference on Designing interactive systems: processes, practices, methods, and techniques". dlnext.acm.org. doi:10.1145/1013115.1013159 . Retrieved 2019-02-27.
↑ "Kelly Dobson: Blendie". web.media.mit.edu. Retrieved 2019-02-27.
↑ "Voicedraw | Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility". dlnext.acm.org. doi:10.1145/1296843.1296850. S2CID 218338 . Retrieved 2019-02-27.
1 2 "Design guidelines for hands-free speech interaction | Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct". dlnext.acm.org. doi:10.1145/3236112.3236149. S2CID 52099112 . Retrieved 2019-02-27.
↑ "Designing SpeechActs | Proceedings of the SIGCHI Conference on Human Factors in Computing Systems". dlnext.acm.org. doi:10.1145/223904.223952. S2CID 9313029 . Retrieved 2019-02-27.
↑ "What can I say? | Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services". doi: 10.1145/2935334.2935386 . S2CID 6246618.{{cite journal}}: Cite journal requires |journal= (help)
↑ "Patterns for How Users Overcome Obstacles in Voice User Interfaces | Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems". dlnext.acm.org. doi:10.1145/3173574.3173580. S2CID 5041672 . Retrieved 2019-02-27.
↑ ""Play PRBLMS" | Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems". dlnext.acm.org. doi:10.1145/3173574.3173870. S2CID 5050837 . Retrieved 2019-02-27.
↑ Galitsky, Boris (2019). Developing Enterprise Chatbots: Learning Linguistic Structures (1st ed.). Cham, Switzerland: Springer. pp. 13–24. doi:10.1007/978-3-030-04299-8. ISBN 978-3-030-04298-1. S2CID 102486666.
↑ Pearl, Cathy (2016-12-06). Designing Voice User Interfaces: Principles of Conversational Experiences (1st ed.). Sebastopol, CA: O'Reilly Media. pp. 16–19. ISBN 978-1-491-95541-3.
↑ Messaoudi, Mohamed Dhiaeddine; Menelas, Bob-Antoine J.; Mcheick, Hamid (2022-10-17). "Review of Navigation Assistive Tools and Technologies for the Visually Impaired". Sensors. 22 (20): 7888. Bibcode:2022Senso..22.7888M. doi: 10.3390/s22207888 . ISSN 1424-8220. PMC 9606951 . PMID 36298237.
↑ "Apple, Google, and Amazon May Have Violated Your Privacy by Reviewing Digital Assistant Commands". Fortune. 2019-08-05. Retrieved 2020-05-13.
↑ Hern, Alex (2019-04-11). "Amazon staff listen to customers' Alexa recordings, report says". the Guardian. Retrieved 2020-05-21.
↑ Kröger, Jacob Leon; Lutz, Otto Hans-Martin; Raschke, Philip (2020). "Privacy Implications of Voice and Speech Analysis – Information Disclosure by Inference". Privacy and Identity Management. Data for Better Living: AI and Privacy. IFIP Advances in Information and Communication Technology. Vol. 576. pp. 242–258. doi: 10.1007/978-3-030-42504-3_16 . ISBN 978-3-030-42503-6. ISSN 1868-4238.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[appliance-1] "Washing Machine Voice Control". Appliance Magazine.

[2] Borzo, Jeanette (8 February 2007). "Now You're Talking". CNN Money. Retrieved 25 April 2012.

[3] "Voice Control, the End of the TV Remote?". Bloomberg.com. Business Week. 9 December 2011. Archived from the original on December 8, 2011. Retrieved 1 May 2012.

[4] "Windows Vista Built In Speech". Windows Vista. Retrieved 25 April 2012.

[5] "Speech Operation On Vista". Microsoft.

[6] "Speech Recognition Set Up". Microsoft.

[Y-7] 1 2 "Physical and Motor Skills". Apple.

[8] "DragonNaturallySpeaking PC". Nuance.

[9] "DragonNaturallySpeaking Mac". Nuance.

[v-10] 1 2 "Voice Actions".

[11] "Google Voice Search For Android Can Now Be "Trained" To Your Voice". 14 December 2010. Retrieved 24 April 2012.

[12] "Using Voice Command". Microsoft. Retrieved 24 April 2012.

[win-13] 1 2 "Using Voice Commands". Microsoft. Retrieved 27 April 2012.

[14] "Siri, The iPhone 3GS & 4, iPod 3 & 4, have voice control like an express Siri, it plays music, pauses music, suffle, Facetime, and calling Features". Apple. Retrieved 27 April 2012.

[15] "Siri FAQ". Apple.

[16] "How Amazon's Echo went from a smart speaker to the center of your home". Business Insider .

[C-17] 1 2 3 4 "Siri Like Voice". CNET.

[18] "Portable GPS With Voice". CNET.

[Blattner-19] Blattner, Meera M.; Greenberg, Robert M. (1992). "Communicating and Learning Through Non-speech Audio". Multimedia Interface Design in Education. pp. 133–143. doi:10.1007/978-3-642-58126-7_9. ISBN 978-3-540-55046-4.

[Hereford-20] Hereford, James; Winn, William (October 1994). "Non-Speech Sound in Human-Computer Interaction: A Review and Design Guidelines". Journal of Educational Computing Research. 11 (3): 211–233. doi:10.2190/mkd9-w05t-yj9y-81nm. ISSN 0735-6331. S2CID 61510202.

[dlnext.acm.org-21] 1 2 "Voice augmented manipulation | Proceedings of the 15th international conference on Human-computer interaction with mobile devices and services". dlnext.acm.org. doi:10.1145/2493190.2493244. S2CID 6251400 . Retrieved 2019-02-27.

[22] "Blendie | Proceedings of the 5th conference on Designing interactive systems: processes, practices, methods, and techniques". dlnext.acm.org. doi:10.1145/1013115.1013159 . Retrieved 2019-02-27.

[23] "Kelly Dobson: Blendie". web.media.mit.edu. Retrieved 2019-02-27.

[24] "Voicedraw | Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility". dlnext.acm.org. doi:10.1145/1296843.1296850. S2CID 218338 . Retrieved 2019-02-27.

[:0-25] 1 2 "Design guidelines for hands-free speech interaction | Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct". dlnext.acm.org. doi:10.1145/3236112.3236149. S2CID 52099112 . Retrieved 2019-02-27.

[26] "Designing SpeechActs | Proceedings of the SIGCHI Conference on Human Factors in Computing Systems". dlnext.acm.org. doi:10.1145/223904.223952. S2CID 9313029 . Retrieved 2019-02-27.

[27] "What can I say? | Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services". doi: 10.1145/2935334.2935386 . S2CID 6246618.{{cite journal}}: Cite journal requires |journal= (help)

[28] "Patterns for How Users Overcome Obstacles in Voice User Interfaces | Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems". dlnext.acm.org. doi:10.1145/3173574.3173580. S2CID 5041672 . Retrieved 2019-02-27.

[29] ""Play PRBLMS" | Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems". dlnext.acm.org. doi:10.1145/3173574.3173870. S2CID 5050837 . Retrieved 2019-02-27.

[30] Galitsky, Boris (2019). Developing Enterprise Chatbots: Learning Linguistic Structures (1st ed.). Cham, Switzerland: Springer. pp. 13–24. doi:10.1007/978-3-030-04299-8. ISBN 978-3-030-04298-1. S2CID 102486666.

[31] Pearl, Cathy (2016-12-06). Designing Voice User Interfaces: Principles of Conversational Experiences (1st ed.). Sebastopol, CA: O'Reilly Media. pp. 16–19. ISBN 978-1-491-95541-3.

[32] Messaoudi, Mohamed Dhiaeddine; Menelas, Bob-Antoine J.; Mcheick, Hamid (2022-10-17). "Review of Navigation Assistive Tools and Technologies for the Visually Impaired". Sensors. 22 (20): 7888. Bibcode:2022Senso..22.7888M. doi: 10.3390/s22207888 . ISSN 1424-8220. PMC 9606951 . PMID 36298237.

[Fortune_2019-33] "Apple, Google, and Amazon May Have Violated Your Privacy by Reviewing Digital Assistant Commands". Fortune. 2019-08-05. Retrieved 2020-05-13.

[Hern_2019-34] Hern, Alex (2019-04-11). "Amazon staff listen to customers' Alexa recordings, report says". the Guardian. Retrieved 2020-05-21.

[KrögerLutz2020-35] Kröger, Jacob Leon; Lutz, Otto Hans-Martin; Raschke, Philip (2020). "Privacy Implications of Voice and Speech Analysis – Information Disclosure by Inference". Privacy and Identity Management. Data for Better Living: AI and Privacy. IFIP Advances in Information and Communication Technology. Vol. 576. pp. 242–258. doi: 10.1007/978-3-030-42504-3_16 . ISBN 978-3-030-42503-6. ISSN 1868-4238.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

v t e User interfaces
Natural-language user interfaces	Chatbot Dialogue system Voice user interfaces Conversational user interface Virtual assistant Voice search
Graphical user interfaces	Widgets Zooming user interface
Touch user interfaces	Multi-touch Tangible user interface
3D user interfaces	Augmented and virtual reality Finger tracking Positional tracking
Other user interfaces	Text-based user interface Natural user interface Multimodal user interface

v t e Virtual assistants
Active	AliGenie Alexa Alice Bixby Viv Braina Celia Clova Google Assistant Maluuba Mycroft Siri Voice Mate Watson WolframAlpha Xiaoice
Discontinued	BlackBerry Assistant Cortana Google Now M Microsoft Agent Microsoft Bob Microsoft Voice Command Ms. Dewey Mya Office Assistant (Clippy) S Voice Speaktoit Assistant Tafiti Vlingo

Voice user interface

Contents

Overview

History

Voice command software products on computing devices

Microsoft Windows

Windows Vista

Windows 7

Mac OS X

Commercial products

Voice command mobile devices

Android OS

Microsoft Windows

iOS

Amazon Alexa

Speech recognition in cars

Non-verbal input

Design challenges

Discoverability

Transcription

Understanding

Future uses

Privacy implications

See also

Related Research Articles

References

External links