This article needs additional citations for verification .(June 2022) |
A dialogue system, or conversational agent (CA), is a computer system intended to converse with a human. Dialogue systems employed one or more of text, speech, graphics, haptics, gestures, and other modes for communication on both the input and output channel.
The elements of a dialogue system are not defined because this idea is under research,[ citation needed ] however, they are different from chatbot. [1] The typical GUI wizard engages in a sort of dialogue, but it includes very few of the common dialogue system components, and the dialogue state is trivial.
After dialogue systems based only on written text processing starting from the early Sixties, [2] the first speaking dialogue system was issued by the DARPA Project in the US in 1977. [3] After the end of this 5-year project, some European projects issued the first dialogue system able to speak many languages (also French, German and Italian). [4] Those first systems were used in the telecom industry to provide phone various services in specific domains, e.g. automated agenda and train tables service.
What sets of components are included in a dialogue system, and how those components divide up responsibilities differs from system to system. Principal to any dialogue system is the dialogue manager, which is a component that manages the state of the dialogue, and dialogue strategy. A typical activity cycle in a dialogue system contains the following phases: [5]
Dialogue systems that are based on a text-only interface (e.g. text-based chat) contain only stages 2–5.
Dialogue systems fall into the following categories, which are listed here along a few dimensions. Many of the categories overlap and the distinctions may not be well established.
This section may lend undue weight to certain ideas, incidents, or controversies. Please help to create a more balanced presentation. Discuss and resolve this issue before removing this message. (May 2017) |
"A Natural Dialogue System is a form of dialogue system that tries to improve usability and user satisfaction by imitating human behaviour" [6] (Berg, 2014). It addresses the features of a human-to-human dialogue (e.g. sub dialogues and topic changes) and aims to integrate them into dialogue systems for human-machine interaction. Often, (spoken) dialogue systems require the user to adapt to the system because the system is only able to understand a very limited vocabulary, is not able to react to topic changes, and does not allow the user to influence the dialogue flow. Mixed-initiative is a way to enable the user to have an active part in the dialogue instead of only answering questions. However, the mere existence of mixed-initiative is not sufficient to be classified as a natural dialogue system. Other important aspects include: [6]
Although most of these aspects are issues of many different research projects, there is a lack of tools that support the development of dialogue systems addressing these topics. [7] Apart from VoiceXML that focuses on interactive voice response systems and is the basis for many spoken dialogue systems in industry (customer support applications) and AIML that is famous for the A.L.I.C.E. chatbot, none of these integrate linguistic features like dialogue acts or language generation. Therefore, NADIA (a research prototype) gives an idea of how to fill that gap and combines some of the aforementioned aspects like natural language generation, adaptive formulation, and sub dialogues.
Some authors measure the dialogue system's performance in terms of the percentage of sentences completely right, by comparing the model of sentences (this measure is called Concept Sentence Accuracy [8] or Sentence Understanding [4] ).
Dialogue systems can support a broad range of applications in business enterprises, education, government, healthcare, and entertainment. [9] For example:
In some cases, conversational agents can interact with users using artificial characters. These agents are then referred to as embodied agents.
A survey of current frameworks, languages and technologies for defining dialogue systems.
Name & links | System type | Description | Affiliation[s] | Environment[s] | Comments |
---|---|---|---|---|---|
AIML | Chatterbot language | XML dialect for creating natural language software agents | Richard Wallace, Pandorabots, Inc. | ||
ChatScript | Chatterbot language | Language/Engine for creating natural language software agents | Bruce Wilcox | ||
CSLU Toolkit | A state-based speech interface prototyping environment | OGI School of Science and Engineering M. McTear Ron Cole | publications are from 1999. | ||
NLUI Server | Domain-independent toolkit | Complete multilingual framework for building natural language user interface systems | LinguaSys | out-of-box support of mixed-initiative dialogues | |
Olympus | Complete framework for implementing spoken dialogue systems | Carnegie Mellon University | |||
Nextnova | Multimodal Platform | Platform for developing multimodal software applications. Based on State Chart XML (SCXML) | Ponvia Technology, Inc. | ||
VXML Voice XML | Spoken dialogue | Multimodal dialogue markup language | Developed initially by AT&T, then administered by an industry consortium and finally a W3C specification | Example | primarily for telephony. |
SALT | markup language | Multimodal dialogue markup language | Microsoft | "has not reached the level of maturity of VoiceXML in the standards process". | |
Quack.com - QXML | Development Environment | Company bought by AOL | |||
OpenDial | Domain-independent toolkit | Hybrid symbolic/statistical framework for spoken dialogue systems, implemented in Java | University of Oslo | ||
NADIA | dialogue engine and dialogue modelling | Creating natural dialogues/dialogue systems. Supports dialogue acts, mixed initiative, NLG. Implemented in Java. | Markus M. Berg | create XML-based dialogue files, no need to specify grammars, publications are from 2014 |
ELIZA is an early natural language processing computer program developed from 1964 to 1967 at MIT by Joseph Weizenbaum. Created to explore communication between humans and machines, ELIZA simulated conversation by using a pattern matching and substitution methodology that gave users an illusion of understanding on the part of the program, but had no representation that could be considered really understanding what was being said by either party. Whereas the ELIZA program itself was written (originally) in MAD-SLIP, the pattern matching directives that contained most of its language capability were provided in separate "scripts", represented in a lisp-like representation. The most famous script, DOCTOR, simulated a psychotherapist of the Rogerian school, and used rules, dictated in the script, to respond with non-directional questions to user inputs. As such, ELIZA was one of the first chatterbots and one of the first programs capable of attempting the Turing test.
In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine from the human end, while the machine simultaneously feeds back information that aids the operators' decision-making process. Examples of this broad concept of user interfaces include the interactive aspects of computer operating systems, hand tools, heavy machinery operator controls and process controls. The design considerations applicable when creating user interfaces are related to, or involve such disciplines as, ergonomics and psychology.
Natural language understanding (NLU) or natural language interpretation (NLI) is a subset of natural language processing in artificial intelligence that deals with machine reading comprehension. NLU is considered an AI-hard problem.
A chatbot is a software application or web interface that is designed to mimic human conversation through text or voice interactions. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use deep learning and natural language processing, but simpler chatbots have existed for decades.
Interactive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telephony, IVR allows customers to interact with a company's host system via a telephone keypad or by speech recognition, after which services can be inquired about through the IVR dialogue. IVR systems can respond with pre-recorded or dynamically generated audio to further direct users on how to proceed. IVR systems deployed in the network are sized to handle large call volumes and also used for outbound calling as IVR systems are more intelligent than many predictive dialer systems.
Natural language generation (NLG) is a software process that produces natural language output. A widely-cited survey of NLG methods describes NLG as "the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems than can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information".
Customer service is the assistance and advice provided by a company through phone, online chat, and e-mail to those who buy or use its products or services. Each industry requires different levels of customer service, but towards the end, the idea of a well-performed service is that of increasing revenues. The perception of success of the customer service interactions is dependent on employees "who can adjust themselves to the personality of the customer". Customer service is often practiced in a way that reflects the strategies and values of a firm. Good quality customer service is usually measured through customer retention.
Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data.
A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.
In artificial intelligence, an embodied agent, also sometimes referred to as an interface agent, is an intelligent agent that interacts with the environment through a physical body within that environment. Agents that are represented graphically with a body, for example a human or a cartoon animal, are also called embodied agents, although they have only virtual, not physical, embodiment. A branch of artificial intelligence focuses on empowering such agents to interact autonomously with human beings and the environment. Mobile robots are one example of physically embodied agents; Ananova and Microsoft Agent are examples of graphically embodied agents. Embodied conversational agents are embodied agents that are capable of engaging in conversation with one another and with humans employing the same verbal and nonverbal means that humans do.
A dialog manager (DM) is a component of a dialog system (DS), responsible for the state and flow of the conversation. Usually:
The Verbot (Verbal-Robot) was a popular chatbot program and artificial intelligence software development kit (SDK) for Windows and web.
A spoken dialog system (SDS) is a computer system able to converse with a human with voice. It has two essential components that do not exist in a written text dialog system: a speech recognizer and a text-to-speech module. It can be further distinguished from command and control speech systems that can respond to requests but do not attempt to maintain continuity over time.
AutoTutor is an intelligent tutoring system developed by researchers at the Institute for Intelligent Systems at the University of Memphis, including Arthur C. Graesser that helps students learn Newtonian physics, computer literacy, and critical thinking topics through tutorial dialogue in natural language. AutoTutor differs from other popular intelligent tutoring systems such as the Cognitive Tutor, in that it focuses on natural language dialog. This means that the tutoring occurs in the form of an ongoing conversation, with human input presented using either voice or free text input. To handle this input, AutoTutor uses computational linguistics algorithms including latent semantic analysis, regular expression matching, and speech act classifiers. These complementary techniques focus on the general meaning of the input, precise phrasing or keywords, and functional purpose of the expression, respectively. In addition to natural language input, AutoTutor can also accept ad hoc events such as mouse clicks, learner emotions inferred from emotion sensors, and estimates of prior knowledge from a student model. Based on these inputs, the computer tutor determine when to reply and what speech acts to reply with. This process is driven by a "script" that includes a set of dialog-specific production rules.
Natural-language user interface is a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications.
A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.
The Rich Representation Language, often abbreviated as RRL, is a computer animation language specifically designed to facilitate the interaction of two or more animated characters. The research effort was funded by the European Commission as part of the NECA Project. The NECA framework within which RRL was developed was not oriented towards the animation of movies, but the creation of intelligent "virtual characters" that interact within a virtual world and hold conversations with emotional content, coupled with suitable facial expressions.
The following outline is provided as an overview of and topical guide to natural-language processing:
Amazon Lex is a service for building conversational interfaces into any application using voice and text. It powers the Amazon Alexa virtual assistant. In April 2017, the platform was released to the developer community, and suggested that it could be used for conversational interfaces including Web, mobile apps, robots, toys, drones, and more. Amazon already had launched Alexa Voice Services, which developers can use to integrate Alexa into their own devices, like smart speakers, alarm clocks, etc.; however, Lex will not require that end users interact with the Alexa assistant per se, but rather any type of assistant or interface. As of February 2018, users can now define a response for Amazon Lex chatbots directly from the AWS management console.
A conversational user interface (CUI) is a user interface for computers that emulates a conversation with a real human. Historically, computers have relied on text-based user interfaces and graphical user interfaces (GUIs) to translate the user's desired action into commands the computer understands. While an effective mechanism of completing computing actions, there is a learning curve for the user associated with GUI. Instead, CUIs provide opportunity for the user to communicate with the computer in their natural language rather than in a syntax specific commands.