Dialogue system

Last updated

An automated online assistant on a website - an example where dialogue systems are major components Automated online assistant.png
An automated online assistant on a website - an example where dialogue systems are major components

A dialogue system, or conversational agent (CA), is a computer system intended to converse with a human. Dialogue systems employed one or more of text, speech, graphics, haptics, gestures, and other modes for communication on both the input and output channel.

Contents

The elements of a dialogue system are not defined because this idea is under research,[ citation needed ] however, they are different from chatbot. [1] The typical GUI wizard engages in a sort of dialogue, but it includes very few of the common dialogue system components, and the dialogue state is trivial.

Background

After dialogue systems based only on written text processing starting from the early Sixties, [2] the first speaking dialogue system was issued by the DARPA Project in the US in 1977. [3] After the end of this 5-year project, some European projects issued the first dialogue system able to speak many languages (also French, German and Italian). [4] Those first systems were used in the telecom industry to provide phone various services in specific domains, e.g. automated agenda and train tables service.

Components

What sets of components are included in a dialogue system, and how those components divide up responsibilities differs from system to system. Principal to any dialogue system is the dialogue manager, which is a component that manages the state of the dialogue, and dialogue strategy. A typical activity cycle in a dialogue system contains the following phases: [5]

  1. The user speaks, and the input is converted to plain text by the system's input recogniser/decoder, which may include:
  2. The text is analysed by a natural language understanding (NLU) unit, which may include:
  3. The semantic information is analysed by the dialogue manager, which keeps the history and state of the dialogue and manages the general flow of the conversation.
  4. Usually, the dialogue manager contacts one or more task managers, that have knowledge of the specific task domain.
  5. The dialogue manager produces output using an output generator, which may include:
  6. Finally, the output is rendered using an output renderer, which may include:

Dialogue systems that are based on a text-only interface (e.g. text-based chat) contain only stages 2–5.

Types of systems

Dialogue systems fall into the following categories, which are listed here along a few dimensions. Many of the categories overlap and the distinctions may not be well established.

Natural dialogue systems

"A Natural Dialogue System is a form of dialogue system that tries to improve usability and user satisfaction by imitating human behaviour" [6] (Berg, 2014). It addresses the features of a human-to-human dialogue (e.g. sub dialogues and topic changes) and aims to integrate them into dialogue systems for human-machine interaction. Often, (spoken) dialogue systems require the user to adapt to the system because the system is only able to understand a very limited vocabulary, is not able to react to topic changes, and does not allow the user to influence the dialogue flow. Mixed-initiative is a way to enable the user to have an active part in the dialogue instead of only answering questions. However, the mere existence of mixed-initiative is not sufficient to be classified as a natural dialogue system. Other important aspects include: [6]

Although most of these aspects are issues of many different research projects, there is a lack of tools that support the development of dialogue systems addressing these topics. [7] Apart from VoiceXML that focuses on interactive voice response systems and is the basis for many spoken dialogue systems in industry (customer support applications) and AIML that is famous for the A.L.I.C.E. chatbot, none of these integrate linguistic features like dialogue acts or language generation. Therefore, NADIA (a research prototype) gives an idea of how to fill that gap and combines some of the aforementioned aspects like natural language generation, adaptive formulation, and sub dialogues.

Performance

Some authors measure the dialogue system's performance in terms of the percentage of sentences completely right, by comparing the model of sentences (this measure is called Concept Sentence Accuracy [8] or Sentence Understanding [4] ).

Applications

Dialogue systems can support a broad range of applications in business enterprises, education, government, healthcare, and entertainment. [9] For example:

In some cases, conversational agents can interact with users using artificial characters. These agents are then referred to as embodied agents.

Toolkits and architectures

A survey of current frameworks, languages and technologies for defining dialogue systems.

Name & linksSystem typeDescriptionAffiliation[s]Environment[s]Comments
AIML Chatterbot languageXML dialect for creating natural language software agents Richard Wallace, Pandorabots, Inc.
ChatScript Chatterbot languageLanguage/Engine for creating natural language software agents Bruce Wilcox
CSLU Toolkit
A state-based speech interface prototyping environment OGI School of Science and Engineering
M. McTear
Ron Cole
publications are from 1999.
NLUI Server Domain-independent toolkitComplete multilingual framework for building natural language user interface systems LinguaSys out-of-box support of mixed-initiative dialogues
OlympusComplete framework for implementing spoken dialogue systems Carnegie Mellon University
Nextnova Multimodal PlatformPlatform for developing multimodal software applications. Based on State Chart XML (SCXML) Ponvia Technology, Inc.
VXML
Voice XML
Spoken dialogueMultimodal dialogue markup languageDeveloped initially by AT&T, then administered by an industry consortium and finally a W3C specificationExampleprimarily for telephony.
SALT markup languageMultimodal dialogue markup language Microsoft "has not reached the level of maturity of VoiceXML in the standards process".
Quack.com - QXMLDevelopment EnvironmentCompany bought by AOL
OpenDial Domain-independent toolkitHybrid symbolic/statistical framework for spoken dialogue systems, implemented in Java University of Oslo
NADIA dialogue engine and dialogue modellingCreating natural dialogues/dialogue systems. Supports dialogue acts, mixed initiative, NLG. Implemented in Java.Markus M. Bergcreate XML-based dialogue files, no need to specify grammars, publications are from 2014

See also

Related Research Articles

<span class="mw-page-title-main">ELIZA</span> Early natural language processing computer program

ELIZA is an early natural language processing computer program developed from 1964 to 1967 at MIT by Joseph Weizenbaum. Created to explore communication between humans and machines, ELIZA simulated conversation by using a pattern matching and substitution methodology that gave users an illusion of understanding on the part of the program, but had no representation that could be considered really understanding what was being said by either party. Whereas the ELIZA program itself was written (originally) in MAD-SLIP, the pattern matching directives that contained most of its language capability were provided in separate "scripts", represented in a lisp-like representation. The most famous script, DOCTOR, simulated a psychotherapist of the Rogerian school, and used rules, dictated in the script, to respond with non-directional questions to user inputs. As such, ELIZA was one of the first chatterbots and one of the first programs capable of attempting the Turing test.

<span class="mw-page-title-main">User interface</span> Means by which a user interacts with and controls a machine

In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine from the human end, while the machine simultaneously feeds back information that aids the operators' decision-making process. Examples of this broad concept of user interfaces include the interactive aspects of computer operating systems, hand tools, heavy machinery operator controls and process controls. The design considerations applicable when creating user interfaces are related to, or involve such disciplines as, ergonomics and psychology.

Natural language understanding (NLU) or natural language interpretation (NLI) is a subset of natural language processing in artificial intelligence that deals with machine reading comprehension. NLU is considered an AI-hard problem.

<span class="mw-page-title-main">Chatbot</span> Program that simulates conversation

A chatbot is a software application or web interface that is designed to mimic human conversation through text or voice interactions. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. Such chatbots often use deep learning and natural language processing, but simpler chatbots have existed for decades.

Interactive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telephony, IVR allows customers to interact with a company's host system via a telephone keypad or by speech recognition, after which services can be inquired about through the IVR dialogue. IVR systems can respond with pre-recorded or dynamically generated audio to further direct users on how to proceed. IVR systems deployed in the network are sized to handle large call volumes and also used for outbound calling as IVR systems are more intelligent than many predictive dialer systems.

Natural language generation (NLG) is a software process that produces natural language output. A widely-cited survey of NLG methods describes NLG as "the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems than can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information".

<span class="mw-page-title-main">Customer service</span> Provision of service to customers

Customer service is the assistance and advice provided by a company through phone, online chat, and e-mail to those who buy or use its products or services. Each industry requires different levels of customer service, but towards the end, the idea of a well-performed service is that of increasing revenues. The perception of success of the customer service interactions is dependent on employees "who can adjust themselves to the personality of the customer". Customer service is often practiced in a way that reflects the strategies and values of a firm. Good quality customer service is usually measured through customer retention.

Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data.

A voice-user interface (VUI) enables spoken human interaction with computers, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply. A voice command device is a device controlled with a voice user interface.

In artificial intelligence, an embodied agent, also sometimes referred to as an interface agent, is an intelligent agent that interacts with the environment through a physical body within that environment. Agents that are represented graphically with a body, for example a human or a cartoon animal, are also called embodied agents, although they have only virtual, not physical, embodiment. A branch of artificial intelligence focuses on empowering such agents to interact autonomously with human beings and the environment. Mobile robots are one example of physically embodied agents; Ananova and Microsoft Agent are examples of graphically embodied agents. Embodied conversational agents are embodied agents that are capable of engaging in conversation with one another and with humans employing the same verbal and nonverbal means that humans do.

A dialog manager (DM) is a component of a dialog system (DS), responsible for the state and flow of the conversation. Usually:

The Verbot (Verbal-Robot) was a popular chatbot program and artificial intelligence software development kit (SDK) for Windows and web.

A spoken dialog system (SDS) is a computer system able to converse with a human with voice. It has two essential components that do not exist in a written text dialog system: a speech recognizer and a text-to-speech module. It can be further distinguished from command and control speech systems that can respond to requests but do not attempt to maintain continuity over time.

AutoTutor is an intelligent tutoring system developed by researchers at the Institute for Intelligent Systems at the University of Memphis, including Arthur C. Graesser that helps students learn Newtonian physics, computer literacy, and critical thinking topics through tutorial dialogue in natural language. AutoTutor differs from other popular intelligent tutoring systems such as the Cognitive Tutor, in that it focuses on natural language dialog. This means that the tutoring occurs in the form of an ongoing conversation, with human input presented using either voice or free text input. To handle this input, AutoTutor uses computational linguistics algorithms including latent semantic analysis, regular expression matching, and speech act classifiers. These complementary techniques focus on the general meaning of the input, precise phrasing or keywords, and functional purpose of the expression, respectively. In addition to natural language input, AutoTutor can also accept ad hoc events such as mouse clicks, learner emotions inferred from emotion sensors, and estimates of prior knowledge from a student model. Based on these inputs, the computer tutor determine when to reply and what speech acts to reply with. This process is driven by a "script" that includes a set of dialog-specific production rules.

Natural-language user interface is a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications.

<span class="mw-page-title-main">Virtual assistant</span> Software agent

A virtual assistant (VA) is a software agent that can perform a range of tasks or services for a user based on user input such as commands or questions, including verbal ones. Such technologies often incorporate chatbot capabilities to simulate human conversation, such as via online chat, to facilitate interaction with their users. The interaction may be via text, graphical interface, or voice - as some virtual assistants are able to interpret human speech and respond via synthesized voices.

The Rich Representation Language, often abbreviated as RRL, is a computer animation language specifically designed to facilitate the interaction of two or more animated characters. The research effort was funded by the European Commission as part of the NECA Project. The NECA framework within which RRL was developed was not oriented towards the animation of movies, but the creation of intelligent "virtual characters" that interact within a virtual world and hold conversations with emotional content, coupled with suitable facial expressions.

The following outline is provided as an overview of and topical guide to natural-language processing:

Amazon Lex is a service for building conversational interfaces into any application using voice and text. It powers the Amazon Alexa virtual assistant. In April 2017, the platform was released to the developer community, and suggested that it could be used for conversational interfaces including Web, mobile apps, robots, toys, drones, and more. Amazon already had launched Alexa Voice Services, which developers can use to integrate Alexa into their own devices, like smart speakers, alarm clocks, etc.; however, Lex will not require that end users interact with the Alexa assistant per se, but rather any type of assistant or interface. As of February 2018, users can now define a response for Amazon Lex chatbots directly from the AWS management console.

A conversational user interface (CUI) is a user interface for computers that emulates a conversation with a real human. Historically, computers have relied on text-based user interfaces and graphical user interfaces (GUIs) to translate the user's desired action into commands the computer understands. While an effective mechanism of completing computing actions, there is a learning curve for the user associated with GUI. Instead, CUIs provide opportunity for the user to communicate with the computer in their natural language rather than in a syntax specific commands.

References

  1. Klüwer, Tina. "From chatbots to dialog systems." Conversational agents and natural language interaction: Techniques and Effective Practices. IGI Global, 2011. 1-22.
  2. McTear, Michael, Zoraida Callejas, and David Griol, The conversational interface: Talking to smart devices , Springer, 2016.
  3. Giancarlo Pirani (ed), Advanced algorithms and architectures for speech understanding , Vol. 1. Springer Science & Business Media, 2013.
  4. 1 2 Alberto Ciaramella, A prototype performance evaluation report, Sundial work package 8000 (1993).
  5. Jurafsky & Martin (2009), Speech and language processing. Pearson International Edition, ISBN   978-0-13-504196-3, Chapter 24
  6. 1 2 Berg, Markus M. (2014), Modelling of Natural Dialogues in the Context of Speech-based Information and Control Systems, Akademische Verlagsgesellschaft AKA, ISBN   978-3-89838-508-4
  7. Berg, Markus M. (2015), "NADIA: A Simplified Approach Towards the Development of Natural Dialogue Systems", Natural Language Processing and Information Systems, Lecture Notes in Computer Science, vol. 9103, pp. 144–150, doi:10.1007/978-3-319-19581-0_12, ISBN   978-3-319-19580-3
  8. Bangalore, Srinivas, and Michael Johnston. "Robust understanding in multimodal interfaces." Computational Linguistics 35.3 (2009): 345-397.
  9. Lester, J.; Branting, K.; Mott, B. (2004), "Conversational Agents" (PDF), The Practical Handbook of Internet Computing, Chapman & Hall
  10. Crovari; Pidò; Pinoli; Bernasconi; Canakoglu; Garzotto; Ceri (2021), "GeCoAgent: a conversational agent for empowering genomic data extraction and analysis", ACM Transactions on Computing for Healthcare, 3, ACM New York, NY: 1–29, doi:10.1145/3464383, hdl: 11311/1192262 , S2CID   245855725

Further reading