Developed by | MMI Working Group (World Wide Web Consortium) |
---|---|
Initial release | April 22, 2005 |
Latest release | Recommendation October 25, 2012 |
Type of format | Recommendation |
Standard | Last Version |
Website | http://www.w3.org/2002/mmi/ |
Multimodal Architecture and Interfaces is an open standard developed by the World Wide Web Consortium since 2005. It was published as a Recommendation of the W3C on October 25, 2012. The document is a technical report specifying a multimodal system architecture and its generic interfaces to facilitate integration and multimodal interaction management in a computer system. It has been developed by the W3C's Multimodal Interaction Working Group.
The Multimodal Architecture and Interfaces recommendation introduces a generic structure and a communication protocol to allow the modules in a multimodal system to communicate with each other.
This specification proposes an event-driven architecture as a general frame of reference focused in the control flow data exchange. It can be used to determine the basic infrastructures needed to command the application's multimodal services.
The architecture is also proposed to facilitate the task of implementing several types of multimodal services providers on multiple devices: mobile devices and cell phones, home appliances, Internet of Things objects, television and home networks, enterprise applications, web applications, [1] "smart" cars or on medical devices and applications.
Multimodal Architecture and Interfaces is the specified description of a larger services infrastructure called The Runtime Framework which provides the main functions that a multimodal system can need. This framework is at a higher level of abstraction than the MMI architecture. [2] The MMI Runtime Framework is the runtime support and communication modules of the multimodal system while MMI Architecture is the description and the specification of its main modules, its interfaces and its communication modes.
The Multimodal Architecture and Interfaces specification is based on the MVC design pattern, that proposes to organize the user interface structure in three parts: the Model, the View and the Controller. [3] This design pattern is also shown by the Data-Flow-Presentation architecture from the Voice Browser Working Group. [4]
A particularity of this architecture is that although the presentation layer represented by theView has traditionally been associated with graphical interfaces; this recommendation abstraction generalizes the View to the broader context of the multimodal interaction, where the user can use a combination of visual, auditory, biometric and / or tactile modalities.
The MMI Architecture recommendation distinguishes three types of components: the Interaction Manager – IM, the Data Component – DC and the Modality Components – MC. This distinction is similar to the separation between the Controller, the Model and the presentation documents of the View in the MVC pattern.
Another characteristic is recursion. The modules are black boxes and it is possible to encapsulate several components in a more complex component, which communicate with an Interaction Manager at a higher level. In this way, the architecture follows the nested dolls principle . [5]
The specification also covers the issues of a distributed implementation on multiple material resources in a network or a centralized implementation, with all the modules installed in a single material support. Information sharing between modules is loose coupled. This promotes low dependence between modules, reducing the impact of changes in one module on other modules, and facilitating the modules' reuse. In this way, the modules have little or no knowledge of the functioning of any other modules and the communication between modules is done through the exchange of messages following a precise communication protocol provided by the architecture's API. [6]
The interaction manager is a logical component, responsible for all message exchanges between the components of the system and the multimodal Runtime Framework. It is a communication bus and also an event handler.
Each application can configure at least one Interaction Manager to define the required interaction logic. This controller is the core of the multimodal interaction:
The modality components are responsible for specific tasks, including handling inputs and outputs in various ways, such as speech, writing, video, etc..
These are logical entities that handles the input and output of different hardware devices (microphone, graphic tablet, keyboard) and software services (motion detection, biometric changes) associated with the multimodal system. For example, (see figure below), a modality component A can be charged at the same time of the speech recognition and the audio input management. Another modality component B can manage the complementary command inputs on two different devices: a graphics tablet and a microphone. Two modality components C, can manage separately two complementary inputs given by a single device: a camcorder. And finally, a modality component D, can use an external recognition web service and only be responsible for the control of communication exchanges needed for the recognition task.
In all four cases the system has a generic modality component for the detection of a voice command input, despite the differences in implementation. Any modality component can potentially wrap multiple features provided by multiple physical devices but also more than one modality component could be included in a single device. To this extent the modality component is an abstraction of the same kind of input handled and implemented differently in each case.
For this reason, the W3C recommendation currently does not describe in detail the structure or implementation of the modality components. It focuses only on the need of a communication interface with the Interaction Manager and the need of an implementation that follows a specific communication protocol: the Life-Cycle Events.
The data component's primary role is to save the public data of the application that may be required by one or several modality components or by other modules (e.g., the Framework's session module).
The data component can be an internal module A or an external module B to the Interaction Manager (see Figure). This depends on the implementation chosen by each application. However, the Interaction Manager is the only module that has direct access to the Data Component and only the Interaction Manager can view and edit the data and communicate with external servers if necessary. As a result, the Modality Components must use the Interaction Manager as an intermediary to access the public data of the multimodal application.
However, for the storage of private data, each Modality Component can implement its own Data Component. This private Data Component can also access external servers B (see Figure) and keep the data that the Modality Component may require, for example, in the speech recognition task. This can be the case of an implementation following the principle of nested dolls given by the MMI architecture recommendation.
In the MMI architecture, the communication protocol is asynchronous, bi-directional and based on the exchange of event notifications that are raised by the system following a user action or some internal activity.
This protocol defines the exchange mode and how to establish and end the communication between modules. In the case of this specification, it is reflected in the Life-Cycle Events. These are six standard control events that are proposed for controlling devices and material services (such as a video player or a sound reproduction device) and two notifications proposed for monitoring the current status of the multimodal system.
The specification recommends eight standard Life-Cycle Events specified as pairs of Request > Response exchanges:
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:newContextRequestrequestID="myReq1"source="myPointerMC.php"target="myIM.php"data="myMCStatus.xml"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:newContextResponserequestID="myReq1"source="myIM.php"target="myPointerMC.php"context="myContextID1"status="success"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:clearContextRequestrequestID="myReq2"source="myPointerMC.php"target="myIM.php"context="myContextID1"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:clearContextResponserequestID="myReq2"source="myPointerMC.php"target="myIM.php"context="myContextID1"status="success"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"xmlns:svg="http://www.w3.org/2000/svg"><mmi:prepareRequestrequestID="myReq3"source="myIM.php"target="myDisplayMC.php"context="myContextID2"><mmi:content><svg:svgwidth="100%"height="100%"version="1.1"><rectwidth="300"height="100"style="fill:rgb(0,0,255);stroke-width:1; stroke:rgb(0,0,0)"/></svg:svg></mmi:content></mmi:prepareRequest></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:prepareResponserequestID="myReq3"source="myDisplayMC.php"target="myIM.php"status="success"context="myContextID2"></mmi:prepareResponse></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:startRequestrequestID="myReq4"source="myIM.php"target="myPlayerMC.php"context="myContextID3"data="myPlayerParams.xml"><mmi:contentURLhref="myAnimation.swf"/></mmi:startRequest></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:startResponserequestID="myReq4"source="myPlayerMC.php"target="myIM.php"status="success"context="myContextID3"></mmi:startResponse></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:startResponserequestID="myReq4"source="myPlayerMC.php"target="myIM.php"status="failure"context="myContextID3"><mmi:statusInfo> NoContent </mmi:statusInfo></mmi:startResponse></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:cancelRequestrequestID="myReq5"source="myIM.php"target="mySpokerMC.php"context="myContextID4"Immediate="true"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:cancelResponserequestID="myReq5"source="mySpokerMC.php"target="myIM.php"context="myContextID4"status="success"data="myMCStatus.xml"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:pauseRequestrequestID="myReq6"source="myIM.php"target="myWriterMC.php"context="myContextID5"immediate="false"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:pauseResponserequestID="myReq6"source="myWriterMC.php"target="myIM.php"context="myContextID5"status="success"data="myMCStatus.xml"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:resumeRequestrequestID="myReq7"source="myIM.php"target="myWriterMC.php"context="myContextID5"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:resumeResponserequestID="myReq7"source="myWriterMC.php"target="myIM.php"context="myContextID5"status="success"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:statusRequestrequestID="myReq8"source="myRecorderMC.php"target="myIM.php"requestAutomaticUpdate="true"/></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:statusResponserequestID="myReq8"source="myIM.php"target="myRecorderMC.php"status="alive"AutomaticUpdate="true"/></mmi:mmi>
The specification also recommends two types of notifications, which one, the Extension Notification, can contain control or command data to handle devices or material services. For this reason this notification is regarded like an exception, a standard control event that is not described as a pair Request > Response (in a previous version it was called the Data Event). The two notifications are:
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:extensionNotificationrequestID="myReq9"source="myPlayerMC.php"target="myIM.php"context="myContextID6"name="playerNavigation"><applicationdata><dataid="menu"selected="true"/></applicationdata></mmi:extensionNotification></mmi:mmi>
<mmi:mmixmlns:mmi="http://www.w3.org/2008/04/mmi-arch"version="1.0"><mmi:doneNotificationrequestID="myReq10"source="myDetectorMC.php"target="myIM.php"context="myContextID7"status="success"><mmi:data><dataid="detectionList"><users><userid="u58"confidence=".85"/><userid="u32"confidence=".75"/><userid="u87"confidence=".60"/></users></data></mmi:data></mmi:doneNotification></mmi:mmi>
SOAP is a messaging protocol specification for exchanging structured information in the implementation of web services in computer networks. It uses XML Information Set for its message format, and relies on application layer protocols, most often Hypertext Transfer Protocol (HTTP), although some legacy systems communicate over Simple Mail Transfer Protocol (SMTP), for message negotiation and transmission.
The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.
XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.
Apache Wicket, commonly referred to as Wicket, is a component-based web application framework for the Java programming language conceptually similar to JavaServer Faces and Tapestry. It was originally written by Jonathan Locke in April 2004. Version 1.0 was released in June 2005. It graduated into an Apache top-level project in June 2007.
Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data.
Speech Recognition Grammar Specification (SRGS) is a W3C standard for how speech recognition grammars are specified. A speech recognition grammar is a set of word patterns, and tells a speech recognition system what to expect a human to say. For instance, if you call an auto-attendant application, it will prompt you for the name of a person. It will then start up a speech recognizer, giving it a speech recognition grammar. This grammar contains the names of the people in the auto attendant's directory and a collection of sentence patterns that are the typical responses from callers to the prompt.
RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.
SCXML stands for State Chart XML: State Machine Notation for Control Abstraction. It is an XML-based markup language that provides a generic state-machine-based execution environment based on Harel statecharts.
The Multimodal Interaction Activity is an initiative from W3C aiming to provide means to support Multimodal interaction scenarios on the Web.
In computer science and web development, XML Events is a W3C standard for handling events that occur in an XML document. These events are typically caused by users interacting with the web page using a device, such as a web browser on a personal computer or mobile phone.
The Web Application Description Language (WADL) is a machine-readable XML description of HTTP-based web services. WADL models the resources provided by a service and the relationships between them. WADL is intended to simplify the reuse of web services that are based on the existing HTTP architecture of the Web. It is platform and language independent and aims to promote reuse of applications beyond the basic use in a web browser. WADL was submitted to the World Wide Web Consortium by Sun Microsystems on 31 August 2009, but the consortium has no current plans to standardize it. WADL is the REST equivalent of SOAP's Web Services Description Languages (WSDL), which can also be used to describe REST web services.
Security Assertion Markup Language 2.0 (SAML 2.0) is a version of the SAML standard for exchanging authentication and authorization identities between security domains. SAML 2.0 is an XML-based protocol that uses security tokens containing assertions to pass information about a principal between a SAML authority, named an Identity Provider, and a SAML consumer, named a Service Provider. SAML 2.0 enables web-based, cross-domain single sign-on (SSO), which helps reduce the administrative overhead of distributing multiple authentication tokens to the user. SAML 2.0 was ratified as an OASIS Standard in March 2005, replacing SAML 1.1. The critical aspects of SAML 2.0 are covered in detail in the official documents SAMLCore, SAMLBind, SAMLProf, and SAMLMeta.
N-Triples is a format for storing and transmitting data. It is a line-based, plain text serialisation format for RDF graphs, and a subset of the Turtle format. N-Triples should not be confused with Notation3 which is a superset of Turtle. N-Triples was primarily developed by Dave Beckett at the University of Bristol and Art Barstow at the World Wide Web Consortium (W3C).
RichFaces was an open source Ajax-enabled component library for JavaServer Faces, hosted by JBoss. It allows easy integration of Ajax capabilities into enterprise application development. It reached its end-of-life in June 2016.
XML documents typically refer to external entities, for example the public and/or system ID for the Document Type Definition. These external relationships are expressed using URIs, typically as URLs.
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
In computing, Facelets is an open-source Web template system under the Apache license and the default view handler technology for Jakarta Faces. The language requires valid input XML documents to work. Facelets supports all of the JSF UI components and focuses completely on building the JSF component tree, reflecting the view for a JSF application.
The Web Services Description Language is an XML-based interface description language that is used for describing the functionality offered by a web service. The acronym is also used for any specific WSDL description of a web service, which provides a machine-readable description of how the service can be called, what parameters it expects, and what data structures it returns. Therefore, its purpose is roughly similar to that of a type signature in a programming language.
XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.
gSOAP is a C and C++ software development toolkit for SOAP/XML web services and generic XML data bindings. Given a set of C/C++ type declarations, the compiler-based gSOAP tools generate serialization routines in source code for efficient XML serialization of the specified C and C++ data structures. Serialization takes zero-copy overhead.