Semantic Interpretation for Speech Recognition

Last updated

Semantic Interpretation for Speech Recognition (SISR) defines the syntax and semantics of annotations to grammar rules in the Speech Recognition Grammar Specification (SRGS). Since 5 April 2007, it is a World Wide Web Consortium recommendation. [1]

Contents

By building upon SRGS grammars, it allows voice browsers via ECMAScript to semantically interpret complex grammars and provide the information back to the application. For example, it allows utterances like "I would like a Coca-cola and three large pizzas with pepperoni and mushrooms." to be interpreted into an object that can be understood by an application. For example, the utterance could produce the following object named order:

{drink:{liquid:"coke",drinksize:"medium"},pizza:{number:"3",pizzasize:"large",topping:["pepperoni","mushrooms"]}}

If used against this grammar that includes SISR markup in addition to the standard SRGS grammar in XML format:

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"                  "http://www.w3.org/TR/speech-grammar/grammar.dtd"><grammarxmlns="http://www.w3.org/2001/06/grammar"xml:lang="en"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2001/06/grammar                             http://www.w3.org/TR/speech-grammar/grammar.xsd"version="1.0"mode="voice"tag-format="semantics/1.0"root="order"><ruleid="order">Iwouldlikea <rulerefuri="#drink"/><tag>out.drink=newObject();out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize;</tag>and <rulerefuri="#pizza"/><tag>out.pizza=rules.pizza;</tag></rule><ruleid="kindofdrink"><one-of><item>coke</item><item>pepsi</item><item>cocacola<tag>out="coke";</tag></item></one-of></rule><ruleid="foodsize"><tag>out="medium";</tag><!-- "medium" is default if nothing said --><itemrepeat="0-1"><one-of><item>small<tag>out="small";</tag></item><item>medium</item><item>large<tag>out="large";</tag></item><item>regular<tag>out="medium";</tag></item></one-of></item></rule><!-- Construct Array of toppings, return Array --><ruleid="tops"><tag>out=newArray;</tag><rulerefuri="#top"/><tag>out.push(rules.top);</tag><itemrepeat="1-">and <rulerefuri="#top"/><tag>out.push(rules.top);</tag></item></rule><ruleid="top"><one-of><item>anchovies</item><item>pepperoni</item><item>mushroom<tag>out="mushrooms";</tag></item><item>mushrooms</item></one-of></rule><!-- Two properties (drinksize, type) on left hand side Rule Variable --><ruleid="drink"><rulerefuri="#foodsize"/><rulerefuri="#kindofdrink"/><tag>out.drinksize=rules.foodsize;out.type=rules.kindofdrink;</tag></rule><!-- Three properties on rules.pizza --><ruleid="pizza"><rulerefuri="#number"/><rulerefuri="#foodsize"/><tag>out.pizzasize=rules.foodsize;out.number=rules.number;</tag>pizzaswith <rulerefuri="#tops"/><tag>out.topping=rules.tops;</tag></rule><ruleid="number"><one-of><item><tag>out=1;</tag><one-of><item>a</item><item>one</item></one-of></item><item>two<tag>out=2;</tag></item><item>three<tag>out=3;</tag></item></one-of></rule></grammar>

See also

Related Research Articles

A document type definition (DTD) is a specification file that contains set of markup declarations that define a document type for an SGML-family markup language. The DTD specification file can be used to validate documents.

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the meaning and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content, and is one of a number of mathematical markup languages. Its aim is to natively integrate mathematical formulae into World Wide Web pages and other documents. It is part of HTML5 and standardised by ISO/IEC since 2015.

The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of syntax notations and data serialization formats, with Turtle currently being the most widely used notation.

VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service portals. VoiceXML applications are developed and deployed in a manner analogous to how a web browser interprets and visually renders the Hypertext Markup Language (HTML) it receives from a web server. VoiceXML documents are interpreted by a voice browser and in common deployment architectures, users interact with voice browsers via the public switched telephone network (PSTN).

XHTML Basic is an XML-based structured markup language primarily used for simple user agents, typically mobile devices.

SyncML is the former name for a platform-independent information synchronization standard. The project is currently referred to as Open Mobile Alliance Data Synchronization and Device Management. The purpose of SyncML is to offer an open standard as a replacement for existing data synchronization solutions, which have mostly been somewhat vendor-, application- or operating system specific. SyncML 1.0 specification was released on December 17, 2000, and 1.1 on February 26, 2002.

XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

Apache Wicket, commonly referred to as Wicket, is a component-based web application framework for the Java programming language conceptually similar to JavaServer Faces and Tapestry. It was originally written by Jonathan Locke in April 2004. Version 1.0 was released in June 2005. It graduated into an Apache top-level project in June 2007.

Speech Recognition Grammar Specification (SRGS) is a W3C standard for how speech recognition grammars are specified. A speech recognition grammar is a set of word patterns, and tells a speech recognition system what to expect a human to say. For instance, if you call an auto-attendant application, it will prompt you for the name of a person. It will then start up a speech recognizer, giving it a speech recognition grammar. This grammar contains the names of the people in the auto attendant's directory and a collection of sentence patterns that are the typical responses from callers to the prompt.

Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. It is a recommendation of the W3C's Voice Browser Working Group. SSML is often embedded in VoiceXML scripts to drive interactive telephony systems. However, it also may be used alone, such as for creating audio books. For desktop applications, other markup languages are popular, including Apple's embedded speech commands, and Microsoft's SAPI Text to speech (TTS) markup, also an XML language. It is also used to produce sounds via Azure Cognitive Services' Text to Speech API or when writing third-party skills for Google Assistant or Amazon Alexa.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

In computing, a CURIE defines a generic, abbreviated syntax for expressing Uniform Resource Identifiers (URIs). It is an abbreviated URI expressed in a compact syntax, and may be found in both XML and non-XML grammars. A CURIE may be considered a datatype.

The Pronunciation Lexicon Specification (PLS) is a W3C Recommendation, which is designed to enable interoperable specification of pronunciation information for both speech recognition and speech synthesis engines within voice browsing applications. The language is intended to be easy to use by developers while supporting the accurate specification of pronunciation information for international use.

N-Triples is a format for storing and transmitting data. It is a line-based, plain text serialisation format for RDF graphs, and a subset of the Turtle format. N-Triples should not be confused with Notation3 which is a superset of Turtle. N-Triples was primarily developed by Dave Beckett at the University of Bristol and Art Barstow at the World Wide Web Consortium (W3C).

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

In computing, Facelets is an open-source Web template system under the Apache license and the default view handler technology for Jakarta Server Faces. The language requires valid input XML documents to work. Facelets supports all of the JSF UI components and focuses completely on building the JSF component tree, reflecting the view for a JSF application.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

References