Speech Application Language Tags

Last updated

Speech Application Language Tags (SALT) is an XML-based markup language that is used in HTML and XHTML pages to add voice recognition capabilities to web-based applications.

Contents

Description

Speech Application Language Tags enables multimodal and telephony-enabled access to information, applications, and Web services from PCs, telephones, tablet PCs, and wireless personal digital assistants (PDAs). The Speech Application Language Tags extend existing mark-up languages such as HTML, XHTML, and XML. Multimodal access will enable users to interact with an application in a variety of ways: they will be able to input data using speech, a keyboard, keypad, mouse and/or stylus, and produce data as synthesized speech, audio, plain text, motion video, and/or graphics.

History

SALT was developed as a competitor to VoiceXML and was supported by the SALT Forum. The SALT Forum was founded on October 15, 2001, by Microsoft, along with Cisco Systems, Comverse, Intel, Philips Consumer Electronics, and ScanSoft. [1] The SALT 1.0 specification was submitted to the W3C (World Wide Web Consortium) for review in August 2002. [2] However, the W3C continued developing its VoiceXML 2.0 standard, which reached the final "Recommendation" stage in March 2004. [3]

By 2006, Microsoft realized Speech Server had to support the W3C VoiceXML standard to remain competitive. Microsoft joined the VoiceXML Forum as a Promoter in April of that year. [4] Speech Server 2007 supports VoiceXML 2.0 and 2.1 in addition to SALT. In 2007, Microsoft purchased Tellme, one of the largest VoiceXML service providers.

By that point nearly every other SALT Forum company had committed to VoiceXML. [5] The last press release posted to the SALT Forum website was in 2003, while the VoiceXML Forum is quite active. "SALT [Speech Application Language Tags] is a direct competitor but has not reached the level of maturity of VoiceXML in the standards process," said Bill Meisel, principal at TMA Associates, a speech technology research firm. [3]

Usage

The Microsoft Speech Server 2004 product supports SALT, while Microsoft Speech Server 2007 supports SALT in addition to VoiceXML 2.0 and 2.1. There is also a speech add-in for Internet Explorer that interprets SALT tags on web pages, available as part of the Microsoft Speech Application SDK.

Related Research Articles

HTML Hypertext Markup Language

The HyperText Markup Language, or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

Markup language Modern system for annotating a document

In computer text processing, a markup language is a system for annotating a document in a way that is visually distinguishable from the content. It is used only to format the text, so that when the document is processed for display, the markup language does not appear. The idea and terminology evolved from the "marking up" of paper manuscripts, which is traditionally written with a red pen or blue pencil on authors' manuscripts. Such "markup" typically includes both content corrections, and also typographic instructions, such as to make a heading larger or boldface.

Synchronized Multimedia Integration Language XML-based markup language for multimedia presentations

Synchronized Multimedia Integration Language ) is a World Wide Web Consortium recommended Extensible Markup Language (XML) markup language to describe multimedia presentations. It defines markup for timing, layout, animations, visual transitions, and media embedding, among other things. SMIL allows presenting media items such as text, images, video, audio, links to other SMIL presentations, and files from multiple web servers. SMIL markup is written in XML, and has similarities to HTML.

The Telephony Application Programming Interface (TAPI) is a Microsoft Windows API, which provides computer telephony integration and enables PCs running Microsoft Windows to use telephone services. Different versions of TAPI are available on different versions of Windows. TAPI allows applications to control telephony functions between a computer and telephone network for data, fax, and voice calls. It includes basic functions, such as dialing, answering, and hanging up a call. It also supports supplementary functions, such as hold, transfer, conference, and call park found in PBX, ISDN, and other telephone systems.

Wireless Markup Language Markup language intended for devices that implement the Wireless Application Protocol specification

Wireless Markup Language (WML), based on XML, is a now-obsolete markup language intended for devices that implement the Wireless Application Protocol (WAP) specification, such as mobile phones. It provides navigational support, data input, hyperlinks, text and image presentation, and forms, much like HTML. It preceded the use of other markup languages now used with WAP, such as HTML itself, and XHTML.

VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service portals. VoiceXML applications are developed and deployed in a manner analogous to how a web browser interprets and visually renders the Hypertext Markup Language (HTML) it receives from a web server. VoiceXML documents are interpreted by a voice browser and in common deployment architectures, users interact with voice browsers via the public switched telephone network (PSTN).

HTML-Kit is a proprietary HTML editor for Microsoft Windows made by chami.com. The application is a full-featured HTML editor designed to edit, format, validate, preview and publish web pages in HTML, XHTML and XML -languages.

Call Control eXtensible Markup Language (CCXML) is an XML standard designed to provide asynchronous event-based telephony support to VoiceXML. Its current status is a W3C Proposed Recommendation, adopted May 10, 2011. Whereas VoiceXML is designed to provide a Voice User Interface to a voice browser, CCXML is designed to inform the voice browser how to handle the telephony control of the voice channel. The two XML applications are wholly separate and are not required by each other to be implemented - however, they have been designed with interoperability in mind

In computing, quirks mode refers to a technique used by some web browsers for the sake of maintaining backward compatibility with web pages designed for old web browsers instead of strictly complying with W3C and IETF standards in standards mode.

The SASDK is Microsoft's Speech Application SDK. It is used to create telephony applications as well as multimodal web applications. It complies with the SALT XML standard, unlike Microsoft's earlier endeavors. The SASDK is used to create Web-based applications only. It can be used to create a single application with both a web interface and a telephony interface.

The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date, a number of versions of the API have been released, which have shipped either as part of a Speech SDK or as part of the Windows OS itself. Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech Server.

RDFa or Resource Description Framework in Attributes  is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

SCXML stands for State Chart XML: State Machine Notation for Control Abstraction. It is an XML-based markup language that provides a generic state-machine-based execution environment based on Harel statecharts.

The Multimodal Interaction Activity is an initiative from W3C aiming to provide means to support Multimodal interaction scenarios on the Web.

XHTML+Voice is an XML language for describing multimodal user interfaces. The two essential modalities are visual and auditory. Visual interaction is defined like most current web pages via XHTML. Auditory components are defined by a subset of Voice XML. Interfacing the voice and visual components of X+V documents is accomplished through a combination of ECMAScript, JavaScript, and XML Events.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

CT Connect is a software product that allows computer applications to monitor and control telephone calls. This monitoring and control is called computer-telephone integration, or CTI. CT Connect implements CTI by providing server software that supports the CTI link protocols used by a range of telephone systems, and client software that provides an application programming interface (API) for telephony functions.

Voxeo Corporation was a technology company that specialized in providing development platforms for unified customer experience (self-service) and unified communications applications. Voxeo was headquartered in Orlando, Florida with main offices in Cologne, Germany; Beijing, China; London, UK and San Francisco, US.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

References

  1. Cisco, Comverse, Intel, Microsoft, Philips and SpeechWorks Found Speech Application Language Tags Forum to Develop New Standard For Multimodal and Telephony-Enabled Applications and Services Archived 2012-07-12 at archive.today News@Cisco News release, October 15, 2001
  2. SALT Forum Submits Spec to W3C eWeek, Dennis Callaghan, August 13, 2002
  3. 1 2 W3C recommends VoiceXML 2.0 Archived 2007-11-16 at the Wayback Machine InfoWorld, Ephraim Schwartz, March 17, 2004
  4. Microsoft Unveils Road Map for Speech Server 2007 Microsoft PressPass - Information for Journalists, April 5, 2006
  5. SALT Forum Companies Back VoiceXML