SpeechWeb

Last updated

A SpeechWeb is a collection of hyperlinked speech applications, accessed remotely by speech browsers running on end-user devices. Links are activated through spoken commands.

The idea of surfing the web by voice dates back to at least the work of Hemphill and Thrift in 1995 [1] who developed a system in which, HTML pages were downloaded and processed on client-side computers enabling voice access to web page content, and activation of hyperlinks through spoken commands.

Also in the mid 1990s, researchers at AT&T were discussing the development of a new markup language that would enable the web to be accessed through regular phones. From 1995 to 1999, AT&T, Lucent, Motorola, and IBM all developed their own versions of phone and speech markup languages. These companies created the VoiceXML Forum, and jointly designed the Voice Markup Language, VXML, which was accepted by the W3C Committee in 2000. VXML is typically used to create hyperlinked speech applications. [2] VXML pages include commands for prompting user speech input, invoking recognition grammars, outputting synthesized voice, iterating through blocks of code, calling local JavaScript, and hyperlinking to other remote VXML pages downloaded in a manner similar to the linking of HTML pages in the conventional Web.

Around the same time as the emergence of VXML, a research group at the University of Windsor in Canada were developing an alternative approach, in which speech applications deployed on the web can be accessed by client-side speech browsers which provide the speech-recognition capability, that is tailored to the application by downloading an application-specific recognition grammar from the remote speech application web site. Input that is recognized by the client-side browser is sent to the remote server which processes it and returns a text result to the browsers for output as synthesized voice. The term SpeechWeb was used, in 1999, [3] to describe the collection of hyperlinked speech applications in this architecture . The first SpeechWeb browser was demonstrated at the AAAI Sixteenth National Conference on Artificial Intelligence. [4]

The term "speechweb" has also been used, since the 1990s, in a different context to describe a web based network of information on speech, language and speech-language pathology. In addition, it was also hoped to provide a meeting place for professionals and those who have been affected by communication disorders. The term "speechWeb" has been trademarked by the company PipeBeach, which is now owned by HP, and refers to a software product which bridges telephone networks and conventional web servers.

In 2005, it was recognized that very few voice applications were available to the public through the Internet, despite the maturity of VXML at that time. It was also observed that nearly all VXML applications that were available had been constructed by people working in commerce and industry. This was in stark contrast to the huge growth of the conventional web, and the huge involvement of the public in the development of regular web pages, only a few years after the development of HTML. This observation led to the call for a Public-Domain SpeechWeb [5] which is accessible to the public through existing web browsers (with speech plugins) and which contains hyperlinked speech applications that are created and deployed by the public in a manner that is analogous to the creation and deployment of HTML pages on the conventional web. A browser for the Public-Domain SpeechWeb was demonstrated at the 16th International World Wide Web Conference, held in Banff, Canada in 2007. [6] The browser is a small X+V page which is executed by the freely available Opera with the free IBM speech-recognition plugin.

Two research groups are developing software to facilitate the construction and deployment of SpeechWeb applications by non-experts:

Related Research Articles

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">Hypertext</span> Text with references (links) to other text that the reader can immediately access

Hypertext is text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typically activated by a mouse click, keypress set, or screen touch. Apart from text, the term "hypertext" is also sometimes used to describe tables, images, and other presentational content formats with integrated hyperlinks. Hypertext is one of the key underlying concepts of the World Wide Web, where Web pages are often written in the Hypertext Markup Language (HTML). As implemented on the Web, hypertext enables the easy-to-use publication of information over the Internet.

<span class="mw-page-title-main">Markup language</span> Modern system for annotating a document

A markuplanguage is a text-encoding system which specifies the structure and formatting of a document and potentially the relationship between its parts. Markup can control the display of a document or enrich its content to facilitate automated processing.

<span class="mw-page-title-main">Semantic Web</span> Extension of the Web to facilitate data exchange

The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.

<span class="mw-page-title-main">World Wide Web</span> Linked hypertext system on the Internet

The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).

<span class="mw-page-title-main">Wiki software</span> Software to run a collaborative wiki (Including private wiki)

Wiki software is collaborative software that runs a wiki, which allows the users to create and collaboratively edit pages or entries via a web browser. A wiki system is usually a web application that runs on one or more web servers. The content, including previous revisions, is usually stored in either a file system or a database. Wikis are a type of web content management system, and the most commonly supported off-the-shelf software that web hosting facilities offer.

<span class="mw-page-title-main">Hyperlink</span> Method of referencing visual computer data

In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided to by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text with hyperlinks. The text that is linked from is known as anchor text. A software system that is used for viewing and creating hypertext is a hypertext system, and to create a hyperlink is to hyperlink. A user following hyperlinks is said to navigate or browse the hypertext.

<span class="mw-page-title-main">Wireless Markup Language</span> Markup language intended for devices that implement the Wireless Application Protocol specification

Wireless Markup Language (WML), based on XML, is an obsolete markup language intended for devices that implement the Wireless Application Protocol (WAP) specification, such as mobile phones. It provides navigational support, data input, hyperlinks, text and image presentation, and forms, much like HTML. It preceded the use of other markup languages used with WAP, such as XHTML and HTML itself, which achieved dominance as processing power in mobile devices increased.

In HTML and XHTML, an image map is a list of coordinates relating to a specific image, created in order to hyperlink areas of the image to different destinations. For example, a map of the world may have each country hyperlinked to further information about that country. The intention of an image map is to provide an easy way of linking various parts of an image without dividing the image into separate image files.

Interactive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telephony, IVR allows customers to interact with a company's host system via a telephone keypad or by speech recognition, after which services can be inquired about through the IVR dialogue. IVR systems can respond with pre-recorded or dynamically generated audio to further direct users on how to proceed. IVR systems deployed in the network are sized to handle large call volumes and also used for outbound calling as IVR systems are more intelligent than many predictive dialer systems.

VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service portals. VoiceXML applications are developed and deployed in a manner analogous to how a web browser interprets and visually renders the Hypertext Markup Language (HTML) it receives from a web server. VoiceXML documents are interpreted by a voice browser and in common deployment architectures, users interact with voice browsers via the public switched telephone network (PSTN).

Hypermedia, an extension of the term hypertext, is a nonlinear medium of information that includes graphics, audio, video, plain text and hyperlinks. This designation contrasts with the broader term multimedia, which may include non-interactive linear presentations as well as hypermedia. It is also related to the field of electronic literature. The term was first used in a 1965 article written by Ted Nelson.

Web development is the work involved in developing a website for the Internet or an intranet. Web development can range from developing a simple single static page of plain text to complex web applications, electronic businesses, and social network services. A more comprehensive list of tasks to which Web development commonly refers, may include Web engineering, Web design, Web content development, client liaison, client-side/server-side scripting, Web server and network security configuration, and e-commerce development.

A voice browser is a software application that presents an interactive voice user interface to the user in a manner analogous to the functioning of a web browser interpreting Hypertext Markup Language (HTML). Dialog documents interpreted by voice browser are often encoded in standards-based markup languages, such as Voice Dialog Extensible Markup Language (VoiceXML), a standard by the World Wide Web Consortium.

<span class="mw-page-title-main">OpenLaszlo</span> Discontinued open-source platform

OpenLaszlo is a discontinued open-source platform for the development and delivery of rich web applications. It is released under the Open Source Initiative certified Common Public License (CPL).

<span class="mw-page-title-main">History of the World Wide Web</span> Information system running in the Internet

The World Wide Web is a global information medium that users can access via computers connected to the Internet. The term is often mistakenly used as a synonym for the Internet, but the Web is a service that operates over the Internet, just as email and Usenet do. The history of the Internet and the history of hypertext date back significantly further than that of the World Wide Web.

The DASL Programming Language is a high-level, strongly typed programming language originally developed at Sun Microsystems Laboratories between 1999 and 2003 as part of the Ace Project. The goals of the project were to enable rapid development of web-based applications based on Sun's J2EE architecture, and to eliminate the steep learning curve of platform-specific details.

Wireless Application Protocol (WAP) is a technical standard for accessing information over a mobile wireless network. A WAP browser is a web browser for mobile devices such as mobile phones that use the protocol. Introduced in 1999, WAP achieved some popularity in the early 2000s, but by the 2010s it had been largely superseded by more modern standards. Modern phones have proper Web browsers, so they do not need WAP markup for compatibility, and therefore, most are no longer able to render and display pages written in WML, WAP's markup language.

XHTML+Voice is an XML language for describing multimodal user interfaces. The two essential modalities are visual and auditory. Visual interaction is defined like most current web pages via XHTML. Auditory components are defined by a subset of Voice XML. Interfacing the voice and visual components of X+V documents is accomplished through a combination of ECMAScript, JavaScript, and XML Events.

References

  1. Hemphill, C.T. and Thrift, P. R. "Surfing the Web by Voice " Proceedings of the third ACM International Multimedia Conference (San Francisco 1995), Year: 1995, Pages: 215 – 222.
  2. Lucas, B."VoiceXML for Web-based distributed conversational applications." Commun. ACM 43, 9, Year: 2000, Pages: 53 – 57.
  3. Frost, R. A. and Chitte, S. "A New Approach for Providing Natural-Language Speech Access to Large Knowledge Bases" Proc. of PACLING ’99, The Conference of the Pacific Association for Computational Linguistics, University of Waterloo, Ontario, Canada Year: 1999, Pages: 82 – 90.
  4. Frost, R. A. "A Natural-Language Speech Interface Constructed Entirely as a Set of Executable Specifications." Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, Orlando, Florida, USA. Year: 1999, Pages: 908 - 909.
  5. Frost, R. A. "A call for a public-domain SpeechWeb." Commun. ACM 48, 11, Year: 2005, Pages: 45 – 49.
  6. Frost, R. A., Ma, X. and Shi, Y. "A browser for a public-domain SpeechWeb." World Wide Web Conference, Banff, Canada Year: 2007, Pages: 1307–1308.