A SpeechWeb is a collection of hyperlinked speech applications, accessed remotely by speech browsers running on end-user devices. Links are activated through spoken commands.
The idea of surfing the web by voice dates back to at least the work of Hemphill and Thrift in 1995 [1] who developed a system in which, HTML pages were downloaded and processed on client-side computers enabling voice access to web page content, and activation of hyperlinks through spoken commands.
Also in the mid 1990s, researchers at AT&T were discussing the development of a new markup language that would enable the web to be accessed through regular phones. From 1995 to 1999, AT&T, Lucent, Motorola, and IBM all developed their own versions of phone and speech markup languages. These companies created the VoiceXML Forum, and jointly designed the Voice Markup Language, VXML, which was accepted by the W3C Committee in 2000. VXML is typically used to create hyperlinked speech applications. [2] VXML pages include commands for prompting user speech input, invoking recognition grammars, outputting synthesized voice, iterating through blocks of code, calling local JavaScript, and hyperlinking to other remote VXML pages downloaded in a manner similar to the linking of HTML pages in the conventional Web.
Around the same time as the emergence of VXML, a research group at the University of Windsor in Canada were developing an alternative approach, in which speech applications deployed on the web can be accessed by client-side speech browsers which provide the speech-recognition capability, that is tailored to the application by downloading an application-specific recognition grammar from the remote speech application web site. Input that is recognized by the client-side browser is sent to the remote server which processes it and returns a text result to the browsers for output as synthesized voice. The term SpeechWeb was used, in 1999, [3] to describe the collection of hyperlinked speech applications in this architecture . The first SpeechWeb browser was demonstrated at the AAAI Sixteenth National Conference on Artificial Intelligence. [4]
The term "speechweb" has also been used, since the 1990s, in a different context to describe a web based network of information on speech, language and speech-language pathology. In addition, it was also hoped to provide a meeting place for professionals and those who have been affected by communication disorders. The term "speechWeb" has been trademarked by the company PipeBeach, which is now owned by HP, and refers to a software product which bridges telephone networks and conventional web servers.
In 2005, it was recognized that very few voice applications were available to the public through the Internet, despite the maturity of VXML at that time. It was also observed that nearly all VXML applications that were available had been constructed by people working in commerce and industry. This was in stark contrast to the huge growth of the conventional web, and the huge involvement of the public in the development of regular web pages, only a few years after the development of HTML. This observation led to the call for a Public-Domain SpeechWeb [5] which is accessible to the public through existing web browsers (with speech plugins) and which contains hyperlinked speech applications that are created and deployed by the public in a manner that is analogous to the creation and deployment of HTML pages on the conventional web. A browser for the Public-Domain SpeechWeb was demonstrated at the 16th International World Wide Web Conference, held in Banff, Canada in 2007. [6] The browser is a small X+V page which is executed by the freely available Opera with the free IBM speech-recognition plugin.
Two research groups are developing software to facilitate the construction and deployment of SpeechWeb applications by non-experts:
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.
Hypertext is text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typically activated by a mouse click, keypress set, or screen touch. Apart from text, the term "hypertext" is also sometimes used to describe tables, images, and other presentational content formats with integrated hyperlinks. Hypertext is one of the key underlying concepts of the World Wide Web, where Web pages are often written in the Hypertext Markup Language (HTML). As implemented on the Web, hypertext enables the easy-to-use publication of information over the Internet.
A markuplanguage is a text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate automated processing.
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
A wiki is a form of hypertext publication on the internet which is collaboratively edited and managed by its audience directly through a web browser. A typical wiki contains multiple pages that can either be edited by the public or limited to use within an organization for maintaining its internal knowledge base.
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing.
The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).
Wiki software is collaborative software that runs a wiki, which allows the users to create and collaboratively edit pages or entries via a web browser. A wiki system is usually a web application that runs on one or more web servers. The content, including previous revisions, is usually stored in either a file system or a database. Wikis are a type of web content management system, and the most commonly supported off-the-shelf software that web hosting facilities offer.
In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided to by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text with hyperlinks. The text that is linked from is known as anchor text. A software system that is used for viewing and creating hypertext is a hypertext system, and to create a hyperlink is to hyperlink. A user following hyperlinks is said to navigate or browse the hypertext.
Wireless Markup Language (WML), based on XML, is an obsolete markup language intended for devices that implement the Wireless Application Protocol (WAP) specification, such as mobile phones. It provides navigational support, data input, hyperlinks, text and image presentation, and forms, much like HTML. It preceded the use of other markup languages used with WAP, such as XHTML and HTML itself, which achieved dominance as processing power in mobile devices increased.
In HTML and XHTML, an image map is a list of coordinates relating to a specific image, created in order to hyperlink areas of the image to different destinations. For example, a map of the world may have each country hyperlinked to further information about that country. The intention of an image map is to provide an easy way of linking various parts of an image without dividing the image into separate image files.
Interactive voice response (IVR) is a technology that allows telephone users to interact with a computer-operated telephone system through the use of voice and DTMF tones input with a keypad. In telephony, IVR allows customers to interact with a company's host system via a telephone keypad or by speech recognition, after which services can be inquired about through the IVR dialogue. IVR systems can respond with pre-recorded or dynamically generated audio to further direct users on how to proceed. IVR systems deployed in the network are sized to handle large call volumes and also used for outbound calling as IVR systems are more intelligent than many predictive dialer systems.
VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service portals. VoiceXML applications are developed and deployed in a manner analogous to how a web browser interprets and visually renders the Hypertext Markup Language (HTML) it receives from a web server. VoiceXML documents are interpreted by a voice browser and in common deployment architectures, users interact with voice browsers via the public switched telephone network (PSTN).
Hypermedia, an extension of hypertext, is a nonlinear medium of information that includes graphics, audio, video, plain text and hyperlinks. This designation contrasts with the broader term multimedia, which may include non-interactive linear presentations as well as hypermedia. The term was first used in a 1965 article written by Ted Nelson. Hypermedia is a type of multimedia that features interactive elements, such as hypertext, buttons, or interactive images and videos, allowing users to navigate and engage with content in a non-linear manner.
A voice browser is a software application that presents an interactive voice user interface to the user in a manner analogous to the functioning of a web browser interpreting Hypertext Markup Language (HTML). Dialog documents interpreted by voice browser are often encoded in standards-based markup languages, such as Voice Dialog Extensible Markup Language (VoiceXML), a standard by the World Wide Web Consortium.
OpenLaszlo is a discontinued open-source platform for the development and delivery of rich web applications. It is released under the Open Source Initiative certified Common Public License (CPL).
The World Wide Web is a global information medium that users can access via computers connected to the Internet. The term is often mistakenly used as a synonym for the Internet, but the Web is a service that operates over the Internet, just as email and Usenet do. The history of the Internet and the history of hypertext date back significantly further than that of the World Wide Web.
The DASL Programming Language is a high-level, strongly typed programming language originally developed at Sun Microsystems Laboratories between 1999 and 2003 as part of the Ace Project. The goals of the project were to enable rapid development of web-based applications based on Sun's J2EE architecture, and to eliminate the steep learning curve of platform-specific details.
Wireless Application Protocol (WAP) is a now obsolete technical standard for accessing information over a mobile cellular network. Introduced in 1999, WAP allowed at launch users with compatible mobile devices to browse content such as news, weather and sports scores provided by mobile network operators, specially designed for the limited capabilities of a mobile device. The Japanese i-mode system offered another major competing wireless data standard.