Java Speech Markup Language

Last updated

Java Speech API Markup Language (JSML) is an XML-based markup language for annotating text input to speech synthesizers. JSML is used within the Java Speech API. JSML is an XML application and conforms to the requirements of well-formed XML documents. Java Speech API Markup Language is referred to as JSpeech Markup Language when describing the W3C documentation of the standard. Java Speech API Markup Language and JSpeech Markup Language identical apart from the change in name, which is made to protect Sun trademarks.

JSML is primarily an XML text format used by Java applications to annotate text input to speech synthesizers. Elements of JSML provide speech synthesizer with detailed information on how to speak text in a naturalized fashion.

JSML defines elements which define a document's structure, the pronunciation of certain words and phrases, features of speech such as emphasis and intonation, etc. JSML is designed in the Java fashion to be simple to learn and use, to be portable across different synthesizers and computing platforms, and although designed for use within is also applicable to a wide range of languages. An example of how JSML is defined is set out below:

<jsml><divtype="paragraph">Thisblockabout<literal>JSML</literal>isconstructedas a<emphasis><literal>JSML</literal></emphasis>example.</div></jsml>

The W3C has developed a standard markup language called SSML, which is based on JSML but is not identical to it. This became a formal W3C recommendation in 2004.

Related Research Articles

<span class="mw-page-title-main">Document Object Model</span> Convention for representing and interacting with objects in HTML, XHTML, and XML documents

The Document Object Model (DOM) is a cross-platform and language-independent interface that treats an HTML or XML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document with a logical tree. Each branch of the tree ends in a node, and each node contains objects. DOM methods allow programmatic access to the tree; with them one can change the structure, style or content of a document. Nodes can have event handlers attached to them. Once an event is triggered, the event handlers get executed.

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">Standard Generalized Markup Language</span> Markup language

The Standard Generalized Markup Language is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content, and is one of a number of mathematical markup languages. Its aim is to natively integrate mathematical formulae into World Wide Web pages and other documents. It is part of HTML5 and standardised by ISO/IEC since 2015.

VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service portals. VoiceXML applications are developed and deployed in a manner analogous to how a web browser interprets and visually renders the Hypertext Markup Language (HTML) it receives from a web server. VoiceXML documents are interpreted by a voice browser and in common deployment architectures, users interact with voice browsers via the public switched telephone network (PSTN).

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.

A user interface markup language is a markup language that renders and describes graphical user interfaces and controls. Many of these markup languages are dialects of XML and are dependent upon a pre-existing scripting language engine, usually a JavaScript engine, for rendering of controls and extra scriptability.

Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis applications. It is a recommendation of the W3C's Voice Browser Working Group. SSML is often embedded in VoiceXML scripts to drive interactive telephony systems. However, it also may be used alone, such as for creating audio books. For desktop applications, other markup languages are popular, including Apple's embedded speech commands, and Microsoft's SAPI Text to speech (TTS) markup, also an XML language. It is also used to produce sounds via Azure Cognitive Services' Text to Speech API or when writing third-party skills for Google Assistant or Amazon Alexa.

In HTML, <div> and <span> tags are elements used to define parts of a document, so that they are identifiable when a unique classification is necessary. Where other HTML elements such as <p> (paragraph), <em> (emphasis), and so on, accurately represent the semantics of the content, the additional use of <span> and <div> tags leads to better accessibility for readers and easier maintainability for authors. Where no existing HTML element is applicable, <span> and <div> can valuably represent parts of a document so that HTML attributes such as class, id, lang, or dir can be applied.

The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications. To date, a number of versions of the API have been released, which have shipped either as part of a Speech SDK or as part of the Windows OS itself. Applications that use SAPI include Microsoft Office, Microsoft Agent and Microsoft Speech Server.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

SCXML stands for State Chart XML: State Machine Notation for Control Abstraction. It is an XML-based markup language that provides a generic state-machine-based execution environment based on Harel statecharts.

<span class="mw-page-title-main">HTML5</span> Fifth and previous version of hypertext markup language

HTML5 is a markup language used for structuring and presenting hypertext documents on the World Wide Web. It was the fifth and final major HTML version that is now a retired World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML Living Standard. It is maintained by the Web Hypertext Application Technology Working Group (WHATWG), a consortium of the major browser vendors.

SABLE is an XML markup language used to annotate texts for speech synthesis. It defines tags that control how written words, numbers, and sentences are audibly reproduced by a computer. SABLE was developed as an informal joint project between Sun Microsystems, AT&T, Bell Labs, and the University of Edinburgh as an initiative to combine three previous speech synthesis markup languages SSML, STML, and JSML.

Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.

XQuery is a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML, text and with vendor-specific extensions for other data formats. The language is developed by the XML Query working group of the W3C. The work is closely coordinated with the development of XSLT by the XSL Working Group; the two groups share responsibility for XPath, which is a subset of XQuery.

The Web platform is a collection of technologies developed as open standards by the World Wide Web Consortium and other standardization bodies such as the Web Hypertext Application Technology Working Group, the Unicode Consortium, the Internet Engineering Task Force, and Ecma International. It is the umbrella term introduced by the World Wide Web Consortium, and in 2011 it was defined as "a platform for innovation, consolidation and cost efficiencies" by W3C CEO Jeff Jaffe. Being built on The evergreen Web has allowed for the addition of new capabilities while addressing security and privacy risks. Additionally, developers are enabled to build interoperable content on a cohesive platform.

<span class="mw-page-title-main">XML transformation language</span> Type of programming language

An XML transformation language is a programming language designed specifically to transform an input XML document into an output document which satisfies some specific goal.

References