Pronunciation Lexicon Specification

Last updated December 16, 2023

The Pronunciation Lexicon Specification (PLS) is a W3C Recommendation, which is designed to enable interoperable specification of pronunciation information for both speech recognition and speech synthesis engines within voice browsing applications. The language is intended to be easy to use by developers while supporting the accurate specification of pronunciation information for international use.

The language allows one or more pronunciations for a word or phrase to be specified using a standard pronunciation alphabet or if necessary using vendor specific alphabets. Pronunciations are grouped together into a PLS document which may be referenced from other markup languages, such as the Speech Recognition Grammar Specification SRGS and the Speech Synthesis Markup Language SSML.

Usage

Here is an example PLS document:

<?xml version="1.0" encoding="UTF-8"?><lexiconversion="1.0"xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"alphabet="ipa"xml:lang="en-US"><lexeme><grapheme>judgment</grapheme><grapheme>judgement</grapheme><phoneme>ˈdʒʌdʒ.mənt</phoneme><!-- IPA string is:       "ˈdʒʌdʒ.mənt" --></lexeme><lexeme><grapheme>fiancé</grapheme><grapheme>fiance</grapheme><phoneme>fiˈɒns.eɪ</phoneme><!-- IPA string is:       "fiˈɒns.eɪ" --><phoneme>ˌfiː.ɑːnˈseɪ</phoneme><!-- IPA string is:       "ˌfiː.ɑːnˈseɪ" --></lexeme></lexicon>

which could be used to improve TTS as shown in the following SSML 1.0 document:

<?xml version="1.0" encoding="UTF-8"?><speakversion="1.0"xmlns="http://www.w3.org/2001/10/synthesis"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2001/10/synthesis       http://www.w3.org/TR/speech-synthesis/synthesis.xsd"xml:lang="en-US"><lexiconuri="http://www.example.org/lexicon_defined_above.xml"/><p>Inthejudgementofmyfiancé,LasVegasisthebestplaceforahoneymoon. IrepliedthatIpreferredVeniceanddidn'tthinktheVenetiancasinowasan acceptablecompromise.</p></speak>

but also to improve ASR in the following SRGS 1.0 grammar:

<?xml version="1.0" encoding="UTF-8"?><grammarversion="1.0"xmlns="http://www.w3.org/2001/06/grammar"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2001/06/grammar        http://www.w3.org/TR/speech-grammar/grammar.xsd"xml:lang="en-US"root="movies"mode="voice"><lexiconuri="http://www.example.org/lexicon_defined_above.xml"/><ruleid="movies"scope="public"><one-of><item>Terminator2:JudgmentDay</item><item>MyBigFatObnoxiousFiance</item><item>Pluto'sJudgementDay</item></one-of></rule></grammar>

Common use cases

Multiple pronunciations for the same orthography

For ASR systems it is common to rely on multiple pronunciations of the same word or phrase in order to cope with variations of pronunciation within a language. In the Pronunciation Lexicon language, multiple pronunciations are represented by more than one <phoneme> (or <alias>) element within the same <lexeme> element.

In the following example the word "Newton" has two possible pronunciations.

<?xml version="1.0" encoding="UTF-8"?><lexiconversion="1.0"xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"alphabet="ipa"xml:lang="en-GB"><lexeme><grapheme>Newton</grapheme><phoneme>ˈnjuːtən</phoneme><!-- IPA string is: "ˈnjuːtən" --><phoneme>ˈnuːtən</phoneme><!-- IPA string is: "ˈnuːtən" --></lexeme></lexicon>

Multiple orthographies

In some situations there are alternative textual representations for the same word or phrase. This can arise due to a number of reasons. See Section 4.5 of PLS for details. Because these are representations that have the same meaning (as opposed to homophones), it is recommended that they be represented using a single <lexeme> element that contains multiple graphemes.

Here are two simple examples of multiple orthographies: alternative spelling of an English word and multiple writings of a Japanese word.

<?xml version="1.0" encoding="UTF-8"?><lexiconversion="1.0"xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"alphabet="ipa"xml:lang="en-US"><!-- English entry showing how alternative spellings are handled --><lexeme><grapheme>colour</grapheme><grapheme>color</grapheme><phoneme>ˈkʌlər</phoneme><!-- IPA string is: "ˈkʌlər" --></lexeme></lexicon><?xml version="1.0" encoding="UTF-8"?><lexiconversion="1.0"xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"alphabet="ipa"xml:lang="ja"><!-- Japanese entry showing how multiple writing systems are handled          romaji, kanji and hiragana orthographies --><lexeme><grapheme>nihongo</grapheme><grapheme>日本語</grapheme><grapheme>にほんご</grapheme><phoneme>ɲihoŋɡo</phoneme><!-- IPA string is: "ɲihoŋɡo" --></lexeme></lexicon>

Homophones

Most languages have homophones, words with the same pronunciation but different meanings (and possibly different spellings), for instance "seed" and "cede". It is recommended that these be represented as different lexemes.

<?xml version="1.0" encoding="UTF-8"?><lexiconversion="1.0"xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"alphabet="ipa"xml:lang="en-US"><lexeme><grapheme>cede</grapheme><phoneme>siːd</phoneme><!-- IPA string is: "siːd" --></lexeme><lexeme><grapheme>seed</grapheme><phoneme>siːd</phoneme><!-- IPA string is: "siːd" --></lexeme></lexicon>

Homographs

Most languages have words with different meanings but the same spelling (and sometimes different pronunciations), called homographs. For example, in English the word bass (fish) and the word bass (in music) have identical spellings but different meanings and pronunciations. Although it is recommended that these words be represented using separate <lexeme> elements that are distinguished by different values of the role attribute (see Section 4.4 of PLS 1.0), if a pronunciation lexicon author does not want to distinguish between the two words they could simply be represented as alternative pronunciations within the same <lexeme> element. In the latter case the TTS processor will not be able to distinguish when to apply the first or the second transcription.

In this example the pronunciations of the homograph "bass" are shown.

<?xml version="1.0" encoding="UTF-8"?><lexiconversion="1.0"xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"alphabet="ipa"xml:lang="en-US"><lexeme><grapheme>bass</grapheme><phoneme>bæs</phoneme><!-- IPA string is: bæs --><phoneme>beɪs</phoneme><!-- IPA string is: beɪs --></lexeme></lexicon>

Note that English contains numerous examples of noun-verb pairs that can be treated either as homographs or as alternative pronunciations, depending on author preference. Two examples are the noun/verb "refuse" and the noun/verb "address".

<?xml version="1.0" encoding="UTF-8"?><lexiconversion="1.0"xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"xmlns:mypos="http://www.example.org/my_pos_namespace"alphabet="ipa"xml:lang="en-US"><lexemerole="mypos:verb"><grapheme>refuse</grapheme><phoneme>rɪˈfjuːz</phoneme><!-- IPA string is: "rɪˈfjuːz" --></lexeme><lexemerole="mypos:noun"><grapheme>refuse</grapheme><phoneme>ˈrɛfjuːs</phoneme><!-- IPA string is: "ˈrɛfjuːs" --></lexeme></lexicon>

Pronunciation by orthography

For some words and phrases pronunciation can be expressed quickly and conveniently as a sequence of other orthographies. The developer is not required to have linguistic knowledge, but instead makes use of the pronunciations that are already expected to be available. To express pronunciations using other orthographies the <alias> element may be used.

This feature may be very useful to deal with acronym expansion.

<?xml version="1.0" encoding="UTF-8"?><lexiconversion="1.0"xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"alphabet="ipa"xml:lang="en-US"><!--      Acronym expansion   --><lexeme><grapheme>W3C</grapheme><alias>WorldWideWebConsortium</alias></lexeme><!--      number representation   --><lexeme><grapheme>101</grapheme><alias>onehundredandone</alias></lexeme><!--      crude pronunciation mechanism   --><lexeme><grapheme>Thailand</grapheme><alias>tieland</alias></lexeme><!--      crude pronunciation mechanism and acronym expansion   --><lexeme><grapheme>BBC1</grapheme><alias>bebeseaone</alias></lexeme></lexicon>

Status and future

PLS 1.0 reached the status of W3C Recommendation on 14 October 2008.

Related Research Articles

XSD, a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document, to assure it adheres to the description of the element it is placed in.

GPX, or GPS Exchange Format, is an XML schema designed as a common GPS data format for software applications. It can be used to describe waypoints, tracks, and routes. It is an open format and can be used without the need to pay license fees. Location data is stored in tags and can be interchanged between GPS devices and software. Common software applications for the data include viewing tracks projected onto various map sources, annotating maps, and geotagging photographs based on the time they were taken.

Apache Wicket, commonly referred to as Wicket, is a component-based web application framework for the Java programming language conceptually similar to JavaServer Faces and Tapestry. It was originally written by Jonathan Locke in April 2004. Version 1.0 was released in June 2005. It graduated into an Apache top-level project in June 2007.

Speech Recognition Grammar Specification (SRGS) is a W3C standard for how speech recognition grammars are specified. A speech recognition grammar is a set of word patterns, and tells a speech recognition system what to expect a human to say. For instance, if you call an auto-attendant application, it will prompt you for the name of a person. It will then start up a speech recognizer, giving it a speech recognition grammar. This grammar contains the names of the people in the auto attendant's directory and a collection of sentence patterns that are the typical responses from callers to the prompt.

Sitemaps is a protocol in XML format meant for a webmaster to inform search engines about URLs on a website that are available for web crawling. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs of the site. This allows search engines to crawl the site more efficiently and to find URLs that may be isolated from the rest of the site's content. The Sitemaps protocol is a URL inclusion protocol and complements robots.txt, a URL exclusion protocol.

GraphML is an XML-based file format for graphs. The GraphML file format results from the joint effort of the graph drawing community to define a common format for exchanging graph structure data. It uses an XML-based syntax and supports the entire range of possible graph structure constellations including directed, undirected, mixed graphs, hypergraphs, and application-specific attributes.

Catalogue Service for the Web (CSW), sometimes seen as Catalogue Service - Web, is a standard for exposing a catalogue of geospatial records in XML on the Internet. The catalogue is made up of records that describe geospatial data, geospatial services, and related resources.

RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The Resource Description Framework (RDF) data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.

Semantic Interpretation for Speech Recognition (SISR) defines the syntax and semantics of annotations to grammar rules in the Speech Recognition Grammar Specification (SRGS). Since 5 April 2007, it is a World Wide Web Consortium recommendation.

The Web Application Description Language (WADL) is a machine-readable XML description of HTTP-based web services. WADL models the resources provided by a service and the relationships between them. WADL is intended to simplify the reuse of web services that are based on the existing HTTP architecture of the Web. It is platform and language independent and aims to promote reuse of applications beyond the basic use in a web browser. WADL was submitted to the World Wide Web Consortium by Sun Microsystems on 31 August 2009, but the consortium has no current plans to standardize it. WADL is the REST equivalent of SOAP's Web Services Description Language (WSDL), which can also be used to describe REST web services.

Liquibase is an open-source database-independent library for tracking, managing and applying database schema changes. It was started in 2006 to allow easier tracking of database changes, especially in an agile software development environment.

In computing, Facelets is an open-source Web template system under the Apache license and the default view handler technology for Jakarta Server Faces. The language requires valid input XML documents to work. Facelets supports all of the JSF UI components and focuses completely on building the JSF component tree, reflecting the view for a JSF application.

Apache Click is a page and component oriented web application framework for the Java language and is built on top of the Java Servlet API.

The W3C's XML Schema Recommendation defines a formal mechanism for describing XML documents. The standard has become popular and is used by the majority of standards bodies when describing their data.

PhyloXML is an XML language for the analysis, exchange, and storage of phylogenetic trees and associated data. The structure of phyloXML is described by XML Schema Definition (XSD) language.

<span class="mw-page-title-main">Web Services Description Language</span> XML-based interface description language

The Web Services Description Language is an XML-based interface description language that is used for describing the functionality offered by a web service. The acronym is also used for any specific WSDL description of a web service, which provides a machine-readable description of how the service can be called, what parameters it expects, and what data structures it returns. Therefore, its purpose is roughly similar to that of a type signature in a programming language.

XHTML+RDFa is an extended version of the XHTML markup language for supporting RDF through a collection of attributes and processing rules in the form of well-formed XML documents. XHTML+RDFa is one of the techniques used to develop Semantic Web content by embedding rich semantic markup. Version 1.1 of the language is a superset of XHTML 1.1, integrating the attributes according to RDFa Core 1.1. In other words, it is an RDFa support through XHTML Modularization.

Data Format Description Language, published as an Open Grid Forum Recommendation in February 2021, is a modeling language for describing general text and binary data in a standard way. A DFDL model or schema allows any text or binary data to be read from its native format and to be presented as an instance of an information set.. The same DFDL schema also allows data to be taken from an instance of an information set and written out to its native format.

A document type declaration, or DOCTYPE, is an instruction that associates a particular XML or SGML document with a document type definition (DTD). In the serialized form of the document, it manifests as a short string of markup that conforms to a particular syntax.

The PROV standard defines a data model, serializations, and definitions to support the interchange of provenance information on the Web. Here provenance includes all "information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness".

References

PLS Specification (W3C Recommendation)

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.