Pronunciation Lexicon Specification
The Pronunciation Lexicon Specification (PLS) is a W3C Specification, which is designed to enable interoperable specification of pronunciation information for both speech recognition and speech synthesis engines within voice browsing applications. The language is intended to be easy to use by developers while supporting the accurate specification of pronunciation information for international use.
The language allows one or more pronunciations for a word or phrase to be specified using a standard pronunciation alphabet or if necessary using vendor specific alphabets. Pronunciations are grouped together into a PLS document which may be referenced from other markup languages, such as the Speech Recognition Grammar Specification SRGS and the Speech Synthesis Markup Language SSML.
Usage
Here is an example PLS document:
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>judgment</grapheme> <grapheme>judgement</grapheme> <phoneme>ˈdʒʌdʒ.mənt</phoneme> </lexeme> <lexeme> <grapheme>fiancé</grapheme> <grapheme>fiance</grapheme> <phoneme>fiˈɒns.eɪ</phoneme> <phoneme>ˌfiː.ɑːnˈseɪ</phoneme> </lexeme> </lexicon>
which could be used to improve TTS as shown in the following SSML 1.0 document:
<?xml version="1.0" encoding="UTF-8"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <lexicon uri="http://www.example.com/lexicon_defined_above.xml"/> <p> In the judgement of my fiancé, Las Vegas is the best place for a honeymoon. I replied that I preferred Venice and didn't think the Venetian casino was an acceptable compromise.<\p> </speak>
but also to improve ASR in the following SRGS 1.0 grammar:
<?xml version="1.0" encoding="UTF-8"?> <grammar version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" xml:lang="en-US" root="movies" mode="voice"> <lexicon uri="http://www.example.com/lexicon_defined_above.xml"/> <rule id="movies" scope="public"> <one-of> <item>Terminator 2: Judgment Day</item> <item>My Big Fat Obnoxious Fiance</item> <item>Pluto's Judgement Day</item> </one-of> </rule> </grammar>
Common Use Cases
Multiple pronunciations for the same orthography
For ASR systems it is common to rely on multiple pronunciations of the same word or phrase in order to cope with variations of pronunciation within a language. In the Pronunciation Lexicon language, multiple pronunciations are represented by more than one <phoneme> (or <alias>) element within the same <lexeme> element.
In the following example the word "Newton" has two possible pronunciations.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>ˈnjuːtən</phoneme> <phoneme>ˈnuːtən</phoneme> </lexeme> </lexicon>
Multiple orthographies
In some situations there are alternative textual representations for the same word or phrase. This can arise due to a number of reasons. See Section 4.5 of PLS for details. Because these are representations that have the same meaning (as opposed to homophones), it is recommended that they be represented using a single <lexeme> element that contains multiple graphemes.
Here are two simple examples of multiple orthographies: alternative spelling of an English word and multiple writings of a Japanese word.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>colour</grapheme> <grapheme>color</grapheme> <phoneme>ˈkʌlər</phoneme> </lexeme> </lexicon>
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd" alphabet="ipa" xml:lang="jp"> <!-- Japanese entry showing how multiple writing systems are handled romaji, kanji and hiragana orthographies --> <lexeme> <grapheme>nihongo</grapheme> <grapheme>日本語</grapheme> <grapheme>にほんご</grapheme> <phoneme>ɲihoŋo</phoneme> </lexeme> </lexicon>