Working with Speech Recognition Grammar¶

Introduction¶

Speech recognition grammar defines syntactical structure of the words to be recognized.

Julius speech recognition component provided by OpenHRI uses W3C-SRGS format to define the speech recognition grammar.

In this section, we explain the W3C-SRGS format and introduce the tools provided by OpenHRI to help authoring the grammar.

W3C-SRGS Grammar¶

W3C-SRGS (Speech Recognition Grammar Specification) is one of the standard to define the speech recognition grammar. It uses XML format with following tags to

Tags¶

lexicon: Indicates URI of W3C-PLS lexicon (see next section). Optional.
rule: Indicates set of grammar distinguished by an ID. This will be used to reference the grammar from the other grammar or to switch the active grammar recognized by the Julius speech recognition component.
item: Indicates a word or a sentence (space separated words) to be recognized. “repeat” property can be used.
one-of: Indicates the child items are all acceptable.
ruleref: Import the rule defined by the uri.

Example¶

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en"
         version="1.0" mode="voice" root="main">

  <lexicon uri="sample-lex-en.xml"/>

  <rule id="main">
    <one-of>
      <item><ruleref uri="#greet" /></item>
      <item><ruleref uri="#command" /></item>
    </one-of>
  </rule>

  <rule id="greet">
    <one-of>
      <item>hello</item>
      <item>good afternoon</item>
      <item>good evening</item>
      <item>good bye</item>
      <item>bye</item>
    </one-of>
  </rule>

  <rule id="command">
    <one-of>
      <item>pick</item>
      <item>give me</item>
    </one-of>
    <item repeat="0-1">the</item>
    <one-of>
      <item>apple</item>
      <item>cake</item>
      <item>remote</item>
    </one-of>
    <item repeat="0-1">please</item>
  </rule>

</grammar>

W3C-PLS Lexicon¶

W3C-PLS (Pronunciation Lexicon Specification) is one of the standard to define the speech recognition lexicon. It uses XML format with following tags to

Tags¶

lexeme: Set of grapheme and phoneme.
grapheme: Indicates how you write the word.
phoneme: Indicates how you pronounce the word.

Example¶

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
     xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
                         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
     alphabet="x-ARPAbet" xml:lang="en">
  <lexeme>
    <grapheme>me</grapheme>
    <phoneme>{{x-ARPAbet|m iy}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>good</grapheme>
    <phoneme>{{x-ARPAbet|g uh d}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>remote</grapheme>
    <phoneme>{{x-ARPAbet|r ix m ow t}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>apple</grapheme>
    <phoneme>{{x-ARPAbet|ae p ax l}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>give</grapheme>
    <phoneme>{{x-ARPAbet|g ih v}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>please</grapheme>
    <phoneme>{{x-ARPAbet|p l iy z}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>evening</grapheme>
    <phoneme>{{x-ARPAbet|iy v n ix ng}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>afternoon</grapheme>
    <phoneme>{{x-ARPAbet|ae f t er n uw n}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>pick</grapheme>
    <phoneme>{{x-ARPAbet|p ih k}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>cake</grapheme>
    <phoneme>{{x-ARPAbet|k ey k}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>the</grapheme>
    <phoneme>{{x-ARPAbet|dh ax}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>bye</grapheme>
    <phoneme>{{x-ARPAbet|b ay}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>hello</grapheme>
    <phoneme>{{x-ARPAbet|hh ax l ow}}</phoneme>
  </lexeme>
</lexicon>

Tools¶

Validation tool¶

You can validate your grammar in W3C-SRGS format by using “validatesrgs” tool.

You can use the validation tool by simply entering the following command:

$ validatesrgs [grammarfile]

If the grammar is correct, you will get the following output:

$ validatesrgs sample-en.grxml
Validating SRGS file sample-en.grxml...
SRGS file is valid.
Validating PLS file sample-lex-en.xml...
PLS file is valid.

If the grammar is not correct, you will get error messages for example as follows:

$ validatesrgs sample-invalid.grxml
Validating SRGS file sample-invalid.grxml...
[error] Invalid SRGS file.
Element '{http://www.w3.org/2001/06/grammar}one-of': Missing child element(s). Expected is ( {http://www.w3.org/2001/06/grammar}item )., line 12

Visualization tool¶

OpenHRI has more powerful tool to validate the structure of the W3C-SRGS grammar. “juliustographviz” tool can visualize the grammar in graph to check the correctness.

To draw the graph, enter following command:

$ srgstojulius sample-en.grxml | juliustographviz | dot -Txlib

For example, you will get the following output:

Lexicon generation tool¶

After you have finished writing W3C-SRGS grammar, you sometime have to prepare W3C-PLS lexicon (this process is required when you have designed a grammar which contains words with special readings).

OpenHRI provides a tool to automatically generate W3C-PLS lexicon from the W3C-SRGS grammar.

The “srgstopls” tool can be used as follows:

$ srgstopls sample-en.grxml > sample-lex-en.xml

English lexicon (by using julius-voxforge dictionary) and Japanese lexicon (by using julius-runkit dictionary) are supported by this tool at the moment.

Note

Words not in the dictionary remains blank in output XML file. You should always check the output XML and fill in manually for such words.