Working with Speech Recognition Grammar

Introduction

Speech recognition grammar defines syntactical structure of the words to be recognized.

Julius speech recognition component provided by OpenHRI uses W3C-SRGS format to define the speech recognition grammar.

In this section, we explain the W3C-SRGS format and introduce the tools provided by OpenHRI to help authoring the grammar.

W3C-SRGS Grammar

W3C-SRGS (Speech Recognition Grammar Specification) is one of the standard to define the speech recognition grammar. It uses XML format with following tags to

Tags

lexicon
Indicates URI of W3C-PLS lexicon (see next section). Optional.
rule
Indicates set of grammar distinguished by an ID. This will be used to reference the grammar from the other grammar or to switch the active grammar recognized by the Julius speech recognition component.
item
Indicates a word or a sentence (space separated words) to be recognized. “repeat” property can be used.
one-of
Indicates the child items are all acceptable.
ruleref
Import the rule defined by the uri.

Example

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en"
         version="1.0" mode="voice" root="main">

  <lexicon uri="sample-lex-en.xml"/>

  <rule id="main">
    <one-of>
      <item><ruleref uri="#greet" /></item>
      <item><ruleref uri="#command" /></item>
    </one-of>
  </rule>

  <rule id="greet">
    <one-of>
      <item>hello</item>
      <item>good afternoon</item>
      <item>good evening</item>
      <item>good bye</item>
      <item>bye</item>
    </one-of>
  </rule>

  <rule id="command">
    <one-of>
      <item>pick</item>
      <item>give me</item>
    </one-of>
    <item repeat="0-1">the</item>
    <one-of>
      <item>apple</item>
      <item>cake</item>
      <item>remote</item>
    </one-of>
    <item repeat="0-1">please</item>
  </rule>

</grammar>

W3C-PLS Lexicon

W3C-PLS (Pronunciation Lexicon Specification) is one of the standard to define the speech recognition lexicon. It uses XML format with following tags to

Tags

lexeme
Set of grapheme and phoneme.
grapheme
Indicates how you write the word.
phoneme
Indicates how you pronounce the word.

Example

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
     xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
                         http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
     alphabet="x-ARPAbet" xml:lang="en">
  <lexeme>
    <grapheme>me</grapheme>
    <phoneme>{{x-ARPAbet|m iy}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>good</grapheme>
    <phoneme>{{x-ARPAbet|g uh d}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>remote</grapheme>
    <phoneme>{{x-ARPAbet|r ix m ow t}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>apple</grapheme>
    <phoneme>{{x-ARPAbet|ae p ax l}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>give</grapheme>
    <phoneme>{{x-ARPAbet|g ih v}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>please</grapheme>
    <phoneme>{{x-ARPAbet|p l iy z}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>evening</grapheme>
    <phoneme>{{x-ARPAbet|iy v n ix ng}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>afternoon</grapheme>
    <phoneme>{{x-ARPAbet|ae f t er n uw n}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>pick</grapheme>
    <phoneme>{{x-ARPAbet|p ih k}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>cake</grapheme>
    <phoneme>{{x-ARPAbet|k ey k}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>the</grapheme>
    <phoneme>{{x-ARPAbet|dh ax}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>bye</grapheme>
    <phoneme>{{x-ARPAbet|b ay}}</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>hello</grapheme>
    <phoneme>{{x-ARPAbet|hh ax l ow}}</phoneme>
  </lexeme>
</lexicon>

Tools

Validation tool

You can validate your grammar in W3C-SRGS format by using “validatesrgs” tool.

You can use the validation tool by simply entering the following command:

$ validatesrgs [grammarfile]

If the grammar is correct, you will get the following output:

$ validatesrgs sample-en.grxml
Validating SRGS file sample-en.grxml...
SRGS file is valid.
Validating PLS file sample-lex-en.xml...
PLS file is valid.

If the grammar is not correct, you will get error messages for example as follows:

$ validatesrgs sample-invalid.grxml
Validating SRGS file sample-invalid.grxml...
[error] Invalid SRGS file.
Element '{http://www.w3.org/2001/06/grammar}one-of': Missing child element(s). Expected is ( {http://www.w3.org/2001/06/grammar}item )., line 12

Visualization tool

OpenHRI has more powerful tool to validate the structure of the W3C-SRGS grammar. “juliustographviz” tool can visualize the grammar in graph to check the correctness.

To draw the graph, enter following command:

$ srgstojulius sample-en.grxml | juliustographviz | dot -Txlib

For example, you will get the following output:

sample-grammar-en.png

Lexicon generation tool

After you have finished writing W3C-SRGS grammar, you sometime have to prepare W3C-PLS lexicon (this process is required when you have designed a grammar which contains words with special readings).

OpenHRI provides a tool to automatically generate W3C-PLS lexicon from the W3C-SRGS grammar.

The “srgstopls” tool can be used as follows:

$ srgstopls sample-en.grxml > sample-lex-en.xml

English lexicon (by using julius-voxforge dictionary) and Japanese lexicon (by using julius-runkit dictionary) are supported by this tool at the moment.

Note

Words not in the dictionary remains blank in output XML file. You should always check the output XML and fill in manually for such words.