Sophie: festival-2.1-3.mga1 i586

festival-2.1-3.mga1.i586.rpm

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52
     from ../festival.texi on 2 August 2001 -->

<TITLE>Festival Speech Synthesis System - 25  Tools</TITLE>
</HEAD>
<BODY bgcolor="#ffffff">
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_24.html">previous</A>, <A HREF="festival_26.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC110" HREF="festival_toc.html#TOC110">25  Tools</A></H1>

<P>
<A NAME="IDX323"></A>
A number of basic data manipulation tools are supported by Festival.
These often make building new modules very easy and are already used
in many of the existing modules.  They typically offer a Scheme method
for entering data, and Scheme and C++ functions for evaluating it.

</P>



<H2><A NAME="SEC111" HREF="festival_toc.html#TOC111">25.1  Regular expressions</A></H2>

<P>
<A NAME="IDX324"></A>
<A NAME="IDX325"></A>
<A NAME="IDX326"></A>
Regular expressions are a formal method for describing a certain class
of mathematical languages.  They may be viewed as patterns which match
some set of strings.  They are very common in many software tools such
as scripting languages like the UNIX shell, PERL, awk, Emacs etc.
Unfortunately the exact form of regualr expressions often differs
slightly between different applications making their use often a little
tricky.  

</P>
<P>
Festival support regular expressions based mainly of the form used in
the GNU libg++ <CODE>Regex</CODE> class, though we have our own implementation
of it.  Our implementation (<CODE>EST_Regex</CODE>) is actually based on Henry
Spencer's <TT>`regex.c'</TT> as distributed with BSD 4.4.

</P>
<P>
Regular expressions are represented as character strings which
are interpreted as regular expressions by certain Scheme
and C++ functions.  Most characters in a regular expression are
treated as literals and match only that character but a number
of others have special meaning.  Some characters may be escaped
with preceeding backslashes to change them from operators to literals
(or sometime literals to operators).

</P>
<DL COMPACT>

<DT><CODE>.</CODE>
<DD>
Matches any character.  
<DT><CODE>$</CODE>
<DD>
matches end of string
<DT><CODE>^</CODE>
<DD>
matches beginning of string
<DT><CODE>X*</CODE>
<DD>
matches zero or more occurrences of X, X may be a character, range
of parenthesized expression.
<DT><CODE>X+</CODE>
<DD>
matches one or more occurrences of X, X may be a character, range
of parenthesized expression.
<DT><CODE>X?</CODE>
<DD>
matches zero or one occurrence of X, X may be a character, range
of parenthesized expression.  
<DT><CODE>[...]</CODE>
<DD>
a ranges matches an of the values in the brackets.  The range 
operator "-" allows specification of ranges e.g. <CODE>a-z</CODE> for all
lower case characters.  If the first character of the range is 
<CODE>^</CODE> then it matches anything character except those specificed
in the range.  If you wish <CODE>-</CODE> to be in the range you must
put that first.
<DT><CODE>\\(...\\)</CODE>
<DD>
Treat contents of parentheses as single object allowing operators
<CODE>*</CODE>, <CODE>+</CODE>, <CODE>?</CODE> etc to operate on more than single characters.
<DT><CODE>X\\|Y</CODE>
<DD>
matches either X or Y.  X or Y may be single characters, ranges
or parenthesized expressions.
</DL>
<P>
Note that actuall only one backslash is needed before a character to
escape it but becuase these expressions are most often contained with
Scheme or C++ strings, the escpae mechanaism for those strings requires
that backslash itself be escaped, hence you will most often be
required to type two backslashes.

</P>
<P>
Some example may help in enderstanding the use of regular
expressions.
<DL COMPACT>

<DT><CODE>a.b</CODE>
<DD>
matches any three letter string starting with an <CODE>a</CODE> and 
ending with a <CODE>b</CODE>.
<DT><CODE>.*a</CODE>
<DD>
matches any string ending in an <CODE>a</CODE>
<DT><CODE>.*a.*</CODE>
<DD>
matches any string containing an <CODE>a</CODE>
<DT><CODE>[A-Z].*</CODE>
<DD>
matches any string starting with a capital letter
<DT><CODE>[0-9]+</CODE>
<DD>
matches any string of digits
<DT><CODE>-?[0-9]+\\(\\.[0-9]+\\)?</CODE>
<DD>
matches any positive or negative real number.  Note the optional
preceeding minus sign and the optional part contain the point and
following numbers.  The point itself must be escaped as dot on its
own matches any character.
<DT><CODE>[^aeiouAEIOU]+</CODE>
<DD>
mathes any non-empty string which doesn't conatin a vowel
<DT><CODE>\\([Ss]at\\(urday\\)\\)?\\|\\([Ss]un\\(day\\)\\)</CODE>
<DD>
matches Saturday and Sunday in various ways
</DL>

<P>
The Scheme function <CODE>string-matches</CODE> takes a string and
a regular expression and returns <CODE>t</CODE> if the regular 
expression macthes the string and <CODE>nil</CODE> otherwise.

</P>


<H2><A NAME="SEC112" HREF="festival_toc.html#TOC112">25.2  CART trees</A></H2>

<P>
<A NAME="IDX327"></A>
One of the basic tools available with Festival is a system for building
and using Classification and Regression Trees (<CITE>breiman84</CITE>).  This
standard statistical method can be used to predict both categorical and
continuous data from a set of feature vectors.

</P>
<P>
<A NAME="IDX328"></A>
The tree itself contains yes/no questions about features and ultimately
provides either a probability distribution, when predicting categorical
values (classification tree), or a mean and standard deviation when
predicting continuous values (regression tree).  Well defined techniques
can be used to construct an optimal tree from a set of training data.
The program, developed in conjunction with Festival, called
<TT>`wagon'</TT>, distributed with the speech tools, provides a basic but
ever increasingly powerful method for constructing trees.

</P>
<P>
A tree need not be automatically constructed, CART trees have the
advantage over some other automatic training methods, such as neural
networks and linear regression, in that their output is more readable
and often understandable by humans.  Importantly this makes it possible
to modify them.  CART trees may also be fully hand constructed.  This
is used, for example, in generating some duration models for languages we
do not yet have full databases to train from.

</P>
<P>
A CART tree has the following syntax

<PRE>
    CART ::= QUESTION-NODE || ANSWER-NODE
    QUESTION-NODE ::= ( QUESTION YES-NODE NO-NODE )
    YES-NODE ::= CART
    NO-NODE ::= CART
    QUESTION ::= ( FEATURE in LIST )
    QUESTION ::= ( FEATURE is STRVALUE )
    QUESTION ::= ( FEATURE = NUMVALUE )
    QUESTION ::= ( FEATURE &#62; NUMVALUE )
    QUESTION ::= ( FEATURE &#60; NUMVALUE )
    QUESTION ::= ( FEATURE matches REGEX )
    ANSWER-NODE ::= CLASS-ANSWER || REGRESS-ANSWER
    CLASS-ANSWER ::= ( (VALUE0 PROB) (VALUE1 PROB) ... MOST-PROB-VALUE )
    REGRESS-ANSWER ::= ( ( STANDARD-DEVIATION MEAN ) )
</PRE>

<P>
Note that answer nodes are distinguished by their car not being atomic.

</P>
<P>
<A NAME="IDX329"></A>
The interpretation of a tree is with respect to a Stream_Item
The <VAR>FEATURE</VAR> in a tree is a standard feature (see section <A HREF="festival_14.html#SEC54">14.6  Features</A>).

</P>
<P>
The following example tree is used in one of the Spanish voices
to predict variations from average durations.

<PRE>
(set! spanish_dur_tree
 '
(set! spanish_dur_tree
 '
   ((R:SylStructure.parent.R:Syllable.p.syl_break &#62; 1 ) ;; clause initial
    ((R:SylStructure.parent.stress is 1)
     ((1.5))
     ((1.2)))
    ((R:SylStructure.parent.syl_break &#62; 1)   ;; clause final
     ((R:SylStructure.parent.stress is 1)
      ((2.0))
      ((1.5)))
     ((R:SylStructure.parent.stress is 1)
      ((1.2))
      ((1.0))))))
</PRE>

<P>
It is applied to the segment stream to give a factor to multiply
the average by.

</P>
<P>
<CODE>wagon</CODE> is constantly improving and with version 1.2 of the speech
tools may now be considered fairly stable for its basic operations.
Experimental features are described in help it gives.  See the
Speech Tools manual for a more comprehensive discussion of using 
<TT>`wagon'</TT>.

</P>
<P>
However the above format of trees is similar to those produced by many
other systems and hence it is reasonable to translate their formats into
one which Festival can use.

</P>


<H2><A NAME="SEC113" HREF="festival_toc.html#TOC113">25.3  Ngrams</A></H2>

<P>
<A NAME="IDX330"></A>
Bigram, trigrams, and general ngrams are used in the part
of speech tagger and the phrase break predicter.  An Ngram
C++ Class is defined in the speech tools library and some simple
facilities are added within Festival itself.

</P>
<P>
Ngrams may be built from files of tokens using the program
<CODE>ngram_build</CODE> which is part of the speech tools.  See
the speech tools documentation for details.

</P>
<P>
Within Festival ngrams may be named and loaded from files
and used when required.  The LISP function <CODE>load_ngram</CODE>
takes a name and a filename as argument and loads the Ngram 
from that file.  For an example of its use once loaded see
<TT>`src/modules/base/pos.cc'</TT> or 
<TT>`src/modules/base/phrasify.cc'</TT>.

</P>


<H2><A NAME="SEC114" HREF="festival_toc.html#TOC114">25.4  Viterbi decoder</A></H2>

<P>
<A NAME="IDX331"></A>
Another common tool is a Viterbi decoder.  This C++ Class is defined in
the speech tools library <TT>`speech_tooks/include/EST_viterbi.h'</TT> and
<TT>`speech_tools/stats/EST_viterbi.cc'</TT>.  A Viterbi decoder
requires two functions at declaration time.  The first constructs
candidates at each stage, while the second combines paths.  A number of
options are available (which may change).

</P>
<P>
The prototypical example of use is in the part of speech tagger which
using standard Ngram models to predict probabilities of tags.
See <TT>`src/modules/base/pos.cc'</TT> for an example.

</P>
<P>
The Viterbi decoder can also be used through the Scheme function
<CODE>Gen_Viterbi</CODE>.  This function respects the parameters defined
in the variable <CODE>get_vit_params</CODE>.  Like other modules this
parameter list is an assoc list of feature name and value.  The
parameters supported are:
<DL COMPACT>

<DT><CODE>Relation</CODE>
<DD>
The name of the relation the decoeder is to be applied to.
<DT><CODE>cand_function</CODE>
<DD>
A function that is to be called for each item that will return
a list of candidates (with probilities).
<DT><CODE>return_feat</CODE>
<DD>
The name of a feature that the best candidate is to be returned in
for each item in the named relation.
<DT><CODE>p_word</CODE>
<DD>
The previous word to the first item in the named relation (only used
when ngrams are the "language model").
<DT><CODE>pp_word</CODE>
<DD>
The previous previous word to the first item in the named relation 
(only used when ngrams are the "language model").
<DT><CODE>ngramname</CODE>
<DD>
the name of an ngram (loaded by <CODE>ngram.load</CODE>) to be used 
as a "language model".
<DT><CODE>wfstmname</CODE>
<DD>
the name of a WFST (loaded by <CODE>wfst.load</CODE>) to be used 
as a "language model", this is ignored if an <CODE>ngramname</CODE> is also
specified.
<DT><CODE>debug</CODE>
<DD>
If specified more debug features are added to the items in the
relation.
<DT><CODE>gscale_p</CODE>
<DD>
Grammar scaling factor.
</DL>
<P>
Here is a short example to help make the use of this facility clearer.

</P>
<P>
There are two parts required for the Viterbi decode a set of
candidate observations and some "language model".  For the
math to work properly the candidate observations must be reverse
probabilities (for each candidiate as given what is the probability
of the observation, rather than the probability of the candidate
given the observation).  These can be calculated for the
probabilties candidate given the observation divided by the 
probability of the candidate in isolation.  

</P>
<P>
For the sake of simplicity let us assume we have a lexicon of words to
distribution of part of speech tags with reverse probabilities.  And an
tri-gram called <CODE>pos-tri-gram</CODE> over ngram sequences of part of
speech tags.  First we must define the candidate function

<PRE>
(define (pos_cand_function w)
 ;; select the appropriate lexicon
 (lex.select 'pos_lex)
 ;; return the list of cands with rprobs
 (cadr 
  (lex.lookup (item.name w) nil)))
</PRE>

<P>
The returned candidate list would look somthing like

<PRE>
( (jj -9.872) (vbd -6.284) (vbn -5.565) )
</PRE>

<P>
Our part of speech tagger function would look something
like this

<PRE>
(define (pos_tagger utt)
  (set! get_vit_params
        (list
         (list 'Relation "Word")
         (list 'return_feat 'pos_tag)
         (list 'p_word "punc")
         (list 'pp_word "nn")
         (list 'ngramname "pos-tri-gram")
         (list 'cand_function 'pos_cand_function)))
  (Gen_Viterbi utt)
  utt)
</PRE>

<P>
this will assign the optimal part of speech tags to each word in utt.

</P>


<H2><A NAME="SEC115" HREF="festival_toc.html#TOC115">25.5  Linear regression</A></H2>

<P>
<A NAME="IDX332"></A>
The linear regression model takes models built from some external
package and finds coefficients based on the features and weights.  A
model consists of a list of features.  The first should be the atom
<CODE>Intercept</CODE> plus a value.  The following in the list should consist
of a feature (see section <A HREF="festival_14.html#SEC54">14.6  Features</A>) followed by a weight.  An optional third
element may be a list of atomic values.  If the result of the feature is
a member of this list the feature's value is treated as 1 else it is 0.
This third argument allows an efficient way to map categorical values
into numeric values.  For example, from the F0 prediction model in
<TT>`lib/f2bf0lr.scm'</TT>.  The first few parameters are

<PRE>
(set! f2b_f0_lr_start
'(
   ( Intercept 160.584956 )
   ( Word.Token.EMPH 36.0 )
   ( pp.tobi_accent 10.081770 (H*) )
   ( pp.tobi_accent 3.358613 (!H*) )
   ( pp.tobi_accent 4.144342 (*? X*? H*!H* * L+H* L+!H*) )
   ( pp.tobi_accent -1.111794 (L*) )
   ...
)
</PRE>

<P>
Note the feature <CODE>pp.tobi_accent</CODE> returns an atom, and is hence
tested with the map groups specified as third arguments.

</P>
<P>
Models may be built from feature data (in the same format as
<TT>`wagon'</TT> using the <TT>`ols'</TT> program distributed with the speech
tools library.

</P>
<P><HR><P>
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_24.html">previous</A>, <A HREF="festival_26.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
</BODY>
</HTML>