Sophie

Sophie

distrib > Mageia > 2 > i586 > by-pkgid > a2e5ae2091c2674a899ba2cbfce176e5 > files > 40

festival-2.1-3.mga1.i586.rpm

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52
     from ../festival.texi on 2 August 2001 -->

<TITLE>Festival Speech Synthesis System - 29  Examples</TITLE>
</HEAD>
<BODY bgcolor="#ffffff">
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_28.html">previous</A>, <A HREF="festival_30.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC137" HREF="festival_toc.html#TOC137">29  Examples</A></H1>

<P>
This chapter contains some simple walkthrough examples of using
Festival in various ways, not just as speech synthesizer

</P>



<H2><A NAME="SEC138" HREF="festival_toc.html#TOC138">29.1  POS Example</A></H2>

<P>
<A NAME="IDX370"></A>
<A NAME="IDX371"></A>
This example shows how we can use part of the standard synthesis process
to tokenize and tag a file of text.  This section does not cover
training and setting up a part of speech tag set (See section <A HREF="festival_16.html#SEC62">16  POS tagging</A>),
only how to go about using the standard POS tagger on text.

</P>
<P>
This example also shows how to use Festival as a simple scripting
language, and how to modify various methods used during text to speech.

</P>
<P>
The file <TT>`examples/text2pos'</TT> contains an executable shell script
which will read arbitrary ascii text from standard input and produce
words and their part of speech (one per line) on standard output.

</P>
<P>
A Festival script, like any other UNIX script, it must start with the
the characters <CODE>#!</CODE> followed by the name of the <TT>`festival'</TT>
executable.  For scripts the option <CODE>-script</CODE> is also
required.  Thus our first line looks like

<PRE>
#!/usr/local/bin/festival -script
</PRE>

<P>
Note that the pathname may need to be different on your system

</P>
<P>
Following this we have copious comments, to keep our lawyers happy,
before we get into the real script.

</P>
<P>
The basic idea we use is that the tts process segments text into
utterances, those utterances are then passed to a list of functions, as
defined by the Scheme variable <CODE>tts_hooks</CODE>.  Normally this variable
contains a list of two function, <CODE>utt.synth</CODE> and <CODE>utt.play</CODE> which
will synthesize and play the resulting waveform.  In this case, instead,
we wish to predict the part of speech value, and then print it out.

</P>
<P>
The first function we define basically replaces the normal synthesis
function <CODE>utt.synth</CODE>.  It runs the standard festival utterance
modules used in the synthesis process, up to the point where POS is
predicted. This function looks like

<PRE>
(define (find-pos utt)
"Main function for processing TTS utterances.  Predicts POS and
prints words with their POS"
  (Token utt)
  (POS utt)
)
</PRE>

<P>
The normal text-to-speech process first tokenizes the text splitting it
in to "sentences".  The utterance type of these is <CODE>Token</CODE>.  Then
we call the <CODE>Token</CODE> utterance module, which converts the tokens to
a stream of words.  Then we call the <CODE>POS</CODE> module to predict part
of speech tags for each word.  Normally we would call other modules
ultimately generating a waveform but in this case we need no further
processing.

</P>
<P>
The second function we define is one that will print out the words and
parts of speech

<PRE>
(define (output-pos utt)
"Output the word/pos for each word in utt"
 (mapcar
  (lambda (pair)
    (format t "%l/%l\n" (car pair) (car (cdr pair))))
  (utt.features utt 'Word '(name pos))))
</PRE>

<P>
This uses the <CODE>utt.features</CODE> function to extract features from the
items in a named stream of an utterance.  In this case we want the
<CODE>name</CODE> and <CODE>pos</CODE> features for each item in the <CODE>Word</CODE>
stream.  Then for each pair we print out the word's name, a slash and its
part of speech followed by a newline.

</P>
<P>
Our next job is to redefine the functions to be called
during text to speech.  The variable <CODE>tts_hooks</CODE> is defined
in <TT>`lib/tts.scm'</TT>.  Here we set it to our two newly-defined
functions

<PRE>
(set! tts_hooks (list find-pos output-pos))
</PRE>

<P>
<A NAME="IDX372"></A>
<A NAME="IDX373"></A>
So that garbage collection messages do not appear on the screen
we stop the message from being outputted by the following
command

<PRE>
(gc-status nil)
</PRE>

<P>
The final stage is to start the tts process running on standard
input.  Because we have redefined what functions are to be run on
the utterances, it will no longer generate speech but just predict
part of speech and print it to standard output.

<PRE>
(tts_file "-")
</PRE>

<P><HR><P>
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_28.html">previous</A>, <A HREF="festival_30.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
</BODY>
</HTML>