Sophie

Sophie

distrib > Mageia > 2 > i586 > by-pkgid > a2e5ae2091c2674a899ba2cbfce176e5 > files > 53

festival-2.1-3.mga1.i586.rpm

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52
     from ../festival.texi on 2 August 2001 -->

<TITLE>Festival Speech Synthesis System - 9  TTS</TITLE>
</HEAD>
<BODY bgcolor="#ffffff">
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_8.html">previous</A>, <A HREF="festival_10.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC27" HREF="festival_toc.html#TOC27">9  TTS</A></H1>

<P>
Festival supports text to speech for raw text files.  If you
are not interested in using Festival in any other way except as
black box for rendering text as speech, the following method
is probably what you want.

<PRE>
festival --tts myfile
</PRE>

<P>
This will say the contents of <TT>`myfile'</TT>.  Alternatively text
may be submitted on standard input

<PRE>
echo hello world | festival --tts
cat myfile | festival --tts
</PRE>

<P>
<A NAME="IDX103"></A>
Festival supports the notion of <EM>text modes</EM> where the text file
type may be identified, allowing Festival to process the file in an
appropriate way.  Currently only two types are considered stable:
<CODE>STML</CODE> and <CODE>raw</CODE>, but other types such as <CODE>email</CODE>,
<CODE>HTML</CODE>, <CODE>Latex</CODE>, etc. are being developed and discussed below.
This follows the idea of buffer modes in Emacs where a file's type can
be utilized to best display the text.  Text mode may also be selected
based on a filename's extension.

</P>
<P>
Within the command interpreter the function <CODE>tts</CODE> is used
to render files as text; it takes a filename and the text mode 
as arguments.

</P>



<H2><A NAME="SEC28" HREF="festival_toc.html#TOC28">9.1  Utterance chunking</A></H2>

<P>
<A NAME="IDX104"></A>
<A NAME="IDX105"></A>
Text to speech works by first tokenizing the file and chunking the
tokens into utterances.  The definition of utterance breaks is
determined by the utterance tree in variable <CODE>eou_tree</CODE>.  A default
version is given in <TT>`lib/tts.scm'</TT>.  This uses a decision tree to
determine what signifies an utterance break.  Obviously blank lines are
probably the most reliable, followed by certain punctuation.  The
confusion of the use of periods for both sentence breaks and
abbreviations requires some more heuristics to best guess their
different use.  The following tree is currently used which
works better than simply using punctuation.

<PRE>
(defvar eou_tree 
'((n.whitespace matches ".*\n.*\n\\(.\\|\n\\)*") ;; 2 or more newlines
  ((1))
  ((punc in ("?" ":" "!"))
   ((1))
   ((punc is ".")
    ;; This is to distinguish abbreviations vs periods
    ;; These are heuristics
    ((name matches "\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)")
     ((n.whitespace is " ")
      ((0))                  ;; if abbrev single space isn't enough for break
      ((n.name matches "[A-Z].*")
       ((1))
       ((0))))
     ((n.whitespace is " ")  ;; if it doesn't look like an abbreviation
      ((n.name matches "[A-Z].*")  ;; single space and non-cap is no break
       ((1))
       ((0)))
      ((1))))
    ((0)))))
</PRE>

<P>
The token items this is applied to will always (except in the
end of file case) include one following token, so look ahead is
possible.  The "n." and "p." and "p.p." prefixes allow access to the
surrounding token context.  The features <CODE>name</CODE>, <CODE>whitespace</CODE>
and <CODE>punc</CODE> allow access to the contents of the token itself.  At
present there is no way to access the lexicon form this tree which
unfortunately might be useful if certain abbreviations were identified
as such there.

</P>
<P>
Note these are heuristics and written by hand not trained from data,
though problems have been fixed as they have been observed in data.  The
above rules may make mistakes where abbreviations appear at end of
lines, and when improper spacing and capitalization is used.  This is
probably worth changing, for modes where more casual text appears, such
as email messages and USENET news messages.  A possible improvement
could be made by analysing a text to find out its basic threshold of
utterance break (i.e. if no full stop, two spaces, followed by a
capitalized word sequences appear and the text is of a reasonable length
then look for other criteria for utterance breaks).

</P>
<P>
Ultimately what we are trying to do is to chunk the text into utterances
that can be synthesized quickly and start to play them quickly to
minimise the time someone has to wait for the first sound when starting
synthesis.  Thus it would be better if this chunking were done on
<EM>prosodic phrases</EM> rather than chunks more similar to linguistic
sentences.  Prosodic phrases are bounded in size, while sentences are
not.

</P>


<H2><A NAME="SEC29" HREF="festival_toc.html#TOC29">9.2  Text modes</A></H2>

<P>
<A NAME="IDX106"></A>
We do not believe that all texts are of the same type.  Often information
about the general contents of file will aid synthesis greatly.  For
example in Latex files we do not want to here "left brace, backslash e
m" before each emphasized word, nor do we want to necessarily hear
formating commands.  Festival offers a basic method for specifying
customization rules depending on the <EM>mode</EM> of the text.  By type
we are following the notion of modes in Emacs and eventually will allow
customization at a similar level.

</P>
<P>
Modes are specified as the third argument to the function <CODE>tts</CODE>.
When using the Emacs interface to Festival the buffer mode is
automatically passed as the text mode.  If the mode is not supported a
warning message is printed and the raw text mode is used.

</P>
<P>
Our initial text mode implementation allows configuration both in C++
and in Scheme.  Obviously in C++ almost anything can be done but it is
not as easy to reconfigure without recompilation.  Here
we will discuss those modes which can be fully configured at 
run time.

</P>
<P>
A text mode may contain the following
<DL COMPACT>

<DT><EM>filter</EM>
<DD>
A Unix shell program filter that processes the text file in some 
appropriate way.  For example for email it might remove uninteresting
headers and just output the subject, from line and the message body.
If not specified, an identity filter is used.
<DT><EM>init_function</EM>
<DD>
This (Scheme) function will be called before any processing
will be done.  It allows further set up of tokenization rules
and voices etc.
<DT><EM>exit_function</EM>
<DD>
This (Scheme) function will be called at the end of any processing
allowing reseting of tokenization rules etc.
<DT><EM>analysis_mode</EM>
<DD>
If analysis mode is <CODE>xml</CODE> the file is read through the built in XML
parser <CODE>rxp</CODE>.  Alternatively if analysis mode is <CODE>xxml</CODE> the
filter should an SGML normalising parser and the output is processed in
a way suitable for it.  Any other value is ignored.
</DL>
<P>
These mode specific parameters are specified in the a-list
held in <CODE>tts_text_modes</CODE>.

</P>
<P>
When using Festival in Emacs the emacs buffer mode is passed to
Festival as the text mode.

</P>
<P>
Note that above mechanism is not really designed to be re-entrant,
this should be addressed in later versions.

</P>
<P>
<A NAME="IDX107"></A>
<A NAME="IDX108"></A>
Following the use of auto-selection of mode in Emacs, Festival can
auto-select the text mode based on the filename given when no explicit
mode is given.  The Lisp variable <CODE>auto-text-mode-alist</CODE> is a list
of dotted pairs of regular expression and mode name.  For example
to specify that the <CODE>email</CODE> mode is to be used for files ending
in <TT>`.email'</TT> we would add to the current <CODE>auto-text-mode-alist</CODE>
as follows

<PRE>
(set! auto-text-mode-alist
      (cons (cons "\\.email$" 'email) 
            auto-text-mode-alist))
</PRE>

<P>
If the function <CODE>tts</CODE> is called with a mode other than <CODE>nil</CODE>
that mode overrides any specified by the <CODE>auto-text-mode-alist</CODE>.
The mode <CODE>fundamental</CODE> is the explicit "null" mode, it is used
when no mode is specified in the function <CODE>tts</CODE>, and match
is found in <CODE>auto-text-mode-alist</CODE> or the specified mode
is not found.

</P>
<P>
By convention if a requested text model is not found in
<CODE>tts_text_modes</CODE> the file <TT>`MODENAME-mode'</TT> will be
<CODE>required</CODE>.  Therefore if you have the file
<TT>`MODENAME-mode.scm'</TT> in your library then it will be automatically
loaded on reference.  Modes may be quite large and it is not necessary
have Festival load them all at start up time.

</P>
<P>
Because of the <CODE>auto-text-mode-alist</CODE> and the auto loading
of currently undefined text modes you can use Festival like

<PRE>
festival --tts example.email
</PRE>

<P>
Festival with automatically synthesize <TT>`example.email'</TT> in text
mode <CODE>email</CODE>.

</P>
<P>
<A NAME="IDX109"></A>
If you add your own personal text modes you should do the following.
Suppose you've written an HTML mode.  You have named it
<TT>`html-mode.scm'</TT> and put it in <TT>`/home/awb/lib/festival/'</TT>.  In
your <TT>`.festivalrc'</TT> first identify you're personal Festival library
directory by adding it to <CODE>lib-path</CODE>.

<PRE>
(set! lib-path (cons "/home/awb/lib/festival/" lib-path))
</PRE>

<P>
Then add the definition to the <CODE>auto-text-mode-alist</CODE>
that file names ending <TT>`.html'</TT> or <TT>`.htm'</TT> should
be read in HTML mode.

<PRE>
(set! auto-text-mode-alist
      (cons (cons "\\.html?$" 'html) 
            auto-text-mode-alist))
</PRE>

<P>
Then you may synthesize an HTML file either from Scheme

<PRE>
(tts "example.html" nil)
</PRE>

<P>
Or from the shell command line

<PRE>
festival --tts example.html
</PRE>

<P>
Anyone familiar with modes in Emacs should recognise that the process of
adding a new text mode to Festival is very similar to adding a new
buffer mode to Emacs.

</P>


<H2><A NAME="SEC30" HREF="festival_toc.html#TOC30">9.3  Example text mode</A></H2>

<P>
<A NAME="IDX110"></A>
Here is a short example of a tts mode for reading email messages.  It
is by no means complete but is a start at showing how you can customize
tts modes without writing new C++ code.

</P>
<P>
The first task is to define a filter that will take a saved mail
message and remove extraneous headers and just leave the from
line, subject and body of the message.  The filter program
is given a file name as its first argument and should output the
result on standard out.  For our purposes we will do this as
a shell script.

<PRE>
#!/bin/sh
#  Email filter for Festival tts mode
#  usage: email_filter mail_message &#62;tidied_mail_message
grep "^From: " $1
echo 
grep "^Subject: " $1
echo
# delete up to first blank line (i.e. the header)
sed '1,/^$/ d' $1
</PRE>

<P>
Next we define the email init function, which will be called 
when we start this mode.  What we will do is save the current
token to words function and slot in our own new one.  We can
then restore the previous one when we exit.

<PRE>
(define (email_init_func)
 "Called on starting email text mode."
 (set! email_previous_t2w_func token_to_words)
 (set! english_token_to_words email_token_to_words)
 (set! token_to_words email_token_to_words))
</PRE>

<P>
Note that <EM>both</EM> <CODE>english_token_to_words</CODE> and
<CODE>token_to_words</CODE> should be set to ensure that our new
token to word function is still used when we change voices.

</P>
<P>
The corresponding end function puts the token to words function 
back.

<PRE>
(define (email_exit_func)
 "Called on exit email text mode."
 (set! english_token_to_words email_previous_t2w_func)
 (set! token_to_words email_previous_t2w_func))
</PRE>

<P>
Now we can define the email specific token to words function.  In this
example we deal with two specific cases.  First we deal with the common
form of email addresses so that the angle brackets are not pronounced.
The second points are to recognise quoted text and immediately change the 
the speaker to the alternative speaker.

<PRE>
(define (email_token_to_words token name)
  "Email specific token to word rules."
  (cond
</PRE>

<P>
This first condition identifies the token as a bracketed email address
and removes the brackets and splits the token into name
and IP address.  Note that we recursively call the function
<CODE>email_previous_t2w_func</CODE> on the email name and IP address
so that they will be pronounced properly.  Note that because that
function returns a <EM>list</EM> of words we need to append them together.

<PRE>
   ((string-matches name "&#60;.*.*&#62;")
     (append
      (email_previous_t2w_func token
       (string-after (string-before name "@") "&#60;"))
      (cons 
       "at"
       (email_previous_t2w_func token
        (string-before (string-after name "@") "&#62;")))))
</PRE>

<P>
Our next condition deals with identifying a greater than sign being used
as a quote marker.  When we detect this we select the alternative
speaker, even though it may already be selected.  We then return no
words so the quote marker is not spoken.  The following condition finds
greater than signs which are the first token on a line.

<PRE>
   ((and (string-matches name "&#62;")
         (string-matches (item.feat token "whitespace") 
                         "[ \t\n]*\n *"))
    (voice_don_diphone)
    nil ;; return nothing to say
   )
</PRE>

<P>
If it doesn't match any of these we can go ahead and use the builtin
token to words function  Actually, we call the function that was set
before we entered this mode to ensure any other specific rules
still remain.  But before that we need to check if we've had a newline
with doesn't start with a greater than sign.  In that case we
switch back to the primary speaker.

<PRE>
   (t  ;; for all other cases
     (if (string-matches (item.feat token "whitespace") 
                         ".*\n[ \t\n]*")
         (voice_rab_diphone))
     (email_previous_t2w_func token name))))
</PRE>

<P>
<A NAME="IDX111"></A>
In addition to these we have to actually declare the text mode.
This we do by adding to any existing modes as follows.

<PRE>
(set! tts_text_modes
   (cons
    (list
      'email   ;; mode name
      (list         ;; email mode params
       (list 'init_func email_init_func)
       (list 'exit_func email_exit_func)
       '(filter "email_filter")))
    tts_text_modes))
</PRE>

<P>
This will now allow simple email messages to be dealt with in a mode
specific way.  

</P>
<P>
An example mail message is included in <TT>`examples/ex1.email'</TT>.  To
hear the result of the above text mode start Festival, load
in the email mode descriptions,  and call TTS on the example file.

<PRE>
(tts ".../examples/ex1.email" 'email)
</PRE>

<P>
The above is very short of a real email mode but does illustrate
how one might go about building one.  It should be reiterated
that text modes are new in Festival and their most effective form
has not been discovered yet.  This will improve with time
and experience.

</P>
<P><HR><P>
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_8.html">previous</A>, <A HREF="festival_10.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
</BODY>
</HTML>