Sophie

Sophie

distrib > Mageia > 2 > i586 > by-pkgid > a2e5ae2091c2674a899ba2cbfce176e5 > files > 59

festival-2.1-3.mga1.i586.rpm

This is festival.info, produced by Makeinfo version 3.12h from
festival.texi.

   This file documents the `Festival' Speech Synthesis System a general
text to speech system for making your computer talk and developing new
synthesis techniques.

   Copyright (C) 1996-2001 University of Edinburgh

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the authors.


File: festival.info,  Node: Basic command line options,  Next: Simple command driven session,  Up: Quick start

Basic command line options
==========================

   Festival's basic calling method is as

     festival [options] file1 file2 ...

   Options may be any of the following

`-q'
     start Festival without loading `init.scm' or user's `.festivalrc'

`-b'
`--batch'
     After processing any file arguments do not become interactive

`-i'
`--interactive'
     After processing file arguments become interactive.  This option
     overrides any batch argument.

`--tts'
     Treat file arguments in text-to-speech mode, causing them to be
     rendered as speech rather than interpreted as commands.  When
     selected in interactive mode the command line edit functions are
     not available

`--command'
     Treat file arguments in command mode.  This is the default.

`--language LANG'
     Set the default language to LANG.  Currently LANG may be one of
     `english', `spanish' or `welsh' (depending on what voices are
     actually available in your installation).

`--server'
     After loading any specified files go into server mode.  This is a
     mode where Festival waits for clients on a known port (the value
     of `server_port', default is 1314).  Connected clients may send
     commands (or text) to the server and expect waveforms back. *Note
     Server/client API::.  Note server mode may be unsafe and allow
     unauthorised access to your machine, be sure to read the security
     recommendations in *Note Server/client API::

`--script scriptfile'
     Run scriptfile as a Festival script file.  This is similar to to
     `--batch' but it encapsulates the command line arguments into the
     Scheme variables `argv' and `argc', so that Festival scripts may
     process their command line arguments just like any other program.
     It also does not load the the basic initialisation files as
     sometimes you may not want to do this.  If you wish them, you
     should copy the loading sequence from an example Festival script
     like `festival/examples/saytext'.

`--heap NUMBER'
     The Scheme heap (basic number of Lisp cells) is of a fixed size and
     cannot be dynamically increased at run time (this would complicate
     garbage collection).  The default size is 210000 which seems to be
     more than adequate for most work.  In some of our training
     experiments where very large list structures are required it is
     necessary to increase this.  Note there is a trade off between
     size of the heap and time it takes to garbage collect so making
     this unnecessarily big is not a good idea.  If you don't
     understand the above explanation you almost certainly don't need
     to use the option.  In command mode, if the file name starts with
a left parenthesis, the name itself is read and evaluated as a Lisp
command.  This is often convenient when running in batch mode and a
simple command is necessary to start the whole thing off after loading
in some other specific files.


File: festival.info,  Node: Simple command driven session,  Next: Getting some help,  Prev: Basic command line options,  Up: Quick start

Sample command driven session
=============================

   Here is a short session using Festival's command interpreter.

   Start Festival with no arguments
     $ festival
     Festival Speech Synthesis System 1.4.2:release July 2001
     Copyright (C) University of Edinburgh, 1996-2001. All rights reserved.
     For details type `(festival_warranty)'
     festival>

   Festival uses the a command line editor based on editline for
terminal input so command line editing may be done with Emacs commands.
Festival also supports history as well as function, variable name, and
file name completion via the <TAB> key.

   Typing `help' will give you more information, that is `help' without
any parenthesis.  (It is actually a variable name whose value is a
string containing help.)

   Festival offers what is called a read-eval-print loop, because it
reads an s-expression (atom or list), evaluates it and prints the
result.  As Festival includes the SIOD Scheme interpreter most standard
Scheme commands work
     festival> (car '(a d))
     a
     festival> (+ 34 52)
     86
   In addition to standard Scheme commands a number of commands
specific to speech synthesis are included.  Although, as we will see,
there are simpler methods for getting Festival to speak, here are the
basic underlying explicit functions used in synthesizing an utterance.

   Utterances can consist of various types (*Note Utterance types::),
but the simplest form is plain text.  We can create an utterance and
save it in a variable
     festival> (set! utt1 (Utterance Text "Hello world"))
     #<Utterance 1d08a0>
     festival>
   The (hex) number in the return value may be different for your
installation.  That is the print form for utterances.  Their internal
structure can be very large so only a token form is printed.

   Although this creates an utterance it doesn't do anything else.  To
get a waveform you must synthesize it.
     festival> (utt.synth utt1)
     #<Utterance 1d08a0>
     festival>
   This calls various modules, including tokenizing, duration,.
intonation etc.  Which modules are called are defined with respect to
the type of the utterance, in this case `Text'. It is possible to
individually call the modules by hand but you just wanted it to talk
didn't you.  So
     festival> (utt.play utt1)
     #<Utterance 1d08a0>
     festival>
will send the synthesized waveform to your audio device.  You should
   hear "Hello world" from your machine.

   To make this all easier a small function doing these three steps
exists.  `SayText' simply takes a string of text, synthesizes it and
sends it to the audio device.
     festival> (SayText "Good morning, welcome to Festival")
     #<Utterance 1d8fd0>
     festival>
   Of course as history and command line editing are supported <c-p> or
up-arrow will allow you to edit the above to whatever you wish.

   Festival may also synthesize from files rather than simply text.
     festival> (tts "myfile" nil)
     nil
     festival>
   The end of file character <c-d> will exit from Festival and return
you to the shell, alternatively the command `quit' may be called (don't
forget the parentheses).

   Rather than starting the command interpreter, Festival may synthesize
files specified on the command line
     unix$ festival --tts myfile
     unix$

   Sometimes a simple waveform is required from text that is to be kept
and played at some later time.  The simplest way to do this with
festival is by using the `text2wave' program.  This is a festival
script that will take a file (or text from standard input) and produce
a single waveform.

   An example use is
     text2wave myfile.txt -o myfile.wav
   Options exist to specify the waveform file type, for example if Sun
audio format is required
     text2wave myfile.txt -otype snd -o myfile.wav
   Use `-h' on `text2wave' to see all options.


File: festival.info,  Node: Getting some help,  Prev: Simple command driven session,  Up: Quick start

Getting some help
=================

   If no audio is generated then you must check to see if audio is
properly initialized on your machine. *Note Audio output::.

   In the command interpreter <m-h> (meta-h) will give you help on the
current symbol before the cursor.  This will be a short description of
the function or variable, how to use it and what its arguments are.  A
listing of all such help strings appears at the end of this document.
<m-s> will synthesize and say the same information, but this extra
function is really just for show.

   The lisp function `manual' will send the appropriate command to an
already running Netscape browser process.  If `nil' is given as an
argument the browser will be directed to the tables of contents of the
manual.  If a non-nil value is given it is assumed to be a section title
and that section is searched and if found displayed.  For example
     festival> (manual "Accessing an utterance")
   Another related function is `manual-sym' which given a symbol will
check its documentation string for a cross reference to a manual
section and request Netscape to display it.  This function is bound to
<m-m> and will display the appropriate section for the given symbol.

   Note also that the <TAB> key can be used to find out the name of
commands available as can the function `Help' (remember the
parentheses).

   For more up to date information on Festival regularly check the
Festival Home Page at
     `http://www.cstr.ed.ac.uk/projects/festival.html'

   Further help is available by mailing questions to
     festival-help@cstr.ed.ac.uk
   Although we cannot guarantee the time required to answer you, we
will do our best to offer help.

   Bug reports should be submitted to
     festival-bug@cstr.ed.ac.uk

   If there is enough user traffic a general mailing list will be
created so all users may share comments and receive announcements.  In
the mean time watch the Festival Home Page for news.


File: festival.info,  Node: Scheme,  Next: TTS,  Prev: Quick start,  Up: Top

Scheme
******

   Many people seem daunted by the fact that Festival uses Scheme as its
scripting language and feel they can't use Festival because they don't
know Scheme.  However most of those same people use Emacs everyday which
also has (a much more complex) Lisp system underneath.  The number of
Scheme commands you actually need to know in Festival is really very
small and you can easily just find out as you go along.  Also people use
the Unix shell often but only know a small fraction of actual commands
available in the shell (or in fact that there even is a distinction
between shell builtin commands and user definable ones).  So take it
easy, you'll learn the commands you need fairly quickly.

* Menu:

* Scheme references::   Places to learn more about Scheme
* Scheme fundamentals:: Syntax and semantics
* Scheme Festival specifics::
* Scheme I/O::


File: festival.info,  Node: Scheme references,  Next: Scheme fundamentals,  Up: Scheme

Scheme references
=================

   If you wish to learn about Scheme in more detail I recommend the
book `abelson85'.

   The Emacs Lisp documentation is reasonable as it is comprehensive and
many of the underlying uses of Scheme in Festival were influenced by
Emacs.  Emacs Lisp however is not Scheme so there are some differences.

   Other Scheme tutorials and resources available on the Web are
   * The Revised Revised Revised Revised Scheme Report, the document
     defining the language is available from
          `http://tinuviel.cs.wcu.edu/res/ldp/r4rs-html/r4rs_toc.html'

   * a Scheme tutorials from the net:
        * `http://www.cs.uoregon.edu/classes/cis425/schemeTutorial.html'

   * the Scheme FAQ
        * `http://www.landfield.com/faqs/scheme-faq/part1/'


File: festival.info,  Node: Scheme fundamentals,  Next: Scheme Festival specifics,  Prev: Scheme references,  Up: Scheme

Scheme fundamentals
===================

   But you want more now, don't you, not just be referred to some other
book.  OK here goes.

   _Syntax_: an expression is an _atom_ or a _list_.  A list consists
of a left paren, a number of expressions and right paren.  Atoms can be
symbols, numbers, strings or other special types like functions, hash
tables, arrays, etc.

   _Semantics_:  All expressions can be evaluated.  Lists are evaluated
as function calls.  When evaluating a list all the members of the list
are evaluated first then the first item (a function) is called with the
remaining items in the list as arguments.  Atoms are evaluated
depending on their type: symbols are evaluated as variables returning
their values.  Numbers, strings, functions, etc. evaluate to themselves.

   Comments are started by a semicolon and run until end of line.

   And that's it. There is nothing more to the language that.  But just
in case you can't follow the consequences of that, here are some key
examples.

     festival> (+ 2 3)
     5
     festival> (set! a 4)
     4
     festival> (* 3 a)
     12
     festival> (define (add a b) (+ a b))
     #<CLOSURE (a b) (+ a b)>
     festival> (add 3 4)
     7
     festival> (set! alist '(apples pears bananas))
     (apples pears bananas)
     festival> (car alist)
     apples
     festival> (cdr alist)
     (pears bananas)
     festival> (set! blist (cons 'oranges alist))
     (oranges apples pears bananas)
     festival> (append alist blist)
     (apples pears bananas oranges apples pears bananas)
     festival> (cons alist blist)
     ((apples pears bananas) oranges apples pears bananas)
     festival> (length alist)
     3
     festival> (length (append alist blist))
     7


File: festival.info,  Node: Scheme Festival specifics,  Next: Scheme I/O,  Prev: Scheme fundamentals,  Up: Scheme

Scheme Festival specifics
=========================

   There a number of additions to SIOD that are Festival specific though
still part of the Lisp system rather than the synthesis functions per
se.

   By convention if the first statement of a function is a string, it
is treated as a documentation string.  The string will be printed when
help is requested for that function symbol.

   In interactive mode if the function `:backtrace' is called (within
parenthesis) the previous stack trace is displayed.  Calling
`:backtrace' with a numeric argument will display that particular stack
frame in full.  Note that any command other than `:backtrace' will
reset the trace.  You may optionally call
     (set_backtrace t)
   Which will cause a backtrace to be displayed whenever a Scheme error
occurs. This can be put in your `.festivalrc' if you wish.  This is
especially useful when running Festival in non-interactive mode (batch
or script mode) so that more information is printed when an error
occurs.

   A _hook_ in Lisp terms is a position within some piece of code where
a user may specify their own customization.  The notion is used heavily
in Emacs.  In Festival there a number of places where hooks are used.
A hook variable contains either a function or list of functions that
are to be applied at some point in the processing.  For example the
`after_synth_hooks' are applied after synthesis has been applied to
allow specific customization such as resampling or modification of the
gain of the synthesized waveform.  The Scheme function `apply_hooks'
takes a hook variable as argument and an object and applies the
function/list of functions in turn to the object.

   When an error occurs in either Scheme or within the C++ part of
Festival by default the system jumps to the top level, resets itself and
continues.  Note that errors are usually serious things, pointing to
bugs in parameters or code.  Every effort has been made to ensure that
the processing of text never causes errors in Festival.  However when
using Festival as a development system it is often that errors occur in
code.

   Sometimes in writing Scheme code you know there is a potential for
an error but you wish to ignore that and continue on to the next thing
without exiting or stopping and returning to the top level.  For
example you are processing a number of utterances from a database and
some files containing the descriptions have errors in them but you want
your processing to continue through every utterance that can be
processed rather than stopping 5 minutes after you gone home after
setting a big batch job for overnight.

   Festival's Scheme provides the function `unwind-protect' which
allows the catching of errors and then continuing normally.  For example
suppose you have the function `process_utt' which takes a filename and
does things which you know might cause an error.  You can write the
following to ensure you continue processing even in an error occurs.
     (unwind-protect
      (process_utt filename)
      (begin
        (format t "Error found in processing %s\n" filename)
        (format t "continuing\n")))
   The `unwind-protect' function takes two arguments.  The first is
evaluated and if no error occurs the value returned from that expression
is returned.  If an error does occur while evaluating the first
expression, the second expression is evaluated.  `unwind-protect' may
be used recursively.  Note that all files opened while evaluating the
first expression are closed if an error occurs.  All global variables
outside the scope of the `unwind-protect' will be left as they were set
up until the error.  Care should be taken in using this function but
its power is necessary to be able to write robust Scheme code.


File: festival.info,  Node: Scheme I/O,  Prev: Scheme Festival specifics,  Up: Scheme

Scheme I/O
==========

   Different Scheme's may have quite different implementations of file
i/o functions so in this section we will describe the basic functions
in Festival SIOD regarding i/o.

   Simple printing to the screen may be achieved with the function
`print' which prints the given s-expression to the screen.  The printed
form is preceded by a new line.  This is often useful for debugging but
isn't really powerful enough for much else.

   Files may be opened and closed and referred to file descriptors in a
direct analogy to C's stdio library.  The SIOD functions `fopen' and
`fclose' work in the exactly the same way as their equivalently named
partners in C.

   The `format' command follows the command of the same name in Emacs
and a number of other Lisps.  C programmers can think of it as
`fprintf'.  `format' takes a file descriptor, format string and
arguments to print.  The file description may be a file descriptor as
returned by the Scheme function `fopen', it may also be `t' which means
the output will be directed as standard out (cf. `printf').  A third
possibility is `nil' which will cause the output to printed to a string
which is returned (cf. `sprintf').

   The format string  closely follows the format strings in ANSI C, but
it is not the same.  Specifically the directives currently supported
are, `%%', `%d', `%x', `%s', `%f', `%g' and `%c'.  All modifiers for
these are also supported.  In addition `%l' is provided for printing of
Scheme objects as objects.

   For example
     (format t "%03d %3.4f %s %l %l %l\n" 23 23 "abc" "abc" '(a b d) utt1)
   will produce
     023 23.0000 abc "abc" (a b d) #<Utterance 32f228>
   on standard output.

   When large lisp expressions are printed they are difficult to read
because of the parentheses.  The function `pprintf' prints an
expression to a file description (or `t' for standard out).  It prints
so the s-expression is nicely lined up and indented.  This is often
called pretty printing in Lisps.

   For reading input from terminal or file, there is currently no
equivalent to `scanf'.  Items may only be read as Scheme expressions.
The command
     (load FILENAME t)

   will load all s-expressions in `FILENAME' and return them,
unevaluated as a list.  Without the third argument  the `load' function
will load and evaluate each s-expression in the file.

   To read individual s-expressions use `readfp'.  For example
     (let ((fd (fopen trainfile "r"))
           (entry)
           (count 0))
         (while (not (equal? (set! entry (readfp fd)) (eof-val)))
          (if (string-equal (car entry) "home")
             (set! count (+ 1 count))))
         (fclose fd))

   To convert a symbol whose print name is a number to a number use
`parse-number'.  This is the equivalent to `atof' in C.

   Note that, all i/o from Scheme input files is assumed to be
basically some form of Scheme data (though can be just numbers,
tokens).  For more elaborate analysis of incoming data it is possible
to use the text tokenization functions which offer a fully programmable
method of reading data.


File: festival.info,  Node: TTS,  Next: XML/SGML mark-up,  Prev: Scheme,  Up: Top

TTS
***

   Festival supports text to speech for raw text files.  If you are not
interested in using Festival in any other way except as black box for
rendering text as speech, the following method is probably what you
want.
     festival --tts myfile
   This will say the contents of `myfile'.  Alternatively text may be
submitted on standard input
     echo hello world | festival --tts
     cat myfile | festival --tts

   Festival supports the notion of _text modes_ where the text file
type may be identified, allowing Festival to process the file in an
appropriate way.  Currently only two types are considered stable:
`STML' and `raw', but other types such as `email', `HTML', `Latex',
etc. are being developed and discussed below.  This follows the idea of
buffer modes in Emacs where a file's type can be utilized to best
display the text.  Text mode may also be selected based on a filename's
extension.

   Within the command interpreter the function `tts' is used to render
files as text; it takes a filename and the text mode as arguments.

* Menu:

* Utterance chunking::   From text to utterances
* Text modes::           Mode specific text analysis
* Example text mode::    An example mode for reading email


File: festival.info,  Node: Utterance chunking,  Next: Text modes,  Up: TTS

Utterance chunking
==================

   Text to speech works by first tokenizing the file and chunking the
tokens into utterances.  The definition of utterance breaks is
determined by the utterance tree in variable `eou_tree'.  A default
version is given in `lib/tts.scm'.  This uses a decision tree to
determine what signifies an utterance break.  Obviously blank lines are
probably the most reliable, followed by certain punctuation.  The
confusion of the use of periods for both sentence breaks and
abbreviations requires some more heuristics to best guess their
different use.  The following tree is currently used which works better
than simply using punctuation.
     (defvar eou_tree
     '((n.whitespace matches ".*\n.*\n\\(.\\|\n\\)*") ;; 2 or more newlines
       ((1))
       ((punc in ("?" ":" "!"))
        ((1))
        ((punc is ".")
         ;; This is to distinguish abbreviations vs periods
         ;; These are heuristics
         ((name matches "\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)")
          ((n.whitespace is " ")
           ((0))                  ;; if abbrev single space isn't enough for break
           ((n.name matches "[A-Z].*")
            ((1))
            ((0))))
          ((n.whitespace is " ")  ;; if it doesn't look like an abbreviation
           ((n.name matches "[A-Z].*")  ;; single space and non-cap is no break
            ((1))
            ((0)))
           ((1))))
         ((0)))))
   The token items this is applied to will always (except in the end of
file case) include one following token, so look ahead is possible.  The
"n." and "p." and "p.p." prefixes allow access to the surrounding token
context.  The features `name', `whitespace' and `punc' allow access to
the contents of the token itself.  At present there is no way to access
the lexicon form this tree which unfortunately might be useful if
certain abbreviations were identified as such there.

   Note these are heuristics and written by hand not trained from data,
though problems have been fixed as they have been observed in data.  The
above rules may make mistakes where abbreviations appear at end of
lines, and when improper spacing and capitalization is used.  This is
probably worth changing, for modes where more casual text appears, such
as email messages and USENET news messages.  A possible improvement
could be made by analysing a text to find out its basic threshold of
utterance break (i.e. if no full stop, two spaces, followed by a
capitalized word sequences appear and the text is of a reasonable length
then look for other criteria for utterance breaks).

   Ultimately what we are trying to do is to chunk the text into
utterances that can be synthesized quickly and start to play them
quickly to minimise the time someone has to wait for the first sound
when starting synthesis.  Thus it would be better if this chunking were
done on _prosodic phrases_ rather than chunks more similar to linguistic
sentences.  Prosodic phrases are bounded in size, while sentences are
not.


File: festival.info,  Node: Text modes,  Next: Example text mode,  Prev: Utterance chunking,  Up: TTS

Text modes
==========

   We do not believe that all texts are of the same type.  Often
information about the general contents of file will aid synthesis
greatly.  For example in Latex files we do not want to here "left
brace, backslash e m" before each emphasized word, nor do we want to
necessarily hear formating commands.  Festival offers a basic method
for specifying customization rules depending on the _mode_ of the text.
By type we are following the notion of modes in Emacs and eventually
will allow customization at a similar level.

   Modes are specified as the third argument to the function `tts'.
When using the Emacs interface to Festival the buffer mode is
automatically passed as the text mode.  If the mode is not supported a
warning message is printed and the raw text mode is used.

   Our initial text mode implementation allows configuration both in C++
and in Scheme.  Obviously in C++ almost anything can be done but it is
not as easy to reconfigure without recompilation.  Here we will discuss
those modes which can be fully configured at run time.

   A text mode may contain the following
_filter_
     A Unix shell program filter that processes the text file in some
     appropriate way.  For example for email it might remove
     uninteresting headers and just output the subject, from line and
     the message body.  If not specified, an identity filter is used.

_init_function_
     This (Scheme) function will be called before any processing will
     be done.  It allows further set up of tokenization rules and
     voices etc.

_exit_function_
     This (Scheme) function will be called at the end of any processing
     allowing reseting of tokenization rules etc.

_analysis_mode_
     If analysis mode is `xml' the file is read through the built in XML
     parser `rxp'.  Alternatively if analysis mode is `xxml' the filter
     should an SGML normalising parser and the output is processed in a
     way suitable for it.  Any other value is ignored.  These mode
specific parameters are specified in the a-list held in
`tts_text_modes'.

   When using Festival in Emacs the emacs buffer mode is passed to
Festival as the text mode.

   Note that above mechanism is not really designed to be re-entrant,
this should be addressed in later versions.

   Following the use of auto-selection of mode in Emacs, Festival can
auto-select the text mode based on the filename given when no explicit
mode is given.  The Lisp variable `auto-text-mode-alist' is a list of
dotted pairs of regular expression and mode name.  For example to
specify that the `email' mode is to be used for files ending in
`.email' we would add to the current `auto-text-mode-alist' as follows
     (set! auto-text-mode-alist
           (cons (cons "\\.email$" 'email)
                 auto-text-mode-alist))
   If the function `tts' is called with a mode other than `nil' that
mode overrides any specified by the `auto-text-mode-alist'.  The mode
`fundamental' is the explicit "null" mode, it is used when no mode is
specified in the function `tts', and match is found in
`auto-text-mode-alist' or the specified mode is not found.

   By convention if a requested text model is not found in
`tts_text_modes' the file `MODENAME-mode' will be `required'.
Therefore if you have the file `MODENAME-mode.scm' in your library then
it will be automatically loaded on reference.  Modes may be quite large
and it is not necessary have Festival load them all at start up time.

   Because of the `auto-text-mode-alist' and the auto loading of
currently undefined text modes you can use Festival like
     festival --tts example.email
   Festival with automatically synthesize `example.email' in text mode
`email'.

   If you add your own personal text modes you should do the following.
Suppose you've written an HTML mode.  You have named it `html-mode.scm'
and put it in `/home/awb/lib/festival/'.  In your `.festivalrc' first
identify you're personal Festival library directory by adding it to
`lib-path'.
     (set! lib-path (cons "/home/awb/lib/festival/" lib-path))
   Then add the definition to the `auto-text-mode-alist' that file
names ending `.html' or `.htm' should be read in HTML mode.
     (set! auto-text-mode-alist
           (cons (cons "\\.html?$" 'html)
                 auto-text-mode-alist))
   Then you may synthesize an HTML file either from Scheme
     (tts "example.html" nil)
Or from the shell command line
     festival --tts example.html
   Anyone familiar with modes in Emacs should recognise that the
process of adding a new text mode to Festival is very similar to adding
a new buffer mode to Emacs.


File: festival.info,  Node: Example text mode,  Prev: Text modes,  Up: TTS

Example text mode
=================

   Here is a short example of a tts mode for reading email messages.  It
is by no means complete but is a start at showing how you can customize
tts modes without writing new C++ code.

   The first task is to define a filter that will take a saved mail
message and remove extraneous headers and just leave the from line,
subject and body of the message.  The filter program is given a file
name as its first argument and should output the result on standard
out.  For our purposes we will do this as a shell script.
     #!/bin/sh
     #  Email filter for Festival tts mode
     #  usage: email_filter mail_message >tidied_mail_message
     grep "^From: " $1
     echo
     grep "^Subject: " $1
     echo
     # delete up to first blank line (i.e. the header)
     sed '1,/^$/ d' $1
   Next we define the email init function, which will be called when we
start this mode.  What we will do is save the current token to words
function and slot in our own new one.  We can then restore the previous
one when we exit.
     (define (email_init_func)
      "Called on starting email text mode."
      (set! email_previous_t2w_func token_to_words)
      (set! english_token_to_words email_token_to_words)
      (set! token_to_words email_token_to_words))
   Note that _both_ `english_token_to_words' and `token_to_words'
should be set to ensure that our new token to word function is still
used when we change voices.

   The corresponding end function puts the token to words function back.
     (define (email_exit_func)
      "Called on exit email text mode."
      (set! english_token_to_words email_previous_t2w_func)
      (set! token_to_words email_previous_t2w_func))
   Now we can define the email specific token to words function.  In
this example we deal with two specific cases.  First we deal with the
common form of email addresses so that the angle brackets are not
pronounced.  The second points are to recognise quoted text and
immediately change the the speaker to the alternative speaker.
     (define (email_token_to_words token name)
       "Email specific token to word rules."
       (cond
   This first condition identifies the token as a bracketed email
address and removes the brackets and splits the token into name and IP
address.  Note that we recursively call the function
`email_previous_t2w_func' on the email name and IP address so that they
will be pronounced properly.  Note that because that function returns a
_list_ of words we need to append them together.
        ((string-matches name "<.*.*>")
          (append
           (email_previous_t2w_func token
            (string-after (string-before name "@") "<"))
           (cons
            "at"
            (email_previous_t2w_func token
             (string-before (string-after name "@") ">")))))
   Our next condition deals with identifying a greater than sign being
used as a quote marker.  When we detect this we select the alternative
speaker, even though it may already be selected.  We then return no
words so the quote marker is not spoken.  The following condition finds
greater than signs which are the first token on a line.
        ((and (string-matches name ">")
              (string-matches (item.feat token "whitespace")
                              "[ \t\n]*\n *"))
         (voice_don_diphone)
         nil ;; return nothing to say
        )
   If it doesn't match any of these we can go ahead and use the builtin
token to words function  Actually, we call the function that was set
before we entered this mode to ensure any other specific rules still
remain.  But before that we need to check if we've had a newline with
doesn't start with a greater than sign.  In that case we switch back to
the primary speaker.
        (t  ;; for all other cases
          (if (string-matches (item.feat token "whitespace")
                              ".*\n[ \t\n]*")
              (voice_rab_diphone))
          (email_previous_t2w_func token name))))
   In addition to these we have to actually declare the text mode.
This we do by adding to any existing modes as follows.
     (set! tts_text_modes
        (cons
         (list
           'email   ;; mode name
           (list         ;; email mode params
            (list 'init_func email_init_func)
            (list 'exit_func email_exit_func)
            '(filter "email_filter")))
         tts_text_modes))
   This will now allow simple email messages to be dealt with in a mode
specific way.

   An example mail message is included in `examples/ex1.email'.  To
hear the result of the above text mode start Festival, load in the
email mode descriptions,  and call TTS on the example file.
     (tts ".../examples/ex1.email" 'email)

   The above is very short of a real email mode but does illustrate how
one might go about building one.  It should be reiterated that text
modes are new in Festival and their most effective form has not been
discovered yet.  This will improve with time and experience.


File: festival.info,  Node: XML/SGML mark-up,  Next: Emacs interface,  Prev: TTS,  Up: Top

XML/SGML mark-up
****************

   The ideas of a general, synthesizer system nonspecific, mark-up
language for labelling text has been under discussion for some time.
Festival has supported an SGML based markup language through multiple
versions most recently STML (`sproat97').  This is based on the earlier
SSML (Speech Synthesis Markup Language) which was supported by previous
versions of Festival (`taylor96').  With this version of Festival we
support _Sable_ a similar mark-up language devised by a consortium from
Bell Labls, Sub Microsystems, AT&T and Edinburgh, `sable98'.  Unlike
the previous versions which were SGML based, the implementation of
Sable in Festival is now XML based.  To the user they different is
negligable but using XML makes processing of files easier and more
standardized.  Also Festival now includes an XML parser thus reducing
the dependencies in processing Sable text.

   Raw text has the problem that it cannot always easily be rendered as
speech in the way the author wishes.  Sable offers a well-defined way of
marking up text so that the synthesizer may render it appropriately.

   The definition of Sable is by no means settled and is still in
development.  In this release Festival offers people working on Sable
and other XML (and SGML) based markup languages a chance to quickly
experiment with prototypes by providing a DTD (document type
descriptions) and the mapping of the elements in the DTD to Festival
functions.  Although we have not yet (personally) investigated
facilities like cascading style sheets and generalized SGML
specification languages like DSSSL we believe the facilities offer by
Festival allow rapid prototyping of speech output markup languages.

   Primarily we see Sable markup text as a language that will be
generated by other programs, e.g. text generation systems, dialog
managers etc.  therefore a standard, easy to parse, format is required,
even if it seems overly verbose for human writers.

   For more information of Sable and access to the mailing list see
     `http://www.cstr.ed.ac.uk/projects/sable.html'

* Menu:

* Sable example::          an example of Sable with descriptions
* Supported Sable tags::   Currently supported Sable tags
* Adding Sable tags::      Adding new Sable tags
* XML/SGML requirements::  Software environment requirements for use
* Using Sable::            Rendering Sable files as speech


File: festival.info,  Node: Sable example,  Next: Supported Sable tags,  Up: XML/SGML mark-up

Sable example
=============

   Here is a simple example of Sable marked up text

     <?xml version="1.0"?>
     <!DOCTYPE SABLE PUBLIC "-//SABLE//DTD SABLE speech mark up//EN"
           "Sable.v0_2.dtd"
     []>
     <SABLE>
     <SPEAKER NAME="male1">
     
     The boy saw the girl in the park <BREAK/> with the telescope.
     The boy saw the girl <BREAK/> in the park with the telescope.
     
     Good morning <BREAK /> My name is Stuart, which is spelled
     <RATE SPEED="-40%">
     <SAYAS MODE="literal">stuart</SAYAS> </RATE>
     though some people pronounce it
     <PRON SUB="stoo art">stuart</PRON>.  My telephone number
     is <SAYAS MODE="literal">2787</SAYAS>.
     
     I used to work in <PRON SUB="Buckloo">Buccleuch</PRON> Place,
     but no one can pronounce that.
     
     By the way, my telephone number is actually
     <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.2.au"/>
     <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/>
     <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.8.au"/>
     <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/>.
     </SPEAKER>
     </SABLE>
   After the initial definition of the SABLE tags, through the file
`Sable.v0_2.dtd', which is distributed as part of Festival, the body is
given.  There are tags for identifying the language and the voice.
Explicit boundary markers may be given in text.  Also duration and
intonation control can be explicit specified as can new pronunciations
of words.  The last sentence specifies some external filenames to play
at that point.


File: festival.info,  Node: Supported Sable tags,  Next: Adding Sable tags,  Prev: Sable example,  Up: XML/SGML mark-up

Supported Sable tags
====================

   There is not yet a definitive set of tags but hopefully such a list
will form over the next few months.  As adding support for new tags is
often trivial the problem lies much more in defining what tags there
should be than in actually implementing them.    The following are
based on version 0.2 of Sable as described in
`http://www.cstr.ed.ac.uk/projects/sable_spec2.html', though some
aspects are not currently supported in this implementation.  Further
updates will be announces through the Sable mailing list.

`LANGUAGE'
     Allows the specification of the language through the `ID'
     attribute.  Valid values in Festival are, `english', `en1',
     `spanish', `en', and others depending on your particular
     installation.  For example
          <LANGUAGE id="english"> ... </LANGUAGE>
     If the language isn't supported by the particualr installation of
     Festival "Some text in .." is said instead and the section is
     ommitted.

`SPEAKER'
     Select a voice.  Accepts a parameter `NAME' which takes values
     `male1', `male2', `female1',  etc.  There is currently no
     definition about what happens when a voice is selected which the
     synthesizer doesn't support.  An example is
          <SPEAKER name="male1"> ... </SPEAKER>

`AUDIO'
     This allows the specification of an external waveform that is to
     be included.  There are attributes for specifying volume and
     whether the waveform is to be played in the background of the
     following text or not.  Festival as yet only supports insertion.
          My telephone number is
          <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.2.au"/>
          <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/>
          <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.8.au"/>
          <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/>.

`MARKER'
     This allows Festival to mark when a particalur part of the text has
     been reached.  At present the simply the value of the `MARK'
     attribute is printed.  This is done some when that piece of text
     is analyzed. not when it is played.  To use this in any real
     application would require changes to this tags implementation.
          Move the <MARKER MARK="mouse" /> mouse to the top.

`BREAK'
     Specifies a boundary at some `LEVEL'.  Strength may be values
     `Large', `Medium', `Small' or a number.  Note that this this tag
     is an emtpy tag and must include the closing part within itsefl
     specification.
          <BREAK LEVEL="LARGE"/>

`DIV'
     This signals an division.  In Festival this causes an utterance
     break.  A `TYPE' attribute may be specified but it is ignored by
     Festival.

`PRON'
     Allows pronunciation of enclosed text to be explcitily given.  It
     supports the attributes `IPA' for an IPA specification (not
     currently supported by Festival); `SUB' text to be substituted
     which can be in some form of phonetic spelling, and `ORIGIN' where
     the linguistic origin of the enclosed text may be identified to
     assist in etymologically sensitive letter to sound rules.
          <PRON SUB="toe maa toe">tomato</PRON>

`SAYAS'
     Allows indeitnfication of the enclose tokens/text.  The attribute
     `MODE' cand take any of the following a values: `literal', `date',
     `time', `phone', `net', `postal', `currency', `math', `fraction',
     `measure', `ordinal', `cardinal', or `name'.  Further specification
     of type for dates (MDY, DMY etc) may be speficied through the
     `MODETYPE' attribute.
          As a test of marked-up numbers. Here we have
          a year <SAYAS MODE="date">1998</SAYAS>,
          an ordinal <SAYAS MODE="ordinal">1998</SAYAS>,
          a cardinal <SAYAS MODE="cardinal">1998</SAYAS>,
          a literal <SAYAS MODE="literal">1998</SAYAS>,
          and phone number <SAYAS MODE="phone">1998</SAYAS>.

`EMPH'
     To specify enclose text should be emphasized, a `LEVEL' attribute
     may be specified but its value is currently ignored by Festival
     (besides the emphasis Festival generates isn't very good anyway).
          The leaders of <EMPH>Denmark</EMPH> and <EMPH>India</EMPH> meet on
          Friday.

`PITCH'
     Allows the specification of pitch range, mid and base points.
          Without his penguin, <PITCH BASE="-20%"> which he left at home, </PITCH>
          he could not enter the restaurant.

`RATE'
     Allows the specification of speaking rate
          The address is <RATE SPEED="-40%"> 10 Main Street </RATE>.

`VOLUME'
     Allows the specification of volume.  Note in festival this causes
     an utetrance break before and after this tag.
          Please speak more <VOLUME LEVEL="loud">loudly</VOLUME>, except
          when I ask you to speak <VOLUME LEVEL="quiet">in a quiet voice</VOLUME>.

`ENGINE'
     This allows specification of engine specific commands
          An example is <ENGINE ID="festival" DATA="our own festival speech
          synthesizer"> the festival speech synthesizer</ENGINE> or
          the Bell Labs speech synthesizer.

   These tags may change in name but they cover the aspects of speech
mark up that we wish to express.  Later additions and changes to these
are expected.

   See the files `festival/examples/example.sable' and
`festival/examples/example2.sable' for working examples.

   Note the definition of Sable is on going and there are likely to be
later more complete implementations of sable for Festival as independent
releases consult `url://www.cstr.ed.ac.uk/projects/sable.html' for the
most recent updates.


File: festival.info,  Node: Adding Sable tags,  Next: XML/SGML requirements,  Prev: Supported Sable tags,  Up: XML/SGML mark-up

Adding Sable tags
=================

   We do not yet claim that there is a fixed standard for Sable tags but
we wish to move towards such a standard.  In the mean time we have made
it easy in Festival to add support for new tags without, in general,
having to change any of the core functions.

   Two changes are necessary to add a new tags.  First, change the
definition in `lib/Sable.v0_2.dtd', so that Sable files may use it.
The second stage is to make Festival sensitive to that new tag.  The
example in `festival/lib/sable-mode.scm' shows how a new text mode may
be implemented for an XML/SGML-based markup language.  The basic point
is that an identified function will be called on finding a start tag or
end tags in the document.  It is the tag-function's job to synthesize
the given utterance if the tag signals an utterance boundary.  The
return value from the tag-function is the new status of the current
utterance, which may remain unchanged or if the current utterance has
been synthesized `nil' should be returned signalling a new utterance.

   Note the hierarchical structure of the document is not available in
this method of tag-functions.  Any hierarchical state that must be
preserved has to be done using explicit stacks in Scheme.  This is an
artifact due to the cross relationship to utterances and tags
(utterances may end within start and end tags), and the desire to have
all specification in Scheme rather than C++.

   The tag-functions are defined in an elements list.  They are
identified with names such as "(SABLE" and ")SABLE" denoting start and
end tags respectively.  Two arguments are passed to these tag functions,
an assoc list of attributes and values as specified in the document and
the current utterances.  If the tag denotes an utterance break, call
`xxml_synth' on `UTT' and return `nil'.  If a tag (start or end) is
found in the document and there is no corresponding tag-function it is
ignored.

   New features may be added to words with a start and end tag by
adding features to the global `xxml_word_features'.  Any features in
that variable will be added to each word.

   Note that this method may be used for both XML based lamnguages and
SGML based markup languages (though and external normalizing SGML
parser is required in the SGML case).  The type (XML vs SGML) is
identified by the `analysis_type' parameter in the tts text mode
specification.


File: festival.info,  Node: XML/SGML requirements,  Next: Using Sable,  Prev: Adding Sable tags,  Up: XML/SGML mark-up

XML/SGML requirements
=====================

   Festival is distributed with `rxp' an XML parser developed by
Richard Tobin of the Language Technology Group, University of
Edinburgh.  Sable is set up as an XML text mode so no further
requirements or external programs are required to synthesize from Sable
marked up text (unlike previous releases).  Note that `rxp' is not a
full validation parser and hence doesn't check some aspects of the file
(tags within tags).

   Festival still supports SGML based markup but in such cases requires
an external SGML normalizing parser.  We have tested `nsgmls-1.0' which
is available as part of the SGML tools set `sp-1.1.tar.gz' which is
available from `http://www.jclark.com/sp/index.html'.  This seems
portable between many platforms.


File: festival.info,  Node: Using Sable,  Prev: XML/SGML requirements,  Up: XML/SGML mark-up

Using Sable
===========

   Support in Festival for Sable is as a text mode.  In the command
mode use the following to process an Sable file
     (tts "file.sable" 'sable)

   Also the automatic selection of mode based on file type has been set
up such that files ending `.sable' will be automatically synthesized in
this mode.  Thus
     festival --tts fred.sable
   Will render `fred.sable' as speech in Sable mode.

   Another way of using Sable is through the Emacs interface.  The
say-buffer command will send the Emacs buffer mode to Festival as its
tts-mode. If the Emacs mode is stml or sgml the file is treated as an
sable file.  *Note Emacs interface::

   Many people experimenting with Sable (and TTS in general) often want
all the waveform output to be saved to be played at a later date.  The
simplest way to do this is using the `text2wave' script, It respects
the audo mode selection so
     text2wave fred.sable -o fred.wav
   Note this renders the file a single waveform (done by concatenating
the waveforms for each utterance in the Sable file).

   If you wish the waveform for each utterance in a file saved you can
cause the tts process to save the waveforms during synthesis.  A call to
     festival> (save_waves_during_tts)
   Any future call to `tts' will cause the waveforms to be saved in a
file `tts_file_xxx.wav' where `xxx' is a number.  A call to
`(save_waves_during_tts_STOP)' will stop saving the waves.  A message
is printed when the waveform is saved otherwise people forget about
this and wonder why their disk has filled up.

   This is done by inserting a function in `tts_hooks' which saves the
wave.  To do other things to each utterances during TTS (such as saving
the utterance structure), try redefining the function `save_tts_output'
(see `festival/lib/tts.scm').