Sophie

Sophie

distrib > Mageia > 2 > i586 > by-pkgid > a2e5ae2091c2674a899ba2cbfce176e5 > files > 64

festival-2.1-3.mga1.i586.rpm

This is festival.info, produced by Makeinfo version 3.12h from
festival.texi.

   This file documents the `Festival' Speech Synthesis System a general
text to speech system for making your computer talk and developing new
synthesis techniques.

   Copyright (C) 1996-2001 University of Edinburgh

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the authors.


File: festival.info,  Node: Building models,  Prev: Extracting features,  Up: Building models from databases

Building models
===============

   This section describes how to build models from data extracted from
databases as described in the previous section.  It uses the CART
building program, `wagon' which is available in the speech tools
distribution.  But the data is suitable for many other types of model
building techniques, such as linear regression or neural networks.

   Wagon is described in the speech tools manual, though we will cover
simple use here.  To use Wagon you need a datafile and a data
description file.

   A datafile consists of a number of vectors one per line each
containing the same number of fields.  This, not coincidentally, is
exactly the format produced by `dumpfeats' described in the previous
section.  The data description file describes the fields in the datafile
and their range.  Fields may be of any of the following types: class (a
list of symbols), floats, or ignored.  Wagon will build a classification
tree if the first field (the predictee) is of type class, or a
regression tree if the first field is a float.  An example data
description file would be
     (
     ( duration float )
     ( name # @ @@ a aa ai au b ch d dh e e@ ei f g h i i@ ii jh k l m n
         ng o oi oo ou p r s sh t th u u@ uh uu v w y z zh )
     ( n.name # @ @@ a aa ai au b ch d dh e e@ ei f g h i i@ ii jh k l m n
         ng o oi oo ou p r s sh t th u u@ uh uu v w y z zh )
     ( p.name # @ @@ a aa ai au b ch d dh e e@ ei f g h i i@ ii jh k l m n
         ng o oi oo ou p r s sh t th u u@ uh uu v w y z zh )
     ( R:SylStructure.parent.position_type 0 final initial mid single )
     ( pos_in_syl float )
     ( syl_initial 0 1 )
     ( syl_final 0 1)
     ( R:SylStructure.parent.R:Syllable.p.syl_break 0 1 3 )
     ( R:SylStructure.parent.syl_break 0 1 3 4 )
     ( R:SylStructure.parent.R:Syllable.n.syl_break 0 1 3 4 )
     ( R:SylStructure.parent.R:Syllable.p.stress 0 1 )
     ( R:SylStructure.parent.stress 0 1 )
     ( R:SylStructure.parent.R:Syllable.n.stress 0 1 )
     )
   The script `speech_tools/bin/make_wagon_desc' goes some way to
helping.  Given a datafile and a file containing the field names, it
will construct an approximation of the description file.  This file
should still be edited as all fields are treated as of type class by
`make_wagon_desc' and you may want to change them some of them to float.

   The data file must be a single file, although we created a number of
feature files by the process described in the previous section.  From a
list of file ids select, say, 80% of them, as training data and cat them
into a single datafile.  The remaining 20% may be catted together as
test data.

   To build a tree use a command like
     wagon -desc DESCFILE -data TRAINFILE -test TESTFILE
   The minimum cluster size (default 50) may be reduced using the
command line option `-stop' plus a number.

   Varying the features and stop size may improve the results.

   Building the models and getting good figures is only one part of the
process.  You must integrate this model into Festival if its going to
be of any use.  In the case of CART trees generated by Wagon, Festival
supports these directly.  In the case of CART trees predicting zscores,
or factors to modify duration averages, ees can be used as is.

   Note there are other options to Wagon which may help build better
CART models.  Consult the chapter in the speech tools manual on Wagon
for more information.

   Other parts of the distributed system use CART trees, and linear
regression models that were training using the processes described in
this chapter.  Some other parts of the distributed system use CART trees
which were written by hand and may be improved by properly applying
these processes.


File: festival.info,  Node: Programming,  Next: API,  Prev: Building models from databases,  Up: Top

Programming
***********

   This chapter covers aspects of programming within the Festival
environment, creating new modules, and modifying existing ones.  It
describes basic Classes available and gives some particular examples of
things you may wish to add.

* Menu:

* The source code::        A walkthrough of the source code
* Writing a new module::   Example access of an utterance


File: festival.info,  Node: The source code,  Next: Writing a new module,  Up: Programming

The source code
===============

   The ultimate authority on what happens in the system lies in the
source code itself.  No matter how hard we try, and how automatic we
make it, the source code will always be ahead of the documentation.
Thus if you are going to be using Festival in a serious way,
familiarity with the source is essential.

   The lowest level functions are catered for in the Edinburgh Speech
Tools, a separate library distributed with Festival.  The Edinburgh
Speech Tool Library offers the basic utterance structure, waveform file
access, and other various useful low-level functions which we share
between different speech systems in our work.  *Note Overview:
(speechtools)Top.

   The directory structure for the Festival distribution reflects the
conceptual split in the code.

`./bin/'
     The user-level executable binaries and scripts that are part of the
     festival system.  These are simple symbolic links to the binaries
     or if the system is compiled with shared libraries small
     wrap-around shell scripts that set `LD_LIBRARY_PATH' appropriately

`./doc/'
     This contains the texinfo documentation for the whole system.  The
     `Makefile' constructs the info and/or html version as desired.
     Note that the `festival' binary itself is used to generate the
     lists of functions and variables used within the system, so must
     be compiled and in place to generate a new version of the
     documentation.

`./examples/'
     This contains various examples.  Some are explained within this
     manual, others are there just as examples.

`./lib/'
     The basic Scheme parts of the system, including `init.scm' the
     first file loaded by `festival' at start-up time.  Depending on
     your installation, this directory may also contain subdirectories
     containing lexicons, voices and databases.  This directory and its
     sub-directories are used by Festival at run-time.

`./lib/etc/'
     Executables for Festival's internal use.  A subdirectory containing
     at least the audio spooler will be automatically created (one for
     each different architecture the system is compiled on).  Scripts
     are added to this top level directory itself.

`./lib/voices/'
     By default this contains the voices used by Festival including
     their basic Scheme set up functions as well as the diphone
     databases.

`./lib/dicts/'
     This contains various lexicon files distributed as part of the
     system.

`./config/'
     This contains the basic `Makefile' configuration files for
     compiling the system (run-time configuration is handled by Scheme
     in the `lib/' directory).  The file `config/config' created as a
     copy of the standard `config/config-dist' is the installation
     specific configuration.  In most cases a simpel copy of the
     distribution file will be sufficient.

`./src/'
     The main C++/C source for the system.

`./src/lib/'
     Where the `libFestival.a' is built.

`./src/include/'
     Where include files shared between various parts of the system
     live.  The file `festival.h' provides access to most of the parts
     of the system.

`./src/main/'
     Contains the top level C++ files for the actual executables.  This
     is directory where the executable binary `festival' is created.

`./src/arch/'
     The main core of the Festival system.  At present everything is
     held in a single sub-directory `./src/arc/festival/'.  This
     contains the basic core of the synthesis system itself.  This
     directory contains lisp front ends to access the core utterance
     architecture, and phonesets, basic tools like, client/server
     support, ngram support, etc, and an audio spooler.

`./src/modules/'
     In contrast to the `arch/' directory this contains the non-core
     parts of the system.  A set of basic example modules are included
     with the standard distribution.  These are the parts that do the
     synthesis, the other parts are just there to make module writing
     easier.

`./src/modules/base/'
     This contains some basic simple modules that weren't quite big
     enough to deserve their own directory.  Most importantly it
     includes the `Initialize' module called by many synthesis methods
     which sets up an utterance structure and loads in initial values.
     This directory also contains phrasing, part of speech, and word
     (syllable and phone construction from words) modules.

`./src/modules/Lexicon/'
     This is not really a module in the true sense (the `Word' module
     is the main user of this).  This contains functions to construct,
     compile, and access lexicons (entries of words, part of speech and
     pronunciations).  This also contains a letter-to-sound rule system.

`./src/modules/Intonation/'
     This contains various intonation systems, from the very simple to
     quite complex parameter driven intonation systems.

`./src/modules/Duration/'
     This contains various duration prediction systems, from the very
     simple (fixed duration) to quite complex parameter driven duration
     systems.

`./src/modules/UniSyn/'
     A basic diphone synthesizer system, supporting a simple database
     format (which can be grouped into a more efficient binary
     representation).  It is multi-lingual, and allows multiple
     databases to be loaded at once.  It offers a choice of
     concatenation methods for diphones: residual excited LPC or PSOLA
     (TM) (which is not distributed)

`./src/modules/Text/'
     Various text analysis functions, particularly the tokenizer and
     utterance segmenter (from arbitrary files).  This directory also
     contains the support for text modes and SGML.

`./src/modules/donovan/'
     An LPC based diphone synthesizer.  Very small and neat.

`./src/modules/rxp/'
     The Festival/Scheme front end to An XML parser written by Richard
     Tobin from University of Edinburgh's Language Technology Group..
     rxp is now part of the speech tools rather than just Festival.

`./src/modules/parser'
     A simple interface the the Stochastic Context Free Grammar parser
     in the speech tools library.

`./src/modules/diphone'
     An optional module contain the previouslty used diphone synthsizer.

`./src/modules/clunits'
     A partial implementation of a cluster unit selection algorithm as
     described in `black97c'.

`./src/modules/Database rjc_synthesis'
     This consist of a new set of modules for doing waveform synthesis.
     They are inteneded to unit size independent (e.g. diphone, phone,
     non-uniform unit).  Also selection, prosodic modification, joining
     and signal processing are separately defined.  Unfortunately this
     code has not really been exercised enough to be considered stable
     to be used in the default synthesis method, but those working on
     new synthesis techniques may be interested in integration using
     these new modules.  They may be updated before the next full
     release of Festival.

`./src/modules/*'
     Other optional directories may be contained here containing
     various research modules not yet part of the standard distribution.
     See below for descriptions of how to add modules to the basic
     system.  One intended use of Festival is offer a software system
where new modules may be easily tested in a stable environment.  We
have tried to make the addition of new modules easy, without requiring
complex modifications to the rest of the system.

   All of the basic modules should really be considered merely as
example modules.  Without much effort all of them could be improved.


File: festival.info,  Node: Writing a new module,  Prev: The source code,  Up: Programming

Writing a new module
====================

   This section gives a simple example of writing a new module. showing
the basic steps that must be done to create and add a new module that is
available for the rest of the system to use.  Note many things can be
done solely in Scheme now and really only low-level very intensive
things (like waveform synthesizers) need be coded in C++.

Example 1: adding new modules
-----------------------------

   The example here is a duration module which sets durations of phones
for a given list of averages.  To make this example more interesting,
all durations in accented syllables are increased by 1.5.  Note that
this is just an example for the sake of one, this (and much better
techniques) could easily done within the system as it is at present
using a hand-crafted CART tree.

   Our knew module, called `Duration_Simple' can most easily be added
to the `./src/Duration/' directory in a file `simdur.cc'.  You can
worry about the copyright notice, but after that you'll probably need
the following includes
     #include <festival.h>
   The module itself must be declared in a fixed form.  That is
receiving a single LISP form (an utterance) as an argument and
returning that LISP form at the end.  Thus our definition will start
     LISP FT_Duration_Simple(LISP utt)
     {
   Next we need to declare an utterance structure and extract it from
the LISP form. We also make a few other variable declarations
         EST_Utterance *u = get_c_utt(utt);
         EST_Item *s;
         float end=0.0, dur;
         LISP ph_avgs,ldur;
   We cannot list the average durations for each phone in the source
code as we cannot tell which phoneset we are using (or what
modifications we want to make to durations between speakers).  Therefore
the phone and average duration information is held in a Scheme variable
for easy setting at run time.  To use the information in our C++ domain
we must get that value from the Scheme domain.  This is done with the
following statement.
         ph_avgs = siod_get_lval("phoneme_averages","no phoneme durations");
   The first argument to `siod_get_lval' is the Scheme name of a
variable which has been set to an assoc list of phone and average
duration before this module is called.  See the variable
`phone_durations' in `lib/mrpa_durs.scm' for the format.  The second
argument to `siod_get_lval'. is an error message to be printed if the
variable `phone_averages' is not set.  If the second argument to
`siod_get_lval' is `NULL' then no error is given and if the variable is
unset this function simply returns the Scheme value `nil'.

   Now that we have the duration data we can go through each segment in
the utterance and add the duration.  The loop looks like
         for (s=u->relation("Segment")->head(); s != 0; s = next(s))
         {
   We can lookup the average duration of the current segment name using
the function `siod_assoc_str'.  As arguments, it takes the segment name
`s->name()' and the assoc list of phones and duration.
             ldur = siod_assoc_str(s->name(),ph_avgs);
   Note the return value is actually a LISP pair (phone name and
duration), or `nil' if the phone isn't in the list.  Here we check if
the segment is in the list.  If it is not we print an error and set the
duration to 100 ms, if it is in the list the floating point number is
extracted from the LISP pair.
             if (ldur == NIL)
             {
                 cerr << "Phoneme: " << s->name() << " no duration "
                     << endl;
                 dur = 0.100;
             }
             else
                 dur = get_c_float(car(cdr(ldur)));
   If this phone is in an accented syllable we wish to increase its
duration by a factor of 1.5.  To find out if it is accented we use the
feature system to find the syllable this phone is part of and find out
if that syllable is accented.
             if (ffeature(s,"R:SylStructure.parent.accented") == 1)
                 dur *= 1.5;
   Now that we have the desired duration we increment the `end'
duration with our predicted duration for this segment and set the end
of the current segment.
             end += dur;
             s->fset("end",end);
         }
   Finally we return the utterance from the function.
         return utt;
     }
   Once a module is defined it must be declared to the system so it may
be called.  To do this one must call the function
`festival_def_utt_module' which takes a LISP name, the C++ function
name and a documentation string describing what the module does.  This
will automatically be available at run-time and added to the manual.
The call to this function should be added to the initialization function
in the directory you are adding the module too.  The function is called
`festival_DIRNAME_init()'.  If one doesn't exist you'll need to create
it.

   In `./src/Duration/' the function `festival_Duration_init()' is at
the end of the file `dur_aux.cc'.  Thus we can add our new modules
declaration at the end of that function.  But first we must declare the
C++ function in that file.  Thus above that function we would add
     LISP FT_Duration_Simple(LISP args);
   While at the end of the function `festival_Duration_init()' we would
add
        festival_def_utt_module("Duration_Simple",FT_Duration_Simple,
        "(Duration_Simple UTT)\n\
       Label all segments with average duration ... ");

   In order for our new file to be compiled we must add it to the
`Makefile' in that directory, to the `SRCS' variable.  Then when we
type `make' in `./src/' our new module will be properly linked in and
available for use.

   Of course we are not quite finished.  We still have to say when our
new duration module should be called.  When we set
        (Parameter.set 'Duration_Method Duration_Simple)
   for a voice it will use our new module, calls to the function
`utt.synth' will use our new duration module.

   Note in earlier versions of Festival it was necessary to modify the
duration calling function in `lib/duration.scm' but that is no longer
necessary.

Example 2: accessing the utterance
----------------------------------

   In this example we will make more direct use of the utterance
structure, showing the gory details of following relations in an
utterance.  This time we will create a module that will name all
syllables with a concatenation of the names of the segments they are
related to.

   As before we need the same standard includes
     #include "festival.h"
   Now the definition the function
     LISP FT_Name_Syls(LISP utt)
     {
   As with the previous example we are called with an utterance LISP
object and will return the same.  The first task is to extract the
utterance object from the LISP object.
         EST_Utterance *u = get_c_utt(utt);
         EST_Item *syl,*seg;
   Now for each syllable in the utterance we want to find which segments
are related to it.
         for (syl=u->relation("Syllable")->head(); syl != 0; syl = next(syl))
         {
   Here we declare a variable to cummulate the names of the segments.
             EST_String sylname = "";
   Now we iterate through the `SylStructure' daughters of the syllable.
These will be the segments in that syllable.
             for (seg=daughter1(syl,"SylStructure"); seg; seg=next(seg))
                 sylname += seg->name();
   Finally we set the syllables name to the concatenative name, and
loop to the next syllable.
             syl->set_name(sylname);
         }
   Finally we return the LISP form of the utterance.
         return utt;
     }

Example 3: adding new directories
---------------------------------

   In this example we will add a whole new subsystem.  This will often
be a common way for people to use Festival.  For example let us assume
we wish to add a formant waveform synthesizer (e.g like that in the free
`rsynth' program).  In this case we will add a whole new sub-directory
to the modules directory.  Let us call it `rsynth/'.

   In the directory we need a `Makefile' of the standard form so we
should copy one from one of the other directories, e.g. `Intonation/'.
Standard methods are used to identify the source code files in a
`Makefile' so that the `.o' files are properly added to the library.
Following the other examples will ensure your code is integrated
properly.

   We'll just skip over the bit where you extract the information from
the utterance structure and synthesize the waveform (see
`donovan/donovan.cc' or `diphone/diphone.cc' for examples).

   To get Festival to use your new module you must tell it to compile
the directory's contents.  This is done in `festival/config/config'.
Add the line
     ALSO_INCLUDE += rsynth
   to the end of that file (there are simialr ones mentioned).  Simply
adding the name of the directory here will add that as a new module and
the directory will be compiled.

   What you must provide in your code is a function
`festival_DIRNAME_init()' which will be called at initialization time.
In this function you should call any further initialization require and
define and new Lisp functions you with to made available to the rest of
the system.  For example in the `rsynth' case we would define in some
file in `rsynth/'
     #include "festival.h"
     
     static LISP utt_rtsynth(LISP utt)
     {
         EST_Utterance *u = get_c_utt(utt);
         // Do format synthesis
         return utt;
     }
     
     void festival_rsynth_init()
     {
        proclaim_module("rsynth");
     
        festival_def_utt_module("Rsynth_Synth",utt_rsynth,
        "(Rsynth_Synth UTT)
        A simple formant synthesizer");
     
        ...
     }
   Integration of the code in optional (and standard directories) is
done by automatically creating `src/modules/init_modules.cc' for the
list of standard directories plus those defined as `ALSO_INCLUDE'. A
call to a function called `festival_DIRNAME_init()' will be made.

   This mechanism is specifically designed so you can add modules to the
system without changing anything in the standard distribution.

Example 4: adding new LISP objects
----------------------------------

   This third examples shows you how to add a new Object to Scheme and
add wraparounds to allow manipulation within the the Scheme (and C++)
domain.

   Like example 2 we are assuming this is done in a new directory.
Suppose you have a new object called `Widget' that can transduce a
string into some other string (with some optional continuous parameter.
 Thus, here we create a new file `widget.cc' like this

     #include "festival.h"
     #include "widget.h"  // definitions for the widget class
   In order to register the widgets as Lisp objects we actually need to
register them as `EST_Val''s as well.  Thus we now need
     VAL_REGISTER_CLASS(widget,Widget)
     SIOD_REGISTER_CLASS(widget,Widget)
   The first names given to these functions should be a short mnenomic
name for the object that will be used in the defining of a set of
access and construction functions.  It of course must be unique within
the whole systems.  The second name is the name of the object itself.

   To understand its usage we can add a few simple widget maniplutation
functions
     LISP widget_load(LISP filename)
     {
        EST_String fname = get_c_string(filename);
        Widget *w = new Widget;   // build a new widget
     
        if (w->load(fname) == 0)  // successful load
           return siod(w);
        else
        {
           cerr << "widget load: failed to load \"" << fname << "\"" << endl;
           festival_error();
        }
        return NIL;  // for compilers that get confused
     }
   Note that the function `siod' constructs a LISP object from a
`widget', the class register macro defines that for you.  Also note
that when giving an object to a `LISP' object it then owns the object
and is responsibile for deleting it when garbage collection occurs on
that `LISP' object.  Care should be taken that you don't put the same
object within different `LISP' objects.  The macros
`VAL_RESGISTER_CLASS_NODEL' should be called if you do not want your
give object to be deleted by the LISP system (this may cause leaks).

   If you want refer to these functions in other files within your
models you can use
     VAL_REGISTER_CLASS_DCLS(widget,Widget)
     SIOD_REGISTER_CLASS_DCLS(widget,Widget)
   in a common `.h' file

   The following defines a function that takes a LISP object containing
a widget, aplies some method and returns a string.
     LISP widget_apply(LISP lwidget, LISP string, LISP param)
     {
         Widget *w = widget(lwidget);
         EST_String s = get_c_string(string);
         float p = get_c_float(param);
         EST_String answer;
     
         answer = w->apply(s,p);
     
         return strintern(answer);
     }
   The function `widget', defined by the regitration macros, takes a
`LISP' object and returns a pointer to the `widget' inside it.  If the
`LISP' object does not contain a `widget' an error will be thrown.

   Finally you wish to add these functions to the Lisp system
     void festival_widget_init()
     {
       init_subr_1("widget.load",widget_load,
         "(widget.load FILENAME)\n\
       Load in widget from FILENAME.");
       init_subr_3("widget.apply",widget_apply,
         "(widget.apply WIDGET INPUT VAL)\n\
       Returns widget applied to string iNPUT with float VAL.");
     }

   In yout `Makefile' for this directory you'll need to add the include
directory where `widget.h' is, if it is not contained within the
directory itself.  This done through the make variable `LOCAL_INCLUDES'
as
     LOCAL_INCLUDES = -I/usr/local/widget/include
   And for the linker you 'll need to identify where your widget library
is.  In your `festival/config/config' file at the end add
     COMPILERLIBS += -L/usr/local/widget/lib -lwidget


File: festival.info,  Node: API,  Next: Examples,  Prev: Programming,  Up: Top

API
***

   If you wish to use Festival within some other application there are
a number of possible interfaces.

* Menu:

* Scheme API::           Programming in Scheme
* Shell API::            From Unix shell
* Server/client API::    Festival as a speech synthesis server
* C/C++ API::            Through function calls from C++.
* C only API::           Small independent C client access
* Java and JSAPI::       Sythesizing from Java


File: festival.info,  Node: Scheme API,  Next: Shell API,  Up: API

Scheme API
==========

   Festival includes a full programming language, Scheme (a variant of
Lisp) as a powerful interface to its speech synthesis functions.  Often
this will be the easiest method of controlling Festival's
functionality.  Even when using other API's they will ultimately depend
on the Scheme interpreter.

   Scheme commands (as s-expressions) may be simply written in files and
interpreted by Festival, either by specification as arguments on the
command line, in the interactive interpreter, or through standard input
as a pipe.  Suppose we have a file `hello.scm' containing

     ;; A short example file with Festival Scheme commands
     (voice_rab_diphone) ;; select Gordon
     (SayText "Hello there")
     (voice_don_diphone) ;; select Donovan
     (SayText "and hello from me")

   From the command interpreter we can execute the commands in this file
by loading them
     festival> (load "hello.scm")
     nil
   Or we can execute the commands in the file directly from the shell
command line
     unix$ festival -b hello.scm
   The `-b' option denotes batch operation meaning the file is loaded
and then Festival will exit, without starting the command interpreter.
Without this option `-b' Festival will load `hello.scm' and then accept
commands on standard input.  This can be convenient when some initial
set up is required for a session.

   Note one disadvantage of the batch method is that time is required
for Festival's initialisation every time it starts up.  Although this
will typically only be a few seconds, for saying short individual
expressions that lead in time may be unacceptable.  Thus simply
executing the commands within an already running system is more
desirable, or using the server/client mode.

   Of course its not just about strings of commands, because Scheme is a
fully functional language, functions, loops, variables, file access,
arithmetic operations may all be carried out in your Scheme programs.
Also, access to Unix is available through the `system' function.  For
many applications directly programming them in Scheme is both the
easiest and the most efficient method.

   A number of example Festival scripts are included in `examples/'.
Including a program for saying the time, and for telling you the latest
news (by accessing a page from the web).  Also see the detailed
discussion of a script example in *Note POS Example::.


File: festival.info,  Node: Shell API,  Next: Server/client API,  Prev: Scheme API,  Up: API

Shell API
=========

   The simplest use of Festival (though not the most powerful) is
simply using it to directly render text files as speech.  Suppose we
have a file `hello.txt' containing
     Hello world.  Isn't it excellent weather
     this morning.
   We can simply call Festival as
     unix$ festival --tts hello.txt
   Or for even simpler one-off phrases
     unix$ echo "hello " | festival --tts
   This is easy to use but you will need to wait for Festival to start
up and initialise its databases before it starts to render the text as
speech.  This may take several seconds on some machines.  A socket based
server mechanism is provided in Festival which will allow a single
server process to start up once and be used efficiently by multiple
client programs.

   Note also the use of Sable for marked up text, *note XML/SGML
mark-up::..  Sable allows various forms of additional information in
text, such as phrasing, emphasis, pronunciation, as well as changing
voices, and inclusion of external waveform files (i.e. random noises).
For many application this will be the preferred interface method.
Other text modes too are available through the command line by using
`auto-text-mode-alist'.


File: festival.info,  Node: Server/client API,  Next: C/C++ API,  Prev: Shell API,  Up: API

Server/client API
=================

   Festival offers a BSD socket-based interface.  This allows Festival
to run as a server and allow client programs to access it.  Basically
the server offers a new command interpreter for each client that
attaches to it.  The server is forked for each client but this is much
faster than having to wait for a Festival process to start from
scratch.  Also the server can run on a bigger machine, offering much
faster synthesis.

   _Note: the Festival server is inherently insecure and may allow
arbitrary users access to your machine._

   Every effort has been made to minimise the risk of unauthorised
access through Festival and a number of levels of security are provided.
However with any program offering socket access, like `httpd',
`sendmail' or `ftpd' there is a risk that unauthorised access is
possible.  I trust Festival's security enough to often run it on my own
machine and departmental servers, restricting access to within our
department.  Please read the information below before using the
Festival server so you understand the risks.

Server access control
---------------------

   The following access control is available for Festival when running
as a server.  When the server starts it will usually start by loading
in various commands specific for the task it is to be used for.  The
following variables are used to control access.
`server_port'
     A number identifying the inet socket port.  By default this is
     1314.  It may be changed as required.

`server_log_file'
     If nil no logging takes place, if t logging is printed to standard
     out and if a file name log messages are appended to that file. All
     connections and attempted connections are logged with a time stamp
     and the name of the client.  All commands sent from the client are
     also logged (output and data input is not logged).

`server_deny_list'
     If non-nil it is used to identify which machines are not allowed
     access to the server.  This is a list of regular expressions.  If
     the host name of the client matches any of the regexs in this list
     the client is denied access.   This overrides all other access
     methods.  Remember that sometimes hosts are identified as numbers
     not as names.

`server_access_list'
     If this is non-nil only machines whose names match at least one of
     the regexs in this list may connect as clients.  Remember that
     sometimes hosts are identified as numbers not as names, so you
     should probably exclude the IP number of machine as well as its
     name to be properly secure.

`server_passwd'
     If this is non-nil, the client must send this passwd to the server
     followed by a newline before access is given.  This is required
     even if the machine is included in the access list.  This is
     designed so servers for specific tasks may be set up with
     reasonable security.

`(set_server_safe_functions FUNCNAMELIST)'
     If called this can restrict which functions the client may call.
     This is the most restrictive form of access, and thoroughly
     recommended.  In this mode it would be normal to include only the
     specific functions the client can execute (i.e. the function to
     set up output, and a tts function).  For example a server could
     call the following at set up time, thus restricting calls to only
     those that `festival_client' `--ttw' uses.
          (set_server_safe_functions
                  '(tts_return_to_client tts_text tts_textall Parameter.set))

   Its is strongly recommend that you run Festival in server mode as
userid `nobody' to limit the access the process will have, also running
it in a chroot environment is more secure.

   For example suppose we wish to allow access to all machines in the
CSTR domain except for `holmes.cstr.ed.ac.uk' and `adam.cstr.ed.ac.uk'.
This may be done by the following two commands
     (set! server_deny_list '("holmes\\.cstr\\.ed\\.ac\\.uk"
                              "adam\\.cstr\\.ed\\.ac\\.uk"))
     (set! server_access_list '("[^\\.]*\\.cstr\\.ed\\.ac\\.uk"))
   This is not complete though as when DNS is not working `holmes' and
`adam' will still be able to access the server (but if our DNS isn't
working we probably have more serious problems).  However the above is
secure in that only machines in the domain `cstr.ed.ac.uk' can access
the server, though there may be ways to fix machines to identify
themselves as being in that domain even when they are not.

   By default Festival in server mode will only accept client
connections for `localhost'.

Client control
--------------

   An example client program called `festival_client' is included with
the system that provides a wide range of access methods to the server.
A number of options for the client are offered.

`--server'
     The name (or IP number) of the server host.  By default this is
     `localhost' (i.e. the same machine you run the client on).

`--port'
     The port number the Festival server is running on.  By default this
     is 1314.

`--output FILENAME'
     If a waveform is to be synchronously returned, it will be saved in
     FILENAME.   The `--ttw' option uses this as does the use of the
     Festival command `utt.send.wave.client'.  If an output waveform
     file is received by `festival_client' and no output file has been
     given the waveform is discarded with an error message.

`--passwd PASSWD'
     If a passwd is required by the server this should be stated on the
     client call.  PASSWD is sent plus a newline before any other
     communication takes places.  If this isn't specified and a passwd
     is required, you must enter that first, if the `--ttw' option is
     used, a passwd is required and none specified access will be
     denied.

`--prolog FILE'
     FILE is assumed to be contain Festival commands and its contents
     are sent to the server after the passwd but before anything else.
     This is convenient to use in conjunction with `--ttw' which
     otherwise does not offer any way to send commands as well as the
     text to the server.

`--otype OUTPUTTYPE'
     If an output waveform file is to be used this specified the output
     type of the file.  The default is `nist', but, `ulaw', `riff',
     `ulaw' and others as supported by the Edinburgh Speech Tools
     Library are valid.  You may use raw too but note that Festival may
     return waveforms of various sampling rates depending on the sample
     rates of the databases its using.  You can of course make Festival
     only return one particular sample rate, by using
     `after_synth_hooks'.  Note that byte order will be native machine
     of the _client_ machine if the output format allows it.

`--ttw'
     Text to wave is an attempt to make `festival_client' useful in
     many simple applications.  Although you can connect to the server
     and send arbitrary Festival Scheme commands, this option
     automatically does what is probably what you want most often.
     When specified this options takes text from the specified file (or
     stdin), synthesizes it (in one go) and saves it in the specified
     output file.  It basically does the following
          (Parameter.set 'Wavefiletype '<output type>)
          (tts_textall "
          <file/stdin contents>
          ")))
     Note that this is best used for small, single utterance texts as
     you have to wait for the whole text to be synthesized before it is
     returned.

`--aucommand COMMAND'
     Execute COMMAND of each waveform returned by the server.   The
     variable `FILE' will be set when COMMAND is executed.

`--async'
     So that the delay between the text being sent and the first sound
     being available to play, this option in conjunction with `--ttw'
     causes the text to be synthesized utterance by utterance and be
     sent back in separated waveforms.  Using `--aucommand' each
     waveform my be played locally, and when `festival_client' is
     interrupted the sound will stop.  Getting the client to connect to
     an audio server elsewhere means the sound will not necessarily
     stop when the `festival_client' process is stopped.

`--withlisp'
     With each command being sent to Festival a Lisp return value is
     sent, also Lisp expressions may be sent from the server to the
     client through the command `send_client'.  If this option is
     specified the Lisp expressions are printed to standard out,
     otherwise this information is discarded.

   A typical example use of `festival_client' is
     festival_client --async --ttw --aucommand 'na_play $FILE' fred.txt
   This will use `na_play' to play each waveform generated for the
utterances in `fred.txt'.  Note the _single_ quotes so that the `$' in
`$FILE' isn't expanded locally.

   Note the server must be running before you can talk to it.  At
present Festival is not set up for automatic invocations through `inetd'
and `/etc/services'.  If you do that yourself, note that it is a
different type of interface as `inetd' assumes all communication goes
through standard in/out.

   Also note that each connection to the server starts a new session.
Variables are not persistent over multiple calls to the server so if any
initialization is required (e.g. loading of voices) it must be done
each time the client starts or more reasonably in the server when it is
started.

   A PERL festival client is also available in
`festival/examples/festival_client.pl'

Server/client protocol
----------------------

   The client talks to the server using s-expression (Lisp).  The server
will reply with and number of different chunks until either OK, is
returned or ER (on error).  The communicatotion is synchronous, each
client request can generate a number of waveform (WV) replies and/or
Lisp replies (LP) and terminated with an OK (or ER).  Lisp is used as it
has its own inherent syntax that Festival can already parse.

   The following pseudo-code will help defined the protocol as well as
show typical use

        fprintf(serverfd,"%s\n",s-expression);
        do
           ack = read three character acknowledgemnt
           if (ack == "WV\n")
              read a waveform
           else if (ack == "LP\n")
              read an s-expression
           else if (ack == "ER\n")
              an error occurred, break;
        while ack != "OK\n"
   The server can send a waveform in an utterance to the client through
the function `utt.send.wave.client';  The server can send a lisp
expression to the client through the function


File: festival.info,  Node: C/C++ API,  Next: C only API,  Prev: Server/client API,  Up: API

C/C++ API
=========

   As well as offerening an interface through Scheme and the shell some
users may also wish to embedd Festival within their own C++ programs.
A number of simply to use high level functions are available for such
uses.

   In order to use Festival you must include
`festival/src/include/festival.h' which in turn will include the
necessary other include files in `festival/src/include' and
`speech_tools/include' you should ensure these are included in the
include path for you your program.  Also you will need to link your
program with `festival/src/lib/libFestival.a',
`speech_tools/lib/libestools.a', `speech_tools/lib/libestbase.a' and
`speech_tools/lib/libeststring.a' as well as any other optional
libraries such as net audio.

   The main external functions available for C++ users of Festival are.
`void festival_initialize(int load_init_files,int heapsize);'
     This must be called before any other festival functions may be
     called.  It sets up the synthesizer system.  The first argument if
     true, causes the system set up files to be loaded (which is
     normallly what is necessary), the second argument is the initial
     size of the Scheme heap, this should normally be 210000 unless you
     envisage processing very large Lisp structures.

`int festival_say_file(const EST_String &filename);'
     Say the contents of the given file.  Returns `TRUE' or `FALSE'
     depending on where this was successful.

`int festival_say_text(const EST_String &text);'
     Say the contents of the given string.  Returns `TRUE' or `FALSE'
     depending on where this was successful.

`int festival_load_file(const EST_String &filename);'
     Load the contents of the given file and evaluate its contents as
     Lisp commands.  Returns `TRUE' or `FALSE' depending on where this
     was successful.

`int festival_eval_command(const EST_String &expr);'
     Read the given string as a Lisp command and evaluate it.  Returns
     `TRUE' or `FALSE' depending on where this was successful.

`int festival_text_to_wave(const EST_String &text,EST_Wave &wave);'
     Synthesize the given string into the given wave.  Returns `TRUE' or
     `FALSE' depending on where this was successful.  Many other
commands are also available but often the above will be sufficient.

   Below is a simple top level program that uses the Festival functions
     int main(int argc, char **argv)
     {
         EST_Wave wave;
         int heap_size = 210000;  // default scheme heap size
         int load_init_files = 1; // we want the festival init files loaded
     
         festival_initialize(load_init_files,heap_size);
     
         // Say simple file
         festival_say_file("/etc/motd");
     
         festival_eval_command("(voice_ked_diphone)");
         // Say some text;
         festival_say_text("hello world");
     
         // Convert to a waveform
         festival_text_to_wave("hello world",wave);
         wave.save("/tmp/wave.wav","riff");
     
         // festival_say_file puts the system in async mode so we better
         // wait for the spooler to reach the last waveform before exiting
         // This isn't necessary if only festival_say_text is being used (and
         // your own wave playing stuff)
         festival_wait_for_spooler();
     
         return 0;
     }


File: festival.info,  Node: C only API,  Next: Java and JSAPI,  Prev: C/C++ API,  Up: API

C only API
==========

   A simpler C only interface example is given inf
`festival/examples/festival_client.c'.  That interface talks to a
festival server.  The code does not require linking with any other EST
or Festival code so is much smaller and easier to include in other
programs.  The code is missing some functionality but not much consider
how much smaller it is.


File: festival.info,  Node: Java and JSAPI,  Prev: C only API,  Up: API

Java and JSAPI
==============

   Initial support for talking to a Festival server from java is
included from version 1.3.0 and initial JSAPI support is included from
1.4.0.  At present the JSAPI talks to a Festival server elsewhere
rather than as part of the Java process itself.

   A simple (Pure) Java festival client is given
`festival/src/modules/java/cstr/festival/Client.java' with a wraparound
script in `festival/bin/festival_client_java'.

   See the file `festival/src/modules/java/cstr/festival/jsapi/ReadMe'
for requirements and a small example of using the JSAPI interface.


File: festival.info,  Node: Examples,  Next: Problems,  Prev: API,  Up: Top

Examples
********

   This chapter contains some simple walkthrough examples of using
Festival in various ways, not just as speech synthesizer

* Menu:

* POS Example::          Using Festival as a part of speech tagger


File: festival.info,  Node: POS Example,  Up: Examples

POS Example
===========

   This example shows how we can use part of the standard synthesis
process to tokenize and tag a file of text.  This section does not cover
training and setting up a part of speech tag set (*Note POS tagging::),
only how to go about using the standard POS tagger on text.

   This example also shows how to use Festival as a simple scripting
language, and how to modify various methods used during text to speech.

   The file `examples/text2pos' contains an executable shell script
which will read arbitrary ascii text from standard input and produce
words and their part of speech (one per line) on standard output.

   A Festival script, like any other UNIX script, it must start with the
the characters `#!' followed by the name of the `festival' executable.
For scripts the option `-script' is also required.  Thus our first line
looks like
     #!/usr/local/bin/festival -script
   Note that the pathname may need to be different on your system

   Following this we have copious comments, to keep our lawyers happy,
before we get into the real script.

   The basic idea we use is that the tts process segments text into
utterances, those utterances are then passed to a list of functions, as
defined by the Scheme variable `tts_hooks'.  Normally this variable
contains a list of two function, `utt.synth' and `utt.play' which will
synthesize and play the resulting waveform.  In this case, instead, we
wish to predict the part of speech value, and then print it out.

   The first function we define basically replaces the normal synthesis
function `utt.synth'.  It runs the standard festival utterance modules
used in the synthesis process, up to the point where POS is predicted.
This function looks like
     (define (find-pos utt)
     "Main function for processing TTS utterances.  Predicts POS and
     prints words with their POS"
       (Token utt)
       (POS utt)
     )
   The normal text-to-speech process first tokenizes the text splitting
it in to "sentences".  The utterance type of these is `Token'.  Then we
call the `Token' utterance module, which converts the tokens to a
stream of words.  Then we call the `POS' module to predict part of
speech tags for each word.  Normally we would call other modules
ultimately generating a waveform but in this case we need no further
processing.

   The second function we define is one that will print out the words
and parts of speech
     (define (output-pos utt)
     "Output the word/pos for each word in utt"
      (mapcar
       (lambda (pair)
         (format t "%l/%l\n" (car pair) (car (cdr pair))))
       (utt.features utt 'Word '(name pos))))
   This uses the `utt.features' function to extract features from the
items in a named stream of an utterance.  In this case we want the
`name' and `pos' features for each item in the `Word' stream.  Then for
each pair we print out the word's name, a slash and its part of speech
followed by a newline.

   Our next job is to redefine the functions to be called during text
to speech.  The variable `tts_hooks' is defined in `lib/tts.scm'.  Here
we set it to our two newly-defined functions
     (set! tts_hooks (list find-pos output-pos))
   So that garbage collection messages do not appear on the screen we
stop the message from being outputted by the following command
     (gc-status nil)
   The final stage is to start the tts process running on standard
input.  Because we have redefined what functions are to be run on the
utterances, it will no longer generate speech but just predict part of
speech and print it to standard output.
     (tts_file "-")