Sophie: festival-2.1-3.mga1 i586

festival-2.1-3.mga1.i586.rpm

This is festival.info, produced by Makeinfo version 3.12h from
festival.texi.

This file documents the `Festival' Speech Synthesis System a general
text to speech system for making your computer talk and developing new
synthesis techniques.

Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the authors.

File: festival.info, Node: Top, Up: (dir)

This file documents the _Festival Speech Synthesis System_ 1.4.2.
This document contains many gaps and is still in the process of being
written.

* Menu:

* Abstract:: initial comments
* Copying:: How you can copy and share the code
* Acknowledgements:: List of contributors
* What is new:: Enhancements since last public release

* Overview:: Generalities and Philosophy
* Installation:: Compilation and Installation
* Quick start:: Just tell me what to type
* Scheme:: A quick introduction to Festival's scripting language

Text methods for interfacing to Festival
* TTS:: Text to speech modes
* XML/SGML mark-up:: XML/SGML mark-up Language
* Emacs interface:: Using Festival within Emacs

Internal functions
* Phonesets:: Defining and using phonesets
* Lexicons:: Building and compiling Lexicons
* Utterances:: Existing and defining new utterance types

Modules
* Text analysis:: Tokenizing text
* POS tagging:: Part of speech tagging
* Phrase breaks:: Finding phrase breaks
* Intonation:: Intonations modules
* Duration:: Duration modules
* UniSyn synthesizer:: The UniSyn waveform synthesizer
* Diphone synthesizer:: Building and using diphone synthesizers
* Other synthesis methods:: other waveform synthesis methods
* Audio output:: Getting sound from Festival

* Voices:: Adding new voices (and languages)

* Tools:: CART, Ngrams etc

* Building models from databases::

Adding new modules and writing C++ code
* Programming:: Programming in Festival (Lisp/C/C++)
* API:: Using Festival in other programs

* Examples:: Some simple (and not so simple) examples

* Problems:: Reporting bugs.
* References:: Other sources of information
* Feature functions:: List of builtin feature functions.
* Variable list:: Short descriptions of all variables
* Function list:: Short descriptions of all functions
* Index:: Index of concepts.

File: festival.info, Node: Abstract, Next: Copying, Up: Top

Abstract
********

This document provides a user manual for the Festival Speech
Synthesis System, version 1.4.2.

Festival offers a general framework for building speech synthesis
systems as well as including examples of various modules. As a whole it
offers full text to speech through a number APIs: from shell level,
though a Scheme command interpreter, as a C++ library, and an Emacs
interface. Festival is multi-lingual, we have develeoped voices in many
languages including English (UK and US), Spanish and Welsh, though
English is the most advanced.

The system is written in C++ and uses the Edinburgh Speech Tools for
low level architecture and has a Scheme (SIOD) based command
interpreter for control. Documentation is given in the FSF texinfo
format which can generate a printed manual, info files and HTML.

The latest details and a full software distribution of the Festival
Speech Synthesis System are available through its home page which may
be found at
`http://www.cstr.ed.ac.uk/projects/festival.html'

File: festival.info, Node: Copying, Next: Acknowledgements, Prev: Abstract, Up: Top

Copying
*******

As we feeel the core system has reached an acceptable level of
maturity from 1.4.0 the basic system is released under a free lience,
without the commercial restrictions we imposed on early versions. The
basic system has been placed under an X11 type licence which as free
licences go is pretty free. No GPL code is included in festival or the
speech tools themselves (though some auxiliary files are GPL'd e.g. the
Emacs mode for Festival). We have deliberately choosen a licence that
should be compatible with our commercial partners and our free software
users.

However although the code is free, we still offer no warranties and
no maintenance. We will continue to endeavor to fix bugs and answer
queries when can, but are not in a position to guarantee it. We will
consider maintenance contracts and consultancy if desired, please
contacts us for details.

Also note that not all the voices and lexicons we distribute with
festival are free. Particularly the British English lexicon derived
from Oxford Advanced Learners' Dictionary is free only for
non-commercial use (we will release an alternative soon). Also the
Spanish diphone voice we relase is only free for non-commercial use.

If you are using Festival or the speech tools in commercial
environment, even though no licence is required, we would be grateful
if you let us know as it helps justify ourselves to our various
sponsors.

The current copyright on the core system is
The Festival Speech Synthesis System: version 1.4.2
Centre for Speech Technology Research
University of Edinburgh, UK
Copyright (c) 1996-2001
All Rights Reserved.

Permission is hereby granted, free of charge, to use and distribute
this software and its documentation without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of this work, and to
permit persons to whom this work is furnished to do so, subject to
the following conditions:
1. The code must retain the above copyright notice, this list of
conditions and the following disclaimer.
2. Any modifications must be clearly marked as such.
3. Original authors' names are not deleted.
4. The authors' names are not used to endorse or promote products
derived from this software without specific prior written
permission.

THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK
DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT
SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
THIS SOFTWARE.

File: festival.info, Node: Acknowledgements, Next: What is new, Prev: Copying, Up: Top

Acknowledgements
****************

The code in this system was primarily written by Alan W Black, Paul
Taylor and Richard Caley. Festival sits on top of the Edinburgh Speech
Tools Library, and uses much of its functionality.

Amy Isard wrote a synthesizer for her MSc project in 1995, which
first used the Edinburgh Speech Tools Library. Although Festival
doesn't contain any code from that system, her system was used as a
basic model.

Much of the design and philosophy of Festival has been built on the
experience both Paul and Alan gained from the development of various
previous synthesizers and software systems, especially CSTR's Osprey and
Polyglot systems `taylor91' and ATR's CHATR system `black94'.

However, it should be stated that Festival is fully developed at CSTR
and contains neither proprietary code or ideas.

Festival contains a number of subsystems integrated from other
sources and we acknowledge those systems here.

SIOD
====

The Scheme interpreter (SIOD - Scheme In One Defun 3.0) was written
by George Carrett (gjc@mitech.com, gjc@paradigm.com) and offers a
basic small Scheme (Lisp) interpreter suitable for embedding in
applications such as Festival as a scripting language. A number of
changes and improvements have been added in our development but it
still remains that basic system. We are grateful to George and
Paradigm Associates Incorporated for providing such a useful and
well-written sub-system.
Scheme In One Defun (SIOD)
COPYRIGHT (c) 1988-1994 BY
PARADIGM ASSOCIATES INCORPORATED, CAMBRIDGE, MASSACHUSETTS.
ALL RIGHTS RESERVED

Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose and without fee is hereby
granted, provided that the above copyright notice appear in all copies
and that both that copyright notice and this permission notice appear
in supporting documentation, and that the name of Paradigm Associates
Inc not be used in advertising or publicity pertaining to distribution
of the software without specific, written prior permission.

PARADIGM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
PARADIGM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
SOFTWARE.

editline
========

Because of conflicts between the copyright for GNU readline, for
which an optional interface was included in earlier versions, we have
replace the interface with a complete command line editing system based
on `editline'. `Editline' was posted to the USENET newsgroup
`comp.sources.misc' in 1992. A number of modifications have been made
to make it more useful to us but the original code (contained within
the standard speech tools distribution) and our modifications fall
under the following licence.
Copyright 1992 Simmule Turner and Rich Salz. All rights reserved.

This software is not subject to any license of the American Telephone
and Telegraph Company or of the Regents of the University of California.

Permission is granted to anyone to use this software for any purpose on
any computer system, and to alter it and redistribute it freely, subject
to the following restrictions:
1. The authors are not responsible for the consequences of use of this
software, no matter how awful, even if they arise from flaws in it.
2. The origin of this software must not be misrepresented, either by
explicit claim or by omission. Since few users ever read sources,
credits must appear in the documentation.
3. Altered versions must be plainly marked as such, and must not be
misrepresented as being the original software. Since few users
ever read sources, credits must appear in the documentation.
4. This notice may not be removed or altered.

Edinburgh Speech Tools Library
==============================

The Edinburgh Speech Tools lies at the core of Festival. Although
developed separately, much of the development of certain parts of the
Edinburgh Speech Tools has been directed by Festival's needs. In turn
those who have contributed to the Speech Tools make Festival a more
usable system.

*Note Acknowledgements: (speechtools)Acknowledgements.

Online information about the Edinburgh Speech Tools library is
available through
`http://www.cstr.ed.ac.uk/projects/speech_tools.html'

Others
======

Many others have provided actual code and support for Festival, for
which we are grateful. Specifically,

* Alistair Conkie: various low level code points and some design
work, Spanish synthesis, the old diphone synthesis code.

* Steve Isard: directorship and LPC diphone code, design of diphone
schema.

* EPSRC: who fund Alan Black and Paul Taylor.

* Sun Microsystems Laboratories: for supporting the project and
funding Richard.

* AT&T Labs - Research: for supporting the project.

* Paradigm Associates and George Carrett: for Scheme in one defun.

* Mike Macon: Improving the quality of the diphone synthesizer and
LPC analysis.

* Kurt Dusterhoff: Tilt intonation training and modelling.

* Amy Isard: for her SSML project and related synthesizer.

* Richard Tobin: for answering all those difficult questions, the
socket code, and the XML parser.

* Simmule Turner and Rich Salz: command line editor (editline)

* Borja Etxebarria: Help with the Spanish synsthesis

* Briony Williams: Welsh synthesis

* Jacques H. de Villiers: `jacques@cse.ogi.edu' from CSLU at OGI,
for the TCL interface, and other usability issues

* Kevin Lenzo: `lenzo@cs.cmu.edu' from CMU for the PERL interface.

* Rob Clarke: for support under Linux.

* Samuel Audet `guardia@cam.org': OS/2 support

* Mari Ostendorf: For providing access to the BU FM Radio corpus
from which some modules were trained.

* Melvin Hunt: from whose work we based our residual LPC synthesis
model on

* Oxford Text Archive: For the computer users version of Oxford
Advanced Learners' Dictionary (redistributed with permission).

* Reading University: for access to MARSEC from which the phrase
break model was trained.

* LDC & Penn Tree Bank: from which the POS tagger was trained,
redistribution of the models is with permission from the LDC.

* Roger Burroughes and Kurt Dusterhoff: For letting us capture their
voices.

* ATR and Nick Campbell: for first getting Paul and Alan to work
together and for the experience we gained.

* FSF: for G++, make, ....

* Center for Spoken Language Understanding: CSLU at OGI,
particularly Ron Cole and Mike Macon, have acted as significant
users for the system giving significant feedback and allowing us
to teach courses on Festival offering valuable real-use feedback.

* Our beta testers: Thanks to all the people who put up with
previous versions of the system and reported bugs, both big and
small. These comments are very important to the constant
improvements in the system. And thanks for your quick responses
when I had specific requests.

* And our users ... Many people have downloaded earlier versions of
the system. Many have found problems with installation and use
and have reported it to us. Many of you have put up with multiple
compilations trying to fix bugs remotely. We thank you for
putting up with us and are pleased you've taken the time to help
us improve our system. Many of you have come up with uses we
hadn't thought of, which is always rewarding.

Even if you haven't actively responded, the fact that you use the
system at all makes it worthwhile.

File: festival.info, Node: What is new, Next: Overview, Prev: Acknowledgements, Up: Top

What is new
***********

Compared to the the previous major release (1.3.0 release Aug 1998)
1.4.0 is not functionally so different from its previous versions.
This release is primarily a consolidation release fixing and tidying up
some of the lower level aspects of the system to allow better
modularity for some of our future planned modules.

* Copyright change: The system is now free and has no commercial
restriction. Note that currently on the US voices (ked and kal)
are also now unrestricted. The UK English voices depend on the
Oxford Advanced Learners' Dictionary of Current English which
cannot be used for commercial use without permission from Oxford
University Press.

* Architecture tidy up: the interfaces to lower level part parts of
the system have been tidied up deleting some of the older code
that was supported for compatibility reasons. This is a much
higher dependence of features and easier (and safer) ways to
register new objects as feature values and Scheme objects. Scheme
has been tidied up. It is no longer "in one defun" but "in one
directory".

* New documentation system for speech tools: A new docbook based
documentation system has been added to the speech tools.
Festival's documentation will will move over to this sometime soon
too.

* initial JSAPI support: both JSAPI and JSML (somewhat similar to
Sable) now have initial impelementations. They of course depend
on Java support which so far we have only (successfully)
investgated under Solaris and Linux.

* Generalization of statistical models: CART, ngrams, and WFSTs are
now fully supported from Lisp and can be used with a generalized
viterbi function. This makes adding quite complex statistical
models easy without adding new C++.

* Tilt Intonation modelling: Full support is now included for the
Tilt intomation models, both training and use.

* Documentation on Bulding New Voices in Festival: documentation,
scripts etc. for building new voices and languages in the system,
see
`http://www.cstr.ed.ac.uk/projects/festival/docs/festvox/'

File: festival.info, Node: Overview, Next: Installation, Prev: What is new, Up: Top

Overview
********

Festival is designed as a speech synthesis system for at least three
levels of user. First, those who simply want high quality speech from
arbitrary text with the minimum of effort. Second, those who are
developing language systems and wish to include synthesis output. In
this case, a certain amount of customization is desired, such as
different voices, specific phrasing, dialog types etc. The third level
is in developing and testing new synthesis methods.

This manual is not designed as a tutorial on converting text to
speech but for documenting the processes and use of our system. We do
not discuss the detailed algorithms involved in converting text to
speech or the relative merits of multiple methods, though we will often
give references to relevant papers when describing the use of each
module.

For more general information about text to speech we recommend
Dutoit's `An introduction to Text-to-Speech Synthesis' `dutoit97'. For
more detailed research issues in TTS see `sproat98' or `vansanten96'.

* Menu:

* Philosophy:: Why we did it like it is
* Future:: How much better its going to get

File: festival.info, Node: Philosophy, Next: Future, Up: Overview

Philosophy
==========

One of the biggest problems in the development of speech synthesis,
and other areas of speech and language processing systems, is that
there are a lot of simple well-known techniques lying around which can
help you realise your goal. But in order to improve some part of the
whole system it is necessary to have a whole system in which you can
test and improve your part. Festival is intended as that whole system
in which you may simply work on your small part to improve the whole.
Without a system like Festival, before you could even start to test
your new module you would need to spend significant effort to build a
whole system, or adapt an existing one before you could start working
on your improvements.

Festival is specifically designed to allow the addition of new
modules, easily and efficiently, so that development need not get
bogged down in re-implementing the wheel.

But there is another aspect of Festival which makes it more useful
than simply an environment for researching into new synthesis
techniques. It is a fully usable text-to-speech system suitable for
embedding in other projects that require speech output. The provision
of a fully working easy-to-use speech synthesizer in addition to just a
testing environment is good for two specific reasons. First, it offers
a conduit for our research, in that our experiments can quickly and
directly benefit users of our synthesis system. And secondly, in
ensuring we have a fully working usable system we can immediately see
what problems exist and where our research should be directed rather
where our whims take us.

These concepts are not unique to Festival. ATR's CHATR system
(`black94') follows very much the same philosophy and Festival benefits
from the experiences gained in the development of that system.
Festival benefits from various pieces of previous work. As well as
CHATR, CSTR's previous synthesizers, Osprey and the Polyglot projects
influenced many design decisions. Also we are influenced by more
general programs in considering software engineering issues, especially
GNU Octave and Emacs on which the basic script model was based.

Unlike in some other speech and language systems, software
engineering is considered very important to the development of
Festival. Too often research systems consist of random collections of
hacky little scripts and code. No one person can confidently describe
the algorithms it performs, as parameters are scattered throughout the
system, with tricks and hacks making it impossible to really evaluate
why the system is good (or bad). Such systems do not help the
advancement of speech technology, except perhaps in pointing at ideas
that should be further investigated. If the algorithms and techniques
cannot be described externally from the program _such that_ they can
reimplemented by others, what is the point of doing the work?

Festival offers a common framework where multiple techniques may be
implemented (by the same or different researchers) so that they may be
tested more fairly in the same environment.

As a final word, we'd like to make two short statements which both
achieve the same end but unfortunately perhaps not for the same reasons:
Good software engineering makes good research easier
But the following seems to be true also
If you spend enough effort on something it can be shown to be
better than its competitors.

File: festival.info, Node: Future, Prev: Philosophy, Up: Overview

Future
======

Festival is still very much in development. Hopefully this state
will continue for a long time. It is never possible to complete
software, there are always new things that can make it better. However
as time goes on Festival's core architecture will stabilise and little
or no changes will be made. Other aspects of the system will gain
greater attention such as waveform synthesis modules, intonation
techniques, text type dependent analysers etc.

Festival will improve, so don't expected it to be the same six months
from now.

A number of new modules and enhancements are already under
consideration at various stages of implementation. The following is a
non-exhaustive list of what we may (or may not) add to Festival over the
next six months or so.
* Selection-based synthesis: Moving away from diphone technology to
more generalized selection of units for speech database.

* New structure for linguistic content of utterances: Using
techniques for Metrical Phonology we are building more structure
representations of utterances reflecting there linguistic
significance better. This will allow improvements in prosody and
unit selection.

* Non-prosodic prosodic control: For language generation systems and
custom tasks where the speech to be synthesized is being generated
by some program, more information about text structure will
probably exist, such as phrasing, contrast, key items etc. We
are investigating the relationship of high-level tags to prosodic
information through the Sole project
`http://www.cstr.ed.ac.uk/projects/sole.html'

* Dialect independent lexicons: Currently for each new dialect we
need a new lexicon, we are currently investigating a form of
lexical specification that is dialect independent that allows the
core form to be mapped to different dialects. This will make the
generation of voices in different dialects much easier.

File: festival.info, Node: Installation, Next: Quick start, Prev: Overview, Up: Top

Installation
************

This section describes how to install Festival from source in a new
location and customize that installation.

* Menu:

* Requirements:: Software/Hardware requirements for Festival
* Configuration:: Setting up compilation
* Site initialization:: Settings for your particular site
* Checking an installation:: But does it work ...
* Y2K:: Comment on Festival and year 2000

File: festival.info, Node: Requirements, Next: Configuration, Up: Installation

Requirements
============

In order to compile Festival you first need the following source
packages

`festival-1.4.2.tar.gz'
Festival Speech Synthesis System source

`speech_tools-1.2.2.tar.gz'
The Edinburgh Speech Tools Library

`festlex_NAME.tar.gz'
The lexicon distribution, where possible, includes the lexicon
input file as well as the compiled form, for your convenience.
The lexicons have varying distribution policies, but are all free
except OALD, which is only free for non-commercial use (we are
working on a free replacement). In some cases only a pointer to
an ftp'able file plus a program to convert that file to the
Festival format is included.

`festvox_NAME.tar.gz'
You'll need a speech database. A number are available (with
varying distribution policies). Each voice may have other
dependencies such as requiring particular lexicons

`festdoc_1.4.2.tar.gz'
Full postscript, info and html documentation for Festival and the
Speech Tools. The source of the documentation is available in the
standard distributions but for your conveniences it has been
pre-generated.

In addition to Festival specific sources you will also need

_A UNIX machine_
Currently we have compiled and tested the system under Solaris
(2.5(.1), 2.6, 2.7 and 2.8), SunOS (4.1.3), FreeBSD 3.x, 4.x Linux
(Redhat 4.1, 5.0, 5.1, 5.2, 6.[012], 7.[01] and other Linux
distributions), and it should work under OSF (Dec Alphas) SGI
(Irix), HPs (HPUX). But any standard UNIX machine should be
acceptable. We have now successfully ported this version to
Windows NT nad Windows 95 (using the Cygnus GNU win32
environment). This is still a young port but seems to work.

_A C++ compiler_
Note that C++ is not very portable even between different versions
of the compiler from the same vendor. Although we've tried very
hard to make the system portable, we know it is very unlikely to
compile without change except with compilers that have already
been tested. The currently tested systems are
* Sun Sparc Solaris 2.5, 2.5.1, 2.6, 2.7: GCC 2.7.2, egcs
1.1.1, egcs 1.1.2, GCC 2.95.1

* Sun Sparc SunOS 4.1.3: GCC 2.7.2

* FreeBSD for Intel 3.x and 4.x GCC 2.95.1, GCC 3.0

* Linux for Intel (RedHat 4.1/5.0/5.1/5.2/6.0): GCC 2.7.2, GCC
2.7.2/egcs-1.0.2, egcs 1.1.1, egcs-1.1.2, GCC 2.95.[123], GCC
"2.96", GCC 3.0

* Windows NT 4.0: GCC 2.7.2 plus egcs (from Cygnus GNU win32
b19), Visual C++ PRO v5.0, Visual C++ v6.0
Note if GCC works on one version of Unix it usually works on
others.

We have compiled both the speech tools and Festival under Windows
NT 4.0 and Windows 95 using the GNU tools available from Cygnus.
`ftp://ftp.cygnus.com/pub/gnu-win32/'.

_GNU make_
Due to there being too many different `make' programs out there we
have tested the system using GNU make on all systems we use.
Others may work but we know GNU make does.

_Audio hardware_
You can use Festival without audio output hardware but it doesn't
sound very good (though admittedly you can hear less problems with
it). A number of audio systems are supported (directly inherited
from the audio support in the Edinburgh Speech Tools Library):
NCD's NAS (formerly called netaudio) a network transparent audio
system (which can be found at
`ftp://ftp.x.org/contrib/audio/nas/'); `/dev/audio' (at 8k ulaw
and 8/16bit linear), found on Suns, Linux machines and FreeBSD;
and a method allowing arbitrary UNIX commands. *Note Audio
output::.

Earlier versions of Festival mistakenly offered a command line editor
interface to the GNU package readline, but due to conflicts with the GNU
Public Licence and Festival's licence this interface was removed in
version 1.3.1. Even Festival's new free licence would cause problems as
readline support would restrict Festival linking with non-free code. A
new command line interface based on editline was provided that offers
similar functionality. Editline remains a compilation option as it is
probably not yet as portable as we would like it to be.

In addition to the above, in order to process the documentation you
will need `TeX', `dvips' (or similar), GNU's `makeinfo' (part of the
texinfo package) and `texi2html' which is available from
`http://wwwcn.cern.ch/dci/texi2html/'.

However the document files are also available pre-processed into,
postscript, DVI, info and html as part of the distribution in
`festdoc-1.4.X.tar.gz'.

Ensure you have a fully installed and working version of your C++
compiler. Most of the problems people have had in installing Festival
have been due to incomplete or bad compiler installation. It might be
worth checking if the following program works if you don't know if
anyone has used your C++ installation before.
#include <iostream.h>
int main (int argc, char **argv)
{
cout << "Hello world\n";
}

Unpack all the source files in a new directory. The directory will
then contain two subdirectories
speech_tools/
festival/

File: festival.info, Node: Configuration, Next: Site initialization, Prev: Requirements, Up: Installation

Configuration
=============

First ensure you have a compiled version of the Edinburgh Speech
Tools Library. See `speech_tools/INSTALL' for instructions.

The system now supports the standard GNU `configure' method for set
up. In most cases this will automatically configure festival for your
particular system. In most cases you need only type
gmake
and the system will configure itself and conpile, (note you need to
have compiled the Edinburgh Speech Tools `speech_tools-1.2.2' first.

In some case hand configure is require. All of the configuration
choise are held in the file `config/config'

For the most part Festival configuration inherits the configuration
from your speech tools config file (`../speech_tools/config/config').
Additional optional modules may be added by adding them to the end of
your config file e.g.
ALSO_INCLUDE += clunits
Adding and new module here will treat is as a new directory in the
`src/modules/' and compile it into the system in the same way the
`OTHER_DIRS' feature was used in previous versions.

If the compilation directory being accessed by NFS or if you use an
automounter (e.g. amd) it is recommend to explicitly set the variable
`FESTIVAL_HOME' in `config/config'. The command `pwd' is not reliable
when a directory may have multiple names.

There is a simple test suite with Festival but it requires the three
basic voices and their respective lexicons install before it will work.
Thus you need to install
festlex_CMU.tar.gz
festlex_OALD.tar.gz
festlex_POSLEX.tar.gz
festvox_don.tar.gz
festvox_kedlpc16k.tar.gz
festvox_rablpc16k.tar.gz
If these are installed you can test the installation with
gmake test

To simply make it run with a male US Ebglish voiuce it is sufficient
to install just
festlex_CMU.tar.gz
festlex_POSLEX.tar.gz
festvox_kallpc16k.tar.gz

Note that the single most common reason for problems in compilation
and linking found amongst the beta testers was a bad installation of GNU
C++. If you get many strange errors in G++ library header files or link
errors it is worth checking that your system has the compiler, header
files and runtime libraries properly installed. This may be checked by
compiling a simple program under C++ and also finding out if anyone at
your site has ever used the installation. Most of these installation
problems are caused by upgrading to a newer version of libg++ without
removing the older version so a mixed version of the `.h' files exist.

Although we have tried very hard to ensure that Festival compiles
with no warnings this is not possible under some systems.

Under SunOS the system include files do not declare a number of
system provided functions. This a bug in Sun's include files. This
will causes warnings like "implicit definition of fprintf". These are
harmless.

Under Linux a warning at link time about reducing the size of some
symbols often is produced. This is harmless. There is often
occasional warnings about some socket system function having an
incorrect argument type, this is also harmless.

The speech tools and festival compile under Windows95 or Windows NT
with Visual C++ v5.0 using the Microsoft `nmake' make program. We've
only done this with the Professonal edition, but have no reason to
believe that it relies on anything not in the standard edition.

In accordance to VC++ conventions, object files are created with
extension .obj, executables with extension .exe and libraries with
extension .lib. This may mean that both unix and Win32 versions can be
built in the same directory tree, but I wouldn't rely on it.

To do this you require nmake Makefiles for the system. These can be
generated from the gnumake Makefiles, using the command
gnumake VCMakefile
in the speech_tools and festival directories. I have only done this
under unix, it's possible it would work under the cygnus gnuwin32
system.

If `make.depend' files exist (i.e. if you have done `gnumake depend'
in unix) equivalent `vc_make.depend' files will be created, if not the
VCMakefiles will not contain dependency information for the `.cc'
files. The result will be that you can compile the system once, but
changes will not cause the correct things to be rebuilt.

In order to compile from the DOS command line using Visual C++ you
need to have a collection of environment variables set. In Windows NT
there is an instalation option for Visual C++ which sets these
globally. Under Windows95 or if you don't ask for them to be set
globally under NT you need to run
vcvars32.bat
See the VC++ documentation for more details.

Once you have the source trees with VCMakefiles somewhere visible
from Windows, you need to copy `peech_tools\config\vc_config-dist' to
`speech_tools\config\vc_config' and edit it to suit your local
situation. Then do the same with `festival\config\vc_config-dist'.

The thing most likely to need changing is the definition of
`FESTIVAL_HOME' in `festival\config\vc_config_make_rules' which needs
to point to where you have put festival.

Now you can compile. cd to the speech_tools directory and do
nmake /nologo /fVCMakefile
and the library, the programs in main and the test programs should be
compiled.

The tests can't be run automatically under Windows. A simple test to
check that things are probably OK is:
main\na_play testsuite\data\ch_wave.wav
which reads and plays a waveform.
Next go into the festival directory and do
nmake /nologo /fVCMakefile
to build festival. When it's finished, and assuming you have the
voices and lexicons unpacked in the right place, festival should run
just as under unix.

We should remind you that the NT/95 ports are still young and there
may yet be problems that we've not found yet. We only recommend the
use the speech tools and Festival under Windows if you have significant
experience in C++ under those platforms.

Most of the modules `src/modules' are actually optional and the
system could be compiled without them. The basic set could be reduced
further if certain facilities are not desired. Particularly: `donovan'
which is only required if the donovan voice is used; `rxp' if no XML
parsing is required (e.g. Sable); and `parser' if no stochastic paring
is required (this parser isn't used for any of our currently released
voices). Actually even `UniSyn' and `UniSyn_diphone' could be removed
if some external waveform synthesizer is being used (e.g. MBROLA) or
some alternative one like `OGIresLPC'. Removing unused modules will
make the festival binary smaller and (potentially) start up faster but
don't expect too much. You can delete these by changing the
`BASE_DIRS' variable in `src/modules/Makefile'.

File: festival.info, Node: Site initialization, Next: Checking an installation, Prev: Configuration, Up: Installation

Site initialization
===================

Once compiled Festival may be further customized for particular
sites. At start up time Festival loads the file `init.scm' from its
library directory. This file further loads other necessary files such
as phoneset descriptions, duration parameters, intonation parameters,
definitions of voices etc. It will also load the files `sitevars.scm'
and `siteinit.scm' if they exist. `sitevars.scm' is loaded after the
basic Scheme library functions are loaded but before any of the
festival related functions are loaded. This file is intended to set
various path names before various subsystems are loaded. Typically
variables such as `lexdir' (the directory where the lexicons are held),
and `voices_dir' (pointing to voice directories) should be reset here
if necessary.

The default installation will try to find its lexicons and voices
automatically based on the value of `load-path' (this is derived from
`FESTIVAL_HOME' at compilation time or by using the `--libdir' at
run-time). If the voices and lexicons have been unpacked into
subdirectories of the library directory (the default) then no site
specific initialization of the above pathnames will be necessary.

The second site specific file is `siteinit.scm'. Typical examples
of local initialization are as follows. The default audio output method
is NCD's NAS system if that is supported as that's what we use normally
in CSTR. If it is not supported, any hardware specific mode is the
default (e.g. sun16audio, freebas16audio, linux16audio or mplayeraudio).
But that default is just a setting in `init.scm'. If for example in
your environment you may wish the default audio output method to be 8k
mulaw through `/dev/audio' you should add the following line to your
`siteinit.scm' file
(Parameter.set 'Audio_Method 'sunaudio)
Note the use of `Parameter.set' rather than `Parameter.def' the
second function will not reset the value if it is already set.
Remember that you may use the audio methods `sun16audio'.
`linux16audio' or `freebsd16audio' only if `NATIVE_AUDIO' was selected
in `speech_tools/config/config' and your are on such machines. The
Festival variable `*modules*' contains a list of all supported
functions/modules in a particular installation including audio support.
Check the value of that variable if things aren't what you expect.

If you are installing on a machine whose audio is not directly
supported by the speech tools library, an external command may be
executed to play a waveform. The following example is for an imaginary
machine that can play audio files through a program called `adplay'
with arguments for sample rate and file type. When playing waveforms,
Festival, by default, outputs as unheadered waveform in native byte
order. In this example you would set up the default audio playing
mechanism in `siteinit.scm' as follows
(Parameter.set 'Audio_Method 'Audio_Command)
(Parameter.set 'Audio_Command "adplay -raw -r $SR $FILE")
For `Audio_Command' method of playing waveforms Festival supports
two additional audio parameters. `Audio_Required_Rate' allows you to
use Festivals internal sample rate conversion function to any desired
rate. Note this may not be as good as playing the waveform at the
sample rate it is originally created in, but as some hardware devices
are restrictive in what sample rates they support, or have naive
resample functions this could be optimal. The second addition audio
parameter is `Audio_Required_Format' which can be used to specify the
desired output forms of the file. The default is unheadered raw, but
this may be any of the values supported by the speech tools (including
nist, esps, snd, riff, aiff, audlab, raw and, if you really want it,
ascii).

For example suppose you run Festival on a remote machine and are not
running any network audio system and want Festival to copy files back to
your local machine and simply cat them to `/dev/audio'. The following
would do that (assuming permissions for rsh are allowed).
(Parameter.set 'Audio_Method 'Audio_Command)
;; Make output file ulaw 8k (format ulaw implies 8k)
(Parameter.set 'Audio_Required_Format 'ulaw)
(Parameter.set 'Audio_Command
"userhost=`echo $DISPLAY | sed 's/:.*$//'`; rcp $FILE $userhost:$FILE; \
rsh $userhost \"cat $FILE >/dev/audio\" ; rsh $userhost \"rm $FILE\"")
Note there are limits on how complex a command you want to put in the
`Audio_Command' string directly. It can get very confusing with respect
to quoting. It is therefore recommended that once you get past a
certain complexity consider writing a simple shell script and calling
it from the `Audio_Command' string.

A second typical customization is setting the default speaker.
Speakers depend on many things but due to various licence (and resource)
restrictions you may only have some diphone/nphone databases available
in your installation. The function name that is the value of
`voice_default' is called immediately after `siteinit.scm' is loaded
offering the opportunity for you to change it. In the standard
distribution no change should be required. If you download all the
distributed voices `voice_rab_diphone' is the default voice. You may
change this for a site by adding the following to `siteinit.scm' or per
person by changing your `.festivalrc'. For example if you wish to
change the default voice to the American one `voice_ked_diphone'
(set! voice_default 'voice_ked_diphone)
Note the single quote, and note that unlike in early versions
`voice_default' is not a function you can call directly.

A second level of customization is on a per user basis. After
loading `init.scm', which includes `sitevars.scm' and `siteinit.scm'
for local installation, Festival loads the file `.festivalrc' from the
user's home directory (if it exists). This file may contain arbitrary
Festival commands.

File: festival.info, Node: Checking an installation, Next: Y2K, Prev: Site initialization, Up: Installation

Checking an installation
========================

Once compiled and site initialization is set up you should test to
see if Festival can speak or not.

Start the system
$ bin/festival
Festival Speech Synthesis System 1.4.2:release July 2001
Copyright (C) University of Edinburgh, 1996-2001. All rights reserved.
For details type `(festival_warranty)'
festival> ^D
If errors occur at this stage they are most likely to do with
pathname problems. If any error messages are printed about
non-existent files check that those pathnames point to where you
intended them to be. Most of the (default) pathnames are dependent on
the basic library path. Ensure that is correct. To find out what it
has been set to, start the system without loading the init files.
$ bin/festival -q
Festival Speech Synthesis System 1.4.2:release July 2001
Copyright (C) University of Edinburgh, 1996-2001. All rights reserved.
For details type `(festival_warranty)'
festival> libdir
"/projects/festival/lib/"
festival> ^D
This should show the pathname you set in your `config/config'.

If the system starts with no errors try to synthesize something
festival> (SayText "hello world")
Some files are only accessed at synthesis time so this may show up
other problem pathnames. If it talks, you're in business, if it
doesn't, here are some possible problems.

If you get the error message
Can't access NAS server
You have selected NAS as the audio output but have no server running
on that machine or your `DISPLAY' or `AUDIOSERVER' environment variable
is not set properly for your output device. Either set these properly
or change the audio output device in `lib/siteinit.scm' as described
above.

Ensure your audio device actually works the way you think it does.
On Suns, the audio output device can be switched into a number of
different output modes, speaker, jack, headphones. If this is set to
the wrong one you may not hear the output. Use one of Sun's tools to
change this (try `/usr/demo/SOUND/bin/soundtool'). Try to find an audio
file independent of Festival and get it to play on your audio. Once
you have done that ensure that the audio output method set in Festival
matches that.

Once you have got it talking, test the audio spooling device.
festival> (intro)
This plays a short introduction of two sentences, spooling the audio
output.

Finally exit from Festival (by end of file or `(quit)') and test the
script mode with.
$ examples/saytime

A test suite is included with Festival but it makes certain
assumptions about which voices are installed. It assumes that
`voice_rab_diphone' (`festvox_rabxxxx.tar.gz') is the default voice and
that `voice_ked_diphone' and `voice_don_diphone'
(`festvox_kedxxxx.tar.gz' and `festvox_don.tar.gz') are installed.
Also local settings in your `festival/lib/siteinit.scm' may affect
these tests. However, after installation it may be worth trying
gnumake test
from the `festival/' directory. This will do various tests
including basic utterance tests and tokenization tests. It also checks
that voices are installed and that they don't interfere with each other.
These tests are primarily regression tests for the developers of
Festival, to ensure new enhancements don't mess up existing supported
features. They are not designed to test an installation is successful,
though if they run correctly it is most probable the installation has
worked.

File: festival.info, Node: Y2K, Prev: Checking an installation, Up: Installation

Y2K
===

Festival comes with _no_ warranty therefore we will not make any
legal statement about the performance of the system. However a number
of people have ask about Festival and Y2K compliance, and we have
decided to make some comments on this.

Every effort has been made to ensure that Festival will continue
running as before into the next millenium. However even if Festival
itself has no problems it is dependent on the operating system
environment it is running in. During compilation dates on files are
important and the compilation process may not work if your machine
cannot assign (reasonable) dates to new files. At run time there is
less dependence on system dates and times. Specifically times are used
in generation of random numbers (where only relative time is important)
and as time stamps in log files when festival runs in server mode, thus
we feel it is unlikely there will be any problems.

However, as a speech synthesizer, Festival must make explicit
decisions about the pronunciation of dates in the next two decades when
people themselves have not yet made such decisions. Most people are
still unsure how to read years written as '01, '04, '12, 00s, 10s, (cf.
'86, 90s). It is interesting to note that while there is a convenient
short name for the last decade of the twentieth century, the "ninties"
there is no equivalent name for the first decade of the twenty-first
century (or the second). In the mean time we have made reasonable
decisions about such pronunciations.

Once people have themselves become Y2K compliant and decided what to
actually call these years, if their choices are different from how
Festival pronounces them we reserve the right to change how Festival
speaks these dates to match their belated decisions. However as we do
not give out warranties about compliance we will not be requiring our
users to return signed Y2K compliant warranties about their own
compliance either.

File: festival.info, Node: Quick start, Next: Scheme, Prev: Installation, Up: Top

Quick start
***********

This section is for those who just want to know the absolute basics
to run the system.

Festival works in two fundamental modes, _command mode_ and
_text-to-speech mode_ (tts-mode). In command mode, information (in
files or through standard input) is treated as commands and is
interpreted by a Scheme interpreter. In tts-mode, information (in files
or through standard input) is treated as text to be rendered as speech.
The default mode is command mode, though this may change in later
versions.

* Menu:

* Basic command line options::
* Simple command driven session::
* Getting some help::