This is festival.info, produced by Makeinfo version 3.12h from festival.texi. This file documents the `Festival' Speech Synthesis System a general text to speech system for making your computer talk and developing new synthesis techniques. Copyright (C) 1996-2001 University of Edinburgh Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the authors. File: festival.info, Node: Top, Up: (dir) This file documents the _Festival Speech Synthesis System_ 1.4.2. This document contains many gaps and is still in the process of being written. * Menu: * Abstract:: initial comments * Copying:: How you can copy and share the code * Acknowledgements:: List of contributors * What is new:: Enhancements since last public release * Overview:: Generalities and Philosophy * Installation:: Compilation and Installation * Quick start:: Just tell me what to type * Scheme:: A quick introduction to Festival's scripting language Text methods for interfacing to Festival * TTS:: Text to speech modes * XML/SGML mark-up:: XML/SGML mark-up Language * Emacs interface:: Using Festival within Emacs Internal functions * Phonesets:: Defining and using phonesets * Lexicons:: Building and compiling Lexicons * Utterances:: Existing and defining new utterance types Modules * Text analysis:: Tokenizing text * POS tagging:: Part of speech tagging * Phrase breaks:: Finding phrase breaks * Intonation:: Intonations modules * Duration:: Duration modules * UniSyn synthesizer:: The UniSyn waveform synthesizer * Diphone synthesizer:: Building and using diphone synthesizers * Other synthesis methods:: other waveform synthesis methods * Audio output:: Getting sound from Festival * Voices:: Adding new voices (and languages) * Tools:: CART, Ngrams etc * Building models from databases:: Adding new modules and writing C++ code * Programming:: Programming in Festival (Lisp/C/C++) * API:: Using Festival in other programs * Examples:: Some simple (and not so simple) examples * Problems:: Reporting bugs. * References:: Other sources of information * Feature functions:: List of builtin feature functions. * Variable list:: Short descriptions of all variables * Function list:: Short descriptions of all functions * Index:: Index of concepts. File: festival.info, Node: Abstract, Next: Copying, Up: Top Abstract ******** This document provides a user manual for the Festival Speech Synthesis System, version 1.4.2. Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, and an Emacs interface. Festival is multi-lingual, we have develeoped voices in many languages including English (UK and US), Spanish and Welsh, though English is the most advanced. The system is written in C++ and uses the Edinburgh Speech Tools for low level architecture and has a Scheme (SIOD) based command interpreter for control. Documentation is given in the FSF texinfo format which can generate a printed manual, info files and HTML. The latest details and a full software distribution of the Festival Speech Synthesis System are available through its home page which may be found at `http://www.cstr.ed.ac.uk/projects/festival.html' File: festival.info, Node: Copying, Next: Acknowledgements, Prev: Abstract, Up: Top Copying ******* As we feeel the core system has reached an acceptable level of maturity from 1.4.0 the basic system is released under a free lience, without the commercial restrictions we imposed on early versions. The basic system has been placed under an X11 type licence which as free licences go is pretty free. No GPL code is included in festival or the speech tools themselves (though some auxiliary files are GPL'd e.g. the Emacs mode for Festival). We have deliberately choosen a licence that should be compatible with our commercial partners and our free software users. However although the code is free, we still offer no warranties and no maintenance. We will continue to endeavor to fix bugs and answer queries when can, but are not in a position to guarantee it. We will consider maintenance contracts and consultancy if desired, please contacts us for details. Also note that not all the voices and lexicons we distribute with festival are free. Particularly the British English lexicon derived from Oxford Advanced Learners' Dictionary is free only for non-commercial use (we will release an alternative soon). Also the Spanish diphone voice we relase is only free for non-commercial use. If you are using Festival or the speech tools in commercial environment, even though no licence is required, we would be grateful if you let us know as it helps justify ourselves to our various sponsors. The current copyright on the core system is The Festival Speech Synthesis System: version 1.4.2 Centre for Speech Technology Research University of Edinburgh, UK Copyright (c) 1996-2001 All Rights Reserved. Permission is hereby granted, free of charge, to use and distribute this software and its documentation without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of this work, and to permit persons to whom this work is furnished to do so, subject to the following conditions: 1. The code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Any modifications must be clearly marked as such. 3. Original authors' names are not deleted. 4. The authors' names are not used to endorse or promote products derived from this software without specific prior written permission. THE UNIVERSITY OF EDINBURGH AND THE CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE UNIVERSITY OF EDINBURGH NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. File: festival.info, Node: Acknowledgements, Next: What is new, Prev: Copying, Up: Top Acknowledgements **************** The code in this system was primarily written by Alan W Black, Paul Taylor and Richard Caley. Festival sits on top of the Edinburgh Speech Tools Library, and uses much of its functionality. Amy Isard wrote a synthesizer for her MSc project in 1995, which first used the Edinburgh Speech Tools Library. Although Festival doesn't contain any code from that system, her system was used as a basic model. Much of the design and philosophy of Festival has been built on the experience both Paul and Alan gained from the development of various previous synthesizers and software systems, especially CSTR's Osprey and Polyglot systems `taylor91' and ATR's CHATR system `black94'. However, it should be stated that Festival is fully developed at CSTR and contains neither proprietary code or ideas. Festival contains a number of subsystems integrated from other sources and we acknowledge those systems here. SIOD ==== The Scheme interpreter (SIOD - Scheme In One Defun 3.0) was written by George Carrett (gjc@mitech.com, gjc@paradigm.com) and offers a basic small Scheme (Lisp) interpreter suitable for embedding in applications such as Festival as a scripting language. A number of changes and improvements have been added in our development but it still remains that basic system. We are grateful to George and Paradigm Associates Incorporated for providing such a useful and well-written sub-system. Scheme In One Defun (SIOD) COPYRIGHT (c) 1988-1994 BY PARADIGM ASSOCIATES INCORPORATED, CAMBRIDGE, MASSACHUSETTS. ALL RIGHTS RESERVED Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Paradigm Associates Inc not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. PARADIGM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL PARADIGM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. editline ======== Because of conflicts between the copyright for GNU readline, for which an optional interface was included in earlier versions, we have replace the interface with a complete command line editing system based on `editline'. `Editline' was posted to the USENET newsgroup `comp.sources.misc' in 1992. A number of modifications have been made to make it more useful to us but the original code (contained within the standard speech tools distribution) and our modifications fall under the following licence. Copyright 1992 Simmule Turner and Rich Salz. All rights reserved. This software is not subject to any license of the American Telephone and Telegraph Company or of the Regents of the University of California. Permission is granted to anyone to use this software for any purpose on any computer system, and to alter it and redistribute it freely, subject to the following restrictions: 1. The authors are not responsible for the consequences of use of this software, no matter how awful, even if they arise from flaws in it. 2. The origin of this software must not be misrepresented, either by explicit claim or by omission. Since few users ever read sources, credits must appear in the documentation. 3. Altered versions must be plainly marked as such, and must not be misrepresented as being the original software. Since few users ever read sources, credits must appear in the documentation. 4. This notice may not be removed or altered. Edinburgh Speech Tools Library ============================== The Edinburgh Speech Tools lies at the core of Festival. Although developed separately, much of the development of certain parts of the Edinburgh Speech Tools has been directed by Festival's needs. In turn those who have contributed to the Speech Tools make Festival a more usable system. *Note Acknowledgements: (speechtools)Acknowledgements. Online information about the Edinburgh Speech Tools library is available through `http://www.cstr.ed.ac.uk/projects/speech_tools.html' Others ====== Many others have provided actual code and support for Festival, for which we are grateful. Specifically, * Alistair Conkie: various low level code points and some design work, Spanish synthesis, the old diphone synthesis code. * Steve Isard: directorship and LPC diphone code, design of diphone schema. * EPSRC: who fund Alan Black and Paul Taylor. * Sun Microsystems Laboratories: for supporting the project and funding Richard. * AT&T Labs - Research: for supporting the project. * Paradigm Associates and George Carrett: for Scheme in one defun. * Mike Macon: Improving the quality of the diphone synthesizer and LPC analysis. * Kurt Dusterhoff: Tilt intonation training and modelling. * Amy Isard: for her SSML project and related synthesizer. * Richard Tobin: for answering all those difficult questions, the socket code, and the XML parser. * Simmule Turner and Rich Salz: command line editor (editline) * Borja Etxebarria: Help with the Spanish synsthesis * Briony Williams: Welsh synthesis * Jacques H. de Villiers: `jacques@cse.ogi.edu' from CSLU at OGI, for the TCL interface, and other usability issues * Kevin Lenzo: `lenzo@cs.cmu.edu' from CMU for the PERL interface. * Rob Clarke: for support under Linux. * Samuel Audet `guardia@cam.org': OS/2 support * Mari Ostendorf: For providing access to the BU FM Radio corpus from which some modules were trained. * Melvin Hunt: from whose work we based our residual LPC synthesis model on * Oxford Text Archive: For the computer users version of Oxford Advanced Learners' Dictionary (redistributed with permission). * Reading University: for access to MARSEC from which the phrase break model was trained. * LDC & Penn Tree Bank: from which the POS tagger was trained, redistribution of the models is with permission from the LDC. * Roger Burroughes and Kurt Dusterhoff: For letting us capture their voices. * ATR and Nick Campbell: for first getting Paul and Alan to work together and for the experience we gained. * FSF: for G++, make, .... * Center for Spoken Language Understanding: CSLU at OGI, particularly Ron Cole and Mike Macon, have acted as significant users for the system giving significant feedback and allowing us to teach courses on Festival offering valuable real-use feedback. * Our beta testers: Thanks to all the people who put up with previous versions of the system and reported bugs, both big and small. These comments are very important to the constant improvements in the system. And thanks for your quick responses when I had specific requests. * And our users ... Many people have downloaded earlier versions of the system. Many have found problems with installation and use and have reported it to us. Many of you have put up with multiple compilations trying to fix bugs remotely. We thank you for putting up with us and are pleased you've taken the time to help us improve our system. Many of you have come up with uses we hadn't thought of, which is always rewarding. Even if you haven't actively responded, the fact that you use the system at all makes it worthwhile. File: festival.info, Node: What is new, Next: Overview, Prev: Acknowledgements, Up: Top What is new *********** Compared to the the previous major release (1.3.0 release Aug 1998) 1.4.0 is not functionally so different from its previous versions. This release is primarily a consolidation release fixing and tidying up some of the lower level aspects of the system to allow better modularity for some of our future planned modules. * Copyright change: The system is now free and has no commercial restriction. Note that currently on the US voices (ked and kal) are also now unrestricted. The UK English voices depend on the Oxford Advanced Learners' Dictionary of Current English which cannot be used for commercial use without permission from Oxford University Press. * Architecture tidy up: the interfaces to lower level part parts of the system have been tidied up deleting some of the older code that was supported for compatibility reasons. This is a much higher dependence of features and easier (and safer) ways to register new objects as feature values and Scheme objects. Scheme has been tidied up. It is no longer "in one defun" but "in one directory". * New documentation system for speech tools: A new docbook based documentation system has been added to the speech tools. Festival's documentation will will move over to this sometime soon too. * initial JSAPI support: both JSAPI and JSML (somewhat similar to Sable) now have initial impelementations. They of course depend on Java support which so far we have only (successfully) investgated under Solaris and Linux. * Generalization of statistical models: CART, ngrams, and WFSTs are now fully supported from Lisp and can be used with a generalized viterbi function. This makes adding quite complex statistical models easy without adding new C++. * Tilt Intonation modelling: Full support is now included for the Tilt intomation models, both training and use. * Documentation on Bulding New Voices in Festival: documentation, scripts etc. for building new voices and languages in the system, see `http://www.cstr.ed.ac.uk/projects/festival/docs/festvox/' File: festival.info, Node: Overview, Next: Installation, Prev: What is new, Up: Top Overview ******** Festival is designed as a speech synthesis system for at least three levels of user. First, those who simply want high quality speech from arbitrary text with the minimum of effort. Second, those who are developing language systems and wish to include synthesis output. In this case, a certain amount of customization is desired, such as different voices, specific phrasing, dialog types etc. The third level is in developing and testing new synthesis methods. This manual is not designed as a tutorial on converting text to speech but for documenting the processes and use of our system. We do not discuss the detailed algorithms involved in converting text to speech or the relative merits of multiple methods, though we will often give references to relevant papers when describing the use of each module. For more general information about text to speech we recommend Dutoit's `An introduction to Text-to-Speech Synthesis' `dutoit97'. For more detailed research issues in TTS see `sproat98' or `vansanten96'. * Menu: * Philosophy:: Why we did it like it is * Future:: How much better its going to get File: festival.info, Node: Philosophy, Next: Future, Up: Overview Philosophy ========== One of the biggest problems in the development of speech synthesis, and other areas of speech and language processing systems, is that there are a lot of simple well-known techniques lying around which can help you realise your goal. But in order to improve some part of the whole system it is necessary to have a whole system in which you can test and improve your part. Festival is intended as that whole system in which you may simply work on your small part to improve the whole. Without a system like Festival, before you could even start to test your new module you would need to spend significant effort to build a whole system, or adapt an existing one before you could start working on your improvements. Festival is specifically designed to allow the addition of new modules, easily and efficiently, so that development need not get bogged down in re-implementing the wheel. But there is another aspect of Festival which makes it more useful than simply an environment for researching into new synthesis techniques. It is a fully usable text-to-speech system suitable for embedding in other projects that require speech output. The provision of a fully working easy-to-use speech synthesizer in addition to just a testing environment is good for two specific reasons. First, it offers a conduit for our research, in that our experiments can quickly and directly benefit users of our synthesis system. And secondly, in ensuring we have a fully working usable system we can immediately see what problems exist and where our research should be directed rather where our whims take us. These concepts are not unique to Festival. ATR's CHATR system (`black94') follows very much the same philosophy and Festival benefits from the experiences gained in the development of that system. Festival benefits from various pieces of previous work. As well as CHATR, CSTR's previous synthesizers, Osprey and the Polyglot projects influenced many design decisions. Also we are influenced by more general programs in considering software engineering issues, especially GNU Octave and Emacs on which the basic script model was based. Unlike in some other speech and language systems, software engineering is considered very important to the development of Festival. Too often research systems consist of random collections of hacky little scripts and code. No one person can confidently describe the algorithms it performs, as parameters are scattered throughout the system, with tricks and hacks making it impossible to really evaluate why the system is good (or bad). Such systems do not help the advancement of speech technology, except perhaps in pointing at ideas that should be further investigated. If the algorithms and techniques cannot be described externally from the program _such that_ they can reimplemented by others, what is the point of doing the work? Festival offers a common framework where multiple techniques may be implemented (by the same or different researchers) so that they may be tested more fairly in the same environment. As a final word, we'd like to make two short statements which both achieve the same end but unfortunately perhaps not for the same reasons: Good software engineering makes good research easier But the following seems to be true also If you spend enough effort on something it can be shown to be better than its competitors. File: festival.info, Node: Future, Prev: Philosophy, Up: Overview Future ====== Festival is still very much in development. Hopefully this state will continue for a long time. It is never possible to complete software, there are always new things that can make it better. However as time goes on Festival's core architecture will stabilise and little or no changes will be made. Other aspects of the system will gain greater attention such as waveform synthesis modules, intonation techniques, text type dependent analysers etc. Festival will improve, so don't expected it to be the same six months from now. A number of new modules and enhancements are already under consideration at various stages of implementation. The following is a non-exhaustive list of what we may (or may not) add to Festival over the next six months or so. * Selection-based synthesis: Moving away from diphone technology to more generalized selection of units for speech database. * New structure for linguistic content of utterances: Using techniques for Metrical Phonology we are building more structure representations of utterances reflecting there linguistic significance better. This will allow improvements in prosody and unit selection. * Non-prosodic prosodic control: For language generation systems and custom tasks where the speech to be synthesized is being generated by some program, more information about text structure will probably exist, such as phrasing, contrast, key items etc. We are investigating the relationship of high-level tags to prosodic information through the Sole project `http://www.cstr.ed.ac.uk/projects/sole.html' * Dialect independent lexicons: Currently for each new dialect we need a new lexicon, we are currently investigating a form of lexical specification that is dialect independent that allows the core form to be mapped to different dialects. This will make the generation of voices in different dialects much easier. File: festival.info, Node: Installation, Next: Quick start, Prev: Overview, Up: Top Installation ************ This section describes how to install Festival from source in a new location and customize that installation. * Menu: * Requirements:: Software/Hardware requirements for Festival * Configuration:: Setting up compilation * Site initialization:: Settings for your particular site * Checking an installation:: But does it work ... * Y2K:: Comment on Festival and year 2000 File: festival.info, Node: Requirements, Next: Configuration, Up: Installation Requirements ============ In order to compile Festival you first need the following source packages `festival-1.4.2.tar.gz' Festival Speech Synthesis System source `speech_tools-1.2.2.tar.gz' The Edinburgh Speech Tools Library `festlex_NAME.tar.gz' The lexicon distribution, where possible, includes the lexicon input file as well as the compiled form, for your convenience. The lexicons have varying distribution policies, but are all free except OALD, which is only free for non-commercial use (we are working on a free replacement). In some cases only a pointer to an ftp'able file plus a program to convert that file to the Festival format is included. `festvox_NAME.tar.gz' You'll need a speech database. A number are available (with varying distribution policies). Each voice may have other dependencies such as requiring particular lexicons `festdoc_1.4.2.tar.gz' Full postscript, info and html documentation for Festival and the Speech Tools. The source of the documentation is available in the standard distributions but for your conveniences it has been pre-generated. In addition to Festival specific sources you will also need _A UNIX machine_ Currently we have compiled and tested the system under Solaris (2.5(.1), 2.6, 2.7 and 2.8), SunOS (4.1.3), FreeBSD 3.x, 4.x Linux (Redhat 4.1, 5.0, 5.1, 5.2, 6.[012], 7.[01] and other Linux distributions), and it should work under OSF (Dec Alphas) SGI (Irix), HPs (HPUX). But any standard UNIX machine should be acceptable. We have now successfully ported this version to Windows NT nad Windows 95 (using the Cygnus GNU win32 environment). This is still a young port but seems to work. _A C++ compiler_ Note that C++ is not very portable even between different versions of the compiler from the same vendor. Although we've tried very hard to make the system portable, we know it is very unlikely to compile without change except with compilers that have already been tested. The currently tested systems are * Sun Sparc Solaris 2.5, 2.5.1, 2.6, 2.7: GCC 2.7.2, egcs 1.1.1, egcs 1.1.2, GCC 2.95.1 * Sun Sparc SunOS 4.1.3: GCC 2.7.2 * FreeBSD for Intel 3.x and 4.x GCC 2.95.1, GCC 3.0 * Linux for Intel (RedHat 4.1/5.0/5.1/5.2/6.0): GCC 2.7.2, GCC 2.7.2/egcs-1.0.2, egcs 1.1.1, egcs-1.1.2, GCC 2.95.[123], GCC "2.96", GCC 3.0 * Windows NT 4.0: GCC 2.7.2 plus egcs (from Cygnus GNU win32 b19), Visual C++ PRO v5.0, Visual C++ v6.0 Note if GCC works on one version of Unix it usually works on others. We have compiled both the speech tools and Festival under Windows NT 4.0 and Windows 95 using the GNU tools available from Cygnus. `ftp://ftp.cygnus.com/pub/gnu-win32/'. _GNU make_ Due to there being too many different `make' programs out there we have tested the system using GNU make on all systems we use. Others may work but we know GNU make does. _Audio hardware_ You can use Festival without audio output hardware but it doesn't sound very good (though admittedly you can hear less problems with it). A number of audio systems are supported (directly inherited from the audio support in the Edinburgh Speech Tools Library): NCD's NAS (formerly called netaudio) a network transparent audio system (which can be found at `ftp://ftp.x.org/contrib/audio/nas/'); `/dev/audio' (at 8k ulaw and 8/16bit linear), found on Suns, Linux machines and FreeBSD; and a method allowing arbitrary UNIX commands. *Note Audio output::. Earlier versions of Festival mistakenly offered a command line editor interface to the GNU package readline, but due to conflicts with the GNU Public Licence and Festival's licence this interface was removed in version 1.3.1. Even Festival's new free licence would cause problems as readline support would restrict Festival linking with non-free code. A new command line interface based on editline was provided that offers similar functionality. Editline remains a compilation option as it is probably not yet as portable as we would like it to be. In addition to the above, in order to process the documentation you will need `TeX', `dvips' (or similar), GNU's `makeinfo' (part of the texinfo package) and `texi2html' which is available from `http://wwwcn.cern.ch/dci/texi2html/'. However the document files are also available pre-processed into, postscript, DVI, info and html as part of the distribution in `festdoc-1.4.X.tar.gz'. Ensure you have a fully installed and working version of your C++ compiler. Most of the problems people have had in installing Festival have been due to incomplete or bad compiler installation. It might be worth checking if the following program works if you don't know if anyone has used your C++ installation before. #include <iostream.h> int main (int argc, char **argv) { cout << "Hello world\n"; } Unpack all the source files in a new directory. The directory will then contain two subdirectories speech_tools/ festival/ File: festival.info, Node: Configuration, Next: Site initialization, Prev: Requirements, Up: Installation Configuration ============= First ensure you have a compiled version of the Edinburgh Speech Tools Library. See `speech_tools/INSTALL' for instructions. The system now supports the standard GNU `configure' method for set up. In most cases this will automatically configure festival for your particular system. In most cases you need only type gmake and the system will configure itself and conpile, (note you need to have compiled the Edinburgh Speech Tools `speech_tools-1.2.2' first. In some case hand configure is require. All of the configuration choise are held in the file `config/config' For the most part Festival configuration inherits the configuration from your speech tools config file (`../speech_tools/config/config'). Additional optional modules may be added by adding them to the end of your config file e.g. ALSO_INCLUDE += clunits Adding and new module here will treat is as a new directory in the `src/modules/' and compile it into the system in the same way the `OTHER_DIRS' feature was used in previous versions. If the compilation directory being accessed by NFS or if you use an automounter (e.g. amd) it is recommend to explicitly set the variable `FESTIVAL_HOME' in `config/config'. The command `pwd' is not reliable when a directory may have multiple names. There is a simple test suite with Festival but it requires the three basic voices and their respective lexicons install before it will work. Thus you need to install festlex_CMU.tar.gz festlex_OALD.tar.gz festlex_POSLEX.tar.gz festvox_don.tar.gz festvox_kedlpc16k.tar.gz festvox_rablpc16k.tar.gz If these are installed you can test the installation with gmake test To simply make it run with a male US Ebglish voiuce it is sufficient to install just festlex_CMU.tar.gz festlex_POSLEX.tar.gz festvox_kallpc16k.tar.gz Note that the single most common reason for problems in compilation and linking found amongst the beta testers was a bad installation of GNU C++. If you get many strange errors in G++ library header files or link errors it is worth checking that your system has the compiler, header files and runtime libraries properly installed. This may be checked by compiling a simple program under C++ and also finding out if anyone at your site has ever used the installation. Most of these installation problems are caused by upgrading to a newer version of libg++ without removing the older version so a mixed version of the `.h' files exist. Although we have tried very hard to ensure that Festival compiles with no warnings this is not possible under some systems. Under SunOS the system include files do not declare a number of system provided functions. This a bug in Sun's include files. This will causes warnings like "implicit definition of fprintf". These are harmless. Under Linux a warning at link time about reducing the size of some symbols often is produced. This is harmless. There is often occasional warnings about some socket system function having an incorrect argument type, this is also harmless. The speech tools and festival compile under Windows95 or Windows NT with Visual C++ v5.0 using the Microsoft `nmake' make program. We've only done this with the Professonal edition, but have no reason to believe that it relies on anything not in the standard edition. In accordance to VC++ conventions, object files are created with extension .obj, executables with extension .exe and libraries with extension .lib. This may mean that both unix and Win32 versions can be built in the same directory tree, but I wouldn't rely on it. To do this you require nmake Makefiles for the system. These can be generated from the gnumake Makefiles, using the command gnumake VCMakefile in the speech_tools and festival directories. I have only done this under unix, it's possible it would work under the cygnus gnuwin32 system. If `make.depend' files exist (i.e. if you have done `gnumake depend' in unix) equivalent `vc_make.depend' files will be created, if not the VCMakefiles will not contain dependency information for the `.cc' files. The result will be that you can compile the system once, but changes will not cause the correct things to be rebuilt. In order to compile from the DOS command line using Visual C++ you need to have a collection of environment variables set. In Windows NT there is an instalation option for Visual C++ which sets these globally. Under Windows95 or if you don't ask for them to be set globally under NT you need to run vcvars32.bat See the VC++ documentation for more details. Once you have the source trees with VCMakefiles somewhere visible from Windows, you need to copy `peech_tools\config\vc_config-dist' to `speech_tools\config\vc_config' and edit it to suit your local situation. Then do the same with `festival\config\vc_config-dist'. The thing most likely to need changing is the definition of `FESTIVAL_HOME' in `festival\config\vc_config_make_rules' which needs to point to where you have put festival. Now you can compile. cd to the speech_tools directory and do nmake /nologo /fVCMakefile and the library, the programs in main and the test programs should be compiled. The tests can't be run automatically under Windows. A simple test to check that things are probably OK is: main\na_play testsuite\data\ch_wave.wav which reads and plays a waveform. Next go into the festival directory and do nmake /nologo /fVCMakefile to build festival. When it's finished, and assuming you have the voices and lexicons unpacked in the right place, festival should run just as under unix. We should remind you that the NT/95 ports are still young and there may yet be problems that we've not found yet. We only recommend the use the speech tools and Festival under Windows if you have significant experience in C++ under those platforms. Most of the modules `src/modules' are actually optional and the system could be compiled without them. The basic set could be reduced further if certain facilities are not desired. Particularly: `donovan' which is only required if the donovan voice is used; `rxp' if no XML parsing is required (e.g. Sable); and `parser' if no stochastic paring is required (this parser isn't used for any of our currently released voices). Actually even `UniSyn' and `UniSyn_diphone' could be removed if some external waveform synthesizer is being used (e.g. MBROLA) or some alternative one like `OGIresLPC'. Removing unused modules will make the festival binary smaller and (potentially) start up faster but don't expect too much. You can delete these by changing the `BASE_DIRS' variable in `src/modules/Makefile'. File: festival.info, Node: Site initialization, Next: Checking an installation, Prev: Configuration, Up: Installation Site initialization =================== Once compiled Festival may be further customized for particular sites. At start up time Festival loads the file `init.scm' from its library directory. This file further loads other necessary files such as phoneset descriptions, duration parameters, intonation parameters, definitions of voices etc. It will also load the files `sitevars.scm' and `siteinit.scm' if they exist. `sitevars.scm' is loaded after the basic Scheme library functions are loaded but before any of the festival related functions are loaded. This file is intended to set various path names before various subsystems are loaded. Typically variables such as `lexdir' (the directory where the lexicons are held), and `voices_dir' (pointing to voice directories) should be reset here if necessary. The default installation will try to find its lexicons and voices automatically based on the value of `load-path' (this is derived from `FESTIVAL_HOME' at compilation time or by using the `--libdir' at run-time). If the voices and lexicons have been unpacked into subdirectories of the library directory (the default) then no site specific initialization of the above pathnames will be necessary. The second site specific file is `siteinit.scm'. Typical examples of local initialization are as follows. The default audio output method is NCD's NAS system if that is supported as that's what we use normally in CSTR. If it is not supported, any hardware specific mode is the default (e.g. sun16audio, freebas16audio, linux16audio or mplayeraudio). But that default is just a setting in `init.scm'. If for example in your environment you may wish the default audio output method to be 8k mulaw through `/dev/audio' you should add the following line to your `siteinit.scm' file (Parameter.set 'Audio_Method 'sunaudio) Note the use of `Parameter.set' rather than `Parameter.def' the second function will not reset the value if it is already set. Remember that you may use the audio methods `sun16audio'. `linux16audio' or `freebsd16audio' only if `NATIVE_AUDIO' was selected in `speech_tools/config/config' and your are on such machines. The Festival variable `*modules*' contains a list of all supported functions/modules in a particular installation including audio support. Check the value of that variable if things aren't what you expect. If you are installing on a machine whose audio is not directly supported by the speech tools library, an external command may be executed to play a waveform. The following example is for an imaginary machine that can play audio files through a program called `adplay' with arguments for sample rate and file type. When playing waveforms, Festival, by default, outputs as unheadered waveform in native byte order. In this example you would set up the default audio playing mechanism in `siteinit.scm' as follows (Parameter.set 'Audio_Method 'Audio_Command) (Parameter.set 'Audio_Command "adplay -raw -r $SR $FILE") For `Audio_Command' method of playing waveforms Festival supports two additional audio parameters. `Audio_Required_Rate' allows you to use Festivals internal sample rate conversion function to any desired rate. Note this may not be as good as playing the waveform at the sample rate it is originally created in, but as some hardware devices are restrictive in what sample rates they support, or have naive resample functions this could be optimal. The second addition audio parameter is `Audio_Required_Format' which can be used to specify the desired output forms of the file. The default is unheadered raw, but this may be any of the values supported by the speech tools (including nist, esps, snd, riff, aiff, audlab, raw and, if you really want it, ascii). For example suppose you run Festival on a remote machine and are not running any network audio system and want Festival to copy files back to your local machine and simply cat them to `/dev/audio'. The following would do that (assuming permissions for rsh are allowed). (Parameter.set 'Audio_Method 'Audio_Command) ;; Make output file ulaw 8k (format ulaw implies 8k) (Parameter.set 'Audio_Required_Format 'ulaw) (Parameter.set 'Audio_Command "userhost=`echo $DISPLAY | sed 's/:.*$//'`; rcp $FILE $userhost:$FILE; \ rsh $userhost \"cat $FILE >/dev/audio\" ; rsh $userhost \"rm $FILE\"") Note there are limits on how complex a command you want to put in the `Audio_Command' string directly. It can get very confusing with respect to quoting. It is therefore recommended that once you get past a certain complexity consider writing a simple shell script and calling it from the `Audio_Command' string. A second typical customization is setting the default speaker. Speakers depend on many things but due to various licence (and resource) restrictions you may only have some diphone/nphone databases available in your installation. The function name that is the value of `voice_default' is called immediately after `siteinit.scm' is loaded offering the opportunity for you to change it. In the standard distribution no change should be required. If you download all the distributed voices `voice_rab_diphone' is the default voice. You may change this for a site by adding the following to `siteinit.scm' or per person by changing your `.festivalrc'. For example if you wish to change the default voice to the American one `voice_ked_diphone' (set! voice_default 'voice_ked_diphone) Note the single quote, and note that unlike in early versions `voice_default' is not a function you can call directly. A second level of customization is on a per user basis. After loading `init.scm', which includes `sitevars.scm' and `siteinit.scm' for local installation, Festival loads the file `.festivalrc' from the user's home directory (if it exists). This file may contain arbitrary Festival commands. File: festival.info, Node: Checking an installation, Next: Y2K, Prev: Site initialization, Up: Installation Checking an installation ======================== Once compiled and site initialization is set up you should test to see if Festival can speak or not. Start the system $ bin/festival Festival Speech Synthesis System 1.4.2:release July 2001 Copyright (C) University of Edinburgh, 1996-2001. All rights reserved. For details type `(festival_warranty)' festival> ^D If errors occur at this stage they are most likely to do with pathname problems. If any error messages are printed about non-existent files check that those pathnames point to where you intended them to be. Most of the (default) pathnames are dependent on the basic library path. Ensure that is correct. To find out what it has been set to, start the system without loading the init files. $ bin/festival -q Festival Speech Synthesis System 1.4.2:release July 2001 Copyright (C) University of Edinburgh, 1996-2001. All rights reserved. For details type `(festival_warranty)' festival> libdir "/projects/festival/lib/" festival> ^D This should show the pathname you set in your `config/config'. If the system starts with no errors try to synthesize something festival> (SayText "hello world") Some files are only accessed at synthesis time so this may show up other problem pathnames. If it talks, you're in business, if it doesn't, here are some possible problems. If you get the error message Can't access NAS server You have selected NAS as the audio output but have no server running on that machine or your `DISPLAY' or `AUDIOSERVER' environment variable is not set properly for your output device. Either set these properly or change the audio output device in `lib/siteinit.scm' as described above. Ensure your audio device actually works the way you think it does. On Suns, the audio output device can be switched into a number of different output modes, speaker, jack, headphones. If this is set to the wrong one you may not hear the output. Use one of Sun's tools to change this (try `/usr/demo/SOUND/bin/soundtool'). Try to find an audio file independent of Festival and get it to play on your audio. Once you have done that ensure that the audio output method set in Festival matches that. Once you have got it talking, test the audio spooling device. festival> (intro) This plays a short introduction of two sentences, spooling the audio output. Finally exit from Festival (by end of file or `(quit)') and test the script mode with. $ examples/saytime A test suite is included with Festival but it makes certain assumptions about which voices are installed. It assumes that `voice_rab_diphone' (`festvox_rabxxxx.tar.gz') is the default voice and that `voice_ked_diphone' and `voice_don_diphone' (`festvox_kedxxxx.tar.gz' and `festvox_don.tar.gz') are installed. Also local settings in your `festival/lib/siteinit.scm' may affect these tests. However, after installation it may be worth trying gnumake test from the `festival/' directory. This will do various tests including basic utterance tests and tokenization tests. It also checks that voices are installed and that they don't interfere with each other. These tests are primarily regression tests for the developers of Festival, to ensure new enhancements don't mess up existing supported features. They are not designed to test an installation is successful, though if they run correctly it is most probable the installation has worked. File: festival.info, Node: Y2K, Prev: Checking an installation, Up: Installation Y2K === Festival comes with _no_ warranty therefore we will not make any legal statement about the performance of the system. However a number of people have ask about Festival and Y2K compliance, and we have decided to make some comments on this. Every effort has been made to ensure that Festival will continue running as before into the next millenium. However even if Festival itself has no problems it is dependent on the operating system environment it is running in. During compilation dates on files are important and the compilation process may not work if your machine cannot assign (reasonable) dates to new files. At run time there is less dependence on system dates and times. Specifically times are used in generation of random numbers (where only relative time is important) and as time stamps in log files when festival runs in server mode, thus we feel it is unlikely there will be any problems. However, as a speech synthesizer, Festival must make explicit decisions about the pronunciation of dates in the next two decades when people themselves have not yet made such decisions. Most people are still unsure how to read years written as '01, '04, '12, 00s, 10s, (cf. '86, 90s). It is interesting to note that while there is a convenient short name for the last decade of the twentieth century, the "ninties" there is no equivalent name for the first decade of the twenty-first century (or the second). In the mean time we have made reasonable decisions about such pronunciations. Once people have themselves become Y2K compliant and decided what to actually call these years, if their choices are different from how Festival pronounces them we reserve the right to change how Festival speaks these dates to match their belated decisions. However as we do not give out warranties about compliance we will not be requiring our users to return signed Y2K compliant warranties about their own compliance either. File: festival.info, Node: Quick start, Next: Scheme, Prev: Installation, Up: Top Quick start *********** This section is for those who just want to know the absolute basics to run the system. Festival works in two fundamental modes, _command mode_ and _text-to-speech mode_ (tts-mode). In command mode, information (in files or through standard input) is treated as commands and is interpreted by a Scheme interpreter. In tts-mode, information (in files or through standard input) is treated as text to be rendered as speech. The default mode is command mode, though this may change in later versions. * Menu: * Basic command line options:: * Simple command driven session:: * Getting some help::