This is festival.info, produced by Makeinfo version 3.12h from festival.texi. This file documents the `Festival' Speech Synthesis System a general text to speech system for making your computer talk and developing new synthesis techniques. Copyright (C) 1996-2001 University of Edinburgh Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the authors. File: festival.info, Node: Basic command line options, Next: Simple command driven session, Up: Quick start Basic command line options ========================== Festival's basic calling method is as festival [options] file1 file2 ... Options may be any of the following `-q' start Festival without loading `init.scm' or user's `.festivalrc' `-b' `--batch' After processing any file arguments do not become interactive `-i' `--interactive' After processing file arguments become interactive. This option overrides any batch argument. `--tts' Treat file arguments in text-to-speech mode, causing them to be rendered as speech rather than interpreted as commands. When selected in interactive mode the command line edit functions are not available `--command' Treat file arguments in command mode. This is the default. `--language LANG' Set the default language to LANG. Currently LANG may be one of `english', `spanish' or `welsh' (depending on what voices are actually available in your installation). `--server' After loading any specified files go into server mode. This is a mode where Festival waits for clients on a known port (the value of `server_port', default is 1314). Connected clients may send commands (or text) to the server and expect waveforms back. *Note Server/client API::. Note server mode may be unsafe and allow unauthorised access to your machine, be sure to read the security recommendations in *Note Server/client API:: `--script scriptfile' Run scriptfile as a Festival script file. This is similar to to `--batch' but it encapsulates the command line arguments into the Scheme variables `argv' and `argc', so that Festival scripts may process their command line arguments just like any other program. It also does not load the the basic initialisation files as sometimes you may not want to do this. If you wish them, you should copy the loading sequence from an example Festival script like `festival/examples/saytext'. `--heap NUMBER' The Scheme heap (basic number of Lisp cells) is of a fixed size and cannot be dynamically increased at run time (this would complicate garbage collection). The default size is 210000 which seems to be more than adequate for most work. In some of our training experiments where very large list structures are required it is necessary to increase this. Note there is a trade off between size of the heap and time it takes to garbage collect so making this unnecessarily big is not a good idea. If you don't understand the above explanation you almost certainly don't need to use the option. In command mode, if the file name starts with a left parenthesis, the name itself is read and evaluated as a Lisp command. This is often convenient when running in batch mode and a simple command is necessary to start the whole thing off after loading in some other specific files. File: festival.info, Node: Simple command driven session, Next: Getting some help, Prev: Basic command line options, Up: Quick start Sample command driven session ============================= Here is a short session using Festival's command interpreter. Start Festival with no arguments $ festival Festival Speech Synthesis System 1.4.2:release July 2001 Copyright (C) University of Edinburgh, 1996-2001. All rights reserved. For details type `(festival_warranty)' festival> Festival uses the a command line editor based on editline for terminal input so command line editing may be done with Emacs commands. Festival also supports history as well as function, variable name, and file name completion via the <TAB> key. Typing `help' will give you more information, that is `help' without any parenthesis. (It is actually a variable name whose value is a string containing help.) Festival offers what is called a read-eval-print loop, because it reads an s-expression (atom or list), evaluates it and prints the result. As Festival includes the SIOD Scheme interpreter most standard Scheme commands work festival> (car '(a d)) a festival> (+ 34 52) 86 In addition to standard Scheme commands a number of commands specific to speech synthesis are included. Although, as we will see, there are simpler methods for getting Festival to speak, here are the basic underlying explicit functions used in synthesizing an utterance. Utterances can consist of various types (*Note Utterance types::), but the simplest form is plain text. We can create an utterance and save it in a variable festival> (set! utt1 (Utterance Text "Hello world")) #<Utterance 1d08a0> festival> The (hex) number in the return value may be different for your installation. That is the print form for utterances. Their internal structure can be very large so only a token form is printed. Although this creates an utterance it doesn't do anything else. To get a waveform you must synthesize it. festival> (utt.synth utt1) #<Utterance 1d08a0> festival> This calls various modules, including tokenizing, duration,. intonation etc. Which modules are called are defined with respect to the type of the utterance, in this case `Text'. It is possible to individually call the modules by hand but you just wanted it to talk didn't you. So festival> (utt.play utt1) #<Utterance 1d08a0> festival> will send the synthesized waveform to your audio device. You should hear "Hello world" from your machine. To make this all easier a small function doing these three steps exists. `SayText' simply takes a string of text, synthesizes it and sends it to the audio device. festival> (SayText "Good morning, welcome to Festival") #<Utterance 1d8fd0> festival> Of course as history and command line editing are supported <c-p> or up-arrow will allow you to edit the above to whatever you wish. Festival may also synthesize from files rather than simply text. festival> (tts "myfile" nil) nil festival> The end of file character <c-d> will exit from Festival and return you to the shell, alternatively the command `quit' may be called (don't forget the parentheses). Rather than starting the command interpreter, Festival may synthesize files specified on the command line unix$ festival --tts myfile unix$ Sometimes a simple waveform is required from text that is to be kept and played at some later time. The simplest way to do this with festival is by using the `text2wave' program. This is a festival script that will take a file (or text from standard input) and produce a single waveform. An example use is text2wave myfile.txt -o myfile.wav Options exist to specify the waveform file type, for example if Sun audio format is required text2wave myfile.txt -otype snd -o myfile.wav Use `-h' on `text2wave' to see all options. File: festival.info, Node: Getting some help, Prev: Simple command driven session, Up: Quick start Getting some help ================= If no audio is generated then you must check to see if audio is properly initialized on your machine. *Note Audio output::. In the command interpreter <m-h> (meta-h) will give you help on the current symbol before the cursor. This will be a short description of the function or variable, how to use it and what its arguments are. A listing of all such help strings appears at the end of this document. <m-s> will synthesize and say the same information, but this extra function is really just for show. The lisp function `manual' will send the appropriate command to an already running Netscape browser process. If `nil' is given as an argument the browser will be directed to the tables of contents of the manual. If a non-nil value is given it is assumed to be a section title and that section is searched and if found displayed. For example festival> (manual "Accessing an utterance") Another related function is `manual-sym' which given a symbol will check its documentation string for a cross reference to a manual section and request Netscape to display it. This function is bound to <m-m> and will display the appropriate section for the given symbol. Note also that the <TAB> key can be used to find out the name of commands available as can the function `Help' (remember the parentheses). For more up to date information on Festival regularly check the Festival Home Page at `http://www.cstr.ed.ac.uk/projects/festival.html' Further help is available by mailing questions to festival-help@cstr.ed.ac.uk Although we cannot guarantee the time required to answer you, we will do our best to offer help. Bug reports should be submitted to festival-bug@cstr.ed.ac.uk If there is enough user traffic a general mailing list will be created so all users may share comments and receive announcements. In the mean time watch the Festival Home Page for news. File: festival.info, Node: Scheme, Next: TTS, Prev: Quick start, Up: Top Scheme ****** Many people seem daunted by the fact that Festival uses Scheme as its scripting language and feel they can't use Festival because they don't know Scheme. However most of those same people use Emacs everyday which also has (a much more complex) Lisp system underneath. The number of Scheme commands you actually need to know in Festival is really very small and you can easily just find out as you go along. Also people use the Unix shell often but only know a small fraction of actual commands available in the shell (or in fact that there even is a distinction between shell builtin commands and user definable ones). So take it easy, you'll learn the commands you need fairly quickly. * Menu: * Scheme references:: Places to learn more about Scheme * Scheme fundamentals:: Syntax and semantics * Scheme Festival specifics:: * Scheme I/O:: File: festival.info, Node: Scheme references, Next: Scheme fundamentals, Up: Scheme Scheme references ================= If you wish to learn about Scheme in more detail I recommend the book `abelson85'. The Emacs Lisp documentation is reasonable as it is comprehensive and many of the underlying uses of Scheme in Festival were influenced by Emacs. Emacs Lisp however is not Scheme so there are some differences. Other Scheme tutorials and resources available on the Web are * The Revised Revised Revised Revised Scheme Report, the document defining the language is available from `http://tinuviel.cs.wcu.edu/res/ldp/r4rs-html/r4rs_toc.html' * a Scheme tutorials from the net: * `http://www.cs.uoregon.edu/classes/cis425/schemeTutorial.html' * the Scheme FAQ * `http://www.landfield.com/faqs/scheme-faq/part1/' File: festival.info, Node: Scheme fundamentals, Next: Scheme Festival specifics, Prev: Scheme references, Up: Scheme Scheme fundamentals =================== But you want more now, don't you, not just be referred to some other book. OK here goes. _Syntax_: an expression is an _atom_ or a _list_. A list consists of a left paren, a number of expressions and right paren. Atoms can be symbols, numbers, strings or other special types like functions, hash tables, arrays, etc. _Semantics_: All expressions can be evaluated. Lists are evaluated as function calls. When evaluating a list all the members of the list are evaluated first then the first item (a function) is called with the remaining items in the list as arguments. Atoms are evaluated depending on their type: symbols are evaluated as variables returning their values. Numbers, strings, functions, etc. evaluate to themselves. Comments are started by a semicolon and run until end of line. And that's it. There is nothing more to the language that. But just in case you can't follow the consequences of that, here are some key examples. festival> (+ 2 3) 5 festival> (set! a 4) 4 festival> (* 3 a) 12 festival> (define (add a b) (+ a b)) #<CLOSURE (a b) (+ a b)> festival> (add 3 4) 7 festival> (set! alist '(apples pears bananas)) (apples pears bananas) festival> (car alist) apples festival> (cdr alist) (pears bananas) festival> (set! blist (cons 'oranges alist)) (oranges apples pears bananas) festival> (append alist blist) (apples pears bananas oranges apples pears bananas) festival> (cons alist blist) ((apples pears bananas) oranges apples pears bananas) festival> (length alist) 3 festival> (length (append alist blist)) 7 File: festival.info, Node: Scheme Festival specifics, Next: Scheme I/O, Prev: Scheme fundamentals, Up: Scheme Scheme Festival specifics ========================= There a number of additions to SIOD that are Festival specific though still part of the Lisp system rather than the synthesis functions per se. By convention if the first statement of a function is a string, it is treated as a documentation string. The string will be printed when help is requested for that function symbol. In interactive mode if the function `:backtrace' is called (within parenthesis) the previous stack trace is displayed. Calling `:backtrace' with a numeric argument will display that particular stack frame in full. Note that any command other than `:backtrace' will reset the trace. You may optionally call (set_backtrace t) Which will cause a backtrace to be displayed whenever a Scheme error occurs. This can be put in your `.festivalrc' if you wish. This is especially useful when running Festival in non-interactive mode (batch or script mode) so that more information is printed when an error occurs. A _hook_ in Lisp terms is a position within some piece of code where a user may specify their own customization. The notion is used heavily in Emacs. In Festival there a number of places where hooks are used. A hook variable contains either a function or list of functions that are to be applied at some point in the processing. For example the `after_synth_hooks' are applied after synthesis has been applied to allow specific customization such as resampling or modification of the gain of the synthesized waveform. The Scheme function `apply_hooks' takes a hook variable as argument and an object and applies the function/list of functions in turn to the object. When an error occurs in either Scheme or within the C++ part of Festival by default the system jumps to the top level, resets itself and continues. Note that errors are usually serious things, pointing to bugs in parameters or code. Every effort has been made to ensure that the processing of text never causes errors in Festival. However when using Festival as a development system it is often that errors occur in code. Sometimes in writing Scheme code you know there is a potential for an error but you wish to ignore that and continue on to the next thing without exiting or stopping and returning to the top level. For example you are processing a number of utterances from a database and some files containing the descriptions have errors in them but you want your processing to continue through every utterance that can be processed rather than stopping 5 minutes after you gone home after setting a big batch job for overnight. Festival's Scheme provides the function `unwind-protect' which allows the catching of errors and then continuing normally. For example suppose you have the function `process_utt' which takes a filename and does things which you know might cause an error. You can write the following to ensure you continue processing even in an error occurs. (unwind-protect (process_utt filename) (begin (format t "Error found in processing %s\n" filename) (format t "continuing\n"))) The `unwind-protect' function takes two arguments. The first is evaluated and if no error occurs the value returned from that expression is returned. If an error does occur while evaluating the first expression, the second expression is evaluated. `unwind-protect' may be used recursively. Note that all files opened while evaluating the first expression are closed if an error occurs. All global variables outside the scope of the `unwind-protect' will be left as they were set up until the error. Care should be taken in using this function but its power is necessary to be able to write robust Scheme code. File: festival.info, Node: Scheme I/O, Prev: Scheme Festival specifics, Up: Scheme Scheme I/O ========== Different Scheme's may have quite different implementations of file i/o functions so in this section we will describe the basic functions in Festival SIOD regarding i/o. Simple printing to the screen may be achieved with the function `print' which prints the given s-expression to the screen. The printed form is preceded by a new line. This is often useful for debugging but isn't really powerful enough for much else. Files may be opened and closed and referred to file descriptors in a direct analogy to C's stdio library. The SIOD functions `fopen' and `fclose' work in the exactly the same way as their equivalently named partners in C. The `format' command follows the command of the same name in Emacs and a number of other Lisps. C programmers can think of it as `fprintf'. `format' takes a file descriptor, format string and arguments to print. The file description may be a file descriptor as returned by the Scheme function `fopen', it may also be `t' which means the output will be directed as standard out (cf. `printf'). A third possibility is `nil' which will cause the output to printed to a string which is returned (cf. `sprintf'). The format string closely follows the format strings in ANSI C, but it is not the same. Specifically the directives currently supported are, `%%', `%d', `%x', `%s', `%f', `%g' and `%c'. All modifiers for these are also supported. In addition `%l' is provided for printing of Scheme objects as objects. For example (format t "%03d %3.4f %s %l %l %l\n" 23 23 "abc" "abc" '(a b d) utt1) will produce 023 23.0000 abc "abc" (a b d) #<Utterance 32f228> on standard output. When large lisp expressions are printed they are difficult to read because of the parentheses. The function `pprintf' prints an expression to a file description (or `t' for standard out). It prints so the s-expression is nicely lined up and indented. This is often called pretty printing in Lisps. For reading input from terminal or file, there is currently no equivalent to `scanf'. Items may only be read as Scheme expressions. The command (load FILENAME t) will load all s-expressions in `FILENAME' and return them, unevaluated as a list. Without the third argument the `load' function will load and evaluate each s-expression in the file. To read individual s-expressions use `readfp'. For example (let ((fd (fopen trainfile "r")) (entry) (count 0)) (while (not (equal? (set! entry (readfp fd)) (eof-val))) (if (string-equal (car entry) "home") (set! count (+ 1 count)))) (fclose fd)) To convert a symbol whose print name is a number to a number use `parse-number'. This is the equivalent to `atof' in C. Note that, all i/o from Scheme input files is assumed to be basically some form of Scheme data (though can be just numbers, tokens). For more elaborate analysis of incoming data it is possible to use the text tokenization functions which offer a fully programmable method of reading data. File: festival.info, Node: TTS, Next: XML/SGML mark-up, Prev: Scheme, Up: Top TTS *** Festival supports text to speech for raw text files. If you are not interested in using Festival in any other way except as black box for rendering text as speech, the following method is probably what you want. festival --tts myfile This will say the contents of `myfile'. Alternatively text may be submitted on standard input echo hello world | festival --tts cat myfile | festival --tts Festival supports the notion of _text modes_ where the text file type may be identified, allowing Festival to process the file in an appropriate way. Currently only two types are considered stable: `STML' and `raw', but other types such as `email', `HTML', `Latex', etc. are being developed and discussed below. This follows the idea of buffer modes in Emacs where a file's type can be utilized to best display the text. Text mode may also be selected based on a filename's extension. Within the command interpreter the function `tts' is used to render files as text; it takes a filename and the text mode as arguments. * Menu: * Utterance chunking:: From text to utterances * Text modes:: Mode specific text analysis * Example text mode:: An example mode for reading email File: festival.info, Node: Utterance chunking, Next: Text modes, Up: TTS Utterance chunking ================== Text to speech works by first tokenizing the file and chunking the tokens into utterances. The definition of utterance breaks is determined by the utterance tree in variable `eou_tree'. A default version is given in `lib/tts.scm'. This uses a decision tree to determine what signifies an utterance break. Obviously blank lines are probably the most reliable, followed by certain punctuation. The confusion of the use of periods for both sentence breaks and abbreviations requires some more heuristics to best guess their different use. The following tree is currently used which works better than simply using punctuation. (defvar eou_tree '((n.whitespace matches ".*\n.*\n\\(.\\|\n\\)*") ;; 2 or more newlines ((1)) ((punc in ("?" ":" "!")) ((1)) ((punc is ".") ;; This is to distinguish abbreviations vs periods ;; These are heuristics ((name matches "\\(.*\\..*\\|[A-Z][A-Za-z]?[A-Za-z]?\\|etc\\)") ((n.whitespace is " ") ((0)) ;; if abbrev single space isn't enough for break ((n.name matches "[A-Z].*") ((1)) ((0)))) ((n.whitespace is " ") ;; if it doesn't look like an abbreviation ((n.name matches "[A-Z].*") ;; single space and non-cap is no break ((1)) ((0))) ((1)))) ((0))))) The token items this is applied to will always (except in the end of file case) include one following token, so look ahead is possible. The "n." and "p." and "p.p." prefixes allow access to the surrounding token context. The features `name', `whitespace' and `punc' allow access to the contents of the token itself. At present there is no way to access the lexicon form this tree which unfortunately might be useful if certain abbreviations were identified as such there. Note these are heuristics and written by hand not trained from data, though problems have been fixed as they have been observed in data. The above rules may make mistakes where abbreviations appear at end of lines, and when improper spacing and capitalization is used. This is probably worth changing, for modes where more casual text appears, such as email messages and USENET news messages. A possible improvement could be made by analysing a text to find out its basic threshold of utterance break (i.e. if no full stop, two spaces, followed by a capitalized word sequences appear and the text is of a reasonable length then look for other criteria for utterance breaks). Ultimately what we are trying to do is to chunk the text into utterances that can be synthesized quickly and start to play them quickly to minimise the time someone has to wait for the first sound when starting synthesis. Thus it would be better if this chunking were done on _prosodic phrases_ rather than chunks more similar to linguistic sentences. Prosodic phrases are bounded in size, while sentences are not. File: festival.info, Node: Text modes, Next: Example text mode, Prev: Utterance chunking, Up: TTS Text modes ========== We do not believe that all texts are of the same type. Often information about the general contents of file will aid synthesis greatly. For example in Latex files we do not want to here "left brace, backslash e m" before each emphasized word, nor do we want to necessarily hear formating commands. Festival offers a basic method for specifying customization rules depending on the _mode_ of the text. By type we are following the notion of modes in Emacs and eventually will allow customization at a similar level. Modes are specified as the third argument to the function `tts'. When using the Emacs interface to Festival the buffer mode is automatically passed as the text mode. If the mode is not supported a warning message is printed and the raw text mode is used. Our initial text mode implementation allows configuration both in C++ and in Scheme. Obviously in C++ almost anything can be done but it is not as easy to reconfigure without recompilation. Here we will discuss those modes which can be fully configured at run time. A text mode may contain the following _filter_ A Unix shell program filter that processes the text file in some appropriate way. For example for email it might remove uninteresting headers and just output the subject, from line and the message body. If not specified, an identity filter is used. _init_function_ This (Scheme) function will be called before any processing will be done. It allows further set up of tokenization rules and voices etc. _exit_function_ This (Scheme) function will be called at the end of any processing allowing reseting of tokenization rules etc. _analysis_mode_ If analysis mode is `xml' the file is read through the built in XML parser `rxp'. Alternatively if analysis mode is `xxml' the filter should an SGML normalising parser and the output is processed in a way suitable for it. Any other value is ignored. These mode specific parameters are specified in the a-list held in `tts_text_modes'. When using Festival in Emacs the emacs buffer mode is passed to Festival as the text mode. Note that above mechanism is not really designed to be re-entrant, this should be addressed in later versions. Following the use of auto-selection of mode in Emacs, Festival can auto-select the text mode based on the filename given when no explicit mode is given. The Lisp variable `auto-text-mode-alist' is a list of dotted pairs of regular expression and mode name. For example to specify that the `email' mode is to be used for files ending in `.email' we would add to the current `auto-text-mode-alist' as follows (set! auto-text-mode-alist (cons (cons "\\.email$" 'email) auto-text-mode-alist)) If the function `tts' is called with a mode other than `nil' that mode overrides any specified by the `auto-text-mode-alist'. The mode `fundamental' is the explicit "null" mode, it is used when no mode is specified in the function `tts', and match is found in `auto-text-mode-alist' or the specified mode is not found. By convention if a requested text model is not found in `tts_text_modes' the file `MODENAME-mode' will be `required'. Therefore if you have the file `MODENAME-mode.scm' in your library then it will be automatically loaded on reference. Modes may be quite large and it is not necessary have Festival load them all at start up time. Because of the `auto-text-mode-alist' and the auto loading of currently undefined text modes you can use Festival like festival --tts example.email Festival with automatically synthesize `example.email' in text mode `email'. If you add your own personal text modes you should do the following. Suppose you've written an HTML mode. You have named it `html-mode.scm' and put it in `/home/awb/lib/festival/'. In your `.festivalrc' first identify you're personal Festival library directory by adding it to `lib-path'. (set! lib-path (cons "/home/awb/lib/festival/" lib-path)) Then add the definition to the `auto-text-mode-alist' that file names ending `.html' or `.htm' should be read in HTML mode. (set! auto-text-mode-alist (cons (cons "\\.html?$" 'html) auto-text-mode-alist)) Then you may synthesize an HTML file either from Scheme (tts "example.html" nil) Or from the shell command line festival --tts example.html Anyone familiar with modes in Emacs should recognise that the process of adding a new text mode to Festival is very similar to adding a new buffer mode to Emacs. File: festival.info, Node: Example text mode, Prev: Text modes, Up: TTS Example text mode ================= Here is a short example of a tts mode for reading email messages. It is by no means complete but is a start at showing how you can customize tts modes without writing new C++ code. The first task is to define a filter that will take a saved mail message and remove extraneous headers and just leave the from line, subject and body of the message. The filter program is given a file name as its first argument and should output the result on standard out. For our purposes we will do this as a shell script. #!/bin/sh # Email filter for Festival tts mode # usage: email_filter mail_message >tidied_mail_message grep "^From: " $1 echo grep "^Subject: " $1 echo # delete up to first blank line (i.e. the header) sed '1,/^$/ d' $1 Next we define the email init function, which will be called when we start this mode. What we will do is save the current token to words function and slot in our own new one. We can then restore the previous one when we exit. (define (email_init_func) "Called on starting email text mode." (set! email_previous_t2w_func token_to_words) (set! english_token_to_words email_token_to_words) (set! token_to_words email_token_to_words)) Note that _both_ `english_token_to_words' and `token_to_words' should be set to ensure that our new token to word function is still used when we change voices. The corresponding end function puts the token to words function back. (define (email_exit_func) "Called on exit email text mode." (set! english_token_to_words email_previous_t2w_func) (set! token_to_words email_previous_t2w_func)) Now we can define the email specific token to words function. In this example we deal with two specific cases. First we deal with the common form of email addresses so that the angle brackets are not pronounced. The second points are to recognise quoted text and immediately change the the speaker to the alternative speaker. (define (email_token_to_words token name) "Email specific token to word rules." (cond This first condition identifies the token as a bracketed email address and removes the brackets and splits the token into name and IP address. Note that we recursively call the function `email_previous_t2w_func' on the email name and IP address so that they will be pronounced properly. Note that because that function returns a _list_ of words we need to append them together. ((string-matches name "<.*.*>") (append (email_previous_t2w_func token (string-after (string-before name "@") "<")) (cons "at" (email_previous_t2w_func token (string-before (string-after name "@") ">"))))) Our next condition deals with identifying a greater than sign being used as a quote marker. When we detect this we select the alternative speaker, even though it may already be selected. We then return no words so the quote marker is not spoken. The following condition finds greater than signs which are the first token on a line. ((and (string-matches name ">") (string-matches (item.feat token "whitespace") "[ \t\n]*\n *")) (voice_don_diphone) nil ;; return nothing to say ) If it doesn't match any of these we can go ahead and use the builtin token to words function Actually, we call the function that was set before we entered this mode to ensure any other specific rules still remain. But before that we need to check if we've had a newline with doesn't start with a greater than sign. In that case we switch back to the primary speaker. (t ;; for all other cases (if (string-matches (item.feat token "whitespace") ".*\n[ \t\n]*") (voice_rab_diphone)) (email_previous_t2w_func token name)))) In addition to these we have to actually declare the text mode. This we do by adding to any existing modes as follows. (set! tts_text_modes (cons (list 'email ;; mode name (list ;; email mode params (list 'init_func email_init_func) (list 'exit_func email_exit_func) '(filter "email_filter"))) tts_text_modes)) This will now allow simple email messages to be dealt with in a mode specific way. An example mail message is included in `examples/ex1.email'. To hear the result of the above text mode start Festival, load in the email mode descriptions, and call TTS on the example file. (tts ".../examples/ex1.email" 'email) The above is very short of a real email mode but does illustrate how one might go about building one. It should be reiterated that text modes are new in Festival and their most effective form has not been discovered yet. This will improve with time and experience. File: festival.info, Node: XML/SGML mark-up, Next: Emacs interface, Prev: TTS, Up: Top XML/SGML mark-up **************** The ideas of a general, synthesizer system nonspecific, mark-up language for labelling text has been under discussion for some time. Festival has supported an SGML based markup language through multiple versions most recently STML (`sproat97'). This is based on the earlier SSML (Speech Synthesis Markup Language) which was supported by previous versions of Festival (`taylor96'). With this version of Festival we support _Sable_ a similar mark-up language devised by a consortium from Bell Labls, Sub Microsystems, AT&T and Edinburgh, `sable98'. Unlike the previous versions which were SGML based, the implementation of Sable in Festival is now XML based. To the user they different is negligable but using XML makes processing of files easier and more standardized. Also Festival now includes an XML parser thus reducing the dependencies in processing Sable text. Raw text has the problem that it cannot always easily be rendered as speech in the way the author wishes. Sable offers a well-defined way of marking up text so that the synthesizer may render it appropriately. The definition of Sable is by no means settled and is still in development. In this release Festival offers people working on Sable and other XML (and SGML) based markup languages a chance to quickly experiment with prototypes by providing a DTD (document type descriptions) and the mapping of the elements in the DTD to Festival functions. Although we have not yet (personally) investigated facilities like cascading style sheets and generalized SGML specification languages like DSSSL we believe the facilities offer by Festival allow rapid prototyping of speech output markup languages. Primarily we see Sable markup text as a language that will be generated by other programs, e.g. text generation systems, dialog managers etc. therefore a standard, easy to parse, format is required, even if it seems overly verbose for human writers. For more information of Sable and access to the mailing list see `http://www.cstr.ed.ac.uk/projects/sable.html' * Menu: * Sable example:: an example of Sable with descriptions * Supported Sable tags:: Currently supported Sable tags * Adding Sable tags:: Adding new Sable tags * XML/SGML requirements:: Software environment requirements for use * Using Sable:: Rendering Sable files as speech File: festival.info, Node: Sable example, Next: Supported Sable tags, Up: XML/SGML mark-up Sable example ============= Here is a simple example of Sable marked up text <?xml version="1.0"?> <!DOCTYPE SABLE PUBLIC "-//SABLE//DTD SABLE speech mark up//EN" "Sable.v0_2.dtd" []> <SABLE> <SPEAKER NAME="male1"> The boy saw the girl in the park <BREAK/> with the telescope. The boy saw the girl <BREAK/> in the park with the telescope. Good morning <BREAK /> My name is Stuart, which is spelled <RATE SPEED="-40%"> <SAYAS MODE="literal">stuart</SAYAS> </RATE> though some people pronounce it <PRON SUB="stoo art">stuart</PRON>. My telephone number is <SAYAS MODE="literal">2787</SAYAS>. I used to work in <PRON SUB="Buckloo">Buccleuch</PRON> Place, but no one can pronounce that. By the way, my telephone number is actually <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.2.au"/> <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/> <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.8.au"/> <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/>. </SPEAKER> </SABLE> After the initial definition of the SABLE tags, through the file `Sable.v0_2.dtd', which is distributed as part of Festival, the body is given. There are tags for identifying the language and the voice. Explicit boundary markers may be given in text. Also duration and intonation control can be explicit specified as can new pronunciations of words. The last sentence specifies some external filenames to play at that point. File: festival.info, Node: Supported Sable tags, Next: Adding Sable tags, Prev: Sable example, Up: XML/SGML mark-up Supported Sable tags ==================== There is not yet a definitive set of tags but hopefully such a list will form over the next few months. As adding support for new tags is often trivial the problem lies much more in defining what tags there should be than in actually implementing them. The following are based on version 0.2 of Sable as described in `http://www.cstr.ed.ac.uk/projects/sable_spec2.html', though some aspects are not currently supported in this implementation. Further updates will be announces through the Sable mailing list. `LANGUAGE' Allows the specification of the language through the `ID' attribute. Valid values in Festival are, `english', `en1', `spanish', `en', and others depending on your particular installation. For example <LANGUAGE id="english"> ... </LANGUAGE> If the language isn't supported by the particualr installation of Festival "Some text in .." is said instead and the section is ommitted. `SPEAKER' Select a voice. Accepts a parameter `NAME' which takes values `male1', `male2', `female1', etc. There is currently no definition about what happens when a voice is selected which the synthesizer doesn't support. An example is <SPEAKER name="male1"> ... </SPEAKER> `AUDIO' This allows the specification of an external waveform that is to be included. There are attributes for specifying volume and whether the waveform is to be played in the background of the following text or not. Festival as yet only supports insertion. My telephone number is <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.2.au"/> <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/> <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.8.au"/> <AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/>. `MARKER' This allows Festival to mark when a particalur part of the text has been reached. At present the simply the value of the `MARK' attribute is printed. This is done some when that piece of text is analyzed. not when it is played. To use this in any real application would require changes to this tags implementation. Move the <MARKER MARK="mouse" /> mouse to the top. `BREAK' Specifies a boundary at some `LEVEL'. Strength may be values `Large', `Medium', `Small' or a number. Note that this this tag is an emtpy tag and must include the closing part within itsefl specification. <BREAK LEVEL="LARGE"/> `DIV' This signals an division. In Festival this causes an utterance break. A `TYPE' attribute may be specified but it is ignored by Festival. `PRON' Allows pronunciation of enclosed text to be explcitily given. It supports the attributes `IPA' for an IPA specification (not currently supported by Festival); `SUB' text to be substituted which can be in some form of phonetic spelling, and `ORIGIN' where the linguistic origin of the enclosed text may be identified to assist in etymologically sensitive letter to sound rules. <PRON SUB="toe maa toe">tomato</PRON> `SAYAS' Allows indeitnfication of the enclose tokens/text. The attribute `MODE' cand take any of the following a values: `literal', `date', `time', `phone', `net', `postal', `currency', `math', `fraction', `measure', `ordinal', `cardinal', or `name'. Further specification of type for dates (MDY, DMY etc) may be speficied through the `MODETYPE' attribute. As a test of marked-up numbers. Here we have a year <SAYAS MODE="date">1998</SAYAS>, an ordinal <SAYAS MODE="ordinal">1998</SAYAS>, a cardinal <SAYAS MODE="cardinal">1998</SAYAS>, a literal <SAYAS MODE="literal">1998</SAYAS>, and phone number <SAYAS MODE="phone">1998</SAYAS>. `EMPH' To specify enclose text should be emphasized, a `LEVEL' attribute may be specified but its value is currently ignored by Festival (besides the emphasis Festival generates isn't very good anyway). The leaders of <EMPH>Denmark</EMPH> and <EMPH>India</EMPH> meet on Friday. `PITCH' Allows the specification of pitch range, mid and base points. Without his penguin, <PITCH BASE="-20%"> which he left at home, </PITCH> he could not enter the restaurant. `RATE' Allows the specification of speaking rate The address is <RATE SPEED="-40%"> 10 Main Street </RATE>. `VOLUME' Allows the specification of volume. Note in festival this causes an utetrance break before and after this tag. Please speak more <VOLUME LEVEL="loud">loudly</VOLUME>, except when I ask you to speak <VOLUME LEVEL="quiet">in a quiet voice</VOLUME>. `ENGINE' This allows specification of engine specific commands An example is <ENGINE ID="festival" DATA="our own festival speech synthesizer"> the festival speech synthesizer</ENGINE> or the Bell Labs speech synthesizer. These tags may change in name but they cover the aspects of speech mark up that we wish to express. Later additions and changes to these are expected. See the files `festival/examples/example.sable' and `festival/examples/example2.sable' for working examples. Note the definition of Sable is on going and there are likely to be later more complete implementations of sable for Festival as independent releases consult `url://www.cstr.ed.ac.uk/projects/sable.html' for the most recent updates. File: festival.info, Node: Adding Sable tags, Next: XML/SGML requirements, Prev: Supported Sable tags, Up: XML/SGML mark-up Adding Sable tags ================= We do not yet claim that there is a fixed standard for Sable tags but we wish to move towards such a standard. In the mean time we have made it easy in Festival to add support for new tags without, in general, having to change any of the core functions. Two changes are necessary to add a new tags. First, change the definition in `lib/Sable.v0_2.dtd', so that Sable files may use it. The second stage is to make Festival sensitive to that new tag. The example in `festival/lib/sable-mode.scm' shows how a new text mode may be implemented for an XML/SGML-based markup language. The basic point is that an identified function will be called on finding a start tag or end tags in the document. It is the tag-function's job to synthesize the given utterance if the tag signals an utterance boundary. The return value from the tag-function is the new status of the current utterance, which may remain unchanged or if the current utterance has been synthesized `nil' should be returned signalling a new utterance. Note the hierarchical structure of the document is not available in this method of tag-functions. Any hierarchical state that must be preserved has to be done using explicit stacks in Scheme. This is an artifact due to the cross relationship to utterances and tags (utterances may end within start and end tags), and the desire to have all specification in Scheme rather than C++. The tag-functions are defined in an elements list. They are identified with names such as "(SABLE" and ")SABLE" denoting start and end tags respectively. Two arguments are passed to these tag functions, an assoc list of attributes and values as specified in the document and the current utterances. If the tag denotes an utterance break, call `xxml_synth' on `UTT' and return `nil'. If a tag (start or end) is found in the document and there is no corresponding tag-function it is ignored. New features may be added to words with a start and end tag by adding features to the global `xxml_word_features'. Any features in that variable will be added to each word. Note that this method may be used for both XML based lamnguages and SGML based markup languages (though and external normalizing SGML parser is required in the SGML case). The type (XML vs SGML) is identified by the `analysis_type' parameter in the tts text mode specification. File: festival.info, Node: XML/SGML requirements, Next: Using Sable, Prev: Adding Sable tags, Up: XML/SGML mark-up XML/SGML requirements ===================== Festival is distributed with `rxp' an XML parser developed by Richard Tobin of the Language Technology Group, University of Edinburgh. Sable is set up as an XML text mode so no further requirements or external programs are required to synthesize from Sable marked up text (unlike previous releases). Note that `rxp' is not a full validation parser and hence doesn't check some aspects of the file (tags within tags). Festival still supports SGML based markup but in such cases requires an external SGML normalizing parser. We have tested `nsgmls-1.0' which is available as part of the SGML tools set `sp-1.1.tar.gz' which is available from `http://www.jclark.com/sp/index.html'. This seems portable between many platforms. File: festival.info, Node: Using Sable, Prev: XML/SGML requirements, Up: XML/SGML mark-up Using Sable =========== Support in Festival for Sable is as a text mode. In the command mode use the following to process an Sable file (tts "file.sable" 'sable) Also the automatic selection of mode based on file type has been set up such that files ending `.sable' will be automatically synthesized in this mode. Thus festival --tts fred.sable Will render `fred.sable' as speech in Sable mode. Another way of using Sable is through the Emacs interface. The say-buffer command will send the Emacs buffer mode to Festival as its tts-mode. If the Emacs mode is stml or sgml the file is treated as an sable file. *Note Emacs interface:: Many people experimenting with Sable (and TTS in general) often want all the waveform output to be saved to be played at a later date. The simplest way to do this is using the `text2wave' script, It respects the audo mode selection so text2wave fred.sable -o fred.wav Note this renders the file a single waveform (done by concatenating the waveforms for each utterance in the Sable file). If you wish the waveform for each utterance in a file saved you can cause the tts process to save the waveforms during synthesis. A call to festival> (save_waves_during_tts) Any future call to `tts' will cause the waveforms to be saved in a file `tts_file_xxx.wav' where `xxx' is a number. A call to `(save_waves_during_tts_STOP)' will stop saving the waves. A message is printed when the waveform is saved otherwise people forget about this and wonder why their disk has filled up. This is done by inserting a function in `tts_hooks' which saves the wave. To do other things to each utterances during TTS (such as saving the utterance structure), try redefining the function `save_tts_output' (see `festival/lib/tts.scm').