Sophie

Sophie

distrib > Mageia > 2 > i586 > by-pkgid > a2e5ae2091c2674a899ba2cbfce176e5 > files > 39

festival-2.1-3.mga1.i586.rpm

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52
     from ../festival.texi on 2 August 2001 -->

<TITLE>Festival Speech Synthesis System - 28  API</TITLE>
</HEAD>
<BODY bgcolor="#ffffff">
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_27.html">previous</A>, <A HREF="festival_29.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC127" HREF="festival_toc.html#TOC127">28  API</A></H1>

<P>
If you wish to use Festival within some other application there are
a number of possible interfaces.  

</P>



<H2><A NAME="SEC128" HREF="festival_toc.html#TOC128">28.1  Scheme API</A></H2>

<P>
<A NAME="IDX355"></A>
Festival includes a full programming language, Scheme (a variant of
Lisp) as a powerful interface to its speech synthesis functions.
Often this will be the easiest method of controlling Festival's
functionality.  Even when using other API's they will ultimately
depend on the Scheme interpreter.

</P>
<P>
Scheme commands (as s-expressions) may be simply written in files and
interpreted by Festival, either by specification as arguments on 
the command line, in the interactive interpreter, or through standard
input as a pipe.  Suppose we have a file <TT>`hello.scm'</TT> containing

</P>

<PRE>
;; A short example file with Festival Scheme commands
(voice_rab_diphone) ;; select Gordon
(SayText "Hello there")
(voice_don_diphone) ;; select Donovan
(SayText "and hello from me")
</PRE>

<P>
From the command interpreter we can execute the commands in this file
by loading them

<PRE>
festival&#62; (load "hello.scm")
nil
</PRE>

<P>
Or we can execute the commands in the file directly from the
shell command line 

<PRE>
unix$ festival -b hello.scm
</PRE>

<P>
The <SAMP>`-b'</SAMP> option denotes batch operation meaning the file is loaded
and then Festival will exit, without starting the command interpreter.
Without this option <SAMP>`-b'</SAMP> Festival will load
<TT>`hello.scm'</TT> and then accept commands on standard input.  This can
be convenient when some initial set up is required for a session.

</P>
<P>
Note one disadvantage of the batch method is that time is required for
Festival's initialisation every time it starts up.  Although this will
typically only be a few seconds, for saying short individual expressions
that lead in time may be unacceptable.  Thus simply executing the
commands within an already running system is more desirable, or using
the server/client mode.

</P>
<P>
Of course its not just about strings of commands, because Scheme is a
fully functional language, functions, loops, variables, file access,
arithmetic operations may all be carried out in your Scheme programs.
Also, access to Unix is available through the <CODE>system</CODE>
function.  For many applications directly programming them in Scheme is
both the easiest and the most efficient method.

</P>
<P>
<A NAME="IDX356"></A>
A number of example Festival scripts are included in <TT>`examples/'</TT>.
Including a program for saying the time, and for telling you the latest
news (by accessing a page from the web).  Also see the
detailed discussion of a script example in See section <A HREF="festival_29.html#SEC138">29.1  POS Example</A>.

</P>


<H2><A NAME="SEC129" HREF="festival_toc.html#TOC129">28.2  Shell API</A></H2>

<P>
<A NAME="IDX357"></A>
The simplest use of Festival (though not the most powerful) is
simply using it to directly render text files as speech.  Suppose
we have a file <TT>`hello.txt'</TT> containing

<PRE>
Hello world.  Isn't it excellent weather
this morning.
</PRE>

<P>
We can simply call Festival as

<PRE>
unix$ festival --tts hello.txt
</PRE>

<P>
Or for even simpler one-off phrases

<PRE>
unix$ echo "hello " | festival --tts
</PRE>

<P>
This is easy to use but you will need to wait for Festival to start up
and initialise its databases before it starts to render the text as
speech.  This may take several seconds on some machines.  A socket based
server mechanism is provided in Festival which will allow a single
server process to start up once and be used efficiently by multiple
client programs.

</P>
<P>
Note also the use of Sable for marked up text, see section <A HREF="festival_10.html#SEC31">10  XML/SGML mark-up</A>.
Sable allows various forms of additional information in text, such as
phrasing, emphasis, pronunciation, as well as changing voices, and
inclusion of external waveform files (i.e. random noises).  For many
application this will be the preferred interface method.  Other text
modes too are available through the command line by using
<CODE>auto-text-mode-alist</CODE>.

</P>


<H2><A NAME="SEC130" HREF="festival_toc.html#TOC130">28.3  Server/client API</A></H2>

<P>
<A NAME="IDX358"></A>
Festival offers a BSD socket-based interface.  This allows
Festival to run as a server and allow client programs to access
it.  Basically the server offers a new command interpreter for
each client that attaches to it.  The server is forked for each
client but this is much faster than having to wait for a 
Festival process to start from scratch.  Also the server can
run on a bigger machine, offering much faster synthesis.

</P>
<P>
<EM>Note: the Festival server is inherently insecure and may allow
arbitrary users access to your machine.</EM>

</P>
<P>
Every effort has been made to minimise the risk of unauthorised access
through Festival and a number of levels of security are provided.
However with any program offering socket access, like <CODE>httpd</CODE>,
<CODE>sendmail</CODE> or <CODE>ftpd</CODE> there is a risk that unauthorised access
is possible.  I trust Festival's security enough to often run it on my
own machine and departmental servers, restricting access to within our
department.  Please read the information below before using
the Festival server so you understand the risks.

</P>


<H3><A NAME="SEC131" HREF="festival_toc.html#TOC131">28.3.1  Server access control</A></H3>

<P>
<A NAME="IDX359"></A>
<A NAME="IDX360"></A>
The following access control is available for Festival when
running as a server.  When the server starts it will usually
start by loading in various commands specific for the task
it is to be used for.  The following variables are used
to control access.
<DL COMPACT>

<DT><CODE>server_port</CODE>
<DD>
A number identifying the inet socket port.  By default this
is 1314.  It may be changed as required.
<DT><CODE>server_log_file</CODE>
<DD>
If nil no logging takes place, if t logging is printed to standard out
and if a file name log messages are appended to that file. All
connections and attempted connections are logged with a time stamp
and the name of the client.  All commands sent from the client
are also logged (output and data input is not logged).
<DT><CODE>server_deny_list</CODE>
<DD>
If non-nil it is used to identify which machines are not allowed
access to the server.  This is a list of regular expressions.
If the host name of the client matches any of the regexs in this
list the client is denied access.   This overrides all other
access methods.  Remember that sometimes hosts are identified as
numbers not as names.
<DT><CODE>server_access_list</CODE>
<DD>
If this is non-nil only machines whose names match at least one of the
regexs in this list may connect as clients.  Remember that sometimes
hosts are identified as numbers not as names, so you should probably
exclude the IP number of machine as well as its name to be properly
secure.
<DT><CODE>server_passwd</CODE>
<DD>
If this is non-nil, the client must send this passwd to the server
followed by a newline before access is given.  This is required
even if the machine is included in the access list.  This is designed
so servers for specific tasks may be set up with reasonable security.
<DT><CODE>(set_server_safe_functions FUNCNAMELIST)</CODE>
<DD>
If called this can restrict which functions the client may call.  This
is the most restrictive form of access, and thoroughly recommended.  In
this mode it would be normal to include only the specific functions the
client can execute (i.e. the function to set up output, and a tts
function).  For example a server could call the following at
set up time, thus restricting calls to only those that
<TT>`festival_client'</TT> <CODE>--ttw</CODE> uses.

<PRE>
(set_server_safe_functions 
        '(tts_return_to_client tts_text tts_textall Parameter.set))
</PRE>

</DL>
<P>
Its is strongly recommend that you run Festival in server mode as userid
<CODE>nobody</CODE> to limit the access the process will have, also running it
in a chroot environment is more secure.

</P>
<P>
For example suppose we wish to allow access to all machines in the CSTR
domain except for <CODE>holmes.cstr.ed.ac.uk</CODE> and
<CODE>adam.cstr.ed.ac.uk</CODE>.  This may be done by the following two
commands

<PRE>
(set! server_deny_list '("holmes\\.cstr\\.ed\\.ac\\.uk" 
                         "adam\\.cstr\\.ed\\.ac\\.uk"))
(set! server_access_list '("[^\\.]*\\.cstr\\.ed\\.ac\\.uk"))
</PRE>

<P>
This is not complete though as when DNS is not working <CODE>holmes</CODE> and
<CODE>adam</CODE> will still be able to access the server (but if our DNS
isn't working we probably have more serious problems).  However the
above is secure in that only machines in the domain <CODE>cstr.ed.ac.uk</CODE>
can access the server, though there may be ways to fix machines to
identify themselves as being in that domain even when they are not.

</P>
<P>
By default Festival in server mode will only accept client connections
for <CODE>localhost</CODE>.

</P>


<H3><A NAME="SEC132" HREF="festival_toc.html#TOC132">28.3.2  Client control</A></H3>

<P>
<A NAME="IDX361"></A>
<A NAME="IDX362"></A>
An example client program called <TT>`festival_client'</TT> is
included with the system that provides a wide range of access methods
to the server.  A number of options for the client are offered.

</P>
<DL COMPACT>

<DT><CODE>--server</CODE>
<DD>
The name (or IP number) of the server host.  By default this
is <TT>`localhost'</TT> (i.e. the same machine you run the client on).
<DT><CODE>--port</CODE>
<DD>
The port number the Festival server is running on.  By default this
is 1314.
<DT><CODE>--output FILENAME</CODE>
<DD>
If a waveform is to be synchronously returned, it will be saved in
<VAR>FILENAME</VAR>.   The <CODE>--ttw</CODE> option uses this as does the
use of the Festival command <CODE>utt.send.wave.client</CODE>.  If 
an output waveform file is received by <TT>`festival_client'</TT>
and no output file has been given the waveform is discarded with
an error message.
<DT><CODE>--passwd PASSWD</CODE>
<DD>
If a passwd is required by the server this should be stated
on the client call.  <VAR>PASSWD</VAR> is sent plus a newline
before any other communication takes places.  If this isn't
specified and a passwd is required, you must enter that first,
if the <CODE>--ttw</CODE> option is used, a passwd is required and 
none specified access will be denied.
<DT><CODE>--prolog FILE</CODE>
<DD>
<VAR>FILE</VAR> is assumed to be contain Festival commands and its contents
are sent to the server after the passwd but before anything else.  This
is convenient to use in conjunction with <CODE>--ttw</CODE> which otherwise
does not offer any way to send commands as well as the text to the
server.
<DT><CODE>--otype OUTPUTTYPE</CODE>
<DD>
If an output waveform file is to be used this specified the output type
of the file.  The default is <CODE>nist</CODE>, but, <CODE>ulaw</CODE>,
<CODE>riff</CODE>, <CODE>ulaw</CODE> and others as supported by the Edinburgh
Speech Tools Library are valid.  You may use raw too but note that
Festival may return waveforms of various sampling rates depending on the
sample rates of the databases its using.  You can of course make
Festival only return one particular sample rate, by using
<CODE>after_synth_hooks</CODE>.  Note that byte order will be native machine of the
<EM>client</EM> machine if the output format allows it.
<DT><CODE>--ttw</CODE>
<DD>
Text to wave is an attempt to make <CODE>festival_client</CODE> useful
in many simple applications.  Although you can connect to the server
and send arbitrary Festival Scheme commands, this option automatically
does what is probably what you want most often.  When specified
this options takes text from the specified file (or stdin),
synthesizes it (in one go) and saves it in the specified output
file.  It basically does the following

<PRE>
(Parameter.set 'Wavefiletype '&#60;output type&#62;)
(tts_textall "
&#60;file/stdin contents&#62;
")))
</PRE>

Note that this is best used for small, single utterance texts as you
have to wait for the whole text to be synthesized before it is returned.
<DT><CODE>--aucommand COMMAND</CODE>
<DD>
Execute <VAR>COMMAND</VAR> of each waveform returned by the server.   The
variable <CODE>FILE</CODE> will be set when <VAR>COMMAND</VAR> is executed.
<DT><CODE>--async</CODE>
<DD>
So that the delay between the text being sent and the first sound
being available to play, this option in conjunction with <CODE>--ttw</CODE>
causes the text to be synthesized utterance by utterance and be sent back
in separated waveforms.  Using <CODE>--aucommand</CODE> each waveform my
be played locally, and when <TT>`festival_client'</TT> is interrupted
the sound will stop.  Getting the client to connect to an audio
server elsewhere means the sound will not necessarily stop when 
the <TT>`festival_client'</TT> process is stopped.
<DT><CODE>--withlisp</CODE>
<DD>
With each command being sent to Festival a Lisp return value is
sent, also Lisp expressions may be sent from the server to the
client through the command <CODE>send_client</CODE>.  If this option
is specified the Lisp expressions are printed to standard out,
otherwise this information is discarded.
</DL>

<P>
A typical example use of <TT>`festival_client'</TT> is 

<PRE>
festival_client --async --ttw --aucommand 'na_play $FILE' fred.txt
</PRE>

<P>
This will use <TT>`na_play'</TT> to play each waveform generated for the
utterances in <TT>`fred.txt'</TT>.  Note the <EM>single</EM> quotes so that
the <CODE>$</CODE> in <CODE>$FILE</CODE> isn't expanded locally.

</P>
<P>
Note the server must be running before you can talk to it.  At present
Festival is not set up for automatic invocations through <TT>`inetd'</TT>
and <TT>`/etc/services'</TT>.  If you do that yourself, note
that it is a different type of interface as <TT>`inetd'</TT> assumes all
communication goes through standard in/out.

</P>
<P>
Also note that each connection to the server starts a new session.
Variables are not persistent over multiple calls to the server so if any
initialization is required (e.g. loading of voices) it must be done
each time the client starts or more reasonably in the server
when it is started.

</P>
<P>
<A NAME="IDX363"></A>
A PERL festival client is also available in
<TT>`festival/examples/festival_client.pl'</TT>

</P>


<H3><A NAME="SEC133" HREF="festival_toc.html#TOC133">28.3.3  Server/client protocol</A></H3>

<P>
<A NAME="IDX364"></A>
<A NAME="IDX365"></A>
The client talks to the server using s-expression (Lisp).  The server
will reply with and number of different chunks until either OK, is
returned or ER (on error).  The communicatotion is synchronous, each
client request can generate a number of waveform (WV) replies and/or
Lisp replies (LP) and terminated with an OK (or ER).  Lisp is used as it
has its own inherent syntax that Festival can already parse.

</P>
<P>
The following pseudo-code will help defined the protocol
as well as show typical use

<PRE>

   fprintf(serverfd,"%s\n",s-expression);
   do
      ack = read three character acknowledgemnt
      if (ack == "WV\n")
         read a waveform
      else if (ack == "LP\n")
         read an s-expression
      else if (ack == "ER\n")
         an error occurred, break;
   while ack != "OK\n"
</PRE>

<P>
The server can send a waveform in an utterance to the client through the
function <CODE>utt.send.wave.client</CODE>;  The server can send a lisp
expression to the client through the function

</P>



<H2><A NAME="SEC134" HREF="festival_toc.html#TOC134">28.4  C/C++ API</A></H2>

<P>
As well as offerening an interface through Scheme and the shell some
users may also wish to embedd Festival within their own C++ programs.
A number of simply to use high level functions are available for such
uses.  

</P>
<P>
In order to use Festival you must include
<TT>`festival/src/include/festival.h'</TT> which in turn will include the
necessary other include files in <TT>`festival/src/include'</TT> and
<TT>`speech_tools/include'</TT> you should ensure these are included in the
include path for you your program.  Also you will need to link your
program with <TT>`festival/src/lib/libFestival.a'</TT>,
<TT>`speech_tools/lib/libestools.a'</TT>,
<TT>`speech_tools/lib/libestbase.a'</TT> and
<TT>`speech_tools/lib/libeststring.a'</TT> as well as any other optional
libraries such as net audio.

</P>
<P>
The main external functions available for C++ users of Festival
are.
<DL COMPACT>

<DT><CODE>void festival_initialize(int load_init_files,int heapsize);</CODE>
<DD>
This must be called before any other festival functions may be called.
It sets up the synthesizer system.  The first argument if true,
causes the system set up files to be loaded (which is normallly
what is necessary), the second argument is the initial size of the
Scheme heap, this should normally be 210000 unless you envisage
processing very large Lisp structures.
<DT><CODE>int festival_say_file(const EST_String &#38;filename);</CODE>
<DD>
Say the contents of the given file.  Returns <CODE>TRUE</CODE> or <CODE>FALSE</CODE>
depending on where this was successful.
<DT><CODE>int festival_say_text(const EST_String &#38;text);</CODE>
<DD>
Say the contents of the given string.  Returns <CODE>TRUE</CODE> or <CODE>FALSE</CODE>
depending on where this was successful.
<DT><CODE>int festival_load_file(const EST_String &#38;filename);</CODE>
<DD>
Load the contents of the given file and evaluate its contents as
Lisp commands.  Returns <CODE>TRUE</CODE> or <CODE>FALSE</CODE>
depending on where this was successful.
<DT><CODE>int festival_eval_command(const EST_String &#38;expr);</CODE>
<DD>
Read the given string as a Lisp command and evaluate it.  Returns
<CODE>TRUE</CODE> or <CODE>FALSE</CODE> depending on where this was successful.
<DT><CODE>int festival_text_to_wave(const EST_String &#38;text,EST_Wave &#38;wave);</CODE>
<DD>
Synthesize the given string into the given wave.  Returns <CODE>TRUE</CODE> or
<CODE>FALSE</CODE> depending on where this was successful.
</DL>
<P>
Many other commands are also available but often the above will be
sufficient.

</P>
<P>
Below is a simple top level program that uses the Festival
functions

<PRE>
int main(int argc, char **argv)
{
    EST_Wave wave;
    int heap_size = 210000;  // default scheme heap size
    int load_init_files = 1; // we want the festival init files loaded

    festival_initialize(load_init_files,heap_size);

    // Say simple file
    festival_say_file("/etc/motd");

    festival_eval_command("(voice_ked_diphone)");
    // Say some text;
    festival_say_text("hello world");

    // Convert to a waveform
    festival_text_to_wave("hello world",wave);
    wave.save("/tmp/wave.wav","riff");

    // festival_say_file puts the system in async mode so we better
    // wait for the spooler to reach the last waveform before exiting
    // This isn't necessary if only festival_say_text is being used (and
    // your own wave playing stuff)
    festival_wait_for_spooler();

    return 0;
}
</PRE>



<H2><A NAME="SEC135" HREF="festival_toc.html#TOC135">28.5  C only API</A></H2>

<P>
<A NAME="IDX366"></A>
<A NAME="IDX367"></A>
A simpler C only interface example is given inf
<TT>`festival/examples/festival_client.c'</TT>.  That interface talks to a
festival server.  The code does not require linking with any other EST
or Festival code so is much smaller and easier to include in other
programs.  The code is missing some functionality but not much consider
how much smaller it is.

</P>


<H2><A NAME="SEC136" HREF="festival_toc.html#TOC136">28.6  Java and JSAPI</A></H2>

<P>
<A NAME="IDX368"></A>
<A NAME="IDX369"></A>
Initial support for talking to a Festival server from java is included
from version 1.3.0 and initial JSAPI support is included from 1.4.0.
At present the JSAPI talks to a Festival server elsewhere rather than
as part of the Java process itself.

</P>
<P>
A simple (Pure) Java festival client is given 
<TT>`festival/src/modules/java/cstr/festival/Client.java'</TT> with a
wraparound script in <TT>`festival/bin/festival_client_java'</TT>.

</P>
<P>
See the file <TT>`festival/src/modules/java/cstr/festival/jsapi/ReadMe'</TT>
for requirements and a small example of using the JSAPI interface.

</P>
<P><HR><P>
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_27.html">previous</A>, <A HREF="festival_29.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
</BODY>
</HTML>