Sophie: festival-2.1-3.mga1 i586

festival-2.1-3.mga1.i586.rpm

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52
     from ../festival.texi on 2 August 2001 -->

<TITLE>Festival Speech Synthesis System - 10  XML/SGML mark-up</TITLE>
</HEAD>
<BODY bgcolor="#ffffff">
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_9.html">previous</A>, <A HREF="festival_11.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC31" HREF="festival_toc.html#TOC31">10  XML/SGML mark-up</A></H1>

<P>
<A NAME="IDX112"></A>
<A NAME="IDX113"></A>
<A NAME="IDX114"></A>
<A NAME="IDX115"></A>
<A NAME="IDX116"></A>
<A NAME="IDX117"></A>
The ideas of a general, synthesizer system nonspecific, mark-up language
for labelling text has been under discussion for some time.  Festival
has supported an SGML based markup language through multiple versions
most recently STML (<CITE>sproat97</CITE>).  This is based on the earlier SSML
(Speech Synthesis Markup Language) which was supported by previous
versions of Festival (<CITE>taylor96</CITE>).  With this version of Festival
we support <EM>Sable</EM> a similar mark-up language devised by a
consortium from Bell Labls, Sub Microsystems, AT&#38;T and Edinburgh,
<CITE>sable98</CITE>.  Unlike the previous versions which were SGML based, the
implementation of Sable in Festival is now XML based.  To the user they
different is negligable but using XML makes processing of files easier
and more standardized.  Also Festival now includes an XML parser thus
reducing the dependencies in processing Sable text.

</P>
<P>
Raw text has the problem that it cannot always easily be rendered as
speech in the way the author wishes.  Sable offers a well-defined way of
marking up text so that the synthesizer may render it appropriately.

</P>
<P>
<A NAME="IDX118"></A>
<A NAME="IDX119"></A>
<A NAME="IDX120"></A>
The definition of Sable is by no means settled and is still in
development.  In this release Festival offers people working on Sable
and other XML (and SGML) based markup languages a chance to quickly
experiment with prototypes by providing a DTD (document type
descriptions) and the mapping of the elements in the DTD to Festival
functions.  Although we have not yet (personally) investigated facilities
like cascading style sheets and generalized SGML specification languages
like DSSSL we believe the facilities offer by Festival allow rapid
prototyping of speech output markup languages.

</P>
<P>
Primarily we see Sable markup text as a language that will be generated by
other programs, e.g. text generation systems, dialog managers etc.
therefore a standard, easy to parse, format is required, even if
it seems overly verbose for human writers.

</P>
<P>
For more information of Sable and access to the mailing list see

<PRE>
<A HREF="http://www.cstr.ed.ac.uk/projects/sable.html">http://www.cstr.ed.ac.uk/projects/sable.html</A>
</PRE>



<H2><A NAME="SEC32" HREF="festival_toc.html#TOC32">10.1  Sable example</A></H2>

<P>
Here is a simple example of Sable marked up text

</P>

<PRE>
&#60;?xml version="1.0"?&#62;
&#60;!DOCTYPE SABLE PUBLIC "-//SABLE//DTD SABLE speech mark up//EN" 
      "Sable.v0_2.dtd"
[]&#62;
&#60;SABLE&#62;
&#60;SPEAKER NAME="male1"&#62;

The boy saw the girl in the park &#60;BREAK/&#62; with the telescope.
The boy saw the girl &#60;BREAK/&#62; in the park with the telescope.

Good morning &#60;BREAK /&#62; My name is Stuart, which is spelled
&#60;RATE SPEED="-40%"&#62;
&#60;SAYAS MODE="literal"&#62;stuart&#60;/SAYAS&#62; &#60;/RATE&#62;
though some people pronounce it 
&#60;PRON SUB="stoo art"&#62;stuart&#60;/PRON&#62;.  My telephone number
is &#60;SAYAS MODE="literal"&#62;2787&#60;/SAYAS&#62;.

I used to work in &#60;PRON SUB="Buckloo"&#62;Buccleuch&#60;/PRON&#62; Place, 
but no one can pronounce that.

By the way, my telephone number is actually
&#60;AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.2.au"/&#62;
&#60;AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/&#62;
&#60;AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.8.au"/&#62;
&#60;AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/&#62;.
&#60;/SPEAKER&#62;
&#60;/SABLE&#62;
</PRE>

<P>
<A NAME="IDX121"></A>
<A NAME="IDX122"></A>
After the initial definition of the SABLE tags, through the file
<TT>`Sable.v0_2.dtd'</TT>, which is distributed as part of Festival, the
body is given.  There are tags for identifying the language and the
voice.  Explicit boundary markers may be given in text.  Also duration
and intonation control can be explicit specified as can new
pronunciations of words.  The last sentence specifies some external
filenames to play at that point.

</P>


<H2><A NAME="SEC33" HREF="festival_toc.html#TOC33">10.2  Supported Sable tags</A></H2>

<P>
<A NAME="IDX123"></A>
There is not yet a definitive set of tags but hopefully such a list
will form over the next few months.  As adding support for new tags is
often trivial the problem lies much more in defining what tags there
should be than in actually implementing them.    The following
are based on version 0.2 of Sable as described in 
<A HREF="http://www.cstr.ed.ac.uk/projects/sable_spec2.html">http://www.cstr.ed.ac.uk/projects/sable_spec2.html</A>, though
some aspects are not currently supported in this implementation.
Further updates will be announces through the Sable mailing list.

</P>
<DL COMPACT>

<DT><CODE>LANGUAGE</CODE>
<DD>
Allows the specification of the language through the <CODE>ID</CODE>
attribute.  Valid values in Festival are, <CODE>english</CODE>,
<CODE>en1</CODE>, <CODE>spanish</CODE>, <CODE>en</CODE>, and others depending
on your particular installation.
For example

<PRE>
&#60;LANGUAGE id="english"&#62; ... &#60;/LANGUAGE&#62;
</PRE>

If the language isn't supported by the particualr installation of
Festival "Some text in .." is said instead and the section is
ommitted.
<DT><CODE>SPEAKER</CODE>
<DD>
Select a voice.  Accepts a parameter <CODE>NAME</CODE> which takes values
<CODE>male1</CODE>, <CODE>male2</CODE>, <CODE>female1</CODE>,  etc.  There
is currently no definition about what happens when a voice is selected
which the synthesizer doesn't support.  An example is

<PRE>
&#60;SPEAKER name="male1"&#62; ... &#60;/SPEAKER&#62;
</PRE>

<DT><CODE>AUDIO</CODE>
<DD>
This allows the specification of an external waveform that is to
be included.  There are attributes for specifying volume and whether
the waveform is to be played in the background of the following
text or not.  Festival as yet only supports insertion.

<PRE>
My telephone number is 
&#60;AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.2.au"/&#62;
&#60;AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/&#62;
&#60;AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.8.au"/&#62;
&#60;AUDIO SRC="http://www.cstr.ed.ac.uk/~awb/sounds/touchtone.7.au"/&#62;.
</PRE>

<DT><CODE>MARKER</CODE>
<DD>
This allows Festival to mark when a particalur part of the text has
been reached.  At present the simply the value of the <CODE>MARK</CODE>
attribute is printed.  This is done some when that piece
of text is analyzed. not when it is played.  To use
this in any real application would require changes to this tags
implementation.

<PRE>
Move the &#60;MARKER MARK="mouse" /&#62; mouse to the top.
</PRE>

<DT><CODE>BREAK</CODE>
<DD>
Specifies a boundary at some <CODE>LEVEL</CODE>.  Strength may be values
<CODE>Large</CODE>, <CODE>Medium</CODE>, <CODE>Small</CODE> or a number.  Note that
this this tag is an emtpy tag and must include the closing part
within itsefl specification.

<PRE>
&#60;BREAK LEVEL="LARGE"/&#62;
</PRE>

<DT><CODE>DIV</CODE>
<DD>
This signals an division.  In Festival this causes an utterance
break.  A <CODE>TYPE</CODE> attribute may be specified but it is ignored
by Festival.
<DT><CODE>PRON</CODE>
<DD>
Allows pronunciation of enclosed text to be explcitily given.  It
supports the attributes <CODE>IPA</CODE> for an IPA specification (not
currently supported by Festival); <CODE>SUB</CODE> text to be substituted
which can be in some form of phonetic spelling, and <CODE>ORIGIN</CODE> where
the linguistic origin of the enclosed text may be identified to assist
in etymologically sensitive letter to sound rules.

<PRE>
&#60;PRON SUB="toe maa toe"&#62;tomato&#60;/PRON&#62;
</PRE>

<DT><CODE>SAYAS</CODE>
<DD>
Allows indeitnfication of the enclose tokens/text.  The attribute
<CODE>MODE</CODE> cand take any of the following a values: <CODE>literal</CODE>,
<CODE>date</CODE>, <CODE>time</CODE>, <CODE>phone</CODE>, <CODE>net</CODE>, <CODE>postal</CODE>,
<CODE>currency</CODE>, <CODE>math</CODE>, <CODE>fraction</CODE>, <CODE>measure</CODE>,
<CODE>ordinal</CODE>, <CODE>cardinal</CODE>, or <CODE>name</CODE>.  Further specification
of type for dates (MDY, DMY etc) may be speficied through the 
<CODE>MODETYPE</CODE> attribute.

<PRE>
As a test of marked-up numbers. Here we have 
a year &#60;SAYAS MODE="date"&#62;1998&#60;/SAYAS&#62;, 
an ordinal &#60;SAYAS MODE="ordinal"&#62;1998&#60;/SAYAS&#62;, 
a cardinal &#60;SAYAS MODE="cardinal"&#62;1998&#60;/SAYAS&#62;, 
a literal &#60;SAYAS MODE="literal"&#62;1998&#60;/SAYAS&#62;, 
and phone number &#60;SAYAS MODE="phone"&#62;1998&#60;/SAYAS&#62;.
</PRE>

<DT><CODE>EMPH</CODE>
<DD>
To specify enclose text should be emphasized, a <CODE>LEVEL</CODE>
attribute may be specified but its value is currently 
ignored by Festival (besides the emphasis Festival generates
isn't very good anyway).

<PRE>
The leaders of &#60;EMPH&#62;Denmark&#60;/EMPH&#62; and &#60;EMPH&#62;India&#60;/EMPH&#62; meet on
Friday.
</PRE>

<DT><CODE>PITCH</CODE>
<DD>
Allows the specification of pitch range, mid and base points.

<PRE>
Without his penguin, &#60;PITCH BASE="-20%"&#62; which he left at home, &#60;/PITCH&#62;
he could not enter the restaurant.
</PRE>

<DT><CODE>RATE</CODE>
<DD>
Allows the specification of speaking rate

<PRE>
The address is &#60;RATE SPEED="-40%"&#62; 10 Main Street &#60;/RATE&#62;.
</PRE>

<DT><CODE>VOLUME</CODE>
<DD>
Allows the specification of volume.  Note in festival this
causes an utetrance break before and after this tag.

<PRE>
Please speak more &#60;VOLUME LEVEL="loud"&#62;loudly&#60;/VOLUME&#62;, except
when I ask you to speak &#60;VOLUME LEVEL="quiet"&#62;in a quiet voice&#60;/VOLUME&#62;.
</PRE>

<DT><CODE>ENGINE</CODE>
<DD>
This allows specification of engine specific commands

<PRE>
An example is &#60;ENGINE ID="festival" DATA="our own festival speech
synthesizer"&#62; the festival speech synthesizer&#60;/ENGINE&#62; or
the Bell Labs speech synthesizer.
</PRE>

</DL>

<P>
These tags may change in name but they cover the aspects of speech
mark up that we wish to express.  Later additions and changes to these
are expected.

</P>
<P>
See the files <TT>`festival/examples/example.sable'</TT> and
<TT>`festival/examples/example2.sable'</TT> for working examples.

</P>
<P>
Note the definition of Sable is on going and there are likely to be
later more complete implementations of sable for Festival as independent
releases consult <TT>`url://www.cstr.ed.ac.uk/projects/sable.html'</TT> for
the most recent updates.

</P>


<H2><A NAME="SEC34" HREF="festival_toc.html#TOC34">10.3  Adding Sable tags</A></H2>

<P>
We do not yet claim that there is a fixed standard for Sable tags but
we wish to move towards such a standard.  In the mean time we have
made it easy in Festival to add support for new tags without, in 
general, having to change any of the core functions.

</P>
<P>
Two changes are necessary to add a new tags.  First, change the
definition in <TT>`lib/Sable.v0_2.dtd'</TT>, so that Sable files may use it.
The second stage is to make Festival sensitive to that new tag.  The
example in <CODE>festival/lib/sable-mode.scm</CODE> shows how a new text mode
may be implemented for an XML/SGML-based markup language.  The basic
point is that an identified function will be called on finding a start
tag or end tags in the document.  It is the tag-function's job to
synthesize the given utterance if the tag signals an utterance boundary.
The return value from the tag-function is the new status of the current
utterance, which may remain unchanged or if the current utterance has
been synthesized <CODE>nil</CODE> should be returned signalling a new
utterance.

</P>
<P>
Note the hierarchical structure of the document is not available in this
method of tag-functions.  Any hierarchical state that must be preserved
has to be done using explicit stacks in Scheme.  This is an artifact
due to the cross relationship to utterances and tags (utterances may end
within start and end tags), and the desire to have all specification in
Scheme rather than C++.

</P>
<P>
The tag-functions are defined in an elements list.  They are identified
with names such as "(SABLE" and ")SABLE" denoting start and end tags
respectively.  Two arguments are passed to these tag functions, 
an assoc list of attributes and values as specified in the document
and the current utterances.  If the tag denotes an utterance
break, call <CODE>xxml_synth</CODE> on <CODE>UTT</CODE> and return <CODE>nil</CODE>.
If a tag (start or end) is found in the document and there is no
corresponding tag-function it is ignored.

</P>
<P>
New features may be added to words with a start and end tag by
adding features to the global <CODE>xxml_word_features</CODE>.  Any
features in that variable will be added to each word.

</P>
<P>
Note that this method may be used for both XML based lamnguages and SGML
based markup languages (though and external normalizing SGML parser is
required in the SGML case).  The type (XML vs SGML) is identified
by the <CODE>analysis_type</CODE> parameter in the tts text mode specification.

</P>


<H2><A NAME="SEC35" HREF="festival_toc.html#TOC35">10.4  XML/SGML requirements</A></H2>

<P>
<A NAME="IDX124"></A>
<A NAME="IDX125"></A>
Festival is distributed with <CODE>rxp</CODE> an XML parser developed
by Richard Tobin of the Language Technology Group, University of
Edinburgh.  Sable is set up as an XML text mode so no
further requirements or external programs are required to synthesize
from Sable marked up text (unlike previous releases).  Note that <CODE>rxp</CODE>
is not a full validation parser and hence doesn't check some aspects
of the file (tags within tags).

</P>
<P>
<A NAME="IDX126"></A>
<A NAME="IDX127"></A>
Festival still supports SGML based markup but in such cases requires an
external SGML normalizing parser.  We have tested <TT>`nsgmls-1.0'</TT>
which is available as part of the SGML tools set <TT>`sp-1.1.tar.gz'</TT>
which is available from <A HREF="http://www.jclark.com/sp/index.html">http://www.jclark.com/sp/index.html</A>.
This seems portable between many platforms.

</P>


<H2><A NAME="SEC36" HREF="festival_toc.html#TOC36">10.5  Using Sable</A></H2>

<P>
<A NAME="IDX128"></A>
<A NAME="IDX129"></A>
Support in Festival for Sable is as a text mode.  In the command
mode use the following to process an Sable file

<PRE>
(tts "file.sable" 'sable)
</PRE>

<P>
Also the automatic selection of mode based on file type has been set up
such that files ending <TT>`.sable'</TT> will be automatically synthesized in
this mode.  Thus

<PRE>
festival --tts fred.sable
</PRE>

<P>
Will render <TT>`fred.sable'</TT> as speech in Sable mode.

</P>
<P>
Another way of using Sable is through the Emacs interface.  The
say-buffer command will send the Emacs buffer mode to Festival as
its tts-mode. If the Emacs mode is stml or sgml the file is treated
as an sable file.  See section <A HREF="festival_11.html#SEC37">11  Emacs interface</A>

</P>
<P>
<A NAME="IDX130"></A>
<A NAME="IDX131"></A>
Many people experimenting with Sable (and TTS in general) often want all
the waveform output to be saved to be played at a later date.  The
simplest way to do this is using the <TT>`text2wave'</TT> script, It
respects the audo mode selection so 

<PRE>
text2wave fred.sable -o fred.wav
</PRE>

<P>
Note this renders the file a single waveform (done by concatenating
the waveforms for each utterance in the Sable file).  

</P>
<P>
If you wish the waveform for each utterance in a file saved you can
cause the tts process to save the waveforms during synthesis.  A
call to

<PRE>
festival&#62; (save_waves_during_tts)
</PRE>

<P>
Any future call to <CODE>tts</CODE> will cause the waveforms to be saved in a
file <TT>`tts_file_xxx.wav'</TT> where <TT>`xxx'</TT> is a number.  A call to
<CODE>(save_waves_during_tts_STOP)</CODE> will stop saving the waves.  A
message is printed when the waveform is saved otherwise people forget
about this and wonder why their disk has filled up.

</P>
<P>
This is done by inserting a function in <CODE>tts_hooks</CODE>
which saves the wave.  To do other things to each utterances during
TTS (such as saving the utterance structure), try redefining
the function <CODE>save_tts_output</CODE> (see <CODE>festival/lib/tts.scm</CODE>).

</P>
<P><HR><P>
Go to the <A HREF="festival_1.html">first</A>, <A HREF="festival_9.html">previous</A>, <A HREF="festival_11.html">next</A>, <A HREF="festival_35.html">last</A> section, <A HREF="festival_toc.html">table of contents</A>.
</BODY>
</HTML>