Sophie

Sophie

distrib > Mageia > 5 > x86_64 > media > core-release > by-pkgid > f1f429f1e7336a64acc418038325ad85 > files > 18

flite-1.4-7.mga5.x86_64.rpm

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created on October 18, 2014 by texi2html 5.0
texi2html was written by: 
            Lionel Cons <Lionel.Cons@cern.ch> (original author)
            Karl Berry  <karl@freefriends.org>
            Olaf Bachmann <obachman@mathematik.uni-kl.de>
            and many others.
Maintained by: Many creative people.
Send bugs and suggestions to <texi2html-bug@nongnu.org>
-->
<head>
<title>Flite: a small, fast speech synthesis engine: 5 Flite Design</title>

<meta name="description" content="Flite: a small, fast speech synthesis engine: 5 Flite Design">
<meta name="keywords" content="Flite: a small, fast speech synthesis engine: 5 Flite Design">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="texi2html 5.0">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smalllisp {margin-left: 3.2em}
pre.display {font-family: serif}
pre.format {font-family: serif}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: serif; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: serif; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space:pre}
span.nolinebreak {white-space:pre}
span.roman {font-family:serif; font-weight:normal}
span.sansserif {font-family:sans-serif; font-weight:normal}
ul.no-bullet {list-style: none}
-->
</style>


</head>

<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">

<a name="Flite-Design"></a>
<table class="header" cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="flite_3.html#Installation" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
<td valign="middle" align="left">[<a href="flite_3.html#Using-the-PalmOS" title="Previous section in reading order"> &lt; </a>]</td>
<td valign="middle" align="left">[ Up ]</td>
<td valign="middle" align="left">[<a href="#Background" title="Next section in reading order"> &gt; </a>]</td>
<td valign="middle" align="left">[<a href="flite_5.html#Structure" title="Next chapter"> &gt;&gt; </a>]</td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left">[<a href="flite.html#Abstract" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="flite_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[Index]</td>
<td valign="middle" align="left">[<a href="flite_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<a name="Flite-Design-1"></a>
<h1 class="chapter">5 Flite Design</h1>

<hr>
<a name="Background"></a>
<table class="header" cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Flite-Design" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
<td valign="middle" align="left">[<a href="#Flite-Design" title="Previous section in reading order"> &lt; </a>]</td>
<td valign="middle" align="left">[<a href="#Flite-Design" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="#Key-Decisions" title="Next section in reading order"> &gt; </a>]</td>
<td valign="middle" align="left">[<a href="flite_5.html#Structure" title="Next chapter"> &gt;&gt; </a>]</td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left">[<a href="flite.html#Abstract" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="flite_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[Index]</td>
<td valign="middle" align="left">[<a href="flite_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<h2 class="section">5.1 Background</h2>

<p>Flite was primarily developed to address one of the most common
complaints about the Festival Speech Synthesis System.  Festival is
large and slow, even with the software bloat common amongst most
products and that that bloat has helped machines get faster, have more
memory and large disks, still Festival is criticized for its size.
</p>
<p>Although sometimes this complaint is unfair, it is valid and although
much work was done to ensure Festival can be trimmed and run fast it
still requires substantial resources per utterance to run.  After some
investigation to see if Festival itself could be trimmed down it became
clear because there was a core set of functions that were sufficient for
synthesis that a new implementation containing only those aspects that
were necessary would be easier than trimming down Festival itself.
</p>
<p>Given that a new implementation was being considered a number of
problems with Festival could also be addressed at the same time.
Festival is not thread-safe, and although it runs under Windows, in
server mode it relies on the Unix-centric view of fast forks with
copy-on-write shared memory for servicing clients.  This is a perfectly
safe and practical solution for Unix systems, but under Windows where
threads are the more common feature used for servicing multiple events
and forking is expensive, a non-thread safe program can&rsquo;t be used as
efficiently.
</p>
<p>Festival is written in C++ which was a good decision at the time and
perfectly suitable for a large program.  However what was discovered
over the years of development is that C++ is not a portable language.
Different C++ compilers are quite different and it takes significant
amount of work to ensure compatibility of the code base over multiple
compilers.  What makes this worse is that new versions of each compiler
are incompatible and changes are required.  At first this looked like we
were producing bad quality code but after 10 years it is clear that it is
also that the compilers are still maturing.  Thus it is clear that
Festival and the Edinburgh Speech Tools will continue to require
constant support as new versions of compilers are released.
</p>
<p>A second problem with C++ is the size and efficiency of the code
produced.  Proponents of C++ may rightly argue that Festival and the
Edinburgh Speech Tools aren&rsquo;t properly designed, but irrespective if
that is true or not, it is true that the size of the code is much larger
and slower than it need be for what it does.  Throughout the design
there is a constant trade-off between elegancy and efficiency which
unfortunately at times in Festival requires untidy solutions of
copying data out of objects processing it and copying back because
direct access (particularly in some signal processing routines)
is just too inefficient.
</p>
<p>Another major criticism of Festival is the use of Scheme as the
interpreter language.  Even though it is a simple to implement language
that is adequate for Festival&rsquo;s needs and can be easily included in the
distribution, people still hate it.  Often these people do learn to use
it and appreciate how run time configurability is very desirable and that
new voices may be added without recompilation.  Scheme does have garbage
collection which makes leaky programs much harder to write and as some
of the intended audience for developing in Festival will not be hard
core programmers a safe programming language seems very desirable.
</p>
<p>After taking into consideration all of the above it was decided to
develop Flite as a new system written in ANSI C.  C is much more
portable than C++ as well as offering much lower level control of the
size of the objects and data structure it uses.  
</p>
<p>Flite is not intended as a research and development platform for speech
synthesis, Festival is and will continue to be the best platform for
that.  Flite however is designed as a run-time engine when an
application needs to be delivered.  It specifically addresses two
communities.  First as a engine for small devices such as PDAs and
telephones where the memory and CPU power are limited and in some cases do
not even have a conventional operating system.
</p>
<p>The second community is for those running synthesis servers for many
clients.  Here although large fixed databases are acceptable, the size
of memory required per utterance and speed in which they can be
synthesized is crucial.
</p>
<p>However in spite of the decision to build a new synthesis engine we see
this as being tightly coupled into the existing free software synthesis
tools or Festival and the FestVox voice building suite.  Flite offers
a companion run-time engine.  Our intended mode of development is
to build new voices in FestVox and debug and tune them in Festival.
Then for deployment the FestVox format voice may be (semi-)automatically
compiled into a form that can be used by Flite.
</p>
<p>In case some people feel that development of a small run-time
synthesizer is not an appropriate thing to do within a University and is
more suited to commercial development, we have a few points which they
should be aware of that to our mind justify this work.  
</p>
<p>We have long felt that research in speech and language should have an
identifiable link to ultimate commercial use.  In providing a platform
that can be used in consumer products that falls within the same
framework as our research we can better understand what research issues
are actually important to the improvement our work.
</p>
<p>In considering small useful synthesizers it forces a more explicit
definition of what is necessary in a synthesizer and also how we can
trade size, flexibility and speed with the quality of synthesized
output.  Defining that relationship is a research issue.
</p>
<p>We are also advocates of speech technology within other research areas
and the ability to offer support on new platforms such as PDAs and
wearables allows for more interesting speech applications such as
speech-to-speech translation, robots, and interactive personal digital
assistants, that will prove new and interesting areas of research.
Thus having a platform that others around us can more easily integrate
into their research makes our work more satisfying.
</p>
<hr>
<a name="Key-Decisions"></a>
<table class="header" cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Flite-Design" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
<td valign="middle" align="left">[<a href="#Background" title="Previous section in reading order"> &lt; </a>]</td>
<td valign="middle" align="left">[<a href="#Flite-Design" title="Up section"> Up </a>]</td>
<td valign="middle" align="left">[<a href="flite_5.html#Structure" title="Next section in reading order"> &gt; </a>]</td>
<td valign="middle" align="left">[<a href="flite_5.html#Structure" title="Next chapter"> &gt;&gt; </a>]</td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left">[<a href="flite.html#Abstract" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="flite_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[Index]</td>
<td valign="middle" align="left">[<a href="flite_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<h2 class="section">5.2 Key Decisions</h2>

<p>The basic architecture of Festival is good.  It is well proven.  Paul
Taylor, Alan W. Black and Richard Caley spent many hours debating low
level aspects of representation and structure that would both be
adequate for current theories but also allow for future theories too.
The heterogeneous relation graphs (HRG) are theoretically adequate,
computationally efficient and well proven.  Thus both because HRGs have
such a background and that Flite is to be compatible with voices and
models developed in Festival, Flite uses HRGs as its basic utterance
representation structure.
</p>
<p>Most of a synthesizer is in its data (lexicons, unit database etc),
the actual synthesis code it pretty small.  In Festival most of that
data exists in external files which are loaded on demand.  This is
obviously slow and memory expensive (you need both a copy on the data
on disk and in memory).  As one of the principal targets for Flite is
very small machines we wanted to allow that core data to be in ROM,
and be appropriately mapped into RAM without any explicit loading
(some OS&rsquo;s call this XIP &ndash; execute in place).  This can be done by
various memory mapping functions (in Unix its called mmap) and is the
core technique used in shared libraries (called DLLs in some parts of
the world).  Thus the data should be in a format that it can be
directly accessed.  If you are going to directly access data you need
to ensure the byte layout is appropriate for the architecture you are
running on, byte order and address width become crucial if you want to
avoid any extra conversion code at access time (like byte swapping).
</p>
<p>At first is was considered that synthesis data would be converted in
binary files which could be mmap&rsquo;ed into the runtime systems but
building appropriate binaries files for architectures is quite a job.
However the C compiler does this in a standard way.  Therefore the mode
of operation for data within Flite is to convert it to C code (actually
C structures) and use the C compiler to generate the appropriate binary
structures.
</p>
<p>Using the C compiler is a good portable solution but it as these
structures can be very big this can tax the C compiler somewhat.  Also
because this data is not going to change at run time it can all be
declared <code>const</code>.  Which means (in Unix) it will be in the text
segment and hence read only (this can be ROM on platforms which have
that distinction).  For structures to be const all their subparts
must also be const thus all relevant parts must be in the same file,
hence the unit databases files can be quite big.
</p>
<p>Of course, this all presumes that you have a C compiler robust enough to
compile these files, hardware smart enough to treat flash ROM as memory
rather than disk, or an operating system smart enough to demand-page
executables.  Certain &quot;popular&quot; operating systems and compilers fail in
at least one of these respects, and therefore we have provided the
flexibility to use memory-mapped file I/O on voice databases, where
available, or simply to load them all into memory.
</p>
<hr>
<table class="header" cellpadding="1" cellspacing="1" border="0">
<tr><td valign="middle" align="left">[<a href="#Flite-Design" title="Beginning of this chapter or previous chapter"> &lt;&lt; </a>]</td>
<td valign="middle" align="left">[<a href="flite_5.html#Structure" title="Next chapter"> &gt;&gt; </a>]</td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left"> &nbsp; </td>
<td valign="middle" align="left">[<a href="flite.html#Abstract" title="Cover (top) of document">Top</a>]</td>
<td valign="middle" align="left">[<a href="flite_toc.html#SEC_Contents" title="Table of contents">Contents</a>]</td>
<td valign="middle" align="left">[Index]</td>
<td valign="middle" align="left">[<a href="flite_abt.html#SEC_About" title="About (help)"> ? </a>]</td>
</tr></table>
<p>
 <font size="-1">
  This document was generated on <i>October 18, 2014</i> using <a href="http://www.nongnu.org/texi2html/"><i>texi2html 5.0</i></a>.
 </font>
 <br>

</p>
</body>
</html>