Sophie: eso-midas-doc-13SEPpl1.2-3.mga5 i586

eso-midas-doc-13SEPpl1.2-3.mga5.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!--Converted with LaTeX2HTML 98.1p1 release (March 2nd, 1998)
originally by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds
* revised and updated by:  Marcus Hennecke, Ross Moore, Herb Swan
* with significant contributions from:
  Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
<HTML>
<HEAD>
<TITLE>Cluster Analysis</TITLE>
<META NAME="description" CONTENT="Cluster Analysis">
<META NAME="keywords" CONTENT="vol2">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<LINK REL="STYLESHEET" HREF="vol2.css">
<LINK REL="next" HREF="node214.html">
<LINK REL="previous" HREF="node212.html">
<LINK REL="up" HREF="node210.html">
<LINK REL="next" HREF="node214.html">
</HEAD>
<BODY >
<!--Navigation Panel-->
<A NAME="tex2html4061"
 HREF="node214.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
 SRC="icons.gif/next_motif.gif"></A> 
<A NAME="tex2html4058"
 HREF="node210.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
 SRC="icons.gif/up_motif.gif"></A> 
<A NAME="tex2html4052"
 HREF="node212.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
 SRC="icons.gif/previous_motif.gif"></A> 
<A NAME="tex2html4060"
 HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents"
 SRC="icons.gif/contents_motif.gif"></A>  
<BR>
<B> Next:</B> <A NAME="tex2html4062"
 HREF="node214.html">Discriminant Analysis</A>
<B> Up:</B> <A NAME="tex2html4059"
 HREF="node210.html">Multivariate Analysis Methods</A>
<B> Previous:</B> <A NAME="tex2html4053"
 HREF="node212.html">Principal Components Analysis</A>
<BR>
<BR>
<!--End of Navigation Panel-->

<H1><A NAME="SECTION001530000000000000000">
Cluster Analysis</A>
</H1> 
The routines implemented are <TT>CLUSTER</TT> which has 8 options for
hierarchical clustering and <TT>PARTITION</TT> which carries out
non-hierarchical clustering. We will look at the hierarchical options
available first.

<P>
The automatic classification of the <I>n</I> row-objects of an <I>n</I> by <I>m</I>table generally produces output in one of two forms: the assignments
to clusters found for the <I>n</I> objects; or a series of clusterings of
the <I>n</I> objects, from the initial situation when each object may be
considered a singleton cluster to the other extreme when all objects
belong to one cluster.  The former is non-hierarchical clustering or
partitioning.

<P>
The latter is hierarchical clustering.  Brief consideration will show
that a sequence of <I>n</I>-1 agglomerations are needed to successively
merge the two closest objects and/or clusters at each stage, so that
we have a set of <I>n</I> (singleton) clusters, <I>n</I>-1 clusters, ...,
2&nbsp;clusters, 1&nbsp;cluster.  This is usually represented by a hierarchic
tree or a <I>dendrogram</I>, and a ``slice'' through the dendrogram
defines a partition of the objects. Unfortunately, no rigid guideline
can be indicated for deriving such a partition from a dendrogram
except that large increases in cluster criterion values (which scale
the dendrogram) can indicate a partition of interest.

<P>
In carrying out the sequence of agglomerations, various criteria are
feasible for defining the newly-constituted cluster:<DL COMPACT><DT><I>The minimum variance criterion</I>
<DD>(method <TT>MVAR</TT>) constructs clusters
    which are of minimal variance internally (i.e. compact) and maximal
    variance externally (i.e. isolated).  It is useful for synoptic
    clustering, and for all clustering work where another method cannot be
    explicitly justified. 
  <DT><I>The minimum variance hierarchy:</I>
<DD>All options, with the exception of <TT>MNVR</TT>, construct a set of
    Euclidean distances from the input set of <I>n</I> vectors.  Thus the internal
    storage required is large.  Option <TT>MNVR</TT> allows a minimum variance
    hierarchy (identical to option <TT>MVAR</TT>) to be obtained, without
    requiring storage of distances.  Computational time is slightly higher
    than the latter option. 
  <DT><I>The single link method</I>
<DD>(method <TT>SLNK</TT>) often gives a very skew or 
   "chained" hierarchy.  It is therefore not useful for summarising data, but
    may indicate very anomalous or outlying objects, -- these will be among
    the last to be agglomerated in the hierarchy.   
  <DT><I>The complete link method</I>
<DD>(method <TT>CLNK</TT>) often does not differ 
    unduly from the minimum variance method, but its restrictive criterion is
    not suitable if the data is noisy. 
 <DT><I>The average link method</I>
<DD>(method <TT>ALNK</TT>) is a reasonable
    compromise between the (lax) single link method and the (rigid) complete
    link criterion: all of these methods may be of interest if a graph
    representation of the results of the clustering is desired. 
  <DT><I>The weighted average link method</I>
<DD>(method <TT>WLNK</TT>) does not take the
    relative sizes of clusters into account in agglomerating them.  This, and
    the two following methods, are included for completeness and for
    consistency with other software packages, but are not recommended for
    general use. 
  <DT><I>The median method</I>
<DD>(method <TT>MEDN</TT>) replaces a cluster, on 
    agglomeration, with the median value.  It is not guaranteed that these 
    criterion values will vary monotonically, and this may present difficulty 
    with the interpretation of the dendrogram representation. 
  <DT><I>The centroid method</I>
<DD>(method <TT>CNTR</TT>) replaces a cluster, on 
     agglomeration, with the centroid value.  As in the case of the last 
     option, reversals or inversions in the hierarchy are possible. </DL>
<P>
The Minimal Spanning Tree, which is closely related to the single link
method, has been used in such applications as interferogram analysis
and in galaxy clustering studies.  It is useful as a detector of
outlying data points (i.e. anomalous objects).

<P>
Routine <TT>PARTITION</TT> operates in one two options.  For both, a
partition of minimum variance, given the number of clusters, is
sought.  Two iterative refinement algorithms (minimum distance or the
exchange method) constitute the options available.

<P>
<HR>
<!--Navigation Panel-->
<A NAME="tex2html4061"
 HREF="node214.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
 SRC="icons.gif/next_motif.gif"></A> 
<A NAME="tex2html4058"
 HREF="node210.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
 SRC="icons.gif/up_motif.gif"></A> 
<A NAME="tex2html4052"
 HREF="node212.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
 SRC="icons.gif/previous_motif.gif"></A> 
<A NAME="tex2html4060"
 HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents"
 SRC="icons.gif/contents_motif.gif"></A>  
<BR>
<B> Next:</B> <A NAME="tex2html4062"
 HREF="node214.html">Discriminant Analysis</A>
<B> Up:</B> <A NAME="tex2html4059"
 HREF="node210.html">Multivariate Analysis Methods</A>
<B> Previous:</B> <A NAME="tex2html4053"
 HREF="node212.html">Principal Components Analysis</A>
<!--End of Navigation Panel-->
<ADDRESS>
<I>Petra Nass</I>
<BR><I>1999-06-15</I>
</ADDRESS>
</BODY>
</HTML>