Sophie: eso-midas-doc-13SEPpl1.2-3.mga5 i586

eso-midas-doc-13SEPpl1.2-3.mga5.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!--Converted with LaTeX2HTML 98.1p1 release (March 2nd, 1998)
originally by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds
* revised and updated by:  Marcus Hennecke, Ross Moore, Herb Swan
* with significant contributions from:
  Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
<HTML>
<HEAD>
<TITLE>Principal Components Analysis</TITLE>
<META NAME="description" CONTENT="Principal Components Analysis">
<META NAME="keywords" CONTENT="vol2">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<LINK REL="STYLESHEET" HREF="vol2.css">
<LINK REL="next" HREF="node213.html">
<LINK REL="previous" HREF="node211.html">
<LINK REL="up" HREF="node210.html">
<LINK REL="next" HREF="node213.html">
</HEAD>
<BODY >
<!--Navigation Panel-->
<A NAME="tex2html4050"
 HREF="node213.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
 SRC="icons.gif/next_motif.gif"></A> 
<A NAME="tex2html4047"
 HREF="node210.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
 SRC="icons.gif/up_motif.gif"></A> 
<A NAME="tex2html4041"
 HREF="node211.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
 SRC="icons.gif/previous_motif.gif"></A> 
<A NAME="tex2html4049"
 HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents"
 SRC="icons.gif/contents_motif.gif"></A>  
<BR>
<B> Next:</B> <A NAME="tex2html4051"
 HREF="node213.html">Cluster Analysis</A>
<B> Up:</B> <A NAME="tex2html4048"
 HREF="node210.html">Multivariate Analysis Methods</A>
<B> Previous:</B> <A NAME="tex2html4042"
 HREF="node211.html">Introduction</A>
<BR>
<BR>
<!--End of Navigation Panel-->

<H1><A NAME="SECTION001520000000000000000">
Principal Components Analysis</A>
</H1>
Among the objectives of Principal Components Analysis are the
following.

<P>
<DL COMPACT>
<DD><P>
<DT>1.
<DD>dimensionality reduction;
<A NAME="10023">&#160;</A>

<P>
<DT>2.
<DD>the determining of linear combinations of variables;
<A NAME="10024">&#160;</A> 

<P>
<DT>3.
<DD>feature selection: the choosing of the most useful variables;
<A NAME="10025">&#160;</A>

<P>
<DT>4.
<DD>visualisation of multidimensional data;

<P>
<DT>5.
<DD>identification of underlying variables;

<P>
<DT>6.
<DD>identification of groups of objects or of
outliers.
</DL>
<P>
The tasks required of the analyst to carry these out are as
follows:

<P>
<DL COMPACT>
<DT>1.
<DD>In case of a table of dimensions 
<!-- MATH: $n \times m$ -->
<IMG
 WIDTH="64" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
 SRC="img419.gif"
 ALT="$n \times m$">,
each of the
<I>n</I> rows or objects can be regarded as an <I>m</I>-dimensional vector.
Finding a set of 
<!-- MATH: $m^\prime < m$ -->
<IMG
 WIDTH="79" HEIGHT="45" ALIGN="MIDDLE" BORDER="0"
 SRC="img420.gif"
 ALT="$m^\prime < m$">
principal axes allows the objects to
be adequately characterised on a smaller number of 
(artificial) variables.  This is advantageous as a prelude to
further analysis 
as the 
<!-- MATH: $m-m^\prime$ -->
<IMG
 WIDTH="76" HEIGHT="45" ALIGN="MIDDLE" BORDER="0"
 SRC="img421.gif"
 ALT="$m-m^\prime$">
dimensions may often be ignored as
constituting noise; and, secondly, for storage economy 
(sufficient information from the initial table is now 
represented in a table with 
<!-- MATH: $m^\prime < m$ -->
<IMG
 WIDTH="79" HEIGHT="45" ALIGN="MIDDLE" BORDER="0"
 SRC="img422.gif"
 ALT="$m^\prime < m$">
columns).
Reduction of dimensionality is practicable if the first
<IMG
 WIDTH="30" HEIGHT="24" ALIGN="BOTTOM" BORDER="0"
 SRC="img423.gif"
 ALT="$m^\prime$">
new axes account for approximately 75 % or more of
the variance.  There is no set threshold, -- the analyst
must judge.  The cumulative percentage of variance 
explained by the principal axes is consulted in order
to make this choice.
<P>
<DT>2.
<DD>If the eigenvalue is zero, the variance of projections on the 
associated eigenvector
is zero.  Hence the eigenvector is reduced to a point. If this point is
additionally the origin (i.e. the data is centred), then 
this allows linear combinations between
the variables to be found.  In fact, we can go a
good deal further: by analysing second-order variables,
defined from the given variables, quadratic dependencies
can be straightforwardly sought.  This means, for example,
that in analysing three variables, <I>y</I><SUB>1</SUB>, <I>y</I><SUB>2</SUB>, and <I>y</I><SUB>3</SUB>,
we would also input the variables <I>y</I><SUB>1</SUB><SUP>2</SUP>, <I>y</I><SUB>2</SUB><SUP>2</SUP>, <I>y</I><SUB>3</SUB><SUP>2</SUP>,
<I>y</I><SUB>1</SUB><I>y</I><SUB>2</SUB>, <I>y</I><SUB>1</SUB><I>y</I><SUB>3</SUB>, and <I>y</I><SUB>2</SUB><I>y</I><SUB>3</SUB>.  If the linear combination
<BR><P></P>
<DIV ALIGN="CENTER">
<!-- MATH: \begin{displaymath}
y_1 = c_1 y_2^2 + c_2 y_1y_2
\end{displaymath} -->


<I>y</I><SUB>1</SUB> = <I>c</I><SUB>1</SUB> <I>y</I><SUB>2</SUB><SUP>2</SUP> + <I>c</I><SUB>2</SUB> <I>y</I><SUB>1</SUB><I>y</I><SUB>2</SUB>
</DIV>
<BR CLEAR="ALL">
<P></P>
exists, then we would find
it.  Similarly we could feed in the logarithms or other functions
of variables.

<P>
<DT>3.
<DD>In feature selection we want to simplify the task
of characterising each object by a set of attributes.
Linear combinations among attributes must be found; highly
correlated attributes (i.e. closely located attributes in the 
new space) allow some attributes to be
removed from consideration; and the proximity of attributes
to the new axes indicate the more relevant and important
attributes.

<P>
<DT>4.
<DD>In order to provide a convenient representation of 
multidimensional data, planar plots are necessary.  An
important consideration is the adequacy of the planar
representation: the percentage variance explained by the
pair of axes defining the plane must be looked at here.

<P>
<DT>5.
<DD>PCA is often motivated by the search for latent 
variables.  Often it is relatively easy to label the
highest or second highest components, but it becomes
increasingly difficult as less relevant axes are 
examined.  The objects with the highest loadings or
projections on the axes (i.e. those which are placed
towards the extremities of the axes) are usually worth
examining: the axis may be characterisable as a spectrum
running from a small number of objects with high positive
loadings to those with high negative loadings.

<P>
<DT>6.
<DD>A visual inspection of a planar plot indicates which
objects are grouped together, thus indicating that they
belong to the same family or result from the same 
process.  Anomalous objects can also be detected, and 
in some cases it might be of interest to redo the
analysis with these excluded because of the perturbation
they introduce.
</DL>
<P>
<HR>
<!--Navigation Panel-->
<A NAME="tex2html4050"
 HREF="node213.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
 SRC="icons.gif/next_motif.gif"></A> 
<A NAME="tex2html4047"
 HREF="node210.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
 SRC="icons.gif/up_motif.gif"></A> 
<A NAME="tex2html4041"
 HREF="node211.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
 SRC="icons.gif/previous_motif.gif"></A> 
<A NAME="tex2html4049"
 HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents"
 SRC="icons.gif/contents_motif.gif"></A>  
<BR>
<B> Next:</B> <A NAME="tex2html4051"
 HREF="node213.html">Cluster Analysis</A>
<B> Up:</B> <A NAME="tex2html4048"
 HREF="node210.html">Multivariate Analysis Methods</A>
<B> Previous:</B> <A NAME="tex2html4042"
 HREF="node211.html">Introduction</A>
<!--End of Navigation Panel-->
<ADDRESS>
<I>Petra Nass</I>
<BR><I>1999-06-15</I>
</ADDRESS>
</BODY>
</HTML>