<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <!--Converted with LaTeX2HTML 98.1p1 release (March 2nd, 1998) originally by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds * revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan * with significant contributions from: Jens Lippmann, Marek Rouchal, Martin Wilck and others --> <HTML> <HEAD> <TITLE>Principal Components Analysis</TITLE> <META NAME="description" CONTENT="Principal Components Analysis"> <META NAME="keywords" CONTENT="vol2"> <META NAME="resource-type" CONTENT="document"> <META NAME="distribution" CONTENT="global"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <LINK REL="STYLESHEET" HREF="vol2.css"> <LINK REL="next" HREF="node213.html"> <LINK REL="previous" HREF="node211.html"> <LINK REL="up" HREF="node210.html"> <LINK REL="next" HREF="node213.html"> </HEAD> <BODY > <!--Navigation Panel--> <A NAME="tex2html4050" HREF="node213.html"> <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="icons.gif/next_motif.gif"></A> <A NAME="tex2html4047" HREF="node210.html"> <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="icons.gif/up_motif.gif"></A> <A NAME="tex2html4041" HREF="node211.html"> <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="icons.gif/previous_motif.gif"></A> <A NAME="tex2html4049" HREF="node1.html"> <IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents" SRC="icons.gif/contents_motif.gif"></A> <BR> <B> Next:</B> <A NAME="tex2html4051" HREF="node213.html">Cluster Analysis</A> <B> Up:</B> <A NAME="tex2html4048" HREF="node210.html">Multivariate Analysis Methods</A> <B> Previous:</B> <A NAME="tex2html4042" HREF="node211.html">Introduction</A> <BR> <BR> <!--End of Navigation Panel--> <H1><A NAME="SECTION001520000000000000000"> Principal Components Analysis</A> </H1> Among the objectives of Principal Components Analysis are the following. <P> <DL COMPACT> <DD><P> <DT>1. <DD>dimensionality reduction; <A NAME="10023"> </A> <P> <DT>2. <DD>the determining of linear combinations of variables; <A NAME="10024"> </A> <P> <DT>3. <DD>feature selection: the choosing of the most useful variables; <A NAME="10025"> </A> <P> <DT>4. <DD>visualisation of multidimensional data; <P> <DT>5. <DD>identification of underlying variables; <P> <DT>6. <DD>identification of groups of objects or of outliers. </DL> <P> The tasks required of the analyst to carry these out are as follows: <P> <DL COMPACT> <DT>1. <DD>In case of a table of dimensions <!-- MATH: $n \times m$ --> <IMG WIDTH="64" HEIGHT="39" ALIGN="MIDDLE" BORDER="0" SRC="img419.gif" ALT="$n \times m$">, each of the <I>n</I> rows or objects can be regarded as an <I>m</I>-dimensional vector. Finding a set of <!-- MATH: $m^\prime < m$ --> <IMG WIDTH="79" HEIGHT="45" ALIGN="MIDDLE" BORDER="0" SRC="img420.gif" ALT="$m^\prime < m$"> principal axes allows the objects to be adequately characterised on a smaller number of (artificial) variables. This is advantageous as a prelude to further analysis as the <!-- MATH: $m-m^\prime$ --> <IMG WIDTH="76" HEIGHT="45" ALIGN="MIDDLE" BORDER="0" SRC="img421.gif" ALT="$m-m^\prime$"> dimensions may often be ignored as constituting noise; and, secondly, for storage economy (sufficient information from the initial table is now represented in a table with <!-- MATH: $m^\prime < m$ --> <IMG WIDTH="79" HEIGHT="45" ALIGN="MIDDLE" BORDER="0" SRC="img422.gif" ALT="$m^\prime < m$"> columns). Reduction of dimensionality is practicable if the first <IMG WIDTH="30" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" SRC="img423.gif" ALT="$m^\prime$"> new axes account for approximately 75 % or more of the variance. There is no set threshold, -- the analyst must judge. The cumulative percentage of variance explained by the principal axes is consulted in order to make this choice. <P> <DT>2. <DD>If the eigenvalue is zero, the variance of projections on the associated eigenvector is zero. Hence the eigenvector is reduced to a point. If this point is additionally the origin (i.e. the data is centred), then this allows linear combinations between the variables to be found. In fact, we can go a good deal further: by analysing second-order variables, defined from the given variables, quadratic dependencies can be straightforwardly sought. This means, for example, that in analysing three variables, <I>y</I><SUB>1</SUB>, <I>y</I><SUB>2</SUB>, and <I>y</I><SUB>3</SUB>, we would also input the variables <I>y</I><SUB>1</SUB><SUP>2</SUP>, <I>y</I><SUB>2</SUB><SUP>2</SUP>, <I>y</I><SUB>3</SUB><SUP>2</SUP>, <I>y</I><SUB>1</SUB><I>y</I><SUB>2</SUB>, <I>y</I><SUB>1</SUB><I>y</I><SUB>3</SUB>, and <I>y</I><SUB>2</SUB><I>y</I><SUB>3</SUB>. If the linear combination <BR><P></P> <DIV ALIGN="CENTER"> <!-- MATH: \begin{displaymath} y_1 = c_1 y_2^2 + c_2 y_1y_2 \end{displaymath} --> <I>y</I><SUB>1</SUB> = <I>c</I><SUB>1</SUB> <I>y</I><SUB>2</SUB><SUP>2</SUP> + <I>c</I><SUB>2</SUB> <I>y</I><SUB>1</SUB><I>y</I><SUB>2</SUB> </DIV> <BR CLEAR="ALL"> <P></P> exists, then we would find it. Similarly we could feed in the logarithms or other functions of variables. <P> <DT>3. <DD>In feature selection we want to simplify the task of characterising each object by a set of attributes. Linear combinations among attributes must be found; highly correlated attributes (i.e. closely located attributes in the new space) allow some attributes to be removed from consideration; and the proximity of attributes to the new axes indicate the more relevant and important attributes. <P> <DT>4. <DD>In order to provide a convenient representation of multidimensional data, planar plots are necessary. An important consideration is the adequacy of the planar representation: the percentage variance explained by the pair of axes defining the plane must be looked at here. <P> <DT>5. <DD>PCA is often motivated by the search for latent variables. Often it is relatively easy to label the highest or second highest components, but it becomes increasingly difficult as less relevant axes are examined. The objects with the highest loadings or projections on the axes (i.e. those which are placed towards the extremities of the axes) are usually worth examining: the axis may be characterisable as a spectrum running from a small number of objects with high positive loadings to those with high negative loadings. <P> <DT>6. <DD>A visual inspection of a planar plot indicates which objects are grouped together, thus indicating that they belong to the same family or result from the same process. Anomalous objects can also be detected, and in some cases it might be of interest to redo the analysis with these excluded because of the perturbation they introduce. </DL> <P> <HR> <!--Navigation Panel--> <A NAME="tex2html4050" HREF="node213.html"> <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="icons.gif/next_motif.gif"></A> <A NAME="tex2html4047" HREF="node210.html"> <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="icons.gif/up_motif.gif"></A> <A NAME="tex2html4041" HREF="node211.html"> <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="icons.gif/previous_motif.gif"></A> <A NAME="tex2html4049" HREF="node1.html"> <IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents" SRC="icons.gif/contents_motif.gif"></A> <BR> <B> Next:</B> <A NAME="tex2html4051" HREF="node213.html">Cluster Analysis</A> <B> Up:</B> <A NAME="tex2html4048" HREF="node210.html">Multivariate Analysis Methods</A> <B> Previous:</B> <A NAME="tex2html4042" HREF="node211.html">Introduction</A> <!--End of Navigation Panel--> <ADDRESS> <I>Petra Nass</I> <BR><I>1999-06-15</I> </ADDRESS> </BODY> </HTML>