Sophie

Sophie

distrib > Mageia > 5 > i586 > by-pkgid > 37ce2601040f8edc2329d4714238376a > files > 3907

eso-midas-doc-13SEPpl1.2-3.mga5.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!--Converted with LaTeX2HTML 98.1p1 release (March 2nd, 1998)
originally by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds
* revised and updated by:  Marcus Hennecke, Ross Moore, Herb Swan
* with significant contributions from:
  Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
<HTML>
<HEAD>
<TITLE>Correspondence Analysis</TITLE>
<META NAME="description" CONTENT="Correspondence Analysis">
<META NAME="keywords" CONTENT="vol2">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<LINK REL="STYLESHEET" HREF="vol2.css">
<LINK REL="next" HREF="node216.html">
<LINK REL="previous" HREF="node214.html">
<LINK REL="up" HREF="node210.html">
<LINK REL="next" HREF="node216.html">
</HEAD>
<BODY >
<!--Navigation Panel-->
<A NAME="tex2html4083"
 HREF="node216.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
 SRC="icons.gif/next_motif.gif"></A> 
<A NAME="tex2html4080"
 HREF="node210.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
 SRC="icons.gif/up_motif.gif"></A> 
<A NAME="tex2html4074"
 HREF="node214.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
 SRC="icons.gif/previous_motif.gif"></A> 
<A NAME="tex2html4082"
 HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents"
 SRC="icons.gif/contents_motif.gif"></A>  
<BR>
<B> Next:</B> <A NAME="tex2html4084"
 HREF="node216.html">Related Table Commands</A>
<B> Up:</B> <A NAME="tex2html4081"
 HREF="node210.html">Multivariate Analysis Methods</A>
<B> Previous:</B> <A NAME="tex2html4075"
 HREF="node214.html">Discriminant Analysis</A>
<BR>
<BR>
<!--End of Navigation Panel-->

<H1><A NAME="SECTION001550000000000000000">
Correspondence Analysis</A>
</H1>
Correspondence Analysis may be described as a PCA in a different
metric (the <IMG
 WIDTH="30" HEIGHT="48" ALIGN="MIDDLE" BORDER="0"
 SRC="img426.gif"
 ALT="$\chi^2$">
metric replaces the usual Euclidean metric).
Mathematically, it differs from PCA also in that points in
multidimensional space are considered to have a mass (or weight)
associated with them, at their given locations.  The percentage <I>inertia</I> explained by axes
<A NAME="10071">&#160;</A>
takes the place of the percentage variance of PCA, -- and in the former
case the values can be so small that such a figure of merit assumes less
importance than in the case of PCA.  Correspondence Analysis is a 
technique in which it is a good deal more difficult to interpret 
results, but it considerably expands the scope of a PCA-type analysis
in its ability to handle a wide range of data.

<P>
While PCA is particularly
suitable for quantitative data, CA is recommendable for the following types
of input data, which will subsequently be looked at more closely:
frequencies, contingency tables, probabilities, categorical data, and
mixed qualitative/categorical data.
<A NAME="10072">&#160;</A>
<A NAME="10073">&#160;</A>
<A NAME="10074">&#160;</A>
<A NAME="10075">&#160;</A>
<A NAME="10076">&#160;</A>
<A NAME="10077">&#160;</A>
<A NAME="10078">&#160;</A>

<P>
In the case of <I>frequencies</I> (i.e. the <I>ij</I><SUP><I>th</I></SUP> table entry indicates
the frequency of occurrence of attribute <I>j</I> for object <I>i</I>) the row and
column ``profiles'' are of interest.  That is to say, the relative 
magnitudes are of importance.  Use of a weighted Euclidean distance, 
termed the <IMG
 WIDTH="29" HEIGHT="48" ALIGN="MIDDLE" BORDER="0"
 SRC="img427.gif"
 ALT="$\chi^2$">
distance, gives a zero distance for example to the
<A NAME="10081">&#160;</A>
following 5-coordinate vectors which have identical <I>profiles</I>
<A NAME="10083">&#160;</A>
of values: (2,7,0,3,1) and (8,28,0,12,4).  Probability type values can be
constructed here by dividing each value in the vectors by the sum of 
the respective vector values.

<P>
A particular type of frequency of occurrence data is the <I>contingency
table</I>, -- a table crossing (usually, two) sets of characteristics of
the population under study.  As an example, an 
<!-- MATH: $n \times m$ -->
<IMG
 WIDTH="64" HEIGHT="39" ALIGN="MIDDLE" BORDER="0"
 SRC="img429.gif"
 ALT="$n \times m$">
contingency 
table might 
give frequencies of the existence of <I>n</I> different metals in stars of 
<I>m</I> different ages.  CA allows the study of the two sets of variables
which constitute the rows and columns of the contingency table.  In its
usual variant, PCA would privilege either the rows or the columns by
standardizing: if, however, we are dealing with a
contingency table, both rows and columns are equally interesting.
The ``standardizing'' inherent in CA (a consequence of the <IMG
 WIDTH="30" HEIGHT="48" ALIGN="MIDDLE" BORDER="0"
 SRC="img430.gif"
 ALT="$\chi^2$">distance) treats rows and columns in an identical manner.
One byproduct is that the row and column projections in the new space
may both be plotted on
the same output graphic presentations (-- the lack of an analogous
direct relationship between row projections and column projections 
in PCA precludes doing this in the latter technique).

<P>
<I>Categorical</I> data may be coded by the ``scoring'' of 1 (presence)
or 0 (absence) for each of the possible categories.  Such coding leads
to <I>complete disjunctive coding</I>. 
CA of an array of such complete disjunctive 
data is referred to as Multiple Correspondence Analysis (MCA) (and in fact
such a coding of categorical data is, in fact, closely related to
contingency table type data).

<P>
Dealing with a complex astronomical catalogue may well give rise in practice to a
mixture of quantitative (real valued) and qualitative data.  One
possibility for the analysis of such data is to ``discretize'' the
quantitative values, and treat them thereafter as categorical.  In this
way a set of variables -- many more than the initially given set of
variables -- which is homogenous, is analysed.

<P>
<HR>
<!--Navigation Panel-->
<A NAME="tex2html4083"
 HREF="node216.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
 SRC="icons.gif/next_motif.gif"></A> 
<A NAME="tex2html4080"
 HREF="node210.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
 SRC="icons.gif/up_motif.gif"></A> 
<A NAME="tex2html4074"
 HREF="node214.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
 SRC="icons.gif/previous_motif.gif"></A> 
<A NAME="tex2html4082"
 HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents"
 SRC="icons.gif/contents_motif.gif"></A>  
<BR>
<B> Next:</B> <A NAME="tex2html4084"
 HREF="node216.html">Related Table Commands</A>
<B> Up:</B> <A NAME="tex2html4081"
 HREF="node210.html">Multivariate Analysis Methods</A>
<B> Previous:</B> <A NAME="tex2html4075"
 HREF="node214.html">Discriminant Analysis</A>
<!--End of Navigation Panel-->
<ADDRESS>
<I>Petra Nass</I>
<BR><I>1999-06-15</I>
</ADDRESS>
</BODY>
</HTML>