<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <!--Converted with LaTeX2HTML 98.1p1 release (March 2nd, 1998) originally by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds * revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan * with significant contributions from: Jens Lippmann, Marek Rouchal, Martin Wilck and others --> <HTML> <HEAD> <TITLE>Correspondence Analysis</TITLE> <META NAME="description" CONTENT="Correspondence Analysis"> <META NAME="keywords" CONTENT="vol2"> <META NAME="resource-type" CONTENT="document"> <META NAME="distribution" CONTENT="global"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <LINK REL="STYLESHEET" HREF="vol2.css"> <LINK REL="next" HREF="node216.html"> <LINK REL="previous" HREF="node214.html"> <LINK REL="up" HREF="node210.html"> <LINK REL="next" HREF="node216.html"> </HEAD> <BODY > <!--Navigation Panel--> <A NAME="tex2html4083" HREF="node216.html"> <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="icons.gif/next_motif.gif"></A> <A NAME="tex2html4080" HREF="node210.html"> <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="icons.gif/up_motif.gif"></A> <A NAME="tex2html4074" HREF="node214.html"> <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="icons.gif/previous_motif.gif"></A> <A NAME="tex2html4082" HREF="node1.html"> <IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents" SRC="icons.gif/contents_motif.gif"></A> <BR> <B> Next:</B> <A NAME="tex2html4084" HREF="node216.html">Related Table Commands</A> <B> Up:</B> <A NAME="tex2html4081" HREF="node210.html">Multivariate Analysis Methods</A> <B> Previous:</B> <A NAME="tex2html4075" HREF="node214.html">Discriminant Analysis</A> <BR> <BR> <!--End of Navigation Panel--> <H1><A NAME="SECTION001550000000000000000"> Correspondence Analysis</A> </H1> Correspondence Analysis may be described as a PCA in a different metric (the <IMG WIDTH="30" HEIGHT="48" ALIGN="MIDDLE" BORDER="0" SRC="img426.gif" ALT="$\chi^2$"> metric replaces the usual Euclidean metric). Mathematically, it differs from PCA also in that points in multidimensional space are considered to have a mass (or weight) associated with them, at their given locations. The percentage <I>inertia</I> explained by axes <A NAME="10071"> </A> takes the place of the percentage variance of PCA, -- and in the former case the values can be so small that such a figure of merit assumes less importance than in the case of PCA. Correspondence Analysis is a technique in which it is a good deal more difficult to interpret results, but it considerably expands the scope of a PCA-type analysis in its ability to handle a wide range of data. <P> While PCA is particularly suitable for quantitative data, CA is recommendable for the following types of input data, which will subsequently be looked at more closely: frequencies, contingency tables, probabilities, categorical data, and mixed qualitative/categorical data. <A NAME="10072"> </A> <A NAME="10073"> </A> <A NAME="10074"> </A> <A NAME="10075"> </A> <A NAME="10076"> </A> <A NAME="10077"> </A> <A NAME="10078"> </A> <P> In the case of <I>frequencies</I> (i.e. the <I>ij</I><SUP><I>th</I></SUP> table entry indicates the frequency of occurrence of attribute <I>j</I> for object <I>i</I>) the row and column ``profiles'' are of interest. That is to say, the relative magnitudes are of importance. Use of a weighted Euclidean distance, termed the <IMG WIDTH="29" HEIGHT="48" ALIGN="MIDDLE" BORDER="0" SRC="img427.gif" ALT="$\chi^2$"> distance, gives a zero distance for example to the <A NAME="10081"> </A> following 5-coordinate vectors which have identical <I>profiles</I> <A NAME="10083"> </A> of values: (2,7,0,3,1) and (8,28,0,12,4). Probability type values can be constructed here by dividing each value in the vectors by the sum of the respective vector values. <P> A particular type of frequency of occurrence data is the <I>contingency table</I>, -- a table crossing (usually, two) sets of characteristics of the population under study. As an example, an <!-- MATH: $n \times m$ --> <IMG WIDTH="64" HEIGHT="39" ALIGN="MIDDLE" BORDER="0" SRC="img429.gif" ALT="$n \times m$"> contingency table might give frequencies of the existence of <I>n</I> different metals in stars of <I>m</I> different ages. CA allows the study of the two sets of variables which constitute the rows and columns of the contingency table. In its usual variant, PCA would privilege either the rows or the columns by standardizing: if, however, we are dealing with a contingency table, both rows and columns are equally interesting. The ``standardizing'' inherent in CA (a consequence of the <IMG WIDTH="30" HEIGHT="48" ALIGN="MIDDLE" BORDER="0" SRC="img430.gif" ALT="$\chi^2$">distance) treats rows and columns in an identical manner. One byproduct is that the row and column projections in the new space may both be plotted on the same output graphic presentations (-- the lack of an analogous direct relationship between row projections and column projections in PCA precludes doing this in the latter technique). <P> <I>Categorical</I> data may be coded by the ``scoring'' of 1 (presence) or 0 (absence) for each of the possible categories. Such coding leads to <I>complete disjunctive coding</I>. CA of an array of such complete disjunctive data is referred to as Multiple Correspondence Analysis (MCA) (and in fact such a coding of categorical data is, in fact, closely related to contingency table type data). <P> Dealing with a complex astronomical catalogue may well give rise in practice to a mixture of quantitative (real valued) and qualitative data. One possibility for the analysis of such data is to ``discretize'' the quantitative values, and treat them thereafter as categorical. In this way a set of variables -- many more than the initially given set of variables -- which is homogenous, is analysed. <P> <HR> <!--Navigation Panel--> <A NAME="tex2html4083" HREF="node216.html"> <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="icons.gif/next_motif.gif"></A> <A NAME="tex2html4080" HREF="node210.html"> <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="icons.gif/up_motif.gif"></A> <A NAME="tex2html4074" HREF="node214.html"> <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="icons.gif/previous_motif.gif"></A> <A NAME="tex2html4082" HREF="node1.html"> <IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents" SRC="icons.gif/contents_motif.gif"></A> <BR> <B> Next:</B> <A NAME="tex2html4084" HREF="node216.html">Related Table Commands</A> <B> Up:</B> <A NAME="tex2html4081" HREF="node210.html">Multivariate Analysis Methods</A> <B> Previous:</B> <A NAME="tex2html4075" HREF="node214.html">Discriminant Analysis</A> <!--End of Navigation Panel--> <ADDRESS> <I>Petra Nass</I> <BR><I>1999-06-15</I> </ADDRESS> </BODY> </HTML>