Sophie: eso-midas-doc-13SEPpl1.2-3.mga5 i586

eso-midas-doc-13SEPpl1.2-3.mga5.i586.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!--Converted with LaTeX2HTML 98.1p1 release (March 2nd, 1998)
originally by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds
* revised and updated by:  Marcus Hennecke, Ross Moore, Herb Swan
* with significant contributions from:
  Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
<HTML>
<HEAD>
<TITLE>Robust fits and bad points</TITLE>
<META NAME="description" CONTENT="Robust fits and bad points">
<META NAME="keywords" CONTENT="vol2">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<LINK REL="STYLESHEET" HREF="vol2.css">
<LINK REL="next" HREF="node276.html">
<LINK REL="previous" HREF="node274.html">
<LINK REL="up" HREF="node274.html">
<LINK REL="next" HREF="node276.html">
</HEAD>
<BODY >
<!--Navigation Panel-->
<A NAME="tex2html4885"
 HREF="node276.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
 SRC="icons.gif/next_motif.gif"></A> 
<A NAME="tex2html4882"
 HREF="node274.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
 SRC="icons.gif/up_motif.gif"></A> 
<A NAME="tex2html4876"
 HREF="node274.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
 SRC="icons.gif/previous_motif.gif"></A> 
<A NAME="tex2html4884"
 HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents"
 SRC="icons.gif/contents_motif.gif"></A>  
<BR>
<B> Next:</B> <A NAME="tex2html4886"
 HREF="node276.html">Subtraction of dark and</A>
<B> Up:</B> <A NAME="tex2html4883"
 HREF="node274.html">Reductions   at</A>
<B> Previous:</B> <A NAME="tex2html4877"
 HREF="node274.html">Reductions   at</A>
<BR>
<BR>
<!--End of Navigation Panel-->

<H3><A NAME="SECTION001853100000000000000">
Robust fits and bad points</A>
</H3>

<P>
Already, we must decide how to deal with discordant data.
As this is a subject of considerable importance, a short digression is in order.

<P>
Poincar&#233; attributed to Lippmann the remark that ``Everyone believes in the
normal law of errors: the mathematicians, because they think it is an
experimental fact; and the experimenters, because they suppose
it is a theorem of mathematics.''
However, it is neither an experimental fact nor a theorem of mathematics.

<P>
Experimentally, numerous investigations have shown that real errors are rarely
if ever normally distributed.
Nearly always, large errors are much more frequent than would be expected for a
normal distribution (see [<A
 HREF="node307.html#MT77">18</A>], pp.&nbsp;10 - 12,
and [<A
 HREF="node307.html#H86">12</A>], pp.&nbsp;20 - 31).
Menzies and Laing [<A
 HREF="node307.html#ML85">17</A>] show clear examples in photometric data.

<P>
Mathematically, the reason for this behavior
is well understood: although the Central Limit Theorem promises
a Gaussian distribution
<I>in the limit</I> as the number of comparable error sources
approaches infinity, the actual approach to this limit is agonizingly slow --
especially in the tails, where a small number of large individual contributors
dominate.
In fact, if there are <I>n</I> independent and identically distributed contributors,
the rate of convergence is no faster than <I>n</I><SUP>-1/2</SUP> [<A
 HREF="node307.html#H80">11</A>].
If we wanted to be sure that our distribution was Gaussian to an accuracy of
1%, we would need some 10<SUP>4</SUP> elemental contributions -- clearly, an
unrealistic requirement.
In practice, a few large error sources dominate the sum.

<P>
Furthermore, the proportionality constant
in the convergence formula changes rapidly with distance from
the center of the distribution, so that
convergence is very slow in the tails.
This guarantees that the tails of real error distributions are always far from
Gaussian.

<P>
In the last 30 years, the implications of these deviations from ``normality''
for practical data analysis have become widely appreciated by
statisticians.
Traditionally, the excess of large errors was handled by applying the method of
least squares, after rejecting some subset of the data that appeared
suspiciously discordant.
There are several problems with this approach.

<P>
First, the decision whether to keep or reject a datum has an arbitrary
character.
A great deal of experience is needed to obtain reliable results.
But manual rejection may be impractical for large data sets; and
automated rejection rules are known to have inferior performance.
Second, rejection criteria based on some fixed number of standard deviations
result in no rejections at all when the number of degrees of freedom is small,
because a single aberrant point greatly inflates the estimated standard
deviation ([<A
 HREF="node307.html#H86">12</A>], pp.&nbsp;64 - 69).
The common ``3-<IMG
 WIDTH="19" HEIGHT="21" ALIGN="BOTTOM" BORDER="0"
 SRC="img556.gif"
 ALT="$\sigma$">'' rejection rule rejects nothing in samples smaller
than 11, no matter how large the biggest residual is; the inflation of the
estimated standard deviation by just one wild point outruns the largest
residual in smaller data sets.
There is no hope of rejecting a bad point this way in samples of 10 or smaller;
but one rarely measures the same star 10 times.
For the more typical sample sizes of 3 and 4, the largest possible residuals are
only 1.15 and 1.5 times the estimated standard deviation.
Third, including or rejecting a single point typically introduces
discontinuous changes in the estimated parameters that are comparable to their
estimated errors, so that the estimated values undergo relatively large jumps
in response to small changes in the data.
We would have more trust in estimators that are continuous functions of the
data.

<P>
Finally, the nature of most observed error distributions is not that
data are clearly either ``good'' or ``bad'', but that the few obviously wrong
points are accompanied by a much larger number of ``marginal'' cases.
Thus the problem of rejection is usually not clear-cut, and the data analyst is
left with doubts, no matter where the rejection threshold is set.
The reason for this situation is also well understood: most data are affected
by error sources that vary, so that the ``marginal'' cases represent data
gathered when the dominant error source was larger than average.
Such observations are not ``wrong'',
though they clearly deserve smaller weights than
those with smaller residuals.

<P>
In particular, we know that <I>photometric</I> data are afflicted with variable
errors.
For example, scintillation noise can vary by a factor of 2 on time scales of a
few minutes; and by an additional factor of <IMG
 WIDTH="55" HEIGHT="22" ALIGN="BOTTOM" BORDER="0"
 SRC="img557.gif"
 ALT="$sec \, Z$">
at a given air mass,
depending on whether one observes along or at right angles to the
upper-atmospheric wind vector.
Menzies and Laing&nbsp;[<A
 HREF="node307.html#ML85">17</A>] discuss other possible sources of error.
Therefore, we know we must deal with an error distribution that is
longer-tailed than a Gaussian.
Furthermore, both scintillation and photon noise are decidedly asymmetrical.
As these are the main sources of random error in photometric observations, we
can be sure that we <I>never</I> deal with truly Gaussian errors in photometry.

<P>
Unfortunately, the method of least squares, which is optimal for the Gaussian
distribution, loses a great deal of its statistical efficiency for even
slightly non-Gaussian errors.
(Statistical efficiency simply refers to the number of observations you need to
get a desired level of reliability.  If one estimator is twice as efficient as
another, it will give you the same information with half as many
observations.)
The classical example is Tukey's contaminated distribution.
Suppose all but some fraction <IMG
 WIDTH="15" HEIGHT="21" ALIGN="BOTTOM" BORDER="0"
 SRC="img558.gif"
 ALT="$\epsilon$">
of the data are drawn from a normal
distribution, and the remainder are drawn from another Gaussian that is three
times as broad.
Tukey&nbsp;[<A
 HREF="node307.html#T60">23</A>]
asked for the level of contamination <IMG
 WIDTH="14" HEIGHT="21" ALIGN="BOTTOM" BORDER="0"
 SRC="img559.gif"
 ALT="$\epsilon$">
that would make the
mean of the absolute values of the residuals (the so-called <I>average
deviation</I>, or A.D.) a more efficient estimator of the population width
than the standard deviation, which is the least-squares estimator of width.

<P>
Although the mean absolute deviation has only 88% of the
efficiency of the standard deviation for a pure Gaussian, Tukey found that less
than 0.2% contamination was enough to make the A.D. more efficient.
The reason is simply that least squares weights large errors according to the
squares of their magnitudes, which gives them an unreasonably large influence
on the results.

<P>
Similar, though less spectacular, results exist for position estimators.
For example,
about 10% contamination is enough to make the median as efficient as
the mean (the least-squares estimator); while several ``robust''
estimators are some 40% more efficient than the mean at this level of
contamination.
Real data seem to be somewhat longer tailed than this, so the mean (i.e., least
squares) is typically even worse than this simple example suggests.

<P>
Because convergence of the central limit theorem is much faster near the center
of the error distribution than in the tails, we can expect real error
distributions to be nearly Gaussian in the middle, and this is in fact
observed to be true.
A practical approach to data analysis is then to treat the bulk of the data
(in the middle of the distribution) as in least squares; but to reduce the
contribution of the points with large residuals, which would be rare in
a genuinely Gaussian distribution, in a smooth and continuous
fashion.

<P>
There is now a large literature on ``robust'' estimation -- that is, on methods
that are less critically dependent on detailed assumptions about the actual
error distribution than is least squares.
They can be regarded as re-weighted least squares, in which the weights of
data with moderate to large residuals are decreased smoothly to zero.
There are many ways to do this; all produce rather similar results.
The really ``wild points'' are completely rejected; the marginal cases
are allowed to participate in the solution, but with reduced weight.
The result is only a few per cent less efficient than least squares for
exactly Gaussian errors, and much better than least squares
-- typically, by a factor of the order of two --for realistic error
distributions.
These methods
are also typically 10% or so more efficient than results obtained by
experienced data analysts using careful rejection methods ([<A
 HREF="node307.html#H86">12</A>], pp.&nbsp;67
- 69).

<P>
The particular method used here for reducing photometric data is known to the
statisticians as ``Tukey's biweight''; it is easy to calculate, and produces
results of uniformly high efficiency for a range of realistic distributions.
To prevent iteration problems, it is always started with values obtained from
even more robust (but less efficient) estimators, such as the median and its
offspring, Tukey's robust line&nbsp;[<A
 HREF="node307.html#HMT">13</A>].
The usual method starts with a very robust but inefficient estimator
such as the median or Tukey's robust line; switches to Huber's M-estimator for
initial refinement until scale is well established; and then iterates to the
final values using the biweight.
If you are unaware of the need to precede the biweight with an
estimator having an non-redescending influence function, don't worry.
This is known to be a numerically stable procedure.

<P>
As robust methods depend on ``majority logic'' to decide which data to
down-weight, they obviously require a certain amount of redundancy.
One cannot find even a single bad point unless there are at least three to
choose from (corresponding to the old rule about never going to sea with two
chronometers).
Therefore, it is better to obtain a large number of short integrations
than a smaller number of longer ones, provided that
the repetitions are separated in time enough to be independent.
The planning program will help you get the necessary data.

<P>
In summary, photometric data are known to have decidedly non-Gaussian error
distributions; so we use methods designed to be nearly optimal for
these distributions, rather than the less reliable method of least squares.
These methods are closely related to least squares, but are much less
sensitive to the bigger-than-Gaussian tails of real error distributions.
From the point of view of the average user, the methods employed here are
simply a more effective refinement of the old method of rejecting outliers.

<P>
The advantage of using these well-established, modern
methods is a gain in efficiency of some tens of
per cent -- exactly equivalent to increasing the amount of observing time by
such an amount.
It's about like getting an extra night per week of observing time.
This advantage is well worth having.

<P>
<HR>
<!--Navigation Panel-->
<A NAME="tex2html4885"
 HREF="node276.html">
<IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next"
 SRC="icons.gif/next_motif.gif"></A> 
<A NAME="tex2html4882"
 HREF="node274.html">
<IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up"
 SRC="icons.gif/up_motif.gif"></A> 
<A NAME="tex2html4876"
 HREF="node274.html">
<IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous"
 SRC="icons.gif/previous_motif.gif"></A> 
<A NAME="tex2html4884"
 HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents"
 SRC="icons.gif/contents_motif.gif"></A>  
<BR>
<B> Next:</B> <A NAME="tex2html4886"
 HREF="node276.html">Subtraction of dark and</A>
<B> Up:</B> <A NAME="tex2html4883"
 HREF="node274.html">Reductions   at</A>
<B> Previous:</B> <A NAME="tex2html4877"
 HREF="node274.html">Reductions   at</A>
<!--End of Navigation Panel-->
<ADDRESS>
<I>Petra Nass</I>
<BR><I>1999-06-15</I>
</ADDRESS>
</BODY>
</HTML>