The problem with outliers is that they have the potential to skew statistical analysis. Many statistical techniques are sensitive to the presence of outliers. Calculations of the standard deviation and mean may be distorted by a single faulty data point. In the example given, we can eyeball a best fit line somewhere around here:
But when we draw the best fit line using a best-fit formula, we find that the single outlier in its extreme deviation from the norm has skewed our best-fit line dramatically:
Oftentimes, outliers can be fairly easy to 'eyeball'. But there are cases in which determining if a set of data contains any outliers becomes more difficult. Also, computer programs do not have the sort of 'visual intuition' that we have. We can usually spot an outlier because it looks wrong. Computers, however, have no way to determine what "looks" wrong or right.
There are many mathematical tests for determining if a set of data contains outliers, each with a varying degree of robustness and accuracy within certain conditions. The test that we will be focusing on for this tutorial is Grubbs' test. However, the methods and techniques used in this tutorial can be applied to various other mathematical tests as well.