Originally Posted by
GrizzlyAdams
I was mostly away from the Network yesterday and didn't have time to give a proper lecture on sigma. Here beginneth the First Lesson.
Imagine you have a collection of things (like hammocks) with some measurable attribute (like length). You can compute the average length of this collection---add up all the lengths and divide by the number of hammocks. This is a useful number, but doesn't tell you anything about how different the lengths of the hammocks in the group may be. For instance, I could have a small group with 1 bridge hammock that is 7' long and 1 HH Safari that is 12' long. The average length is 9.5'. I could have a different group that has ten netless Claytor hammocks, each one being 9.5' long. The average length in that group is also 9.5'. So the average doesn't measure "dispersion", or "variance".
Now if we've computed the average length of the group, we could then compute the average deviation from the average : For each hammock subtract its length from the group average. Add up all these deviations and divide that by the number of hammocks. This is a good and intuitive start, but suffers from the fact that a large deviation that is larger than the average is cancelled out in the sum by an equally large deviation that is less than the average. In fact, in the case of the 7' bridge hammock and 12' Safari, the
deviation of the bridge would be 7-9.5 = -2.5, and the deviation of the Safari would be 12-9.5 = 2.5, with the average of -2.5 and 2.5 being zero.
So measures of variance concern the average size of the deviations, not whether they are greater or less than the average. The size of the deviation of the bridge and the Safari from the average is 2.5 in both cases.
Now sigma is, as I pointed out before, something that is called "Standard Deviation", which captures the sense of the average size of the deviation, but in a way that has a number of mystical mathematical properties, but intuitiveness is not one of them. But it's quite powerful. The reason pollsters can predict with quite a lot of accuracy how an election is going to go based on only a very few samples from exit polls is due in part to the mathemagics behind an analysis of the variations in the responses from those who were polled.
For the record, the way you compute the standard deviation of a group of N value is as follows.
1. Compute the average value, call this A.
2. For every value in the group, compute the square of its difference with A. (Squaring captures the size, and erases the direction of the difference).
3. Compute the average squared difference, that is, add up all the squared differences as outlined in 2, and divide by N. Call this average V. This is known as "Variance", or sometimes the "Second Central Moment". Rattle that off at your next kegger and impress your friends.
4. The standard deviation is the square root of V.
Just as the average alone doesn't tell you how different values in a group are from the average (hence, from each other), the standard deviation alone doesn't tell you how significant the standard deviation is. A group that has average value 9.5 with standard deviation of 3.53 (which happens to be the standard deviation of the lengths of the group with the bridge and the safari) has more inherent variance than a group with average value 12 and a standard deviation of 3.53. So usually when one describes the variance within a group, one speaks of both the average and the standard deviation.
A probability distribution is an assignment of numbers to values that describe how likely each value is to turn up when one is randomly choosen from the group. Whereas when one takes the average of a group, each value in the group contributes equally to the average, when an average is taken using a probability distribution, the values that have larger probabilities are weighed more heavily in taking the average. Precisely, the average of a group that is described with a probability distribution (also known as the mean) is the sum of products, each product being a group value and its probability. The variance is the sum of products where each product is the squared difference of a value from the mean, times the probability of that value. The standard deviation is the square root of the variance.
So the standard deviation referred to in the Wikipedia article I linked to before is that of a probability distribution, and is used there to make assertions about how likely are the extreme values in a distribution.
To say that a value is 3 sigmas away from the the mean is to say that the difference between that value and the mean (or average) is at least 3 times the standard deviation. With respect to a probability distribution, than means it is very unlikely. With respect to just a bunch of numbers in a group, it means that the number is quite unlike most of the others.
Here endeth the First Lesson.
Grizz
Bookmarks