Standard Deviation
Formula for calculating Standard Deviation:
An Example
Most people are comfortable with working out an average. If your shopping bill goes up and down every week and someone asks you how much you spend, you know that if you were to take your last few shopping receipts and add them all together, then divide the overall total by the number of receipts, this will give you an average for your weekly shopping bill.My Shopping in February |
|
Week |
Receipt Total (£) |
1 |
102 |
2 |
112 |
3 |
102 |
4 |
100 |
Total |
416 |
Mean |
104 |
Average (mean) = £416 divided by number of weeks - 416 / 4 = £104
Alternatively, you might decide to just go with a quick shortcut and quickly scan through the receipts to find out what you most often spend. This is also a sort of average - the one described in the previous paragraph is called a mean, whereas this one is called the mode (the most frequent amount spent).
Average (mode) = most frequent spend = £102
There is a third type of average called a median, but I'll leave explaining that for now and just tell you that the official name for an average is a measure of central tendency. As the name suggests, this is a way of describing your shopping by lumping it all together and coming up with a number that indicates about how much you usually spend. This would be handy if you wanted to compare how much you spend to how much other people spend, but it's not perfect.
The problem is that averages can be highly inaccurate. For instance, yesterday I spent £202 at Costco (never been there before, so it was the novelty of it I think), but I will still need to go and do a proper grocery shop at some point. Should I decide to look back at my shopping bills for March, this £202 is going to really throw the average, making it much higher than any other month, just because of one week's unusually high bill. This could mean that in the other 3 weeks I only spend £100, but come out with an average of around £150 per week, whereas my sister could consistently spend £50 more a week than me and come out with the same average.
My Shopping (March) |
Sister's Shopping (March) |
||
Week |
Receipt Total (£) |
Week |
Receipt Total (£) |
1 |
102 |
1 |
151 |
2 |
104 |
2 |
152 |
3 |
302 |
3 |
153 |
4 |
98 |
4 |
150 |
Total |
606 |
Total |
606 |
Mean |
151.50 |
Mean |
151.50 |
This is where a second type of descriptive statistic comes in handy...
Standard Deviation is a type of what mathematical people call a 'measure of distribution'.
Measures of distribution look at the pattern of all the data and the simplest type is the range, which is calculated by subtracting the lowest bill from the highest. Thus, I could show that even though my sister and I both spent on average £150 per week, the range for my shopping was £204, whereas for hers it was only £3. This shows that her shopping bills are far more consistent than mine.
So far, we've looked at the mean, mode and range, all of which are a bit limited. With the mean, as seen above, this can be very unreliable if there are any anomalous scores (my £202 Costco binge). The mode and range only look at one or two scores, not all of them, so might not spot other odd things going on in the data.
Standard Deviation is the next step up in accurately describing the numbers in a form that is much easier to read than looking at a whole table full of numbers. It's easy enough in this shopping example to scan through 4 weeks' worth of bills, but if this was a whole years' worth of shopping bills, then that wouldn't be quite so practical.
All that standard deviation does is look at each of the shopping bills and figure out as a general rule how much these go against the norm (ie how much it deviates from the average).
The first step then has to be calculating the average - in this case the mean, which we know means adding all of the shopping bills together and dividing this by the number of bills we're dealing with.
My Shopping (March) |
|
Week |
Receipt Total (£) |
1 |
102 |
2 |
104 |
3 |
302 |
4 |
98 |
Total |
606 |
Mean |
151.50 |
Now we need to look at how much difference there is between each bill and the average. This is likely to mean that some weeks I spent less and some weeks I spent more, so for some the difference will be a negative value and for some a positive value.
Week 1 £151.50 - £102 = £49.50
Week 2 £151.50 - £104 = £47.50
Week 3 £151.50 - £302 = £-150.50
Week 4 £151.50 - £98 = £53.50
There is a basic rule in maths that states two signs that are the same make a positive. This is quite handy, because if I was going to add up all the differences between the average and my various shopping receipts, the minuses would cancel out the pluses. Also, it doesn't matter whether I spent more or less, only that I spent a different amount to the average.
So, if we multiply a negative number by itself (square it), it becomes a positive number, although it will obviously make the difference between each shopping bill and the average bigger than it actually is, but this doesn't matter, as it can be easily reversed. This means we now have all positive numbers and can add up all the values without them cancelling each other out. Later on we can 'unsquare' the total (or find the square root).
Week 1 £49.50 x £49.50 = £2,450.25
Week 2 £47.50 x £47.50 = £2,256.25
Week 3 £-150.50 x £-150.50 = £22,650.25
Week 4 £53.50 x £53.50 = £2862.25
Total = £30,219
To sum up so far, I have:
- found the average (mean) for my weekly shopping
- found out the difference between each bill and the average (average - receipt total)
- for each bill, squared this difference to make sure that it is a positive number
- added all the differences together
Total = £30,219
Number of receipts = 4
Variance = Total / Number of receipts
Variance = 7,554.75
Now I have variance, I just need to reverse the squaring I did earlier and what I am left with is the standard deviation.
SD = the square root of the variance
SD = the square root of 7,554.75
SD = 86.92